Dec 23 2010

The Uncanny Valley

I recently saw the movie Tron – the sequel that just came out last week. Not a bad sequel for a Disney movie – the plot and characters were a bit thin, but there was some nice eye-candy. The movie did try one thing that few have before it – it tried to accurately simulate a realistic person in CG. They needed a young Jeff Bridges so they created him in CG. It was pretty good, but just slightly off. I especially noticed it around the mouth when he spoke – it was creepy.

Despite advances in computer graphics and animation, it is not yet possible to create a convincing human. That’s partly why so many CG movies feature bugs, toys, robots, and dragons – they look seamless. But the human ability to discriminate human expression is remarkable, and so subtle that even the slightest imperfections are noticeable and tend to provoke an emotional response.

This phenomenon has been dubbed the “uncanny valley.” The term refers to a map of emotional acceptance on the vertical axis and accuracy of human simulation on the horizontal axis. As simulations appear more human we tend to accept them more, until you get close to realistic but not quite – then acceptance plummets into the uncanny valley where acceptance turns into revulsion.

This is all still a bit speculative, but accords well with subjective experience. The idea is that we recognize cartoons as fake. The big eyes and pouting expression can still evoke and emotional response – if things act like agents we tend to treat them as if they were alive. But we know that they are not human because they are cartoony. But when you get close enough to human, being a bit off provokes revulsion. The speculation is because this triggers evolved revulsion against corpses and unhealthy or disfigured people.

For this reason many modern CG movies, if they portray people, stay clear of the uncanny valley and look deliberately cartoony.

One recent exception was The Polar Express, staring Tom Hanks. This movie was smack in the middle of the uncanny valley. Many viewers also noticed a closely related aspect of the CG being slightly off – the characters often had a “dead” expression on their face. This is a specific aspect of the human ability to discriminate faces – there is something about a human face that looks alive. We know when the lights are on and somebody is home.

Recently researchers have explored this phenomenon. They took realistic dolls and then found volunteers who looked similar to the dolls. They then used computer software to morph images of the dolls and the people, so there was a continuum of images between the two (you can see a couple of examples at the link above). Then they had subjects look at the morphing video and determine the point at which the resulting image looked “alive.” They found that this turning point was about two-thirds of the way from the doll image to the human image. They also found that the eyes were the most important feature in determining “aliveness.”

Again – this fits with subjective experience. It’s interesting to look at the morphing videos – to see a person’s face slowly become  “dead” and mask-like.

The deeper context of all of this is that our brains are hardwired to pay close attention to what is an agent (an actor in the world capable of motivation), what is human, what is healthy, and what is alive. Our visual processing is organized on these concepts – we actually process visual information for agents along different visual pathways than for non-agents or inanimate objects. This visual processing is also tied to the emotional centers in our brain, which is why we can have powerful emotional responses to purely visual stimuli.

What I find interesting is that, according to our hardwiring, the concept of “agent” does not have 100% overlap with the concept of “alive.” It makes sense it would not completely overlap with “human” – as there can be important non-human agents in the world, like that predator who is about to eat us. But the rules for determining agency are not tied to the rules for determining what is alive. Agents are anything that seems to have a will of their own, even if they do not possess any other features of being alive. This is probably why we can emotionally assign agency to things that are clearly non-living, like toys and cartoons. This may also be why we feel glitchy software is deliberately trying to frustrate us, for example.

The even deeper neurological context here is that our brains, in many respects, are organized thematically, in terms of how they process and assign importance to information. Far from being passive recorders of information, our brains subconsciously make many choices about which information is important and what it means. Emotions are our automatic reactions to this subconscious processing. Therefore it must have been evolutionarily important for our ancestors to automatically have a feel for which things in their environment were likely to be agents (erring hugely on the side of false positives), which things were alive, and which things were tainted and should be avoided.

The reaction of revulsion to the dead-eyes of Tom Hanks in the Polar Express likely results from this hardwiring.

Finally, I wonder how long it will take before CG is advanced enough that it can represent a realistic human that can fool the average human. My sense is that we are still off in level of detail by an order of magnitude, at least. Getting the mouth to move realistically will probably be the last hurdle. Until then we are stuck in the uncanny valley.

Share

17 responses so far

17 Responses to “The Uncanny Valley”

  1. rafalon 23 Dec 2010 at 10:04 am

    The problem with realistic facial expressions will probably be overcome with simply casting real actors and motion capturing their facial expressions with increasing accuracy.

    In a sense that’s bypassing the issue, since you literally copy the facial movement of a real person.

    An interesting example is the upcoming game L.A. Noire which uses such technology. It’s not quite perfected yet, and obviously real time rendering limits how realistic the models are, but I think it may be more balanced when it comes to the visuals and movements. It seems that balance is very important.

    Video showing the technology: http://www.youtube.com/watch?v=q2EG5J05048

  2. mkimble1on 23 Dec 2010 at 10:09 am

    Reminds me of a quote from John Singer Sargent. “A portrait is a painting of a person where there’s something wrong with the mouth.”

  3. tmac57on 23 Dec 2010 at 10:28 am

    I thought that Avatar very successfully overcame this problem.
    Here is a short clip where James Cameron discusses the ‘uncanny valley’ problem: http://www.youtube.com/watch?v=1wK1Ixr-UmM

  4. superdaveon 23 Dec 2010 at 10:29 am

    What did you think of the Naavi in avatar. They were motion captured people and I thought they did a great job. At times I felt more like I was watching a human with a painted face than a cgi image.

    Also, I wonder how a person with Autism or face blindness would perform on that doll morph test.

  5. Steven Novellaon 23 Dec 2010 at 11:02 am

    The Naavi were good – but the problem was solved by having them be not human. There were still subtle problems with the CG, but it was not distracting because the characters were not human.

  6. NinjaChurchon 23 Dec 2010 at 12:53 pm

    The facial animations for CLU in Tron were done with motion capture. Based on some videos I’ve seen, the L.A. Noire game looks like it uses some more advanced motion capturing techniques. I’d be interested to see how far we can go and if we’ll be able to animate human faces without motion capture from actors some day.

  7. HHCon 23 Dec 2010 at 12:53 pm

    To what degree is culture active in creating a revulsion to things which lack agency? For example, when the Taliban controlled Kabul, they forbade the possession of stuffed animals or dolls by Afghan children. They would terrorize the children by going from house to house ripping the limbs off toys.

  8. SourDouron 23 Dec 2010 at 3:09 pm

    This is pure conjecture on my part, but this how I see it:

    When a person opens his or her mouth wide, the entire face reacts. Little wrinkles appear around the eyes and so on. If we see a person with the mouth wide open (or any other extreme facial expression) without the secondary telltale signs, we know immediately something’s off. CG characters tend to have skins that don’t react realistically to either deformations or light, which I think is the underlying problem. Or at least one of the underlying problems.

  9. SRFWP1on 23 Dec 2010 at 4:03 pm

    Steve,

    While watching TRON, I just imagined that the slightly unreal look Jeff Bridges had was due to his actually being (canon-wise) a CG being in the grid, and not due to not-quite-there special effects. It did add a bit to his non-human nature.

    And how can you talk about the Uncanny Valley without mentioning the trailer before TRON, Mars needs Moms? That trailer marks the first time I’ve ever felt “weird” watching CG characters, it’s pretty freaky, maybe even more so in 3D. And funnily enough, made by the Polar Express studio…

    And on the topic of eyes, even in videogames developers spend a large amount of effort getting them right, from the reflections off the wet outer layer, to mimicking saccades and gaze behaviour.

    -SRFWP

  10. daedalus2uon 23 Dec 2010 at 7:21 pm

    I have written about how I think the revulsion of the uncanny valley triggers xenophobia.

    http://daedalus2u.blogspot.com/2010/03/physiology-behind-xenophobia.html

    My hypothesis is that when two people meet, they instinctively do a version of a Turing Test, where they try to communicate with the other person and see if they are “human enough” to communicate with, to understand, and maybe close enough genetically to potentially mate with. If the “error rate” in the communication is too high, then the uncanny valley is triggered and that triggers xenophobia.

    This level of detail can’t be coded for genetically, it has to be generated during neurodevelopment. I suspect that the detail is generated during language acquisition, and that during language acquisition, individuals acquire “details” of communication, accents, use of idioms, intonations, vocabulary, word use and many other things that “label” them as having acquired a first language during a specific cultural moment. That is why you can usually tell how old someone is by listening to their language. I think that when the ability to acquire a first language is lost, that the “communication protocols” are to some extent “frozen”. (this is a tremendous simplification of an enormously complex and idiosyncratic process)

    I don’t think that the uncanny valley can be overcome simply with better computer animation. I think that it is fundamentally the communication protocols that are sensed and not simply human-like movements. Humans from a different culture trigger xenophobia. Presumably there is physiology behind that triggering of xenophobia. Humans can get over their feelings of xenophobia. I think that this happens by learning the “communication protocols” of “the other” and so they are not “the other” any more because what they are communicating can now be understood.

    It is not possible to simply mimic communication by mimicking communication protocols, mimicked communication is easily recognized as gibberish. Gibberish in body language will always be recognized as such.

    This is why in The Voyage Home Kirk and Spock had to go back in time to actually acquire gray whales, the sounds could be duplicated artificially but not the information content.

    However, if a generation of humans grows up accepting computer generated features as “human enough”, then they probably will not invoke xenophobia in such people. This may not be so much due to improvements in animation, but due to changes in people becoming more accustomed to it.

    I think this is very similar to how the children of bigots don’t always grow up to be bigots, so long as they are exposed to the objects of their parents’ bigotry as children. This is one reason why I think it is so important to have integrated multi-cultural schools and for everyone to be mainstreamed into those schools.

  11. daedalus2uon 23 Dec 2010 at 10:42 pm

    Just to add to my previous comment. That is also why I completely reject the idea that women and girls should be allowed to cover themselves. My hypothesis is that when women and girls are covered, that men and boys do not learn read the body language of women and girls, and so women and girls trigger xenophobia in such men, and so such men can then treat women as if they are non-human because if someone does trigger xenophobia in you, then you feel in your heart of hearts as if they are non-human. Not everyone can overcome their feelings and treat objects they perceive to be non-human as human beings with full human rights.

  12. Steven Novellaon 24 Dec 2010 at 2:27 pm

    SRFWP – I thought of that, but other characters in the tron world did not look CG, and young Jeff Bridges in the real world was also CG, so that argument breaks down.

  13. Draalon 24 Dec 2010 at 4:20 pm

    “The term refers to a map of emotional acceptance on the vertical access and accuracy of human simulation on the horizontal access.”

    Misspelled axis.

  14. Draalon 24 Dec 2010 at 4:26 pm

    Daedalus,
    The ability to tell someone’s age by there voice depends greatly on hormone levels like testosterone.
    http://answers.google.com/answers/threadview/id/276894.html

  15. Draalon 24 Dec 2010 at 4:33 pm

    One possible way to improve CG is to understand the sex appeal of human faces.
    http://dsc.discovery.com/videos/science-of-sex-appeal-attractive-facial-features.html

  16. BillyJoe7on 25 Dec 2010 at 2:52 am

    daedalus,

    “I completely reject the idea that women and girls should be allowed to cover themselves. My hypothesis is that when women and girls are covered, that men and boys do not learn read the body language of women and girls, and so women and girls trigger xenophobia in such men, and so such men can then treat women as if they are non-human ”

    The intermediary here is emotional connection. If you are not emotionally connected to someone, it is possible to treat them as non-human, as objects. The burqa acts a block to achieving an emotional connection with its wearer. It is also the reason why it is much easier to hate [insert ethnic group] when yoiu haven’t connected to an indiviual of that ethnic group with whom you have established an emotional connection. Which gets back to the desirability of all schools being multicultural.

    (I went through secondary school with a group of friends who were Australian, Dutch, Italian, Polish, and Chinese. It was only much later on that I realised that we were such a group of such diverse ethnic background)

  17. daedalus2uon 25 Dec 2010 at 8:56 am

    Billiejoe, it is more fundamental than that. When boys only observe girls and women wearing burqa, they do not develop the neuroanatomy to be able to acquire an emotional connection to girls and women.

    Your experience illustrates my point. You didn’t realize at the time that the group was ethnically diverse because at that age you were still acquiring the neuroanatomy that differentiates who is “the other”.

    Unless segregation is enforced during those formative periods, people will acquire the neuroanatomy to recognize other ethnic groups as fully human and xenophobia won’t be triggered.

Trackback URI | Comments RSS

Leave a Reply

You must be logged in to post a comment.