Mar 22 2021

Breaking Through the Uncanny Valley

In 1970 robotics professor Masahiro Mori observed, “Bbukimi no tani genshō,” which was later translated into “uncanny valley”. This refers to an observed phenomenon (first in robots, but also applies to digital recreations) that the more human-like the robot the greater the emotional affinity of people. However, as imitation approaches complete imitation it takes a sharp dip where people actually become uneasy and even revulsed by the not-quite-human face, before going up again as perfection is achieved. That dip in emotional affinity for near human imitation is the uncanny valley. Both roboticists and digital artists have been trying to break through that valley every since it was identified.

Perhaps the most notorious example of this phenomenon in modern popular culture is the CG portrayal of Tom Hanks in the movie Polar Express.  There is something dead about his eyes, which gives him the eerie appearance of an animated corpse. Even the most advanced current CG does not quite break through, but it’s getting damn close.

There are two neurological phenomena at work here. The first, as I have discussed before, is agency detection. Our brains divide the world into things with agency (the ability to act of their own will) and things without. Our brains are wired to then network our perception of things with agency to the limbic system and assign some emotional significance to them. This is why we feel something about our pets but not about a rock. Of course this is an oversimplification, because our brains are massive parallel processors with many circuits all working at the same time. But this is definitely a fundamental component that explains a lot about our reaction to certain things. This concept has even been pushed to the limits – in 1944 researchers made a video of simple two-dimensional shapes interacting with each other on a screen. Subjects spontaneously imbued these shapes with agency and provided elaborate interpretations of their actions and motivations. (A triangle is about as far away from the uncanny valley in the other direction as you can get.)

Neurologically we can see why we have no problem identifying with cartoon characters, responding to their personality, and even getting invested in their story. Added to this is the brain wiring behind seeing faces and reading emotion. This too can be stripped down to basic components – which is why we can interpret emoji’s. Cartoonists learn how to convey a range of emotions with a few lines. This in turn is partly due tot he fact that we have a large area of our visual cortex dedicated to processing information about human faces. Even infants will prefer to look at a human face over other stimuli. We also have a tendency to construct visual stimuli as a face – which is why we can see faces everywhere, such as in low-res images from the surface of Mars, or in the bark of a tree.

But this hypersensitivity to human faces and emotional expressions is a double-edged sword for animators and roboticists. It means they can achieve their goal of conveying agency and human emotion with a minimalist approach, and this only gets better as they get more and more human-like. But there’s a limit. Think of the style of many modern animated movies, like Brave. Animators have figured out how to have high quality life-like human characters, but there is a deliberate cartoon-like aspect to them that stops well short of the uncanny valley.

Step over the line, however, and you enter the uncanny valley. This too emerges from the fact that our visual cortex is so attentive to tiny facial details, especially movement and emotion. Neuroscientist are not exactly sure why we react to the uncanny valley the way we do, but there are some hypotheses. Perhaps it triggers a sense that the person is ill, injured, malformed, or even dead. This might trigger a self-protective disgust response to avoid contamination. It may also be triggering a sense of deception or ill-intent – like the difference between a fake smile and a real smile, we can have a negative response to the former because it does not feel genuine.

Regardless of the emotional circuitry, the underlying problem for animators and roboticists is that the human brain is exquisitely sensitive to the human face and its normal movement and expressions. How the mouth moves when people speak seems to be a special area of focus.

CG animators are getting pretty close to breaking through the uncanny valley, but I still have not seen an example where I could not immediately tell it was CG. Roboticists are having a much harder time, because they have to physically duplicate the movement of all the complex facial muscles and the overlying skin. Of course they can abandon this strategy and use “robot faces” that do not try to imitate humans but are fully capable of triggering emotional recognition. That will likely be a practical strategy for a while. But never-the-less, the goal remains to produce human-like robots with full human facial expressiveness that people respond to positively and does not trigger the uncanny valley.

Progress here is happening, although it’s hard to predict when success will be achieved. Researchers at Osaka University, for example, are using motion capture to compare the movement of android faces and human faces. Their goal is to identify the specific differences between the two, to guide further development of more human-like robot faces. They identified, for example, that when human faces express emotions the skin forms curved flow lines, while android faces form straight flow lines. There are also differences in the way the skin undulates as it moves. Identifying the tiny details that separate convincing human faces from the uncanny valley is the first step, but then there remains the substantial engineering problem of making those differences go away.

While it appears we are getting very close, I do wonder how long it will take for CG and robots to break through the uncanny valley. Close is not good enough, and the human brain is just a superb human face detector and analyzer. I suspect there will be a long tail, where differences get more and more subtle but take a long time to disappear completely. We will probably find out for CG this decade, and for robots in the following decade or two.



No responses yet