May 14 2018

AI Accurately Mimicking Humans

Recently Google pushed the envelope a bit further with its Duplex chat bots. This is an artificially intelligent (AI) system designed to mimic natural-sounding human speech and interaction. Take a listen to the conversations on the link above, they are pretty convincing.

I do think, knowing ahead of time that I was listening to a bot, I can sense the computer algorithm at work rather than a real person. However, I do wonder if I would have detected it if I were blinded to whether or not the conversation was with a bot, and especially if I wasn’t even alerted to the possibility.

It seems that every time there is an incremental advance toward more human looking or acting robots or software, it sparks a conversation about the implications of this technology. Google Duplex is no exception. Not long after they announced and demonstrated their software, they had to announce that they would always warn people when they are speaking with a bot. It’s interesting, because that may defeat the whole point.

Google says it developed the technology so that people will feel at ease when doing business with AI, because the conversation will feel natural. However, (at least some) people feel creeped out instead. People may feel somehow deceived and even violated if they find out they thought they were talking to a person when in fact they were talking to a machine.

That is an interesting psychological question in itself, one that will likely become increasingly relevant as our world fills with AI and robots.

Google Duplex

A bit of background on the technology itself – Google says that Duplex is the result of deep learning using a recurrent neural network. They also limited the scope of what Duplex was trained to do – just make an appointment.

So, to be clear, Duplex is not self-aware. It is not the full artificial intelligence of science fiction. It is a sophisticated chat bot limited to a specific function.

The two technological challenges Google had to overcome were creating speech that sounds realistic, and also understanding what a person is saying when they think they are talking to another person. When people know they are talking to a computer program, they speak more slowly and clearly. When they assume they are talking to another person, they speak faster, leave out words and depend on context, and make many corrections (often mid-sentence, with false starts and repetitions).

In one of the examples, Duplex calls an Asian restaurant, and has to deal with poor sound quality, background noise, and a heavy accent. Also the person on the other side spoke only broken English and often misunderstood what Duplex said, causing confusion. Duplex handled the whole thing remarkably well. (Perhaps we are seeing only cherry-picked examples, but still.)

Duplex then has to speak itself in a natural-sounding way. This includes adding “ums” and pauses, some unnecessary words, and natural inflections. Humans have very good ears for human speech. A lot of our gray matter is dedicated to processing this information, so we can pick up the slightest emotion in another’s speech, for example.

This is a similar situation to human faces – we have a lot of cortex dedicated to processing human faces and the emotions and other information they convey. Simulating a human face closely, but not close enough, results in the “uncanny valley.” This can be extremely creepy because the result looks almost human, but the subtle difference makes the resulting figure look like an animated corpse, or simply unnatural – hence the creep factor.

Duplex has to also mimic the human mind, at least within its very limited context. It needs to respond to what the other person is saying in a realistic way, which can involve layers of context. What if the person tells a joke, or peppers the conversation with small-talk? That is where I find chat bots tend to break down, with the unexpected. They fall back on pat phrases, which sort-of work, but not really because it’s obvious they are not responding specifically to what the other person said. Often such responses are a clue that the other person is not really listening.

These kind of responses create “awkward” moments. It’s possible the human on the other end may interpret this as the person having a hard time hearing them, or that they are socially awkward. But that hardly accomplishes Google’s stated goal of making people feel comfortable, which is why they worked hard to avoid such moments.

From a technological point of view, software and robot developers are making incredible progress in simulating humans. We are not quite there yet, but getting very close in some aspects. We can see what’s coming. With Duplex (again, in a very limited context) we may have crossed a line where people will be interacting with software and not realize it.

This raises the big question, which Google had to quickly address – what are the ethics involved? Is it wrong to “deceive” a person into thinking they are talking to a human when they are talking to software? I put the word “deceive” in quotes, because that does not appear to be the purpose (if we take Google at their word). Google states they want to make people feel comfortable when dealing with automated services. They are not trying to manipulate them for any nefarious or selfish end.

As the BBC reports:

The demo was called “horrifying” by Zeynep Tufekci, an associate professor at the University of North Carolina who regularly comments on the ways technology and society impact on each other.

In a tweet, Prof Tufekci said the idea of mimicking human speech so realistically was “horrible and obviously wrong”.

In a later comment, she added: “This is straight up, deliberate deception. Not okay.”

I don’t think it is so obvious, or that the point is deception. This strikes me as a typical reaction to any new technology the breaks new ground. This is the same type of reaction we saw with in vitro fertilization, GMOs, and implanting animal parts. We recoil at the “unnatural.”

But while I don’t think the answer is obvious, there is a legitimate ethical question here – do people have the right to be informed up front when dealing with something (software or robot) meant to mimic a human to such a degree that their true nature may not be apparent.

I think it’s reasonable to proceed initially with caution, and there doesn’t seem to be a downside to requiring full disclosure at this point. It is an interesting thought experiment – how would I react to having an interaction with a bot and not knowing it? I suppose if everything went smoothly, I wouldn’t care.

But what if the interaction was a little frustrating, and I am being patient with what I think is a person when it turns out I have been dealing with inefficient code? Will I feel cheated? I wasted my time and courtesy on inanimate unaware software.

Robocalls are also off-putting partly because I feel like I am being asked to give a little of my time, but no one is giving their time on the other end. A machine can call thousands of people hoping to get a hit, and then expect me to spend my precious time dealing with the result. If you are going to waste my time, I want to feel there is a person on the other end wasting their time.

Also, if I know I am dealing with a computer, fine, but let’s get this over with as efficiently as possible. That’s the point, right? I might feel cheated if I spent my time being patient and courteous for no reason. I was deprived of the opportunity to optimize my end of the interaction for a machine.

Finally – are people simply bothered because they invested a little bit of human emotion under false pretenses in a machine. It does feel like it breaks an unspoken social contract of some sort.

This is mostly speculation on my part, but these are possible concerns that come to mind. This will have to be an active area of psychological research as this technology develops.

I also think that it is very likely that as this technology gets incorporated more into our society, people will adapt. Our feelings will change. Everything I wrote above may seem hopelessly quaint in 20-30 years. It will be interesting to see how this technology plays out in the future.

For now, there is an undeniable creep factor involved in dealing with chat bots that are realistic enough to fool you into thinking you are talking with a live person.

No responses yet