May 16 2019

Training an AI to See Like a Human

The synergy between AI research and neuroscience is fascinating, and becoming more so. Knowledge of how organic brains function is informing our approaches to artificial intelligence, and research into AI is informing our understanding of neuroscience. I think this process will eventually lead to an artificial human brain, but it’s very hard to predict how long this will take.

Meanwhile, we are beginning to see the fruits of AI algorithms that employ neural nets or try to recapitulate organic learning in some way. A team at the University of Texas has published a paper in which they present yet another example of this – teaching an AI to quickly gather visual information about their environment from a few “glimpses.”

Human perception evolved to be fast and efficient, to infer our environment from as little information as possible. The advantage of this is speed – there is obviously an advantage to perceiving something flying at your head, or a predator stalking you, as quickly as possible. Our brains use algorithm to make high probability inference to construct our perception of the environment from tiny slices of information. This process works very well, and most of the time we accurately perceive our environment sufficiently to move around and interact with it.

However, this system is not perfect. It occasionally will make incorrect inferences or assumptions, and reconstruct reality wrong. We experience such occasions as optical illusions (if we ever break the incorrect construction, otherwise we persist in the wrong perception). This happens, for example, when our brains receive insufficient information or ambiguous information. When observing objects in the sky we may lack a reference in order to judge size and distance. Low light conditions may obscure much of what we see. Unusual objects or environments may challenge the algorithms assumptions.

Evolution has essentially given us a trade-off – we use the limited processing and perception of our brains to quickly construct a picture of reality, accepting that there will be flaws. Likely we evolved an optimal balance between speed and accuracy, within the limits of our brain’s raw power. There is also another element in the mix, attention. We can alter our attention to adjust the balance between speed and accuracy, focusing closely on something and taking the time to analyze it carefully. Or we could be on the alert for anything unexpected, or we might be focusing internally and not paying much attention to the world. Our attention allocates our limited resources as the situation requires.

Computers face many similar challenges. They have limited processing power or communication speed, and programmers are always looking for ways to do more faster with less. I remember the early days of video streaming. It was rough. Then programmers realized they did not have to send every frame of video in its entirety. They only really had to send information on which pixels changed from the previous image. This dramatically increased the smoothness and speed of streaming video, but also created some artifacts (weirdly shifting foregrounds and backgrounds). But as the tech improved, these artifacts disappeared. Throughput also improved, and now video streaming is much better.

In similar fashion, the new study involves an AI algorithm whose purpose is to allow a computer to take snap shots of its environment to quickly infer its entire surroundings. Processing each image takes time, and gathering visual information about a 360 degree environment would be relatively slow. There may be situations, like a search and rescue, in which time is of the essence. So what this AI does is start with training by looking at thousands of images. What it is trying to do is decide what the best next snap shot of its environment would be to most quickly and accurately infer its entire environment. The authors write:

We propose a reinforcement learning solution, where the agent is rewarded for reducing its uncertainty about the unobserved portions of its environment. Specifically, the agent is trained to select a short sequence of glimpses, after which it must infer the appearance of its full environment.

This is similar to the video streaming approach – ignore information that is not of value or is not needed. There is a certain predictability to reality – if you image the bottom of a column, it probably goes all the way up to the ceiling. Walls probably don’t suddenly stop for no reason. Part of a tree is probably connected to a whole tree.

As such algorithms “evolve” (in the technology sense), they will likely optimize the balance of speed and accuracy. I can also imagine that, like the human brain, the algorithm may eventually include a process for adjusting the algorithm itself based on the current situation. If the environment is more chaotic, for example, it may need to slow down and improve accuracy. A demolished building is likely much less predictable than an intact one. The AI may also need to allocate its “attention.” If it sees something of interest (like a wounded victim), it will focus on it, slow down, and gather detailed information.

The really cool thing is that AIs are trying to do the same things that organic brains do, and are facing the same limitations and trade-offs. The solutions that evolution stumbled upon are likely to be useful in the development of AI as well.

One critical difference, however, is that the pace of technological advancement is much faster than evolution. We could not simply evolve a brain with 10 times the processing power quickly (well, we did but it took millions of years). But our AIs will be 10 times faster within a generation. It’s always better to be more efficient than less efficient, but improvement in raw power may obviate the need for some trade-offs. Future AIs may be able to process their entire 360 degree environment in detail so quickly that we don’t have to sacrifice accuracy for speed.

But as we solve issues with using AI to navigate and interact with the real world, to learn and to problem-solve, we are increasingly using and finding solutions similar to the solutions that naturally evolved. It’s a fascinating process, and I think it will lead to a greater and deeper understanding of the nature of intelligence itself.

 

No responses yet