Someone across the room throws you a ball and you catch it. Simple
right?
Actually, this is
one of the most complex processes we have ever attempted to comprehend-let
alone recreate. Inventing a machine that sees like we do is a deceptively difficult
task, not just because it's hard to make computers do it, but because we are
not entirely sure how do it in the first place.
What actually
happens is this:
The image of the ball
passes through your eye retina, which does some elementary analysis and sends
it along to the brain, where the visual cortex more thoroughly analyses the
image. It then sends it out to the rest of the cortex, which compares it to
everything it already knows, classifies the objects and dimensions, and finally
decides on something to do: raise your hand and catch the ball.
This takes place in a tiny fraction of a second, with almost no
conscious effort, and almost never fails. So recreating human vision is not
just a hard problem, it is a set of them, each of which relies on the other.
Well, no one ever
said this would be easy. Except, perhaps, AI pioneer Marvin Minsky, who
famously instructed a graduate student in 1966 to "connect a camera to a
computer and have it describe what it sees". Pity the kid: 50 years later,
we are still working on it.
Serious research
began in the 50s and started along three distinct lines:
Replicating the eye (difficult); replicating the visual cortex (very
difficult); and replicating the brain (arguably the most difficult problem ever
attempted).
To see:
Reinventing the
eye is the area where we have had the most success. Over the past few decades, we
have created sensors and image processors that match and in some ways exceed
the human eye's capabilities. With larger, more optically perfect lenses and
semiconductor sub-pixels fabricated at nano-meter scales, the precision and
sensitivity of modern camera is nothing short of incredible. Cameras can also
record thousands of images per second and detect distances with great
precision.
Yet despite the
high fidelity of their outputs, these devices are in many ways no better than a
pinhole camera from the 19th century: They merely record the distribution of
photons coming in a given direction. The best camera sensor ever made couldn't
recognise a ball-much less be able to catch it.
The hardware, in
other words, is severely limited without the software - which, it turns out, is
by far the greater problem to solve. But modern camera technology does provide
a rich and flexible platform on which to work.
To describe
This is not the
place for a complete course on visual neuro anatomy, but suffice it to say that
our brains are built from the ground up with seeing in mind, so to speak. More
of the brain is dedicated to vision than any other task, and that
specialisation goes all the way down to the cells themselves. Billions of them
work together to extract patterns from the noisy, disorganised signal from the
retina.
Continue reading part - 2