What is Computer Vision ?

By Guru - December 10, 2016

Someone across the room throws you a ball and you catch it. Simple right?

Actually, this is one of the most complex processes we have ever attempted to comprehend-let alone recreate. Inventing a machine that sees like we do is a deceptively difficult task, not just because it's hard to make computers do it, but because we are not entirely sure how do it in the first place.

What actually happens is this:

The image of the ball passes through your eye retina, which does some elementary analysis and sends it along to the brain, where the visual cortex more thoroughly analyses the image. It then sends it out to the rest of the cortex, which compares it to everything it already knows, classifies the objects and dimensions, and finally decides on something to do: raise your hand and catch the ball.

This takes place in a tiny fraction of a second, with almost no conscious effort, and almost never fails. So recreating human vision is not just a hard problem, it is a set of them, each of which relies on the other.

Well, no one ever said this would be easy. Except, perhaps, AI pioneer Marvin Minsky, who famously instructed a graduate student in 1966 to "connect a camera to a computer and have it describe what it sees". Pity the kid: 50 years later, we are still working on it.

Serious research began in the 50s and started along three distinct lines:

Replicating the eye (difficult); replicating the visual cortex (very difficult); and replicating the brain (arguably the most difficult problem ever attempted).

To see:

Reinventing the eye is the area where we have had the most success. Over the past few decades, we have created sensors and image processors that match and in some ways exceed the human eye's capabilities. With larger, more optically perfect lenses and semiconductor sub-pixels fabricated at nano-meter scales, the precision and sensitivity of modern camera is nothing short of incredible. Cameras can also record thousands of images per second and detect distances with great precision.

Yet despite the high fidelity of their outputs, these devices are in many ways no better than a pinhole camera from the 19th century: They merely record the distribution of photons coming in a given direction. The best camera sensor ever made couldn't recognise a ball-much less be able to catch it.

The hardware, in other words, is severely limited without the software - which, it turns out, is by far the greater problem to solve. But modern camera technology does provide a rich and flexible platform on which to work.

To describe

This is not the place for a complete course on visual neuro anatomy, but suffice it to say that our brains are built from the ground up with seeing in mind, so to speak. More of the brain is dedicated to vision than any other task, and that specialisation goes all the way down to the cells themselves. Billions of them work together to extract patterns from the noisy, disorganised signal from the retina.

Search This Blog

Efficient Programmer

Featured post

Why should I learn Go?

What is Computer Vision ?

Continue reading part - 2

Popular posts from this blog

Introduction to Big Data and Hadoop

what Is Big Data?

LocationManager vs GoogleApiClient