Teaching a car to see what's around // EIT Digital

Teaching a car to see what's around

In a test by KurzweilAI using a Google Maps image of Market Street In San Francisco, the SegNet system accurately identified the various elements, even hard-to-see pedestrians (shown in brown on the left) and road markings. Credit: KurzweilAI/Cambridge University/Google

For a car to drive all by itself it has to answer three questions: 

1. where am I right now?

2. what's around me?

3. where do I want to go?

They are all tricky questions, as an example knowing were it is, requires understanding its orientation (sound familiar? Did it ever happen to you to have your smart phone telling you the road but you don't know what the starting direction is because you don't know which way you are looking with respect to the map shown on your smart phone screen...), however the second one is probably the trickiest for a vehicle. Distinguishing a pedestrian from the photo of a person on a billboard is quite easy for us but quite complex for a computer (and sometime we, too, are fooled...).

Answering this second question has been the goal of researchers at the University of Cambridge that have recently published their results in a paper on IEEE Explore.

They created a software, SegNet, that can look at an image and classifies objects into one of 12 different categories, like road, pedestrian, vehicles, cyclists, buildings....

Each of these categories is relevant for understanding the surrounding and taking decisions on what to do next. Clearly a building does not move, whilst cars, cyclists and pedestrians do, although at different speed and potentially in different directions.

SegNet is able to distinguish shadows from objects, and has reached a correctness of 90%, which may seem still low, and in some cases it is, but it is way better than much more expensive systems based on radars and lasers and even better than our human recognition capability!

To achieve this level of image understanding undergraduate students at Cambridge have processed manually 5,000 images (taking an average time of half a hour each) identifying for each image pixel its relation to a class of object. This knowledge was then digested by SegNet.

In the short term SegNet is expected to provide robot vacuum cleaner a better understanding of their environment. In the longer term (a successor of SegNet) it is expected to become an integral part of self driving cars.

Author - Roberto Saracco

© 2010-2018 EIT Digital IVZW. All rights reserved. Legal notice. Privacy Policy.