Be careful of the Potato Chips bag

With this photo MIT researchers announced a system to extract sounds from video ... Credit: MIT

"Any sufficiently advanced technology is indistinguishable from magic". 

I couldn't help but remember this phrase from Arthur Clarke as I read the MIT news on a way to extract sound from video.

Researchers have been able through better and better signal processing to extract information on your pulse by having a cell phone camera filming your face. The tiny changes in hues that your skin undergoes as more or less blood flows in your face capillary system in response to the heart beat are providing the information required. That was an amazing feat but it did not strike me as "impossible".

But now further sophistication in signal processing has allowed a joint team of MIT, Microsoft and Adobe researchers to extract sound information by looking at the extremely tiny vibration that sounds waves create on objects, like a simple potato chip bag. And this is really magic!

In order to reconstruct the sound the camera has to film the object at a frame rate that is higher than the sound frequency one wants to recover. In the experiment they placed a person in a soundproof room with a soundproof glass and filmed the potato chip bag from outside the room at a distance of 5 m as the person talked. Then they analysed the micro vibrations of the potato chip bag and reconstructed the sound, the voice of that person.

This requires a special camera, able of frame rates in the order of 60 thousands frames per second, well above the ones we have in our smart phones, but below what special cameras can do (reaching up to 100,000 frames per second).

They also proved that with a normal camera, filming at 60 frames per second (something some cell phones can do today), they can derive a signature of the voice of a person letting them tell if it was a he or a she talking and even matching that sound with the voice of a specific person whose signature is on file. This is not easy at all, since 60 frames per second is really low but the researchers have found a way to examine neighbour pixels that results in a multiplication of the frequency used.

Also, the detection of such tiny vibrations is a real challenge. Clearly it is beyond our 

Interestingly, they have also been able to determine, just looking at the video, the material of which the object was made, e.g. is the potato chip bag made of plastic, tin or paper foil? It turns out that different materials absorb sound waves differently and each one results in a specific signature that leads to the identification of the substance.

Author - Roberto Saracco

© 2010-2018 EIT Digital IVZW. All rights reserved. Legal notice