AI fools humans with fake sound effects Source: Lance Ulanoff
The auditory Turing test has been defeated.
When MIT Computer Science and Artificial Intelligence Lab researchers showed videos of a drumstick hitting and brushing through various objects, subjects were fooled into believing that the sounds they heard actually came from the objects and materials on screen.
They did not.
Instead, a computer programmed to analyze the video and apply the correct sounds from its own library of samples chose the audio clips for all the videos. And the subjects were none the wiser.
The team’s work is described in a new paper released Monday and being presented next week at the Computer Vision and Pattern Recognition conference in Las Vegas.
To be clear, there really isn’t any such thing as an Auditory Turing test. When computer scientist Alan Turing came up with the concept in 1950, he believed a computer conducting a text-based chat with a human could fool that person into believing another human was on the other side of the conversation at least 30 percent of the time.
Turing never talked about a computer AI Foley artists fooling people into thinking they were hearing real audio.
In fact, an audio Turing test champ wasn’t even the original goal of the MIT CSAIL research team.
“We were more motivated by the high level goal of getting a computer algorithm to interact with the world and see what happens,” said study lead author Andrew Owens.
Training
In order for MIT’s audio AI to work, it had to learn a lot about the relationship between objects and materials and sound. Usually, to create these kinds of correlations, scientists do a lot of human annotation. They feed, for instance, millions of images into a system, labeling each one as they go, “This is a dog,” “This is a cat,’ “This is a tree.”
In that way, a smart AI can then use pattern matching to identify images of dogs, cats and trees that it sees in the future.
“We wanted to train an algorithm that can learn just by observing without explicit labels. Sound happened to be a concrete way of studying that,” Owens told Mashable.
To start, the team created almost 1,000 videos of them hitting and scratching things with a drumstick.
        We wanted to train an algorithm that can learn just by observing without explicit labels. Sound happened to be a concrete way of studying that
The algorithm applied different numerical values to each audio frequency (different objects and materials all make different sounds) created by the action, which were then tied to specific video frames and the images within those frames.
The team then shot new silent video of, once again, a drumstick hitting and scratching things.
The algorithm’s job was to analyze the video, look for patterns and then match it up with the sound samples from its library (it never created sounds out of whole cloth).
It did get fooled on occasion. If the drumstick moved toward a surface, but did not actually hit it, the algorithm still dropped in an impact sound (see the video above).
Foley artist Joo Fuerst records sounds made with a paintbrush in his studio in an old cowshed near Erkheim, Germany, 13 October 2015.
Image: Karl-Josef Hildenbrand/picture-alliance/dpa/AP Images
To test how well the algorithm did its job, MIT researchers asked 400 online subjects to view two sets of videos and select the videos playing the real soundtracks. According to the study, subjects selected the fake sounds over the real ones twice as often.
Why would you do this?
There are practical applications for a program that can take visual information and apply sound. The first one is rather obvious: film production.
Movie studios already employ Foley artists to add sounds to movies in post-production. They usually create footsteps, doors closing, rain and horse trots with everything but the actual objects in the film. An audio algorithm could analyze the movie and pull the right sounds from a database.
Owens said that an algorithm like this could also be used in object recognition for robotics. It could, in essence, “Use sound to learn about the world,” he said, adding, “Sound is a signal that robots can learn with that helps them associate objects with different ways they can interact with them.”
| }
|