AI Can Simulate Your Voice From Just 3 Seconds of Audio

0

The infamous ChatGPT release truly shocked the majority of society. While we have not yet recovered from that, Microsoft jumps into the fray to make important public news: the new AI capable of reproducing the voice of any person in a surprisingly realistic way. All this artificial intelligence needs is to listen to the voice that it must simulate for at least 3 seconds, and only with this information it is capable of pronouncing phrases entered by an operator with the exact tone of voice of the ‘victim.’

We say the ‘victim’ because this new AI seems to lend itself to many harmful uses on the internet. If we already had enough cause for concern with deepfakes, we must now add voice forgery to the horrors wrought by 21st-century computer technology.

The new audio synthesizer is called WALL-E

 

Presented on January 5, WALL-E – that’s what this AI is called – seems to be designed for the purposes of social integration, app verbalization, or audiobook narration. For example, WALL-E could have made Stephen Hawking’s robotic voice sound exactly like his real voice before he was affected by multiple sclerosis. And in this sense, it does seem that it can considerably improve the lives of millions of people.

As for its use in the audiobook market, WALL-E can be very successful in lowering the cost of narrations, putting thousands of broadcasters and voice professionals around the world out of work. Content writers who lose their jobs because of ChatGPT, then, will not be able to retrain as storytellers because of WALL-E, and they will have to think of another profession to get ahead.

The risks of impersonation

As had already happened with deepfakes, this new artificial intelligence is highly susceptible to being used maliciously. With deepfakes, disinformation agents could already manipulate a video to make it appear as if Joe Biden or Pedro Sánchez were saying something they never actually said. And, with WALL-E, they won’t even need to hire a voice actor to simulate the voice of this fake speech because this new AI is more than capable of imitating the voices of any politician and any other major character.

The danger may be even greater for ordinary internet users. After all, if a deepfake of a high-profile person like Joe Biden is released, the media will most likely be quick to verify whether or not it is true and report it as a fake as soon as the fake is detected. But ordinary people do not have these privileges. They can be blackmailed by the publication of false videos on the internet that can seriously harm them in their work and personal lives.

WALL-E, therefore, constitutes one more reason why protecting our data on the internet is necessary. It is convenient to minimize the content we share publicly on social networks, use strong passwords to protect access to our accounts, cancel online accounts that we do not use, and connect to a VPN for phone and laptop to encrypt the information we send over the internet.

The law is two steps behind the technology

Despite the growing risk of these digital threats, the legal framework seems not yet to be prepared to deal with them. While countries like China have drawn up specific laws against deepfakes –the new Chinese legislation came into force on January 10– in the US and Europe, we will have to wait a bit longer. In the meantime, we will need to fit current laws into this new landscape of new technological skills, which is not always easy.

The only hope – sadly momentary – lies in the fact that technologies like WALL-E are owned by Microsoft or other big tech companies, which try to keep them safe. And it doesn’t seem like Microsoft is interested in spoofing anyone’s voice for illegitimate purposes. However, Big Tech has already shown us in the past that they are not always capable of protecting their own technology. Their platforms have been the subject of hacks on a regular basis, and if WALL-E falls into the wrong hands, the problems they can cause are unimaginable.