Sound clips of Donald Trump reading the ‘Three Little Pigs’ nursery rhyme aloud and Tom Hanks reciting Pulp Fiction’s ‘Ezekiel 25:17’ may sound realistic, but they were generated by artificial intelligence. 

A developer created a tool, dubbed Tortoise TTS (Text-to-Speech), capable of replicating a person’s voice after analyzing 20 seconds of an audio clip with them speaking.

And DailyMail.com asked the AI to clone the voices of the former president and actor.

Shashank Jain, the creator of Tortoise TTS, said his main idea was to create a tool that allows us to generate podcasts based on text.  

‘With the arrival of ChatGPT, we can generate conversations in the format we want, provide the feed to the tool I created and outcomes a podcast between two speakers of our choice,’ he told DailyMail.com.

The sound clips were created with a text-to-speech AI developed by Shashank Jain, who said it was designed to generate podcasts. DailyMail.com had the AI generate Donald Trump's voice to read 'The Three Little Pigs'

The sound clips were created with a text-to-speech AI developed by Shashank Jain, who said it was designed to generate podcasts. DailyMail.com had the AI generate Donald Trump’s voice to read ‘The Three Little Pigs’

And just as Microsoft is not releasing its voice-cloning VALL-E due to fears of misuse, Jain also plans to keep Tortoise safeguarded from bad actors.

Using AI to write essays, create music and replicate someone’s voice was once seen as something from a science-fiction film, but is now becoming the way of the world.

Jain shared his technology on Twitter, following Microsoft announcing its VALL-E – he tweeted that the technology already exists.

He said text is first fed to ChatGPT, Microsoft’s popular chatbot, to generate a textual conversation between the two on this topic. 

‘Once that is done, the text is fed to my tool, which then creates the podcast based on audio samples of two characters (Musk and Hanks in this case) and text conversation between the two,’ said Jain.

‘My main reason was just to do this as a hobby and not do anything commercial with it. 

‘Microsoft VALL-E promises to do the same and architecture wise also uses Transformers architecture underlying. 

‘Microsoft has not made its model public yet mainly due to concerns of misuse of voices.’

The tool is capable of replicating a person's voice after analyzing 20 seconds of an audio clip with them speaking. DailyMail.com also asked the AI to clone Tom Hanks' voice

The tool is capable of replicating a person’s voice after analyzing 20 seconds of an audio clip with them speaking. DailyMail.com also asked the AI to clone Tom Hanks’ voice

The digital voice of Tom Hanks recites Pulp Fiction's 'Ezekiel 25:17' which was said by actor Samuel L Jackson in the 1994 films

The digital voice of Tom Hanks recites Pulp Fiction’s ‘Ezekiel 25:17’ which was said by actor Samuel L Jackson in the 1994 films 

Microsoft announced VALL-E in January, touting its ability to clone someone’s voice after analyzing just three seconds of an audio clip of them speaking.

The technology sparked controversy among the public, who fear it is a tool for scammers to steal your voice.T

A telephone scammer could use the system to capture just three seconds of your voice and replicate it, which would also include your emotional range and acoustic environment.

This would allow bad actors to bypass systems that use your voice as a password.

While the AI sparks fear among some users, others see the technology as a way for people who lost their voice to throat disease ALS or another injury to regain their speech. 

However, some Twitter users have raised an important question – do you own the sound of your voice?

The Microsoft Vall-E team has addressed the ethics question with a statement: ‘The experiments in this work were carried out under the assumption that the user of the model is the target speaker and has been approved by the speaker.

However, when the model is generalized to unseen speakers, relevant components should be accompanied by speech editing models, including the protocol to ensure that the speaker agrees to execute the modification and the system to detect the edited speech.’

VALLE was trained on 60,000 hours of English and Microsoft claims it can replicate American, British and several European-sounding accents.

VALL-E can only turn written text into speech, but this is enough for someone to use the technology to steal your voice and ‘put words in your mouth.’

Microsoft has not yet released it to the public, but the company has high hopes for its AI – it is poised to revolutionize how we hear audiobooks and smart assistants.

The creators of VALL-E said the AI tool is designed for high-quality text-to-speech applications.

This includes editing speech in a recording of a person – such as an audiobook.

VALL-E analyzes how the person in the audio clip sounds, breaks that information into different components, then uses its training data to find something similar and combines the two.





Source link

Leave a Reply

Your email address will not be published. Required fields are marked *