AI Narration: The Future for Audiobooks?

Artificial intelligence (AI) is already used for so many of our daily tasks that we may even forget its presence. It can park your car, play your favorite song, and even lock the doors to your home while you’re miles away.

It shouldn’t really be a surprise, then, that the audiobook industry is looking toward AI to make the narration process more efficient. But the thought of manipulating real voices, and possibly replacing them, leaves many questions and even more concerns that need to be addressed.

New Developments in AI Narration

The use of AI in narration and voice-over productions is not new. Remember listening to the first GPS navigators? You’d be lucky if you could even recognize the street names they were dictating.

While AI assistants such as Apple’s Siri and Amazon’s Alexa have seen significant improvements over the past few years, they’re still far from matching a human voice and all its rhythms, intonations, inflections, and other natural qualities.

Tech company NVIDIA’s invention of the graphics processing unit (GPU), and now GPU deep learning, have revolutionized modern AI. In August of 2021, the company announced new research and tools that can capture the natural speech qualities older technologies lacked by allowing speakers to train the AI system with their own voices.

Meanwhile, NaturalReader is one company that offers a free online text-to-speech converter that actually produces pretty natural-sounding recordings, in case you’d like to try it out yourself. (Note that this only narrates the text you feed it; it doesn’t clone your voice and create recordings on its own.)

The implications of this technological progress are enormous, and as we’ll discuss later, not without controversy. Among the many benefits are the ability to bring to life the voices of lost loved ones, help people who have lost their voices and speaking abilities, and yield faster, less expensive productions of films, digital productions, and of course, audiobooks.

AI-Narrated Audiobooks

In late 2020, Google Play Books announced that it would offer automatic narration for books that do not currently have an audio version. This will not be applied to every book, and the decision will still ultimately be left to publishers.

We’ve already discussed some of the pros and cons of AI writing machines, and such technology has already been used by media outlets and even authors who are struggling with writer’s block. But why would publishers be interested in replacing human audiobook narrators with AI technology?

For one, the technology can be considerably cheaper than a human narrator. But perhaps more importantly, it would make audio files much easier to change in post-production. Rather than calling back the human narrator to re-record segments, you can have AI change the file by simply updating the text.

Another benefit is the potential for audiobook listeners to customize their experience. Much like you can customize the voice of your AI assistant, you could opt to have a British female voice read Pride and Prejudice, or an American male voice read The Old Man and the Sea, for example.

Currently, Google Play Books has a beta version available, and is working on making the tool available to publishers. You can already try out some free audio versions of books from the public domain, which were created with AI.

Speechki

Perhaps no one is tackling the AI narration revolution as head-on as Speechki, a new recording platform that uses AI synthetic voices to record audiobooks in just 15 minutes.

Founded in 2019, Speechki works by allowing publishers or authors to upload their text, select one of 251 voices in 72 languages, customize the sound, then get their audio file in their desired format, which can be fine-tuned by a proof-listener in just a few hours.

In a call with the company, Speechki confirmed to us that they have already completed over 800 audiobooks with AI that are currently available, and have secured a deal to do over 1,000.

Currently, audiobooks make up just 5% of the book market, but Speechki projects a 24.4% growth rate between 2020 and 2027.

In an interview with the Audio Publishers Association, Speechki’s co-founder and CEO Dima Abramov explained that the company is not aiming to replace human narrators, but simply open more opportunities for listeners, including those who are vision-impaired or disabled, as content opportunities are “currently severely limited.”

AI Narration Controversy

One example of just how advanced AI narration has become can be found in the recent CNN documentary Roadrunner, which covers the life and death of Anthony Bourdain.

The filmmakers utilized AI to create a model of Bourdain’s voice, which naturally stirred up ethical debates over whether or not it’s putting words in a dead man’s mouth is ever justified. In one instance, the words were actually Bourdain’s, taken from an email he wrote to a friend, but still, the use of AI in the film was not disclosed to viewers.

But as LA Times journalist Matt Pearce sums up the dilemma: “The most important thing about a documentary deepfaking Anthony Bourdain’s voice isn’t that it happened, but that it happened and almost nobody noticed.”

Granted, the film featured a total of just 45 seconds of AI narration. Over longer periods, the technology is unlikely to fool many listeners. However, the more authentic audio that is fed to these machines, the more they learn and improve, so it’s only a matter of time before we hear AI narrations that are so realistic you wouldn’t notice even over the course of an entire film.

Of course, this is troubling news for voice actors and audio narrators. Because we’re still on the frontier of this new technology, the rules remain unclear, and most are being written along the way. One potential concern is that if voice actors sign any type of agreement allowing a company to synthesize their voice, they might use the voice as their property in any way they please, simply because the actor completed one job for them.

Deep Fakes

Another potential danger doesn’t require more than 45 seconds to make an impact. AI has already been used to create deepfake sound bytes of presidents, celebrities, and other influential figures apparently saying things that they never actually said.

The damaging implications here are obvious, and something we clearly need to confront. Currently, AI companies forbid clients from cloning the voices of celebrities without permission, but that hasn’t stopped a few troubling cases from appearing online.

An Ear to the Future

As with any new technology, there are reasons to celebrate, but also to be wary. Our voices might be the greatest powers we possess, so any development that stands to manipulate them or diminish their power should be approached with extreme caution.

On the flip side, authors like Joanna Penn point to the more positive possibilities, like speakers being able to license their voices to narrate other people’s audiobooks or play a part in a podcast drama.

Some voice actors have already done this, preserving their voices with AI and allowing companies to license those voices to say whatever they need. Then there are the infinite sentimental benefits, like being able to preserve the voice of a lost loved one.

There’s still much to see (and hear) about how this technology will shape the audiobook industry, as well as our discussions on individual rights, which makes this an exciting time for publishers, voice actors, and listeners alike.

What do you think about the use of AI in narration? Share your thoughts in the comments below!

If you enjoyed this post, then you might also like:

The post AI Narration: The Future for Audiobooks? appeared first on TCK Publishing.

Search This Blog

TCK Publishing Videos