When your voice runs away from home
Manage episode 440907675 series 3449256
Your daughter, on vacation, calls you in a state of panic to let you know that she and her friend are having trouble with a payment. Please can you send some money to her friend so they can pay for the hotel room. What do you do? Your safest option is to hang up and call your daughter back. It may sound exactly like Sophie but that doesn't mean it is.
Fraud with the use of cloned voices is becoming more common. Phone calls where people are conned out of their money has been going on for decades, but the novelty today is that the voice at the other end can be a voice you know. Perhaps even a voice you share a home with.
My podcast colleague James and I have been joking about the existence of many hundreds of hours of sound material with our voices after more than a decade of shows and talks published online. Our voices can be made to say anything. I've let my family know the dilemma of this.
Your voice is also at risk
The truth is that your voice can be just as vulnerable today. The new tools only need a recorded minute of your voice to generate a believable copy. Sixty seconds. Actually Microsoft say they can do it in three. And then your voice can be made to say anything.
Anyone thinking their voice is at least not recorded to any useful extent is likely oblivious to all phone calls recorded for "educational purposes", all video meetings where you may be asking questions, parties where people are constantly filming, or all the omnipresent microphones always in our vicinity that can often be activated remotely. It's almost hard not to mistakenly record yourself from time-to-time.
Imagine: Your boss calls you into their office to ask about the circumstances of your phone call last night where you quit your job. But you haven't called. A funny "practical joke" by a colleague or a premonition of worse things to come?
What can happen?
Last week many outlets reported about Jennifer who received a phone call from a kidnapper where her daughter first said "Mom, I messed up..." and then, while the kidnapper was stating his demands, sobbed in the background, "Mom, please help me".
The undertaking failed when a separate phone call revealed her daughter was safe and sound with a friend. But Jennifer was convinced her daughter was with the supposed kidnapper throughout that fraudulent call.
In Australia voices are used to verify identity with banks and with the tax authority. It's been shown that voice clones can be used to trick the systems into giving account access. Thankfully a pin code is also often needed, which can help in stopping a large portion of these attemps.
But think about the times you've been asked to verify a subscription or contract through a recording of your voice. Your voice. That you assume no one else has.
A chat channel on Telegram lets you order swatting services that make use of voice clones. It's a phenomenon wherein criminals trick emergency services to send police or emergency response teams to someone's address. In the latest episode of Cyber we can hear about it being used to send response teams to schools where computer-generated voices claim to have placed explosives.
How will you know if the next crime or swindle will involve your voice, or the voice of someone close to you? Can your voice even be claimed to belong to you anymore?
Of course good uses exist
Already in 2007 I experimented with using a service known as ReadSpeaker to have my blog posts read aloud by an artificial voice. This allowed more people access to my articles and was of course positive for my website in terms of accessibility.
The modern voice clones provide a quality that lead to further accessibility improvements: the listening experience is more pleasing. And these voices can also be used to convert news and articles to podcast episodes, without any human having to utter a single word.
As I personally read many of my blog posts for podcast publishing I could in theory automate this task. So far I'm somewhat skeptical towards going down this route, as I would lose so many other benefits of reading aloud, benefits that improve both the original write-up and audio production values.
If there is a choice between making text accessible in an automated manner (not so thorough) and not doing it at all, it makes sense to prefer that it actually happens. But a paradigm shift where cloned voices become prevalent will also lead to many voice professionals (in industries like radio commercials, audio books and cartoons) will struggle to find work. In fact, many voice actors have already been coerced into signing away the rights to their own voices.
Efficiency needs both consideration and compassion
For creators like myself there are of course further benefits to offering content in more formats and thereby also reaching more people in ways that align with the preferences of the person listening, reading or shifting between these formats.
And already people with disabilities that lead to a loss of their voices can be provided the choice to communicate through tech tools that sound like voices they once had, or wish they had been given. A technology relying on voice banking is made more efficient when many recorded hours of a voice aren't required. Even though many still wish for more inclusion of people with these needs in the tool development process.
Whether or not we should be able to generate new music with celebrity voices, and if you should be allowed the voices of deceased family members to give voice to your digital assistant, are examples of dilemmas that need to be reckoned with in the near future. Does it matter if the music we listen to is AI-generated if we aren't aware of this?
I still believe it would be a good idea to, as we say in Sweden, hurry slowly, from the latin Festina lente. Essentially: When tasks are rushed mistakes are made and beneficial long-term results fail to be achieved.
There is a balance to urgency and diligence and often the former takes precedence at the expense of the latter.
Risk awareness requires transparency
My message, as always, is that as benefits of digitalisation are promoted we also need to be open and clear about all the problems created. It's only then we become better at recognising, mitigating and managing risk. And we can begin to build the social force that compels organisations to be more careful in an otherwise unrestrained hustle.
In my English podcast Carefully I will in the next episode use a voice clone for parts of the sound. Subscribe if you want to try making out which parts of the episode is not me talking. To be revealed at the end.
In Sweden we have a bit of an advantage (though some would say hindrance) in that these cloning tools perform at their best with English voices and pronunication. Something that can also provide a false sense of security. But it may just give us opportunity to reflect more on suitable use and concerning risks before the larger wave of synthetic voices strikes.
But, be aware, your voice can already be made to speak more than 30 languages. And it will not stumble when "reading". While it of course won't understand the text, in other ways your voice clone is already more capable than you.
What are your thoughts on this development? What benefits and risks do you see for yourself and others?
Sound bite
Example of my voice having been cloned from my English podcast. I used 13 minutes of sound and the cloning process took less than a minute.
Listen
Boost your digitalisation
References and further reading
6 afleveringen