Automated voice impersonation

I’ve written before about some problems with biometric security: it seems convenient to be able to use facial recognition to log in to your computer, until you find your co-workers doing it with colour photocopies of your picture.

Computers aren’t the only context where we use biometrics for identification. “Don’t you recognize my voice?” has been used for decades for authentication over the phone, whether implicitly or explicitly. Now, we’re approaching the day when faking anybody’s voice and having it say anything you like is getting near.

Expect disruption on every level, from teens pranking each other to abusive harassers terrifying victims in new ways to more election-altering political fraud.

Author: Milan

In the spring of 2005, I graduated from the University of British Columbia with a degree in International Relations and a general focus in the area of environmental politics. In the fall of 2005, I began reading for an M.Phil in IR at Wadham College, Oxford. Outside school, I am very interested in photography, writing, and the outdoors. I am writing this blog to keep in touch with friends and family around the world, provide a more personal view of graduate student life in Oxford, and pass on some lessons I've learned here.

5 thoughts on “Automated voice impersonation”

  1. This might actually make some types of crime harder to commit. You can’t easily blackmail someone with an audio tape if anyone is able to replicate the content and modify it at will with a bit of software.

  2. On the plus side, we will be able to simulate dramatic readings of poems and literature by any author who has ever been recorded on the public record. Philip Pullman read by Philip Pullman, where it hasn’t already been done. Stephen Fry by Stephen Fry.

  3. Until recently, voice cloning—or voice banking, as it was then known—was a bespoke industry which served those at risk of losing the power of speech to cancer or surgery. Creating a synthetic copy of a voice was a lengthy and pricey process. It meant recording many phrases, each spoken many times, with different emotional emphases and in different contexts (statement, question, command and so forth), in order to cover all possible pronunciations. Acapela Group, a Belgian voice-banking company, charges €3,000 ($3,200) for a process that requires eight hours of recording. Other firms charge more and require a speaker to spend days in a sound studio.

    Not any more. Software exists that can store slivers of recorded speech a mere five milliseconds long, each annotated with a precise pitch. These can be shuffled together to make new words, and tweaked individually so that they fit harmoniously into their new sonic homes. This is much cheaper than conventional voice banking, and permits novel uses to be developed. With little effort, a wife can lend her voice to her blind husband’s screen-reading software. A boss can give his to workplace robots. A Facebook user can listen to a post apparently read aloud by its author. Parents often away on business can personalise their children’s wirelessly connected talking toys. And so on. At least, that is the vision of Gershon Silbert, boss of VivoText, a voice-cloning firm in Tel Aviv.

    More troubling, any voice—including that of a stranger—can be cloned if decent recordings are available on YouTube or elsewhere. Researchers at the University of Alabama, Birmingham, led by Nitesh Saxena, were able to use Festvox to clone voices based on only five minutes of speech retrieved online. When tested against voice-biometrics software like that used by many banks to block unauthorised access to accounts, more than 80% of the fake voices tricked the computer. Alan Black, one of Festvox’s developers, reckons systems that rely on voice-ID software are now “deeply, fundamentally insecure”.

Leave a Reply

Your email address will not be published. Required fields are marked *