OpenAI Shares More About Its Voice Cloning Tool
- By Paul Mah
- April 03, 2024
OpenAI has published a preview of its Voice Engine, which is a model for creating custom voices.
First unveiled last year, Voice Engine uses text input and an audio sample to generate natural-sounding speech. It is notable that a single 15-second sample is adequate to create an emotive and realistic voice mirroring that of the original speaker.
To be clear, Voice Engine is already in use at OpenAI – it powers OpenAI’s text-to-speed API as well as GPT Voice and Read Aloud features.
In a blog post, OpenAI highlighted several examples of how Voice Engine is used in private testing with a small group of partners.
For example, an educational firm has tapped into Voice Engine to provide reading assistance to non-readers and children through natural-sounding, emotive voices representing a wider range of speakers than is possible with present voices.
Another firm had used Voice Engine to build an AI visual storyteller platform to create custom, human-like avatars for a variety of content. It also uses Voice Engine for video translation, translating a speaker's voice into multiple languages – while retaining the original accent.
According to a TechCrunch report, the Voice Engine model was trained on a mix of licensed and publicly available data. However, it doesn’t need to do any fine-tuning to clone a voice.
This is possible because Voice Engine uses a combination of a diffusion process and transformer to generate speech. The model simultaneously analyzes the speech data it pulls from and the text data meant to be read aloud, generating a matching voice without having to build a custom model per speaker.
Voice Engine is not currently available to the public. OpenAI says it is committed to developing safe and broadly beneficial AI, and is taking a cautious and informed approach before a broader release.
“We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small-scale tests, we will make a more informed decision about whether and how to deploy this technology at scale,” said OpenAI in its blog post.
With great power comes great responsibility. OpenAI says users should obtain explicit consent before using someone's voice for cloning purposes and called for open dialogue about the ethical implications of voice cloning technology to ensure responsible development and use.
Ultimately, while Voice Engine holds immense potential, responsible implementation and adherence to ethical guidelines are essential to prevent misuse. But are we ready for it?
Image credit: iStock/Mongkol Akarasirithada
Paul Mah
Paul Mah is the editor of DSAITrends, where he report on the latest developments in data science and AI. A former system administrator, programmer, and IT lecturer, he enjoys writing both code and prose.