OpenAI Previews Voice Engine: Realistic AI-Generated Speech From Short Audio Samples

OpenAI announced preliminary insights and results from a preview of their new Voice Engine model. Voice Engine uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker’s voice. The model was first developed in late 2022 and powers the preset voices in OpenAI’s text-to-speech API as well as features like ChatGPT Voice and Read Aloud.

While the capabilities of Voice Engine to create emotive and realistic voices from small audio samples are impressive, OpenAI is taking a cautious approach to deploying the technology more broadly at this time. The company recognizes the potential for misuse of synthetic voices and wants to engage in a dialogue with stakeholders on how to responsibly deploy this technology in a way that benefits society. The results of the small-scale Voice Engine preview will help inform OpenAI’s decision on if and how to release the model more widely.

Early Applications and Benefits

To explore the potential uses and value of Voice Engine, OpenAI has been privately testing the technology with a small group of trusted partners. Some promising early applications include:

Enhancing education and reading assistance. Age of Learning, an educational technology company, has been using Voice Engine to generate voice-overs and real-time personalized responses to interact with students. The model allows them to create content with natural, emotive voices for a wider range of students.
Enabling content translation. Companies like HeyGen are using Voice Engine to translate videos and podcasts into multiple languages while preserving the speaker’s original voice and accent. This allows creators and businesses to fluently reach global audiences.
Improving essential services in remote communities. Dimagi is building tools powered by Voice Engine and GPT-4 to provide interactive feedback to community health workers in their primary languages, including Swahili and Sheng. This helps workers develop skills to deliver critical services.
Supporting non-verbal individuals. Livox’s AI-based AAC devices are using Voice Engine to provide people with speech disabilities unique, non-robotic voices across many languages. Users can choose a voice that represents them and maintain it consistently.
Helping patients recover their voice. The Norman Prince Neurosciences Institute at Lifespan health system has piloted a program to restore the voices of patients who have lost fluent speech due to medical conditions, using short audio samples.

Safety Considerations and Responsible Deployment

OpenAI acknowledges the serious risks associated with generating speech that mimics real people’s voices, especially in the context of an election year. The company is actively engaging with government entities, media, educators, civil society groups, and other stakeholders to incorporate their feedback into the development process.

Partners testing Voice Engine must adhere to usage policies that prohibit impersonation without consent, obtain informed permission from original speakers, and clearly disclose that voices are AI-generated. OpenAI has also implemented safety measures like watermarking and proactive monitoring.

Looking ahead, OpenAI believes that synthetic voice technology should be accompanied by experiences that authenticate speaker consent and prevent voices too similar to prominent figures. The company encourages steps to bolster societal resilience, such as:

Phasing out voice-based authentication for sensitive information
Exploring policies to protect individuals’ voice rights
Educating the public on AI capabilities and limitations
Accelerating tools to track the origin of AI-generated content

OpenAI aims to openly share the possibilities of AI while carefully considering the implications. By previewing Voice Engine, the company hopes to motivate important conversations around the challenges and opportunities presented by increasingly convincing generative AI models.

Risks aside, OpenAI looks forward to continuing to engage with policymakers, researchers, developers, and creatives as synthetic voice technology advances.

Have you tried using Voice Engine yet? Let me know what you think in the comments below.

If you need assistance understanding how to leverage Generative AI in your marketing, advertising, or public relations campaigns, contact us today. Custom training workshops are available. Or, schedule a session for a comprehensive AI Transformation strategic roadmap to ensure your marketing team utilizes the right GAI tech stack for your needs.