OpenAI Misses Easy Layup to Champion Property Rights, Opting Instead for “Dialog” About “How Society Can Adapt”
Putting the brakes on public deployment of its voice synthesis AI, OpenAI missed an easy opportunity to take the high road. Rather than remind its audience that unauthorized voice impersonation violates personal property rights, OpenAI calls for “dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities.”
On March 29, 2024, OpenAI published a blog post, entitled, “Navigating the Challenges and Opportunities of Synthetic Voices - We’re sharing lessons from a small scale preview of Voice Engine, a model for creating custom voices.“ Unfortunately, OpenAI doesn’t take the opportunity to acknowledge the threat to personal property rights. In a remarkably tone deaf moment, OpenAI fails to acknowledge the modernization of the law of personal rights currently sweeping the US (seen for example in Tennessee’s codification of a person’s rights in their own voice, passed into law only last week. See the Briefing Room comment.)
Here’s an abbreviated version of OpenAI’s post:
___________________________________________________________
… Today we are sharing preliminary insights and results from a small-scale preview of a model called Voice Engine, which uses text input and a single 15-second audio sample to generate natural-sounding speech that closely resembles the original speaker. It is notable that a small model with a single 15-second sample can create emotive and realistic voices.
… we are taking a cautious and informed approach to a broader release due to the potential for synthetic voice misuse. We hope to start a dialogue on the responsible deployment of synthetic voices, and how society can adapt to these new capabilities. Based on these conversations and the results of these small scale tests, we will make a more informed decision about whether and how to deploy this technology at scale.
Early applications of Voice Engine
To better understand the potential uses of this technology, late last year we started privately testing it... These small scale deployments are helping to inform our approach, safeguards, and thinking about how Voice Engine could be used for good across various industries. A few early examples include:
Providing reading assistance to non-readers and children through natural-sounding, emotive voices representing a wider range of speakers than what's possible with preset voices. … use Voice Engine and GPT-4 to create real-time, personalized responses to interact with students.
…
Translating content, like videos and podcasts, so creators and businesses can reach more people around the world, fluently and in their own voices. … Voice Engine for video translation, so they can translate a speaker's voice into multiple languages and reach a global audience. When used for translation, Voice Engine preserves the native accent of the original speaker: for example generating English with an audio sample from a French speaker would produce speech with a French accent.
…
Reaching global communities, by improving essential service delivery in remote settings. … tools for community health workers to provide a variety of essential services … To help these workers develop their skills … uses Voice Engine and GPT-4 to give interactive feedback in each worker's primary language …
Supporting people who are non-verbal, such as therapeutic applications for individuals with conditions that affect speech and educational enhancements for those with learning needs. …
Helping patients recover their voice, for those suffering from sudden or degenerative speech conditions. …
Building Voice Engine safely
We recognize that generating speech that resembles people's voices has serious risks, which are especially top of mind in an election year. …
The partners testing Voice Engine today have agreed to our usage policies, which prohibit the impersonation of another individual or organization without consent or legal right. In addition, our terms with these partners require explicit and informed consent from the original speaker and we don’t allow developers to build ways for individual users to create their own voices. Partners must also clearly disclose to their audience that the voices they're hearing are AI-generated. Finally, we have implemented a set of safety measures, including watermarking to trace the origin of any audio generated by Voice Engine, as well as proactive monitoring of how it's being used.
We believe that any broad deployment of synthetic voice technology should be accompanied by voice authentication experiences that verify that the original speaker is knowingly adding their voice to the service and a no-go voice list that detects and prevents the creation of voices that are too similar to prominent figures.
Looking ahead
… In line with our approach to AI safety and our voluntary commitments, we are choosing to preview but not widely release this technology at this time. We hope this preview of Voice Engine both underscores its potential and also motivates the need to bolster societal resilience against the challenges brought by ever more convincing generative models. Specifically, we encourage steps like:
Phasing out voice based authentication as a security measure for accessing bank accounts and other sensitive information
Exploring policies to protect the use of individuals' voices in AI
Educating the public in understanding the capabilities and limitations of AI technologies, including the possibility of deceptive AI content
Accelerating the development and adoption of techniques for tracking the origin of audiovisual content, so it's always clear when you're interacting with a real person or with an AI
It's important that people around the world understand where this technology is headed, whether we ultimately deploy it widely ourselves or not. We look forward to continuing to engage in conversations around the challenges and opportunities of synthetic voices with policymakers, researchers, developers and creatives.
Clearly, the voice synthesis AI tech can be used for good (eg; as an educational tool, or a translator, or a voice prosthetic). The elephant in the room, however, is that these may be corner cases. Would OpenAI be better served by taking the personal property rights issues head on, and openly committing to development of authorization frameworks and requiring their use? We think so. Instead, OpenAI squishily characterizes the issues as “safety” concerns in an “election year,” and describes how it “believes” the technology should be used. What do you think?