ATD Blog
WellSaid Labs is a developer of art text-to-speech technology that creates life-like synthetic voices, from the voices of real people.
Wed Jul 31 2024
The generative artificial intelligence (AI) market emerged in January 2023, and now it’s estimated that half of the Fortune 500 use “AI voice.” Google Trends data shows a rapid surge in phrases like “AI voice,” “AI tools,” and “generative AI” in the winter of 2023 that has kept up. This surge has profoundly influenced how voice is created across various industries, transforming lingo from what was once called text-to-speech into AI voice, AI speech, or AI voice cloning.
Amid the buzz that AI will change everything from predictive analytics to video editing, AI voice has quietly emerged as a practical tool for learning and development teams. Surveys indicate that 70 percent of learning and development professionals have recognized the potential of AI and are exploring its role in their daily tasks.
Here are five aspects of how teams are evaluating AI voice in 2024.
1. How are the AI voices sourced?
2. How will we create voice with text-to-speech or voice cloning?
3. Evaluate vendors, collect voice samples, and set up a test for colleagues.
4. Cross-check your workflow, voice naturalness over multiple projects, and ability to fine-tune emphasis and emotion.
5. Project an ROI over numerous years.
Not all AI voices are created equal, and many AI voices have similar origins. Entry-level AI voice platforms lease AI voices from Google, Amazon, Microsoft, or IBM, filter them, and call them their own. Truly breakthrough vendors have a proprietary approach and have trained their voices through different ethical lenses and modeling. Closed data sets, with voices from professional voice actors, have been one of the leading approaches to satisfying copyright concerns and compensation. Ask vendors how they source their voices, cross-check with your organization for alignment, and be aware of cloning your voice through a platform and releasing it under unrestricted copyright.
If you’re accustomed to recording your voice, AI voice cloning or using your narration to create an AI voice is one approach. This workflow lends well to solo content creators or trainers who want to scale their voiceovers. Text-to-speech works for teams exchanging scripts and working in an environment where scripts are approved, and edits are completed over time. It’s not uncommon for a vendor to be able to offer each approach. Usually, the question is about the fidelity and cost of modeling a custom voice.
Once you have narrowed your vendors down, initiate a trial with each of them and collect samples of your scripts rendered in different voices for review. Then, set up a blind test and ask colleagues. Make sure to render clips over one minute to account for mispronounced words; platforms that need the most cleanup work for pronunciation, emphasis, and emotion will be more frustrating in the long run.
Write down all teammates who will share access to the AI voice studio. Many organizations will start small and add seats as they adopt AI voices in different departments and use cases. Most vendors will export audio in conventional formats like WAV and MP3. Like any other tools, ensure the trial leaves you confident the UI is manageable to navigate.
Some editors offer robust emotional controls. The upside is creative control; the downside is that additional editing takes time. Determining which vendor renders the best script in the shortest time will pay dividends, especially if your organization can only onboard new tools occasionally.
Traditional voiceover costs between $30 and $55 per finished minute of audio. Using this as a benchmark, your team can calculate the financial impact of traditional voiceovers vs. AI voice platforms.
You've Reached ATD Member-only Content
Become an ATD member to continue
Already a member?Sign In