Money Talks: The Convergence of Artificial Intelligence and Voice Cloning
It’s no secret that voice cloning has become a lucrative industry, with countless investors eager to get in on the action.
Case in point: ElevenLabs, a groundbreaking startup that is revolutionizing the way we create and edit synthetic voices, recently announced the closure of an impressive $80 million Series B round. Co-led by prominent investors such as Andreessen Horowitz, former GitHub CEO Nat Friedman, and entrepreneur Daniel Gross, this funding round brings the company’s total raised capital to an astonishing $101 million. With a valuation of over $1 billion (up from ~$100 million last June), ElevenLabs is rapidly emerging as a leader in voice AI research and product deployment.
According to CEO Mati Staniszewski, the new capital will be used to advance product development, expand ElevenLabs’ infrastructure and team, conduct further AI research, and implement safety measures to ensure responsible and ethical development of AI technology.
“This new round of funding solidifies ElevenLabs’ position as the global leader in voice AI research and product deployment,” Staniszewski told TechCrunch in a recent email interview.
Founded in 2022 by Piotr Dabkowski, an ex-Google machine learning engineer, and Staniszewski, a former Palantir deployment strategist, ElevenLabs initially launched in beta just over a year ago. The idea for their groundbreaking voice cloning tools came from the founders’ frustration with poorly dubbed American films and the belief that AI could do better.
ElevenLabs is perhaps best known for its browser-based speech generation app, which allows users to create lifelike voices with adjustable settings for intonation, emotion, cadence, and other key vocal characteristics. While the app is available for free to all users, paying customers can upload voice samples and use ElevenLabs’ advanced voice cloning technology.
The startup is also investing heavily in creating versions of their technology designed for specific purposes, such as creating audiobooks, dubbing films and TV shows, and generating character voices for games and marketing campaigns.
Last year, ElevenLabs launched a powerful “speech to speech” tool that preserves a speaker’s voice, prosody, and intonation while also automatically removing background noise. This has proved particularly useful for film and TV dubbing, allowing for translations and synchronizations to be done seamlessly. Currently in the works is a new dubbing studio workflow that includes tools for generating and editing transcripts and translations, as well as a subscription-based mobile app that can narrate webpages and text using ElevenLabs voices.
With these innovative advancements, ElevenLabs has attracted high-profile clients such as Paradox Interactive, the game developer behind popular titles like Cities: Skylines 2 and Stellaris, and The Washington Post, as well as numerous other publishing, media, and entertainment companies. Staniszewski claims that ElevenLab users have generated the equivalent of over 100 years of audio and that the platform is used by employees at 41% of Fortune 500 companies.
But with all the buzz surrounding ElevenLabs’ technology, some critics have also emerged.
One of the most notorious incidents involved the infamous message board 4chan, known for its conspiratorial content, using ElevenLabs’ tools to share hateful messages imitating celebrities like actress Emma Watson. In another instance, The Verge’s James Vincent was able to maliciously clone voices in just a matter of seconds, generating samples containing threats of violence, racism, and transphobia. In a separate case, Vox reporter Joseph Cox documented creating a clone convincing enough to fool a bank’s authentication system.
In response, ElevenLabs has taken steps to address these concerns. They have attempted to identify and remove users who repeatedly violate their terms of service, which prohibits abuse, and have introduced a tool to detect speech created with their platform. In the coming months, the company plans to improve this detection tool to also flag audio generated by other voice-generating AI models. ElevenLabs is also partnering with unnamed “distribution players” to make this detection tool available on third-party platforms, according to Staniszewski.
Another issue ElevenLabs has faced is criticism from voice actors who claim the company uses samples of their voices without their consent. This raises concerns about potential misuse of these samples to promote content that the actors do not endorse or spread misinformation. In a recent Vice article, some victims shared their experiences of being harassed using ElevenLabs’ technology, with one example including the sharing of an actor’s private information, such as their home address, using a cloned voice.
Of course, there is also the larger concern of how platforms like ElevenLabs are changing the landscape of the voice acting industry.
An article by Motherboard explores how voice actors are increasingly being asked to give up rights to their voices so that clients can use AI to generate synthetic versions that could eventually replace them – often without adequate compensation. The fear is that AI-generated vocals may eventually replace voice actors entirely, particularly for lower paying entry-level work, leaving actors with no recourse.
Some companies are attempting to find a balance. Earlier this month, ElevenLabs’ competitor Replica Studios signed a deal with SAG-AFTRA, a media artist union, to create and license digital replicas of the union members’ voices. In a press release, the organizations stated that this partnership has established “fair” and “ethical” terms and conditions to ensure performer consent and negotiate compensation for the use of digital voice doubles in new works.
But even this has not fully appeased some voice actors, including SAG-AFTRA’s own members.
ElevenLabs’ solution is to create a marketplace for voices. Currently in its alpha stage, the marketplace is set to become widely available within the next few weeks. It allows users to create a voice, verify it, and share it with others. When someone else uses a voice, the original creator receives compensation, as Staniszewski explains.
“Users always retain control over their voice’s availability and compensation terms,” he adds. “The marketplace is designed as a step towards harmonizing AI advancements with established industry practices, while also bringing a diverse set of voices to ElevenLabs’ platform.”
However, some voice actors may not be happy with the form of compensation offered by ElevenLabs – at least not initially. Currently, creators receive credit towards ElevenLabs’ premium services, which some may find ironic.
But as the best-funded synthetic voice startup, ElevenLabs is setting a high standard in the industry. They face competition from other emerging startups such as Papercup, Deepdub, Acapela, and Voice.ai, as well as tech giants like Amazon, Microsoft, and Google. However, with plans to increase their team from 40 to 100 employees by the end of the year, ElevenLabs is determined to stay at the forefront of the fast-growing synthetic voice market.