Deepgram, a renowned startup in the world of voice recognition, is making big waves with their latest announcement. Today, the well-funded company has officially released Aura – their real-time text-to-speech API. This innovative tool combines realistic voice models with a low-latency API, empowering developers to create cutting-edge conversational AI agents. These agents can then serve as replacements for traditional customer service agents in call centers and other customer-facing scenarios.
According to Deepgram’s co-founder and CEO Scott Stephenson, high-quality voice models have been available for some time now, but their cost and computing time have been significant roadblocks. On the other hand, low latency models often produce robotic-sounding voices. With Aura, Deepgram has found the sweet spot by combining human-like voice models that are incredibly fast, taking less than half a second to render, and doing so at an affordable price.
“Everybody now is like: ‘hey, we need real-time voice AI bots that can perceive what is being said and that can understand and generate a response – and then they can speak back,'” Stephenson said. He emphasized the importance of accuracy, low latency, and reasonable costs for businesses to see the value in a product like Aura, especially considering the high price of accessing large language models (LLMs).
Deepgram is proud to offer Aura at a highly competitive price point of $0.015 per 1,000 characters, surpassing most of its competitors. This pricing is slightly lower than Google’s WaveNet voices, which cost $0.016 per 1,000 characters, as well as Amazon’s Neural voices, which also come in at $0.016 per 1,000 characters. However, Deepgram understands that price isn’t the only factor for businesses, and their top-tier option may be more expensive.
“We strive to hit a significant price point across all segments, while maintaining exceptional accuracy and speed,” Stephenson shared, describing Deepgram’s approach to developing their product. “This is no easy feat, but it’s what we’ve been focused on from the start. We spent four years building the necessary infrastructure to make this a reality.”
With approximately a dozen voice models currently available, all trained by a dataset curated by Deepgram with voice actors, Aura is a certified game-changer. Like all of the company’s models, the Aura model was also trained in-house. Here’s a sample of what it sounds like:
To experience Aura for yourself, you can take a demo here. After testing it out, I can tell you that while there may be the occasional odd pronunciation, the speed is what truly sets Aura apart. Additionally, Deepgram’s existing high-quality speech-to-text model complements this feature perfectly. To highlight Aura’s impressive response time, Deepgram notes the time it takes for the model to start speaking (usually less than 0.3 seconds) and how long the LLM takes to generate its response (usually under a second).