Archive for Deepgram

Guest Post: Why Voice AI Could Be the Career Move That Puts You on the Executive Shortlist

Posted in Commentary with tags on September 26, 2025 by itnerd

By Praveen Rangnath, CMO, Deepgram

If you’re in IT leadership, you’ve seen tech trends come and go. Some flare up in the headlines, get hyped at conferences, and then quietly fade away. Others — the rare ones — change the way entire industries operate.

Voice AI is one of the rare ones.

The problem is, a lot of people think they already understand it. They’ll nod along and say, “Oh yeah, like Siri or Alexa, right?” And if you leave it at that, you’ll miss the fact that Voice AI in the enterprise has almost nothing to do with consumer gadgets.

What we’re talking about is the ability to capture, understand, and act on every conversation your business has — with customers, employees, suppliers, regulators — in real time. It’s a shift from voice being something that “just happens” to voice becoming one of your most valuable data assets.

And here’s the kicker: if you’re the IT leader who drives that transformation, it won’t just make your company more competitive — it’ll make you more promotable.

Why This Is Bigger Than a Tech Upgrade

Think about all the conversations that happen in your organization in a single day:

  • A frustrated customer explaining a problem to support.
  • A nurse speaking with a patient about symptoms.
  • A technician describing a fault to a field supervisor.
  • A sales rep negotiating terms with a client.
  • Two department heads debating how to allocate budget.

Historically, those conversations disappear the second they end. Maybe you get a line or two typed into a CRM. Maybe a call recording sits on a server somewhere, unlistened to. But you’re losing context, insights, and opportunities every single time.

Voice AI changes the game. It can:

  • Transcribe those conversations in real time — accurately, even with accents, jargon, or background noise.
  • Understand the meaning, not just the words.
  • Detect urgency, emotion, and intent.
  • Feed that intelligence into your existing systems so it’s searchable, reportable, and actionable.

This isn’t about making calls “sound nicer.” It’s about turning the most human part of business — conversation — into something you can measure, learn from, and improve at scale.

Why IT Management Should Own This

If you’re thinking, “This sounds like a customer service project,” you’re not wrong — but you’re not right, either.

Customer service, sales, compliance, operations — they all want the benefits. But none of them can roll this out enterprise-wide without IT leadership guiding the architecture, governance, and integrations.

That means you have a rare opportunity:

  • Lead a high-impact initiative that crosses silos.
  • Control the data governance and compliance from day one (critical if you’re in finance, healthcare, or government).
  • Choose technology that scales, so this doesn’t become another point-solution mess.

When IT drives Voice AI adoption, it’s not just “supporting the business.” It’s reshaping how the business works. That’s the kind of strategic leadership that the C-suite and boards notice.

The Career Capital Play

Here’s the part some IT leaders miss: delivering a Voice AI initiative isn’t just good for the company — it’s good for your career.

  1. You’ll be seen as an innovator — not just the person who keeps the lights on. You’ll have a real-world example of bringing in a new capability that directly ties to revenue, customer satisfaction, and operational efficiency.
  2. You’ll get visibility with the C-suite — because every major function will be affected, and you’ll be the one making sure it all works.
  3. You’ll have hard metrics to show — cost savings, reduced call times, faster onboarding, higher CSAT, better compliance records. These are the kind of results that get repeated in performance reviews.
  4. You’ll build executive allies across departments — sales, marketing, operations, and compliance will all have wins they can point to because of your project. That makes you easier to promote.

If You Think You Already Understand It — This Could Be the Gap

A lot of tech leaders think Voice AI means “speech-to-text” plus maybe a chatbot. That’s like saying the internet is just “email plus websites.”

The real power is when Voice AI is:

  • Real-time — not hours later.
  • Context-aware — understanding your specific business language and workflows.
  • Integrated — feeding into CRM, ERP, analytics, and compliance systems automatically.
  • Scalable — able to handle every department, every language, every channel without breaking.

That’s when it stops being a tool and becomes infrastructure, and infrastructure projects with measurable business wins are the ones that get you invited to the big table.

Your Next Moves

If you want this to be a career-making initiative, here’s where you can start:

  1. Map Your Conversation Ecosystem — Where in your organization do high-value voice interactions happen daily?
  2. Identify High-Impact Use Cases — Pick two or three where you can prove ROI quickly (e.g., customer support, compliance-heavy calls, field service updates).
  3. Get Cross-Functional Buy-In Early — Loop in operations, CX, compliance, and sales from day one.
  4. Test for Accuracy First — Before you get dazzled by AI features, nail transcription quality. Everything else depends on it.
  5. Plan for Scale — Choose solutions that can grow beyond your pilot without creating security or integration headaches.

Bottom line: Voice AI is more than a technology trend — it’s a platform for delivering visible, measurable business wins. If you own it, you don’t just modernize the company. You modernize your own career path.

The leaders who make this move now won’t just be part of the conversation. They’ll be running it.

Deepgram and AWS to Present Live Webinar: “Building AI Voice Agents with Deepgram + AWS Bedrock” 

Posted in Commentary with tags on September 8, 2025 by itnerd

Deepgram is joining with Amazon Web Services (AWS) to present a live webinar titled, “Building AI Voice Agents with Deepgram + AWS Bedrock” tomorrow, Tuesday, September 9, 10:30 AM – 11:30 AM PDT.

​Deepgram’s Voice Agent API brings lightning-fast speech-to-text and lifelike text-to-speech together with event hooks and speaker diarization, all in real time. Amazon Bedrock gives you instant access to leading foundation models like Claude and Titan, with built-in safety, compliance, and flexibility, perfect for powering voice agents with real intelligence. ​​

Attendees will learn how to build scalable, responsive AI voice agents that actually work in production.

What You’ll See & Learn:

  • ​Build & Deploy in Minutes – See how Deepgram’s streaming API and Bedrock’s managed LLMs make real-time, voice-driven agents possible without stitching together brittle services.
  • ​Smarts + Speed in Action – Watch a live demo that showcases accurate transcription, rapid LLM responses, and the power of few-shot or RAG-based responses, all with sub-second latency.
  • ​Enterprise-Ready Architecture – Learn how to deploy with VPC, IAM, encryption, and autoscaling, all while controlling cost and optimizing performance.

Learn more and register here: https://luma.com/d3qf3t8s, or I would be happy to get you signed up to attend.

Deepgram’s Unfiltered Views on The Announcement From OpenAI

Posted in Commentary with tags on September 2, 2025 by itnerd

OpenAI just made an announcement titled, “Introducing gpt-realtime and Realtime API updates for production voice agents” found here: https://openai.com/index/introducing-gpt-realtime/

Scott Stephenson, CEO and Founder of Deepgram, would like to respectfully offer the following thoughts on this news:

“OpenAI’s new model shows progress, but the benchmarks make it clear: latency, turn-taking, and lack of control remain its Achilles’ heel in real conversations,” said Scott Stephenson, CEO and Founder, Deepgram. “When you measure what makes conversations actually work — speed, politeness, and turn-taking — Deepgram still leads the pack. The benchmarks confirm what users feel: conversations with Deepgram just flow more naturally.”

Stephenson continued, “Why does this matter? In real-world deployments, people don’t judge a voice agent by its feature set — they judge it by how the conversation feels. Latency and turn-taking aren’t technical footnotes; they’re the difference between a helpful interaction and a frustrating one. That’s why benchmarks that measure conversational flow, not just functionality, are the true indicator of readiness for production.”

Benchmarks That Back It Up 

  • #1 across all tests: Deepgram ranked highest under every VAQI weighting — equal, politeness-heavy, and latency-heavy.
  • Politer conversations: Fewest interruptions, meaning agents don’t talk over users. 
  • Faster responses: Sub-second average latency (0.85s) vs. OpenAI’s 2.55s. 
  • Smarter timing: Strong turn-taking with a competitive miss rate (0.427). 
  •  Consistent edge: Even when benchmarks shifted priorities, results held — Deepgram stayed on top. 

Source: VAQI Benchmark, August 2025

Deepgram published a blog today with further details: https://deepgram.com/learn/vaqi-openai-gpt-realtime-test-with-sensitivity-analysis

Deepgram Signs Strategic Collaboration Agreement with AWS

Posted in Commentary with tags on August 19, 2025 by itnerd

Deepgram today announced that it has signed a strategic collaboration agreement (SCA) with Amazon Web Services (AWS). The multi-year agreement deepens Deepgram’s relationship with AWS and reflects a shared commitment to accelerating the development and adoption of generative voice AI. As part of the collaboration, Deepgram will expand co-selling and go-to-market efforts, integrate more deeply with AWS services, and empower enterprises to build scalable, high-accuracy voice applications across a wide range of use cases. 

Innovative startups and Fortune 100 enterprises alike are already transforming customer experiences using Deepgram and AWS. One Fortune 20 healthcare company uses Deepgram’s speech models on secure, scalable AWS infrastructure to modernize its contact center operations and deliver faster, more personalized customer support.

As a Generative AI Competency Partner and long-standing AWS Partner Network (APN) member, Deepgram offers a full-featured voice AI platform that includes speech-to-text (STT), text-to-speech (TTS), and speech-to-speech (STS) capabilities. Additionally, Deepgram’s Dedicated deployment and EU endpoints run entirely on AWS infrastructure, enabling enterprise customers to meet global requirements for data residency, security, and compliance.

Deepgram’s infrastructure is deeply integrated with AWS, enabling customers to deploy its platform on Amazon EKS for scalable container orchestration, store data securely with Amazon S3, and manage containers using Amazon ECR. Customers can also use Amazon API Gateway and AWS Lambda to securely orchestrate interactions between Deepgram’s voice AI APIs and other services, including Amazon Bedrock hosted models and enterprise systems. Whether deployed in a customer-owned VPC or as a fully managed SaaS environment, Deepgram offers the flexibility required to maintain compliance, ensure data control, and operate efficiently at scale. Looking ahead, Deepgram plans to expand availability through AWS services like Amazon SageMaker and Amazon Bedrock to further streamline AI model deployment and orchestration.

Deepgram’s speech-to-text API can also be integrated into Amazon Connect, enabling best-in-class STT speed and accuracy for real-time transcription and voice automation within contact center environments. This helps enterprises improve agent productivity, automate call summaries, and enhance customer experiences.

As part of the SCA, Deepgram will invest in building GenAI-enabled capabilities on AWS, deliver new case studies and proof-of-concepts for enterprise customers, and continue optimizing its models and services for the AWS ecosystem.

Deepgram’s availability in AWS Marketplace also simplifies procurement for engineering and infrastructure teams by enabling usage-based pricing, unified billing, and rapid deployment within existing AWS environments.

Learn more about the partnership by visiting deepgram.com/aws, or start building with Deepgram on AWS today by exploring their listing on the AWS Marketplace.

Deepgram Expands Internationally, Launches Managed Single-Tenant Deployment Option

Posted in Commentary with tags on July 30, 2025 by itnerd

Voice AI is rapidly becoming foundational infrastructure across industries, powering real-time agents, compliance-sensitive workflows, and multilingual applications at scale. As global adoption accelerates, so does the demand for flexible deployment models, regional hosting, and production-grade reliability.

To meet that demand, Deepgram is announcing two major infrastructure expansions:

  • The general availability of Deepgram Dedicated, a fully managed, single-tenant runtime
  • The early access launch of our EU-hosted API endpoint, enabling in-region inference for European workloads

These launches reflect a broader shift in how voice AI is being deployed, and they come at a time of growing industry validation. This month, Deepgram Nova-3 was named a 2025 Voice AI Technology Excellence Award winner by TMC’s CUSTOMER magazine, recognizing their leadership in accuracy, real-time multilingual transcription, and self-serve customization.

Together, these milestones reinforce Deepgram’s commitment to providing voice AI infrastructure that supports enterprise-scale performance, compliance, and geographic flexibility.

What It Means to Go Global with Voice AI

Going global starts with supporting the world’s languages. Deepgram already supports over 36 languages for customers worldwide and will continue expanding language coverage throughout 2025. 

But language support is only the beginning.

For engineering teams building production-grade systems, global voice AI also requires solving for infrastructure and compliance demands as workloads expand across regions. As enterprises scale voice workloads globally, we continue to hear two common friction points: the growing complexity of managing infrastructure across regions and tightening data policies, particularly in the EU, that require stricter control over where and how voice data is processed.

These demands include:

  • Ultra low-latency inference paths. Real-time applications require models to run as close to the end user as possible to minimize round-trip time and meet interaction thresholds.
  • Data residency and legal jurisdiction. Voice data often must be processed and stored within specific geographic boundaries to meet regulatory requirements such as GDPR.
  • Single-tenant isolation for sensitive workloads. Some environments require dedicated infrastructure to enforce data segregation, meet compliance standards, or satisfy internal security policies.
  • Scalable operations without added DevOps burden. Expanding voice workloads across regions should not require a proportional increase in infrastructure engineering.

Deepgram’s platform was designed with these requirements in mind, providing the foundation needed to operationalize voice AI reliably and securely across global environments.

Introducing Deepgram Dedicated: A Managed, Single-Tenant Runtime

Enterprises adopting voice AI at scale often face a difficult tradeoff: maintain control over infrastructure and data by self-hosting, or prioritize ease of use through shared, multi-tenant cloud APIs. Self-hosting offers isolation and regional control, but introduces significant ongoing operational complexity. Managed service providers can help bridge the gap, but they often lack product-level expertise and introduce dependency overhead that slows down feature adoption.

Now generally available, Deepgram Dedicated closes this gap. It is a fully managed, single-tenant deployment of Deepgram’s voice AI platform that offers the control and flexibility of self-hosted infrastructure without the burden of operating it. Over the past six months, it has been selectively deployed with a select group of enterprise customers in early production deployments across a range of use cases, from real-time contact center platforms to globally distributed voice agents.

Teams gain regional isolation, performance control, and compliance alignment while offloading infrastructure management to Deepgram. Deepgram Dedicated currently runs on AWS, with support for additional cloud providers on the roadmap.

Key Highlights:

  • Single-tenant architecture: Each deployment runs on isolated compute, avoiding noisy neighbor effects and supporting strict data segregation.
  • Unified voice AI stack: Run speech-to-texttext-to-speech, and speech-to-speech workloads in a single runtime with consistent API behavior.
  • Multi-cluster design: Separates real-time, pre-recorded, and agent workloads onto specialized clusters to maximize performance, ensure high availability, and enable strict workload isolation.
  • Region-specific infrastructure: Deploy in your preferred cloud region to meet compliance requirements, enable ultra-low latency, and align with internal policies, including support for country-level deployments.
  • SLA-backed performance: Optional SLAs ensure predictable uptime and latency with defined targets monitored and enforced by Deepgram.

In one modeled scenario, a customer supporting 1,000 concurrent real-time streams would spend approximately $467K USD annually if self-hosting. This includes $250K in DevOps headcount and $98K in infrastructure costs.

Running the same workload on Deepgram Dedicated lowers total OPEX by approximately $98K USD per year. It also reduces engineering overhead and improves deployment reliability through platform-managed SLAs and regional isolation, giving teams more time to focus on higher-impact work.

EU-Hosted API Endpoint: In-Region Inference for European Voice Workloads

Voice AI adoption is accelerating across Europe, driven by demand for real-time applications in finance, public services, retail, and telecommunications. To date, more than two dozen customers and prospects have expressed interest in EU-based infrastructure, highlighting growing demand for in-region processing that meets local performance expectations and regulatory requirements without compromising model quality or flexibility.

To support this, Deepgram is launching early access to api.eu.deepgram.com, a new EU-hosted speech-to-text API endpoint that delivers in-region inference with full feature parity and consistent performance. The EU endpoint is hosted in AWS EU regions, with additional hosting options under consideration.

Key Highlights:

  • Voice data stays within the EU. All processing occurs inside EU-based AWS regions, ensuring no cross-border data transfer.
  • Latency improvements for EU-based users: Localized inference reduces round-trip time for applications serving users in or near the EU.
  • No code changes required: Existing integrations can migrate by updating the base URL, with no other changes needed.
  • Supports GDPR compliance and auditability: The deployment is fully isolated within the EU legal boundary and aligned with regional data protection standards.

This endpoint is well-suited for European ISVs, compliance-focused enterprises, and global teams looking to reduce latency and streamline deployment in the EU.

Why This Matters: A Global-Ready Voice AI Platform

With these additions, Deepgram now supports a range of deployment options, including multi-tenant hosted APIs, fully managed single-tenant deployments, and customer-operated self-hosted infrastructure. This flexibility allows engineering teams to choose the right model based on their application requirements, compliance obligations, and operational preferences. For some, the hosted API provides a fast path to integration. Others may require the regional data residency of the EU endpoint or the isolation and control of a Dedicated deployment. Teams with existing DevOps capacity may opt for self-hosting to align with internal security policies or infrastructure standards.

What differentiates Deepgram is the ability to deliver true flexibility across deployment models. Teams can build and scale voice AI systems using consistent APIs and model performance, while choosing the infrastructure that fits their environment. Looking ahead, the roadmap includes customer VPC deployments, BYOC support, and expanded region availability across Asia-Pacific, EMEA, and LATAM.

Start Building for Your Environment

If you’re building voice applications that require global reach, regulatory alignment, or low-latency performance, now is the time to explore your deployment options. Demand is high, and we’re expanding access selectively:

Deepgram now runs where your business runs. No trade-offs. No overhead. Just voice AI on your terms.

Deepgram Launches Saga: The Voice OS for Developers

Posted in Commentary with tags on July 8, 2025 by itnerd

Deepgram, the leading voice AI platform for enterprise use cases, today announced the launch of Deepgram Saga, a Voice Operating System (OS) designed specifically for developers. Saga is a universal voice interface that embeds directly into developer workflows, allowing users to control their tech stack through natural speech. Unlike traditional voice assistants that pull developers out of their flow, Saga sits on top of existing tools, transforming rough ideas into precise AI coding prompts, executing multi-step workflows across platforms via Model Context Protocol (MCP), and eliminating the constant context switching that fragments modern development.

In today’s development environment, engineers routinely juggle 8+ tools across multiple monitors, constantly translating thoughts into clicks, rough ideas into overly specific prompts, and context into commands. This fragmentation creates a “quiet tax” on productivity — time lost to alt-tabbing, window hunting, and manual navigation between coding, testing, and deployment tools. Saga eliminates this friction by providing a voice-native AI interface that interprets developer intent and executes actions across the entire tech stack, enabling developers to stay in flow while building software.

Voice-First Workflow Control

Saga addresses the core challenges facing AI-native developers and early-stage builders who need to move fast without getting bogged down in tool complexity. 

Key capabilities include:

  • Developer Ecosystem Friendly: Whether vibe coding with Cursor or Windsurf, maintaining status updates in Linear, Asana, Jira or Slack, extracting CSS from Figma designs, or just executing operational day-to-day tasks within Google Docs, Gmail or Google Sheets, Saga lives alongside the tools developers already know, love, and use every day.
  • Intelligent Prompt Generation: Developers can speak vague ideas like “Build a Slack bot that reacts to emoji,” and Saga transforms these into crystal-clear, one-shot prompts for tools like Cursor, eliminating the trial-and-error cycle of “vibe coding.”
  • End-to-End Workflow Execution: A single voice command like “Run tests, commit changes, deploy, and update the team” triggers coordinated actions across the entire development stack — no tabs, manual commands, or context switching required.
  • Real-Time Documentation: Saga captures stream-of-consciousness thinking and transforms it into structured documentation, tickets, or PR descriptions, allowing developers to rubber-duck their way to clean documentation without breaking their train of thought.
  • Contextual Tool Integration: Rather than requiring developers to switch to separate AI chat windows, Saga surfaces answers and executes actions inline, layered over existing development tools.
  • Natural Code Generation: Developers can speak requests like “Get me the top 10 users who signed up in the last week” and receive instant SQL or JavaScript snippets without needing to Google syntax or write boilerplate.

Built for AI-Native Development with MCP

Saga is specifically designed for the new generation of technical users who rely on AI agents, use tools like Cursor and Windsurf daily, and treat their workflow like a programmable operating system. The platform integrates seamlessly with existing developer tools through MCP (Model Context Protocol) and other standard interfaces, ensuring teams can adopt Saga without disrupting their current setup.

Enterprise-Grade Voice Intelligence

Built on Deepgram’s world-class speech-to-text, text-to-speech, and voice agent APIs, Saga delivers the accuracy and responsiveness required for mission-critical development workflows. The platform understands technical context, domain-specific terminology, and the nuanced language developers use when thinking through complex problems.

Unlike consumer voice assistants that require rigid command structures, Saga interprets natural, conversational speech and translates it into precise technical actions. This approach eliminates the cognitive overhead of remembering specific voice commands while maintaining the reliability enterprises need for production development environments.

Start Building with Saga

Experience how voice can transform your development workflow with Deepgram Saga. The platform is designed for developers who want fewer clicks and more execution, enabling faster iteration cycles and reduced context switching. 

Additional Resources

–      Get started with Deepgram’s quickstart guides

Deepgram Expands Aura-2 Text-to-Speech Platform with High-Fidelity Spanish Voice Models 

Posted in Commentary with tags on June 30, 2025 by itnerd

Deepgram has officially expanded its Aura-2 text-to-speech (TTS) API with a new suite of high-quality Spanish voice models, bringing realistic, expressive, and business-ready voice synthesis to Spanish-speaking markets.

This launch marks a major step in Deepgram’s mission to enable real-time, natural-sounding voice experiences across global industries. The new Spanish voices are optimized for enterprise use cases, from customer support and IVR systems to healthcare and education, featuring precise pronunciation for currencies, timestamps, acronyms, emails, and more.

HIGHLIGHTS:

  • 10 new Spanish Aura-2 voice models tailored for professional use
  • Support for Mexican, Peninsular, Colombian, and Latin American accents
  • Models designed for diverse applications including advertising, IVR, storytelling, and customer service
  • Support for code-switching in select models (English ↔ Spanish)
  • Available now via REST and Websocket APIs

Voices like “Celeste” (Colombian, energetic and friendly) and “Nestor” (Peninsular, calm and confident) are just a couple of the expressive voices now available.

It is available now for use via Deepgram’s hosted TTS API platform. 

Here is a blog with details: https://deepgram.com/changelog/aura-2-spanish-tts 

Developers and product teams can find implementation examples and model specifications in the Deepgram Developer Documentation, here: https://developers.deepgram.com/docs/tts-models?_gl=1*1azt33a*_gcl_au*NzM3MTk0MjU1LjE3NDQ2NjQxNTE.*_ga*OTEzNjY5NzcyLjE3MzY4MDM3ODc.*_ga_TYPC1TBCKT*czE3NTA5NjIyNTckbzkyJGcxJHQxNzUwOTY0ODE3JGozMyRsMCRoMA..#aura-2-all-available-voices 

Deepgram CEO Scott Stephenson Launches “The Scott Stephenson AI Show” — A No-Hype, Deep-Dive Podcast on the AI Revolution

Posted in Commentary with tags on June 23, 2025 by itnerd

Deepgramtoday announced the launch of The Scott Stephenson AI Show, a new podcast hosted by Scott Stephenson, CEO and Co-Founder of Deepgram. In each episode, Stephenson explores the fast-changing world of artificial intelligence (AI), cutting through the hype and digging into what’s actually happening under the hood of today’s most powerful AI technologies. Stephenson brings his signature candor, industry insight, and curiosity to every topic, offering unfiltered perspectives on what’s working, what’s hype, and what’s next.

Episode 1

In the first episode, Scott unpacks the concept of vibe coding, the rising trend where developers interact with AI in a product manager-like mindset, using natural language and feedback instead of conventional code. He also explores the emerging era of AI agents, A2A (agent-to-agent) communication, MCP (Model Context Protocol), and how these breakthroughs will reshape engineering and business workflows.

Episodes will be released bi-weekly. Future episodes will feature conversations around evaluating GenAI models, what and who to trust, and constraints and accelerators for the pace of innovation.  

Where to Watch and Subscribe:

Deepgram Launches Voice Agent API

Posted in Commentary with tags on June 16, 2025 by itnerd

Deepgram today announced the general availability (GA) of its Voice Agent API, a single, unified voice-to-voice interface that gives developers full control to build context-aware voice agents that power natural, responsive conversations. Combining speech-to-texttext-to-speech, and large language model (LLM) orchestration with contextualized conversational logic into a unified architecture, the Voice Agent API gives developers the choice of using Deepgram’s fully integrated stack (leveraging industry-leading Nova-3 STT and Aura-2 TTS models) or bringing their own LLM and TTS models. It delivers the simplicity developers love and the controllability enterprises need to deploy real-time, intelligent voice agents at scale. Today, companies like Aircall, Jack in the Box, StreamIt, and OpenPhone are building voice agents with Deepgram to save costs, reduce wait times, and increase customer loyalty.

In today’s market, teams building voice agents are often forced to choose between two extremes: rigid, low-code platforms that lack customization, or DIY toolchains that require stitching together STT, TTS, and LLMs with significant engineering effort. Deepgram’s Voice Agent API eliminates this tradeoff by providing a unified API that simplifies development without sacrificing control. Developers can build faster with less complexity, while enterprises retain full control over orchestration, deployment, and model behavior, without compromising on performance or reliability.

Developer Simplicity and Faster Time to Market

For teams taking the DIY route, the challenge isn’t just connecting models but also building and operating the entire runtime layer that makes real-time conversations work. Teams must manage live audio streaming, accurately detect when a user has finished speaking, coordinate model responses, handle mid-sentence interruptions, and maintain a natural conversational cadence. While some platforms offer partial orchestration features, most APIs do not provide a fully integrated runtime. As a result, developers are often left to manage streaming, session state, and coordination logic across fragmented services, which adds complexity and delays time to production.

Deepgram’s Voice Agent API removes this burden by providing a single, unified API that integrates speech-to-text, LLM reasoning, and text-to-speech with built-in support for real-time conversational dynamics. Capabilities such as barge-in handling and turn-taking prediction are model-driven and managed natively within the platform. This eliminates the need to stitch together multiple vendors or maintain custom orchestration, enabling faster prototyping, reduced complexity, and more time focused on building high-quality experiences.

In addition to the Voice Agent API, organizations seeking broader integrations can leverage Deepgram’s extensive partner ecosystem, including Kore.ai, OneReach.ai, Twilio and others, to access comprehensive conversational AI solutions and services powered by Deepgram APIs.  

Maximum Control and Flexibility

While the Voice Agent API streamlines development, it also gives teams deep control over performance, behavior, and scalability in production. Built on Deepgram’s Enterprise Runtime and full model ownership across the entire voice AI stack, the platform enables model-level optimization at every layer of the interaction loop. This allows for precise tuning of latency, barge-in handling, turn-taking, and domain-specific behavior in ways not possible with disconnected components.

Key capabilities include:

  • Flexible Deployment: Run the complete voice stack in cloud, VPC, or on-prem environments to meet enterprise requirements for security, compliance, and performance.
  • Runtime-Level Orchestration: Deepgram’s runtime supports mid-session control, real-time prompt updates, model switching, and event-driven signaling to adapt agent behavior dynamically.
  • Bring-Your-Own Models: Teams can integrate their own LLMs or TTS systems while retaining Deepgram’s orchestration, streaming pipeline, and real-time responsiveness.

This tightly coordinated design translates directly into measurable performance gains. In recent benchmark testing using the Voice Agent Quality Index (VAQI), Deepgram achieved the highest overall score among all evaluated providers (see Figure 1). VAQI is a composite benchmark that measures the core elements of voice agent quality: latency (how quickly the agent responds), interruption rate (how often it cuts users off), and response coverage (how often it misses valid input).

Deepgram outperformed OpenAI by 6.4% and ElevenLabs by 29.3%, reflecting the advantage of its integrated architecture and model-driven turn-taking. The result is smooth, responsive conversations without missed inputs, premature responses, or unnatural delays.

Cost-Effectiveness at Scale

In addition to control and performance, the Voice Agent API is built for cost efficiency across large-scale deployments. When teams run entirely on Deepgram’s vertically integrated stack, pricing is fully consolidated at a flat rate of $4.50 per hour (see Figure 2). This provides predictable, all-in-one billing that simplifies planning and scales with usage. Deepgram’s vertically integrated runtime also delivers unmatched compute efficiency, optimizing every stage of the speech pipeline to minimize infrastructure costs while maintaining real-time responsiveness.

For teams that bring their own LLM or TTS models, Deepgram offers built-in rate reductions, enabling even lower total cost of ownership for production-scale deployments.

Start Building with the Voice Agent API

Experience how fast and flexible voice agents can be with Deepgram’s unified voice-to-voice API. Explore the API in our interactive playground, review documentation, or integrate in minutes using our SDK. New users receive $200 in free credits, enough to process over 40 hours of real-time voice agent usage. Start building natural, responsive conversations with infrastructure built for real-time performance and enterprise-scale.

Additional Resources:

  • Explore the blog for an in-depth breakdown of Voice Agent API’s capabilities
  • Watch a fun demo of Deepgram’s voice agent API
  • Try Deepgram’s interactive demo
  • Get $200 in free credits and try Deepgram for yourself

Introducing Nova-3 Medical: The Most Accurate Medical Transcription Model in the World 

Posted in Commentary with tags on March 3, 2025 by itnerd

Deepgram today announced the launch of Nova‑3 Medical, its next‑generation AI-powered speech‑to‑text (STT) model specifically engineered for the healthcare industry. Designed to meet the rigorous demands of clinical environments, Nova‑3 Medical enables developers to build highly accurate, customizable, and secure voice AI products and solutions tailored for healthcare settings. It seamlessly integrates with Deepgram’s enterprise runtime platform—including advanced text-to-speech (TTS) and speech-to-speech (STS) capabilities—providing a comprehensive suite of AI-driven tools that deliver enterprise-grade performance, adaptability, and cost efficiency. From streamlining clinical documentation to revolutionizing therapeutic scribing, Deepgram powers transformative medical transcription applications for industry leaders, driving exceptional outcomes across the healthcare spectrum.

Meeting the Growing Demand for AI-Powered Healthcare Transcription

As healthcare rapidly digitizes—with the widespread adoption of electronic health records, telemedicine, and digital health platforms—the demand for AI-powered transcription has never been greater. Traditional off-the-shelf speech-to-text models often struggle with the complexities of clinical terminology, leading to transcription errors and “hallucinations” that can compromise patient care. With the medical transcription market projected to grow from USD 85.3 billion in 2023 to USD 190.2 billion by 2032, developers building voice-AI applications for healthcare need infrastructure that not only delivers exceptional accuracy and speed but also provides the flexibility to meet diverse regulatory and operational requirements.

Built to meet these demands, Nova-3 Medical leverages advanced machine learning and specialized medical vocabulary training to set a new standard in healthcare transcription. Engineered for real-world clinical environments, the model accurately captures specialized medical terms, acronyms, and clinical jargon—even in challenging far-field audio conditions where providers step away from recording devices such as desktops and tablets. Moreover, it delivers structured transcriptions that seamlessly integrate with clinical workflows and EHR systems, ensuring vital patient data is accurately organized and readily accessible. Its flexible, self‑service customization—featuring Keyterm Prompting for up to 100 key terms—allows developers to tailor the solution to the unique needs of various medical specialties while versatile deployment options, including on‑premises and VPC configurations, ensure enterprise‑grade security and HIPAA compliance.

Benchmarking Nova-3 Medical: Accuracy, Speed, and Efficiency

Nova-3 Medical delivers industry-leading transcription accuracy, optimizing both overall word recognition and critical medical term accuracy for voice-driven healthcare applications.

WER Comparison (see figure 1)

With a median Word Error Rate (WER) of 3.45%, Nova-3 Medical outperforms competing models, achieving a 63.6% reduction in errors compared to the next best competitor. This improvement enhances documentation precision, minimizes manual corrections, and streamlines workflows for healthcare providers.

KER Comparison (see figure 2)

However, medical transcription accuracy isn’t limited to WER—correctly capturing critical medical terms is essential for minimizing patient care risks. Nova-3 Medical achieves a Keyword Error Rate (KER) of 6.79%, marking a 40.35% reduction in errors compared to the next best competitor. This ensures that fewer critical drug names, conditions, and procedures are misrecognized, reducing the chances of transcription errors that could lead to miscommunication, improper documentation, or even patient safety risks.

In addition to transcription accuracy, Nova-3 Medical’s performance excels in real-time applications, where speed and scalability are crucial. Optimized for real-time use, Nova‑3 Medical transcribes speech 5 to 40 times faster than most alternative speech recognition vendors, making it ideal for telemedicine and digital health platforms. Its scalable architecture ensures that as transcription volumes grow, healthcare tech companies can maintain high performance without incurring excessive costs. Starting at $0.0077 per minute of streaming audio, Nova‑3 Medical is more than 2x more affordable than leading cloud providers, reducing operational expenses and enabling companies to reinvest in innovation, accelerate product development, and offer competitive pricing to drive market adoption.

Visit Deepgram at Booth #136 in the AI Pavilion at HIMSS25, March 3-6, 2025, to see Nova-3 Medical in action; and don’t miss these sessions:

SessionFrom AI Scribes to EHR Automation: How Deepgram Enables Healthtech with Voice AI and Amazon Bedrock

When: Tuesday, March 4, 3:40 PM to 4:00 PM

Where: AI Pavilion, Venetian, Level 2, Hall A

SessionVoice AI Mixer with Deepgram & OneReach.ai

When: Wednesday, March 5, 6:00 PM to 7:30 PM

Where: Venetian, Palazzo Ballroom, Palazzo A

For more information about Nova‑3 Medical and how it is revolutionizing healthcare transcription, please visit www.deepgram.com.