Archive for Deepgram

Deepgram Launches Flux Multilingual

Posted in Commentary with tags on April 29, 2026 by itnerd

Deepgram today announced the general availability (GA) of Flux Multilingual, expanding its conversational speech recognition model beyond English to support 10 languages, with the ability to automatically detect, understand, and switch languages dynamically within a single conversation in real time. Developers, enterprises, and product teams building voice agents now have access to the first real-time conversational speech recognition model, delivering accurate turn-taking, interruption handling, low latency, and natural human-like conversations at global scale. 

Traditional automatic speech recognition (ASR) is designed for transcription. Flux introduced a new approach, conversational speech recognition (CSR), built from the ground up to understand dialogue flow and enable real-time interaction. Flux has rapidly become foundational infrastructure for real-time voice agents, powering production systems that developers trust to deliver fast, natural conversational experiences with best-in-class accuracy in turn detection and speech recognition. Prior to today’s release, extending these experiences across multiple languages required stitching together multilingual transcription models, language detection, and routing logic, introducing latency, complexity, and brittle user experiences. Flux Multilingual replaces that complexity with a single model and API, making it possible to build conversational voice agents across 10 languages without re-architecting systems or sacrificing performance.

With native support for turn-taking, interruptions, and code-switching within a single interaction, voice applications remain fluid, responsive, and natural regardless of language or region. Flux Multilingual delivers monolingual-grade accuracy across languages. Developers can guide the model with language hints or let it auto-detect, adapting in real time even mid-conversation.

Flux Multilingual Capabilities

Supported Languages

English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch

Ultra-low latency conversational speech recognition, now global

Flux Multilingual is built for understanding and interaction, not just transcription. It uses model-based turn detection, not simple silence detection, to deliver accurate end-of-turn decisions in under 400 milliseconds, keeping conversations fluid and responsive across languages.

Monolingual-grade accuracy with real-time language control

Flux Multilingual delivers monolingual-grade accuracy across languages, with flexible real-time control through language hints or automatic detection, native code-switching, and dynamic adaptation as conversations evolve.

Build and scale global voice agents with one model

Flux Multilingual supports 10 languages in a single conversational model, enabling teams to build and deploy voice agents globally with one integration. One model, ten languages, one API, with no additional infrastructure or model orchestration required.

Key Features

  • Native turn detection and interruption handling for natural dialogue flow
  • Low-latency streaming transcription for real-time responsiveness
  • Automatic language detection and language hint support for accuracy control 
  • Mid-session configurability for dynamic language adaptation
  • Native code-switching within a single conversation
  • Fully compatible with existing Flux API integrations 

Flux Multilingual is now generally available (GA). As part of the launch, Deepgram is offering a limited-time promotional rate on streaming speech-to-text, including Flux Multilingual and Nova-3 models.

Flux Multilingual is available via Deepgram’s Cloud API or as a self-hosted deployment, with support for EU endpoints, SDKs, and seamless integration into voice agent architectures. Developers can get started today at deepgram.com or try Flux Multilingual directly in the Deepgram Playground.

Penguin Solutions Selected by Deepgram to Enable Deployment of Optimized AI Inference Infrastructure for Enterprise Voice AI

Posted in Commentary with tags on March 17, 2026 by itnerd

Penguin Solutions today announced a strategic collaboration with Deepgram and Dell Technologies to architect and deploy a fully optimized, production-ready infrastructure aligned to Deepgram’s demanding enterprise voice AI requirements. By leveraging its unique expertise in designing, building, deploying, and managing AI infrastructure with Dell PowerEdge servers and Dell PowerScale storage optimized for AI workloads, Penguin Solutions delivered an optimal solution to support and enhance Deepgram’s innovative Speech-to-Text (STT), Text-to-Speech (TTS), and Voice Agent capabilities, while ensuring maximum reliability and performance.  

As enterprise adoption of generative AI accelerates, organizations must adhere to stricter service level agreements (SLAs), which require infrastructure that can ensure low latency and high concurrent usage. This Penguin-led deployment addresses these challenges by combining Deepgram’s innovative voice AI models with a purpose-built architectural design, a highly efficient deployment, and ongoing performance optimization.

Drawing on its extensive experience with HPC and AI infrastructure, Penguin Solutions ensures that the underlying infrastructure meets the specific demands of Deepgram’s neural networks. The architecture also incorporates Dell PowerScale storage and Dell PowerEdge XE7745 servers with NVIDIA RTX PRO 6000 Blackwell Server Edition GPUs, which provide efficient inferencing that enables data-intensive voice applications to operate seamlessly in real-time environments.

The Deepgram-Penguin Solutions-Dell collaboration comprises a comprehensive approach for enterprises looking to modernize their customer and employee experiences. With Deepgram’s API-driven voice capabilities, Penguin Solutions’ AI services, and Dell’s powerful AI infrastructure, organizations can achieve highly accurate, real-time transcription and speech synthesis—all while maintaining strict data governance and control.

For those attending NVIDIA GTC AI Conference and Expo March 16-19, 2026, in San Jose, CA, learn more about this innovative collaboration at Dell’s Booth #721 on March 17 at 3:30 p.m. for the session “Powering Enterprise Voice AI: Deepgram’s Agentic Solution” presented by Penguin, Deepgram and Dell. Attendees can also stop by Penguin Solutions’ booth #1031 to speak with an AI factory platform expert.

Flux Voice AI Platform Now Supports On-the-Fly Configuration 

Posted in Commentary with tags on March 4, 2026 by itnerd

Deepgram just announced Flux “on-the-fly configuration” for its voice AI platform which lets developers dynamically update speech recognition settings — such as keyterms and end-of-turn detection — during a live voice conversation without disconnecting or restarting the audio stream.

This matters because now businesses can automate more customer interactions with voice AI without frustrating users, lowering support costs while maintaining a natural, reliable experience.

Employees love it because when voice AI reliably handles repetitive questions and routine tasks, they spend less time on frustrating, high-volume calls and more time on meaningful work, which tends to improve job satisfaction and reduce burnout. What’s more, when routine work is automated, employees often shift toward higher-value roles such as handling complex cases, supervising AI systems, improving workflows, or managing customer outcomes, which can lead to greater responsibility, new skills, and potentially higher pay.

Deepgram just published a blog detailing the new offering — it can be found here: https://deepgram.com/learn/flux-on-the-fly-configuration

Deepgram and IBM Introduce Advanced Voice Capabilities for Enterprise AI

Posted in Commentary with tags on February 24, 2026 by itnerd

IBM and Deepgram today announced a collaboration to integrate Deepgram’s industry-leading speech-to-text and text-to-speech capabilities into IBM’s watsonx Orchestrate generative AI solution.

To address client needs for highly performant, enterprise-grade transcription and real-time captioning, IBM will embed Deepgram’s capabilities into watsonx Orchestrate. This collaboration makes Deepgram IBM’s first voice partner, bringing voice AI technology that helps enterprises automate their operations and meet the growing demand for conversational AI technology, including advanced speech-to-text voice recognition so users can interact with digital agents using natural speech.

Many organizations are adopting AI-powered speech-to-text systems to automate transcription while handling real-world audio conditions, including background noise, diverse accents, and real-life dialog. This integration addresses these challenges by offering a wider range of languages and dialects, including dozens of Arabic and Indian variants, along with voices that reflect regional accents. It also adds options for custom tuning, real-time captioning and natural-sounding speech.

These technologies open new possibilities for enhanced automated customer care and support, call analysis, and voice-driven data entry in fields like healthcare and finance.

Voice interfaces are quickly becoming essential for enterprise AI, and this collaboration strengthens IBM’s role in delivering modern, flexible solutions to its clients. For Deepgram, it expands access to new customers through a trusted enterprise partner and reinforces its position as a reliable, real-time voice platform built for large-scale use.

Deepgram Expands Language Coverage with Hebrew, Persian, and Urdu

Posted in Commentary with tags on February 12, 2026 by itnerd

Deepgram has expanded its Voice AI platform to include Hebrew, Persian, and Urdu — three monolingual right-to-left (RTL) languages.

Some key points:

  • This isn’t just new languages – it shows Deepgram has solved one of the hardest problems in global Voice AI and is now enabling enterprise-grade voice systems across some of the world’s fastest-growing markets.
  • Deepgram just removed one of the biggest blockers to scaling voice AI in the Middle East and South Asia.
  • This enables AI agents to operate natively in RTL markets.
  • Companies no longer need patchwork vendors for global deployments. Great for any size company, but also levels the playing field for smaller companies that thought they couldn’t afford global Voice AI.
  • This opens major revenue markets that were previously hard to serve.
  • This opens doors across virtually every industry from finance and healthcare, to retail, education, and manufacturing, to hospitality, oil & gas, and government, to travel, tourism, and entertainment, and more…

Find out more here: Speech-to-Text for Hebrew, Persian, and Urdu on Nova-3

Deepgram Raises $130M Series C at $1.3B Valuation to Power the Voice AI Economy

Posted in Commentary with tags on January 13, 2026 by itnerd

Deepgram today announced it has raised $130 million in Series C funding at a $1.3 billion valuation. The round was led by AVP, an independent global investment platform dedicated to high-growth technology companies across Europe and North America. 

All major existing investors joined the round, including Alkeon, In-Q-Tel, Madrona, Tiger, Wing, Y Combinator, and funds and accounts managed by BlackRock. Several new investors, including Alumni Ventures and Princeville Capital, invested in the round, in addition to industry leaders such as Twilio, ServiceNow Ventures, SAP, and Citi Ventures. University of Michigan and Columbia University also invested, joining other existing academic investors such as Stanford University.

With this investment, Deepgram is ideally positioned to deliver the real-time frontier Voice AI models and platform required to reliably power billions of live conversations with the naturalness, latency, and accuracy of human voice. AVP was selected as lead investor for its deep expertise scaling category-defining companies globally and its ability to support Deepgram’s international expansion, including Europe and other key markets.

Powered by Deepgram

Today, more than 1,300 organizations build Voice AI functionality powered by Deepgram APIs. Deepgram APIs are a foundational infrastructure layer of a global set of offerings delivering real-time, accurate, and reliable speech understanding, speech generation, analytics, orchestration, and fully autonomous voice agents.

Deepgram’s industry-leading offerings include:

  • Aura-2, the world’s most professional, cost-effective, and enterprise-grade text-to-speech model
  • Nova-3, the world’s most accurate, real-time and reliable speech-to-text model
  • Flux, the world’s first Conversational Speech Recognition model built specifically to solve the biggest problem in voice agents – interruptions
  • Voice Agent API, the world’s only enterprise-ready, real-time, and cost-effective conversational AI API
  • Saga, the Voice OS

All Deepgram models can be customized to domain-specific terminology and acoustic environments and deployed as cloud APIs or through self-hosted and on-premises options. A full SDK library is available to simplify development and accelerate production timelines.

See the Powered by Deepgram page to learn more about how the most innovative AI organizations in the world build Voice AI functionality powered by Deepgram. 

Deepgram Acquires OfOne to Expand Real-Time Voice Automation into Restaurants

Deepgram also announced today the acquisition of OfOne, an AI-native voice platform created for restaurants and the quick-service drive-thru market. OfOne has consistently delivered more than 95% containment, with high employee satisfaction scores and strong operational impact for national QSR brands.

The OfOne team has joined Deepgram, and its technology now anchors Deepgram for Restaurants, an offering built to help restaurants improve customer experience, increase order accuracy, and support overstretched staff with real-time AI assistance. Additional functionality and expanded integrations will be delivered in the coming months.

Expansion of Patent Portfolio

New funding will also accelerate Deepgram’s expansion of its intellectual property, building on a patent portfolio filed continuously since 2016, with several key U.S. patents granted in 2025. US 12,380,880 for End-to-End Automatic Speech Recognition With Transformer establishes a novel method for integrating and training ASR and transformer models as a single system, leading to improvements in accuracy and speed. This is complemented by US 12,334,075 for Hardware-Efficient Automatic Speech Recognition, which utilizes intelligent batching and parallel processing to ensure optimal hardware use, directly reducing latency and cost for customers handling massive volumes of voice data. Most recently, US 12,499,875 for Deep Learning Internal State Index-Based Search and Classification protects techniques for leveraging internal neural representations to enable faster audio search and more accurate classification at scale. These newly granted patents solidify Deepgram’s leadership in core deep learning architecture, representation learning, and deployment efficiency.

New Voice AI Collaboration Hub in San Francisco

Deepgram is opening a new Voice AI Collaboration Hub in San Francisco to bring the voice AI community together in person. Designed for meaningful collaboration with customers, partners, and builders, the space will host hands-on working sessions, live demonstrations, executive briefings, community meetups, and developer hackathons – creating a shared environment where ideas turn into products and the future of Voice AI is built together.

Deepgram Brings Low-Latency Speech Recognition and TTS to Amazon Connect

Posted in Commentary with tags on December 1, 2025 by itnerd

Deepgram today announced integration of its enterprise-grade speech-to-text (STT) and text-to-speech (TTS) models with Amazon Connect and Amazon Lex, enabling real-time transcription, low-latency voice bots, and analytics within customers’ existing AWS environments.

With this launch, teams can use Deepgram’s models natively in Amazon Lex for natural conversational experiences and pair them with Amazon Connect to unlock real-time transcription, quality monitoring, and automation without heavy custom engineering for customer experience scenarios.

Real-time transcription and analytics in Amazon Connect enable live coaching, compliance monitoring, and automated workflows built on a documented integration pattern. Native STT and TTS support in Amazon Lex deliver ultra-low latency, natural-sounding voice experiences and accurate understanding in noisy, high-variance environments. The integration fits seamlessly into existing AWS operations, allowing customers to deploy inside their AWS environment, keep data within AWS, and streamline procurement through AWS Marketplace.

Deepgram’s integrations with Amazon Connect and Amazon Lex are available for customers building on AWS today, with live demonstrations planned at AWS re:Invent in Las Vegas, December 1–5, 2025, in Deepgram Booth #690

Learn more and explore deployment resources here.

Deepgram Launches Streaming Speech, Text, and Voice Agents on Amazon SageMaker AI

Posted in Commentary with tags on December 1, 2025 by itnerd

Deepgram today announced native integration with Amazon SageMaker AI, delivering streaming, real-time speech-to-text (STT), text-to-speech (TTS), and the Voice Agent API as Amazon SageMaker AI real-time endpoints, no custom pipelines or orchestration required. Teams can now build, deploy, and scale voice-powered applications inside their existing AWS workflows while maintaining the security and compliance benefits of their AWS environment.

Native streaming via Amazon SageMaker endpoints means no workarounds or hoops to jump through, just clean, real-time inferences through the SageMaker API. The integration enables sub-second latency and enterprise-grade reliability for high-scale use cases like contact centers, trading floors, and live analytics.

Built to run on AWS, the solution supports streaming responses via InvokeEndpointWithResponseStream and keeps data within AWS. Customers can deploy Deepgram in their Amazon Virtual Private Cloud (Amazon VPC) or as a managed service, aligning with stringent data residency and compliance requirements.

The integration is also backed by a strong relationship with AWS. Deepgram is an AWS Generative AI Competency Partner and has signed a multi-year Strategic Collaboration Agreement (SCA) with AWS to accelerate enterprise adoption.

The integration is available to customers building on AWS, with live demonstrations planned at AWS re:Invent in Las Vegas, December 1–5, 2025, in Deepgram Booth #690. Learn more about our AWS partnership and technical implementation on Deepgram’s AWS partner page, and read the AWS blog: “Introducing bidirectional streaming for real-time inference on Amazon SageMaker AI.” 

Deepgram Launches Flux – The World’s First Conversational Speech Recognition Model 

Posted in Commentary with tags on October 2, 2025 by itnerd

Deepgram, the world’s most realistic and real-time Voice AI platform, today announced from VapiCon 2025 the launch of Flux, the world’s first conversational speech recognition (CSR) model designed specifically for real-time voice agents. Unlike traditional automatic speech recognition (ASR), which was built for passive transcription use cases like captions or meeting notes, Flux is trained to understand the nuances of dialogue. It doesn’t just capture what was said. It knows when a speaker has finished, when to respond, and how to keep the flow of conversation natural and engaging.

The global voice AI agents market is projected to reach nearly $47.5 billion by 2034, growing at a compound annual rate of about 34.8%. This growth is primarily due to the enterprise shift toward automated customer self-service, smarter agent assist tools, and embedded conversational experiences across industries. But traditional STT systems weren’t designed to participate in live dialogue. To recreate conversational flow, developers have been forced to piece together transcription, voice activity detection, and turn-taking logic — a patchwork that leads to latency, errors, and frustrating user experiences.

Flux eliminates these problems by embedding turn-taking directly into recognition. It transforms speech recognition from a passive recorder into an active conversational partner. This provides developers with the tools to build responsive, human-like voice agents without the complexity of workaround code or endless threshold tuning.

What Flux Delivers:

  • Embedded turn-taking intelligence – Conversation-aware recognition that handles timing inside the model itself, with context-aware turn detection and native barge-in handling for fluid exchanges.
  • Lightning-fast performance – Ultra-low latency where it matters most with ~260ms end-of-turn detection, plus distinct events to support eager response generation before a turn is complete.
  • Simpler development – Turn-complete transcripts and structured conversational cues replace fragile client-side logic, so teams can ship production-ready agents in weeks, not months.
  • Enterprise-ready scalability – Nova-3 level accuracy, GPU-efficient concurrency with 100+ streams per GPU, and predictable costs that avoid the hidden overhead of bolted-on systems.

Who It’s For: 

  • Voice AI builders – Developers, engineering leads, and AI teams creating real-time agents.
  • Enterprise innovators – Leaders modernizing customer experience with agent assist and conversational AI platforms.
  • Ecosystem partners – Platform providers, consultancies, and cloud architects looking to integrate CSR into larger AI stacks.

Flux is generally available (GA) today. Developers can start building with CSR immediately.

To celebrate the launch, Deepgram is announcing OktoberTalk – making Flux FREE to use for the entire month of October. Developers can use Flux to build and test real-time voice agents at no cost, with support for up to 50 concurrent connections. The goal: remove every barrier to experimentation so teams can experience how conversational speech recognition changes what’s possible in voice AI. 

Abby Connect Scales Personalized Service and Launches AI Receptionist with Deepgram’s Real-Time Speech-to-Text

Posted in Commentary with tags on September 29, 2025 by itnerd

Deepgram today announced that Abby Connect, a premier virtual receptionist service, has successfully launched its new AI Receptionist product line built on Deepgram’s real-time speech-to-text technology. By choosing Deepgram, Abby Connect is scaling its high-touch customer experience while meeting the demanding needs of industries such as law, healthcare, and home services.

For more than 20 years, Abby Connect has built its reputation on creating a warm, human first impression for every call. But scaling that personal service 24/7 – while managing rising client demand and costs – presented a major challenge. Abby Connect turned to Deepgram to help strike the right balance between efficiency and empathy.

Why Abby Connect Chose Deepgram 

After evaluating Google Cloud Speech-to-Text, AWS Transcribe, AssemblyAI, and Whisper, Abby Connect found Deepgram’s performance to be unmatched:

  • Accuracy in the Real World – Deepgram outperformed competitors on noisy calls, including from HVAC job sites.
  • Low Latency for Natural Conversations – Sub-300ms streaming latency enabled real-time, two-way AI dialogue without delays.
  • Ease of Integration – Developer-friendly APIs and transparent pricing simplified rollout.
  • Domain Customization – Tuned for industry-specific terminology, from legal to medical.

Results Delivered

By leveraging both Deepgram’s real-time and pre-recorded transcription APIs, Abby Connect achieved measurable results:

  • New AI Receptionist Product Line – Successfully launched, automating repetitive call types like scheduling and FAQs.
  • 5x Boost in QA Productivity – Quality assurance teams now review five times more calls per day.
  • 30% Reduction in Audit Time – Faster reviews mean stronger agent coaching and more consistent service.
  • Scale to 100,000+ Calls per Month – Deepgram reliably transcribes massive call volumes to power both AI and human workflows.

Abby Connect is now exploring how to extend Deepgram-powered transcription into even more advanced conversational AI, including large language models trained on call data to detect intent, measure sentiment, and enable smarter escalations.

To learn more, please read the Abby Connect case study found here: https://deepgram.com/customers/abby-connect