Conversational voice agents may represent the most significant opportunity in Vertical AI today. While many early voice AI breakouts have focused on speech-to-text transcription use cases (Abridge in medical scribes, Rilla in sales coaching, etc.), we believe the largest long-term opportunities will emerge from conversational, speech-to-speech uses. We are not bearish on transcription by any means. But conversational voice minimizes friction by doubling down on our most natural form of interface and has promise to redefine our relationship with software altogether. As AI whittles away at UI, speech-to-speech will unlock substantial new market opportunities, even if its adoption curve lags behind other generative use cases today.
Quality, trust, and reliability are often cited as the biggest hurdles for conversational voice agents. Common customer complaints stem from performance and reliability issues, including latency, hallucinations, and call dropping. Turn-taking in everyday human conversation is fast. Median latencies in conversational speech are often reported to be under 300ms.1
People love phones—they’re universally accessible and Apple et al have spent decades building an optimized human experience—but they are still imperfect as far as business solutions go. According to a recent Deepgram survey,2 80% of enterprises use some form of voice agent—traditional IVR or newer AI-powered SaaS, mostly. Only 21%, however, are "very satisfied" with them.
New native voice models have addressed many of these challenges. Initial AI speech-to-text models worked as follows: voice is transcribed into text, the LLM generates a text response, and finally, voice is generated from that response. Each step added delays, voice cues were lost, and performance generally suffered. And while flashy experiences encouraged many buyers to feel the need to try, early Voice AI risked similar churn challenges that mainline consumer AI products have faced.3
Native voice models show substantial progress in handling many of these UX issues. They work with audio directly, allowing them to pick up on pronunciation, speed, tone, and other nuances that are only possible from direct processing. Additionally, newer models have much higher contextual awareness: they can listen and speak simultaneously, retain information from earlier in the conversation on the fly, and interpret the meaning behind spoken words without text reference.
While native voice models themselves continue to improve, the underlying AI infrastructure is also making rapid progress. New voice-specific developer platforms—layered on top of native voice models—provide startups with the tooling to build and scale high-quality conversational voice agents, abstracting away the complexity of optimizing models and infrastructure. Product and engineering resources can be focused on customer and vertical-specific needs rather than painstakingly fine-tuning an imperfect voice model to fit a use case.
While not a 1x1 comparison, the impact of developer tools like Twilio and Plaid in enabling application-layer innovation is a reasonable analogy.
The Conversational Voice Advantage
Early conversational voice agents also possess a unique advantage in adoption: despite their shortcomings, phone-based communications remain mission-critical for nearly every vertical market. Our friends at Bessemer covered this well in their piece on Voice AI:
Across industries such as healthcare, legal services, home services, insurance, logistics, and others, businesses rely on phone-based communication to convey complex information more effectively, provide personalized services or advice, handle high-value transactions, and address urgent, time-sensitive needs.
Yet a vast majority of calls go unanswered. For instance, SMBs miss 62% of their calls on average, missing out on addressing customer needs and winning more business.
Additionally, unlike AI implementations that demand significant systems integration or workflow redesign, voice agents can seamlessly join any phone-based communication. Founder enthusiasm is strong and growing. Companies building with voice represented 22% of the most recent YC class, with nearly all focused on vertical applications.4 This concentration isn't coincidental—it reflects the reality that voice works best when deeply embedded in industry-specific workflows and processes.
From a market perspective, we’ve seen a few clear, consistent patterns emerge in voice agent adoption:
Phone-Centric: A vertical market where the phone is the preferred medium for customer-facing sales, support, ops, etc.
Measurable Outcomes: Calls are constrained in length and complexity, with a pre-defined range of outcomes.
Revenue-Generating: Customers can book more business, collect more money, or save on back-office costs.
You're not alone if you think such patterns can apply to almost any vertical market. We’ll dive into some deeper characteristics of early winning playbooks and future opportunities on the horizon.
Voice as the Wedge
In our piece earlier this year on emerging playbooks in Vertical AI, we summarized the three critical ingredients for a successful AI-native vertical wedge product:
Serving as the authoring layer (i.e., launch point) for a valuable internal workflow.
Powering an essential workflow that unlocks revenue, perhaps agnostic to peripheral systems to start, but absorbing downstream progressively.
Becoming the highest engagement application, leveraging usage to build a moat.
Given voice is a modality—not a business model—it shouldn’t surprise you that we believe this same framework applies to voice AI. We’ve seen successful voice agents operate in three primary flavors of authoring layer:
Inbound: Voice agents prevent lost revenue by answering every call. Overflow or after-hours use cases lower the adoption hurdles. Authors: Lead / account objects in system of record.
Outbound: Voice agents enable businesses to conduct hundreds of concurrent outbound calls for lead qualification, appointment setting, and customer follow-up. Authors: Lead / account objects in the system of record (which may even generate prospects); calendar objects in the business management system or ERP analog.
Back-Office: Voice agents streamline internal operations by automating routine communication tasks like vendor coordination, appointment confirmations, and status updates. Authors: Lead/account objects in the system of record; POs and other objects in the procurement system of record; calendar objects in the business management system or ERP analog.
One critical element we’ve seen across these playbooks is that domain experience and maniacal focus on the vertical customer workflow are essential product differentiators. Sitting at the nexus of LLM model development, industry-specific innovation, and consumer behavior change requires a unique combination of talents from successful founding teams. And in truth, some of these initial voice applications are such a breath of fresh air for buyers—“no more cancellations”, “my calendar is full all of a sudden,” “no missed calls”—that some are growing, fast.5 This is obscuring, in our view, a coming wave of competition and commoditization. In the short and medium-term, startups will win based on workflow adaptation, performance, and UX, but there is little defensibility to be had in those components long-term—and it’s unclear to what extent agentic architectures will be moats. Of course, growth (and capital) can win the right to find defensibility over time, so we don’t discount the value of being early and moving fast.
In vertical software, our data shows that domain expertise correlates strongly with long-term success. There’s a strong case to be made that this correlation will only grow as AI matures. Founders steeped in their vertical understand the practical pain points, underlying workflows, compliance requirements, stakeholder mindsets, historical frustrations, semantic norms, and other market-specific nuances. That advantage manifests in startup product decisions, go-to-market segmentation, account management interactions, pricing, and partnership priorities. Beyond driving adoption, this domain expertise is key to achieving workflow defensibility over time.
While our thinking is in utero here, we have been considering the signals indicative of vertical alignment in voice AI. At a high level, vertically aligned voice agents—like successful Vertical SaaS before them—should leverage what is unique to their industry to build differentiation. Parsing specialized vocabulary or documents with context generated from first-party data; understanding industry-specific workflows, with discrete rules-based or deterministic models underlying certain decisions; deep integration and perhaps even partnership with incumbent systems of record. For example, a voice agent for trucking logistics may use off-the-shelf live voice infrastructure, but access proprietary models for route optimization or regulatory requirements. Vertical focus allows companies to achieve the reliability, simplicity, and efficacy thresholds required for high-stakes interactions by deeply understanding their target market's specific edge cases and requirements.
The Voice Agent Playbook
The most ambitious Vertical Voice startups aren't stopping at single use cases. They're using voice as a wedge to capture broader platform opportunities within their target markets. From our earlier piece on Vertical AI playbooks:
We see AI wedges as—ideally—the launch point for a particular high-value workflow. A motion that doesn’t just automate something; it results in an output that will be used and increasingly relied upon by the business. We call this species of wedge solution the “Authoring Layer.” Because it is the first step in a process, it often does not require substantial integrations with pre-existing SaaS. It creates something that a human would’ve created—documentation, notes, a lead, an appointment—and hands it off to another system as a human would have.
Ideal voice wedges should naturally become coordination hubs, where the “authoring layer” triggers downstream actions across multiple systems. For example, a healthcare voice agent doesn't just book appointments—it updates the EHR, books appointments in the PMS, sends reminders, manages waitlists, coordinates with insurance verification systems, and perhaps even supports care management through outbound calls and texts.
This strategy follows a predictable pattern from our Vertical AI playbook…
Identify an impactful constraint to revenue that the voice agent can address with sufficient performance.
Build an easy-to-adopt wedge with these characteristics:
Powering an essential workflow that unlocks revenue growth.
Serving as the authoring layer for a valuable internal workflow.
Build trust through high-quality and consistent performance.
Drive adoption of the wedge and integrate with existing systems of record as needed to maximize data ingestion & stickiness.
Leveraging the starting point, or authoring layer, as the base unit for a new system of record… the System of Intelligence.
…with some subtle but important differences.
Just as we posited earlier regarding conversational voice, we believe the hurdles for voice agents are quality, reliability, and trust. Unlike transcription or other asynchronous generative AI use cases, real-time requirements of natural human voice interaction leave little wiggle room for consistent underperformance. Async software-driven customer communications need to be “just good enough,” as humans can review and edit output as they see fit. An unreliable voice product—with much more limited room for human-in-the-loop—will struggle with adoption and retention, no matter the prospective gains when it works.
With voice AI, there is a generalized bet we (and probably most readers here) are making around consumer adoption. Human comfort with voice AI—including the willingness to share information with, purchase from, or make important decisions in concert with a “fake person”—is a behavioral and sociological journey that we are just a few steps into. Success for all voice startups, vertical or not, will require relentless iteration on the core voice experience.
The fruits of that labor on voice AI will in part accrue to infrastructure plays—certainly compute and storage, and probably foundation models too. But as we have seen in every technology epoch, unlocks at the infrastructure layers have exponential tailwinds for the application layer. And at the application layer, infrastructure is a commodity. Regardless of modality, agentic vertical applications will need the same thing vertical software did to succeed: a workflow that solves a hair-on-fire pain point. In voice, this means continuous refinement of the agent's capabilities. Which may mean a whole lot more than just adapting text- or UI-based systems: optimizing call completion & containment rates, incorporating vertical information to embed vertical language properly, reinventing product flows to mirror speech-thought, and simply understanding how executing jobs to be done differs in the conversational context.
Building a successful vertical-specific voice product that is not only high-quality and reliable but also that customers can trust to handle critical aspects of their livelihood… is a tall order for startups. But necessary to earn the right to grow from “hot point solution” into the systems of intelligence we believe will capture a huge portion of AI-native enterprise value.
A Voice-First Vertical Future
Conversational voice agents represent more than an incremental improvement in communication—they fundamentally change how companies interact with customers and manage operations. More, they enable a new kind of authoring layer that is fertile ground for Vertical AI wedge products. By starting with focused, high-value use cases in specific industries, the best companies can grow quickly and build defensibility as they expand into comprehensive systems of record.
The Vertical Voice AI opportunity is substantially larger than the current scope of phone-based operations would imply. It has the potential to reimagine software from the ground up. Much of the narrative around the value of B2B generative AI to date has centered around labor replacement. Understandably so, since profit potential is so tangible and immediate for buyers. We posit, though, that the greater impact of AI will be its ability to enhance workforce productivity—two sides of the same coin, but we think the distinction matters.
Functions that can be 100% automated are appealing for obvious reasons but in the real world—especially the enterprise world—such cases are the minority. For example, Five9 (a leading customer support SaaS platform now treading into AI) shared examples of voice AI cost-savings in their latest earnings call:
The first is a fast food chain with over 3,000 restaurants globally. Following the deployment of our AI Agents, they experienced a nearly 40% improvement in containment rate… The second example is a global payment processing provider in the UK. They deployed our AI Agents and experienced a 10% increase in self-service in the first year, as well as a 50% containment rate.6
“Containment rate” is the share of calls that were completely resolved without human intervention. Five9 considered 50% containment significant enough to brag about in an SEC filing. Five9 may not be the cutting edge voice AI but the point stands that humans are necessary for interactions of even moderate complexity, and likely will be for a while. But revenue per employee should inexorably rise. In that case, will labor replacement really be the net outcome?
If your business could generate and support revenue at a lower cost overnight, the basic profit incentive would be to grow, not fire everyone. So the role of the human doesn’t diminish… it simply changes.
There are also additional possible axes of productivity growth that humans can facilitate. Imagine, for instance, voice-first project management for workers in the field. In construction, for example, foremen, contractors, and employees could manage and coordinate job sites through voice alone. In trade services, technicians could trade verbal observations (perhaps record machine sounds) for detailed issue-resolution steps. Farm workers could describe what they see in their fields to receive real-time guidance on pest identification or crop yield optimization. It’s no surprise many speculated Jony Ive’s collaboration with OpenAI would take the form of a watch, bracelet7, or most recently, sunglasses.8
These platforms would all be integrated with various backends to inform asks and store records—in the near future, some of those sources may be third-party companies with their own agents (and there is a healthy discourse on how that architecture will emerge9). This frees humans from onerous software UIs, at least in daily or line-of-business use. The power of humans and voice AI in combination comes down to bandwidth. Software UI has been a dilatory, constraining second language that limits throughput of thought and action. Many across vertical workforces have barely had time to learn this “language”—what could they do if we empowered them to operate in their native tongue? How many millions will be voice-native users of technology?
We’re calling right now that, in many scenarios, users will be more comfortable talking to AI than humans. No judgment, no pressure, no forgotten follow-ups, no “turning it on”… just 24/7, 100% active listening around your needs and nuances. Pure conjecture, but if this behavioral hypothesis proves true, it will have societal implications—some unalloyed goods and others more questionable.
One of the most interesting emerging use cases of Voice AI we have been thinking about centers around multi-party transactions and brokering. Digital marketplaces have struggled to gain traction in many segments, often because of unwillingness to share margin or complexity in facilitating the transaction. But a voice-first digital brokering model could unlock value. Parties (both human and agentic) could interact through voice—instead of endless email chains—to move transactions forward. In logistics, for example, voice-first freight matching, negotiation, and dispatching is already live in the market, with clear uptake.
On the SMB side—or in verticals / functions where software adoption is already low due to price, complexity, or lack of options—AI brokering layers could gain particularly fast traction. Like younger generations’ relationship with mobile vs. desktop: why learn a SaaS interface when you could jump straight to voice. Need to buy? Call your procurement “agent,” describe your needs, let it talk to the other side of the network (or other agents), and call back with offers, handling payments silently. Need to get info on a product? Need to train your employees on new SKUs? Need to coach them on their client service skills? Just call. If there’s one thing we expect out of Voice AI, it’s a comeback in phone numbers on startup websites.
Another classic challenge of B2B Marketplaces is intent discovery. Consumer-grade marketing doesn’t often work for buying intent if a good isn’t already bought on channels that ad networks touch. A voice-first GPO (Group Purchasing Organization) could uncover needs and supply-demand matching at scale in a way no team of humans could. Imagine 100 reps calling businesses with no qualification, recording every single detail of their needs into a CRM, then matching them painstakingly when and if supply materialized. Silly and expensive. Boardy foreshadows how AI positions itself more selflessly, less transactionally, and plays for the long game.
We’re calling businesses employing this AI-native, network-driven model “Dark Marketplaces.” We recently backed a founder pursuing a strategy along these lines in retail. But it has broad applicability across verticals, for example healthcare and travel. It could also meaningfully reinvigorate labor marketplaces, especially in highly fragmented, specialized verticals where supply aggregation (and retention) has proven exceptionally difficult. We are working on an upcoming essay on AI-Native B2BMs, so more on that to come.
The Emerging Vertical Voice Framework
These ideas aren’t exhaustive and are as much informed musing as anything. But we share them to illustrate our developing hypothesis on voice AI and its potential to evolve the basic form factor of software. We’ll end with a few potential hallmarks of successful Voice-First Vertical AI:
Reimagine how to solve the pain point voice-first, rather than adding voice to an existing workflow (still solving key data / integration flows on the backend).
Leverage voice's unique capabilities, including emotion detection, hands-free operation, low marginal cost of data input, and smart / on-demand interaction.
Use voice as sensory input, not just a command interface.
Enhance workforce productivity rather than just automating existing ones.
Lean into information asymmetry to build defensibility, capturing and contextualizing vertical, domain-specific, and tribal knowledge. Our friends at NFX touched on this point in their piece10 and we agree.
Build trust through consistency in high-stakes vertical-specific interactions, building buy-in with multiple stakeholders in an organization (and industry).
Thanks for reading Euclid Insights! Additional sources here.11 If you know a founder thinking through Vertical Voice (or multi-modal Vertical AI generally), we’d love to help. Just reach out via LinkedIn, email, or here on Substack in the comments below.
Meyer (2023). Timing in Conversation. Journal of Cognition
Francisco (2025). The State of Voice AI 2025. Deepgram.
Data/slide from Altimeter Capital, via:
Goel (2024). State of Voice AI Report. Cartesia
Palazzolo (2025). VCs Are Keeping an Ear Out for Voice AI Startups. The Information
Five9 (2025). Q1 2025 Earnings Call.
Liberty RPF (2025). 566: The Product Era of AI…. Liberty’s Highlights.
Basu (2025). AI race goes supersonic in milestone-packed week. Axios.
MarkTechPost (2025). Building the Internet of Agents…. MarkTechPost.
Piñol, Mahoney (2025). Voice AI is Working. Here’s Where It Wins First. NFX.
Droesch, Sarycheva, Frost (2024). Roadmap: Voice AI. Bessemer Venture Partners.
Olivia Moore (2025). AI Voice Update 2025. A16Z