What's Working in Vertical Voice AI
With Mike Droesch, Partner at Bessemer Venture Partners
Mike Droesch is a Partner at Bessemer Venture Partners, where he’s spent nearly nine years investing across vertical SaaS, B2B marketplaces, supply chain, and — for the last two — leading the firm’s voice AI roadmap. He backed VAPI, which just closed its Series B at a $500M valuation. BVP is also in Abridge in healthcare, Rilla in home services, and Axiamatic in enterprise transformation. Before venture, Mike was an engineer and management consultant. He’s one of the most methodical vertical-specific roadmap builders in the business — tune in to learn what he’s seeing in Vertical Voice AI.
Check out our latest episode of Verticals with Mike and continue below for our breakdown of the landscape and the thesis that stuck with us most: voice as the new process mining, and the fuel for future Vertical AI defensibility.
A quick word from our sponsor on the Verticals podcast, Parafin. They’ve surpassed $100M in revs, extending >$25B in financing to their customers’ customers. They power embedded capital for platforms like DoorDash and Jobber, and are purpose-built for vertical platforms. Click here to explore a custom program for your platform today.
Today’s Episode
Voice AI has produced some of the fastest-growing vertical startups over the last two years. It’s also addressing some of the biggest open questions around application-layer defensibility. The infrastructure to make a voice agent that sounds human and can call tools is increasingly commoditized. So where does the moat come from? And for founders building voice-first products — what separates a feature from a company?
Mike has a view on this, shaped by two years of running Bessemer’s voice AI roadmap and many more backing companies across the stack — from VAPI at the platform layer to Abridge and Rilla at the application layer. The conversation covered the full landscape: build vs. buy decisions in voice infrastructure, why regulated industries adopted fastest (counterintuitively), where voice is overdone vs. underdone by vertical, and AI-native monetization models.
The most trenchant insight— and the one most relevant to how we think about defensibility at Euclid — is Mike’s thesis on voice as a fundamentally new data layer. Your system of record captures end states. Voice captures everything upstream: how decisions actually get made, how processes actually work, what sentiment and context look like in real time. That data has never existed in structured form before. Perhaps it’s never even been written down. And in a world where AI can actually make sense of it, the companies that capture it first will build something their competitors can’t replicate.
The Voice Stack: Build vs. Buy
When Mike started the voice roadmap two years ago, the impetus wasn’t platforms — it was the fact that vertical voice agents in home services, restaurants, and healthcare were starting to see incredible traction. But when his team dug in, they found something surprising: standing up a production-grade voice infrastructure stack required a 10-plus person engineering team full-time. That’s what made the VAPI thesis so powerful.
If voice isn’t your core product — you’re a ServiceTitan extending into a voice customer service agent, or a bank building an internal training tool — you’re going to reach for a developer platform rather than staff an entire engineering team around infrastructure that isn’t your differentiator. À la Twilio. But if voice is your core product,’ you probably want to control every detail of the stack. Mike sees plenty of voice startups doing exactly that.
Where VAPI has found particular resonance is regulated industries: healthcare, insurance, financial services. These are verticals that need discrete conversational eval gates — validate the customer’s birthday before releasing health information, for instance — with strict governance around every leg of the interaction.
The Trust Gap
The technology bottlenecks in voice have largely been solved. Latency is down to ~500ms end-to-end turns — snappy enough to feel close to a natural conversation. LLMs support tool calling well. The bottleneck now is trust.
Mike shared a telling anecdote: two years ago, a roofing company with a $30K average deal size wouldn’t touch voice agents. Every lead was potentially worth $30,000 — they were going to staff a human every time. Today, that same company would adopt. What changed wasn’t the underlying technology. It was the eval infrastructure. As eval suites get more reliable, businesses extend more autonomy to agents. More autonomy means more workflows handled end-to-end.
There is a simple implication for founders: don’t build a voice agent company unless you can own the customer’s entire journey — not just have a delightful conversation and hand it off to a human. If you can’t fully resolve the query, you might actually be delivering your customer more work. Scheduling is a good example of something low-stakes enough that businesses will fully trust an agent. The question is whether you’re in a use case where you can push into higher-dollar, higher-stakes territory. If the answer is no, or if you can’t articulate how you get there, the step-function scalability that makes voice agents venture-interesting might not be there.
Voice as the New Process Mining
Here’s the thesis that resonated most with how we think about the future of vertical defensibility.
Your system of record — your CRM, your ERP, your EHR — captures the end state. The deal closed. The claim resolved. The migration shipped. It tells you what happened. As Mike put it, it doesn’t capture any of the discussion about how you got there. Voice does.
Consider what that upstream context actually contains. In a sales call, it’s not just what the rep said — it’s how they said it. How much time they spent listening versus talking. How well they read the prospect’s sentiment. In Rilla‘s case, that data is the foundation of a coaching product for home services reps that couldn’t exist without voice. But the insight goes way beyond sales.
Mike pointed to Axiamatic, which raised $54M from Greylock and Bessemer to apply this to enterprise transformations. When a company runs an SAP migration, consultants run dozens of parallel workshops where stakeholders describe how business processes actually work — not how the spec says they should. Traditionally, one person was taking notes. Now every session is transcribed and fed to agents that build a migration plan no single human could assemble, because no single human could sit in on all those workshops simultaneously.
Then there’s Qualitate, an AI-native expert network that uses voice agents to conduct primary research at scale. Mike shared an example of a corporate buyer running an M&A process that commissioned 200 expert network calls over a single weekend. By Monday morning, perfectly compiled, structured intelligence — a process that would have taken an analyst team weeks. Qualitate has now conducted over 350,000 minutes of AI-moderated expert discussions, and every conversation trains the next one.
This maps directly to how we think about moats. In “Dude, Where’s My Moat?” we argued that usage and data loops are the most critical moats at the growth stage. Voice dramatically widens the aperture of what those loops can capture. The keyboard was always a bottleneck — low bandwidth, high friction, lossy. Voice is high bandwidth, low friction, and captures signal that was previously trapped in someone’s head. We explored some of these dynamics in our piece on voice-first playbooks in vertical AI, and Mike’s portfolio is proving the thesis out.
Use Cases That Didn’t Exist Before
What excites Mike most isn’t the customer service replacement stuff — he views that as table stakes at this point. It’s the use cases that flat-out were not possible before voice AI. Recruiting interviews at scale. Sales reps logging mandatory training hours against an AI opponent before ever touching a real customer. Mixed-modality support where an agent watches your cursor navigate an app while talking you through it. Construction inspections where a roofer talks to an app and takes photos simultaneously, stitching together real-world data that was never captured digitally.
And then there’s the ambient layer. Every meeting recorded by Granola. Mike mentioned a CEO who wears an always-on wrist recorder and feeds his entire day into Claude. We’re moving from voice as a channel to voice as a continuous intelligence layer — and the companies that figure out how to capture and compound on that context will own something no system of record ever could.
The Takeaway for Vertical Founders
The potential of Voice AI defensibility is increasingly important: infrastructure is table stakes, distribution (developer community, organic flywheel) is the near-term differentiator, and the long-term moat is the compounding loop of conversations → better evals → more agent autonomy → more conversations. True data gravity.
For founders building voice-first Vertical AI products, the implication is direct. If your product touches a workflow where conversations happen — sales, consulting, intake, support, training, field work — the voice data layer isn’t a feature. It’s the foundation of your long-term defensibility. The system of record captured the scoreboard. Voice captures the game. Or perhaps another analogy is better: in an era where AI can watch the film, the company with the best footage wins.
Jump to Key Moments from the Episode
00:00 Why Voice AI could transform software forever
03:15 The infrastructure powering the Voice AI boom
06:04 Why healthcare and finance are adopting Voice AI fast
11:15 The hidden moat behind Voice AI platforms
15:34 Why humans still matter in AI voice workflows
19:45 Voice AI use cases nobody saw coming
24:26 The real future of AI interfaces
29:11 Which Voice AI markets are overcrowded… and which aren’t
34:41 How AI pricing models are changing software economics
37:53 What investors actually look for in AI startups now
41:32 The biggest mistake Voice AI founders make
42:55 Why owning the entire workflow matters
Subscribe to Verticals to get new episodes every week, available wherever you watch or listen.


