Home AI

xAI Launches Voice Agent Builder at $0.05 Per Minute, Undercutting ElevenLabs and Vapi on Price

Elon Musk’s AI company just collapsed the cost of building AI phone agents by an order of magnitude, and the timing is designed to pull enterprise…

By Scott Allan 6h ago 3 min read

xAI and Grok logos with voice wave patterns phone icons pricing panels telephony network and agent performance metrics

Elon Musk’s AI company just collapsed the cost of building AI phone agents by an order of magnitude, and the timing is designed to pull enterprise customers away from incumbents before they lock in annual contracts.

xAI launched the Grok Voice Agent Builder in beta on July 1, a no-code platform that turns a plain-language description of a phone call into a working voice agent in roughly two minutes. The platform bundles telephony, document retrieval, tool-calling, guardrails, and call observability into a single speech-to-speech interface, as eesel AI detailed in its analysis. At $0.05 per minute of agent audio plus $0.01 per minute for telephony, the pricing undercuts established voice AI vendors by a significant margin.

Why the Pricing Matters More Than the Product

The voice AI market has been a fragmented, expensive mess. Most enterprise deployments require stitching together three separate vendors: a speech-to-text provider, a language model, and a text-to-speech engine. Each adds latency, each bills separately, and the integration overhead means that deploying a single AI phone agent can cost hundreds of engineering hours before a single call is made.

xAI is collapsing that stack into one platform at a price point that makes the three-vendor model look economically irrational. Let’s Data Science reported that the platform supports over 25 languages and delivers sub-second response times, built around Grok’s native voice model rather than a bolted-together pipeline.

The competitive implications are immediate. ElevenLabs charges roughly $0.18 to $0.30 per minute for comparable voice synthesis. Vapi, which has raised significant venture capital on the premise that voice AI infrastructure is a platform business, charges per-minute rates that are multiples of what xAI is offering. At $0.06 total per minute (audio plus telephony), xAI is not competing on features. It is competing on cost destruction.

The No-Code Play

The builder’s no-code interface is the second strategic lever. NewsBytesApp confirmed that setup requires no programming: users write a natural-language description of how calls should flow, attach documents and tools, configure guardrails, and the agent is live. Integration hooks include Gmail, Outlook, Google Calendar, Linear, Notion, and OneDrive.

This matters because the voice AI market’s current bottleneck is not technology. It is implementation. Companies know they want AI phone agents for customer support, appointment scheduling, lead qualification, and inbound sales. They have been unable to deploy them at scale because the engineering cost of building and maintaining a multi-vendor voice stack exceeds the labor savings for all but the highest-volume call centers.

A no-code platform at $0.06 per minute changes that calculus for every mid-market company that answers phones. The total addressable market is not the enterprise voice AI segment. It is every business with a phone line and a desire to reduce call handling costs.

The xAI Ecosystem Angle

For xAI, which has been described as a “money furnace” burning through capital on Grok development and infrastructure, the Voice Agent Builder represents something new: a product with direct, per-minute revenue attached to it. Unlike the Grok chatbot, which generates revenue indirectly through X Premium subscriptions, voice agents create a usage-metered revenue stream that scales linearly with adoption.

The beta launch is also a data play. Every voice agent interaction generates training data that improves Grok’s voice model, creating a flywheel where lower prices drive higher volume, which generates more data, which improves the model, which justifies even lower prices. It is the same playbook that AWS used to dominate cloud computing, and it only works if you are willing to price below cost in the early innings.

What Incumbents Need to Worry About

xAI’s own benchmark claims are worth noting with caveats. The company says its Grok Voice Think Fast 1.0 model scores 67.3% on its self-administered tau-voice Bench, ahead of Gemini 3.1 Flash Live and GPT Realtime 1.5. Self-administered benchmarks are marketing, not science. But the benchmark claim signals that xAI believes its voice model is competitive on quality, not just price, and that it is willing to make that argument publicly.

For ElevenLabs, Vapi, and the constellation of voice AI startups that have raised venture capital on the assumption that voice infrastructure is a defensible platform business, xAI’s entry introduces a brutal pricing dynamic. When a well-capitalized competitor enters your market at one-third to one-fifth of your price point with a no-code interface, the moat you thought you had turns out to be a speed bump. The question is whether the incumbents can differentiate on quality, customization, and enterprise support fast enough to justify the premium, or whether price wins the way it usually does in infrastructure markets.