Home AI

Anthropic and Microsoft in Talks for the Maia 200 Chip: Claude Inference Goes Custom as Nvidia Faces Its First Real ASIC Threat

Anthropic is in talks to run Claude on Microsoft's Maia 200 inference chip, a fourth silicon program that turns up the pressure on Nvidia's inference moat.

By Scott Allan May 25 3 min read

Microsoft, Anthropic, and Nvidia logos around a glowing Maia 200 AI accelerator chip on a circuit-board background

Anthropic is in talks to run Claude on Microsoft’s custom Maia 200 inference chip, according to a CNBC report on May 21, a move that would make the Claude maker the first frontier AI lab spreading gigawatt-scale workloads across four separate silicon programs at once. For Nvidia, whose stock slipped about 2% the following Friday, this is the story Wall Street has quietly braced for: the biggest AI labs are starting to route real production traffic onto chips that are not Nvidia’s.

The detail that matters is the word inference. Training is where Nvidia’s moat is widest. Inference, the running of a model once it is built, is the high-volume, cost-sensitive workload that custom silicon was designed to eat first.

A Fourth Lane of Silicon

If the talks convert, Anthropic would run Claude across Nvidia GPUs, Amazon’s Trainium, Google’s TPUs, and now Microsoft’s Maia, four distinct chip families under contract at the same time. No frontier lab has done that. CNBC was careful to note this is still a negotiation, not a signed deal, with discussions in early stages and centered on inference, set against the backdrop of Microsoft’s $5 billion investment in Anthropic. The Information broke the story first; CNBC and others substantiated it.

The strategic logic is blunt. A lab that depends on one supplier for the scarcest input in technology is a lab that does not control its own cost curve. Four suppliers is negotiating power. For Microsoft, the upside is just as concrete: every Claude token served on a Maia chip instead of a rented Nvidia card is margin that stays inside Azure rather than flowing to Santa Clara.

What the Maia 200 Actually Is

Microsoft launched the Maia 200 on January 26 of this year, built on TSMC’s 3-nanometer process with 216GB of HBM3e memory running at 7 terabytes per second. The headline efficiency claim deserves precision, because it has been widely garbled: the roughly 30% improvement Microsoft touts is in tokens per dollar measured against its own previous fleet silicon, the figure CEO Satya Nadella cited on the April earnings call, not a blanket claim against Nvidia. Microsoft does claim its FP4 throughput beats Amazon’s Trainium3 and its FP8 sits above Google’s latest TPU.

Until now, Maia 200 has been publicly tied to running OpenAI’s GPT-5.2. Putting Claude on the same accelerator would be a first, and a notable one given how closely Microsoft, OpenAI’s largest backer, and Anthropic have been circling each other.

The Compute Problem Anthropic Cannot Spend Its Way Out Of

Behind the chip talks is a capacity crisis. Speaking at Anthropic’s developer conference in early May, CEO Dario Amodei said the company planned for tenfold growth and instead saw demand running at roughly eighty times on an annualized basis in the first quarter, adding bluntly that “this is the reason we have had difficulties with compute.” That is the context for a spending spree that already includes a deal to pay xAI about $1.25 billion a month, through May 2029, to rent the Colossus 1 supercomputer in Memphis, a roughly $45 billion commitment disclosed inside SpaceX’s IPO filing.

This is the same capital intensity that pushed Anthropic into a $30 billion funding round at a $900 billion valuation earlier this month. The valuation is the headline. The compute bill is the reason.

Why This Is Nvidia’s Slow-Motion Nightmare

Nvidia just printed a record quarter, so the threat is not this week’s revenue. The threat is the shape of the next three years. If Maia 200 hits its cost-per-token claim in production, every hyperscaler with its own chip program has a reason to push more inference onto in-house silicon, and the most profitable, highest-volume slice of AI compute slowly migrates away from merchant GPUs. Nvidia keeps training. The margin pressure shows up at the inference layer first, one workload at a time.

Inference is also the faster-growing half of AI compute as models shift from being built to being used at scale, which means the workload now drifting toward custom chips is the one expanding most quickly. That is why a report about early-stage talks, not even a signed contract, was enough to nick Nvidia’s stock. The market is not pricing this quarter. It is pricing the precedent.

The Tell to Watch

Three things will tell you whether this is a real shift or a negotiating leak. Whether the talks convert into a signed, dollar-denominated commitment. Whether Claude actually runs on Maia in production rather than a test cluster. And whether a second frontier lab follows Anthropic onto a hyperscaler’s custom chip. The first lab to diversify off Nvidia at scale sets the template. The rest of the industry is watching the same tape Nvidia is.