Home AI

OpenAI and Broadcom Unveil Jalapeño, a Custom Inference Chip That Cuts AI Costs by 50 Percent

OpenAI just revealed its first piece of custom silicon, and it is aimed squarely at the economics that have made running large language models ruinously expensive.…

By Scott Allan 3h ago 3 min read

OpenAI and Broadcom logos alongside the Jalapeno custom inference chip with 50 percent cost reduction and performance metrics on a dark circuit-board background

OpenAI just revealed its first piece of custom silicon, and it is aimed squarely at the economics that have made running large language models ruinously expensive. The chip, called Jalapeño, was co-developed with Broadcom and promises roughly 50 percent cost savings over the merchant GPUs that currently dominate inference workloads.

Nine Months From Blank Sheet to Tape-Out

The speed of the development cycle is the headline within the headline. TechCrunch reported on Tuesday that Jalapeño went from initial design to manufacturing tape-out in just nine months, a timeline Broadcom CEO Hock Tan called the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors. OpenAI said its own AI models were used to accelerate portions of the chip design process, a claim that, if validated at scale, has implications well beyond this single product.

The chip is a reticle-sized ASIC, meaning it pushes right up against the maximum die area that current lithography equipment can expose in a single shot. That is the same class of ambition that defines Nvidia’s Blackwell and AMD’s Instinct MI350: go as large as the physics will allow, then optimize everything inside that envelope for a single workload.

The Business Case: Follow the Money

The 50 percent cost reduction figure, cited by Broadcom in its investor release, compares Jalapeño to “typical AI graphics processing units” on inference tasks. That is a deliberately broad benchmark, and the production reality will depend on workload mix, utilization rates, and how quickly OpenAI can ramp deployment. But the directional signal is unmistakable: OpenAI is building hardware economics into its competitive moat.

The timing is not accidental. OpenAI confidentially filed its S-1 with the SEC earlier this month, and a potential public listing as early as September 2026 would land better with investors if the company can demonstrate a credible path to margin improvement. Inference is the cost center that scales with every ChatGPT query, every API call, every agent interaction. Cutting that cost in half, even partially, changes the unit economics of the entire business.

Broadcom, meanwhile, gets a showcase customer for its custom-silicon division at a moment when the merchant GPU duopoly of Nvidia and AMD faces its first serious threat from purpose-built alternatives. Qualcomm’s confirmed $3.9 billion acquisition of Modular and its rumored $8 billion to $10 billion pursuit of Tenstorrent, both announced this week, underscore how aggressively the industry is moving toward workload-specific hardware. The GPU-for-everything era is not over, but the assumption that it will last forever is.

What This Means for Nvidia

Nvidia’s inference revenue has been the fastest-growing segment of an already explosive business. Jensen Huang has repeatedly told investors that inference will eventually dwarf training as a revenue driver, a thesis that depends on customers continuing to buy Nvidia hardware for both halves of the AI workload. Every custom chip that peels off inference demand chips away at that thesis.

The market seems to agree that this matters. Nvidia stock dropped 6 percent in a broad chip selloff earlier this week, and while Broadcom’s own AI chip forecast fell short of expectations in its most recent quarter, the Jalapeño announcement reframes the narrative: Broadcom may not need to sell more of its own chips if it can build them for the largest AI companies on the planet.

Google has been running custom TPUs for years. Amazon has Trainium and Inferentia. Meta is developing its own MTIA accelerators. OpenAI was the conspicuous holdout among frontier AI labs, relying almost entirely on Nvidia’s hardware. That changed on Tuesday.

Deployment Timeline and the Bigger Picture

Jalapeño is slated for initial deployment by end of 2026, with broader rollout continuing into 2027 and beyond. CNN reported that OpenAI described the chip as the first in a “multigeneration computing platform,” signaling that this is not a one-off experiment but a sustained strategic commitment.

For the broader AI infrastructure market, the implications are significant. If the largest model providers all move toward custom silicon for inference, the addressable market for merchant GPUs narrows to training workloads and the long tail of smaller customers who cannot justify custom chip programs. Nvidia’s CUDA software ecosystem remains a formidable moat, but hardware margins face compression from both custom ASICs and from AMD’s aggressive pricing on its Instinct line.

The next six months will determine whether Jalapeño delivers on its cost promises at production scale. For now, the signal is clear: the company that built the most expensive AI models in history is no longer content to let someone else control the cost of running them.