OpenAI is no longer content to rent its compute from Nvidia. The company unveiled Jalapeno on June 24, its first custom-designed artificial intelligence chip, built in partnership with Broadcom and targeting a 50% reduction in inference costs compared to current-generation GPUs.
The chip was designed from end to end in nine months, with OpenAI’s own AI models contributing to the architecture work.
What Jalapeno Actually Is
Jalapeno is not a training chip. It is an inference accelerator, purpose-built for the specific computational patterns that emerge when running large language models in production. TechCrunch reported that the architecture was optimized around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models, rather than the general-purpose flexibility that makes Nvidia’s GPUs dominant across the broader AI market.
The division of labor is clean. OpenAI handles the underlying architecture design, Broadcom is responsible for silicon implementation and network hardware, and Celestica manages board and rack system integration. Early lab testing shows performance on par with Nvidia’s Blackwell chips and Google’s Tensor Processing Units, with the cost advantage coming from specialization rather than raw compute power.
The deployment timeline is measured. Small prototype runs begin in late 2026. Scaling starts in 2027. The meaningful ramp arrives in 2028. This is not a chip that changes OpenAI’s cost structure next quarter. It is a chip that changes the structural economics of inference over the next three years.
Why OpenAI Is Building Its Own Silicon
The strategic logic is identical to what drove Google to build TPUs, Amazon to build Trainium and Inferentia, and Apple to build M-series chips: when you run enough compute, the economics of custom silicon beat the economics of buying general-purpose hardware from someone else, especially when that someone else is Nvidia, which commands margins that reflect its monopoly position in AI training and inference GPUs.
OpenAI’s reasoning model o3-pro-medium currently costs $100 per million output tokens. Even Claude’s most capable models run at $15 per million tokens. At the scale OpenAI operates, with hundreds of millions of ChatGPT users generating inference workloads around the clock, a 50% reduction in per-token cost translates to billions of dollars in annual savings. The math is straightforward, and the fact that OpenAI is publishing the cost target openly suggests the company is confident in the engineering.
The competitive framing matters too. CNBC reported that OpenAI describes Jalapeno as part of a strategy to “build the full stack,” from model architecture to silicon to serving infrastructure. That language mirrors what Google, Amazon, and Meta have already done. It also signals to Nvidia that its largest customers are systematically reducing their dependence on Nvidia hardware, even as they continue to buy massive quantities of it for training workloads.
The Nvidia Implications
Nvidia is not losing this market today. But the direction is clear. Every major AI company is investing in custom silicon for inference, the workload that generates revenue, while continuing to buy Nvidia GPUs for training, the workload that consumes capital. Over time, training workloads also face competition from custom ASICs and alternative architectures.
The Jalapeno announcement lands at a moment when Nvidia’s stock has already corrected from its highs and the semiconductor sector is repricing around concerns about the sustainability of AI capex growth. Nvidia’s moat remains its software ecosystem, CUDA, and its dominance in training. But inference is where the money is made, and inference is where custom silicon has the most advantage.
What This Means for the AI Cost Curve
The broader significance of Jalapeno is what it says about the trajectory of AI costs. If OpenAI can deliver 50% inference cost savings by 2028, and Google, Amazon, and Meta are achieving similar gains with their own custom chips, the cost of running AI at scale will drop far faster than most market projections assume.
That is good news for enterprises demanding cheaper AI (see the tokenmaxxing-to-efficiency shift happening right now). It is less good news for AI companies whose current pricing assumes GPU-era cost structures. And it is a structural headwind for Nvidia’s inference revenue, even if the company’s training business remains untouchable for the foreseeable future.
The chip race is no longer about who can build the best general-purpose accelerator. It is about who can build the cheapest way to serve the specific models they already run. OpenAI just entered that race with a clear cost target and a partner in Broadcom that knows how to ship silicon at scale.
For the broader market, Jalapeno is a signal that the AI cost curve is bending faster than expected. The companies that control both the models and the silicon to run them will set the price floor for intelligence. Everyone else will pay retail.