SPXNDXDJIBTCETHOILGLD10YGOOGAAPLNVDATSLAMSFTMETASOLXRPLINKLTCDOTBNBSPXNDXDJIBTCETHOILGLD10YGOOGAAPLNVDATSLAMSFTMETASOLXRPLINKLTCDOTBNB
Home AI Technology

OpenAI and Broadcom Unveil Jalapeno, a Custom Inference Chip Targeting 50% Cost Savings

OpenAI is no longer content to rent its compute. The company unveiled Jalapeno on Tuesday, its first custom-designed AI chip, built in partnership with Broadcom and…

OpenAI and Broadcom logos with custom silicon chip illustration showing minus 50 percent cost savings on dark dashboard background with server racks

OpenAI is no longer content to rent its compute. The company unveiled Jalapeno on Tuesday, its first custom-designed AI chip, built in partnership with Broadcom and engineered from the ground up for large language model inference. Early lab testing shows the chip delivering roughly 50% cost savings per inference token compared to current-generation GPUs, according to Broadcom CEO Hock Tan.

If those numbers hold in production, Jalapeno could fundamentally reshape the economics of running AI at scale.

From Concept to Silicon in Nine Months

The development timeline is the first surprise. OpenAI and Broadcom went from initial design to manufacturing tape-out in nine months, which Broadcom described as potentially the fastest ASIC development cycle ever achieved in high-performance advanced semiconductors. That speed reflects both the urgency of OpenAI’s cost problem and the maturity of Broadcom’s custom silicon operation, which already builds chips for Google, Meta, and other hyperscalers.

TechCrunch reported the unveiling as a significant milestone in OpenAI’s strategy to control its own infrastructure stack. The chip is named after the pepper, not the city, and is specifically optimized for inference, the process of running pre-built AI models in response to user queries, rather than training.

Why Inference Economics Matter More Than Training

Training gets the headlines. Inference pays the bills. Every time a user sends a message to ChatGPT, every API call from a developer, every agent task running in the background: that is inference, and it accounts for the vast majority of OpenAI’s compute spend. Cutting inference costs by 50% does not just improve margins. It makes new product categories viable, from always-on AI agents to real-time multimodal processing, that are currently too expensive to run at consumer price points.

OpenAI is designing the full infrastructure stack beneath its models: chip architecture, kernels, memory systems, networking, scheduling, deployment systems, and product experience. Each layer is optimized around the same goal: making models faster, more reliable, and more affordable.

The Nvidia Question

The obvious question is what this means for Nvidia. The short answer: not much in the near term, a lot over the long term. Jalapeno is an inference-only chip. OpenAI still relies on Nvidia’s H100 and B200 GPUs for training, and that dependency is not going away soon. But inference is the faster-growing workload, and every custom chip OpenAI deploys is a GPU it does not buy from Jensen Huang.

Google has been running its own TPUs for years. Amazon has Trainium and Inferentia. Meta is working with Broadcom on its own custom silicon. The pattern is consistent: as AI workloads mature, the largest consumers of compute are building their own chips to escape Nvidia’s pricing power. Nvidia’s gross margins in its data center segment sit above 75%. That is the number every hyperscaler is looking at when they decide to design their own silicon.

Deployment Timeline and What to Watch

Initial deployment of Jalapeno chips is targeted for late 2026, with ramp-up in 2027 and full-scale production in the first half of 2028. That means the cost savings, if they materialize, will be gradual. OpenAI’s compute costs will not drop 50% overnight. But the trajectory matters: a company that was entirely dependent on rented GPU capacity two years ago is now designing and deploying its own purpose-built silicon.

The competitive race in AI infrastructure is shifting from “who has the best model” to “who can run the best model cheapest.” Jalapeno is OpenAI’s first serious entry in that race.