OpenAI Unveils Jalapeño, Its First Custom AI Chip Built With Broadcom, to Cut Nvidia Dependence

OpenAI on Wednesday unveiled Jalapeño, its first custom-designed AI inference chip, built in collaboration with Broadcom. The announcement marks a significant shift in OpenAI's infrastructure strategy: the company has been almost entirely dependent on Nvidia GPUs since its founding, and Jalapeño represents the first concrete step toward building hardware that OpenAI designs to its own specifications rather than buying off the shelf.

The chip is an inference processor — meaning it is designed to run already-trained AI models in response to user requests, not to train models from scratch. That is the correct focus for OpenAI's immediate cost problem: inference for products like ChatGPT and the API runs continuously at massive scale, and Nvidia GPUs, while excellent at training, carry significant cost overhead when used primarily for inference workloads. A purpose-built inference chip can eliminate the hardware and power overheads of general-purpose GPU architecture.

Performance and Cost Claims

OpenAI president Greg Brockman described the chip's design philosophy in terms of workload fit: "We have a deep understanding of the workload. How can we build something that will accelerate what's possible?" Early testing results show "significantly better performance-per-watt than current state-of-the-art alternatives," according to the company, with particular benefits for "low operating cost when running real-time coding models." Specific benchmark figures were not released.

The performance-per-watt framing is significant. Power consumption is increasingly the binding constraint in AI data centers — not compute capacity or memory bandwidth. A chip that delivers the same inference throughput at lower wattage reduces electricity costs and unlocks more capacity within fixed power budgets. For a company running inference at the scale OpenAI does, even modest efficiency gains compound into material cost reductions.

The Broadcom Partnership

Broadcom is the natural partner for this kind of project. The company has extensive experience designing custom application-specific integrated circuits (ASICs) for hyperscalers — including the TPU chips Google has used to build its AI infrastructure for over a decade. Broadcom handled the silicon design and manufacturing coordination; OpenAI contributed the workload specifications and model architecture knowledge that informed the chip's design.

The manufacturing process node and foundry partner were not disclosed. Given the timeline and the emphasis on inference rather than training, TSMC's 3nm or 4nm nodes are the most likely candidates, though OpenAI has not confirmed this.

Why Now, and Why Inference-First

OpenAI is not the first large AI lab to build custom silicon. Google has run its AI infrastructure on TPUs since 2016. Amazon's Trainium chips power parts of AWS's AI workloads. Meta has deployed custom inference chips across its recommendation systems. Microsoft's Maia project, developed in partnership with OpenAI, has been in development for several years. But Jalapeño is the first chip OpenAI has designed with its own name on it, signalling a strategic shift rather than just a supplier relationship.

The emphasis on inference reflects OpenAI's current economics. Training large models is a one-time cost per model version; inference is continuous and scales directly with user growth. As ChatGPT has grown past one billion monthly active users and OpenAI's API business has expanded, inference has become the dominant driver of compute spending. Owning the chip layer for inference gives OpenAI direct control over its largest and fastest-growing cost center.

Implications for Nvidia

Jalapeño is not a threat to Nvidia's training business — training frontier models at the scale OpenAI operates requires the kind of flexible, massively parallel compute that Nvidia GPUs provide and purpose-built ASICs cannot match in the short term. But inference is a different story. If Jalapeño performs as advertised and scales to production deployment, OpenAI could shift a meaningful portion of its inference workload off Nvidia hardware.

The broader trend is clear: every major AI lab and cloud provider is developing alternatives to Nvidia for specific workloads. Nvidia's dominance in AI hardware is real but not permanent, and inference — being more predictable in its workload characteristics than training — is the easiest segment to displace with custom silicon. Jalapeño, as first reported by TechCrunch, is currently in testing with no production deployment date announced.