anagnorisis.cloudSign in

← Hourlies

Hourly ·

OpenAI Built Its First Custom Chip in Nine Months — and It’s Already Running GPT-5.3

OpenAI Built Its First Custom Chip in Nine Months — and It’s Already Running GPT-5.3

OpenAI and Broadcom unveiled Jalapeño, a purpose-built LLM inference chip developed from scratch in just nine months. Engineering samples are already running GPT-5.3-Codex-Spark at target speed and power — and early testing shows performance-per-watt substantially better than anything on the market.

For the first time, OpenAI has its own silicon.

On June 24, OpenAI and Broadcom unveiled Jalapeño, the company’s first custom-built intelligence processor — an inference ASIC designed from the ground up for large language models. Engineering samples are already humming in the lab, running production ML workloads including GPT-5.3-Codex-Spark at target clock speed and power.

The chip was built in nine months flat. Nine months from design to production silicon, with OpenAI’s own models helping accelerate the layout and verification. That is blisteringly fast even by the standards of an industry where tape-outs normally take 18 to 24 months.

"Jalapeño was designed from the ground up for LLM inference using detailed insights from our close collaboration with OpenAI researchers," said Richard Ho, who leads OpenAI’s hardware program. "We optimized the architecture around the kernels, memory movement, networking, and serving patterns that matter most for frontier AI models."

What Makes It Different

Jalapeño is not a repurposed training GPU or a general-purpose AI accelerator. OpenAI stresses it is a purpose-built inference ASIC — a massive, reticle-sized chip with one large compute chiplet surrounded by six HBM memory modules. The architecture was architected around the practical bottlenecks of inference at scale: costly data movement, the balance between compute and memory, networking efficiency, and overall system behavior.

The result, according to early testing, is performance-per-watt substantially better than current state-of-the-art hardware. The companies have not disclosed exact benchmarks — a detailed technical paper is promised in the coming months — but the claim is bold: this chip will execute OpenAI’s most important workloads close to the hardware’s theoretical limits.

Why It Matters

OpenAI currently runs its inference on Nvidia GPUs, as does nearly every major AI lab. Custom inference silicon changes the economics. Google’s TPUs and Amazon’s Trainium chips have already proven that purpose-built accelerators can slash per-query costs for cloud AI workloads. Jalapeño is OpenAI’s first move in that direction, and the company plans to deploy it at gigawatt scale with data center partners over multiple chip generations.

The inference-only focus is deliberate. Training frontier models will likely still require Nvidia’s general-purpose GPUs for the foreseeable future. But inference — the process of running a trained model to answer user queries — represents the vast majority of operational cost for any AI product company. Even single-digit percentage improvements in inference efficiency could reshape OpenAI’s economics.

"We have a deep understanding of the workload," OpenAI president Greg Brockman explained when the Broadcom partnership was first announced. "We’ve really been looking for specific workloads that are underserved, and asking: how can we build something that will be able to accelerate what’s possible?"

Jalapeño is the first answer to that question. And with engineering samples already lighting up in the lab, the second generation is almost certainly already on the drawing board.

Sources: TechCrunch, Tom’s Hardware, Broadcom Investors

More Hourlies Stories

Content on Anagnorisis is summarized, paraphrased, and editorialized from publicly available sources for length and clarity. Original sources are linked where available. All trademarks belong to their respective owners.

More from Anagnorisis