Skip to main content
SurfaceSURFACE BREAK

OpenAI Deploys First Production Model on Cerebras Chips, Bypassing Nvidia

OpenAI has launched GPT-5.3-Codex-Spark on Cerebras wafer-scale chips instead of Nvidia GPUs, delivering 1,000 tokens per second — 15x faster than previous versions. The move validates alternative chip architectures for production AI workloads.

VERIFIEDConfidence: 80%

OpenAI has launched GPT-5.3-Codex-Spark, its first production AI model running on Cerebras wafer-scale chips instead of Nvidia GPUs. The model generates approximately 1,000 tokens per second — roughly 15 times faster than previous versions — and is designed specifically for real-time coding assistance. It is currently available as a research preview to ChatGPT Pro subscribers.

The move marks a significant shift in OpenAI's hardware strategy. Until now, the company has relied almost exclusively on Nvidia's GPU infrastructure for its production models. By deploying on Cerebras's Wafer Scale Engine 3 accelerators, OpenAI is validating an alternative chip architecture for commercial AI workloads — a development that could reshape AI infrastructure economics and reduce the industry's dependence on a single chip supplier.

Newsletter

Stay informed. The best AI coverage, delivered weekly.

Related