Hourly ·
AMD MI355X Delivers 2,626 tok/s on GLM 5.2 — 80% of Blackwell Speed at Half the Cost
Wafer served Z.AI's GLM 5.2 on AMD MI355X GPUs at 2,626 tok/s per node — 80% of NVIDIA B200 throughput but at over 2× lower cost per token. With Blackwell supply tightening and inference demand exploding, AMD's Instinct MI350 series is closing the gap faster than expected.
Inference supply can't keep up with the torrent of frontier model releases, and NVIDIA Blackwell GPUs are getting scarce and expensive. Wafer just showed there's another path — and it runs on AMD.
Serving Z.AI's GLM 5.2 on AMD MI355X accelerators, Wafer hit <strong>2,626 tok/s per node</strong> on a 20k-input / 1k-output workload with 60% cache hit rate. That's roughly 80% of what a B200 delivers — but the MI355X costs roughly 2.75× less per GPU than a B300. Bottom line: <strong>performance per dollar is winning, fast</strong>.
Single-stream latency tells the same story: <strong>213 tok/s</strong> on 10k input / 1.5k output, measured to Artificial Analysis standards. It won't top the raw-speed leaderboard, but it leads on cost-adjusted throughput — the metric that matters when you're serving billions of tokens.
Under the hood, Wafer quantized the bf16 model to MXFP4 using AMD Quark — a lossless conversion that preserved GSM8K, GPQA-Diamond, and tau2 scores within noise. They ran inference on sglang after hitting roadblocks with vLLM (no working MXFP4 + GlmMoeDsa path) and ATOM (output degradation at long context). Getting speculative decode working required two custom fixes for the ROCm image, underscoring that AMD's software gap is real — but shrinking by the week as AI-assisted optimization accelerates.
The takeaway: NVIDIA's monopoly on frontier inference is no longer a given. For the first time, a competitive AMD stack is delivering usable, cost-effective performance on a top-tier model — not in a lab, but in production. The inference cost curve just bent downward.
Sources: Wafer — Performance per dollar is getting faster and cheaper
AMD MI355X 在 GLM 5.2 上 delivers 2,626 tok/s – 80% 的 Blackwell 性能 一半的[K 成本
硅片验证了Z.AI在AMD MI355X GPU上的GLM 5.2表现——每节点2626 tok/s,是NVIDIA B[1D[K B200吞吐量的80%,但成本仅为其两倍。随着布莱克韦尔供应吃紧和推理需求激增,AM[2D[K AMD的Instinct MI350系列正以超出预期的速度缩小差距。
小时报 · 2026-07-04 12:00 UTC AMD MI355X 在 GLM 5.2 上实现每节点 2,626 tok/[4D[K tok/s — 黑well 性能的80% 但成本仅为其两倍 随着Blackwell供应紧张和推理需求激[K 增,AMD 的Instinct MI350 系列正以更快的速度缩小了差距 工艺晶片在AMD的MI355X[6D[K MI355X GPU上使用Z.AI的GLM 5.2,在每节点实现2,626 tok/s的速度 ——这相当于NVID[4D[K NVIDIA B200的80%吞吐量,但成本仅为其两倍。随着Blackwell供应趋紧和推理需求激[K 增,AMD 的Instinct MI350系列正在以更快的速度缩小了差距。
More Hourlies Stories
Content on Anagnorisis is summarized, paraphrased, and editorialized from publicly available sources for length and clarity. Original sources are linked where available. All trademarks belong to their respective owners.

