anagnorisis.cloudSign in

← Hourlies

Hourly ·

GPT-5.5 Hits an Invisible Wall: 44% of Complex Tasks Land at Exactly 516 Reasoning Tokens

A statistical analysis of 390,000 Codex responses reveals GPT-5.5 chain-of-thought tokens cluster at a fixed boundary — and the pattern has grown 300-fold since February.

GPT-5.5 Hits an Invisible Wall: 44% of Complex Tasks Land at Exactly 516 Reasoning Tokens

A statistical investigation into OpenAI's Codex platform has uncovered a striking anomaly in GPT-5.5's behavior: the model's reasoning tokens disproportionately land at exactly 516, a pattern that has intensified dramatically since February and correlates with degraded performance on complex tasks.

The analysis, filed on the openai/codex issue tracker, examined 390,195 response-level token records across 865 sessions from February through June 2026. The findings are stark:

  • GPT-5.5 accounts for only 19.3% of all Codex responses, but 82% of all responses that land exactly at 516 reasoning tokens
  • 44% of GPT-5.5 responses that use 516 or more reasoning tokens stop at precisely 516 — compared to just 1.3% for all other models
  • The clustering has grown from 0.11% of responses in February to 35.84% in June — a 300x increase

When GPT-5.5 responses hit this 516-token boundary, performance appears to degrade. A related earlier issue documented a case where a GPT-5.5 run ending at exactly 516 reasoning tokens returned the wrong answer. The new analysis adds aggregate evidence: as the exact-516 clustering intensified month-over-month, overall reasoning-token intensity decreased across the board.

The pattern is specific to GPT-5.5. GPT-5.3-codex and GPT-5.3-codex-spark show zero exact-516 events in the dataset. GPT-5.2 shows 0.34%. GPT-5.4 shows 19.8% — elevated but nowhere near GPT-5.5's 44%.

The investigator is careful not to claim this proves hidden chain-of-thought truncation. The narrower claim is that Codex telemetry shows a GPT-5.5-specific fixed-token clustering anomaly consistent with thresholded reasoning-budget behavior — and that users are noticing the quality impact.

The issue has drawn 190 reactions and 63 comments on GitHub, with the HN discussion reaching 273 points and 105 comments.

Sources: GitHub Issue #30364, Hacker News Discussion

More Hourlies Stories

Content on Anagnorisis is summarized, paraphrased, and editorialized from publicly available sources for length and clarity. Original sources are linked where available. All trademarks belong to their respective owners.

More from Anagnorisis