Debugging Code World Models

Debugging CWMs: Where Traces Help (and Why They Fail)

February 2026

Code World Models (CWMs) learn to simulate program execution by predicting explicit runtime state after each executed command. This post analyzes their behavior through two lenses—local semantic execution and long-horizon state tracking—and identifies two dominant failure regimes: token-budget exhaustion from long traces and brittleness in string-valued state due to subword tokenization.

Code World Models Semantic Execution Tokenization State-Tracking

Debugging Code World Models

Blog Post

Debugging CWMs: Where Traces Help (and Why They Fail)