Debugging CWMs: Where Traces Help (and Why They Fail)
February 2026
Code World Models (CWMs) learn to simulate program execution by predicting explicit runtime state after each executed command. This post analyzes their behavior through two lenses—local semantic execution and long-horizon state tracking—and identifies two dominant failure regimes: token-budget exhaustion from long traces and brittleness in string-valued state due to subword tokenization.