PAPER 002 JUNE 2026 · UPCOMING

Reward hacking patterns in LLM agent benchmarks.

Cataloging reward-hacking strategies observed across 200+ trajectory annotations on Terminal-Bench. We propose a taxonomy and recommend evaluation patterns that resist gaming.

Evaluation
ML systems

Cataloging reward-hacking strategies observed across 200+ trajectory annotations on Terminal-Bench. We propose a taxonomy and recommend evaluation patterns that resist gaming.