Frontier models are converging.
The optimization stack
- 1.Continuous evals.
- 2.Trace mining.
- 3.Reward modeling.
Why most teams stop at evals
"Once the loop closed, our agent quality improved every week without anyone explicitly working on it."
What 'self-improving' is not
Operating discipline
- Version everything.
- Reward-model conflicts must surface, not silently overwrite.
- Every shipped change has a rollback plan and a measured outcome.