Frontier models can recognize when they are being tested, and their tendency to scheme is influenced by this situational awareness.
We demonstrated counterfactually that situational awareness in their chain-of-thought affects scheming rates: the more situationally aware a model is, the less it schemes, and vice versa.
Moreover, both RL training and anti-scheming training increase levels of situational awareness.
显示更多