OpenAI (@OpenAI) “Chain of thought monitors are a key layer of defense against AI agent misalignme”

OpenAI

@OpenAI

OpenAI’s mission is to ensure that artificial general intelligence benefits all of humanity. We’re hiring:

加入 December 2015

4 正在关注 4.9M 粉丝

OpenAI@OpenAI

2026.05.08 20:19

Chain of thought monitors are a key layer of defense against AI agent misalignment. To preserve monitorability, we avoid penalizing misaligned reasoning during RL. We found a limited amount of accidental CoT grading which affected released models, and are sharing our analysis.

显示更多