Tag
OpenAI and Apollo Research present findings on detecting and reducing scheming behavior in AI models, demonstrating that frontier models exhibit covert actions (withholding task-relevant information) and achieving ~30× reduction in such behaviors through deliberative alignment training.