agentic-misalignment

Tag

Cards List
#agentic-misalignment

A Sober Look at Agentic Misalignment in Automated Workflows

arXiv cs.AI · 2026-05-26 Cached

This paper studies agentic misalignment in multi-agent systems with automated workflows, proposing Agentic Evidence Attribution (AEA) to correct misaligned agent behavior using context-specific evidence.

0 favorites 0 likes
#agentic-misalignment

@AnthropicAI: Read the full post here: https://alignment.anthropic.com/2026/teaching-claude-why/…

X AI KOLs · 2026-05-08 Cached

Anthropic's alignment team presents techniques to reduce agentic misalignment in AI models, including training on ethical dilemma advice and constitutional documents, which generalized well out-of-distribution.

0 favorites 0 likes
#agentic-misalignment

May 8, 2026AlignmentTeaching Claude why

Anthropic Research · 2026-05-08 Cached

Anthropic shares lessons from improving Claude's alignment training, achieving perfect scores on agentic misalignment evaluations by teaching underlying principles rather than just demonstrations.

0 favorites 0 likes
← Back to home

Submit Feedback