agent-interpretation

#agent-interpretation

Radical AI Interpretability

arXiv cs.AI ↗ · 4d ago Cached

This paper develops a framework for interpreting AI systems as agents, drawing on radical interpretation philosophy and mechanistic interpretability tools, addressing how to trust AI systems by understanding their beliefs, desires, and meanings.

0 favorites 0 likes

agent-interpretation

Radical AI Interpretability

Submit Feedback