agent-interpretation

Tag

Cards List
#agent-interpretation

Radical AI Interpretability

arXiv cs.AI · 4d ago Cached

This paper develops a framework for interpreting AI systems as agents, drawing on radical interpretation philosophy and mechanistic interpretability tools, addressing how to trust AI systems by understanding their beliefs, desires, and meanings.

0 favorites 0 likes
← Back to home

Submit Feedback