Tag
This paper benchmarks seven LLM feedback agents in propositional logic tutoring, finding they perform well on optimal steps but systematically fail to correctly diagnose valid suboptimal and incorrect solutions, highlighting limitations for adaptive tutoring.