texas-holdem

#texas-holdem

Doing What They Say, Not What They Reason: Locating the Faithfulness Gap in LLM Agents

arXiv cs.AI ↗ · 3d ago Cached

This paper decomposes the faithfulness gap in LLM agents into reasoning→conclusion and conclusion→action steps using Texas Hold'em poker as a controlled environment. It finds that the conclusion→action step is reliable, while the reasoning→conclusion step is the primary source of inconsistency.

0 favorites 0 likes

#texas-holdem

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Hugging Face Daily Papers ↗ · 2026-05-18 Cached

DexHoldem is a real-world benchmark for evaluating embodied agents in dexterous manipulation tasks, using Texas Hold'em with a ShadowHand to test primitive execution, perception, and decision-making in a closed-loop setting.

0 favorites 0 likes

texas-holdem

Doing What They Say, Not What They Reason: Locating the Faithfulness Gap in LLM Agents

DexHoldem: Playing Texas Hold'em with Dexterous Embodied System

Submit Feedback