Tag
This paper introduces the Context Gathering Decision Process (CGDP), a POMDP framework to model LLM agent search behavior, proposing interventions that improve multi-hop reasoning and reduce token usage without performance degradation.
This paper introduces NM-PPG, a non-myopic active feature acquisition method using pathwise policy gradients to optimize sequential feature selection in costly prediction scenarios.