Tag
This paper proposes a framework linking partially observable Markov decision processes (POMDPs) with biochemical reaction dynamics to model phototaxis in unicellular algae, using inverse reinforcement learning to infer behavioral objectives from experimental trajectories.
This paper proposes a POMDP framework for multi-objective decision making in lithium production, addressing geological, demand, and pricing uncertainties to optimize mine opening and extraction method selection. The approach outperforms human-inspired heuristics by dynamically adapting to shifting price regimes through belief state planning.
This paper models cancer treatment as a belief-space planning problem using active inference, deriving an expected free-energy objective that unifies goal-directed control and information acquisition under measurement budgets. The framework is validated on real clinical data from the AACR Project GENIE, demonstrating simultaneous patient categorization and high treatment efficacy.
This paper presents the first theoretical model for out-of-distribution generalization in reinforcement learning, showing that smaller abstract state spaces enable cross-scale generalization in POMDPs.
A controlled study of compound LLM agent design in an adversarial POMDP (CybORG CAGE-2), systematically varying context, reasoning, and hierarchy across five model families. Key findings: programmatic state abstraction yields large returns per token, hierarchy without deliberation tools achieves best absolute performance, and context engineering is more cost-effective than deeper reasoning.
This paper proposes Action-Conditioned Risk Gating, a lightweight reinforcement learning method for risk-sensitive control under partial observability that uses a compact finite-history proxy state and an action-conditioned near-term risk predictor to balance safety and performance.
This paper presents a novel framework for synthesizing finite-state controllers for Partially Observable Markov Decision Processes (POMDPs) by integrating sampling, automata learning, and model-checking. The approach provides formal guarantees for threshold-safety problems that elude existing formal synthesis tools.
This paper proposes an attention-guided decision framework for hospital pharmacists managing drug shortages, modeling bounded rationality by dynamically decomposing drugs into urgent and monitoring subsets, and shows that selective attention enables stable decision-making without full state reasoning.
This paper introduces Pinductor, a method that uses language model priors to efficiently learn POMDP world models from limited observation-action data, achieving performance comparable to methods with privileged hidden state access while surpassing traditional tabular approaches.
This paper introduces the Context Gathering Decision Process (CGDP), a POMDP framework to model LLM agent search behavior, proposing interventions that improve multi-hop reasoning and reduce token usage without performance degradation.
This paper introduces NM-PPG, a non-myopic active feature acquisition method using pathwise policy gradients to optimize sequential feature selection in costly prediction scenarios.