pomdp

#pomdp

Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration

arXiv cs.LG ↗ · 3d ago Cached

This paper proposes a framework linking partially observable Markov decision processes (POMDPs) with biochemical reaction dynamics to model phototaxis in unicellular algae, using inverse reinforcement learning to infer behavioral objectives from experimental trajectories.

0 favorites 0 likes

#pomdp

Optimizing Lithium Production Decisions under Geological, Demand, and Pricing Uncertainties: A POMDP Framework for Multi-Objective Decision Making

arXiv cs.AI ↗ · 2026-06-18 Cached

This paper proposes a POMDP framework for multi-objective decision making in lithium production, addressing geological, demand, and pricing uncertainties to optimize mine opening and extraction method selection. The approach outperforms human-inspired heuristics by dynamically adapting to shifting price regimes through belief state planning.

0 favorites 0 likes

#pomdp

Belief-Space Control for Personalized Cancer Treatment via Active Inference

arXiv cs.AI ↗ · 2026-06-10 Cached

This paper models cancer treatment as a belief-space planning problem using active inference, deriving an expected free-energy objective that unifies goal-directed control and information acquisition under measurement budgets. The framework is validated on real clinical data from the AACR Project GENIE, demonstrating simultaneous patient categorization and high treatment efficacy.

0 favorites 0 likes

#pomdp

Smaller Abstract State Spaces Enable Cross-Scale Generalization in Reinforcement Learning

arXiv cs.LG ↗ · 2026-05-21 Cached

This paper presents the first theoretical model for out-of-distribution generalization in reinforcement learning, showing that smaller abstract state spaces enable cross-scale generalization in POMDPs.

0 favorites 0 likes

#pomdp

Context, Reasoning, and Hierarchy: A Cost-Performance Study of Compound LLM Agent Design in an Adversarial POMDP

arXiv cs.AI ↗ · 2026-05-18 Cached

A controlled study of compound LLM agent design in an adversarial POMDP (CybORG CAGE-2), systematically varying context, reasoning, and hierarchy across five model families. Key findings: programmatic state abstraction yields large returns per token, hierarchy without deliberation tools achieves best absolute performance, and context engineering is more cost-effective than deeper reasoning.

0 favorites 0 likes

#pomdp

Action-Conditioned Risk Gating for Safety-Critical Control under Partial Observability

arXiv cs.LG ↗ · 2026-05-15 Cached

This paper proposes Action-Conditioned Risk Gating, a lightweight reinforcement learning method for risk-sensitive control under partial observability that uses a compact finite-history proxy state and an action-conditioned near-term risk predictor to balance safety and performance.

0 favorites 0 likes

#pomdp

Synthesizing POMDP Policies: Sampling Meets Model-checking via Learning

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper presents a novel framework for synthesizing finite-state controllers for Partially Observable Markov Decision Processes (POMDPs) by integrating sampling, automata learning, and model-checking. The approach provides formal guarantees for threshold-safety problems that elude existing formal synthesis tools.

0 favorites 0 likes

#pomdp

Modeling Bounded Rationality in Drug Shortage Pharmacists Using Attention-Guided Dynamic Decomposition

arXiv cs.AI ↗ · 2026-05-15 Cached

This paper proposes an attention-guided decision framework for hospital pharmacists managing drug shortages, modeling bounded rationality by dynamically decomposing drugs into urgent and monitoring subsets, and shows that selective attention enables stable decision-making without full state reasoning.

0 favorites 0 likes

#pomdp

Learning POMDP World Models from Observations with Language-Model Priors

Hugging Face Daily Papers ↗ · 2026-05-13 Cached

This paper introduces Pinductor, a method that uses language model priors to efficiently learn POMDP world models from limited observation-action data, achieving performance comparable to methods with privileged hidden state access while surpassing traditional tabular approaches.

0 favorites 0 likes

#pomdp

The Context Gathering Decision Process: A POMDP Framework for Agentic Search

arXiv cs.AI ↗ · 2026-05-11 Cached

This paper introduces the Context Gathering Decision Process (CGDP), a POMDP framework to model LLM agent search behavior, proposing interventions that improve multi-hop reasoning and reduce token usage without performance degradation.

0 favorites 0 likes

#pomdp

Non-Myopic Active Feature Acquisition via Pathwise Policy Gradients

arXiv cs.LG ↗ · 2026-05-08 Cached

This paper introduces NM-PPG, a non-myopic active feature acquisition method using pathwise policy gradients to optimize sequential feature selection in costly prediction scenarios.

0 favorites 0 likes

pomdp

Submit Feedback