Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration
Summary
This paper proposes a framework linking partially observable Markov decision processes (POMDPs) with biochemical reaction dynamics to model phototaxis in unicellular algae, using inverse reinforcement learning to infer behavioral objectives from experimental trajectories.
View Cached Full Text
Cached at: 06/26/26, 05:14 AM
# Implementation of reinforcement learning in chemical reaction networks: application to phototaxis as curiosity-driven exploration Source: [https://arxiv.org/abs/2606.26168](https://arxiv.org/abs/2606.26168) [View PDF](https://arxiv.org/pdf/2606.26168) > Abstract:Living systems navigate environments using noisy and incomplete sensory signals\. In unicellular algae, phototaxis is often modeled as a mechanistic run\-\-tumble process driven by stimulus\-\-response rules\. However, such descriptions overlook how organisms actively sample their environment to reduce sensory ambiguity\. From a minimal cognition perspective, we reframe this navigation as a subjective, information\-driven sensorimotor process\. To this end, we propose a framework linking a Partially Observable Markov Decision Process \(POMDP\) with biochemical reaction dynamics\. Environmental variables are hidden, while the cell updates a minimal internal state from each observation through a memoryless Bayesian step\. These internal dynamics balance orienting toward light with exploratory reorientation and can be implemented through Chemical\-Reaction\-Network Ordinary Differential Equations \(CRN\-\-ODEs\)\. Our model includes a biophysical observation process for photoreception and a chemically computable polynomial bound on information gain\. Using Inverse Reinforcement Learning \(IRL\) on 30 experimentally recorded Chlamydomonas trajectories, we infer the behavioral objective consistent with observed phototactic motion and benchmark the resulting dynamics with standard Stochastic Simulation Algorithm \(SSA\) baselines\. Our model reproduces the empirical alignment\-to\-light distribution, comparable to objective SSA baselines on this dataset\. Within this framework, run\-\-tumble alternation emerges as an information\-acquisition strategy: tumbling reorients the cell to sample new sensory configurations and resolve sensor ambiguity, demonstrating how intracellular biochemical networks can support adaptive information\-seeking behavior in cellular navigation\. ## Submission history From: Gregoire Sergeant\-Perthuis \[[view email](https://arxiv.org/show-email/576731b7/2606.26168)\] \[via CCSD proxy\] **\[v1\]**Wed, 24 Jun 2026 08:11:14 UTC \(1,163 KB\)
Similar Articles
Synergizing Physically Constrained MCMC and Chemical-Informed Gaussian Processes for Reaction Network Discovery
This paper presents PC-MCMC-CIGP, a gray-box workflow that combines spike-and-slab topology sampling with physical constraints and a Chemical-Informed Gaussian Process for reaction network discovery. The method demonstrates improved yield on styrene epoxidation and distinguishes elementary pathways from deceptive fits on a hydrogen-bromine benchmark.
Uncertainty-aware reinforcement learning for chemical language models
Proposes two complementary approaches to incorporate predictive uncertainty into reinforcement learning for chemical language models, improving robustness and increasing true hit rate by 0.25 in de novo molecular design.
Reduction of Probabilistic Chemical Reaction Networks
This paper presents a method to reduce the size of chemical reaction networks (CRNs) implementing probabilistic inference by leveraging factor graph reduction techniques, resulting in smaller CRNs while preserving belief propagation fixed points on surviving variables.
Generative-Model Predictive Planning for Navigation in Partially Observable Environments
This paper introduces BeliefDiffusion, a framework combining diffusion models to represent multimodal belief distributions and Model Predictive Control for planning in partially observable environments, achieving better navigation success and path efficiency than baselines.
Reinforcement learning with prediction-based rewards
OpenAI introduces Random Network Distillation (RND), a prediction-based method for encouraging exploration in RL agents through curiosity, achieving human-level performance on Montezuma's Revenge without demonstrations or game state access.