Tag
This paper presents a theoretical framework for deep reinforcement learning in continuous environments, modeling it as a continuous-time stochastic process using stochastic control theory. The authors characterize an actor-critic algorithm's dynamics in the infinite width limit of two-layer networks, deriving an equation for infinitesimal changes in state distribution under a vanishingly small learning rate.
Proposes a continuity criterion for extending discrete-time causal prior-data fitted networks to continuous time using stochastic differential equations, introducing a taxonomy and fine-grid integration method that outperforms naive integration on irregular observation schedules.