A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation
Summary
A unified Python framework using PPO-based deep reinforcement learning for optimizing HVAC control with economizer logic and CO2-constrained ventilation is presented, showing improved energy efficiency and temperature stability over traditional PID controllers.
View Cached Full Text
Cached at: 05/26/26, 09:06 AM
# A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation Source: [https://arxiv.org/abs/2605.24406](https://arxiv.org/abs/2605.24406) [View PDF](https://arxiv.org/pdf/2605.24406) > Abstract:Optimizing HVAC \(Heating, Ventilation and Air Conditioning\) can enhance a building's energy efficiency while providing comfort levels for its occupants\. Using conventional control systems to maintain HVAC functions is often difficult because of the nonlinear characteristics of a building envelope as it experiences stochastic load variations over time\. This paper presents a new approach to optimizing HVAC systems through the use of Deep Reinforcement Learning \(DRL\) algorithms and the Proximal Policy Optimization \(PPO\) algorithm implemented in a custom Python performance environment\. The DRL system uses a second order resistor\-capacitor thermal model and an integrated dynamic mass balance of CO2 to replicate the complex physics associated with buildings\. One major innovation of this study is a "Hierarchical Flow Logic," which provides the means to ensure that indoor air quality \(IAQ\) is maintained by overriding the accepted actions of the agent that cause CO2 to exceed 1000 ppm\. In addition, an enthalpy\-based economiser is used to create free cooling from the outdoor environment\. The experimental data shows that compared to PID controllers tuned by GA or traditional On\-Off controls, a PPO agent has better temperature stability and energy efficiency overall\. An end\-to\-end pipeline provides an avenue for robust and generalized solutions to help implement smart building energy management within the context of real hardware implementation\. ## Submission history From: Mahdi Alibeigi \[[view email](https://arxiv.org/show-email/dc1aff01/2605.24406)\] **\[v1\]**Sat, 23 May 2026 05:31:09 UTC \(1,148 KB\)
Similar Articles
Plan online, learn offline: Efficient learning and exploration via model-based control
OpenAI proposes POLO (Plan Online, Learn Offline), a framework combining model-based control with value function learning and coordinated exploration to enable efficient learning on complex control tasks like humanoid locomotion and dexterous manipulation with minimal real-world experience.
GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios
GenPO++ proposes a reversible generative policy optimization framework that uses history states as auxiliary memory in a high-order reversible ODE solver, enabling exact inversion and Jacobian-free likelihood-ratio computation for flow-based policies in reinforcement learning. It achieves competitive performance on large-scale control, fine-tuning, and real-world robotic tasks while improving stability and efficiency.
Proximal Policy Optimization
OpenAI introduces Proximal Policy Optimization (PPO), a reinforcement learning algorithm that matches or outperforms state-of-the-art methods while being simpler to implement and tune. PPO uses a novel clipped objective function to constrain policy updates and has since become OpenAI's default RL algorithm.
Mahalanobis-Guided Latent OOD Detection for Hybrid ES-DRL Control in Time-Varying Systems
This paper presents a Mahalanobis-guided latent out-of-distribution detection method using a VAE to switch between a reinforcement learning controller and an extremum seeking controller in time-varying systems, validated in particle accelerator control.
DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning
DiffAero is a GPU-accelerated, fully differentiable simulation framework for quadrotor control policy learning that supports environment- and agent-level parallelism, multiple dynamics models, and customizable sensors. It enables robust flight policy learning in hours on consumer-grade hardware and is released as open-source.