A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation

arXiv cs.LG 05/26/26, 04:00 AM Papers

deep-reinforcement-learning hvac building-energy ppo co2-constrained-ventilation economizer control-systems

Summary

A unified Python framework using PPO-based deep reinforcement learning for optimizing HVAC control with economizer logic and CO2-constrained ventilation is presented, showing improved energy efficiency and temperature stability over traditional PID controllers.

arXiv:2605.24406v1 Announce Type: new Abstract: Optimizing HVAC (Heating, Ventilation and Air Conditioning) can enhance a building's energy efficiency while providing comfort levels for its occupants. Using conventional control systems to maintain HVAC functions is often difficult because of the nonlinear characteristics of a building envelope as it experiences stochastic load variations over time. This paper presents a new approach to optimizing HVAC systems through the use of Deep Reinforcement Learning (DRL) algorithms and the Proximal Policy Optimization (PPO) algorithm implemented in a custom Python performance environment. The DRL system uses a second order resistor-capacitor thermal model and an integrated dynamic mass balance of CO2 to replicate the complex physics associated with buildings. One major innovation of this study is a "Hierarchical Flow Logic," which provides the means to ensure that indoor air quality (IAQ) is maintained by overriding the accepted actions of the agent that cause CO2 to exceed 1000 ppm. In addition, an enthalpy-based economiser is used to create free cooling from the outdoor environment. The experimental data shows that compared to PID controllers tuned by GA or traditional On-Off controls, a PPO agent has better temperature stability and energy efficiency overall. An end-to-end pipeline provides an avenue for robust and generalized solutions to help implement smart building energy management within the context of real hardware implementation.

Original Article

View Cached Full Text

Cached at: 05/26/26, 09:06 AM

# A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation
Source: [https://arxiv.org/abs/2605.24406](https://arxiv.org/abs/2605.24406)
[View PDF](https://arxiv.org/pdf/2605.24406)

> Abstract:Optimizing HVAC \(Heating, Ventilation and Air Conditioning\) can enhance a building's energy efficiency while providing comfort levels for its occupants\. Using conventional control systems to maintain HVAC functions is often difficult because of the nonlinear characteristics of a building envelope as it experiences stochastic load variations over time\. This paper presents a new approach to optimizing HVAC systems through the use of Deep Reinforcement Learning \(DRL\) algorithms and the Proximal Policy Optimization \(PPO\) algorithm implemented in a custom Python performance environment\. The DRL system uses a second order resistor\-capacitor thermal model and an integrated dynamic mass balance of CO2 to replicate the complex physics associated with buildings\. One major innovation of this study is a "Hierarchical Flow Logic," which provides the means to ensure that indoor air quality \(IAQ\) is maintained by overriding the accepted actions of the agent that cause CO2 to exceed 1000 ppm\. In addition, an enthalpy\-based economiser is used to create free cooling from the outdoor environment\. The experimental data shows that compared to PID controllers tuned by GA or traditional On\-Off controls, a PPO agent has better temperature stability and energy efficiency overall\. An end\-to\-end pipeline provides an avenue for robust and generalized solutions to help implement smart building energy management within the context of real hardware implementation\.

## Submission history

From: Mahdi Alibeigi \[[view email](https://arxiv.org/show-email/dc1aff01/2605.24406)\] **\[v1\]**Sat, 23 May 2026 05:31:09 UTC \(1,148 KB\)

A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation

Similar Articles

Plan online, learn offline: Efficient learning and exploration via model-based control

GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios

Proximal Policy Optimization

Mahalanobis-Guided Latent OOD Detection for Hybrid ES-DRL Control in Time-Varying Systems

DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning

Submit Feedback

Similar Articles

Plan online, learn offline: Efficient learning and exploration via model-based control

GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios

Mahalanobis-Guided Latent OOD Detection for Hybrid ES-DRL Control in Time-Varying Systems

DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning