A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation

arXiv cs.LG Papers

Summary

A unified Python framework using PPO-based deep reinforcement learning for optimizing HVAC control with economizer logic and CO2-constrained ventilation is presented, showing improved energy efficiency and temperature stability over traditional PID controllers.

arXiv:2605.24406v1 Announce Type: new Abstract: Optimizing HVAC (Heating, Ventilation and Air Conditioning) can enhance a building's energy efficiency while providing comfort levels for its occupants. Using conventional control systems to maintain HVAC functions is often difficult because of the nonlinear characteristics of a building envelope as it experiences stochastic load variations over time. This paper presents a new approach to optimizing HVAC systems through the use of Deep Reinforcement Learning (DRL) algorithms and the Proximal Policy Optimization (PPO) algorithm implemented in a custom Python performance environment. The DRL system uses a second order resistor-capacitor thermal model and an integrated dynamic mass balance of CO2 to replicate the complex physics associated with buildings. One major innovation of this study is a "Hierarchical Flow Logic," which provides the means to ensure that indoor air quality (IAQ) is maintained by overriding the accepted actions of the agent that cause CO2 to exceed 1000 ppm. In addition, an enthalpy-based economiser is used to create free cooling from the outdoor environment. The experimental data shows that compared to PID controllers tuned by GA or traditional On-Off controls, a PPO agent has better temperature stability and energy efficiency overall. An end-to-end pipeline provides an avenue for robust and generalized solutions to help implement smart building energy management within the context of real hardware implementation.
Original Article
View Cached Full Text

Cached at: 05/26/26, 09:06 AM

# A Unified Python Framework for Direct PPO-based Control of AHUs with Economizer Logic and CO2-Constrained Ventilation
Source: [https://arxiv.org/abs/2605.24406](https://arxiv.org/abs/2605.24406)
[View PDF](https://arxiv.org/pdf/2605.24406)

> Abstract:Optimizing HVAC \(Heating, Ventilation and Air Conditioning\) can enhance a building's energy efficiency while providing comfort levels for its occupants\. Using conventional control systems to maintain HVAC functions is often difficult because of the nonlinear characteristics of a building envelope as it experiences stochastic load variations over time\. This paper presents a new approach to optimizing HVAC systems through the use of Deep Reinforcement Learning \(DRL\) algorithms and the Proximal Policy Optimization \(PPO\) algorithm implemented in a custom Python performance environment\. The DRL system uses a second order resistor\-capacitor thermal model and an integrated dynamic mass balance of CO2 to replicate the complex physics associated with buildings\. One major innovation of this study is a "Hierarchical Flow Logic," which provides the means to ensure that indoor air quality \(IAQ\) is maintained by overriding the accepted actions of the agent that cause CO2 to exceed 1000 ppm\. In addition, an enthalpy\-based economiser is used to create free cooling from the outdoor environment\. The experimental data shows that compared to PID controllers tuned by GA or traditional On\-Off controls, a PPO agent has better temperature stability and energy efficiency overall\. An end\-to\-end pipeline provides an avenue for robust and generalized solutions to help implement smart building energy management within the context of real hardware implementation\.

## Submission history

From: Mahdi Alibeigi \[[view email](https://arxiv.org/show-email/dc1aff01/2605.24406)\] **\[v1\]**Sat, 23 May 2026 05:31:09 UTC \(1,148 KB\)

Similar Articles

GenPO++: Generative Policy Optimization with Jacobian-free Likelihood Ratios

arXiv cs.LG

GenPO++ proposes a reversible generative policy optimization framework that uses history states as auxiliary memory in a high-order reversible ODE solver, enabling exact inversion and Jacobian-free likelihood-ratio computation for flow-based policies in reinforcement learning. It achieves competitive performance on large-scale control, fine-tuning, and real-world robotic tasks while improving stability and efficiency.

Proximal Policy Optimization

OpenAI Blog

OpenAI introduces Proximal Policy Optimization (PPO), a reinforcement learning algorithm that matches or outperforms state-of-the-art methods while being simpler to implement and tune. PPO uses a novel clipped objective function to constrain policy updates and has since become OpenAI's default RL algorithm.