UniWind: Toward Unified Day-Ahead Wind Power Forecasting via Physics-Informed State Routing

arXiv cs.LG Papers

Summary

Introduces UniWind, a physics-informed machine learning model for day-ahead wind power forecasting that combines physical prior estimation with latent state encoding to handle operational states like shutdowns and curtailment, achieving robust performance across multiple real-world datasets.

arXiv:2607.01670v1 Announce Type: new Abstract: Day-ahead wind power forecasting is essential for cost-effective power-system operation. It is primarily driven by future meteorological conditions while retaining temporal dependencies in power generation. In practice, observed wind-farm power often entangles physically available power with local environmental effects and latent operational states, such as shutdowns and curtailment. Existing physical models provide useful constraints but adapt poorly across wind farms, whereas data-driven models can capture rich correlations but often conflate meteorological effects with state-induced deviations. In this study, we propose UniWind, a wind power forecasting model based on physics-informed state routing. UniWind first employs a Physical Prior Estimator to construct a site-calibrated physical prior by combining site-conditioned monotonic warping with a shared physical power curve. It further applies a physical upper-bound constraint to shape this prior as a soft envelope of available wind power generation. UniWind then proposes a Latent State Encoder to model operating-state embeddings and transforms the physical prior into final power forecasts through a State-aware Power Corrector, which uses knowledge-guided supervised state routing and bounded, state-specific expert correction. Full-shot and cross-farm zero-shot experiments on more than 20 real-world datasets demonstrate the accuracy and robustness of UniWind.
Original Article
View Cached Full Text

Cached at: 07/03/26, 05:43 AM

# UniWind: Toward Unified Day-Ahead Wind Power Forecasting via Physics-Informed State Routing
Source: [https://arxiv.org/html/2607.01670](https://arxiv.org/html/2607.01670)
Ronghui Xu1, Tongxin Wu2, Guozhen Zhang3, Yihan Li2, Chenjuan Guo1, Bin Yang1,\*, Yong Li4,\* 1East China Normal University, Shanghai, China 2Southern University of Science and Technology, Shenzhen, China 3TsingRoc\.ai, Beijing, China 4Tsinghua University, Beijing, China \*Corresponding authors: byang@dase\.ecnu\.edu\.cn, liyong07@tsinghua\.edu\.cn

###### Abstract

Day\-ahead wind power forecasting is essential for cost\-effective power\-system operation\. It is primarily driven by future meteorological conditions while retaining temporal dependencies in power generation\. In practice, observed wind\-farm power often entangles physically available power with local environmental effects and latent operational states, such as shutdowns and curtailment\. Existing physical models provide useful constraints but adapt poorly across wind farms, whereas data\-driven models can capture rich correlations but often conflate meteorological effects with state\-induced deviations\. In this study, we propose UniWind, a wind power forecasting model based on physics\-informed state routing\. UniWind first employs a Physical Prior Estimator to construct a site\-calibrated physical prior by combining site\-conditioned monotonic warping with a shared physical power curve\. It further applies a physical upper\-bound constraint to shape this prior as a soft envelope of available wind power generation\. UniWind then proposes a Latent State Encoder to model operating\-state embeddings and transforms the physical prior into final power forecasts through a State\-aware Power Corrector, which uses knowledge\-guided supervised state routing and bounded, state\-specific expert correction\. Full\-shot and cross\-farm zero\-shot experiments on more than 20 real\-world datasets demonstrate the accuracy and robustness of UniWind\.

## 1Introduction

![Refer to caption](https://arxiv.org/html/2607.01670v1/x1.png)Figure 1:Comparison of UniWind with other wind power forecasting models\.Day\-ahead wind power forecasting plays a central role in cost\-effective power\-system operationWuet al\.\([2026](https://arxiv.org/html/2607.01670#bib.bib40)\), as electricity\-market decisions must be made before power deliveryHanifiet al\.\([2020](https://arxiv.org/html/2607.01670#bib.bib12)\); Xuet al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib24)\)\. Compared with conventional time series forecastingStitsyuk and Choi \([2025](https://arxiv.org/html/2607.01670#bib.bib27)\); Liuet al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib26)\); Wanget al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib25)\); Nieet al\.\([2023](https://arxiv.org/html/2607.01670#bib.bib28)\); Fanget al\.\([2026b](https://arxiv.org/html/2607.01670#bib.bib38)\), day\-ahead wind power forecasting exhibits a stronger dependence on meteorological conditions at the corresponding future time steps than on historical power\. For example, high historical power on previous days does not imply high future power if the future wind speed is low\. Therefore, this task should be viewed as a future\-meteorology\-conditioned power realization forecasting problem\.

![Refer to caption](https://arxiv.org/html/2607.01670v1/x2.png)Figure 2:Operational states in a wind power sequence\.Existing wind power forecasting methods can be categorized into physically driven and data\-driven approaches\. Physical modelsFockenet al\.\([2001](https://arxiv.org/html/2607.01670#bib.bib11)\); Liet al\.\([2013](https://arxiv.org/html/2607.01670#bib.bib8)\); Wanget al\.\([2018](https://arxiv.org/html/2607.01670#bib.bib10)\), detailed in Figure[1](https://arxiv.org/html/2607.01670#S1.F1)\(a\), provide reliable power predictions through physical constraints and structured inductive biases\. However, they struggle to capture local endogenous factors, such as terrain\-induced circulation and wind\-farm operational states\. As shown in Figure[2](https://arxiv.org/html/2607.01670#S1.F2), turbines may switch among shutdown with zero output, regular operation where power follows the physical wind\-speed response, and curtailment where output is intentionally constrained below the available power potential\. These latent\-state changes alter the wind\-to\-power mapping, limiting the adaptability of fixed physical formulations across different wind farms and operational conditions\. Data\-driven methodsSideratos and Hatziargyriou \([2007](https://arxiv.org/html/2607.01670#bib.bib13)\); Ozkan and Karagoz \([2015](https://arxiv.org/html/2607.01670#bib.bib14)\); Karijadiet al\.\([2023](https://arxiv.org/html/2607.01670#bib.bib7)\); Zhanget al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib2)\); Changet al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib4)\)can capture richer correlations from real\-world data\. Nevertheless, as shown in Figure[1](https://arxiv.org/html/2607.01670#S1.F1)\(b\), they tend to entangle physical power potential with hidden\-state effects, making it difficult to determine whether power deviations arise from meteorological changes or latent operational states\. This weakens their robustness under diverse wind and operational conditions\. Therefore, robust day\-ahead wind power forecasting requires explicitly modeling the dependency from meteorological conditions to physical power potential and further to realized power under hidden\-state interventions, as illustrated in Figure[1](https://arxiv.org/html/2607.01670#S1.F1)\(c\)\. However, constructing such a robust model poses two key challenges:

First, estimating power generation potential requires balancing universal physical principles with site\-specific adaptation\.Wind power generation follows physical relationships that provide reusable structure across wind farms\. However, real\-world generation potential is also shaped by site\-specific factors, such as local environmental effects, that are often unobserved\. Existing fixed physical formulasPrósperet al\.\([2019](https://arxiv.org/html/2607.01670#bib.bib9)\); Wanget al\.\([2018](https://arxiv.org/html/2607.01670#bib.bib10)\)may be too rigid to capture site\-dependent wind\-to\-power responses, whereas unconstrained data\-driven methodsBuhan and Çadırcı \([2015](https://arxiv.org/html/2607.01670#bib.bib15)\); Hong and Rioflorido \([2019](https://arxiv.org/html/2607.01670#bib.bib16)\)may overfit noisy site\-specific correlations or violate physical principles\. Therefore, an effective physical prior should preserve wind\-energy physics while enabling site\-level calibration under appropriate constraints\.

Second, predicting realized power requires modeling the effects of unobservable, time\-varying future operational states\.Even when the physical generation potential is similar, the actual output may differ due to different latent states like shutdown or curtailment\. These dynamic states are often unobservable at forecasting time and are entangled with meteorological variation\. Meanwhile, existing hidden\-state\-aware methodsLiuet al\.\([2025b](https://arxiv.org/html/2607.01670#bib.bib31)\); Zhouet al\.\([2023](https://arxiv.org/html/2607.01670#bib.bib32)\)typically infer hidden states directly from historical sequences, making it difficult to distinguish true changes in physically available power from hidden\-state interventions\. As a result, they may average over multiple states and fail to capture abrupt transitions or abnormal dynamics\. This motivates the inference of future states and the design of state\-specific dependency correction\.

To this end, we proposeUniWind, a physics\-informed state\-routing model for day\-ahead wind power forecasting\. UniWind decouples physically available power from operationally induced deviations, thereby modeling both meteorology\-driven generation potential and its state\-dependent realization\.To address the first challenge, UniWind introduces a Physical Prior Estimator that combines site\-conditioned monotone warping and a shared physical power curve to construct a site\-calibrated physical prior\. As the physical prior represents the ideal generation potential under given meteorological conditions, we treat it as an upper bound on realized power and construct an upper\-bound constraint accordingly\.To address the second challenge, UniWind uses a Latent State Encoder to infer historical operating patterns from prior\-power discrepancies and retrieve future state embeddings from similar meteorological contexts\. Since future operational states are dynamic and require distinct correction mechanisms, a State\-aware Power Corrector then routes each timestamp to supervised operating\-state experts and applies bounded state\-specific corrections to obtain realized power forecasts\. Our contributions are summarized as follows:

- •We propose UniWind, a physics\-informed state\-routing model for day\-ahead wind power forecasting that explicitly factorizes the forecasting process into meteorology\-driven physical power potential and state\-dependent realized generation\.
- •We develop a site\-calibrated Physical Prior Estimator that learns an available\-power prior by combining shared wind\-energy structure with site\-conditioned monotone calibration and physical upper\-bound regularization\.
- •We introduce a Latent State Encoder and a State\-aware Power Corrector to model unobservable operational states and their effects on wind\-farm generation\. The encoder retrieves future state representations from historical weather\-state patterns, while the corrector uses knowledge\-supervised state routing and bounded state\-specific experts to transform the physical prior into realized power forecasts\.
- •We conduct extensive experiments on more than 20 real\-world wind\-farm datasets that cover full\-shot and cross\-farm zero\-shot settings, and demonstrate the accuracy and robustness of UniWind\.

## 2Related Work

Wind Power Forecasting\.Current research on wind power forecasting primarily relies on physical models or data\-driven approaches\. Physical modelsFockenet al\.\([2001](https://arxiv.org/html/2607.01670#bib.bib11)\); Liet al\.\([2013](https://arxiv.org/html/2607.01670#bib.bib8)\); Wanget al\.\([2018](https://arxiv.org/html/2607.01670#bib.bib10)\)estimate wind power by propagating meteorological forecasts through physical and engineering assumptions\. For example, Li et al\.Liet al\.\([2013](https://arxiv.org/html/2607.01670#bib.bib8)\)first simulate steady\-state flow fields under discrete inflow wind conditions to construct a database, and then predict short\-term wind power by querying this database using wind inputs\. Data\-driven models learn mappings from meteorological variables, historical power, and site features to future generation\. Statistical methodsSideratos and Hatziargyriou \([2007](https://arxiv.org/html/2607.01670#bib.bib13)\); Ozkan and Karagoz \([2015](https://arxiv.org/html/2607.01670#bib.bib14)\); Buhan and Çadırcı \([2015](https://arxiv.org/html/2607.01670#bib.bib15)\); Phanet al\.\([2021](https://arxiv.org/html/2607.01670#bib.bib17)\); Juet al\.\([2019](https://arxiv.org/html/2607.01670#bib.bib18)\)typically rely on handcrafted feature engineering\. Tree\-based approaches, such as XGBoostPhanet al\.\([2021](https://arxiv.org/html/2607.01670#bib.bib17)\)and LightGBMJuet al\.\([2019](https://arxiv.org/html/2607.01670#bib.bib18)\), combine diverse meteorological features, and often serve as strong practical baselines\. Deep learning modelsFanet al\.\([2020](https://arxiv.org/html/2607.01670#bib.bib39)\); Karijadiet al\.\([2023](https://arxiv.org/html/2607.01670#bib.bib7)\); Keisler and Le Naour \([2024](https://arxiv.org/html/2607.01670#bib.bib3)\); Zhanget al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib2)\); Changet al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib4)\)learn nonlinear weather\-power dependencies from large\-scale observations\. More recently, following the emergence of foundation\-model researchFanget al\.\([2026a](https://arxiv.org/html/2607.01670#bib.bib37)\); Wanget al\.\([2025b](https://arxiv.org/html/2607.01670#bib.bib41)\), WindFMFanet al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib5)\)has been developed for wind energy prediction in unseen scenarios\. Although these methods capture rich empirical correlations, real\-world wind\-farm data often contain measurement noise, local environmental effects, and latent operational states, such as shutdowns and curtailment\. Consequently, physical models may fail to adapt to local endogenous conditions, whereas data\-driven models may absorb hidden operational effects as noise or spurious correlations\. Neither line of work explicitly separates physical generation potential from state\-dependent realized power, which is the key gap addressed by UniWind\.

Solar Power Forecasting\.Solar power forecasting models use weather conditions and historical observations, but rely more on cloud\-sensitive multimodal signalsBoussifet al\.\([2023](https://arxiv.org/html/2607.01670#bib.bib19)\); Maet al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib20)\); Wanget al\.\([2025a](https://arxiv.org/html/2607.01670#bib.bib21)\); Liet al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib22)\)\. For example, CrossViViTBoussifet al\.\([2023](https://arxiv.org/html/2607.01670#bib.bib19)\)uses satellite imagery as spatio\-temporal context and combines it with station time series for probabilistic forecasting\. FusionSFMaet al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib20)\)discretizes heterogeneous modalities into a shared codebook for robust solar\-power generation\. However, wind power depends mainly on wind\-speed dynamics rather than irradiance, and its uncertainty is more affected by turbine operation than cloud motion\. These differences limit the direct transfer of solar forecasting models to wind power prediction\.

Time\-Series Forecasting for Wind Power\.Time\-series forecasting is widely used for wind power prediction because it directly learns temporal dependencies from historical sequencesZhaoet al\.\([2016](https://arxiv.org/html/2607.01670#bib.bib23)\); Xuet al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib24)\); Wanget al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib25)\)\. Recent models such as PatchTSTNieet al\.\([2023](https://arxiv.org/html/2607.01670#bib.bib28)\), iTransformerLiuet al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib26)\), and xPatchStitsyuk and Choi \([2025](https://arxiv.org/html/2607.01670#bib.bib27)\)improve long\-horizon forecasting through patch\-based, variable\-centric, and multi\-period representations\. Foundation models such as ChronosAnsariet al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib29)\)and MoiraiLiuet al\.\([2025a](https://arxiv.org/html/2607.01670#bib.bib30)\)leverage large\-scale heterogeneous time\-series pretraining for downstream adaptation\. Nonetheless, these generic sequence forecasting models are often dominated by historical power patterns and may inadequately capture how future meteorological changes drive wind generation\.

## 3Methodology

In this section, we present UniWind, a unified wind power forecasting model that combines physics\-informed priors with latent operational state dynamics\.

### 3\.1Problem Statement

Given the numerical weather prediction \(NWP\) sequence𝒘∈ℝ\(T1\+T2\)×Nw\\boldsymbol\{w\}\\in\\mathbb\{R\}^\{\(T\_\{1\}\+T\_\{2\}\)\\times N\_\{w\}\}, the historical power sequence𝒙∈ℝT1\\boldsymbol\{x\}\\in\\mathbb\{R\}^\{T\_\{1\}\}, and the site features𝒄∈ℝNs\\boldsymbol\{c\}\\in\\mathbb\{R\}^\{N\_\{s\}\}of the target wind farm, our goal is to learn a forecasting modelfθf\_\{\\theta\}that predicts wind power𝒚^∈ℝT2\\hat\{\\boldsymbol\{y\}\}\\in\\mathbb\{R\}^\{T\_\{2\}\}:

𝒚^=fθ​\(𝒘,𝒙,𝒄\),\\hat\{\\boldsymbol\{y\}\}=f\_\{\\theta\}\\left\(\\boldsymbol\{w\},\\boldsymbol\{x\},\\boldsymbol\{c\}\\right\),\(1\)whereT1T\_\{1\}andT2T\_\{2\}denote the lengths of the historical and forecasting horizons, respectively\.NwN\_\{w\}is the number of meteorological features, andNsN\_\{s\}is the number of static site features, such as longitude, latitude, and rated power\.

![Refer to caption](https://arxiv.org/html/2607.01670v1/fig/framework.jpg)Figure 3:Overall framework of UniWind\.
### 3\.2Overall Framework

As shown in Figure[3](https://arxiv.org/html/2607.01670#S3.F3), UniWind consists of three stages\. First, the Physical Prior Estimator combines density\-aware wind\-speed normalization, site\-conditioned physical calibration, and a shared learnable power curve to estimate the physical prior under ideal conditions\. A physical upper\-bound constraint further encourages this prior to serve as a soft envelope for available generation\. Second, the Latent State Encoder characterizes historical operational states using discrepancies between observed power and the physical prior\. It then retrieves future state patterns from historical states under similar meteorological conditions, refines the operational\-state embeddings, and fuses them with weather embeddings to obtain the final state embeddings\. Finally, the State\-aware Power Corrector routes the final state embedding at each time step to supervised operational\-state experts and aggregates bounded, state\-specific corrections to produce the final prediction\.

### 3\.3Physical Prior Estimator

Given that wind power is governed by strong site\-specific physical constraints, we propose a Physical Prior Estimator to construct site\-calibrated estimates of physically available power, providing a reliable prior for latent state modeling and state\-aware correction\.

Physical normalization\.To remove first\-order variations in air density, we apply a density\-aware wind\-speed adjustmentPanditet al\.\([2020](https://arxiv.org/html/2607.01670#bib.bib1)\)to obtain the equivalent wind speed𝒗~=𝒗​\(𝝆ρ0\)1/3\\tilde\{\\boldsymbol\{v\}\}=\\boldsymbol\{v\}\\left\(\\frac\{\\boldsymbol\{\\rho\}\}\{\\rho\_\{0\}\}\\right\)^\{1/3\}, which follows the wind\-energy relationPwind∝ρ​v3P\_\{\\mathrm\{wind\}\}\\propto\\rho v^\{3\}\. Here,𝒗∈ℝT1\+T2\\boldsymbol\{v\}\\in\\mathbb\{R\}^\{T\_\{1\}\+T\_\{2\}\}and𝝆∈ℝT1\+T2\\boldsymbol\{\\rho\}\\in\\mathbb\{R\}^\{T\_\{1\}\+T\_\{2\}\}denote the raw wind speed and air density over the pastT1T\_\{1\}and futureT2T\_\{2\}timestamps, respectively, andρ0=1\.225\\rho\_\{0\}=1\.225denotes the reference air density\.

Site\-conditioned monotone warp\.To adapt the physical prior to different sites while remaining sufficiently constrained to avoid non\-physical wind\-speed inversions, we design a site\-conditioned monotone warp that maps equivalent wind speed to effective wind speed as follows:

𝒗^=max⁡\(0,exp⁡\(a\)​\(1\+𝒗~\)exp⁡\(b\)−1\),\[a,b\]⊤=hϕ​\(𝒄\)∈ℝ2\.\\hat\{\\boldsymbol\{v\}\}=\\max\\left\(0,\\;\\exp\(a\)\(1\+\\tilde\{\\boldsymbol\{v\}\}\)^\{\\exp\(b\)\}\-1\\right\),\\qquad\[a,b\]^\{\\top\}=h\_\{\\phi\}\(\\boldsymbol\{c\}\)\\in\\mathbb\{R\}^\{2\}\.\(2\)Here,𝒄∈ℝNs\\boldsymbol\{c\}\\in\\mathbb\{R\}^\{N\_\{s\}\}denotes the site features,hϕ​\(⋅\)h\_\{\\phi\}\(\\cdot\)is an MLP\-based site\-conditioned parameter head,aashifts the effective speed scale, andbbcontrols the local stretch of the curve\. The mapping from𝒗~\\tilde\{\\boldsymbol\{v\}\}to𝒗^\\hat\{\\boldsymbol\{v\}\}preserves the ordering of wind speeds, so a larger equivalent wind speed cannot yield a smaller effective wind speed\. A formal proof is provided in Appendix[C\.1](https://arxiv.org/html/2607.01670#A3.SS1)\.

To further align the shared power curve with the target wind farm, UniWind learns site\-conditioned anchors\[vb,vr\]⊤∈ℝ2\[v\_\{b\},v\_\{r\}\]^\{\\top\}\\in\\mathbb\{R\}^\{2\}, withvr\>vbv\_\{r\}\>v\_\{b\}, in the same manner asaaandbb\. Here,vbv\_\{b\}andvrv\_\{r\}denote cut\-in\-like and rated\-speed\-like anchors\. We then normalize the effective wind speed as𝒗∗=𝒗^−vbvr−vb\\boldsymbol\{v\}^\{\*\}=\\frac\{\\hat\{\\boldsymbol\{v\}\}\-v\_\{b\}\}\{v\_\{r\}\-v\_\{b\}\}\.

Shared physical power curve\.After site\-conditioned wind\-speed calibration, we introduce a shared physical power curve as a smooth, learnable physical prior inspired by classical wind\-turbine power curvesWanget al\.\([2019](https://arxiv.org/html/2607.01670#bib.bib33)\)\. Rather than reproducing a specific manufacturer curve, this curve captures the common physical structure of wind generation: near\-zero power before cut\-in, nonlinear growth in the sub\-rated region, and saturation near rated capacity\. The physical prior power is generated as follows:

𝒑prior=Pcap​fcurve​\(𝒗∗\),\\boldsymbol\{p\}^\{\\mathrm\{prior\}\}=P\_\{\\mathrm\{cap\}\}f\_\{\\mathrm\{curve\}\}\(\\boldsymbol\{v\}^\{\*\}\),\(3\)wherePcap∈ℝP\_\{\\mathrm\{cap\}\}\\in\\mathbb\{R\}denotes the rated capacity of the target wind farm, andfcurve​\(⋅\)f\_\{\\mathrm\{curve\}\}\(\\cdot\)maps the normalized effective wind speed𝒗∗\\boldsymbol\{v\}^\{\*\}to a capacity factor in\[0,1\]\[0,1\]\. The detailed parameterization offcurvef\_\{\\mathrm\{curve\}\}is provided in Appendix[C\.2](https://arxiv.org/html/2607.01670#A3.SS2)\.

Physical upper\-bound constraint\.We further shape the physical prior into a soft upper envelope of available generation\. The prior should be high enough to cover observed power, while remaining close to real power under regular operation\. We therefore define

ℒprior=1\|Ω\|​∑t∈Ω\[𝒑t−𝒑tprior\]\+\+λtight​1\|Ωreg\|​∑t∈Ωreg\[𝒑tprior−𝒑t−ϵslack\]\+\.\\mathcal\{L\}\_\{\\mathrm\{prior\}\}=\\frac\{1\}\{\|\\Omega\|\}\\sum\_\{t\\in\\Omega\}\\left\[\\boldsymbol\{p\}\_\{t\}\-\\boldsymbol\{p\}\_\{t\}^\{\\mathrm\{prior\}\}\\right\]\_\{\+\}\+\\lambda\_\{\\mathrm\{tight\}\}\\frac\{1\}\{\|\\Omega\_\{\\mathrm\{reg\}\}\|\}\\sum\_\{t\\in\\Omega\_\{\\mathrm\{reg\}\}\}\\left\[\\boldsymbol\{p\}\_\{t\}^\{\\mathrm\{prior\}\}\-\\boldsymbol\{p\}\_\{t\}\-\\epsilon\_\{\\mathrm\{slack\}\}\\right\]\_\{\+\}\.\(4\)Here,Ω\\OmegaandΩreg\\Omega\_\{\\mathrm\{reg\}\}denote the set of observed time steps and the subset labeled as regular state, respectively\. The construction of the regular label is detailed in Appendix[C\.3](https://arxiv.org/html/2607.01670#A3.SS3)\. We define\[x\]\+=max⁡\(x,0\)\[x\]\_\{\+\}=\\max\(x,0\), andϵslack\\epsilon\_\{\\mathrm\{slack\}\}is the allowed slack between the physical prior and observed power\. The weightλtight\\lambda\_\{\\mathrm\{tight\}\}controls the regular\-state tightness\.

### 3\.4Latent State Encoder

By encoding historical deviations and retrieving future state patterns from weather context, this module produces state embeddings that explain the gap between ideal physical availability and actual wind\-farm generation\.

Historical state encoding\.We use historical discrepancies between observed power and theoretically available power to infer recent latent operating behavior\. Let1:T11\{:\}T\_\{1\}denote the historical window andT1\+1:T1\+T2T\_\{1\}\{\+\}1\{:\}T\_\{1\}\+T\_\{2\}denote the future window\. Given the historical real power𝒙\\boldsymbol\{x\}and the historical physical prior𝒑1:T1prior\\boldsymbol\{p\}^\{\\mathrm\{prior\}\}\_\{1:T\_\{1\}\}, UniWind constructs discrepancy\-aware historical state tokens:

𝝍=\[𝒙,𝒑1:T1prior,𝒆,Δ​𝒙,Δ​𝒑1:T1prior,Δ​𝒆,𝒓,Δ​𝒓\]\.\\boldsymbol\{\\psi\}=\\left\[\\boldsymbol\{x\},\\;\\boldsymbol\{p\}^\{\\mathrm\{prior\}\}\_\{1:T\_\{1\}\},\\;\\boldsymbol\{e\},\\;\\Delta\\boldsymbol\{x\},\\;\\Delta\\boldsymbol\{p\}^\{\\mathrm\{prior\}\}\_\{1:T\_\{1\}\},\\;\\Delta\\boldsymbol\{e\},\\;\\boldsymbol\{r\},\\;\\Delta\\boldsymbol\{r\}\\right\]\.\(5\)Here,𝝍∈ℝT1×8\\boldsymbol\{\\psi\}\\in\\mathbb\{R\}^\{T\_\{1\}\\times 8\}denotes the historical state\-token sequence,𝒆=𝒙−𝒑1:T1prior\\boldsymbol\{e\}=\\boldsymbol\{x\}\-\\boldsymbol\{p\}^\{\\mathrm\{prior\}\}\_\{1:T\_\{1\}\}denotes the residual sequence,𝒓\\boldsymbol\{r\}denotes the power\-to\-prior ratio sequence, andΔ\\Deltadenotes the first\-order temporal difference operator\. These tokens are then encoded intoDD\-dimensional historical state embeddings:

𝒔h=Ehist​\(𝝍\)\.\\boldsymbol\{s\}^\{h\}=E\_\{\\mathrm\{hist\}\}\(\\boldsymbol\{\\psi\}\)\.\(6\)Here,Ehist​\(⋅\)E\_\{\\mathrm\{hist\}\}\(\\cdot\)denotes the historical state encoder, which consists of token normalization, an MLP projection, and a multi\-scale temporal convolution block, and𝒔h∈ℝT1×D\\boldsymbol\{s\}^\{h\}\\in\\mathbb\{R\}^\{T\_\{1\}\\times D\}denotes the historical state embedding sequence\.

Weather encoding\.UniWind encodes the full weather sequence as follows:

𝒉~w=Ew​\(𝒘\),𝒉w=Tw​\(𝒉~w\)\.\\tilde\{\\boldsymbol\{h\}\}^\{w\}=E\_\{\\mathrm\{w\}\}\(\\boldsymbol\{w\}\),\\qquad\\boldsymbol\{h\}^\{w\}=T\_\{\\mathrm\{w\}\}\(\\tilde\{\\boldsymbol\{h\}\}^\{w\}\)\.\(7\)Here,𝒘∈ℝ\(T1\+T2\)×Nw\\boldsymbol\{w\}\\in\\mathbb\{R\}^\{\(T\_\{1\}\+T\_\{2\}\)\\times N\_\{w\}\}is the NWP sequence,Ew​\(⋅\)E\_\{\\mathrm\{w\}\}\(\\cdot\)is an MLP\-based per\-step weather encoder,Tw​\(⋅\)T\_\{\\mathrm\{w\}\}\(\\cdot\)is a temporal weather encoder with positional and time information, and𝒉w∈ℝ\(T1\+T2\)×D\\boldsymbol\{h\}^\{w\}\\in\\mathbb\{R\}^\{\(T\_\{1\}\+T\_\{2\}\)\\times D\}is the context\-aware weather embedding sequence\.

Future latent state retrieving\.To transfer operating\-state patterns from similar historical meteorological conditions to future horizons, we treat future weather embeddings as queries, historical weather embeddings as keys, and historical state embeddings as values, thereby retrieving future latent state embeddings as follows:

𝒔~f=Attn⁡\(Q​𝒉T1\+1:T1\+T2w,K​𝒉1:T1w,V​𝒔1:T1h\)\.\\tilde\{\\boldsymbol\{s\}\}^\{f\}=\\operatorname\{Attn\}\\left\(Q\\boldsymbol\{h\}^\{w\}\_\{T\_\{1\}\+1:T\_\{1\}\+T\_\{2\}\},\\;K\\boldsymbol\{h\}^\{w\}\_\{1:T\_\{1\}\},\\;V\\boldsymbol\{s\}^\{h\}\_\{1:T\_\{1\}\}\\right\)\.\(8\)Here,𝒔~f∈ℝT2×D\\tilde\{\\boldsymbol\{s\}\}^\{f\}\\in\\mathbb\{R\}^\{T\_\{2\}\\times D\}denotes the retrieved future state sequence, andQQ,KK, andVVdenote learnable query, key, and value projections, respectively\.

Latent state refining\.Future operational states are correlated with past operational states and influenced by future weather conditions\. Therefore, UniWind refines the full state sequence\[𝒔h;𝒔~f\]\[\\boldsymbol\{s\}^\{h\};\\tilde\{\\boldsymbol\{s\}\}^\{f\}\]using a causal TCN\-based refiner\. It then fuses the refined state features with weather features through projection and an MLP to obtain the final state embedding sequence𝒛∈ℝ\(T1\+T2\)×D\\boldsymbol\{z\}\\in\\mathbb\{R\}^\{\(T\_\{1\}\+T\_\{2\}\)\\times D\}\.

### 3\.5State\-aware Power Corrector

To convert ideal available power into realistic forecasts under hidden operating conditions, we propose the State\-aware Power Corrector, which learns state\-dependent corrections to the physical prior using the inferred state embeddings\.

Site\-context conditioning\.The same latent operating pattern may require different corrections across wind farms\. To incorporate such persistent site information, UniWind encodes site context and injects it into the final state embeddings:

𝒄ctx=Ectx​\(𝒄,𝒙\),𝒛~=LN⁡\(𝒛\+𝒄ctx\)\.\\boldsymbol\{c\}^\{\\mathrm\{ctx\}\}=E\_\{\\mathrm\{ctx\}\}\(\\boldsymbol\{c\},\\boldsymbol\{x\}\),\\qquad\\tilde\{\\boldsymbol\{z\}\}=\\operatorname\{LN\}\(\\boldsymbol\{z\}\+\\boldsymbol\{c\}^\{\\mathrm\{ctx\}\}\)\.\(9\)Here,Ectx​\(⋅\)E\_\{\\mathrm\{ctx\}\}\(\\cdot\)is an MLP\-based site\-context encoder that fuses site features and historical power statistics,𝒄ctx∈ℝD\\boldsymbol\{c\}^\{\\mathrm\{ctx\}\}\\in\\mathbb\{R\}^\{D\}is the correction\-side site embedding,𝒛~∈ℝ\(T1\+T2\)×D\\tilde\{\\boldsymbol\{z\}\}\\in\\mathbb\{R\}^\{\(T\_\{1\}\+T\_\{2\}\)\\times D\}is the site\-aware latent state sequence, andLN⁡\(⋅\)\\operatorname\{LN\}\(\\cdot\)denotes layer normalization\.

Supervised state routing\.Different operational states require distinct correction strategies\. Accordingly, UniWind routes each state embedding to state\-specific experts and supervises the routing process with state\-prior labels:

𝒍^=softmax⁡\(R​\(𝒛~\)\),ℒrouter=CE⁡\(𝒍^,𝒍\)\.\\hat\{\\boldsymbol\{l\}\}=\\operatorname\{softmax\}\(R\(\\tilde\{\\boldsymbol\{z\}\}\)\),\\qquad\\mathcal\{L\}\_\{\\mathrm\{router\}\}=\\operatorname\{CE\}\(\\hat\{\\boldsymbol\{l\}\},\\boldsymbol\{l\}\)\.\(10\)Here,R​\(⋅\)R\(\\cdot\)is an MLP\-based state router,𝒍^∈ℝ\(T1\+T2\)×3\\hat\{\\boldsymbol\{l\}\}\\in\\mathbb\{R\}^\{\(T\_\{1\}\+T\_\{2\}\)\\times 3\}contains the probabilities of three operating\-state experts: regular, curtailment and shutdown\.CE⁡\(⋅,⋅\)\\operatorname\{CE\}\(\\cdot,\\cdot\)denotes the cross\-entropy loss\. The state\-prior labels for the historical and future windows, denoted by𝒍\\boldsymbol\{l\}, are derived from wind\-power behavior rather than ground\-truth operational annotations\. Their construction is detailed in Appendix[C\.3](https://arxiv.org/html/2607.01670#A3.SS3)\.

State\-specific correction\.To make the correction features both state\-specific and temporally contextualized, UniWind first transforms the site\-aware latent sequence𝒛~\\tilde\{\\boldsymbol\{z\}\}with expert\-specific MLPs\. The resulting expert\-conditioned features are weighted by the routing probabilities and then refined by a shared self\-attention layer, producing an interaction\-aware correction embedding𝒛′∈ℝ\(T1\+T2\)×D\\boldsymbol\{z\}^\{\\prime\}\\in\\mathbb\{R\}^\{\(T\_\{1\}\+T\_\{2\}\)\\times D\}\. Based on this embedding, UniWind assigns expert\-specific scaling and upper\-bound parameters:

\(𝜶sd,𝜷sd\)=\(𝟎,𝟎\),\(𝜶cur,𝜷cur\)=\(𝟏,σ​\(Hβ​\(𝒛′\)\)\),\(𝜶reg,𝜷reg\)=\(σ​\(Hα​\(𝒛′\)\),𝟏\),\\displaystyle\(\\boldsymbol\{\\alpha\}^\{\\mathrm\{sd\}\},\\boldsymbol\{\\beta\}^\{\\mathrm\{sd\}\}\)=\(\\boldsymbol\{0\},\\boldsymbol\{0\}\),\\qquad\(\\boldsymbol\{\\alpha\}^\{\\mathrm\{cur\}\},\\boldsymbol\{\\beta\}^\{\\mathrm\{cur\}\}\)=\\left\(\\boldsymbol\{1\},\\sigma\(H\_\{\\beta\}\(\\boldsymbol\{z\}^\{\\prime\}\)\)\\right\),\\qquad\(\\boldsymbol\{\\alpha\}^\{\\mathrm\{reg\}\},\\boldsymbol\{\\beta\}^\{\\mathrm\{reg\}\}\)=\\left\(\\sigma\(H\_\{\\alpha\}\(\\boldsymbol\{z\}^\{\\prime\}\)\),\\boldsymbol\{1\}\\right\),\(11\)whereHα​\(⋅\)H\_\{\\alpha\}\(\\cdot\)andHβ​\(⋅\)H\_\{\\beta\}\(\\cdot\)denote MLP\-based parameter heads\. The shutdown expertsd\\mathrm\{sd\}sets both the scaling factor and upper bound to zero, the curtailment expertcur\\mathrm\{cur\}learns an upper bound ratio, and the regular expertreg\\mathrm\{reg\}learns a multiplicative correction under the rated\-capacity bound\.

For each experte∈\{sd,reg,cur\}e\\in\\\{\\mathrm\{sd\},\\mathrm\{reg\},\\mathrm\{cur\}\\\}, the bounded expert output is computed as follows:

𝒑^e=clip⁡\(𝜶e⊙𝒑prior,0,𝜷e​Pcap\)\.\\hat\{\\boldsymbol\{p\}\}^\{e\}=\\operatorname\{clip\}\\left\(\\boldsymbol\{\\alpha\}^\{e\}\\odot\\boldsymbol\{p\}^\{\\mathrm\{prior\}\},0,\\boldsymbol\{\\beta\}^\{e\}P\_\{\\mathrm\{cap\}\}\\right\)\.\(12\)The final forecast is the routing\-weighted aggregation of expert outputs:

𝒑^=∑e∈\{sd,reg,cur\}𝒍^e⊙𝒑^e,\\hat\{\\boldsymbol\{p\}\}=\\sum\_\{e\\in\\\{\\mathrm\{sd\},\\mathrm\{reg\},\\mathrm\{cur\}\\\}\}\\hat\{\\boldsymbol\{l\}\}\_\{e\}\\odot\\hat\{\\boldsymbol\{p\}\}^\{e\},\(13\)where𝒑^∈ℝT1\+T2\\hat\{\\boldsymbol\{p\}\}\\in\\mathbb\{R\}^\{T\_\{1\}\+T\_\{2\}\}is the predicted power sequence and𝒍^e\\hat\{\\boldsymbol\{l\}\}\_\{e\}denotes the routing probability for expertee\. For clarity, we denote the historical segment𝒑^1:T1\\hat\{\\boldsymbol\{p\}\}\_\{1:T\_\{1\}\}as𝒙^\\hat\{\\boldsymbol\{x\}\}and the future segment𝒑^T1\+1:T1\+T2\\hat\{\\boldsymbol\{p\}\}\_\{T\_\{1\}\+1:T\_\{1\}\+T\_\{2\}\}as𝒚^\\hat\{\\boldsymbol\{y\}\}\.

### 3\.6Training Objectives

Because historical reconstruction stabilizes the learned operational states, and future prediction directly optimizes the forecasting target, we construct the mean squared error loss as follows:

ℒmse=‖𝒙^−𝒙‖F2\+‖𝒚^−𝒚‖F2,\\mathcal\{L\}\_\{\\mathrm\{mse\}\}=\\left\\lVert\\hat\{\\boldsymbol\{x\}\}\-\\boldsymbol\{x\}\\right\\rVert\_\{F\}^\{2\}\+\\left\\lVert\\hat\{\\boldsymbol\{y\}\}\-\\boldsymbol\{y\}\\right\\rVert\_\{F\}^\{2\},\(14\)where𝒙\\boldsymbol\{x\}and𝒚\\boldsymbol\{y\}denote the ground\-truth historical and future power sequences, respectively\.

The final training objective combines forecasting supervision, physical\-prior regularization, and supervised state routing:

ℒ=ℒmse\+ℒprior\+λrouter​ℒrouter,\\mathcal\{L\}=\\mathcal\{L\}\_\{\\mathrm\{mse\}\}\+\\mathcal\{L\}\_\{\\mathrm\{prior\}\}\+\\lambda\_\{\\mathrm\{router\}\}\\mathcal\{L\}\_\{\\mathrm\{router\}\},\(15\)whereλrouter\\lambda\_\{\\mathrm\{router\}\}is the weight for the router\-supervision loss\.

## 4Experiments

### 4\.1Experimental Setup

Datasets\.We evaluate UniWind on 24 real\-world datasets collected from distinct wind farms\. These datasets include the PenmanshielPlumley \([2022b](https://arxiv.org/html/2607.01670#bib.bib35)\)and KelmarshPlumley \([2022a](https://arxiv.org/html/2607.01670#bib.bib36)\)wind farms in the UK, while the remaining datasets span three provinces in China: Shandong, Shanxi, and Anhui\. The corresponding NWP data are obtained from the European Centre for Medium\-Range Weather Forecasts \(ECMWF\)111https://www\.ecmwf\.int/en/forecasts/datasetsand the Global Forecast System \(GFS\)222https://www\.ncep\.noaa\.gov/products/global\-forecast\-system\. All datasets are uniformly processed at 15\-minute intervals\. Detailed summaries of the datasets are provided in Appendix Tables[3](https://arxiv.org/html/2607.01670#A1.T3)and[4](https://arxiv.org/html/2607.01670#A1.T4)\. For each dataset, we use the last two months for testing, the preceding two month for validation, and earlier data for training, removing overlapping windows at split boundaries\.

Baselines\.We compare UniWind with representative and state\-of\-the\-art models for day\-ahead wind power forecasting, which are categorized into five groups:Physical models\.We include PowerCurveHaaset al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib34)\), a physics\-based wind\-turbine power\-curve model that converts meteorological conditions into power estimates using predefined power\-curve assumptions\.Statistical models\.We consider widely used tree\-based statistical learning models, including LightGBMJuet al\.\([2019](https://arxiv.org/html/2607.01670#bib.bib18)\)and XGBoostPhanet al\.\([2021](https://arxiv.org/html/2607.01670#bib.bib17)\), which learn nonlinear mappings from meteorological covariates to wind power\.Time\-series forecasting models\.We compare UniWind with competitive long\-horizon time\-series forecasting models, including PatchTSTNieet al\.\([2023](https://arxiv.org/html/2607.01670#bib.bib28)\), iTransformerLiuet al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib26)\), TimeMixerWanget al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib25)\), and xPatchStitsyuk and Choi \([2025](https://arxiv.org/html/2607.01670#bib.bib27)\)\.Renewable\-energy forecasting models\.We further include solar power forecasting models, including CrossViViTBoussifet al\.\([2023](https://arxiv.org/html/2607.01670#bib.bib19)\), FusionSFMaet al\.\([2024](https://arxiv.org/html/2607.01670#bib.bib20)\), and the wind power forecasting model 2DXformerZhanget al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib2)\)\.Foundation models\.To evaluate zero\-shot generalization, we compare UniWind with the wind\-specific foundation model WindFMFanet al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib5)\)and general time\-series foundation models, including Chronos\-2Ansariet al\.\([2025](https://arxiv.org/html/2607.01670#bib.bib29)\)and MoiraiLiuet al\.\([2025a](https://arxiv.org/html/2607.01670#bib.bib30)\)\. The types of data used by all baselines are reported in Appendix Table[5](https://arxiv.org/html/2607.01670#A2.T5)\.

Setup\.We evaluate all models using Mean Absolute Error \(MAE\) and Root Mean Squared Error \(RMSE\)\. The historical and future windows,T1T\_\{1\}andT2T\_\{2\}, are set to 96 and 192 time steps, respectively, and evaluation is conducted on the last 96 prediction steps\. The one\-day gap betweenT2T\_\{2\}and the evaluation window reflects the operational requirement that generation be forecast before the target delivery day\. In the full\-shot setting, each dataset is trained and evaluated independently using its complete training and test sets\. Due to space limitations, we report full\-shot results only for SD\_A, SX\_A, AH\_A, and UK\_Penmanshiel, with regional average performance provided in Appendix Table[6](https://arxiv.org/html/2607.01670#A4.T6)\. In the zero\-shot setting, WindFM, Chronos\-2, and Moirai are evaluated directly using their officially released checkpoints, while Moirai\-PT and UniWind are trained on all datasets except SD\_A, SX\_A, AH\_A, and UK\_Penmanshiel, and evaluated separately on these four datasets\.

### 4\.2Experimental Results

#### 4\.2\.1Overall Performance

Table[1](https://arxiv.org/html/2607.01670#S4.T1)presents the overall performance\. We compare the full\-shot results of our proposed UniWind with those of end\-to\-end models, and compare the zero\-shot results of our proposed UniWind with those of foundation models\. We highlight the best and second\-best performance inboldandunderline\.

Table 1:Prediction performance comparison on four datasets in terms of MAE and RMSE\.Full\-shot prediction\.As shown in Table[1](https://arxiv.org/html/2607.01670#S4.T1), UniWind achieves the best full\-shot performance across all representative datasets and evaluation metrics\. Similar trends are observed in the regional average results reported in Appendix Table[6](https://arxiv.org/html/2607.01670#A4.T6)\. Among the baselines, the physics\-only PowerCurve performs poorly, suggesting that fixed power\-curve assumptions cannot adequately capture site\-specific environmental effects or latent operational states in real\-world wind farms\. Statistical models, LightGBM and XGBoost, remain strong practical baselines on several datasets, but their performance is less stable across regions because they primarily learn empirical weather\-power correlations\. Time\-series forecasting models that rely on historical sequence patterns generally underperform, confirming that historical power and NWP data are insufficient for day\-ahead wind power forecasting\. Renewable\-energy forecasting models, including CrossViViT, FusionSF, and 2DXformer, further incorporate future NWP information and therefore provide richer meteorological context\. FusionSF is particularly competitive on AH\_A\. However, these models still lag behind UniWind across all reported metrics, as they do not fully disentangle physically available power from latent operational effects\. By combining a site\-calibrated physical prior with state\-aware correction, UniWind better preserves transferable wind\-power structure while adapting to sample\-specific operational states\.

Zero\-shot prediction\.Table[1](https://arxiv.org/html/2607.01670#S4.T1)reports zero\-shot results against wind\-specific and general time\-series foundation models\. UniWind substantially outperforms WindFM, Chronos\-2, Moirai, and Moirai\-PT across all datasets and metrics\. Among the baselines, WindFM performs best on SD\_A, whereas Moirai\-PT performs best on SX\_A and UK\_Penmanshiel\. However, all foundation\-model baselines lag behind UniWind, indicating that wind\-domain pretraining or generic time\-series pretraining without future NWP and physical priors is insufficient for zero\-shot wind power forecasting\. These results show that UniWind transfers more effectively across wind farms by combining future NWP covariates with transferable physical prior modeling and state\-aware correction\.

#### 4\.2\.2Performance under Varied Conditions

Figure[4](https://arxiv.org/html/2607.01670#S4.F4)evaluates UniWind under different meteorological and operational conditions\. We compare UniWind with three representative baselines: XGBoost, PatchTST, and FusionSF\.

![Refer to caption](https://arxiv.org/html/2607.01670v1/x3.png)\(a\)Low\-wind, regular state
![Refer to caption](https://arxiv.org/html/2607.01670v1/x4.png)\(b\)Low\-wind, abnormal state
![Refer to caption](https://arxiv.org/html/2607.01670v1/x5.png)\(c\)High\-wind, regular state
![Refer to caption](https://arxiv.org/html/2607.01670v1/x6.png)\(d\)High\-wind, abnormal state

Figure 4:Performance comparison under varied conditions on the SD\_A dataset\.We evaluate robustness under joint meteorological and operational conditions by dividing the SD\_A test samples into four regimes\. Wind speeds below9​m/s9\\,\\mathrm\{m/s\}, corresponding to the 90th percentile on SD\_A, are treated as low\-wind conditions, whereas the remaining samples are treated as high\-wind conditions\. Regular and abnormal states are determined using the state\-prior labels defined in Appendix[C\.3](https://arxiv.org/html/2607.01670#A3.SS3), with shutdown and curtailment grouped as abnormal states\. As shown in Figure[4](https://arxiv.org/html/2607.01670#S4.F4), UniWind achieves the lowest MAE in all four regimes\. XGBoost is competitive in regular states, but its advantage diminishes once abnormal operation is introduced\. PatchTST performs poorly in most regimes and suffers particularly large errors under high wind speeds\. FusionSF degrades when the wind speed is low\. In contrast, UniWind remains strong across all regimes and gains most in the high\-wind abnormal cases\. This supports the core motivation of UniWind: by separating physical power variations induced by wind speed from power deviations induced by operational states, UniWind can follow the physical wind\-to\-power relationship in regular regimes while correcting deviations in abnormal states\.

#### 4\.2\.3Ablation Studies

![Refer to caption](https://arxiv.org/html/2607.01670v1/x7.png)Figure 5:Ablation study on the SD\_A dataset\.To evaluate the contribution of each component, we construct four variants of UniWind\.w/o phy\.removes the physical prior and replaces it with a learnable parameter\.w/o cor\.directly uses the physical prior as the final prediction\.w/o upperremoves the physical upper\-bound constraint from the Physical Prior Estimator\.w/o stateremoves the supervised state routing lossℒr​o​u​t​e​r\\mathcal\{L\}\_\{router\}\.

As shown in Figure[5](https://arxiv.org/html/2607.01670#S4.F5), all variants perform worse than the full UniWind, demonstrating that each component contributes to the forecasting accuracy\. The largest degradation appears in w/o cor\., indicating that directly using the physical prior is insufficient because realized wind power is strongly affected by latent operational states\. w/o phy\. also shows a clear performance drop, confirming that the physical prior provides an important available\-power reference for subsequent correction\. Removing the upper\-bound constraint or the state classification loss leads to smaller but still noticeable degradation, suggesting that physical regularization and knowledge\-guided state supervision further stabilize the learned prior and improve state\-aware correction\.

Table 2:Efficiency comparison on the SD\_A dataset\.
#### 4\.2\.4Efficiency Comparison

Table[2](https://arxiv.org/html/2607.01670#S4.T2)compares the computational efficiency of UniWind with representative foundation\-model baselines on the SD\_A dataset\. UniWind has only 4\.0M parameters, far fewer than Chronos\-2 and Moirai, while remaining moderately larger than WindFM\. In terms of inference efficiency, UniWind requires 0\.72 ms per sample, achieving a speed comparable to WindFM and substantially faster than Chronos\-2 and Moirai\. These results show that UniWind improves forecasting accuracy without relying on large foundation\-model scale, making it efficient for repeated day\-ahead forecasting across wind farms\.

## 5Conclusion, Limitations, and Future Work

In this paper, we propose UniWind, a day\-ahead wind power forecasting model that captures the structured dependency from meteorological conditions to physically available power and realized generation under latent operational states\. UniWind first constructs a site\-calibrated physical prior through site\-conditioned monotone warping, a shared physical power curve, and a physical upper\-bound constraint\. It then uses a State\-aware Power Corrector to transform this available\-power prior into final forecasts through knowledge\-guided state routing and bounded state\-specific correction\. Extensive experiments on more than 20 real\-world wind\-farm datasets demonstrate that UniWind achieves strong full\-shot and zero\-shot performance and remains effective across diverse operating conditions\. We believe that UniWind can promote the safe, economical, and efficient integration of wind power into future low\-carbon energy systems\.

Like other NWP\-driven day\-ahead forecasting methods, UniWind still relies on the quality of NWP inputs\. However, the proposed site\-calibrated physical prior and state\-aware correction help mitigate this dependence\. Future work will explore the integration of additional meteorological observations to better capture local weather dynamics and further improve forecasting reliability\.

## References

- \[1\]A\. F\. Ansari, O\. Shchur, J\. Küken, A\. Auer, B\. Han, P\. Mercado, S\. S\. Rangapuram, H\. Shen, L\. Stella, X\. Zhang,et al\.\(2025\)Chronos\-2: from univariate to universal forecasting\.arXiv preprint arXiv:2510\.15821\.Cited by:[12nd item](https://arxiv.org/html/2607.01670#A2.I1.i12.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.14.14.2),[§2](https://arxiv.org/html/2607.01670#S2.p3.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.16.16.1)\.
- \[2\]\(2023\)Improving day\-ahead solar irradiance time series forecasting by leveraging spatio\-temporal context\.NeurIPS36,pp\. 2342–2367\.Cited by:[8th item](https://arxiv.org/html/2607.01670#A2.I1.i8.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.21.21.4),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.10.10.1),[§2](https://arxiv.org/html/2607.01670#S2.p2.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.10.10.1)\.
- \[3\]S\. Buhan and I\. Çadırcı\(2015\)Multistage wind\-electric power forecast by using a combination of advanced statistical methods\.IEEE Trans\. Ind\. Inform\.11\(5\),pp\. 1231–1242\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p3.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[4\]T\. Chang, D\. Chen, and D\. Wang\(2025\)Multivariate wind power time series forecasting with noise\-filtering neural odes\.InCIKM,pp\. 221–230\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p2.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[5\]H\. Fan, Y\. Shi, Z\. Fu, S\. Chen, W\. Wei, W\. Xu, and J\. Li\(2025\)WindFM: an open\-source foundation model for zero\-shot wind power forecasting\.arXiv preprint arXiv:2509\.06311\.Cited by:[11st item](https://arxiv.org/html/2607.01670#A2.I1.i11.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.18.18.3),[§2](https://arxiv.org/html/2607.01670#S2.p1.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.15.15.2)\.
- \[6\]H\. Fan, X\. Zhang, S\. Mei, K\. Chen, and X\. Chen\(2020\)M2gsnet: multi\-modal multi\-task graph spatiotemporal network for ultra\-short\-term wind farm cluster power prediction\.Applied Sciences10\(21\),pp\. 7915\.Cited by:[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[7\]Y\. Fang, H\. Miao, Y\. Liang, L\. Deng, Y\. Cui, X\. Zeng, Y\. Xia, Y\. Zhao, T\. B\. Pedersen, C\. S\. Jensen,et al\.\(2026\)Unraveling spatio\-temporal foundation models via the pipeline lens: a comprehensive review\.IEEE Trans\. Knowl\. Data Eng\.\.Cited by:[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[8\]Y\. Fang, S\. Wang, Y\. Liang, Z\. Ye, Y\. Xiang, Y\. Zhao, and K\. Zheng\(2026\)Efficient high\-dimensional time series forecasting with transformers: a channel reordering perspective\.InWWW,pp\. 7223–7234\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p1.1)\.
- \[9\]U\. Focken, M\. Lange, and H\. Waldl\(2001\)Previento\-a wind power prediction system with an innovative upscaling algorithm\.InEWEC,Vol\.276\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p2.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[10\]Wind\-python/windpowerlib: update releaseExternal Links:[Document](https://dx.doi.org/10.5281/zenodo.10685057)Cited by:[1st item](https://arxiv.org/html/2607.01670#A2.I1.i1.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.1.1.2),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.3.3.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.3.3.2)\.
- \[11\]S\. Hanifi, X\. Liu, Z\. Lin, and S\. Lotfian\(2020\)A critical review of wind power forecasting methods—past, present and future\.Energies13\(15\),pp\. 3764\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p1.1)\.
- \[12\]Y\. Hong and C\. L\. P\. P\. Rioflorido\(2019\)A hybrid deep learning\-based neural network for 24\-h ahead wind power forecasting\.Applied Energy250,pp\. 530–539\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p3.1)\.
- \[13\]Y\. Ju, G\. Sun, Q\. Chen, M\. Zhang, H\. Zhu, and M\. U\. Rehman\(2019\)A model combining convolutional neural network and lightgbm algorithm for ultra\-short\-term wind power forecasting\.IEEE Access7,pp\. 28309–28318\.Cited by:[2nd item](https://arxiv.org/html/2607.01670#A2.I1.i2.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.3.3.3),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.4.4.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.4.4.1)\.
- \[14\]I\. Karijadi, S\. Chou, and A\. Dewabharata\(2023\)Wind power forecasting based on hybrid ceemdan\-ewt deep learning method\.Renewable Energy218,pp\. 119357\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p2.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[15\]J\. Keisler and E\. Le Naour\(2024\)WindDragon: enhancing wind power forecasting with automated deep learning\.InWorkshop\-"Tackling Climate Change with Machine Learning", ICLR,Cited by:[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[16\]L\. Li, H\. Shuang, W\. Yi\-mei,et al\.\(2013\)A physical approach of the short\-term wind power prediction based on cfd pre\-calculated flow fields\.Journal of Hydrodynamics, Ser\. B25\(1\),pp\. 56–61\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p2.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[17\]R\. Li, Y\. Xie, X\. Jia, D\. Wang, Y\. Li, Y\. Zhang, Z\. Wang, and Z\. Li\(2024\)Solarcube: an integrative benchmark dataset harnessing satellite and in\-situ observations for large\-scale solar energy forecasting\.NeurIPS37,pp\. 3499–3513\.Cited by:[§2](https://arxiv.org/html/2607.01670#S2.p2.1)\.
- \[18\]X\. Liu, J\. Liu, G\. Woo, T\. Aksu, Y\. Liang, R\. Zimmermann, C\. Liu, J\. Li, S\. Savarese, C\. Xiong,et al\.\(2025\)Moirai\-moe: empowering time series foundation models with sparse mixture of experts\.InICML,Cited by:[13rd item](https://arxiv.org/html/2607.01670#A2.I1.i13.p1.1),[14th item](https://arxiv.org/html/2607.01670#A2.I1.i14.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.15.15.2),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.16.16.2),[§2](https://arxiv.org/html/2607.01670#S2.p3.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.17.17.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.18.18.1)\.
- \[19\]Y\. Liu, T\. Hu, H\. Zhang, H\. Wu, S\. Wang, L\. Ma, and M\. Long\(2024\)ITransformer: inverted transformers are effective for time series forecasting\.InICLR,Cited by:[5th item](https://arxiv.org/html/2607.01670#A2.I1.i5.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.9.9.3),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.7.7.1),[§1](https://arxiv.org/html/2607.01670#S1.p1.1),[§2](https://arxiv.org/html/2607.01670#S2.p3.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.7.7.1)\.
- \[20\]Z\. Liu, M\. Cheng, G\. Zhao, J\. Yang, Q\. Liu, and E\. Chen\(2025\)Improving time series forecasting via instance\-aware post\-hoc revision\.InNeurIPS,Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p4.1)\.
- \[21\]Z\. Ma, W\. Wang, T\. Zhou, C\. Chen, B\. Peng, L\. Sun, and R\. Jin\(2024\)Fusionsf: fuse heterogeneous modalities in a vector quantized framework for robust solar power forecasting\.InSIGKDD,pp\. 5532–5543\.Cited by:[9th item](https://arxiv.org/html/2607.01670#A2.I1.i9.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.27.27.4),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.11.11.1),[§2](https://arxiv.org/html/2607.01670#S2.p2.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.11.11.1)\.
- \[22\]Y\. Nie, N\. H\. Nguyen, P\. Sinthong, and J\. Kalagnanam\(2023\)A time series is worth 64 words: long\-term forecasting with transformers\.InICLR,Cited by:[4th item](https://arxiv.org/html/2607.01670#A2.I1.i4.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.7.7.3),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.6.6.1),[§1](https://arxiv.org/html/2607.01670#S1.p1.1),[§2](https://arxiv.org/html/2607.01670#S2.p3.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.6.6.1)\.
- \[23\]M\. B\. Ozkan and P\. Karagoz\(2015\)A novel wind power forecast model: statistical hybrid wind power forecast technique \(shwip\)\.IEEE Trans\. Ind\. Inform\.11\(2\),pp\. 375–387\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p2.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[24\]R\. K\. Pandit, D\. Infield, and A\. Kolios\(2020\)Gaussian process power curve models incorporating wind turbine operational variables\.Energy Reports6,pp\. 1658–1669\.Cited by:[§3\.3](https://arxiv.org/html/2607.01670#S3.SS3.p2.7)\.
- \[25\]Q\. T\. Phan, Y\. K\. Wu, and Q\. D\. Phan\(2021\)A hybrid wind power forecasting model with xgboost, data preprocessing considering different nwps\.Applied Sciences11\(3\),pp\. 1100\.Cited by:[3rd item](https://arxiv.org/html/2607.01670#A2.I1.i3.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.5.5.3),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.5.5.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.5.5.1)\.
- \[26\]Cited by:[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p1.1)\.
- \[27\]Cited by:[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p1.1)\.
- \[28\]M\. A\. Prósper, C\. Otero\-Casal, F\. C\. Fernández, and G\. Miguez\-Macho\(2019\)Wind power forecasting for a real onshore wind farm on complex terrain using wrf high resolution simulations\.Renewable Energy135,pp\. 674–686\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p3.1)\.
- \[29\]G\. Sideratos and N\. D\. Hatziargyriou\(2007\)An advanced statistical method for wind power forecasting\.IEEE Trans\. Power Syst\.22\(1\),pp\. 258–265\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p2.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[30\]A\. Stitsyuk and J\. Choi\(2025\)Xpatch: dual\-stream time series forecasting with exponential seasonal\-trend decomposition\.InAAAI,Vol\.39,pp\. 20601–20609\.Cited by:[7th item](https://arxiv.org/html/2607.01670#A2.I1.i7.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.13.13.3),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.9.9.1),[§1](https://arxiv.org/html/2607.01670#S1.p1.1),[§2](https://arxiv.org/html/2607.01670#S2.p3.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.9.9.1)\.
- \[31\]J\. Wang, B\. Peng, W\. Wang, Y\. Hu, Y\. Chen, P\. Niu, and L\. Sun\(2025\)SolarMAE: a unified framework for regional centralized and distributed solar power forecasting with weather pre\-training\.InCIKM,pp\. 6093–6101\.Cited by:[§2](https://arxiv.org/html/2607.01670#S2.p2.1)\.
- \[32\]S\. Wang, H\. Wu, X\. Shi, T\. Hu, H\. Luo, L\. Ma, J\. Y\. Zhang, and J\. ZHOU\(2024\)TimeMixer: decomposable multiscale mixing for time series forecasting\.InICLR,Cited by:[6th item](https://arxiv.org/html/2607.01670#A2.I1.i6.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.11.11.3),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.8.8.1),[§1](https://arxiv.org/html/2607.01670#S1.p1.1),[§2](https://arxiv.org/html/2607.01670#S2.p3.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.8.8.1)\.
- \[33\]Y\. Wang, Y\. Qiu, P\. Chen, Y\. Shu, Z\. Rao, L\. Pan, B\. Yang, and C\. Guo\(2025\)LightGTS: a lightweight general time series forecasting model\.InICML,pp\. 64109–64126\.Cited by:[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[34\]Y\. Wang, Y\. Liu, L\. Li, D\. Infield, and S\. Han\(2018\)Short\-term wind power forecasting based on clustering pre\-calculated cfd method\.Energies11\(4\),pp\. 854\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p2.1),[§1](https://arxiv.org/html/2607.01670#S1.p3.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1)\.
- \[35\]Y\. Wang, Q\. Hu, L\. Li, A\. M\. Foley, and D\. Srinivasan\(2019\)Approaches to wind power curve modeling: a review and discussion\.Renew\. Sustain\. Energy Rev\.116,pp\. 109422\.Cited by:[§C\.2](https://arxiv.org/html/2607.01670#A3.SS2.p1.4),[§3\.3](https://arxiv.org/html/2607.01670#S3.SS3.p5.6)\.
- \[36\]J\. Wu, Y\. Shen, R\. Yang, H\. Fan, and Y\. Duan\(2026\)Electric\-carbon market coupling and price transmission mechanism in china: an empirical analysis and development barriers study\.J\. Clean\. Prod\.544,pp\. 147692\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p1.1)\.
- \[37\]X\. Xu, Q\. Cao, R\. Deng, Z\. Guo, Y\. Chen, and J\. Yan\(2025\)A cross\-dataset benchmark for neural network\-based wind power forecasting\.Renewable Energy254,pp\. 123463\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p1.1),[§2](https://arxiv.org/html/2607.01670#S2.p3.1)\.
- \[38\]Y\. Zhang, J\. Jiang, Y\. Yan, L\. Yang, and P\. Zhang\(2025\)2DXformer: dual transformers for wind power forecasting with dual exogenous variables\.arXiv preprint arXiv:2505\.01286\.Cited by:[10th item](https://arxiv.org/html/2607.01670#A2.I1.i10.p1.1),[Table 5](https://arxiv.org/html/2607.01670#A2.T5.24.24.4),[Table 6](https://arxiv.org/html/2607.01670#A4.T6.3.12.12.1),[§1](https://arxiv.org/html/2607.01670#S1.p2.1),[§2](https://arxiv.org/html/2607.01670#S2.p1.1),[§4\.1](https://arxiv.org/html/2607.01670#S4.SS1.p2.1),[Table 1](https://arxiv.org/html/2607.01670#S4.T1.3.12.12.1)\.
- \[39\]Y\. Zhao, L\. Ye, Z\. Li, X\. Song, Y\. Lang, and J\. Su\(2016\)A novel bidirectional mechanism based on time series model for wind power forecasting\.Applied Energy177,pp\. 793–803\.Cited by:[§2](https://arxiv.org/html/2607.01670#S2.p3.1)\.
- \[40\]L\. Zhou, M\. Poli, W\. Xu, S\. Massaroli, and S\. Ermon\(2023\)Deep latent state space models for time\-series generation\.InICML,pp\. 42625–42643\.Cited by:[§1](https://arxiv.org/html/2607.01670#S1.p4.1)\.

## Appendix ADatasets

We provide detailed information on the datasets used in this study\. The wind power data were collected from 24 wind farms across four geographically distinct regions: Anhui, Shandong, and Shanxi in China, and the United Kingdom\. Table[3](https://arxiv.org/html/2607.01670#A1.T3)summarizes the basic information of these wind power datasets, including their temporal coverage, temporal interval, and installed capacity\. Table[4](https://arxiv.org/html/2607.01670#A1.T4)reports the numerical weather prediction \(NWP\) datasets used as meteorological covariates\. Since GFS data are unavailable for the UK datasets, only ECMWF data are used for this region\. In addition, only five ECMWF features are available for the UK datasets: temperature2m, surfacePressure, windSpeed100m, windDirection100m, and airDensity\. Accordingly, we use only these five features for training and evaluation on the UK datasets and adopt the same feature set in the zero\-shot experiments to ensure input consistency\.

The raw data sources have different temporal resolutions\. During preprocessing, we resample the wind power and NWP sequences by upsampling or downsampling as needed and align all variables to a unified 15\-minute interval\. We match each wind farm with the corresponding NWP grid point based on its latitude and longitude\. For wind\-farm datasets using two meteorological sources, we concatenate the feature dimensions from both sources as the weather sequence\. To avoid information leakage among the training, validation, and test sets, we remove overlapping sequences around the split boundaries\. All reported prediction results are computed after the model outputs are denormalized\.

Table 3:The basic information of the wind power datasets\.Table 4:The basic information of the NWP datasets\.
## Appendix BExperimental Details

### B\.1Implementation Details

UniWind is implemented in PyTorch, and all experiments are conducted on an NVIDIA GeForce RTX 2080 Ti 11 GB GPU\. All baselines follow their original designs and use the parameter settings reported in their respective papers\. For UniWind, we use Adam with an initial learning rate of1×10−41\\times 10^\{\-4\}, a weight decay of1×10−51\\times 10^\{\-5\}, and a ReduceLROnPlateau scheduler with a decay factor of 0\.5 and a patience of 5\.ϵslack\\epsilon\_\{\\mathrm\{slack\}\},λtight\\lambda\_\{\\mathrm\{tight\}\}, andλrouter\\lambda\_\{\\mathrm\{router\}\}are set to 1, 0\.01, and 0\.4, respectively\. The hidden dimensionDDis 256\. Both the weather encoder and state retriever use 4 attention heads, and the batch size is 32\.

### B\.2Baselines

We provide detailed descriptions of the baseline models used in our experiments\. These baselines cover physical modeling, statistical learning, time\-series forecasting, renewable\-energy forecasting, and foundation models\.

Table 5:Input information used by baselines\.ModelHistorical NWPFuture NWPHistorical PowerPowerCurve\[[10](https://arxiv.org/html/2607.01670#bib.bib34)\]–√\\surd–LightGBM\[[13](https://arxiv.org/html/2607.01670#bib.bib18)\]–√\\surd√\\surdXGBoost\[[25](https://arxiv.org/html/2607.01670#bib.bib17)\]–√\\surd√\\surdPatchTST\[[22](https://arxiv.org/html/2607.01670#bib.bib28)\]√\\surd–√\\surdiTransformer\[[19](https://arxiv.org/html/2607.01670#bib.bib26)\]√\\surd–√\\surdTimeMixer\[[32](https://arxiv.org/html/2607.01670#bib.bib25)\]√\\surd–√\\surdxPatch\[[30](https://arxiv.org/html/2607.01670#bib.bib27)\]√\\surd–√\\surdChronos\-2\[[1](https://arxiv.org/html/2607.01670#bib.bib29)\]––√\\surdMoirai\[[18](https://arxiv.org/html/2607.01670#bib.bib30)\]––√\\surdMoirai\-PT\[[18](https://arxiv.org/html/2607.01670#bib.bib30)\]––√\\surdWindFM\[[5](https://arxiv.org/html/2607.01670#bib.bib5)\]√\\surd–√\\surdCrossViViT\[[2](https://arxiv.org/html/2607.01670#bib.bib19)\]√\\surd√\\surd√\\surd2DXformer\[[38](https://arxiv.org/html/2607.01670#bib.bib2)\]√\\surd√\\surd√\\surdFusionSF\[[21](https://arxiv.org/html/2607.01670#bib.bib20)\]√\\surd√\\surd√\\surd- •PowerCurve\[[10](https://arxiv.org/html/2607.01670#bib.bib34)\]: A physics\-based wind power model that converts meteorological conditions into power estimates according to predefined wind\-turbine power\-curve assumptions\.
- •LightGBM\[[13](https://arxiv.org/html/2607.01670#bib.bib18)\]: A gradient boosting decision tree model that learns nonlinear mappings from meteorological covariates to wind power generation\. It is widely used as a strong statistical baseline for tabular forecasting tasks\. We concatenate future NWP and historical power data as input to LightGBM\.
- •XGBoost\[[25](https://arxiv.org/html/2607.01670#bib.bib17)\]: An efficient tree\-based boosting model that captures nonlinear feature interactions from future NWP variables and provides a strong practical baseline for wind power forecasting\. We use future NWP data and historical power data as inputs to XGBoost\.
- •PatchTST\[[22](https://arxiv.org/html/2607.01670#bib.bib28)\]: A patch\-based Transformer model for long\-term time\-series forecasting\. It segments historical power sequences into patches and models temporal dependencies with a channel\-independent design\.
- •iTransformer\[[19](https://arxiv.org/html/2607.01670#bib.bib26)\]: A Transformer\-based time\-series forecasting model that treats variables as tokens and learns multivariate temporal dependencies through an inverted attention structure\.
- •TimeMixer\[[32](https://arxiv.org/html/2607.01670#bib.bib25)\]: A time\-series forecasting model that mixes temporal patterns across multiple scales, enabling it to capture both short\-term and long\-term dependencies from historical power observations\.
- •xPatch\[[30](https://arxiv.org/html/2607.01670#bib.bib27)\]: A long\-horizon forecasting model that enhances patch\-based time\-series representations for historical sequence modeling\.
- •CrossViViT\[[2](https://arxiv.org/html/2607.01670#bib.bib19)\]: A renewable\-energy forecasting model originally designed to fuse spatio\-temporal weather context with station\-level time series\. In our experiments, this model is adapted for wind power prediction by incorporating historical and future NWP data as well as historical generation data\.
- •FusionSF\[[21](https://arxiv.org/html/2607.01670#bib.bib20)\]: A solar power forecasting model that fuses heterogeneous temporal and weather information\. Since satellite cloud imagery is unavailable, we replace the satellite\-image branch with historical NWP data\.
- •2DXformer\[[38](https://arxiv.org/html/2607.01670#bib.bib2)\]: A wind power forecasting model that captures temporal dependencies from historical weather and power sequences through Transformer\-based representation learning\. In our experiments, future NWP data are additionally incorporated to ensure consistency across the experimental datasets\.
- •WindFM\[[5](https://arxiv.org/html/2607.01670#bib.bib5)\]: A wind power foundation model pretrained on large\-scale wind power data\. We evaluate it in the zero\-shot setting using the released checkpoint\.
- •Chronos\-2\[[1](https://arxiv.org/html/2607.01670#bib.bib29)\]: A general\-purpose time\-series foundation model pretrained on heterogeneous time\-series corpora\. We evaluate it as a zero\-shot forecasting baseline using the released checkpoint\.
- •Moirai\[[18](https://arxiv.org/html/2607.01670#bib.bib30)\]: A universal time\-series foundation model for zero\-shot forecasting across diverse domains\. We evaluate it in the zero\-shot setting using the released checkpoint\.
- •Moirai\-PT\[[18](https://arxiv.org/html/2607.01670#bib.bib30)\]: A Moirai variant further pretrained on wind power datasets to assess whether additional wind\-domain pretraining improves zero\-shot wind power forecasting\.

Table[5](https://arxiv.org/html/2607.01670#A2.T5)summarizes the input information used by each baseline\. Here,√\\surdindicates that the corresponding input type is used by the model, and “–” indicates that it is not used\.

## Appendix CSupplementary Methods

### C\.1Supplementary Proof

###### Proposition 1\(Monotonicity of the site\-conditioned warp\)\.

For a given wind farm with site features𝐜\\boldsymbol\{c\}, let\[a,b\]⊤=hϕ​\(𝐜\)∈ℝ2\[a,b\]^\{\\top\}=h\_\{\\phi\}\(\\boldsymbol\{c\}\)\\in\\mathbb\{R\}^\{2\}, wherehϕ​\(⋅\)h\_\{\\phi\}\(\\cdot\)is the site\-conditioned parameter head,aais the site\-specific speed\-scale parameter, andbbis the site\-specific stretch parameter\. For any timestamptt, letv~t≥0\\tilde\{v\}\_\{t\}\\geq 0denote the equivalent wind speed and define the unclipped warp

g​\(v~t\)=exp⁡\(a\)​\(1\+v~t\)exp⁡\(b\)−1\.g\(\\tilde\{v\}\_\{t\}\)=\\exp\(a\)\(1\+\\tilde\{v\}\_\{t\}\)^\{\\exp\(b\)\}\-1\.\(16\)Theng​\(v~t\)g\(\\tilde\{v\}\_\{t\}\)is strictly increasing with respect tov~t\\tilde\{v\}\_\{t\}\. Consequently, the clipped effective wind speedv^t=max⁡\(0,g​\(v~t\)\)\\hat\{v\}\_\{t\}=\\max\(0,g\(\\tilde\{v\}\_\{t\}\)\)is non\-decreasing with respect tov~t\\tilde\{v\}\_\{t\}\.

###### Proof\.

For a fixed site,aaandbbare constants with respect to the equivalent wind speedv~t\\tilde\{v\}\_\{t\}\. Forv~t≥0\\tilde\{v\}\_\{t\}\\geq 0, the derivative of the unclipped warp is

∂g​\(v~t\)∂v~t=exp⁡\(a\)​exp⁡\(b\)​\(1\+v~t\)exp⁡\(b\)−1\.\\frac\{\\partial g\(\\tilde\{v\}\_\{t\}\)\}\{\\partial\\tilde\{v\}\_\{t\}\}=\\exp\(a\)\\exp\(b\)\(1\+\\tilde\{v\}\_\{t\}\)^\{\\exp\(b\)\-1\}\.\(17\)Sinceexp⁡\(a\)\>0\\exp\(a\)\>0,exp⁡\(b\)\>0\\exp\(b\)\>0, and1\+v~t\>01\+\\tilde\{v\}\_\{t\}\>0, we have

∂g​\(v~t\)∂v~t\>0\.\\frac\{\\partial g\(\\tilde\{v\}\_\{t\}\)\}\{\\partial\\tilde\{v\}\_\{t\}\}\>0\.\(18\)Therefore,g​\(v~t\)g\(\\tilde\{v\}\_\{t\}\)is strictly increasing onv~t≥0\\tilde\{v\}\_\{t\}\\geq 0\.

It remains to consider the clipping operation\. The functionm​\(x\)=max⁡\(0,x\)m\(x\)=\\max\(0,x\)is non\-decreasing inxx\. Sinceg​\(v~t\)g\(\\tilde\{v\}\_\{t\}\)is strictly increasing inv~t\\tilde\{v\}\_\{t\}, the compositionm​\(g​\(v~t\)\)m\(g\(\\tilde\{v\}\_\{t\}\)\)is non\-decreasing inv~t\\tilde\{v\}\_\{t\}\. Thus, the clipped effective wind speedv^t=max⁡\(0,g​\(v~t\)\)\\hat\{v\}\_\{t\}=\\max\(0,g\(\\tilde\{v\}\_\{t\}\)\)preserves wind\-speed ordering: a larger equivalent wind speed cannot produce a smaller effective wind speed\. The result applies element\-wise to the vector form used in the main text\. ∎

### C\.2Shared Physical Power Curve

The shared canonical curvefcurve​\(⋅\)f\_\{\\mathrm\{curve\}\}\(\\cdot\)maps the normalized effective wind speed𝒗∗\\boldsymbol\{v\}^\{\*\}to a capacity factor in\[0,1\]\[0,1\], and is inspired by the classical wind\-turbine power\-curve structure\[[35](https://arxiv.org/html/2607.01670#bib.bib33)\]\. Specifically, a typical power curve has near\-zero output before the cut\-in region, nonlinear power growth in the sub\-rated region, and a rated\-power plateau after the rated\-speed region\. The nonlinear rising segment is physically motivated by the wind\-energy relationPwind∝ρ​v3P\_\{\\mathrm\{wind\}\}\\propto\\rho v^\{3\}\. We do not model the high\-wind cut\-out regime in this shared available\-power curve, because abnormal shutdowns and curtailment effects are handled by the State\-aware Power Corrector\.

We parameterize this physical prior as an element\-wise smooth approximation:

fcurve​\(𝒗∗\)=clip⁡\(gon​\(𝒗∗\)​gmid​\(𝒗∗\)​h​\(𝒗∗\)\+grated​\(𝒗∗\),0,1\),f\_\{\\mathrm\{curve\}\}\(\\boldsymbol\{v\}^\{\*\}\)=\\operatorname\{clip\}\\left\(g\_\{\\mathrm\{on\}\}\(\\boldsymbol\{v\}^\{\*\}\)\\,g\_\{\\mathrm\{mid\}\}\(\\boldsymbol\{v\}^\{\*\}\)\\,h\(\\boldsymbol\{v\}^\{\*\}\)\+g\_\{\\mathrm\{rated\}\}\(\\boldsymbol\{v\}^\{\*\}\),0,1\\right\),\(19\)whereclip⁡\(⋅,0,1\)\\operatorname\{clip\}\(\\cdot,0,1\)constrains the output to a valid capacity factor\. The cut\-in gate and rated\-region gate are defined as

gon​\(𝒗∗\)=σ​\(τon​𝒗∗\),grated​\(𝒗∗\)=σ​\(τrated​\(𝒗∗−1\)\),g\_\{\\mathrm\{on\}\}\(\\boldsymbol\{v\}^\{\*\}\)=\\sigma\(\\tau\_\{\\mathrm\{on\}\}\\boldsymbol\{v\}^\{\*\}\),\\qquad g\_\{\\mathrm\{rated\}\}\(\\boldsymbol\{v\}^\{\*\}\)=\\sigma\(\\tau\_\{\\mathrm\{rated\}\}\(\\boldsymbol\{v\}^\{\*\}\-1\)\),\(20\)whereσ​\(⋅\)\\sigma\(\\cdot\)is the sigmoid function,τon\>0\\tau\_\{\\mathrm\{on\}\}\>0controls the sharpness of the cut\-in transition, andτrated\>0\\tau\_\{\\mathrm\{rated\}\}\>0controls the sharpness of the transition into the rated region\. The sub\-rated gate and rising segment are

gmid​\(𝒗∗\)=1−grated​\(𝒗∗\),h​\(𝒗∗\)=k​softplus​\(𝒗∗\)n,g\_\{\\mathrm\{mid\}\}\(\\boldsymbol\{v\}^\{\*\}\)=1\-g\_\{\\mathrm\{rated\}\}\(\\boldsymbol\{v\}^\{\*\}\),\\qquad h\(\\boldsymbol\{v\}^\{\*\}\)=k\\,\\mathrm\{softplus\}\(\\boldsymbol\{v\}^\{\*\}\)^\{n\},\(21\)wheregmid​\(⋅\)g\_\{\\mathrm\{mid\}\}\(\\cdot\)suppresses the rising segment after the rated region,softplus​\(x\)=log⁡\(1\+exp⁡\(x\)\)\\mathrm\{softplus\}\(x\)=\\log\(1\+\\exp\(x\)\)provides a smooth positive basis for the rising segment,k\>0k\>0controls the sub\-rated scale, andn\>0n\>0controls the growth curvature\. We use a learnable exponentnninstead of a fixed cubic power becausePwind∝ρ​v3P\_\{\\mathrm\{wind\}\}\\propto\\rho v^\{3\}describes available wind energy, whereas the effective wind\-farm power curve can deviate from an exact cubic law due to turbine efficiency\. In implementation, the learnable parameternnis bounded around the physically motivated cubic regime, whilekk,τon\\tau\_\{\\mathrm\{on\}\}andτrated\\tau\_\{\\mathrm\{rated\}\}are constrained to positive ranges\. This preserves the primary physical shape of a classical power curve\.

### C\.3State\-Prior Label Construction

These state\-prior labels are weak supervisory signals derived from wind\-power behavior rather than ground\-truth operational annotations\. They provide weak supervision for the state router without requiring manually annotated operational states\. For each wind farm, we first construct an empirical available\-power curve from wind\-power pairs in the training dataset\. Letv¯t\\bar\{v\}\_\{t\}andptp\_\{t\}denote the representative wind speed and the observed power at timestamptt, respectively\. We divide the wind\-speed range between the1%1\\%and99\.5%99\.5\\%quantiles into bins and estimate the available power in each bin using the upper empirical quantile:

qj=Quantile0\.9⁡\(\{ptclip:v¯t∈Bj\}\),q\_\{j\}=\\operatorname\{Quantile\}\_\{0\.9\}\\left\(\\left\\\{p\_\{t\}^\{\\mathrm\{clip\}\}:\\bar\{v\}\_\{t\}\\in B\_\{j\}\\right\\\}\\right\),\(22\)whereBjB\_\{j\}is thejj\-th wind\-speed bin andqjq\_\{j\}is the empirical available\-power estimate for that bin\. Missing bins are linearly interpolated\. The resulting curve is smoothed by short moving\-window filters, and monotonicity is enforced by a cumulative maximum\. Interpolating this monotone curve atv¯t\\bar\{v\}\_\{t\}gives the expected available powerqtexp∈\[0,Pcap\]q\_\{t\}^\{\\mathrm\{exp\}\}\\in\[0,P\_\{\\mathrm\{cap\}\}\]\.

We then assign state\-prior labels by comparing the measured power with the expected available power\. Define the power ratio, power gap, and local power variability as

rtexp=ptclipmax⁡\(qtexp,10−6\),σt\(4\)=Std⁡\(pt−1:t\+2clip\)\.r\_\{t\}^\{\\mathrm\{exp\}\}=\\frac\{p\_\{t\}^\{\\mathrm\{clip\}\}\}\{\\max\(q\_\{t\}^\{\\mathrm\{exp\}\},10^\{\-6\}\)\},\\qquad\\sigma\_\{t\}^\{\(4\)\}=\\operatorname\{Std\}\\left\(p\_\{t\-1:t\+2\}^\{\\mathrm\{clip\}\}\\right\)\.\(23\)Here,rtexpr\_\{t\}^\{\\mathrm\{exp\}\}measures the realized power ratio relative to the expected available power, andσt\(4\)\\sigma\_\{t\}^\{\(4\)\}is the centered four\-step local standard deviation used to identify flat curtailed segments\. We use the thresholds

ηavail=0\.2,ηpow=0\.05,ηratio=0\.7,γstd=1\.\\eta\_\{\\mathrm\{avail\}\}=0\.2,\\quad\\eta\_\{\\mathrm\{pow\}\}=0\.05,\\quad\\eta\_\{\\mathrm\{ratio\}\}=0\.7,\\quad\\gamma\_\{\\mathrm\{std\}\}=1\.\(24\)A timestamp is considered to have sufficient available power whenqtexp≥ηavail​Pcapq\_\{t\}^\{\\mathrm\{exp\}\}\\geq\\eta\_\{\\mathrm\{avail\}\}P\_\{\\mathrm\{cap\}\}\. The shutdown and curtailment masks are then defined as

mtsd=\[qtexp≥ηavail​Pcap\]∧\[ptclip≤ηpow​Pcap\],m\_\{t\}^\{\\mathrm\{sd\}\}=\\left\[q\_\{t\}^\{\\mathrm\{exp\}\}\\geq\\eta\_\{\\mathrm\{avail\}\}P\_\{\\mathrm\{cap\}\}\\right\]\\wedge\\left\[p\_\{t\}^\{\\mathrm\{clip\}\}\\leq\\eta\_\{\\mathrm\{pow\}\}P\_\{\\mathrm\{cap\}\}\\right\],\(25\)mtcur=\[qtexp≥ηavail​Pcap\]∧\[rtexp≤ηratio\]∧\[σt\(4\)≤γstd\]\.m\_\{t\}^\{\\mathrm\{cur\}\}=\\left\[q\_\{t\}^\{\\mathrm\{exp\}\}\\geq\\eta\_\{\\mathrm\{avail\}\}P\_\{\\mathrm\{cap\}\}\\right\]\\wedge\\left\[r\_\{t\}^\{\\mathrm\{exp\}\}\\leq\\eta\_\{\\mathrm\{ratio\}\}\\right\]\\wedge\\left\[\\sigma\_\{t\}^\{\(4\)\}\\leq\\gamma\_\{\\mathrm\{std\}\}\\right\]\.\(26\)Here,mtsdm\_\{t\}^\{\\mathrm\{sd\}\}identifies near\-zero generation despite sufficient available power, whilemtcurm\_\{t\}^\{\\mathrm\{cur\}\}identifies sustained under\-production with low local variability\. The final state\-prior labellt∈\{0,1,2\}l\_\{t\}\\in\\\{0,1,2\\\}is assigned as

lt=\{0,mtsd,1,mtcur​and​¬mtsd,2,otherwise,l\_\{t\}=\\begin\{cases\}0,&m\_\{t\}^\{\\mathrm\{sd\}\},\\\\ 1,&m\_\{t\}^\{\\mathrm\{cur\}\}\\ \\mathrm\{and\}\\ \\neg m\_\{t\}^\{\\mathrm\{sd\}\},\\\\ 2,&\\mathrm\{otherwise\},\\end\{cases\}\(27\)where0,11, and22correspond to shutdown, curtailment, and regular generation, respectively\. Shutdown is given priority when the shutdown and curtailment masks overlap\.

## Appendix DAdditional Results

### D\.1Average Performance Comparison across Regions

Table[6](https://arxiv.org/html/2607.01670#A4.T6)reports the average full\-shot performance across four regions\. Since the rated power varies across wind farms, we report normalized metrics, defined as NMAE = MAE/rated power×\\times100% and NRMSE = RMSE/rated power×\\times100%\. UniWind achieves the best NMAE and NRMSE in all regions\. These consistent gains show that UniWind remains robust across different regional distributions, while statistical and renewable\-energy forecasting baselines exhibit more region\-dependent performance\.

Table 6:Average performance comparison of end\-to\-end prediction across regions \(%\)\.
### D\.2Ablation Studies

The experimental results of the ablation study on the SD\_A, SX\_A, AH\_A and UK\_Penmanshiel datasets are shown in Figure[6](https://arxiv.org/html/2607.01670#A4.F6)\.

![Refer to caption](https://arxiv.org/html/2607.01670v1/x8.png)\(a\)SD\_A
![Refer to caption](https://arxiv.org/html/2607.01670v1/x9.png)\(b\)SX\_A
![Refer to caption](https://arxiv.org/html/2607.01670v1/x10.png)\(c\)AH\_A
![Refer to caption](https://arxiv.org/html/2607.01670v1/x11.png)\(d\)UK\_Penmanshiel

Figure 6:Ablation Studies\.
### D\.3State Statistics

Table[7](https://arxiv.org/html/2607.01670#A4.T7)summarizes the average proportions of state\-prior labels constructed by the procedure in Appendix[C\.3](https://arxiv.org/html/2607.01670#A3.SS3)\. The regular state dominates all regions, but the proportion varies substantially, from 44\.8% in the UK to 84\.3% in AH\. The abnormal states also show clear regional differences, indicating that wind farms in different regions exhibit distinct operating\-state distributions\.

Table 7:The average proportion of states in each region\.
### D\.4Parameter Sensitivity Analyses

![Refer to caption](https://arxiv.org/html/2607.01670v1/x12.png)\(a\)λr​o​u​t​e​r\\lambda\_\{router\}
![Refer to caption](https://arxiv.org/html/2607.01670v1/x13.png)\(b\)λu​p​p​e​r\\lambda\_\{upper\}

Figure 7:Sensitivity analysis on the SD\_A dataset\.Figure[7](https://arxiv.org/html/2607.01670#A4.F7)shows the sensitivity of UniWind to the router\-supervision weightλrouter\\lambda\_\{\\mathrm\{router\}\}and the physical upper\-bound weightλupper\\lambda\_\{\\mathrm\{upper\}\}on the SD\_A dataset\. When either weight is set to 0, the MAE increases, indicating that both knowledge\-guided state supervision and physical upper\-bound regularization contribute to stable forecasting\. For nonzero values, the performance remains relatively stable, suggesting that UniWind is not highly sensitive to the exact choice of these hyperparameters\. Based on the validation performance, we setλrouter=0\.4\\lambda\_\{\\mathrm\{router\}\}=0\.4andλupper=0\.01\\lambda\_\{\\mathrm\{upper\}\}=0\.01in the final model\.

### D\.5Case studies

![Refer to caption](https://arxiv.org/html/2607.01670v1/x14.png)Figure 8:Case study on the SD\_A dataset\. The physical prior follows the wind\-speed\-driven available power and saturates near the rated power, while UniWind corrects this prior into the final realized\-power forecast\.Figure[8](https://arxiv.org/html/2607.01670#A4.F8)provides a qualitative example on the SD\_A dataset, illustrating how UniWind separates physically available power from realized generation\. The physical prior generally follows the wind\-speed profile: it increases rapidly when wind speed enters the productive range, decreases under low\-wind conditions, and forms a plateau during high\-wind periods\. This plateau does not indicate a forecasting error, but reflects the saturation of the wind farm at its rated power after the wind speed reaches the rated region\. However, the realized power can remain substantially below this available\-power envelope because of latent operating conditions and site\-specific effects\. UniWind therefore does not directly use the physical prior as the final output\. Instead, the State\-aware Power Corrector adjusts the prior according to the inferred operating state, enabling the prediction to follow the ground\-truth trajectory during both high\-potential intervals and low\-output periods\. In particular, the prediction tracks the ground truth when the physical prior stays near the rated\-power plateau, and also captures the sharp drops and recoveries when the available power changes rapidly\. This case confirms that the proposed physical prior provides a meaningful upper\-envelope reference, while state\-aware correction is necessary to translate this reference into realistic day\-ahead power forecasts\.

Similar Articles

This AI weather startup is out-forecasting government agencies

TechCrunch AI

Windborne Systems launched WeatherMesh 6, an AI weather forecasting model that claims to outperform the European Centre for Medium-Range Weather Forecasting (ECMWF) in accuracy and frequency, thanks to direct ingestion of sensor data from its balloons.