AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion
Summary
AirCast-SR is a diffusion-based foundation model that downscales global AI weather forecasts from 0.25° to 1 km resolution at hourly cadence, producing 67-hour forecasts with near-zero bias and structural realism, while running inference in minutes on a single commodity GPU.
View Cached Full Text
Cached at: 05/27/26, 09:03 AM
# AirCast-SR: A Foundation Model for Kilometer-Scale Atmospheric Super-Resolution via Latent Consistency Diffusion
Source: [https://arxiv.org/html/2605.26130](https://arxiv.org/html/2605.26130)
Somnath Luitel†Department of Earth, Environmental, and Atmospheric Sciences, Western Kentucky University, Bowling Green, KY, USAManmeet Singh†,∗Department of Earth, Environmental, and Atmospheric Sciences, Western Kentucky University, Bowling Green, KY, USAJoshua DurkeeDepartment of Earth, Environmental, and Atmospheric Sciences, Western Kentucky University, Bowling Green, KY, USANaveen SudharsanThe University of Texas at Austin, Austin, TX, USAPrabhjot SinghThe University of Texas at Austin, Austin, TX, USACenlin HeNSF National Center for Atmospheric Research, Boulder, CO, USAHarsh KamathThe University of Texas at Austin, Austin, TX, USAZong\-Liang YangThe University of Texas at Austin, Austin, TX, USAKrishnagopal HalderLeibniz Centre for Agricultural Landscape Research \(ZALF\), Berlin, GermanySandeep JunejaAshoka University, Sonipat, IndiaParthasarathi MukhopadhyayAshoka University, Sonipat, IndiaSaptarishi DhanukaAshoka University, Sonipat, IndiaAmit Kumar SrivastavaLeibniz Centre for Agricultural Landscape Research \(ZALF\), Berlin, Germany
###### Abstract
†Equal contribution \(co\-first authors\)\.∗Corresponding author: manmeet\.singh@wku\.edu Kilometer\-scale weather prediction is computationally prohibitive for traditional numerical weather prediction \(NWP\) models, restricting fine\-grained forecasts to a small set of operational systems run on supercomputing infrastructure\. Here we introduceAirCast\-SR, a diffusion\-based foundation model that downscales global AI weather forecasts from0\.25°0\.25\\text\{\\,\}\\mathrm\{\\SIUnitSymbolDegree\}\(∼\\sim28km28\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}\) to1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}resolution at hourly cadence, producing 67\-hour forecasts of seven coupled near\-surface variables simultaneously\. AirCast\-SR couples a three\-dimensional U\-Net to a Latent Consistency Model \(LCM\) diffusion framework, trained on patch\-based samples over the contiguous United States \(CONUS\) with GraphCast forecasts as input and NOAA’s Analysis of Record for Calibration \(AORC\) as the target\. Across two extreme events \(Winter Storm Elliott in December 2022 and the June 2022 summer convective episode\) and one spring transition case \(March 2023\), AirCast\-SR achieves near\-zero systematic bias across all variables and all lead times to 48 h, while its radial power spectral density tracks the AORC reference at wavelengths from10kmto100km10\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}100\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}where coarser models lose spectral power\. The model approaches but does not yet exceed the operational High\-Resolution Rapid Refresh \(HRRR\) on pointwise skill for most variables; instead, it delivers broader coupled\-variable coverage, structural realism via diffusion \(the perception–distortion regime of generative super\-resolution\), and inference in minutes on a single commodity GPU\. Without retraining, AirCast\-SR generalises zero\-shot to India and Germany when verified against surface station observations from StationBench\. Released with open weights, AirCast\-SR establishes a foundation for kilometer\-scale AI weather services and a clear scaling path—more compute, more training years, ensemble inference—toward eventually exceeding operational NWP systems\.
## IIntroduction
The past three years have witnessed a transformative shift in weather prediction, with AI\-based models achieving forecast skill comparable to or exceeding state\-of\-the\-art numerical weather prediction \(NWP\) systems at global scales\[[1](https://arxiv.org/html/2605.26130#bib.bib1),[2](https://arxiv.org/html/2605.26130#bib.bib2),[3](https://arxiv.org/html/2605.26130#bib.bib3),[4](https://arxiv.org/html/2605.26130#bib.bib4)\]\. Models such as GraphCast\[[1](https://arxiv.org/html/2605.26130#bib.bib1)\], Pangu\-Weather\[[2](https://arxiv.org/html/2605.26130#bib.bib2)\], and GenCast\[[5](https://arxiv.org/html/2605.26130#bib.bib5)\]produce medium\-range forecasts at0\.25°to1°0\.25\\text\{\\,\}\\mathrm\{\\SIUnitSymbolDegree\}1\\text\{\\,\}\\mathrm\{\\SIUnitSymbolDegree\}resolution in seconds on a single GPU, democratizing access to global weather forecasts\. Yet the resolution of these global AI models—typically25km25\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}or coarser—leaves a structural gap for the growing class of applications that demand kilometer\-scale, hourly meteorological fields\[[6](https://arxiv.org/html/2605.26130#bib.bib6)\]: renewable\-energy management, precision agriculture, urban hydrology, wildfire spread, and local hazard response\.
Bridging this gap has traditionally required computationally expensive limited\-area NWP models such as NOAA’s High\-Resolution Rapid Refresh \(HRRR\)\[[7](https://arxiv.org/html/2605.26130#bib.bib7)\], which operates at3km3\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}resolution over CONUS and is refreshed hourly on supercomputing infrastructure\. HRRR sets the operational benchmark for short\-range, kilometer\-scale forecasting over the United States\. Its computational cost, however, restricts both the spatial domain and the ensemble size that can be sustained, and no operational system delivers the full coupled set of near\-surface variables \(temperature, humidity, wind components, pressure, precipitation, and longwave radiation\) at1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}hourly resolution over arbitrary continental domains worldwide\.
Statistical and machine\-learning\-based downscaling offers a path to kilometer\-scale fields at drastically reduced cost\. Classical approaches include bias correction and spatial disaggregation \(BCSD\)\[[8](https://arxiv.org/html/2605.26130#bib.bib8)\], constructed analogues\[[9](https://arxiv.org/html/2605.26130#bib.bib9)\], and regression\-based methods\[[10](https://arxiv.org/html/2605.26130#bib.bib10)\]\. More recent deep learning approaches have applied convolutional networks\[[11](https://arxiv.org/html/2605.26130#bib.bib11),[12](https://arxiv.org/html/2605.26130#bib.bib12)\], generative adversarial networks \(GANs\)\[[13](https://arxiv.org/html/2605.26130#bib.bib13),[14](https://arxiv.org/html/2605.26130#bib.bib14)\], and diffusion models\[[15](https://arxiv.org/html/2605.26130#bib.bib15)\]to atmospheric downscaling\. Among these, diffusion\-based models have shown particular promise: they generate physically realistic fine\-scale structure without the mode collapse that plagues GANs\[[16](https://arxiv.org/html/2605.26130#bib.bib16),[17](https://arxiv.org/html/2605.26130#bib.bib17)\], and they sit naturally within the*perception–distortion*trade\-off of generative super\-resolution\[[25](https://arxiv.org/html/2605.26130#bib.bib25)\], where structural realism \(sharp gradients, mesoscale variability\) is recovered at the cost of some pointwise pixel\-wise accuracy\.
In this work, we introduceAirCast\-SR\(Super\-Resolution\), a diffusion\-based foundation model that addresses three coupled requirements simultaneously\. First, a*capability*requirement: simultaneous prediction of seven coupled near\-surface variables—precipitation, 2\-m temperature, 2\-m specific humidity, 10\-muuandvvwind components, surface pressure, and downward longwave radiation—at1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}and 1\-hour resolution over a 67\-hour forecast window, a configuration not delivered by any existing operational system globally\. Second, a*cost*requirement: full inference over CONUS in minutes on a single commodity GPU, in contrast to the supercomputing budgets required by limited\-area NWP\. Third, a*generalisation*requirement: the model is trained on CONUS but expected to transfer zero\-shot to other continents through patch\-based learning over physical features \(topography, solar forcing\) rather than regional patterns\. AirCast\-SR takes 0\.25° GraphCast forecasts at 6\-hourly intervals as conditioning input and produces simultaneous 1\-km hourly predictions of the seven near\-surface variables\. We evaluate it against AORC reference data using HRRR as the operational benchmark and GraphCast as the AI baseline\.
## IIMethods
### II\.1Model Architecture
AirCast\-SR is built on a 3D U\-Net architecture\[[18](https://arxiv.org/html/2605.26130#bib.bib18),[19](https://arxiv.org/html/2605.26130#bib.bib19)\]operating within a Latent Consistency Model \(LCM\)\[[20](https://arxiv.org/html/2605.26130#bib.bib20)\]diffusion framework\. The 3D architecture jointly processes spatial \(1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}\) and temporal \(hourly\) dimensions in a single forward pass, preserving multi\-variable physical coherence across the 67\-hour forecast window\. The denoiser is aUNet3DConditionModelwith 28 input channels \(7 target variables \+ 20 conditioning channels\), block channels\(64,128,256,512\)\(64,128,256,512\), four encoder/decoder stages, two layers per block, group normalisation with eight groups, and cross\-attention with eight heads\. The 20 conditioning channels are: 17 GraphCast atmospheric variables sampled at three pressure levels \(bilinearly interpolated from 0\.25° to the1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}target grid\), normalised topography, sky\-view factor, and the cosine of the solar zenith angle\. Temporal interpolation fromTcond=12T\_\{\\text\{cond\}\}=12\(the 6\-hourly GraphCast horizon\) toTtarget=67T\_\{\\text\{target\}\}=67uses trilinear interpolation\.
Rather than the standard denoising diffusion probabilistic model \(DDPM\) approach, which requires hundreds of denoising steps\[[16](https://arxiv.org/html/2605.26130#bib.bib16)\], we adopt the LCM framework\[[20](https://arxiv.org/html/2605.26130#bib.bib20)\]which distils the diffusion process into a consistency function, enabling high\-quality generation in 4–25 denoising steps\. The noise schedule uses 1000 training timesteps with a scaled\-linearβ\\betaschedule\. During training, Gaussian noiseϵ∼𝒩\(𝟎,𝐈\)\\boldsymbol\{\\epsilon\}\\sim\\mathcal\{N\}\(\\mathbf\{0\},\\mathbf\{I\}\)is added to normalised targets at a uniformly sampled timestepkk, and the model predicts the noise via mean\-squared\-error loss\. At inference, generation starts from𝐳T∼𝒩\(𝟎,𝐈\)\\mathbf\{z\}\_\{T\}\\sim\\mathcal\{N\}\(\\mathbf\{0\},\\mathbf\{I\}\)and iterates through the LCM solver\. Although AirCast\-SR is trained as a deterministic point estimator from each random seed, the underlying model is fundamentally probabilistic: independent random seeds produce statistically distinct realisations, providing a natural route to ensemble inference and probabilistic skill evaluation in subsequent work\.
### II\.2Training Data and Procedure
GraphCast reforecasts at0\.25°0\.25\\text\{\\,\}\\mathrm\{\\SIUnitSymbolDegree\}resolution \(6\-hourly\) serve as input; NOAA AORC v1\.1 fields at1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}hourly resolution\[[21](https://arxiv.org/html/2605.26130#bib.bib21)\]serve as the target\. The training domain is CONUS and the training period is calendar year 2021 \(∼\\sim300 initialisation times\)\. All variables are min–max normalised to\[0,1\]\[0,1\]; precipitation useslog\(1\+x\)\\log\(1\+x\)\-transformed min–max scaling\. Training samples are64×6464\\times 64pixel patches drawn at random spatial locations within CONUS at each step, with batch size 1, AdamW optimiser, learning rate10−410^\{\-4\}, weight decay10−210^\{\-2\}, and checkpoint selection on global best validation loss\. GraphCast conditioning fields are pre\-materialised per initialisation time for I/O efficiency\.
##### Temporal train/test separation\.
All evaluation cases reported in this work fall*outside*the 2021 training period\. The CONUS case studies are December 22, 2022 \(Winter Storm Elliott\), June 12, 2022 \(a continental\-scale convective episode\), and March 31, 2023 \(a spring frontal transition\); the international cases initialise at March 3, 2023 \(India\) and August 16, 2023 \(Germany\)\. No data from these initialisation times—neither the GraphCast conditioning fields nor the AORC targets—were seen during training, validation, or hyper\-parameter selection\. This strict chronological separation is essential for honest evaluation of weather forecasting models, where random temporal sampling produces optimistic results due to autocorrelation in the underlying meteorology\[[24](https://arxiv.org/html/2605.26130#bib.bib24)\]\.
### II\.3Inference
At inference, the target domain is tiled into256×256256\\times 256patches with 128\-pixel stride \(50% overlap\)\. Patches are independently denoised and merged via cosine\-tapered spatial blending\[[22](https://arxiv.org/html/2605.26130#bib.bib22)\], eliminating boundary artefacts\. The model supports 4, 8, 25, or 50 denoising steps; results below use 25 steps\. A full CONUS forecast at1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}hourly resolution for 67 lead hours completes in minutes on a single NVIDIA A100 GPU\.
## IIIResults
We evaluate AirCast\-SR on three CONUS case studies—two extreme events and one transition—and two international domains for zero\-shot transferability\. The cases were selected to span the range of high\-impact regimes that operational kilometer\-scale forecasting must serve:*Winter Storm Elliott*\(initialisation December 22, 2022\), a historic Arctic cold\-air outbreak that brought blizzard conditions and record\-low pressures across the eastern United States; the*June 2022 continental convective episode*\(initialisation June 12, 2022\), a multi\-day pattern of organised mesoscale convective systems across the Plains and Midwest; and a*spring frontal transition*\(initialisation March 31, 2023\), representative of the cool\-season\-to\-warm\-season pattern shifts that dominate seasonal predictability over CONUS\. All three cases are entirely outside the 2021 training period\. Predictions are compared against AORC v1\.1 \(the verification reference at1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}hourly resolution\), HRRR \(the3km3\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}operational NWP benchmark\)\[[7](https://arxiv.org/html/2605.26130#bib.bib7)\], and the raw GraphCast input \(the0\.25°0\.25\\text\{\\,\}\\mathrm\{\\SIUnitSymbolDegree\}6\-hourly AI baseline\)\.
#### III\.0\.1Surface Pressure
Surface pressure \(Table[3](https://arxiv.org/html/2605.26130#S4.T3); Fig\.[2](https://arxiv.org/html/2605.26130#S4.F2)\) is the cleanest demonstration of AirCast\-SR’s near\-zero\-bias property\. Across all three cases and all lead times to 48 h, AirCast\-SR maintainsr\>0\.96r\>0\.96against AORC, with absolute bias below77Pa—approximately one to two orders of magnitude smaller than HRRR’s systematic bias of11\.711\.7Pa in the spring case \(Fig\.[2](https://arxiv.org/html/2605.26130#S4.F2)\)\. The spatial maps reveal that AirCast\-SR resolves the synoptic pressure pattern \(the deep low associated with Storm Elliott, the spring frontal trough\) without introducing the smooth\-field artefacts characteristic of pure\-regression downscaling\.
#### III\.0\.22\-m Temperature
AirCast\-SR demonstrates strong temperature prediction skill across seasons \(Table[2](https://arxiv.org/html/2605.26130#S4.T2); Fig\.[3](https://arxiv.org/html/2605.26130#S4.F3)\)\. For Winter Storm Elliott at 6\-hour lead, AirCast\-SR achievesr=0\.972r=0\.972against AORC with bias of only−0\.007\-0\.007K\. Across all three cases at the 6\-hour lead time, AirCast\-SR holdsrrbetween 0\.88 \(summer\) and 0\.97 \(winter\), with bias never exceeding0\.010\.01K in absolute value\. The spatial maps in Fig\.[3](https://arxiv.org/html/2605.26130#S4.F3)reveal that AirCast\-SR resolves valley cold pools, lake\-effect temperature gradients, and terrain\-modulated thermal contrasts that are entirely absent in the GraphCast28km28\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}fields\. HRRR retains a marginal pointwise advantage at very short lead times—consistent with its operational data assimilation—but AirCast\-SR’s bias is consistently smaller, a property that matters more than marginal correlation gains for applications where errors compound across aggregation periods \(e\.g\., growing\-season degree\-day accumulation, building\-energy load forecasting\)\.
#### III\.0\.3Precipitation
Precipitation is the most challenging variable \(Table[1](https://arxiv.org/html/2605.26130#S4.T1); Fig\.[4](https://arxiv.org/html/2605.26130#S4.F4)\)\. For 6\-hour accumulated precipitation in the summer convective case, AirCast\-SR achievesr=0\.43r=0\.43at 18\-hour lead, outperforming HRRR’sr=0\.26r=0\.26\. Across other lead times in the summer case, AirCast\-SR maintains a competitive or leading position against HRRR \(r=0\.39r=0\.39–0\.430\.43vs HRRR’s0\.090\.09–0\.260\.26at leads≥\\geq12 h\), reflecting the model’s ability to generate physically realistic convective structure where HRRR struggles with the intrinsic predictability limits of organised summer convection\. In the winter Storm Elliott case, HRRR retains a clear advantage at short lead times \(r=0\.64r=0\.64vs AirCast\-SR’s0\.400\.40at 6 h\), but AirCast\-SR closes the gap by 18 h \(r=0\.74r=0\.74vs0\.780\.78\) and exceeds HRRR by 36 h\. GraphCast leads on pointwise correlation in winter and spring, where the larger\-scale frontal forcing is well captured at0\.25°0\.25\\text\{\\,\}\\mathrm\{\\SIUnitSymbolDegree\}; AirCast\-SR’s value is in the simultaneous combination of high resolution, near\-zero bias, and competitive skill\.
#### III\.0\.4Other Variables
Downward longwave radiation\(Table[5](https://arxiv.org/html/2605.26130#S4.T5); Fig\.[5](https://arxiv.org/html/2605.26130#S4.F5)\): AirCast\-SR outperforms HRRR on both correlation \(r=0\.80r=0\.80vs HRRR’s0\.790\.79\) and RMSE \(28\.828\.8vs36\.836\.8W m\-2\) at the 2\-hour lead in the winter case, with near\-zero bias compared to HRRR’s negative systematic bias\. AirCast\-SR maintains correlations between0\.670\.67and0\.880\.88across all cases and leads while HRRR shows larger lead\-time\-dependent biases\.
2\-m specific humidity\(Table[4](https://arxiv.org/html/2605.26130#S4.T4); Fig\.[6](https://arxiv.org/html/2605.26130#S4.F6)\): AirCast\-SR achievesr=0\.90r=0\.90in the winter case at 6\-h lead with bias below0\.03×10−30\.03\\times 10^\{\-3\}kg kg\-1, capturing the moisture distribution along the storm front\.
10\-m wind components\(Tables[6](https://arxiv.org/html/2605.26130#S4.T6),[7](https://arxiv.org/html/2605.26130#S4.T7); Figs\.[7](https://arxiv.org/html/2605.26130#S4.F7),[8](https://arxiv.org/html/2605.26130#S4.F8)\): For theuu\-component at 24\-h lead in the winter case, AirCast\-SR achievesr=0\.79r=0\.79with near\-zero bias \(0\.0020\.002m s\-1\), while GraphCast exhibits a systematic positive bias of0\.760\.76m s\-1\. HRRR retains the highest correlation \(r=0\.94r=0\.94\) at the small RMSE end\. Thevv\-component case is more striking: in the winter Storm Elliott scene, AirCast\-SR \(r=0\.75r=0\.75\) outperforms GraphCast \(r=0\.56r=0\.56\) on correlation while maintaining near\-zero bias \(0\.0030\.003m s\-1\) versus GraphCast’s1\.941\.94m s\-1bias; HRRR again leads on pointwise metrics \(r=0\.93r=0\.93\) but with non\-negligible bias\.
### III\.1Spectral Analysis and the Perception–Distortion Trade\-off
A defining signature of diffusion\-based downscaling is the recovery of physically realistic fine\-scale variability that pure regression methods cannot generate\. Radial power spectral density \(PSD\) analysis confirms that AirCast\-SR preserves atmospheric structure at wavelengths of10kmto100km10\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}100\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}where GraphCast’s spectrum drops by orders of magnitude\. For temperature, AirCast\-SR’s spectral slope tracks the AORC reference from1000km1000\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}down to10km10\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}; for precipitation, the gap is more dramatic, with AirCast\-SR maintaining power at sub\-100km100\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}wavelengths where GraphCast loses spectral energy entirely\.
This spectral fidelity is the central reason AirCast\-SR’s pointwise correlation can be lower than HRRR’s while its physical realism is qualitatively higher: the model lives in the structural\-realism regime of the*perception–distortion*trade\-off\[[25](https://arxiv.org/html/2605.26130#bib.bib25)\]\. Pixel\-wise mean\-squared\-error minimisers necessarily produce blurred fields that score well on pointwise metrics but fail to capture the small\-scale variability that drives downstream applications \(precipitation extremes for hydrological forcing, wind ramps for renewable energy assessment, surface gradient features for hazard prediction\)\. AirCast\-SR explicitly trades a small amount of pointwise sharpness in the AORC\-error metric for a large gain in physically realistic mesoscale structure—a trade that is desirable for the application classes that motivated kilometer\-scale prediction in the first place\.
### III\.2Zero\-Shot Global Transferability
The most direct test of foundation\-model behaviour is whether the model generalises to domains absent from training without any retraining or fine\-tuning\. We evaluate AirCast\-SR over India \(524 stations, March 2023\) and Germany \(1,338 stations, August 2023\) using StationBench surface observations\[[23](https://arxiv.org/html/2605.26130#bib.bib23)\]as the verification reference \(Table[8](https://arxiv.org/html/2605.26130#S4.T8)\)\.
Over India, AirCast\-SR achieves 2\-m temperature correlations ofr=0\.83r=0\.83–0\.890\.89across all lead times from 1 to 48 h, with bias of essentially zero at every lead\. This is a notable result for a model trained exclusively on CONUS data: the model resolves Himalayan cold\-temperature signals and Indo\-Gangetic thermal gradients at1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}resolution, indicating that it has learned a generalisable representation of topographic and diurnal forcing rather than a CONUS\-specific surface mapping\. GraphCast achieves higher pointwise correlations over India \(r=0\.93r=0\.93–0\.950\.95\) but exhibits systematic negative biases reaching−1\.5\-1\.5K; AirCast\-SR’s near\-zero bias more than compensates for the modest correlation gap in applications where systematic errors aggregate over time\.
Over Germany, both models perform less well than over CONUS or India, with AirCast\-SR correlations betweenr=0\.37r=0\.37andr=0\.70r=0\.70depending on lead time\. The reduced skill is consistent with the greater*climatological distance*between the CONUS training domain and central Europe: the typical air\-mass histories, land\-cover spectra, and synoptic regimes over Germany differ more markedly from CONUS than do those over India, where many Indo\-Gangetic and peninsular regimes share dynamical analogues with CONUS subtropical\-to\-temperate transitions\. As over India, AirCast\-SR’s bias remains near zero while GraphCast’s systematic errors range from\+0\.22\+0\.22to−0\.78\-0\.78K\. Lower zero\-shot skill over climatologically distant domains is the expected behaviour for a foundation model: the architecture provides a transfer baseline that targeted fine\-tuning—ideally with even a single year of regional reanalysis or station data—is expected to substantially improve\.
## IVDiscussion
### IV\.1Relationship to operational HRRR and concurrent diffusion downscaling
AirCast\-SR does not yet exceed the operational HRRR system on pointwise skill for most variables and most lead times\. We state this explicitly because it is the relevant question for an operational forecasting community evaluating an AI alternative\. The places where AirCast\-SR currently leads HRRR are diagnostic but specific: summer convective precipitation at intermediate\-to\-long lead times \(where HRRR’s deterministic forecast loses skill rapidly with lead time\), downward longwave radiation in the winter case, and—most consistently—systematic bias across all variables and all lead times\. AirCast\-SR’s value at this stage is therefore not a claim of operational superiority but rather a demonstration of three distinct attributes that operational NWP cannot match simultaneously: the simultaneous coupled prediction of seven near\-surface variables at1km1\\text\{\\,\}\\mathrm\{k\}\\mathrm\{m\}hourly resolution; deployment over arbitrary global domains via patch\-based zero\-shot transfer; and inference in minutes on a single commodity GPU rather than the supercomputing infrastructure required by limited\-area NWP\.
The performance ceiling of the present configuration is not the architecture but the resources behind it\. AirCast\-SR was trained on a single calendar year of CONUS reanalysis using a single GPU, with no ensemble inference, no chained multi\-year training, and no fine\-tuning on the international evaluation domains\. Each of these axes is straightforward to scale: training\-data extension to multi\-decadal AORC\-equivalent reanalyses, model\-size scaling along the lines demonstrated for global AI models\[[2](https://arxiv.org/html/2605.26130#bib.bib2),[1](https://arxiv.org/html/2605.26130#bib.bib1)\], ensemble inference exploiting the inherently probabilistic structure of the LCM, and regional fine\-tuning\. The expected trajectory under these scalings, by analogy with the global\-AI\-model literature, is for AirCast\-SR to first match and then exceed HRRR on pointwise metrics while retaining its present advantages in coupled\-variable coverage, generalisation, and cost\.
AirCast\-SR sits within a growing family of diffusion\-based atmospheric downscaling efforts, most prominently residual corrective diffusion at km\-scale \(CorrDiff\)\[[15](https://arxiv.org/html/2605.26130#bib.bib15)\]\. Direct head\-to\-head benchmarking against these concurrent approaches is left to future work; the methodological choices of AirCast\-SR \(3D U\-Net rather than 2D, full\-field rather than residual prediction, LCM consistency distillation rather than full DDPM sampling, patch\-based global zero\-shot rather than regional fixed\-domain training\) are complementary rather than directly competing, and the relative strengths of each design will depend on the application and evaluation regime\.
### IV\.2Limitations
AirCast\-SR has four current limitations that scope the conclusions of this work\.
*Pointwise skill gap to HRRR\.*On most variables and lead times, AirCast\-SR’s pointwise correlation with AORC is lower than HRRR’s\. Two factors contribute\. Physically, the perception–distortion trade\-off\[[25](https://arxiv.org/html/2605.26130#bib.bib25)\]mandates a small loss of pointwise sharpness when the model is optimised for structural realism rather than the mean\-squared\-error minimum\. Algorithmically, the LCM\-distilled inference is fast but loses some of the information\-recovery quality of the full underlying DDPM; recent work on consistency model improvements\[[26](https://arxiv.org/html/2605.26130#bib.bib26)\]suggests this gap is closeable\. Both factors point to specific architectural and training\-objective improvements that are out of scope here\.
*Deterministic outputs from a generative model\.*The present results report a single sampled realisation per inference\. The underlying LCM is generative; ensemble inference via independent random seeds—at modest additional cost—will enable probabilistic skill evaluation \(CRPS, spread/skill ratio, rank histograms, reliability diagrams\) and is the most immediate planned extension\.
*Lower zero\-shot skill over climatologically distant domains\.*Germany correlations are weaker than India correlations, which we attribute to the greater climatological distance from CONUS training data\. Targeted fine\-tuning with even small amounts of regional reanalysis or station data is expected to close this gap substantially\.
*Single training year\.*Training used calendar year 2021 only, which limits the diversity of synoptic and convective regimes the model has seen\. Multi\-year training is straightforward but was not performed for this initial release\.
### IV\.3Outlook
AirCast\-SR is released as an open\-weights foundation model\. Its design supports three modes of community use: direct deployment for applications that tolerate the present skill levels in exchange for kilometer\-scale coverage; regional fine\-tuning for operational use over specific domains; and architectural research exploring improved loss functions, physics\-informed constraints, ensemble inference, and extended variable sets including hail, icing, and visibility\. The near\-zero systematic bias—preserved across all seven variables, all lead times to 48 h, and all evaluated domains including the zero\-shot transfers—is a distinctive and practically important property: for climate services, hydrological modelling, and energy applications where biases compound over aggregation periods, a near\-zero\-bias model with moderate correlation outperforms a high\-correlation model with persistent bias\. Combined with the model’s spectral fidelity and multi\-variable coherence, AirCast\-SR provides a platform for the next generation of high\-resolution AI\-driven weather and climate services, with a clear scaling path toward eventually exceeding operational NWP systems on every relevant axis\.
Table 1:Precipitation skill metrics for 6\-hour accumulated precipitation \(mm/6h\) across the three CONUS case studies\. EM = AirCast\-SR; HRRR = High\-Resolution Rapid Refresh; GC = GraphCast\. Bold indicates best performance per metric and lead time within each case\.Table 2:2\-m temperature skill metrics \(K\) across three CONUS case studies\. EM = AirCast\-SR; GC = GraphCast\. Boldrrindicates best correlation per lead time\.Table 3:Surface pressure skill metrics \(Pa\) across three CONUS case studies\. EM = AirCast\-SR\. Boldrrindicates best correlation per lead time; bold Bias indicates smallest absolute bias\.Table 4:2\-m specific humidity skill metrics \(×10−3\\times 10^\{\-3\}kg kg\-1\) across three CONUS case studies\. EM = AirCast\-SR\. Boldrrindicates best correlation per lead time; bold Bias indicates smallest absolute bias\.Table 5:Downward longwave radiation skill metrics \(W m\-2\) across three CONUS case studies\. EM = AirCast\-SR\. Boldrrindicates best correlation per lead time; bold Bias indicates smallest absolute bias\.Table 6:10\-muu\-wind skill metrics \(m s\-1\) across three CONUS case studies\. EM = AirCast\-SR; GC = GraphCast\. Boldrrindicates best correlation per lead time; bold Bias indicates smallest absolute bias\.Table 7:10\-mvv\-wind skill metrics \(m s\-1\) across three CONUS case studies\. EM = AirCast\-SR; GC = GraphCast\. Boldrrindicates best correlation per lead time; bold Bias indicates smallest absolute bias\.Table 8:Zero\-shot 2\-m temperature evaluation \(rr/ RMSE \[K\]\)\. EM = AirCast\-SR; GC = GraphCast\.Figure 1:AirCast\-SR architecture and workflow\.\(a\) End\-to\-end pipeline\. \(b\) Latent Consistency Model architecture\. \(c\) Patch\-based inference\. \(d\) Input \(20 channels\) and output \(7 NOAA AORC variables\) specifications\.Figure 2:Surface pressure, March 31, 2023 \(spring\), \+6h lead\.AirCast\-SR achievesr=0\.97r=0\.97with bias of only0\.30\.3Pa, compared to HRRR’s bias of11\.711\.7Pa\. The near\-zero bias is a consistent feature across all case studies and lead times \(Table[3](https://arxiv.org/html/2605.26130#S4.T3)\)\.Figure 3:2\-m temperature, December 22, 2022 \(winter\), \+6h lead\.Left column: spatial maps\. Upper right: radial power spectral density\. Lower right: scatter plots\. AirCast\-SR achievesr=0\.97r=0\.97with near\-zero bias \(−0\.007\-0\.007K\)\.Figure 4:6\-hour accumulated precipitation, June 12, 2022 \(summer\), \+12h lead\.AirCast\-SR resolves mesoscale convective precipitation structure absent in the GraphCast input\. The PSD confirms preservation of fine\-scale variability\.Figure 5:Downward longwave radiation, December 22, 2022, \+2h lead\.AirCast\-SR outperforms HRRR on correlation \(r=0\.80r=0\.80vs\.0\.790\.79\) and RMSE \(28\.828\.8vs\.36\.836\.8W m\-2\)\.Figure 6:2\-m specific humidity, December 22, 2022 \(winter\), \+6h lead\.AirCast\-SR achievesr=0\.90r=0\.90with bias<0\.03×10−3<0\.03\\times 10^\{\-3\}kg kg\-1, capturing the spatial moisture distribution\.Figure 7:10\-muu\-wind, December 22, 2022 \(winter\), \+24h lead\.AirCast\-SR achievesr=0\.79r=0\.79with near\-zero bias \(0\.0020\.002m s\-1\), while GraphCast shows a systematic positive bias of0\.760\.76m s\-1\.Figure 8:10\-mvv\-wind, December 22, 2022 \(winter\), \+6h lead\.AirCast\-SR \(r=0\.75r=0\.75\) outperforms GraphCast \(r=0\.56r=0\.56\) on correlation while maintaining near\-zero bias \(0\.0030\.003m s\-1\)\.
## References
- \[1\]R\. Lam,*et al\.*, “GraphCast: Learning skillful medium\-range global weather forecasting,”*Science*,382, 1416–1421 \(2023\)\.
- \[2\]K\. Bi,*et al\.*, “Accurate medium\-range global weather forecasting with 3D neural networks,”*Nature*,619, 533–538 \(2023\)\.
- \[3\]L\. Chen,*et al\.*, “FuXi: A cascade machine learning forecasting system for 15\-day global weather forecast,”*npj Climate and Atmospheric Science*,6, 190 \(2023\)\.
- \[4\]S\. Lang,*et al\.*, “AIFS – ECMWF’s data\-driven forecasting system,” arXiv:2406\.01465 \(2024\)\.
- \[5\]I\. Price,*et al\.*, “GenCast: Diffusion\-based ensemble forecasting for medium\-range weather,”*Nature*,637, 84–90 \(2025\)\.
- \[6\]P\. Bauer, A\. Thorpe, and G\. Brunet, “The quiet revolution of numerical weather prediction,”*Nature*,525, 47–55 \(2015\)\.
- \[7\]D\. C\. Dowell,*et al\.*, “The High\-Resolution Rapid Refresh \(HRRR\): An hourly updating convection\-allowing forecast model,”*Weather and Forecasting*,37, 1371–1395 \(2022\)\.
- \[8\]A\. W\. Wood,*et al\.*, “Hydrologic implications of dynamical and statistical approaches to downscaling climate model outputs,”*Climatic Change*,62, 189–216 \(2004\)\.
- \[9\]H\. G\. Hidalgo,*et al\.*, “Downscaling with constructed analogues: Daily precipitation and temperature fields over the United States,” California Energy Commission Report CEC\-500\-2007\-123 \(2008\)\.
- \[10\]T\. Vandal,*et al\.*, “DeepSD: Generating high resolution climate change projections through single image super\-resolution,” in*Proc\. 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining \(KDD\)*, 1663–1672 \(2017\)\.
- \[11\]Y\. Sha,*et al\.*, “Deep\-learning\-based gridded downscaling of surface meteorological variables in complex terrain\. Part I: Daily maximum and minimum 2\-m temperature,”*Journal of Applied Meteorology and Climatology*,59\(12\), 2057–2073 \(2020\)\.
- \[12\]F\. Wang,*et al\.*, “Deep learning for daily precipitation and temperature downscaling,”*Water Resources Research*,57, e2020WR029308 \(2021\)\.
- \[13\]K\. Stengel,*et al\.*, “Adversarial super\-resolution of climatological wind and solar data,”*Proc\. National Academy of Sciences*,117, 16805–16815 \(2020\)\.
- \[14\]L\. Harris,*et al\.*, “A generative deep learning approach to stochastic downscaling of precipitation forecasts,”*Journal of Advances in Modeling Earth Systems*,14, e2022MS003120 \(2022\)\.
- \[15\]M\. Mardani,*et al\.*, “Residual corrective diffusion modeling for km\-scale atmospheric downscaling,”*Communications Earth & Environment*,6, 124 \(2025\); preprint at arXiv:2309\.15214 \(2023\)\.
- \[16\]J\. Ho,*et al\.*, “Denoising diffusion probabilistic models,” in*Advances in Neural Information Processing Systems*,33, 6840–6851 \(2020\)\.
- \[17\]Y\. Song,*et al\.*, “Score\-based generative modeling through stochastic differential equations,” in*Proc\. ICLR*\(2021\)\.
- \[18\]O\. Ronneberger,*et al\.*, “U\-Net: Convolutional networks for biomedical image segmentation,” in*Proc\. MICCAI*, 234–241 \(2015\)\.
- \[19\]Ö\. Çiçek,*et al\.*, “3D U\-Net: Learning dense volumetric segmentation from sparse annotation,” in*Proc\. MICCAI*, 424–432 \(2016\)\.
- \[20\]S\. Luo,*et al\.*, “Latent consistency models: Synthesizing high\-resolution images with few\-step inference,” arXiv:2310\.04378 \(2023\)\.
- \[21\]NOAA Office of Water Prediction, “Analysis of Record for Calibration \(AORC\), version 1\.1,”[https://hydrology\.nws\.noaa\.gov/aorc\-historic/](https://hydrology.nws.noaa.gov/aorc-historic/)\(2023\)\.
- \[22\]N\. Pielawski and C\. Wählby, “Introducing Hann windows for reducing edge\-effects in patch\-based image segmentation,”*PLoS ONE*,15, e0229839 \(2020\)\.
- \[23\]ECMWF, “StationBench: Surface station benchmark for AI weather models,” software repository,[https://github\.com/ecmwf\-lab/stationbench](https://github.com/ecmwf-lab/stationbench)\(accessed 2024\)\.
- \[24\]M\. G\. Schultz,*et al\.*, “Can deep learning beat numerical weather prediction?,”*Phil\. Trans\. R\. Soc\. A*,379, 20200097 \(2021\)\.
- \[25\]Y\. Blau and T\. Michaeli, “The perception–distortion tradeoff,” in*Proc\. IEEE/CVF Conf\. Computer Vision and Pattern Recognition \(CVPR\)*, 6228–6237 \(2018\)\.
- \[26\]Y\. Song, P\. Dhariwal, M\. Chen, and I\. Sutskever, “Consistency models,” in*Proc\. International Conference on Machine Learning \(ICML\)*\(2023\)\.Similar Articles
RESCAST-100K: A Comprehensive Dataset for Cross-Domain Residential Load and Indoor Temperature Forecasting
Introduces RESCAST-100K, a large-scale benchmark dataset for cross-domain residential load and indoor temperature forecasting, featuring simulated and real data to evaluate transfer learning, domain adaptation, and zero-shot generalization.
AsymFlow Claims More Realistic AI Images by Moving Beyond Latent Diffusion
AsymFlow is a new method from Stanford that converts latent diffusion models to pixel space, achieving more realistic images by avoiding information loss from compression. It surpasses FLUX.2 klein on benchmarks with lower computational cost.
Multi-Quantile Regression for Extreme Precipitation Downscaling
Introduces Q-srdrn, a multi-quantile super-resolution network using pinball loss to improve extreme precipitation downscaling, achieving dramatic detection rate gains for heavy rainfall events while maintaining overall accuracy.
Simplifying, stabilizing, and scaling continuous-time consistency models
OpenAI presents sCM (simplified continuous-time consistency models), a new approach that scales consistency models to 1.5B parameters and achieves ~50x speedup over diffusion models by generating high-quality samples in just 2 steps. The method demonstrates comparable sample quality to state-of-the-art diffusion models while using less than 10% of the effective sampling compute.
Bayesian Rain Field Reconstruction using Commercial Microwave Links and Diffusion Model Priors
This paper presents a Bayesian inverse problem framework for rain field reconstruction using Commercial Microwave Links and Diffusion Model priors, demonstrating improved accuracy over existing baselines.