PRB-RUPFormer: A Recursive Unified Probabilistic Transformer for Residual PRB Forecasting
Summary
Proposes PRB-RUPFormer, a recursive unified probabilistic Transformer for forecasting residual Physical Resource Blocks in cellular networks, achieving high accuracy and uncertainty quantification on commercial LTE data.
View Cached Full Text
Cached at: 05/18/26, 06:40 AM
# PRB-RUPFormer: A Recursive Unified Probabilistic Transformer for Residual PRB Forecasting
Source: [https://arxiv.org/html/2605.15363](https://arxiv.org/html/2605.15363)
Saad Masrur⋆,†, Yuxuan Jiang⋆, Matti Hiltunen⋆, Ajay Rajkumar⋆, and İsmail Güvenç† ⋆AT&T RAN Technology, Bedminster, NJ †Department of Electrical and Computer Engineering, North Carolina State University, Raleigh, NC \{smasrur,iguvenc\}@ncsu\.edu
###### Abstract
Accurate forecasting of residual Physical Resource Blocks \(PRBs\) is critical for proactive network slice provisioning, energy\-efficient operation, and spectrum\-aware decision making in cellular systems, where residual PRBs serve as a practical proxy for short\- and medium\-term spectrum availability\. Existing PRB prediction methods typically rely only on historical PRB values and are trained independently per carrier or sector, limiting their ability to capture cross\-carrier dependencies and providing no measure of forecast uncertainty\. Moreover, point forecasts alone are insufficient for robust spectrum\-aware control under highly variable traffic conditions\. This paper proposesPRB\-RUPFormer, a recursive unified probabilistic Transformer for residual PRB forecasting\. The proposed model jointly processes multivariate KPI time series using temporal, seasonal, and carrier\-aware embeddings, preserving inter\-metric temporal coupling during recursive rollout and stabilizing long\-horizon forecasting\. A single shared model is trained across all carriers and sectors of an eNB, enabling efficient learning of joint traffic dynamics with low computational overhead\. Forecast uncertainty is captured through quantile\-based prediction intervals, providing confidence\-aware estimates of future PRB availability\. Evaluations on six months of commercial LTE network data from multiple U\.S\. locations demonstrate median MAE below0\.050\.05and hit probabilities above0\.800\.80for both one\-day and seven\-day recursive forecasts\. These probabilistic predictions directly support spectrum\-aware RAN functions such as dynamic carrier activation, congestion avoidance, and proactive spectrum sharing, making the proposed framework well\-suited for dynamic spectrum access scenarios\.
Index Terms— O\-RAN, PRB forecasting, Dynamic spectrum access, Transformer models, Probabilistic prediction, RAN intelligence\.
## IIntroduction
Accurate forecasting of residual physical resource blocks \(PRBs\) is critical for a wide range of radio access network \(RAN\) functions, including network slice provisioning, spectrum management, quality\-of\-service \(QoS\) assurance, and energy\-efficient operation\. Traffic patterns in operational cellular networks exhibit strong temporal variability driven by diurnal usage cycles, user mobility, and event\-induced surges\. These dynamics directly affect PRB availability across carriers and sectors, and anticipating future PRB utilization enables proactive control actions that mitigate congestion and improve resource utilization\.
Figure 1:Integration of the proposed forecasting model into the O\-RAN architecture, showing how probabilistic PRB predictions support rApp and xApp control functions\.From a spectrum management perspective, residual PRBs provide a fine\-grained representation of short\- and near\-term spectrum availability at the carrier level\. Since PRBs correspond to time\-frequency resources scheduled at the physical layer, their residual count reflects the spectrum that may be dynamically reallocated, shared, or opportunistically reused\. Forecasting residual PRBs, therefore, enables spectrum\-aware decisions such as dynamic carrier activation, inter\-carrier load balancing, and adaptive spectrum allocation across sectors or network slices\. In emerging dynamic spectrum access \(DSA\) scenarios\[[1](https://arxiv.org/html/2605.15363#bib.bib12)\], where spectrum resources are increasingly shared across services and operators, anticipating future PRB availability is essential to reduce interference, prevent congestion, and improve spectral efficiency\. By forecasting residual PRBs as a proxy for short\- and medium\-term spectrum availability, the proposed framework directly supports dynamic spectrum access and spectrum\-sharing use cases envisioned in next\-generation cellular systems\.
These spectrum\-aware forecasting capabilities naturally align with the Open Radio Access Network \(O\-RAN\) architecture, which introduces an open, programmable, and disaggregated framework for data\-driven RAN control and automation\[[10](https://arxiv.org/html/2605.15363#bib.bib1)\]\. O\-RAN defines two types of RAN automation applications distinguished by the order of magnitude of their control\-loop latency\. rApps operate at the second level and are deployed in the non\-real\-time RAN Intelligent Controller \(RIC\), whereas xApps operate at the millisecond level and are deployed in the near\-real\-time RIC\. Both rApps and xApps require timely awareness of future network and spectrum conditions to enable predictive, closed\-loop control\. As shown in Fig\.[1](https://arxiv.org/html/2605.15363#S1.F1), the proposed probabilistic PRB forecasting module can be deployed as an O\-RAN rApp to provide future network condition awareness for advanced network automation, including network slicing\. Specifically, historical KPIs are processed in the cloud to train the forecasting model, which is deployed as a prediction rApp in the non\-real\-time RIC\. The resulting forecasts can be shared with other rApps via the R1 interface and communicated to near\-real\-time xApps through the A1 interface, enabling spectrum\-aware control actions such as dynamic resource allocation, congestion avoidance, and proactive spectrum sharing\. While this work focuses on forecasting methodology rather than O\-RAN implementation details, the figure highlights the practical relevance of probabilistic PRB prediction for RIC\-driven spectrum and resource management\.
To address the limitations of existing techniques, this paper introduces the recursive unified probabilistic Transformer \(PRB\-RUPFormer\), a data\-driven and computationally efficient Transformer\-based architecture for residual PRB forecasting\. Unlike conventional per\-carrier or per\-cell predictors,PRB\-RUPFormeremploys a single unified model that jointly forecasts residual PRBs across all carriers and sectors of an eNB, enabling it to capture inter\-carrier and inter\-sector dependencies that naturally arise from shared spectrum usage, carrier aggregation, and coordinated scheduling\. This unified design significantly reduces model maintenance and training overhead in large\-scale deployments, while improving generalization under data sparsity\. The model integrates lightweight attention blocks with temporal and seasonal positional embeddings to efficiently learn long\-range traffic dynamics, and performs recursive multi\-step forecasting over horizons ranging from minutes to several days\. By producing probabilistic forecasts with calibrated confidence intervals,PRB\-RUPFormerprovides reliable, uncertainty\-aware predictions that are well\-suited for spectrum\-aware and closed\-loop RAN control\. The main contributions of this paper are summarized as follows:
- •A unified, data\-driven Transformer architecture that forecasts residual PRBs for all carriers and sectors of an eNB using a single model, capturing inter\-carrier and inter\-sector dependencies that are missed by per\-carrier predictors\.
- •An efficient recursive forecasting strategy that extends short\-horizon predictions to multi\-day horizons without retraining separate models\.
- •Probabilistic residual PRB forecasts with calibrated confidence intervals, enabling uncertainty\-aware spectrum and RAN control decisions\.
- •An extensive evaluation using commercial LTE network data across multiple locations, traffic conditions, and forecasting horizons\.
The remainder of this paper is organized as follows\. Section II reviews related work on PRB and traffic forecasting\. Section III presents the system model and problem formulation\. Section IV describes the proposedPRB\-RUPFormerarchitecture and training methodology\. Section V outlines the experimental setup and evaluation metrics\. Numerical results are discussed in Section VI, followed by conclusions in Section VII\.
## IIRelated Work
Early work on PRB forecasting primarily relied on statistical and autoregressive models that capture temporal correlations in PRB usage using historical observations alone\[[11](https://arxiv.org/html/2605.15363#bib.bib2)\]\. While effective for modeling short\-term trends, these approaches assume that future PRB utilization depends solely on past PRB values and do not incorporate the influence of traffic load, scheduling dynamics, user mobility, or control\-plane activity\. As a result, their ability to generalize under non\-stationary network conditions is limited\.
To capture nonlinear dependencies in cellular networks, machine learning techniques have been applied to traffic and resource forecasting tasks\[[4](https://arxiv.org/html/2605.15363#bib.bib5),[8](https://arxiv.org/html/2605.15363#bib.bib3),[7](https://arxiv.org/html/2605.15363#bib.bib4)\]\. For example, hybrid models combining KPI selection, neural regression, and seasonal ARIMA have been proposed for downlink throughput forecasting using per\-cell historical data\[[8](https://arxiv.org/html/2605.15363#bib.bib3)\]\. Other studies evaluate ensemble learning and neural models to identify KPIs that influence short\-term throughput prediction accuracy\[[7](https://arxiv.org/html/2605.15363#bib.bib4)\]\. However, these approaches focus on throughput rather than PRB availability, operate independently per cell, and produce point estimates that do not quantify prediction uncertainty, limiting their applicability to spectrum\-aware control and admission decisions\.
Recurrent neural networks, including LSTM and GRU architectures, have been widely used for cellular traffic forecasting\. LSTM\-based models have been applied to short\-horizon bearer\-level throughput and PRB demand prediction using scheduler\-level information\[[3](https://arxiv.org/html/2605.15363#bib.bib6)\], as well as to millisecond\-scale PRB utilization forecasting using large LTE datasets\[[9](https://arxiv.org/html/2605.15363#bib.bib7)\]\. While these methods capture temporal correlations, recurrent structures struggle with long\-range dependencies and typically operate on a per\-cell or per\-carrier basis\. Moreover, they generate deterministic point predictions and do not model inter\-carrier or inter\-sector coupling, which is critical for coordinated spectrum management\.
More recently, transformer\-based architectures have gained attention for network traffic forecasting due to their ability to model long\-range temporal dependencies via self\-attention mechanisms\[[5](https://arxiv.org/html/2605.15363#bib.bib8)\]\. Existing transformer\-based studies in cellular networks, however, primarily target single\-step or short\-horizon forecasting and are often designed for individual cells or carriers\. In addition, most of these approaches provide point estimates only, without uncertainty quantification, which restricts their use in spectrum sharing, slice admission control, and energy\-aware RAN optimization where robustness is essential\.
From a spectrum management perspective, prior work has highlighted the importance of forecasting resource demand to support proactive allocation and avoid over\- or under\-provisioning, particularly in the context of RAN slicing and shared spectrum environments\[[3](https://arxiv.org/html/2605.15363#bib.bib6),[2](https://arxiv.org/html/2605.15363#bib.bib13)\]\. However, existing solutions typically treat PRB demand prediction as a supporting component rather than a unified forecasting problem across carriers and sectors, and they do not explicitly address probabilistic forecasting of residual PRBs as a proxy for future spectrum availability\.
Within the O\-RAN ecosystem, forecasting is widely recognized as a key enabler for intelligent rApps and xApps supporting functions such as PRB utilization prediction, interference mitigation, and proactive mobility management\. Despite this recognition, the integration of probabilistic, multi\-carrier PRB forecasting into O\-RAN control workflows remains largely unexplored\. This work addresses this gap by proposing a unified probabilistic Transformer\-based architecture for residual PRB forecasting and outlining its deployment within O\-RAN\-compliant control loops, enabling spectrum\-aware, predictive RAN management\.
## IIISystem Model
We consider a commercial LTE eNB deployment composed of multiple sectors, typically three sectors covering approximately 120 degrees each\. Each sector operates several component carriers with different center frequencies and bandwidths, forming a multi\-layer structure\. As illustrated in Fig\.[2](https://arxiv.org/html/2605.15363#S3.F2), these coverage layers overlap in practice, and their boundaries are not strict; the effective footprint of each carrier varies with path loss, shadowing, antenna configuration, and traffic load\. Training separate forecasting models for each carrier, as commonly done in prior studies, does not leverage the inherent coupling across carriers and sectors and leads to redundant model instances\. This motivates the use of a unified forecasting framework capable of capturing cross\-carrier and cross\-sector dependencies, which is essential in dynamic spectrum access scenarios where spectrum resources may be flexibly reassigned, shared, or opportunistically reused across carriers and sectors\.
Figure 2:Illustration of the LTE eNB configuration used in this study, consisting of three sectors, each operating seven carriers\.At the eNB, a large number of key performance indicators \(KPIs\) are measured at the transmission time interval \(TTI\) level and are commonly aggregated to coarser resolutions for monitoring and analytics in operational cellular networks\. In this work, KPIs are aggregated over fixed 15\-minute intervals, which is an industry\-standard granularity for cell\- and carrier\-level performance reporting\. At each discrete time indextt, the network reports multiple KPIs, from which a subset is selected based on Pearson correlation with residual PRB utilization to capture the dominant drivers of PRB consumption and short\-term spectrum availability\. The forecasting target is the residual PRB ratio, defined for each carrier as:
rt=NPRBtot−NPRBused\(t\)NPRBtot,r\_\{t\}=\\frac\{N\_\{\\text\{PRB\}\}^\{\\text\{tot\}\}\-N\_\{\\text\{PRB\}\}^\{\\text\{used\}\}\(t\)\}\{N\_\{\\text\{PRB\}\}^\{\\text\{tot\}\}\},\(1\)whereNPRBtotN\_\{\\text\{PRB\}\}^\{\\text\{tot\}\}denotes the total number of PRBs available to the carrier andNPRBused\(t\)N\_\{\\text\{PRB\}\}^\{\\text\{used\}\}\(t\)is the number of PRBs utilized during the 15\-minute interval at time \(t\)\. This normalized formulation bounds the residual PRB ratio to the interval \(\[0,1\]\) and serves as a practical proxy for short\-term spectrum availability, enabling meaningful comparison across carriers and supporting spectrum\-aware decision making\.
The selected KPI vector at time \(t\) is defined as:
Xt=\[\\displaystyle\\small X\_\{t\}=\\big\[PRB\_MEANt,PRB\_TOTALt,ACTIVE\_TTIt,\\displaystyle\\text\{PRB\\\_MEAN\}\_\{t\},\\;\\text\{PRB\\\_TOTAL\}\_\{t\},\\;\\text\{ACTIVE\\\_TTI\}\_\{t\},\\;PRB\_PDSCHt,PRB\_PUCCHt,UE\_MAXt,UE\_AVGt,\\displaystyle\\text\{PRB\\\_PDSCH\}\_\{t\},\\;\\text\{PRB\\\_PUCCH\}\_\{t\},\\;\\text\{UE\\\_MAX\}\_\{t\},\\;\\text\{UE\\\_AVG\}\_\{t\},\\;DL\_TPUTt,rt\]\.\\displaystyle\\text\{DL\\\_TPUT\}\_\{t\},\\;r\_\{t\}\\big\]\.\(2\)
In words, the vector𝐗t\\mathbf\{X\}\_\{t\}includes KPIs that jointly capture traffic demand, scheduling behavior, and user activity, including mean PRB usage per TTI, total PRBs, active TTIs, PRBs allocated to shared and control channels, the maximum and average number of connected UEs, downlink throughput, and the residual PRB ratio\. Together, these features provide the contextual information required for accurate and stable residual PRB forecasting\.
Prior work on residual PRB forecasting often relies solely on historical PRB values, implicitly assuming that future PRB availability depends only on its recent history\. While this simplifies recursive prediction, it ignores the influence of traffic load, scheduling activity, control signaling, and user dynamics, leading to poor generalization under non\-stationary conditions, particularly in DSA scenarios\. To address this limitation, we adopt a multivariate formulation in which the model observes the pastNNKPI vectors:
𝐗tinp=\{Xt−N\+1,…,Xt\},\\mathbf\{X\}^\{\\text\{inp\}\}\_\{t\}=\\\{X\_\{t\-N\+1\},\\dots,X\_\{t\}\\\}~,\(3\)and predicts the nextMMfull KPI vectors, not just the residual PRB\. Let
𝐘t\+1:t\+Mout=\{X^t\+1,…,X^t\+M\}\\mathbf\{Y\}^\{\\text\{out\}\}\_\{t\+1:t\+M\}=\\\{\\hat\{X\}\_\{t\+1\},\\dots,\\hat\{X\}\_\{t\+M\}\\\}\(4\)denote the sequence of future KPI vectors\. The forecasting problem is defined as learning a mapping
𝐟θ:𝐗t−N\+1:t→𝐗^t\+1:t\+M,\\mathbf\{f\_\{\\theta\}\}:\\mathbf\{X\}\_\{t\-N\+1:t\}\\rightarrow\\widehat\{\\mathbf\{X\}\}\_\{t\+1:t\+M\}~,\(5\)where𝐟θ\\mathbf\{f\_\{\\theta\}\}is the model parameterized byθ\\theta\.
This formulation naturally enables recursive multi\-step prediction\. During inference, the model extends forecasting beyond a single horizon of lengthMMby feeding predicted KPI vectors back into the input window\. After generating the first block of forecasts𝐗^t\+1:t\+M\\widehat\{\\mathbf\{X\}\}\_\{t\+1:t\+M\}, the predicted vectors replace the oldest observations while the window length remains fixed atNN\. The updated window
\{Xt−N\+M\+1,…,Xt,X^t\+1,…,X^t\+M\},\\\{X\_\{t\-N\+M\+1\},\\dots,X\_\{t\},\\ \\widehat\{X\}\_\{t\+1\},\\dots,\\widehat\{X\}\_\{t\+M\}\\\}~,\(6\)is then fed back into the model to produce the next block of forecasts𝐗^t\+M\+1:t\+2M\\widehat\{\\mathbf\{X\}\}\_\{t\+M\+1:t\+2M\}\. This process repeats, with each iteration using the most recent mix of observed and previously predicted KPIs, allowing the prediction horizon to grow without increasing the model input size\. As illustrated in fig\.[3](https://arxiv.org/html/2605.15363#S3.F3), this sliding\-window recursion supports long\-horizon forecasting by propagating temporal dependencies across both observed and predicted KPI sequences\. Joint KPI prediction further preserves inter\-metric relationships critical for stable and accurate long\-horizon forecasts\.
Figure 3:Recursive multi\-step forecasting where each block ofMMpredicted KPI vectors is appended to the recent history, and the modelfθf\_\{\\theta\}is reapplied using only the latestNNvectors to maintain a fixed input window\.Since future network behavior is uncertain, the model generates probabilistic forecasts
pθ\(𝐗t\+1:t\+M∣𝐗t−N\+1:t\),p\_\{\\theta\}\\big\(\\mathbf\{X\}\_\{t\+1:t\+M\}\\mid\\mathbf\{X\}\_\{t\-N\+1:t\}\\big\)~,\(7\)for just the residual PRB component using quantile\-based prediction \(see Section V\), while the remaining KPIs are predicted deterministically\. Although probabilistic outputs are applied only to residual PRBs, the model jointly predicts all KPIs to preserve inter\-KPI temporal coupling during recursive rollout\. Predicting auxiliary KPIs prevents the accumulation of physically inconsistent states that can arise when residual PRBs are forecast in isolation, thereby stabilizing long\-horizon recursion and ensuring that future PRB trajectories remain consistent with underlying traffic load, scheduling activity, and user dynamics\.
## IVProposed Model: PRB\-RUPFormer
The proposed recursive unified probabilistic Transformer \(PRB\-RUPFormer\) is a unified Transformer\-based forecasting architecture designed to predict future KPI vectors and generate probabilistic forecasts for residual PRBs across all carriers of an eNB\. The model operates on multivariate KPI sequences, incorporates temporal and categorical embeddings, and supports recursive multi\-step prediction using a fixed\-length input window\. In this section, we elaborate the detailed architecture of the model\.
### IV\-AInput Embedding Layer
In this layer, each KPI vectorXt∈ℝdX\_\{t\}\\in\\mathbb\{R\}^\{d\}is first projected into an embedding space through a learnable linear transformationWproj∈ℝdembW\_\{\\text\{proj\}\}\\in\\mathbb\{R\}^\{d\_\{\\text\{emb\}\}\}:
Et=Wproj×Xt\.E\_\{t\}=W\_\{\\text\{proj\}\}\\times X\_\{t\}\.\(8\)To provide contextual and temporal structure, the projected vectorEtE\_\{t\}is augmented with auxiliary embeddings that encode temporal and categorical information, as described below\.
#### IV\-A1Positional EmbeddingPtEP\_\{t\}^\{E\}
A learned positional embedding that identifies the relative position within theNN\-length input window, and helps the Transformer distinguish between early and recent time steps\.
#### IV\-A2Calendar\-Based Temporal Embeddings
To capture the periodic structure inherent in cellular traffic, the model augments each KPI vector with a set of calendar\-based temporal embeddings\. These embeddings encode the month, weekday, hour, and minute corresponding to each measurement time, allowing the Transformer to learn seasonal, weekly, and daily load patterns\. Formally, each temporal component is implemented as a learned dictionary\. The month embedding\(PtM\)\(P\_\{t\}^\{M\}\)is represented by a\(12×demb\)\(12\\times d\_\{\\text\{emb\}\}\)lookup table, where each month maps to a learnable vector\. The weekday embedding\(PtW\)\(P\_\{t\}^\{W\}\)uses a\(7×demb\)\(7\\times d\_\{\\text\{emb\}\}\)table, enabling the model to internalize weekly usage cycles\. Hour\-of\-day information is captured through a\(24×demb\)\(24\\times d\_\{\\text\{emb\}\}\)embedding matrix\(PtH\)\(P\_\{t\}^\{H\}\), while minute\-level variation \(with 15\-minute aggregation\) is encoded via a\(4×demb\)\(4\\times d\_\{\\text\{emb\}\}\)table\(PtS\)\(P\_\{t\}^\{S\}\)corresponding to minute indices\(0,15,30,45\)\(\{0,15,30,45\}\)\.
Together, these learned embeddings provide the model with rich temporal context across multiple scales, enabling it to recognize long\-term seasonality \(e\.g\., monthly trends\), medium\-range patterns \(e\.g\., weekday effects\), and short\-term periodicity \(e\.g\., hourly traffic peaks\)\. This structure enhances the model’s ability to forecast future KPI behavior under varying temporal conditions\.
#### IV\-A3Carrier Embedding:PtCP\_\{t\}^\{C\}
Because a single unified model is trained across all carriers of all sectors, we introduce a carrier\-identity embeddingPtCP\_\{t\}^\{C\}that uses a\(21×demb\)\(21\\times d\_\{\\text\{emb\}\}\)table that encodes the carrier index associated with each KPI vector\. This embedding allows the model to internalize carrier\-specific characteristics, such as frequency\-dependent coverage ranges, scheduling behavior, and load profiles, while still leveraging shared temporal dynamics learned across carriers\.
#### IV\-A4Combined Input Embedding
All embeddings are summed to produce the token representation for the Transformer encoder\. The final input representation for time stepttis constructed as:
XtE=Et\+PtE\+PtM\+PtW\+PtH\+PtS\+PtC,X\_\{t\}^\{E\}=E\_\{t\}\+P\_\{t\}^\{E\}\+P\_\{t\}^\{M\}\+P\_\{t\}^\{W\}\+P\_\{t\}^\{H\}\+P\_\{t\}^\{S\}\+P\_\{t\}^\{C\}~,\(9\)where each term corresponds to a different temporal or categorical encoding\. The overall embedding pipeline, including positional, temporal, and carrier\-specific components, is illustrated in Fig\.[4](https://arxiv.org/html/2605.15363#S4.F4)\.
Figure 4:The proposed embedding framework forPRB\-RUPFormer\.
### IV\-BWorking Mechanism of the ProposedPRB\-RUPFormer
The proposedPRB\-RUPFormeradopts a sequence\-to\-sequence Transformer architecture designed to model multivariate KPI dynamics and residual PRB uncertainty jointly\. The model consists of a Transformer encoder that processesNNhistorical KPI sequences and a Transformer decoder that autoregressively generates forecasts for the nextMMintervals\. Although the architecture remains identical during training and inference, the decoder operates under different input conditions, resulting in distinct behaviors\.
#### IV\-B1Encoder Architecture
The encoder consists ofℓe\\ell\_\{e\}stacked layers, each containing multi\-head self\-attention \(MHSA\) followed by a position\-wise feed\-forward network, with residual connections and normalization at every sublayer\[[13](https://arxiv.org/html/2605.15363#bib.bib9)\]\. The encoder receives a fixed\-length window ofNNpast embedded input sequnceXt−N\+1E,…,XtE\{X\_\{t\-N\+1\}^\{E\},\\dots,X\_\{t\}^\{E\}\}\([9](https://arxiv.org/html/2605.15363#S4.E9)\), and outputs a latent representation sequenceZt−N\+1,…,Zt\{Z\_\{t\-N\+1\},\\dots,Z\_\{t\}\}where each vector captures long\-range temporal dependencies and cross\-KPI interactions\. The resulting encoder representations summarize recent network dynamics and are used by the decoder to condition future forecasts\.
#### IV\-B2Decoder Operation During Training
During training, the decoder uses a teacher forcing scheme\[[12](https://arxiv.org/html/2605.15363#bib.bib11)\], which stabilizes optimization and enables the model to learn accurate temporal dependencies\. At each decoding step, the model is supplied with two types of inputs: \(i\) future categorical metadata, including the month, weekday, hour, minute, and carrier identity, all of which are known deterministically for future intervals; and \(ii\) continuous KPI values shifted by one time step, such that the decoder receives the ground truth continuous features from step \(k\-1\) when predicting step \(k\)\. For the first prediction step, the continuous inputs are initialized to zero since no earlier ground truth values exist\. A causal attention mask ensures that each prediction step only attends to earlier steps within the output horizon\. Through cross\-attention, the decoder aligns future predictions with the encoder representationZt−N\+1:tZ\_\{t\-N\+1:t\}, enabling it to integrate both historical context and future time metadata\. The decoder prediction head generates
X^t\+1:t\+M=fθ\(Zt−N\+1:t\),\\widehat\{X\}\_\{t\+1:t\+M\}=f\_\{\\theta\}\(Z\_\{t\-N\+1:t\}\)~,\(10\)where each vectorX^t\+1:t\+M\\widehat\{X\}\_\{t\+1:t\+M\}contains: \(i\) deterministic predictions for all non\-PRB KPIs; and \(ii\) quantile predictionsR^t\+k,q\\widehat\{R\}\_\{t\+k,q\}for residual PRBs with\(q∈0\.1,0\.5,0\.9\)\(q\\in\{0\.1,0\.5,0\.9\}\)\. This hybrid formulation enables probabilistic modeling of PRB availability while keeping the output dimension compact\.
#### IV\-B3Decoder Operation During Inference
During inference, ground truth future continuous KPI values are unavailable, so the decoder operates in a fully autoregressive mode\. A future metadata sequence containing categorical attributes such as month, weekday, hour, minute, and carrier ID is generated in advance since these values are known deterministically\. The corresponding continuous inputs for these future steps are initialized to zero, forming the initial decoder input\. The decoder then produces a full block of predictions for the nextMMtime steps\. These predicted continuous values are fed back into the model: the newestMMpredictions replace the oldest entries in the encoder input window, while the window length remains fixed atNN\. The updated window is then re\-encoded and passed to the decoder to generate the next block of forecasts \([6](https://arxiv.org/html/2605.15363#S3.E6)\)\.
The encoder\-decoder pipeline enables the model to learn both long\-term temporal structure from historical windows and short\-term fluctuations from localized KPI trends\. Deterministic outputs stabilize auxiliary KPI prediction, while quantile\-based residual PRB forecasting provides uncertainty estimates essential for slice admission, congestion avoidance, and RAN resource optimization\.
#### IV\-B4Loss Function and Probabilistic Training Objective
Future PRB availability is uncertain due to variations in user mobility, traffic bursts, and scheduling decisions\. To model this uncertainty,PRB\-RUPFormerpredicts three quantiles of the residual PRB distribution:q=0\.1q=0\.1,0\.50\.5, and0\.90\.9\. These correspond to pessimistic, median, and optimistic estimates of future residual PRBs\. Formally, the model outputsr^t\+k,q\\widehat\{r\}\_\{t\+k,q\}which denotes the predictedqq\-quantile of the residual PRB at future stept\+kt\+k\. It is defined such that
Pr\(rt\+k≤r^t\+k,q\)=q,\\Pr\\big\(r\_\{t\+k\}\\leq\\widehat\{r\}\_\{t\+k,q\}\\big\)=q,\(11\)meaning that a fractionqqof the possible future outcomes for the residual PRB is expected to fall below the predicted value\.
Each quantile is trained using the pinball \(quantile\) loss\[[6](https://arxiv.org/html/2605.15363#bib.bib10)\]:
Lq\(y,y^\)=\{q\(y−y^\),ify\>y^,\(1−q\)\(y^−y\),ify≤y^,L\_\{q\}\(y,\\hat\{y\}\)=\\begin\{cases\}q\(y\-\\hat\{y\}\),&\\text\{if \}y\>\\hat\{y\},\\\\\[4\.0pt\] \(1\-q\)\(\\hat\{y\}\-y\),&\\text\{if \}y\\leq\\hat\{y\},\\end\{cases\}\(12\)which penalizes under\- and over\-prediction asymmetrically to ensure correct quantile behavior\. The combined set of quantiles produces calibrated confidence intervals, such as the interval between the tenth and ninetieth percentiles\. These intervals quantify uncertainty in future PRB availability and support more reliable slice admission, congestion avoidance, and energy\-efficient resource allocation\.
For all other KPIs, the model generates deterministic predictions trained using standard L2 loss\. Since the paper focuses primarily on accurate residual PRB forecasting, the loss terms are weighted to emphasize quantile prediction\. LetLL2L\_\{\\text\{L2\}\}denote the mean squared error across non\-PRB KPIs andLQL\_\{\\text\{Q\}\}the sum of quantile losses \([12](https://arxiv.org/html/2605.15363#S4.E12)\) across \(0\.1, 0\.5, 0\.9\)\. The total training loss is
Ltotal=α×LL2\+β×LQ,L\_\{\\text\{total\}\}=\\alpha\\times L\_\{\\text\{L2\}\}\+\\beta\\times L\_\{\\text\{Q\}\}~,\(13\)whereα\\alphaandβ\\betacontrol the relative emphasis on deterministic KPI prediction and probabilistic residual PRB forecasting\. In practice, a largerβ\\betais selected to prioritize accurate modeling of residual PRB uncertainty\.
For all non\-PRB KPIs,PRB\-RUPFormeroutputs single deterministic values, whereas residual PRBs are predicted through multiple quantiles, a representative value is required for constructing the next input window during recursive inference \([6](https://arxiv.org/html/2605.15363#S3.E6)\)\. For this purpose, the median estimate \(theq=0\.5q=0\.5quantile\) is used as the residual PRB value for the subsequent prediction step\. The median provides a stable central estimate that avoids optimistic or pessimistic bias\.
## VNumerical Results and Analysis
### V\-ADataset Description and Training Setup
The evaluation is performed using operational LTE data collected from multiple commercial eNBs deployed across different locations in the United States\. The dataset spans approximately six months of measurements and contains KPI records for eNBs configured with three sectors and seven carriers per sector\. The first five months of data are used for training the proposedPRB\-RUPFormermodel, the subsequent 15 days serve as a validation set for hyperparameter tuning, and the final 15 days are held out for testing\. This chronological partition ensures that all evaluations reflect true future prediction behavior under realistic network operating conditions\.
The proposedPRB\-RUPFormermodel is implemented in PyTorch and executed on both a GPU platform \(NVIDIA Tesla V100\-PCIE\-32GB\) and a high\-performance CPU server equipped with a4444\-core,8888\-threadx86\_64x86\\\_64processor operating at up to3\.73\.7GHz\. This setup enables evaluation of computational efficiency under heterogeneous deployment conditions\.PRB\-RUPFormeradopts a Transformer encoder–decoder architecture with a model dimensionality ofdemb=64d\_\{\\text\{emb\}\}=64\. The encoder consists ofℓe=2\\ell\_\{e\}=2layers, while the decoder containsℓd=3\\ell\_\{d\}=3layers, each employing multi\-head attention with eight heads and a feed\-forward dimensionality of 256\. A dropout rate of 0\.1 is applied across all layers\. All categorical and temporal embeddings, including positional and calendar\-based components, are projected to the same dimensionality and learned jointly with the model parameters\.
For all experiments, the input sequence length is fixed toN=4N=4historical time steps, corresponding to one hour of aggregated KPI information\. The model predictsM=2M=2future steps in each decoding cycle\. During inference, extended forecasting horizons are produced recursively by sliding the input window forward and replacing the oldest historical measurements with model\-generated forecasts \([6](https://arxiv.org/html/2605.15363#S3.E6)\)\. Training is performed for 200 epochs with a batch size of 400\. The Adam optimizer is used with a learning rate of10−410^\{\-4\}, a weight decay factor of10−510^\{\-5\}, and gradient clipping with a maximum norm of 1\.0 to prevent divergence\. Early stopping is applied with a patience threshold of 10 epochs and a minimum improvement tolerance of10−510^\{\-5\}\. The weighting coefficients used in the training loss function \([13](https://arxiv.org/html/2605.15363#S4.E13)\), namelyα=0\.9\\alpha=0\.9andβ=1\.2\\beta=1\.2, are selected to place greater emphasis on modeling the uncertainty of residual PRBs, which is the primary objective of this work\. In the results that follow, we plot the normalized residual PRB ratio\. Following common practice for operational network datasets, residual PRB values are normalized to avoid disclosing absolute spectrum utilization levels\.
\(a\)One day ahead residual PRB prediction at Location A\.
\(b\)One day ahead residual PRB prediction at Location B\.
Figure 5:One day ahead recursive forecasting of residual PRBs for two representative LTE carriers deployed at two distinct eNB locations\.
### V\-BEvaluation Metrics
To assess the performance ofPRB\-RUPFormerin both short\- and long\-horizon forecasting, two complementary classes of metrics are considered: \(i\) deterministic metrics that evaluate the accuracy of the median point forecast, and \(ii\) probabilistic calibration metrics that measure the reliability of the predicted quantile intervals\. Together, these metrics quantify how well the model captures both the central trend and the uncertainty structure of residual PRBs under realistic network variability\.
#### V\-B1Median Forecast Error
The recursive forecasting pipeline relies on the median quantile \(q = 0\.5\) for rollout, making its accuracy central to overall prediction quality\. Letrt\+kr\_\{t\+k\}denote the ground\-truth residual PRB ratio at stepkk, andr^t\+k,0\.5\\widehat\{r\}\_\{t\+k,0\.5\}the corresponding predicted median\. The mean absolute error \(MAE\) is computed as
MAE=1K∑k=1K\|rt\+k−r^t\+k,0\.5\|,\\mathrm\{MAE\}=\\frac\{1\}\{K\}\\sum\_\{k=1\}^\{K\}\\left\|r\_\{t\+k\}\-\\widehat\{r\}\_\{t\+k,0\.5\}\\right\|~,\(14\)whereKKis the forecast horizon\. This metric provides a direct measure of the mismatch between the predicted and actual median trajectories\.
#### V\-B2Hit Probability
SincePRB\-RUPFormerpredicts quantiles atq=0\.1,0\.5,q=0\.1,0\.5,and0\.90\.9, an8080percent prediction interval is defined as
\[r^t\+k,0\.1,r^t\+k,0\.9\]\.\\big\[\\widehat\{r\}\_\{t\+k,0\.1\},\\ \\widehat\{r\}\_\{t\+k,0\.9\}\\big\]~\.\(15\)The hit probability measures the proportion of time steps for which the true value lies within the predicted interval:
HitProb=1K∑k=1K𝟏\{r^t\+k,0\.1≤rt\+k≤r^t\+k,0\.9\}\.\\mathrm\{HitProb\}=\\frac\{1\}\{K\}\\sum\_\{k=1\}^\{K\}\\mathbf\{1\}\\\!\\left\\\{\\widehat\{r\}\_\{t\+k,0\.1\}\\leq r\_\{t\+k\}\\leq\\widehat\{r\}\_\{t\+k,0\.9\}\\right\\\}~\.\(16\)A hit probability close to the nominal confidence level \(e\.g\.,0\.80\.8\) indicates well\-calibrated uncertainty estimates\. This metric is particularly important for network functions that depend on forecast reliability, such as proactive scheduling or slice resource guarantees\.
### V\-CExperimental Results
#### V\-C1One Day Recursive Forecasting Evaluation
This subsection evaluates the one\-day recursive forecasting performance ofPRB\-RUPFormerusing an input window ofN=4N=4historical KPI samples\. The model recursively predicts the nextM=96M=96time steps, corresponding to a full 24\-hour horizon, without access to future ground\-truth values\. Fig\.[5](https://arxiv.org/html/2605.15363#S5.F5)presents the resulting one\-day ahead residual PRB forecasts for a representative LTE carrier at two geographically distinct eNB locations in the United States, referred to as Location A and Location B\. For Location A, the proposedPRB\-RUPFormerachieves a low median forecast error, with a MAE of 0\.0182 on the residual PRB ratio\. The predicted median trajectory closely follows the ground truth across both low\-load nighttime periods and higher\-variability daytime office hours\. Forecast stability is reflected in a low standard deviation of the absolute median forecast error, equal to 0\.0178\. In addition, the hit probability reaches 0\.949, significantly exceeding the nominal 0\.80 confidence level, indicating that the predicted 10th–90th percentile interval reliably captures the true residual PRB values even during rapid load changes\.
In contrast, Location B exhibits more pronounced short\-term volatility, including sharper drops in residual PRBs during peak hours, indicative of sudden traffic bursts or scheduling irregularities\. Despite these challenges, the model maintains strong predictive performance, achieving an MAE of 0\.0204\. The increased variability in traffic conditions is reflected in a higher standard deviation of the absolute median forecast error, equal to 0\.0351, while the predicted uncertainty band appropriately expands to accommodate this variability\. The corresponding hit probability of 0\.898 remains above the nominal confidence level, confirming that the probabilistic forecasts remain well calibrated under more dynamic load patterns\.
Overall, these results highlight two important characteristics of the proposed model\. First, the median forecasts provide accurate point estimates across diverse operating regimes, as evidenced by consistently low MAE values at both locations\. Second, the adaptive width of the predicted confidence intervals enables robust uncertainty quantification, maintaining high hit probabilities even in the presence of abrupt traffic fluctuations\. Such behavior is particularly desirable for RAN control applications, where conservative yet informative estimates of future PRB availability are required to support proactive scheduling, congestion avoidance, and slice\-level decision making\.
#### V\-C2Seven Day Recursive Forecasting Evaluation
\(a\)Seven day ahead residual PRB prediction at Location A\.
\(b\)Seven day ahead residual PRB prediction at Location B\.
Figure 6:Seven day ahead recursive forecasting of residual PRBs for two representative LTE carriers deployed at two distinct eNB locations\.Next, we evaluate long\-horizon performance by recursively forecasting residual PRBs over a seven\-day horizon\. In this setting, the model repeatedly rolls forward its predictions to form subsequent inputs, thereby testing robustness to error accumulation and nonstationary traffic dynamics across multiple diurnal cycles\. Fig\.[6](https://arxiv.org/html/2605.15363#S5.F6)reports results for the same two representative LTE carriers at Location A and Location B, where the median forecast \(q = 0\.5\) is shown together with the 80% prediction interval \(q = 0\.1–0\.9\)\.
At Location A, the model tracks the repeating daily structure while remaining responsive to gradual level shifts across the week\. The resulting long\-horizon median accuracy yieldsMAE=0\.0298\\mathrm\{MAE\}=0\.0298withHitProb=0\.8863\\mathrm\{HitProb\}=0\.8863, indicating that the predicted 10th–90th percentile band contains the ground truth substantially more often than the nominal 0\.80 level\. The wider uncertainty band observed during pronounced troughs reflects increased variability in residual PRB availability, which can arise from bursty traffic, scheduling changes, and short\-term congestion episodes\. Importantly, most sharp downward excursions remain covered by the prediction interval, suggesting that the quantile outputs capture the elevated uncertainty encountered during low\-availability periods\.
For Location B, the model achieves a smaller median error ofMAE=0\.0175\\mathrm\{MAE\}=0\.0175withHitProb=0\.8994\\mathrm\{HitProb\}=0\.8994over the full seven\-day rollout\. As seen in Fig\.[6](https://arxiv.org/html/2605.15363#S5.F6)\(b\), the predicted median closely follows the day\-to\-day cycles, while the prediction interval remains tight yet still encloses most abrupt deviations\. Overall, both locations maintain hit probabilities above the nominal 80% target, demonstrating thatPRB\-RUPFormerpreserves probabilistic calibration under extended recursive forecasting\. The observed differences between locations primarily reflect site\-specific traffic volatility and the extent of abrupt residual PRB drops, which directly influence the required prediction\-interval width and the difficulty of maintaining a low\-error median trajectory over long horizons\.
#### V\-C3Aggregate Carrier\-Level Performance
In addition to the representative carrier results shown above, we also evaluate the 7\-day recursive forecasting performance across all carriers deployed at Location A\. Averaging over all source cells, the proposedPRB\-RUPFormerachieves a mean absolute error of 0\.0449, with a standard deviation of 0\.0207, indicating consistent prediction accuracy across heterogeneous carriers\. The corresponding average hit probability is 0\.8174, which remains above the nominal 0\.80 confidence level\. These results demonstrate that the model generalizes well beyond individual carriers and maintains well\-calibrated uncertainty estimates under diverse traffic and load conditions within the same eNB\.
#### V\-C4Computational Complexity and Runtime Performance
TABLE I:Computational characteristics ofPRB\-RUPFormer\.The proposedPRB\-RUPFormeris intentionally designed to be lightweight, containing fewer than 0\.309 million parameters with a total model size of 1\.18 MB\. This compact architecture enables efficient training and real\-time inference on both GPU and CPU platforms\.
Table[I](https://arxiv.org/html/2605.15363#S5.T1)summarizes the computational performance\. Training on an NVIDIA Tesla V100 completes in 11\.03 minutes, while CPU training on a 44\-core/88\-thread processor requires 12\.54 minutes, demonstrating minimal dependence on hardware acceleration\. GPU memory usage remains low \(≈\\approx80 MB\), and both platforms achieve an inference latency of5 ms, making the model suitable for deployment in latency\-sensitive RAN applications\. These results show that the model is computationally efficient, fast to train, and capable of real\-time inference even on CPU\-only systems, supporting practical deployment in RAN controllers and edge computing nodes\.
## VIConclusion
This paper presentedPRB\-RUPFormer, a recursive unified probabilistic Transformer for residual PRB forecasting across short\- and long\-term horizons in commercial LTE networks\. The proposed framework jointly models multivariate KPI dynamics across all carriers and sectors of an eNB using temporal, seasonal, and carrier\-aware embeddings, enabling a single lightweight model to capture cross\-carrier and cross\-sector coupling\. By producing quantile\-based prediction intervals and supporting recursive multi\-step rollout,PRB\-RUPFormerprovides uncertainty\-aware residual PRB forecasts over horizons ranging from minutes to multiple days\. Evaluation on six months of operational LTE data from multiple U\.S\. locations demonstrates accurate median prediction performance and well\-calibrated uncertainty estimates for both one\-day and seven\-day recursive forecasting, while maintaining low inference latency suitable for real\-time deployment\. These probabilistic residual PRB forecasts serve as a practical proxy for short\- and medium\-term spectrum availability and directly support spectrum\-aware RAN functions such as dynamic carrier activation, congestion avoidance, and risk\-aware spectrum sharing\. Future work will extend the framework to multi\-eNB and multi\-operator scenarios, incorporate external context signals such as events and mobility indicators to improve robustness under abrupt non\-stationarity, and integrate PRB forecasts into closed\-loop DSA control policies for proactive spectrum allocation and sharing\.
## Acknowledgment
The authors would like to thank Shankaranarayanan Puzhavakath Narayanan for insightful discussions and thoughtful comments that helped improve this work\.
## References
- \[1\]S\. Bhattarai, J\. J\. Park, B\. Gao, K\. Bian, and W\. Lehr\(2016\)An overview of dynamic spectrum sharing: ongoing initiatives, challenges, and a roadmap for future research\.IEEE Trans\. Cognitive Commun\. Netw\.2\(2\),pp\. 110–128\.External Links:[Document](https://dx.doi.org/10.1109/TCCN.2016.2592921)Cited by:[§I](https://arxiv.org/html/2605.15363#S1.p2.1)\.
- \[2\]A\. Farajzadeh, H\. Zheng, S\. Dumoulin, T\. Ha, H\. Yanikomeroglu, and A\. Ghasemi\(2025\)Data\-driven spectrum demand prediction: a spatio\-temporal framework with transfer learning\.arXiv preprint\.Note:arXiv:2508\.03863Cited by:[§II](https://arxiv.org/html/2605.15363#S2.p5.1)\.
- \[3\]C\. Gutterman, E\. Grinshpun, S\. Sharma, and G\. Zussman\(2019\)RAN resource usage prediction for a 5G slice broker\.Inin Proc\. ACM Int\. Symp\. Mobile Ad Hoc Netw\. Comput\. \(MobiHoc\),pp\. 231–240\.Cited by:[§II](https://arxiv.org/html/2605.15363#S2.p3.1),[§II](https://arxiv.org/html/2605.15363#S2.p5.1)\.
- \[4\]F\. Jiang, L\. Liu, R\. Cheng, S\. Wang, J\. Meng, S\. Zhang, and M\. Li\(2023\)Network downlink PRB utilization rate forecasting and evaluation method based on multi\-feature construction\.Inin Proc\. IEEE Joint Int\. Inf\. Technol\. Artif\. Intell\. Conf\. \(ITAIC\),pp\. 1419–1423\.Cited by:[§II](https://arxiv.org/html/2605.15363#S2.p2.1)\.
- \[5\]V\. Kasuluru, L\. Blanco, and E\. Zeydan\(2023\)On the use of probabilistic forecasting for network analysis in Open RAN\.Inin Proc\. IEEE Int\. Mediterranean Conf\. Commun\. Netw\. \(MeditCom\),pp\. 258–263\.Cited by:[§II](https://arxiv.org/html/2605.15363#S2.p4.1)\.
- \[6\]B\. Lim, S\. Ö\. Arık, N\. Loeff, and T\. Pfister\(2021\)Temporal fusion transformers for interpretable multi\-horizon time series forecasting\.Int\. J\. Forecast\.37\(4\),pp\. 1748–1764\.Cited by:[§IV\-B4](https://arxiv.org/html/2605.15363#S4.SS2.SSS4.p2.1)\.
- \[7\]D\. Minovski, N\. Ögren, K\. Mitra, and C\. Åhlund\(2021\)Throughput prediction using machine learning in LTE and 5G networks\.IEEE Trans\. Mobile Comput\.22\(3\),pp\. 1825–1840\.Cited by:[§II](https://arxiv.org/html/2605.15363#S2.p2.1)\.
- \[8\]A\. Mostafa, M\. A\. Elattar, and T\. Ismail\(2022\)Downlink throughput prediction in LTE cellular networks using time series forecasting\.Inin Proc\. IEEE Int\. Conf\. Broadband Commun\. Next Generation Networks Multimedia Appl\. \(CoBCom\),pp\. 1–4\.Cited by:[§II](https://arxiv.org/html/2605.15363#S2.p2.1)\.
- \[9\]A\. M\. Nagib, H\. Abou\-Zeid, H\. S\. Hassanein, A\. B\. Sediq, and G\. Boudreau\(2021\)Deep learning\-based forecasting of cellular network utilization at millisecond resolutions\.Inin Proc\. IEEE Int\. Conf\. Commun\. \(ICC\),pp\. 1–6\.Cited by:[§II](https://arxiv.org/html/2605.15363#S2.p3.1)\.
- \[10\]M\. Polese, L\. Bonati, S\. D’Oro, S\. Basagni, and T\. Melodia\(2023\)Understanding O\-RAN: architecture, interfaces, algorithms, security, and research challenges\.IEEE Commun\. Surv\. Tutor\.25\(2\),pp\. 1376–1411\.External Links:[Document](https://dx.doi.org/10.1109/COMST.2023.3239220)Cited by:[§I](https://arxiv.org/html/2605.15363#S1.p3.1)\.
- \[11\]G\. Premsankar, G\. Piao, P\. K\. Nicholson, M\. Di Francesco, and D\. Lugones\(2021\)Data\-driven energy conservation in cellular networks: a systems approach\.IEEE Trans\. Netw\. Service Manag\.18\(3\),pp\. 3567–3582\.Cited by:[§II](https://arxiv.org/html/2605.15363#S2.p1.1)\.
- \[12\]I\. Sutskever, O\. Vinyals, and Q\. V\. Le\(2014\)Sequence to sequence learning with neural networks\.Inin Proc\. Adv\. Neural Inf\. Process\. Syst\. \(NeurIPS\),Vol\.27\.Cited by:[§IV\-B2](https://arxiv.org/html/2605.15363#S4.SS2.SSS2.p1.1)\.
- \[13\]A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin\(2017\)Attention is all you need\.Inin Proc\. Adv\. Neural Inf\. Process\. Syst\. \(NeurIPS\),Vol\.30\.Cited by:[§IV\-B1](https://arxiv.org/html/2605.15363#S4.SS2.SSS1.p1.4)\.Similar Articles
Process Rewards with Learned Reliability
BetaPRM is a process reward model that predicts both a step-level success probability and the reliability of that prediction using a Beta belief from Monte Carlo continuations, enabling adaptive computation allocation that reduces token usage by up to 33.57% while improving accuracy.
Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories
This paper proposes a residual gap-aware transformer that combines a mixed-effects statistical reference with transformer-based residual learning to forecast 24-month CDR-SB change from ADNI clinical and biomarker histories, achieving reduced MSE and improved correlation over baselines.
ResBM: a new transformer-based architecture for low-bandwidth pipeline-parallel training, achieving 128× activation compression [R]
ResBM introduces a transformer-based architecture with residual encoder-decoder bottlenecks for pipeline-parallel training, achieving 128× activation compression while maintaining convergence. The work advances decentralized, internet-grade distributed training by reducing inter-stage communication overhead.
From Long News to Accurate Forecast: Importance-Aware Fusion and PRM-Guided Reflection for Time Series Forecasting
This paper introduces a framework for time series forecasting that uses importance-aware news compression and process reward model-guided retrieval to incorporate long news articles within fixed context limits, improving prediction accuracy across finance, energy, traffic, and Bitcoin benchmarks.
Distributional Process Reward Models: Calibrated Prediction of Future Rewards via Conditional Optimal Transport
This paper introduces Distributional Process Reward Models, using conditional optimal transport to calibrate PRMs for more accurate success probability estimates in inference-time scaling. It demonstrates improved calibration and downstream performance on mathematical reasoning benchmarks like MATH-500 and AIME.