Semantics-Enhanced Retrieval-Augmented Time Series Forecasting

arXiv cs.AI Papers

Summary

Proposes SERAF, a multimodal retrieval-augmented framework for time series forecasting that uses both numerical similarity and self-generated textual descriptions to retrieve historical patterns, improving forecasting under non-stationarity. Experiments on seven real-world datasets show effectiveness over state-of-the-art baselines.

arXiv:2606.14941v1 Announce Type: new Abstract: Time series forecasting models often benefit from historical patterns. Inspired by Retrieval-Augmented Generation (RAG), recent research explored retrieving relevant historical time series segments to enhance forecasting. However, relying solely on time series similarity is often insufficient for retrieval under non-stationarity. To address this, we propose a multimodal approach: a \textbf{S}emantics-\textbf{E}nhanced \textbf{R}etrieval-\textbf{A}ugmented Time Series \textbf{F}orecasting framework, SERAF. Unlike mainstream approaches that depend only on time series similarity, SERAF conducts dual retrieval over the time series and their self-generated textual descriptions. It retrieves two complementary sets of historical patterns and corresponding futures, which are selectively and jointly used to guide future predictions. Experiments across seven real-world datasets demonstrate the effectiveness of SERAF in bridging numerical and semantic views of time series compared with state-of-the-art baselines.
Original Article
View Cached Full Text

Cached at: 06/16/26, 11:43 AM

# Semantics-Enhanced Retrieval-Augmented Time Series Forecasting
Source: [https://arxiv.org/html/2606.14941](https://arxiv.org/html/2606.14941)
###### Abstract

Time series forecasting models often benefit from historical patterns\. Inspired by Retrieval\-Augmented Generation \(RAG\), recent research explored retrieving relevant historical time series segments to enhance forecasting\. However, relying solely on time series similarity is often insufficient for retrieval under non\-stationarity\. To address this, we propose a multimodal approach: aSemantics\-EnhancedRetrieval\-Augmented Time SeriesForecasting framework, SERAF\. Unlike mainstream approaches that depend only on time series similarity, SERAF conducts dual retrieval over the time series and their self\-generated textual descriptions\. It retrieves two complementary sets of historical patterns and corresponding futures, which are selectively and jointly used to guide future predictions\. Experiments across seven real\-world datasets demonstrate the effectiveness of SERAF in bridging numerical and semantic views of time series compared with state\-of\-the\-art baselines\.

Time Series Forecasting, Retrieval\-Augmented Generation, Multimodal Retrieval

## 1Introduction

Multivariate time series forecasting predicts future trajectories from historical observations and is central to traffic\(Lippiet al\.,[2013](https://arxiv.org/html/2606.14941#bib.bib8)\), energy\(Dautet al\.,[2017](https://arxiv.org/html/2606.14941#bib.bib6)\), finance\(Poon and Granger,[2003](https://arxiv.org/html/2606.14941#bib.bib19)\), and climate\(Priceet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib7)\)\. Methods have progressed from classical statistical models such as ARIMA\(Boxet al\.,[2015](https://arxiv.org/html/2606.14941#bib.bib9)\)to deep forecasters such as DLinear, a robust MLP\-based linear model\(Zenget al\.,[2022](https://arxiv.org/html/2606.14941#bib.bib2)\), and PatchTST, which uses patch\-level time series representations\(Nieet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib3)\)\. Recently, LLM\-based methods such as Time\-LLM\(Jinet al\.,[2023](https://arxiv.org/html/2606.14941#bib.bib4)\)and GPT4TS\(Zhouet al\.,[2023](https://arxiv.org/html/2606.14941#bib.bib5)\)have used textual context to inject background knowledge into forecasting\.

The most widely adopted approach for supporting context\-aware generation is retrieval\-augmented generation \(RAG\), which retrieves relevant documents from large external databases and has become a key component of modern LLM pipelines\(Lewiset al\.,[2020](https://arxiv.org/html/2606.14941#bib.bib10)\)\. Motivated by RAG, recent studies on time series forecasting have explored retrieval\-based approaches that construct historical databases to retrieve similar patterns, thereby explicitly leveraging the entire history to guide future prediction\. Representative efforts along this direction include RAFT, introducing multi\-periodicity for retrieval\-based forecasting\(Hanet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib1)\); TimeRAG, integrating retrieved sequences into LLM\-based forecasters\(Yanget al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib14)\); TRACE, aligning external text with time series for multimodal retrieval\(Chenet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib13)\); TS\-RAG, augmenting time series foundation models \(TSFMs\) via adaptive retrieval\(Ninget al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib12)\); and TimeRAF, incorporating channel prompting into retrieval\-augmented TSFMs\(Zhanget al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib11)\)\.

However, for challenging non\-stationary time series, most retrieval methods remain confined to time series similarity\(Hanet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib1); Yanget al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib14); Ninget al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib12); Zhanget al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib11)\), while multimodal approaches often rely on large external text corpora or LLM\-generated descriptions, leading to inefficiency and limited scalability\(Chenet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib13)\)\. To address this gap, we propose SERAF, aSemantics\-EnhancedRetrieval\-Augmented Time SeriesForecasting framework\. Beyond retrieving relevant patterns via TS similarity, SERAF introduces a semantic retrieval module based on textual descriptions generated directly from time series segments\. SERAF performs retrieval from both temporal and semantic perspectives, and adaptively fuses the retrieved results to enhance forecasting, all within a lightweight pipeline that requires no external annotations or domain\-specific texts\.

This is useful because two time series segments may differ in raw scale or local shape while sharing high\-level attributes such as season, trend, and volatility\. By indexing these attributes as text, SERAF can retrieve semantically relevant historical futures that are not necessarily nearest under TS similarity, complementing numerical retrieval without relying on external text resources\.

Our main contributions are summarized as follows:

- •We propose a novel semantic retrieval strategy based on textual descriptions, enriching the retrieval process beyond purely numerical time series similarity\.
- •The textual descriptions are automatically generated from time series segments, requiring no external texts, thus ensuring efficiency and scalability\.
- •Extensive experiments on seven real\-world datasets demonstrate SERAF’s forecasting improvements\.

![Refer to caption](https://arxiv.org/html/2606.14941v1/x1.png)Figure 1:Overview of SERAF\. Each retrieved TS pair contains a historical time series segment and its corresponding future\.![Refer to caption](https://arxiv.org/html/2606.14941v1/x2.png)Figure 2:An example of textual description of time series\.
## 2Method

### 2\.1Problem Formulation

Given a historical input sequence𝐗t−L\+1:t=\{Xt−L\+1,Xt−L\+2,…,Xt\}∈ℝL×C\\mathbf\{X\}\_\{t\-L\+1:t\}=\\\{X\_\{t\-L\+1\},X\_\{t\-L\+2\},\\dots,X\_\{t\}\\\}\\in\\mathbb\{R\}^\{L\\times C\}, whereLLis the look\-back window andCCis the number of channels, the goal is to predict𝐘t\+1:t\+H=\{Yt\+1,Yt\+2,…,Yt\+H\}\\mathbf\{Y\}\_\{t\+1:t\+H\}=\\\{Y\_\{t\+1\},Y\_\{t\+2\},\\dots,\\ Y\_\{t\+H\}\\\}over the nextHHtime steps\.

### 2\.2Overview

As shown in Figure[1](https://arxiv.org/html/2606.14941#S1.F1), SERAF performs retrieval\-augmented forecasting from both temporal and semantic information\. Given an input time series, a trainable encoder produces a naive prediction, while TS\-similar historical segments are retrieved from a time series database\. In parallel, a textual description of the input is embedded by a frozen text model to retrieve semantically similar descriptions from an aligned description database\. The retrieved futures from both modalities are Gaussian\-weighted, fused, gated with the naive prediction, and projected to produce the final prediction\. This semantic retrieval dimension enriches the search space beyond temporal patterns and improves forecasting\.

### 2\.3TS Database and TS Description Database

We construct the Time Series \(TS\) database by sliding a length\-LLwindow over the training set with stride 1 for dense historical coverage\. Each segment is paired with its future sequence of lengthHH\. The TS database isDT=\{\(𝐏Ti,𝐅Ti\)\}i=1ND\_\{T\}=\\\{\(\\mathbf\{P\}\_\{T\}^\{i\},\\mathbf\{F\}\_\{T\}^\{i\}\)\\\}\_\{i=1\}^\{N\}, where each pair contains a historical segment𝐏Ti\\mathbf\{P\}\_\{T\}^\{i\}and its future𝐅Ti\\mathbf\{F\}\_\{T\}^\{i\}\.

In parallel, we build a TS Description Database by generating a natural language description for each pair inDTD\_\{T\}with a predefined template\. As illustrated in Figure[2](https://arxiv.org/html/2606.14941#S1.F2), each description includes time period, season, main trend, and main volatility\. Time period and season come from timestamps, while trend and volatility use the most frequent channel\-level pattern\. Trend is categorized as upward, downward, or stable, and volatility as high, medium, or low\. We denote the database asDS=\{𝐐Si\}i=1ND\_\{S\}=\\\{\\mathbf\{Q\}\_\{S\}^\{i\}\\\}\_\{i=1\}^\{N\}, where each descriptionQSiQ\_\{S\}^\{i\}aligns with𝐏Ti\\mathbf\{P\}\_\{T\}^\{i\}inDTD\_\{T\}\.

### 2\.4Retrieval from Time Series Similarity

For TS retrieval, given thejj\-th input sequence𝐗j\\mathbf\{X\}^\{j\}, we compute the similarity scoreρi​j\\rho\_\{ij\}between𝐗j\\mathbf\{X\}^\{j\}and each historical segment𝐏Ti\\mathbf\{P\}\_\{T\}^\{i\}inDTD\_\{T\}using a similarity functions​i​msim:

ρi​j=s​i​m​\(𝐗j,𝐏Ti\),i∈\[1,N\]\.\\displaystyle\\rho\_\{ij\}=sim\(\\mathbf\{X\}^\{j\},\\mathbf\{P\}\_\{T\}^\{i\}\),\\quad i\\in\[1,N\]\.\(1\)We use Pearson’s correlation because it reduces the effects of scale variation and value shifts while emphasizing monotonic trends\(Hanet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib1)\)\. To avoid leakage, historical segments inDTD\_\{T\}that overlap with the input are excluded during training\. The valid index set is:

ℐvalid=\{\{i∈\[1,N\]∣i∉\[max\(1,j−\(L\+H−1\)\),min\(N,j\+\(L\+H−1\)\)\]\},if training,\{i∈\[1,N\]\},otherwise\.\\mathcal\{I\}\_\{\\text\{valid\}\}=\\left\\\{\\begin\{array\}\[\]\{@\{\}l@\{\\;\}l\}\\\{&i\\in\[1,N\]\\mid i\\notin\[\\max\(1,j\-\(L\+H\-1\)\),\\\\ &\\min\(N,j\+\(L\+H\-1\)\)\]\\,\\\},\\,\\text\{if training\},\\\\\[6\.0pt\] \\\{&i\\in\[1,N\]\\,\\\},\\,\\text\{otherwise\}\.\\end\{array\}\\right\.\(2\)The Top\-KKindex set𝒦Tj\\mathcal\{K\}\_\{T\}^\{j\}and retrieval setℛTj\\mathcal\{R\}\_\{T\}^\{j\}are:

𝒦Tj=Top\-​K​\{ρi​j∣i∈ℐvalid\},\|𝒦Tj\|=K,\\displaystyle\\mathcal\{K\}\_\{T\}^\{j\}=\\text\{Top\-\}K\\\{\\rho\_\{ij\}\\mid i\\in\\mathcal\{I\}\_\{\\text\{valid\}\}\\\},\\ \\ \|\\mathcal\{K\}\_\{T\}^\{j\}\|=K,\\\(3\)ℛTj=\{\(𝐏Tk,𝐅Tk\)∣k∈𝒦Tj\}\.\\mathcal\{R\}\_\{T\}^\{j\}=\\\{\\,\(\\mathbf\{P\}\_\{T\}^\{k\},\\mathbf\{F\}\_\{T\}^\{k\}\)\\mid k\\in\\mathcal\{K\}\_\{T\}^\{j\}\\,\\\}\.\(4\)TheKKretrieved segments are weighted by a Gaussian kernel, assigning larger weights to higher similarities:

αTk​j=exp⁡\(−\(1−ρk​j\)22​τ2\)∑i∈𝒦jexp⁡\(−\(1−ρi​j\)22​τ2\),\\alpha\_\{T\}^\{kj\}=\\frac\{\\exp\\\!\\left\(\-\\tfrac\{\(1\-\\rho\_\{kj\}\)^\{2\}\}\{2\\tau^\{2\}\}\\right\)\}\{\\sum\\limits\_\{i\\in\\mathcal\{K\}^\{j\}\}\\exp\\\!\\left\(\-\\tfrac\{\(1\-\\rho\_\{ij\}\)^\{2\}\}\{2\\tau^\{2\}\}\\right\)\},\(5\)whereτ\\tauis the Gaussian bandwidth\.

The TS\-similarity retrieved future is:

𝐅^Tj=∑k∈𝒦TjαTk​j​𝐅Tk\.\\hat\{\\mathbf\{F\}\}\_\{T\}^\{j\}\\;=\\;\\sum\_\{k\\in\\mathcal\{K\}\_\{T\}^\{j\}\}\\alpha\_\{T\}^\{kj\}\\,\\mathbf\{F\}\_\{T\}^\{k\}\.\(6\)

### 2\.5Retrieval from Semantic Similarity

For semantic retrieval, we generate a textual description𝐐j\\mathbf\{Q\}^\{j\}for input𝐗j\\mathbf\{X\}^\{j\}using the same template as in Figure[2](https://arxiv.org/html/2606.14941#S1.F2)\. This extracts attributes from the time series itself without external text\. A frozen text embedding model encodes𝐐j\\mathbf\{Q\}^\{j\}into𝐄j\\mathbf\{E\}^\{j\}and each historical description𝐐Si\\mathbf\{Q\}\_\{S\}^\{i\}into𝐄Si\\mathbf\{E\}\_\{S\}^\{i\}\. Semantic similarity is computed as:

si​j=s​i​m​\(𝐄j,𝐄Si\),i∈\[1,N\],\\displaystyle s\_\{ij\}=sim\(\\mathbf\{E\}^\{j\},\\mathbf\{E\}\_\{S\}^\{i\}\),\\quad i\\in\[1,N\],\(7\)wheres​i​msimis cosine similarity\. Analogous to Equation[4](https://arxiv.org/html/2606.14941#S2.E4), we retrieve Top\-KKTS pairs and define:

ℛSj=\{\(𝐏Tk,𝐅Tk\)∣k∈𝒦Sj\},\\mathcal\{R\}\_\{S\}^\{j\}=\\\{\\,\(\\mathbf\{P\}\_\{T\}^\{k\},\\mathbf\{F\}\_\{T\}^\{k\}\)\\mid k\\in\\mathcal\{K\}\_\{S\}^\{j\}\\,\\\},\(8\)where𝒦Sj\\mathcal\{K\}\_\{S\}^\{j\}is the Top\-KKindex set\. The retrieved segments are weighted as in Eq\.[5](https://arxiv.org/html/2606.14941#S2.E5), giving:

𝐅^Sj=∑k∈𝒦SjαSk​j​𝐅Tk\.\\hat\{\\mathbf\{F\}\}\_\{S\}^\{j\}\\;=\\;\\sum\_\{k\\in\\mathcal\{K\}\_\{S\}^\{j\}\}\\alpha\_\{S\}^\{kj\}\\,\\mathbf\{F\}\_\{T\}^\{k\}\.\(9\)

### 2\.6Fusion and Final Prediction

To adaptively balance the contributions of semantic and temporal retrieval, their aggregated futures𝐅^Sj\\hat\{\\mathbf\{F\}\}\_\{S\}^\{j\}and𝐅^Tj\\hat\{\\mathbf\{F\}\}\_\{T\}^\{j\}are first fused with a learnable weight:

𝐅^j=w​𝐅^Sj\+\(1−w\)​𝐅^Tj,\\hat\{\\mathbf\{F\}\}^\{j\}=w\\,\\hat\{\\mathbf\{F\}\}\_\{S\}^\{j\}\+\(1\-w\)\\,\\hat\{\\mathbf\{F\}\}\_\{T\}^\{j\},\(10\)wherew∈\(0,1\)w\\in\(0,1\)is a trainable parameter\.

In addition to the two retrieval modules, the inputXjX^\{j\}is passed through a linear time series \(TS\) encoder to produce a naive prediction𝐗^j\\hat\{\\mathbf\{X\}\}^\{j\}\. This output is then adaptively integrated with𝐅^j\\hat\{\\mathbf\{F\}\}^\{j\}through a gating mechanism, which dynamically balances their relative contributions:

𝐆j=β​𝐗^j\+\(1−β\)​𝐅^j,β=σ​\(W​\[𝐗^j;𝐅^j\]\)\.\\mathbf\{G\}^\{j\}=\\beta\\ \\hat\{\\mathbf\{X\}\}^\{j\}\+\(1\-\\beta\)\\hat\{\\mathbf\{F\}\}^\{j\},\\quad\\beta=\\sigma\\left\(W\[\\hat\{\\mathbf\{X\}\}^\{j\};\\hat\{\\mathbf\{F\}\}^\{j\}\]\\right\)\.\(11\)where\[⋅;⋅\]\[\\,\\cdot;\\cdot\\,\]denotes concatenation,WWis a learnable projection, andσ​\(⋅\)\\sigma\(\\cdot\)is the sigmoid function\.

Finally, the representation𝐆j\\mathbf\{G\}^\{j\}is mapped by an output projection to the final prediction𝐘^j\\hat\{\\mathbf\{Y\}\}^\{j\}, and the model is trained by minimizing the mean squared error \(MSE\) loss\.

## 3Experiment

### 3\.1Experimental Settings

Datasets\. We train and evaluate SERAF on seven widely used multivariate time series datasets: ETTh1, ETTh2, ETTm1, ETTm2, Exchange, Weather, and Electricity\(Wuet al\.,[2021](https://arxiv.org/html/2606.14941#bib.bib15)\)\.

Table 1:Comparison of SERAF and baselines over seven datasets\. All the results are with the same input time series length=720=720and averaged across four horizons \(96, 192, 336, 720\)\. Best results are shown inboldand second\-best results areunderlined\.Baselines\. We compare SERAF against seven state\-of\-the\-art time series forecasting models\. Autoformer\(Wuet al\.,[2021](https://arxiv.org/html/2606.14941#bib.bib15)\)and PatchTST\(Nieet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib3)\)are Transformer\-based forecasters\. TimeMixer\(Wanget al\.,[2024](https://arxiv.org/html/2606.14941#bib.bib20)\)is an MLP\-based multiscale mixing model\. DLinear\(Zenget al\.,[2022](https://arxiv.org/html/2606.14941#bib.bib2)\)employs a lightweight yet robust linear model\. RAFT\(Hanet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib1)\)enhances linear models with multi\-period retrieval modules based on TS similarity\. CycleNet\(Linet al\.,[2024](https://arxiv.org/html/2606.14941#bib.bib18)\)explicitly captures periodic patterns on linear backbones\. TimesNet\(Wuet al\.,[2023](https://arxiv.org/html/2606.14941#bib.bib17)\)detects dominant periods via Fourier analysis\.

Implementation details\.We follow the experimental settings of RAFT\(Hanet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib1)\)\. The batch size is set to 32, and the Adam optimizer is employed\. The input length is set as 720 andτ\\tauis set as 0\.1\. We use all\-MiniLM\-L6\-v2 as the text embedding model\. All experiments are conducted on one NVIDIA A100 GPU\. Each result reported in the tables is reproduced and averaged over three independent runs\.

### 3\.2Main Results

The forecasting results are shown in Table[1](https://arxiv.org/html/2606.14941#S3.T1)\. SERAF achieves the best MSE on all four ETT datasets and the best or tied\-best MAE on three of them, with second\-best MAE on ETTh2\. Compared with the retrieval\-based baseline RAFT, SERAF matches or improves both metrics across all seven datasets, reducing the averaged MSE and MAE by2\.56%2\.56\\%and1\.42%1\.42\\%, respectively\. Compared with CycleNet, SERAF reduces the averaged MSE and MAE by2\.95%2\.95\\%and1\.34%1\.34\\%, respectively\. These results suggest that semantic retrieval consistently improves over TS\-similarity retrieval, while its gains may be less pronounced on datasets whose future dynamics are harder to capture through coarse semantic descriptions alone\.

### 3\.3Ablation Study

We conduct three ablation studies on ETTh2 and ETTm2 datasets to evaluate the contributions of key components in SERAF, as shown in Table[2](https://arxiv.org/html/2606.14941#S3.T2)\. The results are averaged across four forecasting horizons \(96, 192, 336, 720\), with Full \(SERAF\) using the averaged results from Table[1](https://arxiv.org/html/2606.14941#S3.T1)\. Removing the semantic retrieval module \(w/o text\) leads to the most significant average drop, confirming the complementary benefits of semantic similarity beyond pure time series similarity\. Excluding the gating mechanism \(w/o gate\) and replacing it with simple concatenation, as in RAFT\(Hanet al\.,[2025](https://arxiv.org/html/2606.14941#bib.bib1)\), results in a general performance drop, highlighting the role of gating in adaptively balancing the encoder’s prediction with retrieval\-enhanced signals\. Finally, replacing the learnable weighted fusion with uniform averaging \(w/o weight\) also degrades performance, demonstrating the effectiveness of relevance\-aware weighting in fusing the two retrieval results\.

Table 2:Ablation study of SERAF on ETTh2 and ETTm2 datasets, with MSE and MAE averaged over four forecasting horizons \(96, 192, 336, 720\)\. Best results are shown inbold\.
### 3\.4Inference Efficiency Analysis

We profile inference\-stage efficiency on ETTh1 with forecasting horizon 96 and batch size 32\. All compared backbones are lightweight MLP\-based forecasters\. As shown in Table[3](https://arxiv.org/html/2606.14941#S3.T3), SERAF uses the smallest forecasting backbone, with 0\.088M parameters and 0\.34 MiB model GPU memory, compared with 0\.138M parameters and 0\.53 MiB for DLinear and 0\.097M parameters and 0\.37 MiB for RAFT\. SERAF achieves the fastest inference speed, requiring 0\.0075 seconds per iteration, compared with 0\.0203 and 0\.0230 seconds per iteration for DLinear and RAFT, respectively\.

Table 3:Inference\-stage efficiency profiling on ETTh1 with horizon 96 and batch size 32\.

## 4Conclusion

In this paper, we proposed SERAF, which enhances retrieval\-based forecasting by incorporating semantic retrieval with self\-generated textual descriptions of time series\. By constructing parallel time series and text databases, SERAF retrieves complementary historical patterns and adaptively fuses them through weighted fusion, while a gating module balances retrieval signals with the TS encoder’s initial prediction\. Experiments on seven real\-world datasets and ablation studies highlight SERAF’s design and demonstrate accuracy improvements\. For future work, we will explore richer text templates and refined Top\-KKretrieval to reduce redundancy and improve robustness, toward more interpretable and generalizable retrieval\-augmented time series forecasting\.

## References

- G\. E\. Box, G\. M\. Jenkins, G\. C\. Reinsel, and G\. M\. Ljung \(2015\)Time series analysis: forecasting and control\.John Wiley & Sons\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p1.1)\.
- J\. Chen, Z\. Zhao, G\. Nurbek, A\. Feng, A\. Maatouk, L\. Tassiulas, Y\. Gao, and R\. Ying \(2025\)TRACE: grounding time series in context for multimodal embedding and retrieval\.arXiv preprint arXiv:2506\.09114\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p2.1),[§1](https://arxiv.org/html/2606.14941#S1.p3.1)\.
- M\. A\. M\. Daut, M\. Y\. Hassan, H\. Abdullah, H\. A\. Rahman, M\. P\. Abdullah, and F\. Hussin \(2017\)Building electrical energy consumption forecasting analysis using conventional and artificial intelligence methods: a review\.Renewable and Sustainable Energy Reviews70,pp\. 1108–1118\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p1.1)\.
- S\. Han, S\. Lee, M\. Cha, S\. O\. Arik, and J\. Yoon \(2025\)Retrieval augmented time series forecasting\.InForty\-second International Conference on Machine Learning,Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p2.1),[§1](https://arxiv.org/html/2606.14941#S1.p3.1),[§2\.4](https://arxiv.org/html/2606.14941#S2.SS4.p1.8),[§3\.1](https://arxiv.org/html/2606.14941#S3.SS1.p2.1),[§3\.1](https://arxiv.org/html/2606.14941#S3.SS1.p3.1),[§3\.3](https://arxiv.org/html/2606.14941#S3.SS3.p1.1)\.
- M\. Jin, S\. Wang, L\. Ma, Z\. Chu, J\. Y\. Zhang, X\. Shi, P\. Chen, Y\. Liang, Y\. Li, S\. Pan,et al\.\(2023\)Time\-llm: time series forecasting by reprogramming large language models\.arXiv preprint arXiv:2310\.01728\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p1.1)\.
- P\. Lewis, E\. Perez, A\. Piktus, F\. Petroni, V\. Karpukhin, N\. Goyal, H\. Küttler, M\. Lewis, W\. Yih, T\. Rocktäschel,et al\.\(2020\)Retrieval\-augmented generation for knowledge\-intensive nlp tasks\.Advances in neural information processing systems33,pp\. 9459–9474\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p2.1)\.
- S\. Lin, W\. Lin, X\. Hu, W\. Wu, R\. Mo, and H\. Zhong \(2024\)Cyclenet: enhancing time series forecasting through modeling periodic patterns\.Advances in Neural Information Processing Systems37,pp\. 106315–106345\.Cited by:[§3\.1](https://arxiv.org/html/2606.14941#S3.SS1.p2.1)\.
- M\. Lippi, M\. Bertini, and P\. Frasconi \(2013\)Short\-term traffic flow forecasting: an experimental comparison of time\-series analysis and supervised learning\.IEEE Transactions on Intelligent Transportation Systems14\(2\),pp\. 871–882\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p1.1)\.
- Y\. Nie, N\. H\. Nguyen, P\. Sinthong, and J\. Kalagnanam \(2025\)A time series is worth 64 words: long\-term forecasting with transformers\.InThe Eleventh International Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p1.1),[§3\.1](https://arxiv.org/html/2606.14941#S3.SS1.p2.1)\.
- K\. Ning, Z\. Pan, Y\. Liu, Y\. Jiang, J\. Y\. Zhang, K\. Rasul, A\. Schneider, L\. Ma, Y\. Nevmyvaka, and D\. Song \(2025\)Ts\-rag: retrieval\-augmented generation based time series foundation models are stronger zero\-shot forecaster\.arXiv preprint arXiv:2503\.07649\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p2.1),[§1](https://arxiv.org/html/2606.14941#S1.p3.1)\.
- S\. Poon and C\. W\. J\. Granger \(2003\)Forecasting volatility in financial markets: a review\.Journal of economic literature41\(2\),pp\. 478–539\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p1.1)\.
- I\. Price, A\. Sanchez\-Gonzalez, F\. Alet, T\. R\. Andersson, A\. El\-Kadi, D\. Masters, T\. Ewalds, J\. Stott, S\. Mohamed, P\. Battaglia,et al\.\(2025\)Probabilistic weather forecasting with machine learning\.Nature637\(8044\),pp\. 84–90\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p1.1)\.
- S\. Wang, H\. Wu, X\. Shi, T\. Hu, H\. Luo, L\. Ma, J\. Y\. Zhang, and J\. Zhou \(2024\)TimeMixer: decomposable multiscale mixing for time series forecasting\.InThe Twelfth International Conference on Learning Representations,Cited by:[§3\.1](https://arxiv.org/html/2606.14941#S3.SS1.p2.1)\.
- H\. Wu, T\. Hu, Y\. Liu, H\. Zhou, J\. Wang, and M\. Long \(2023\)TimesNet: temporal 2d\-variation modeling for general time series analysis\.InThe Eleventh International Conference on Learning Representations,Cited by:[§3\.1](https://arxiv.org/html/2606.14941#S3.SS1.p2.1)\.
- H\. Wu, J\. Xu, J\. Wang, and M\. Long \(2021\)Autoformer: decomposition transformers with auto\-correlation for long\-term series forecasting\.Advances in neural information processing systems34,pp\. 22419–22430\.Cited by:[§3\.1](https://arxiv.org/html/2606.14941#S3.SS1.p1.1),[§3\.1](https://arxiv.org/html/2606.14941#S3.SS1.p2.1)\.
- S\. Yang, D\. Wang, H\. Zheng, and R\. Jin \(2025\)Timerag: boosting llm time series forecasting via retrieval\-augmented generation\.InICASSP 2025\-2025 IEEE International Conference on Acoustics, Speech and Signal Processing \(ICASSP\),pp\. 1–5\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p2.1),[§1](https://arxiv.org/html/2606.14941#S1.p3.1)\.
- A\. Zeng, M\. Chen, L\. Zhang, and Q\. Xu \(2022\)Are transformers effective for time series forecasting?\.arXiv preprint arXiv:2205\.13504\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p1.1),[§3\.1](https://arxiv.org/html/2606.14941#S3.SS1.p2.1)\.
- H\. Zhang, C\. Xu, Y\. Zhang, Z\. Zhang, L\. Wang, and J\. Bian \(2025\)Timeraf: retrieval\-augmented foundation model for zero\-shot time series forecasting\.IEEE Transactions on Knowledge and Data Engineering\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p2.1),[§1](https://arxiv.org/html/2606.14941#S1.p3.1)\.
- T\. Zhou, P\. Niu, L\. Sun, R\. Jin,et al\.\(2023\)One fits all: power general time series analysis by pretrained lm\.Advances in neural information processing systems36,pp\. 43322–43355\.Cited by:[§1](https://arxiv.org/html/2606.14941#S1.p1.1)\.

Similar Articles

Stationarity-Aware Retrieval-Augmented Time Series Forecasting

arXiv cs.LG

SARAF is a Stationarity-Aware Retrieval-Augmented Forecasting framework that adaptively balances relevance and diversity in retrieval for time series forecasting, modulating diversification strength based on dataset-level stationarity to handle non-stationary regime shifts. Accepted to KDD 2026, it demonstrates competitive performance over strong baselines on eight real-world datasets.

Nested Spatio-Temporal Time Series Forecasting

arXiv cs.LG

This paper proposes a nested spatiotemporal forecasting framework that uses spectral clustering to construct semantically coherent macro-level regions, which provide top-down guidance for fine-grained micro-level predictions. Experiments on high-dimensional datasets show consistent improvements over state-of-the-art baselines.

LEAF: A Living Benchmark for Event-Augmented Forecasting

arXiv cs.LG

LEAF is a living benchmark for evaluating large language models on event-augmented forecasting tasks, such as future event probabilities and time series forecasting. It uses a recursive retrieval agent system and dual-agent cross-validation to provide relevant auxiliary text, and shows that LLMs can leverage complex events to improve predictive performance.