Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting

arXiv cs.CL Papers

Summary

Proposes TempoWave, a plug-and-play temporal wavelet digit interface that maps time series observations into digit-wise embeddings from multi-wavelet coefficients, improving LLM-based time series forecasting and achieving state-of-the-art on multiple benchmarks.

arXiv:2606.26487v1 Announce Type: new Abstract: Large language models (LLMs) are attractive for context-aware time series forecasting because they can integrate heterogeneous textual signals, yet their discrete, language-oriented tokenization and embedding interfaces are misaligned with continuous numerical values, often harming numerical ordering and forecasting reliability. We propose TempoWave, a plug-and-play temporal wavelet digit interface that maps each scalar observation into digit-wise embeddings constructed from multi-wavelet, multi-scale coefficients. By directly overriding standard token representations, TempoWave seamlessly exposes both fine-grained local fluctuations and macro global structures in a transformer-compatible form, ensuring that precise numerical formatting, distinct digit identity, and robustness to common normalization operations are maintained throughout the LLM pipeline. Experiments across five context-enriched forecasting benchmarks demonstrate that TempoWave consistently improves LLM-based forecasters over standard numeric tokenization and alternative embedding interfaces, achieving a new state-of-the-art. These results highlight the numeric interface as a key bottleneck and suggest that principled multi-resolution embeddings can better couple LLMs' contextual reasoning with precise forecasting. Our code is available at https://github.com/DC-research/TempoWAVE and our model can be accessed at https://huggingface.co/Melady/TempoWAVE.
Original Article
View Cached Full Text

Cached at: 06/26/26, 05:17 AM

# Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting
Source: [https://arxiv.org/html/2606.26487](https://arxiv.org/html/2606.26487)
Defu Cao1111Equal contribution\. Correspondence to: Defucao@usc\.edu\. This work was done while Zijie Lei was at USC\.Muyan Weng1Jiao Sun1,3&Yan Liu1 1University of Southern California 2Meta 3Google DeepMind \{defucao, zijielei, muyanwen, jiaosun, yanliu\.cs\}@usc\.edu

###### Abstract

Large language models \(LLMs\) are attractive for context\-aware time series forecasting because they can integrate heterogeneous textual signals, yet their discrete, language\-oriented tokenization and embedding interfaces are misaligned with continuous numerical values, often harming numerical ordering and forecasting reliability\. We propose TempoWave, a plug\-and\-play temporal wavelet digit interface that maps each scalar observation into digit\-wise embeddings constructed from multi\-wavelet, multi\-scale coefficients\. By directly overriding standard token representations, TempoWave seamlessly exposes both fine\-grained local fluctuations and macro global structures in a transformer\-compatible form, ensuring that precise numerical formatting, distinct digit identity, and robustness to common normalization operations are maintained throughout the LLM pipeline\. Experiments across five context\-enriched forecasting benchmarks demonstrate that TempoWave consistently improves LLM\-based forecasters over standard numeric tokenization and alternative embedding interfaces, achieving a new state\-of\-the\-art\. These results highlight the numeric interface as a key bottleneck and suggest that principled multi\-resolution embeddings can better couple LLMs’ contextual reasoning with precise forecasting\. Our code is available at[DC\-research/TempoWAVE](https://github.com/DC-research/TempoWAVE)and our model can be accessed at[![[Uncaptioned image]](https://arxiv.org/html/2606.26487v1/figs/hf-logo.png)Melady/TempoWAVE](https://huggingface.co/Melady/TempoWAVE)\.

## 1Introduction

Time series analysis, the study of data points ordered chronologically, is indispensable across sectors such as finance, healthcare, and climate scienceCaoet al\.\([2025](https://arxiv.org/html/2606.26487#bib.bib6)\); Wanget al\.\([2026b](https://arxiv.org/html/2606.26487#bib.bib2)\)\. Accurate forecasting supports resource allocation, risk management, and early warning systems, yet the underlying data generating processes are often complex and evolvingCaoet al\.\([2023a](https://arxiv.org/html/2606.26487#bib.bib113)\); Zhanget al\.\([2022](https://arxiv.org/html/2606.26487#bib.bib112)\)\. Practical time series typically exhibit non\-stationarity, mixed periodicities, regime shifts, long\- and short\-range temporal dependencies, and substantial noiseLiuet al\.\([2022](https://arxiv.org/html/2606.26487#bib.bib341)\); Caoet al\.\([2020](https://arxiv.org/html/2606.26487#bib.bib371),[2021](https://arxiv.org/html/2606.26487#bib.bib82),[2023b](https://arxiv.org/html/2606.26487#bib.bib84)\)\. These properties make it difficult to learn models that simultaneously capture fine\-grained local fluctuations and long\-horizon global structure, while remaining robust under distribution shifts and limited supervision\.

In parallel, Large Language Models \(LLMs\)OpenAI \([2023](https://arxiv.org/html/2606.26487#bib.bib366)\)have become strong general\-purpose sequence learners\. They can exploit long contexts, perform in\-context pattern induction, and naturally integrate textual informationHuet al\.\([2025](https://arxiv.org/html/2606.26487#bib.bib122)\); Zhou and Yu \([2025](https://arxiv.org/html/2606.26487#bib.bib121)\); Zhanget al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib209)\)\. This is particularly appealing for time series intelligence because many exogenous drivers that affect temporal dynamics are expressed in language, such as policy changes, market news, clinical narratives, and operational logs\. Moreover, the few\-shot and zero\-shot generalization behavior of LLMs suggests a promising pathway for domains where labeled time series data is scarce and task distribution varies across entities, locations, or time periods\.

Despite this promise, directly adapting LLMs to time series forecasting remains challenging\. LLMs are optimized for discrete token prediction, whereas time series forecasting fundamentally requires precise modeling of continuous values\. This mismatch can lead to unreliable numerical behavior even when the model captures high\-level temporal patternsMerrillet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib380)\); Yeet al\.\([2025](https://arxiv.org/html/2606.26487#bib.bib5)\)\. More critically, language\-oriented tokenization fragments numbers into sub\-tokens in ways that are not tied to magnitude, for example “2026”→\\rightarrow“20” and “26”\. Such fragmentation breaks ordinal relations and obscures the continuity intrinsic to temporal processes\. As a result, two numerically close values may be mapped to very different token sequences, while numerically distant values can share sub\-tokens, introducing spurious similarity\. In LLM\-based forecasting pipelines, this translation layer between real\-valued sequences and discrete tokens becomes a principal bottleneck\.

Recent research has pursued several avenues to bridge the gap between LLMs and time series analysis\. One direction develops specialized foundation models tailored to time seriesCaoet al\.\([2026](https://arxiv.org/html/2606.26487#bib.bib3),[2024a](https://arxiv.org/html/2606.26487#bib.bib94)\)\. Another uses agentic or multimodal systems that couple LLMs with dedicated forecasting toolsYeet al\.\([2026](https://arxiv.org/html/2606.26487#bib.bib378)\); Liet al\.\([2026](https://arxiv.org/html/2606.26487#bib.bib88)\)\. A third direction focuses on input adaptations, including patchingNieet al\.\([2023](https://arxiv.org/html/2606.26487#bib.bib148)\), quantizationTalukderet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib129)\), or converting time series into symbolic or textual representationsJiaet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib382)\)\. While these approaches can improve usability and efficiency, they often trade away numerical faithfulness, blur fine\-grained variations, or rely on external components that reduce end\-to\-end differentiability and complicate analysis\. Consequently, there remains a persistent gap in representing continuous numerical values inside the standard transformer input space in a principled and information\-preserving manner\.

In this work, we focus on the representation bottleneck and ask whether LLMs can be equipped with numerically grounded embeddings that preserve quantitative relations and multi\-scale temporal structure while remaining compatible with standard transformer inputs\. An effective forecasting representation should support reasoning across resolutions\. Local changes are crucial for short\-term dynamics and anomaly\-sensitive regimes, while trends and seasonal components dominate long\-horizon behavior\. Motivated by wavelet analysis, which provides a natural multi\-resolution decomposition, we proposeMulti\-Wavelet Number Embedding \(TempoWave\), an input embedding interface that maps each scalar observation into a dense vector encoding multi\-scale structure\. TempoWave is designed to be injected into LLM backbones without requiring language tokenization of numbers, thereby reducing the disconnect between numeric magnitude and the model’s discrete interface\. Beyond forecasting accuracy, we also aim to understand how the embedding interface shapes numerical structure inside the model\. To this end, we introduce diagnostic analyses that probe whether local neighborhoods in representation space respect numeric ordering, a property we refer to as monotonic neighborhood consistency\. These analyses help explain when TempoWave improves forecasting and provide guidance for designing numerically grounded interfaces for LLMs\. Extensive experiments on diverse forecasting benchmarks show that TempoWave consistently improves LLM\-based forecasters compared to standard tokenization and common adaptation methods, and it is competitive with strong time\-series\-specific models in settings requiring precise numerical forecasting\.

Contributions\.Our contributions are as follows:

- •A wavelet\-based numeric interface for LLM forecasting\.We propose*Multi\-Wavelet Number Embedding \(TempoWave\)*, an input embedding interface that maps each real\-valued numerical observation into a dense vector with multi\-resolution structure, enabling direct use of standard LLM backbones for numerical forecasting without relying on language tokenization of numbers\.
- •Structural Faithful and Stable Numeric Representation\.We analyze TempoWave from a multi\-scale signal processing perspective and establish properties related to numerical faithfulness and stability, including improved separability across values and robustness to common normalization operations used in transformers\.
- •Comprehensive evaluation with diagnostic evidence\.We conduct extensive experiments on diverse forecasting benchmarks and show consistent gains over LLM baselines using tokenization and common input adaptations\. We further provide diagnostic analyses that probe monotonic neighborhood consistency, offering evidence for how TempoWave reshapes numerical neighborhoods and helping explain its forecasting improvements\.

## 2Related Work

Research on applying Large Language Models \(LLMs\) to time series analysis has grown rapidly, motivated by LLMs’ strong sequence modeling and their ability to integrate contextual information\. Existing efforts can be grouped into three directions: time series foundation models trained on large\-scale temporal corpora, LLM\-centered agentic or multimodal systems, and input adaptation strategies that re\-design how numerical values are represented for transformer backbones\.

#### Time series foundation models\.

A major direction is to pre\-train foundation models directly on large and diverse time series collections to learn transferable temporal representations\. Representative examples include ChronosAnsariet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib105)\), TimesFMDaset al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib74)\), and other large\-scale temporal pre\-training effortsWooet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib104)\); Caoet al\.\([2024b](https://arxiv.org/html/2606.26487#bib.bib90)\); Yanget al\.\([2025a](https://arxiv.org/html/2606.26487#bib.bib85)\)\. These models demonstrate the benefits of scaling and pre\-training for forecasting, but they often rely on patching, discretization, or quantization to interface with transformers, which can blur fine\-grained numerical differences and introduce information loss in precision\-sensitive regimes\.

#### LLM agents and multimodal time series systems\.

Another line of work uses general\-purpose LLMs as reasoning engines within larger pipelines, where forecasting or numerical computation is delegated to specialized tools and the LLM performs orchestration and interpretationYanget al\.\([2026](https://arxiv.org/html/2606.26487#bib.bib87)\); Wenget al\.\([2026](https://arxiv.org/html/2606.26487#bib.bib89)\); Yanget al\.\([2025b](https://arxiv.org/html/2606.26487#bib.bib83)\)\. Closely related are multimodal systems that jointly model time series and text, enabling context\-aware forecasting and question answering, for example TimeLLMJinet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib203)\), GPT4MTSJiaet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib382)\)\. While these approaches highlight the value of fusing textual signals, their performance still depends critically on how continuous values are encoded for transformer inputs, and numerical faithfulness can remain a bottleneck when the interface is token\-based or heavily discretized\.

#### Input adaptation and numeric representation for LLMs\.

A substantial body of work focuses on making numerical time series consumable by LLM backbones via input adaptation\. Common strategies include patch\-based representationsNieet al\.\([2023](https://arxiv.org/html/2606.26487#bib.bib148)\), discretization or binningTalukderet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib129)\), symbolic conversion of segments into stringsGoswamiet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib124)\), and embedding alignment methods that map time series embeddings into the language embedding spaceGruveret al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib91)\); Zenget al\.\([2023](https://arxiv.org/html/2606.26487#bib.bib132)\)\. More generally, recent studies on numeracy in language models emphasize that tokenization and discrete interfaces can impede faithful numerical reasoning, motivating alternative representations that preserve quantitative structureMerrillet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib380)\); Gillmanet al\.\([2025](https://arxiv.org/html/2606.26487#bib.bib17)\)\. These adaptations improve compatibility, but they often shift the burden to a preprocessing stage and may sacrifice either precision, locality, or multi\-scale structureWanget al\.\([2026a](https://arxiv.org/html/2606.26487#bib.bib86)\)\.

![Refer to caption](https://arxiv.org/html/2606.26487v1/x1.png)Figure 1:Overview of the TempoWave\-based forecasting framework with digit\-level tokens\. The input prompt is tokenized once using a tokenizer augmented with dedicated digit tokens\. Text and context tokens use standard embeddings, while digit tokens are routed to the TempoWave module, which constructs digit embeddings via multi\-wavelet, multi\-scale coefficients and overrides the corresponding token embeddings\. The resulting embedding sequence is fed into an unchanged LLM backbone trained via supervised fine\-tuning \(SFT\)\. The model generates numeric tokens that are parsed, de\-normalized, and evaluated as real\-valued forecasts\.
#### Positioning of TempoWave\.

Our work addresses the above interface challenge by introducing Multi\-Wavelet Number Embedding, which constructs numerically grounded embeddings that encode multi\-resolution structure prior to ingestion by an LLM\. In contrast to purely symbolic conversion or coarse discretization, TempoWave aims to preserve quantitative relations while providing a multi\-scale representation inspired by wavelet analysis, supporting both accurate forecasting and diagnostic analysis of the induced numerical neighborhood structure\.

## 3Methodology

### 3\.1Overview

To bridge the mismatch between continuous\-valued time series and the discrete input interface of Large Language Models \(LLMs\), we propose Multi\-Wavelet Number Embedding \(TempoWave\), a numerically grounded embedding interface that intervenes only at the token embedding layer while keeping the LLM backbone unchanged\. As illustrated in Figure[1](https://arxiv.org/html/2606.26487#S2.F1), TempoWave enables LLMs to process numerical sequences by replacing the embeddings of digit tokens with multi\-resolution wavelet\-based representations\.

#### Numeric\-to\-token interface\.

Given a normalized time series valuext∈ℝx\_\{t\}\\in\\mathbb\{R\}, we first render it into a fixed\-precision string withmp​r​e​cm\_\{prec\}integer digits andnp​r​e​cn\_\{prec\}fractional digits \(e\.g\.,V\.FFFF\)\. A single tokenization pass is then applied using a tokenizer augmented with*dedicated digit tokens*, where each digitdi∈\{0,…,9\}d\_\{i\}\\in\\\{0,\\ldots,9\\\}is treated as an individual token\. As a result, the input prompt is converted into a mixed token sequence consisting of text/context tokens and digit tokens\.

Standard token embeddings are used for text and context tokens\. For digit tokens, TempoWave computes digit embeddings via multi\-wavelet, multi\-scale features and*overrides*the standard embeddings\. This routing mechanism ensures that numerical structure is injected at digit positions only, without altering the remaining token representations\.

#### TempoWave embedding override\.

For each digit tokendd, TempoWave computes a set of wavelet coefficientsWψ,s​\(d\)W\_\{\\psi,s\}\(d\)over a predefined wavelet familyΨ\\Psiand scale setSS\. These coefficients are concatenated into a digit feature vectorϕ​\(d\)\\phi\(d\)and mapped to the LLM embedding dimension via a fixed alignment functiong​\(⋅\)g\(\\cdot\), producing the digit embeddingE​\(d\)∈ℝDE\(d\)\\in\\mathbb\{R\}^\{D\}\. The resulting digit embeddings replace the standard embeddings at digit positions, yielding the final input embedding sequence𝐇0∈ℝT×D\\mathbf\{H\}\_\{0\}\\in\\mathbb\{R\}^\{T\\times D\}, which is fed into the LLM\.

#### Context and training objective\.

For context\-aware forecasting, we fine\-tune the LLM via supervised fine\-tuning \(SFT\) on prompts that include \(i\) historical numeric values represented as fixed\-precision digit tokens, \(ii\) optional global descriptors such as Catch22 featuresLubbaet al\.\([2019](https://arxiv.org/html/2606.26487#bib.bib80)\), and \(iii\) situational context such as date and domain information\. The LLM backbone remains unchanged, and training minimizes the standard next\-token cross\-entropy loss to generate future numeric tokens corresponding to the nextkktime steps\.

### 3\.2TempoWave Construction

#### Wavelet dictionary\.

Letψ​\(t\)\\psi\(t\)denote a mother wavelet andψs,τ​\(t\)\\psi\_\{s,\\tau\}\(t\)denote its scaled and translated version:

ψs,τ​\(t\)=1s​ψ​\(t−τs\),\\psi\_\{s,\\tau\}\(t\)=\\frac\{1\}\{\\sqrt\{s\}\}\\psi\\left\(\\frac\{t\-\\tau\}\{s\}\\right\),\(1\)wheres\>0s\>0is the scale andτ\\tauis the translation\. We select a set of waveletsΨ=\{ψ1,…,ψk\}\\Psi=\\\{\\psi\_\{1\},\\ldots,\\psi\_\{k\}\\\}and a set of scalesS=\{s1,…,sl\}S=\\\{s\_\{1\},\\ldots,s\_\{l\}\\\}\. For TempoWave we use a fixed translation \(typicallyτ=0\\tau=0\)\.

#### Digit signal and wavelet coefficients\.

Each digitd∈\{0,…,9\}d\\in\\\{0,\\ldots,9\\\}is normalized tod~=d/9∈\[0,1\]\\tilde\{d\}=d/9\\in\[0,1\]\. To obtain wavelet coefficients without degeneracy from the zero\-mean property of admissible wavelets, we represent a digit as a discrete impulse on a fixed grid\. LetBBbe the grid resolution andtr=rB−1t\_\{r\}=\\frac\{r\}\{B\-1\}forr=0,…,B−1r=0,\\ldots,B\-1\. Define the digit indexq\(d\)=⌊d~\(B−1\)⌉q\(d\)=\\lfloor\\tilde\{d\}\\,\(B\-1\)\\rceil, and the digit signalfd∈ℝBf\_\{d\}\\in\\mathbb\{R\}^\{B\}as a Kronecker delta:

\(fd\)r=\{1,r=q​\(d\),0,otherwise\.\(f\_\{d\}\)\_\{r\}=\\begin\{cases\}1,&r=q\(d\),\\\\ 0,&\\text\{otherwise\}\.\\end\{cases\}\(2\)For each waveletψi\\psi\_\{i\}at scalesjs\_\{j\}, we sampleψi,sj,0​\(t\)\\psi\_\{i,s\_\{j\},0\}\(t\)on the same grid to obtain a discrete vector𝝍i,sj∈ℝB\\bm\{\\psi\}\_\{i,s\_\{j\}\}\\in\\mathbb\{R\}^\{B\}, and define the digit wavelet coefficient as

Wψi,sj​\(d\):=⟨fd,𝝍i,sj⟩=\(𝝍i,sj\)q​\(d\)\.W\_\{\\psi\_\{i\},s\_\{j\}\}\(d\):=\\langle f\_\{d\},\\bm\{\\psi\}\_\{i,s\_\{j\}\}\\rangle=\(\\bm\{\\psi\}\_\{i,s\_\{j\}\}\)\_\{q\(d\)\}\.\(3\)This definition is consistent with the continuous formulation using an impulse and avoids the trivial zero coefficients produced by projecting constants onto zero\-mean wavelets\.

#### Digit embedding and dimension matching\.

We concatenate multi\-wavelet, multi\-scale coefficients into a feature vector

ϕ​\(d\)=vec​\(\[Wψi,sj​\(d\)\]i=1\.\.k,j=1\.\.l\)∈ℝk​l\.\\phi\(d\)=\\mathrm\{vec\}\\left\(\\left\[W\_\{\\psi\_\{i\},s\_\{j\}\}\(d\)\\right\]\_\{i=1\.\.k,\\,j=1\.\.l\}\\right\)\\in\\mathbb\{R\}^\{kl\}\.\(4\)To interface with an LLM of embedding dimensionDD, we mapϕ​\(d\)\\phi\(d\)to a token embeddingE​\(d\)∈ℝDE\(d\)\\in\\mathbb\{R\}^\{D\}via a fixed mappingg​\(⋅\)g\(\\cdot\), which can be zero\-padding whenk​l≤Dkl\\leq Dor a lightweight linear projection whenk​l≠Dkl\\neq D:

E​\(d\)=g​\(ϕ​\(d\)\)∈ℝD\.E\(d\)=g\(\\phi\(d\)\)\\in\\mathbb\{R\}^\{D\}\.\(5\)Since there are only ten digits,E​\(0\),…,E​\(9\)E\(0\),\\ldots,E\(9\)can be precomputed and cached as a small embedding table\.

#### TempoWave for a real number and injection into LLMs\.

Letxxbe formatted into a fixed\-precision digit sequence\(d1,…,dNd​i​g\)\(d\_\{1\},\\ldots,d\_\{N\_\{dig\}\}\)withNd​i​g=mp​r​e​c\+np​r​e​cN\_\{dig\}=m\_\{prec\}\+n\_\{prec\}\. TempoWave representsxxas a sequence of digit token embeddings

TempoWave​\(x\)=\[E​\(d1\),E​\(d2\),…,E​\(dNd​i​g\)\]\.\\mathrm\{TempoWave\}\(x\)=\\left\[E\(d\_\{1\}\),E\(d\_\{2\}\),\\ldots,E\(d\_\{N\_\{dig\}\}\)\\right\]\.\(6\)In the input prompt, each digit token is embedded byE​\(di\)E\(d\_\{i\}\), while other tokens use the original LLM embedding lookup\. Standard positional encodings are applied as usual\.

#### Summary on generation algorithm\.

Givenxx, we \(1\) extract digits according to\(mp​r​e​c,np​r​e​c\)\(m\_\{prec\},n\_\{prec\}\), \(2\) computeϕ​\(d\)\\phi\(d\)by evaluating wavelet samples atq​\(d\)q\(d\)for each\(ψi,sj\)\(\\psi\_\{i\},s\_\{j\}\), \(3\) obtainE​\(d\)E\(d\)viag​\(⋅\)g\(\\cdot\), and \(4\) assemble the digit\-embedding sequenceTempoWave​\(x\)\\mathrm\{TempoWave\}\(x\)\.

### 3\.3Representation Faithfulness in LLMs

TempoWave is designed to provide a faithful and stable numeric interface for large language models by explicitly accounting for both the discrete nature of digits and the architectural properties of Transformers\. A central challenge in this setting is the pervasive use of normalization layers, such as LayerNorm and RMSNorm, which rescale and re\-center token embeddings at every layer\. When numerical information is encoded primarily through absolute magnitudes, such normalization can severely distort or even collapse numerical distinctions, especially under deep stacking and autoregressive decoding\.

The key design principle of TempoWave is to encode digits through*structured multi\-scale patterns*rather than raw scalar values\. Each digit is mapped to a vector of wavelet coefficients across multiple wavelet families and scales, capturing characteristic geometric patterns in the coefficient space\. By concatenating these coefficients and applying a fixed dimension\-alignment mapping, TempoWave constructs digit embeddings whose identity is determined by relative patterns instead of absolute scale\. As a result, subsequent normalization operations mainly act as global affine transformations and do not destroy the structural differences between digits\.

From a representational standpoint, this construction induces a*finite digit codebook*in the embedding space\. Because the digit set is finite, injectivity of this codebook implies the existence of a positive separation margin between different digits, which guarantees robust nearest\-neighbor recoverability under small perturbations\. This property underliesdigit recoverabilityand, by extension,numeracy preservationunder fixed precision, since each digit can be recovered independently from its embedding\.

The use of multiple wavelets and multiple scales further enhances this separation\. Concatenating coefficients across wavelet\-scale pairs cannot decrease pairwise distances between digit embeddings and typically increases them, thereby improving or maintaining the separation margin\. This explains why TempoWave exhibits enhanced discriminability compared to single\-scale or single\-frequency numeric encodings, as formally analyzed in the appendix\.

Crucially, we also analyze how the induced digit codebook behaves under common normalization layers in Transformers\. We show that LayerNorm and RMSNorm can only collapse two embeddings under highly restricted affine conditions\. As long as the normalized digit codebook remains injective, digit identities remain uniquely recoverable after normalization\. Empirically, the multi\-wavelet construction yields well\-separated digit embeddings that remain distinct throughout the LLM\.

Table 1:Forecasting performance \(RMSE/MAE\) across five context\-enriched datasets\.TempoWAVE achieves new SOTA on 7/10 metrics and ranks second on the remaining three\. Best values arebolded, second\-best areunderlined\. The last row reports the relative↓\\downarrowimprovement of MWNE over the previous best method for each metric\.Table 2:Ablation Study: Forecasting performance \(RMSE/MAE\) across datasets under different context settings\.Full\-context prompt exampleInput:Historical load time series \(48 half\-hour points\), e\.g\., “…, \-0\.3849, \-0\.4859, \-0\.6162, \-0\.7185, …”Context:Region: VIC; Dates: 2021\-05\-12→\\rightarrow2021\-05\-13 \(weekday, non\-holiday\); Resolution: 30 minutes; Horizon: next 24 hours \(48 points\); Auxiliary features: daily weather statistics \(temperature, humidity, pressure\)\.Instruction:Predict the next 48 load values using the time series and contextual information\. Similarity is assessed using Catch22 statistical descriptors to preserve autocorrelation, periodicity, and fluctuation patterns\.Output:Predicted sequence only, e\.g\., “…, 0\.3918, 0\.3817, 0\.4148, 0\.4327, 0\.4201, …”

## 4Experiments

### 4\.1Experimental Setup

#### Datasets\.

We evaluate TempoWave on context\-enriched forecasting benchmarks where each time series segment is paired with additional textual or event\-based context\. First, we use the CGTSF dataset released via Hugging Face DatasetsWanget al\.\([2025](https://arxiv.org/html/2606.26487#bib.bib126)\), which contains three collections: MSPG \(solar power generation from 27 sites in Melbourne, 2021–2022, 15\-minute frequency\), LEU \(electricity usage from 16 London households, 2012–2013, 30\-minute frequency\), and PTF \(traffic flow from 32 Paris detectors in Paris, 2012, hourly frequency\)\. Each example includes a historical numerical window and associated context such as background descriptions, weather information \(from Open\-Meteo\), date and holiday indicators, and curated news text when available\. We follow the official data splits and preprocessing protocol provided by the dataset source\.

We additionally use the context\-aware forecasting datasets fromWanget al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib130)\), including Australia \(AUL\) and Bitcoin \(BIT\), which pair time series with relevant news articles\. For AUL and BIT, we follow the original preprocessing, normalization, and train/validation/test splits to ensure comparability with prior work\.

#### Task formulation\.

Given a historical window of observations and its associated context, the model predicts the nextkkfuture values\. We adopt a generative formulation: each numeric value is rendered into a fixed\-precision string \(e\.g\.,V\.FFFF\) and the model generates future values as token sequences\. During fine\-tuning, we minimize the standard cross\-entropy loss over next\-token prediction\.

#### Decoding and numeric parsing\.

At inference time, generated token sequences are converted back to real values by parsing the fixed\-precision numeric strings\. An example of the full\-context prompt is detailed in the accompanying text box\. If a generated output violates the numeric format \(e\.g\., missing digits or containing non\-numeric tokens\), we apply a deterministic fallback parsing rule; if parsing still fails, the prediction for that step is treated as invalid and is counted in the evaluation according to the protocol\. All formatting and parsing rules are fixed across methods to ensure a fair comparison\.

#### Evaluation metrics\.

Forecasting accuracy is measured using Mean Absolute Error \(MAE\) and Root Mean Squared Error \(RMSE\) across multiple prediction horizons\. Metrics are computed after inverting dataset\-specific normalization when applicable, following the evaluation protocol of the corresponding benchmarks\.

#### Baselines and fairness\.

We compare TempoWave\-enhanced LLMs against a comprehensive set of baselines\.\(i\) LLM\-based baselines\.These methods use the same prompt templates and contextual inputs as TempoWave and differ only in the numeric interface, including standard tokenization and alternative input adaptation strategies\.\(ii\) Time\-series\-specific baselines\.We also report results from established forecasting models that primarily operate on numerical history, including DLinearZenget al\.\([2023](https://arxiv.org/html/2606.26487#bib.bib132)\), N\-BEATSOreshkinet al\.\([2019](https://arxiv.org/html/2606.26487#bib.bib257)\), InformerZhouet al\.\([2021](https://arxiv.org/html/2606.26487#bib.bib299)\), AutoformerWuet al\.\([2021](https://arxiv.org/html/2606.26487#bib.bib312)\), and TimesNetWuet al\.\([2023](https://arxiv.org/html/2606.26487#bib.bib331)\), as well as large\-scale time series foundation models such as ChronosAnsariet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib105)\)and MoiraiWooet al\.\([2024](https://arxiv.org/html/2606.26487#bib.bib104)\)\.\(iii\) Context\-aware and embedding\-interface baselines\.We include ChatTimeWanget al\.\([2025](https://arxiv.org/html/2606.26487#bib.bib126)\)as a representative multimodal LLM system for forecasting with text context, and FoNEZhouet al\.\([2025](https://arxiv.org/html/2606.26487#bib.bib81)\)as an alternative numerical embedding interface\.

### 4\.2Main Results

Table[1](https://arxiv.org/html/2606.26487#S3.T1)summarizes forecasting performance on five context\-enriched benchmarks spanning news\-driven series \(AUL,BIT\) and sensor or infrastructure series \(MSPG,PTF,LEU\)\. Overall, TempoWave establishes a new state of the art on 7 out of 10 reported metrics and achieves top\-2 performance on all metrics\. Relative to the previous best method per metric \(last row of Table[1](https://arxiv.org/html/2606.26487#S3.T1)\), TempoWave yields an average 7\.0% relative improvement on MAE across datasets, with the largest gains onLEU\(14\.4%\) andAUL\(11\.2%\)\.

#### Dataset\-wise improvements and robustness\.

TempoWave delivers the most consistent gains on news\-driven datasets\. OnAUL, TempoWave improves both RMSE and MAE over the previous best by 7\.3% and 11\.2%, respectively\. OnBIT, TempoWave achieves 4\.7% \(RMSE\) and 6\.3% \(MAE\) improvements over the previous best\. OnMSPG, TempoWave achieves the best RMSE \(1\.9% improvement\) while remaining close to the best MAE \(within 2\.0% relative to the previous best\)\. ForPTFandLEU, TempoWave attains the best MAE \(5\.3% and 14\.4% improvements\), and achieves the second\-best RMSE with small absolute gaps to the best baseline \(0\.0096 onPTF, 0\.0124 onLEU\)\.

#### MAE improves more consistently than RMSE\.

A recurring pattern is that TempoWave improves MAE more consistently than RMSE\. In particular, TempoWave reduces MAE on 4/5 datasets, while RMSE improvements are observed on 3/5 datasets\. This discrepancy is expected because RMSE emphasizes rare large deviations, whereas MAE better reflects typical per\-step errors\.

#### Comparison to numeric interfaces and time\-series baselines\.

Compared with the Fourier\-based numeric interface \(FoNE\) using the same LLM backbone, TempoWave is substantially more robust across domains\. The advantage is most prominent onBIT, where TempoWave reduces RMSE from 1\.71 to 0\.80 and MAE from 1\.52 to 0\.70, indicating improved generalization under highly non\-stationary and event\-driven dynamics\. Finally, TempoWave\-enhanced LLMs outperform classic time\-series forecasting models across all datasets in Table[1](https://arxiv.org/html/2606.26487#S3.T1), highlighting the benefit of combining external context with a numerically grounded embedding interface\.

## 5Analysis

### 5\.1Ablation Study on Contextual Information

To better understand how TempoWave interacts with different forms of contextual information in time series forecasting, we conduct a systematic ablation study over four context configurations, as summarized in Table[2](https://arxiv.org/html/2606.26487#S3.T2)\. Across five diverse datasets \(AUL,BIT,MSPG,PTF, andLEU\), we progressively remove components from the full context setting to isolate their individual and combined effects\.

#### Overall impact of contextual information\.

The results show a clear and consistent trend: incorporating richer contextual information leads to improved forecasting performance across all datasets\. The full context setting achieves the best RMSE on all five datasets and the best MAE on four out of five datasets\. In contrast, removing all contextual information results in the weakest performance, indicating that TempoWave alone, while effective, benefits substantially from complementary contextual signals\. This trend is particularly pronounced on news\-driven datasets such asAULandBIT, where RMSE improves from 0\.3809 to 0\.3391 onAULand from 0\.8356 to 0\.7979 onBITwhen moving from no context to full context\.

#### Contribution of different context components\.

Comparing partial ablations reveals that different types of context contribute in distinct and complementary ways\. Removing Catch22 features \(*w/o Catch22*\) leads to noticeable degradation across most datasets, suggesting that statistical descriptors capturing autocorrelation, periodicity, and distributional properties provide strong global signals for forecasting\. Indeed, the*w/o Catch22*setting consistently underperforms the full context configuration\. Conversely, removing situational context \(*w/o situational context*\) primarily affects datasets with strong external dependencies, such asAULandBIT, where date and domain\-related information play a more prominent role\.

#### Dataset\-specific behavior\.

The relative importance of context components varies across datasets\. For infrastructure and sensor\-driven datasets \(MSPG,PTF, andLEU\), Catch22 features alone already provide strong performance, in some cases matching or approaching the full context results\. For example, onMSPG, the*w/o situational context*setting achieves the best MAE \(0\.1901\), indicating that short\-term statistical regularities dominate forecasting performance\. In contrast, onAULandBIT, which are influenced by external events and news, the full context setting yields the largest gains, highlighting the importance of integrating situational and textual information with TempoWave\.

![Refer to caption](https://arxiv.org/html/2606.26487v1/figs/Token_TempoWave.png)Figure 2:Token ID difference distribution between predicted tokens and their reference counterparts forTempoWave\-embedded Qwen 2\.5 1\.5B model, under the top\-10 prediction setting\. The histogram illustrates raw frequency, while the smoothed curve highlights the overall trend\. The sharp concentration around zero indicates strong local proximity in token prediction\.![Refer to caption](https://arxiv.org/html/2606.26487v1/figs/Token_FoNE.png)Figure 3:Token ID difference distribution between predicted tokens and their reference counterparts forFoNE\-embedded Qwen 2\.5 1\.5B model\(baseline\), under the top\-10 prediction setting\. The histogram illustrates raw frequency, while the smoothed curve highlights the overall trend\. The sharp concentration around zero indicates strong local proximity in token prediction\.![Refer to caption](https://arxiv.org/html/2606.26487v1/figs/Token_Chattime.png)Figure 4:Token ID difference distribution between predicted tokens and their reference counterparts forChatTime\-7B\-Chat model\(baseline\), under the top\-10 prediction setting\. The histogram illustrates raw frequency, while the smoothed curve highlights the overall trend\. The sharp concentration around zero indicates strong local proximity in token prediction\.

### 5\.2Embedding Alignment via Next Token Proximity

To evaluate the semantic and structural alignment of different embedding strategies, we analyze the distribution of token ID proximity between the model’s predicted next token and the immediately preceding token in the input prompt\. This probing task is particularly informative in our setting, where tokens represent numerical values derived from time series data\. A well\-structured embedding should induce a smooth, symmetric distribution reflecting temporal continuity\. Our method, as shown in Figure[2](https://arxiv.org/html/2606.26487#S5.F2), exhibits a clear unimodal, approximately Gaussian distribution centered around zero, indicating that the model learns to predict numerically coherent tokens aligned with the underlying time series dynamics\. In contrast, the FoNE baseline in Figure[3](https://arxiv.org/html/2606.26487#S5.F3), evaluated with the same backbone but without an explicit numeric inductive bias, exhibits a flatter and more irregular distribution, indicating weaker alignment with underlying numeric trends\. More notably, a standard pretrained baseline without our embedding augmentation in Figure[4](https://arxiv.org/html/2606.26487#S5.F4)exhibits a sharp, anomalous spike in one bin, revealing a tendency to overfit by repeatedly predicting a fixed token, regardless of local context\. These results underscore the effectiveness of our embedding approach in capturing latent numerical semantics and encoding smooth transitions that mirror real\-world time series behavior\.

## 6Conclusion

While directly applying large language models \(LLMs\) to time series analysis remains challenging due to the mismatch between continuous values and discrete token interfaces, the potential payoff is substantial\. In this paper, we proposed Multi\-Wavelet Number Embedding \(TempoWave\), a numerically grounded embedding interface that leverages multi\-resolution wavelet features to bridge the numerical–textual modality gap for time series forecasting\. Extensive experiments on five diverse benchmarks show that TempoWave consistently improves LLM\-based forecasters, outperforming strong specialized time series models and alternative numeric embedding approaches in most settings\. Empirically, TempoWave is more robust under non\-stationarity and extreme values, and exhibits favorable optimization behavior, including smoother training dynamics, resilience to digit\-level perturbations, and stable interaction with common normalization layers\. Ablation results further highlight that contextual information is complementary to TempoWave and contributes to the strongest overall performance\. Together, these findings advance LLM\-based forecasting by coupling LLMs’ contextual reasoning with a more faithful numeric interface\. A promising direction for future work is to investigate whether TempoWave also benefits non\-contextual forecasting pipelines that rely on discretization or binning\-based tokenization of time series values\.

## Ethical Statement

There are no ethical issues\.

## Acknowledgements

This work is partially supported by the NSF Award \#2425919, and NSF Award \#2413417\. The funding from these sources has been a cornerstone in enabling us to bring our project to fruition\. We are also deeply grateful to the anonymous reviewers for their rigorous review process\. Their detailed comments and constructive suggestions have significantly contributed to the improvement of this paper\.

## References

- A\. F\. Ansari, L\. Stella, C\. Turkmen, X\. Zhang, P\. Mercado, H\. Shen, O\. Shchur, S\. S\. Rangapuram, S\. P\. Arango, S\. Kapoor,et al\.\(2024\)Chronos: learning the language of time series\.arXiv preprint arXiv:2403\.07815\.Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px5.p1.1)\.
- D\. Cao, J\. Enouen, Y\. Wang, X\. Song, C\. Meng, H\. Niu, and Y\. Liu \(2023a\)Estimating treatment effects from irregular time series observations with hidden confounders\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.37,pp\. 6897–6905\.Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p1.1)\.
- D\. Cao, M\. Gee, J\. Liu, H\. Wang, W\. Yang, R\. Wang, and Y\. Liu \(2025\)Conversational time series foundation models: towards explainable and effective forecasting\.arXiv preprint arXiv:2512\.16022\.Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p1.1)\.
- D\. Cao, F\. Jia, S\. O\. Arik, T\. Pfister, Y\. Zheng, W\. Ye, and Y\. Liu \(2024a\)TEMPO: prompt\-based generative pre\-trained transformer for time series forecasting\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=YH5w12OUuU)Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p4.1)\.
- D\. Cao, J\. Li, H\. Ma, and M\. Tomizuka \(2021\)Spectral temporal graph neural network for trajectory prediction\.In2021 IEEE International Conference on Robotics and Automation \(ICRA\),Vol\.,pp\. 1839–1845\.External Links:[Document](https://dx.doi.org/10.1109/ICRA48506.2021.9561461)Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p1.1)\.
- D\. Cao, Y\. Wang, J\. Duan, C\. Zhang, X\. Zhu, C\. Huang, Y\. Tong, B\. Xu, J\. Bai, J\. Tong,et al\.\(2020\)Spectral temporal graph neural network for multivariate time\-series forecasting\.Advances in neural information processing systems33,pp\. 17766–17778\.Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p1.1)\.
- D\. Cao, W\. Ye, Y\. Zhang, S\. Griesemer, and Y\. Liu \(2026\)PINFDit: energy\-based physics\-informed diffusion transformers for general\-purpose time series tasks\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=EphTlUJ4XN)Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p4.1)\.
- D\. Cao, W\. Ye, Y\. Zhang, and Y\. Liu \(2024b\)Timedit: general\-purpose diffusion transformers for time series foundation model\.arXiv preprint arXiv:2409\.02322\.Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px1.p1.1)\.
- D\. Cao, Y\. Zheng, P\. Hassanzadeh, S\. Lamba, X\. Liu, and Y\. Liu \(2023b\)Large scale financial time series forecasting with multi\-faceted model\.InProceedings of the Fourth ACM International Conference on AI in Finance,ICAIF ’23,New York, NY, USA,pp\. 472–480\.External Links:ISBN 9798400702402,[Link](https://doi.org/10.1145/3604237.3626868),[Document](https://dx.doi.org/10.1145/3604237.3626868)Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p1.1)\.
- A\. Das, W\. Kong, R\. Sen, and Y\. Zhou \(2024\)A decoder\-only foundation model for time\-series forecasting\.InForty\-first International Conference on Machine Learning,Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px1.p1.1)\.
- N\. Gillman, D\. Aggarwal, M\. Freeman, and C\. Sun \(2025\)Fourier head: helping large language models learn complex probability distributions\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=4hPwLg7zD3)Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px3.p1.1)\.
- M\. Goswami, K\. Szafer, A\. Choudhry, Y\. Cai, S\. Li, and A\. Dubrawski \(2024\)MOMENT: a family of open time\-series foundation models\.InInternational Conference on Machine Learning,pp\. 16115–16152\.Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px3.p1.1)\.
- N\. Gruver, M\. Finzi, S\. Qiu, and A\. G\. Wilson \(2024\)Large language models are zero\-shot time series forecasters\.Advances in Neural Information Processing Systems36\.Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px3.p1.1)\.
- Y\. Hu, Q\. Li, D\. Zhang, J\. Yan, and Y\. Chen \(2025\)Context\-alignment: activating and enhancing LLMs capabilities in time series\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=syC2764fPc)Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p2.1)\.
- F\. Jia, K\. Wang, Y\. Zheng, D\. Cao, and Y\. Liu \(2024\)GPT4MTS: prompt\-based large language model for multimodal time\-series forecasting\.InThe 14th Symposium on Educational Advances in Artificial Intelligence \(EAAI\-24\),Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p4.1),[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px2.p1.1)\.
- M\. Jin, S\. Wang, L\. Ma, Z\. Chu, J\. Y\. Zhang, X\. Shi, P\. Chen, Y\. Liang, Y\. Li, S\. Pan, and Q\. Wen \(2024\)Time\-LLM: time series forecasting by reprogramming large language models\.InThe Twelfth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=Unb5CVPtae)Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px2.p1.1)\.
- J\. Li, D\. Cao, L\. Li, W\. Yang, Y\. Qin, C\. Yu, T\. Yang, R\. A\. Rossi, Y\. Liu, X\. Hu,et al\.\(2026\)“Someone hid it\!”: query\-agnostic black\-box attacks on LLM\-based retrieval\.InForty\-third International Conference on Machine Learning,External Links:[Link](https://openreview.net/forum?id=bzmt9wJ6uW)Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p4.1)\.
- Y\. Liu, H\. Wu, J\. Wang, and M\. Long \(2022\)Non\-stationary transformers: exploring the stationarity in time series forecasting\.InAdvances in Neural Information Processing Systems,Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p1.1)\.
- C\. H\. Lubba, S\. S\. Sethi, P\. Knaute, S\. R\. Schultz, B\. D\. Fulcher, and N\. S\. Jones \(2019\)Catch22: canonical time\-series characteristics: selected through highly comparative time\-series analysis\.Data mining and knowledge discovery33\(6\),pp\. 1821–1852\.Cited by:[§3\.1](https://arxiv.org/html/2606.26487#S3.SS1.SSS0.Px3.p1.1)\.
- M\. A\. Merrill, M\. Tan, V\. Gupta, T\. Hartvigsen, and T\. Althoff \(2024\)Language models still struggle to zero\-shot reason about time series\.InEMNLP \(Findings\),Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p3.1),[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px3.p1.1)\.
- Y\. Nie, N\. H\. Nguyen, P\. Sinthong, and J\. Kalagnanam \(2023\)A time series is worth 64 words: long\-term forecasting with transformers\.InInternational Conference on Learning Representations \(ICLR ’23\),Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p4.1),[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px3.p1.1)\.
- OpenAI \(2023\)GPT\-4 technical report\.External Links:2303\.08774Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p2.1)\.
- B\. N\. Oreshkin, D\. Carpov, N\. Chapados, and Y\. Bengio \(2019\)N\-beats: neural basis expansion analysis for interpretable time series forecasting\.arXiv preprint arXiv:1905\.10437\.Cited by:[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px5.p1.1)\.
- S\. J\. Talukder, Y\. Yue, and G\. Gkioxari \(2024\)TOTEM: tokenized time series embeddings for general time series analysis\.Transactions on Machine Learning Research\.Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p4.1),[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px3.p1.1)\.
- C\. Wang, Q\. Qi, J\. Wang, H\. Sun, Z\. Zhuang, J\. Wu, L\. Zhang, and J\. Liao \(2025\)Chattime: a unified multimodal time series foundation model bridging numerical and textual data\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 12694–12702\.Cited by:[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px5.p1.1)\.
- X\. Wang, C\. Chang, D\. Cao, K\. Han, F\. Sun, Y\. Huang, M\. Wang, C\. Xu, X\. Luo, R\. Yan,et al\.\(2026a\)Position: beyond prediction: toward verifiable physiological waveform reasoning with foundation models and agentic LLMs\.InForty\-third International Conference on Machine Learning Position Paper Track,External Links:[Link](https://openreview.net/forum?id=cgpU6fhUXx)Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px3.p1.1)\.
- X\. Wang, K\. Han, Y\. Xu, X\. Luo, Y\. Sun, W\. Wang, and C\. Yang \(2026b\)SE\-diff: simulator and experience enhanced diffusion model for comprehensive ecg generation\.InThe Fourteenth International Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p1.1)\.
- X\. Wang, M\. Feng, J\. Qiu, J\. Gu, and J\. Zhao \(2024\)From news to forecast: integrating event analysis in llm\-based time series forecasting with reflection\.InNeural Information Processing Systems,Cited by:[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px1.p2.1)\.
- M\. Weng, D\. Cao, W\. Yang, Y\. Sharma, and Y\. Liu \(2026\)Temporalbench: a benchmark for evaluating llm\-based agents on contextual and event\-informed time series tasks\.arXiv preprint arXiv:2602\.13272\.Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px2.p1.1)\.
- G\. Woo, C\. Liu, A\. Kumar, C\. Xiong, S\. Savarese, and D\. Sahoo \(2024\)Unified training of universal time series forecasting transformers\.InForty\-first International Conference on Machine Learning,Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px5.p1.1)\.
- H\. Wu, T\. Hu, Y\. Liu, H\. Zhou, J\. Wang, and M\. Long \(2023\)TimesNet: temporal 2d\-variation modeling for general time series analysis\.InThe Eleventh International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=ju_Uqw384Oq)Cited by:[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px5.p1.1)\.
- H\. Wu, J\. Xu, J\. Wang, and M\. Long \(2021\)Autoformer: decomposition transformers with auto\-correlation for long\-term series forecasting\.InAdvances in Neural Information Processing Systems \(NeurIPS\),pp\. 101–112\.Cited by:[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px5.p1.1)\.
- W\. Yang, D\. Cao, and Y\. Liu \(2025a\)Foundation models for demand forecasting via dual\-strategy ensembling\.arXiv preprint arXiv:2507\.22053\.Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px1.p1.1)\.
- W\. Yang, D\. Cao, J\. Pang, M\. Weng, and Y\. Liu \(2026\)Adaptive collaboration with humans: metacognitive policy optimization for multi\-agent LLMs with continual learning\.InThe Fourteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=IKVUB9Exuc)Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px2.p1.1)\.
- W\. Yang, M\. Weng, J\. Pang, D\. Cao, H\. Ping, P\. Zhang, S\. Li, Y\. Zhao, Q\. Yang, M\. Wang,et al\.\(2025b\)Toward evolutionary intelligence: llm\-based agentic systems with multi\-agent reinforcement learning\.Available at SSRN 5819182\.Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px2.p1.1)\.
- W\. Ye, J\. Liu, D\. Cao, W\. Yang, and Y\. Liu \(2025\)When llm meets time series: can llms perform multi\-step time series reasoning and inference\.arXiv preprint arXiv:2509\.01822\.Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p3.1)\.
- W\. Ye, W\. Yang, D\. Cao, Y\. Zhang, L\. Tang, J\. Cai, and Y\. Liu \(2026\)TS\-reasoner: domain\-oriented time series inference agents for reasoning and automated analysis\.Transactions on Machine Learning Research\.Note:External Links:ISSN 2835\-8856,[Link](https://openreview.net/forum?id=yhy7Vigjcf)Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p4.1)\.
- A\. Zeng, M\. Chen, L\. Zhang, and Q\. Xu \(2023\)Are transformers effective for time series forecasting?\.InProceedings of the AAAI conference on artificial intelligence,Vol\.37,pp\. 11121–11128\.Cited by:[§2](https://arxiv.org/html/2606.26487#S2.SS0.SSS0.Px3.p1.1),[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px5.p1.1)\.
- Y\. Zhang, D\. Cao, and Y\. Liu \(2022\)Counterfactual neural temporal point process for estimating causal influence of misinformation on social media\.Advances in Neural Information Processing Systems35,pp\. 10643–10655\.Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p1.1)\.
- Y\. Zhang, L\. Du, D\. Cao, Q\. Fu, and Y\. Liu \(2024\)Guiding large language models with divide\-and\-conquer program for discerning problem solving\.arXiv preprint arXiv:2402\.05359\.Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p2.1)\.
- H\. Zhou, S\. Zhang, J\. Peng, S\. Zhang, J\. Li, H\. Xiong, and W\. Zhang \(2021\)Informer: beyond efficient transformer for long sequence time\-series forecasting\.InProceedings of AAAI,Cited by:[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px5.p1.1)\.
- T\. Zhou, D\. Fu, M\. Soltanolkotabi, R\. Jia, and V\. Sharan \(2025\)FoNE: precise single\-token number embeddings via fourier features\.arXiv preprint arXiv:2502\.09741\.Cited by:[§4\.1](https://arxiv.org/html/2606.26487#S4.SS1.SSS0.Px5.p1.1)\.
- Z\. Zhou and R\. Yu \(2025\)Can LLMs understand time series anomalies?\.InThe Thirteenth International Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=LGafQ1g2D2)Cited by:[§1](https://arxiv.org/html/2606.26487#S1.p2.1)\.

Similar Articles

Number-aware embeddings

Reddit r/LocalLLaMA

A technique to make embedding models aware of number ordering by overriding tokenizer and MLM fine-tuning, achieving 59% accuracy on number sorting benchmarks.

Multiplication in Multimodal LLMs: Computation with Text, Image, and Audio Inputs

Hugging Face Daily Papers

This paper investigates the arithmetic limitations of multimodal LLMs on multi-digit multiplication across text, image, and audio modalities, introducing a controlled benchmark and a novel 'arithmetic load' metric (C) that better predicts model accuracy than traditional step-counting methods. Results show accuracy collapses as C grows, and that performance degradation is primarily computational rather than perceptual.