BatteryMFormer: Multi-level Learning for Battery Degradation Trajectory Forecasting
Summary
The paper proposes BatteryMFormer, a multi-level Transformer for early battery degradation trajectory forecasting that integrates aging-condition-aware decoding, meta degradation pattern memory, and dual-view encoding to capture multi-level degradation structures and SOC-localized variations, consistently outperforming state-of-the-art baselines across four battery domains.
View Cached Full Text
Cached at: 05/27/26, 09:11 AM
# BatteryMFormer: Multi-level Learning for Battery Degradation Trajectory Forecasting Source: [https://arxiv.org/html/2605.27044](https://arxiv.org/html/2605.27044) Ruifeng TanSustainable Energy and Environment Thrust, The Hong Kong University of Science and Technology \(Guangzhou\)GuangzhouChina[rtan474@connect\.hkust\-gz\.edu\.cn](https://arxiv.org/html/2605.27044v1/mailto:[email protected])Jintao DongSchool of Computer Science and Engineering, Central South UniversityChangshaChina[jintaodong@csu\.edu\.cn](https://arxiv.org/html/2605.27044v1/mailto:[email protected]),Weixiang HongSustainable Energy and Environment Thrust, The Hong Kong University of Science and Technology \(Guangzhou\)GuangzhouChina[whong719@connect\.hkust\-gz\.edu\.cn](https://arxiv.org/html/2605.27044v1/mailto:[email protected]),Jia LiData Science and Analytics Thrust, The Hong Kong University of Science and Technology \(Guangzhou\)GuangzhouChina[jialee@ust\.hk](https://arxiv.org/html/2605.27044v1/mailto:[email protected]),Jiaqiang HuangSustainable Energy and Environment Thrust, The Hong Kong University of Science and Technology \(Guangzhou\)GuangzhouChina[seejhuang@hkust\-gz\.edu\.cn](https://arxiv.org/html/2605.27044v1/mailto:[email protected])andTong\-Yi ZhangMaterial Genome Institute, Shanghai UniversityShanghaiChinaAdvanced Materials Thrust and Sustainable Energy and Environment Thrust, The Hong Kong University of Science and Technology \(Guangzhou\)GuangzhouChina[mezhangt@hkust\-gz\.edu\.cn](https://arxiv.org/html/2605.27044v1/mailto:[email protected]) \(2026\) ###### Abstract\. Early battery degradation trajectory forecasting \(BDTF\), which predicts the full\-life state\-of\-health trajectory from early operational data, is critical for battery optimization, manufacturing, and deployment\. Battery degradation data exhibit two key characteristics\. First, degradation data present a multi\-level structure, including regularities shared within aging conditions and trajectory patterns shared across batteries\. Second, degradation\-related variations in voltage\-current profiles are often localized to specific state of charge \(SOC\) intervals\. Existing approaches often fail to explicitly model these characteristics\. To bridge this gap, we propose BatteryMFormer, a multi\-level Transformer for early BDTF\. BatteryMFormer integrates \(1\) an aging\-condition\-aware decoder that injects aging\-condition priors via aging\-condition\-informed queries and aging\-condition\-aware attention, \(2\) a meta degradation pattern memory that learns and retrieves trajectory prototypes to guide long\-horizon forecasting, and \(3\) a dual\-view encoder that jointly captures temporal dynamics and SOC\-localized variations from voltage and current time series\. Extensive experiments on four battery domains show that BatteryMFormer consistently outperforms state\-of\-the\-art baselines, marking a significant step toward reliable BDTF\. Our code is available at[https://github\.com/Ruifeng\-Tan/BatteryMFormer](https://github.com/Ruifeng-Tan/BatteryMFormer)\. materials informatics, battery informatics, time series ††journalyear:2026††copyright:cc††conference:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2; August 09–13, 2026; Jeju Island, Republic of Korea††booktitle:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2 \(KDD ’26\), August 09–13, 2026, Jeju Island, Republic of Korea††doi:10\.1145/3770855\.3818948††isbn:979\-8\-4007\-2259\-2/2026/08††ccs:Information systems Data mining## 1\.Introduction Rechargeable batteries are ubiquitous in modern industry, powering applications ranging from electric vehicles and grid\-scale energy storage to portable electronics\(Huanget al\.,[2022](https://arxiv.org/html/2605.27044#bib.bib74); Taoet al\.,[2023](https://arxiv.org/html/2605.27044#bib.bib34); Zhanget al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib15); Tanet al\.,[2025a](https://arxiv.org/html/2605.27044#bib.bib14)\)\. In 2024, global battery shipments exceeded 1545 GWh and are projected to reach 4700 GWh by 2030\(Zhenget al\.,[2026](https://arxiv.org/html/2605.27044#bib.bib102); Fleischmannet al\.,[2023](https://arxiv.org/html/2605.27044#bib.bib17)\)\. This rapid expansion highlights the need for advanced modeling frameworks to support battery optimization, manufacturing, and deployment\(Liet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib18); Tanet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib59); Zhanget al\.,[2025a](https://arxiv.org/html/2605.27044#bib.bib48); Seversonet al\.,[2019](https://arxiv.org/html/2605.27044#bib.bib71); Attiaet al\.,[2020](https://arxiv.org/html/2605.27044#bib.bib62)\)\. In particular, battery degradation trajectory forecasting \(BDTF\), which predicts battery state\-of\-health \(SOH\) trajectories from beginning of life to end of life, occupies a critical frontier\. By forecasting full\-life degradation trajectory from early\-stage operational data, BDTF enables accelerated degradation assessment and timely maintenance for battery\-powered systems\(Tanet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib59); Liet al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib22); Huanget al\.,[2026](https://arxiv.org/html/2605.27044#bib.bib19)\)\. Machine learning \(ML\) models have recently emerged as promising solutions to BDTF\. Existing approaches primarily fall into feature\-engineering\-based methods and representation\-learning\-based methods\. Feature\-engineering\-based methods design descriptors from voltage and current time series \(Figure[1](https://arxiv.org/html/2605.27044#S1.F1)a\) using domain knowledge\(Taoet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib82); Liet al\.,[2024a](https://arxiv.org/html/2605.27044#bib.bib90); Menget al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib11)\), whereas these features are often protocol\-specific or dataset\-specific and may be unavailable or ineffective across diverse aging conditions\. Representation\-learning\-based methods instead focus on learning mappings from raw measurements to future SOH trajectories\(Liet al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib22),[2022](https://arxiv.org/html/2605.27044#bib.bib83); Tanet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib59); Liuet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib8); Huanget al\.,[2026](https://arxiv.org/html/2605.27044#bib.bib19); Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98); Huanget al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib9); Shenet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib89)\)\. An intuitive modeling choice is to treat BDTF as generic time\-series forecasting and extrapolate future SOH from historical SOH using generic time series forecasters \(e\.g\. Informer\(Zhouet al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib79)\)\)\(Liet al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib22),[2022](https://arxiv.org/html/2605.27044#bib.bib83); Tanet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib59); Liuet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib8); Shenet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib89)\)\. While effective in some settings, early\-cycle SOH can be nearly indistinguishable across batteries whose long\-horizon trajectories diverge substantially, and therefore forecasting with SOH as the only input can be unsuitable for early BDTF \(Figure[1](https://arxiv.org/html/2605.27044#S1.F1)b\)\. This limitation has motivated growing interest in models that exploit fine\-grained voltage–current profiles for forecasting\(Huanget al\.,[2026](https://arxiv.org/html/2605.27044#bib.bib19); Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98); Huanget al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib9)\)\. Figure 1\.Motivation for multi\-level learning in BDTF\.\(a\) An example of partial operational voltage and current time series\. \(b\) SOH trajectories under different aging conditions\. \(c\) Schematic of three canonical trajectory shapes\. \(d\) Examples of additional trajectory phenomena beyond the canonical shapes\.Despite these advances, current models still exhibit two critical research gaps\. First, these methods operate at the*battery level*and do not explicitly model the*multi\-level structure*of degradation\. Batteries under the same aging condition \(e\.g\., specifications, formation, and operating conditions\) exhibit consistent operational patterns, and prior work shows that batteries under similar aging conditions can be characterized by a small set of handcrafted descriptors\(Seversonet al\.,[2019](https://arxiv.org/html/2605.27044#bib.bib71); Wenget al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib94); Kimet al\.,[2023](https://arxiv.org/html/2605.27044#bib.bib92); Taoet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib82); Liet al\.,[2022](https://arxiv.org/html/2605.27044#bib.bib83)\)\. However, existing models fail to promote*aging\-condition\-consistent*representations\. Moreover, although trajectories appear diverse, established battery knowledge\(Attiaet al\.,[2022](https://arxiv.org/html/2605.27044#bib.bib10)\)suggests that their*global shapes are highly structured*and often fall into a small family of patterns linked to common mechanisms \(Figure[1](https://arxiv.org/html/2605.27044#S1.F1)c\)\. Additional phenomena such as initial capacity rise\(Seversonet al\.,[2019](https://arxiv.org/html/2605.27044#bib.bib71)\)and capacity regeneration\(Huanget al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib9)\)can occur \(Figure[1](https://arxiv.org/html/2605.27044#S1.F1)d\), but the space of plausible trajectories remains constrained\. Second, degradation\-relevant variations in voltage–current profiles often concentrate within specific SOC intervals \(Figure[2](https://arxiv.org/html/2605.27044#S2.F2)\), as underlying electrochemical mechanisms can be manifested as localized electrochemical signal variations along the SOC axis \(e\.g\., phase transition\)\(Tanet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib59); Birklet al\.,[2017](https://arxiv.org/html/2605.27044#bib.bib24)\)\. Nevertheless, most methods either emphasize temporal modeling or treat SOC intervals uniformly, diluting localized signals\. To address these limitations, we proposeBatteryMFormer\(Battery Multi\-level Transformer\), a novel deep learning architecture that integrates multi\-level learning across aging conditions, trajectory patterns, and battery\-specific representations\. BatteryMFormer consists of three major components: \(1\) Aging\-condition\-aware decoder that injects aging\-condition priors via aging\-condition\-informed queries and aging\-condition\-aware attention to promote aging\-condition\-consistent representations; \(2\) Meta degradation pattern memory that learns and retrieves prototypical trajectory patterns to guide long\-horizon forecasting; and \(3\) Dual\-view encoder that captures complementary temporal dynamics and SOC\-localized variations from voltage\-current profiles\. The main contributions of this paper are summarized as follows: - •We identify and formalize the multi\-level structure of early BDTF, including aging\-condition regularities, trajectory patterns shared across batteries, and SOC\-localized degradation signatures in operational data\. - •We propose BatteryMFormer, a multi\-level Transformer that integrates \(i\) an aging\-condition\-aware decoder, \(ii\) a meta degradation pattern memory, and \(iii\) a dual\-view encoder with temporal and SOC perspectives\. - •We conduct extensive experimental evaluation, the results from which demonstrate the superior performance of our approach across four battery domains from the largest public real\-world battery lifetime database\. ## 2\.Preliminaries ### 2\.1\.Aging Condition We use aging condition to denote the recorded experimental settings and battery specifications that determine a battery’s degradation regime\. In this work, an aging condition is represented as a tuple of aging factors, including positive electrode, negative electrode, electrolyte, package structure, nominal capacity, manufacturer, formation protocol, charge protocol, discharge protocol, and operating temperature\. Different factor tuples correspond to different aging conditions\. Batteries operated under different aging conditions can exhibit distinct degradation trajectories \(Figure[1](https://arxiv.org/html/2605.27044#S1.F1)b\) and patterns of voltage–current profiles \(Figure[2](https://arxiv.org/html/2605.27044#S2.F2)\)\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98); Zhanget al\.,[2025a](https://arxiv.org/html/2605.27044#bib.bib48); Tanet al\.,[2025a](https://arxiv.org/html/2605.27044#bib.bib14)\)\. ### 2\.2\.Degradation Trajectory Degradation trajectories are measured from repeated cycles, with each having a charge and discharge process\. Following prior work\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98); Maet al\.,[2022](https://arxiv.org/html/2605.27044#bib.bib63); Seversonet al\.,[2019](https://arxiv.org/html/2605.27044#bib.bib71)\), we compute the discharge capacity of cycleiias \(1\)Capi=∫t1t2\|I\(t\)\|𝑑t,Cap\_\{i\}=\\int\_\{t\_\{1\}\}^\{t\_\{2\}\}\|I\(t\)\|\\,dt,wheret1t\_\{1\}andt2t\_\{2\}denote the start and end times of the discharge process, andI\(t\)I\(t\)is the measured current at timett, with\|I\(t\)\|\|I\(t\)\|used to make the definition invariant to sign conventions\. The state of health \(SOH\) at cycleiiis defined as \(2\)SOHi=CapiCap0×DoD,\\mathrm\{SOH\}\_\{i\}=\\frac\{Cap\_\{i\}\}\{Cap\_\{0\}\\times DoD\},whereDoDDoDis the depth of discharge, andCap0Cap\_\{0\}denotes the nominal capacity for all datasets except CALB, whereCap0Cap\_\{0\}is defined as the first\-cycle discharge capacity following the CALB protocol in BatteryLife\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)\. Figure 2\.SOC\-localized degradation signatures in voltage–current profiles\.Voltage–SOC \(top\) and current–SOC \(bottom\) curves over the first 100 cycles for two representative aging conditions\. Although the global profiles evolve smoothly with cycling, aging\-induced deviations can concentrate within specific SOC intervals \(dashed boxes\)\. ### 2\.3\.Task Formulation Following prior work\(Zhanget al\.,[2025a](https://arxiv.org/html/2605.27044#bib.bib48); Seversonet al\.,[2019](https://arxiv.org/html/2605.27044#bib.bib71); Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\), we use the firstS≤100S\\leq 100cycles as the early stage and forecast the SOH trajectory beyond the observation window\. We denote by𝒂\\boldsymbol\{a\}the available aging\-condition metadata of a battery, including recorded experimental settings and specifications\. Let𝐗i\\mathbf\{X\}\_\{i\}denote the cycle\-iioperational data, consisting of voltage and current time series \(and any auxiliary variables derived from𝒂\\boldsymbol\{a\}and these early\-cycle measurements, e\.g\., capacity and SOC\)\. We define the early input as ordered sequences \(3\)𝐆1:S=\(𝐗1:S,𝒂\),𝐗1:S=\[𝐗1,…,𝐗S\]\.\\mathbf\{G\}\_\{1:S\}=\\bigl\(\\mathbf\{X\}\_\{1:S\},\\,\\boldsymbol\{a\}\\bigr\),\\qquad\\mathbf\{X\}\_\{1:S\}=\[\\mathbf\{X\}\_\{1\},\\ldots,\\mathbf\{X\}\_\{S\}\]\.We useteolt\_\{\\mathrm\{eol\}\}to denote the end\-of\-life \(EOL\) cycle index, defined as the first cycle at whichSOH\\mathrm\{SOH\}falls below a thresholdτ\\tau\(Appendix[A](https://arxiv.org/html/2605.27044#A1)\)\. Let𝐲1:teol∈ℝteol\\mathbf\{y\}\_\{1:t\_\{\\mathrm\{eol\}\}\}\\in\\mathbb\{R\}^\{t\_\{\\mathrm\{eol\}\}\}denote the measured SOH trajectory\. The goal of early BDTF is to learn a forecasting modelf\(⋅\)f\(\\cdot\)that predicts the future SOH trajectory given the firstSScycles: \(4\)𝐲^S\+1:teol=f\(𝐆1:S\)\.\\hat\{\\mathbf\{y\}\}\_\{S\+1:t\_\{\\mathrm\{eol\}\}\}=f\(\\mathbf\{G\}\_\{1:S\}\)\. Figure 3\.An overview of BatteryMFormer\.Left: dual\-view encoder \(temporal and SOC views\)\. Middle: aging\-condition\-aware decoder\. Right: details of meta degradation pattern memory and aging\-condition\-aware attention\. ## 3\.Methodology Figure[3](https://arxiv.org/html/2605.27044#S2.F3)presents the overall architecture of BatteryMFormer, a Transformer with multi\-level inductive biases for early BDTF\. BatteryMFormer encodes early operational data into complementary temporal and SOC tokens via a dual\-view encoder \(Section[3\.1](https://arxiv.org/html/2605.27044#S3.SS1)\), refines these tokens with an aging\-condition\-aware decoder \(Section[3\.2](https://arxiv.org/html/2605.27044#S3.SS2)\), and retrieves prototypical trajectory patterns from a meta degradation pattern memory \(Section[3\.3](https://arxiv.org/html/2605.27044#S3.SS3)\) to guide long\-horizon forecasting\. ### 3\.1\.Dual\-View Encoder The dual\-view encoder maps early operational data into temporal\-view and SOC\-view tokens\. Following BatteryLife\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\), we obtain within\-cycle capacity via ampere\-hour counting from current time series to encode temporal information and additionally compute SOC \(Appendix[B](https://arxiv.org/html/2605.27044#A2)\)\. After resampling each cycle toLLdata points, the firstSScycles are represented as𝐗∈ℝS×L×4\\mathbf\{X\}\\in\\mathbb\{R\}^\{S\\times L\\times 4\}with variables \(voltage, current, capacity, SOC\)\. SOC view\.To capture SOC\-localized degradation signatures \(Figure[2](https://arxiv.org/html/2605.27044#S2.F2)\), we construct SOC\-view tokens by modeling cross\-cycle evolution within each SOC interval\. Given𝐗∈ℝS×L×4\\mathbf\{X\}\\in\\mathbb\{R\}^\{S\\times L\\times 4\}, where each cycle containsLLSOC\-aligned points and 4 variables, we reshape theii\-th cycle as𝐗i∈ℝ4×L\\mathbf\{X\}\_\{i\}\\in\\mathbb\{R\}^\{4\\times L\}, treating variables as channels\. We then apply a 1D convolution along the SOC axis: \(5\)𝐙^i=Conv1D\(𝐗i\)∈ℝd×M,M=⌊L−PP⌋\+1,\\hat\{\\mathbf\{Z\}\}\_\{i\}=\\mathrm\{Conv1D\}\(\\mathbf\{X\}\_\{i\}\)\\in\\mathbb\{R\}^\{d\\times M\},\\qquad M=\\left\\lfloor\\frac\{L\-P\}\{P\}\\right\\rfloor\+1,wherePPis both the patch length and stride\. Stacking all cycles yields𝐙^∈ℝS×d×M\\hat\{\\mathbf\{Z\}\}\\in\\mathbb\{R\}^\{S\\times d\\times M\}\. For each SOC intervalmm, we collect𝐙^:,:,m∈ℝS×d\\hat\{\\mathbf\{Z\}\}\_\{:,:,m\}\\in\\mathbb\{R\}^\{S\\times d\}across cycles and feed it into a shared temporal encoder implemented with feed\-forward neural networks and GELU activations\(Hendrycks and Gimpel,[2016](https://arxiv.org/html/2605.27044#bib.bib85)\)\. The encoder aggregates information along the cycle axis and produces one SOC token: \(6\)𝐭msoc=TempEnc\(𝐙^:,:,m\)∈ℝd\.\\mathbf\{t\}^\{\\mathrm\{soc\}\}\_\{m\}=\\mathrm\{TempEnc\}\(\\hat\{\\mathbf\{Z\}\}\_\{:,:,m\}\)\\in\\mathbb\{R\}^\{d\}\.Concatenating all interval tokens yields𝐓soc=\[𝐭1soc;…;𝐭Msoc\]∈ℝM×d\\mathbf\{T\}^\{\\mathrm\{soc\}\}=\[\\mathbf\{t\}\_\{1\}^\{\\mathrm\{soc\}\};\\ldots;\\mathbf\{t\}\_\{M\}^\{\\mathrm\{soc\}\}\]\\in\\mathbb\{R\}^\{M\\times d\}\. Temporal view\.In parallel to the SOC view, we construct a temporal view that summarizes each early cycle as a cycle\-level token to capture intra\-cycle dynamics\. Following CyclePatch\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\), we project the resampled multivariate series of cycleiiinto add\-dimensional embedding and refine it with an intra\-cycle encoder: \(7\)𝐗^i=CyclePatch\(𝐗i\)=Flatten\(𝐗i\)𝐖\+𝐛,\\displaystyle\\hat\{\\mathbf\{X\}\}\_\{i\}=\\mathrm\{CyclePatch\}\(\\mathbf\{X\}\_\{i\}\)=\\mathrm\{Flatten\}\(\\mathbf\{X\}\_\{i\}\)\\mathbf\{W\}\+\\mathbf\{b\},\(8\)𝐳itemporal=Intra\-CycleEncoder\(𝐗^i\),i=1,…,S,\\displaystyle\\mathbf\{z\}^\{\\mathrm\{temporal\}\}\_\{i\}=\\mathrm\{Intra\\text\{\-\}CycleEncoder\}\(\\hat\{\\mathbf\{X\}\}\_\{i\}\),\\qquad i=1,\\ldots,S,where𝐖\\mathbf\{W\}and𝐛\\mathbf\{b\}are learnable parameters\. Stacking\{𝐳itemporal\}i=1S\\\{\\mathbf\{z\}^\{\\mathrm\{temporal\}\}\_\{i\}\\\}\_\{i=1\}^\{S\}yields temporal tokens \(9\)𝐇temporal=\[𝐳1temporal;…;𝐳Stemporal\]∈ℝS×d\.\\mathbf\{H\}^\{\\mathrm\{temporal\}\}=\[\\mathbf\{z\}^\{\\mathrm\{temporal\}\}\_\{1\};\\ldots;\\mathbf\{z\}^\{\\mathrm\{temporal\}\}\_\{S\}\]\\in\\mathbb\{R\}^\{S\\times d\}\.We further inject cycle\-level descriptors by projecting𝐗f∈ℝS×df\\mathbf\{X\}\_\{f\}\\in\\mathbb\{R\}^\{S\\times d\_\{f\}\}to the token space and adding it to𝐇temporal\\mathbf\{H\}^\{\\mathrm\{temporal\}\}: \(10\)𝐓temporal=𝐇temporal\+𝐗f𝐖f\+𝐛f,\\mathbf\{T\}^\{\\mathrm\{temporal\}\}=\\mathbf\{H\}^\{\\mathrm\{temporal\}\}\+\\mathbf\{X\}\_\{f\}\\mathbf\{W\}\_\{f\}\+\\mathbf\{b\}\_\{f\},where𝐖f\\mathbf\{W\}\_\{f\}and𝐛f\\mathbf\{b\}\_\{f\}are learnable parameters\. In this work,𝐗f\\mathbf\{X\}\_\{f\}consists of Coulombic efficiency and energy efficiency, which are commonly available on a per\-cycle basis\. Together,𝐓temporal∈ℝS×d\\mathbf\{T\}^\{\\mathrm\{temporal\}\}\\in\\mathbb\{R\}^\{S\\times d\}and𝐓soc∈ℝM×d\\mathbf\{T\}^\{\\mathrm\{soc\}\}\\in\\mathbb\{R\}^\{M\\times d\}provide complementary inputs for subsequent decoding\. ### 3\.2\.Aging\-Condition\-Aware Decoder Batteries operated under the same/similar aging conditions often exhibit consistent/similar degradation signatures\(Seversonet al\.,[2019](https://arxiv.org/html/2605.27044#bib.bib71); Wenget al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib94); Kimet al\.,[2023](https://arxiv.org/html/2605.27044#bib.bib92); Taoet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib82)\)\. To exploit such aging\-condition\-level regularities, we design an aging\-condition\-aware decoder \(ACDecoder\) with two mechanisms: \(i\)*aging\-condition\-informed queries*, which inject an aging\-condition prior into the decoder states, and \(ii\)*aging\-condition\-aware attention*, which conditions attention on the aging\-condition prior\. Aging\-condition\-informed queries\.Inspired by\(Yeet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib88)\), ACDecoder starts from learnable generic queries𝐐g∈ℝs¯×d\\mathbf\{Q\}\_\{g\}\\in\\mathbb\{R\}^\{\\bar\{s\}\\times d\}and injects aging\-condition information via additive conditioning\. Let𝒂\\boldsymbol\{a\}denote the structured aging\-condition metadata \(Section[2\.1](https://arxiv.org/html/2605.27044#S2.SS1)\) and letπ\(𝒂\)\\pi\(\\boldsymbol\{a\}\)be the corresponding metadata\-to\-text prompt\(Tanet al\.,[2025a](https://arxiv.org/html/2605.27044#bib.bib14)\)\. We encodeπ\(𝒂\)\\pi\(\\boldsymbol\{a\}\)using a language\-based embedder: \(11\)𝐳ac=LastValid\(Enc\(π\(𝒂\)\)\)∈ℝdenc,\\displaystyle\\mathbf\{z\}^\{ac\}=\\mathrm\{LastValid\}\\bigl\(\\mathrm\{Enc\}\(\\pi\(\\boldsymbol\{a\}\)\)\\bigr\)\\in\\mathbb\{R\}^\{d\_\{\\mathrm\{enc\}\}\},\(12\)𝐞ac=𝐳ac𝐖1\+𝐛1∈ℝd,\\displaystyle\\mathbf\{e\}^\{ac\}=\\mathbf\{z\}^\{ac\}\\mathbf\{W\}\_\{1\}\+\\mathbf\{b\}\_\{1\}\\in\\mathbb\{R\}^\{d\},whereEnc\(⋅\)\\mathrm\{Enc\}\(\\cdot\)is a language\-based embedder \(Qwen3\-Embedding\-0\.6B\(Zhanget al\.,[2025c](https://arxiv.org/html/2605.27044#bib.bib44)\)\),LastValid\(⋅\)\\mathrm\{LastValid\}\(\\cdot\)retrieves the embedding of the last non\-padding token, and𝐖1∈ℝdenc×d\\mathbf\{W\}\_\{1\}\\in\\mathbb\{R\}^\{d\_\{\\mathrm\{enc\}\}\\times d\}and𝐛1∈ℝd\\mathbf\{b\}\_\{1\}\\in\\mathbb\{R\}^\{d\}are learnable parameters\. We then project𝐞ac\\mathbf\{e\}^\{ac\}to produce one prior vector per query token: \(13\)𝐞^iac=𝐞ac𝐖2,i\+𝐛2,i∈ℝd,i=1,…,s¯,\\displaystyle\\hat\{\\mathbf\{e\}\}^\{ac\}\_\{i\}=\\mathbf\{e\}^\{ac\}\\mathbf\{W\}\_\{2,i\}\+\\mathbf\{b\}\_\{2,i\}\\in\\mathbb\{R\}^\{d\},\\qquad i=1,\\ldots,\\bar\{s\},\(14\)𝐄^ac=\[𝐞^1ac;…;𝐞^s¯ac\]∈ℝs¯×d,\\displaystyle\\hat\{\\mathbf\{E\}\}^\{ac\}=\[\\hat\{\\mathbf\{e\}\}^\{ac\}\_\{1\};\\ldots;\\hat\{\\mathbf\{e\}\}^\{ac\}\_\{\\bar\{s\}\}\]\\in\\mathbb\{R\}^\{\\bar\{s\}\\times d\},\(15\)𝐗de=𝐐g\+𝐄^ac\.\\displaystyle\\mathbf\{X\}\_\{de\}=\\mathbf\{Q\}\_\{g\}\+\\hat\{\\mathbf\{E\}\}^\{ac\}\.Here𝐖2,i∈ℝd×d\\mathbf\{W\}\_\{2,i\}\\in\\mathbb\{R\}^\{d\\times d\}and𝐛2,i∈ℝd\\mathbf\{b\}\_\{2,i\}\\in\\mathbb\{R\}^\{d\}are learnable parameters\. Each𝐞^iac\\hat\{\\mathbf\{e\}\}^\{ac\}\_\{i\}provides a query\-specific prior, yielding aging\-condition\-informed queries \(ACQuery\)𝐗de\\mathbf\{X\}\_\{de\}for conditioning different queries on different aspects of the aging\-condition information\. Aging\-condition\-aware attention\.Beyond query initialization, ACDecoder promotes aging\-condition\-consistent attention by modulating queries with𝐄^ac\\hat\{\\mathbf\{E\}\}^\{ac\}\. Given queries𝐐\\mathbf\{Q\}and key–value tokens\(𝐊,𝐕\)\(\\mathbf\{K\},\\mathbf\{V\}\), we define aging\-condition\-aware attention \(ACAttention\) as follows: \(16\)ACAttention\(𝐐,𝐊,𝐕,𝐄^ac\)=Concat\(head1,…,headh\)𝐖O,\\displaystyle\\mathrm\{ACAttention\}\(\\mathbf\{Q\},\\mathbf\{K\},\\mathbf\{V\},\\hat\{\\mathbf\{E\}\}^\{ac\}\)=\\mathrm\{Concat\}\(\\mathrm\{head\}\_\{1\},\\ldots,\\mathrm\{head\}\_\{h\}\)\\mathbf\{W\}^\{O\},\(17\)headi=Attention\(\(𝐐\+𝐄^ac\)𝐖iQ,𝐊𝐖iK,𝐕𝐖iV\)\.\\displaystyle\\mathrm\{head\}\_\{i\}=\\mathrm\{Attention\}\\bigl\(\(\\mathbf\{Q\}\+\\hat\{\\mathbf\{E\}\}^\{ac\}\)\\mathbf\{W\}\_\{i\}^\{Q\},\\ \\mathbf\{K\}\\mathbf\{W\}\_\{i\}^\{K\},\\ \\mathbf\{V\}\\mathbf\{W\}\_\{i\}^\{V\}\\bigr\)\.HereAttention\(⋅\)\\mathrm\{Attention\}\(\\cdot\)is the attention in standard Transformer\(Vaswaniet al\.,[2017](https://arxiv.org/html/2605.27044#bib.bib23)\)\. This query modulation injects aging\-condition priors into every attention operation, thereby promoting aging\-condition\-consistent decoding throughout the network\. ACDecoder layer\.Let𝐓temporal∈ℝS×d\\mathbf\{T\}^\{\\mathrm\{temporal\}\}\\in\\mathbb\{R\}^\{S\\times d\}and𝐓soc∈ℝM×d\\mathbf\{T\}^\{\\mathrm\{soc\}\}\\in\\mathbb\{R\}^\{M\\times d\}be the dual\-view tokens \(Section[3\.1](https://arxiv.org/html/2605.27044#S3.SS1)\), and𝐓=\[𝐓temporal;𝐓soc\]\+𝐏∈ℝ\(S\+M\)×d\\mathbf\{T\}=\[\\mathbf\{T\}^\{\\mathrm\{temporal\}\};\\mathbf\{T\}^\{\\mathrm\{soc\}\}\]\+\\mathbf\{P\}\\in\\mathbb\{R\}^\{\(S\+M\)\\times d\}, where𝐏∈ℝ\(S\+M\)×d\\mathbf\{P\}\\in\\mathbb\{R\}^\{\(S\+M\)\\times d\}is positional encoding\(Vaswaniet al\.,[2017](https://arxiv.org/html/2605.27044#bib.bib23)\)\. With𝐇0=𝐗de∈ℝs¯×d\\mathbf\{H\}^\{0\}=\\mathbf\{X\}\_\{de\}\\in\\mathbb\{R\}^\{\\bar\{s\}\\times d\}, the process in thell\-th ACDecoder layer is \(18\)𝐇1l=LN\(𝐇l−1\+ACAttention\(𝐇l−1,𝐇l−1,𝐇l−1,𝐄^ac\)\),\\displaystyle\\mathbf\{H\}^\{l\}\_\{1\}=\\mathrm\{LN\}\\Bigl\(\\mathbf\{H\}^\{l\-1\}\+\\mathrm\{ACAttention\}\\bigl\(\\mathbf\{H\}^\{l\-1\},\\mathbf\{H\}^\{l\-1\},\\mathbf\{H\}^\{l\-1\},\\hat\{\\mathbf\{E\}\}^\{ac\}\\bigr\)\\Bigr\),\(19\)𝐇2l=LN\(𝐇1l\+ACAttention\(𝐇1l,𝐓,𝐓,𝐄^ac\)\),\\displaystyle\\mathbf\{H\}^\{l\}\_\{2\}=\\mathrm\{LN\}\\Bigl\(\\mathbf\{H\}^\{l\}\_\{1\}\+\\mathrm\{ACAttention\}\\bigl\(\\mathbf\{H\}^\{l\}\_\{1\},\\mathbf\{T\},\\mathbf\{T\},\\hat\{\\mathbf\{E\}\}^\{ac\}\\bigr\)\\Bigr\),\(20\)𝐇l=LN\(𝐇2l\+FFN\(𝐇2l\)\),l=1,…,Lde,\\displaystyle\\mathbf\{H\}^\{l\}=\\mathrm\{LN\}\\Bigl\(\\mathbf\{H\}^\{l\}\_\{2\}\+\\mathrm\{FFN\}\(\\mathbf\{H\}^\{l\}\_\{2\}\)\\Bigr\),\\qquad l=1,\\ldots,L\_\{de\},whereLN\(⋅\)\\mathrm\{LN\}\(\\cdot\)denotes LayerNorm\(Baet al\.,[2016](https://arxiv.org/html/2605.27044#bib.bib21)\)and𝐇l∈ℝs¯×d\\mathbf\{H\}^\{l\}\\in\\mathbb\{R\}^\{\\bar\{s\}\\times d\}is the query representation afterlllayers\. ### 3\.3\.Meta Degradation Pattern Memory Established battery knowledge\(Attiaet al\.,[2022](https://arxiv.org/html/2605.27044#bib.bib10)\)suggests that battery degradation trajectories share a small set of patterns across batteries\. We call these shared trajectory prototypes*meta degradation patterns*, as they compose diverse real\-world trajectories\. Inspired by memory networks\(Westonet al\.,[2015](https://arxiv.org/html/2605.27044#bib.bib87); Tanet al\.,[2023](https://arxiv.org/html/2605.27044#bib.bib86)\), we propose a meta degradation pattern memory \(MDPM\) to store and retrieve such prototypes\. MDPM maintainsNmemN\_\{\\mathrm\{mem\}\}learnable memory slots𝛀∈ℝNmem×d\\mathbf\{\\Omega\}\\in\\mathbb\{R\}^\{N\_\{\\mathrm\{mem\}\}\\times d\}, where each slot𝛀i∈ℝd\\mathbf\{\\Omega\}\_\{i\}\\in\\mathbb\{R\}^\{d\}stores one vector representation of a meta degradation pattern\. Pattern retrieval\.Given decoder output𝐇Lde∈ℝs¯×d\\mathbf\{H\}^\{L\_\{de\}\}\\in\\mathbb\{R\}^\{\\bar\{s\}\\times d\}, we transform it into a memory query for retrieving relevant patterns by cosine similarity: \(21\)𝐪mem\\displaystyle\\mathbf\{q\}\_\{mem\}=FFN\(Flatten\(𝐇Lde\)\)∈ℝd,\\displaystyle=\\mathrm\{FFN\}\\left\(\\mathrm\{Flatten\}\(\\mathbf\{H\}^\{L\_\{de\}\}\)\\right\)\\in\\mathbb\{R\}^\{d\},\(22\)si\\displaystyle s\_\{i\}=𝐪mem⊤𝛀i‖𝐪mem‖2‖𝛀i‖2,i=1,…,Nmem\.\\displaystyle=\\frac\{\\mathbf\{q\}\_\{mem\}^\{\\top\}\\mathbf\{\\Omega\}\_\{i\}\}\{\\\|\\mathbf\{q\}\_\{mem\}\\\|\_\{2\}\\,\\\|\\mathbf\{\\Omega\}\_\{i\}\\\|\_\{2\}\},\\qquad i=1,\\ldots,N\_\{\\mathrm\{mem\}\}\.We select the top\-22memory slots with the largest similarity scores\. Letℐ2\\mathcal\{I\}\_\{2\}denote the corresponding index set\. The relevant pattern embedding𝐡mem\\mathbf\{h\}\_\{mem\}is retrieved as follows: \(23\)αi\\displaystyle\\alpha\_\{i\}=exp\(si\)∑k∈ℐ2exp\(sk\),i∈ℐ2,\\displaystyle=\\frac\{\\exp\(s\_\{i\}\)\}\{\\sum\_\{k\\in\\mathcal\{I\}\_\{2\}\}\\exp\(s\_\{k\}\)\},\\qquad i\\in\\mathcal\{I\}\_\{2\},\(24\)𝐡mem\\displaystyle\\mathbf\{h\}\_\{mem\}=∑i∈ℐ2αi𝛀i∈ℝd\.\\displaystyle=\\sum\_\{i\\in\\mathcal\{I\}\_\{2\}\}\\alpha\_\{i\}\\,\\mathbf\{\\Omega\}\_\{i\}\\in\\mathbb\{R\}^\{d\}\. Memory learning\.During training, we encourage the retrieved pattern embedding𝐡mem\\mathbf\{h\}\_\{mem\}to align with a full\-life trajectory embedding𝐞trajectory\\mathbf\{e\}\_\{trajectory\}: \(25\)ℒalign=1N∑i=1N\(1−𝐡mem,i⋅𝐞trajectory,i‖𝐡mem,i‖2‖𝐞trajectory,i‖2\),\\mathcal\{L\}\_\{align\}=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\left\(1\-\\frac\{\\mathbf\{h\}\_\{mem,i\}\\cdot\\mathbf\{e\}\_\{trajectory,i\}\}\{\\\|\\mathbf\{h\}\_\{mem,i\}\\\|\_\{2\}\\,\\\|\\mathbf\{e\}\_\{trajectory,i\}\\\|\_\{2\}\}\\right\),whereNNis the batch size,𝐞trajectory,i=TrajectoryEncoder\(𝐲i\)\\mathbf\{e\}\_\{trajectory,i\}=\\mathrm\{TrajectoryEncoder\}\(\\mathbf\{y\}\_\{i\}\), and𝐡mem,i\\mathbf\{h\}\_\{mem,i\}is the retrieved pattern embedding for sampleii\. To ensure𝐞trajectory\\mathbf\{e\}\_\{trajectory\}preserves trajectory information, we reconstruct the trajectory with a decoder: \(26\)𝐲¯\\displaystyle\\bar\{\\mathbf\{y\}\}=TrajectoryDecoder\(𝐞trajectory\),\\displaystyle=\\mathrm\{TrajectoryDecoder\}\(\\mathbf\{e\}\_\{trajectory\}\),\(27\)ℒrecover\\displaystyle\\mathcal\{L\}\_\{recover\}=1N∑i=1N1Oi∑j=1Tmaxmaskij\(𝐲ij−𝐲¯ij\)2,\\displaystyle=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\frac\{1\}\{O\_\{i\}\}\\sum\_\{j=1\}^\{T\_\{\\max\}\}mask\_\{ij\}\\left\(\\mathbf\{y\}\_\{ij\}\-\\bar\{\\mathbf\{y\}\}\_\{ij\}\\right\)^\{2\},whereTmax=5000T\_\{\\max\}=5000is the maximum horizon, set to cover the longest degradation trajectories in the database,maskij∈\{0,1\}mask\_\{ij\}\\in\\\{0,1\\\}indicates whether the ground\-truth SOHyijy\_\{ij\}is available at cyclejjfor sampleiiand falls in the prediction region, andOi=∑j=1TmaxmaskijO\_\{i\}=\\sum\_\{j=1\}^\{T\_\{\\max\}\}mask\_\{ij\}is the number of observed SOH measurements\. BothTrajectoryEncoder\(⋅\)\\mathrm\{TrajectoryEncoder\}\(\\cdot\)andTrajectoryDecoder\(⋅\)\\mathrm\{TrajectoryDecoder\}\(\\cdot\)are feed\-forward networks with GELU\. Fusion and prediction\.We incorporate the retrieved degradation pattern into the forecasting head via gated fusion: \(28\)𝐇¯=GELU\(Flatten\(𝐇Lde\)𝐖3\+𝐛3\)𝐖4\+𝐛4,\\displaystyle\\bar\{\\mathbf\{H\}\}=\\mathrm\{GELU\}\\left\(\\mathrm\{Flatten\}\(\\mathbf\{H\}^\{L\_\{de\}\}\)\\mathbf\{W\}\_\{3\}\+\\mathbf\{b\}\_\{3\}\\right\)\\mathbf\{W\}\_\{4\}\+\\mathbf\{b\}\_\{4\},\(29\)𝜷=Sigmoid\(FFN\(\[𝐇¯;𝐡mem\]\)\),\\displaystyle\\boldsymbol\{\\beta\}=\\mathrm\{Sigmoid\}\\left\(\\mathrm\{FFN\}\\left\(\[\\bar\{\\mathbf\{H\}\};\\mathbf\{h\}\_\{mem\}\]\\right\)\\right\),\(30\)𝐲^=Head\(𝐇¯\+𝜷⊙𝐡mem\),\\displaystyle\\hat\{\\mathbf\{y\}\}=\\mathrm\{Head\}\\left\(\\bar\{\\mathbf\{H\}\}\+\\boldsymbol\{\\beta\}\\odot\\mathbf\{h\}\_\{mem\}\\right\),where𝜷∈ℝd\\boldsymbol\{\\beta\}\\in\\mathbb\{R\}^\{d\}is a feature\-wise gate,⊙\\odotdenotes element\-wise multiplication, andHead\(⋅\)\\mathrm\{Head\}\(\\cdot\)is a linear projection that outputs the predicted degradation trajectory𝐲^\\hat\{\\mathbf\{y\}\}\. ### 3\.4\.Training of BatteryMFormer BatteryMFormer is trained with the following objective: \(31\)minθℒ\(θ\)\\displaystyle\\min\_\{\\theta\}\\ \\mathcal\{L\}\(\\theta\)=ℒpred\+λ1ℒalign\+λ2ℒrecover,\\displaystyle=\\mathcal\{L\}\_\{pred\}\+\\lambda\_\{1\}\\mathcal\{L\}\_\{align\}\+\\lambda\_\{2\}\\mathcal\{L\}\_\{recover\},\(32\)ℒpred\\displaystyle\\mathcal\{L\}\_\{pred\}=1N∑i=1N1Oi∑j=1Tmaxmaskij\(𝐲ij−𝐲^ij\)2,\\displaystyle=\\frac\{1\}\{N\}\\sum\_\{i=1\}^\{N\}\\frac\{1\}\{O\_\{i\}\}\\sum\_\{j=1\}^\{T\_\{\\max\}\}mask\_\{ij\}\\bigl\(\\mathbf\{y\}\_\{ij\}\-\\hat\{\\mathbf\{y\}\}\_\{ij\}\\bigr\)^\{2\},whereλ1\\lambda\_\{1\}andλ2\\lambda\_\{2\}weight the alignment and recovery losses, respectively\. Table 1\.Statistics of four battery domains\. ## 4\.Experiments ### 4\.1\.Experimental Settings Datasets\. We evaluate our model and baselines on four battery domains from the largest public real\-world battery lifetime database\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)\. Dataset statistics are reported in Table[1](https://arxiv.org/html/2605.27044#S3.T1)\. - •Li\-ion\. This domain contains lab\-tested lithium\-ion batteries \(LIBs\) aggregated from 13 subdatasets\([Y\. Xing, E\. W\.M\. Ma, K\. Tsui, and M\. Pecht \(2013\)](https://arxiv.org/html/2605.27044#bib.bib54);[W\. He, N\. Williard, M\. Osterman, and M\. Pecht \(2011\)](https://arxiv.org/html/2605.27044#bib.bib12);[A\. Devie, G\. Baure, and M\. Dubarry \(2018\)](https://arxiv.org/html/2605.27044#bib.bib80);[P\. M\. Attia, A\. Grover, N\. Jin, K\. A\. Severson, T\. M\. Markov, Y\. Liao, M\. H\. Chen, B\. Cheong, N\. Perkins, Z\. Yang,et al\.\(2020\)](https://arxiv.org/html/2605.27044#bib.bib62);[K\. A\. Severson, P\. M\. Attia, N\. Jin, N\. Perkins, B\. Jiang, Z\. Yang, M\. H\. Chen, M\. Aykol, P\. K\. Herring, D\. Fraggedakis, M\. Z\. Bazant, S\. J\. Harris, W\. C\. Chueh, and R\. D\. Braatz \(2019\)](https://arxiv.org/html/2605.27044#bib.bib71);[D\. Juarez\-Robles, J\. A\. Jeevarajan, and P\. P\. Mukherjee \(2020\)](https://arxiv.org/html/2605.27044#bib.bib50);[D\. Juarez\-Robles, S\. Azam, J\. A\. Jeevarajan, and P\. P\. Mukherjee \(2021\)](https://arxiv.org/html/2605.27044#bib.bib32);[Y\. Preger, H\. M\. Barkholtz, A\. Fresquez, D\. L\. Campbell, B\. W\. Juba, J\. Romàn\-Kustas, S\. R\. Ferreira, and B\. Chalamala \(2020\)](https://arxiv.org/html/2605.27044#bib.bib99);[P\. Mohtat, S\. Lee, J\. B\. Siegel, and A\. G\. Stefanopoulou \(2021\)](https://arxiv.org/html/2605.27044#bib.bib95);[A\. Weng, P\. Mohtat, P\. M\. Attia, V\. Sulzer, S\. Lee, G\. Less, and A\. Stefanopoulou \(2021\)](https://arxiv.org/html/2605.27044#bib.bib94);[W\. Li, N\. Sengupta, P\. Dechent, D\. Howey, A\. Annaswamy, and D\. U\. Sauer \(2021\)](https://arxiv.org/html/2605.27044#bib.bib22);[G\. Ma, S\. Xu, B\. Jiang, C\. Cheng, X\. Yang, Y\. Shen, T\. Yang, Y\. Huang, H\. Ding, and Y\. Yuan \(2022\)](https://arxiv.org/html/2605.27044#bib.bib63);[J\. Zhu, Y\. Wang, Y\. Huang, R\. Bhushan Gopaluni, Y\. Cao, M\. Heere, M\. J\. Mühlbauer, L\. Mereacre, H\. Dai, X\. Liu,et al\.\(2022\)](https://arxiv.org/html/2605.27044#bib.bib6);[4](https://arxiv.org/html/2605.27044#bib.bib100);[X\. Cui, S\. D\. Kang, S\. Wang, J\. A\. Rose, H\. Lian, A\. Geslin, S\. B\. Torrisi, M\. Z\. Bazant, S\. Sun, and W\. C\. Chueh \(2024\)](https://arxiv.org/html/2605.27044#bib.bib93);[F\. Wang, Z\. Zhai, Z\. Zhao, Y\. Di, and X\. Chen \(2024\)](https://arxiv.org/html/2605.27044#bib.bib91);[T\. Li, Z\. Zhou, A\. Thelen, D\. A\. Howey, and C\. Hu \(2024b\)](https://arxiv.org/html/2605.27044#bib.bib5);[H\. Zhang, X\. Gui, S\. Zheng, Z\. Lu, Y\. Li, and J\. Bian \(2023a\)](https://arxiv.org/html/2605.27044#bib.bib69)\)\. Most batteries are commercial, covering diverse operating conditions and widely used LIB chemistries\. - •CALB\. This domain consists of large\-format commercial LIBs tested in a production environment\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)\. Compared with Li\-ion, CALB reflects industrial development toward larger capacities and package structure\. - •Na\-ion\. This domain includes commercial sodium\-ion batteries evaluated under diverse charge and discharge protocols\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)\. - •Zn\-ion\. This domain contains zinc\-ion batteries with varying electrolyte compositions and package structures, tested under different operating temperatures\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)\. Table 2\.Overall model performance on four battery domains\. The top\-three results are shaded\. The best results are shown in bold and the second\-best results are underlined\. The improvement denotes the relative improvement of BatteryMFormer over the second\-best model\.Figure 4\.Performance of the top\-three models as the number of usable early cycles increases\.Metrics and dataset splits\. In line with prior work\(Taoet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib82); Rahmanianet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib81)\), we evaluate performance using mean absolute error \(MAE\) and mean absolute percentage error \(MAPE\), both computed on the original SOH values\. We assess model generalizability under aging\-condition\-exclusive testing, where all test batteries come from aging conditions unseen during training and validation\. For the Li\-ion and Zn\-ion domains, we generate three random splits while keeping the aging condition counts close to a 6:2:2 train/validation/test ratio\. For CALB and Na\-ion, where the number of aging conditions is limited, we use a leave\-one\-aging\-condition\-out protocol: one aging condition is held out for testing, and 25% of the remaining aging conditions are selected for validation while the rest are used for training\. We report the mean and standard deviation over the resulting splits for each domain\. Baselines\. We compare against state\-of\-the\-art methods in two groups\. \(1\) Battery\-specific BDTF models: IC2ML\(Huanget al\.,[2026](https://arxiv.org/html/2605.27044#bib.bib19)\), CPTransformer\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\), and CPMLP\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)\. \(2\) Generic time\-series forecasting models: Transformer\-based methods \(TimeMixer\+\+\(Wanget al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib28)\), TimeBridge\(Liuet al\.,[2025a](https://arxiv.org/html/2605.27044#bib.bib101)\), iTransformer\(Liuet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib27)\), TimesFM\(Daset al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib45)\), and PatchTST\(Nieet al\.,[2023](https://arxiv.org/html/2605.27044#bib.bib36)\)\), multi\-layer perceptron \(MLP\) methods \(PatchMLP\(Tang and Zhang,[2025](https://arxiv.org/html/2605.27044#bib.bib42)\)and DLinear\(Zenget al\.,[2023](https://arxiv.org/html/2605.27044#bib.bib29)\)\), and convolutional neural network \(CNN\)\-based methods \(ConvTimeNet\(Chenget al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib46)\)\)\. Following the BatteryLife benchmark protocol\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\), all baselines take voltage, current, and capacity sequences as input and predict the future degradation trajectory, except IC2ML and TimesFM\. Following the original designs, IC2ML uses only the charging capacity\-increment sequence as input, and TimesFM extrapolates future SOH values from the historical SOH sequence\. Implementation details\. Following prior work\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\), we resample each cycle to a unified length ofL=300L=300\. All models are implemented in PyTorch and trained for up to 300 epochs with early stopping \(patience 30\) based on validation performance\. For each model and domain, we evaluate at least 10 hyperparameter configurations and report the one with the best validation performance\. All experiments are conducted on NVIDIA RTX 3090 GPUs\. Additional implementation and preprocessing details are provided in Appendix[C](https://arxiv.org/html/2605.27044#A3)and Appendix[D](https://arxiv.org/html/2605.27044#A4), respectively\. ### 4\.2\.Overall Performance Table 3\.Ablation study of BatteryMFormer on four battery domains\.#### 4\.2\.1\.Main results Table[2](https://arxiv.org/html/2605.27044#S4.T2)compares BatteryMFormer with state\-of\-the\-art baselines across four battery domains\. BatteryMFormer achieves the best performance on all domains and metrics despite substantial differences in battery chemistry, data scale, and degradation characteristics\. Compared with the second\-best model in each domain, BatteryMFormer reduces MAPE by 11\.07%, 8\.49%, 17\.66%, and 8\.97%, and reduces MAE by 10\.94%, 10\.83%, 17\.65%, and 11\.83% on Li\-ion, CALB, Na\-ion, and Zn\-ion, respectively\. These consistent improvements demonstrate the effectiveness of BatteryMFormer for early BDTF across domains\. Notably, the strongest baseline varies across domains: IC2ML performs best among baselines on Li\-ion, CALB, and Zn\-ion, while TimeBridge achieves the best baseline performance on Na\-ion\. This suggests that the predictive patterns underlying battery degradation are domain\-dependent, and different architectural inductive biases match these patterns to different extents\. This heterogeneity regarding the underlying effective patterns can also lead to pronounced performance instability\. For example, DLinear performs competitively on Na\-ion but incurs much larger errors on CALB, and TimesFM shows relatively high errors on Li\-ion and Zn\-ion\. In contrast, BatteryMFormer consistently achieves the best performance across domains, indicating that it can capture a broader spectrum of degradation patterns in different battery datasets\. #### 4\.2\.2\.Comparison under different numbers of usable cycles We further evaluate model performance by varying the number of usable early cycles\. Figure[4](https://arxiv.org/html/2605.27044#S4.F4)reports MAE and MAPE for the top\-performing models on each domain\. BatteryMFormer achieves consistent improvements across a broad range of early\-forecasting settings across four domains\. These results demonstrate the effectiveness of BatteryMFormer under different amounts of early degradation information\. We also observe that prediction errors can increase on Li\-ion and Na\-ion whenS\>25S\>25\. This reflects an open challenge in long\-sequence time\-series modeling: longer inputs do not guarantee improvements, and may instead introduce redundancy and optimization difficulty\. In our setting, each cycle contains 300 points, soS\>25S\>25already yields more than 7,500 input points\. Since adjacent battery cycles often change only marginally, longer inputs may dilute informative degradation signatures with redundant measurements\. Similar error increases are also observed in other top\-performing baselines and broader time\-series forecasting studies when modeling long input sequences\(Nieet al\.,[2023](https://arxiv.org/html/2605.27044#bib.bib36); Tang and Zhang,[2025](https://arxiv.org/html/2605.27044#bib.bib42)\)\. Despite this effect, BatteryMFormer generally outperforms the baselines, underscoring the advantage of the proposed multi\-level learning strategy in handling long operational voltage and current time series for early BDTF\. ### 4\.3\.Ablation Study We conduct an ablation study of BatteryMFormer to evaluate the effectiveness of its key components, with the results summarized in Table[3](https://arxiv.org/html/2605.27044#S4.T3)\. ”w/o SOCView” removes the SOC view from the dual\-view encoder, and ”w/o MDPM” removes the meta degradation pattern memory\. For the aging\-condition\-aware decoder, ”w/o ACQuery” removes aging\-condition information from generic queries, ”w/o ACAttention” replaces aging\-condition\-aware attention with standard attention, and ”w/o ACDecoder” removes both mechanisms\. ”w/o LLM” replaces the language\-based embedder with factor\-wise lookup embeddings followed by projection\. Since variable\-length protocols, such as multi\-stage charge/discharge protocols, cannot be trivially encoded by fixed lookup embeddings due to out\-of\-vocabulary issues, this variant only uses positive electrode, negative electrode, operating temperature, package structure, and manufacturer as lookup\-embedding factors\. ”CPTransformer\-SI” provides CPTransformer with the same input information as BatteryMFormer, including voltage, current, capacity, SOC, aging\-condition information, and cycle\-level descriptors\. Table[3](https://arxiv.org/html/2605.27044#S4.T3)shows that the three major components of BatteryMFormer all contribute to performance\. Removing the SOC view, MDPM, or ACDecoder consistently degrades results across all domains, indicating that learning SOC\-localized patterns, trajectory\-level prototypes, and aging\-condition\-informed representations is useful for early BDTF\. Their contributions are domain\-dependent: the SOC view brings particularly clear gains on Li\-ion and Na\-ion, while ACDecoder has more pronounced effects on CALB and Zn\-ion\. This suggests that the effective predictive patterns vary across battery domains, and that the components of BatteryMFormer capture complementary aspects of these patterns\. CPTransformer\-SI remains clearly worse than BatteryMFormer and performs close to CPTransformer in most domains, indicating that simply providing the same input information yields limited benefits\. The advantage of BatteryMFormer therefore comes primarily from its multi\-level learning architecture rather than additional input variables alone\. Finer\-grained ACDecoder ablations further show that both ACQuery and ACAttention are important for aging\-condition\-aware pattern mining\. Finally, replacing the LLM embedder with lookup embeddings consistently degrades performance, confirming that semantic aging\-condition representations provide useful information beyond factor\-wise embeddings\. While the mean difference is small on Li\-ion and Na\-ion, the larger standard deviations of w/o LLM suggest that the LLM embedder improves performance stability; on CALB and Zn\-ion, it improves both accuracy and robustness\. Collectively, these results validate the effectiveness of the proposed multi\-level learning strategy\. Figure 5\.Case study on three representative test batteries with superlinear, linear, and sublinear degradation\. \(a\) Ground\-truth vs\. predicted SOH trajectories with the top two retrieved MDPM prototypes \(weights shown\)\. \(b\) Attention weights of ACDecoder cross\-attention over temporal\-view and SOC\-view tokens; dashed line marks the boundary\. \(c\) Token\-wise attention weights \(bars\) and cumulative weight \(lines\) over token indices\.Figure 6\.Differential voltage analysis and average attention weights of SOC\-view tokens for a representative test battery\. Highlighted regions denote the SOC intervals corresponding to the top\-25% SOC\-view tokens ranked by attention weight, with darker green indicating larger attention weights\. ### 4\.4\.Case Study To interpret how BatteryMFormer leverages the multi\-level structure of early BDTF, we examine three representative test batteries exhibiting superlinear, linear, and sublinear degradation \(Figure[5](https://arxiv.org/html/2605.27044#S4.F5)\)\. We analyze \(i\) the top\-two meta degradation patterns retrieved from MDPM, visualized by decoding the corresponding memory embeddings with the trajectory decoder, and \(ii\) the ACDecoder cross\-attention weights over temporal\-view and SOC\-view tokens\. Figure[5](https://arxiv.org/html/2605.27044#S4.F5)a shows that MDPM retrieves prototypes that are consistent with the batteries’ long\-horizon degradation patterns\. For the superlinear battery, the retrieved prototypes include two trajectory prototypes showing accelerated degradation with knee points, providing informative priors for long\-range extrapolation\. For the linear battery, the retrieved prototypes are approximately linear and largely agree with the observed trend\. For the sublinear battery, MDPM retrieves prototypes exhibiting a slowdown in degradation even though the input covers only the early, faster\-decay stage, indicating that the MDPM stores diverse global trajectory shapes and can retrieve related trajectory prototypes to help BDTF for batteries subjected to aging conditions not covered by the training data\. Figure[5](https://arxiv.org/html/2605.27044#S4.F5)b–c further indicates that ACDecoder integrates both views\. Most attention mass is assigned to temporal\-view tokens, while SOC\-view tokens receive a non\-trivial share\. Moreover, attention over SOC\-view tokens is highly concentrated on a small subset, suggesting that degradation\-relevant operational signatures are localized to specific SOC intervals and that BatteryMFormer can prioritize these regions through attention\. Overall, this case study illustrates how MDPM supplies pattern\-level priors and how the dual\-view encoder with ACDecoder selectively aggregates temporal and SOC\-localized cues to improve early BDTF\. We further interpret the SOC\-view attention through differential voltage analysis \(DVA\) on a test battery \(Figure[6](https://arxiv.org/html/2605.27044#S4.F6)\)\. The top\-25% SOC tokens ranked by attention weights are concentrated around the major DVA peaks and their shoulder regions, which are known to be sensitive to degradation\-induced changes such as peak shifts, broadening, and shape distortion caused by lithium inventory loss, loss of active material, or polarization growth\(Birklet al\.,[2017](https://arxiv.org/html/2605.27044#bib.bib24); Tanet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib59)\)\. This result suggests that the SOC view guides BatteryMFormer to selectively attend to particular SOC intervals\. The alignment with DVA features further indicates that the learned SOC\-localized patterns can reflect electrochemical signatures associated with battery aging mechanisms\. Table 4\.Results of data\-efficient learning with 50% of the training batteries retained\. Top\-3 and Top\-2 denote the third\-best and second\-best baselines selected from the overall comparison\. ### 4\.5\.Data\-Efficient Learning Collecting full\-life degradation trajectories is costly and can take months to years, making data\-efficient BDTF an important practical requirement\. To evaluate model robustness under limited lifetime data, we retain only 50% of the training batteries in each domain while keeping the validation and test parts unchanged\. Table[4](https://arxiv.org/html/2605.27044#S4.T4)reports the results of BatteryMFormer and the top\-performing baselines under this reduced\-data setting\. BatteryMFormer achieves the best performance across all four domains with only 50% of the training data\. Compared with the strongest baseline, it reduces MAPE by 12\.45%, 2\.81%, 15\.23%, and 17\.69%, and reduces MAE by 12\.17%, 5\.22%, 14\.98%, and 18\.04% on Li\-ion, CALB, Na\-ion, and Zn\-ion, respectively\. The improvements are particularly pronounced on Na\-ion and Zn\-ion, where training data are limited and aging conditions are diverse\. These results indicate that the proposed multi\-level learning strategy can still extract informative degradation patterns from limited lifetime data, thereby improving data efficiency in early BDTF\. ## 5\.Related Work We review existing approaches on battery degradation trajectory forecasting \(BDTF\) from two perspectives: feature engineering and representation learning\. Feature\-engineering\-based methods\.These methods focus on extracting degradation\-relevant descriptors from operational measurements \(e\.g\., voltage, current, capacity, relaxation signals\) and then fit data\-driven predictors for future capacity/SOH trajectories\(Taoet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib82); Liet al\.,[2024a](https://arxiv.org/html/2605.27044#bib.bib90); Menget al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib11)\)\. A common practice is to extract features from regions where aging signatures are pronounced\. For instance,\(Liet al\.,[2024a](https://arxiv.org/html/2605.27044#bib.bib90)\)constructs descriptors from the late\-charge capacity sequence and the post\-charge relaxation voltage region and then uses an LSTM as the forecaster;\(Taoet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib82)\)designs features tailored to a 9\-step charging protocol and applies a feed\-forward network for trajectory prediction\. While effective on curated settings, these handcrafted descriptors are often protocol\- or dataset\-specific \(e\.g\., tied to particular voltage windows or multi\-step procedures\) and may not be available or predictive across diverse aging conditions\(Maet al\.,[2022](https://arxiv.org/html/2605.27044#bib.bib63); Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)\. Representation\-learning\-based methods\.In contrast, methods in this research line learn forecasting\-relevant representations directly from raw or minimally processed measurements using neural networks\(Liet al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib22),[2022](https://arxiv.org/html/2605.27044#bib.bib83); Tanet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib59); Liuet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib8); Huanget al\.,[2026](https://arxiv.org/html/2605.27044#bib.bib19); Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98); Huanget al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib9)\)\. An intuitive strategy treats BDTF as generic time\-series forecasting by extrapolating future SOH from historical SOH records using architectures such as LSTM or Transformer variants\(Liet al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib22); Tanet al\.,[2024](https://arxiv.org/html/2605.27044#bib.bib59); Shenet al\.,[2025](https://arxiv.org/html/2605.27044#bib.bib89); Liet al\.,[2022](https://arxiv.org/html/2605.27044#bib.bib83)\)\. However, SOH\-only inputs can be weakly informative in early cycles, where early trajectories appear similar yet diverge substantially in the long horizon\. Recent work therefore exploits fine\-grained voltage\-current profiles\. The BatteryLife benchmark\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)shows that directly applying generic forecasters for modeling voltage and current time series can be suboptimal, and introduces CyclePatch to model intra\-cycle dynamics and inter\-cycle evolution more effectively; IC2ML\(Huanget al\.,[2026](https://arxiv.org/html/2605.27044#bib.bib19)\)further improves this paradigm by injecting auxiliary supervision to enhance learned representations\. Our BatteryMFormer belongs to this representation\-learning family, but advances beyond existing approaches by explicitly integrating multi\-level inductive biases for early BDTF\. ## 6\.Limitations and Ethical Considerations Limitations\.First, using more early cycles yields long and redundant inputs; for example, more than 25 cycles already corresponds to over 7,500 input points\. This can compromise model performance on ultra\-long inputs\. Second, we evaluate on regular laboratory/production tests that are critical for battery optimization and production, whereas field data \(e\.g\., EV logs\) are often irregular and noisier due to varying usage and sensor noise\. Applying BatteryMFormer to field conditions may require modified representations and preprocessing \(e\.g\., handling inaccurate and irregular data records\(Zhanget al\.,[2023b](https://arxiv.org/html/2605.27044#bib.bib77)\)\)\. Ethical considerations\.This work utilizes publicly available battery lifetime datasets and contains no human\-subject or personally identifiable data\. In field deployment, forecast errors can induce suboptimal decisions \(e\.g\., premature retirement or delayed maintenance\), so models should be validated against the target operating distribution before high\-stakes deployment\. ## 7\.Conclusion and Future Work This paper highlights the value of explicitly modeling the multi\-level structure in early battery degradation trajectory forecasting, spanning trajectory patterns, aging conditions, and battery\-specific dynamics\. We propose BatteryMFormer and demonstrate that our model delivers consistent improvements over state\-of\-the\-art baselines across four battery domains\. Ablation and case studies further confirm that each component contributes meaningfully to these gains\. BatteryMFormer also performs better in data\-efficient settings under reduced training data\. Future endeavors will focus on improving the model’s ability to model long operational time series and adapting the framework to irregular, noisy field data\. ## 8\.GenAI Disclosure Generative AI tools were used to assist with language editing \(e\.g\., improving clarity, grammar, and conciseness\) of author\-written text and to support code development \(e\.g\., drafting or refactoring implementation snippets\)\. These tools were not used to generate the experimental results reported in this work\. All AI\-assisted edits to the manuscript and code were reviewed and validated by the authors, who take full responsibility for the correctness, originality, and integrity of the work\. ## 9\.Acknowledgments The authors acknowledge financial support from the National Key R&D Program of China \(No\. 2023YFB2503600\)\. This work is also supported by research grants from the National Natural Science Foundation of China \(Nos\. 92372109, 62572418 and 52207230\) and the Guangdong Provincial Talent Program \(No\. 2024TQ08X366\)\. We also acknowledge support from the Wilson Tang Brilliant Energy Science and Technology Lab \(BEST Lab\) at The Hong Kong University of Science and Technology \(Guangzhou\)\. ## References - P\. M\. Attia, A\. Bills, F\. B\. Planella, P\. Dechent, G\. Dos Reis, M\. Dubarry, P\. Gasper, R\. Gilchrist, S\. Greenbank, D\. Howey,et al\.\(2022\)“Knees” in lithium\-ion battery aging trajectories\.Journal of The Electrochemical Society169\(6\),pp\. 060517\.Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p3.1),[§3\.3](https://arxiv.org/html/2605.27044#S3.SS3.p1.3)\. - P\. M\. Attia, A\. Grover, N\. Jin, K\. A\. Severson, T\. M\. Markov, Y\. Liao, M\. H\. Chen, B\. Cheong, N\. Perkins, Z\. Yang,et al\.\(2020\)Closed\-loop optimization of fast\-charging protocols for batteries with machine learning\.Nature578\(7795\),pp\. 397–402\.Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1),[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - J\. L\. Ba, J\. R\. Kiros, and G\. E\. Hinton \(2016\)Layer normalization\.External Links:1607\.06450,[Link](https://arxiv.org/abs/1607.06450)Cited by:[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p4.9)\. - \[4\]\(2024\)BatteryArchive\.org\.\(Website\)External Links:[Link](https://batteryarchive.org/index.html)Cited by:[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - C\. R\. Birkl, M\. R\. Roberts, E\. McTurk, P\. G\. Bruce, and D\. A\. Howey \(2017\)Degradation diagnostics for lithium ion cells\.Journal of Power Sources341,pp\. 373–386\.External Links:ISSN 0378\-7753,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jpowsour.2016.12.011),[Link](https://www.sciencedirect.com/science/article/pii/S0378775316316998)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p3.1),[§4\.4](https://arxiv.org/html/2605.27044#S4.SS4.p4.1)\. - M\. Cheng, J\. Yang, T\. Pan, Q\. Liu, Z\. Li, and S\. Wang \(2025\)ConvTimeNet: a deep hierarchical fully convolutional model for multivariate time series analysis\.InCompanion Proceedings of the ACM on Web Conference 2025,WWW ’25,New York, NY, USA,pp\. 171–180\.External Links:ISBN 9798400713316,[Link](https://doi.org/10.1145/3701716.3715214),[Document](https://dx.doi.org/10.1145/3701716.3715214)Cited by:[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1)\. - X\. Cui, S\. D\. Kang, S\. Wang, J\. A\. Rose, H\. Lian, A\. Geslin, S\. B\. Torrisi, M\. Z\. Bazant, S\. Sun, and W\. C\. Chueh \(2024\)Data\-driven analysis of battery formation reveals the role of electrode utilization in extending cycle life\.Joule8\(11\),pp\. 3072 – 3087\(English\)\.External Links:ISSN 25424351,[Link](http://dx.doi.org/10.1016/j.joule.2024.07.024)Cited by:[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - A\. Das, W\. Kong, R\. Sen, and Y\. Zhou \(2024\)A decoder\-only foundation model for time\-series forecasting\.InProceedings of the 41st International Conference on Machine Learning,ICML’24\.Cited by:[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1)\. - A\. Devie, G\. Baure, and M\. Dubarry \(2018\)Intrinsic variability in the degradation of a batch of commercial 18650 lithium\-ion cells\.Energies11\(5\)\.External Links:[Link](https://www.mdpi.com/1996-1073/11/5/1031),ISSN 1996\-1073,[Document](https://dx.doi.org/10.3390/en11051031)Cited by:[Appendix D](https://arxiv.org/html/2605.27044#A4.p4.13),[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - J\. Fleischmann, M\. Hanicke, E\. Horetsky, D\. Ibrahim, S\. Jautelat, M\. Linder, P\. Schaufuss, L\. Torscht, and A\. van de Rijt \(2023\)Battery 2030: resilient, sustainable, and circular\.McKinsey & Company16,pp\. 2023\.Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1)\. - W\. He, N\. Williard, M\. Osterman, and M\. Pecht \(2011\)Prognostics of lithium\-ion batteries based on dempster–shafer theory and the bayesian monte carlo method\.Journal of Power Sources196\(23\),pp\. 10314–10321\.External Links:ISSN 0378\-7753,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jpowsour.2011.08.040),[Link](https://www.sciencedirect.com/science/article/pii/S0378775311015400)Cited by:[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - D\. Hendrycks and K\. Gimpel \(2016\)Gaussian error linear units \(gelus\)\.arXiv preprint arXiv:1606\.08415\.Cited by:[§3\.1](https://arxiv.org/html/2605.27044#S3.SS1.p2.8)\. - J\. Huang, S\. T\. Boles, and J\. Tarascon \(2022\)Sensing as the key to battery lifetime and sustainability\.Nature Sustainability5\(3\),pp\. 194 – 204\(English\)\.External Links:ISSN 23989629,[Link](http://dx.doi.org/10.1038/s41893-022-00859-y)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1)\. - X\. Huang, C\. Liang, S\. Tao, Y\. Che, N\. Bian, J\. Zhang, R\. Wang, Y\. Zhang, B\. Xia, and X\. Zhang \(2026\)IC2ML: unified battery state\-of\-health, degradation trajectory and remaining useful life prediction via intra\-cycle and inter\-cycle enhanced machine learning\.Journal of Power Sources666,pp\. 239148\.External Links:ISSN 0378\-7753,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jpowsour.2025.239148),[Link](https://www.sciencedirect.com/science/article/pii/S0378775325029854)Cited by:[§C\.2](https://arxiv.org/html/2605.27044#A3.SS2.p4.1),[§1](https://arxiv.org/html/2605.27044#S1.p1.1),[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1),[§5](https://arxiv.org/html/2605.27044#S5.p3.1)\. - Y\. Huang, P\. Zhang, J\. Lu, R\. Xiong, and Z\. Cai \(2024\)A transferable long\-term lithium\-ion battery aging trajectory prediction model considering internal resistance and capacity regeneration phenomenon\.Applied Energy360,pp\. 122825\.External Links:ISSN 0306\-2619,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.apenergy.2024.122825),[Link](https://www.sciencedirect.com/science/article/pii/S0306261924002083)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§1](https://arxiv.org/html/2605.27044#S1.p3.1),[§5](https://arxiv.org/html/2605.27044#S5.p3.1)\. - D\. Juarez\-Robles, S\. Azam, J\. A\. Jeevarajan, and P\. P\. Mukherjee \(2021\)Degradation\-safety analytics in lithium\-ion cells and modules: part iii\. aging and safety of pouch format cells\.Journal of The Electrochemical Society168\(11\),pp\. 110501\.External Links:[Document](https://dx.doi.org/10.1149/1945-7111/ac30af),[Link](https://dx.doi.org/10.1149/1945-7111/ac30af)Cited by:[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - D\. Juarez\-Robles, J\. A\. Jeevarajan, and P\. P\. Mukherjee \(2020\)Degradation\-safety analytics in lithium\-ion cells: part i\. aging under charge/discharge cycling\.Journal of The Electrochemical Society167\(16\),pp\. 160510\.Cited by:[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - M\. Kim, I\. Kim, J\. Kim, and J\. W\. Choi \(2023\)Lifetime prediction of lithium ion batteries by using the heterogeneity of graphite anodes\.ACS Energy Letters8\(7\),pp\. 2946 – 2953\(English\)\.External Links:ISSN 23808195,[Link](http://dx.doi.org/10.1021/acsenergylett.3c00695)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p3.1),[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p1.1)\. - J\. Li, Z\. Deng, Y\. Che, Y\. Xie, X\. Hu, and R\. Teodorescu \(2024a\)Degradation pattern recognition and features extrapolation for battery capacity trajectory prediction\.IEEE Transactions on Transportation Electrification10\(3\),pp\. 7565–7579\.External Links:[Document](https://dx.doi.org/10.1109/TTE.2023.3336618)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§5](https://arxiv.org/html/2605.27044#S5.p2.1)\. - J\. Li, Y\. Yang, H\. Su, J\. Liu, Y\. Chen, J\. Zhang, and L\. Pan \(2025\)LiPM: foundation model for lithium\-ion battery analysis\.InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2,KDD ’25,New York, NY, USA,pp\. 1412–1423\.External Links:ISBN 9798400714542,[Link](https://doi.org/10.1145/3711896.3737027),[Document](https://dx.doi.org/10.1145/3711896.3737027)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1)\. - T\. Li, Z\. Zhou, A\. Thelen, D\. A\. Howey, and C\. Hu \(2024b\)Predicting battery lifetime under varying usage conditions from early aging data\.Cell Reports Physical Science5\(4\),pp\. 101891\.External Links:ISSN 2666\-3864,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.xcrp.2024.101891),[Link](https://www.sciencedirect.com/science/article/pii/S2666386424001279)Cited by:[Appendix D](https://arxiv.org/html/2605.27044#A4.p4.13),[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - W\. Li, N\. Sengupta, P\. Dechent, D\. Howey, A\. Annaswamy, and D\. U\. Sauer \(2021\)One\-shot battery degradation trajectory prediction with deep learning\.Journal of Power Sources506,pp\. 230024\.External Links:ISSN 0378\-7753,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.jpowsour.2021.230024),[Link](https://www.sciencedirect.com/science/article/pii/S0378775321005528)Cited by:[Appendix D](https://arxiv.org/html/2605.27044#A4.p4.13),[§1](https://arxiv.org/html/2605.27044#S1.p1.1),[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1),[§5](https://arxiv.org/html/2605.27044#S5.p3.1)\. - W\. Li, H\. Zhang, B\. van Vlijmen, P\. Dechent, and D\. U\. Sauer \(2022\)Forecasting battery capacity and power degradation with multi\-task learning\.Energy Storage Materials53,pp\. 453–466\.External Links:ISSN 2405\-8297,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.ensm.2022.09.013),[Link](https://www.sciencedirect.com/science/article/pii/S2405829722004998)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§1](https://arxiv.org/html/2605.27044#S1.p3.1),[§5](https://arxiv.org/html/2605.27044#S5.p3.1)\. - P\. Liu, B\. Wu, Y\. Hu, N\. Li, T\. Dai, J\. Bao, and S\. Xia \(2025a\)TimeBridge: non\-stationarity matters for long\-term time series forecasting\.International Conference on Machine Learning\.Cited by:[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1)\. - Q\. Liu, Z\. Shang, S\. Lu, Y\. Liu, Y\. Liu, and S\. Yu \(2025b\)Physics\-guided tl\-lstm network for early\-stage degradation trajectory prediction of lithium\-ion batteries\.Journal of Energy Storage106,pp\. 114736\.External Links:ISSN 2352\-152X,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.est.2024.114736),[Link](https://www.sciencedirect.com/science/article/pii/S2352152X24043226)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§5](https://arxiv.org/html/2605.27044#S5.p3.1)\. - Y\. Liu, T\. Hu, H\. Zhang, H\. Wu, S\. Wang, L\. Ma, and M\. Long \(2024\)ITransformer: inverted transformers are effective for time series forecasting\.InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7\-11, 2024,Vienna, Austria,pp\. 1–25\.External Links:[Link](https://openreview.net/forum?id=JePfAI8fah)Cited by:[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1)\. - G\. Ma, S\. Xu, B\. Jiang, C\. Cheng, X\. Yang, Y\. Shen, T\. Yang, Y\. Huang, H\. Ding, and Y\. Yuan \(2022\)Real\-time personalized health status prediction of lithium\-ion batteries using deep transfer learning\.Energy and Environmental Science15\(10\),pp\. 4083 – 4094\(English\)\.External Links:ISSN 17545692,[Link](http://dx.doi.org/10.1039/d2ee01676a)Cited by:[§2\.2](https://arxiv.org/html/2605.27044#S2.SS2.p1.1),[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1),[§5](https://arxiv.org/html/2605.27044#S5.p2.1)\. - J\. Meng, L\. Cai, S\. Yang, J\. Li, F\. Zhou, J\. Peng, and Z\. Song \(2024\)An empirical\-informed model for the early degradation trajectory prediction of lithium\-ion battery\.IEEE Transactions on Energy Conversion39\(4\),pp\. 2299–2311\.External Links:[Document](https://dx.doi.org/10.1109/TEC.2024.3385093)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§5](https://arxiv.org/html/2605.27044#S5.p2.1)\. - P\. Mohtat, S\. Lee, J\. B\. Siegel, and A\. G\. Stefanopoulou \(2021\)Reversible and irreversible expansion of lithium\-ion batteries under a wide range of stress factors\.Journal of The Electrochemical Society168\(10\),pp\. 100520\.Cited by:[Appendix D](https://arxiv.org/html/2605.27044#A4.p4.13),[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - Y\. Nie, N\. H\. Nguyen, P\. Sinthong, and J\. Kalagnanam \(2023\)A time series is worth 64 words: long\-term forecasting with transformers\.InThe Eleventh International Conference on Learning Representations, ICLR 2023, Kigali, Rwanda, May 1\-5, 2023,Kigali, Rwanda,pp\. 1–24\.External Links:[Link](https://openreview.net/forum?id=Jbdc0vTOcol)Cited by:[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1),[§4\.2\.2](https://arxiv.org/html/2605.27044#S4.SS2.SSS2.p2.2)\. - Y\. Preger, H\. M\. Barkholtz, A\. Fresquez, D\. L\. Campbell, B\. W\. Juba, J\. Romàn\-Kustas, S\. R\. Ferreira, and B\. Chalamala \(2020\)Degradation of commercial lithium\-ion cells as a function of chemistry and cycling conditions\.Journal of The Electrochemical Society167\(12\),pp\. 120532\.External Links:[Document](https://dx.doi.org/10.1149/1945-7111/abae37),[Link](https://dx.doi.org/10.1149/1945-7111/abae37)Cited by:[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - F\. Rahmanian, R\. M\. Lee, D\. Linzner, K\. Michel, L\. Merker, B\. B\. Berkes, L\. Nuss, and H\. S\. Stein \(2024\)Attention towards chemistry agnostic and explainable battery lifetime prediction\.npj Computational Materials10\(1\) \(English\)\.External Links:ISSN 20573960,[Link](http://dx.doi.org/10.1038/s41524-024-01286-7)Cited by:[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p2.1)\. - K\. A\. Severson, P\. M\. Attia, N\. Jin, N\. Perkins, B\. Jiang, Z\. Yang, M\. H\. Chen, M\. Aykol, P\. K\. Herring, D\. Fraggedakis, M\. Z\. Bazant, S\. J\. Harris, W\. C\. Chueh, and R\. D\. Braatz \(2019\)Data\-driven prediction of battery cycle life before capacity degradation\.Nature Energy4\(5\),pp\. 383 – 391\(English\)\.External Links:ISSN 20587546Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1),[§1](https://arxiv.org/html/2605.27044#S1.p3.1),[§2\.2](https://arxiv.org/html/2605.27044#S2.SS2.p1.1),[§2\.3](https://arxiv.org/html/2605.27044#S2.SS3.p1.5),[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p1.1),[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - Q\. Shen, J\. Li, J\. Nie, Z\. Bao, and C\. Wang \(2025\)A lightweight multiscale signal learning framework for predicting battery degradation trajectory\.IEEE Sensors Journal25\(24\),pp\. 44801 – 44812\(English\)\.External Links:ISSN 1530437X,[Link](http://dx.doi.org/10.1109/JSEN.2025.3625630)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§5](https://arxiv.org/html/2605.27044#S5.p3.1)\. - R\. Tan, W\. Hong, J\. Li, J\. Huang, and T\. Zhang \(2025a\)Pretrained battery transformer \(pbt\): a battery life prediction foundation model\.arXiv preprint arXiv:2512\.16334\.Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.27044#S2.SS1.p1.1),[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p2.4)\. - R\. Tan, W\. Hong, J\. Tang, X\. Lu, R\. Ma, X\. Zheng, J\. Li, J\. Huang, and T\. Zhang \(2025b\)BatteryLife: a comprehensive dataset and benchmark for battery life prediction\.InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2,KDD ’25,New York, NY, USA,pp\. 5789–5800\.External Links:ISBN 9798400714542,[Link](https://doi.org/10.1145/3711896.3737372),[Document](https://dx.doi.org/10.1145/3711896.3737372)Cited by:[Appendix A](https://arxiv.org/html/2605.27044#A1.p1.6),[§C\.1](https://arxiv.org/html/2605.27044#A3.SS1.p1.7),[Appendix D](https://arxiv.org/html/2605.27044#A4.p2.6),[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§2\.1](https://arxiv.org/html/2605.27044#S2.SS1.p1.1),[§2\.2](https://arxiv.org/html/2605.27044#S2.SS2.p1.1),[§2\.2](https://arxiv.org/html/2605.27044#S2.SS2.p1.10),[§2\.3](https://arxiv.org/html/2605.27044#S2.SS3.p1.5),[§3\.1](https://arxiv.org/html/2605.27044#S3.SS1.p1.3),[§3\.1](https://arxiv.org/html/2605.27044#S3.SS1.p3.2),[2nd item](https://arxiv.org/html/2605.27044#S4.I1.i2.p1.1),[3rd item](https://arxiv.org/html/2605.27044#S4.I1.i3.p1.1),[4th item](https://arxiv.org/html/2605.27044#S4.I1.i4.p1.1),[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p1.1),[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1),[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p4.1),[§5](https://arxiv.org/html/2605.27044#S5.p2.1),[§5](https://arxiv.org/html/2605.27044#S5.p3.1)\. - R\. Tan, X\. Lu, M\. Cheng, J\. Li, J\. Huang, and T\. Zhang \(2024\)Forecasting battery degradation trajectory under domain shift with domain generalization\.Energy Storage Materials72,pp\. 103725\.External Links:ISSN 2405\-8297,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.ensm.2024.103725),[Link](https://www.sciencedirect.com/science/article/pii/S2405829724005518)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1),[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§1](https://arxiv.org/html/2605.27044#S1.p3.1),[§4\.4](https://arxiv.org/html/2605.27044#S4.SS4.p4.1),[§5](https://arxiv.org/html/2605.27044#S5.p3.1)\. - S\. Tan, B\. Ji, and Y\. Pan \(2023\)EMMN: emotional motion memory network for audio\-driven emotional talking face generation\.In2023 IEEE/CVF International Conference on Computer Vision \(ICCV\),Vol\.,pp\. 22089–22099\.External Links:[Document](https://dx.doi.org/10.1109/ICCV51070.2023.02024)Cited by:[§3\.3](https://arxiv.org/html/2605.27044#S3.SS3.p1.3)\. - P\. Tang and W\. Zhang \(2025\)Unlocking the power of patch: patch\-based mlp for long\-term time series forecasting\.Proceedings of the AAAI Conference on Artificial Intelligence39\(12\),pp\. 12640–12648\.External Links:[Link](https://ojs.aaai.org/index.php/AAAI/article/view/33378),[Document](https://dx.doi.org/10.1609/aaai.v39i12.33378)Cited by:[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1),[§4\.2\.2](https://arxiv.org/html/2605.27044#S4.SS2.SSS2.p2.2)\. - S\. Tao, H\. Liu, C\. Sun, H\. Ji, G\. Ji, Z\. Han, R\. Gao, J\. Ma, R\. Ma, Y\. Chen,et al\.\(2023\)Collaborative and privacy\-preserving retired battery sorting for profitable direct recycling via federated machine learning\.Nature Communications14\(1\),pp\. 8032\.Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1)\. - S\. Tao, M\. Zhang, Z\. Zhao, H\. Li, R\. Ma, Y\. Che, X\. Sun, L\. Su, C\. Sun, X\. Chen, H\. Chang, S\. Zhou, Z\. Li, H\. Lin, Y\. Liu, W\. Yu, Z\. Xu, H\. Hao, S\. Moura, X\. Zhang, Y\. Li, X\. Hu, and G\. Zhou \(2025\)Non\-destructive degradation pattern decoupling for early battery trajectory prediction via physics\-informed learning\.Energy and Environmental Science18\(3\),pp\. 1544 – 1559\(English\)\.External Links:ISSN 17545692,[Link](http://dx.doi.org/10.1039/d4ee03839h)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p2.1),[§1](https://arxiv.org/html/2605.27044#S1.p3.1),[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p1.1),[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p2.1),[§5](https://arxiv.org/html/2605.27044#S5.p2.1)\. - A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, L\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.InAdvances in Neural Information Processing Systems,Vol\.2017\-December,Long Beach, CA, United states,pp\. 5999 – 6009\(English\)\.External Links:ISSN 10495258Cited by:[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p3.4),[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p4.6)\. - F\. Wang, Z\. Zhai, Z\. Zhao, Y\. Di, and X\. Chen \(2024\)Physics\-informed neural network for lithium\-ion battery degradation stable modeling and prognosis\.Nature Communications15\(1\),pp\. 4332\.External Links:[Link](https://doi.org/10.1038/s41467-024-48779-z)Cited by:[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - S\. Wang, J\. Li, X\. Shi, Z\. Ye, B\. Mo, W\. Lin, S\. Ju, Z\. Chu, and M\. Jin \(2025\)TIMEMIXER\+\+: a general time series pattern machine for universal predictive analysis\.Singapore, Singapore,pp\. 1662 – 1698\(English\)\.Cited by:[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1)\. - A\. Weng, P\. Mohtat, P\. M\. Attia, V\. Sulzer, S\. Lee, G\. Less, and A\. Stefanopoulou \(2021\)Predicting the impact of formation protocols on battery lifetime immediately after manufacturing\.Joule5\(11\),pp\. 2971 – 2992\(English\)\.External Links:ISSN 25424351,[Link](http://dx.doi.org/10.1016/j.joule.2021.09.015)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p3.1),[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p1.1),[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - J\. Weston, S\. Chopra, and A\. Bordes \(2015\)Memory networks\.External Links:1410\.3916,[Link](https://arxiv.org/abs/1410.3916)Cited by:[§3\.3](https://arxiv.org/html/2605.27044#S3.SS3.p1.3)\. - Y\. Xing, E\. W\.M\. Ma, K\. Tsui, and M\. Pecht \(2013\)An ensemble model for predicting the remaining useful performance of lithium\-ion batteries\.Microelectronics Reliability53\(6\),pp\. 811 – 820\(English\)\.External Links:ISSN 00262714,[Link](http://dx.doi.org/10.1016/j.microrel.2012.12.003)Cited by:[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - J\. Ye, W\. Zhang, Z\. Li, J\. Li, and F\. Tsung \(2025\)MedSpaformer: a transferable transformer with multi\-granularity token sparsification for medical time series classification\.External Links:2503\.15578,[Link](https://arxiv.org/abs/2503.15578)Cited by:[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p2.4)\. - A\. Zeng, M\. Chen, L\. Zhang, and Q\. Xu \(2023\)Are transformers effective for time series forecasting?\.InThirty\-Seventh AAAI Conference on Artificial Intelligence, AAAI 2023, Thirty\-Fifth Conference on Innovative Applications of Artificial Intelligence, IAAI 2023, Thirteenth Symposium on Educational Advances in Artificial Intelligence, EAAI 2023, Washington, DC, USA, February 7\-14, 2023,B\. Williams, Y\. Chen, and J\. Neville \(Eds\.\),Washington, DC, USA,pp\. 11121–11128\.External Links:[Link](https://doi.org/10.1609/aaai.v37i9.26317),[Document](https://dx.doi.org/10.1609/AAAI.V37I9.26317)Cited by:[§4\.1](https://arxiv.org/html/2605.27044#S4.SS1.p3.1)\. - H\. Zhang, X\. Gui, S\. Zheng, Z\. Lu, Y\. Li, and J\. Bian \(2023a\)BATTERYML: an open\-source platform for machine learning on battery degradation\.\(English\)\.External Links:ISSN 23318422,[Link](http://dx.doi.org/10.48550/arXiv.2310.14714)Cited by:[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. - H\. Zhang, Y\. Li, S\. Zheng, Z\. Lu, X\. Gui, W\. Xu, and J\. Bian \(2025a\)Battery lifetime prediction across diverse ageing conditions with inter\-cell deep learning\.Nature Machine Intelligence7\(2\),pp\. 270–277\.External Links:[Document](https://dx.doi.org/10.1038/s42256-024-00972-x)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1),[§2\.1](https://arxiv.org/html/2605.27044#S2.SS1.p1.1),[§2\.3](https://arxiv.org/html/2605.27044#S2.SS3.p1.5)\. - J\. Zhang, S\. Zheng, W\. Cao, J\. Bian, and J\. Li \(2023b\)Warpformer: a multi\-scale modeling approach for irregular clinical time series\.InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,KDD ’23,New York, NY, USA,pp\. 3273–3285\.External Links:ISBN 9798400701030,[Link](https://doi.org/10.1145/3580305.3599543),[Document](https://dx.doi.org/10.1145/3580305.3599543)Cited by:[§6](https://arxiv.org/html/2605.27044#S6.p1.1)\. - T\. Zhang, R\. Tan, P\. Zhu, T\. Zhang, and J\. Huang \(2025b\)Unlocking ultrafast diagnosis of retired batteries via interpretable machine learning and optical fiber sensors\.ACS Energy Letters10,pp\. 862–871\.Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1)\. - Y\. Zhang, M\. Li, D\. Long, X\. Zhang, H\. Lin, B\. Yang, P\. Xie, A\. Yang, D\. Liu, J\. Lin, F\. Huang, and J\. Zhou \(2025c\)Qwen3 embedding: advancing text embedding and reranking through foundation models\.arXiv preprint arXiv:2506\.05176\.Cited by:[§3\.2](https://arxiv.org/html/2605.27044#S3.SS2.p2.9)\. - H\. Zheng, S\. Yang, W\. Xue, S\. Xiao, D\. Shen, W\. Dong, and X\. Zhang \(2026\)Self\-discharge estimation for lithium\-ion batteries based on formation data in production\.Engineering Applications of Artificial Intelligence169,pp\. 114180\.External Links:ISSN 0952\-1976,[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.engappai.2026.114180),[Link](https://www.sciencedirect.com/science/article/pii/S0952197626004616)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p1.1)\. - H\. Zhou, S\. Zhang, J\. Peng, S\. Zhang, J\. Li, H\. Xiong, and W\. Zhang \(2021\)Informer: beyond efficient transformer for long sequence time\-series forecasting\.Proceedings of the AAAI Conference on Artificial Intelligence35\(12\),pp\. 11106–11115\.External Links:[Link](https://ojs.aaai.org/index.php/AAAI/article/view/17325),[Document](https://dx.doi.org/10.1609/aaai.v35i12.17325)Cited by:[§1](https://arxiv.org/html/2605.27044#S1.p2.1)\. - J\. Zhu, Y\. Wang, Y\. Huang, R\. Bhushan Gopaluni, Y\. Cao, M\. Heere, M\. J\. Mühlbauer, L\. Mereacre, H\. Dai, X\. Liu,et al\.\(2022\)Data\-driven capacity estimation of commercial lithium\-ion batteries from voltage relaxation\.Nature communications13\(1\),pp\. 2261\.Cited by:[Appendix D](https://arxiv.org/html/2605.27044#A4.p4.13),[1st item](https://arxiv.org/html/2605.27044#S4.I1.i1.p1.1)\. ## Appendix AEnd\-of\-life Definition We define end\-of\-lifeteolt\_\{\\mathrm\{eol\}\}as the first cycle at whichSOH\\mathrm\{SOH\}falls below the thresholdτ\\tau\. We useτ=80%\\tau=80\\%for Li\-ion, Na\-ion, and Zn\-ion\. For CALB, since many batteries are not degraded to80%80\\%SOH within the measured data, we useτ=90%\\tau=90\\%following BatteryLife\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)\. ## Appendix BDetails of SOC Calculation Each battery provides a protocol SOC interval\[SOCstartch,SOCendch\]\\bigl\[\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{start\}\},\\,\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{end\}\}\\bigr\], where charging starts atSOCstartch\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{start\}\}and ends atSOCendch\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{end\}\}; the subsequent discharge returns toSOCstartch\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{start\}\}before the next cycle\. Within each cycleii, we assume SOC varies linearly with the within\-segment charge/discharge capacity change\. LetQi,kQ\_\{i,k\}denote the capacity at pointkkin cycleii\. For charging, with segment endpointsQi,startchQ^\{\\mathrm\{ch\}\}\_\{i,\\mathrm\{start\}\}andQi,endchQ^\{\\mathrm\{ch\}\}\_\{i,\\mathrm\{end\}\}, we compute \(33\)SOCi,k=SOCstartch\+Qi,k−Qi,startchQi,endch−Qi,startch\(SOCendch−SOCstartch\)\.\\mathrm\{SOC\}\_\{i,k\}=\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{start\}\}\+\\frac\{Q\_\{i,k\}\-Q^\{\\mathrm\{ch\}\}\_\{i,\\mathrm\{start\}\}\}\{Q^\{\\mathrm\{ch\}\}\_\{i,\\mathrm\{end\}\}\-Q^\{\\mathrm\{ch\}\}\_\{i,\\mathrm\{start\}\}\}\\left\(\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{end\}\}\-\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{start\}\}\\right\)\.For discharging, we use the same linear mapping but reverse the SOC direction fromSOCendch\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{end\}\}toSOCstartch\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{start\}\}: \(34\)SOCi,k=SOCendch\+Qi,k−Qi,startdisQi,enddis−Qi,startdis\(SOCstartch−SOCendch\)\.\\mathrm\{SOC\}\_\{i,k\}=\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{end\}\}\+\\frac\{Q\_\{i,k\}\-Q^\{\\mathrm\{dis\}\}\_\{i,\\mathrm\{start\}\}\}\{Q^\{\\mathrm\{dis\}\}\_\{i,\\mathrm\{end\}\}\-Q^\{\\mathrm\{dis\}\}\_\{i,\\mathrm\{start\}\}\}\\left\(\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{start\}\}\-\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{end\}\}\\right\)\. SOC\-aligned resampling\.We re\-parameterize each segment by SOC and resample voltage, current, and capacity on a uniform SOC grid\. Concretely, for charging we interpolate each variable atL/2L/2equally spaced SOC values in\[SOCstartch,SOCendch\]\\bigl\[\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{start\}\},\\,\\mathrm\{SOC\}^\{\\mathrm\{ch\}\}\_\{\\mathrm\{end\}\}\\bigr\]; for discharging we useL/2L/2equally spaced values in the reverse direction\. We then concatenate the resampled charging and discharging sequences to form a length\-LLper\-cycle input, where thekk\-th point corresponds to a fixed SOC level \(within the recorded interval\), yielding SOC\-aligned inputs for BatteryMFormer\. ## Appendix CFurther Implementation Details This appendix provides additional implementation details to facilitate reproducibility\. We first describe the unified input processing pipeline used for all baselines that leverage cycling data, and then explain how we adapt generic time\-series forecasters and battery\-specific baselines to the early BDTF setting\. ### C\.1\.Input and Target Construction Input processing\. For each battery, we process the raw cycling record on a per\-cycle basis\. For cycleii, we resample the charging and discharging segments toL/2L/2uniformly spaced points \(per segment\) and concatenate them in the order*charge→\\rightarrowdischarge*, yielding length\-LLsequences for voltage, current, and capacity\. Let𝐯i∈ℝL\\mathbf\{v\}\_\{i\}\\in\\mathbb\{R\}^\{L\},𝐈i∈ℝL\\mathbf\{I\}\_\{i\}\\in\\mathbb\{R\}^\{L\}, and𝐜i∈ℝL\\mathbf\{c\}\_\{i\}\\in\\mathbb\{R\}^\{L\}denote the resampled voltage, current, and capacity, respectively\. Following prior work\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\), we normalize current to C\-rate by dividing by the \(per\-battery\) nominal capacity: \(35\)𝐈i←𝐈i/Qnominal,\\mathbf\{I\}\_\{i\}\\leftarrow\\mathbf\{I\}\_\{i\}\\,/\\,Q\_\{\\mathrm\{nominal\}\},whereQnominalQ\_\{\\mathrm\{nominal\}\}is provided by the dataset\. Voltage and capacity are kept in their original scales\. We then form the per\-cycle input as \(36\)𝐗¯i=\[𝐯i;𝐈i;𝐜i\]∈ℝ3×L\.\\bar\{\\mathbf\{X\}\}\_\{i\}=\[\\mathbf\{v\}\_\{i\};\\,\\mathbf\{I\}\_\{i\};\\,\\mathbf\{c\}\_\{i\}\]\\in\\mathbb\{R\}^\{3\\times L\}\. For BatteryMFormer, we additionally compute the SOC variable as described in Appendix[B](https://arxiv.org/html/2605.27044#A2)\. All models are trained and evaluated under the early BDTF protocol in Section[2\.3](https://arxiv.org/html/2605.27044#S2.SS3): we use at most the first 100 cycles as input, and if fewer cycles are available, we pad the missing cycles with all\-zero sequences\. Specifically, for a setting withS∈\{1,…,100\}S\\in\\\{1,\\ldots,100\\\}usable early cycles, we build𝐗¯1:100\\bar\{\\mathbf\{X\}\}\_\{1:100\}by placing the available\{𝐗¯i\}i=1S\\\{\\bar\{\\mathbf\{X\}\}\_\{i\}\\\}\_\{i=1\}^\{S\}in the firstSScycles and zero\-padding the remaining cycles\. We also provide a cycle\-level validity mask \(37\)𝐦cyc∈\{0,1\}100,\\mathbf\{m\}^\{\\mathrm\{cyc\}\}\\in\\\{0,1\\\}^\{100\},wheremicyc=1m^\{\\mathrm\{cyc\}\}\_\{i\}=1if and only if cycleiiexists \(i\.e\.,i≤Si\\leq S\) and0otherwise; models that support attention masking use𝐦cyc\\mathbf\{m\}^\{\\mathrm\{cyc\}\}to ignore padded cycles\. Target normalization and padding\. Each battery trajectory is padded to a maximum horizon of 5000 cycles, which covers the longest trajectories in the database\. To reduce scale impact, we normalize SOH using the same EOL thresholdτ\\tauas in Appendix[A](https://arxiv.org/html/2605.27044#A1): \(38\)y~j=yj−τ1−τ,\\tilde\{y\}\_\{j\}=\\frac\{y\_\{j\}\-\\tau\}\{1\-\\tau\},whereyjy\_\{j\}is the ground\-truth SOH at cyclejj\. Consistent with Equation[32](https://arxiv.org/html/2605.27044#S3.E32), we compute the prediction loss in the normalized space by replacing𝐲\\mathbf\{y\}and𝐲^\\hat\{\\mathbf\{y\}\}with𝐲~\\tilde\{\\mathbf\{y\}\}and𝐲~^\\hat\{\\tilde\{\\mathbf\{y\}\}\}, respectively\. ### C\.2\.Baseline Implementation We next describe how we adapt baselines for battery degradation trajectory forecasting\. Generic time\-series forecasting models\.We reshape𝐗¯1:100∈ℝ100×3×L\\bar\{\\mathbf\{X\}\}\_\{1:100\}\\in\\mathbb\{R\}^\{100\\times 3\\times L\}as the time series inputs of the generic time\-series forecasting models: \(39\)𝐗flat=Reshape\(𝐗¯1:100\)∈ℝ\(100⋅L\)×3\.\\mathbf\{X\}\_\{\\mathrm\{flat\}\}=\\mathrm\{Reshape\}\(\\bar\{\\mathbf\{X\}\}\_\{1:100\}\)\\in\\mathbb\{R\}^\{\(100\\cdot L\)\\times 3\}\.The core architecture of the backbone encoderf\(⋅\)f\(\\cdot\)is kept unchanged from the original implementation, and we replace the original forecasting head with a trajectory prediction head that outputs a length\-5000 SOH sequence: \(40\)𝐇=f\(𝐗flat\),\\displaystyle\\mathbf\{H\}=f\(\\mathbf\{X\}\_\{\\mathrm\{flat\}\}\),\(41\)𝐲~^=Head\(𝐇\)∈ℝ5000\.\\displaystyle\\hat\{\\tilde\{\\mathbf\{y\}\}\}=\\mathrm\{Head\}\(\\mathbf\{H\}\)\\in\\mathbb\{R\}^\{5000\}\.Here,Head\(⋅\)\\mathrm\{Head\}\(\\cdot\)first flattens𝐇\\mathbf\{H\}if necessary and then applies a linear projection to forecast 5000 SOH points\. TimesFM is a pre\-trained time\-series foundation model that performs zero\-shot forecasting without task\-specific fine\-tuning\. As the model only supports univariate time\-series inputs, we directly feed the collected historical SOH sequence into it\. Given its patch\-based architecture with a fixed patch length of 32, the input sequence length must be a multiple of 32\. To meet this constraint, we truncate the sequence to the largest multiple of 32 that does not exceed the original input length for overly long sequences\. The model then autoregressively infers and generates the complete SOH degradation trajectory up to the specified prediction horizon\. Battery\-specific models\.For CPTransformer and CPMLP, we follow the official BatteryLife implementations and only adjust the output head to produce the length\-5000 trajectory\. For IC2ML\(Huanget al\.,[2026](https://arxiv.org/html/2605.27044#bib.bib19)\), the original paper uses the capacity increment within a fixed voltage window \(3\.6–3\.8V\) during charging\. To accommodate voltage\-range variations across diverse batteries and protocols in our data, we instead compute the capacity increment over each sample’s full observed charging voltage range and use it as the IC2ML input\. All other components follow the original paper and the official repository\. Training objective\. For a fair comparison, all baselines are trained with the same prediction loss as BatteryMFormer \(Equation[32](https://arxiv.org/html/2605.27044#S3.E32)\), i\.e\., a masked MSE over the available SOH labels in the prediction region\. For IC2ML, which introduces auxiliary supervision via multi\-task learning, we additionally incorporate the multi\-task loss terms as described in the original paper, while keeping the main trajectory prediction term consistent with Equation[32](https://arxiv.org/html/2605.27044#S3.E32)\. Hyperparameter search\. For BatteryMFormer, we perform per\-fold hyperparameter search using Bayesian optimization, running 30 trials per fold and selecting the configuration with the lowest validation MAPE\. The search space includes learning rate in\[2×10−5,2×10−4\]\[2\\times 10^\{\-5\},2\\times 10^\{\-4\}\], batch size in\{64,128\}\\\{64,128\\\}, dropout rate in\[0\.05,0\.5\]\[0\.05,0\.5\], embedding dimensiond∈\{64,128,256\}d\\in\\\{64,128,256\\\}, feed\-forward dimensionsdff∈\{32,64,128\}d\_\{\\mathrm\{ff\}\}\\in\\\{32,64,128\\\}anddffs∈\{32,64,128,256\}d\_\{\\mathrm\{ffs\}\}\\in\\\{32,64,128,256\\\}, key dimension in\{512,768\}\\\{512,768\\\}, memory dimension in\{128,512\}\\\{128,512\\\},Lintra∈\{2,4\}L\_\{\\mathrm\{intra\}\}\\in\\\{2,4\\\},Lde∈\{2,4,6,8\}L\_\{de\}\\in\\\{2,4,6,8\\\}, number of queries in\{4,8,10,12,20,50\}\\\{4,8,10,12,20,50\\\},Nmem∈\{64,96\}N\_\{\\mathrm\{mem\}\}\\in\\\{64,96\\\}, and patch\-encoder kernel size in\{10,16,20,30\}\\\{10,16,20,30\\\}\. For fair comparison, each baseline is tuned on the same training/validation splits with at least 10 configurations per domain, and the configuration with the lowest validation MAPE is reported\. ## Appendix DFurther Details of Data Preprocessing This section describes the preprocessing pipeline applied to all datasets\. In principle, a battery is cycled under a single protocol\. However, raw operational data may include non\-standard segments such as reference performance tests \(RPTs\), formation cycles, and equipment faults\. These segments can induce spurious deviations in the cycle\-level SOH trajectory relative to the predominant operating regime, which can impair model training because such deviations may appear randomly\. We therefore detect such anomalous regions and repair them to obtain SOH trajectories that better reflect the underlying degradation trend under the major cycling protocol\. SOH computation and battery filtering\.We computeSOH\\mathrm\{SOH\}as defined in Section[2\.2](https://arxiv.org/html/2605.27044#S2.SS2)\. Single\-cycle spike drops exceeding 3% of the previous cycle’s SOH are replaced with that previous value to suppress measurement artifacts\. Batteries whose SOH has not degraded belowτ\+2\.5%\\tau\+2\.5\\%are excluded due to insufficient degradation information, whereτ=90%\\tau=90\\%for CALB\(Tanet al\.,[2025b](https://arxiv.org/html/2605.27044#bib.bib98)\)andτ=80%\\tau=80\\%for all other datasets \(Appendix[A](https://arxiv.org/html/2605.27044#A1)\)\. For batteries that have degraded toτ\+2\.5%\\tau\+2\.5\\%but have not yet reachedτ\\tau, we fit a linear regression on the last 20 cycles and extrapolate the SOH trajectory to the end\-of\-life \(EOL\) point\. SOH trajectory smoothing\.RPTs, formation cycles, and equipment faults can cause abrupt SOH jumps or drops in adjacent cycles\. We detect and repair these SOH anomalies at the region level\. Letδk=\(SOHk−SOHk−1\)/SOHk−1\\delta\_\{k\}=\(\\mathrm\{SOH\}\_\{k\}\-\\mathrm\{SOH\}\_\{k\-1\}\)/\\mathrm\{SOH\}\_\{k\-1\}denote the relative SOH change rate at cyclekk, and lettkt\_\{k\}denote the start time of cyclekk\. We identify anomaly onsets using dataset\-specific metadata when available\. For datasets that provide RPT timestamps \(ISU\-ILCC\(Liet al\.,[2024b](https://arxiv.org/html/2605.27044#bib.bib5)\)\), we map each RPT start time to a cycle index and take the last normal cycle immediately before the RPT as the onset\. For datasets that lack RPT annotations but record cycle timestamps \(HNEI\(Devieet al\.,[2018](https://arxiv.org/html/2605.27044#bib.bib80)\), RWTH\(Liet al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib22)\), Tongji\(Zhuet al\.,[2022](https://arxiv.org/html/2605.27044#bib.bib6)\), MICH\_EXP\(Mohtatet al\.,[2021](https://arxiv.org/html/2605.27044#bib.bib95)\)\), we flag cyclekkas an onset when\(tk−tk−1\)\>γgap\(t\_\{k\}\-t\_\{k\-1\}\)\>\\gamma\_\{\\mathrm\{gap\}\}, whereγgap\\gamma\_\{\\mathrm\{gap\}\}is a dataset\-specific time\-gap threshold \(fixed for all batteries in that dataset; see Appendix[D](https://arxiv.org/html/2605.27044#A4)for values\)\. For the remaining datasets, we flag cyclekkwhenδk\>γ\+\\delta\_\{k\}\>\\gamma^\{\+\}orδk<γ−\\delta\_\{k\}<\\gamma^\{\-\}, whereγ\+\\gamma^\{\+\}andγ−\\gamma^\{\-\}are the 99th and 1st percentiles of the empirical\{δk\}\\\{\\delta\_\{k\}\\\}distribution computed on the training split of each dataset, respectively\. Once an anomaly onset is identified at cycleksk\_\{s\}, we locate the recovery pointkek\_\{e\}by scanning forward and selecting the earliest cycle such thatSOH\\mathrm\{SOH\}returns within a toleranceϵ\\epsilonof the pre\-anomaly levelSOHks−1\\mathrm\{SOH\}\_\{k\_\{s\}\-1\}and remains within this tolerance for the nextWWconsecutive cycles, whereϵ\\epsilonis an SOH tolerance andWWis the stability window length \(in cycles\)\. The anomalous region\[ks,ke\]\[k\_\{s\},k\_\{e\}\]is then repaired using PCHIP \(Piecewise Cubic Hermite Interpolating Polynomial\) interpolation of SOH over cycle indices, withMMnormal cycles immediately before and after the region used as anchor points\.
Similar Articles
Forecasting Medium-Horizon Alzheimer's Disease Progression: Residual Gap-Aware Transformers for 24-Month CDR-SB Change from ADNI Clinical and Biomarker Histories
This paper proposes a residual gap-aware transformer that combines a mixed-effects statistical reference with transformer-based residual learning to forecast 24-month CDR-SB change from ADNI clinical and biomarker histories, achieving reduced MSE and improved correlation over baselines.
MagBridge-Battery: A Synthetic Bridge Dataset for Li-ion Magnetometry and State-of-Health Diagnostics
This paper introduces MagBridge-Battery, a synthetic dataset of 6,760 magnetic-field signatures for Li-ion battery state-of-health diagnostics, combining real magnetic morphology with real degradation labels to bridge the gap in public magnetic-sensing battery data.
VBFDD-Agent for Electric Vehicle Battery Fault Detection and Diagnosis: Descriptive Text Modeling of Battery Digital Signals
This paper proposes VBFDD-Agent, a vehicle battery fault detection and diagnosis agent that uses descriptive text modeling of battery signals, large language models, and historical cases to generate interpretable diagnostic results and maintenance recommendations for electric vehicle batteries.
C2L-Net: A Data-Driven Model for State-of-Charge Estimation of Lithium-Ion Batteries During Discharge
This paper introduces C2L-Net, a data-driven model for efficient and accurate state-of-charge estimation of lithium-ion batteries using short historical windows.
Scientific Machine Learning for Engine Health Management and Remaining Useful Life Prediction
This paper presents a multi-task scientific machine learning framework for turbine prognostics that jointly predicts engine health metrics and remaining useful life with quantified uncertainty, using a shared sequence encoder and task-specific heads.