QUIVER: Quantum-Informed Views for Enhanced Representations in Large ML Models

arXiv cs.LG 06/03/26, 04:00 AM Papers
Summary
This paper introduces Quiver, a paradigm that enriches classical machine learning models with quantum-inspired features derived from the quantum Fisher information matrix, demonstrating improvements on molecule property prediction and jet flavor classification benchmarks.
arXiv:2606.02785v1 Announce Type: new Abstract: Large machine learning models benefit substantially from multimodal inputs that provide a complementary view of the same example. We introduce QUIVER (QUantum-Informed Views for Enhanced Representations, a paradigm that enriches classical data-driven features with a quantum Fisher view: a geometrically motivated, basis-independent summary of higher-order correlations captured by a variational quantum circuit (VQC) trained to perform the same task. Unlike classical feature augmentation, the quantum Fisher information matrix encodes the intrinsic geometry of the learned quantum state manifold. While this feature map, motivated by quantum information theory, is ordinarily non-trivial to model classically, it can surface statistical structure that additional classical data or model capacity finds difficult to learn. This makes the quantum Fisher view a genuinely complementary modality rather than a redundant one. We demonstrate that QUIVER improves standard performance metrics on two benchmark datasets from very different fields: QM9 for predicting molecule properties, and JetClass for predicting jet flavor at the Large Hadron Collider (LHC). The core contribution, however, is domain-agnostic: the quantum Fisher view can be fused into a broad class of model architectures via targeted modifications to the base architecture, to incorporate information about the quantum geometry of the problem. These results demonstrate that quantum-geometric features, extracted from simulated variational circuits, can deliver measurable value for standard machine learning tasks, well before the advent of fault-tolerant quantum hardware.
Original Article
View Cached Full Text
Cached at: 06/03/26, 09:39 AM
# Quiver: Quantum-Informed Views for Enhanced Representations in Large Machine Learning Models
Source: [https://arxiv.org/html/2606.02785](https://arxiv.org/html/2606.02785)
###### Abstract

Large machine learning models benefit substantially from multimodal inputs that provide a complementary view of the same example\. We introduceQuiver\(QUantum\-InformedViews forEnhancedRepresentations\), a paradigm that enriches classical data\-driven features with a*quantum Fisher view*: a geometrically motivated, basis\-independent summary of higher\-order correlations captured by a variational quantum circuit \(VQC\) trained to perform the same task\. Unlike classical feature augmentation, the quantum Fisher information matrix encodes the intrinsic geometry of the learned quantum state manifold\. While this feature map, motivated by quantum information theory, is ordinarily non\-trivial to model classically, it can surface statistical structure that additional classical data or model capacity finds difficult to learn\. This makes the quantum Fisher view a genuinely complementary modality rather than a redundant one\. We demonstrate thatQuiverimproves standard performance metrics on two benchmark datasets from very different fields: QM9 for predicting molecule properties, andJetClassfor predicting jet flavor at the Large Hadron Collider \(LHC\)\. The core contribution, however, is domain\-agnostic: the quantum Fisher view can be fused into a broad class of model architectures via targeted modifications to the base architecture, to incorporate information about the quantum geometry of the problem\. These results demonstrate that quantum\-geometric features, extracted from simulated variational circuits, can deliver measurable value for standard machine learning tasks, well before the advent of fault\-tolerant quantum hardware\.

Machine Learning, Quantum Computing, High Energy Physics, Variational Quantum Circuits, ICML

## 1Introduction

Multi\-million parameter models play an increasingly important role in scientific analysis of extremely high\-dimensional data\. Architectures such as graph neural networks \(GNNs\)\(Gilmeret al\.,[2017](https://arxiv.org/html/2606.02785#bib.bib2); Battagliaet al\.,[2018](https://arxiv.org/html/2606.02785#bib.bib3); Scarselliet al\.,[2009](https://arxiv.org/html/2606.02785#bib.bib15)\)and transformers\(Vaswaniet al\.,[2017](https://arxiv.org/html/2606.02785#bib.bib1)\)have become standard tools in this regard\. Two domains where this paradigm has driven significant methodological progress are high\-energy physics \(HEP\) and molecular chemistry\. In both cases the inputs are high\-dimensional structured objects, particles in a jet or atoms in a molecule, yet the models that process them are trained exclusively on classical representations of these systems\. While these representations are highly effective, they are ultimately restricted to correlations that can be efficiently expressed within standard feature spaces and architectures\.

Critical tasks in the context of major experiments such as the Large Hadron Collider \(LHC\), among others, are performed by such machine learning models, two prominent examples being*jet flavor classification*, and*anomaly detection*for new\-physics searches\. Jets are collimated sprays of hadrons produced when a high\-energy quark or gluon fragments and hadronizes; identifying their parton\-level origin \(e\.g\. light quark vs\. gluon, or boosted top quark vs\. QCD background\) is a central ingredient of essentially every analysis at the LHC\. State\-of\-the\-art jet taggers represent each jet as a*point cloud*of its constituent particles, each carrying kinematic features\(Komiskeet al\.,[2019](https://arxiv.org/html/2606.02785#bib.bib4); Qu and Gouskos,[2020](https://arxiv.org/html/2606.02785#bib.bib5); Quet al\.,[2022b](https://arxiv.org/html/2606.02785#bib.bib6)\), in addition to other information\. Similar principles are used to design model\-agnostic anomaly\-detection pipelines aimed at uncovering physics beyond the Standard Model\(Nachman and Shih,[2020](https://arxiv.org/html/2606.02785#bib.bib8); Kasieczka and others,[2021](https://arxiv.org/html/2606.02785#bib.bib7)\)\.

While such architectures excel at exploiting kinematic correlations in the particle\-level representation of a jet, they are limited by the structure of the feature space in which these correlations are expressed\. In particular, higher\-order and non\-local correlations must be learned implicitly through model capacity rather than being directly exposed to the model\. This motivates the exploration of alternative representations that can surface such structure more directly\. Jets arise from a coherent branching process governed by quantum chromodynamics, and consequently exhibit rich correlation patterns among their constituents\. In practice, however, these objects are represented for machine learning through classical feature constructions, usually kinematic, which can obscure or compress precisely those multi\-particle correlations that are most discriminative—for example, between a color\-singletWWjet and a color\-connected QCD jet\. This motivates the exploration of alternative representations that make such structure more directly accessible\. In this work, we operationalize this idea through the notion of*quantum views*of a jet: embeddings of classical data into a Hilbert space that expose its geometric and correlation structure in ways not readily captured by standard kinematic summaries\.

This structural limitation extends equally to molecular chemistry, where a central task is the prediction of quantum\-mechanical molecular properties from structure, with targets such as the HOMO\-LUMO gap, dipole moment, isotropic polarizability and atomization energy available in benchmark datasets like QM9\(Ramakrishnanet al\.,[2014](https://arxiv.org/html/2606.02785#bib.bib9)\)\. GNNs and transformers tailored to molecular graphs and atomic point clouds, SchNet\(Schüttet al\.,[2017](https://arxiv.org/html/2606.02785#bib.bib10)\), MPNN\(Gilmeret al\.,[2017](https://arxiv.org/html/2606.02785#bib.bib2)\), DimeNet\(Gasteigeret al\.,[2020b](https://arxiv.org/html/2606.02785#bib.bib11)\)and equivariant successors, now define the state of the art on these benchmarks\. These targets often depend on complex, highly correlated interactions that are only indirectly reflected in classical structural descriptors\. As a result, models must infer these relationships from data rather than accessing a representation in which such correlations are naturally organized\.

We address this limitation by introducing a*quantum Fisher view*of the input in which we map classical data into a parameterized quantum state via a variational quantum circuit \(VQC\)\(Cerezoet al\.,[2021](https://arxiv.org/html/2606.02785#bib.bib17)\), and extract the associated quantum Fisher information matrix \(QFIM\)\(Liuet al\.,[2020](https://arxiv.org/html/2606.02785#bib.bib13); Meyer,[2021](https://arxiv.org/html/2606.02785#bib.bib14); Abbaset al\.,[2021](https://arxiv.org/html/2606.02785#bib.bib18)\)\. This construction does not require an assumption that the underlying system is quantum or that the encoding reflects a physical quantum state\. Rather, it provides a principled mapping of classical data into a Hilbert space where geometric structure, in particular, sensitivity and higher\-order correlations, can be probed through the induced metric\.

On this basis, we proposeQuiver, a paradigm that fuses the quantum Fisher view with the classical view of the same input\.Quiveris deliberately architecture\-agnostic: for transformer backbones we condition the model through cross\-attention between the quantum and classical modalities, and for GNN backbones we modulate the learned graph messages by features derived from theQFIM\. We show thatQuiverdelivers consistent improvements over classical baselines of similar or greater complexity, including the Particle Transformer\(Quet al\.,[2022b](https://arxiv.org/html/2606.02785#bib.bib6)\), a\>2\>\\\!2M\-parameter state\-of\-the\-art jet tagger used in LHC analyses, and DimeNet\+\+, a state\-of\-the\-art model on QM9 property prediction\.

## 2MotivatingQuiver: A Mathematical Background

In this section, we provide a short mathematical overview of aVQCand theQFIM\. Thereafter, we describe the quantum encoding used for the HEP task, and the novel quantum encoding we develop for the molecular chemistry application\. All quantum circuit operations described in this paper were simulated classically using the quantum simulator libraryPennyLane\(Bergholmet al\.,[2022](https://arxiv.org/html/2606.02785#bib.bib24)\)\.

### 2\.1Variational quantum circuits

AVQCis a parameterized unitaryU\(𝜽\)U\(\\bm\{\\theta\}\)acting on a fixed reference state ofNNqubits\(Cerezoet al\.,[2021](https://arxiv.org/html/2606.02785#bib.bib17)\)\. With the standard initialization\|0⟩⊗N\|0\\rangle^\{\\otimes N\}, the circuit prepares first an input encoding:

\|ψ\(𝚯\)⟩=U\(𝚯\)\|0⟩⊗N,𝚯∈ℝP,\|\\psi\(\\bm\{\\Theta\}\)\\rangle\\;=\\;U\(\\bm\{\\Theta\}\)\\,\|0\\rangle^\{\\otimes N\},\\qquad\\bm\{\\Theta\}\\in\\mathbb\{R\}^\{P\},\(1\)where thePPangles𝚯=\(Θ1,…,ΘP\)\\bm\{\\Theta\}=\(\\Theta\_\{1\},\\dots,\\Theta\_\{P\}\)are functions of the inputs:𝚯=𝚯\(x\)\\bm\{\\Theta\}=\\bm\{\\Theta\}\(x\), withxxa jet or a molecule\. This encoding is typically followed by a series of entanglement operations and trainable single\-qubit rotationsR\(θ\)R\(\\theta\), which together constitute a variational ansatz\. The output of the circuit is obtained by measuring one or more qubit observables, yielding an expectation value that serves as the prediction of theVQC\.

### 2\.2The Quantum Fisher Information Matrix

Now, because the map of inputs𝜽↦\|ψ\(𝚯,𝜽\)⟩\\bm\{\\theta\}\\\!\\mapsto\\\!\|\\psi\(\\bm\{\\Theta,\\bm\{\\theta\}\}\)\\rangleis smooth, the resultant quantum states form a submanifold of pure states whose canonical Riemannian structure is the Fubini–Study metric \(\(Provost and Vallee,[1980](https://arxiv.org/html/2606.02785#bib.bib19)\)\)\. On pure states this metric coincides, up to an overall factor of four, with theQFIM\(\(Braunstein and Caves,[1994](https://arxiv.org/html/2606.02785#bib.bib21)\)\):

Fij\(𝜽\)=4Re\[⟨∂iψ\|∂jψ⟩−⟨∂iψ\|ψ⟩⟨ψ\|∂jψ⟩\],F\_\{ij\}\(\\bm\{\\theta\}\)\\;=\\;4\\,\\mathrm\{Re\}\\\!\\left\[\\,\\langle\\partial\_\{i\}\\psi\\,\|\\,\\partial\_\{j\}\\psi\\rangle\-\\langle\\partial\_\{i\}\\psi\\,\|\\,\\psi\\rangle\\langle\\psi\\,\|\\,\\partial\_\{j\}\\psi\\rangle\\,\\right\],\\qquad\(2\)
where∂i≡∂∂θi\\partial\_\{i\}\\equiv\\frac\{\\partial\}\{\\partial\\theta\_\{i\}\}, yielding the line element

ds2=14∑i,jFij\(𝜽\)dθidθj,\\mathrm\{d\}s^\{2\}\\;=\\;\\tfrac\{1\}\{4\}\\sum\_\{i,j\}F\_\{ij\}\(\\bm\{\\theta\}\)\\,\\mathrm\{d\}\\theta\_\{i\}\\,\\mathrm\{d\}\\theta\_\{j\},\(3\)which measures the statistical distinguishability of two infinitesimally separated parameter points𝜽\\bm\{\\theta\}and𝜽\+d𝜽\\bm\{\\theta\}\+\\mathrm\{d\}\\bm\{\\theta\}through the states they prepare\. In our setting, the input enters through the data\-dependent state preparation that precedes the trainable rotations, soFij\(𝜽;x\)F\_\{ij\}\(\\bm\{\\theta\};x\)is an input\-conditioned object: evaluated at a fixed reference𝜽0\\bm\{\\theta\}\_\{0\}, it characterizes how the encoded state ofxxshapes the local geometry of the trainable\-parameter manifold\. This computation is tractable on existing classical simulators, using standard implementations such as that inPennylane\.

The diagonalFiiF\_\{ii\}records how strongly the prepared state responds to a perturbation ofθi\\theta\_\{i\}alone, and so acts as a per\-feature “dynamic” importance score\. The off\-diagonalFijF\_\{ij\}couple distinct parameters and are non\-zero precisely when the two corresponding directions in input space act*coherently*on overlapping qubit subsystems; they vanish whenever the two parameters drive factorized, independent parts of the state\. Under an encoding of the form described in the two previous sections, this gives a direct relational reading: large off\-diagonal entries between two qubits flag collective behavior of the corresponding input elements; while a nearly diagonalFFsignals effectively independent contributions\. The result is a compact relational tensor whose entries are directly consumable by attention layers, or by message\-passing networks, as will now be detailed in the following sections\.

### 2\.3The𝟏𝐏𝟏𝐐\\mathbf\{1P1Q\}particle embedding

For jets we adopt the one\-particle–one\-qubit \(1P1Q\\mathrm\{1P1Q\}\) encoding ofBalet al\.\([2025](https://arxiv.org/html/2606.02785#bib.bib23)\), in which each reconstructed constituent is mapped to a dedicated qubit using its kinematic features, followed by two\-qubit entanglement, and standard Pauli rotation operations\. We represent each jet as an ordered set of its ten highestpTp\_\{\\mathrm\{T\}\}constituents, using\(pT,η,ϕ\)\(p\_\{\\mathrm\{T\}\},\\eta,\\phi\)as the set of kinematic input features\. Here,pTp\_\{\\mathrm\{T\}\}is the transverse momentum \(magnitude of momentum in the plane perpendicular to the collider beam axis\),η=−ln⁡\(tan⁡\(θ/2\)\)\\eta=\-\\ln\(\\tan\(\\theta/2\)\)is the pseudorapidity, andθ,ϕ\\theta,\\phiare the zenith and azimuthal angles\. We use the standard coordinate references in collider physics where theZ\\mathrm\{Z\}axis is defined as being along the collider beam direction\. We omit further details of the circuit and embedding, this being amply described in\(Balet al\.,[2025](https://arxiv.org/html/2606.02785#bib.bib23)\)\.

### 2\.4The𝟐𝐀𝟐𝐐\{\\mathbf\{2A2Q\}\}molecular embedding

For the QM9 molecular dataset, we design and use a novel two\-atom–two\-qubit embedding which we call2A2Q\\mathrm\{2A2Q\}\. The objective is to regressΔϵ=ϵHOMO−ϵLUMO\\Delta\\epsilon=\\epsilon\_\{\\mathrm\{HOMO\}\}\-\\epsilon\_\{\\mathrm\{LUMO\}\}, defined as the energy difference between the highest occupied \(HOMO\) and lowest unoccupied molecular orbitals \(LUMO\), on the QM9 dataset\. We represent each molecule as a1010\-qubit system, with one qubit assigned to each heavy atom\. The unused qubits are populated with randomly sampled hydrogen atoms, and all remaining explicit hydrogen information is otherwise discarded\.

Starting from the initial state\|0⟩⊗N\\ket\{0\}^\{\\otimes N\}, we first learn a per\-atom embedding by applyingRY\(watomj\)\|0⟩R\_\{Y\}\(w\_\{\\mathrm\{atom\}\}^\{j\}\)\\ket\{0\}on each qubitjj, whereRYR\_\{Y\}denotes the Pauli rotation about theYY\-axis andwatomjw\_\{\\mathrm\{atom\}\}^\{j\}is a trainable parameter associated with the atomic species occupying qubitjj\. A naive one\-atom–one\-qubit encoding of Cartesian coordinates would introduce a dependence on the choice of reference frame, which is undesirable\. To mitigate this, we combine the encoding and entanglement stages into a single pairwise operation, defined by the angles

ω1\(ij\)\\displaystyle\\omega\_\{1\}^\{\(ij\)\}=ed1⋅\(1−dijdCUTOFF\)⋅cos⁡\(θij\),\\displaystyle=e\_\{d\_\{1\}\}\\cdot\\left\(1\-\\frac\{d\_\{ij\}\}\{d\_\{\\mathrm\{CUTOFF\}\}\}\\right\)\\cdot\\cos\(\\theta\_\{ij\}\),\(4\)ω2\(ij\)\\displaystyle\\omega\_\{2\}^\{\(ij\)\}=ebond\(ij\)⋅π,\\displaystyle=e\_\{\\mathrm\{bond\}\}^\{\(ij\)\}\\cdot\\pi,ω3\(ij\)\\displaystyle\\omega\_\{3\}^\{\(ij\)\}=ed2⋅\(1−dijdCUTOFF\)⋅cos⁡\(ϕij\),\\displaystyle=e\_\{d\_\{2\}\}\\cdot\\left\(1\-\\frac\{d\_\{ij\}\}\{d\_\{\\mathrm\{CUTOFF\}\}\}\\right\)\\cdot\\cos\(\\phi\_\{ij\}\),followed by the two\-qubit unitary

𝒰ij=\(IYY\(ω3\(ij\)\)IZZ\(ω2\(ij\)\)IXX\(ω1\(ij\)\)\\displaystyle\\mathcal\{U\}\_\{ij\}=\\big\(I\_\{YY\}\(\\omega\_\{3\}^\{\(ij\)\}\)\\,I\_\{ZZ\}\(\\omega\_\{2\}^\{\(ij\)\}\)\\,I\_\{XX\}\(\\omega\_\{1\}^\{\(ij\)\}\)\)\\displaystyle\\big\)\(RY\(watomi\)⊗RY\(watomj\)\)\|00⟩\\displaystyle\\big\(R\_\{Y\}\(w\_\{\\mathrm\{atom\}\}^\{i\}\)\\otimes R\_\{Y\}\(w\_\{\\mathrm\{atom\}\}^\{j\}\)\\big\)\\ket\{00\},\(5\)whereed1e\_\{d\_\{1\}\}anded2e\_\{d\_\{2\}\}are learnable scaling parameters,ebond\(ij\)e\_\{\\mathrm\{bond\}\}^\{\(ij\)\}is a learnable bond\-type entanglement parameter,dCUTOFF=1\.7Åd\_\{\\mathrm\{CUTOFF\}\}=1\.7~\\text\{\\AA \}is fixed, andIXXI\_\{XX\},IYYI\_\{YY\},IZZI\_\{ZZ\}denote the Ising\-type two\-qubit interactionsexp⁡\(−iωσX⊗σX/2\)\\exp\(\-i\\,\\omega\\,\\sigma\_\{X\}\\otimes\\sigma\_\{X\}/2\),exp⁡\(−iωσY⊗σY/2\)\\exp\(\-i\\,\\omega\\,\\sigma\_\{Y\}\\otimes\\sigma\_\{Y\}/2\), andexp⁡\(−iωσZ⊗σZ/2\)\\exp\(\-i\\,\\omega\\,\\sigma\_\{Z\}\\otimes\\sigma\_\{Z\}/2\), respectively\. The pairwise distancedijd\_\{ij\}is frame\-invariant by construction, while the pairwise zenith and azimuthal anglesθij\\theta\_\{ij\}andϕij\\phi\_\{ij\}retain a residual frame dependence; we accept this trade\-off, as the objective of the small variational quantum circuit is not to achieve state\-of\-the\-art performance in isolation\. The entanglement block𝒰ij\\mathcal\{U\}\_\{ij\}is applied only for atom pairs\(i,j\)\(i,j\)satisfyingdij<dCUTOFFd\_\{ij\}<d\_\{\\mathrm\{CUTOFF\}\}and connected by a chemical bond\. The pairwise stage is followed by a per\-qubit trainable rotation sequenceRZ⋅RY⋅RZR\_\{Z\}\\cdot R\_\{Y\}\\cdot R\_\{Z\}with independent parameters on each qubit\. Together, the atom embedding, conditional pairwise entanglement, and single\-qubit rotations constitute one layer of the circuit, and we stackN=2N=2such layers in the final architecture\.

The prediction for the gapΔϵ\\Delta\\epsilonis extracted from theVQCvia measurement of the observable:

ℋ=∑i=1NciZi,\\mathcal\{H\}=\\sum\_\{i=1\}^\{N\}c\_\{i\}\\,Z\_\{i\},\(6\)where the indexiiruns over allN=10N=10qubits of the system,ZiZ\_\{i\}denotes the Pauli\-ZZoperator acting on theii\-th qubit, and\{ci\}\\\{c\_\{i\}\\\}are trainable coefficients\. Since the HOMO–LUMO gap is strictly positive, the raw expectation value⟨ℋ⟩∈\[−∑i\|ci\|,∑i\|ci\|\]\\langle\\mathcal\{H\}\\rangle\\in\[\-\\sum\_\{i\}\|c\_\{i\}\|,\\,\\sum\_\{i\}\|c\_\{i\}\|\]is shifted by∑i\|ci\|\\sum\_\{i\}\|c\_\{i\}\|so that the predicted value lies in\[0,2∑i\|ci\|\]\[0,\\,2\\sum\_\{i\}\|c\_\{i\}\|\]\. The circuit parameters are optimized by minimizing the Huber loss \(\(Huber,[1964](https://arxiv.org/html/2606.02785#bib.bib27); PyTorch Contributors,[2024](https://arxiv.org/html/2606.02785#bib.bib26)\)\) between the predicted gap and the target value in units ofmeV\\mathrm\{meV\}, which provides robustness to outliers by interpolating between anℓ2\\ell\_\{2\}behavior for small residuals and anℓ1\\ell\_\{1\}behavior for large ones\.

## 3Adding the Quantum Views

### 3\.1Jet Flavor Classification

We evaluateQuiveron theJetClassdataset of\(Quet al\.,[2022a](https://arxiv.org/html/2606.02785#bib.bib20)\), focusing on the binary classification task of distinguishing hadronic top\-quark jets \(t→Wb→qq¯bt\\\!\\to\\\!Wb\\\!\\to\\\!q\\bar\{q\}b\) against the QCD multijet background\. As our classical baseline, we adopt the Particle Transformer\(Quet al\.,[2022b](https://arxiv.org/html/2606.02785#bib.bib6)\), a state\-of\-the\-art model with approximately2\.14M2\.14\\mathrm\{M\}parameters111We use the official release\(Qu and Li,[2022](https://arxiv.org/html/2606.02785#bib.bib29)\)of Particle Transformer on GitHub for the baseline, and incorporate theQFIMinto its attention mechanism via implicit cross\-attention through sequence concatenation, as described in Section[3\.1\.2](https://arxiv.org/html/2606.02785#S3.SS1.SSS2)\.

Table[1](https://arxiv.org/html/2606.02785#S3.T1)summarizes the per\-particle feature sets\. The kinematic baseline uses the per\-particle kinematic\-only features relative to the jet axis; the full\-feature baseline additionally includes calorimeter energy deposits, particle identification flags, and track impact\-parameters\.

Table 1:Per\-particle input features for the jet tagging experiments\.All models also receive four\-vectors\(px,py,pz,E\)\(p\_\{x\},p\_\{y\},p\_\{z\},E\)for the Lorentz\-vector pair embedding and\(Δη,Δϕ\)\(\\Delta\\eta,\\Delta\\phi\)as spatial point coordinates\.

#### 3\.1\.1QFIM representation

Given the compute requirements of simulating large multi\-qubit states, we restrict our implementation of the1P1Q\\mathrm\{1P1Q\}encoding to a maximum ofN=10N=10qubits, allowing us to use only the ten highest\-pTp\_\{\\mathrm\{T\}\}constituents of a given jet, with all subsequent per\-particle information being discarded\. For1010particles encoded under the1P1Q\\mathrm\{1P1Q\}scheme with three local rotation\-gate parameters per qubit, theQFIMis a30×3030\\\!\\times\\\!30real symmetric matrix, stored in the data pipeline as 90 channels over 10 particle slots\. Each particle slotiireceives all 90QFIMchannels as its feature vector, which is embedded by a Particle\-Transformer\-style MLP into a token of dimension 128 and appended to the classical particle\-token sequence\. The transformer therefore receives 20 tokens in total: 10 classical particle tokens followed by 10QFIMtokens\.

#### 3\.1\.2TheQuiverParadigm:QFIMinjection

Let\{xi\}i=1P\\\{x\_\{i\}\\\}\_\{i=1\}^\{P\}be the particle features,\{vi\}i=1P\\\{v\_\{i\}\\\}\_\{i=1\}^\{P\}be the four\-vectors used for pairwise embedding and𝐐∈ℝ90×10\\mathbf\{Q\}\\in\\mathbb\{R\}^\{90\\times 10\}be the reshapedQFIMwith the second dimension representing each constituent particle of the jet\. We incorporate theQFIMinto the Particle Transformer by embedding the per\-particleQFIMchannels independently using a Particle\-Transformer\-style MLP and appending the resulting tokens to the classical particle sequence:

transformer input=\[k1,…,kP,q1,…,qP\],\\text\{transformer input\}=\\bigl\[k\_\{1\},\\dots,k\_\{P\},\\;q\_\{1\},\\dots,q\_\{P\}\\bigr\],\(7\)whereki=MLPtok\(xi\)∈ℝ128k\_\{i\}=\\mathrm\{MLP\}\_\{\\text\{tok\}\}\(x\_\{i\}\)\\in\\mathbb\{R\}^\{128\}are the classical particle tokens andqi=MLPQFIM\(𝐐\[:,i\]\)∈ℝ128q\_\{i\}=\\mathrm\{MLP\}\_\{\\text\{\\mbox\{QFIM\}\{\}\}\}\(\\mathbf\{Q\}\[:,i\]\)\\in\\mathbb\{R\}^\{128\}are the embeddedQFIMtokens, with𝐐\[:,i\]\\mathbf\{Q\}\[:,i\]denoting all9090QFIMchannels associated with particle slotii\. The Lorentz\-vector pair bias is computed for the original particle sequence and zero\-padded to the doubled sequence length\. Algorithm[1](https://arxiv.org/html/2606.02785#alg1)details the complete forward pass\.

Algorithm 1Quiver: QFIM token injection into ParT\.0:

\{xi\}i=1P\\\{x\_\{i\}\\\}\_\{i=1\}^\{P\},

\{vi\}i=1P\\\{v\_\{i\}\\\}\_\{i=1\}^\{P\},

𝐐\\mathbf\{Q\}
for

i=1,…,Pi=1,\\dots,Pdo

ki←MLPtok\(xi\)k\_\{i\}\\leftarrow\\mathrm\{MLP\}\_\{\\text\{tok\}\}\(x\_\{i\}\)
qi←MLPqfim\(𝐐\[:,i\]\)q\_\{i\}\\leftarrow\\mathrm\{MLP\}\_\{\\text\{qfim\}\}\(\\mathbf\{Q\}\[:,i\]\)
endfor

bij←PairEmbed\(vi,vj\)b\_\{ij\}\\leftarrow\\mathrm\{PairEmbed\}\(v\_\{i\},v\_\{j\}\)
Transformer Input

←\[k1,…,kP,q1,…,qP\]\\;\\leftarrow\\;\[k\_\{1\},\\dots,k\_\{P\},\\;q\_\{1\},\\dots,q\_\{P\}\]

With the architectural modifications of Algorithm[1](https://arxiv.org/html/2606.02785#alg1), theQuiver\-augmented Particle Transformer has a parameter count of2\.29M2\.29\\mathrm\{M\}, a modest increase of7%7\\%over the original\.

### 3\.2Molecular Property Regression

The task, as before, remains the regression ofΔϵ\\Delta\\epsilon\. The quantum encoding was previously described in Section[2](https://arxiv.org/html/2606.02785#S2)\. TheQFIMis computed per molecule and stored as a10×1010\\\!\\times\\\!10grid of6×66\\\!\\times\\\!6sub\-blocks, corresponding to 2 circuit layers times 3 single\-qubit rotations per qubit, resulting in a60×6060\\\!\\times\\\!60matrix whose off\-diagonal blockQijQ\_\{ij\}captures the coherent coupling between the rotation\-gate parameter groups of qubitsiiandjj, which by the conditions of the2A2Q\\mathrm\{2A2Q\}encoding necessarily corresponds to that between atomsiiandjj\.

#### 3\.2\.1TheQuiverparadigm: Quantum\-informed edge\-state rescaling

Rather than introducing an independent QFIM processing branch, which risks an improvement by generic parameter capacity rather than physically aligned information, we constrain theQFIMto act as a modulating factor on top of the existing baseline edge\-state vectors\.

We apply this mechanism to DimeNet\+\+\(Gasteigeret al\.,[2020a](https://arxiv.org/html/2606.02785#bib.bib12)\)222The baseline uses the DimeNet\+\+ implementation in the packagePyTorchGeometric\(Fey and Lenssen,[2019](https://arxiv.org/html/2606.02785#bib.bib25); Contributors,[2024](https://arxiv.org/html/2606.02785#bib.bib16)\)\., an improvement to the original DimeNet\(Gasteigeret al\.,[2020b](https://arxiv.org/html/2606.02785#bib.bib11)\), which operates on directed\-edge embeddingsxij\(l\)x\_\{ij\}^\{\(l\)\}updated by interaction blocks rather than explicit node messages\. This rescaling is applied after the initial embedding block and after each interaction block, so the steps described in Algorithm[2](https://arxiv.org/html/2606.02785#alg2)are realizable in DimeNet\+\+’s native edge state\. TheQFIM\-modulated rescaling results in a parameter size of1\.891M1\.891\\mathrm\{M\}, a negligible increase over the original’s1\.886M1\.886\\mathrm\{M\}parameters\. We refer to this model as𝒬\\mathcal\{Q\}DimeNet\+\+\.

Algorithm 2Quiverin DimeNet\+\+: QFIM\-gated edge\-state rescaling\.0:atomic numbers

\{zi\}\\\{z\_\{i\}\\\}, positions

\{𝐫i\}\\\{\\mathbf\{r\}\_\{i\}\\\},QFIMsub\-blocks

\{Qij\}\\\{Q\_\{ij\}\\\}, learnable scalar

α\\alpha
Construct: DimeNet\+\+ radius graph and geometric basis functions:

RBFij\\mathrm\{RBF\}\_\{ij\}and

SBFkji\\mathrm\{SBF\}\_\{kji\}\.

xij\(0\)←EmbedDimeNet\+⁣\+\(zi,zj,RBFij\)x\_\{ij\}^\{\(0\)\}\\leftarrow\\textsc\{Embed\}^\{\\mathrm\{DimeNet\+\+\}\}\(z\_\{i\},z\_\{j\},\\mathrm\{RBF\}\_\{ij\}\)
x~ij\(0\)←Rescale\(xij\(0\),Qij,α\)\\widetilde\{x\}\_\{ij\}^\{\(0\)\}\\leftarrow\\textsc\{Rescale\}\(x\_\{ij\}^\{\(0\)\},Q\_\{ij\},\\alpha\)
o←OutputDimeNet\+⁣\+\(0\)\(x~ij\(0\),RBFij\)o\\leftarrow\\textsc\{Output\}^\{\(0\)\}\_\{\\mathrm\{DimeNet\+\+\}\}\(\\widetilde\{x\}\_\{ij\}^\{\(0\)\},\\mathrm\{RBF\}\_\{ij\}\)
for

l=1,…,Ll=1,\\ldots,Ldo

xij\(l\)←InteractionDimeNet\+⁣\+\(l\)\(x~ij\(l−1\),RBFij,SBFkji\)x\_\{ij\}^\{\(l\)\}\\leftarrow\\textsc\{Interaction\}^\{\(l\)\}\_\{\\mathrm\{DimeNet\+\+\}\}\(\\widetilde\{x\}\_\{ij\}^\{\(l\-1\)\},\\mathrm\{RBF\}\_\{ij\},\\mathrm\{SBF\}\_\{kji\}\)
x~ij\(l\)←Rescale\(xij\(l\),Qij,α\)\\widetilde\{x\}\_\{ij\}^\{\(l\)\}\\leftarrow\\textsc\{Rescale\}\(x\_\{ij\}^\{\(l\)\},Q\_\{ij\},\\alpha\)\{QFIM gate\}

o←o\+OutputDimeNet\+⁣\+\(l\)\(x~ij\(l\),RBFij\)o\\leftarrow o\+\\textsc\{Output\}^\{\(l\)\}\_\{\\mathrm\{DimeNet\+\+\}\}\(\\widetilde\{x\}\_\{ij\}^\{\(l\)\},\\mathrm\{RBF\}\_\{ij\}\)\{pooling\}

endfor

y^G←∑i∈Goi\\hat\{y\}\_\{G\}\\leftarrow\\sum\_\{i\\in G\}o\_\{i\}\{graph\-level sum readout\}

The edge\-state rescaling of Step[3](https://arxiv.org/html/2606.02785#alg2.l3), Algorithm[2](https://arxiv.org/html/2606.02785#alg2)is implemented as a residual multiplicative gate on the baseline directed edge statexij\(l\)x\_\{ij\}^\{\(l\)\}

x~ij\(l\)=Rescale\(xij\(l\),Qij,α\)=\(1\+α⋅Θ\(Qij\)\)xij\(l\)\.\\widetilde\{x\}\_\{ij\}^\{\(l\)\}=\{\\textsc\{Rescale\}\}\(x\_\{ij\}^\{\(l\)\},Q\_\{ij\},\\alpha\)=\\bigg\(1\+\\alpha\\cdot\\Theta\(Q\_\{ij\}\)\\bigg\)x\_\{ij\}^\{\(l\)\}\.\(8\)
Here,α\\alphais a global learnable scalar initialized to zero, ensuring that the two networks are identical in the beginning\. The functionΘ\(Qij\)\\Theta\(Q\_\{ij\}\)is a per\-edge bounded scalar learnt using a convolutional neural network \(CNN\) applied to the6×66\\times 6QFIMsub\-block, followed by a scaling multilayer perceptron \(MLP\)\(Rosenblatt,[1958](https://arxiv.org/html/2606.02785#bib.bib28)\)with a finaltanh\\tanhactivation; ensuringΘ\(Qij\)∈\[−1,1\]\\Theta\(Q\_\{ij\}\)\\in\[\-1,1\]\. The exact details of the CNN and training setup are provided in the appendix\.

The inputs and prediction targets are standardized using the statistics from the train sample, with subsequent de\-standardization of the output before reporting the results\. Ten independent seed initializations are run for each variant to obtain the error bars\.

## 4Results

### 4\.1Jet Flavor Classification

The evaluation criteria and classical benchmark was previously described in Section[3\.1](https://arxiv.org/html/2606.02785#S3.SS1)\. We train both the Particle Transformer \(ParT\) and itsQuiver\-augmented variant with five independent random seed initializations, and report our results in Table[2](https://arxiv.org/html/2606.02785#S4.T2), using the AUC score and the QCD background rejection rate as our performance metrics\. The latter quantity is a standard quantity used in HEP, defined as1/ϵB1/\\epsilon\_\{B\}\. We evaluate this at a top\-tagging efficiency ofϵS=0\.5\\epsilon\_\{S\}=0\.5, \(whereϵB=FPR\\epsilon\_\{B\}=\\mathrm\{FPR\},ϵS=TPR\\epsilon\_\{S\}=\\mathrm\{TPR\}\)\. Each run, comprising the aforementioned five initializations, is carried out twice, once for the kinematic\-only set of features and then again for the full set of available inputs, both described in Table[1](https://arxiv.org/html/2606.02785#S3.T1)\.

Table 2:Comparison of ParT andQuiveron the top tagging task\.These results demonstrate that theQuiverparadigm contributes to improving the performance of even large classical state\-of\-the\-art baseline model such as the Particle Transformer in terms of community\-standard performance metrics\.

### 4\.2Molecular Property Regression

We evaluateQuiveron the benchmark described in Section[3\.2\.1](https://arxiv.org/html/2606.02785#S3.SS2.SSS1), with the results reported as mean±\\pmstandard deviation over ten independent seed initializations\.

Figure[1](https://arxiv.org/html/2606.02785#S4.F1)shows the validation MAE curves over training for both DimeNet\+\+ variants\.

The upper panel shows that𝒬\\mathcal\{Q\}DimeNet\+\+ consistently achieves lower validation MAE than the baseline, with the gap opening early in training and persisting through convergence\. The lower panel shows the per\-epoch difference between DimeNet\+\+ and𝒬\\mathcal\{Q\}DimeNet\+\+, which remains positive throughout training\.

![Refer to caption](https://arxiv.org/html/2606.02785v1/x1.png)Figure 1:Validation MAE during training for the DimeNet\+\+ baseline and the quantum\-inspired𝒬\\mathcal\{Q\}DimeNet\+\+ model\. Solid lines show the mean acrossN=10N=10paired seeds \(identical data splits and initialization RNG state across the two models; only the architecture differs\), smoothed with a 3\-epoch rolling window\. Shaded bands denote±1\\pm 1sample standard deviation across seeds \(calculated using standard library functions withddof=1\)\. The lower panel shows the per\-epoch paired differenceΔMAE=MAEDimeNet\+\+−MAE𝒬DimeNet\+\+\\Delta\\mathrm\{MAE\}=\\mathrm\{MAE\}\_\{\\text\{DimeNet\+\+\}\}\-\\mathrm\{MAE\}\_\{\\mathcal\{Q\}\\text\{DimeNet\+\+\}\}with the corresponding±1σ\\pm 1\\sigmaband over the same paired seeds; positive values indicate lower MAE for𝒬\\mathcal\{Q\}DimeNet\+\+\.Our results show that𝒬\\mathcal\{Q\}DimeNet\+\+ achieves a mean test MAE of67\.92±1\.98meV67\.92\\pm 1\.98\\,\\mathrm\{meV\}against72\.42±1\.52meV72\.42\\pm 1\.52\\,\\mathrm\{meV\}for the classical baseline, a relative reduction of6\.21%6\.21\\%achieved with a negligible parameter overhead of0\.27%0\.27\\%\.

The mean paired difference isΔMAE=4\.50±2\.46meV\\Delta\\mathrm\{MAE\}=4\.50\\pm 2\.46\\,\\mathrm\{meV\}\. A pairedtt\-test across the ten seeds yieldst9=5\.78t\_\{9\}=5\.78\(p<10−3p<10^\{\-3\}\), confirming that the observed reduction is statistically significant and is not derived from seed\-level noise\. This serves as a demonstration that theQFIMcontributes genuine information rather than acting as a source of generic additional model capacity\.

## 5Conclusion

Our work introducedQuiver, an architecture\-agnostic paradigm in which the quantum Fisher information matrix of a variational quantum circuit, evaluated on classical inputs, furnishes a geometry\-aware view that complements the standard classical representation of the same example\. The construction is deliberately decoupled from any assumption that the underlying system is quantum: it treats the embedding solely as a mapping into a Hilbert space whose induced Fubini–Study metric \(equivalent to theQFIMup to an overall factor of four\) exposes higher\-order correlations that are not naturally organized within standard kinematic or structural feature spaces\. Fusing this quantum Fisher view with the classical view, via targeted architectural modifications, yields consistent improvements over two state\-of\-the\-art baselines on tasks drawn from very different domains\. On theJetClasstop\-quark tagging benchmark, the QUIVER\-augmented Particle Transformer improves both the AUC and the QCD background rejection1/ϵB1/\\epsilon\_\{B\}atϵS=0\.5\\epsilon\_\{S\}=0\.5across all training sample sizes and feature sets considered, at a7%7\\%parameter overhead\. On the QM9 HOMO–LUMO gap regression task,𝒬\\mathcal\{Q\}DimeNet\+\+ reduces the test mean absolute error from72\.42±1\.5272\.42\\pm 1\.52meV to67\.92±1\.9867\.92\\pm 1\.98meV, a6\.21%6\.21\\%relative improvement obtained at a0\.27%0\.27\\%parameter overhead, with a mean paired differenceΔMAE=4\.50±2\.46\\Delta\\mathrm\{MAE\}=4\.50\\pm 2\.46meV that remains positive within±1σ\\pm 1\\sigmaacross the ten paired seeds\. The persistence of these gains under negligible parameter overhead, and across distinct architectures, input modalities and physical symmetries, supports the interpretation that the QFIM supplies genuinely complementary information rather than acting as a source of generic model capacity\. Taken together, these results indicate that quantum\-geometric features extracted from classically simulated variational quantum circuits can deliver measurable value to large classical models today, decoupling the practical utility of quantum\-informed representations from progress toward fault\-tolerant hardware\.

## 6Limitations

We provide a short discussion of the main limitations of this work, with a clear pathway for tackling these in future research\.

First, the computational overhead of simulating extremely large quantum circuits constrains us to use at most1010qubits for theVQCof the1P1Q\\mathrm\{1P1Q\}encoding, which contains the kinematic information of up to ten jet constituents, a limitation propagated therefore to the Particle Transformer benchmark\. Even though most of the critical information required for jet flavor classification is often contained in these high\-momentum constituents, this still results in a loss of performance, as compared to what would have been attained by using all150150jet constituent particles available in theJetClassdataset\.

This limitation also affects the task of molecular property prediction: the absolute performance of our benchmark DimeNet\+\+, andQuiver\-augmented𝒬\\mathcal\{Q\}DimeNet\+\+ is marginally below that of the numbers reported in the original DimeNet\+\+ paper\(Gasteigeret al\.,[2020a](https://arxiv.org/html/2606.02785#bib.bib12)\), a necessary consequence of our setup operating on a restricted subset of up to1010atoms \(these being mapped to1010qubits\), resulting in the information contained in the remaining \(hydrogen\) atoms being lost\. The results are interpretable as a clear methodological gain under identical conditions for both large\-parameter models\. The goal therefore remains to scale up these systems to more qubits by the usage of, for example,HPC resources with multi\-GPU nodesfor large\-qubit quantum system simulations\.

Finally, a hybrid quantum\-classical pipeline that simultaneously minimizes the parameters of both the precursorVQCand the subsequent large neural model could in principle, converge to a global minimum with performance better than what is observed in the current iteration of this work\. This remains among our goals, with the main technical challenge lying in optimizing the quantum circuit based on a measurement of itsQFIMrather than an observable\.

## References

- A\. Abbas, D\. Sutter, C\. Zoufal, A\. Lucchi, A\. Figalli, and S\. Woerner \(2021\)The power of quantum neural networks\.Nature Computational Science1\(6\),pp\. 403–409\.Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p5.1)\.
- A\. Bal, M\. Klute, B\. Maier, M\. Oughton, E\. Pezone, and M\. Spannowsky \(2025\)One particle \- one qubit: particle physics data encoding for quantum machine learning\.Phys\. Rev\. D112,pp\. 076004\.External Links:[Document](https://dx.doi.org/10.1103/l8y2-87vq),[Link](https://link.aps.org/doi/10.1103/l8y2-87vq)Cited by:[§2\.3](https://arxiv.org/html/2606.02785#S2.SS3.p1.7)\.
- P\. W\. Battaglia, J\. B\. Hamrick, V\. Bapst, A\. Sanchez\-Gonzalez, V\. Zambaldi, M\. Malinowski, A\. Tacchetti, D\. Raposo, A\. Santoro, R\. Faulkner, C\. Gulcehre, F\. Song, A\. Ballard, J\. Gilmer, G\. Dahl, A\. Vaswani, K\. Allen, C\. Nash, V\. Langston, C\. Dyer, N\. Heess, D\. Wierstra, P\. Kohli, M\. Botvinick, O\. Vinyals, Y\. Li, and R\. Pascanu \(2018\)Relational inductive biases, deep learning, and graph networks\.arXiv preprint arXiv:1806\.01261\.Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p1.1)\.
- V\. Bergholm, J\. Izaac, M\. Schuld, C\. Gogolin, S\. Ahmed, V\. Ajith, M\. S\. Alam, G\. Alonso\-Linaje, B\. AkashNarayanan, A\. Asadi, J\. M\. Arrazola, U\. Azad, S\. Banning, C\. Blank, T\. R\. Bromley, B\. A\. Cordier, J\. Ceroni, A\. Delgado, O\. D\. Matteo, A\. Dusko, T\. Garg, D\. Guala, A\. Hayes, R\. Hill, A\. Ijaz, T\. Isacsson, D\. Ittah, S\. Jahangiri, P\. Jain, E\. Jiang, A\. Khandelwal, K\. Kottmann, R\. A\. Lang, C\. Lee, T\. Loke, A\. Lowe, K\. McKiernan, J\. J\. Meyer, J\. A\. Montañez\-Barrera, R\. Moyard, Z\. Niu, L\. J\. O’Riordan, S\. Oud, A\. Panigrahi, C\. Park, D\. Polatajko, N\. Quesada, C\. Roberts, N\. Sá, I\. Schoch, B\. Shi, S\. Shu, S\. Sim, A\. Singh, I\. Strandberg, J\. Soni, A\. Száva, S\. Thabet, R\. A\. Vargas\-Hernández, T\. Vincent, N\. Vitucci, M\. Weber, D\. Wierichs, R\. Wiersema, M\. Willmann, V\. Wong, S\. Zhang, and N\. Killoran \(2022\)PennyLane: automatic differentiation of hybrid quantum\-classical computations\.External Links:1811\.04968,[Link](https://arxiv.org/abs/1811.04968)Cited by:[§2](https://arxiv.org/html/2606.02785#S2.p1.1)\.
- S\. L\. Braunstein and C\. M\. Caves \(1994\)Statistical distance and the geometry of quantum states\.Physical Review Letters72\(22\),pp\. 3439–3443\.Cited by:[§2\.2](https://arxiv.org/html/2606.02785#S2.SS2.p1.1)\.
- M\. Cerezo, A\. Arrasmith, R\. Babbush, S\. C\. Benjamin, S\. Endo, K\. Fujii, J\. R\. McClean, K\. Mitarai, X\. Yuan, L\. Cincio, and P\. J\. Coles \(2021\)Variational quantum algorithms\.Nature Reviews Physics3\(9\),pp\. 625–644\.Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p5.1),[§2\.1](https://arxiv.org/html/2606.02785#S2.SS1.p1.3)\.
- P\. G\. Contributors \(2024\)DimeNetPlusPlus\.Note:[https://pytorch\-geometric\.readthedocs\.io/en/latest/generated/torch\_geometric\.nn\.models\.DimeNetPlusPlus\.html](https://pytorch-geometric.readthedocs.io/en/latest/generated/torch_geometric.nn.models.DimeNetPlusPlus.html)PyTorch Geometric DocumentationCited by:[footnote 2](https://arxiv.org/html/2606.02785#footnote2)\.
- M\. Fey and J\. E\. Lenssen \(2019\)Fast graph representation learning with pytorch geometric\.InICLR Workshop on Representation Learning on Graphs and Manifolds,External Links:[Link](https://arxiv.org/abs/1903.02428)Cited by:[footnote 2](https://arxiv.org/html/2606.02785#footnote2)\.
- J\. Gasteiger, S\. Giri, J\. T\. Margraf, and S\. Günnemann \(2020a\)Fast and uncertainty\-aware directional message passing for non\-equilibrium molecules\.CoRRabs/2011\.14115\.External Links:[Link](https://arxiv.org/abs/2011.14115),2011\.14115Cited by:[§3\.2\.1](https://arxiv.org/html/2606.02785#S3.SS2.SSS1.p2.4),[§6](https://arxiv.org/html/2606.02785#S6.p3.3)\.
- J\. Gasteiger, J\. Groß, and S\. Günnemann \(2020b\)Directional message passing for molecular graphs\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=B1eWbxStPH)Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p4.1),[§3\.2\.1](https://arxiv.org/html/2606.02785#S3.SS2.SSS1.p2.4)\.
- J\. Gilmer, S\. S\. Schoenholz, P\. F\. Riley, O\. Vinyals, and G\. E\. Dahl \(2017\)Neural message passing for quantum chemistry\.InProceedings of the 34th International Conference on Machine Learning \(ICML\),pp\. 1263–1272\.Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p1.1),[§1](https://arxiv.org/html/2606.02785#S1.p4.1)\.
- P\. J\. Huber \(1964\)Robust estimation of a location parameter\.The Annals of Mathematical Statistics35\(1\),pp\. 73–101\.External Links:[Document](https://dx.doi.org/10.1214/aoms/1177703732)Cited by:[§2\.4](https://arxiv.org/html/2606.02785#S2.SS4.p3.13)\.
- G\. Kasieczkaet al\.\(2021\)The LHC Olympics 2020 a community challenge for anomaly detection in high energy physics\.Rept\. Prog\. Phys\.84\(12\),pp\. 124201\.External Links:2101\.08320,[Document](https://dx.doi.org/10.1088/1361-6633/ac36b9)Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p2.1)\.
- P\. T\. Komiske, E\. M\. Metodiev, and J\. Thaler \(2019\)Energy flow networks: deep sets for particle jets\.Journal of High Energy Physics2019\(1\)\.External Links:ISSN 1029\-8479,[Link](http://dx.doi.org/10.1007/JHEP01(2019)121),[Document](https://dx.doi.org/10.1007/jhep01%282019%29121)Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p2.1)\.
- J\. Liu, H\. Yuan, X\. Lu, and X\. Wang \(2020\)Quantum Fisher information matrix and multiparameter estimation\.Journal of Physics A: Mathematical and Theoretical53\(2\),pp\. 023001\.Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p5.1)\.
- J\. J\. Meyer \(2021\)Fisher Information in Noisy Intermediate\-Scale Quantum Applications\.Quantum5,pp\. 539\.External Links:2103\.15191,[Document](https://dx.doi.org/10.22331/q-2021-09-09-539)Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p5.1)\.
- B\. Nachman and D\. Shih \(2020\)Anomaly detection with density estimation\.Physical Review D101\(7\)\.External Links:ISSN 2470\-0029,[Link](http://dx.doi.org/10.1103/PhysRevD.101.075042),[Document](https://dx.doi.org/10.1103/physrevd.101.075042)Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p2.1)\.
- J\. P\. Provost and G\. Vallee \(1980\)Riemannian structure on manifolds of quantum states\.Communications in Mathematical Physics76\(3\),pp\. 289–301\.Cited by:[§2\.2](https://arxiv.org/html/2606.02785#S2.SS2.p1.1)\.
- PyTorch Contributors \(2024\)HuberLoss – PyTorch Documentation\.Note:[https://pytorch\.org/docs/stable/generated/torch\.nn\.HuberLoss\.html](https://pytorch.org/docs/stable/generated/torch.nn.HuberLoss.html)Accessed: 2026Cited by:[§2\.4](https://arxiv.org/html/2606.02785#S2.SS4.p3.13)\.
- H\. Qu and L\. Gouskos \(2020\)Jet tagging via particle clouds\.Physical Review D101\(5\)\.External Links:ISSN 2470\-0029,[Link](http://dx.doi.org/10.1103/PhysRevD.101.056019),[Document](https://dx.doi.org/10.1103/physrevd.101.056019)Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p2.1)\.
- H\. Qu, C\. Li, and S\. Qian \(2022a\)Cited by:[§3\.1](https://arxiv.org/html/2606.02785#S3.SS1.p1.2)\.
- H\. Qu, C\. Li, and S\. Qian \(2022b\)Particle transformer for jet tagging\.InProceedings of the 39th International Conference on Machine Learning \(ICML\),pp\. 18281–18292\.Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p2.1),[§1](https://arxiv.org/html/2606.02785#S1.p6.1),[§3\.1](https://arxiv.org/html/2606.02785#S3.SS1.p1.2)\.
- H\. Qu and C\. Li \(2022\)weaver\-core: a streamlined deep\-learning framework for high energy physicsNote:Accessed: May 2026External Links:[Link](https://github.com/hqucms/weaver-core)Cited by:[footnote 1](https://arxiv.org/html/2606.02785#footnote1)\.
- R\. Ramakrishnan, P\. O\. Dral, M\. Rupp, and O\. A\. von Lilienfeld \(2014\)Quantum chemistry structures and properties of 134 kilo molecules\.Scientific Data1\(1\),pp\. 140022\.Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p4.1)\.
- F\. Rosenblatt \(1958\)The perceptron: a probabilistic model for information storage and organization in the brain\.Psychological Review65\(6\),pp\. 386–408\.External Links:[Document](https://dx.doi.org/10.1037/h0042519)Cited by:[§3\.2\.1](https://arxiv.org/html/2606.02785#S3.SS2.SSS1.p4.5)\.
- F\. Scarselli, M\. Gori, A\. C\. Tsoi, M\. Hagenbuchner, and G\. Monfardini \(2009\)The graph neural network model\.IEEE Transactions on Neural Networks20\(1\),pp\. 61–80\.External Links:[Document](https://dx.doi.org/10.1109/TNN.2008.2005605)Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p1.1)\.
- K\. T\. Schütt, P\. Kindermans, H\. E\. Sauceda, S\. Chmiela, A\. Tkatchenko, and K\. Müller \(2017\)SchNet: a continuous\-filter convolutional neural network for modeling quantum interactions\.InAdvances in Neural Information Processing Systems,Vol\.30\.Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p4.1)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.InAdvances in Neural Information Processing Systems,Vol\.30\.Cited by:[§1](https://arxiv.org/html/2606.02785#S1.p1.1)\.

## Appendix AAppendix: Jet Flavor Classification

### A\.1Training setup and times

As mentioned in Table[2](https://arxiv.org/html/2606.02785#S4.T2), we train the particle transformer per epoch on training sizes of0\.1M0\.1\\mathrm\{M\},0\.5M0\.5\\mathrm\{M\}and5M5\\mathrm\{M\}jets, equally divided between the two classes\. We use a validation set of1M1\\mathrm\{M\}jets \(equally balanced\) and a held\-out test set of3\.9M3\.9\\mathrm\{M\}jets\. Training employs the Ranger optimizer \(combining RAdam with LookAhead\) with learning rateη=1×10−3\\eta=1\\times 10^\{\-3\}, batch size512512, and a flat\+decay learning rate schedule that maintains the initial learning rate for the first 70% of training before exponentially decaying to0\.01η0\.01\\etaover the final 30%\. The model is trained with cross\-entropy loss and early stopping is disabled, allowing models to train for the full6060epochs\. For the largest training size of5M5\\mathrm\{M\}examples, the baseline Particle Transformer requires approximately0\.46h0\.46\\,\\mathrm\{h\}per epoch, with itsQuiver\-augmented variant requiring1\.26h1\.26\\,\\mathrm\{h\}for the same\.

## Appendix BAppendix: Molecular Property Regression

### B\.1Training setup and times

The QM9 dataset is partitioned into training, validation, and test subsets containing65,39065\{,\}390,13,07813\{,\}078, and52,31352\{,\}313molecules respectively, summing to the full corpus of130,781130\{,\}781samples\. TheVQCis trained on a subset of5,0005\{,\}000molecules drawn from the training partition, with1,0001\{,\}000molecules taken from the validation partition for monitoring convergence\. The classical DimeNet\+\+ benchmark and the proposed𝒬DimeNet\+\+\\mathcal\{Q\}\\mathrm\{DimeNet\}\{\+\+\}architecture are both trained on the remaining60,39060\{,\}390molecules and validated on the remaining12,07812\{,\}078molecules, with final performance reported on the held\-out test set of52,31352\{,\}313molecules\. This protocol guarantees that the classical and quantum\-enhanced models are evaluated on identical test data, while ensuring that no sample seen by theVQCduring its pre\-training stage is reused for validation or testing of the downstream graph network\.

We train all DimeNet\+\+ models with theAdamoptimizer using an initial learning rate of10−310^\{\-3\}, batch size128128, and zero weight decay\. No learning\-rate schedule or decay is used\. Models are trained for at most300300epochs with early stopping on validation MAE: training terminates when validation MAE fails to improve by at least0\.25meV0\.25\\,\\mathrm\{meV\}for3030consecutive epochs\. Since the HOMO–LUMO gap target is standardized during training, this corresponds to a normalized threshold of1\.95×10−41\.95\\times 10^\{\-4\}\.

The training objective is the L1 loss, equivalent to MAE, applied to the standardized HOMO–LUMO gap target\. Validation and test MAEs are reported after converting back to physical units using the training\-set target standard deviation\.

Across the 10 seeds, DimeNet\+\+ trained for196\.9±30\.9196\.9\\pm 30\.9epochs on average, while𝒬\\mathcal\{Q\}DimeNet\+\+ trained for216\.5±54\.6216\.5\\pm 54\.6epochs\. Because runs were executed under parallel GPU scheduling, wall\-clock timings should be interpreted as approximate runtime bounds rather than isolated architecture benchmarks\. In this setup, epochs completed within approximately80s80\\,\\mathrm\{s\}for DimeNet\+\+ and100s100\\,\\mathrm\{s\}for𝒬\\mathcal\{Q\}DimeNet\+\+\.

For both cases, training was carried out on a singleNvidiaL40S\\mathrm\{L40S\}GPU with48GB48\\,\\mathrm\{GB\}of VRAM, on a locally available university compute cluster with 128 CPU cores\.

### B\.2Technical Details

The edge\-state rescaling in Equation[8](https://arxiv.org/html/2606.02785#S3.E8), proposed under theQuiverparadigm, operates on the6×66\\times 6QFIMsub\-matrix that encodes the pairwise interaction between qubitsiiandjj\. This sub\-matrix is first processed by a two\-dimensional convolutional layer with1616output channels and a kernel of size33, followed by a ReLU non\-linearity\. A global average pooling operation then collapses the spatial dimensions, and the resulting representation is flattened from1616channels into an88\-dimensional vector, which is subsequently standardized by a LayerNorm operation to stabilize the learned embedding distribution\. The normalized embedding is then passed through a scaling multilayer perceptronsijs\_\{ij\}consisting of a linear projection fromdQFIMd\_\{\\mathrm\{QFIM\}\}tomax⁡\(4,dQFIM\)\\max\(4,d\_\{\\mathrm\{QFIM\}\}\)hidden units, a SiLU activation, a second linear projection to a single scalar, and a finaltanh\\tanhnon\-linearity that bounds the output to\(−1,1\)\(\-1,1\)\. The resulting scalar acts as a learned, edge\-specific multiplicative gate on the message exchanged between qubitsiiandjj, allowing the model to attenuate or amplify eachQFIM\-derived interaction in a fully data\-driven manner\.
QUIVER: Quantum-Informed Views for Enhanced Representations in Large ML Models

Similar Articles

PQFA: Parallel Quantum Feature Augmentation of Fused Representations for Multimodal Classification

@ickma2311: Efficient AI Lecture 22: Quantum Machine Learning I Quantum ML starts from a different computational primitive: the Qub…

QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems

ViQ: Text-Aligned Visual Quantized Representations at Any Resolution

Image classification via a quantum-inspired strategy involving a mixture of experts

Submit Feedback

Similar Articles

PQFA: Parallel Quantum Feature Augmentation of Fused Representations for Multimodal Classification
@ickma2311: Efficient AI Lecture 22: Quantum Machine Learning I Quantum ML starts from a different computational primitive: the Qub…
QUIVER: A Formal Framework for Quantifying Perturbation Propagation and Bifurcation in Compound AI Systems
ViQ: Text-Aligned Visual Quantized Representations at Any Resolution
Image classification via a quantum-inspired strategy involving a mixture of experts