Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling

arXiv cs.LG 05/15/26, 04:00 AM Papers
Summary
This paper introduces an unsupervised framework for modeling acquisition-related variability in structural connectomes using hybrid latent space modeling, eliminating the need for manual capacity tuning by architecturally annealing encoder outputs.
arXiv:2605.13933v1 Announce Type: new Abstract: Acquisition differences across sites, scanners, and protocols in dMRI introduce variability that complicates structural connectome analysis. This motivates deep learning models that can represent high-dimensional connectomes in a low-dimensional space while explicitly separating acquisition-related effects from biological variation. Conventional dimensionality reduction methods model all variance as continuous, so acquisition effects often get absorbed into a continuous latent space. Recent hybrid latent-space models combine discrete and continuous components to address this, but typically require manual capacity tuning to ensure the discrete component captures the intended variability. We introduce an unsupervised framework that removes this manual tuning by architecturally annealing encoder outputs before decoding, allowing the model to adaptively balance discrete and continuous latent variables during training. To evaluate it, we curated a dataset of N=7,416 structural connectomes derived from dMRI, spanning ages 2 to 102 and 13 studies with 25 unique acquisition-parameter combinations. Of these, 5,900 are cognitively unimpaired, 877 have mild cognitive impairment (MCI), and 639 have Alzheimer's disease (AD). We compare against a standard VAE, PCA with k-means clustering, and hybrid models that anneal only through the loss function. Our architectural annealing produces stronger site learning (ARI=0.53, p<0.05) than these baselines. Results show that a hybrid continuous-discrete latent space, with architectural rather than loss-based annealing, provides a useful unsupervised mechanism for capturing acquisition variability in dMRI: by jointly modeling smooth and categorical structure, the Joint-VAE recovers clusters aligned with scanner and protocol differences.
Original Article
View Cached Full Text
Cached at: 05/15/26, 06:25 AM
# Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling
Source: [https://arxiv.org/html/2605.13933](https://arxiv.org/html/2605.13933)
\\jmlrvolume

– Under Review\\jmlryear2026\\jmlrworkshopFull Paper – MIDL 2026 submission\\midlauthor\\NameGaurav Rudravaram\\nametag1\\Emailgaurav\.rudravaram@vanderbilt\.edu \\addr1Department of Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, USA \\NameLianrui Zuo\\nametag1\\Emaillianrui\.zuo@vanderbilt\.edu \\NameMichael E\. Kim\\nametag2\\Emailmichael\.kim@vanderbilt\.edu \\addr2Department of Computer Science, Vanderbilt University, Nashville, TN, USA \\NameKarthik Ramadass\\nametag1,2\\Emailkarthik\.ramadass@vanderbilt\.edu \\NameElyssa McMaster\\nametag1\\Emailelyssa\.mcmaster@vanderbilt\.edu \\NameJongyeon Yoon\\nametag2\\Emailjongyeon\.yoon@vanderbilt\.ed \\NameAravind R\. Krishnan\\nametag1\\Emailaravind\.r\.krishnan@vanderbilt\.edu \\NameAdam M\. Saunders\\nametag1\\Emailadam\.m\.saunders@vanderbilt\.edu \\NameChenyu Gao\\nametag1\\Emailchenyu\.gao@vanderbilt\.edu \\NameNancy R\. Newlin\\nametag3\\Emailnewlinn@mskcc\.org \\addr3Memorial Sloan Kettering Cancer Center, New York, NY, USA \\NamePraitayini Kanakaraj\\nametag2\\Emailpraitayini\.kanakaraj@vanderbilt\.ed \\NameLori L\. Beason Held\\nametag4\\Emailheldlo@grc\.nia\.nih\.gov \\addr4Laboratory of Behavioral Neuroscience, National Institute on Aging, National Institutes of Health, Baltimore, MD, USA \\NameMurat Bilgel\\nametag4\\Emailmurat\.bilgel@nih\.gov \\NameLaura A\. Barquero\\nametag5\\Emaillaura\.barquero@Vanderbilt\.Edu \\addr5Peabody College of Education and Human Development, Nashville, Tennessee, USA \\NameMicah D’Archangel\\nametag5\\Emailmicah\.a\.darchangel@vanderbilt\.edu \\NameTin Q\. Nguyen\\nametag5\\Emailtin\.nguyen@vanderbilt\.edu \\NameLaurie B\. Cutting\\nametag5\\Emaillaurie\.cutting@Vanderbilt\.Edu \\NameDerek Archer\\nametag6\\Emailderek\.archer@vumc\.org \\addr6Vanderbilt University Medical Center,Vanderbilt Memory&\\&Alzheimer’s Center,Nashville,TN, USA \\NameTimothy J\. Hohman\\nametag6\\Emailtimothy\.j\.hohman@vumc\.org \\NameDaniel C\. Moyer\\nametag2\\Emaildaniel\.moyer@Vanderbilt\.Edu \\NameBennett A\. Landman\\nametag1,2\\Emailbennett\.landman@vanderbilt\.edu

###### Abstract

Acquisition differences across sites, scanners, and protocols in dMRI introduce variability in structural connectome analysis\. This motivates the need for deep learning models that can represent downstream, high\-dimensional structural connectomes in a low\-dimensional space while explicitly separating acquisition\-related effects from underlying biological variation\. Conventional statistical and deep learning approaches for dimensionality reduction typically model all sources of variance as continuous, making it difficult to separate discrete effects, e\.g\., acquisition\- or site\-related, from continuous biological variation\. As a result, acquisition\-related effects often become absorbed into a continuous latent space\. Recent advances in deep learning have explored hybrid latent space modeling, where discrete and continuous components jointly represent structured variability\. However, existing hybrid approaches generally rely on manual capacity tuning to ensure that the discrete component captures desired variability \(e\.g\., acquisition\)\. Here, we introduce a principled unsupervised framework that removes the need for such manual capacity tuning byarchitecturally annealingthe encoder outputs before decoding, allowing the model to adaptively balance the contributions of discrete and continuous latent variables during training\. To investigate this joint latent space modeling, we curated a large dataset \(N=7,416N=7,416; 60% female\) of structural connectomes derived from dMRI scans of participants\. Our dataset spans an age range of 2 to 102 years and encompasses 13 different studies with 25 unique acquisition parameter combinations\. Among these, 5,900 are cognitively unimpaired/neurotypical, 877 are diagnosed with mild cognitive impairment \(MCI\), and 639 are diagnosed with Alzheimer’s disease \(AD\)\. We compare our approach with a standard VAE, PCA followed by k\-means clustering, and hybrid models that impose annealing only through the loss function, showing that the architectural annealing results in stronger site learning \(ARI=0\.530\.53,p<0\.05p<0\.05\) as compared to the other methods\. These results demonstrate that the proposed hybrid continuous–discrete latent space provides a useful unsupervised mechanism for capturing acquisition\-related variability in diffusion MRI; by jointly modeling smooth and categorical structure, the Joint\-VAE recovers meaningful clusters aligned with scanner and protocol differences\.

###### keywords:

Unsupervised representation learning, structural connectomes, harmonization

††editors:Under Review for MIDL 2026## 1Introduction

White\-matter connectivity and microstructural integrity are central to understanding neurodegenerative diseases\[kamagata2021diffusion\], cognitive decline\[vogt2020cortical\], and aging\[kantarci2014white\]\. Diffusion MRI \(dMRI\) is well suited for these investigations because water molecule diffusion is constrained by axonal membranes and myelin, providing indirect signatures of microstructural organization\[jones2010diffusion\]\. By modeling diffusion‐weighted signals, dMRI enables the estimation of quantitative metrics such as fractional anisotropy \(FA\), mean diffusivity \(MD\), and more advanced compartment\-based measures that reflect axonal integrity and tissue composition\[mori2013introduction\]\. These signals can also be used to perform tractography and construct structural connectomes that characterize large\-scale network organization for downstream statistical or machine\-learning analyses\[shamir2025tutorial\]\.

Despite its importance, dMRI is heavily impacted by heterogeneity in acquisition, which complicates the ability to draw generalizable conclusions across studies or cohorts\. Diffusion\-derived metrics have been shown to vary significantly with echo time \(TE\), repetition time \(TR\), magnetic field strength, and diffusion weighting\[hui2010b,yao2023both\]\. Higher b\-values reduce signal\-to\-noise ratio, and bothhui2010bandyao2023bothdemonstrated that FA, MD, and related quantities differ substantially when computed from data acquired at different b\-values, reflecting a combination of noise\-floor effects and underlying tissue\-compartment differences\. Multi\-shell acquisitions introduce additional complexity: while they enable more expressive modeling of crossing fibers and multi\-compartment microstructure—supporting models such as NODDIzhang2012noddi;jelescu2017design—they also produce connectomes that differ meaningfully from those derived from single\-shell data, withyao2023bothshowing improved sensitivity for predicting Parkinson’s disease when using multi\-shell–derived connectomes\. Although such acquisition differences are often intentionally leveraged to optimize sensitivity to particular microstructural properties or brain regionscaiazzo2016q, they become problematic when integrating data across scanners or protocols: even subtle variations alter the diffusion signal and propagate to microstructural estimates, tractography, and ultimately the connectomes themselves\. Consequently, connectomes generated from the same individual under different acquisition protocols can exhibit substantial discrepanciesprvckovska2016reproducibility;villalon2016reliability, making it difficult to distinguish biological effects from acquisition\-driven variability in multi\-site analyses\.

This variability has motivated the development of harmonization techniques aimed at reducing acquisition\-related differences before group analyses or machine\-learning tasks\. Statistical approaches such as ComBatfortin2017harmonizationhave been applied to remove protocol effects from scalar brain connectivity metrics, and recent deep\-learning methods have sought to learn latent representations of structural connectomes that are invariant to acquisition domains\. Autoencoders\(zheng2025connectomeae\), graph neural networks\(noman2024graph\), and conditional variational autoencoders \(CVAEs\)\(newlin2025harmonizing;zuo2021unsupervised;zuo2023haca3\)have all been used to enforce site invariance by conditioning on site labels or acquisition\-related variables\. Adversarial graph neural network frameworks have further incorporated site classifiers and site\-conditioned decoders to suppress domain information\(patel2025structural\)\. However, these approaches all depend on predefined site or scanner labels, a significant limitation given that acquisition variability may exist within a single dataset or metadata may be incomplete or inaccurate\(zuo2022disentangling\)\. This motivates the need for unsupervised models capable of disentangling acquisition\-related variation directly from the data itself\.

Hybrid latent\-space models that combine continuous and discrete representations offer a promising path toward such unsupervised disentanglement\. Following the insight fromdupont2018learningthat discrete latent variables can capture structured, categorical variation, we treat site effects \(arising from combinations of acquisition parameters and protocol differences\) as a discrete component superimposed on continuous biological variation\. This joint representation allows the model to infer site structure without any site labels\(rudravaram2025characterizing\)\. Yet the original hybrid formulation relies on a heuristic hinge\-margin mechanism to balance capacity between continuous and discrete latent spaces, providing only indirect control over information allocation\. Our prior work\(rudravaram2025characterizing\)addressed this by introducing a loss\-based capacity regulation strategy that improved stability and interpretability\. In this paper, we extend the framework to a substantially larger and more heterogeneous multi\-site dataset and introduce a staged, model\-based annealing procedure that adjusts capacity within the architecture itself\. This new strategy increases discrete\-space utilization and enables recovery of more meaningful acquisition\-related clusters in a fully unsupervised manner\.

## 2Methods

### 2\.1Data and preprocessing

To evaluate hybrid continuous\-discrete latent space models for structural connectomes, we assembled a large multi\-cohort dMRI dataset spanning 13 major neuroimaging studies\. For datasets with longitudinal imaging, we selected a single scan per participant to avoid bias from repeated measures\. The final dataset comprises 955 connectomes from BLSAferrucci2008baltimore, 57 from ADNI, 87 from the Calgary Preschool datasetreynolds2020calgary, 373 from a pediatric VUMC datasetds004146:1\.0\.0, 111 from the Multisensory Lexical Processing cohortds001894:1\.4\.2, 80 from MASiVards003416:2\.0\.2, 236 from a longitudinal language\-development studyds003604:1\.0\.7, 398 from QTABds004146:1\.0\.3, 2,486 from HABSHDpetersen2025health, 339 from WRAPjohnson2018wisconsin, 1,006 from NACC, and 610 from a combined ROS/MAP/MARS cohortbennett2005rush;a2012overview;l2012minority\. Across datasets, we defined a site as a unique scanner–protocol pairing, yielding 25 distinct sites\. The aggregated sample includes 4,490 females and 2,926 males \(ages2∼1022\\sim 102\), consisting of neurotypical participants \(n=5,900n=5,900\) as well as individuals diagnosed with mild cognitive impairment \(MCI;n=877n=877\) and Alzheimer’s disease \(AD;n=639n=639\)\.

All imaging data were preprocessed followingkim2025scalable\. Diffusion MRI scans were corrected and standardized using the PreQual pipelinecai2021prequal\. Structural MRI volumes were segmented into 121 BrainColor regions using the SLANT brain segmentatonhuo20193d\. Fiber orientation distributions were estimated and whole\-brain probabilistic tractography was performed in MRtrix3tournier2012mrtrix, seeding streamlines at the white matter\-gray matter interface and generating 10 million streamlines per subject\. Structural connectomes were constructed by counting streamline terminations between all SLANT\-defined ROI pairs\. Visual quality assurance was performed at each processing stage, following best practices inkim2025scalable\.

### 2\.2JointVAE

Traditional variational autoencoders \(VAEs\)consist of an encoderqϕ\(⋅\)q\_\{\\phi\}\(\\cdot\), parameterized byϕ\\phi, that maps the input dataxxinto a low\-dimensional latent representationzz, and a decoderpθ\(⋅\)p\_\{\\theta\}\(\\cdot\), parameterized byθ\\theta, that reconstructs the input asx^\\hat\{x\}\. To prevent the latent space from becoming arbitrarily dispersed or fragmented, the approximate posteriorqϕ\(z\|x\)q\_\{\\phi\}\(z\|x\)is encouraged to match a prior distribution, typically a standard normal distributionp\(z\)=𝒩\(0,I\)p\(z\)=\\mathcal\{N\}\(0,I\)\. Under this formulation, the traditional VAE optimization objective becomes:

ℒ\(θ,ϕ\)=𝔼qϕ\(z\|x\)\[log⁡pθ\(x\|z\)\]−DKL\(qϕ\(z\|x\)∥p\(z\)\)\.\\mathcal\{L\}\(\\theta,\\phi\)=\\mathbb\{E\}\_\{q\_\{\\phi\}\(z\|x\)\}\\\!\\left\[\\log p\_\{\\theta\}\(x\|z\)\\right\]\-D\_\{\\mathrm\{KL\}\}\\left\(q\_\{\\phi\}\(z\|x\)\\,\\\|\\,p\(z\)\\right\)\.\(1\)The Joint\-VAEdupont2018learningextends the traditional VAE framework by augmenting the latent space with an additional discrete component\. Letzcz\_\{c\}denote the continuous latent variable andzdz\_\{d\}denote the discrete latent variable\. The encoder now estimates a joint posterior distribution,qϕ\(zc,zd\|x\)q\_\{\\phi\}\(z\_\{c\},z\_\{d\}\|x\), while the decoder reconstructs the input from both latent components viapθ\(x\|zc,zd\)p\_\{\\theta\}\(x\|z\_\{c\},z\_\{d\}\)\. With this formulation, the Joint\-VAE objective becomes:

ℒ\(θ,ϕ\)=𝔼qϕ\(zc,zd\|x\)\[log⁡pθ\(x\|zc,zd\)\]−DKL\(qϕ\(zc,zd\|x\)∥p\(zc,zd\)\)\.\\mathcal\{L\}\(\\theta,\\phi\)=\\mathbb\{E\}\_\{q\_\{\\phi\}\(z\_\{c\},z\_\{d\}\|x\)\}\\\!\\left\[\\log p\_\{\\theta\}\(x\|z\_\{c\},z\_\{d\}\)\\right\]\-D\_\{\\mathrm\{KL\}\}\\left\(q\_\{\\phi\}\(z\_\{c\},z\_\{d\}\|x\)\\,\\\|\\,p\(z\_\{c\},z\_\{d\}\)\\right\)\.\(2\)Becauseqϕ\(zc∣x\)q\_\{\\phi\}\(z\_\{c\}\\mid x\)andqϕ\(zd∣x\)q\_\{\\phi\}\(z\_\{d\}\\mid x\)are produced as separate distributions from the encoder, the posterior can be factorized asqϕ\(zc,zd∣x\)=qϕ\(zc∣x\)qϕ\(zd∣x\)q\_\{\\phi\}\(z\_\{c\},z\_\{d\}\\mid x\)=q\_\{\\phi\}\(z\_\{c\}\\mid x\)\\,q\_\{\\phi\}\(z\_\{d\}\\mid x\), and with the standard factorized priorp\(zc,zd\)=p\(zc\)p\(zd\)p\(z\_\{c\},z\_\{d\}\)=p\(z\_\{c\}\)\\,p\(z\_\{d\}\)the KL divergence decomposes into two terms\.

DKL\(qϕ\(zc,zd\|x\)∥p\(zc,zd\)\)=DKL\(qϕ\(zc\|x\)∥p\(zc\)\)\+DKL\(qϕ\(zd\|x\)∥p\(zd\)\)\.D\_\{\\mathrm\{KL\}\}\\\!\\left\(q\_\{\\phi\}\(z\_\{c\},z\_\{d\}\|x\)\\,\\\|\\,p\(z\_\{c\},z\_\{d\}\)\\right\)=D\_\{\\mathrm\{KL\}\}\\\!\\left\(q\_\{\\phi\}\(z\_\{c\}\|x\)\\,\\\|\\,p\(z\_\{c\}\)\\right\)\+D\_\{\\mathrm\{KL\}\}\\\!\\left\(q\_\{\\phi\}\(z\_\{d\}\|x\)\\,\\\|\\,p\(z\_\{d\}\)\\right\)\.\(3\)Thus, the final objective for the Joint\-VAE training becomes:

ℒ\(θ,ϕ\)=𝔼qϕ\(zc,zd\|x\)\[log⁡pθ\(x\|zc,zd\)\]−βcDKL\(qϕ\(zc\|x\)∥p\(zc\)\)−βdDKL\(qϕ\(zd\|x\)∥p\(zd\)\),\\mathcal\{L\}\(\\theta,\\phi\)=\\mathbb\{E\}\_\{q\_\{\\phi\}\(z\_\{c\},z\_\{d\}\|x\)\}\\\!\\left\[\\log p\_\{\\theta\}\(x\|z\_\{c\},z\_\{d\}\)\\right\]\-\\beta\_\{c\}\\,D\_\{\\mathrm\{KL\}\}\\\!\\left\(q\_\{\\phi\}\(z\_\{c\}\|x\)\\,\\\|\\,p\(z\_\{c\}\)\\right\)\-\\beta\_\{d\}\\,D\_\{\\mathrm\{KL\}\}\\\!\\left\(q\_\{\\phi\}\(z\_\{d\}\|x\)\\,\\\|\\,p\(z\_\{d\}\)\\right\),\(4\)whereβc\\beta\_\{c\}andβd\\beta\_\{d\}allow separate weighting of the continuous and discrete KL terms\. However, because the continuous latent space has, in principle, unbounded capacity compared to the discrete space, optimizing this loss directly often leads the model to place almost all information in the continuous latentzcz\_\{c\}and ignore the discrete latentzdz\_\{d\}\.

To address this imbalance, the original Joint\-VAEdupont2018learningproposes gradually increasing the “capacity” of each latent channel throughout training so that the continuous and discrete parts can contribute at different rates\. This is implemented by introducing hinge parametersCcC\_\{c\}andCdC\_\{d\}, which specify target capacities \(upper bounds\) for the KL contributions of the continuous and discrete latent spaces, respectively, and are treated as user\-defined hyperparameters\. The resulting objective is

\{aligned\}ℒ\(θ,ϕ\)=𝔼qϕ\(zc,zd\|x\)\[logpθ\(x\|zc,zd\)\]−β\|DKL\(qϕ\(zc∣x\)∥p\(zc\)\)−Cc\|−β\|DKL\(qϕ\(zd\|x\)∥p\(zd\)\)−Cd\|\.\\aligned\\mathcal\{L\}\(\\theta,\\phi\)&=\\mathbb\{E\}\_\{q\_\{\\phi\}\(z\_\{c\},z\_\{d\}\|x\)\}\\\!\\left\[\\log p\_\{\\theta\}\(x\|z\_\{c\},z\_\{d\}\)\\right\]\\\\ &\\quad\-\\beta\\Bigl\|D\_\{\\mathrm\{KL\}\}\\\!\\bigl\(q\_\{\\phi\}\(z\_\{c\}\\mid x\)\\,\\\|\\,p\(z\_\{c\}\)\\bigr\)\-C\_\{c\}\\Bigr\|\-\\beta\\Bigl\|D\_\{\\mathrm\{KL\}\}\\\!\\bigl\(q\_\{\\phi\}\(z\_\{d\}\|x\)\\,\\\|\\,p\(z\_\{d\}\)\\bigr\)\-C\_\{d\}\\Bigr\|\.\(5\)In this hinge\-based formulation, the loss now depends on three hyperparameters: the continuous and discrete capacitiesCcC\_\{c\}andCdC\_\{d\}, and the weighting factorβ\\beta\. Unlike in the traditionalβ\\beta\-VAE setting, where increasingβ\\betacan encourage disentanglement in a principled evidence lower bound\(ELBO\) framework, hereβ\\betaprimarily controls how strictly the model is forced to match the manually specified capacities\.

### 2\.3Improved Joint\-VAE with Loss and Architecture Annealing

Loss Annealing\.To address this this, in our prior workrudravaram2025characterizing, we proposed a principled staged capacity\-control mechanism designed specifically for hybrid continuous\-discrete latent spaces\. Rather than adopting the KL\-capacity formulation ofburgess2018understanding, we directly regulated the expressive power of the continuous latent by annealing the encoder’s posterior parameters\. Let\(μ,log⁡σ2\)\(\\mu,\\log\\sigma^\{2\}\)denote the encoder outputs for the continuous latentzcz\_\{c\}\. We defined annealed parametersμ′=λμ\\mu^\{\\prime\}=\\lambda\\,\\muandlog⁡σ′⁣2=λlog⁡σ2\\log\\sigma^\{\\prime 2\}=\\lambda\\,\\log\\sigma^\{2\}, where the annealing coefficientλ\\lambdaincreases linearly from0to11over a fixed number of iterations\. Substituting\(μ′,log⁡σ′⁣2\)\(\\mu^\{\\prime\},\\log\\sigma^\{\\prime 2\}\)into the KL divergence yields an annealed continuous KL term,

D~KL\(qϕ\(zc∣x\)∥p\(zc\)\)=\\tfrac12\(σ2λ\+λ2μ2−1−λlog⁡σ2\),\\widetilde\{D\}\_\{\\mathrm\{KL\}\}\\\!\\left\(q\_\{\\phi\}\(z\_\{c\}\\mid x\)\\,\\\|\\,p\(z\_\{c\}\)\\right\)=\\tfrac\{1\}\{2\}\\bigl\(\\sigma^\{2\\lambda\}\+\\lambda^\{2\}\\mu^\{2\}\-1\-\\lambda\\log\\sigma^\{2\}\\bigr\),\(6\)and the corresponding objective becomes

ℒ\(θ,ϕ\)=𝔼qϕ\(zc,zd\|x\)\[log⁡pθ\(x\|zc,zd\)\]−βD~KL\(qϕ\(zc\|x\)∥p\(zc\)\)−βDKL\(qϕ\(zd\|x\)∥p\(zd\)\)\.\\mathcal\{L\}\(\\theta,\\phi\)=\\mathbb\{E\}\_\{q\_\{\\phi\}\(z\_\{c\},z\_\{d\}\|x\)\}\\\!\\left\[\\log p\_\{\\theta\}\(x\|z\_\{c\},z\_\{d\}\)\\right\]\-\\beta\\,\\widetilde\{D\}\_\{\\mathrm\{KL\}\}\\\!\\left\(q\_\{\\phi\}\(z\_\{c\}\|x\)\\,\\\|\\,p\(z\_\{c\}\)\\right\)\-\\beta\\,D\_\{\\mathrm\{KL\}\}\\\!\\left\(q\_\{\\phi\}\(z\_\{d\}\|x\)\\,\\\|\\,p\(z\_\{d\}\)\\right\)\.\(7\)
This annealing schedule reduces the influence of the continuous latent at early iterations, encouraging the model to encode structure in the discrete latentzdz\_\{d\}before gradually incorporating the continuous latent asλ\\lambdaincreases\. Asλ\\lambdaapproaches11, the continuous channel becomes fully expressive, enabling balanced joint use of both latent spaces\. While this loss\-based annealing improved stability and reduced dependence on manual capacity tuning, it has a key limitation: the annealing is applied only to the KL term, not to the encoder itself\. The raw encoder outputs\(μ,log⁡σ2\)\(\\mu,\\log\\sigma^\{2\}\)remain unconstrained and can drift toward extreme values during early epochs when the KL penalty is weak, leading to unstable dynamics or deviations from the intended capacity schedule\. Freezing or ignoring the continuous KL term is also insufficient, as these parameters may drift into degenerate regions of parameter space while unregularized\.

Architectural Annealing\.In this paper, we introduce a novel*architectural annealing*strategy that enforces the capacity schedule at the encoder level\. Instead of annealing only the KL divergence, we anneal the encoder outputs themselves and useμλ⇐λμ\\mu\_\{\\lambda\}\\Leftarrow\\lambda\\,\\muandlog⁡σλ2⇐λlog⁡σ2\\log\\sigma^\{2\}\_\{\\lambda\}\\Leftarrow\\lambda\\,\\log\\sigma^\{2\}for both sampling and KL computation\. Because the decoder receives latent samples drawn from this annealed posterior, it is conditioned throughout training on representations whose capacity is directly controlled\. This modification ensures that the influence of the continuous latent is minimal at the start of training and then increases smoothly and predictably asλ\\lambdagrows\. By constraining the encoder outputs—and consequently the decoder inputs, rather than modifying the loss alone, the model is forced to rely on the discrete latent early in training and transitions gradually to full joint optimization, avoiding the instability and failure modes associated with loss\-based annealing\.

\\floatconts

fig:methods\_figure![Refer to caption](https://arxiv.org/html/2605.13933v1/x1.png)

Figure 1:We encode each flattened connectomeXXinto a mean, log\-variance, and discrete class probabilities\. An annealing factor \(ramping from0to11\) scales the mean and log\-variance during training to suppress the continuous pathway early on and encourage reliance on the discrete space before transitioning to full joint optimization\. The continuous latent variablezcz\_\{c\}is sampled via the VAE reparameterization trick, and the discrete variablezdz\_\{d\}is sampled using the Gumbel–Softmax followed by anarg⁡max\\arg\\max\. The concatenated latent vector\[zc,zd\]\[z\_\{c\},z\_\{d\}\]is then decoded to reconstructX^\\hat\{X\}\.
### 2\.4Framework of Architectural Annealing Joint\-VAE

Following this annealing mechanism, our Architectural Annealing Joint\-VAE framework \(FigureLABEL:fig:methods\_figure\) models structural connectomes using both continuous and discrete latent variables\. The encoder receives the flattened upper\-triangular portion of each connectome and processes it through four fully connected layers with ReLU activations\. From the final encoder layer, we obtain three outputs: the mean vectorμ\\mu, the log\-variance vectorlog⁡σ2\\log\\sigma^\{2\}, and the logitsα\\alphathat parameterize the discrete latent\. The continuous latentzcz\_\{c\}is sampled using the standard VAE reparameterization trick applied to the annealed parameters\(μλ,log⁡σλ2\)\(\\mu\_\{\\lambda\},\\log\\sigma^\{2\}\_\{\\lambda\}\), while the discrete latentzdz\_\{d\}represents 25 categorical classes and is sampled via the Gumbel\-Softmax reparameterization trick\. The decoder mirrors the encoder’s structure and reconstructs the predicted connectomeX^\\hat\{X\}from the concatenated latent vector\(zc,zd\)\(z\_\{c\},z\_\{d\}\), allowing the model to leverage both latent pathways throughout training\. The model is trained for 2500 epochs with a batch size of 512 and each epoch has 14 iterations\. Theλ\\lambdavalue was increased from 0 to 1 over5,0005,000iterations and for the rest of the training regime, it remained at 1\.

### 2\.5Experiments

In our first experiment, we evaluate whether the manual capacity\-tuning strategy from the original Joint\-VAE \(Eq\.[5](https://arxiv.org/html/2605.13933#S2.E5)\) provides stable control over how information is allocated between the continuous and discrete latent spaces\. We fix the penalty weight toβ=100\\beta=100to strongly enforce adherence to the prescribed capacities\. For the discrete latent space, the KL term is theoretically upper\-bounded bylog⁡K\\log Kunder a uniform prior, anddupont2018learningrecommend setting the target capacity equal to this value; accordingly, we setCd=log⁡KC\_\{d\}=\\log K\. We then vary only the continuous capacity, training three models withCc∈50,500,2000C\_\{c\}\\in\{50,500,2000\}to assess how sensitive site\-structure recovery is to this manual capacity specification\.

Next, we compare our proposed model\-based annealing strategy against the alternative approach that applies annealing solely in the loss function \(Eq\.[7](https://arxiv.org/html/2605.13933#S2.E7)\)\. We include two additional baselines: a linear principle component analysis \(PCA\) andkk\-means pipeline and a nonlinear baseline where a standard VAE’s continuous latent space is clustered usingkk\-means\. For all hybrid models, discrete assignments are obtained from the Gumbel–Softmax output\. To quantify how well each method recovers the underlying acquisition sites, we compute the Adjusted Rand Index \(ARI\) between inferred clusters and ground\-truth site labels\. ARI measures similarity between two labelings while correcting for chance agreement, ranging from–1–1\(worse than random\) to11\(perfect agreement\), with 0 indicating chance\-level correspondence\. This makes ARI well suited for evaluating latent site\-structure discovery\. To assess variability, we bootstrap the dataset with1,0001,000resamples and report the resulting ARI distributions for all models\. We perform a t\-test on the bootstrapped ARI values at 25 classes, which is the true number of sites to demonstrate that the observed improvements are statistically significant\.

## 3Results

### 3\.1Manual Capacity tuning on Joint\-VAE

WhenCc=50C\_\{c\}=50, the discrete codes recover a reasonable site structure, with distinct clusters that align well with the true site labels\. However, asCcC\_\{c\}is increased to500500and then2,0002,000, the model increasingly relies on the continuous latentzcz\_\{c\}, and the discrete assignments lose granularity, collapsing previously distinct site clusters \(FigureLABEL:fig:problem\_figure\) \. This behavior highlights that the Joint\-VAE objective is highly sensitive to manual capacity tuning and that small changes inCcC\_\{c\}can qualitatively alter how information is distributed between continuous and discrete latent spaces\.

\\floatconts

fig:problem\_figure![Refer to caption](https://arxiv.org/html/2605.13933v1/x2.png)

Figure 2:Latent space of the Joint\-VAE \(colored by ground\-truth sites and learned discrete classes\)\. As continuous capacity increases, the discrete space collapses, merging distinct site clusters and illustrating the sensitivity of the original Joint\-VAE to manual capacity tuning, motivating the need for a principled, automatic mechanism to balance continuous and discrete representations\.
### 3\.2Site learning comparison for improved Joint\-VAE with annealing

We then evaluate how well each method recovers the underlying acquisition sites by varying the number of discrete classes from 5 to 25, where 25 corresponds to the true number of sites in the dataset\. For PCA and the fully continuous VAE baseline, we use the same number of clusters forkk\-means\. At low numbers of classes, PCA, the continuous VAE, and both Joint\-VAE variants perform similarly, likely because the small number of classes forces many true sites to be merged together, effectively averaging across multiple acquisition conditions\. However, as the number of classes approaches the true value, the differences between methods become pronounced\. Beginning around 20 classes, both PCA and the loss\-annealed Joint\-VAE show a marked drop in performance, whereas the model\-annealed Joint\-VAE remains substantially more stable\. At 25 classes, the ARI achieved by the model\-annealed Joint\-VAE is significantly higher \(p<0\.05p<0\.05\) than all other methods, indicating that model\-level annealing yields far more reliable and robust site\. discovery\(Fig\.LABEL:fig:quant\_figureA\)\.

To better understand what the model learns, we visualize the continuous latent space in two dimensions usint t\-SNE \(Fig\.LABEL:fig:qual\_figure\), coloring each point by the discrete assignments inferred by the model\. The assignments closely match the ground\-truth site labels, which are defined by unique combinations of acquisition parameters\. To further probe this structure, we color the same latent space using the acquisition parameters themselves\. The latent space separates multi\-shell from single\-shell acquisitions clearly, and within the single\-shell group, the latent space smoothly organizes subjects according to TE, TR, and angular resolution \(number of directions\)\. These patterns demonstrate that the model discovers a meaningful factorization of acquisition variability: rather than being driven by any single parameter, the structure reflects a nonlinear combination of all acquisition characteristics\. Importantly, this organization emerges in an unsupervised manner solely through the joint continuous–discrete representation\.

\\floatconts

fig:quant\_figure![Refer to caption](https://arxiv.org/html/2605.13933v1/x3.png)

Figure 3:A\) ARI comparison across methods as the number of discrete classes varies\. Joint\-VAE with model\-based annealing and Joint\-VAE with loss\-based annealing both outperform PCA\+kk\-means and VAE\+kk\-means as the number of classes approaches the true number of acquisition sites \(25\), with the model\-annealed Joint\-VAE showing the most stable and consistently high ARI \(p<0\.05p<0\.05; 1000 bootstrap resamples\)\. B\) Sensitivity analysis of the annealing suppression duration\. Both ARI and homogeneity remain stable across a wide range, with performance degrading when suppression consumes most of training, indicating robustness to this hyperparameter while highlighting the need for sufficient joint\-optimization time\.
### 3\.3Sensitivity of improved Joint\-VAE to annealing

To assess how sensitive the proposed method is to the annealing hyperparameter, we perform a sweep over the number of iterations used for the linear increase ofλ\\lambda\. Becauseλ\\lambdaalways ranges from 0 to 1, its effect is controlled entirely by the duration of the warm\-up schedule\. We therefore vary the annealing length from5,0005,000iterations up to the full 35,000 training iterations, with the latter effectively suppressing joint optimization for the entire training regime\(Fig\.LABEL:fig:quant\_figureB\)\. Across most of this range, both ARI and homogeneity remain stable, indicating that the model is relatively robust to the choice of warm\-up length\. However, when the annealing schedule is extended too far, the model’s ability to recover site structure degrades: prolonged suppression of the continuous channel prevents the network from fully transitioning into the joint learning objective\.

\\floatconts

fig:qual\_figure![Refer to caption](https://arxiv.org/html/2605.13933v1/x4.png)

Figure 4:Left: Data points colored by true acquisition site and model\-learned discrete assignments, showing strong correspondence\. Right: the same space colored by acquisition parameters using the LAB scientific color space \(L = TE, A = TR, B =\#\\\#directions\)\. Marker size indicates shell value for single\-shell data, and symbol type indicates shell combinations for multi\-shell data\. The latent structure captures multiple acquisition characteristics simultaneously, demonstrating that the model learns diverse sources and groupings of acquisition variability\.

## 4Discussion and Conclusion

Our findings demonstrate that hybrid continuous–discrete latent spaces provide a useful mechanism for capturing acquisition\-related variability in dMRI in a fully unsupervised manner\. By explicitly modeling both smooth variation and categorical structure, the Joint\-VAE is able to recover meaningful clusters corresponding to scanner and protocol differences\. The proposed model\-based annealing strategy further offers a principled way to balance the capacities of the continuous and discrete components without manual tuning, enabling stable and reliable site discovery across a wide range of settings\. The structure of the learned latent space reinforces that no single acquisition parameter dominates dMRI variability\. Instead, the latent representation reveals that TE, TR, angular resolution, b\-values, and shell structure all contribute jointly, and the hybrid latent space naturally captures this multi\-factor dependence\.

A notable limitation is that the latent space is dominated by acquisition\-related variation rather than biological differences between subjects\. This is expected in a fully unsupervised setting given the strength of acquisition effects in large, heterogeneous datasets, but it highlights an important direction for future work\. A semi\- or weakly supervised extension—where the discrete latent variable captures acquisition structure in this unsupervised way, while dedicated continuous dimensions are encouraged to encode biological variability—could better separate these factors\. Such models would support large\-scale, cross\-study neuroimaging analyses that account for acquisition heterogeneity while preserving meaningful biological signal\.

\\midlacknowledgments

This work was conducted in part using the resources of the Advanced Computing Center for Research and Education at Vanderbilt University, Nashville, TN, and was supported by NIH grant R01EB017230 \(PI: Landman\) and K01AG073584\(PI: Archer\)\. We would also like to thank the Vanderbilt Institute for Clinical and Translational Research \(VICTR\) for providing computation resources to aid in this research\. This work was supported by the Alzheimer’s Disease Sequencing Project Phenotype Harmonization Consortium \(ADSP\-PHC\) that is funded by NIA \(U24 AG074855, U01 AG068057 and R01 AG059716\)\.

This work was also supported by the National Cancer Institute \(NCI\) grants R01 CA253923 and R01 CA275015 This research was supported in part by the Intramural Research Program of the National Institutes of Health \(NIH\)\. The contributions of the NIH author\(s\) were made as part of their official duties as NIH federal employees, are in compliance with agency policy requirements, and are considered Works of the United States Government\. However, the findings and conclusions presented in this paper are those of the author\(s\) and do not necessarily reflect the views of the NIH or the U\.S\. Department of Health and Human Services\. The BLSA dataset was supported by the Intramural Research Program of the National Institute on Aging, NIH\. Data from the Wisconsin Registry for Alzheimer’s Prevention \(WRAP\) was supported by NIA grants AG021155, AG0271761, AG037639, and AG054047\.

We gratefully acknowledge the efforts of the HABS\-HD MPIs: Sid E\. O’Bryant, Kristine Yaffe, Arthur Toga, Robert Rissman, and Leigh Johnson, as well as the HABS\-HD Investigators: Meredith Braskie, Kevin King, James R\. Hall, Melissa Petersen, Raymond Palmer, Robert Barber, Yonggang Shi, Fan Zhang, Rajesh Nandy, Roderick McColl, David Mason, Bradley Christian, Nicole Phillips, Stephanie Large, Joe Lee, Badri Vardarajan, Monica Rivera Mindt, Amrita Cheema, Lisa Barnes, Mark Mapstone, Annie Cohen, Amy Kind, Ozioma Okonkwo, Raul Vintimilla, Zhengyang Zhou, Michael Donohue, Rema Raman, Matthew Borzage, Michelle Mielke, Beau Ances, Ganesh Babulal, Jorge Llibre\-Guerra, Carl Hill, and Rocky Vig\. Research related to HABS\-HD was supported by the National Institute on Aging of the National Institutes of Health under award numbers R01AG054073, R01AG058533, R01AG070862, P41EB015922, and U19AG078109\.

Data contributed from the ROS/MAP/MARS studies were supported by grants from the National Institute on Aging: R01AG017917, P30AG10161, P30AG072975, R01AG022018, R01AG056405, UH2NS100599, UH3NS100599, R01AG064233, R01AG15819, and R01AG0\-67482, along with support from the Illinois Department of Public Health \(Alzheimer’s Disease Research Fund\)\. These data are available at www\.radc\.rush\.edu\. Data were also provided in part by OASIS\-4: Clinical Cohort, led by Principal Investigators T\. Benzinger, L\. Koenig, and P\. LaMontagne\.

We also acknowledge the National Alzheimer’s Coordinating Center \(NACC\) database, which is funded by NIA/NIH Grant U24AG072122\. Data were contributed by NIA\-funded Alzheimer’s Disease Research Centers \(ADRCs\), including but not limited to: P30AG062429 \(PI: James Brewer\), P30AG066468 \(PI: Oscar Lopez\), P30AG062421 \(PI: Bradley Hyman\), P30AG066509 \(PI: Thomas Grabowski\), P30AG066514 \(PI: Mary Sano\), P30AG066530 \(PI: Helena Chui\), P30AG066507 \(PI: Marilyn Albert\), P30AG066444 \(PI: John Morris\), and numerous others listed in full above\. Data collection and sharing for this project was funded by the Alzheimer’s Disease Neuroimaging Initiative \(ADNI\) \(National Institutes of Health Grant U01 AG024904\) and DOD ADNI \(Department of Defense award number W81XWH\-12\-2\-0012\)\. ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the following: AbbVie, Alzheimer’s Association; Alzheimer’s Drug Discovery Foundation; Araclon Biotech; BioClinica, Inc\.; Biogen; Bristol\-Myers Squibb Company; CereSpir, Inc\.; Cogstate; Eisai Inc\.; Elan Pharmaceuticals, Inc\.; Eli Lilly and Company; EuroImmun; F\. Hoffmann\-La Roche Ltd and its affiliated company Genentech, Inc\.; Fujirebio; GE Healthcare; IXICO Ltd\.; Janssen Alzheimer Immunotherapy Research & Development, LLC\.; Johnson & Johnson Pharmaceutical Research & Development LLC\.; Lumosity; Lundbeck; Merck & Co\., Inc\.; Meso Scale Diagnostics, LLC\.; NeuroRx Research; Neurotrack Technologies; Novartis Pharmaceuticals Corporation; Pfizer Inc\.; Piramal Imaging; Servier; Takeda Pharmaceutical Company; and Transition Therapeutics\. The Canadian Institutes of Health Research is providing funds to support ADNI clinical sites in Canada\. Private sector contributions are facilitated by the Foundation for the National Institutes of Health \(www\.fnih\.org\)\. The grantee organization is the Northern California Institute for Research and Education, and the study is coordinated by the Alzheimer’s Therapeutic Research Institute at the University of Southern California\. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of Southern California\.

During the preparation of this work, the authors used GitHub Copilot and ChatGPT to assist with code generation, debugging, editing, and sentence restructuring\. All AI\-generated content was reviewed and edited by the authors, who take full responsibility for the final content of the publication\.

## References
Unsupervised learning of acquisition variability in structural connectomes via hybrid latent space modeling

Similar Articles

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography

Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning

Resolving superposition in AI for interpretability and cross-modal alignment in patient-neuronal images

Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding

Unlocking Latent Dimensions: Exploring Representations of Large-Scale X-ray Scattering Data using Variational Autoencoders

Submit Feedback

Similar Articles

Sparse Autoencoders Map Brain-LLM Alignment onto Cortical Semantic Topography
Network-Aware Bilinear Tokenization for Brain Functional Connectivity Representation Learning
Resolving superposition in AI for interpretability and cross-modal alignment in patient-neuronal images
Meta-learning In-Context Enables Training-Free Cross Subject Brain Decoding
Unlocking Latent Dimensions: Exploring Representations of Large-Scale X-ray Scattering Data using Variational Autoencoders