Hybrid Classical-Quantum Variational Autoencoder for Neural Topic Modeling

arXiv cs.CL Papers

Summary

This paper proposes a hybrid classical-quantum variational autoencoder for neural topic modeling, embedding parameterized quantum circuits in the inference network. Experiments on the AgNews dataset demonstrate improved topic coherence and diversity compared to state-of-the-art classical models, showing viability on NISQ-era quantum devices.

arXiv:2606.13852v1 Announce Type: new Abstract: Neural topic models enable scalable semantic discovery, but their integration with quantum hardware remains largely unexplored. We present a proof-of-concept hybrid classical-quantum variational autoencoder (VAE) for topic modeling, embedding parameterized quantum circuits within the VAE inference network while retaining a classical topic-word decoder. To address the resource constraints of quantum hardware, we propose a modified Gaussian Softmax posterior that decouples latent space dimensionality from the number of topics to be extracted, enabling the model to operate with a low-resource 10-qubit quantum device. On the AgNews dataset, the hybrid VAE outperforms state-of-the-art neural topic models (NTMs), reaching a $C_v$ coherence score of 0.71 and an NPMI score of 0.20 while preserving high topic diversity. For comparison, we also construct a fully classical variant, which also outperforms state-of-the-art models on AgNews and exhibits clear class separation in the latent space. These results demonstrate that hybrid VAEs are computationally viable even on NISQ-era devices and represent a promising direction for quantum-enhanced topic modeling.
Original Article
View Cached Full Text

Cached at: 06/15/26, 08:56 AM

# Hybrid Classical-Quantum Variational Autoencoder for Neural Topic Modeling
Source: [https://arxiv.org/html/2606.13852](https://arxiv.org/html/2606.13852)
###### Abstract

Neural topic models enable scalable semantic discovery, but their integration with quantum hardware remains largely unexplored\. We present a proof\-of\-concept hybrid classical\-quantum variational autoencoder \(VAE\) for topic modeling, embedding parameterized quantum circuits within the VAE inference network while retaining a classical topic\-word decoder\. To address the resource constraints of quantum hardware, we propose a modified Gaussian Softmax posterior that decouples latent space dimensionality from the number of topics to be extracted, enabling the model to operate with a low\-resource 10\-qubit quantum device\. On the AgNews dataset, the hybrid VAE outperforms state\-of\-the\-art neural topic models \(NTMs\), reaching aCvC\_\{v\}coherence score of 0\.71 and an NPMI score of 0\.20 while preserving high topic diversity\. For comparison, we also construct a fully classical variant, which also outperforms state\-of\-the\-art models on AgNews and exhibits clear class separation in the latent space\. These results demonstrate that hybrid VAEs are computationally viable even on NISQ\-era devices and represent a promising direction for quantum\-enhanced topic modeling\.

###### keywords:

Topic modeling, NLP, Variational autoencoder \(VAE\), Quantum machine learning \(QML\), Parameterized quantum circuit \(PQC\)

## 1Introduction

Topic modeling is a machine learning technique used to unveil latent topics within a set of documents, generally in an unsupervised fashion\. Topics are represented by sets of semantically related words that collectively describe coherent and distinct semantic concepts\. For instance, the topic “politics” may be described by words such as “law,” “policy,” and “government”\. Owing to its interpretability, topic modeling has seen widespread use in applications such as text analysis, document retrieval, and content recommendation\. Orthodox approaches to topic modeling include Bayesian probabilistic models, such as Latent Dirichlet Allocation \(LDA\), and matrix factorization methods\. LDA\[[1](https://arxiv.org/html/2606.13852#bib.bib1)\], one of the most popular techniques, uses Bayesian inference to discover latent topics by treating documents as distributions over topics\. Matrix factorization, on the other hand, decomposes a word\-document matrix into two lower\-rank matrices: one describing word\-topic relationships and the other describing topic\-document relationships\.

Conventional methods, however, face major challenges, including poor scalability to large datasets and an inability to capture non\-linear relationships between topics and words\. To overcome these limitations, Neural Topic Models \(NTMs\) have emerged as alternatives that can be trained efficiently and flexibly on large datasets using GPUs\. In this paper, we propose and analyze the performance of a hybrid version of a popular NTM, namely the Variational Autoencoder \(VAE\)\.

Motivated by recent advances in quantum machine learning, this work explores how parameterized quantum circuits can be integrated into a VAE\-based topic model while preserving competitive performance\. Specifically, we embed quantum components inside the VAE inference network and compare the resulting hybrid models with fully classical counterparts on standard topic modeling benchmarks\. Through this study, we provide a proof of concept for hybrid classical\-quantum neural topic modeling and assess how the quantum component affects topic coherence, diversity, and latent space organization\.

## 2Related Work

In contrast to a standard autoencoder, which offers a deterministic point estimate for the latent representation, a VAE produces latent variables that describe distributions over the latent space\. While the latent space in a traditional autoencoder is typically sparse and disjointed, the VAE latent space is smooth and continuous, enabling meaningful and consistent sampled data points \(topics\)\. Therefore, a VAE is better equipped to extract and structure topics from a document collection\.

A VAE\-based NTM comprises an inference network \(encoder\) and a generative network \(s​o​f​t​m​a​xsoftmaxdecoder\)\. The encoder infers latent variables from a bag\-of\-words \(BoW\) document representationxx, parameterizing distributions from which the topic distribution vectorzzis sampled\. The assumption is that the document collection follows a prior distributionp​\(z\)p\(z\), which can be approximated to determine the topic distributions among documents\. Thus, the encoderϕ\\phicomputesqϕ​\(z\|x\)q\_\{\\phi\}\(z\|x\), a variational approximation top​\(z\|x\)=p​\(z,x\)p​\(x\)p\(z\|x\)=\\frac\{p\(z,x\)\}\{p\(x\)\}, which is intractable because of the integral inp​\(x\)=∫p​\(x\|z\)​p​\(z\)​𝑑zp\(x\)=\\int p\(x\|z\)p\(z\)dz\. The approximation is obtained by minimizing the Kullback\-Leibler \(KL\) divergenceKL\[qϕ\(z\|x\)\|\|p\(z\|x\)\]=logp\(x\)−ELBOKL\[q\_\{\\phi\}\(z\|x\)\|\|p\(z\|x\)\]=\\log p\(x\)\-ELBO, whereE​L​B​OELBOis the Evidence Lower Bound\. Sincelog⁡p​\(x\)\\log p\(x\)is constant with respect tozz, this is equivalent to minimizing−ELBO=KL\[qϕ\(z\|x\)\|\|p\(z\)\]−𝔼qϕ​\(z\|x\)\(logp\(x\|z\)\)\-ELBO=KL\[q\_\{\\phi\}\(z\|x\)\|\|p\(z\)\]\-\\mathbb\{E\}\_\{q\_\{\\phi\}\(z\|x\)\}\(\\log p\(x\|z\)\)\.p​\(x\|z\)p\(x\|z\), the conditional distribution, is provided by the decoderθ\\theta, which reconstructsxx, the original document representation\. Hence, the loss function is expressed as:

ℒϕ,θ=−ELBO=KL\[qϕ\(z\|x\)\|\|p\(z\)\]−𝔼qϕ​\(z\|x\)\(logpθ\(x\|z\)\),\\mathcal\{L\}\_\{\\phi,\\theta\}=\-ELBO=KL\[q\_\{\\phi\}\(z\|x\)\|\|p\(z\)\]\-\\mathbb\{E\}\_\{q\_\{\\phi\}\(z\|x\)\}\(\\log p\_\{\\theta\}\(x\|z\)\),\(1\)where the second term is the reconstruction loss, and the first term, the KL divergence, regularizes the variational distribution to be close to the prior distribution\. The decoder is typically a bias\-free fully connected layerWWwithpθ​\(x\|z\)=s​o​f​t​m​a​x​\(W​z\)p\_\{\\theta\}\(x\|z\)=softmax\(Wz\)\.

Ordinarily, the prior distribution is assumed to be Gaussian, so the encoder infers the mean and the logarithm of the standard deviation of the distribution\(μ,log⁡σ\)\(\\mu,\\log\\sigma\)\. Instead ofz∼𝒩​\(μ,σ\)z\\sim\\mathcal\{N\}\(\\mu,\\sigma\), the reparameterization trick is applied asz=μ\+σ​ϵz=\\mu\+\\sigma\\epsilonwithϵ∈𝒩​\(𝟎,𝐈\)\\epsilon\\in\\mathcal\{N\}\(\\mathbf\{0\},\\mathbf\{I\}\)to alleviate instability during training\[[5](https://arxiv.org/html/2606.13852#bib.bib5)\]\. However, in topic modeling, the Gaussian prior is not always well\-suited, as it tends to push the topic means in latent space toward the center, thereby entangling topics\. Therefore, alternative approaches have been proposed to approximate the priorp​\(z\)p\(z\)\.

\[[11](https://arxiv.org/html/2606.13852#bib.bib11)\]introduced the Gaussian Softmax \(GSM\) technique, which applies a linear transformation tozzfollowed by a softmax activation function:g​\(z\)=s​o​f​t​m​a​x​\(W​z\+b\)g\(z\)=softmax\(Wz\+b\)\. A key advantage of GSM is that the latent space dimension can be smaller than the topic count – a feature we harness later\.\[[22](https://arxiv.org/html/2606.13852#bib.bib22)\]propose approximating the Dirichlet multinomial distribution using a Laplace approximation\. The Dirichlet distribution is particularly useful because it is defined over a\(K−1\)\(K\-1\)\-dimensional simplex, allowing control over the distribution of topic proportions\.\[[3](https://arxiv.org/html/2606.13852#bib.bib3)\]suggest leveraging semantically rich word embeddings to enhance topic models by factorizing the topic\-word matrix into a product of topic embedding and word embedding matrices\. Building upon previous work,\[[28](https://arxiv.org/html/2606.13852#bib.bib28)\]proposed vONTSS, an NTM based on optimal transport \(also used in[2025](https://arxiv.org/html/2606.13852#bib.bib25)\) that outperforms existing NTMs in an unsupervised setting on the AgNews and 20News benchmark datasets\. The vONTSS encoderϕ\\phimaps a bag\-of\-words \(BoW\) document representation into\(μ,k\)\(\\mu,k\)parameters of the von Mises\-Fisher \(vMF\) distribution, from which a latent vectorη\\etais sampled\. The vMF distribution is employed to mitigate the topic entanglement often ascribed to the Gaussian prior by restricting expressibility\. Additionally, the vectorη\\etais passed through a temperature function before thes​o​f​t​m​a​xsoftmaxapplication to tune the resulting topic distribution vectorzz\. Fromzz, the decoder reconstructs the original BoW representation using a trainable topic embedding matrix and a frozen word embedding matrix\.

Beyond classical autoencoders, quantum and hybrid autoencoders have gained significant attention in the quantum machine learning literature, offering potential advantages over their classical counterparts\. Some of these approaches have already been experimentally implemented on real quantum devices, demonstrating their feasibility on near\-term quantum hardware\. In a pioneering study,\[[16](https://arxiv.org/html/2606.13852#bib.bib16)\]introduced a quantum autoencoder designed to compress quantum states beyond classical capabilities\. Unlike classical autoencoders, where both input and output remain classical, their approach directly maps a quantum input to a quantum output, performing compression intrinsically within the quantum circuit\. Another notable quantum generative model was introduced by\[[7](https://arxiv.org/html/2606.13852#bib.bib7)\], who developed the first quantum VAE based on annealing\-based generative models\. This work later found applications in generative chemistry\[[6](https://arxiv.org/html/2606.13852#bib.bib6)\], demonstrating the potential of quantum\-enhanced VAEs for modeling complex probability distributions\.

More recently, hybrid quantum\-classical autoencoders have emerged as promising architectures\. These architectures integrate parameterized quantum circuits \(PQCs\) into classical autoencoder structures to improve various tasks, such as unsupervised dimensionality reduction and anomaly detection\[[14](https://arxiv.org/html/2606.13852#bib.bib14),[17](https://arxiv.org/html/2606.13852#bib.bib17)\]\. A common approach involves using a classical encoder to generate a latent representation, which is then processed by a PQC before being measured, yielding expectation values that serve as inputs to a classical decoder\. Alternatively, some approaches replace the classical encoder or decoder entirely with a PQC, allowing for direct extraction of information from quantum states or learning the probability distribution of quantum measurements\[[21](https://arxiv.org/html/2606.13852#bib.bib21),[13](https://arxiv.org/html/2606.13852#bib.bib13)\]\. Notably, these methods have been shown to efficiently learn hard quantum states \(e\.g\., Haar random states\) with only a linear number of parameters, unlike classical models, which scale exponentially\.

Several recent works adopt a NISQ\-oriented strategy in which a classical model first compresses the input and the quantum component operates only on the resulting low\-dimensional latent space\.\[[20](https://arxiv.org/html/2606.13852#bib.bib20)\]trained a ResNet10\-inspired convolutional autoencoder for image reconstruction, isolated its 64\-dimensional latent representation, amplitude\-encoded it into six qubits, and used QSVM/QOCSVM blocks for classification and anomaly detection\. Their results show that the downstream quantum block can work well when the reconstruction latent space preserves discriminative information, but that performance degrades on more abstract or imbalanced image data\. In a related MNIST study,\[[18](https://arxiv.org/html/2606.13852#bib.bib18)\]compressed images into 64 autoencoder features, further reduced them to five principal components, mapped them to a 5\-qubit circuit, and classified the resulting 32\-dimensional measurement distribution; the hybrid model remained functional but fell behind the classical latent space baseline, illustrating the cost of aggressive compression before quantum encoding\. Outside image data,\[[24](https://arxiv.org/html/2606.13852#bib.bib24)\]combined classical autoencoders with quantum neural networks for heart disease classification and reported competitive accuracy under limited\-data and noisy\-simulation settings\. These studies suggest that classical encoders are currently a practical bridge to quantum processing, while also making the information bottleneck a central design constraint\.

To the best of our knowledge, our work is the first to apply a hybrid classical\-quantum VAE with PQCs for topic modeling\. Unlike the above pipelines, our PQCs are embedded inside the VAE inference network to parameterize the posterior distribution used to infer document\-topic mixtures, while the decoder remains a topic\-word reconstruction module tailored to NTM evaluation\.

## 3Proposed Methods

Up to this point, we have provided a first glimpse of hybrid neural networks and neural topic modeling\. As mentioned above, in topic modeling we have to deal with large datasets, which compels us to devise sophisticated techniques that aptly balance scalability and performance\. In the following sections, we propose a novel technique based on VAEs and hybrid computing as a proof of concept for tackling topic modeling using quantum devices\. Alongside the hybrid VAE, we also present a fully classical counterpart that is used to comparatively assess the performance of the proposed technique\.

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x1.png)

Figure 1:Architecture of the hybrid VAE\. The red components are the trainable nodes, where CN and QN stand for classical node and quantum node, respectively\.
### 3\.1Hybrid VAE

Figure[1](https://arxiv.org/html/2606.13852#S3.F1)illustrates the model architecture at a high level\. This model is designed to comply with VQA constraints\. At an overarching level, the encoder network transforms a BoW document representation into the mean and logarithm of the variance of a Gaussian Softmax distribution, from which a topic distribution vector is sampled and decoded using a topic\-word matrix to reconstruct the BoW representation\.

The encoder networkϕ\\phicomprises a classical component and a quantum component\. Starting with a BoW document representationx∈ℛ\|V\|x\\in\\mathcal\{R\}^\{\|V\|\}, the model down\-projects it to a \#h\-dimensional vectorhhusing a fully connected layerℒ\|V\|→\#​h\(h\)\\mathcal\{L\}\_\{\|V\|\\to\\\#h\}^\{\(h\)\}, whereVVis the vocabulary of terms appearing in the document collection\. The hidden representationhhis then amplitude\-encoded into VQCs \(with padding\),𝒬\#​h→\#​q\(μ\)\\mathcal\{Q\}\_\{\\\#h\\to\\\#q\}^\{\(\\mu\)\}and𝒬\#​h→\#​q\(log⁡σ2\)\\mathcal\{Q\}\_\{\\\#h\\to\\\#q\}^\{\(\\log\\sigma^\{2\}\)\}, following Equation[2](https://arxiv.org/html/2606.13852#S3.E2)forx∈ℛNx\\in\\mathcal\{R\}^\{N\}:

x↦1‖x‖2​∑i=02n−1xi​\|i⟩\.x\\mapsto\\frac\{1\}\{\|\|x\|\|\_\{2\}\}\\sum\_\{i=0\}^\{2^\{n\}\-1\}x\_\{i\}\\ket\{i\}\.\(2\)The VQCs𝒬\#​h→\#​q\(μ\)\\mathcal\{Q\}\_\{\\\#h\\to\\\#q\}^\{\(\\mu\)\}and𝒬\#​h→\#​q\(log⁡σ2\)\\mathcal\{Q\}\_\{\\\#h\\to\\\#q\}^\{\(\\log\\sigma^\{2\}\)\}compute the meanμ\\muand the logarithm of the variancelog⁡σ2\\log\\sigma^\{2\}, respectively\. They consist of sequences of strongly entangling layers, each parameterized with rotation gates \(see Figure[5](https://arxiv.org/html/2606.13852#A1.F5)\)\. Ultimately, we measure either the state probabilities of a subset of qubits or the Pauli Z expectation values for each qubit, producing \#q\-dimensional classical vectorsm\(μ\)m^\{\(\\mu\)\}andm\(log⁡σ2\)m^\{\(\\log\\sigma^\{2\}\)\}\. We restrict ourselves to these two types of measurements because both are amenable to differentiation methods, such as the parameter\-shift rule and finite differences, using “PennyLane”\[[26](https://arxiv.org/html/2606.13852#bib.bib26)\]\. Due to the high entanglement in the VQCs, the measurements need to be individually scaled by learnable \#q\-dimensional vectorsα\(μ\)\\alpha^\{\(\\mu\)\}andα\(log⁡σ2\)\\alpha^\{\(\\log\\sigma^\{2\}\)\}to increase the variance within the measurements – an important step for alleviating topic entanglement and enhancing diversity\. Consequently, we obtainμ=m\(μ\)⊙α\(μ\)\\mu=m^\{\(\\mu\)\}\\odot\\alpha^\{\(\\mu\)\}andσ=e\(m\(log⁡σ2\)⊙α\(log⁡σ2\)\)\\sigma=\\sqrt\{e^\{\(m^\{\(\\log\\sigma^\{2\}\)\}\\odot\\alpha^\{\(\\log\\sigma^\{2\}\)\}\)\}\}as the mean and standard deviation of the Gaussian distribution over the latent space\. Unlike typical approaches in the literature\[[11](https://arxiv.org/html/2606.13852#bib.bib11),[4](https://arxiv.org/html/2606.13852#bib.bib4)\], which outputlog⁡σ\\log\\sigma, our method outputslog⁡σ2\\log\\sigma^\{2\}, which slightly enhances performance\. We apply a dropout layer to the inputxxto prevent memorization, as well as two batch normalization layers to the scaled measurements to stabilize the training process\.

The Gaussian Softmax distribution \(G​S​MtGSM\_\{t\}\), parameterized by the encoder outputs, is modified and endowed with a learnable temperature parameterτ\\tau, so that the topic distribution vectorzzis sampled as follows:z∼GG​S​Mt​\(μ,σ2\)z\\sim G\_\{GSM\_\{t\}\}\(\\mu,\\sigma^\{2\}\)\. The sampling process utilizes the reparameterization trick:

η=μ⊕ϵ⊙σ,z=s​o​f​t​m​a​x​\(τ​ℒ\#​q→\#​t​o​p​i​c​\(η\)\),with​ϵ∼𝒩​\(μ,σ2\),\\begin\{split\}\\eta=\\mu\\oplus\\epsilon\\odot\\sigma,\\\\ z=softmax\(\\tau\\mathcal\{L\}\_\{\\\#q\\to\\\#topic\}\(\\eta\)\),\\\\ \\text\{with\}\\,\\epsilon\\sim\\mathcal\{N\}\(\\mu,\\sigma^\{2\}\),\\end\{split\}\(3\)whereℒ\#​q→\#​t​o​p​i​c\\mathcal\{L\}\_\{\\\#q\\to\\\#topic\}is a fully connected layer without bias used to project into the topic vector space, whose dimension\#​t​o​p​i​c\\\#topicis the number of topics to be extracted\. TheG​S​MtGSM\_\{t\}is aptly chosen to decouple the topic count from the encoder output dimension, which eases the integration of the quantum components\. The temperature, on the other hand, is introduced to foster topic disentanglement\.

For the decoder network, we follow the proposal of\[[3](https://arxiv.org/html/2606.13852#bib.bib3)\]and utilize topic embeddingℒ\#​t​o​p​i​c→\#​e​m​b\\mathcal\{L\}\_\{\\\#topic\\to\\\#emb\}and word embeddingℒ\#​e​m​b→\|V\|\\mathcal\{L\}\_\{\\\#emb\\to\|V\|\}matrices to construct the topic\-word matrix\. Thus,x′x^\{\\prime\}is reconstructed as follows:

x′=s​o​f​t​m​a​x​\(ℒ\#​e​m​b→\|V\|​\(ℒ\#​t​o​p​i​c→\#​e​m​b​\(z\)\)\),x^\{\\prime\}=softmax\(\\mathcal\{L\}\_\{\\\#emb\\to\|V\|\}\(\\mathcal\{L\}\_\{\\\#topic\\to\\\#emb\}\(z\)\)\),\(4\)whereℒ\#​t​o​p​i​c→\#​e​m​b\\mathcal\{L\}\_\{\\\#topic\\to\\\#emb\}andℒ\#​e​m​b→\|V\|\\mathcal\{L\}\_\{\\\#emb\\to\|V\|\}are both fully connected layers without bias\. In our experiments, we initialize the word embedding layer with pre\-trained 300\-dimensional GloVe embeddings \(\#​e​m​b=300\\\#emb=300\)\[[12](https://arxiv.org/html/2606.13852#bib.bib12)\]to improve training\.

A topic diversity regularization termℒT​D\\mathcal\{L\}\_\{TD\}is introduced to foster diversity, following the recommendations of\[[11](https://arxiv.org/html/2606.13852#bib.bib11)\]\. However, instead of using thea​r​c​c​o​sarccosfunction to capture angular distances as they did, we utilize cosine similarity\. The purpose of this regularization is to encourage the model to learn orthogonal topic embedding vectors, thereby reducing topic entanglement\. Hence, the regularization is calculated as the sum of the mean and variance of the absolute values of the topic embedding distances:

ℒT​D=ζ\+v​a​r​i​a​n​c​e​\(ζ\),with​ζ=1\#​t​o​p​i​c2​∑i=1\#​t​o​p​i​c∑j=1\#​t​o​p​i​c\|c​o​s​\_​s​i​m​\(ti,tj\)\|,\\begin\{split\}&\\mathcal\{L\}\_\{TD\}=\\zeta\+variance\(\\zeta\),\\\\ &\\text\{with\}\\,\\zeta=\\frac\{1\}\{\\\#topic^\{2\}\}\\sum\_\{i=1\}^\{\\\#topic\}\\sum\_\{j=1\}^\{\\\#topic\}\|cos\\\_sim\(t\_\{i\},t\_\{j\}\)\|,\\end\{split\}\(5\)whereti,tj∈ℒ\#​t​o​p​i​c→\#​e​m​bt\_\{i\},t\_\{j\}\\in\\mathcal\{L\}\_\{\\\#topic\\to\\\#emb\}are topic embeddings of length\#​e​m​b\\\#emb\. To minimize the loss, the model must reduce bothζ\\zetaandvariance​\(ζ\)\\text\{variance\}\(\\zeta\)towards 0 to promote orthogonality\.

The final loss functionℒ\\mathcal\{L\}combines the reconstruction lossℒr​e​c​o​n\\mathcal\{L\}\_\{recon\}, the topic diversity regularizationℒT​D\\mathcal\{L\}\_\{TD\}, and the Kullback\-Leibler divergenceℒK​L\\mathcal\{L\}\_\{KL\}between the posterior distribution and the standard Gaussian distribution\. Ergo, we have

ℒ=ℒr​e​c​o​n\+ℒK​L\+ℒT​D,with​ℒr​e​c​o​n=∑i=1\|V\|xi​log⁡xi′,ℒK​L=12​∑i=1\#​q\(−1−2​log⁡σi\+σi2\+μi2\),\\begin\{split\}&\\mathcal\{L\}=\\mathcal\{L\}\_\{recon\}\+\\mathcal\{L\}\_\{KL\}\+\\mathcal\{L\}\_\{TD\},\\\\ &\\text\{with\}\\,\\mathcal\{L\}\_\{recon\}=\\sum\_\{i=1\}^\{\|V\|\}x\_\{i\}\\log x\_\{i\}^\{\\prime\},\\,\\mathcal\{L\}\_\{KL\}=\\frac\{1\}\{2\}\\sum\_\{i=1\}^\{\\\#q\}\(\-1\-2\\log\\sigma\_\{i\}\+\\sigma\_\{i\}^\{2\}\+\\mu\_\{i\}^\{2\}\),\\end\{split\}\(6\)whereμi\\mu\_\{i\}andσi\\sigma\_\{i\}are the i\-th elements of the vectorsμ\\muandσ\\sigma, respectively\.

#### 3\.1\.1Quantum components

Although the functional form of the encoder has been defined, the specific quantum components have not yet been specified\. As illustrated in Figure[1](https://arxiv.org/html/2606.13852#S3.F1), the encoderϕ\\phiconsists of two PQCs following a fully connected down\-projection layer\. The hidden representationhhis amplitude\-encoded into a quantum state as

\|ψh⟩=S​\(h\)​\|0⟩⊗\#​q=1‖h‖​∑i=02\#​q−1hi​\|i⟩,\\ket\{\\psi\_\{h\}\}=S\(h\)\\ket\{0\}^\{\\otimes\\\#q\}=\\frac\{1\}\{\\\|h\\\|\}\\sum\_\{i=0\}^\{2^\{\\\#q\}\-1\}h\_\{i\}\\ket\{i\},\(7\)whereS​\(h\)S\(h\)represents an arbitrary state preparation routine parameterized by angles derived fromhh\. The PQCs then process the quantum state\|ψh⟩\\ket\{\\psi\_\{h\}\}, applying the unitary transformationsU​\(θ\(μ\)\)U\(\\theta^\{\(\\mu\)\}\)andU​\(θ\(log⁡σ2\)\)U\(\\theta^\{\(\\log\\sigma^\{2\}\)\}\), corresponding to the mean and logarithm of the variance, respectively\.

To extract classical information, the expectation value of the Pauli\-Z operator on each qubit is measured\. These expectation values, combined with trainableα\\alpha\-vectors, are used to compute the parameters of the distribution\. Specifically, the classical vectors obtained from the quantum circuits are given by

mi\(μ\)=⟨ψh\|​Zi​U​\(θ\(μ\)\)​\|ψh⟩,mi\(log⁡σ2\)=⟨ψh\|​Zi​U​\(θ\(log⁡σ2\)\)​\|ψh⟩,m^\{\(\\mu\)\}\_\{i\}=\\bra\{\\psi\_\{h\}\}Z\_\{i\}U\(\\theta^\{\(\\mu\)\}\)\\ket\{\\psi\_\{h\}\},\\quad m^\{\(\\log\\sigma^\{2\}\)\}\_\{i\}=\\bra\{\\psi\_\{h\}\}Z\_\{i\}U\(\\theta^\{\(\\log\\sigma^\{2\}\)\}\)\\ket\{\\psi\_\{h\}\},\(8\)whereZiZ\_\{i\}denotes the Pauli\-Z operator acting on the i\-th qubit\. Thus, the complete dressed quantum circuit for the encoder can be expressed as

ϕ=ℒ\|V\|→\#​h\(h\)∘𝒬\#​h→\#​q\(∗\)∘α\(∗\),\\phi=\\mathcal\{L\}\_\{\|V\|\\to\\\#h\}^\{\(h\)\}\\circ\\mathcal\{Q\}\_\{\\\#h\\to\\\#q\}^\{\(\*\)\}\\circ\\alpha^\{\(\*\)\},\(9\)which maps the BoW representation to the mean and logarithm of the variance of the distribution\.

### 3\.2Classical VAE

To construct the classical counterpart, we replace both VQCs with fully connected layers followed byt​a​n​htanhactivation functions\. Additionally, we remove the temperature parameter as well as the parameter vectorsm\(μ\)m^\{\(\\mu\)\}andm\(log⁡σ2\)m^\{\(\\log\\sigma^\{2\}\)\}\.

## 4Experiments

Given the significant impact of dataset pre\-processing on evaluation results\[[2023](https://arxiv.org/html/2606.13852#bib.bib2),[2016](https://arxiv.org/html/2606.13852#bib.bib19),[2021](https://arxiv.org/html/2606.13852#bib.bib23)\], we strictly adhere to the pre\-processing steps outlined in the paper that reports state\-of\-the\-art \(SOTA\) performance\[[28](https://arxiv.org/html/2606.13852#bib.bib28)\]\. To this end, we reuse their publicly accessible pre\-processing algorithm\[[2023](https://arxiv.org/html/2606.13852#bib.bib27)\]without modifications\. This allows us to compare their evaluation results effectively with our own\. We test both hybrid and classical models in two variants: a small latent space \(SLS\) with\#​q=10\\\#q=10and a large latent space \(LLS\) with\#​q=32\\\#q=32\. For the classical model, we simply adjust the size of the fully connected layers accordingly\. For the hybrid model, we use 10 qubits and adjust the measurement method, using the Pauli Z expectation values with respect to every qubit for\#​q=10\\\#q=10and the state probabilities of the last 5 qubits for\#​q=32\\\#q=32\. Thus, we have two hybrid models and two classical models\. Herein, we detail the datasets, evaluation metrics, and experimental settings used in our study\.

##### Task:

The objective is to extract 20 topics from each of the benchmark datasets by training the models in an unsupervised manner\. The decoder network of the model provides the solution by encoding the probability distribution over words for every topic\. Thus, both the hybrid and classical model architectures are adjusted to feature decoders that resemble20×\|V\|20\\times\|V\|matrices\.

##### Datasets:

For training and evaluation, we utilize the following popular benchmark datasets:

1. \-20News\[[8](https://arxiv.org/html/2606.13852#bib.bib8)\]: This dataset consists of approximately 18,000 newsgroup posts across 20 topics, including religion, politics, and sports\. It can be accessed using the “octis” library\. During pre\-processing, each document was tokenized and cleaned, with stop words and words occurring in more than 15% of all documents or fewer than 20 times removed\[[2023](https://arxiv.org/html/2606.13852#bib.bib28)\]\.
2. \-AgNews\[[29](https://arxiv.org/html/2606.13852#bib.bib29)\]: The AG’s News dataset comprises four classes, each containing 30,000 news articles sourced from the web and covering topics such as sports and business\. It can be retrieved using the “Hugging Face” datasets library\. Its pre\-processing is the same as that of the 20News dataset\.

Table 1:Statistics of the topic modeling benchmark datasets usedDatasets\#Documents\#Terms20News16,3091,369AgNews120,00014,696

##### Evaluation Metrics:

To assess the quality of the learned topic\-word matrix, we employ standard metrics widely used in topic modeling:

1. \-NPMI \(Normalized Pointwise Mutual Information\)\[[2014](https://arxiv.org/html/2606.13852#bib.bib9)\]gauges the strength of the relationship between word pairs within topics based on their co\-occurrence in a document collection\. It is defined asN​P​M​I​\(wi,wj\)=P​M​I​\(wi,wj\)log⁡P​\(wi,wj\)NPMI\(w\_\{i\},w\_\{j\}\)=\\frac\{PMI\(w\_\{i\},w\_\{j\}\)\}\{\\log P\(w\_\{i\},w\_\{j\}\)\}, whereP​M​I​\(wi,wj\)=log⁡P​\(wi,wj\)P​\(wi\)​P​\(wj\)PMI\(w\_\{i\},w\_\{j\}\)=\\log\\frac\{P\(w\_\{i\},w\_\{j\}\)\}\{P\(w\_\{i\}\)P\(w\_\{j\}\)\}is the pointwise mutual information,P​\(wi\)P\(w\_\{i\}\)is the probability of wordwiw\_\{i\}occurring, andP​\(wi,wj\)P\(w\_\{i\},w\_\{j\}\)is the probability of both words co\-occurring\.
2. \-CvC\_\{v\}Coherence\[[2015](https://arxiv.org/html/2606.13852#bib.bib15)\]evaluates topic coherence based on NPMI and word co\-occurrence\. It posits that words within the same topic should frequently co\-occur in the document collection\.
3. \-TD \(Topic Diversity\)\[[2019](https://arxiv.org/html/2606.13852#bib.bib3)\]measures the diversity of the discovered topics\. It is defined as the ratio of unique words among the top\-K words across all topics\.
4. \-Quality\[[2019](https://arxiv.org/html/2606.13852#bib.bib3)\]is a derived metric introduced by[Dieng et al](https://arxiv.org/html/2606.13852#bib.bib3)to provide a single score reflecting topic quality\. It is calculated as the product of coherence and topic diversity\. In our experiments, we useCvC\_\{v\}for coherence, as it offers a more robust evaluation than NPMI\.

Higher scores for these metrics are better, with 1 indicating the best possible score\. In our implementation, we use the top 10 topic words to computeCvC\_\{v\}and NPMI, while the top 25 topic words are used for TD, following the settings in the paper with SOTA performance\[[2023](https://arxiv.org/html/2606.13852#bib.bib28)\]\. We employ the Python library “octis”\[[2021](https://arxiv.org/html/2606.13852#bib.bib23)\]to calculate all the aforementioned metrics\.

##### Training settings:

For our experiments, we set the learning rate to 2e\-3 and the batch size to 200, employ the Adam optimizer, and train for 20 epochs\. Each model is trained five times, with the seed values for the “torch\(cuda\)”, “numpy”, and “math” libraries incremented by 1, starting from 42 and ending at 46\. This approach ensures consistent initialization conditions, including the same training set and initial parameters\. We use a single GPU as hardware\. The quantum device is simulated without noise using PennyLane’s “default\.qubit” device, with gradients computed via backpropagation\. Nevertheless, our hybrid model is fully compatible with differentiation methods such as parameter\-shift and finite differences, allowing the quantum components to be executed on either simulated noisy quantum devices or actual quantum hardware\.

##### Testing settings:

After each epoch, we assess the performance of the trained model on the benchmark datasets 20News and AgNews using the metricsCvC\_\{v\}, NPMI, and TD\. The 20 topics are derived from the decoder network by applying as​o​f​t​m​a​xsoftmaxfunction to the topic\-word matrix and selecting the top\-K words for each topic dimension\. Additionally, we save the log data generated during training and evaluation in a ”\.txt” file\. The log file follows this format:

```
Start Training: [date time]
Model: _, Pretrained Model: _
Dataset: [dataset name], Batch: [size], GPU: [ID]
Epoch _, Train Loss: [training loss]
Epoch _, CV: _, NPMI: _, TD: _
End Training: [date time]
```

Any information not specified in this format but included in the log files is considered irrelevant and can be disregarded\.

## 5Result Analysis & Discussion

So far, we have outlined the model architectures and experimental settings\. In this section, we analyze and discuss the results from our experiments for the four different models: classical VAE \(SLS\), classical VAE \(LLS\), hybrid VAE \(SLS\), and hybrid VAE \(LLS\)\. Figures[2](https://arxiv.org/html/2606.13852#S5.F2)and[3](https://arxiv.org/html/2606.13852#S5.F3)report the quality scores aggregated over five independent runs, providing a basis for comparing the models’ performance\. Figure[4](https://arxiv.org/html/2606.13852#S5.F4)illustrates the latent spaces of the trained models\.

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x2.png)

Figure 2:Averages and standard deviations of model quality scores across 5 runs on AgNews during training\.

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x3.png)

Figure 3:Averages and standard deviations of model quality scores across 5 runs on 20News during training\.

We begin by examining the training dynamics of the models\. Figures[2](https://arxiv.org/html/2606.13852#S5.F2)and[3](https://arxiv.org/html/2606.13852#S5.F3)show the evolution of aggregated quality scores over 20 epochs for AgNews and 20News\. Two consistent trends emerge\. First, the hybrid models exhibit similar performance regardless of the latent space configuration, with curves closely following each other\. Second, while models trained on 20News gradually seem to converge toward an upper plateau, those trained on AgNews reach peak performance rapidly before their scores begin to deteriorate\.

The first trend indicates that the choice of latent space configuration \(SLS vs\. LLS\) has limited influence on the performance of the hybrid models\. Since the two configurations differ only in their measurement strategy, the resulting representations passed from the quantum component to the classical component likely remain similar in informational content\. This stands in contrast to the classical models, where the LLS configuration employs a larger fully connected layer with more parameters, producing output vectors that are not only higher\-dimensional but also richer in information\.

The second trend can be attributed to the disparity in dataset sizes: AgNews contains nearly seven times more training samples than 20News\. This abundance allows models trained on AgNews to reach their optimal performance quickly, after which overfitting causes a decline\. In contrast, models trained on 20News start with lower scores and improve gradually, seeming to progress toward a plateau\. A closer inspection \(Figures[6](https://arxiv.org/html/2606.13852#A1.F6)–[9](https://arxiv.org/html/2606.13852#A1.F9)\) reveals that the decline in quality scores is primarily driven by decreasingCvC\_\{v\}coherence scores, whereas topic diversity increases and remains relatively stable\. This pattern suggests that, as training progresses, the models tend to generate less coherent topics, a typical symptom of overfitting\. Notably, this issue appears less severe in hybrid models and is exacerbated by the latent space configuration in classical models\.

Table[5](https://arxiv.org/html/2606.13852#S5)compares our models with the previously reported SOTA results\[[2023](https://arxiv.org/html/2606.13852#bib.bib28)\]\. The reported SOTA scores are averaged over ten runs, whereas our results are averaged over five runs\.

Table 2:Comparison of model performance\. Our results are averaged over five runs\. \* indicates results reported from the paper presenting the SOTA performance\[[2023](https://arxiv.org/html/2606.13852#bib.bib28)\]\.ModelsAgNews20NewsCvC\_\{v\}NPMITDCvC\_\{v\}NPMITDGSM\*0\.41±\\pm0\.010\.03±\\pm0\.010\.58±\\pm0\.020\.55±\\pm0\.040\.07±\\pm0\.030\.66±\\pm0\.05vONT\* \(SOTA\)0\.49±\\pm0\.020\.054±\\pm0\.020\.99±\\pm0\.010\.69±\\pm0\.030\.16±\\pm0\.020\.96±\\pm0\.03Our modelsClassical VAE \(SLS\)0\.7±\\pm0\.020\.19±\\pm0\.020\.93±\\pm0\.020\.72±\\pm0\.020\.15±\\pm0\.010\.84±\\pm0\.01Classical VAE \(LLS\)0\.65±\\pm0\.020\.15±\\pm0\.010\.92±\\pm0\.030\.69±\\pm0\.030\.14±\\pm0\.020\.82±\\pm0\.02Hybrid VAE \(SLS\)0\.71±\\pm0\.020\.2±\\pm0\.010\.95±\\pm0\.00\.71±\\pm0\.020\.15±\\pm0\.010\.83±\\pm0\.02Hybrid VAE \(LLS\)0\.65±\\pm0\.040\.16±\\pm0\.030\.96±\\pm0\.010\.73±\\pm0\.030\.16±\\pm0\.010\.82±\\pm0\.01

##### Performance on AgNews

All of our models substantially outperform the SOTA in topic coherence on AgNews\. The best\-performing model, Hybrid VAE \(SLS\), achieves a \(CvC\_\{v\}\) score of \(0\.71\) and an NPMI of \(0\.20\), compared to \(0\.49\) and \(0\.054\) for the SOTA model\. Topic diversity is slightly lower than the SOTA but remains high across all variants\.

The topic word clouds in Figure[10](https://arxiv.org/html/2606.13852#A1.F10)show that all models consistently identify the dominant topic keywords\. Differences appear mainly in the probability distributions assigned to these words rather than in the discovered topics themselves\.

Figure[4](https://arxiv.org/html/2606.13852#S5.F4)reveals an interesting observation: latent space separability does not correlate strongly with topic\-modeling performance\. The figure shows 1,000 randomly sampled latent vectors from three AgNews classes\. The classical VAE \(LLS\) exhibits the clearest inter‑class separation, outperforming the latent space organization reported for the vMF‑based SOTA model\. Yet this geometric clarity fails to translate into higher coherence or diversity scores\. This disconnect suggests that current evaluation metrics capture latent space disentanglement only superficially\. In particular, the topology of the learned latent space appears to correlate only weakly with the final performance metrics\.

For the classical models, the higher\-dimensional latent space appears to facilitate topic disentanglement and class separation\. In contrast, the hybrid models do not benefit from increased latent dimensionality to the same extent\. Despite operating in a high\-dimensional latent space, the outputs of the quantum components remain relatively similar across samples, limiting separability\.

Nevertheless, the hybrid models still achieve reasonable class separation and maintain competitive topic quality\. Topic entanglement is partially mitigated by the learnable scaling vectors applied after the quantum circuit outputs\. These scaling parameters increase output variance and improve class separation\.

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x4.png)

Figure 4:2\-D t\-SNE projection of randomly sampledzzfrom latent spaces under 4 different posterior distributions on AgNews\.

##### Performance on 20News

On 20News, our models achieve coherence scores comparable to or slightly better than the SOTA\. However, this improvement comes at the cost of reduced topic diversity\.

Inspection of the topic word clouds \(Figure[12](https://arxiv.org/html/2606.13852#A1.F12)\) indicates that some topics are not captured consistently\. For example, the medical topic generated by the Hybrid VAE \(SLS\) includes unrelated terms such as “bike” and “motorcycle”, indicating residual topic mixing\.

The latent space visualizations in Figure[11](https://arxiv.org/html/2606.13852#A1.F11)mirror the pattern observed on AgNews\. The classical VAE \(LLS\) displays the most pronounced cluster separation, whereas the classical VAE \(SLS\) and the hybrid variants show noticeably weaker structure, with topics entangled near the center\.

##### Discussion

The results support two main conclusions\.

First, hybrid VAEs are viable and trainable for neural topic modeling\. We observe no severe optimization difficulties, significant performance degradation, or evidence of barren plateau effects\[[2018](https://arxiv.org/html/2606.13852#bib.bib10)\]\.

Second, the Gaussian Softmax \(GSM\) posterior enables effective decoupling between the latent representation and the quantum circuit architecture\. This flexibility allows hybrid models to achieve performance comparable to purely classical alternatives\.

However, the hybrid models still struggle to disentangle topics as effectively as the best classical variants\. This limitation is partially alleviated through post\-processing layers applied after the quantum circuit outputs\. In further experiments \(not reported here\), post\-processing fully connected layers did not produce better results than simple learnable scaling vectorsm\(μ\)m^\{\(\\mu\)\}andm\(log⁡σ2\)m^\{\(\\log\\sigma^\{2\}\)\}, suggesting that the model can optimize a plain parameter vector more straightforwardly\.

The GSM baseline, which also employs the Gaussian Softmax distribution, underperforms considerably relative to our models, likely due to inappropriate activation functions \(s​o​f​t​p​l​u​ssoftplusinstead oft​a​n​htanh\), as well as the absence of batch normalization\.

Finally, although not included in the reported experiments, we evaluated architectures with between zero \(no trainable layer\) and five trainable post\-processing layers\. None of these configurations produced a significant improvement over the reported results\.

## 6Conclusion

In summary, we have successfully integrated a quantum component into the classical framework of a variational autoencoder\. This was achieved by designing a VQC\-friendly variational autoencoder that uses amplitude encoding and an enhanced Gaussian Softmax distribution\. We minimized the resource requirements for the quantum component to ensure NISQ compatibility\. Consequently, a quantum hardware setup with at least 10 qubits and mild error mitigation can effectively run the quantum circuit instead of relying solely on a quantum simulator\. The measurement outcomes, whether 10 or 32 elements, provide an advantage in execution time and enable precise gradient calculation methods, such as the parameter\-shift rule, which is essential for good approximations on NISQ hardware, even in the presence of noise\. Therefore, aside from the extensive runtime required for training on large datasets, our hybrid VAE could be executed on quantum hardware without significant performance loss\.

Moreover, while developing our hybrid model, we arrived at an architecture that surpasses SOTA results on AgNews by45%, in both classical and hybrid implementations\. This also shows that latent space dimensionality can be detached from the number of topics to be extracted, although this noticeably affects class separability\. However, we emphasize that the primary aim of a hybrid model is not to surpass classical models, which can always be improved, but rather to serve as a proof of concept for hybrid computation with variational autoencoders that ideally preserves performance\.

Future work could involve training our hybrid VAE on a small dataset using actual quantum hardware and conducting an in\-depth study of the effects of noise on model trainability\.

## Data and Code Availability

## Declarations

##### Conflict of Interest

The authors declare no competing interests\.

##### Generative AI

AI tools such as ChatGPT\-4o and ChatGPT\-5\.5 have been used for grammar and spelling checks, rephrasing, and rewording to improve the writing style\. The generated text has been carefully read and verified to be free of hallucinations\.

## References

- \\bibcommenthead
- Blei et al \[2001\]Blei D, Ng A, Jordan M \(2001\) Latent dirichlet allocation\. pp 601–608
- Bystrov et al \[2023\]Bystrov V, Naboka\-Krell V, Staszewska\-Bystrova A, et al \(2023\) Analysing the impact of removing infrequent words on topic quality in lda models\. URL[https://arxiv\.org/abs/2311\.14505](https://arxiv.org/abs/2311.14505),[2311\.14505](https://arxiv.org/html/2606.13852v1/2311.14505)
- Dieng et al \[2019\]Dieng AB, Ruiz FJR, Blei DM \(2019\) Topic modeling in embedding spaces\. URL[https://arxiv\.org/abs/1907\.04907](https://arxiv.org/abs/1907.04907),[1907\.04907](https://arxiv.org/html/2606.13852v1/1907.04907)
- Ding et al \[2018\]Ding R, Nallapati R, Xiang B \(2018\) Coherence\-aware neural topic modeling\. URL[https://arxiv\.org/abs/1809\.02687](https://arxiv.org/abs/1809.02687),[1809\.02687](https://arxiv.org/html/2606.13852v1/1809.02687)
- Figurnov et al \[2019\]Figurnov M, Mohamed S, Mnih A \(2019\) Implicit reparameterization gradients\. URL[https://arxiv\.org/abs/1805\.08498](https://arxiv.org/abs/1805.08498),[1805\.08498](https://arxiv.org/html/2606.13852v1/1805.08498)
- Gircha et al \[2023\]Gircha AI, Boev AS, Avchaciov K, et al \(2023\) Hybrid quantum\-classical machine learning for generative chemistry and drug design\. Scientific Reports 13\(1\)\.[10\.1038/s41598\-023\-32703\-4](https://arxiv.org/doi.org/10.1038/s41598-023-32703-4), URL[http://dx\.doi\.org/10\.1038/s41598\-023\-32703\-4](http://dx.doi.org/10.1038/s41598-023-32703-4)
- Khoshaman et al \[2018\]Khoshaman A, Vinci W, Denis B, et al \(2018\) Quantum variational autoencoder\. Quantum Science and Technology 4\(1\):014001\.[10\.1088/2058\-9565/aada1f](https://arxiv.org/doi.org/10.1088/2058-9565/aada1f), URL[http://dx\.doi\.org/10\.1088/2058\-9565/aada1f](http://dx.doi.org/10.1088/2058-9565/aada1f)
- Lang \[1995\]Lang K \(1995\) Newsweeder: Learning to filter netnews\. In: Prieditis A, Russell S \(eds\) Machine Learning Proceedings 1995\. Morgan Kaufmann, San Francisco \(CA\), p 331–339,[https://doi\.org/10\.1016/B978\-1\-55860\-377\-6\.50048\-7](https://arxiv.org/doi.org/https://doi.org/10.1016/B978-1-55860-377-6.50048-7), URL[https://www\.sciencedirect\.com/science/article/pii/B9781558603776500487](https://www.sciencedirect.com/science/article/pii/B9781558603776500487)
- Lau et al \[2014\]Lau JH, Newman D, Baldwin T \(2014\) Machine reading tea leaves: Automatically evaluating topic coherence and topic model quality\. In: Wintner S, Goldwater S, Riezler S \(eds\) Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics\. Association for Computational Linguistics, Gothenburg, Sweden, pp 530–539,[10\.3115/v1/E14\-1056](https://arxiv.org/doi.org/10.3115/v1/E14-1056), URL[https://aclanthology\.org/E14\-1056](https://aclanthology.org/E14-1056)
- McClean et al \[2018\]McClean JR, Boixo S, Smelyanskiy VN, et al \(2018\) Barren plateaus in quantum neural network training landscapes\. Nature Communications 9\(1\)\.[10\.1038/s41467\-018\-07090\-4](https://arxiv.org/doi.org/10.1038/s41467-018-07090-4), URL[http://dx\.doi\.org/10\.1038/s41467\-018\-07090\-4](http://dx.doi.org/10.1038/s41467-018-07090-4)
- Miao et al \[2018\]Miao Y, Grefenstette E, Blunsom P \(2018\) Discovering discrete latent topics with neural variational inference\. URL[https://arxiv\.org/abs/1706\.00359](https://arxiv.org/abs/1706.00359),[1706\.00359](https://arxiv.org/html/2606.13852v1/1706.00359)
- Pennington et al \[2014\]Pennington J, Socher R, Manning C \(2014\) GloVe: Global vectors for word representation\. In: Moschitti A, Pang B, Daelemans W \(eds\) Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing \(EMNLP\)\. Association for Computational Linguistics, Doha, Qatar, pp 1532–1543,[10\.3115/v1/D14\-1162](https://arxiv.org/doi.org/10.3115/v1/D14-1162), URL[https://aclanthology\.org/D14\-1162](https://aclanthology.org/D14-1162)
- Rao et al \[2023\]Rao A, Madan D, Ray A, et al \(2023\) Learning hard distributions with quantum\-enhanced variational autoencoders\. URL[https://arxiv\.org/abs/2305\.01592](https://arxiv.org/abs/2305.01592),[2305\.01592](https://arxiv.org/html/2606.13852v1/2305.01592)
- Rivas et al \[2021\]Rivas P, Zhao L, Orduz J \(2021\) Hybrid quantum variational autoencoders for representation learning\. In: 2021 International Conference on Computational Science and Computational Intelligence \(CSCI\), pp 52–57,[10\.1109/CSCI54926\.2021\.00085](https://arxiv.org/doi.org/10.1109/CSCI54926.2021.00085)
- Röder et al \[2015\]Röder M, Both A, Hinneburg A \(2015\) Exploring the space of topic coherence measures\. In: Proceedings of the Eighth ACM International Conference on Web Search and Data Mining\. Association for Computing Machinery, New York, NY, USA, WSDM ’15, p 399–408,[10\.1145/2684822\.2685324](https://arxiv.org/doi.org/10.1145/2684822.2685324), URL[https://doi\.org/10\.1145/2684822\.2685324](https://doi.org/10.1145/2684822.2685324)
- Romero et al \[2017\]Romero J, Olson JP, Aspuru\-Guzik A \(2017\) Quantum autoencoders for efficient compression of quantum data\. Quantum Science and Technology 2\(4\):045001\.[10\.1088/2058\-9565/aa8072](https://arxiv.org/doi.org/10.1088/2058-9565/aa8072), URL[http://dx\.doi\.org/10\.1088/2058\-9565/aa8072](http://dx.doi.org/10.1088/2058-9565/aa8072)
- Sakhnenko et al \[2022\]Sakhnenko A, O’Meara C, Ghosh KJB, et al \(2022\) Hybrid classical\-quantum autoencoder for anomaly detection\. Quantum Machine Intelligence 4\(2\)\.[10\.1007/s42484\-022\-00075\-z](https://arxiv.org/doi.org/10.1007/s42484-022-00075-z), URL[http://dx\.doi\.org/10\.1007/s42484\-022\-00075\-z](http://dx.doi.org/10.1007/s42484-022-00075-z)
- Sarkar \[2024\]Sarkar S \(2024\) Quantum transfer learning for mnist classification using a hybrid quantum\-classical approach\. URL[https://arxiv\.org/abs/2408\.03351](https://arxiv.org/abs/2408.03351),[2408\.03351](https://arxiv.org/html/2606.13852v1/2408.03351)
- Schofield and Mimno \[2016\]Schofield A, Mimno D \(2016\) Comparing apples to apple: The effects of stemmers on topic models\. Transactions of the Association for Computational Linguistics 4:287–300\.[10\.1162/tacl\_a\_00099](https://arxiv.org/doi.org/10.1162/tacl_a_00099), URL[https://aclanthology\.org/Q16\-1021](https://aclanthology.org/Q16-1021)
- Slabbert and Petruccione \[2024\]Slabbert D, Petruccione F \(2024\) Hybrid quantum\-classical feature extraction approach for image classification using autoencoders and quantum svms\. URL[https://arxiv\.org/abs/2410\.18814](https://arxiv.org/abs/2410.18814),[2410\.18814](https://arxiv.org/html/2606.13852v1/2410.18814)
- Srikumar et al \[2021\]Srikumar M, Hill CD, Hollenberg LCL \(2021\) Clustering and enhanced classification using a hybrid quantum autoencoder\. Quantum Science and Technology 7\(1\):015020\.[10\.1088/2058\-9565/ac3c53](https://arxiv.org/doi.org/10.1088/2058-9565/ac3c53), URL[http://dx\.doi\.org/10\.1088/2058\-9565/ac3c53](http://dx.doi.org/10.1088/2058-9565/ac3c53)
- Srivastava and Sutton \[2017\]Srivastava A, Sutton C \(2017\) Autoencoding variational inference for topic models\. URL[https://arxiv\.org/abs/1703\.01488](https://arxiv.org/abs/1703.01488),[1703\.01488](https://arxiv.org/html/2606.13852v1/1703.01488)
- Terragni et al \[2021\]Terragni S, Fersini E, Galuzzi BG, et al \(2021\) OCTIS: Comparing and optimizing topic models is simple\! In: Gkatzia D, Seddah D \(eds\) Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations\. Association for Computational Linguistics, Online, pp 263–270,[10\.18653/v1/2021\.eacl\-demos\.31](https://arxiv.org/doi.org/10.18653/v1/2021.eacl-demos.31), URL[https://aclanthology\.org/2021\.eacl\-demos\.31](https://aclanthology.org/2021.eacl-demos.31)
- Verdone et al \[2025\]Verdone A, Succetti F, Ceschini A, et al \(2025\) A hybrid quantum\-neural network for heart disease classification\. Biomedical Signal Processing and Control 113:109185\.[10\.1016/j\.bspc\.2025\.109185](https://arxiv.org/doi.org/10.1016/j.bspc.2025.109185), URL[https://www\.sciencedirect\.com/science/article/pii/S1746809425016969](https://www.sciencedirect.com/science/article/pii/S1746809425016969)
- Vuong et al \[2025\]Vuong HT, Le T, Vu T, et al \(2025\) HiCOT: Improving neural topic models via optimal transport and contrastive learning\. In: Findings of the Association for Computational Linguistics: ACL 2025\. Association for Computational Linguistics, Vienna, Austria, pp 13894–13920,[10\.18653/v1/2025\.findings\-acl\.715](https://arxiv.org/doi.org/10.18653/v1/2025.findings-acl.715), URL[https://aclanthology\.org/2025\.findings\-acl\.715/](https://aclanthology.org/2025.findings-acl.715/)
- Xanadu \[2024\]Xanadu I \(2024\) Supported configurations for differentiation methods\.[https://docs\.pennylane\.ai/en/stable/introduction/interfaces\.html](https://docs.pennylane.ai/en/stable/introduction/interfaces.html), accessed: 2024\-09\-19
- Xu \[2023\]Xu W \(2023\) vontss implementation\.[https://github\.com/xuweijieshuai/Neural\-Topic\-Modeling\-vmf](https://github.com/xuweijieshuai/Neural-Topic-Modeling-vmf), accessed: 2024\-09\-13
- Xu et al \[2023\]Xu W, Jiang X, Sengamedu Hanumantha Rao S, et al \(2023\) vontss: vmf based semi\-supervised neural topic modeling with optimal transport\. In: Findings of the Association for Computational Linguistics: ACL 2023\. Association for Computational Linguistics, p 4433–4457,[10\.18653/v1/2023\.findings\-acl\.271](https://arxiv.org/doi.org/10.18653/v1/2023.findings-acl.271), URL[http://dx\.doi\.org/10\.18653/v1/2023\.findings\-acl\.271](http://dx.doi.org/10.18653/v1/2023.findings-acl.271)
- Zhang et al \[2016\]Zhang X, Zhao J, LeCun Y \(2016\) Character\-level convolutional networks for text classification\. URL[https://arxiv\.org/abs/1509\.01626](https://arxiv.org/abs/1509.01626),[1509\.01626](https://arxiv.org/html/2606.13852v1/1509.01626)

## Appendix AExperiment Details

### A\.1Quantum circuit

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x5.png)

Figure 5:Quantum circuit of the hybrid VAE\.

### A\.2Result details

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x6.png)

Figure 6:Averages and standard deviations of modelCvC\_\{v\}scores across 5 runs on AgNews during training\.

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x7.png)

Figure 7:Averages and standard deviations of modelCvC\_\{v\}scores across 5 runs on 20News during training\.

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x8.png)

Figure 8:Averages and standard deviations of model TD scores across 5 runs on AgNews during training\.

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x9.png)

Figure 9:Averages and standard deviations of model TD scores across 5 runs on 20News during training\.

![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x10.png)

Figure 10:Top 100 word clouds for semantic topics across models on AgNews\.
![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x11.png)

Figure 11:2\-D t\-SNE projection of randomly sampledzzfrom latent spaces under 20 different posterior distributions on 20News\.
![[Uncaptioned image]](https://arxiv.org/html/2606.13852v1/x12.png)

Figure 12:Top 100 word clouds for semantic topics across models on 20News\.

Similar Articles

Supervised Latent Restructuring for Small-Data Quantum Learning in Plant Phenomics

arXiv cs.LG

This paper proposes a hybrid quantum-classical workflow for plant phenomics classification under small-data regimes, using supervised latent restructuring (PCA + LDA) to improve geometric separability before quantum kernel alignment. Experiments show improved separability but highlight compression trade-offs and the difficulty of achieving strong quantum performance.