Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms

arXiv cs.LG 06/02/26, 04:00 AM Papers
Summary
This paper introduces inner product aware quantization methods that preserve inner products with unseen vectors, developing fast and adaptive algorithms with provable guarantees, achieving 2-10x speedup over prior ASQ methods.
arXiv:2606.00289v1 Announce Type: new Abstract: Quantization is a fundamental tool used to compress datasets, neural network weights, and memory usage in a range of computational tasks. Many downstream applications of vector quantization perform inner products with arbitrary inputs. This motivates the study of inner product aware quantization schemes that approximately preserve inner products with unseen vectors -- in contrast to simply minimizing the mean-squared error. In this work, we formulate objectives that capture natural desiderata and develop adaptive and unbiased quantization methods that approximately preserve inner products with worst-case and average-case inputs. An analysis of these objectives shows a tight connection with the well-studied notion of Adaptive Stochastic Quantization (ASQ). We develop provably fast exact and approximate algorithms for our objectives. Our theoretical results inspire efficient practical algorithms that perform well across a variety of workload distributions. They also lead to practical algorithms for standard ASQ which are 2-10$\times$ faster than prior state-of-the-art methods while maintaining quality. These theoretical and empirical results contribute towards making adaptive quantization techniques more efficient and tractable in practical settings.
Original Article
View Cached Full Text
Cached at: 06/02/26, 03:40 PM
# Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms
Source: [https://arxiv.org/html/2606.00289](https://arxiv.org/html/2606.00289)
Nathan White University of Pennsylvania nathanlw@cis\.upenn\.edu &Krish Singal11footnotemark:1 University of Pennsylvania ksingal@seas\.upenn\.edu

###### Abstract

Quantization is a fundamental tool used to compress datasets, neural network weights, and memory usage in a range of computational tasks\. Many downstream applications of vector quantization perform inner products with arbitrary inputs\. This motivates the study of*inner product aware*quantization schemes that approximately preserve inner products with unseen vectors – in contrast to simply minimizing the mean\-squared error\.

In this work, we formulate objectives that capture natural desiderata and develop*adaptive*and*unbiased*quantization methods that approximately preserve inner products with worst\-case and average\-case inputs\. An analysis of these objectives shows a tight connection with the well\-studied notion of Adaptive Stochastic Quantization \(ASQ\)\.

We develop provably fast exact and approximate algorithms for our objectives\. Our theoretical results inspire efficient practical algorithms that perform well across a variety of workload distributions\. They also lead to practical algorithms for standard ASQ which are 2\-10×\\timesfaster than prior state\-of\-the\-art methods while maintaining quality\. These theoretical and empirical results contribute towards making adaptive quantization techniques more efficient and tractable in practical settings\.

## 1Introduction

Vector quantization is an incredibly important tool that is central to space and runtime optimization in a wide variety of computational and machine learning applications\. For example, quantization is used in dataset compression for vector searchGHKS \([13](https://arxiv.org/html/2606.00289#bib.bib15)\); GGX\+\([25](https://arxiv.org/html/2606.00289#bib.bib14)\), compression of large model weights and key\-value cachesSZY\+\([23](https://arxiv.org/html/2606.00289#bib.bib27)\), quantization\-aware trainingMNA\+\([17](https://arxiv.org/html/2606.00289#bib.bib22)\), and post\-training quantizationFAHA \([23](https://arxiv.org/html/2606.00289#bib.bib9)\); JLPK \([23](https://arxiv.org/html/2606.00289#bib.bib20)\)\.

Formally, we define vector quantization as follows\. Considerw∈ℝdw\\in\\mathbb\{R\}^\{d\}and letQ⊂ℝQ\\subset\\mathbb\{R\}be a set of*quantization values*such that\|Q\|=s≪d\|Q\|=s\\ll d\. Then, the vectorwwcan be*quantized*via some*rounding distribution*𝒟\\mathcal\{D\}withSupp\(𝒟\)⊆Qd\\text\{Supp\}\(\\mathcal\{D\}\)\\subseteq Q^\{d\}\. In this work, we considerunbiasedandadaptivequantization schemes\. A quantization scheme is unbiased if𝐄𝒘^∼𝒟\[𝒘^\]=w\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{w\}\}\]=w,111We use boldface to denote random variables\.while an adaptive scheme is one where bothQQand𝒟\\mathcal\{D\}may depend onww\. In full generality, we wish to jointly optimize over the tuple\(Q,𝒟\)\(Q,\\mathcal\{D\}\)according to some suitable objective function\.

Much prior work on quantization considers schemes which are either non\-adaptive/weakly adaptive or biased; for example, schemes such as QSGDAGL\+\([17](https://arxiv.org/html/2606.00289#bib.bib1)\)and NUQSGDRKFM\+\([21](https://arxiv.org/html/2606.00289#bib.bib25)\)only use superficial properties of the vectors such as their norm or aspect ratio\. Other worksFHH\+\([20](https://arxiv.org/html/2606.00289#bib.bib11)\); FTM\+\([20](https://arxiv.org/html/2606.00289#bib.bib13)\)are tailored to vectors which come from common, pre\-specified distributions, but do not adapt to the specific input vector\. For example, there is evidence that important and relevant data in ML applications follow LogNormalCBUS\+\([20](https://arxiv.org/html/2606.00289#bib.bib8)\), NormalBNS \([19](https://arxiv.org/html/2606.00289#bib.bib6)\), or sub\-Weibull distributionsVAM \([18](https://arxiv.org/html/2606.00289#bib.bib28)\)\. On the other hand, techniques such as Round\-to\-Nearest \(RTN\), which are biased and non\-adaptive, are known to be outperformed by adaptive methods on post\-training quantizationNAVB\+\([20](https://arxiv.org/html/2606.00289#bib.bib24)\)\.

A line of work on*adaptive stochastic quantization*\(ASQ\) inBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\); ZLK\+\([17](https://arxiv.org/html/2606.00289#bib.bib32)\); FTM\+\([20](https://arxiv.org/html/2606.00289#bib.bib13)\)studies adaptive and unbiased quantization schemes, and notes substantial improvements over other types of quantization \(see Figure 1 ofBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)\)\. In ASQ, the goal is to construct a quantization set which minimizes the mean\-squared error

𝖬𝖲𝖤⁡\(𝒘^,w\)≜𝐄\[‖𝒘^−w‖22\]\.\\operatorname\{\\mathsf\{MSE\}\}\(\\widehat\{\\boldsymbol\{w\}\},w\)\\triangleq\\mathop\{\{\\bf E\}\\/\}\\left\[~\\mathinner\{\\\!\\left\\lVert\\widehat\{\\boldsymbol\{w\}\}\-w\\right\\rVert\}\_\{2\}^\{2\}\\right\]\.The rounding distribution from which𝒘^\\widehat\{\\boldsymbol\{w\}\}is sampled is simple and natural: round eachwiw\_\{i\}independently to either the closest quantization point larger thanwiw\_\{i\}or smaller thanwiw\_\{i\}in the unique unbiased manner\. We refer to this rounding distribution asstandard stochastic quantization, and denote the resulting distribution𝒟𝖲𝖲𝖰⁡\(w,Q\)\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\);222When clear from context, we simply write𝒟𝖲𝖲𝖰\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}we give a formal definition in Section[1\.1](https://arxiv.org/html/2606.00289#S1.SS1)\. Standard stochastic quantization is a natural rounding distribution choice, as for each coordinateii, it minimizes𝐕𝐚𝐫\[𝒘^i\]\\mathop\{\{\\bf Var\}\\/\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]across all unbiased distributions\.333See Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)for a proof of this fact\.

While MSE is a natural objective to optimize, many downstream applications of quantization rely on more than simply lowℓ2\\ell\_\{2\}distance\. For instance, many modern applications take inner products of quantized vectors with other, potentially arbitrary vectors \(as is the case in both neural network weight quantization and vector search\)\. Thus, the success of quantization in these applications heavily depends on its ability to preserve inner products well\.

In this work, we introduce and study notions ofinner product awarequantization objectives which aim to approximately preserve inner products\. Because we consider unbiased quantization schemes where𝐄\[𝒘^\]=w\\mathop\{\{\\bf E\}\\/\}\[\\widehat\{\\boldsymbol\{w\}\}\]=w, linearity of expectation gives that for any vectorx∈ℝdx\\in\\mathbb\{R\}^\{d\},𝐄\[⟨𝒘^,x⟩\]=⟨w,x⟩\\mathop\{\{\\bf E\}\\/\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},x\\rangle\]=\\langle w,x\\rangle\. So, minimizing the expected \(squared\) error of⟨𝒘^,x⟩\\langle\\widehat\{\\boldsymbol\{w\}\},x\\ranglereduces to minimizing𝐕𝐚𝐫\[⟨𝒘^,x⟩\]\\mathop\{\{\\bf Var\}\\/\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},x\\rangle\]\. In our first objective, we minimize the variance over the worst\-case choice ofxx\.

###### Definition 1\.1\(Maximum Directional Variance \(𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}\)\)\.

For a vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}and target quantization set sizes∈ℕs\\in\\mathbb\{N\}, we define

𝖬𝖣𝖵⁡\(w,s\)≜minQ⊂ℝ:\|Q\|≤s⁡maxx∈ℝd:‖x‖2≤1𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q\)\[⟨𝒘^,x⟩\]\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\triangleq\\min\_\{Q\\subset\\mathbb\{R\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\|Q\|\\leq s\}\\max\_\{x\\in\\mathbb\{R\}^\{d\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\\\|x\\\|\_\{2\}\\leq 1\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},x\\rangle\]

In practice, input vectorsxxare often not worst\-case, and instead come from some known \(or estimated\) distribution\. As such, we also consider optimizing to minimize the average variance over some \(known\) distribution of vectorsxx\.

###### Definition 1\.2\(Average Directional Variance \(𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}\)\)\.

For a vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}, target quantization set sizes∈ℕs\\in\\mathbb\{N\}, and input distribution𝒳\\mathcal\{X\}overℝd\\mathbb\{R\}^\{d\}, define

𝖠𝖣𝖵𝒳⁡\(w,s\)≜minQ⊂ℝ:\|Q\|≤s𝐄𝒙∼𝒳\[𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q\)\[⟨𝒘^,𝒙⟩\]\]\\displaystyle\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\\triangleq\\min\_\{Q\\subset\\mathbb\{R\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\|Q\|\\leq s\}\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\\left\[\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\}\\left\[\\langle\\widehat\{\\boldsymbol\{w\}\},\\mathit\{\\boldsymbol\{x\}\}\\rangle\\right\]\\right\]\(1\)

We note that these objectives fix standard stochastic quantization as the rounding distribution and optimize over quantization sets\. While this is a natural and standard distribution choice, one may wonder if a better choice of rounding distribution could lead to lower inner product variance\. We show that this is unlikely to be a fruitful endeavor: for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}, we prove that standard stochastic quantization is in fact the optimal rounding distribution \(Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)\) and for𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}we show that it is NP\-Hard to compute the optimal distribution \(Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)\)\.

For both of these objectives, we design algorithms which are provably fast and obtain a solution within a\(1\+ε\)\(1\+\\varepsilon\)factor of the optimal objective cost, for a user\-specifiedε\>0\\varepsilon\>0\. See Section[3](https://arxiv.org/html/2606.00289#S3)for a discussion of our algorithmic results\.

We also implement and evaluate practical versions of these algorithms, and show they outperform previous algorithms for adaptive quantization\. In particular, due to a close connection between𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}and traditional ASQ \(see Section[1\.1](https://arxiv.org/html/2606.00289#S1.SS1)\), we employ our techniques to develop algorithms that significantly outperform the current state\-of\-the\-artQUIVERalgorithmBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\); see Figure[1](https://arxiv.org/html/2606.00289#S1.F1)\. In all plots and tables, we denote our novel algorithms with an asterisk∗\.

![Refer to caption](https://arxiv.org/html/2606.00289v1/intro-plot.png)

![Refer to caption](https://arxiv.org/html/2606.00289v1/intro-plots-s.png)

Figure 1:Figure comparing the runtime of the previous state\-of\-the\-art algorithm for traditional ASQ with our faster algorithm \(Wilberwith Warm\-Start Binary Search\), on vectors drawn fromLogNormal\(0,1\)\\text\{LogNormal\}\(0,1\)\. On the left, we fixs=64s=64and vary the vector sizedd, while on the right we fixd=500,000d=500,000and varyss\. The shaded regions show the 10\-90% range of runtimes across sampled vectors\. Figure[3](https://arxiv.org/html/2606.00289#S4.F3)is similar, with additional methods compared as well\.Since the exact algorithm is a core subroutine of the approximateQUIVERalgorithm ofBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\), we immediately obtain speedups for this as well\. More generally, any downstream task usingQUIVERas a subroutine \(e\.g\.,BBBIMV \([25](https://arxiv.org/html/2606.00289#bib.bib5)\)\) can be sped up using our implementation\. See Section[4](https://arxiv.org/html/2606.00289#S4)for the details of our empirically fast algorithms\.

Because adaptive methods come with an inherent trade\-off \(higher quality quantization at the cost of slower pre\-processing time\), one of the main impacts of our work is the use of theoretical insights to advance adaptive techniques towards efficiency in practical settings\.

### 1\.1Preliminaries

##### Standard Stochastic Quantization\.

For a given vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}and quantization setQQ, define thestandard stochastic quantizationrounding distribution𝒟𝖲𝖲𝖰⁡\(w,Q\)\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)to be the distribution which independently rounds eachwiw\_\{i\}such that for𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q\)\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\),

𝒘^i=\{wi↑w\.p\.\(wi↑−wi\)/\(wi↑−wi↓\)wi↓w\.p\.\(wi−wi↓\)/\(wi↑−wi↓\)\\widehat\{\\boldsymbol\{w\}\}\_\{i\}=\\begin\{cases\}w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}&\\text\{w\.p\.~\}\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\-w\_\{i\}\)/\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\)\\\\ w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}&\\text\{w\.p\.~\}\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\)/\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\)\\end\{cases\}wherewi↑:=min\{q∈Q:q≥wi\}w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\\\{q\\in Q\\mathrel\{\\mathop\{\\ordinarycolon\}\}q\\geq w\_\{i\}\\\}andwi↓\(Q\):=max\{q∈Q:q≤wi\}w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\\\{q\\in Q\\mathrel\{\\mathop\{\\ordinarycolon\}\}q\\leq w\_\{i\}\\\}are the values ofQQclosest towiw\_\{i\}from above and below, respectively\. Note that𝐄\[𝒘^\]=w\\mathop\{\{\\bf E\}\\/\}\[\\widehat\{\\boldsymbol\{w\}\}\]=wand𝐕𝐚𝐫\[𝒘^i\]=\(wi↑−wi\)\(wi−wi↓\)\\mathop\{\{\\bf Var\}\\/\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]=\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\)for alli∈\[d\]i\\in\[d\]\. For notational ease, we often write𝖵𝖺𝗋𝖲𝖲𝖰\(wi,Q\):=𝐕𝐚𝐫\[𝒘^i\]=\(wi↑−wi\)\(wi−wi↓\)\\smash\{\\operatorname\{\\mathsf\{VarSSQ\}\}\}\(w\_\{i\},Q\)\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\mathop\{\{\\bf Var\}\\/\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]=\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\)\.

##### Properties of𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}and𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}\.

Here, we make some simple but useful observations about the structure of the𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}and𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}objectives\. We first introduce notational shorthands for the costs of each objective for a fixed quantization set\. We write

𝖬𝖣𝖵⁡\(w,Q\)≜maxx∈ℝd:‖x‖2≤1𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q\)\[⟨𝒘^,x⟩\]and𝖠𝖣𝖵𝒳⁡\(w,Q\)≜𝐄𝒙∼𝒳\[𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q\)\[⟨𝒘^,𝒙⟩\]\]\\displaystyle\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\triangleq\\max\_\{x\\in\\mathbb\{R\}^\{d\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\\\|x\\\|\_\{2\}\\leq 1\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},x\\rangle\]\\quad\\text\{ and \}\\quad\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\\triangleq\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\\left\[\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},\\mathit\{\\boldsymbol\{x\}\}\\rangle\]\\right\]𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}has a simple combinatorial structure; namely,

𝖬𝖣𝖵⁡\(w,Q\)\\displaystyle\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)=maxx∈ℝd:‖x‖2≤1⁡\[𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q\)\[⟨𝒘^,x⟩\]\]\\displaystyle=\\max\_\{x\\in\\mathbb\{R\}^\{d\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\\\|x\\\|\_\{2\}\\leq 1\}\\left\[\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},x\\rangle\]\\right\]=maxx∈ℝd:‖x‖2≤1⁡\[∑i=1dxi2𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q\)\[𝒘^i\]\]\\displaystyle=\\max\_\{x\\in\\mathbb\{R\}^\{d\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\\\|x\\\|\_\{2\}\\leq 1\}\\left\[\\sum\_\{i=1\}^\{d\}x\_\{i\}^\{2\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]\\right\]\(Independence of𝒟𝖲𝖲𝖰\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\)=maxj∈\[d\]⁡𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wj,Q\)\.\\displaystyle=\\max\_\{j\\in\[d\]\}\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{j\},Q\)\.So, the worst\-case input vectorxxis the standard unit basis vectoreie\_\{i\}whereiiis such that𝖵𝖺𝗋𝖲𝖲𝖰⁡\[wi\]\\smash\{\\operatorname\{\\mathsf\{VarSSQ\}\}\}\[w\_\{i\}\]is maximized\. For𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}, consider a distribution𝒳\\mathcal\{X\}supported onℝd\\smash\{\\mathbb\{R\}\}^\{d\}and letλi≜𝐄𝒙∼𝒳\[𝒙i2\]\\lambda\_\{i\}\\triangleq\\smash\{\\mathop\{\{\\bf E\}\\/\}\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}^\{2\}\]\. Then,

𝖠𝖣𝖵𝒳⁡\(w,Q\)=minQ⊂ℝ:\|Q\|≤s𝐄𝒙∼𝒳\[𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q\)\[⟨𝒘^,𝒙⟩\]\]\\displaystyle\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)=\\min\_\{Q\\subset\\mathbb\{R\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\|Q\|\\leq s\}\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\\left\[\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\}\\left\[\\langle\\widehat\{\\boldsymbol\{w\}\},\\mathit\{\\boldsymbol\{x\}\}\\rangle\\right\]\\right\]=minQ⊂ℝ:\|Q\|≤s𝐄𝒙∼𝒳\[∑i∈\[d\]𝒙i2⋅𝖵𝖺𝗋𝖲𝖲𝖰⁡\[wi\]\]\\displaystyle=\\min\_\{Q\\subset\\mathbb\{R\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\|Q\|\\leq s\}\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\\left\[\\sum\_\{i\\in\[d\]\}\\mathit\{\\boldsymbol\{x\}\}\_\{i\}^\{2\}\\cdot\\operatorname\{\\mathsf\{VarSSQ\}\}\[w\_\{i\}\]\\right\]=minQ⊂ℝ:\|Q\|≤s∑i∈\[d\]λi⋅𝖵𝖺𝗋𝖲𝖲𝖰⁡\[wi\]\\displaystyle=\\min\_\{Q\\subset\\mathbb\{R\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\|Q\|\\leq s\}\\sum\_\{i\\in\[d\]\}\\lambda\_\{i\}\\cdot\\operatorname\{\\mathsf\{VarSSQ\}\}\[w\_\{i\}\]which is exactly the weighted MSE\. This connection between quantization for average\-case inner product preservation and that of the \(weighted\)𝖬𝖲𝖤\\operatorname\{\\mathsf\{MSE\}\}objective also provides one possible theoretical explanation for the empirical success of Adaptive Stochastic Quantization for downstream applications in ML and vector search\.

## 2Background and Motivation

![Refer to caption](https://arxiv.org/html/2606.00289v1/motivation_plot_lognormal.png)

![Refer to caption](https://arxiv.org/html/2606.00289v1/inner_prod_error.png)

Figure 2:\(a\) Average variance𝖵𝖺𝗋𝖲𝖲𝖰⁡\[wi\]\\operatorname\{\\mathsf\{VarSSQ\}\}\[w\_\{i\}\]of the worstkkcoordinates on vectorswwdrawn fromLogNormal\(0,1\)over 100 trials\. \(b\) Average absolute inner product error on vectorswwdrawn from a mixture of1616Gaussians with variance 10 and means separated by10510^\{5\}\. In both plots, the shaded regions show the 10–90% range of average variancesWe now briefly discuss the merits of unbiased and adaptive quantization methods\. Methods such as NUQSGD and QSGDRKFM\+\([21](https://arxiv.org/html/2606.00289#bib.bib25)\); AGL\+\([17](https://arxiv.org/html/2606.00289#bib.bib1)\)are unbiased but only very weakly adaptive; they choose the quantization setQQonly using information about the range of the entries inww\. Round\-to\-Nearest is both non\-adaptive and biased, and there is evidenceNAVB\+\([20](https://arxiv.org/html/2606.00289#bib.bib24)\)that post\-training quantization using RTN is significantly outperformed by adaptive techniques\. We note thatZDHM \([25](https://arxiv.org/html/2606.00289#bib.bib31)\)study quantization under the objective of worst\-case inner product preservation\. However, they work in the non\-adaptive setting, where the quantization set cannot depend on the vector being quantized\.

Figure 1 inBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)illustrates the superiority of unbiased and adaptive techniques when minimizing the mean\-squared error\. As noted in the introduction, the worst\-case variance𝐕𝐚𝐫\[⟨𝒘^,x⟩\]\\smash\{\\mathop\{\{\\bf Var\}\\/\}\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},x\\rangle\]is realized by the standard basis vectoreie\_\{i\}whereiiis such that𝖵𝖺𝗋𝖲𝖲𝖰⁡\[wi\]\\operatorname\{\\mathsf\{VarSSQ\}\}\[w\_\{i\}\]is largest\. Figure[2](https://arxiv.org/html/2606.00289#S2.F2)\(a\) shows that, empirically, optimizing𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}actually minimizes the worst variances beyond the single worst coordinate, when compared against a solution to𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}\. Indeed, on at least the 500 worst coordinates of a 1,000,000\-dimensional vector \(i\.e\. the worst0\.05%0\.05\\%\), the optimal quantization set for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}has smaller variance than that for𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}\. Figure[2](https://arxiv.org/html/2606.00289#S2.F2)\(b\) further gives evidence for the benefits of using unbiased and adaptive methods for worst\-case inner product preservation\.

In Table[2](https://arxiv.org/html/2606.00289#A7.T2)\(see Appendix[G](https://arxiv.org/html/2606.00289#A7)\), we present preliminary evidence that our methods have improved tail performance over the popular technique of product quantization \(PQ\) for certain types of vector search \(namely maximum inner product search\)\. While𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}obtains recall comparable to PQ across both the average and tail events forℓ2\\ell\_\{2\}search, it achieves significantly higher recall on the worst 0\.1% of queries tested for maximum inner product search\. These results, while limited in scope, suggest that explicitly optimizing for inner product preservation results in better quantization performance, as PQ is not tailored to inner product preservation\.

## 3Theoretical Results

### 3\.1Optimizing the Rounding Distribution

Our main objectives𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}and𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}use a fixed rounding distribution \(namely, standard stochastic quantization\)\. In this section, we outline the theoretical justification for the use of this distribution\. We define new objectives for the problem of optimizing the rounding distribution given a fixed quantization set\. In particular, we state Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2), which shows that standard stochastic quantization is optimal for minimizing worst\-case inner product variance, and Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2), which shows it is NP\-Hard to find the optimal distribution for minimizing average\-case inner product variance\.

Given a vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}and quantization setQ⊆ℝQ\\subseteq\\mathbb\{R\}, we letΩ\\Omegadenote the set of rounding distributions𝒟\\mathcal\{D\}such thatSupp\(𝒟\)⊆Qd\\text\{Supp\}\(\\mathcal\{D\}\)\\subseteq Q^\{d\}and𝐄𝒘^∼𝒟\[𝒘^\]=w\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{w\}\}\]=w\. As inner products are linear and𝒟∈Ω\\mathcal\{D\}\\in\\Omegais unbiased, it is easy to see that for any vectorx∈ℝdx\\in\\mathbb\{R\}^\{d\}and𝒟∈Ω\\mathcal\{D\}\\in\\Omega,𝐄𝒘^∼𝒟\[⟨𝒘^,x⟩\]=⟨w,x⟩\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},x\\rangle\]=\\langle w,x\\rangle\. We now introduce the new objectives\.

###### Definition 3\.1\(𝖶𝖣𝖣𝖵\\operatorname\{\\mathsf\{WDDV\}\}, Worst\-Case Distributional Directional Variance\)\.

For a vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}and a setQ⊂ℝQ\\subset\\mathbb\{R\}, we define

𝖶𝖣𝖣𝖵⁡\(w,Q\)=argmin𝒟∈Ωmaxx∈ℝd𝐕𝐚𝐫𝒘^∼𝒟\[⟨𝒘^,x⟩\]\.\\operatorname\{\\mathsf\{WDDV\}\}\(w,Q\)=\\mathop\{\\mathrm\{argmin\}\}\_\{\\mathcal\{D\}\\in\\Omega\}\\max\_\{x\\in\\mathbb\{R\}^\{d\}\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},x\\rangle\]\.

###### Definition 3\.2\(𝖠𝖣𝖣𝖵\\operatorname\{\\mathsf\{ADDV\}\}, Average\-Case Distributional Directional Variance\)\.

For a vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}, a setQ⊂ℝQ\\subset\\mathbb\{R\}, and an input distribution𝒳\\mathcal\{X\}overℝd\\mathbb\{R\}^\{d\}, we define

𝖠𝖣𝖣𝖵𝒳⁡\(w,Q\)=argmin𝒟∈Ω𝐄𝒙∼𝒳\[𝐕𝐚𝐫𝒘^∼𝒟\[⟨𝒘^,𝒙⟩\]\]\\operatorname\{\\mathsf\{ADDV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)=\\mathop\{\\mathrm\{argmin\}\}\_\{\\mathcal\{D\}\\in\\Omega\}\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\\left\[\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},\\mathit\{\\boldsymbol\{x\}\}\\rangle\]\\right\]

In Appendix[B](https://arxiv.org/html/2606.00289#A2), we prove Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)– that standard stochastic quantization is optimal for𝖶𝖣𝖣𝖵\\operatorname\{\\mathsf\{WDDV\}\}\. In contrast, in Appendix[C](https://arxiv.org/html/2606.00289#A3), we prove Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)– that finding the optimal rounding distribution for𝖠𝖣𝖣𝖵\\operatorname\{\\mathsf\{ADDV\}\}is NP\-Hard\.\{restatable\*\}theoremASQoptimalthm For every vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}and quantization setQ⊂ℝQ\\subset\\mathbb\{R\}, the rounding distribution𝒟𝖲𝖲𝖰⁡\(w,Q\)\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)minimizes𝖶𝖣𝖣𝖵⁡\(w,Q\)\\operatorname\{\\mathsf\{WDDV\}\}\(w,Q\)\.

\{restatable\*\}

theoremADDVNPHard There exists a vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}, quantization setQ⊂ℝQ\\subset\\mathbb\{R\}, and Gaussian input distribution𝒳\\mathcal\{X\}such that computing𝖠𝖣𝖣𝖵𝒳⁡\(w,Q\)\\operatorname\{\\mathsf\{ADDV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)is NP\-Hard\.

The proof of Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)proceeds by transforming an arbitrary distribution into𝒟𝖲𝖲𝖰\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}and arguing that this transformation cannot have increased the worst\-case variance\. Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)follows by a reduction from Maximum Cut on unweighted graphs\. These results further motivate the use of standard stochastic quantization as the fixed rounding distribution in𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}and𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}\.

### 3\.2Algorithms for𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}

#### 3\.2\.1Exact Algorithms

In the recent literature,ZLK\+\([17](https://arxiv.org/html/2606.00289#bib.bib32)\)first defines the standard ASQ problem and presents a simple dynamic programming algorithm with runtimeO\(d2s\)O\(d^\{2\}s\)and spaceO\(d2\)O\(d^\{2\}\)\.BBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)then presented an algorithm namedQUIVER, which couples the dynamic programming solution with efficient matrix\-searching techniques to solve the standard ASQ problem\. This improved algorithm has runtimeO\(dlog⁡d\+ds\)O\(d\\log d\+ds\)and usesO\(ds\)O\(ds\)space to output the optimal quantization set\.

In a different direction, the survey paper ofGLM\+\([17](https://arxiv.org/html/2606.00289#bib.bib17)\)outlines many results from the11\-dimensionalkk\-means clustering literature\. Perhaps intuitively, as thekk\-means cost is a lower bound on the MSE of any quantization scheme,444In fact, if one allows biased quantization, the optimal scheme is to compute thekk\-means clustering and assign each element ofwwto the nearest cluster\.kk\-means has many similarities with minimizing the MSE of an \(unbiased\) quantization scheme\. Indeed, the algorithms mentioned inGLM\+\([17](https://arxiv.org/html/2606.00289#bib.bib17)\)can be easily modified to give algorithms for the standard ASQ problem\. For example, in the context of 1\-dimensionalkk\-means, the dynamic programming and matrix\-searching algorithm \(Corollary[D\.6](https://arxiv.org/html/2606.00289#A4.Thmlemma6)ofBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)\) first appeared inWu \([91](https://arxiv.org/html/2606.00289#bib.bib30)\)\.

More generally,GLM\+\([17](https://arxiv.org/html/2606.00289#bib.bib17)\)survey a spectrum of theoretical results forkk\-means that complete the state\-of\-the\-art optimality profile in differing regimes ofkkanddd\. Importantly for our work, these algorithms can all be applied to any problem whose structure satisfies the Concave Monge property \(Definition[A\.4](https://arxiv.org/html/2606.00289#A1.Thmdefinition4)\), including but not limited to𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}and11\-dimensionalkk\-median clustering in Euclidean space\.

A key primitive in these algorithms is finding the shortest path with length \(i\.e\. the number of edges\)kkin a directed acyclic graph \(DAG\) whose weights satisfy the Concave Monge property\. For this problem,AST \([93](https://arxiv.org/html/2606.00289#bib.bib3)\)gives \(i\) an algorithm with runtimeO\(dklog⁡d\)O\(d\\sqrt\{k\\log d\}\), and \(ii\) an algorithm with runtimeO\(dlog⁡Δ\)O\(d\\log\\Delta\)whereΔ\\Deltais the difference between the min and max weight\. Whenk=Ω\(log⁡d\)k=\\Omega\(\\log d\),Sch \([98](https://arxiv.org/html/2606.00289#bib.bib26)\)gives an algorithm with runtimed2O\(log⁡log⁡dlog⁡k\)d\\smash\{2\}^\{O\(\\sqrt\{\\log\\log d\\log k\}\)\}\. At a high level, these algorithms work by solving the following*regularized*version of the problem:

min\{i1,…,iℓ\}⊆\[d\]∑j=1ℓ−1C\[ij,ij\+1\]\+τℓ\\displaystyle\\min\_\{\\\{i\_\{1\},\\ldots,i\_\{\\ell\}\\\}\\subseteq\[d\]\}\\sum\_\{j=1\}^\{\\ell\-1\}C\[i\_\{j\},i\_\{j\+1\}\]\+\\tau\\ellwhereC\[i,j\]C\[i,j\]is the weight of edge\(i,j\)\(i,j\)\. That is, we remove the hard constraint of lengthkkedges, and instead add a penalty term to the cost function ofτ\\tauper edge\. By iterating through values ofτ\\tauappropriately, we can then solve the unregularized \(i\.e\. length constrained\) variant\. In solving this regularized problem,AST \([93](https://arxiv.org/html/2606.00289#bib.bib3)\); Sch \([98](https://arxiv.org/html/2606.00289#bib.bib26)\)use ideas and results fromWil \([88](https://arxiv.org/html/2606.00289#bib.bib29)\)\.

These algorithms can be directly applied to𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}since its structure is Concave Monge \(see Lemma[D\.3](https://arxiv.org/html/2606.00289#A4.Thmlemma3)\)\. As we show in Section[4](https://arxiv.org/html/2606.00289#S4), an out\-of\-the\-box implementation byGLM\+\([17](https://arxiv.org/html/2606.00289#bib.bib17)\)of algorithm \(ii\) from above outperformsQUIVERfromBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)\.

#### 3\.2\.2Approximation Algorithms

An additional result ofBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)is a bi\-criteria approximation algorithm for the*unweighted*MSE objective\. In particular, their algorithm returns a quantization set of size2s−22s\-2whose quality is an additive approximation to that of the optimal\. It is easy to see, following their analysis, that the same algorithm, analyzed under the*weighted*MSE objective results in the following weak additive approximation guarantee\.

###### Corollary 3\.1\(Lemma 6\.1 ofBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)\)\.

There exists an algorithm such that for any input distribution𝒳\\mathcal\{X\}andm,s∈ℕm,s\\in\\mathbb\{N\}, it has runtimeO\(d\+ms\)O\(d\+ms\)and returns a quantization setQQsuch that\|Q\|=2s−2\|Q\|=2s\-2and𝖠𝖣𝖵𝒳⁡\(w,Q\)≤𝖠𝖣𝖵𝒳⁡\(w,s\)\+∑i=1dλi⋅Δ2/m2\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\\leq\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\+\\smash\{\\sum\}\_\{i=1\}^\{d\}\\lambda\_\{i\}\\cdot\\Delta^\{2\}/m^\{2\}\. Here,Δ:=maxi,j∈\[d\]wi−wj\\Delta\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\_\{i,j\\in\[d\]\}w\_\{i\}\-w\_\{j\}\.

We develop approximation algorithms for optimizing𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(weighted MSE\) with substantially stronger guaranteed approximation ratio than that of Lemma[3\.1](https://arxiv.org/html/2606.00289#S3.Thmlemma1)\. Furthermore, our algorithms no longer provide bi\-criteria approximation\. We first give anss\-approximation,555This first approximation algorithm underpins our empirically fast algorithms \(which have similar provable approximation guarantees; see Section[4\.3](https://arxiv.org/html/2606.00289#S4.SS3)\)then improve it to a\(1\+ε\)\(1\+\\varepsilon\)\-approximation\.

\{restatable\*\}

theoremADVApproxAlg There exists an algorithm that for any given vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}, target quantization set sizes∈ℕs\\in\\mathbb\{N\}, input distribution𝒳\\mathcal\{X\}, andε\>0\\varepsilon\>0, returns a quantization setQQsuch that\|Q\|=s\|Q\|=sand𝖠𝖣𝖵𝒳⁡\(w,Q\)≤\(1\+ε\)⋅𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\\leq\(1\+\\varepsilon\)\\cdot\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\. The runtime of the algorithm isO\(dlog⁡\(d/ε\)\)O\(d\\log\(d/\\varepsilon\)\)\.

The firstss\-approximation algorithm works by exactly solving a suitably defined objective𝖬𝗂𝗑𝖣𝖵\\operatorname\{\\mathsf\{MixDV\}\}whose optimal value is anss\-approximation to that of𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}\. Crucially, the algorithm uses the fact that the objective has a sortedness \(Definition[A\.1](https://arxiv.org/html/2606.00289#A1.Thmdefinition1)\) property, which allows for fastkk\-selection algorithms to be used \(see Appendix[A](https://arxiv.org/html/2606.00289#A1)\)\. The\(1\+ε\)\(1\+\\varepsilon\)\-approximation algorithm solves a*rounded*version of the regularized shortest path on DAG problem described earlier\. Importantly, it is*warm\-started*using the solution from thess\-approximation algorithm, which allows for a much stronger bound on the search time for the Lagrangian multiplierτ\\tau\. See Appendix[D](https://arxiv.org/html/2606.00289#A4)for details\.

Along the way, we observe that the main ideas from solutions to𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}and𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(in particular, building data\-dependent coresets\) can be used to improve the bi\-criteria approximation algorithm for quantization under the*unweighted*MSE objective\. Our algorithm reports a quantization set of size2s−22s\-2with objective value at most\(1\+ε\)\(1\+\\varepsilon\)times optimal in timeO\(dlog⁡s\+sd/εlog⁡\(d/ε\)\)\\smash\{O\}\(d\\log s\+s\\sqrt\{d/\\varepsilon\}\\log\(d/\\varepsilon\)\); see Theorem[2](https://arxiv.org/html/2606.00289#Thmtheorem2)in Appendix[E](https://arxiv.org/html/2606.00289#A5)for details\.

### 3\.3Algorithms for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}

An ideal algorithm for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}would efficiently return an optimal quantization set\. Unfortunately, we prove in Lemma[F\.22](https://arxiv.org/html/2606.00289#A6.Thmlemma22)that the optimal quantization set may contain irrational values \(even when the input vector is integer\), making such an algorithm impossible under standard finite bit representation \(i\.e\. floating point\) of any precision\. One must then settle for algorithms which are optimal up to some precisionε\\varepsilon\.\{restatable\*\}theoremMDVApproxAlg There exists a randomized algorithm that for any given vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}, target quantization set sizes∈ℕs\\in\\mathbb\{N\}, andε\>0\\varepsilon\>0, returns a quantization setQQsuch that\|Q\|=s\|Q\|=sand𝖬𝖣𝖵⁡\(w,Q\)≤\(1\+ε\)⋅𝖠𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq\(1\+\\varepsilon\)\\cdot\\operatorname\{\\mathsf\{ADV\}\}\(w,s\)\. The runtime of the algorithm isO\(dlog⁡\(s/ε\)\)O\(d\\log\(s/\\varepsilon\)\)with probability0\.990\.99\.

Note that this algorithm doesnotrequire sorting, and for moderate values ofs,εs,\\varepsilonis faster than theO\(dlog⁡d\)O\(d\\log d\)time required to sort the input vector \(as inBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)\)\. Moreover, the dependence on1/ε1/\\varepsilonis logarithmic, i\.e\. linear in the number of bits of precision one desires\.

This randomized algorithm works by constructing a smallcoresetof values, on which we can solve while still obtaining a close\-to\-optimal solution for the entire vector\. In particular, we show how to construct a subset of at mostO\(s/ε\)O\(s/\\sqrt\{\\varepsilon\}\)elements of the input vectory⊆wy\\subseteq wsuch that𝖬𝖣𝖵⁡\(y,s\)\\operatorname\{\\mathsf\{MDV\}\}\(y,s\)is within a\(1\+ε\)\(1\+\\varepsilon\)factor of𝖬𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\. Thus, we construct such a subset and then run a slower but still near\-linear time algorithm on this coreset\.

To construct this coreset, we prove several structural lemmas about𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}and employ faster approximation algorithms forkk\-center clustering \(see Appendix[A](https://arxiv.org/html/2606.00289#A1)\)\. To solve on the coreset, we first find a coarse approximation by restricting the quantization points to be values from the input vector, and then use this estimate to “warm\-start” a binary search for the optimal maximal variance \(again, over the coreset\)\. The most technically difficult step is finding this coarse approximation: it bears some similarities to the fast matrix search algorithms employed to optimize for𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}, but there are a number of added difficulties specific to the max objective of𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}\. See Appendix[F](https://arxiv.org/html/2606.00289#A6)for the full details of the algorithm and the proof of Theorem[3\.3](https://arxiv.org/html/2606.00289#S3.SS3)\.

## 4Empirical Results

### 4\.1Empirical Setup

We implement a number of algorithms in optimizedCythoncode666Our code is availiable here:[https://github\.com/nathanllww/Inner\-Product\-Aware\-Quantization](https://github.com/nathanllww/Inner-Product-Aware-Quantization)\. We also modify and extend the code ofGLM\+\([17](https://arxiv.org/html/2606.00289#bib.bib17)\)in our exact algorithms for ADV/weighted ASQ\. Our experiments were run on a Mac Mini consumer desktop computer with an Apple M4 processor and 16GB of RAM\.

### 4\.2Faster Exact Algorithms for ASQ

Our first contribution is significantly faster algorithms for ASQ777Our algorithms, likeQUIVER, support weights, as needed to solve𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}\.which, in some regimes, obtain a10×10\\timesspeedup over the previous state\-of\-the\-art acceleratedQUIVERfromBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)\. Our implementations can be used as a drop\-in replacement forQUIVER, and thus speedup any algorithm usingQUIVER; in particular, we give faster approximation algorithms \(see Section[D\.3](https://arxiv.org/html/2606.00289#A4.SS3)\)\.

Our first insight to make these speedups is that the fast exact 1Dkk\-means algorithms presented inGLM\+\([17](https://arxiv.org/html/2606.00289#bib.bib17)\)can be modified to solve ASQ; see Section[D\.1](https://arxiv.org/html/2606.00289#A4.SS1)for the full details\. This strategy already gives algorithms which are faster thanQUIVER; namely, theWilberalgorithm fromGLM\+\([17](https://arxiv.org/html/2606.00289#bib.bib17)\), when run with interpolation search and modified to support the ASQ objective, is much faster thanQUIVERin many regimes \(see Figure[3](https://arxiv.org/html/2606.00289#S4.F3)\)\.

To improve the performance further, we exploit the inner workings ofWilbertowarm\-startthe algorithm with an approximation\. This allows the binary search ofWilberto converge much faster, leading to additional speedups\. In particular,Wilberworks by solving a shortest path on a directed acyclic graph \(DAG\), whose weights are determined in part by a Lagrangian multiplierτ\\tau\. Unfortunately,GLM\+\([17](https://arxiv.org/html/2606.00289#bib.bib17)\)show that the desired valueτ\\tauis the difference between the optimal ASQ cost withssquantization points and the cost withs\+1s\+1quantization points, so it must be found via binary search\.

To speed up this search, we use a fast yet robust approximation algorithm \(MixApprox, see Section[D\.3](https://arxiv.org/html/2606.00289#A4.SS3)\) to estimate the valueτ\\tau, and only search in the range around this estimate\. Whereτ′\\tau^\{\\prime\}is the difference in costs returned byMixApproxwhen run withssands\+1s\+1quantization points, we only search forτ∈\[τ′/2,2τ′\]\\tau\\in\[\\tau^\{\\prime\}/2,2\\tau^\{\\prime\}\]\(reverting to searching all possible values if nothing is found\)\. SinceMixApproxis an accurate approximation algorithm across a range of distributions,τ\\tauis found very quickly, leading to a large speedup\. We plot the runtime of our method againstQUIVERand vanillaWilberin Figure[3](https://arxiv.org/html/2606.00289#S4.F3)\.

![Refer to caption](https://arxiv.org/html/2606.00289v1/exact_runtime_n.png)

![Refer to caption](https://arxiv.org/html/2606.00289v1/exact_runtime_s.png)

Figure 3:Runtime of exact algorithms for ASQ on vectorswwof dimensiondddrawn fromLogNormal\(0,1\)\\text\{LogNormal\}\(0,1\)\. The shaded regions show the 10–90% range of runtimes across draws ofww\. Orange denotes acceleratedQUIVER; red denotesWilberwith binary search over the full range of possibleλ\\lambda; purple denotesWilberwith the interpolation search method ofGLM\+\([17](https://arxiv.org/html/2606.00289#bib.bib17)\); and brown denotes our accelerated search method\.
### 4\.3Faster Approximation Algorithms for ASQ

The approximateQUIVERalgorithm ofBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)operates by first placingmmevenly spaced markers between the minimum and maximum entries of the vectorww, and then finds the optimalssof thesemmdiscretized points to include as quantization values\. Unsurprisingly, then, using our faster exact algorithms to find the optimalssquantization values, we are able to improve the performance888Due to the special structure of the calls approximateQUIVERmakes to the exact algorithm, we use a slight variant of our fast exact algorithm; see Section[D\.3](https://arxiv.org/html/2606.00289#A4.SS3)for details\. The overall structure of the algorithm is the same\.of the approximateQUIVERalgorithm as well; see Figure[4](https://arxiv.org/html/2606.00289#S4.F4)\.

![Refer to caption](https://arxiv.org/html/2606.00289v1/approx_comp_n.png)

![Refer to caption](https://arxiv.org/html/2606.00289v1/approx_comp_s.png)

Figure 4:Runtime comparison of approximateQUIVERand our implementation using faster exact subroutines on vectorswwdrawn fromLogNormal\(0,1\)\\text\{LogNormal\}\(0,1\), withm=100sm=100suniformly spaced discretization points\. The shaded regions show the 10–90% range of runtimes across draws ofww\.
### 4\.4Algorithms for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}

We also give a fast practical algorithm for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}, inspired by our theoretical algorithms; see Section[F\.8](https://arxiv.org/html/2606.00289#A6.SS8)for full details\. In the first step when run on vectorww, the algorithm constructs a coreset in a manner similar to approximateQUIVER: it divides the range\[mini⁡wi,maxi⁡wi\]\[\\min\_\{i\}w\_\{i\},\\max\_\{i\}w\_\{i\}\]into10s10sequally sized buckets, and constructs the coreset by taking the minimum and maximum value ofwwfrom each bucket\. The algorithm then uses binary search to find the worst\-case variance, by “guessing and checking”\. To do the check, we utilize an algorithm which \(quickly\) computes the minimum quantization set size required to obtain a target worst\-case variance ofvv\.

However, if any bucket is larger thanεvmax\\sqrt\{\\varepsilon v\_\{\\max\}\}, wherevmaxv\_\{\\max\}is the current maximum of the search range, the algorithmsubdividesthe buckets into smaller sizes until all have length at mostεvmax\\sqrt\{\\varepsilon v\_\{\\max\}\}\. This subdivision step is the crucial modification that allows for a guaranteed\(1\+ε\)\(1\+\\varepsilon\)\-approximation, even on heavily skewed data, but the adaptive nature also keeps the algorithm very fast\. In a certain sense, the algorithm is adaptively finding theinstance idealbucket size, keeping it as fast as possible for a given desired accuracy\. The proof of the\(1\+ε\)\(1\+\\varepsilon\)\-approximation can be found in Lemma[F\.23](https://arxiv.org/html/2606.00289#A6.Thmlemma23)\.

![Refer to caption](https://arxiv.org/html/2606.00289v1/mdv_runtime_n.png)

![Refer to caption](https://arxiv.org/html/2606.00289v1/mdv_runtime_s.png)

Figure 5:Runtime of algorithms for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}compared against Improved ApproxQUIVERas a baseline \(although it does not provide a good approximation to𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}\), on vectors drawn fromLogNormal\(0,1\)\\text\{LogNormal\}\(0,1\)\. The shaded regions show the 10–90% range of runtimes across draws of the input vector\.In Figure[5](https://arxiv.org/html/2606.00289#S4.F5), the algorithm labeled “𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}” is this fast algorithm, run withε=0\.01\\varepsilon=0\.01, while the “Sort \+ Binary Search" method sorts and then uses binary search to find the smallestvvfor whichssquantization points can achieve worst\-case variancevv\. Improved ApproxQUIVERis always run with quality parameterm=100sm=100s\. Note that this implementation of Improved ApproxQUIVER, in contrast to the implementation of ApproxQUIVERprovided byBBBIMV \([24](https://arxiv.org/html/2606.00289#bib.bib4)\)and our implementation shown in Figure[4](https://arxiv.org/html/2606.00289#S4.F4), does not require the input vector to be sorted\.

## 5Discussion

##### Limitations\.

We first discuss a few important limitations of our work\.

- •Although𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}outperforms𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}on worst\-case variance, our experimentation revealed that it is outperformed by𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}on worst1%1\\%tail events for many distributions\. This perhaps motivates future work studying a*hybrid*objective that optimizes for the worst1%1\\%performance over a given input distribution𝒳\\mathcal\{X\}\.
- •Our largest speed improvements over the prior state\-of\-the\-artQUIVERalgorithm come when the number of quantization pointsssis large\. It is natural to consider whether we can adapt our algorithms to achieve similar speedup overQUIVERfor small values ofssas well\.

##### Future Work\.

We conclude with some interesting open questions that are natural extensions of this work:

- •Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)shows that computing the optimal rounding distribution for the average\-case problem is NP\-Hard\. Do there exist efficient algorithms to find approximately optimal rounding distributions?
- •Applying this notion of adaptive stochastic quantization to vector search requires the storage of a quantization set for every vectorwwin the dataset\. Can our ideas and algorithms be adapted to \(potentially dynamic\)*multi\-vector*quantization settings?

## References

- AGL\+\[17\]Dan Alistarh, Demjan Grubic, Jerry Li, Ryota Tomioka, and Milan Vojnovic\.Qsgd: Communication\-efficient sgd via gradient quantization and encoding\.Advances in neural information processing systems, 30, 2017\.
- AKM\+\[86\]Alok Aggarwal, Maria Klawe, Shlomo Moran, Peter Shor, and Robert Wilber\.Geometric applications of a matrix searching algorithm\.InProceedings of the second annual symposium on Computational geometry, pages 285–292, 1986\.
- AST \[93\]Alok Aggarwal, Baruch Schieber, and Takashi Tokuyama\.Finding a minimum weight k\-link path in graphs with monge property and applications\.InProceedings of the ninth annual symposium on Computational geometry, pages 189–197, 1993\.
- BBBIMV \[24\]Ran Ben\-Basat, Yaniv Ben\-Itzhak, Michael Mitzenmacher, and Shay Vargaftik\.Optimal and approximate adaptive stochastic quantization\.Advances in Neural Information Processing Systems, 37:94265–94291, 2024\.
- BBBIMV \[25\]Ran Ben Basat, Yaniv Ben\-Itzhak, Michael Mitzenmacher, and Shay Vargaftik\.Better than optimal: Improving adaptive stochastic quantization using shared randomness\.Proceedings of the ACM on Measurement and Analysis of Computing Systems, 9\(3\):1–44, 2025\.
- BNS \[19\]Ron Banner, Yury Nahshan, and Daniel Soudry\.Post training 4\-bit quantization of convolutional networks for rapid\-deployment\.Advances in neural information processing systems, 32, 2019\.
- CBM \[25\]Riley Carlson, John Bauer, and Christopher D\. Manning\.A new pair of gloves, 2025\.
- CBUS\+\[20\]Brian Chmiel, Liad Ben\-Uri, Moran Shkolnik, Elad Hoffer, Ron Banner, and Daniel Soudry\.Neural gradients are near\-lognormal: improved quantized and sparse training\.arXiv preprint arXiv:2006\.08173, 2020\.
- FAHA \[23\]E Frantar, S Ashkboos, T Hoefler, and D Alistarh\.Optq: Accurate quantization for generative pre\-trained transformers\. 2023\.InURL https://openreview\. net/forum, 2023\.
- FG \[88\]Tomás Feder and Daniel Greene\.Optimal algorithms for approximate clustering\.InProceedings of the Twentieth Annual ACM Symposium on Theory of Computing, STOC ’88, page 434–444, New York, NY, USA, 1988\. Association for Computing Machinery\.
- FHH\+\[20\]Fangcheng Fu, Yuzheng Hu, Yihan He, Jiawei Jiang, Yingxia Shao, Ce Zhang, and Bin Cui\.Don’t waste your bits\! squeeze activations and gradients for deep neural networks via tinyscript\.InInternational Conference on Machine Learning, pages 3304–3314\. PMLR, 2020\.
- FJ \[84\]Greg N Frederickson and Donald B Johnson\.Generalized selection and ranking: sorted matrices\.SIAM Journal on computing, 13\(1\):14–30, 1984\.
- FTM\+\[20\]Fartash Faghri, Iman Tabrizian, Ilia Markov, Dan Alistarh, Daniel M Roy, and Ali Ramezani\-Kebrya\.Adaptive gradient quantization for data\-parallel sgd\.Advances in neural information processing systems, 33:3174–3185, 2020\.
- GGX\+\[25\]Jianyang Gao, Yutong Gou, Yuexuan Xu, Yongyi Yang, Cheng Long, and Raymond Chi\-Wing Wong\.Practical and asymptotically optimal quantization of high\-dimensional vectors in euclidean space for approximate nearest neighbor search\.Proceedings of the ACM on Management of Data, 3\(3\):1–26, 2025\.
- GHKS \[13\]Tiezheng Ge, Kaiming He, Qifa Ke, and Jian Sun\.Optimized product quantization\.IEEE transactions on pattern analysis and machine intelligence, 36\(4\):744–755, 2013\.
- GJS \[74\]Michael R Garey, David S Johnson, and Larry Stockmeyer\.Some simplified np\-complete problems\.InProceedings of the sixth annual ACM symposium on Theory of computing, pages 47–63, 1974\.
- GLM\+\[17\]Allan Grønlund, Kasper Green Larsen, Alexander Mathiasen, Jesper Sindahl Nielsen, Stefan Schneider, and Mingzhou Song\.Fast exact k\-means, k\-medians and bregman divergence clustering in 1d\.arXiv preprint arXiv:1701\.07204, 2017\.
- Gon \[85\]Teofilo F Gonzalez\.Clustering to minimize the maximum intercluster distance\.Theoretical computer science, 38:293–306, 1985\.
- HJ \[12\]Roger A\. Horn and Charles R\. Johnson\.Matrix Analysis\.Cambridge University Press, 2nd edition, 2012\.
- JLPK \[23\]Yongkweon Jeon, Chungman Lee, Kyungphil Park, and Ho\-young Kim\.A frustratingly easy post\-training quantization scheme for llms\.InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 14446–14461, 2023\.
- MA \[85\]Andranik Mirzaian and Eshrat Arjomandi\.Selection in x\+ y and matrices with sorted rows and columns\.Information processing letters, 20\(1\):13–17, 1985\.
- MNA\+\[17\]Paulius Micikevicius, Sharan Narang, Jonah Alben, Gregory Diamos, Erich Elsen, David Garcia, Boris Ginsburg, Michael Houston, Oleksii Kuchaiev, Ganesh Venkatesh, et al\.Mixed precision training\.arXiv preprint arXiv:1710\.03740, 2017\.
- MUJS \[18\]Yusuke Matsui, Yusuke Uchida, Herve Jegou, and Shin’ichi Satoh\.A survey of product quantization\.ITE Transactions on Media Technology and Applications, 6\(1\):2–10, 2018\.
- NAVB\+\[20\]Markus Nagel, Rana Ali Amjad, Mart Van Baalen, Christos Louizos, and Tijmen Blankevoort\.Up or down? adaptive rounding for post\-training quantization\.InInternational conference on machine learning, pages 7197–7206\. PMLR, 2020\.
- RKFM\+\[21\]Ali Ramezani\-Kebrya, Fartash Faghri, Ilya Markov, Vitalii Aksenov, Dan Alistarh, and Daniel M Roy\.Nuqsgd: Provably communication\-efficient data\-parallel sgd via nonuniform quantization\.Journal of Machine Learning Research, 22\(114\):1–43, 2021\.
- Sch \[98\]Baruch Schieber\.Computing a minimum weightk\-link path in graphs with the concave monge property\.Journal of Algorithms, 29\(2\):204–222, 1998\.
- SZY\+\[23\]Ying Sheng, Lianmin Zheng, Binhang Yuan, Zhuohan Li, Max Ryabinin, Beidi Chen, Percy Liang, Christopher Ré, Ion Stoica, and Ce Zhang\.Flexgen: High\-throughput generative inference of large language models with a single gpu\.InInternational Conference on Machine Learning, pages 31094–31116\. PMLR, 2023\.
- VAM \[18\]Mariia Vladimirova, Julyan Arbel, and Pablo Mesejo\.Bayesian neural networks become heavier\-tailed with depth\.InNeurIPS 2018\-Thirty\-second Conference on Neural Information Processing Systems, pages 1–7, 2018\.
- Wil \[88\]Robert Wilber\.The concave least\-weight subsequence problem revisited\.Journal of Algorithms, 9\(3\):418–425, 1988\.
- Wu \[91\]Xiaolin Wu\.Optimal quantization by matrix searching\.Journal of algorithms, 12\(4\):663–673, 1991\.
- ZDHM \[25\]Amir Zandieh, Majid Daliri, Majid Hadian, and Vahab Mirrokni\.Turboquant: Online vector quantization with near\-optimal distortion rate\.arXiv preprint arXiv:2504\.19874, 2025\.
- ZLK\+\[17\]Hantian Zhang, Jerry Li, Kaan Kara, Dan Alistarh, Ji Liu, and Ce Zhang\.Zipml: Training linear models with end\-to\-end low precision, and a little bit of deep learning\.InInternational Conference on Machine Learning, pages 4035–4043\. PMLR, 2017\.

###### Appendix Contents

1. [1Introduction](https://arxiv.org/html/2606.00289#S1)
2. [2Background and Motivation](https://arxiv.org/html/2606.00289#S2)
3. [3Theoretical Results](https://arxiv.org/html/2606.00289#S3)
4. [4Empirical Results](https://arxiv.org/html/2606.00289#S4)
5. [5Discussion](https://arxiv.org/html/2606.00289#S5)
6. [References](https://arxiv.org/html/2606.00289#bib)
7. [ATechnical Preliminaries](https://arxiv.org/html/2606.00289#A1)
8. [BStandard Stochastic Quantization is Optimal for𝖶𝖣𝖣𝖵\\operatorname\{\\mathsf\{WDDV\}\}](https://arxiv.org/html/2606.00289#A2)
9. [CNP\-Hardness of Optimizing𝖠𝖣𝖣𝖵\\operatorname\{\\mathsf\{ADDV\}\}](https://arxiv.org/html/2606.00289#A3)
10. [DAverage Directional Variance](https://arxiv.org/html/2606.00289#A4)
11. [EAn Improved Approximation Algorithm for Unweighted MSE](https://arxiv.org/html/2606.00289#A5)
12. [FMaximum Directional Variance](https://arxiv.org/html/2606.00289#A6)
13. [GDeferred Proofs and Figures](https://arxiv.org/html/2606.00289#A7)

## Appendix ATechnical Preliminaries

### A\.1Matrix Properties and Algorithms

###### Definition A\.1\(Sorted Matrix\)\.

We call a matrixA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}*sorted*if all its rows are sorted in non\-decreasing order while all columns are in non\-increasing order\.

###### Lemma A\.1\(Sorted\-SelectionAlgorithm, Theorem 1 of\[[12](https://arxiv.org/html/2606.00289#bib.bib12)\]and Corollary 6\.2 of\[[21](https://arxiv.org/html/2606.00289#bib.bib21)\]\)\.

LetA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}be a sorted matrix\. Then, there exists an algorithm that returns thekkth smallest element \(i\.e\., performskk\-selection\) inO\(d\)O\(d\)time\.

Importantly, the algorithms of Lemma[A\.1](https://arxiv.org/html/2606.00289#A1.Thmlemma1)operate in the query access model, where they are allowedO\(1\)O\(1\)time access to any entry ofAA\. The algorithms from\[[12](https://arxiv.org/html/2606.00289#bib.bib12),[21](https://arxiv.org/html/2606.00289#bib.bib21)\]achieveO\(d\)O\(d\)time forkk\-selection in row\-and\-column sortedd×dd\\times dmatrices by utilizing a divide\-and\-conquer strategy that systematically restricts the search space to a narrow “staircase” contour ofO\(d\)O\(d\)candidate elements\.\[[12](https://arxiv.org/html/2606.00289#bib.bib12)\]accomplishes this through recursive matrix bisection into2×22\\times 2submatrices, extracting representatives to compute bounds that strictly contain thekk\-th element, and then applying a linear\-time selection subroutine to the remaining candidates along the monotonic frontier\.\[[21](https://arxiv.org/html/2606.00289#bib.bib21)\]streamlines this recursive procedure and greatly simplifies the algorithm and analysis\.

###### Definition A\.2\(Monotone Matrix\)\.

Let matrixA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}and definec\(i\):=min\{j∈\[d\]:A\[i,j\]=mink∈\[d\]A\[i,k\]\}c\(i\)\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\\\{j\\in\[d\]\\mathrel\{\\mathop\{\\ordinarycolon\}\}A\[i,j\]=\\min\_\{k\\in\[d\]\}A\[i,k\]\\\}\. MatrixAAis monotone if for alli∈\[d\]i\\in\[d\],c\(i\)≤c\(i\+1\)c\(i\)\\leq c\(i\+1\)\.

###### Definition A\.3\(Totally Monotone Matrix\)\.

MatrixA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}is*totally*monotone if every submatrix ofAAis monotone\. Equivalently,AAis totally monotone if fori<i′i<i^\{\\prime\}andj<j′j<j^\{\\prime\},A\[i,j\]\>A\[i,j′\]⟹A\[i′,j\]\>A\[i′,j′\]A\[i,j\]\>A\[i,j^\{\\prime\}\]\\implies A\[i^\{\\prime\},j\]\>A\[i^\{\\prime\},j^\{\\prime\}\]\.

###### Definition A\.4\(Concave Monge Property\)\.

MatrixA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}satisfies the concave Monge property if for any1≤j<k<d1\\leq j<k<d,A\[j,k\]\+A\[j\+1,k\+1\]≤A\[j,k\+1\]\+A\[j\+1,k\]A\[j,k\]\+A\[j\+1,k\+1\]\\leq A\[j,k\+1\]\+A\[j\+1,k\]\.

###### Definition A\.5\(Quadrangle Inequality\)\.

MatrixA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}satisfies the quadrangle inequality if for anya≤b≤c≤fa\\leq b\\leq c\\leq f,A\[a,c\]\+A\[b,f\]≤A\[a,f\]\+A\[b,c\]A\[a,c\]\+A\[b,f\]\\leq A\[a,f\]\+A\[b,c\]\.

It is clear that any matrix satisfying the Quadrangle Inequality \(i\) satisfies the Concave Monge property, \(ii\) is totally monotone, and \(iii\) is monotone\.

###### Lemma A\.2\(SMAWKAlgorithm, Theorem 4\.3 of\[[2](https://arxiv.org/html/2606.00289#bib.bib2)\]\)\.

LetA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}be a totally monotone matrix\. Then, there exists an algorithm that inO\(d\)O\(d\)time returnsj\(i\):=min\{j∈\[d\]:A\[i,j\]=maxk∈\[d\]A\[i,k\]\}j\(i\)\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\\\{j\\in\[d\]\\mathrel\{\\mathop\{\\ordinarycolon\}\}A\[i,j\]=\\max\_\{k\\in\[d\]\}A\[i,k\]\\\}for all rowsi∈\[d\]i\\in\[d\]\.

TheSMAWKalgorithm leverages the total monotonicity property to systematically eliminate redundant columns, thereby shrinking the matrix width\. Following this, the algorithm recurses on the even\-indexed rows of the reduced matrix to find their respective minima\. In the final step, it uses the computed minima of the even rows to tightly constrain the search space for the odd\-indexed rows, locating the remaining minima in strictly linear time\.

### A\.2Clustering Algorithms

Many algorithms in Appendices[E](https://arxiv.org/html/2606.00289#A5)and[F](https://arxiv.org/html/2606.00289#A6)make use of exact and/or approximate clustering algorithms as subroutines\. These clustering algorithms are often used to give*data\-dependent coresets*of the original instance, upon which further processing is done\.

In particular, we utilize an algorithm forss\-center clustering in Euclidean space\. Here, an input consists ofddvectorsv1,…,vd∈ℝnv\_\{1\},\\ldots,v\_\{d\}\\in\\mathbb\{R\}^\{n\}and the task is to return anss\-clustering𝒞=\(C1,…,Cs\)\\mathcal\{C\}=\(C\_\{1\},\\ldots,C\_\{s\}\)such thatC1⊔…⊔Cs=\{v1,…,vd\}C\_\{1\}\\sqcup\\ldots\\sqcup C\_\{s\}=\\\{v\_\{1\},\\ldots,v\_\{d\}\\\}that minimizes the maximum radius of any cluster\.

Radius\(𝒞\):=12⋅maxi∈\[s\]maxvj,vk∈Ci‖vj−vk‖2\\displaystyle\\text\{Radius\}\(\\mathcal\{C\}\)\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\frac\{1\}\{2\}\\cdot\\max\_\{i\\in\[s\]\}\\max\_\{v\_\{j\},v\_\{k\}\\in C\_\{i\}\}\\mathinner\{\\\!\\left\\lVert v\_\{j\}\-v\_\{k\}\\right\\rVert\}\_\{2\}The following22\-approximatess\-center clustering algorithm of\[[10](https://arxiv.org/html/2606.00289#bib.bib10)\]is used often\.

###### Lemma A\.3\(s\-Center\-ClusteringApproximation Algorithm, Theorem 4\.1 of\[[10](https://arxiv.org/html/2606.00289#bib.bib10)\]\)\.

Givenddvectorsv1,…,vd∈ℝnv\_\{1\},\\ldots,v\_\{d\}\\in\\mathbb\{R\}^\{n\}and a target number of clusterss∈ℕs\\in\\mathbb\{N\}, there exists an algorithm with runtimeO\(dlog⁡s\)O\(d\\log s\)that returns anss\-clustering𝒞\\mathcal\{C\}such that

Radius\(𝒞\)≤2⋅min𝒜=\(A1,…,As\)A1⊔…⊔As=\{v1,…,vd\}⁡Radius\(𝒜\)\\mathrm\{Radius\}\(\\mathcal\{C\}\)\\leq 2\\cdot\\min\_\{\\begin\{subarray\}\{c\}\\mathcal\{A\}=\(A\_\{1\},\\ldots,A\_\{s\}\)\\\\ A\_\{1\}\\sqcup\\ldots\\sqcup A\_\{s\}=\\\{v\_\{1\},\\ldots,v\_\{d\}\\\}\\end\{subarray\}\}\\mathrm\{Radius\}\(\\mathcal\{A\}\)

Thes\-Center\-Clusteringalgorithm of Lemma[A\.3](https://arxiv.org/html/2606.00289#A1.Thmlemma3)works similarly to Gonzalez’s algorithm\[[18](https://arxiv.org/html/2606.00289#bib.bib18)\]by finding an independent set in a suitable implicitly defined geometric graph\. This graph, however, is defined over*boxes*of points rather than the points themselves\. This ultimately allows for a more efficient data structure that can be used to construct the independent set\. The work is technical and involved, so for further details, consult the original paper\[[10](https://arxiv.org/html/2606.00289#bib.bib10)\]\.

Hidden in the lemma statement is that the runtime of the above algorithm has exponential dependence on the dimension of the spacenn\. In this work, however, we only utilize this algorithm on 1\-dimensional instances\.

## Appendix BStandard Stochastic Quantization is Optimal for𝖶𝖣𝖣𝖵\\operatorname\{\\mathsf\{WDDV\}\}

In this section, we prove that the standard stochastic quantization rounding distribution is optimal for𝖶𝖣𝖣𝖵\\operatorname\{\\mathsf\{WDDV\}\}\(Definition[3\.1](https://arxiv.org/html/2606.00289#S3.Thmdefinition1)\)\. Recall that𝒟𝖲𝖲𝖰⁡\(w,Q\)\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)is the distribution that rounds each pointwiw\_\{i\}to eitherwi↑\(Q\)w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q\)orwi↓\(Q\)w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)independently, with probabilities chosen to ensure that the rounding is unbiased\.

\\ASQoptimalthm

To prove Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2), we first show the following key lemma\.

###### Lemma B\.1\.

Consideru∈ℝu\\in\\mathbb\{R\}and quantization setQ⊆ℝQ\\subseteq\\mathbb\{R\}\. Letu↓:=max\{q∈Q:q≤u\}u^\{\\scriptscriptstyle\\downarrow\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\\\{q\\in Q\\mathrel\{\\mathop\{\\ordinarycolon\}\}q\\leq u\\\},u↑:=min\{q∈Q:q≥u\}u^\{\\scriptscriptstyle\\uparrow\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\\\{q\\in Q\\mathrel\{\\mathop\{\\ordinarycolon\}\}q\\geq u\\\}, and𝒟∗\\mathcal\{D\}^\{\\ast\}denote the unique distribution that roundsuuto\{u↓,u↑\}\\\{u^\{\\scriptscriptstyle\\downarrow\},u^\{\\scriptscriptstyle\\uparrow\}\\\}such that𝐄𝐮^∼𝒟∗\[𝐮^\]=u\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\ast\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]=u\. Then, for any distribution𝒟\\mathcal\{D\}such thatsupp\(𝒟\)⊆Q\\mathrm\{supp\}\(\\mathcal\{D\}\)\\subseteq Qand𝔼𝐮^∼𝒟\[𝐮^\]=u\\mathbb\{E\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]=u, we have that𝐕𝐚𝐫𝐮^∼𝒟\[𝐮^\]≥𝐕𝐚𝐫𝐮^∼𝒟∗\[𝐮^\]\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\\geq\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\ast\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\.

###### Proof\.

LetQ↓:=supp\(𝒟\)∩\{q∈Q:q≤u\}Q^\{\\scriptscriptstyle\\downarrow\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\mathrm\{supp\}\(\\mathcal\{D\}\)\\cap\\\{q\\in Q\\mathrel\{\\mathop\{\\ordinarycolon\}\}q\\leq u\\\}and analogouslyQ↑:=supp\(𝒟\)∩\{q∈Q:q≥u\}Q^\{\\scriptscriptstyle\\uparrow\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\mathrm\{supp\}\(\\mathcal\{D\}\)\\cap\\\{q\\in Q\\mathrel\{\\mathop\{\\ordinarycolon\}\}q\\geq u\\\}\. The argument proceeds by first constructing an unbiased distribution𝒟′\\mathcal\{D\}^\{\\prime\}supported only on the centers of mass ofQ↓Q^\{\\scriptscriptstyle\\downarrow\}andQ↑Q^\{\\scriptscriptstyle\\uparrow\}and showing that𝐕𝐚𝐫𝒖^∼𝒟′\[𝒖^\]≤𝐕𝐚𝐫𝒖^∼𝒟\[𝒖^\]\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\\leq\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\. We then show that𝐕𝐚𝐫𝒖^∼𝒟∗\[𝒖^\]≤𝐕𝐚𝐫𝒖^∼𝒟′\[𝒖^\]\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\ast\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\\leq\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]to conclude the claim\. First, define the centers of mass

QCOM↓:=∑q∈Q↓q⋅𝒟\(q\)𝒟\(Q↓\),QCOM↑:=∑q∈Q↑q⋅𝒟\(q\)𝒟\(Q↑\)\\displaystyle Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\sum\_\{q\\in Q^\{\\scriptscriptstyle\\downarrow\}\}q\\cdot\\frac\{\\mathcal\{D\}\(q\)\}\{\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\downarrow\}\)\},\\quad\\quad\\quad Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\sum\_\{q\\in Q^\{\\scriptscriptstyle\\uparrow\}\}q\\cdot\\frac\{\\mathcal\{D\}\(q\)\}\{\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\uparrow\}\)\}where𝒟\(S\)≜∑v∈S𝒟\(v\)\\mathcal\{D\}\(S\)\\triangleq\\sum\_\{v\\in S\}\\mathcal\{D\}\(v\)forS⊆QS\\subseteq Q\. Define distribution𝒟′\\mathcal\{D\}^\{\\prime\}by placing mass𝒟\(Q↓\)\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\downarrow\}\)atQCOM↓Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\}and mass𝒟\(Q↑\)\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\uparrow\}\)atQCOM↑Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}as follows999Note that because𝒟′\\mathcal\{D\}^\{\\prime\}is an intermediate distribution only used for the purposes of the analysis, the fact that\{QCOM↓,QCOM↑\}\\\{Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\},Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}\\\}may*not*be a subset ofQQdoes not affect correctness

𝒟′\(x\)=\{𝒟\(Q↓\)ifx=QCOM↓𝒟\(Q↑\)ifx=QCOM↑0otherwise\\displaystyle\\mathcal\{D\}^\{\\prime\}\(x\)=\\begin\{cases\}\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\downarrow\}\)\\text\{ if \}x=Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\}\\\\ \\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\uparrow\}\)\\text\{ if \}x=Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}\\\\ 0\\text\{ otherwise \}\\end\{cases\}
𝒖\\boldsymbol\{u\}QCOM↓Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\}QCOM↑Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}Q↓Q^\{\\scriptscriptstyle\\downarrow\}Q↑Q^\{\\scriptscriptstyle\\uparrow\}Figure 6:Illustration of distribution𝒟′\\mathcal\{D\}^\{\\prime\}, showing the reduction of support of𝒟\\mathcal\{D\}to just two elementsSee Figure[6](https://arxiv.org/html/2606.00289#A2.F6)for an illustration of the transformation from distribution𝒟\\mathcal\{D\}to𝒟′\\mathcal\{D\}^\{\\prime\}\. Notice that

𝐄𝒖^∼𝒟′\[𝒖^\]=QCOM↓⋅𝒟\(Q↓\)\+QCOM↑⋅𝒟\(Q↑\)\\displaystyle\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]=Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\}\\cdot\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\downarrow\}\)\+Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}\\cdot\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\uparrow\}\)=∑q∈Q↓q⋅𝒟\(q\)\+∑q∈Q↑q⋅𝒟\(q\)\\displaystyle=\\sum\_\{q\\in Q^\{\\scriptscriptstyle\\downarrow\}\}q\\cdot\\mathcal\{D\}\(q\)\+\\sum\_\{q\\in Q^\{\\scriptscriptstyle\\uparrow\}\}q\\cdot\\mathcal\{D\}\(q\)=∑q∈Qq⋅𝒟\(q\)=𝐄𝒖^∼𝒟\[𝒖^\]=u\\displaystyle=\\sum\_\{q\\in Q\}q\\cdot\\mathcal\{D\}\(q\)=\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]=uBounding the variance,

𝐕𝐚𝐫𝒖^∼𝒟′\[𝒖^\]\\displaystyle\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]=𝐄𝒖^∼𝒟′\[𝒖^2\]−𝐄𝒖^∼𝒟′\[𝒖^\]2\\displaystyle=\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}^\{2\}\]\-\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]^\{2\}=\[𝒟\(Q↓\)⋅QCOM↓2\+𝒟\(Q↑\)⋅QCOM↑2\]−u2\\displaystyle=\[\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\downarrow\}\)\\cdot\{Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\}\}^\{2\}\+\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\uparrow\}\)\\cdot\{Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}\}^\{2\}\]\-u^\{2\}=\[\(∑q∈Q↓q⋅𝒟\(q\)\)2𝒟\(Q↓\)\+\(∑q∈Q↑q⋅𝒟\(q\)\)2𝒟\(Q↑\)\]−u2\\displaystyle=\\left\[\\frac\{\\left\(\\sum\_\{q\\in Q^\{\\scriptscriptstyle\\downarrow\}\}q\\cdot\\mathcal\{D\}\(q\)\\right\)^\{2\}\}\{\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\downarrow\}\)\}\+\\frac\{\\left\(\\sum\_\{q\\in Q^\{\\scriptscriptstyle\\uparrow\}\}q\\cdot\\mathcal\{D\}\(q\)\\right\)^\{2\}\}\{\\mathcal\{D\}\(Q^\{\\scriptscriptstyle\\uparrow\}\)\}\\right\]\-u^\{2\}≤\[∑q∈Q↓q2𝒟\(q\)\+∑q∈Q↑q2𝒟\(q\)\]−u2\\displaystyle\\leq\\left\[\\sum\_\{q\\in Q^\{\\scriptscriptstyle\\downarrow\}\}q^\{2\}\\mathcal\{D\}\(q\)\+\\sum\_\{q\\in Q^\{\\scriptscriptstyle\\uparrow\}\}q^\{2\}\\mathcal\{D\}\(q\)\\right\]\-u^\{2\}\(Cauchy\-Schwarz Inequality\)=\[∑q∈Qq2𝒟\(q\)\]−u2=𝐕𝐚𝐫𝒖^∼𝒟\[𝒖^\]\\displaystyle=\\left\[\\sum\_\{q\\in Q\}q^\{2\}\\mathcal\{D\}\(q\)\\right\]\-u^\{2\}=\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]and thus𝐕𝐚𝐫𝒖^∼𝒟′\[𝒖^\]≤𝐕𝐚𝐫𝒖^∼𝒟\[𝒖^\]\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\\leq\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\. Because𝐄𝒖^∼𝒟′\[𝒖^\]=u\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]=u, it follows that𝒟′\\mathcal\{D\}^\{\\prime\}performs unbiased rounding ofuuto the set\{QCOM↓,QCOM↑\}\\\{Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\},Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}\\\}\. Moreover, asQCOM↓≤u↓Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\}\\leq u^\{\\scriptscriptstyle\\downarrow\}andQCOM↑≥u↑Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}\\geq u^\{\\scriptscriptstyle\\uparrow\}by definition of the center of mass, we have

𝐕𝐚𝐫𝒖^∼𝒟′\[𝒖^\]\\displaystyle\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]=\(u−QCOM↓\)\(QCOM↑−u\)≥\(u−u↓\)\(u↑−u\)=𝐕𝐚𝐫𝒖^∼𝒟∗\[𝒖^\]\\displaystyle=\(u\-Q^\{\\scriptscriptstyle\\downarrow\}\_\{\\text\{COM\}\}\)\(Q^\{\\scriptscriptstyle\\uparrow\}\_\{\\text\{COM\}\}\-u\)\\geq\(u\-u^\{\\scriptscriptstyle\\downarrow\}\)\(u^\{\\scriptscriptstyle\\uparrow\}\-u\)=\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\ast\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]and so

𝐕𝐚𝐫𝒖^∼𝒟∗\[𝒖^\]≤𝐕𝐚𝐫𝒖^∼𝒟′\[𝒖^\]≤𝐕𝐚𝐫𝒖^∼𝒟\[𝒖^\]\.∎\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\*\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\\leq\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}^\{\\prime\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\\leq\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{u\}\}\]\.\\qed

We are now ready to prove Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)\. Here, for a givenw∈ℝdw\\in\\mathbb\{R\}^\{d\}andQ⊆ℝQ\\subseteq\\mathbb\{R\}, we use the notation𝖶𝖣𝖣𝖵\(w,𝒟\):=maxx:‖x‖2≤1𝐕𝐚𝐫𝒖^∼𝒟\[⟨𝒖^,x⟩\]\\operatorname\{\\mathsf\{WDDV\}\}\(w,\\mathcal\{D\}\)\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\_\{x\\mathrel\{\\mathop\{\\ordinarycolon\}\}\\mathinner\{\\\!\\left\\lVert x\\right\\rVert\}\_\{2\}\\leq 1\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\langle\\widehat\{\\boldsymbol\{u\}\},x\\rangle\]to denote the cost of choosing rounding distribution𝒟\\mathcal\{D\}\.

###### Proof of Theorem[3\.2](https://arxiv.org/html/2606.00289#S3.Thmdefinition2)\.

Observe that

𝖶𝖣𝖣𝖵⁡\(w,𝒟𝖲𝖲𝖰⁡\(w,Q\)\)\\displaystyle\\operatorname\{\\mathsf\{WDDV\}\}\(w,\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\)=maxx:‖x‖2≤1𝐕𝐚𝐫𝒖^∼𝒟𝖲𝖲𝖰\[⟨𝒖^,x⟩\]\\displaystyle=\\max\_\{x\\mathrel\{\\mathop\{\\ordinarycolon\}\}\\,\\mathinner\{\\\!\\left\\lVert x\\right\\rVert\}\_\{2\}\\leq 1\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\}\[\\langle\\widehat\{\\boldsymbol\{u\}\},x\\rangle\]=maxx:‖x‖2≤1∑i=1dxi2𝐕𝐚𝐫𝒖^∼𝒟𝖲𝖲𝖰\[𝒖^i\]\\displaystyle=\\max\_\{x\\mathrel\{\\mathop\{\\ordinarycolon\}\}\\,\\mathinner\{\\\!\\left\\lVert x\\right\\rVert\}\_\{2\}\\leq 1\}\\sum\_\{i=1\}^\{d\}x\_\{i\}^\{2\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\}\[\\widehat\{\\boldsymbol\{u\}\}\_\{i\}\]\(𝒟𝖲𝖲𝖰\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}is a product distribution\)=maxi∈\[d\]𝐕𝐚𝐫𝒖^∼𝒟𝖲𝖲𝖰\[𝒖^i\]=𝐕𝐚𝐫𝒖^∼𝒟𝖲𝖲𝖰\[𝒖^i∗\]\\displaystyle=\\max\_\{i\\in\[d\]\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\}\[\\widehat\{\\boldsymbol\{u\}\}\_\{i\}\]=\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\}\[\\widehat\{\\boldsymbol\{u\}\}\_\{i^\{\\ast\}\}\]wherei∗:=argmaxi∈\[d\]𝐕𝐚𝐫𝒖^∼𝒟𝖲𝖲𝖰\[𝒖^i\]i^\{\\ast\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\mathop\{\\mathrm\{argmax\}\}\_\{i\\in\[d\]\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\}\[\\widehat\{\\boldsymbol\{u\}\}\_\{i\}\]\. DefineΣ:=𝐕𝐚𝐫𝒖^∼𝒟𝖲𝖲𝖰\[𝒖^i∗\]\\Sigma\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\}\[\\widehat\{\\boldsymbol\{u\}\}\_\{i^\{\\ast\}\}\]\. Now, consider any arbitrary rounding𝒟∈Ω\\mathcal\{D\}\\in\\Omega\. It follows that

𝖶𝖣𝖣𝖵⁡\(w,𝒟\)≥𝐕𝐚𝐫𝒖^∼𝒟\[⟨𝒖^,ei∗⟩\]=𝐕𝐚𝐫𝒖^∼𝒟\[𝒖^i∗\]\\displaystyle\\operatorname\{\\mathsf\{WDDV\}\}\(w,\\mathcal\{D\}\)\\geq\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\langle\\widehat\{\\boldsymbol\{u\}\},e\_\{i^\{\\ast\}\}\\rangle\]=\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\mathcal\{D\}\}\[\\widehat\{\\boldsymbol\{u\}\}\_\{i^\{\\ast\}\}\]≥𝐕𝐚𝐫𝒖^∼𝒟𝖲𝖲𝖰\[𝒖^i∗\]\\displaystyle\\geq\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{u\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\}\[\\widehat\{\\boldsymbol\{u\}\}\_\{i^\{\\ast\}\}\]\(Lemma[B\.1](https://arxiv.org/html/2606.00289#A2.Thmlemma1)\)=𝖶𝖣𝖣𝖵⁡\(w,𝒟𝖲𝖲𝖰⁡\(w,Q\)\)\\displaystyle=\\operatorname\{\\mathsf\{WDDV\}\}\(w,\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\)as desired\. ∎

## Appendix CNP\-Hardness of Optimizing𝖠𝖣𝖣𝖵\\operatorname\{\\mathsf\{ADDV\}\}

###### Lemma C\.1\(Gershgorin Circle Theorem, Theorem 6\.1\.1 in\[[19](https://arxiv.org/html/2606.00289#bib.bib19)\]\)\.

LetA∈ℂd×dA\\in\\mathbb\{C\}^\{d\\times d\}and letRi\(A\):=∑j≠i\|Aij\|R\_\{i\}\(A\)\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\sum\_\{j\\neq i\}\|A\_\{ij\}\|for alli∈\[d\]i\\in\[d\]\. Then, the eigenvalues ofAAare contained in the union of discs

⋃i∈\[d\]\{z∈ℂ:\|z−Aii\|≤Ri\(A\)\}\\displaystyle\\bigcup\_\{i\\in\[d\]\}\\\{z\\in\\mathbb\{C\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}\|z\-A\_\{ii\}\|\\leq R\_\{i\}\(A\)\\\}

###### Corollary C\.2\(Lower Bound on Minimum Eigenvalue of Real Symmetric Matrices\)\.

LetA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}be a symmetric matrix with eigenvaluesλ1≥…≥λd\\lambda\_\{1\}\\geq\\ldots\\geq\\lambda\_\{d\}\. Then,

λd≥mini∈\[d\]⁡\(Aii−∑j≠i\|Aij\|\)\\displaystyle\\lambda\_\{d\}\\geq\\min\_\{i\\in\[d\]\}\\left\(A\_\{ii\}\-\\sum\_\{j\\neq i\}\|A\_\{ij\}\|\\right\)

###### Proof\.

By Lemma[C\.1](https://arxiv.org/html/2606.00289#A3.Thmlemma1), every eigenvalue ofAAlies inside the union of discs

⋃i∈\[d\]\{z∈ℂ:Aii−Ri\(A\)≤z≤Aii\+Ri\(A\)\}\\displaystyle\\bigcup\_\{i\\in\[d\]\}\\\{z\\in\\mathbb\{C\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}A\_\{ii\}\-R\_\{i\}\(A\)\\leq z\\leq A\_\{ii\}\+R\_\{i\}\(A\)\\\}It follows that

λd≥mini∈\[d\]⁡\(Aii−Ri\(A\)\)=mini∈\[d\]⁡\(Aii−∑j≠i\|Aij\|\)\\displaystyle\\lambda\_\{d\}\\geq\\min\_\{i\\in\[d\]\}\\left\(A\_\{ii\}\-R\_\{i\}\(A\)\\right\)=\\min\_\{i\\in\[d\]\}\\left\(A\_\{ii\}\-\\sum\_\{j\\neq i\}\|A\_\{ij\}\|\\right\)∎

###### Lemma C\.3\.

LetM∈ℝd×dM\\in\\mathbb\{R\}^\{d\\times d\}be a symmetric positive semi\-definite matrix\. Then, there exists a random variable𝐱\\mathit\{\\boldsymbol\{x\}\}such that

𝐂𝐨𝐯⁡\(𝒙\)=𝐄\[𝒙𝒙⊺\]=M\.\\displaystyle\\operatorname\{\{\\bf Cov\}\}\(\\mathit\{\\boldsymbol\{x\}\}\)=\\mathop\{\{\\bf E\}\\/\}\[\\mathit\{\\boldsymbol\{x\}\}\\mathit\{\\boldsymbol\{x\}\}^\{\\intercal\}\]=M\.

###### Proof\.

Define𝒛∼𝒩\(0,I\)\\boldsymbol\{z\}\\sim\\mathcal\{N\}\(0,I\)and𝒙:=M1/2𝒛\\mathit\{\\boldsymbol\{x\}\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=M^\{1/2\}\\boldsymbol\{z\}\.𝒙\\mathit\{\\boldsymbol\{x\}\}is well\-defined asMMis symmetric and positive semi\-definite, and thus has a real square rootM1/2M^\{1/2\}\. Therefore,

𝐂𝐨𝐯⁡\(𝒙\)\\displaystyle\\operatorname\{\{\\bf Cov\}\}\(\\mathit\{\\boldsymbol\{x\}\}\)=𝐄\[\(𝒙−𝐄\[𝒙\]\)\(𝒙−𝐄\[𝒙\]\)⊺\]\\displaystyle=\\mathop\{\{\\bf E\}\\/\}\[\(\\mathit\{\\boldsymbol\{x\}\}\-\\mathop\{\{\\bf E\}\\/\}\[\\mathit\{\\boldsymbol\{x\}\}\]\)\(\\mathit\{\\boldsymbol\{x\}\}\-\\mathop\{\{\\bf E\}\\/\}\[\\mathit\{\\boldsymbol\{x\}\}\]\)^\{\\intercal\}\]=𝐄\[𝒙𝒙⊺\]\\displaystyle=\\mathop\{\{\\bf E\}\\/\}\[\\mathit\{\\boldsymbol\{x\}\}\\mathit\{\\boldsymbol\{x\}\}^\{\\intercal\}\]\(𝐄\[𝒙\]=𝐄\[M1/2𝒛\]=0\\mathop\{\{\\bf E\}\\/\}\[\\mathit\{\\boldsymbol\{x\}\}\]=\\mathop\{\{\\bf E\}\\/\}\[M^\{1/2\}\\boldsymbol\{z\}\]=0\)=M1/2𝐄\[𝒛𝒛⊺\]M1/2\\displaystyle=M^\{1/2\}\\mathop\{\{\\bf E\}\\/\}\[\\boldsymbol\{z\}\\boldsymbol\{z\}^\{\\intercal\}\]M^\{1/2\}=M1/2IM1/2=M\.∎\\displaystyle=M^\{1/2\}IM^\{1/2\}=M\.\\qed

\\ADDVNPHard

###### Proof\.

We show a reduction from the decision version of unweighted Max\-Cut\. Consider an instance given by an unweighted \(simple\) graphGG\. The algorithm is as follows\.

Input:Graph

GG, Oracle access to

𝖠𝖣𝖣𝖵𝒳⁡\(w,Q\)\\operatorname\{\\mathsf\{ADDV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)
Output:Max\-Cut

\(G\)\(G\)
1

21exConstruct adjacency matrix

AAof

GG
3Let

y:=mini∈\[d\]\(Aii−∑j≠i\|Aij\|\)y\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\_\{i\\in\[d\]\}\\left\(A\_\{ii\}\-\\sum\_\{j\\neq i\}\|A\_\{ij\}\|\\right\)and

M:=A−yIM\\mathrel\{\\mathop\{\\ordinarycolon\}\}=A\-yI
4Let

𝒟opt:=𝖠𝖣𝖣𝖵𝒩\(0,M\)\(0,\{−1,1\}\)\\mathcal\{D\}^\{\\text\{opt\}\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\operatorname\{\\mathsf\{ADDV\}\}\_\{\\mathcal\{N\}\(0,M\)\}\(0,\\\{\-1,1\\\}\)
5Let

w^opt\\widehat\{w\}^\{\\text\{opt\}\}be an arbitrary vector from the support of

𝒟opt\\mathcal\{D\}^\{\\text\{opt\}\}
Return

14∑i≠jMij−14∑i≠jw^ioptw^joptMij\\frac\{1\}\{4\}\\sum\_\{i\\neq j\}M\_\{ij\}\-\\frac\{1\}\{4\}\\sum\_\{i\\neq j\}\\widehat\{w\}^\{\\text\{opt\}\}\_\{i\}\\widehat\{w\}^\{\\text\{opt\}\}\_\{j\}M\_\{ij\}

Algorithm 1Max\-Cut via𝖠𝖣𝖣𝖵\\operatorname\{\\mathsf\{ADDV\}\}The adjacency matrixA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}ofGGis

Aij:=\{1if\(i,j\)∈G0otherwiseA\_\{ij\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\begin\{cases\}1\\text\{ if \}\(i,j\)\\in G\\\\ 0\\text\{ otherwise\}\\end\{cases\}Then, definey:=mini∈\[d\]\(Aii−∑j≠i\|Aij\|\)y\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\_\{i\\in\[d\]\}\\left\(A\_\{ii\}\-\\sum\_\{j\\neq i\}\|A\_\{ij\}\|\\right\)and letM:=A−yIM\\mathrel\{\\mathop\{\\ordinarycolon\}\}=A\-yI\. Observe thatMMis positive semi\-definite and symmetric\. To see why this is the case, let\(v,λ\)\(v,\\lambda\)be an eigenvector\-eigenvalue pair ofAA\. Then,

Mv=\(A−yI\)v=Av−yIv=λv−yv=\(λ−y\)v,\\displaystyle Mv=\(A\-yI\)v=Av\-yIv=\\lambda v\-yv=\(\\lambda\-y\)v,so the eigenvalues ofMMareλ1−y≥…≥λd−y\\lambda\_\{1\}\-y\\geq\\ldots\\geq\\lambda\_\{d\}\-y\. By Corollary[C\.2](https://arxiv.org/html/2606.00289#A3.Thmlemma2),λd≥y\\lambda\_\{d\}\\geq y, and all eigenvalues ofMMare real and non\-negative; henceMMis positive semi\-definite\. Symmetry ofMMfollows immediately from symmetry ofAA\.

We now construct a corresponding instance of𝖠𝖣𝖣𝖵\\operatorname\{\\mathsf\{ADDV\}\}and show that its optimum yields a solution to Max\-Cut onGG\. In particular, consider the instance given byw=0∈ℝdw=0\\in\\mathbb\{R\}^\{d\},Q=\{−1,1\}Q=\\\{\-1,1\\\}, and𝒳=𝒩\(0,M\)\\mathcal\{X\}=\\mathcal\{N\}\(0,M\)\. Note that𝒳\\mathcal\{X\}is a valid distribution due to Lemma[C\.3](https://arxiv.org/html/2606.00289#A3.Thmlemma3), and recall thatΩ\\Omegadenotes the set of unbiased rounding distributions with support⊆Qd\\subseteq Q^\{d\}\. Then, the𝖠𝖣𝖣𝖵\\operatorname\{\\mathsf\{ADDV\}\}objective cost is given by

min𝒟∈Ω𝐄𝒙∼𝒩\(0,M\)\[𝐕𝐚𝐫𝒘^∼𝒟\[⟨𝒘^,𝒙⟩\]\]\\displaystyle\\min\_\{\\mathcal\{D\}\\in\\Omega\}\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{N\}\(0,M\)\}\\left\[\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},\\mathit\{\\boldsymbol\{x\}\}\\rangle\]\\right\]=min𝒟∈Ω𝐄𝒙∼𝒩\(0,M\)𝒘^∼𝒟\[⟨𝒘^,𝒙⟩2\]\\displaystyle=\\min\_\{\\mathcal\{D\}\\in\\Omega\}\\mathop\{\{\\bf E\}\\/\}\_\{\\begin\{subarray\}\{c\}\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{N\}\(0,M\)\\\\ \\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\\end\{subarray\}\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},\\mathit\{\\boldsymbol\{x\}\}\\rangle^\{2\}\]\(𝐄𝒘^∼𝒟\[⟨𝒘^,𝒙⟩\]=⟨w,x⟩=0\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},\\mathit\{\\boldsymbol\{x\}\}\\rangle\]=\\langle w,x\\rangle=0\)=min𝒟∈Ω𝐄𝒘^∼𝒟\[∑i,j=1d𝒘^i𝒘^j⋅𝐄𝒙∼𝒩\(0,M\)\[𝒙i𝒙j\]\]\\displaystyle=\\min\_\{\\mathcal\{D\}\\in\\Omega\}\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\\left\[\\sum\_\{i,j=1\}^\{d\}\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\\widehat\{\\boldsymbol\{w\}\}\_\{j\}\\cdot\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{N\}\(0,M\)\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}\\mathit\{\\boldsymbol\{x\}\}\_\{j\}\]\\right\]=min𝒟∈Ω𝐄𝒘^∼𝒟\[∑i=1d𝒘^i2⋅𝐄𝒙∼𝒩\(0,M\)\[𝒙i2\]\+∑i≠j𝒘^i𝒘^j⋅𝐄𝒙∼𝒩\(0,M\)\[𝒙i𝒙j\]\]\\displaystyle=\\min\_\{\\mathcal\{D\}\\in\\Omega\}\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\\left\[\\sum\_\{i=1\}^\{d\}\\widehat\{\\boldsymbol\{w\}\}\_\{i\}^\{2\}\\cdot\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{N\}\(0,M\)\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}^\{2\}\]\+\\sum\_\{i\\neq j\}\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\\widehat\{\\boldsymbol\{w\}\}\_\{j\}\\cdot\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{N\}\(0,M\)\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}\\mathit\{\\boldsymbol\{x\}\}\_\{j\}\]\\right\]=min𝒟∈Ω𝐄𝒘^∼𝒟\[∑i=1d𝐄𝒙∼𝒩\(0,M\)\[𝒙i2\]\+∑i≠j𝒘^i𝒘^j⋅𝐄𝒙∼𝒩\(0,M\)\[𝒙i𝒙j\]\]\\displaystyle=\\min\_\{\\mathcal\{D\}\\in\\Omega\}\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\\left\[\\sum\_\{i=1\}^\{d\}\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{N\}\(0,M\)\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}^\{2\}\]\+\\sum\_\{i\\neq j\}\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\\widehat\{\\boldsymbol\{w\}\}\_\{j\}\\cdot\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{N\}\(0,M\)\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}\\mathit\{\\boldsymbol\{x\}\}\_\{j\}\]\\right\]\(Q=\{−1,1\}Q=\\\{\-1,1\\\}\)=min𝒟∈Ω𝐄𝒘^∼𝒟\[∑i=1dMii\+∑i≠j𝒘^i𝒘^j⋅Mij\]\\displaystyle=\\min\_\{\\mathcal\{D\}\\in\\Omega\}\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\\left\[\\sum\_\{i=1\}^\{d\}M\_\{ii\}\+\\sum\_\{i\\neq j\}\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\\widehat\{\\boldsymbol\{w\}\}\_\{j\}\\cdot M\_\{ij\}\\right\]\(𝐂𝐨𝐯⁡\(𝒙\)=M\\operatorname\{\{\\bf Cov\}\}\(\\mathit\{\\boldsymbol\{x\}\}\)=M\)We now construct an optimal distribution𝒟∗\\mathcal\{D\}^\{\\ast\}for𝖠𝖣𝖣𝖵𝒩\(0,M\)⁡\(0,\{−1,1\}\)\\operatorname\{\\mathsf\{ADDV\}\}\_\{\\mathcal\{N\}\(0,M\)\}\(0,\\\{\-1,1\\\}\)\. Define

w^∗:=argminw^∈\{−1,1\}d\[∑i=1dMii\+∑i≠jw^iw^j⋅Mij\]=argminw^∈\{−1,1\}d\[∑i≠jw^iw^j⋅Mij\]\\displaystyle\\widehat\{w\}^\{\\ast\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\mathop\{\\mathrm\{argmin\}\}\_\{\\widehat\{w\}\\in\\\{\-1,1\\\}^\{d\}\}\\left\[\\sum\_\{i=1\}^\{d\}M\_\{ii\}\+\\sum\_\{i\\neq j\}\\widehat\{w\}\_\{i\}\\widehat\{w\}\_\{j\}\\cdot M\_\{ij\}\\right\]=\\mathop\{\\mathrm\{argmin\}\}\_\{\\widehat\{w\}\\in\\\{\-1,1\\\}^\{d\}\}\\left\[\\sum\_\{i\\neq j\}\\widehat\{w\}\_\{i\}\\widehat\{w\}\_\{j\}\\cdot M\_\{ij\}\\right\]Since the objective is even, let𝒟∗\\mathcal\{D\}^\{\\ast\}place equal weight onw^∗\\widehat\{w\}^\{\\ast\}and−w^∗\-\\widehat\{w\}^\{\\ast\}and observe that

𝐄𝒘^∼𝒟∗\[∑i=1dMii\+∑i≠j𝒘^i𝒘^jMij\]\\displaystyle\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}^\{\\ast\}\}\\left\[\\sum\_\{i=1\}^\{d\}M\_\{ii\}\+\\sum\_\{i\\neq j\}\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\\widehat\{\\boldsymbol\{w\}\}\_\{j\}M\_\{ij\}\\right\]=∑i=1dMii\+∑i≠jw^i∗w^j∗Mij\\displaystyle=\\sum\_\{i=1\}^\{d\}M\_\{ii\}\+\\sum\_\{i\\neq j\}\\widehat\{w\}^\{\\ast\}\_\{i\}\\widehat\{w\}^\{\\ast\}\_\{j\}M\_\{ij\}=min𝒟∈Ω𝐄𝒘^∼𝒟\[∑i=1dMii\+∑i≠j𝒘^i𝒘^j⋅Mij\]\\displaystyle=\\min\_\{\\mathcal\{D\}\\in\\Omega\}\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}\}\\left\[\\sum\_\{i=1\}^\{d\}M\_\{ii\}\+\\sum\_\{i\\neq j\}\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\\widehat\{\\boldsymbol\{w\}\}\_\{j\}\\cdot M\_\{ij\}\\right\]by definition ofw^∗\\widehat\{w\}^\{\\ast\}\. Additionally, for anyx∈supp\(𝒳\)x\\in\\text\{supp\}\(\\mathcal\{X\}\),

𝐄𝒘^∼𝒟∗\[⟨𝒘^,x⟩\]=⟨w^∗,x⟩\+⟨−w^∗,x⟩=0=⟨w,x⟩\\displaystyle\\mathop\{\{\\bf E\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\mathcal\{D\}^\{\\ast\}\}\[\\langle\\widehat\{\\boldsymbol\{w\}\},x\\rangle\]=\\langle\\widehat\{w\}^\{\\ast\},x\\rangle\+\\langle\-\\widehat\{w\}^\{\\ast\},x\\rangle=0=\\langle w,x\\rangleThus,𝒟∗\\mathcal\{D\}^\{\\ast\}is an optimal distribution giving objective value

∑i=1dMii\+minw^∈\{−1,1\}d∑i≠jw^iw^jMij\\displaystyle\\sum\_\{i=1\}^\{d\}M\_\{ii\}\+\\min\_\{\\widehat\{w\}\\in\\\{\-1,1\\\}^\{d\}\}\\sum\_\{i\\neq j\}\\widehat\{w\}\_\{i\}\\widehat\{w\}\_\{j\}M\_\{ij\}𝒟∗\\mathcal\{D\}^\{\\ast\}may not be the unique optimal distribution, but any optimal distribution must be supported only on vectorsw^\\widehat\{w\}that minimize the quantity∑i≠jw^iw^jMij\\sum\_\{i\\neq j\}\\widehat\{w\}\_\{i\}\\widehat\{w\}\_\{j\}M\_\{ij\}\(since its objective value is the same as𝒟∗\\mathcal\{D\}^\{\\ast\}\)\. Therefore, given any optimal distribution𝒟opt\\mathcal\{D\}^\{\\text\{opt\}\}, we can extract an arbitrary vectorw^opt\\widehat\{w\}^\{\\text\{opt\}\}from its support\.

Finally, observe that the max\-cut objective on graphGGcan be written as

Max\-Cut\(G\)\\displaystyle\\text\{Max\-Cut\}\(G\):=maxw^∈\{−1,1\}d12∑i≠j\(w^i−w^j\)24⋅Mij=14∑i≠jMij−minw^∈\{−1,1\}d14∑i≠jw^iw^j⋅Mij\\displaystyle\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\_\{\\widehat\{w\}\\in\\\{\-1,1\\\}^\{d\}\}\\frac\{1\}\{2\}\\sum\_\{i\\neq j\}\\frac\{\(\\widehat\{w\}\_\{i\}\-\\widehat\{w\}\_\{j\}\)^\{2\}\}\{4\}\\cdot M\_\{ij\}=\\frac\{1\}\{4\}\\sum\_\{i\\neq j\}M\_\{ij\}\-\\min\_\{\\widehat\{w\}\\in\\\{\-1,1\\\}^\{d\}\}\\frac\{1\}\{4\}\\sum\_\{i\\neq j\}\\widehat\{w\}\_\{i\}\\widehat\{w\}\_\{j\}\\cdot M\_\{ij\}or equivalently

Max\-Cut\(G\)\\displaystyle\\text\{Max\-Cut\}\(G\):=14∑i≠jMij−14∑i≠jw^ioptw^joptMij\\displaystyle\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\frac\{1\}\{4\}\\sum\_\{i\\neq j\}M\_\{ij\}\-\\frac\{1\}\{4\}\\sum\_\{i\\neq j\}\\widehat\{w\}^\{\\text\{opt\}\}\_\{i\}\\widehat\{w\}^\{\\text\{opt\}\}\_\{j\}M\_\{ij\}

##### Runtime\.

It is clear that constructingAA,yy, andMMtakeO\(d2\)O\(d^\{2\}\)time\. Because computing and reporting𝖠𝖣𝖣𝖵𝒩\(0,M\)⁡\(0,\{−1,1\}\)\\operatorname\{\\mathsf\{ADDV\}\}\_\{\\mathcal\{N\}\(0,M\)\}\(0,\\\{\-1,1\\\}\)is assumed to take polynomial time, extractingwoptw^\{\\text\{opt\}\}from the support of𝒟opt\\mathcal\{D\}^\{\\text\{opt\}\}also takes polynomial time\. Therefore, Algorithm[1](https://arxiv.org/html/2606.00289#alg1)is a polynomial\-time reduction from unweighted Max\-Cut to𝖠𝖣𝖣𝖵\\operatorname\{\\mathsf\{ADDV\}\}\. Because Max\-Cut on unweighted, simple graphs is NP\-Hard\[[16](https://arxiv.org/html/2606.00289#bib.bib16)\], this proves the claim\. ∎

## Appendix DAverage Directional Variance

We begin by making important observations about the𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}objective and defining quantities that will be useful for the algorithms in the remainder of the section\. Oftentimes, we also think ofwwas a multiset\.

Furthermore, we assume that the input distribution𝒳\\mathcal\{X\}hassupp\(𝒳\)⊆ℝd\\mathrm\{supp\}\(\\mathcal\{X\}\)\\subseteq\\mathbb\{R\}^\{d\}and has finite marginal second moments \(i\.e\.𝐄𝒙∼𝒳\[𝒙i2\]\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}^\{2\}\]is finite for alli∈\[d\]i\\in\[d\]\)\. The problems we consider become trivial otherwise\. Our algorithms also assume full\-precision access to the marginal second moments\.

###### Lemma D\.1\.

Let𝒳\\mathcal\{X\}be an arbitrary input distribution andw∈ℝdw\\in\\mathbb\{R\}^\{d\}be sorted such thatw1≤⋯≤wdw\_\{1\}\\leq\\cdots\\leq w\_\{d\}\. LetQ∗Q^\{\\ast\}be such that\|Q∗\|=s\|Q^\{\\ast\}\|=sand𝖠𝖣𝖵𝒳⁡\(w,Q∗\)=𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q^\{\\ast\}\)=\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\. Then,w1,wd∈Q∗w\_\{1\},w\_\{d\}\\in Q^\{\\ast\}\.

###### Lemma D\.2\.

For every input distribution𝒳\\mathcal\{X\},w∈ℝdw\\in\\mathbb\{R\}^\{d\}, ands∈ℕs\\in\\mathbb\{N\}, there exists a quantization setQ∗⊆ℝQ^\{\\ast\}\\subseteq\\mathbb\{R\}such that \(i\)\|Q∗\|=s\|Q^\{\\ast\}\|=s, \(ii\)Q∗⊆wQ^\{\\ast\}\\subseteq w, and \(iii\)𝖠𝖣𝖵𝒳⁡\(w,Q∗\)=𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q^\{\\ast\}\)=\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\.

We now define and prove a property of the optimization problem that is the crux of all the results in this section\. We first define a shorthand, withwwindexed sow1≤⋯≤wdw\_\{1\}\\leq\\cdots\\leq w\_\{d\},

C\[j,k\]\\displaystyle C\[j,k\]≜∑i=jkλi⋅\(wk−wi\)\(wi−wj\)\\displaystyle\\triangleq\\sum\_\{i=j\}^\{k\}\\lambda\_\{i\}\\cdot\(w\_\{k\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{j\}\)so thatC∈ℝd×dC\\in\\mathbb\{R\}^\{d\\times d\}\. We use the standard convention thatC\[j,k\]=0C\[j,k\]=0ifj\>kj\>k\. Here,λi:=𝐄𝒙∼𝒳\[𝒙i2\]∈ℝ\+\\lambda\_\{i\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}^\{2\}\]\\in\\mathbb\{R\}^\{\+\}denotes a weight associated with the elementwiw\_\{i\}\. The quantityC\[j,k\]C\[j,k\]represents the \(weighted\) sum of variances of points in the region\[wj,wk\]\[w\_\{j\},w\_\{k\}\]assumingwj,wk∈Qw\_\{j\},w\_\{k\}\\in Q\.

###### Lemma D\.3\(Lemma 4\.2 of\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]\)\.

For every input distribution𝒳\\mathcal\{X\}, MatrixCCsatisfies the Concave Monge property \(Definition[A\.4](https://arxiv.org/html/2606.00289#A1.Thmdefinition4)\)\.

###### Lemma D\.4\.

For every input distribution𝒳\\mathcal\{X\}, MatrixCCis sorted \(Definition[A\.1](https://arxiv.org/html/2606.00289#A1.Thmdefinition1)\)\.

The proofs of Lemmas[D\.1](https://arxiv.org/html/2606.00289#A4.Thmlemma1),[D\.2](https://arxiv.org/html/2606.00289#A4.Thmlemma2),[D\.3](https://arxiv.org/html/2606.00289#A4.Thmlemma3), and[D\.4](https://arxiv.org/html/2606.00289#A4.Thmlemma4)are deferred to Appendix[G](https://arxiv.org/html/2606.00289#A7)\. Note that Lemma[D\.3](https://arxiv.org/html/2606.00289#A4.Thmlemma3)is the weighted version of Lemma 4\.2 in\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]\. The proof closely follows that of Lemma 4\.2 in\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]\.

###### Lemma D\.5\.

Given \(unsorted\)w∈ℝdw\\in\\mathbb\{R\}^\{d\}, withO\(dlog⁡d\)O\(d\\log d\)pre\-processing time, any entry ofC\[i,j\]C\[i,j\]can be computed inO\(1\)O\(1\)time\.

###### Proof\.

Sortwwso thatw1≤⋯≤wdw\_\{1\}\\leq\\cdots\\leq w\_\{d\}\. We define prefix sum vectors

α\[j\]=∑i=1jλiβ\[j\]=∑i=1jλiwiγ\[j\]=∑i=1jλiwi2\\displaystyle\\alpha\[j\]=\\sum\_\{i=1\}^\{j\}\\lambda\_\{i\}\\quad\\quad\\quad\\quad\\beta\[j\]=\\sum\_\{i=1\}^\{j\}\\lambda\_\{i\}w\_\{i\}\\quad\\quad\\quad\\quad\\gamma\[j\]=\\sum\_\{i=1\}^\{j\}\\lambda\_\{i\}w\_\{i\}^\{2\}Then,

C\[j,k\]\\displaystyle C\[j,k\]:=∑i=jkλi⋅\(wk−wi\)\(wi−wj\)=∑i=j\+1kλi⋅\(wk−wi\)\(wi−wj\)\\displaystyle\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\sum\_\{i=j\}^\{k\}\\lambda\_\{i\}\\cdot\(w\_\{k\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{j\}\)=\\sum\_\{i=j\+1\}^\{k\}\\lambda\_\{i\}\\cdot\(w\_\{k\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{j\}\)=−wjwk∑i=j\+1kλi\+\(wj\+wk\)⋅∑i=j\+1kλiwi−∑i=j\+1kλiwi2\\displaystyle=\-w\_\{j\}w\_\{k\}\\sum\_\{i=j\+1\}^\{k\}\\lambda\_\{i\}\+\(w\_\{j\}\+w\_\{k\}\)\\cdot\\sum\_\{i=j\+1\}^\{k\}\\lambda\_\{i\}w\_\{i\}\-\\sum\_\{i=j\+1\}^\{k\}\\lambda\_\{i\}w\_\{i\}^\{2\}=−wjwk⋅\(α\[k\]−α\[j\]\)\+\(wj\+wk\)⋅\(β\[k\]−β\[j\]\)−\(γ\[k\]−γ\[j\]\)\\displaystyle=\-w\_\{j\}w\_\{k\}\\cdot\(\\alpha\[k\]\-\\alpha\[j\]\)\+\(w\_\{j\}\+w\_\{k\}\)\\cdot\(\\beta\[k\]\-\\beta\[j\]\)\-\(\\gamma\[k\]\-\\gamma\[j\]\)

##### Runtime\.

SortingwwtakesO\(dlog⁡d\)O\(d\\log d\)time\. Then, the prefix sumsα,β,γ\\alpha,\\beta,\\gammacan be computed inO\(d\)O\(d\)time once the vectorwwis sorted\. Thus, the total pre\-processing time isO\(dlog⁡d\)O\(d\\log d\)\. Computing a specific entryC\[j,k\]C\[j,k\]can be done withO\(1\)O\(1\)accesses towwand the prefix sumsα,β,γ\\alpha,\\beta,\\gamma\. ∎

### D\.1Exact Algorithms

In Appendix K of\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\], the authors note that optimizing the weighted MSE objective can be solved using a slight variant of the dynamic programming and matrix\-search based algorithm presented in their paper\. The crux of the improvement in their result comes from the total monotonicity of matrixCCthat allows a more efficient computation of the dynamic program\.

The survey paper of\[[17](https://arxiv.org/html/2606.00289#bib.bib17)\]outlines many results from the11\-dimensionalkk\-means clustering literature, which turn out to immediately give algorithms for the problem of vector quantization with error measured by the weighted MSE objective\. For example, it turns out that the exact same dynamic programming and matrix\-searching solution of Corollary[D\.6](https://arxiv.org/html/2606.00289#A4.Thmlemma6)of\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]was actually first published by\[[30](https://arxiv.org/html/2606.00289#bib.bib30)\], in the setting of 1Dkk\-means\. Formally, this implies the following about objective𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}\.

###### Corollary D\.6\(\[[30](https://arxiv.org/html/2606.00289#bib.bib30)\], Adapted Algorithm 1 of\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]\)\.

There exists an algorithm such that for anyw∈ℝdw\\in\\mathbb\{R\}^\{d\}, input distribution𝒳\\mathcal\{X\}, and target quantization set sizes∈ℕs\\in\\mathbb\{N\}, returns a quantization setQQsuch that\|Q\|=s\|Q\|=sand𝖠𝖣𝖵𝒳⁡\(w,Q\)=𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)=\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\. The algorithm has runtimeO\(dlog⁡d\+d⋅s\)O\(d\\log d\+d\\cdot s\)and usesO\(ds\)O\(ds\)space\.

More generally, there is a spectrum of theoretical results that complete the state\-of\-the\-art optimality profile in differing regimes ofkkanddd\. In particular, the results outlined in the survey paper of\[[17](https://arxiv.org/html/2606.00289#bib.bib17)\]can be applied to any problem whose structure can be shown to satisfy the Concave Monge property \(Definition[A\.4](https://arxiv.org/html/2606.00289#A1.Thmdefinition4)\) \(including but not limited to,11\-dimensionalkk\-means clustering,11\-dimensionalkk\-medians clustering, and𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\)\.

More specifically, the algorithms outlined in\[[17](https://arxiv.org/html/2606.00289#bib.bib17)\]solve the following problem\. LetGGbe a complete, directed acyclic graph onddvertices with weightsA∈ℝd×dA\\in\\mathbb\{R\}^\{d\\times d\}that satisfy the Concave Monge property \(Definition[A\.4](https://arxiv.org/html/2606.00289#A1.Thmdefinition4)\)\. Then, given any two verticesi,j∈\[d\]i,j\\in\[d\]and integerkk, we wish to find the minimum weight length\-kkpath betweeniiandjjinGG\.

\[[3](https://arxiv.org/html/2606.00289#bib.bib3)\]gives \(i\) an algorithm with runtimeO\(dklog⁡d\)O\(d\\sqrt\{k\\log d\}\), and \(ii\) an algorithm with runtimeO\(dlog⁡Δ\)O\(d\\log\\Delta\)whereΔ:=maxi,jA\[i,j\]−mini,jA\[i,j\]\\Delta\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\_\{i,j\}A\[i,j\]\-\\min\_\{i,j\}A\[i,j\]\. Whenk=Ω\(log⁡d\)k=\\Omega\(\\log d\),\[[26](https://arxiv.org/html/2606.00289#bib.bib26)\]gives an algorithm with runtimed2O\(log⁡log⁡dlog⁡k\)d\\smash\{2\}^\{O\(\\sqrt\{\\log\\log d\\log k\}\)\}\. At a high level, these algorithms work by solving the following*regularized*version of the problem

minℓ,\{i1,…,iℓ\}⊆\[d\]∑j=1ℓ−1C\[ij,ij\+1\]\+τℓ\\displaystyle\\min\_\{\\ell,\\\{i\_\{1\},\\ldots,i\_\{\\ell\}\\\}\\subseteq\[d\]\}\\sum\_\{j=1\}^\{\\ell\-1\}C\[i\_\{j\},i\_\{j\+1\}\]\+\\tau\\ellwhere the task is to find an integerℓ\\elland a path of lengthℓ\\ellthat minimizes the regularized objective function \(i\.e\. each edge has additional costτ\\tau\)\. In doing so, the results of\[[3](https://arxiv.org/html/2606.00289#bib.bib3),[26](https://arxiv.org/html/2606.00289#bib.bib26)\]use ideas and results from\[[29](https://arxiv.org/html/2606.00289#bib.bib29)\]that solves the unregularized version \(still assuming the weights are Concave Monge\)\.

Notice that by Lemma[D\.3](https://arxiv.org/html/2606.00289#A4.Thmlemma3), applying these algorithms to weightsC\[i,j\]C\[i,j\]directly gives a solution optimizing𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}, implying the following results\.

###### Corollary D\.7\(\[[3](https://arxiv.org/html/2606.00289#bib.bib3)\]\)\.

There exist algorithms such that for anyw∈ℝdw\\in\\mathbb\{R\}^\{d\}, input distribution𝒳\\mathcal\{X\}, and target quantization set sizes∈ℕs\\in\\mathbb\{N\}, return a quantization setQQsuch that\|Q\|=s\|Q\|=sand𝖠𝖣𝖵𝒳⁡\(w,Q\)=𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)=\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\. The algorithms have runtime \(i\)O\(dslog⁡d\)O\(d\\sqrt\{s\\log d\}\)and \(ii\)O\(dlog⁡Δ\)O\(d\\log\\Delta\)whereΔ:=maxi,jC\[i,j\]−mini,jC\[i,j\]\\Delta\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\_\{i,j\}C\[i,j\]\-\\min\_\{i,j\}C\[i,j\]\.

###### Corollary D\.8\(\[[26](https://arxiv.org/html/2606.00289#bib.bib26)\]\)\.

There exists an algorithm such that for anyw∈ℝdw\\in\\mathbb\{R\}^\{d\}, input distribution𝒳\\mathcal\{X\}, and target quantization set sizes=Ω\(log⁡d\)s=\\Omega\(\\log d\), returns a quantization setQQsuch that\|Q\|=s\|Q\|=sand𝖠𝖣𝖵𝒳⁡\(w,Q\)=𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)=\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\. The algorithm has runtimed2O\(log⁡log⁡dlog⁡s\)d2^\{O\(\\sqrt\{\\log\\log d\\log s\}\)\}\.

This summarizes the state\-of\-the\-art of exact algorithms for𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\.

### D\.2Approximation Algorithms

The work of\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]provides a bi\-criteria approximation algorithm for the*unweighted*MSE objective\. In particular, their algorithm returns a quantization set of size2s−22s\-2whose quality is an additive approximation to that of the optimal\. It is easy to see, following their analysis, that the same algorithm, analyzed under the*weighted*MSE objective results in the following weak additive approximation guarantee\.

###### Corollary D\.9\(Lemma 6\.1 of\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]\)\.

There exist algorithms such that for anyw∈ℝdw\\in\\mathbb\{R\}^\{d\}, input distribution𝒳\\mathcal\{X\}, and target quantization set sizes∈ℕs\\in\\mathbb\{N\}, returns a quantization setQQsuch that\|Q\|=2s−2\|Q\|=2s\-2and𝖠𝖣𝖵𝒳⁡\(w,Q\)≤𝖠𝖣𝖵𝒳⁡\(w,s\)\+∑i=1dλi⋅Δ2/m2\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\\leq\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\+\\smash\{\\sum\}\_\{i=1\}^\{d\}\\lambda\_\{i\}\\cdot\\Delta^\{2\}/m^\{2\}\. Here,Δ:=maxi,j∈\[d\]wi−wj\\Delta\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\_\{i,j\\in\[d\]\}w\_\{i\}\-w\_\{j\}\. The algorithm has runtimeO\(d\+ms\)O\(d\+ms\)\.

In this section, we develop approximation algorithms for optimizing the𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(weighted MSE\) objective\. We first give anss\-approximation, then improve it to a\(1\+ε\)\(1\+\\varepsilon\)\-approximation\. Along the way, we observe that the main ideas from solutions to𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}and𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(in particular, building data\-dependent coresets\) can be used to develop a much stronger approximation algorithm for quantization under the unweighted MSE objective\. That result is detailed in Appendix[E](https://arxiv.org/html/2606.00289#A5)\.

#### D\.2\.1Anss\-approximation

Here, we give anss\-approximation algorithm by defining a new intermediate objective function that \(loosely speaking\) “interpolates” between𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}and𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\. We then show that the optimum of this new*mixed*objective is anss\-approximation to that of𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\. In particular, consider a quantization setQ=\{wi1,…,wi\|Q\|\}⊆wQ=\\\{w\_\{i\_\{1\}\},\\ldots,w\_\{i\_\{\|Q\|\}\}\\\}\\subseteq wwherewi1≤…≤wi\|Q\|w\_\{i\_\{1\}\}\\leq\\ldots\\leq w\_\{i\_\{\|Q\|\}\}\.

We now define the objective

𝖬𝗂𝗑𝖣𝖵𝒳\(w,s\):=minQ⊂w:\|Q\|≤smaxj∈\[s−1\]C\[ij,ij\+1\]\\displaystyle\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\_\{Q\\subset w\\mathrel\{\\mathop\{\\ordinarycolon\}\}\|Q\|\\leq s\}\\max\_\{j\\in\[s\-1\]\}C\[i\_\{j\},i\_\{j\+1\}\]\(2\)which seeks to find the quantization setQQ\(lying entirely insideww\) that minimizes the maximum sum of the variances in the intervals between adjacent quantization points ofQQ\. We also define a shorthand to denote the objective value of a fixed quantization setQQ:

𝖬𝗂𝗑𝖣𝖵𝒳\(w,Q\):=maxj∈\[\|Q\|−1\]C\[ij,ij\+1\]\\displaystyle\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,Q\)\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\_\{j\\in\[\|Q\|\-1\]\}C\[i\_\{j\},i\_\{j\+1\}\]We now prove the following relationship between𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}and𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\.

###### Lemma D\.10\.

Fors≥2s\\geq 2,𝖠𝖣𝖵𝒳⁡\(w,s\)/s≤𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,s\)≤𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)/s\\leq\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\leq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)

###### Proof\.

Notice that for any quantization setQ=\{wi1,…,wis\}⊆wQ=\\\{w\_\{i\_\{1\}\},\\ldots,w\_\{i\_\{s\}\}\\\}\\subseteq wof sizess, it is the case that

𝖠𝖣𝖵𝒳⁡\(w,s\)≤𝖠𝖣𝖵𝒳⁡\(w,Q\)=∑j∈\[s−1\]C\[ij,ij\+1\]≤\|Q\|⋅𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,Q\)\\displaystyle\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\leq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q\)=\\sum\_\{j\\in\[s\-1\]\}C\[i\_\{j\},i\_\{j\+1\}\]\\leq\|Q\|\\cdot\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,Q\)where all steps follow by the definitions of𝖠𝖣𝖵𝒳\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}and𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\. LetQ𝖬𝗂𝗑𝖣𝖵𝒳∗⊆wQ^\{\\ast\}\_\{\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\}\\subseteq wbe such that\|Q𝖬𝗂𝗑𝖣𝖵𝒳∗\|=s\|Q^\{\\ast\}\_\{\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\}\|=sand𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,Q𝖬𝗂𝗑𝖣𝖵𝒳∗\)=𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,Q^\{\\ast\}\_\{\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\}\)=\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\)\. Then, instantiating the above withQ𝖬𝗂𝗑𝖣𝖵𝒳∗Q^\{\\ast\}\_\{\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\}, we conclude that𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,s\)≥𝖠𝖣𝖵𝒳⁡\(w,s\)/s\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\geq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)/s\. It remains to show that𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,s\)≤𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\leq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)\. LetQ𝖠𝖣𝖵𝒳∗Q\_\{\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\}^\{\\ast\}be such that\|Q𝖠𝖣𝖵𝒳∗\|=s\|Q\_\{\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\}^\{\\ast\}\|=sand𝖠𝖣𝖵𝒳⁡\(w,Q𝖠𝖣𝖵∗\)=𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q\_\{\\operatorname\{\\mathsf\{ADV\}\}\}^\{\\ast\}\)=\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)\. It is clear that

𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,s\)≤𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,Q𝖠𝖣𝖵∗\)≤𝖠𝖣𝖵𝒳⁡\(w,Q𝖠𝖣𝖵∗\)=𝖠𝖣𝖵𝒳⁡\(w,s\)\\displaystyle\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\leq\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,Q\_\{\\operatorname\{\\mathsf\{ADV\}\}\}^\{\\ast\}\)\\leq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q\_\{\\operatorname\{\\mathsf\{ADV\}\}\}^\{\\ast\}\)=\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)where again all steps follow by definition of the objectives\. ∎

Lemma[D\.10](https://arxiv.org/html/2606.00289#A4.Thmlemma10)suggests a natural approximation algorithm for𝖠𝖣𝖵𝒳\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\. In particular, an exact solution to𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}immediately gives anss\-approximation to𝖠𝖣𝖵𝒳\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\.

Input:

w∈ℝdw\\in\\mathbb\{R\}^\{d\},

s∈ℕs\\in\\mathbb\{N\},

λ∈ℝd\\lambda\\in\\mathbb\{R\}^\{d\}where

λi=𝐄𝒙∼𝒳\[𝒙i2\]\\lambda\_\{i\}=\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}^\{2\}\]for

i∈\[d\]i\\in\[d\]
Output:Quantization set

QQ
1

2Sort

wwand compute prefix sums

α,β,γ\\alpha,\\beta,\\gamma
3Initialize

k↑←d2k^\{\\scriptscriptstyle\\uparrow\}\\leftarrow d^\{2\}and

k↓←0k^\{\\scriptscriptstyle\\downarrow\}\\leftarrow 0
4Initialize

Q∗←∅Q^\{\\ast\}\\leftarrow\\emptyset
5

61ex

7while*k↑≥k↓k^\{\\scriptscriptstyle\\uparrow\}\\geq k^\{\\scriptscriptstyle\\downarrow\}*do

8

k←⌊\(k↑\+k↓\)/2⌋k\\leftarrow\\lfloor\(k^\{\\scriptscriptstyle\\uparrow\}\+k^\{\\scriptscriptstyle\\downarrow\}\)/2\\rfloor
9

v←Sorted\-Selection\(α,β,γ,k\)v\\leftarrow\\texttt\{Sorted\-Selection\}\(\\alpha,\\beta,\\gamma,k\)
10

111ex

⊳\\trianglerightCheck whether objective valuevvis possible withssquantization points

12Initialize

Q←\{w1,wd\}Q\\leftarrow\\\{w\_\{1\},w\_\{d\}\\\}and

i←1i\\leftarrow 1
13for*j=2,…,dj=2,\\ldots,d*do

14if*C\[i,j\]\>vC\[i,j\]\>v*then

15

Q←Q∪\{wj\}Q\\leftarrow Q\\cup\\\{w\_\{j\}\\\}
16Set

i←ji\\leftarrow j
17

18if*\|Q\|≤s\|Q\|\\leq s*then

19

k↑←k−1k^\{\\scriptscriptstyle\\uparrow\}\\leftarrow k\-1
20

Q∗←QQ^\{\\ast\}\\leftarrow Q
21else

22

k↓←k\+1k^\{\\scriptscriptstyle\\downarrow\}\\leftarrow k\+1
23

return

Q∗Q^\{\\ast\}

Algorithm 2Exact Algorithm for𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}###### Theorem 1\.

There exists an algorithm such that for anyw∈ℝdw\\in\\mathbb\{R\}^\{d\}, input distribution𝒳\\mathcal\{X\}, and target quantization set sizes∈ℕs\\in\\mathbb\{N\}, returns a quantization setQQsuch that\|Q\|=s\|Q\|=sand𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,Q\)≤s⋅𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,Q\)\\leq s\\cdot\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\)\. The runtime of the algorithm isO\(dlog⁡d\)O\(d\\log d\)\.

###### Proof\.

To produce anss\-approximation to𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}without relaxing the target quantization set size, we will use Algorithm[2](https://arxiv.org/html/2606.00289#alg2)to exactly solve the𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}objective\.

Observe that the true optimal cost𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\)is of the formC\[ij,ik\]C\[i\_\{j\},i\_\{k\}\]for somej,k∈\[d\]j,k\\in\[d\]\. Algorithm[2](https://arxiv.org/html/2606.00289#alg2)binary searches over theO\(d2\)O\(d^\{2\}\)many possible objective values, using the linear time checking algorithmSorted\-Selectionfrom Lemma[A\.1](https://arxiv.org/html/2606.00289#A1.Thmlemma1)to direct the binary search\. The fact that matrixCCis sorted is proven as Lemma[D\.4](https://arxiv.org/html/2606.00289#A4.Thmlemma4)in Section[G](https://arxiv.org/html/2606.00289#A7)\. It remains to prove the correctness of the checking step \(Lines[2](https://arxiv.org/html/2606.00289#alg2)\-[2](https://arxiv.org/html/2606.00289#alg2)\)\. A given objective valuev∈ℝv\\in\\mathbb\{R\}is achievable if there exists an arrangement of quantization pointsQ⊂wQ\\subset wfor𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,Q\)≤v\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,Q\)\\leq v\. Algorithm[2](https://arxiv.org/html/2606.00289#alg2)greedily constructs a quantization set by performing a linear scan through the sortedwwand only adding a quantization point once the sum of variances exceedsvv\. It is trivial to see that if𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,s\)≤v\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\leq v, then the greedily constructed quantization setQGreedyQ\_\{\\text\{Greedy\}\}must have\|QGreedy\|≤s\|Q\_\{\\text\{Greedy\}\}\|\\leq sand𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,QGreedy\)≤v\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,Q\_\{\\text\{Greedy\}\}\)\\leq v\.

By construction,QGreedyQ\_\{\\text\{Greedy\}\}is such that𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,QGreedy\)≤v\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,Q\_\{\\text\{Greedy\}\}\)\\leq v\. Therefore, if\|QGreedy\|≤s\|Q\_\{\\text\{Greedy\}\}\|\\leq s, objective valuevvis achievable\. Otherwise, it is not\. This check in lines[2](https://arxiv.org/html/2606.00289#alg2)\-[2](https://arxiv.org/html/2606.00289#alg2)directs the binary search\.

##### Runtime\.

SortingwwtakesO\(dlog⁡d\)O\(d\\log d\)time\. Computing the prefix sumsα,β,γ\\alpha,\\beta,\\gammatakesO\(dlog⁡d\)O\(d\\log d\)time as per Lemma[D\.5](https://arxiv.org/html/2606.00289#A4.Thmlemma5)\. The while loop is performing binary search overO\(d2\)O\(d^\{2\}\)many values and is thus only runO\(log⁡d\)O\(\\log d\)times\. Thekk\-selection algorithm takes timeO\(d\)O\(d\)as per Lemma[A\.1](https://arxiv.org/html/2606.00289#A1.Thmlemma1)\(note that the matrixCCis never explicitly written out by the algorithm\. It is implicitly defined by the prefix sumsα,β,γ\\alpha,\\beta,\\gammawhich allowO\(1\)O\(1\)access time to an entry ofCCas per Lemma[D\.5](https://arxiv.org/html/2606.00289#A4.Thmlemma5)\)\. The*check*logic in Lines[2](https://arxiv.org/html/2606.00289#alg2)\-[2](https://arxiv.org/html/2606.00289#alg2)takeO\(d\)O\(d\)time\. Therefore, the total algorithm runtime isO\(dlog⁡d\)O\(d\\log d\)\. ∎

#### D\.2\.2A\(1\+ε\)\(1\+\\varepsilon\)\-approximation

Using thess\-approximation from Algorithm[2](https://arxiv.org/html/2606.00289#alg2), we can get a\(1\+ε\)\(1\+\\varepsilon\)\-approximation algorithm with efficient runtime\. At a high level, the idea is to first obtain anss\-approximation to the objective value, then solve a suitable rounded version of the original instance using theO\(dlog⁡Δ\)O\(d\\log\\Delta\)time algorithm by\[[3](https://arxiv.org/html/2606.00289#bib.bib3)\]\(Corollary[D\.7](https://arxiv.org/html/2606.00289#A4.Thmlemma7)\)\. We begin by giving a primer on this algorithm\. This discussion closely mirrors that in Section 3 of\[[17](https://arxiv.org/html/2606.00289#bib.bib17)\]\. Recall that the problem at hand is to optimize the*regularized*objective

minℓ,\{i1,…,iℓ\}⊆\[d\]∑j=1ℓ−1C\[ij,ij\+1\]\+τℓ\\displaystyle\\min\_\{\\ell,\\\{i\_\{1\},\\ldots,i\_\{\\ell\}\\\}\\subseteq\[d\]\}\\sum\_\{j=1\}^\{\\ell\-1\}C\[i\_\{j\},i\_\{j\+1\}\]\+\\tau\\ellwhich corresponds to choosing the lightest path in a complete DAG with weights given byCCand costτ\\taufor each node in the path\. Observe that the unregularized version is simply optimizing𝖠𝖣𝖵𝒳\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\. Settingτ=0\\tau=0means that there is no cost associated with choosing extra nodes, so the optimal path is to include every node, resulting in an objective value of0\. Now consider settingτ≥optd−1\\tau\\geq\\mathrm\{opt\}\_\{d\-1\}where

optd−1=min\{i1,…,id−1\}⊆\[d\]∑j=1ℓ−1C\[ij,ij\+1\]\\displaystyle\\mathrm\{opt\}\_\{d\-1\}=\\min\_\{\\\{i\_\{1\},\\ldots,i\_\{d\-1\}\\\}\\subseteq\[d\]\}\\sum\_\{j=1\}^\{\\ell\-1\}C\[i\_\{j\},i\_\{j\+1\}\]is equivalent to𝖠𝖣𝖵𝒳⁡\(w,d−1\)\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,d\-1\)\. For these values ofτ\\tau, it is better to construct the optimald−1d\-1length path than construct addlength path\. This is because the extra additiveτ\\taucost of using theddth node outweighs the reduction ofC\[i,j\]C\[i,j\]costs when usingddnodes\. In this way, we find*critical points*for the setting ofτ\\tauat which the optimal path is of lengthkk\. That isτk=optk−optk\+1\\tau\_\{k\}=\\mathrm\{opt\}\_\{k\}\-\\mathrm\{opt\}\_\{k\+1\}ensures that the optimal path is of lengthkk\. It turns out that0=τd≤τd−1≤…≤τ10=\\tau\_\{d\}\\leq\\tau\_\{d\-1\}\\leq\\ldots\\leq\\tau\_\{1\}\(see\[[3](https://arxiv.org/html/2606.00289#bib.bib3),[17](https://arxiv.org/html/2606.00289#bib.bib17)\]for proofs and details\)\. TheO\(dlog⁡Δ\)O\(d\\log\\Delta\)algorithm then proceeds by binary searching for the correct setting ofτ\\tau; it checks whether a particular setting ofτ\\tauyields an optimal solution of lengthssusing theO\(d\)O\(d\)time algorithm of\[[3](https://arxiv.org/html/2606.00289#bib.bib3),[26](https://arxiv.org/html/2606.00289#bib.bib26)\]which finds shortest paths on DAGs with concave monge weights\. In particular,\[[3](https://arxiv.org/html/2606.00289#bib.bib3),[26](https://arxiv.org/html/2606.00289#bib.bib26)\]gives three types of algorithms: \(1\)ShortestPathDAG\-Minwhich returns the shortest path with optimal cost \(2\)ShortestPathDAG\-Maxwhich returns the longest path with optimal cost and \(3\)ShortestPathDAG\(k\)\\texttt\{ShortestPathDAG\}\(k\)which returns the length\-kkpath with optimal cost \(if it exists\)\. All have runtimeO\(d\)O\(d\)\. Running and checking these algorithms allows us to direct the binary search\.

Algorithm[3](https://arxiv.org/html/2606.00289#alg3)first runs thess\-approximation Algorithm[2](https://arxiv.org/html/2606.00289#alg2)and obtains𝖠𝖣𝖵𝒳⁡\(w,s\)/s≤v≤𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)/s\\leq v\\leq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)\(the objective valuevvcan be computed from the returned quantization setQQin linear time trivially\)\. We then create*new*rounded weights for the original instance that has optimal value close to the original, but has a neat structure that is more amenable to solve efficiently\. Algorithm[3](https://arxiv.org/html/2606.00289#alg3)then exactly solves this new instance\. In particular, the new instance is given by rounding every entry ofC\[i,j\]C\[i,j\]up to the nearest multiple ofεv/s\\varepsilon v/s\. This ensures that the optimal solution has objective value that is close to that of the original instance\. But now, we have the added benefit thatoptk\\mathrm\{opt\}\_\{k\}is always a multiple ofεv/s\\varepsilon v/s, and therefore, so are the critical points forτ\\tau\. This allows us to binary search for the correct setting ofτ\\taumuch more efficiently\. Observe that we do not explicitly write down the newC~\[i,j\]\\tilde\{C\}\[i,j\], instead we perform the rounding on\-the\-fly whenever the matrix is queried\.

Input:

w∈ℝdw\\in\\mathbb\{R\}^\{d\},

s∈ℕs\\in\\mathbb\{N\},

ε\>0\\varepsilon\>0,

λ∈ℝd\\lambda\\in\\mathbb\{R\}^\{d\}where

λi=𝐄𝒙∼𝒳\[𝒙i2\]\\lambda\_\{i\}=\\mathop\{\{\\bf E\}\\/\}\_\{\\mathit\{\\boldsymbol\{x\}\}\\sim\\mathcal\{X\}\}\[\\mathit\{\\boldsymbol\{x\}\}\_\{i\}^\{2\}\]for

i∈\[d\]i\\in\[d\]
Output:Quantization set

QQ
1

2

v←v\\leftarrowAlgorithm[2](https://arxiv.org/html/2606.00289#alg2)

\(w,s,λ\)\(w,s,\\lambda\)
3Sort

wwand compute prefix sums

α,β,γ\\alpha,\\beta,\\gamma
4Initialize

τ↑←⌈s2\(1\+ε\)/ε⌉\\tau^\{\\scriptscriptstyle\\uparrow\}\\leftarrow\\lceil s^\{2\}\(1\+\\varepsilon\)/\\varepsilon\\rceil,

τ↓←0\\tau^\{\\scriptscriptstyle\\downarrow\}\\leftarrow 0, and

Q←∅Q\\leftarrow\\emptyset
5

61exwhile*τ↑≥τ↓\\tau^\{\\scriptscriptstyle\\uparrow\}\\geq\\tau^\{\\scriptscriptstyle\\downarrow\}*do

7

τ←⌊\(τ↑\+τ↓\)/2⌋\\tau\\leftarrow\\lfloor\(\\tau^\{\\scriptscriptstyle\\uparrow\}\+\\tau^\{\\scriptscriptstyle\\downarrow\}\)/2\\rfloor
8

kmin←ShortestPathDAG\-Min\(C~,τεv/s\)k\_\{\\textsf\{min\}\}\\leftarrow\\texttt\{ShortestPathDAG\-Min\}\(\\tilde\{C\},\\tau\\varepsilon v/s\)
9

kmax←ShortestPathDAG\-Max\(C~,τεv/s\)k\_\{\\textsf\{max\}\}\\leftarrow\\texttt\{ShortestPathDAG\-Max\}\(\\tilde\{C\},\\tau\\varepsilon v/s\)
10

11if*k*min*≤s≤k*max*k\_\{\\textsf\{min\}\}\\leq s\\leq k\_\{\\textsf\{max\}\}*then

12Return

ShortestPathDAG\(C~,τεv/s,s\)\\texttt\{ShortestPathDAG\}\(\\tilde\{C\},\\tau\\varepsilon v/s,s\)
13else if*s<k*min*s<k\_\{\\textsf\{min\}\}*then

14

τ↑←τ−1\\tau^\{\\scriptscriptstyle\\uparrow\}\\leftarrow\\tau\-1
15else

16

τ↓←τ\+1\\tau^\{\\scriptscriptstyle\\downarrow\}\\leftarrow\\tau\+1

Algorithm 3\(1\+ε\)\(1\+\\varepsilon\)\-Approximation Algorithm for𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\\ADVApproxAlg

###### Proof\.

The algorithm is described in Algorithm[3](https://arxiv.org/html/2606.00289#alg3)\. We first note that Algorithm[2](https://arxiv.org/html/2606.00289#alg2)returns a quantization setQapproxQ\_\{\\text\{approx\}\}with𝖠𝖣𝖵𝒳⁡\(w,s\)/s≤𝖠𝖣𝖵𝒳⁡\(w,Qapprox\)≤𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)/s\\leq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q\_\{\\text\{approx\}\}\)\\leq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)by the guarantees of Theorem[1](https://arxiv.org/html/2606.00289#Thmtheorem1)\. The objective valuev=𝖠𝖣𝖵𝒳⁡\(w,Qapprox\)v=\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q\_\{\\text\{approx\}\}\)can be computed from the returned quantization set in linear time trivially \(Line[3](https://arxiv.org/html/2606.00289#alg3)\)\.

Recall thatC~\\tilde\{C\}is the matrixCCwith all entries rounded up to the nearest multiple ofεv/s\\varepsilon v/s\. MatrixC~\\tilde\{C\}is implicitly defined, just likeCC\. Any query made toC~\\tilde\{C\}in Algorithm[3](https://arxiv.org/html/2606.00289#alg3)is implicitly performing anO\(1\)O\(1\)time query toCCusing the prefix sumsα,β,γ\\alpha,\\beta,\\gammaand then performing anO\(1\)O\(1\)rounding step on the fly\.C~\\tilde\{C\}defines a reweighting of the original instance\. This means thatτk=opt~k−opt~k\+1\\tau\_\{k\}=\\tilde\{\\mathrm\{opt\}\}\_\{k\}\-\\tilde\{\\mathrm\{opt\}\}\_\{k\+1\}for the instance with weightsC~\\tilde\{C\}must also be a multiple ofεv/s\\varepsilon v/sfor allk∈\[d\]k\\in\[d\]\. Therefore, we can perform our binary search for the correct value ofτ\\tauonly over multiples ofεv/s\\varepsilon v/s\.

LetQ∗Q^\{\\ast\}denote a quantization set with\|Q∗\|=s\|Q^\{\\ast\}\|=sand𝖠𝖣𝖵𝒳⁡\(w,Q∗\)=𝖠𝖣𝖵𝒳⁡\(w,s\)≥𝖠𝖣𝖵𝒳⁡\(w,Qapprox\)=v\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q^\{\\ast\}\)=\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\geq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q\_\{\\text\{approx\}\}\)=v\. Notice then that

𝖠𝖣𝖵𝒳~\(w,Q∗\)≤𝖠𝖣𝖵𝒳⁡\(w,Q∗\)\+s⋅εvs≤\(1\+ε\)⋅𝖠𝖣𝖵𝒳⁡\(w,Q∗\)\\displaystyle\\widetilde\{\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\}\(w,Q^\{\\ast\}\)\\leq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q^\{\\ast\}\)\+s\\cdot\\frac\{\\varepsilon v\}\{s\}\\leq\(1\+\\varepsilon\)\\cdot\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q^\{\\ast\}\)\(3\)Where𝖠𝖣𝖵𝒳~\\widetilde\{\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\}denotes the𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}objective evaluated on weightsC~\\tilde\{C\}\. So, the optimal solution to𝖠𝖣𝖵𝒳~\(w,s\)\\widetilde\{\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\}\(w,s\)must have objective value at most\(1\+ε\)⋅𝖠𝖣𝖵𝒳⁡\(w,Q∗\)\(1\+\\varepsilon\)\\cdot\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q^\{\\ast\}\)\. Simultaneously, it is clear that for an arbitrary quantization setQQwith\|Q\|=s\|Q\|=s, we have

𝖠𝖣𝖵𝒳⁡\(w,Q\)≤𝖠𝖣𝖵𝒳~\(w,Q\)\\displaystyle\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q\)\\leq\\widetilde\{\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\}\(w,Q\)Therefore, exactly solving the instance with weights given byC~\\tilde\{C\}gives a quantization setQQwith\|Q\|=s\|Q\|=sand𝖠𝖣𝖵𝒳⁡\(w,Q\)≤\(1\+ε\)⋅𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,Q\)\\leq\(1\+\\varepsilon\)\\cdot\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)\. It remains to show that the binary search in Lines[3](https://arxiv.org/html/2606.00289#alg3)\-[3](https://arxiv.org/html/2606.00289#alg3)correctly solve the instance on weightsC~\\tilde\{C\}exactly\. To do this, we show an upper bound on the search range forτ\\tau\.

In particular, recall that by the guarantee of Theorem[1](https://arxiv.org/html/2606.00289#Thmtheorem1),𝖠𝖣𝖵𝒳⁡\(w,s\)/s≤v≤𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)/s\\leq v\\leq\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)\. This means that𝖠𝖣𝖵𝒳⁡\(w,s\)≤sv\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\leq sv\. As shown by \(3\),𝖠𝖣𝖵𝒳~\(w,s\)≤\(1\+ε\)⋅𝖠𝖣𝖵𝒳⁡\(w,s\)≤\(1\+ε\)sv\\widetilde\{\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\}\(w,s\)\\leq\(1\+\\varepsilon\)\\cdot\\operatorname\{\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\}\(w,s\)\\leq\(1\+\\varepsilon\)sv\. Thus,τs=opt~s−opt~s\+1≤opt~s≤\(1\+ε\)sv\\tau\_\{s\}=\\tilde\{\\mathrm\{opt\}\}\_\{s\}\-\\tilde\{\\mathrm\{opt\}\}\_\{s\+1\}\\leq\\tilde\{\\mathrm\{opt\}\}\_\{s\}\\leq\(1\+\\varepsilon\)svis the upper range of our search\. This is exactly reflected in Line[3](https://arxiv.org/html/2606.00289#alg3)of Algorithm[3](https://arxiv.org/html/2606.00289#alg3)\.

##### Runtime\.

Running Algorithm[2](https://arxiv.org/html/2606.00289#alg2)and computing the objective valuevvtakesO\(dlog⁡d\)O\(d\\log d\)time by Theorem[1](https://arxiv.org/html/2606.00289#Thmtheorem1)\. Sortingwwand computing the prefix sums takesO\(dlog⁡d\)O\(d\\log d\)time as well by Lemma[D\.5](https://arxiv.org/html/2606.00289#A4.Thmlemma5)\. The binary search \(lines[3](https://arxiv.org/html/2606.00289#alg3)\-[3](https://arxiv.org/html/2606.00289#alg3)\) searches overO\(s2/ε\)O\(s^\{2\}/\\varepsilon\)many values\. Each iteration of the binary search runs aShortestPathDAGalgorithmO\(1\)O\(1\)times\. Therefore, the entire binary search portion of Algorithm[3](https://arxiv.org/html/2606.00289#alg3)has runtimeO\(dlog⁡\(s/ε\)\)O\(d\\log\(s/\\varepsilon\)\)\. The total runtime of Algorithm[3](https://arxiv.org/html/2606.00289#alg3)is thenO\(dlog⁡d\+dlog⁡\(s/ε\)\)=O\(dlog⁡\(d/ε\)\)O\(d\\log d\+d\\log\(s/\\varepsilon\)\)=O\(d\\log\(d/\\varepsilon\)\)\. ∎

### D\.3Practical Algorithms

Our practical algorithms build off theWilberalgorithm of\[[17](https://arxiv.org/html/2606.00289#bib.bib17)\]\(and our implementations build off their open\-source code as well\)\. TheWilberalgorithm proceeds by searching for a correct Lagrangian multiplierτ\\tauon which to solve a relaxed \(orregularized\) version of the problem; see Section[D\.1](https://arxiv.org/html/2606.00289#A4.SS1)\.

To improve the runtime of vanillaWilber, which searches forτ\\tauusing a method the authors of\[[17](https://arxiv.org/html/2606.00289#bib.bib17)\]call interpolation search, we reduce the search space needed by employing approximation algorithms to get a rough estimate of the cost\.

In particular,\[[17](https://arxiv.org/html/2606.00289#bib.bib17)\]show that the optimal value ofτ\\tauoccurs atτ=opts−opts\+1\\tau=\\mathrm\{opt\}\_\{s\}\-\\mathrm\{opt\}\_\{s\+1\}, whereopts\\mathrm\{opt\}\_\{s\}is the cost \(of the optimal𝖠𝖣𝖵\\operatorname\{\\mathsf\{ADV\}\}solution\) withssquantization values andopts\+1\\mathrm\{opt\}\_\{s\+1\}the cost withs\+1s\+1\. This immediately suggests two ways of reducing the search space and thus runtime of the algorithm:

1. 1\.Run a fast, but potentially looser, approximation algorithm, and use this to upper boundτ\\tauat some valueU=𝖠𝖣𝖵⁡\(w,Q\)U=\\operatorname\{\\mathsf\{ADV\}\}\(w,Q\), wherewwis the input vector andQQis the set returned by the approximation\. Then, search the range\[0,U\]\[0,U\]forτ\\tau\. This is thefast approximationtechnique\.
2. 2\.Run a slower, but more accurate, approximation algorithm and obtain estimatesv^s,v^s\+1\\widehat\{v\}\_\{s\},\\widehat\{v\}\_\{s\+1\}foropts,opts\+1\\mathrm\{opt\}\_\{s\},\\mathrm\{opt\}\_\{s\+1\}, respectively\. Then, search the range\[\(v^s−v^s\+1\)/2,2\(v^−v^s\+1\)\]\[\(\\widehat\{v\}\_\{s\}\-\\widehat\{v\}\_\{s\+1\}\)/2,2\(\\widehat\{v\}\-\\widehat\{v\}\_\{s\+1\}\)\]\(where the factors 2 are parameters which can be increased or decreased depending on the quality of the approximation\)\. This is theaccurate approximationtechnique\.

When using the fast approximation technique, we always constructQQas the optimal solution for𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(see Section[D\.2\.1](https://arxiv.org/html/2606.00289#A4.SS2.SSS1)\), as it is very fast to compute and provably anss\-approximation \(Lemma[D\.10](https://arxiv.org/html/2606.00289#A4.Thmlemma10)\)\. The hope of using the accurate approximation technique is that the additional runtime of the approximation algorithm will be offset by a much smaller search space; for this we use both a faster implementation of approximationQUIVER\(as discussed in Section[D\.3\.1](https://arxiv.org/html/2606.00289#A4.SS3.SSS1)\) and a new approximation algorithmMixApprox\.

MixApproxproceeds similarly to approximateQUIVER: given a parametermm, we construct a subset𝒞⊆w\\mathcal\{C\}\\subseteq wof sizemmand find the optimalssvalues to select from this subset𝒞\\mathcal\{C\}\.101010When solving on this subset, we do not recurse, and instead always employ the fast approximation technique with the estimate from𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\.The difference lies in howCCis constructed: while approximateQUIVERuniformly spacesCCacross the range\[min⁡\(w\),max⁡\(w\)\]\[\\min\(w\),\\max\(w\)\], inMixApprox, we set𝒞\\mathcal\{C\}to be the optimal solution to𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}overwwwithmmquantization points\. Thisadaptiveapproach to constructing𝒞\\mathcal\{C\}gives much better approximations, allowing us to use a smaller value ofmmyet still obtain comparable approximation ratio\. In the accurate approximation technique, the approximation algorithm is only run once, while an exact solver is run twice on the set returned by the approximation algorithm \(to estimateopts\\mathrm\{opt\}\_\{s\}andopts\+1\\mathrm\{opt\}\_\{s\+1\}\), so a smaller value ofmmreduces the runtime of these calls\.

We compare the fast approximation technique with the accurate approximation technique, using bothMixApproxand a faster implementation of approximateQUIVER\. Due to the robust nature ofMixApproxand the ability to run with smaller valuemm, usingMixApproxas the approximation algorithm gives the most reliable performance across data distributions \(see the supplementary information for a CSV file of performance evaluations over a range of distributions\)\. However, there is still a cost of running the algorithm for𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}rather than simply uniformly spacing points, so we expect on some data, especially those which are quite uniform, using \(improved\) approximateQUIVERas the estimator will lead to better performance\.

Table 1:Runtime ofWilberin milliseconds \(ms\) with different search methods, across vectors of sizeddsampled fromLogNormal\(0,1\)\\text\{LogNormal\}\(0,1\)ands=64s=64, averaged across 10 trials\. Interp\. search is the standard interpolation search method outlined in\[[17](https://arxiv.org/html/2606.00289#bib.bib17)\],𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}is the fast approximation technique with solving𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\},MixApproxis the accurate approximation technique usingMixApprox, and Imp\. approxQUIVERis the accurate approximation technique using our improved implementation of approximateQUIVER\. ForMixApprox, we usem=4sm=4s, and for approximateQUIVER, we usem=200sm=200s\.#### D\.3\.1Improved Approximate QUIVER

Using our exact algorithms, we are able to also speed up approximateQUIVERfrom\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]\. Usually,mmis not too much larger thanss, and so for this special case, using the accurate approximation technique described above is not faster than the fast approximation technique\. So, we warm\-startWilberwith the solution cost of𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}; see Figure[4](https://arxiv.org/html/2606.00289#S4.F4)for a runtime comparsion\.

One may be interested in usingMixApproxas an approximation algorithm, since it can obtain very good approximations with small values ofmm\. Unfortunately, the cost of computing𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}usually negates the cost of increasingmm, except on very skewed data\. This is in contrast to our exact algorithms, where the additional call of the exact solver \(solving for not just a quantization set of sizessbut also one of sizes\+1s\+1\) allowsMixApproxto be the more performant choice\.

## Appendix EAn Improved Approximation Algorithm for Unweighted MSE

Alongside the exact algorithm for weighted \(and therefore also unweighted\) MSE given by Corollary[D\.6](https://arxiv.org/html/2606.00289#A4.Thmlemma6),\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]also present the following approximation algorithm forunweightedMSE\.

###### Corollary E\.1\(Lemma 6\.1 of\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]\)\.

There exists an algorithm with runtimeO\(d\+ms\)O\(d\+ms\)that returns a quantization setQQsuch that\|Q\|=2s−2\|Q\|=2s\-2and𝖬𝖲𝖤⁡\(w,Q\)≤𝖬𝖲𝖤⁡\(w,s\)\+dΔ2/m2\\operatorname\{\\mathsf\{MSE\}\}\(w,Q\)\\leq\\operatorname\{\\mathsf\{MSE\}\}\(w,s\)\+d\\Delta^\{2\}/m^\{2\}\. Here,Δ:=maxi,j∈\[d\]wi−wj\\Delta\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\_\{i,j\\in\[d\]\}w\_\{i\}\-w\_\{j\}\.

The approximation algorithm of Corollary[E\.1](https://arxiv.org/html/2606.00289#A5.Thmlemma1)leaves much to be desired\. It gives a bicriteria approximation: relaxing the reported quantization set size while guaranteeing only a weak additive approximation to the optimal mean squared error\. In this section, we outline a new approximation algorithm for optimizing unweighted MSE under the standard stochastic quantization rounding distribution\. We give a newdata\-dependentbicriteria algorithm with stronger approximation guarantees than that of Corollary[E\.1](https://arxiv.org/html/2606.00289#A5.Thmlemma1)\.

### E\.1An Improved Bicriteria Approximation Algorithm

At a high level the bicriteria approximation algorithm of Lemma[E\.1](https://arxiv.org/html/2606.00289#A5.Thmlemma1)works by considering a candidate set of quantization points given bymmpoints uniformly spaced along the range ofww\. It then solves exactly on this candidate set to choose the best2s−22s\-2points\. Algorithm[4](https://arxiv.org/html/2606.00289#alg4)improves upon this by choosing the candidate quantization points in a data\-dependent fashion\. We employ thess\-center clustering algorithm of Lemma[A\.3](https://arxiv.org/html/2606.00289#A1.Thmlemma3)to construct the candidate*coreset*, subdivide these clustered regions evenly, then exactly solve on this new candidate set\.

The correctness of Algorithm[4](https://arxiv.org/html/2606.00289#alg4)relies on a connection between the unweighted𝖬𝖲𝖤\\operatorname\{\\mathsf\{MSE\}\}and maximum variance when rounding according to the standard stochastic quantization distribution\. In particular, lets∈ℕs\\in\\mathbb\{N\}be the target quantization set size, and consider quantization setQ𝖬𝖲𝖤Q\_\{\\operatorname\{\\mathsf\{MSE\}\}\}of sizessthat minimizes the mean\-squared error

𝖬𝖲𝖤⁡\(𝒘^,w\)=𝐄\[‖𝒘^−w‖22\]=∑i∈\[d\]𝐕𝐚𝐫\[𝒘^i\]\.\\displaystyle\\operatorname\{\\mathsf\{MSE\}\}\(\\widehat\{\\boldsymbol\{w\}\},w\)=\\mathop\{\{\\bf E\}\\/\}\\left\[\\mathinner\{\\\!\\left\\lVert\\widehat\{\\boldsymbol\{w\}\}\-w\\right\\rVert\}\_\{2\}^\{2\}\\right\]=\\sum\_\{i\\in\[d\]\}\\mathop\{\{\\bf Var\}\\/\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]\.Then, consider quantization setQMaxVarQ\_\{\\textsf\{MaxVar\}\}of sizessthat minimizesmaxi∈\[d\]𝐕𝐚𝐫\[𝒘^i\]\\max\_\{i\\in\[d\]\}\\mathop\{\{\\bf Var\}\\/\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]\. It is immediate that

maxi∈\[d\]𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,QMaxVar\)\[𝒘^i\]≤∑i∈\[d\]𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q𝖬𝖲𝖤\)\[𝒘^i\]\\displaystyle\\max\_\{i\\in\[d\]\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\_\{\\textsf\{MaxVar\}\}\)\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]\\leq\\sum\_\{i\\in\[d\]\}\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\_\{\\operatorname\{\\mathsf\{MSE\}\}\}\)\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]Recall that this can simply be written as𝖬𝖣𝖵⁡\(w,s\)≤𝖬𝖲𝖤⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq\\operatorname\{\\mathsf\{MSE\}\}\(w,s\)\. Concretely, this relation will allow us to use a “worst\-case" clustering algorithm \(ss\-center clustering\) and connect back to the unweighted𝖬𝖲𝖤\\operatorname\{\\mathsf\{MSE\}\}\.

We note that Algorithms[2](https://arxiv.org/html/2606.00289#alg2)and[3](https://arxiv.org/html/2606.00289#alg3)for𝖬𝗂𝗑𝖣𝖵𝒳\\operatorname\{\\mathsf\{MixDV\}\}\_\{\\mathcal\{X\}\}and𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}can be seen to work more generally for a candidate set of quantization pointsP=\{p1,…,pm\}⊂ℝP=\\\{p\_\{1\},\\ldots,p\_\{m\}\\\}\\subset\\mathbb\{R\}\(such thatp1≤w1p\_\{1\}\\leq w\_\{1\}andpm≥wmp\_\{m\}\\geq w\_\{m\}\)\.111111As written, Algorithms[2](https://arxiv.org/html/2606.00289#alg2)and[3](https://arxiv.org/html/2606.00289#alg3)solve the problem forP=wP=wbecause there always exists an optimal quantization setQ⊆wQ\\subseteq w\.In particular, the modified objectives are

𝖬𝗂𝗑𝖣𝖵𝒳⁡\(w,s\)\\displaystyle\\operatorname\{\\mathsf\{MixDV\}\_\{\\mathcal\{X\}\}\}\(w,s\):=minQ⊂P:\|Q\|≤smaxj∈\[s−1\]C\[ij,ij\+1\]\\displaystyle\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\_\{Q\\subset P\\mathrel\{\\mathop\{\\ordinarycolon\}\}\|Q\|\\leq s\}\\max\_\{j\\in\[s\-1\]\}C\[i\_\{j\},i\_\{j\+1\}\]𝖠𝖣𝖵𝒳⁡\(w,s\)\\displaystyle\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\):=minQ⊂P:\|Q\|≤s∑j∈\[s−1\]C\[ij,ij\+1\]\\displaystyle\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\_\{Q\\subset P\\mathrel\{\\mathop\{\\ordinarycolon\}\}\|Q\|\\leq s\}\\sum\_\{j\\in\[s\-1\]\}C\[i\_\{j\},i\_\{j\+1\}\]whereQ=\{pi1,…,pi\|Q\|\}Q=\\\{p\_\{i\_\{1\}\},\\ldots,p\_\{i\_\{\|Q\|\}\}\\\}andC\[ij,ij\+1\]:=∑wi∈\[pij,pij\+1\]\(pij\+1−wi\)\(wi−pij\)C\[i\_\{j\},i\_\{j\+1\}\]\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\sum\_\{w\_\{i\}\\in\[p\_\{i\_\{j\}\},p\_\{i\_\{j\+1\}\}\]\}\(p\_\{i\_\{j\+1\}\}\-w\_\{i\}\)\(w\_\{i\}\-p\_\{i\_\{j\}\}\)\. GivenPPandO\(\|P\|log⁡\|P\|\+dlog⁡s\)O\(\|P\|\\log\|P\|\+d\\log s\)preprocessing time, we can provideO\(1\)O\(1\)time access to matrixCC\. In particular, we wish to compute prefix sumsα,β,γ∈ℝ\|P\|\\alpha,\\beta,\\gamma\\in\\mathbb\{R\}^\{\|P\|\}

α\[j\]=∑wi∈\[p1,pj\]1β\[j\]=∑wi∈\[p1,pj\]wiγ\[j\]=∑wi∈\[p1,pj\]wi2\\displaystyle\\alpha\[j\]=\\sum\_\{w\_\{i\}\\in\[p\_\{1\},p\_\{j\}\]\}1\\quad\\quad\\quad\\quad\\beta\[j\]=\\sum\_\{w\_\{i\}\\in\[p\_\{1\},p\_\{j\}\]\}w\_\{i\}\\quad\\quad\\quad\\quad\\gamma\[j\]=\\sum\_\{w\_\{i\}\\in\[p\_\{1\},p\_\{j\}\]\}w\_\{i\}^\{2\}
This can be achieved by first sortingPP, then for eachi∈\[d\]i\\in\[d\]finding the uniquej∈\[\|P\|\]j\\in\[\|P\|\]for whichwi∈\[pj,pj\+1\]w\_\{i\}\\in\[p\_\{j\},p\_\{j\+1\}\]\(which be done inO\(dlog⁡\|P\|\)O\(d\\log\|P\|\)time using binary search\)\. We then constructA,B,ΓA,B,\\Gammain each bucket

A\[j\]=∑wi∈\[pj,pj\+1\]1B\[j\]=∑wi∈\[pj,pj\+1\]wiΓ\[j\]=∑wi∈\[pj,pj\+1\]wi2\\displaystyle A\[j\]=\\sum\_\{w\_\{i\}\\in\[p\_\{j\},p\_\{j\+1\}\]\}1\\quad\\quad\\quad\\quad B\[j\]=\\sum\_\{w\_\{i\}\\in\[p\_\{j\},p\_\{j\+1\}\]\}w\_\{i\}\\quad\\quad\\quad\\quad\\Gamma\[j\]=\\sum\_\{w\_\{i\}\\in\[p\_\{j\},p\_\{j\+1\}\]\}w\_\{i\}^\{2\}Then,

α\[j\]=∑i=1j−1A\[i\]β\[j\]=∑i=1j−1B\[i\]γ\[j\]=∑i=1j−1Γ\[i\]\\displaystyle\\alpha\[j\]=\\sum\_\{i=1\}^\{j\-1\}A\[i\]\\quad\\quad\\quad\\beta\[j\]=\\sum\_\{i=1\}^\{j\-1\}B\[i\]\\quad\\quad\\quad\\gamma\[j\]=\\sum\_\{i=1\}^\{j\-1\}\\Gamma\[i\]\\quad\\quad\\quadThe runtimes of Algorithms[2](https://arxiv.org/html/2606.00289#alg2)and[3](https://arxiv.org/html/2606.00289#alg3)becomeO\(\|P\|log⁡\|P\|\+dlog⁡\|P\|\)O\(\|P\|\\log\|P\|\+d\\log\|P\|\)andO\(\|P\|log⁡\(\|P\|/ε\)\+dlog⁡\|P\|\)O\(\|P\|\\log\(\|P\|/\\varepsilon\)\+d\\log\|P\|\)respectively\.

The above approach works for any setPP\. In Algorithm[4](https://arxiv.org/html/2606.00289#alg4), we will use a setPPwith special structure which allow us to construct the prefix sums faster, inO\(\|P\|\+dlog⁡s\)O\(\|P\|\+d\\log s\)time; see the proof of Theorem[2](https://arxiv.org/html/2606.00289#Thmtheorem2)\.

Input:

w∈ℝdw\\in\\mathbb\{R\}^\{d\},

s∈ℕs\\in\\mathbb\{N\},

ε\>0\\varepsilon\>0
Output:Quantization set

QQ
1

2

𝒞←\\mathcal\{C\}\\leftarrowssclusters froms\-Center\-Clustering\(w,s\)\(w,s\)

3Let

pimaxp\_\{i\}^\{\\textsf\{max\}\}and

piminp\_\{i\}^\{\\textsf\{min\}\}be the max and min elements in

CiC\_\{i\}for all

i∈\[s\]i\\in\[s\]
4Initialize

P←∅P\\leftarrow\\emptyset
5for*i∈\[s\]i\\in\[s\]*do

6for*j∈\[4d/ε\]j\\in\[\\sqrt\{4d/\\varepsilon\}\]*do

7

P←P∪\{\(pimax−pimin\)⋅ε/4d⋅j\}P\\leftarrow P\\cup\\\{\(p\_\{i\}^\{\\textsf\{max\}\}\-p\_\{i\}^\{\\textsf\{min\}\}\)\\cdot\\sqrt\{\\varepsilon/4d\}\\cdot j\\\}
Return Algorithm[3](https://arxiv.org/html/2606.00289#alg3)run on

w,P,s,εw,P,s,\\varepsilonwith uniform weights

Algorithm 4Data\-Dependent Bicriteria Algorithm for Unweighted𝖬𝖲𝖤⁡\(w,s\)\\operatorname\{\\mathsf\{MSE\}\}\(w,s\)The proof of Theorem[2](https://arxiv.org/html/2606.00289#Thmtheorem2)closely follows that of Lemma 6\.1 in\[[4](https://arxiv.org/html/2606.00289#bib.bib4)\]\. The key difference is in analyzing the quality of the data\-dependent coreset and its implications for the ultimate approximation ratio\.

###### Theorem 2\.

There exists an algorithm such that givenw∈ℝdw\\in\\mathbb\{R\}^\{d\}, target quantization set sizes∈ℕs\\in\\mathbb\{N\}, and parameterε\>0\\varepsilon\>0returns a quantization setQ⊂ℝQ\\subset\\mathbb\{R\}such that\|Q\|=2s−2\|Q\|=2s\-2and𝖬𝖲𝖤⁡\(w,Q\)≤\(1\+ε\)𝖬𝖲𝖤⁡\(w,s\)\\operatorname\{\\mathsf\{MSE\}\}\(w,Q\)\\leq\(1\+\\varepsilon\)\\operatorname\{\\mathsf\{MSE\}\}\(w,s\)\. The runtime of the algorithm isO\(dlog⁡s\+sd/εlog⁡\(d/ε\)\)\\smash\{O\}\(d\\log s\+s\\sqrt\{d/\\varepsilon\}\\log\(d/\\varepsilon\)\)\.

###### Proof\.

We will show the existence of a quantization setQ⊆PQ\\subseteq Pof size2s−22s\-2such that𝖬𝖲𝖤⁡\(w,Q\)≤\(1\+ε\)⋅𝖬𝖲𝖤⁡\(w,s\)\\operatorname\{\\mathsf\{MSE\}\}\(w,Q\)\\leq\(1\+\\varepsilon\)\\cdot\\operatorname\{\\mathsf\{MSE\}\}\(w,s\)\.

In particular, consider quantization setQ∗Q^\{\\ast\}of sizesswhich minimizes the mean\-squared error\. For anyq∈Q∗q\\in Q^\{\\ast\}, letq↓:=min\{p∈P:p≥q\}q^\{\\scriptscriptstyle\\downarrow\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\\\{p\\in P\\mathrel\{\\mathop\{\\ordinarycolon\}\}p\\geq q\\\}andq↑:=max\{p∈P:p≤q\}q^\{\\scriptscriptstyle\\uparrow\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\\\{p\\in P\\mathrel\{\\mathop\{\\ordinarycolon\}\}p\\leq q\\\}\. The solutionQ:=\{q↑,q↓:q∈Q\}Q\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\\{q^\{\\scriptscriptstyle\\uparrow\},q^\{\\scriptscriptstyle\\downarrow\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}q\\in Q\\\}is such that \(i\)\|Q\|=2s−2\|Q\|=2s\-2and \(ii\)𝖬𝖲𝖤⁡\(w,Q\)≤\(1\+ε\)𝖬𝖲𝖤⁡\(w,s\)\\operatorname\{\\mathsf\{MSE\}\}\(w,Q\)\\leq\(1\+\\varepsilon\)\\operatorname\{\\mathsf\{MSE\}\}\(w,s\)\. To see that\|Q\|=2s−2\|Q\|=2s\-2note thatw1,wd∈Q∗w\_\{1\},w\_\{d\}\\in Q^\{\\ast\}butq↓=q↑q^\{\\scriptscriptstyle\\downarrow\}=q^\{\\scriptscriptstyle\\uparrow\}for both\.

To see that𝖬𝖲𝖤⁡\(w,Q\)≤\(1\+ε\)𝖬𝖲𝖤⁡\(w,s\)\\operatorname\{\\mathsf\{MSE\}\}\(w,Q\)\\leq\(1\+\\varepsilon\)\\operatorname\{\\mathsf\{MSE\}\}\(w,s\), consider some elementwiw\_\{i\}\. There are two cases

1. 1\.First, supposewi∈\[\(wi↓\(Q∗\)\)↑,\(wi↑\(Q∗\)\)↓\]w\_\{i\}\\in\[\(w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\uparrow\},\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\downarrow\}\]\. In this case, 𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q\)\[𝒘^i\]\\displaystyle\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q\)\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]=\(\(wi↑\(Q∗\)\)↓−wi\)\(wi−\(wi↓\(Q∗\)\)↑\)\\displaystyle=\(\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\downarrow\}\-w\_\{i\}\)\(w\_\{i\}\-\(w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\uparrow\}\)≤\(wi↑\(Q∗\)−wi\)\(wi−wi↓\(Q∗\)\)=𝐕𝐚𝐫𝒘^∼𝒟𝖲𝖲𝖰⁡\(w,Q∗\)\[𝒘^i\]\\displaystyle\\leq\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\-w\_\{i\}\)\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\)=\\mathop\{\{\\bf Var\}\\/\}\_\{\\widehat\{\\boldsymbol\{w\}\}\\sim\\operatorname\{\\smash\{\\mathcal\{D\}\}^\{\\mathsf\{SSQ\}\}\}\(w,Q^\{\\ast\}\)\}\[\\widehat\{\\boldsymbol\{w\}\}\_\{i\}\]
2. 2\.Otherwise,wi∈\[\(wi↓\(Q∗\)\)↓,\(wi↓\(Q∗\)\)↑\]∪\[\(wi↑\(Q∗\)\)↓,\(wi↑\(Q∗\)\)↑\]w\_\{i\}\\in\[\(w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\downarrow\},\(w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\uparrow\}\]\\cup\[\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\downarrow\},\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\uparrow\}\]\. In this case,wiw\_\{i\}is quantized between points fromPP\. The originalss\-clustering given by the call tos\-Center\-Clusteringresults in clusters with radius≤4𝖬𝖣𝖵⁡\(w,s\)\\leq 4\\sqrt\{\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\}\. Algorithm[4](https://arxiv.org/html/2606.00289#alg4)then further subdivides and refines the granularity ofPPto be≤4𝖬𝖣𝖵⁡\(w,s\)⋅ε/4d\\leq 4\\sqrt\{\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\cdot\\varepsilon/4d\}\. Therefore,wiw\_\{i\}is quantized in an interval of length≤4𝖬𝖣𝖵⁡\(w,s\)⋅ε/4d\\leq 4\\sqrt\{\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\cdot\\varepsilon/4d\}and its variance is then≤𝖬𝖣𝖵⁡\(w,s\)⋅ε/d≤𝖠𝖣𝖵⁡\(w,s\)⋅ε/d\\leq\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\cdot\\varepsilon/d\\leq\\operatorname\{\\mathsf\{ADV\}\}\(w,s\)\\cdot\\varepsilon/d\.

See Figure[7](https://arxiv.org/html/2606.00289#A5.F7)for an illustration of the two cases\. Taking the sum of variances across allddcoordinates then shows that the resulting mean\-squared error is at most\(1\+ε\)𝖬𝖲𝖤⁡\(w,s\)\(1\+\\varepsilon\)\\operatorname\{\\mathsf\{MSE\}\}\(w,s\)\.

Therefore, any solution found by running Algorithm[3](https://arxiv.org/html/2606.00289#alg3)with candidate quantization pointsPPwill have objective value\(1\+ε\)2⋅𝖬𝖲𝖤⁡\(w,s\)\(1\+\\varepsilon\)^\{2\}\\cdot\\operatorname\{\\mathsf\{MSE\}\}\(w,s\)\. Thus, running Algorithm[4](https://arxiv.org/html/2606.00289#alg4)with accuracy parameterε′=ε/3\\varepsilon^\{\\prime\}=\\varepsilon/3results in a\(1\+ε\)\(1\+\\varepsilon\)\-approximation algorithm\.

##### Runtime\.

Finding the22\-approximate clustering usings\-Center\-ClusteringtakesO\(dlog⁡s\)O\(d\\log s\)time as per Lemma[A\.3](https://arxiv.org/html/2606.00289#A1.Thmlemma3)\. Computing the minimum and maximum of each cluster takesO\(d\)O\(d\)time and constructing the coresetPPby uniform subdivision in each cluster takesO\(sd/ε\)\\smash\{O\}\(s\\sqrt\{d/\\varepsilon\}\)time\.

Finally, note that the prefix sums can be computed in timeO\(\|P\|\+dlog⁡s\)O\(\|P\|\+d\\log s\)time \(as opposed to theO\(\|P\|log⁡\|P\|\+dlog⁡\|P\|\)O\(\|P\|\\log\|P\|\+d\\log\|P\|\)bound discussed before\) by exploiting the structure of coresetPP\. In constructingPP, we may first sort thesscenters returned bys\-Center\-Clustering\(w,s\)\(w,s\), and then uniformly subdivide\. Thus, we have constructedPPin sorted order inO\(\|P\|\+slog⁡s\)=O\(sd/ε\)O\(\|P\|\+s\\log s\)=O\(s\\sqrt\{d/\\varepsilon\}\)time\. To compute the prefix sums, for eachi∈\[d\]i\\in\[d\], we can find the uniquej∈\[s4d/ε\]j\\in\[s\\sqrt\{4d/\\varepsilon\}\]for whichwi∈\[pj,pj\+1\]w\_\{i\}\\in\[p\_\{j\},p\_\{j\+1\}\]by binary searching among thesssorted centers, then performing arithmetic to find the subdivision inside whichwiw\_\{i\}lies\. This thus improved the construction time of the prefix sums toO\(\|P\|\+dlog⁡s\)O\(\|P\|\+d\\log s\)\.

So, Algorithm[3](https://arxiv.org/html/2606.00289#alg3)runs inO\(dlog⁡s\+sd/εlog⁡\(sd/ε\)\)O\(d\\log s\+s\\sqrt\{d/\\varepsilon\}\\log\(sd/\\varepsilon\)\)time\. Therefore, the total runtime of Algorithm[4](https://arxiv.org/html/2606.00289#alg4)isO\(dlog⁡s\+sd/εlog⁡\(d/ε\)\)O\(d\\log s\+s\\sqrt\{d/\\varepsilon\}\\log\(d/\\varepsilon\)\)\. ∎

\(wi↓\(Q∗\)\)↓\(w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\downarrow\}wi↓\(Q∗\)w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\(wi↓\(Q∗\)\)↑\(w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\uparrow\}wiw\_\{i\}\(wi↑\(Q∗\)\)↓\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\downarrow\}wi↑\(Q∗\)w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\(wi↑\(Q∗\)\)↑\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\uparrow\}\(wi↓\(Q∗\)\)↓\(w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\downarrow\}wi↓\(Q∗\)w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)wiw\_\{i\}\(wi↓\(Q∗\)\)↑\(w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\uparrow\}\(wi↑\(Q∗\)\)↓\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\downarrow\}wi↑\(Q∗\)w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\(wi↑\(Q∗\)\)↑\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\ast\}\)\)^\{\\scriptscriptstyle\\uparrow\}Figure 7:Illustration of \(a\) case 1 and \(b\) case 2We remark that Algorithm[4](https://arxiv.org/html/2606.00289#alg4)is faster than running the exact algorithm on the full instance in regimes wheressandε\\varepsilonare modestly valued\. As mentioned in Appendix[A](https://arxiv.org/html/2606.00289#A1), however, the use ofs\-Center\-Clusteringas a subroutine makes Algorithm[4](https://arxiv.org/html/2606.00289#alg4)difficult to implement efficiently in practice\.

## Appendix FMaximum Directional Variance

For anyv∈ℝv\\in\\mathbb\{R\}, we define𝖬𝖣𝖵s⁡\(w,v\)\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)to be the smallest value ofssfor which𝖬𝖣𝖵⁡\(w,s\)≤v\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq v\. We first study𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}and give both exact and approximation algorithms\. We then use many of these algorithmic primitives and ideas to develop algorithms with small asymptotic runtime and guaranteed\(1\+ε\)\(1\+\\varepsilon\)\-approximations for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}\.

### F\.1Exact Algorithms for𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}

We first give a simple greedy exact algorithm for𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}that runs in timeO\(d\)O\(d\)on sorted vectors\.

Input:Sorted vector

w∈ℝdw\\in\\mathbb\{R\}^\{d\}so that

w1≤…≤wdw\_\{1\}\\leq\\ldots\\leq w\_\{d\}, variance parameter

vv
Output:Quantization set

QQ
1

2Initialize

Q=\{w1\}Q=\\\{w\_\{1\}\\\},

qℓ=w1q\_\{\\ell\}=w\_\{1\}and

qr=∞q\_\{r\}=\\infty
3for*i=2,…,d−1i=2,\\ldots,d\-1*do

4

qr=min⁡\{qr,wi\+v/\(wi−qℓ\)\}q\_\{r\}=\\min\\\{q\_\{r\},\\;w\_\{i\}\+v/\(w\_\{i\}\-q\_\{\\ell\}\)\\\}
5if*qr≤wiq\_\{r\}\\leq w\_\{i\}*then

6Add

qrq\_\{r\}to

QQ
7Set

qℓ=qrq\_\{\\ell\}=q\_\{r\}and update

qr=wi\+v/\(wi−qℓ\)q\_\{r\}=w\_\{i\}\+v/\(w\_\{i\}\-q\_\{\\ell\}\)
8

9if*qr<wdq\_\{r\}<w\_\{d\}*then

10Add

qrq\_\{r\}and

wdw\_\{d\}to

QQ
11

12else

13Add

wdw\_\{d\}to

QQ
14

return*QQ*

Algorithm 5Exact Algorithm for𝖬𝖣𝖵s⁡\(w,v\)\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)###### Lemma F\.1\.

Ifw∈ℝdw\\in\\mathbb\{R\}^\{d\}is sorted, Algorithm[5](https://arxiv.org/html/2606.00289#alg5)solves𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}in timeO\(d\)O\(d\)\.

###### Proof\.

First, we note that any quantization setQQmust contain the endpointsw1w\_\{1\}andwdw\_\{d\}\. Since ASQ requires that everywiw\_\{i\}have a quantization point on each side,QQmust contain someq1≤w1q\_\{1\}\\leq w\_\{1\}andq\|Q\|≥wdq\_\{\|Q\|\}\\geq w\_\{d\}\. Replacingq1q\_\{1\}withw1w\_\{1\}andq\|Q\|q\_\{\|Q\|\}withwdw\_\{d\}does not increase𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wi,Q\)\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{i\},Q\)for anyii, so the objective valuemaxi∈\[d\]⁡𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wi,Q\)\\max\_\{i\\in\[d\]\}\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{i\},Q\)is non\-increasing under this substitution\.

Second, we prove that the setQQreturned by Algorithm[5](https://arxiv.org/html/2606.00289#alg5)has objective value at mostvv\. For each weightwiw\_\{i\}, Line[5](https://arxiv.org/html/2606.00289#alg5)computes the furthest right a new quantization pointqrq\_\{r\}can be placed while maintaining𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wj,Q\)≤v\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{j\},Q\)\\leq vfor alljjwithwj<qrw\_\{j\}<q\_\{r\}\. Taking the running minimum over these upper bounds ensures the constraint is satisfied for every weightwiw\_\{i\}\.

Lastly, we prove that any quantization setQ′Q^\{\\prime\}with𝖬𝖣𝖵⁡\(w,Q′\)≤v\\operatorname\{\\mathsf\{MDV\}\}\(w,Q^\{\\prime\}\)\\leq vmust satisfy\|Q′\|≥\|Q\|\|Q^\{\\prime\}\|\\geq\|Q\|\. Suppose for contradiction that𝖬𝖣𝖵⁡\(w,Q′\)≤v\\operatorname\{\\mathsf\{MDV\}\}\(w,Q^\{\\prime\}\)\\leq vbut\|Q′\|<\|Q\|\|Q^\{\\prime\}\|<\|Q\|\. SinceQ′Q^\{\\prime\}is feasible, it must place its second point no further right thanq2q\_\{2\}, i\.e\.,q2′≤q2q\_\{2\}^\{\\prime\}\\leq q\_\{2\}; otherwise the constraint would be violated for some weight betweenq1′=w1q\_\{1\}^\{\\prime\}=w\_\{1\}andq2′q\_\{2\}^\{\\prime\}\. Applying this argument inductively givesqi′≤qiq\_\{i\}^\{\\prime\}\\leq q\_\{i\}for alli≤\|Q′\|i\\leq\|Q^\{\\prime\}\|, so the\|Q′\|<\|Q\|\|Q^\{\\prime\}\|<\|Q\|points ofQ′Q^\{\\prime\}fail to cover all weights inww, contradicting feasibility\.

The algorithm makes a single pass over the sorted weights, performingO\(1\)O\(1\)work per step, so the total runtime isO\(d\)O\(d\)\. ∎

We note that the runtime of Algorithm[5](https://arxiv.org/html/2606.00289#alg5)can be improved toO\(slog⁡\(d/s\)\)O\(s\\log\(d/s\)\); for alls≤ds\\leq d, this is an improvement over theO\(d\)O\(d\)runtime of Algorithm[5](https://arxiv.org/html/2606.00289#alg5)\. The algorithm is as follows\.

Input:Sorted

w∈ℝdw\\in\\mathbb\{R\}^\{d\}, variance parameter

vv
Output:Set

QQsuch that

𝖬𝖣𝖵⁡\(w,Q\)≤v\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq vand

\|Q\|=𝖬𝖣𝖵s⁡\(w,v\)\|Q\|=\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)
1

2Initialize

Q=\{w1\}Q=\\\{w\_\{1\}\\\}, the smallest element of

ww
3Let

q=w1q=w\_\{1\}be the max element of

QQ
4while*q<wdq<w\_\{d\}*do

5Binary search for the smallest index

iisuch that

wi≥q\+vw\_\{i\}\\geq q\+\\sqrt\{v\}
6Set

x=min⁡\{wi\+vwi−q,wi−1\+vwi−1−q\}x=\\min\\left\\\{w\_\{i\}\+\\dfrac\{v\}\{w\_\{i\}\-q\},~w\_\{i\-1\}\+\\dfrac\{v\}\{w\_\{i\-1\}\-q\}\\right\\\}
7Add

xxto

QQand set

q=xq=x
8

9Add

wdw\_\{d\}to

QQ
return*QQ*

Algorithm 6Improved Algorithm for𝖬𝖣𝖵s⁡\(w,v\)\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)###### Lemma F\.2\.

Algorithm[6](https://arxiv.org/html/2606.00289#alg6)outputs a setQQsuch that𝖬𝖣𝖵⁡\(w,Q\)≤v\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq vand\|Q\|=𝖬𝖣𝖵s⁡\(w,v\)\|Q\|=\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)\.

###### Proof\.

The goal of the algorithm is to implement the greedy strategy: given the current largest quantization pointqq, find the largest possible next pointx∗\>qx^\{\*\}\>qsuch that the variance constraint is satisfied for allwj∈\[q,x∗\]w\_\{j\}\\in\[q,x^\{\*\}\]\. Letx′=2v\+qx^\{\\prime\}=2\\sqrt\{v\}\+q; for allwk∈\[q,x′\]w\_\{k\}\\in\[q,x^\{\\prime\}\], we have

\(wk−q\)\(x′−wk\)≤\(x′−q2\)2=v\(w\_\{k\}\-q\)\(x^\{\\prime\}\-w\_\{k\}\)\\leq\\left\(\\frac\{x^\{\\prime\}\-q\}\{2\}\\right\)^\{2\}=vand thusx′x^\{\\prime\}is a feasible next quantization point andx∗≥x′x^\{\*\}\\geq x^\{\\prime\}\. Definem∗=\(x∗\+q\)/2m^\{\*\}=\(x^\{\*\}\+q\)/2to be the midpoint of\[q,x∗\]\[q,x^\{\*\}\]; sincex∗≥x′x^\{\*\}\\geq x^\{\\prime\}, it follows thatm∗≥q\+vm^\{\*\}\\geq q\+\\sqrt\{v\}as well\.

Letwiw\_\{i\}be the smallest element ofwwsuch thatwi≥q\+vw\_\{i\}\\geq q\+\\sqrt\{v\}, and consider anyyysuch that\(y\+q\)/2\>wi\(y\+q\)/2\>w\_\{i\}\. Then,

\(wi−q\)\(y−wi\)\>\(wi−q\)\(2wi−q−wi\)=\(wi−q\)2≥v\(w\_\{i\}\-q\)\(y\-w\_\{i\}\)\>\(w\_\{i\}\-q\)\(2w\_\{i\}\-q\-w\_\{i\}\)=\(w\_\{i\}\-q\)^\{2\}\\geq vsoyyis not a feasible next quantization point\. Thus,x∗∈\[x′,y\]x^\{\*\}\\in\[x^\{\\prime\},y\]andm∗∈\[wi−1,wi\]m^\{\*\}\\in\[w\_\{i\-1\},w\_\{i\}\]as\(x′\+q\)/2=q\+v\>wi−1\(x^\{\\prime\}\+q\)/2=q\+\\sqrt\{v\}\>w\_\{i\-1\}by construction ofiiand\(y\+q\)/2\>wi\(y\+q\)/2\>w\_\{i\}by definition ofyy\.

Since the functionf\(z\)=\(z−q\)\(x∗−q\)f\(z\)=\(z\-q\)\(x^\{\*\}\-q\)is concave and symmetric around\(x∗\+q\)/2=m∗\(x^\{\*\}\+q\)/2=m^\{\*\}andm∗∈\[wi−1,wi\]m^\{\*\}\\in\[w\_\{i\-1\},w\_\{i\}\], the maximum variance of allwkw\_\{k\}in\[q,x∗\]\[q,x^\{\*\}\]is attained at eitherwi−1w\_\{i\-1\}orwiw\_\{i\}\. Thus,x∗x^\{\*\}is the largest value for which bothwi−1w\_\{i\-1\}andwiw\_\{i\}have variance at mostvv, which is exactly what the algorithm computes\.

The size analysis then follows by the same argument as in the proof of Lemma[F\.1](https://arxiv.org/html/2606.00289#A6.Thmlemma1)\. ∎

###### Lemma F\.3\.

Lets=𝖬𝖣𝖵s⁡\(w,v\)s=\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)\. Then, the above algorithm runs in timeO\(slog⁡\(d/s\)\)O\(s\\log\(d/s\)\)\(assumingwwis already sorted\)\.

###### Proof\.

LetQQbe the set returned by the algorithm, which has size\|Q\|≤s\|Q\|\\leq sby Lemma[F\.2](https://arxiv.org/html/2606.00289#A6.Thmlemma2)\. For eachk∈\[s\]k\\in\[s\], let𝗂𝖽𝗑k\\mathsf\{idx\}\_\{k\}denote the indexiifound in thekk\-th iteration of the while loop of Algorithm[6](https://arxiv.org/html/2606.00289#alg6)\.

Sinceqqincreases monotonically, the indices𝗂𝖽𝗑k\\mathsf\{idx\}\_\{k\}also increase monotonically across iterations\. By using the doubling search variant of binary search starting from the previous index𝗂𝖽𝗑k−1\\mathsf\{idx\}\_\{k\-1\}, the time complexity to find𝗂𝖽𝗑k\\mathsf\{idx\}\_\{k\}isO\(log⁡\(𝗂𝖽𝗑k−𝗂𝖽𝗑k−1\)\)O\(\\log\(\\mathsf\{idx\}\_\{k\}\-\\mathsf\{idx\}\_\{k\-1\}\)\)\. So, the total runtime to constructQQisO\(log∏k=1sBk\)O\\left\(\\log\\prod\_\{k=1\}^\{s\}B\_\{k\}\\right\), whereBk=𝗂𝖽𝗑k−𝗂𝖽𝗑k−1B\_\{k\}=\\mathsf\{idx\}\_\{k\}\-\\mathsf\{idx\}\_\{k\-1\}\. By construction,∑k=1sBk≤d\\sum\_\{k=1\}^\{s\}B\_\{k\}\\leq d\. Then, by the AM\-GM inequality,

∏k=1sBk≤\(∑k=1sBks\)s=\(d/s\)s\\prod\_\{k=1\}^\{s\}B\_\{k\}\\leq\\left\(\\frac\{\\sum\_\{k=1\}^\{s\}B\_\{k\}\}\{s\}\\right\)^\{s\}=\(d/s\)^\{s\}and thus the total runtime isO\(log⁡\(\(d/s\)s\)\)=O\(slog⁡\(d/s\)\)O\(\\log\(\(d/s\)^\{s\}\)\)=O\(s\\log\(d/s\)\)\. ∎

### F\.2Some Basic Approximation Algorithms for𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}

In this section, we give algorithms which do not require theO\(dlog⁡d\)O\(d\\log d\)\-time sorting required for exact algorithms for𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\. These algorithms each run in timeO\(dlog⁡s\)O\(d\\log s\)and achieve a constant approximation factor\.

#### F\.2\.1\(2s,4v\)\(2s,4v\)\-Bicriteria viass\-Center Clustering

Input:Vector

w∈ℝdw\\in\\mathbb\{R\}^\{d\}, variance parameter

vv
Output:quantization set

QQ
1

2Initialize

C=∅C=\\emptyset
3for*i∈\[d\]i\\in\[d\]*do

4if*for allp∈Cp\\in C,\|wi−p\|\>2v\|w\_\{i\}\-p\|\>2\\sqrt\{v\}*then

5Add

wiw\_\{i\}to

CC
6

7Let

QQbe the set which contains

p−2vp\-2\\sqrt\{v\}and

p\+2vp\+2\\sqrt\{v\}for all

p∈Cp\\in C
return*QQ*

Algorithm 7\(2s,4v\)\(2s,4v\)\-Bicriteria Approximation Algorithm###### Lemma F\.4\.

Givenw∈ℝdw\\in\\mathbb\{R\}^\{d\}andv∈ℝv\\in\\mathbb\{R\}, lets=𝖬𝖣𝖵s⁡\(w,v\)s=\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)\. Then, the above algorithm outputs a setQQof size at most2s2ssuch that𝖬𝖣𝖵⁡\(w,Q\)≤4v\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq 4v\. Moreover, with an appropriate implementation of Line[7](https://arxiv.org/html/2606.00289#alg7), the algorithm runs in timeO\(dlog⁡s\)O\(d\\log s\)\.

###### Proof\.

Since𝖬𝖣𝖵⁡\(w,s\)≤v\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq vby definition of𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}, there exists a setQ∗Q^\{\*\}of sizesssuch that for allii

\(wi−wi↓\(Q∗\)\)\(wi↑\(Q∗\)−wi\)≤v\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\*\}\)\)\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\*\}\)\-w\_\{i\}\)\\leq vwhich implies that eitherwi−wi↓\(Q∗\)≤vw\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\*\}\)\\leq\\sqrt\{v\}orwi↑\(Q∗\)−wi≤vw^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\*\}\)\-w\_\{i\}\\leq\\sqrt\{v\}\. Equivalently, everywiw\_\{i\}is withinv\\sqrt\{v\}of some element ofQ∗Q^\{\*\}and the set\{wi\}i∈\[d\]\\\{w\_\{i\}\\\}\_\{i\\in\[d\]\}can be covered with a collection of at mostssintervals of length2v2\\sqrt\{v\}\.

LetI1,…,IsI\_\{1\},\\ldots,I\_\{s\}be a collection of disjoint intervals of length2v2\\sqrt\{v\}which cover\{wi\}i∈\[d\]\\\{w\_\{i\}\\\}\_\{i\\in\[d\]\}, and letCCbe as constructed in Algorithm[7](https://arxiv.org/html/2606.00289#alg7)\. By construction, for any distinctx,y∈Cx,y\\in C,\|x−y\|\>2v\|x\-y\|\>2\\sqrt\{v\}, and since eachIjI\_\{j\}has length2v2\\sqrt\{v\}, we thus must have that\|Ij∩C\|≤1\|I\_\{j\}\\cap C\|\\leq 1for allj∈\[s\]j\\in\[s\]\. Moreover, since\{Ij\}j\\\{I\_\{j\}\\\}\_\{j\}covers\{wi\}i\\\{w\_\{i\}\\\}\_\{i\}andC⊆\{wi\}iC\\subseteq\\\{w\_\{i\}\\\}\_\{i\}, we have that\|C\|=∑j∈\[s\]\|Ij∩C\|≤s\|C\|=\\sum\_\{j\\in\[s\]\}\|I\_\{j\}\\cap C\|\\leq s\. Since\|Q\|=2\|C\|\|Q\|=2\|C\|, we thus have that\|Q\|≤2s\|Q\|\\leq 2sas desired\.

By construction, for allii, there exists ap∈Cp\\in Csuch that\|wi−p\|≤2v\|w\_\{i\}\-p\|\\leq 2\\sqrt\{v\}\. Equivalently,wi∈\[p−2v,p\+2v\]w\_\{i\}\\in\[p\-2\\sqrt\{v\},p\+2\\sqrt\{v\}\]\. Since bothp−2vp\-2\\sqrt\{v\}andp\+2vp\+2\\sqrt\{v\}are inQQ, it follows that

4v≥wi↑\(Q\)−wi↓\(Q\)=\(wi−wi↓\(Q\)\)\+\(wi↑\(Q\)−wi\)\.4\\sqrt\{v\}\\geq w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q\)\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)=\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\)\+\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q\)\-w\_\{i\}\)\.For anyx,y≥0x,y\\geq 0,xy≤\(x\+y\)2/4xy\\leq\(x\+y\)^\{2\}/4, and thus it follows that

\(wi−wi↓\(Q\)\)\(wi↑\(Q\)−wi\)≤4v\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\)\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q\)\-w\_\{i\}\)\\leq 4vas desired\. ∎

### F\.3\(1\+ε\)\(1\+\\varepsilon\)\-approx for𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}in timeO\(dlog⁡\(s/ε\)\)O\(d\\log\(s/\\varepsilon\)\)

We first need a key structural lemma which can be viewed as a way of constructing a “coreset” for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}\.

###### Lemma F\.5\.

LetI1,…,IkI\_\{1\},\\ldots,I\_\{k\}be intervals, each of length at mostDD, such thatw⊆⋃j∈\[k\]Ijw\\subseteq\\bigcup\_\{j\\in\[k\]\}I\_\{j\}\. Letxxbe the vector of all endpoints of the intervalsI1,…,IkI\_\{1\},\\ldots,I\_\{k\}, and consider any setQ⊆ℝQ\\subseteq\\mathbb\{R\}\. Then,

𝖬𝖣𝖵⁡\(w,Q\)≤𝖬𝖣𝖵⁡\(x,Q\)\+14⋅D2\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq\\operatorname\{\\mathsf\{MDV\}\}\(x,Q\)\+\\frac\{1\}\{4\}\\cdot D^\{2\}

###### Proof\.

Letv=𝖬𝖣𝖵⁡\(x,Q\)v=\\operatorname\{\\mathsf\{MDV\}\}\(x,Q\), and fix some intervalIj=\[a,b\]I\_\{j\}=\[a,b\]\. We will show that𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wi,Q\)≤v\+D2/4\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{i\},Q\)\\leq v\+D^\{2\}/4for allwi∈\[a,b\]w\_\{i\}\\in\[a,b\], which then gives the lemma\.

First, suppose thatQ∩\[a,b\]≠∅Q\\cap\[a,b\]\\neq\\emptyset\. Letq1=min⁡\(Q∩\[a,b\]\)q\_\{1\}=\\min\(Q\\cap\[a,b\]\)andq2=max⁡\(Q∩\[a,b\]\)q\_\{2\}=\\max\(Q\\cap\[a,b\]\)\(noteq1=q2q\_\{1\}=q\_\{2\}if\|Q∩\[a,b\]\|=1\|Q\\cap\[a,b\]\|=1\)\. For allwi∈\[q1,q2\]w\_\{i\}\\in\[q\_\{1\},q\_\{2\}\], there exists someα∈\[0,1\]\\alpha\\in\[0,1\]such thatwi−q1≤αDw\_\{i\}\-q\_\{1\}\\leq\\alpha Dandq2−wi≤\(1−α\)Dq\_\{2\}\-w\_\{i\}\\leq\(1\-\\alpha\)D, asq2−q1≤b−a≤Dq\_\{2\}\-q\_\{1\}\\leq b\-a\\leq D\. Thus,

𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wi,Q\)≤αD⋅\(1−α\)D≤14⋅D2\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{i\},Q\)\\leq\\alpha D\\cdot\(1\-\\alpha\)D\\leq\\frac\{1\}\{4\}\\cdot D^\{2\}asα\(1−α\)≤1/4\\alpha\(1\-\\alpha\)\\leq 1/4for allα∈\[0,1\]\\alpha\\in\[0,1\]\.

So, considerwi∈\[a,q1\)w\_\{i\}\\in\[a,q\_\{1\}\)\. LetB=q1−aB=q\_\{1\}\-aandL=a−a↓\(Q\)L=a\-a^\{\\scriptscriptstyle\\downarrow\}\(Q\), wherea↓\(Q\)a^\{\\scriptscriptstyle\\downarrow\}\(Q\)is the largest element ofQQwhich is at mostaa\. By assumption onQQ,BL=𝖵𝖺𝗋𝖲𝖲𝖰⁡\(a,Q\)≤vBL=\\operatorname\{\\mathsf\{VarSSQ\}\}\(a,Q\)\\leq v\. For allwi∈\[a,q1\)w\_\{i\}\\in\[a,q\_\{1\}\), by definition ofBB, there exists someα∈\[0,1\)\\alpha\\in\[0,1\)such thatwi=a\+αBw\_\{i\}=a\+\\alpha B\. So,

𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wi,Q\)\\displaystyle\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{i\},Q\)=\(L\+αB\)⋅\(1−α\)B\\displaystyle=\(L\+\\alpha B\)\\cdot\(1\-\\alpha\)B=\(1−α\)BL\+α\(1−α\)B2\\displaystyle=\(1\-\\alpha\)BL\+\\alpha\(1\-\\alpha\)B^\{2\}≤BL\+α\(1−α\)D2\\displaystyle\\leq BL\+\\alpha\(1\-\\alpha\)D^\{2\}≤v\+14⋅D2\\displaystyle\\leq v\+\\frac\{1\}\{4\}\\cdot D^\{2\}asBL≤vBL\\leq v,B≤DB\\leq Dandα\(1−α\)≤1/4\\alpha\(1\-\\alpha\)\\leq 1/4for allα∈\[0,1\]\\alpha\\in\[0,1\]\.

An analogous argument shows that for allwi∈\(q2,b\]w\_\{i\}\\in\(q\_\{2\},b\],𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wi,Q\)≤v\+D2/4\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{i\},Q\)\\leq v\+D^\{2\}/4, thus completing the case whereQ∩\[a,b\]≠∅Q\\cap\[a,b\]\\neq\\emptyset\.

Now, supposeQ∩\[a,b\]=∅Q\\cap\[a,b\]=\\emptyset, and letL1=a−a↓\(Q\)L\_\{1\}=a\-a^\{\\scriptscriptstyle\\downarrow\}\(Q\),L2=b↑\(Q\)−bL\_\{2\}=b^\{\\scriptscriptstyle\\uparrow\}\(Q\)\-b, whereb↑\(Q\)b^\{\\scriptscriptstyle\\uparrow\}\(Q\)is the smallest element ofQQwhich is at leastbb\. LetDj=b−aD\_\{j\}=b\-a\. Then,

\(Dj\+L1\)L2\\displaystyle\(D\_\{j\}\+L\_\{1\}\)L\_\{2\}=𝖵𝖺𝗋𝖲𝖲𝖰⁡\(b,Q\)≤v\\displaystyle=\\operatorname\{\\mathsf\{VarSSQ\}\}\(b,Q\)\\leq v\(Dj\+L2\)L1\\displaystyle\(D\_\{j\}\+L\_\{2\}\)L\_\{1\}=𝖵𝖺𝗋𝖲𝖲𝖰⁡\(a,Q\)≤v\\displaystyle=\\operatorname\{\\mathsf\{VarSSQ\}\}\(a,Q\)\\leq vTogether, this implies thatDjL\+L1L2≤vD\_\{j\}L\+L\_\{1\}L\_\{2\}\\leq v, whereL=max⁡\(L1,L2\)L=\\max\(L\_\{1\},L\_\{2\}\)\.

Fix somewi∈\[a,b\]w\_\{i\}\\in\[a,b\]and setα∈\[0,1\]\\alpha\\in\[0,1\]so thatwi=a\+αDjw\_\{i\}=a\+\\alpha D\_\{j\}\(and thuswi=b−\(1−α\)Djw\_\{i\}=b\-\(1\-\\alpha\)D\_\{j\}\)\. Then,

𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wi,Q\)\\displaystyle\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{i\},Q\)=\(αDj\+L1\)⋅\(\(1−α\)Dj\+L2\)\\displaystyle=\(\\alpha D\_\{j\}\+L\_\{1\}\)\\cdot\(\(1\-\\alpha\)D\_\{j\}\+L\_\{2\}\)=αDjL2\+\(1−α\)DjL1\+L1L2\+α\(1−α\)Dj2\\displaystyle=\\alpha D\_\{j\}L\_\{2\}\+\(1\-\\alpha\)D\_\{j\}L\_\{1\}\+L\_\{1\}L\_\{2\}\+\\alpha\(1\-\\alpha\)D\_\{j\}^\{2\}≤αDjL\+\(1−α\)DjL\+L1L2\+α\(1−α\)D2\\displaystyle\\leq\\alpha D\_\{j\}L\+\(1\-\\alpha\)D\_\{j\}L\+L\_\{1\}L\_\{2\}\+\\alpha\(1\-\\alpha\)D^\{2\}=\(DjL\+L1L2\)\+α\(1−α\)D2\\displaystyle=\(D\_\{j\}L\+L\_\{1\}L\_\{2\}\)\+\\alpha\(1\-\\alpha\)D^\{2\}≤v\+14D2\\displaystyle\\leq v\+\\frac\{1\}\{4\}D^\{2\}asDjL\+L1L2≤vD\_\{j\}L\+L\_\{1\}L\_\{2\}\\leq v,Dj≤DD\_\{j\}\\leq Dandα\(1−α\)≤1/4\\alpha\(1\-\\alpha\)\\leq 1/4\. ∎

Lemma[F\.5](https://arxiv.org/html/2606.00289#A6.Thmlemma5)suggests the following approximation algorithm: construct a small number of intervals, and run an exact algorithm for𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}on only the endpoints of those intervals\.

Input:Vector

w∈ℝdw\\in\\mathbb\{R\}^\{d\}, variance parameter

vv, error tolerance

ε\\varepsilon
Output:Set

Q⊆ℝQ\\subseteq\\mathbb\{R\}such that

𝖬𝖣𝖵⁡\(w,Q\)≤\(1\+ε\)v\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq\(1\+\\varepsilon\)vand

𝖬𝖣𝖵⁡\(w,\|Q\|\)≥v\\operatorname\{\\mathsf\{MDV\}\}\(w,\|Q\|\)\\geq v
1

2Initialize

C=∅C=\\emptyset
3Iterate through

i∈\[d\]i\\in\[d\]and add

wiw\_\{i\}to

CCif for all

c∈Cc\\in C,

\|wi−c\|\>2v\|w\_\{i\}\-c\|\>2\\sqrt\{v\}
4for*allc∈Cc\\in C*do

5Compute

Ic=\{wi∣c=argminp∈C\|wi−p\|\}I\_\{c\}=\\\{w\_\{i\}\\mid c=\\mathop\{\\mathrm\{argmin\}\}\_\{p\\in C\}\|w\_\{i\}\-p\|\\\}
6Compute

ac=min⁡\(Ic\)a\_\{c\}=\\min\(I\_\{c\}\),

bc=max⁡\(Ic\)b\_\{c\}=\\max\(I\_\{c\}\), and

Dc=bc−acD\_\{c\}=b\_\{c\}\-a\_\{c\}
7Compute subintervals

Icj=\[ac\+Dc\(j−1\)ε/2,ac\+Dcjε/2\]I\_\{c\}^\{j\}=\[a\_\{c\}\+D\_\{c\}\(j\-1\)\\sqrt\{\\varepsilon\}/2,\\;a\_\{c\}\+D\_\{c\}j\\sqrt\{\\varepsilon\}/2\]for each

j∈⌈2/ε⌉j\\in\\left\\lceil 2/\\sqrt\{\\varepsilon\}\\right\\rceil
8

9for*eachc∈C,j∈⌈2/ε⌉c\\in C,\\;j\\in\\left\\lceil 2/\\sqrt\{\\varepsilon\}\\right\\rceil*do

10Compute endpoints

ℓcj=min⁡\(w∩Icj\)\\ell\_\{c\}^\{j\}=\\min\(w\\cap I\_\{c\}^\{j\}\)and

rcj=max⁡\(w∩Icj\)r\_\{c\}^\{j\}=\\max\(w\\cap I\_\{c\}^\{j\}\)
11

12Let

xxbe a vector of all

ℓcj\\ell\_\{c\}^\{j\}and

rcjr\_\{c\}^\{j\}\. Run Algorithm[6](https://arxiv.org/html/2606.00289#alg6)on

xxand

vvto get a set

QQ
return*QQ*

Algorithm 8\(1\+ε\)\(1\+\\varepsilon\)\-Approximation Algorithm###### Lemma F\.6\.

Givenw,v,εw,v,\\varepsilon, Algorithm[8](https://arxiv.org/html/2606.00289#alg8)outputs a setQQsuch that𝖬𝖣𝖵⁡\(w,Q\)≤\(1\+ε\)v\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq\(1\+\\varepsilon\)vand\|Q\|≤𝖬𝖣𝖵s⁡\(w,v\)\|Q\|\\leq\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)\.

###### Proof\.

We first show\|Q\|≤𝖬𝖣𝖵s⁡\(w,v\)\|Q\|\\leq\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)\. By the analysis of the exact algorithm, we have\|Q\|≤𝖬𝖣𝖵s⁡\(x,v\)\|Q\|\\leq\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(x,v\)\. Furthermore, by construction,x⊆wx\\subseteq w, and so\|Q\|≤𝖬𝖣𝖵s⁡\(x,v\)≤𝖬𝖣𝖵s⁡\(w,v\)\|Q\|\\leq\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(x,v\)\\leq\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)as desired\.

By Lemma[F\.2](https://arxiv.org/html/2606.00289#A6.Thmlemma2),𝖬𝖣𝖵⁡\(x,Q\)≤v\\operatorname\{\\mathsf\{MDV\}\}\(x,Q\)\\leq v\.xxconsists of the endpoints of the set of intervals\{Icj\}\\\{I\_\{c\}^\{j\}\\\}, which coverwwby construction\. Moreover, each intervalIcjI\_\{c\}^\{j\}has length at mostεDc/2≤2εv\\sqrt\{\\varepsilon\}D\_\{c\}/2\\leq 2\\sqrt\{\\varepsilon v\}, asDc≤4vD\_\{c\}\\leq 4\\sqrt\{v\}by the construction ofCC\.

So applying Lemma[F\.5](https://arxiv.org/html/2606.00289#A6.Thmlemma5), we have that

𝖬𝖣𝖵⁡\(w,Q\)≤v\+14⋅\(2εv\)2=\(1\+ε\)v\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq v\+\\frac\{1\}\{4\}\\cdot\(2\\sqrt\{\\varepsilon v\}\)^\{2\}=\(1\+\\varepsilon\)vas desired\. ∎

###### Lemma F\.7\.

Givenw,v,εw,v,\\varepsilon, lets=𝖬𝖣𝖵s⁡\(w,v\)s=\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(w,v\)\. Then, the runtime of Algorithm[8](https://arxiv.org/html/2606.00289#alg8)is

O\(dlog⁡\(s/ε\)\)\.O\\left\(d\\log\(s/\\varepsilon\)\\right\)\.

###### Proof\.

LetC,IcC,I\_\{c\}for eachc∈Cc\\in Cbe defined as in Algorithm[8](https://arxiv.org/html/2606.00289#alg8)\. By Lemma[F\.4](https://arxiv.org/html/2606.00289#A6.Thmlemma4),\|C\|≤s\|C\|\\leq sand takes timeO\(dlog⁡s\)O\(d\\log s\)to construct\. Similarly, computing allIcI\_\{c\}can be done in timeO\(dlog⁡s\)O\(d\\log s\)by iterating through theddpoints ofwwand, for each, using binary search to find the closestc∈Cc\\in C\.

To computeac,bca\_\{c\},b\_\{c\}, we iterate through each of theddelements ofwwand maintain the min and max elements ofIcI\_\{c\}encountered so far, for allc∈Cc\\in C\. Checking whichIcI\_\{c\}a givenwiw\_\{i\}lies in takes timeO\(log⁡s\)O\(\\log s\); updating the stored min/max if needed forIcI\_\{c\}can then be done inO\(1\)O\(1\)time\. Similarly, we can compute allO\(s/ε\)O\(s/\\sqrt\{\\varepsilon\}\)valuesℓcj,rcj\\ell\_\{c\}^\{j\},r\_\{c\}^\{j\}, and thus the vectorxxin Line[8](https://arxiv.org/html/2606.00289#alg8), in timeO\(dlog⁡s\)O\(d\\log s\)\(since the subintervalsIcjI^\{j\}\_\{c\}are equally spaced inIcI\_\{c\}, for eachy∈Icy\\in I\_\{c\}, finding the subinterval in which it lies takesO\(1\)O\(1\)time\)\.

Finally, sorting and running Algorithm[6](https://arxiv.org/html/2606.00289#alg6)takes timeO\(\|x\|log⁡\(\|x\|\)\)O\(\|x\|\\log\(\|x\|\)\)\. Note that\|x\|≤d\|x\|\\leq d, since it is a subset ofww, and also\|x\|=O\(s/ε\)\|x\|=O\(s/\\sqrt\{\\varepsilon\}\), by construction\. So,O\(\|x\|log⁡\|x\|\)=O\(\|x\|log⁡\(s/ε\)\)=O\(dlog⁡\(s/ε\)\)O\(\|x\|\\log\|x\|\)=O\(\|x\|\\log\(s/\\sqrt\{\\varepsilon\}\)\)=O\(d\\log\(s/\\varepsilon\)\)\.

Thus the total time isO\(dlog⁡\(s/ε\)\)O\(d\\log\(s/\\varepsilon\)\)\. ∎

### F\.4Main results for𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}

In this section, we give algorithms for constructing a quantization set of sizesswith maximum variance approximately𝖬𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\); in other words, approximation algorithms for𝖬𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\.

Combined with algorithms from the previous section \(i\.e\. approximation algorithms for𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\), we are able to give algorithms with matching runtime ofO\(dlog⁡\(s/ε\)\)O\(d\\log\(s/\\varepsilon\)\)\.

\\MDVApproxAlg

### F\.5A 4\-approximation to𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}

An important subroutine of our algorithm for Theorem[3\.3](https://arxiv.org/html/2606.00289#S3.SS3)is a fast 4\-approximation, which we will run on a small coresetu∈ℝku\\in\\mathbb\{R\}^\{k\}, that gives a rough \(i\.e\.O\(1\)O\(1\)\-approximate\) estimate of the true optimal cost𝖬𝖣𝖵⁡\(u,s\)\\operatorname\{\\mathsf\{MDV\}\}\(u,s\)\. In this subsection, we prove the following lemma\.121212The algorithm is described as a Las Vegas algorithm, i\.e\. the runtime is probabilistic while the quality guarantee is deterministic\. This can be easily converted to a Monte Carlo algorithm with deterministic runtime and probabilistic quality by simply terminating the algorithm if it runs for too many steps\.\.

###### Lemma F\.8\.

For anyu∈ℝku\\in\\mathbb\{R\}^\{k\},s∈ℕs\\in\\mathbb\{N\}andδ∈\(0,1\)\\delta\\in\(0,1\), there exists an algorithm which outputs a 4\-approximation to the value𝖬𝖣𝖵⁡\(u,s\)\\operatorname\{\\mathsf\{MDV\}\}\(u,s\)\. Moreover, with probability at least1−δ1\-\\delta, the algorithm runs in time

O\(klog⁡k\+k⋅log3⁡k⋅log⁡\(1/δ\)\)\.O\(k\\log k\+\\sqrt\{k\}\\cdot\\log^\{3\}k\\cdot\\log\(1/\\delta\)\)\.

Our first observation is that, in order to obtain a 4\-approximation, it suffices to consider quantization sets which only include points from the input vector\.

###### Lemma F\.9\.

Consider anyu∈ℝku\\in\\mathbb\{R\}^\{k\}andQ⊂ℝQ\\subset\\mathbb\{R\}\. There exists aQ′⊂\{ui\}i∈\[k\]Q^\{\\prime\}\\subset\\\{u\_\{i\}\\\}\_\{i\\in\[k\]\}such that\|Q′\|=\|Q\|\|Q^\{\\prime\}\|=\|Q\|and𝖬𝖣𝖵⁡\(u,Q′\)≤4𝖬𝖣𝖵⁡\(u,Q\)\\operatorname\{\\mathsf\{MDV\}\}\(u,Q^\{\\prime\}\)\\leq 4\\operatorname\{\\mathsf\{MDV\}\}\(u,Q\)\.

###### Proof\.

Reindex so thatu1≤u2≤…≤uku\_\{1\}\\leq u\_\{2\}\\leq\\ldots\\leq u\_\{k\}\. ConstructQ′Q^\{\\prime\}to beQQ, but with each elementq∈Qq\\in Qreplaced by the nearest element of\{ui\}i∈\[k\]\\\{u\_\{i\}\\\}\_\{i\\in\[k\]\}toqq\(breaking ties arbitrarily\)\. By construction,\|Q′\|=\|Q\|\|Q^\{\\prime\}\|=\|Q\|\.

Consider someiisuch thatui↓\(Q\)∉Q′u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\\not\\in Q^\{\\prime\}, and letq′=ujq^\{\\prime\}=u\_\{j\}be the nearest element of\{ui\}i∈\[k\]\\\{u\_\{i\}\\\}\_\{i\\in\[k\]\}toui↓\(Q\)u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\(so,q′∈Q′q^\{\\prime\}\\in Q^\{\\prime\}\)\. Since, by construction, there are no elements of\{ui\}\\\{u\_\{i\}\\\}in\(ui↓\(Q\),q′\)\(u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\),q^\{\\prime\}\), it follows thatq′≤uiq^\{\\prime\}\\leq u\_\{i\}\.

First, supposeq′\>ui↓\(Q\)q^\{\\prime\}\>u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\. Then, asui≥q′u\_\{i\}\\geq q^\{\\prime\},ui−ui↓\(Q′\)≤ui−q′<ui−ui↓\(Q\)u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\prime\}\)\\leq u\_\{i\}\-q^\{\\prime\}<u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\.

So, now consider the case whereq′<ui↓\(Q\)q^\{\\prime\}<u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\), and letuj=q′u\_\{j\}=q^\{\\prime\}\. Then,ui↓\(Q\)u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)is closer touju\_\{j\}thanuj\+1u\_\{j\+1\}, and so

\(ui−q′\)−\(ui−ui↓\(Q\)\)=\|ui↓\(Q\)−uj\|≤\|ui↓\(Q\)−uj\+1\|=uj\+1−ui↓\(Q\)\(u\_\{i\}\-q^\{\\prime\}\)\-\(u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\)=\|u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\-u\_\{j\}\|\\leq\|u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\-u\_\{j\+1\}\|=u\_\{j\+1\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)As there are no elements of\{ui\}\\\{u\_\{i\}\\\}in\(uj,uj\+1\)=\(q′,uj\+1\)\(u\_\{j\},u\_\{j\+1\}\)=\(q^\{\\prime\},u\_\{j\+1\}\)by construction, it follows thatui≥uj\+1u\_\{i\}\\geq u\_\{j\+1\}\. So,uj\+1−ui↓\(Q\)≤ui−ui↓\(Q\)u\_\{j\+1\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\\leq u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)and it follows from the above inequality that

ui−ui↓\(Q′\)≤ui−q′≤\(ui−ui↓\(Q\)\)\+\(uj\+1−ui↓\(Q\)\)≤2\(ui−ui↓\(Q\)\)\.u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\prime\}\)\\leq u\_\{i\}\-q^\{\\prime\}\\leq\(u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\)\+\(u\_\{j\+1\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\)\\leq 2\(u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\)\.
Therefore, for allii,ui−ui↓\(Q′\)≤2\(ui−ui↓\(Q\)\)u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\prime\}\)\\leq 2\(u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\)\. An analogous argument shows that for allii,ui↑\(Q′\)−ui≤2\(ui↑\(Q\)−ui\)u^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\prime\}\)\-u\_\{i\}\\leq 2\(u^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q\)\-u\_\{i\}\), and so for allii,

\(ui−ui↓\(Q′\)\)\(ui↑\(Q′\)−ui\)≤4\(ui−ui↓\(Q\)\)\(ui↑\(Q\)−ui\)\(u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\prime\}\)\)\(u^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\prime\}\)\-u\_\{i\}\)\\leq 4\(u\_\{i\}\-u\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\)\(u^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q\)\-u\_\{i\}\)implying𝖬𝖣𝖵⁡\(u,Q′\)≤4𝖬𝖣𝖵⁡\(u,Q\)\\operatorname\{\\mathsf\{MDV\}\}\(u,Q^\{\\prime\}\)\\leq 4\\operatorname\{\\mathsf\{MDV\}\}\(u,Q\)\. ∎

#### F\.5\.1Sorted Matrices and Algorithms

Recall from Definition[A\.1](https://arxiv.org/html/2606.00289#A1.Thmdefinition1)that a sorted matrixMMis such that all rows and columns are sorted in non\-decreasing order, i\.e\. for alli,ji,j,M\[i,j\]≥M\[i\+1,j\]M\[i,j\]\\geq M\[i\+1,j\]andM\[i,j\]≤M\[i,j\+1\]M\[i,j\]\\leq M\[i,j\+1\]\.

Sorted matrices have several useful properties, one of which is that we can efficiently determine all elements of the matrix which lie in a given interval\.

###### Lemma F\.10\.

Given any sorted matrixMMof dimensionk×kk\\times kand interval\(v1,v2\)⊂ℝ\(v\_\{1\},v\_\{2\}\)\\subset\\mathbb\{R\}, there exist indicesℓ1,…,ℓk\\ell\_\{1\},\\ldots,\\ell\_\{k\}andr1,…,rkr\_\{1\},\\ldots,r\_\{k\}such that for alli∈\[k\]i\\in\[k\],M\[i,j\]∈\(v1,v2\)M\[i,j\]\\in\(v\_\{1\},v\_\{2\}\)if and only ifj∈\[ℓi,ri\]j\\in\[\\ell\_\{i\},r\_\{i\}\]\(orℓi=ri=⊥\\ell\_\{i\}=r\_\{i\}=\\botif no such elements exist\)\. Moreover, there is anO\(k\)O\(k\)time algorithm to computeℓ1,…,ℓk\\ell\_\{1\},\\ldots,\\ell\_\{k\}andr1,…,rkr\_\{1\},\\ldots,r\_\{k\}, which probesO\(k\)O\(k\)elements of the matrixMM\.

###### Proof\.

We show how to compute the valuesℓi\\ell\_\{i\}; therir\_\{i\}can be computed analogously by finding the largest element in each row which is strictly less thanv2v\_\{2\}\. The algorithm is as follows131313Note that if no element ofMMis strictly greater thanv1v\_\{1\}, then the algorithm will returnℓi=k\\ell\_\{i\}=k; this is not a problem, as the final indices will check ifM\[i,ℓi\]M\[i,\\ell\_\{i\}\]is in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)and setℓi=ri=⊥\\ell\_\{i\}=r\_\{i\}=\\botif not\.:

1. 1\.Initializej=1j=1\.
2. 2\.For eachi∈\[k\]i\\in\[k\]: 1. \(a\)Whilej≤kj\\leq kandM\[i,j\]≤v1M\[i,j\]\\leq v\_\{1\}, incrementjj\. 2. \(b\)Setℓi=j\\ell\_\{i\}=j\.

Note that the while loop incrementsjjat mostkktimes over the entire algorithm, so the total time isO\(k\)O\(k\)\(and onlyO\(k\)O\(k\)elements ofMMare probed\)\.

For correctness, fix some rowii\. By construction of the algorithm and the sortedness of the matrix, it follows thatM\[i,ℓi\]\>v1M\[i,\\ell\_\{i\}\]\>v\_\{1\}, and thus for allj≥ℓij\\geq\\ell\_\{i\},M\[i,j\]\>v1M\[i,j\]\>v\_\{1\}as well\. Suppose for contradiction that there exists somej<ℓij<\\ell\_\{i\}such thatM\[i,j\]\>v1M\[i,j\]\>v\_\{1\}\. By construction of the algorithm, we then must have hadM\[i′,j\]≤v1M\[i^\{\\prime\},j\]\\leq v\_\{1\}for some rowi′<ii^\{\\prime\}<i; however, by sortedness ofMM, we haveM\[i′,j\]≥M\[i,j\]\>v1M\[i^\{\\prime\},j\]\\geq M\[i,j\]\>v\_\{1\}, a contradiction\.

Finally, for anyiiwhereℓi=ri\\ell\_\{i\}=r\_\{i\}butM\[i,ℓi\]∉\(v1,v2\)M\[i,\\ell\_\{i\}\]\\not\\in\(v\_\{1\},v\_\{2\}\), setℓi=ri=⊥\\ell\_\{i\}=r\_\{i\}=\\bot\. ∎

Given suchℓ1,…,ℓk\\ell\_\{1\},\\ldots,\\ell\_\{k\}andr1,…,rkr\_\{1\},\\ldots,r\_\{k\}, we can also efficiently sample a uniformly random element ofMMwhich lies in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)\.

###### Lemma F\.11\.

LetMMbe a sorted matrix of dimensionk×kk\\times k, and letℓ1,…,ℓk\\ell\_\{1\},\\ldots,\\ell\_\{k\}andr1,…,rkr\_\{1\},\\ldots,r\_\{k\}be as in Lemma[F\.10](https://arxiv.org/html/2606.00289#A6.Thmlemma10)for some interval\(v1,v2\)\(v\_\{1\},v\_\{2\}\)\. WithO\(k\)O\(k\)preprocessing, there is a data structure which can output a uniformly random element ofMMwhich lies in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)in timeO\(log⁡k\)O\(\\log k\)and probes only 1 element ofMM\.

###### Proof\.

The data structure is as follows\. At construction time, first letS=\{i1,…,im\}S=\\\{i\_\{1\},\\ldots,i\_\{m\}\\\}be the \(sorted\) list ofiisuch thatℓi≠⊥\\ell\_\{i\}\\neq\\bot; i\.e\. the rows for which there exists some element ofMMin\(v1,v2\)\(v\_\{1\},v\_\{2\}\)\. For eachj∈\[m\]j\\in\[m\], computeBj=\(rij−ℓij\)\+1B\_\{j\}=\(r\_\{i\_\{j\}\}\-\\ell\_\{i\_\{j\}\}\)\+1andCij=∑h=1jBhC\_\{i\_\{j\}\}=\\sum\_\{h=1\}^\{j\}B\_\{h\}\. This preprocessing takesO\(k\)O\(k\)time by first removing the rows for whichℓi=ri=⊥\\ell\_\{i\}=r\_\{i\}=\\botand then computingCijC\_\{i\_\{j\}\}as prefix sums\.

To sample a uniformly random element ofMMin\(v1,v2\)\(v\_\{1\},v\_\{2\}\), first sample a uniformly random integer𝒉\\boldsymbol\{h\}in\[Cim\]\[C\_\{i\_\{m\}\}\]\. Then, use binary search to find the largest elementiiofSSwithCi≤𝒉C\_\{i\}\\leq\\boldsymbol\{h\}, and returnM\[i,ℓi\+\(𝒉−Ci\)\]M\[i,\\ell\_\{i\}\+\(\\boldsymbol\{h\}\-C\_\{i\}\)\]\.

Since all elements ofMMin\(v1,v2\)\(v\_\{1\},v\_\{2\}\)are exactly those in the intervals\[ℓi,ri\]\[\\ell\_\{i\},r\_\{i\}\]fori∈Si\\in S, there areCimC\_\{i\_\{m\}\}such elements and each is returned with probability1/Cim1/C\_\{i\_\{m\}\}\. The runtime follows from the fact that the binary search takes timeO\(log⁡\|S\|\)=O\(log⁡k\)O\(\\log\|S\|\)=O\(\\log k\)\. ∎

#### F\.5\.2The algorithm

Definition \(Cu\(i,j\)C\_\{u\}\(i,j\)\)\.Given a sorted vectoruuandi≤ji\\leq j, defineCu\(i,j\)C\_\{u\}\(i,j\)to be the max variance ofuhu\_\{h\}forh∈\[i,j\]h\\in\[i,j\]when quantized to the endpointsui,uju\_\{i\},u\_\{j\}; that is,Cu\(i,j\)=maxi≤h≤j⁡\(uh−ui\)\(uj−uh\)C\_\{u\}\(i,j\)=\\max\_\{i\\leq h\\leq j\}\(u\_\{h\}\-u\_\{i\}\)\(u\_\{j\}\-u\_\{h\}\)\.

###### Observation F\.12\.

LetCuC\_\{u\}be the matrix such thatCu\[i,j\]=Cu\(i,j\)C\_\{u\}\[i,j\]=C\_\{u\}\(i,j\)fori≤ji\\leq jandCu\[i,j\]=0C\_\{u\}\[i,j\]=0otherwise\. Then,CuC\_\{u\}is a sorted matrix\.

###### Proof\.

First, note that for any fixedii,Cu\(i,j\)C\_\{u\}\(i,j\)is non\-decreasing injj; similarly, for any fixedjj,Cu\(i,j\)C\_\{u\}\(i,j\)is non\-increasing inii\. Thus,Cu\[i,j\]≤Cu\[i,j\+1\]C\_\{u\}\[i,j\]\\leq C\_\{u\}\[i,j\+1\]andCu\[i,j\]≤Cu\[i−1,j\]C\_\{u\}\[i,j\]\\leq C\_\{u\}\[i\-1,j\]\. ∎

###### Observation F\.13\.

Whenuuis sorted,Cu\(i,j\)C\_\{u\}\(i,j\)can be computed inO\(log⁡k\)O\(\\log k\)time\.

###### Proof\.

Note thatCu\(i,j\)C\_\{u\}\(i,j\)is maximized at the pointuhu\_\{h\}closest to the midpoint of\[ui,uj\]\[u\_\{i\},u\_\{j\}\]:\(x−ui\)\(uj−x\)\(x\-u\_\{i\}\)\(u\_\{j\}\-x\)is increasing forx≤\(ui\+uj\)/2x\\leq\(u\_\{i\}\+u\_\{j\}\)/2and decreasing forx≥\(ui\+uj\)/2x\\geq\(u\_\{i\}\+u\_\{j\}\)/2, and is symmetric about the midpoint\. Thus, we can binary search for theuhu\_\{h\}closest to the midpoint\(ui\+uj\)/2\(u\_\{i\}\+u\_\{j\}\)/2in timeO\(log⁡k\)O\(\\log k\), and computeCu\(i,j\)C\_\{u\}\(i,j\)inO\(1\)O\(1\)time givenuhu\_\{h\}\. ∎

###### Lemma F\.14\.

For any sortedu∈ℝku\\in\\mathbb\{R\}^\{k\}andQ⊆uQ\\subseteq u, there existsi,ji,jsuch that𝖬𝖣𝖵⁡\(u,Q\)=Cu\(i,j\)\\operatorname\{\\mathsf\{MDV\}\}\(u,Q\)=C\_\{u\}\(i,j\)\.

###### Proof\.

LetQ=\{q1<q2<…<qs\}Q=\\\{q\_\{1\}<q\_\{2\}<\\ldots<q\_\{s\}\\\}, and letβi\\beta\_\{i\}be the index such thatqi=uβiq\_\{i\}=u\_\{\\beta\_\{i\}\}\. Then, by definition,

𝖬𝖣𝖵⁡\(u,Q\)=maxj∈\[s−1\]⁡maxβj≤i≤βj\+1⁡\(ui−qj\)\(qj\+1−ui\)=maxj∈\[s−1\]⁡Cu\(βj,βj\+1\)\.\\operatorname\{\\mathsf\{MDV\}\}\(u,Q\)=\\max\_\{j\\in\[s\-1\]\}\\max\_\{\\beta\_\{j\}\\leq i\\leq\\beta\_\{j\+1\}\}\(u\_\{i\}\-q\_\{j\}\)\(q\_\{j\+1\}\-u\_\{i\}\)=\\max\_\{j\\in\[s\-1\]\}C\_\{u\}\(\\beta\_\{j\},\\beta\_\{j\+1\}\)\.So, there exists somejjsuch that𝖬𝖣𝖵⁡\(u,Q\)=Cu\(βj,βj\+1\)\\operatorname\{\\mathsf\{MDV\}\}\(u,Q\)=C\_\{u\}\(\\beta\_\{j\},\\beta\_\{j\+1\}\)\. ∎

Input:Sorted

u∈ℝku\\in\\mathbb\{R\}^\{k\}, size parameter

2≤s<k2\\leq s<k, failure probability

δ\\delta
Output:Value

vvsuch that

𝖬𝖣𝖵⁡\(u,s\)≤v≤4𝖬𝖣𝖵⁡\(u,s\)\\operatorname\{\\mathsf\{MDV\}\}\(u,s\)\\leq v\\leq 4\\operatorname\{\\mathsf\{MDV\}\}\(u,s\)with probability at least

1−δ1\-\\delta
1

2Initialize

v1=0v\_\{1\}=0,

v2=maxi,j⁡\{Cu\(i,j\)\}=Cu\(1,k\)v\_\{2\}=\\max\_\{i,j\}\\\{C\_\{u\}\(i,j\)\\\}=C\_\{u\}\(1,k\)
3for*i∈\[1,2,…,log⁡k/2\]i\\in\[1,2,\\ldots,\\log k/2\]*do

4Sample and compute

2i\+2⋅log2⁡k⋅log⁡\(log⁡k/δ\)2^\{i\+2\}\\cdot\\log^\{2\}k\\cdot\\log\(\\log k/\\delta\)elements of

CuC\_\{u\}uniformly at random

5Let

v′v^\{\\prime\}be the median of the sampled elements which lie in

\(v1,v2\)\(v\_\{1\},v\_\{2\}\)
6Run Algorithm[6](https://arxiv.org/html/2606.00289#alg6)on

uuwith variance parameter

v′v^\{\\prime\}, which outputs a set of size

s′s^\{\\prime\}
7if*s′\>ss^\{\\prime\}\>s*thenset

v1=v′v\_\{1\}=v^\{\\prime\};

8elseset

v2=v′v\_\{2\}=v^\{\\prime\};

9

10

⊳\\trianglerightAt this step, the number of elements ofCuC\_\{u\}in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)isO\(k3/2\)O\(k^\{3/2\}\)w\.p\.1−δ1\-\\delta\(Lemma[F\.15](https://arxiv.org/html/2606.00289#A6.Thmlemma15)\)

11Use Lemma[F\.10](https://arxiv.org/html/2606.00289#A6.Thmlemma10)on

CuC\_\{u\}with interval

I1∗=\(v1,v2\)I\_\{1\}^\{\*\}=\(v\_\{1\},v\_\{2\}\)to compute indices

ℓ1,…,ℓk\\ell\_\{1\},\\ldots,\\ell\_\{k\}and

r1,…,rkr\_\{1\},\\ldots,r\_\{k\}
12for*i∈\[1,2,3,…,log⁡k/2\]i\\in\[1,2,3,\\ldots,\\log k/2\]*do

13Use Lemma[F\.11](https://arxiv.org/html/2606.00289#A6.Thmlemma11)to sample and compute

2i\+4⋅log2⁡k⋅log⁡\(log⁡k/δ\)2^\{i\+4\}\\cdot\\log^\{2\}k\\cdot\\log\(\\log k/\\delta\)elements of

CuC\_\{u\}with values in

I1∗I\_\{1\}^\{\*\}\.

14Let

v′v^\{\\prime\}be the median of the sampled elements which lie in

\(v1,v2\)\(v\_\{1\},v\_\{2\}\)
15Run Algorithm[6](https://arxiv.org/html/2606.00289#alg6)on

uuwith variance parameter

v′v^\{\\prime\}, which outputs a set of size

s′s^\{\\prime\}
16if*s′\>ss^\{\\prime\}\>s*thenset

v1=v′v\_\{1\}=v^\{\\prime\};

17elseset

v2=v′v\_\{2\}=v^\{\\prime\};

18

19

⊳\\trianglerightAt this step, the number of elements ofCuC\_\{u\}in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)isO\(k\)O\(k\)w\.p\.1−δ1\-\\delta\(Lemma[F\.17](https://arxiv.org/html/2606.00289#A6.Thmlemma17)\)

20Use Lemma[F\.10](https://arxiv.org/html/2606.00289#A6.Thmlemma10)on

CuC\_\{u\}with interval

I2∗=\(v1,v2\)I\_\{2\}^\{\*\}=\(v\_\{1\},v\_\{2\}\)to compute indices

ℓ1,…,ℓk\\ell\_\{1\},\\ldots,\\ell\_\{k\}and

r1,…,rkr\_\{1\},\\ldots,r\_\{k\}
21Set

𝒮\\mathcal\{S\}to be the elements of

CuC\_\{u\}in

\(v1,v2\)\(v\_\{1\},v\_\{2\}\)by computing

Cu\[i,j\]C\_\{u\}\[i,j\]for all

j∈\[k\]j\\in\[k\],

i∈\[ℓj,rj\]i\\in\[\\ell\_\{j\},r\_\{j\}\]
22Add

v2v\_\{2\}to

𝒮\\mathcal\{S\}
23Sort

𝒮\\mathcal\{S\}, and binary search over

𝒮\\mathcal\{S\}using Algorithm[6](https://arxiv.org/html/2606.00289#alg6)to find the smallest

v∗∈𝒮v^\{\*\}\\in\\mathcal\{S\}such

𝖬𝖣𝖵s⁡\(u,v∗\)≤s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(u,v^\{\*\}\)\\leq s
return*v∗v^\{\*\}*

Algorithm 94\-Approximation Algorithm for𝖬𝖣𝖵⁡\(u,s\)\\operatorname\{\\mathsf\{MDV\}\}\(u,s\)###### Lemma F\.15\.

For eachi∈\[0,1,…,log⁡k/2\]i\\in\[0,1,\\ldots,\\log k/2\], with probability at least1−2iδ/log⁡k1\-2i\\delta/\\log k, at the end of theiith iteration of the firstforloop of Algorithm[9](https://arxiv.org/html/2606.00289#alg9)there are at mostk2⋅\(1/2\+1/log⁡k\)ik^\{2\}\\cdot\(1/2\+1/\\log k\)^\{i\}elements ofCuC\_\{u\}which lie in the interval\(v1,v2\)\(v\_\{1\},v\_\{2\}\)\.

###### Proof\.

We proceed by induction onii\. The base case ofi=0i=0is trivial, asCuC\_\{u\}contains onlyk2k^\{2\}elements\. So, fix somei≥1i\\geq 1and suppose the claim holds fori−1i\-1\. Let𝒎\\boldsymbol\{m\}denote the number of elements ofCuC\_\{u\}in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)at the end of thei−1i\-1iteration, so by the inductive hypothesis, with probability at least1−2\(i−1\)δ/log⁡k1\-2\(i\-1\)\\delta/\\log k,𝒎≤k2\(1/2\+1/log⁡k\)i−1\\boldsymbol\{m\}\\leq k^\{2\}\(1/2\+1/\\log k\)^\{i\-1\}\.

If𝒎≤k2\(1/2\+1/log⁡k\)i\\boldsymbol\{m\}\\leq k^\{2\}\(1/2\+1/\\log k\)^\{i\}, then since iterationiican only shrink the interval\(v1,v2\)\(v\_\{1\},v\_\{2\}\), the count at the end of the iteration is at most𝒎≤k2\(1/2\+1/log⁡k\)i\\boldsymbol\{m\}\\leq k^\{2\}\(1/2\+1/\\log k\)^\{i\}and the claim holds\. It therefore suffices to consider the case𝒎\>k2\(1/2\+1/log⁡k\)i\\boldsymbol\{m\}\>k^\{2\}\(1/2\+1/\\log k\)^\{i\}\.

In iterationii, Algorithm[9](https://arxiv.org/html/2606.00289#alg9)drawsTi=2i\+2LT\_\{i\}=2^\{i\+2\}Lentries uniformly at random from allk\(k−1\)/2≤k2k\(k\-1\)/2\\leq k^\{2\}entries ofCuC\_\{u\}, whereL=log2⁡klog⁡\(log⁡k/δ\)L=\\log^\{2\}k\\log\(\\log k/\\delta\)\. Let𝒀\\mathit\{\\boldsymbol\{Y\}\}be the random variable of the number of these entries falling in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)\. Since𝒎/k2\>\(1/2\+1/log⁡k\)i\\boldsymbol\{m\}/k^\{2\}\>\(1/2\+1/\\log k\)^\{i\},

𝔼\[𝒀\]=Ti⋅𝒎k2\>2i\+2L⋅\(12\+1log⁡k\)i=4\(1\+2log⁡k\)iL≥4L\.\\mathbb\{E\}\[\\mathit\{\\boldsymbol\{Y\}\}\]=T\_\{i\}\\cdot\\frac\{\\boldsymbol\{m\}\}\{k^\{2\}\}\>2^\{i\+2\}L\\cdot\\left\(\\frac\{1\}\{2\}\+\\frac\{1\}\{\\log k\}\\right\)^\{i\}=4\\left\(1\+\\frac\{2\}\{\\log k\}\\right\)^\{i\}L\\geq 4L\.By the Chernoff bound,

𝐏𝐫\[𝒀<𝔼\[𝒀\]/4\]≤exp⁡\(−9𝔼\[𝒀\]32\)≤exp⁡\(−9L8\)<δlog⁡k\.\\mathop\{\{\\bf Pr\}\\/\}\\left\[\\mathit\{\\boldsymbol\{Y\}\}<\\mathbb\{E\}\[\\mathit\{\\boldsymbol\{Y\}\}\]/4\\right\]\\leq\\exp\\left\(\-\\frac\{9\\,\\mathbb\{E\}\[\\mathit\{\\boldsymbol\{Y\}\}\]\}\{32\}\\right\)\\leq\\exp\\left\(\-\\frac\{9L\}\{8\}\\right\)<\\frac\{\\delta\}\{\\log k\}\.Thus,𝒀≥L\\mathit\{\\boldsymbol\{Y\}\}\\geq Lwith probability at least1−δ/log⁡k1\-\\delta/\\log k\.

Among the𝒀\\mathit\{\\boldsymbol\{Y\}\}sampled entries in\(v1,v2\)\(v\_\{1\},v\_\{2\}\), Algorithm[9](https://arxiv.org/html/2606.00289#alg9)takes their sample median𝒗′\\boldsymbol\{v\}^\{\\prime\}\. Let𝑹1\\mathit\{\\boldsymbol\{R\}\}\_\{1\}denote the number of elements ofCuC\_\{u\}in\(v1,𝒗′\)\(v\_\{1\},\\boldsymbol\{v\}^\{\\prime\}\)and let𝑹2\\mathit\{\\boldsymbol\{R\}\}\_\{2\}be the number of elements ofCuC\_\{u\}in\(𝒗′,v2\)\(\\boldsymbol\{v\}^\{\\prime\},v\_\{2\}\)\. We will show that, conditioned on𝒀≥L\\mathit\{\\boldsymbol\{Y\}\}\\geq L, with probability at least1−δ/log⁡k1\-\\delta/\\log k, both𝑹1,𝑹2\\mathit\{\\boldsymbol\{R\}\}\_\{1\},\\mathit\{\\boldsymbol\{R\}\}\_\{2\}are at most\(1/2\+1/log⁡k\)𝒎\(1/2\+1/\\log k\)\\boldsymbol\{m\}\.

Letrrbe the largest entry ofCuC\_\{u\}with rank less than\(1/2−1/log⁡k\)𝒎\(1/2\-1/\\log k\)\\boldsymbol\{m\}\. Then, by definition, each of the𝒀≥L\\mathit\{\\boldsymbol\{Y\}\}\\geq Lsampled entries in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)is less than or equal torrwith probability at most1/2−1/log⁡k1/2\-1/\\log k\. Since𝒗′\\boldsymbol\{v\}^\{\\prime\}is the sample median, it follows that𝑹2≥\(1/2\+1/log⁡k\)𝒎\\mathit\{\\boldsymbol\{R\}\}\_\{2\}\\geq\(1/2\+1/\\log k\)\\boldsymbol\{m\}if and only if at least half the sampled elements are less than or equal torr\. So, by Hoeffding’s inequality,

𝐏𝐫\[𝑹2\>\(1/2\+1/log⁡k\)𝒎∣𝒀≥L\]≤e−2L/log2⁡k≤δ2log⁡k\.\\mathop\{\{\\bf Pr\}\\/\}\[\\mathit\{\\boldsymbol\{R\}\}\_\{2\}\>\(1/2\+1/\\log k\)\\boldsymbol\{m\}\\mid\\mathit\{\\boldsymbol\{Y\}\}\\geq L\]\\leq e^\{\-2L/\\log^\{2\}k\}\\leq\\frac\{\\delta\}\{2\\log k\}\.An analogous argument shows that𝐏𝐫\[𝑹1\>\(1/2\+1/log⁡k\)𝒎∣𝒀≥L\]≤δ/2log⁡k\\mathop\{\{\\bf Pr\}\\/\}\[\\mathit\{\\boldsymbol\{R\}\}\_\{1\}\>\(1/2\+1/\\log k\)\\boldsymbol\{m\}\\mid\\mathit\{\\boldsymbol\{Y\}\}\\geq L\]\\leq\\delta/2\\log k\. Since𝒀≥L\\mathit\{\\boldsymbol\{Y\}\}\\geq Lwith probability at least1−δ/log⁡k1\-\\delta/\\log k, a union bound gives

𝐏𝐫\[𝑹1≤\(1/2\+1/log⁡k\)𝒎and𝑹2≤\(1/2\+1/log⁡k\)𝒎\]≥1−2δ/log⁡k\\mathop\{\{\\bf Pr\}\\/\}\\bigl\[\\mathit\{\\boldsymbol\{R\}\}\_\{1\}\\leq\(1/2\+1/\\log k\)\\boldsymbol\{m\}\\text\{ and \}\\mathit\{\\boldsymbol\{R\}\}\_\{2\}\\leq\(1/2\+1/\\log k\)\\boldsymbol\{m\}\\bigr\]\\geq 1\-2\\delta/\\log k
At the end of iterationkk, the algorithm updates eitherv1v\_\{1\}orv2v\_\{2\}to𝒗′\\boldsymbol\{v\}^\{\\prime\}, so the number of points remaining is either𝑹1\\mathit\{\\boldsymbol\{R\}\}\_\{1\}or𝑹2\\mathit\{\\boldsymbol\{R\}\}\_\{2\}\. With probability1−2δ/log⁡k1\-2\\delta/\\log k, both𝑹1\\mathit\{\\boldsymbol\{R\}\}\_\{1\}and𝑹2\\mathit\{\\boldsymbol\{R\}\}\_\{2\}are less than\(1/2\+1/log⁡k\)𝒎\(1/2\+1/\\log k\)\\boldsymbol\{m\}, and by the inductive hypothesis,𝒎≤k2\(1/2\+1/log⁡k\)i−1\\boldsymbol\{m\}\\leq k^\{2\}\(1/2\+1/\\log k\)^\{i\-1\}with probability at least1−2\(i−1\)δ/log⁡k1\-2\(i\-1\)\\delta/\\log k\. So, by the union bound, the number of elements remaining in the interval at the end of iterationkkis at mostk2\(1/2\+1/log⁡k\)ik^\{2\}\(1/2\+1/\\log k\)^\{i\}with probability at least1−2\(i−1\)δ/log⁡k−2δ/log⁡k=1−2iδ/log⁡k1\-2\(i\-1\)\\delta/\\log k\-2\\delta/\\log k=1\-2i\\delta/\\log k\.∎

Lemma[F\.15](https://arxiv.org/html/2606.00289#A6.Thmlemma15)then immediately gives the following corollary\.

###### Corollary F\.16\.

At the end of the firstforloop of Algorithm[9](https://arxiv.org/html/2606.00289#alg9), with probability at least1−δ1\-\\delta, there are at most3k3/23k^\{3/2\}elements ofCuC\_\{u\}in the range\(v1,v2\)\(v\_\{1\},v\_\{2\}\)\.

###### Proof\.

The firstforloop runs forlog⁡k/2\\log k/2iterations\. Applying Lemma[F\.15](https://arxiv.org/html/2606.00289#A6.Thmlemma15)ati=log⁡k/2i=\\log k/2, we have that with probability at least1−δ1\-\\delta, the number of elements ofCuC\_\{u\}in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)is at most

k2⋅\(12\+1log⁡k\)log⁡k/2=k2⋅\(12\)log⁡k/2⋅\(1\+1log⁡k/2\)log⁡k/2≤3k3/2k^\{2\}\\cdot\\left\(\\frac\{1\}\{2\}\+\\frac\{1\}\{\\log k\}\\right\)^\{\\log k/2\}=k^\{2\}\\cdot\\left\(\\frac\{1\}\{2\}\\right\)^\{\\log k/2\}\\cdot\\left\(1\+\\frac\{1\}\{\\log k/2\}\\right\)^\{\\log k/2\}\\leq 3k^\{3/2\}for sufficiently largekk\. ∎

###### Lemma F\.17\.

Let𝒮\\mathcal\{S\}be the set constructed in Line[9](https://arxiv.org/html/2606.00289#alg9)of Algorithm[9](https://arxiv.org/html/2606.00289#alg9)\. Then, with probability at least1−2δ1\-2\\delta,\|𝒮\|=O\(k\)\|\\mathcal\{S\}\|=O\(k\)\.

###### Proof\.

By Corollary[F\.16](https://arxiv.org/html/2606.00289#A6.Thmlemma16), after the end of the firstforloop, with probability at least1−δ1\-\\delta, there are at most3k3/23k^\{3/2\}elements ofCuC\_\{u\}which lie in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)\. The algorithm of the secondforloop then samples only from these elements\. So, an analogous argument to the proofs of Lemma[F\.15](https://arxiv.org/html/2606.00289#A6.Thmlemma15)and Corollary[F\.16](https://arxiv.org/html/2606.00289#A6.Thmlemma16)but replacing thek2k^\{2\}total elements ofCuC\_\{u\}with the3k3/23k^\{3/2\}in\(v1,v2\)\(v\_\{1\},v\_\{2\}\)gives that there are at most3k3/2⋅3k−1/2=O\(k\)3k^\{3/2\}\\cdot 3k^\{\-1/2\}=O\(k\)elements of𝒮\\mathcal\{S\}with probability at least1−δ1\-\\delta\. A union bound then gives the result\. ∎

###### Lemma F\.18\.

Algorithm[9](https://arxiv.org/html/2606.00289#alg9)runs in time

O\(klog⁡k\+k⋅log3⁡k⋅log⁡\(log⁡k/δ\)\)O\\left\(k\\log k\+\\sqrt\{k\}\\cdot\\log^\{3\}k\\cdot\\log\(\\log k/\\delta\)\\right\)with probability at least1−2δ1\-2\\delta\.

###### Proof\.

Throughout the twoforloops, there arelog⁡k\\log kiterations, each of which requires T=O\(2log⁡k/2⋅log2⁡k⋅log⁡\(log⁡k/δ\)\)=O\(k⋅log2⁡k⋅log⁡\(log⁡k/δ\)\)T=O\(2^\{\\log k/2\}\\cdot\\log^\{2\}k\\cdot\\log\(\\log k/\\delta\)\)=O\(\\sqrt\{k\}\\cdot\\log^\{2\}k\\cdot\\log\(\\log k/\\delta\)\)samples, so the algorithm queriesTTelements ofCC\. Each element ofCCcan be computed in timeO\(log⁡k\)O\(\\log k\), so the total time spent computing elements ofCCisO\(Tlog⁡k\)O\(T\\log k\)\. There are also two calls to the algorithm of Lemma[F\.10](https://arxiv.org/html/2606.00289#A6.Thmlemma10), each of which involvesO\(k\)O\(k\)probes toCCandO\(k\)O\(k\)additional time, so the total time spent on these calls isO\(klog⁡k\)O\(k\\log k\)\.

By Lemma[F\.17](https://arxiv.org/html/2606.00289#A6.Thmlemma17), with probability at least1−2δ1\-2\\delta,\|𝒮\|=O\(k\)\|\\mathcal\{S\}\|=O\(k\)\. So, sorting and computing the values of𝒮\\mathcal\{S\}takes timeO\(klog⁡k\)O\(k\\log k\)\. Finally, the binary search of Line[9](https://arxiv.org/html/2606.00289#alg9)takes timeO\(klog⁡k\)O\(k\\log k\), as Algorithm[6](https://arxiv.org/html/2606.00289#alg6)has runtimeO\(slog⁡\(k/s\)\)=O\(k\)O\(s\\log\(k/s\)\)=O\(k\)\.

Combining all the steps, the total time isO\(klog⁡k\+Tlog⁡k\)=O\(klog⁡k\+k⋅log3⁡k⋅log⁡\(log⁡k/δ\)\)O\(k\\log k\+T\\log k\)=O\(k\\log k\+\\sqrt\{k\}\\cdot\\log^\{3\}k\\cdot\\log\(\\log k/\\delta\)\)\. ∎

###### Lemma F\.19\.

Algorithm[9](https://arxiv.org/html/2606.00289#alg9)returns a valuevvsuch that𝖬𝖣𝖵⁡\(u,s\)≤v≤4𝖬𝖣𝖵⁡\(u,s\)\\operatorname\{\\mathsf\{MDV\}\}\(u,s\)\\leq v\\leq 4\\operatorname\{\\mathsf\{MDV\}\}\(u,s\)\.

###### Proof\.

Letv∗v^\{\*\}be the smallest element ofCuC\_\{u\}for which𝖬𝖣𝖵s⁡\(u,v∗\)≤s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(u,v^\{\*\}\)\\leq s\. We claim that at all times throughout the execution of the algorithm,v∗∈\(v1,v2\]v^\{\*\}\\in\(v\_\{1\},v\_\{2\}\]\.

Initially,v1=0<v∗v\_\{1\}=0<v^\{\*\}141414Ass<ks<kandv2=Cu\(1,k\)≥v∗v\_\{2\}=C\_\{u\}\(1,k\)\\geq v^\{\*\}\(sinceCu\(1,k\)C\_\{u\}\(1,k\)is the maximum variance when including only the two endpointsu1,uku\_\{1\},u\_\{k\}\)\. At every step of the algorithm, the algorithm updatesv1v\_\{1\}to a valuev′v^\{\\prime\}only if𝖬𝖣𝖵s⁡\(u,v′\)\>s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(u,v^\{\\prime\}\)\>s, and thus it must be thatv∗\>v′v^\{\*\}\>v^\{\\prime\}as𝖬𝖣𝖵s⁡\(u,v∗\)≤s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(u,v^\{\*\}\)\\leq sby construction\. Similarly, the algorithm only updatesv2v\_\{2\}to a valuev′v^\{\\prime\}if𝖬𝖣𝖵s⁡\(u,v′\)≤s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(u,v^\{\\prime\}\)\\leq s\. Since the algorithm only ever considers updates tov2v\_\{2\}of the formv′=Cu\(i,j\)v^\{\\prime\}=C\_\{u\}\(i,j\)for somei,j∈\[k\]i,j\\in\[k\], andv∗v^\{\*\}is the minimum such value for which𝖬𝖣𝖵⁡\(u,v′\)≤s\\operatorname\{\\mathsf\{MDV\}\}\(u,v^\{\\prime\}\)\\leq s, it follows thatv∗≤v2v^\{\*\}\\leq v\_\{2\}as well\.

The algorithm constructs in Lines[9](https://arxiv.org/html/2606.00289#alg9)and[9](https://arxiv.org/html/2606.00289#alg9)the set𝒮\\mathcal\{S\}to be all values ofCuC\_\{u\}in\(v1,v2\]\(v\_\{1\},v\_\{2\}\]\(with the valuesv1,v2v\_\{1\},v\_\{2\}as set in the priorforloops\)\. Sincev∗∈\(v1,v2\]v^\{\*\}\\in\(v\_\{1\},v\_\{2\}\]and there existsi,ji,jsuch thatv∗=Cu\(i,j\)v^\{\*\}=C\_\{u\}\(i,j\)by construction, it follows thatv∗∈𝒮v^\{\*\}\\in\\mathcal\{S\}\. Moreover, by definition,v∗v^\{\*\}must be the smallest element of𝒮\\mathcal\{S\}for which𝖬𝖣𝖵s⁡\(u,v∗\)≤s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(u,v^\{\*\}\)\\leq s, and thus the algorithm returnsv∗v^\{\*\}\. Since𝖬𝖣𝖵s⁡\(u,v∗\)≤s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(u,v^\{\*\}\)\\leq s,v∗≥𝖬𝖣𝖵⁡\(u,s\)v^\{\*\}\\geq\\operatorname\{\\mathsf\{MDV\}\}\(u,s\)\. Finally, by Lemma[F\.9](https://arxiv.org/html/2606.00289#A6.Thmlemma9),v∗≤4𝖬𝖣𝖵⁡\(u,s\)v^\{\*\}\\leq 4\\operatorname\{\\mathsf\{MDV\}\}\(u,s\), completing the proof\. ∎

### F\.6Proving Theorem[3\.3](https://arxiv.org/html/2606.00289#S3.SS3)

The algorithm of Theorem[3\.3](https://arxiv.org/html/2606.00289#S3.SS3)now follows from combining a fastkk\-center algorithm, the 4\-approximation of Algorithm[9](https://arxiv.org/html/2606.00289#alg9), and a small modification to the\(1\+ε\)\(1\+\\varepsilon\)\-approximation for𝖬𝖣𝖵s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}from Algorithm[8](https://arxiv.org/html/2606.00289#alg8)\.

Input:

w∈ℝdw\\in\\mathbb\{R\}^\{d\}, size parameter

2≤s<d2\\leq s<d, approximation parameter

ε\\varepsilon, failure probability

δ\\delta
Output:Value

vvsuch that

𝖬𝖣𝖵⁡\(w,s\)≤v≤\(1\+ε\)𝖬𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq v\\leq\(1\+\\varepsilon\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)
1Let

CCbe a 2\-approximate

ss\-center solution over

ww, computed usings\-Center\-Clusteringof\[[10](https://arxiv.org/html/2606.00289#bib.bib10)\]

2Let

IcI\_\{c\}be the points of

wwclosest to

c∈Cc\\in C\. Compute

E=⋃c∈C\{min⁡Ic,max⁡Ic\}E=\\bigcup\_\{c\\in C\}\\\{\\min I\_\{c\},\\max I\_\{c\}\\\}
3Sort

EEand run Algorithm[9](https://arxiv.org/html/2606.00289#alg9)on

EEwith size parameter

ssand error parameter

δ\\deltato get an estimate

vapxv\_\{\\textsf\{apx\}\}
4Run Algorithm[6](https://arxiv.org/html/2606.00289#alg6)on

EEwith

vapxv\_\{\\textsf\{apx\}\}to get a set

QapxQ\_\{\\textsf\{apx\}\}, and compute

v′=𝖬𝖣𝖵⁡\(w,Qapx\)v^\{\\prime\}=\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\_\{\\textsf\{apx\}\}\)
5for*allc∈Cc\\in C*do

6Compute

ac=min⁡\(Ic\)a\_\{c\}=\\min\(I\_\{c\}\),

bc=max⁡\(Ic\)b\_\{c\}=\\max\(I\_\{c\}\), and

Dc=bc−acD\_\{c\}=b\_\{c\}\-a\_\{c\}
7Compute subintervals

Icj=\[ac\+Dc\(j−1\)ε/4,ac\+Dcjε/4\]I\_\{c\}^\{j\}=\[a\_\{c\}\+D\_\{c\}\(j\-1\)\\sqrt\{\\varepsilon\}/4,a\_\{c\}\+D\_\{c\}j\\sqrt\{\\varepsilon\}/4\]for each

j≤⌈4/ε⌉j\\leq\\left\\lceil 4/\\sqrt\{\\varepsilon\}\\right\\rceil
8

9Let

Esub=⋃c∈C,j≤⌈4/ε⌉\{min⁡Icj,max⁡Icj\}E\_\{\\textsf\{sub\}\}=\\bigcup\_\{c\\in C,j\\leq\\left\\lceil 4/\\sqrt\{\\varepsilon\}\\right\\rceil\}\\\{\\min I^\{j\}\_\{c\},\\max I^\{j\}\_\{c\}\\\}
10Sort

EsubE\_\{\\textsf\{sub\}\}
11Use binary search and Algorithm[6](https://arxiv.org/html/2606.00289#alg6)to find the smallest

i∗∈ℤi^\{\*\}\\in\\mathbb\{Z\}for which

vi=\(1\+ε/2\)i∈\[v′/9,v′\(1\+ε/2\)\]v\_\{i\}=\(1\+\\varepsilon/2\)^\{i\}\\in\[v^\{\\prime\}/9,v^\{\\prime\}\(1\+\\varepsilon/2\)\]and

𝖬𝖣𝖵s⁡\(Esub,vi\)≤s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(E\_\{\\textsf\{sub\}\},v\_\{i\}\)\\leq s
Return

vi∗⋅\(1\+ε/2\)v\_\{i^\{\*\}\}\\cdot\(1\+\\varepsilon/2\)

Algorithm 104\-Approximation Algorithm for𝖬𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)While the algorithm as written returns thevalueof an approximate solution, constructing a set with at most this value is straighforward: instead of returningvi∗\(1\+ε/2\)v\_\{i^\{\*\}\}\(1\+\\varepsilon/2\), use call Algorithm[6](https://arxiv.org/html/2606.00289#alg6)onEsubE\_\{\\textsf\{sub\}\}withvi∗v\_\{i^\{\*\}\}to obtain a setQ′Q^\{\\prime\}and returnQ′Q^\{\\prime\}\.

###### Lemma F\.20\.

Algorithm[10](https://arxiv.org/html/2606.00289#alg10)runs in time

O\(dlog⁡\(s/ε\)\+s⋅log3⁡\(s\)⋅log⁡\(1/δ\)\)O\\left\(d\\log\(s/\\varepsilon\)\+\\sqrt\{s\}\\cdot\\log^\{3\}\(s\)\\cdot\\log\(1/\\delta\)\\right\)with probability at least1−δ1\-\\delta\.

###### Proof\.

ComputingCCand allIcI\_\{c\}takes timeO\(dlog⁡s\)O\(d\\log s\), from thes\-Center\-Clusteringalgorithm of\[[10](https://arxiv.org/html/2606.00289#bib.bib10)\], and thus it takes timeO\(dlog⁡s\)O\(d\\log s\)to compute the setEE\.\|E\|≤2s\|E\|\\leq 2s, as\|C\|=s\|C\|=s, and so by Lemma[F\.18](https://arxiv.org/html/2606.00289#A6.Thmlemma18), sortingEEand running Algorithm[9](https://arxiv.org/html/2606.00289#alg9)takes timeO\(slog⁡s\+s0\.51log⁡\(1/δ\)\)O\(s\\log s\+s^\{0\.51\}\\log\(1/\\delta\)\)w\.p\.1−δ1\-\\delta\. Computing the setQapxQ\_\{\\textsf\{apx\}\}takes timeO\(s\)O\(s\)using Algorithm[6](https://arxiv.org/html/2606.00289#alg6), and computingv′=𝖬𝖣𝖵⁡\(w,Qapx\)v^\{\\prime\}=\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\_\{\\textsf\{apx\}\}\)takes timeO\(dlog⁡s\)O\(d\\log s\)as\|Qapx\|=s\|Q\_\{\\textsf\{apx\}\}\|=s\. Thus, the total runtime before theforloop isO\(dlog⁡s\+s0\.51log⁡\(1/δ\)\)O\(d\\log s\+s^\{0\.51\}\\log\(1/\\delta\)\)\.

It takes timeO\(dlog⁡\(s/ε\)\)O\(d\\log\(s/\\varepsilon\)\)to compute the subintervalsIcjI^\{j\}\_\{c\}, and the same time to computeEsubE\_\{\\textsf\{sub\}\}\. Note that\|Esub\|=O\(min⁡\(d,s/ε\)\)\|E\_\{\\textsf\{sub\}\}\|=O\(\\min\(d,s/\\sqrt\{\\varepsilon\}\)\), asEsub⊆wE\_\{\\textsf\{sub\}\}\\subseteq wand each of theO\(s/ε\)O\(s/\\sqrt\{\\varepsilon\}\)subintervals contributes at most 2 points toEsubE\_\{\\textsf\{sub\}\}\. So, sortingEsubE\_\{\\textsf\{sub\}\}takes timeO\(\|Esub\|log⁡\|Esub\|\)=O\(dlog⁡\(s/ε\)\)O\(\|E\_\{\\textsf\{sub\}\}\|\\log\|E\_\{\\textsf\{sub\}\}\|\)=O\(d\\log\(s/\\varepsilon\)\)\.

Finally, the binary search of Line[10](https://arxiv.org/html/2606.00289#alg10)considersO\(log⁡1/ε\)O\(\\log 1/\\varepsilon\)optionsviv\_\{i\}, and on each runs Algorithm[6](https://arxiv.org/html/2606.00289#alg6)\. Each call to Algorithm[6](https://arxiv.org/html/2606.00289#alg6)takes timeO\(slog⁡\(\|Esub\|/s\)\)=O\(d\)O\(s\\log\(\|E\_\{\\textsf\{sub\}\}\|/s\)\)=O\(d\), and so the total runtime of Line[10](https://arxiv.org/html/2606.00289#alg10)isO\(dlog⁡\(1/ε\)\)O\(d\\log\(1/\\varepsilon\)\)\.

Combining all the steps, the total runtime is thus

O\(dlog⁡s\+s0\.51log⁡\(1/δ\)\+dlog⁡\(s/ε\)\+dlog⁡\(1/ε\)\)=O\(dlog⁡\(s/ε\)\+s0\.51log⁡\(1/δ\)\)\.∎O\\left\(d\\log s\+s^\{0\.51\}\\log\(1/\\delta\)\+d\\log\(s/\\varepsilon\)\+d\\log\(1/\\varepsilon\)\\right\)=O\\left\(d\\log\(s/\\varepsilon\)\+s^\{0\.51\}\\log\(1/\\delta\)\\right\)\.\\qed

###### Lemma F\.21\.

Algorithm[10](https://arxiv.org/html/2606.00289#alg10)returns a valuevvsuch that𝖬𝖣𝖵⁡\(w,s\)≤v≤\(1\+ε\)𝖬𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq v\\leq\(1\+\\varepsilon\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\.

###### Proof\.

Letv∗=𝖬𝖣𝖵⁡\(w,s\)v^\{\*\}=\\operatorname\{\\mathsf\{MDV\}\}\(w,s\), and letQ∗⊂ℝQ^\{\*\}\\subset\\mathbb\{R\}be a set with𝖬𝖣𝖵⁡\(w,Q∗\)=v∗\\operatorname\{\\mathsf\{MDV\}\}\(w,Q^\{\*\}\)=v^\{\*\}\. Since𝖵𝖺𝗋𝖲𝖲𝖰⁡\(wi,Q∗\)≤v∗\\operatorname\{\\mathsf\{VarSSQ\}\}\(w\_\{i\},Q^\{\*\}\)\\leq v^\{\*\}for alli∈\[d\]i\\in\[d\], it follows that for eachwiw\_\{i\}, there exists an elementq∈Q∗q\\in Q^\{\*\}such that\|q−wi\|≤v∗\|q\-w\_\{i\}\|\\leq\\sqrt\{v^\{\*\}\}\. Thus, the optimalss\-center instance has cost at mostv∗\\sqrt\{v^\{\*\}\}, and thes\-Center\-Clusteringalgorithm of\[[10](https://arxiv.org/html/2606.00289#bib.bib10)\]returns a setCCsuch that for alli∈\[d\]i\\in\[d\], there exists ac∈Cc\\in Cwith\|c−wi\|≤2v∗\|c\-w\_\{i\}\|\\leq 2\\sqrt\{v^\{\*\}\}\. For eachc∈Cc\\in C, it then follows that the intervalIcI\_\{c\}has length at most4v∗4\\sqrt\{v^\{\*\}\}\.

LetEEbe constructed as in Line[10](https://arxiv.org/html/2606.00289#alg10)of Algorithm[10](https://arxiv.org/html/2606.00289#alg10), andQapxQ\_\{\\textsf\{apx\}\}as computed in Line[10](https://arxiv.org/html/2606.00289#alg10)\.EEconsists of the endpoints of a set of intervals which coverwwand which each have length at most4v∗4\\sqrt\{v^\{\*\}\}\. So, by Lemma[F\.5](https://arxiv.org/html/2606.00289#A6.Thmlemma5),v′=𝖬𝖣𝖵⁡\(w,Qapx\)≤𝖬𝖣𝖵⁡\(E,Qapx\)\+4v∗v^\{\\prime\}=\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\_\{\\textsf\{apx\}\}\)\\leq\\operatorname\{\\mathsf\{MDV\}\}\(E,Q\_\{\\textsf\{apx\}\}\)\+4v^\{\*\}\. Moreover, by the guarantees of Algorithm[9](https://arxiv.org/html/2606.00289#alg9)\(Lemma[F\.19](https://arxiv.org/html/2606.00289#A6.Thmlemma19)\),𝖬𝖣𝖵⁡\(E,Qapx\)≤4𝖬𝖣𝖵⁡\(E,s\)\\operatorname\{\\mathsf\{MDV\}\}\(E,Q\_\{\\textsf\{apx\}\}\)\\leq 4\\operatorname\{\\mathsf\{MDV\}\}\(E,s\)\. Thus,

v′\\displaystyle v^\{\\prime\}≤𝖬𝖣𝖵⁡\(E,Qapx\)\+4v∗=4𝖬𝖣𝖵⁡\(E,s\)\+4v∗≤8v∗\\displaystyle\\leq\\operatorname\{\\mathsf\{MDV\}\}\(E,Q\_\{\\textsf\{apx\}\}\)\+4v^\{\*\}=4\\operatorname\{\\mathsf\{MDV\}\}\(E,s\)\+4v^\{\*\}\\leq 8v^\{\*\}where the final inequality follows fromE⊆wE\\subseteq wand so𝖬𝖣𝖵⁡\(E,s\)≤𝖬𝖣𝖵⁡\(w,s\)=v∗\\operatorname\{\\mathsf\{MDV\}\}\(E,s\)\\leq\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)=v^\{\*\}\. By definition ofv∗v^\{\*\},𝖬𝖣𝖵⁡\(E,Qapx\)≥v∗\\operatorname\{\\mathsf\{MDV\}\}\(E,Q\_\{\\textsf\{apx\}\}\)\\geq v^\{\*\}and we have that

v′∈\[v∗,8v∗\]\.v^\{\\prime\}\\in\[v^\{\*\},8v^\{\*\}\]\.Define the intervalsIcjI\_\{c\}^\{j\}and setEsubE\_\{\\textsf\{sub\}\}as in Algorithm[10](https://arxiv.org/html/2606.00289#alg10)\. Each intervalIcI\_\{c\}has length at most4v∗4\\sqrt\{v^\{\*\}\}, and thus, by construction, eachIcjI\_\{c\}^\{j\}has length at most4v∗ε/4=v∗ε4\\sqrt\{v^\{\*\}\\varepsilon\}/\{4\}=\\sqrt\{v^\{\*\}\\varepsilon\}\. AsEsubE\_\{\\textsf\{sub\}\}is the union of the endpoints of theIcjI\_\{c\}^\{j\}, it thus follows that

v∗=𝖬𝖣𝖵⁡\(w,s\)≤𝖬𝖣𝖵⁡\(Esub,s\)\+εv∗/4and so𝖬𝖣𝖵⁡\(Esub,s\)≥\(1−ε/4\)v∗\.v^\{\*\}=\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq\\operatorname\{\\mathsf\{MDV\}\}\(E\_\{\\textsf\{sub\}\},s\)\+\\varepsilon v^\{\*\}/4\\quad\\text\{and so\}\\quad\\operatorname\{\\mathsf\{MDV\}\}\(E\_\{\\textsf\{sub\}\},s\)\\geq\(1\-\\varepsilon/4\)v^\{\*\}\.Letν=𝖬𝖣𝖵⁡\(Esub,s\)\\nu=\\operatorname\{\\mathsf\{MDV\}\}\(E\_\{\\textsf\{sub\}\},s\); we haveν≤v∗≤v′\\nu\\leq v^\{\*\}\\leq v^\{\\prime\}\(asEsub⊆wE\_\{\\textsf\{sub\}\}\\subseteq w\) andν≥\(1−ε/4\)v∗\>v′/9\\nu\\geq\(1\-\\varepsilon/4\)v^\{\*\}\>v^\{\\prime\}/9\. Thus,ν∈\[v′/9,v′\]\\nu\\in\[v^\{\\prime\}/9,v^\{\\prime\}\]\.

Now, consider theiisuch thatvi=\(1\+ε/2\)iv\_\{i\}=\(1\+\\varepsilon/2\)^\{i\}is in\[v′/9,v′\]\[v^\{\\prime\}/9,v^\{\\prime\}\]andν≤vi<\(1\+ε/2\)ν\\nu\\leq v\_\{i\}<\(1\+\\varepsilon/2\)\\nu\. Sincevi≥νv\_\{i\}\\geq\\nu, it follows that𝖬𝖣𝖵s⁡\(Esub,vi\)≤s\\operatorname\{\\mathsf\{MDV\}\}^\{s\}\(E\_\{\\textsf\{sub\}\},v\_\{i\}\)\\leq s, and moreoverviv\_\{i\}is the smallest such power of\(1\+ε/2\)\(1\+\\varepsilon/2\)by construction\. Asν∈\[v′/9,v′\]\\nu\\in\[v^\{\\prime\}/9,v^\{\\prime\}\],vi∈\[v′/9,v′\(1\+ε/2\)\]v\_\{i\}\\in\[v^\{\\prime\}/9,v^\{\\prime\}\(1\+\\varepsilon/2\)\]and so the binary search of Line[10](https://arxiv.org/html/2606.00289#alg10)will returnviv\_\{i\}\.

Finally, the algorithm returnsvi\(1\+ε/2\)v\_\{i\}\(1\+\\varepsilon/2\)\. We have

vi\(1\+ε/2\)≥ν\(1\+ε/2\)≥v∗\(1−ε/4\)\(1\+ε/2\)\>v∗v\_\{i\}\(1\+\\varepsilon/2\)\\geq\\nu\(1\+\\varepsilon/2\)\\geq v^\{\*\}\(1\-\\varepsilon/4\)\(1\+\\varepsilon/2\)\>v^\{\*\}asν≥\(1−ε/4\)v∗\\nu\\geq\(1\-\\varepsilon/4\)v^\{\*\}\. Moreover, asν≤v∗\\nu\\leq v^\{\*\},

vi\(1\+ε/2\)≤ν\(1\+ε/2\)2≤v∗\(1\+ε\)v\_\{i\}\(1\+\\varepsilon/2\)\\leq\\nu\(1\+\\varepsilon/2\)^\{2\}\\leq v^\{\*\}\(1\+\\varepsilon\)giving the desired bounds\. ∎

Theorem[3\.3](https://arxiv.org/html/2606.00289#S3.SS3)then follows from Lemma[F\.21](https://arxiv.org/html/2606.00289#A6.Thmlemma21)and Lemma[F\.20](https://arxiv.org/html/2606.00289#A6.Thmlemma20)\.

### F\.7Irrational Solutions

While we have presented efficient algorithms that achieve high\-precision estimates of the optimal quantization set, it turns out that returning an optimal quantization set that minimizes the𝖬𝖣𝖵\\operatorname\{\\mathsf\{MDV\}\}objective is impossible under standard bit representation \(i\.e\. floating point\) of any precision\. In particular, here we show a simple example where the vectorw∈ℤdw\\in\\mathbb\{Z\}^\{d\}, but𝖬𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)is irrational \(and thus any optimal quantization set contains irrational values as well\)\.

###### Lemma F\.22\.

Letw=\(0,2,5,8,10\)w=\(0,2,5,8,10\)ands=4s=4\. Then,𝖬𝖣𝖵⁡\(w,s\)=8−27\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)=8\-2\\sqrt\{7\}\.

###### Proof\.

The quantization set must contain elements 0 and 10 \(the min and max ofww\), so the problem of finding an optimal quantization set reduces to finding the position of the remaining two elements\. LetQ∗=\{0,x,y,10\}Q^\{\*\}=\\\{0,x,y,10\\\}be an optimal quantization set, withx<yx<y\. LetQ′=\{0,3,7,10\}Q^\{\\prime\}=\\\{0,3,7,10\\\}; so𝖬𝖣𝖵⁡\(w,Q′\)=4\\operatorname\{\\mathsf\{MDV\}\}\(w,Q^\{\\prime\}\)=4and thus𝖬𝖣𝖵⁡\(w,Q∗\)≤4\\operatorname\{\\mathsf\{MDV\}\}\(w,Q^\{\*\}\)\\leq 4\. It then follows thatx≤5x\\leq 5andy≥5y\\geq 5: any setQQwith no elements in\(5,10\)\(5,10\)or none in\(0,5\)\(0,5\)has𝖬𝖣𝖵⁡\(w,Q\)\>6\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\>6\. Similarly,x≥2x\\geq 2andy≤8y\\leq 8: ifx<2x<2, the setQ′=\{0,2,y,10\}Q^\{\\prime\}=\\\{0,2,y,10\\\}would have𝖬𝖣𝖵⁡\(w,Q′\)<𝖬𝖣𝖵⁡\(w,Q∗\)\\operatorname\{\\mathsf\{MDV\}\}\(w,Q^\{\\prime\}\)<\\operatorname\{\\mathsf\{MDV\}\}\(w,Q^\{\*\}\), and analogously fory\>8y\>8\.

Whenx∈\[2,5\]x\\in\[2,5\]andy∈\[5,8\]y\\in\[5,8\], we have by definition

𝖬𝖣𝖵⁡\(w,Q∗\)=max⁡\{2\(x−2\),\(5−x\)\(y−5\),2\(8−y\)\}\\operatorname\{\\mathsf\{MDV\}\}\(w,Q^\{\*\}\)=\\max\\left\\\{2\(x\-2\),\(5\-x\)\(y\-5\),2\(8\-y\)\\right\\\}Solving, we find the minimum occurs whenx=6−7x=6\-\\sqrt\{7\},y=4\+7y=4\+\\sqrt\{7\}, and𝖬𝖣𝖵⁡\(w,Q∗\)=8−27\\operatorname\{\\mathsf\{MDV\}\}\(w,Q^\{\*\}\)=8\-2\\sqrt\{7\}\. ∎

### F\.8Practical Algorithms

While the algorithm presented in this section has excellent theoretical gauntness, its complexity and use of thes\-Center\-Clusteringalgorithm of\[[10](https://arxiv.org/html/2606.00289#bib.bib10)\]makes it challenging to use in practice\. As such, in this section we detail how we modify our algorithm to better perform on real\-world systems, while retaining theoretical guarantees, and provide performance and accuracy evaluations\.

The full pseudocode of our algorithm can be found in Algorithm[11](https://arxiv.org/html/2606.00289#alg11)\. Algorithm[11](https://arxiv.org/html/2606.00289#alg11)is still guaranteed to output a\(1\+ε\)\(1\+\\varepsilon\)\-approximate solution\.

Input:

w∈ℝdw\\in\\mathbb\{R\}^\{d\}, size parameter

s∈ℕs\\in\\mathbb\{N\}, number of initial intervals

mm, approximation parameter

ε\>0\\varepsilon\>0
Output:Set

QQof size

ss
1

2Let

a=min⁡\(w\)a=\\min\(w\),

b=max⁡\(w\)b=\\max\(w\)
3Initialize

ℐ\\mathcal\{I\}as the

m−1m\-1equal intervals

\[a\+\(b−a\)\(j−1\)m−1,a\+\(b−a\)jm−1\]\\left\[a\+\\frac\{\(b\-a\)\(j\-1\)\}\{m\-1\},\\;a\+\\frac\{\(b\-a\)j\}\{m\-1\}\\right\]for

j=1,…,m−1j=1,\\ldots,m\-1
4for*eachI∈ℐI\\in\\mathcal\{I\}withI∩w≠∅I\\cap w\\neq\\emptyset*do

5Add

min⁡\{wi:wi∈I\}\\min\\\{w\_\{i\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}w\_\{i\}\\in I\\\}and

max⁡\{wi:wi∈I\}\\max\\\{w\_\{i\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}w\_\{i\}\\in I\\\}to

xx
6Sort

xx; set

lo=0\\texttt\{lo\}=0and

hi=\(b−a\)2/4\\texttt\{hi\}=\(b\-a\)^\{2\}/4
7while**hi*−*lo*\>ε⋅*hi*\\texttt\{hi\}\-\\texttt\{lo\}\>\\varepsilon\\cdot\\texttt\{hi\}*do

8if*there exists some intervalI∈ℐI\\in\\mathcal\{I\}withmax⁡\{wi:wi∈I\}−min⁡\{wi:wi∈I\}\>ε⋅*hi*\\max\\\{w\_\{i\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}w\_\{i\}\\in I\\\}\-\\min\\\{w\_\{i\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}w\_\{i\}\\in I\\\}\>\\sqrt\{\\varepsilon\\cdot\\texttt\{hi\}\}*then

9Set

ℐ\\mathcal\{I\}to be the

m′=⌈\(b−a\)/ε⋅hi⌉m^\{\\prime\}=\\left\\lceil\(b\-a\)/\\sqrt\{\\varepsilon\\cdot\\texttt\{hi\}\}\\right\\rceilequal\-sized intervals of

\[a,b\]\[a,b\]\.

10For each

I∈ℐI\\in\\mathcal\{I\}with

I∩w≠∅I\\cap w\\neq\\emptyset, add

min⁡\{wi:wi∈I\}\\min\\\{w\_\{i\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}w\_\{i\}\\in I\\\}and

max⁡\{wi:wi∈I\}\\max\\\{w\_\{i\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}w\_\{i\}\\in I\\\}to

xx\.

11Sort

xx\.

12Set

mid=\(lo\+hi\)/2\\texttt\{mid\}=\(\\texttt\{lo\}\+\\texttt\{hi\}\)/2
13Run Algorithm[6](https://arxiv.org/html/2606.00289#alg6)on

xxwith variance parametermid; let

s′s^\{\\prime\}be the size of the returned set

14if*s′\>ss^\{\\prime\}\>s*thenset

lo=mid\\texttt\{lo\}=\\texttt\{mid\};

15elseset

hi=mid\\texttt\{hi\}=\\texttt\{mid\};

16

return*the setQQconstructed in the last call to Algorithm[6](https://arxiv.org/html/2606.00289#alg6)which updatedhi\.*

Algorithm 11Practical Algorithm for𝖬𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)###### Lemma F\.23\.

LetQQbe the set returned by Algorithm[11](https://arxiv.org/html/2606.00289#alg11)when run on vectorw∈ℝdw\\in\\mathbb\{R\}^\{d\}ands∈ℕs\\in\\mathbb\{N\}\. Then,

𝖬𝖣𝖵⁡\(w,Q\)≤\(1\+O\(ε\)\)𝖬𝖣𝖵⁡\(w,s\)\.\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq\(1\+O\(\\varepsilon\)\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\.

###### Proof\.

We first claim that throughout the entire iteration of the algorithm,lo<𝖬𝖣𝖵⁡\(w,s\)≤\(1\+ε\)hi\\texttt\{lo\}<\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq\(1\+\\varepsilon\)\\texttt\{hi\}\. Initially,lo=0\\texttt\{lo\}=0and so𝖬𝖣𝖵⁡\(w,s\)\>0\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\>0; in any iteration of thewhileloop,lois increased to a valuevvif and only if𝖬𝖣𝖵⁡\(x,s\)\>v\\operatorname\{\\mathsf\{MDV\}\}\(x,s\)\>v; sincex⊆wx\\subseteq w, it follows that𝖬𝖣𝖵⁡\(w,s\)≥𝖬𝖣𝖵⁡\(x,s\)\>lo\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\geq\\operatorname\{\\mathsf\{MDV\}\}\(x,s\)\>\\texttt\{lo\}as well\.

For the other direction, by construction𝖬𝖣𝖵⁡\(w,s\)≤hi\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq\\texttt\{hi\}at the beginning of the algorithm\. At any other iteration,hiis updated tomidif and only if𝖬𝖣𝖵\(x,s\)≤∣\\operatorname\{\\mathsf\{MDV\}\}\(x,s\)\\leq\\mid, wherexxis the endpoint of the intervalsℐ\\mathcal\{I\}\. By construction, each intervalI∈ℐI\\in\\mathcal\{I\}has length at mostε⋅hi≤ε⋅2mid<2ε⋅mid\\sqrt\{\\varepsilon\\cdot\\texttt\{hi\}\}\\leq\\sqrt\{\\varepsilon\\cdot 2\\texttt\{mid\}\}<2\\sqrt\{\\varepsilon\\cdot\\texttt\{mid\}\}, asmid=\(lo\+hi\)/2≥hi/2\\texttt\{mid\}=\(\\texttt\{lo\}\+\\texttt\{hi\}\)/2\\geq\\texttt\{hi\}/2\. Thus, applying Lemma[F\.5](https://arxiv.org/html/2606.00289#A6.Thmlemma5),

𝖬𝖣𝖵⁡\(w,s\)≤𝖬𝖣𝖵⁡\(x,s\)\+ε⋅mid≤mid\+ε⋅mid=\(1\+ε\)mid\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq\\operatorname\{\\mathsf\{MDV\}\}\(x,s\)\+\\varepsilon\\cdot\\texttt\{mid\}\\leq\\texttt\{mid\}\+\\varepsilon\\cdot\\texttt\{mid\}=\(1\+\\varepsilon\)\\texttt\{mid\}as we have𝖬𝖣𝖵⁡\(x,s\)≤mid\\operatorname\{\\mathsf\{MDV\}\}\(x,s\)\\leq\\texttt\{mid\}\. Thus, when updatinghitomid, we maintain that𝖬𝖣𝖵⁡\(w,s\)≤\(1\+ε\)hi\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq\(1\+\\varepsilon\)\\texttt\{hi\}\.

At the end of the algorithm,lo≥\(1−ε\)hi\\texttt\{lo\}\\geq\(1\-\\varepsilon\)\\texttt\{hi\}, due to the termination condition of thewhileloop, and so\(1−ε\)hi≤𝖬𝖣𝖵⁡\(w,s\)≤\(1\+ε\)hi\(1\-\\varepsilon\)\\texttt\{hi\}\\leq\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)\\leq\(1\+\\varepsilon\)\\texttt\{hi\}\. Moreover, the algorithm returns a setQQsuch that𝖬𝖣𝖵⁡\(x,Q\)≤hi\\operatorname\{\\mathsf\{MDV\}\}\(x,Q\)\\leq\\texttt\{hi\}, wherexxis the endpoint of a collection of intervals of length at mostε⋅hi\\sqrt\{\\varepsilon\\cdot\\texttt\{hi\}\}, so applying Lemma[F\.5](https://arxiv.org/html/2606.00289#A6.Thmlemma5)again,

𝖬𝖣𝖵⁡\(w,Q\)≤𝖬𝖣𝖵⁡\(x,Q\)\+ε⋅hi/4≤\(1\+ε\)hi≤\(1\+ε\)2𝖬𝖣𝖵⁡\(w,s\)<\(1\+2ε\)𝖬𝖣𝖵⁡\(w,s\)\\operatorname\{\\mathsf\{MDV\}\}\(w,Q\)\\leq\\operatorname\{\\mathsf\{MDV\}\}\(x,Q\)\+\\varepsilon\\cdot\\texttt\{hi\}/4\\leq\(1\+\\varepsilon\)\\texttt\{hi\}\\leq\(1\+\\varepsilon\)^\{2\}\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)<\(1\+2\\varepsilon\)\\operatorname\{\\mathsf\{MDV\}\}\(w,s\)as desired\. ∎

## Appendix GDeferred Proofs and Figures

### G\.1Deferred Proofs from Appendix[D](https://arxiv.org/html/2606.00289#A4)

###### Proof of Lemma[D\.1](https://arxiv.org/html/2606.00289#A4.Thmlemma1)\.

First, note that Adaptive Stochastic Quantization distribution is only well\-defined if bothw1↓\(Q\)w\_\{1\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)andw1↑\(Q\)w\_\{1\}^\{\\scriptscriptstyle\\uparrow\}\(Q\)exist\. We show that if\{w1↓\(Q\),w1↑\(Q\)\}≠\{w1,wd\}\\\{w\_\{1\}^\{\\scriptscriptstyle\\downarrow\}\(Q\),w\_\{1\}^\{\\scriptscriptstyle\\uparrow\}\(Q\)\\\}\\neq\\\{w\_\{1\},w\_\{d\}\\\}, then𝖠𝖣𝖵𝒳⁡\(w,Q\)\>𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\>\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\. First, suppose thatw1↓\(Q\)≠w1w\_\{1\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\\neq w\_\{1\}\. Then, consider quantization setQ′:=\(Q∖\{w1↓\(Q\)\}\)∪\{w1\}Q^\{\\prime\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\(Q\\setminus\\\{w\_\{1\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\\\}\)\\cup\\\{w\_\{1\}\\\}and observe that

𝖠𝖣𝖵𝒳⁡\(w,Q′\)\\displaystyle\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q^\{\\prime\}\)=∑i=1dλi\(wi↑\(Q′\)−wi\)\(wi−wi↓\(Q′\)\)\\displaystyle=\\sum\_\{i=1\}^\{d\}\\lambda\_\{i\}\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\prime\}\)\-w\_\{i\}\)\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\prime\}\)\)=∑i=2dλi\(wi↑\(Q′\)−wi\)\(wi−wi↓\(Q′\)\)\\displaystyle=\\sum\_\{i=2\}^\{d\}\\lambda\_\{i\}\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\prime\}\)\-w\_\{i\}\)\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\prime\}\)\)\(w1↓\(Q′\)=w1w\_\{1\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\prime\}\)=w\_\{1\}\)≤∑i=2dλi\(wi↑\(Q\)−wi\)\(wi−wi↓\(Q\)\)\\displaystyle\\leq\\sum\_\{i=2\}^\{d\}\\lambda\_\{i\}\(w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q\)\-w\_\{i\}\)\(w\_\{i\}\-w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\)≤𝖠𝖣𝖵𝒳⁡\(w,Q\)\\displaystyle\\leq\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)Where the first inequality follows becausewi↑\(Q\)=wi↑\(Q′\)w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q\)=w^\{\\scriptscriptstyle\\uparrow\}\_\{i\}\(Q^\{\\prime\}\)andwi↓\(Q′\)≥wi↓\(Q\)w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q^\{\\prime\}\)\\geq w\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\. A symmetric argument can be used to show that𝖠𝖣𝖵𝒳⁡\(w,Q\)\>𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\>\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)ifw1↓\(Q\)≠wdw\_\{1\}^\{\\scriptscriptstyle\\downarrow\}\(Q\)\\neq w\_\{d\}\. Thus, any quantization set not containingw1w\_\{1\}andwdw\_\{d\}must be suboptimal with respect to𝖠𝖣𝖵𝒳\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\. ∎

###### Proof of Lemma[D\.2](https://arxiv.org/html/2606.00289#A4.Thmlemma2)\.

Consider an optimal quantization setQ=\{q1,…,qs\}⊈wQ=\\\{q\_\{1\},\\ldots,q\_\{s\}\\\}\\not\\subseteq wsuch thatq1≤…≤qsq\_\{1\}\\leq\\ldots\\leq q\_\{s\}and𝖠𝖣𝖵𝒳⁡\(w,Q\)=𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)=\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\)\. The argument proceeds by iteratively transformingQQ, while leaving its objective value unchanged\.

Considerqi∈Qq\_\{i\}\\in Qsuch thatqi∉wq\_\{i\}\\notin w\. Defineqi↓:=max\{wj:wj≤qi\}q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\max\\\{w\_\{j\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}w\_\{j\}\\leq q\_\{i\}\\\}and analogouslyqi↑:=min\{wj:wj≥qi\}q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\\min\\\{w\_\{j\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}w\_\{j\}\\geq q\_\{i\}\\\}\. We first show thatqi−1<qi↓q\_\{i\-1\}<q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}andqi\+1\>qi↑q\_\{i\+1\}\>q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\. Suppose thatqi−1≥qi↓q\_\{i\-1\}\\geq q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}andqi\+1\>qi↑q\_\{i\+1\}\>q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\. Then, quantization setQ′:=\(Q∖\{qi\}\)∪\{qi↑\}Q^\{\\prime\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\(Q\\setminus\\\{q\_\{i\}\\\}\)\\cup\\\{q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\\\}has𝖠𝖣𝖵𝒳⁡\(w,Q′\)<𝖠𝖣𝖵𝒳⁡\(w,Q\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q^\{\\prime\}\)<\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\), contradicting the optimality ofQQ\. A symmetric argument contradicts the optimality ofQQwhenqi−1<qi↓q\_\{i\-1\}<q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}andqi\+1≤qi↑q\_\{i\+1\}\\leq q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\. When bothqi−1≥qi↓q\_\{i\-1\}\\geq q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}andqi\+1≤qi↑q\_\{i\+1\}\\leq q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}, the quantization pointqiq\_\{i\}is never rounded to by ASQ onww, therefore quantization setQ′:=\(Q∖\{qi\}\)∪\{wi′\}Q^\{\\prime\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\(Q\\setminus\\\{q\_\{i\}\\\}\)\\cup\\\{w\_\{i^\{\\prime\}\}\\\}, wherewi′∉Qw\_\{i^\{\\prime\}\}\\notin Q, has𝖠𝖣𝖵𝒳⁡\(w,Q′\)<𝖠𝖣𝖵𝒳⁡\(w,Q\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q^\{\\prime\}\)<\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\. Thus, for the remainder of the proof, we assume thatqi−1<qi↓q\_\{i\-1\}<q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}andqi\+1\>qi↑q\_\{i\+1\}\>q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\.

We construct quantization setQ1:=\(Q∖\{qi\}\)∪\{qi↓\}Q\_\{1\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\(Q\\setminus\\\{q\_\{i\}\\\}\)\\cup\\\{q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\\\}\. Then,

𝖠𝖣𝖵𝒳⁡\(w,Q1\)−𝖠𝖣𝖵𝒳⁡\(w,Q\)\\displaystyle\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\_\{1\}\)\-\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)=∑wj∈\[qi−1,qi↓\]λj\(wj−qi−1\)⋅\(qi↓−qi\)\\displaystyle=\\sum\_\{w\_\{j\}\\in\[q\_\{i\-1\},q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\]\}\\lambda\_\{j\}\(w\_\{j\}\-q\_\{i\-1\}\)\\cdot\\left\(q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\-q\_\{i\}\\right\)\+∑wk∈\[qi↓,qi\+1\]λj\(qi\+1−wk\)⋅\(qi−qi↓\)\\displaystyle\\quad\\quad\\quad\\quad\+\\sum\_\{w\_\{k\}\\in\[q\_\{i\}^\{\\scriptscriptstyle\\downarrow\},q\_\{i\+1\}\]\}\\lambda\_\{j\}\(q\_\{i\+1\}\-w\_\{k\}\)\\cdot\\left\(q\_\{i\}\-q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\\right\)=\(qi−qi↓\)⋅\[∑wk∈\[qi↓,qi\+1\]λj\(qi\+1−wk\)−∑wj∈\[qi−1,qi↓\]λj\(wj−qi−1\)\]\\displaystyle=\(q\_\{i\}\-q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\)\\cdot\\left\[\\sum\_\{w\_\{k\}\\in\[q\_\{i\}^\{\\scriptscriptstyle\\downarrow\},q\_\{i\+1\}\]\}\\lambda\_\{j\}\(q\_\{i\+1\}\-w\_\{k\}\)\-\\sum\_\{w\_\{j\}\\in\[q\_\{i\-1\},q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\]\}\\lambda\_\{j\}\(w\_\{j\}\-q\_\{i\-1\}\)\\right\]=\(qi−qi↓\)⋅γ\\displaystyle=\(q\_\{i\}\-q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\)\\cdot\\gammaAnalogously, we constructQ2:=\(Q∖\{qi\}\)∪\{qi↑\}Q\_\{2\}\\mathrel\{\\mathop\{\\ordinarycolon\}\}=\(Q\\setminus\\\{q\_\{i\}\\\}\)\\cup\\\{q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\\\}\. Then,

𝖠𝖣𝖵𝒳⁡\(w,Q2\)−𝖠𝖣𝖵𝒳⁡\(w,Q\)\\displaystyle\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\_\{2\}\)\-\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)=∑wj∈\[qi−1,qi↑\]λj\(wj−qi−1\)⋅\(qi↑−qi\)\\displaystyle=\\sum\_\{w\_\{j\}\\in\[q\_\{i\-1\},q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\]\}\\lambda\_\{j\}\(w\_\{j\}\-q\_\{i\-1\}\)\\cdot\\left\(q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\-q\_\{i\}\\right\)\+∑wk∈\[qi↑,qi\+1\]λj\(qi\+1−wk\)⋅\(qi−qi↑\)\\displaystyle\\quad\\quad\\quad\\quad\+\\sum\_\{w\_\{k\}\\in\[q\_\{i\}^\{\\scriptscriptstyle\\uparrow\},q\_\{i\+1\}\]\}\\lambda\_\{j\}\(q\_\{i\+1\}\-w\_\{k\}\)\\cdot\\left\(q\_\{i\}\-q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\\right\)=\(qi↑−qi\)⋅\[∑wj∈\[qi−1,qi↑\]λj\(wj−qi−1\)−∑wk∈\[qi↑,qi\+1\]λj\(qi\+1−wk\)\]\\displaystyle=\(q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\-q\_\{i\}\)\\cdot\\left\[\\sum\_\{w\_\{j\}\\in\[q\_\{i\-1\},q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\]\}\\lambda\_\{j\}\(w\_\{j\}\-q\_\{i\-1\}\)\-\\sum\_\{w\_\{k\}\\in\[q\_\{i\}^\{\\scriptscriptstyle\\uparrow\},q\_\{i\+1\}\]\}\\lambda\_\{j\}\(q\_\{i\+1\}\-w\_\{k\}\)\\right\]=−\(qi↑−qi\)⋅γ\\displaystyle=\-\(q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\-q\_\{i\}\)\\cdot\\gammaBecause it is assumed that𝖠𝖣𝖵𝒳⁡\(w,Q\)=𝖠𝖣𝖵𝒳⁡\(w,s\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)=\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,s\), it must be the case that both𝖠𝖣𝖵𝒳⁡\(w,Q1\)−𝖠𝖣𝖵𝒳⁡\(w,Q\)≥0\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\_\{1\}\)\-\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\\geq 0and𝖠𝖣𝖵𝒳⁡\(w,Q2\)−𝖠𝖣𝖵𝒳⁡\(w,Q\)≥0\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\_\{2\}\)\-\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\\geq 0\. But\(qi−qi↓\)\>0\(q\_\{i\}\-q\_\{i\}^\{\\scriptscriptstyle\\downarrow\}\)\>0and\(qi↑−qi\)\>0\(q\_\{i\}^\{\\scriptscriptstyle\\uparrow\}\-q\_\{i\}\)\>0, so it must be the case thatγ=0\\gamma=0\. Therefore,𝖠𝖣𝖵𝒳⁡\(w,Q1\)=𝖠𝖣𝖵𝒳⁡\(w,Q2\)=𝖠𝖣𝖵𝒳⁡\(w,Q\)\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\_\{1\}\)=\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\_\{2\}\)=\\operatorname\{\\mathsf\{ADV\}\}\_\{\\mathcal\{X\}\}\(w,Q\)\. Iteratively replacing eachqi∉wq\_\{i\}\\notin wfromQQin this way then proves the claim\.

∎

###### Proof of Lemma[D\.3](https://arxiv.org/html/2606.00289#A4.Thmlemma3)\.

We prove thatCCsatisfies the Quadrangle Inequality \(Definition[A\.5](https://arxiv.org/html/2606.00289#A1.Thmdefinition5)\) Note that fori∈\[a,b\]i\\in\[a,b\], we have

\(wc−wi\)\(wi−wa\)≤\(wf−wi\)\(wi−wa\)\\displaystyle\(w\_\{c\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\\leq\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\(1\)Fori∈\[c,f\]i\\in\[c,f\], we have

\(wf−wi\)\(wi−wb\)≤\(wf−wi\)\(wi−wa\)\\displaystyle\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{b\}\)\\leq\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\(2\)Fori∈\[b,c\]i\\in\[b,c\], we have

\(wc−wi\)\\displaystyle\(w\_\{c\}\-w\_\{i\}\)\(wi−wa\)\+\(wf−wi\)\(wi−wb\)\\displaystyle\(w\_\{i\}\-w\_\{a\}\)\+\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{b\}\)=\(wc−wi\)\(wi−wb\)\+\(wf−wi\)\(wi−wa\)\+\(wa−wb\)\(wf−wc\)\\displaystyle=\(w\_\{c\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{b\}\)\+\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\+\(w\_\{a\}\-w\_\{b\}\)\(w\_\{f\}\-w\_\{c\}\)≤\(wc−wi\)\(wi−wb\)\+\(wf−wi\)\(wi−wa\)\\displaystyle\\leq\(w\_\{c\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{b\}\)\+\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\(3\)
Thus, we get that

C\[a,c\]\+C\[b,f\]\\displaystyle C\[a,c\]\+C\[b,f\]=∑i∈\[a,c\]λi\(wc−wi\)\(wi−wa\)\+∑i∈\[b,d\]λi\(wf−wi\)\(wi−wb\)\\displaystyle=\\sum\_\{i\\in\[a,c\]\}\\lambda\_\{i\}\(w\_\{c\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\+\\sum\_\{i\\in\[b,d\]\}\\lambda\_\{i\}\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{b\}\)=∑i∈\[a,b\]λi\(wc−wi\)\(wi−wa\)\+∑i∈\[c,d\]λi\(wf−wi\)\(wi−wb\)\\displaystyle=\\sum\_\{i\\in\[a,b\]\}\\lambda\_\{i\}\(w\_\{c\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\+\\sum\_\{i\\in\[c,d\]\}\\lambda\_\{i\}\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{b\}\)\+∑i∈\[b,c\]λi\[\(wc−wi\)\(wi−wb\)\+\(wf−wi\)\(wi−wa\)\]\\displaystyle\\quad\\quad\\quad\\quad\+\\sum\_\{i\\in\[b,c\]\}\\lambda\_\{i\}\[\(w\_\{c\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{b\}\)\+\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\]≤∑i∈\[a,b\]λi\(wf−wi\)\(wi−wa\)\+∑i∈\[c,d\]λi\(wf−wi\)\(wi−wa\)\\displaystyle\\leq\\sum\_\{i\\in\[a,b\]\}\\lambda\_\{i\}\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\+\\sum\_\{i\\in\[c,d\]\}\\lambda\_\{i\}\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\+∑i∈\[b,c\]λi\[\(wc−wi\)\(wi−wb\)\+\(wf−wi\)\(wi−wa\)\]\\displaystyle\\quad\\quad\\quad\\quad\+\\sum\_\{i\\in\[b,c\]\}\\lambda\_\{i\}\\left\[\(w\_\{c\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{b\}\)\+\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\\right\]=∑i∈\[a,d\]λi\(wf−wi\)\(wi−wa\)\+∑i∈\[b,c\]λi\(wc−wi\)\(wi−wb\)=C\[a,d\]\+C\[b,c\]\\displaystyle=\\sum\_\{i\\in\[a,d\]\}\\lambda\_\{i\}\(w\_\{f\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{a\}\)\+\\sum\_\{i\\in\[b,c\]\}\\lambda\_\{i\}\(w\_\{c\}\-w\_\{i\}\)\(w\_\{i\}\-w\_\{b\}\)=C\[a,d\]\+C\[b,c\]∎

###### Proof of Lemma[D\.4](https://arxiv.org/html/2606.00289#A4.Thmlemma4)\.

We first show that the rows ofCCare sorted\. Consider a fixed rowi∈\[d\]i\\in\[d\]and letj,k∈\[d\]j,k\\in\[d\]such thati<j<ki<j<k\. Then,

C\[i,j\]=∑ℓ=ijλℓ\(wj−wℓ\)\(wℓ−wi\)≤∑ℓ=ikλℓ\(wk−wℓ\)\(wℓ−wi\)=C\[i,k\]\\displaystyle C\[i,j\]=\\sum\_\{\\ell=i\}^\{j\}\\lambda\_\{\\ell\}\(w\_\{j\}\-w\_\{\\ell\}\)\(w\_\{\\ell\}\-w\_\{i\}\)\\leq\\sum\_\{\\ell=i\}^\{k\}\\lambda\_\{\\ell\}\(w\_\{k\}\-w\_\{\\ell\}\)\(w\_\{\\ell\}\-w\_\{i\}\)=C\[i,k\]Similarly, for a fixed columnj∈\[d\]j\\in\[d\]and rowsi<ki<kwe have

C\[i,j\]=∑ℓ=ijλℓ\(wj−wℓ\)\(wℓ−wi\)≥∑ℓ=kjλℓ\(wj−wℓ\)\(wℓ−wk\)=C\[k,j\]\\displaystyle C\[i,j\]=\\sum\_\{\\ell=i\}^\{j\}\\lambda\_\{\\ell\}\(w\_\{j\}\-w\_\{\\ell\}\)\(w\_\{\\ell\}\-w\_\{i\}\)\\geq\\sum\_\{\\ell=k\}^\{j\}\\lambda\_\{\\ell\}\(w\_\{j\}\-w\_\{\\ell\}\)\(w\_\{\\ell\}\-w\_\{k\}\)=C\[k,j\]This proves that matrixCChas sorted rows and columns\. ∎

### G\.2Deferred Table from Section[2](https://arxiv.org/html/2606.00289#S2)

Table 2:Recall@100 of maximum inner product andℓ2\\ell\_\{2\}search on vectors from GloVe 300D\[[7](https://arxiv.org/html/2606.00289#bib.bib7)\]\. The dataset consists of 900,000 randomly selected GloVe vectors, each query is sampled from 100,000 \(disjoint from dataset\) vectors, and each method is tested on 10,000 sampled queries\. We benchmark against the FAISS implementation of Product Quantization \(PQ\)\[[23](https://arxiv.org/html/2606.00289#bib.bib23)\]; each method uses an average of 4\-bits per coordinate\.
Inner Product Aware Quantization: Provably Fast, Accurate, and Adaptive Algorithms

Similar Articles

UniSVQ: 2-bit Unified Scalar-Vector Quantization

@_reachsumit: ColBERTSaR: Sparsified ColBERT Index via Product Quantization @EYangTW et al. present an embedding quantization method …

@mixedbreadai: https://x.com/mixedbreadai/status/2071678747439505816

Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction

LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization

Submit Feedback

Similar Articles

UniSVQ: 2-bit Unified Scalar-Vector Quantization
@_reachsumit: ColBERTSaR: Sparsified ColBERT Index via Product Quantization @EYangTW et al. present an embedding quantization method …
@mixedbreadai: https://x.com/mixedbreadai/status/2071678747439505816
Asymmetric Quantization: Near-Lossless Retrieval with 97% Storage Reduction
LC-QAT: Data-Efficient 2-Bit QAT for LLMs via Linear-Constrained Vector Quantization