Contextual Slate GLM Bandits with Limited Adaptivity

arXiv cs.LG Papers

Summary

Proposes algorithms for contextual slate bandits with generalized linear rewards under limited adaptivity, achieving regret bounds independent of the non-linearity parameter. The batched and rarely-switching algorithms are computationally efficient and empirically outperform baselines, including in a language model example selection task.

arXiv:2606.31449v1 Announce Type: new Abstract: We investigate the contextual slate bandit problem with generalized linear rewards under limited adaptivity. At each round, the learner is presented with $N$ sets of items, where each item is represented by a $d$-dimensional feature vector. The learner then constructs a slate by selecting one item per set; the resulting slate yields a scalar reward sampled from a Generalized Linear Model (GLM). We propose algorithms under two limited-adaptivity settings: (a) Batched and (b) Rarely-Switching. For the batched setting, we introduce B-SlateGLinCB, which partitions the time horizon into $\mathcal{O}(\log\log T)$ batches such that each batch's policy relies only on data from previous batches. For the rarely-switching setting, we propose RS-SlateGLinCB, which adaptively performs only $\mathcal{O}(Nd\log T)$ parameter updates. Under a diversity assumption on the item sequences, we prove that B-SlateGLinCB and RS-SlateGLinCB achieve regret bounds of $\mathcal{O}(Nd^{3/2}\sqrt{T})$ and $\mathcal{O}(Nd\sqrt{T})$, respectively. Notably, both bounds are independent of the non-linearity parameter $\kappa$ that is typically found to scale the regret of GLM bandit algorithms. Our algorithms are computationally efficient, requiring only $\text{poly}(N)$ time per round despite $2^{\Omega(N)}$ possible slates. Simulations show our algorithms outperform existing baselines with limited adaptivity and remain competitive with Slate-GLM-OFU, a fully adaptive state-of-the-art algorithm. Notably, a slightly modified B-SlateGLinCB empirically matches this baseline. Finally, we demonstrate strong performance in a practical in-context example selection task for language models.
Original Article
View Cached Full Text

Cached at: 07/01/26, 05:36 AM

# Contextual Slate GLM Bandits with Limited Adaptivity
Source: [https://arxiv.org/html/2606.31449](https://arxiv.org/html/2606.31449)
###### Abstract

We investigate the contextual slate bandit problem with generalized linear rewards under limited adaptivity\. At each round, the learner is presented withNNsets of items, where each item is represented by add\-dimensional feature vector\. The learner then constructs a slate by selecting one item per set; the resulting slate yields a scalar reward sampled from a Generalized Linear Model \(GLM\)\. We propose algorithms under two limited\-adaptivity settings: \(a\) Batched and \(b\) Rarely\-Switching\. For the batched setting, we introduceB\-SlateGLinCB, which partitions the time horizon into𝒪​\(log⁡log⁡T\)\\mathcal\{O\}\(\\log\\log T\)batches such that each batch’s policy relies only on data from previous batches\. For the rarely\-switching setting, we proposeRS\-SlateGLinCB, which adaptively performs only𝒪​\(N​d​log⁡T\)\\mathcal\{O\}\(Nd\\log T\)parameter updates\. Under a diversity assumption on the item sequences, we prove thatB\-SlateGLinCBandRS\-SlateGLinCBachieve regret bounds of𝒪​\(N​d3/2​T\)\\mathcal\{O\}\(Nd^\{3/2\}\\sqrt\{T\}\)and𝒪​\(N​d​T\)\\mathcal\{O\}\(Nd\\sqrt\{T\}\), respectively\. Notably, both bounds are independent of the non\-linearity parameterκ\\kappathat is typically found to scale the regret of GLM bandit algorithms\. Our algorithms are computationally efficient, requiring onlypoly​\(N\)\\text\{poly\}\(N\)time per round despite2Ω​\(N\)2^\{\\Omega\(N\)\}possible slates\. Simulations show our algorithms outperform existing baselines with limited adaptivity and remain competitive withSlate\-GLM\-OFU, a fully adaptive state\-of\-the\-art algorithm\. Notably, a slightly modifiedB\-SlateGLinCBempirically matches this baseline\. Finally, we demonstrate strong performance in a practical in\-context example selection task for language models\.

Contents

## 1Introduction

The online slate bandit framework models sequential decision\-making where a learner must select a slate of items in each round\. A slate is formed by choosing one item for each of several slots, with each slot having a distinct and potentially dynamic pool of candidate items\. Following the selection, the learner observes a single reward for the entire slate \(bandit feedback\)\. The learner aims to design a selection policy that maximizes cumulative reward, or equivalently, minimizes cumulative regret over a horizon ofTTrounds\. This framework is well\-suited for many real\-world applications such as landing page optimization, where page components are selected to maximize conversions\[[14](https://arxiv.org/html/2606.31449#bib.bib21)\], and dynamic ad creative optimization, where advertisements are automatically assembled from various elements\[[5](https://arxiv.org/html/2606.31449#bib.bib22)\]\.

Such broad applications of online slate bandits have led to extensive research across various settings\. When the item sets for each slot remain constant throughout the horizon and semi\-bandit feedback \(individual item level rewards within a slate\) is provided, efficient and low\-regret algorithms are well\-known\[[15](https://arxiv.org/html/2606.31449#bib.bib1),[26](https://arxiv.org/html/2606.31449#bib.bib3)\]\. Recently,\[[7](https://arxiv.org/html/2606.31449#bib.bib2)\]devised an efficient Thompson Sampling method for the fixed\-item\-set scenario that accommodates bandit feedback by attributing the same slate\-level reward to all items within the chosen slate\. Subsequently,\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]explored the stochastic contextual setting, characterized by item sets varying stochastically over time, and utilized bandit feedback from a logistic model\. Their proposed algorithm,Slate\-GLM\-OFU, efficiently navigates an exponentially large space of candidate slates through an optimistic selection process for each slot\. It achieves optimal regret, provided a “diversity” assumption holds for the sequence of chosen items, thus achieving strong theoretical and empirical performance for the contextual logistic slate bandit problem under bandit feedback\.

Despite these algorithmic advances, critical challenges remain for deploying these methods in practice\. Web\-scale applications, such as online advertising and real\-time recommendations, require bandit algorithms to operate with limited adaptivity\. Two popular limited adaptivity settings studied in literature are: \(a\)Batched\- The algorithm must partition the horizon\{1,…,T\}\\\{1,\\ldots,T\\\}into very few intervals \(batches\) and its policy during a batch should only depend on observations \(slates selected and rewards received\) from the previous batches, and \(b\)Rarely\-Switching\- The algorithm adaptively \(and rarely\) decides when to update its estimate of the reward parameters\. While both the settings clearly offer practical efficiency by reducing the number of parameter estimations, the batched setting also enables parallelization, i\.e\., rounds within a batch can be executed independently of each other, significantly improving throughput\.

Motivated by these challenges, we tackle the online contextual slate bandit problem in both these settings of limited adaptivity\. Further, we assume that the environment provides a single reward for the selected slate \(bandit feedback\), generated by a Generalized Linear Model \(GLM\) with unknown parameters\. We summarize our contributions below\.

### 1\.1Our Contributions

First, in Section[3](https://arxiv.org/html/2606.31449#S3), we presentB\-SlateGLinCB\(Algorithm[1](https://arxiv.org/html/2606.31449#alg1)\), a batched algorithm for the Contextual Slate Bandit problem with GLM rewards, that operates over𝒪​\(log⁡log⁡T\)\\mathcal\{O\}\(\\log\\log T\)batches\. We prove that, if the sets of items are chosen stochastically, then, at the end ofTTrounds, under a popular diversity assumption \(Assumption[2\.1](https://arxiv.org/html/2606.31449#S2.Thmassumption1)\),B\-SlateGLinCBincurs𝒪~​\(N​d3/2​T\)\\tilde\{\\mathcal\{O\}\}\(Nd^\{3/2\}\\sqrt\{T\}\)regret, where each item is represented by add\-dimensional feature vector\. In Algorithm[3](https://arxiv.org/html/2606.31449#alg3), Appendix[B](https://arxiv.org/html/2606.31449#A2), we also show an alternate approach using Distributional Optimal Designs\[[27](https://arxiv.org/html/2606.31449#bib.bib6)\], and obtain a regret guarantee of𝒪~​\(N​d​T​min⁡\{d,N\}\)\\tilde\{\\mathcal\{O\}\}\(Nd\\sqrt\{T\}\\min\\\{\\sqrt\{d\},\\sqrt\{N\}\\\}\)\.

Next, in Section[4](https://arxiv.org/html/2606.31449#S4), we presentRS\-SlateGLinCB\(Algorithm[2](https://arxiv.org/html/2606.31449#alg2)\), a rarely\-switching algorithm for the Contextual Slate Bandit problem with GLM rewards, that estimates reward parameters only𝒪​\(N​d​log⁡T\)\\mathcal\{O\}\(Nd\\log T\)times\. We prove that, at the end ofTTrounds, for adversarially chosen item sets, under the same diversity assumption as above,RS\-SlateGLinCBincurs𝒪​\(N​d​T\)\\mathcal\{O\}\(Nd\\sqrt\{T\}\)regret, matching the regret bound ofSlate\-GLM\-OFU\(Algorithm 1,\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]\) which estimates parameters at allTTrounds, i\.e\., is not constrained by limited adaptivity\.

A key feature of both our algorithms is per\-round efficiency\. They exhibitpoly​\(N\)\\text\{poly\}\(N\)per round time complexity by selecting the items for the slots independently of each other\. By doing so, they avoid iterating over the2Ω​\(N\)2^\{\\Omega\(N\)\}set of possible slates, making them practically feasible whenNNis large\.

Finally, in Section[5](https://arxiv.org/html/2606.31449#S5), under diverse experimental settings, we empirically demonstrate that both our algorithms achieve sublinear regret, significantly outperform other limited adaptivity baselines, and thatRS\-SlateGLinCBis quite competitive withSlate\-GLM\-OFU, a fully adaptive algorithm\. We also proposeB\-SlateGLinCB\+, a batched algorithm with slight modifications toB\-SlateGLinCB, and show that its regret matches that ofSlate\-GLM\-OFU\. UsingB\-SlateGLinCB\+, we implement prompt tuning on language models with exemplar selection\. We demonstrate strong performance in binary classification tasks and show that our performance matches that ofSlate\-GLM\-OFU\.

### 1\.2Related Work

Slate Bandits: Due to their practical relevance in real\-world applications such as recommendation systems and advertising, slate bandits have recently attracted considerable attention\[[5](https://arxiv.org/html/2606.31449#bib.bib22),[14](https://arxiv.org/html/2606.31449#bib.bib21)\]\. However, many of these works lack rigorous theoretical foundations, which have been explored in a separate line of research\[[31](https://arxiv.org/html/2606.31449#bib.bib19),[15](https://arxiv.org/html/2606.31449#bib.bib1),[26](https://arxiv.org/html/2606.31449#bib.bib3),[18](https://arxiv.org/html/2606.31449#bib.bib20),[7](https://arxiv.org/html/2606.31449#bib.bib2),[12](https://arxiv.org/html/2606.31449#bib.bib5)\]\. While several of these works\[[31](https://arxiv.org/html/2606.31449#bib.bib19),[18](https://arxiv.org/html/2606.31449#bib.bib20),[26](https://arxiv.org/html/2606.31449#bib.bib3),[15](https://arxiv.org/html/2606.31449#bib.bib1)\]assume semi\-bandit feedback \(a reward for for each item chosen in the slate\), more recent efforts such as\[[7](https://arxiv.org/html/2606.31449#bib.bib2)\]and\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]address the challenging slate\-level bandit feedback scenario\. In particular,\[[7](https://arxiv.org/html/2606.31449#bib.bib2)\]use a heuristic\-based method to attribute the bandit feedback to each of the slots, while\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]decompose the selection rule into a slot\-level selection rule, allowing their algorithms to avoid iterating over the exponential sized set of candidate slates while still obtaining optimal regret guarantees\. However, these algorithms update their parameters at each round, and hence, are not easily adaptable to limited adaptivity settings\.

Limited Adaptivity: Recently, there has been considerable interest in the batched and rarely\-switching limited adaptivity settings\. In the multi\-armed bandit setting, several works have studied the advantages of batching\[[3](https://arxiv.org/html/2606.31449#bib.bib32),[24](https://arxiv.org/html/2606.31449#bib.bib33),[11](https://arxiv.org/html/2606.31449#bib.bib9)\], Subsequently,\[[27](https://arxiv.org/html/2606.31449#bib.bib6)\]proposed batched algorithms for contextual linear bandits, by introducing distributional optimal designs, and using them to guide and determine policy updates\. Building on these ideas, recent work has explored batched algorithms for more complex reward models, including GLMs\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]and multinomial logit \(MNL\) models\[[21](https://arxiv.org/html/2606.31449#bib.bib8)\]\. The rarely\-switching setting for contextual linear bandits was introduced by\[[1](https://arxiv.org/html/2606.31449#bib.bib23)\]and since, has been studied for other reward models as well\[[28](https://arxiv.org/html/2606.31449#bib.bib4),[21](https://arxiv.org/html/2606.31449#bib.bib8)\]\. However, these algorithms do not easily extend to the slate bandit setting, where combinatorial action spaces and structured feedback introduce unique challenges not addressed in prior batched bandit literature\.

## 2Notations and Problem Setup

In this section, we define some general notations and describe the problem setup in complete detail\.

We represent the sets\{1,…,N\}\\\{1,\\ldots,N\\\}and\{m,…,N\}\\\{m,\\ldots,N\\\}as\[N\]\[N\]and\[m,N\]\[m,N\]respectively\. Unless otherwise specified, all vectors, matrices, and sets are represented using bold lower case, bold upper case, and calligraphic upper case letters respectively\. A matrix𝑨\\bm\{A\}is said to be positive semi\-definite \(p\.s\.d\), denoted𝑨⪰0\\bm\{A\}\\succeq 0, if all the eigenvalues of𝑨\\bm\{A\}are non\-negative\. We define the norm of a vector𝒙\\bm\{x\}with respect to a p\.s\.d matrix𝑨\\bm\{A\}as∥𝒙∥𝑨=𝒙⊤​𝑨​𝒙\\lVert\\bm\{x\}\\rVert\_\{\\bm\{A\}\}=\\sqrt\{\\bm\{x\}^\{\\top\}\\bm\{A\}\\bm\{x\}\}\. We useℙ\\mathbb\{P\}and𝔼\\operatorname\*\{\\mathbb\{E\}\}to denote the probability and expectation of a quantity respectively\. For any vector𝒙=\(x11,…,xd1,…,x1N,…​xdN\)∈ℝN​d\\bm\{x\}=\(x^\{1\}\_\{1\},\\ldots,x^\{1\}\_\{d\},\\ldots,x^\{N\}\_\{1\},\\ldots x^\{N\}\_\{d\}\)\\in\\mathbb\{R\}^\{Nd\},𝒙i=\(x1i,…,xdi\)∈ℝd\\bm\{x\}^\{i\}=\(x^\{i\}\_\{1\},\\ldots,x^\{i\}\_\{d\}\)\\in\\mathbb\{R\}^\{d\}denotes theit​hi^\{th\}block of𝒙\\bm\{x\}\. Finally, we use𝒪~\(\.\)\\tilde\{\\mathcal\{O\}\}\(\.\)to suppress polylogarithmic factors\.

### 2\.1Contextual Slate Bandits

LetT∈ℕT\\in\\mathbb\{N\}denote the total number of rounds of interaction between a learner and an environment\. In the contextual slate bandit problem, at each roundt∈\[T\]t\\in\[T\], the learner is presented withNNsets of items𝒳t1,…​𝒳tN⊂ℝN​d\\mathcal\{X\}^\{1\}\_\{t\},\\ldots\\mathcal\{X\}^\{N\}\_\{t\}\\subset\\mathbb\{R\}^\{Nd\}\. For eachi∈\[N\]i\\in\[N\], the learner selects an item𝒙ti∈𝒳ti\\bm\{x\}^\{i\}\_\{t\}\\in\\mathcal\{X\}^\{i\}\_\{t\}, thereby constructing a slate𝒙=\(𝒙t1,…,𝒙tN\)∈𝒳t:=𝒳t1×…×𝒳tN\\bm\{x\}=\(\\bm\{x\}^\{1\}\_\{t\},\\ldots,\\bm\{x\}^\{N\}\_\{t\}\)\\in\\mathcal\{X\}\_\{t\}:=\\mathcal\{X\}^\{1\}\_\{t\}\\times\\ldots\\times\\mathcal\{X\}^\{N\}\_\{t\}\. We say item𝒙ti\\bm\{x\}^\{i\}\_\{t\}is used in theit​hi^\{th\}“slot” on the slate\. The environment then reveals to her a single scalarrt​\(𝒙t\)r\_\{t\}\(\\bm\{x\}\_\{t\}\)\. The goal of the learner is to minimize her cumulative regretR​\(T\)R\(T\), defined as,

R​\(T\)=𝔼\[∑t∈\[T\]max𝒙∈𝒳t⁡rt​\(𝒙\)−rt​\(𝒙t\)\],\\displaystyle R\(T\)=\\operatorname\*\{\\mathbb\{E\}\}\\left\[\\sum\\limits\_\{t\\in\[T\]\}\\max\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}r\_\{t\}\(\\bm\{x\}\)\-r\_\{t\}\(\\bm\{x\}\_\{t\}\)\\right\],\(1\)where the expectation is over the randomness in the rewards\.

### 2\.2Generalized Linear Models \(GLMs\)

We follow the definition of GLMs provided in Definition2\.12\.1,\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\. Letr∈ℝr\\in\\mathbb\{R\}be a random variable and𝒙∈ℝN​d\\bm\{x\}\\in\\mathbb\{R\}^\{Nd\}be a random vector in the Euclidean space\. We say thatr​\(𝒙\)r\(\\bm\{x\}\)is sampled from a GLM, if, the conditional random variabler∣𝒙r\\mid\\bm\{x\}is distributed as per an exponential distribution, i\.e\.,

ℙ𝜽⋆​\(r∣𝒙\)=exp⁡\(r⋅\(𝒙⊤​𝜽⋆\)−b​\(𝒙⊤​𝜽⋆\)\+c​\(r\)\)\.\\displaystyle\\mathbb\{P\}\_\{\\bm\{\\theta\}^\{\\star\}\}\(r\\mid\\bm\{x\}\)=\\exp\\left\(r\\cdot\(\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-b\(\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\+c\(r\)\\right\)\.Here𝜽⋆∈ℝN​d\\bm\{\\theta\}^\{\\star\}\\in\\mathbb\{R\}^\{Nd\}parametrizes the density function\. Further, following\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\], we assume thatbbis twice\-differentiable,b˙\\dot\{b\}is assumed to be monotonic, andr∈\[0,R\]r\\in\[0,R\]almost surely, for some knownR∈ℝR\\in\\mathbb\{R\}\.

We define the link functionμ\\muasμ​\(𝒙t⊤​𝜽⋆\)=𝔼\[rt∣𝒙t\]=b˙​\(𝒙t⊤​𝜽⋆\)\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)=\\operatorname\*\{\\mathbb\{E\}\}\[r\_\{t\}\\mid\\bm\{x\}\_\{t\}\]=\\dot\{b\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\. Thus,μ\\muis monotonic, and further, following\[[10](https://arxiv.org/html/2606.31449#bib.bib10)\], we assume it to beLμL\_\{\\mu\}\-Lipschitz\. A significant property of GLMs is the*self\-concordance*property\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\], i\.e, for GLMs supported on\[0,R\]\[0,R\],\|μ¨​\(z\)\|≤R​μ˙​\(z\)​∀z∈ℝ\\lvert\\ddot\{\\mu\}\(z\)\\rvert\\leq R\\dot\{\\mu\}\(z\)\\penalty 10000\\ \\forall\\penalty 10000\\ z\\in\\mathbb\{R\}\.

### 2\.3Contextual Slate Bandits with Limited Adaptivity

In the limited adaptivity setting, the learner is constrained to makeMMpolicy updates\. Our goal is to solve the contextual bandit problem with GLM rewards parametrized by an unknown parameter vector𝜽⋆\\bm\{\\theta\}^\{\\star\}in both the prevalent limited adaptivity settings:*batched*and*rarely\-switching*\. Next, we formally describe these settings\.

Batched Slate GLM Bandits :The learner is required to partition the horizon\[T\]\[T\]intoMMdisjoint batches𝒯1,…,𝒯M\\mathcal\{T\}\_\{1\},\\ldots,\\mathcal\{T\}\_\{M\}*apriori*, and the policy of selecting slates can only be updated between two consecutive batches\. Therefore, during roundt∈𝒯mt\\in\\mathcal\{T\}\_\{m\}, the policy can only utilize the set of observations from previous batches\{𝒯i\}i=1m−1\\\{\\mathcal\{T\}\_\{i\}\\\}\_\{i=1\}^\{m\-1\}and the present set of items\{𝒳ti\}i∈\[N\]\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\\}\_\{i\\in\[N\]\}, allowing for parallelization within a batch\. It is known that whenM=Ω​\(log⁡log⁡T\)M=\\Omega\(\\log\\log T\),Θ​\(log⁡log⁡T\)\\Theta\(\\log\\log T\)batches suffice to obtain optimal regret inTT\[[3](https://arxiv.org/html/2606.31449#bib.bib32),[11](https://arxiv.org/html/2606.31449#bib.bib9)\]\. Hence, we develop algorithms that make𝒪​\(log⁡log⁡T\)\\mathcal\{O\}\(\\log\\log T\)updates\.111WhenM=o​\(log⁡log⁡T\)M=o\(\\log\\log T\), our algorithms easily extend to the generic schedule presented in\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\.Further, we assume that in each sloti∈\[N\]i\\in\[N\], the set of items𝒳ti​∀t∈\[T\]\\mathcal\{X\}^\{i\}\_\{t\}\\penalty 10000\\ \\forall t\\in\[T\]are sampled independently from a distribution𝒟i\\mathcal\{D\}^\{i\}supported onℝd\\mathbb\{R\}^\{d\}\. The goal of the learner is to minimize the expected cumulative regret defined in \([1](https://arxiv.org/html/2606.31449#S2.E1)\), where the expectation also incorporates the randomness in all the item sets\{𝒳ti\}t∈\[T\],i∈\[N\]\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\\}\_\{t\\in\[T\],i\\in\[N\]\}\.

Rarely\-Switching Slate GLM Bandits:Here, while the learner is constrained to estimate𝜽⋆\\bm\{\\theta\}^\{\\star\}onlyMMtimes, she can adaptively decide when to make these estimates\. We present an algorithm that makesM=𝒪​\(log⁡T\)M=\\mathcal\{O\}\(\\log T\)policy updates, matching the lower bound in\[[27](https://arxiv.org/html/2606.31449#bib.bib6)\]up to polylog factors\. We do not assume any stochasticity in the item sets, i\.e\., they can be adversarial\. Our goal is to minimize the expected cumulative regret as defined in \([1](https://arxiv.org/html/2606.31449#S2.E1)\)\.

### 2\.4Non\-Linearity Parameterκ\\kappa

An important quantity that often arises while dealing with GLMs is an instance\-dependent non\-linear parameterκ\\kappa, defined as

κ=sup𝒙∈𝒳sup𝜽:∥𝜽∥≤S1μ˙​\(𝒙⊤​𝜽\),\\displaystyle\\kappa=\\sup\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\sup\_\{\\bm\{\\theta\}:\\lVert\\bm\{\\theta\}\\rVert\\leq S\}\\frac\{1\}\{\\dot\{\\mu\}\(\\bm\{x\}^\{\\top\}\\bm\{\\theta\}\)\},\(2\)where𝒳\\mathcal\{X\}is the set of all actions \(slates, in our case\) across all rounds, andSSis an upper bound on‖𝜽⋆‖2\\\|\\bm\{\\theta\}^\{\\star\}\\\|\_\{2\}\. Intuitively,κ\\kappaquantifies the deviation of the reward model from linearity, and can be exponential in‖𝜽⋆‖2\\\|\\bm\{\\theta\}^\{\\star\}\\\|\_\{2\}\[[8](https://arxiv.org/html/2606.31449#bib.bib13)\]\. As a result, several works utilizing non\-linear reward models in batched as well as non\-batched settings\[[9](https://arxiv.org/html/2606.31449#bib.bib14),[28](https://arxiv.org/html/2606.31449#bib.bib4),[32](https://arxiv.org/html/2606.31449#bib.bib15),[21](https://arxiv.org/html/2606.31449#bib.bib8),[12](https://arxiv.org/html/2606.31449#bib.bib5)\]have focused on achievingκ\\kappa\-free regret bounds \(in the leading term\)\.

### 2\.5Additional Assumptions

Following the works of several other GLM bandit papers, we assume that the norm of the hidden reward parameter∥𝜽⋆∥2≤S\\lVert\\bm\{\\theta\}^\{\\star\}\\rVert\_\{2\}\\leq S, withSSbeing known\. Also, following\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\], for all roundst∈\[T\]t\\in\[T\]and all slotsi∈\[N\]i\\in\[N\], for any𝒛∈𝒳ti\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}, we have∥𝒛∥2≤1N\\lVert\\bm\{z\}\\rVert\_\{2\}\\leq\\frac\{1\}\{\\sqrt\{N\}\}\. While these assumptions are somewhat standard in the bandit literature, we make an additional “diversity” assumption described below\.

###### Assumption 2\.1\(Diversity Assumption\)\.

We assume that our algorithm ensures that the sequence of items selected are “diverse” enough, i\.e\. for all slotsi∈\[N\]i\\in\[N\]and someρ\>0\\rho\>0

𝔼\[𝒙ti∣ℱt−1\]=𝟎and𝔼\[𝒙ti​𝒙ti⊤∣ℱt−1\]⪰ρ​𝑰,\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{x\}^\{i\}\_\{t\}\\mid\\mathcal\{F\}\_\{t\-1\}\]=\\bm\{0\}\\quad\\text\{ and \}\\quad\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\mid\\mathcal\{F\}\_\{t\-1\}\]\\succeq\\rho\\bm\{I\},where𝟎\\bm\{0\}and𝐈\\bm\{I\}represent the zero vector and the identity matrix, while the filtrationℱt−1=σ​\(𝐱1,r1,…,𝐱t−1,rt−1\)\\mathcal\{F\}\_\{t\-1\}=\\sigma\(\\bm\{x\}\_\{1\},r\_\{1\},\\ldots,\\bm\{x\}\_\{t\-1\},r\_\{t\-1\}\)encodes all the information up till timett\.

Note that\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]assume that the eigenvalues of the f design matrix grow asΩ​\(κ\)\.\\Omega\(\\kappa\)\.222i\.e,\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]assume𝔼\[𝒙ti​𝒙ti⊤∣ℱt−1\]⪰ρ​κ​𝑰\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\mid\\mathcal\{F\}\_\{t\-1\}\]\\succeq\\rho\\kappa\\bm\{I\}for someρ\>0\\rho\>0\.We show that it suffices to assume that the eigenvalues grow asΩ​\(1\)\\Omega\(1\), hence, matching the diversity assumptions made in relevant linear bandit literature\[[6](https://arxiv.org/html/2606.31449#bib.bib17),[4](https://arxiv.org/html/2606.31449#bib.bib34),[2](https://arxiv.org/html/2606.31449#bib.bib35),[16](https://arxiv.org/html/2606.31449#bib.bib37),[25](https://arxiv.org/html/2606.31449#bib.bib38),[23](https://arxiv.org/html/2606.31449#bib.bib16)\]\. Thus, our assumption is strictly weaker than the one in\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]\.

Intuitively, these conditions ensure that the items chosen in each slot span the entire space in a way that the associated design matrices have eigenvalues sufficiently bounded away from zero\. Such a diversity assumption has been used in several prior works\[[12](https://arxiv.org/html/2606.31449#bib.bib5),[6](https://arxiv.org/html/2606.31449#bib.bib17),[23](https://arxiv.org/html/2606.31449#bib.bib16),[4](https://arxiv.org/html/2606.31449#bib.bib34)\]to obtain strong regret bounds\. Similar to\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\], we use this assumption to prove that the eigenvalues of the design matrices used in our algorithms grow \(sufficiently\) linearly\. We refer the reader to Section 2\.1 of\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]for a thorough discussion of the diversity assumption\. Also, in Appendix[H](https://arxiv.org/html/2606.31449#A8), we empirically validate the linear growth of eigenvalues of the design matrices for our algorithms\.

### 2\.6GLM\-MLE Loss

Let𝒙s⊂ℝN​d\\bm\{x\}\_\{s\}\\subset\\mathbb\{R\}^\{Nd\}be the slate selected at rounds∈\[t−1\]s\\in\[t\-1\]andrsr\_\{s\}be the corresponding reward\. The maximum likelihood estimator \(MLE\),𝜽^t\\widehat\{\\bm\{\\theta\}\}\_\{t\}, based on these observations, is the maximizer of the function,

∑s=1t−1log⁡ℙ𝜽​\(rs∣𝒙s\)=∑s=1t−1rs⋅𝒙s⊤​𝜽−μ​\(𝒙s⊤​𝜽\)\.\\displaystyle\\sum\\limits\_\{s=1\}^\{t\-1\}\\log\\mathbb\{P\}\_\{\\bm\{\\theta\}\}\(r\_\{s\}\\mid\\bm\{x\}\_\{s\}\)=\\sum\\limits\_\{s=1\}^\{t\-1\}r\_\{s\}\\cdot\\bm\{x\}\_\{s\}^\{\\top\}\\bm\{\\theta\}\-\\mu\(\\bm\{x\}\_\{s\}^\{\\top\}\\bm\{\\theta\}\)\.\(3\)We refer the readers to Sections22and33of\[[10](https://arxiv.org/html/2606.31449#bib.bib10)\]for more details\. Note that \([3](https://arxiv.org/html/2606.31449#S2.E3)\) is an unconstrained optimization problem\. If the MLE𝜽^t\\widehat\{\\bm\{\\theta\}\}\_\{t\}lies outside the set of admissible parametersΘ=\{𝜽:∥𝜽∥2≤S\}\\Theta=\\\{\\bm\{\\theta\}:\\lVert\\bm\{\\theta\}\\rVert\_\{2\}\\leq S\\\}, we project𝜽^t\\widehat\{\\bm\{\\theta\}\}\_\{t\}back on toΘ\\Theta, worsening the regret byp​o​l​y​\(R,S\)poly\(R,S\)\(see Appendix E of\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]for a more detailed explanation\)\. Henceforth, for the sake of exposition, we assume𝜽^t∈Θ,∀t∈\[T\]\\widehat\{\\bm\{\\theta\}\}\_\{t\}\\in\\Theta,\\penalty 10000\\ \\forall\\penalty 10000\\ t\\in\[T\]\. However, all results easily extend to include the projection described above\.

### 2\.7G\-Optimal Design

Let𝒳⊂ℝd\\mathcal\{X\}\\subset\\mathbb\{R\}^\{d\}\. The G\-Optimal designπG​\(𝒳\)\\pi\_\{G\}\(\\mathcal\{X\}\)is a probability distribution on𝒳\\mathcal\{X\}defined as

πG\(𝒳\)=arg​minπ∈Δ​\(𝒳\)max𝒙∈𝒳∥𝒙∥𝑽−12where𝑽=𝔼π\[𝒙𝒙⊤\]\.\\pi\_\{G\}\(\\mathcal\{X\}\)=\\operatorname\*\{arg\\,min\}\_\{\\pi\\in\\Delta\(\\mathcal\{X\}\)\}\\max\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{V\}^\{\-1\}\}\\text\{ where \}\\bm\{V\}=\\mathbb\{E\}\_\{\\pi\}\[\\bm\{x\}\\bm\{x\}^\{\\top\}\]\.Here,Δ​\(𝒳\)\\Delta\(\\mathcal\{X\}\)is the set of all probability distributions over𝒳\\mathcal\{X\}\. The Keifer\-Wolfowitz theorem\[[17](https://arxiv.org/html/2606.31449#bib.bib18)\]states thatmax𝒙∈𝒳∥𝒙∥𝑽​\(πG​\(𝒳\)\)−12≤d\\max\\limits\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{V\}\(\\pi\_\{G\}\(\\mathcal\{X\}\)\)^\{\-1\}\}\\leq d\. While computing an exact G\-optimal design is known to be NP\-hard\[grötschel2012geometric,[30](https://arxiv.org/html/2606.31449#bib.bib29)\], there exist efficient333polynomial in\|𝒳\|\|\\mathcal\{X\}\|anddd\.algorithms to compute approximate optimal designs \(see Chapter 21,\[[19](https://arxiv.org/html/2606.31449#bib.bib26)\]\)\.

## 3B\-SlateGLinCB

In this section, we present a batched algorithm for Contextual Slate GLM Bandits, which we refer to asB\-SlateGLinCB\(Algorithm[1](https://arxiv.org/html/2606.31449#alg1)\)\. First, we provide a detailed explanation of the algorithm, highlighting the non\-trivialities involved in a multi\-slot batched algorithm\. Next, in Section[3\.2](https://arxiv.org/html/2606.31449#S3.SS2), we provide a regret guarantee for it\. Finally, in Section[3\.3](https://arxiv.org/html/2606.31449#S3.SS3), we make some additional remarks\.

Algorithm 1B\-SlateGLinCB1:Inputs:Number of Slots

NN, Number of batches

MM, Horizon

TT, Parameter norm bound

SS, Failure Level

δ\\delta, and non\-linearity

κ\\kappa\.

2:Initialize

𝜽0=𝟎N​d\\bm\{\\theta\}\_\{0\}=\\bm\{0\}\_\{Nd\}and

λ=𝒪​\(N​d​R2​log⁡T/δ\)\\lambda=\\mathcal\{O\}\(NdR^\{2\}\\log T/\\delta\)\.

3:Define batches

𝒯0,𝒯1,…,𝒯M\\mathcal\{T\}\_\{0\},\\mathcal\{T\}\_\{1\},\\ldots,\\mathcal\{T\}\_\{M\}as per \([4](https://arxiv.org/html/2606.31449#S3.E4)\)\.

4:\{Warmup Batch\}

5:for

t∈𝒯0t\\in\\mathcal\{T\}\_\{0\}do

6:Receive the set of items

𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}for all slots

i∈\[N\]i\\in\[N\]\.

7:for

i∈\[N\]i\\in\[N\]do

8:Obtain

𝒙ti∼πG​\(𝒳ti\)\\bm\{x\}^\{i\}\_\{t\}\\sim\\pi\_\{G\}\(\\mathcal\{X\}^\{i\}\_\{t\}\)\(defined in Section \([2\.7](https://arxiv.org/html/2606.31449#S2.SS7)\)\)\.

9:endfor

10:Play the slate

𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}^\{1\}\_\{t\},\\ldots,\\bm\{x\}^\{N\}\_\{t\}\)and obtain

rtr\_\{t\}\.

11:endfor

12:Compute

𝜽^0=arg​min𝜽​∑t∈𝒯0ℓ​\(𝒙t,rt,𝜽\)\\widehat\{\\bm\{\\theta\}\}\_\{0\}=\\operatorname\*\{arg\\,min\}\_\{\\bm\{\\theta\}\}\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\ell\(\\bm\{x\}\_\{t\},r\_\{t\},\\bm\{\\theta\}\)as per \([3](https://arxiv.org/html/2606.31449#S2.E3)\)\.

13:Define

𝑽0i=λ​𝑰d\+∑t∈𝒯0𝒙ti​𝒙ti⊤\\bm\{V\}^\{i\}\_\{0\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}for all slots

i∈\[N\]i\\in\[N\]\.//Other batches

14:for

m∈\[M\]m\\in\[M\]do

15:for

t∈𝒯mt\\in\\mathcal\{T\}\_\{m\}do

16:Receive the set of items

𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}for all slots

i∈\[N\]i\\in\[N\]\.

17:for

i∈\[N\]i\\in\[N\]do

18:for

l∈\[0,m−1\]l\\in\[0,m\-1\]do

19:\{Perform elimination\}

20:

𝒳ti←\{𝒛∈𝒳ti:UCBi,k​\(𝒛\)≥max𝒚∈𝒳ti⁡LCBi,k​\(𝒚\)\}\\mathcal\{X\}^\{i\}\_\{t\}\\leftarrow\\\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}:\\textrm\{UCB\}^\{i,k\}\(\\bm\{z\}\)\\geq\\max\_\{\\bm\{y\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,k\}\(\\bm\{y\}\)\\\}\.

21:endfor

22:Obtain

𝒙ti∼πG​\(𝒳ti\)\\bm\{x\}^\{i\}\_\{t\}\\sim\\pi\_\{G\}\(\\mathcal\{X\}^\{i\}\_\{t\}\)\(defined in Section \([2\.7](https://arxiv.org/html/2606.31449#S2.SS7)\)\) and

𝒃ti=arg​max𝒚∈𝒳ti∥𝒚∥\(𝑽0i\)−1\\bm\{b\}^\{i\}\_\{t\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{y\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{y\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\.

23:endfor

24:Play the slate

𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}^\{1\}\_\{t\},\\ldots,\\bm\{x\}^\{N\}\_\{t\}\)and obtain

rtr\_\{t\}\.

25:Construct the slate

𝒃t=\(𝒃t1,…,𝒃tN\)\\bm\{b\}\_\{t\}=\(\\bm\{b\}^\{1\}\_\{t\},\\ldots,\\bm\{b\}^\{N\}\_\{t\}\)\.

26:endfor

27:

𝜽^m=arg​min𝜽​∑t∈𝒯mℓ​\(𝒙t,rt,𝜽\)\\widehat\{\\bm\{\\theta\}\}\_\{m\}=\\operatorname\*\{arg\\,min\}\_\{\\bm\{\\theta\}\}\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\ell\(\\bm\{x\}\_\{t\},r\_\{t\},\\bm\{\\theta\}\)as per \([3](https://arxiv.org/html/2606.31449#S2.E3)\)\.

28:

𝑯mi=λ​𝑰d\+∑t∈𝒯mμ˙​\(𝒃t⊤​𝜽0\)βt​𝒙ti​𝒙ti⊤​∀i∈\[N\]\\bm\{H\}^\{i\}\_\{m\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\frac\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\bm\{\\theta\}\_\{0\}\)\}\{\\beta\_\{t\}\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\;\\forall i\\in\[N\]\([9](https://arxiv.org/html/2606.31449#S3.E9)\)\.

29:endfor

### 3\.1Algorithmic Description

B\-SlateGLinCBtakes as inputs the number of slotsNN, the number of batchesMM, the length of the horizonTT, an upper boundSSon theℓ2\\ell\_\{2\}\-norm of the true reward parameter𝜽⋆\\bm\{\\theta\}^\{\\star\}, the probability of failureδ\\delta, and the instance\-dependent nonlinearity parameterκ\\kappa\.444An upper bound onκ\\kappaandSSsuffices\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\.As discussed in Section[2\.3](https://arxiv.org/html/2606.31449#S2.SS3), when the number of batches isΩ​\(log⁡log⁡T\)\\Omega\(\\log\\log T\), we only requireM=Θ​\(log⁡log⁡T\)M=\\Theta\(\\log\\log T\)batches\. Hence, without loss of generality, we develop our algorithm forM=𝒪​\(log⁡log⁡T\)M=\\mathcal\{O\}\(\\log\\log T\)batches\. First, in*Step 3*, we define theM\+1M\+1batches𝒯0,…,𝒯M\\mathcal\{T\}\_\{0\},\\ldots,\\mathcal\{T\}\_\{M\}\. and define the batches to be consecutive disjoint subsets of\[T\]\[T\]with lengths given as,

\|𝒯0\|=⌊T⌋,\|𝒯m\|=⌊T1−2−m⌋∀m∈\[M\]\.\\displaystyle\\lvert\\mathcal\{T\}\_\{0\}\\rvert=\\lfloor\\sqrt\{T\}\\rfloor\\quad,\\quad\\lvert\\mathcal\{T\}\_\{m\}\\rvert=\\lfloor T^\{1\-2^\{\-m\}\}\\rfloor\\quad\\forall m\\in\[M\]\.\(4\)Warm\-up Batch: We begin with a warm\-up batch𝒯0\\mathcal\{T\}\_\{0\}\(*Steps 4\-11*\)\. At each roundt∈𝒯0t\\in\\mathcal\{T\}\_\{0\}, for each sloti∈\[N\]i\\in\[N\], we receive the set of items𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}\. Then, for each sloti∈\[N\]i\\in\[N\], the algorithm samples an item𝒙ti\\bm\{x\}^\{i\}\_\{t\}from a G\-optimal design as per \([3](https://arxiv.org/html/2606.31449#S2.E3)\) computed over𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}and plays the resultant slate𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}^\{1\}\_\{t\},\\ldots,\\bm\{x\}^\{N\}\_\{t\}\), receiving feedbackrtr\_\{t\}\. At the end of this batch, in*Step 12*, we compute an estimate𝜽^0\\widehat\{\\bm\{\\theta\}\}\_\{0\}of the reward parameters𝜽⋆\\bm\{\\theta\}^\{\\star\}, by minimizing the GLM\-MLE loss as per \([3](https://arxiv.org/html/2606.31449#S2.E3)\) over the set\{\(𝒙t,rt\)\}t∈𝒯0\\\{\(\\bm\{x\}\_\{t\},r\_\{t\}\)\\\}\_\{t\\in\\mathcal\{T\}\_\{0\}\}\. Then, in*Step 13*, for all slotsi∈\[N\]i\\in\[N\], we compute the design matrices𝑽0i=λ​𝑰d\+∑t∈𝒯0𝒙ti​𝒙ti⊤\\bm\{V\}\_\{0\}^\{i\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}for all the items chosen in theit​hi^\{th\}slot in batch𝒯0\\mathcal\{T\}\_\{0\}\. Here,λ=𝒪​\(N​d​R2​log⁡\(T/δ\)\)\\lambda=\\mathcal\{O\}\(NdR^\{2\}\\log\(T/\\delta\)\)\. The estimate𝜽^0\\widehat\{\\bm\{\\theta\}\}\_\{0\}and matrices𝑽0i\\bm\{V\}\_\{0\}^\{i\}are utilized in the subsequent batches to control regret\.

Batchesm∈\[M\]m\\in\[M\]: In*Steps 15\-28*, we execute themt​hm^\{th\}batch \(m∈\[M\]m\\in\[M\]\)\. For each roundt∈𝒯mt\\in\\mathcal\{T\}\_\{m\}, we receive theNNsets of items\{𝒳ti\}i∈\[N\]\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\\}\_\{i\\in\[N\]\}\. For each sloti∈\[N\]i\\in\[N\], we prune𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}\(*Steps 17\-21*\) based on a criterion we discuss next\. For any sloti∈\[N\]i\\in\[N\], item𝒛∈𝒳ti\\bm\{z\}\\in\\mathcal\{X\}\_\{t\}^\{i\}and a prior batchl∈\[0,m−1\]l\\in\[0,m\-1\], define the scoresUCBi,l​\(𝒛\)\\textrm\{UCB\}^\{i,l\}\(\\bm\{z\}\)andLCBi,l​\(𝒛\)\\textrm\{LCB\}^\{i,l\}\(\\bm\{z\}\)\(upper and lower confidence bounds respectively\) as follows:

UCBi,l​\(𝒛\)=\{𝒛⊤​𝜽^0i\+2​κ​γ​∥𝒛∥\(𝑽0i\)−1l=0,𝒛⊤​𝜽^li\+2​γ​∥𝒛∥\(𝑯li\)−1l≠0\.\\displaystyle\\textrm\{UCB\}^\{i,l\}\(\\bm\{z\}\)=\\begin\{cases\}\\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}^\{i\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}&l=0,\\\\ \\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{l\}^\{i\}\+2\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}\_\{l\}^\{i\}\)^\{\-1\}\}&l\\neq 0\.\\end\{cases\}\(5\)LCBi,l​\(𝒛\)=\{𝒛⊤​𝜽^0i−2​κ​γ​∥𝒛∥\(𝑽0i\)−1l=0,𝒛⊤​𝜽^li−2​γ​∥𝒛∥\(𝑯li\)−1l≠0\.\\displaystyle\\textrm\{LCB\}^\{i,l\}\(\\bm\{z\}\)=\\begin\{cases\}\\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}^\{i\}\-2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}&l=0,\\\\ \\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{l\}^\{i\}\-2\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}\_\{l\}^\{i\}\)^\{\-1\}\}&l\\neq 0\.\\end\{cases\}\(6\)Here,γ=𝒪​\(S​R​N​d​log⁡\(T/δ\)\)\\gamma=\\mathcal\{O\}\(SR\\sqrt\{Nd\\log\(T/\\delta\)\}\)\. In the definitions above \(forl≠0l\\neq 0\), the first term \(i\.e\.𝒛⊤​𝜽^li\\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{l\}^\{i\}\) utilizes an estimate𝜽^l=\(𝜽^l1,…,𝜽^lN\)\\widehat\{\\bm\{\\theta\}\}\_\{l\}=\(\\widehat\{\\bm\{\\theta\}\}^\{1\}\_\{l\},\\ldots,\\widehat\{\\bm\{\\theta\}\}\_\{l\}^\{N\}\)of the true reward parameters𝜽⋆\\bm\{\\theta\}^\{\\star\}\. This estimate is calculated in*Step 27*by minimizing the GLM\-MLE loss as per \([3](https://arxiv.org/html/2606.31449#S2.E3)\) over the set\{\(𝒙t,rt\)\}t∈𝒯l\\\{\(\\bm\{x\}\_\{t\},r\_\{t\}\)\\\}\_\{t\\in\\mathcal\{T\}\_\{l\}\}at the end of thelt​hl^\{th\}batch\. Forł≠0\\l \\neq 0, the second term in \([5](https://arxiv.org/html/2606.31449#S3.E5)\) and \([6](https://arxiv.org/html/2606.31449#S3.E6)\) \(i\.e\.,±2​γ​∥𝒛∥\(𝑯li\)−1\\pm 2\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}\_\{l\}^\{i\}\)^\{\-1\}\}\) utilizes a slot\-level scaled design matrix

𝑯li=λ​𝑰d\+∑t∈𝒯lμ˙​\(𝒃t⊤​𝜽^0\)βt​𝒙ti​𝒙ti⊤,\\displaystyle\\bm\{H\}\_\{l\}^\{i\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{l\}\}\\frac\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{\\beta\_\{t\}\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\},\(7\)Here,𝒃t=\(𝒃t1,…,𝒃tN\)\\bm\{b\}\_\{t\}=\(\\bm\{b\}\_\{t\}^\{1\},\\ldots,\\bm\{b\}\_\{t\}^\{N\}\)is a slate \(called*scaling\-slate*\) computed at roundtt, with itsit​hi^\{th\}item defined as

𝒃ti=arg​max𝒛∈𝒳ti∥𝒛∥\(𝑽0i\)−1\.\\displaystyle\\bm\{b\}^\{i\}\_\{t\}=\\operatorname\*\{arg\\,max\}\\limits\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\.\(8\)This*scaling\-slate*𝒃t\\bm\{b\}\_\{t\}is used to construct the scaling termμ˙​\(𝒃t⊤​𝜽^0\)\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)in𝑯li\\bm\{H\}\_\{l\}^\{i\}, which is then normalized using

βt=exp⁡\(min⁡\{2​S,6​κ​γ​∑i∈\[N\]∥𝒃ti∥\(𝑽0i\)−1\}\)\.\\displaystyle\\beta\_\{t\}=\\exp\\left\(\\min\\left\\\{2S,6\\sqrt\{\\kappa\}\\gamma\\sum\_\{i\\in\[N\]\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\\\}\\right\)\.\(9\)We provide more details on the choice of this scaled design matrices𝑯li\\bm\{H\}\_\{l\}^\{i\}in Section[3\.1\.1](https://arxiv.org/html/2606.31449#S3.SS1.SSS1)\. Finally, using the scores defined in \([5](https://arxiv.org/html/2606.31449#S3.E5)\) and \([6](https://arxiv.org/html/2606.31449#S3.E6)\), we eliminate all items𝒛∈𝒳ti\\bm\{z\}\\in\\mathcal\{X\}\_\{t\}^\{i\}for whichUCBi,l​\(𝒛\)<LCBi,l​\(𝒚\)\\textrm\{UCB\}^\{i,l\}\(\\bm\{z\}\)<\\textrm\{LCB\}^\{i,l\}\(\\bm\{y\}\)for some previous batchl∈\[0,m−1\]l\\in\[0,m\-1\]and some𝒚∈𝒳ti\\bm\{y\}\\in\\mathcal\{X\}\_\{t\}^\{i\}\. Then, in*Step 22*, we construct a G\-optimal design on the remaining items in𝒳ti\\mathcal\{X\}\_\{t\}^\{i\}and sample an item𝒙ti\\bm\{x\}\_\{t\}^\{i\}from it\. After completing this procedure for all slots, we play the constructed slate𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}\_\{t\}^\{1\},\\ldots,\\bm\{x\}\_\{t\}^\{N\}\)and receive rewardrtr\_\{t\}for it\. We then compute the scaled design matrices𝑯mi\\bm\{H\}\_\{m\}^\{i\}\(*Step 28*\), which will ultimately be used in eliminations performed during batchesl∈\[m\+1,M\]l\\in\[m\+1,M\]\.

#### 3\.1\.1Scaled Matrices𝑯mi\\bm\{H\}\_\{m\}^\{i\}

As described earlier, the scoresUCBi,l\\textrm\{UCB\}^\{i,l\}andLCBi,l\\textrm\{LCB\}^\{i,l\}for sloti∈\[N\]i\\in\[N\]and batchl∈\[M\]l\\in\[M\], used during elimination \(*Step 19\-20*in Algorithm[1](https://arxiv.org/html/2606.31449#alg1)\) utilize a slot\-level scaled design matrix𝑯li\\bm\{H\}\_\{l\}^\{i\}defined in \([7](https://arxiv.org/html/2606.31449#S3.E7)\)\. Moreover, the slate is also constructed by sampling items \(*Step 22*\) for each slot separately\. This ensures that the per\-round time complexity grows aspoly​\(K,N\)\\text\{poly\}\(K,N\)\. We now explain why it also helps us obtain aκ\\kappa\-free optimal regret guarantee\. First, using Assumption[2\.1](https://arxiv.org/html/2606.31449#S2.Thmassumption1)along with a recent technique from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\], in Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136)\(Appendix[A](https://arxiv.org/html/2606.31449#A1)\) we show that the block diagonal matrixd​i​a​g​\(𝑯l1,…,𝑯lN\)diag\(\\bm\{H\}\_\{l\}^\{1\},\\ldots,\\bm\{H\}\_\{l\}^\{N\}\)is multiplicatively equivalent to the matrix

𝑯~l=λ​𝑰N​d\+∑t∈𝒯l\(μ˙​\(𝒃t⊤​𝜽^0\)/βt\)​𝒙t​𝒙tT,\\tilde\{\\bm\{H\}\}\_\{l\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{l\}\}\\left\(\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)/\\beta\_\{t\}\\right\)\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{T\},where\{𝒙t\}t∈𝒯l\\\{\\bm\{x\}\_\{t\}\\\}\_\{t\\in\\mathcal\{T\}\_\{l\}\}is the sequence of actions played during thelt​hl^\{th\}batch andβt\\beta\_\{t\}is the normalization term defined in \([9](https://arxiv.org/html/2606.31449#S3.E9)\)\. Then, we show that𝑯~l≼𝑯l⋆\\tilde\{\\bm\{H\}\}\_\{l\}\\preccurlyeq\\bm\{H\}\_\{l\}^\{\\star\}, where𝑯l⋆\\bm\{H\}\_\{l\}^\{\\star\}is the hessian of the GLM\-MLE loss \(Section[2\.6](https://arxiv.org/html/2606.31449#S2.SS6)\) computed on\{\(𝒙t,rt\)\}t∈𝒯l\\\{\(\\bm\{x\}\_\{t\},r\_\{t\}\)\\\}\_\{t\\in\\mathcal\{T\}\_\{l\}\}, i\.e\.,

𝑯l⋆=λ​𝑰N​d\+∑t∈𝒯lμ˙​\(𝒙t⊤​𝜽⋆\)​𝒙t​𝒙t⊤\.\\bm\{H\}\_\{l\}^\{\\star\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{l\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}\.It is well known from recent literature\[[28](https://arxiv.org/html/2606.31449#bib.bib4),[9](https://arxiv.org/html/2606.31449#bib.bib14)\]that𝑯m⋆\\bm\{H\}\_\{m\}^\{\\star\}can be used to construct confidence sets for the MLE estimator𝜽^l\\widehat\{\\bm\{\\theta\}\}\_\{l\}obtained by minimizing the GLM\-MLE loss on\{\(𝒙t,rt\)\}t∈𝒯l\\\{\(\\bm\{x\}\_\{t\},r\_\{t\}\)\\\}\_\{t\\in\\mathcal\{T\}\_\{l\}\}\. To be precise, for anyδ∈\(0,1\)\\delta\\in\(0,1\), we can show that

ℙ​\[∥𝜽^l−𝜽⋆∥𝑯l⋆≤𝒪​\(N​d​log⁡\(T/δ\)\)\]≥1−δ\.\\displaystyle\\mathbb\{P\}\\left\[\\lVert\\widehat\{\\bm\{\\theta\}\}\_\{l\}\-\\bm\{\\theta\}^\{\\star\}\\rVert\_\{\\bm\{H\}\_\{l\}^\{\\star\}\}\\leq\\mathcal\{O\}\(\\sqrt\{Nd\\log\{\(T/\\delta\)\}\}\)\\right\]\\geq 1\-\\delta\.\(10\)Since at roundt∈𝒯lt\\in\\mathcal\{T\}\_\{l\}, we selected the item𝒙ti\\bm\{x\}\_\{t\}^\{i\}for slotiiby sampling from a G\-optimal design constructed on𝒳ti\\mathcal\{X\}\_\{t\}^\{i\}\(post elimination\), we get the optimal design bound \(Lemma[A\.9](https://arxiv.org/html/2606.31449#A1.Thmlemma9), Appendix A\) as:

𝔼\[‖𝒙ti​μ˙​\(𝒃t⊤​𝜽^0\)/βt‖\(𝑯li\)−1\]=𝒪​\(d\|𝒯l\|\)\.\\operatorname\*\{\\mathbb\{E\}\}\\left\[\\left\\lVert\\bm\{x\}\_\{t\}^\{i\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)/\\beta\_\{t\}\}\\right\\rVert\_\{\(\\bm\{H\}\_\{l\}^\{i\}\)^\{\-1\}\}\\right\]=\\mathcal\{O\}\\left\(\\frac\{d\}\{\\sqrt\{\|\\mathcal\{T\}\_\{l\}\|\}\}\\right\)\.Multiplicative equivalence betweend​i​a​g​\(𝑯l1,…,𝑯lN\)diag\(\\bm\{H\}\_\{l\}^\{1\},\\ldots,\\bm\{H\}\_\{l\}^\{N\}\)and𝑯~l\\tilde\{\\bm\{H\}\}\_\{l\}, then yields that‖𝒙t​μ˙​\(𝒃t⊤​𝜽^0\)/βt‖\(𝑯~l\)−1\\left\\lVert\\bm\{x\}\_\{t\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)/\\beta\_\{t\}\}\\right\\rVert\_\{\(\\tilde\{\\bm\{H\}\}\_\{l\}\)^\{\-1\}\}and∑i∈\[N\]‖𝒙ti​μ˙​\(𝒃t⊤​𝜽^0\)/βt‖\(𝑯li\)−1\\sum\\limits\_\{i\\in\[N\]\}\\left\\lVert\\bm\{x\}\_\{t\}^\{i\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)/\\beta\_\{t\}\}\\right\\rVert\_\{\(\\bm\{H\}\_\{l\}^\{i\}\)^\{\-1\}\}are also multiplicatively equivalent, implying that

𝔼\[‖𝒙t​μ˙​\(𝒃t⊤​𝜽^0\)/βt‖\(𝑯l~\)−1\]=𝒪​\(N​d\|𝒯l\|\)\.\\displaystyle\\operatorname\*\{\\mathbb\{E\}\}\\left\[\\left\\lVert\\bm\{x\}\_\{t\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)/\\beta\_\{t\}\}\\right\\rVert\_\{\{\\tilde\{\(\\bm\{H\}\_\{l\}\}\)\}^\{\-1\}\}\\right\]=\\mathcal\{O\}\\left\(\\frac\{Nd\}\{\\sqrt\{\|\\mathcal\{T\}\_\{l\}\|\}\}\\right\)\.\(11\)These bounds in \([10](https://arxiv.org/html/2606.31449#S3.E10)\) and \([11](https://arxiv.org/html/2606.31449#S3.E11)\) are key to provingκ\\kappa\-independent optimal \(inTT\) regret guarantees\.

### 3\.2Regret Guarantee forB\-SlateGLinCB

In Theorem[3\.1](https://arxiv.org/html/2606.31449#S3.Thmtheorem1)we present our regret guarantee forB\-SlateGLinCBand provide the proof in Appendix[A](https://arxiv.org/html/2606.31449#A1)\.

###### Theorem 3\.1\.

At the end ofTTrounds,B\-SlateGLinCB\(Algorithm[1](https://arxiv.org/html/2606.31449#alg1)\) incurs a regretR​\(T\)R\(T\)which can be bounded as

R​\(T\)=𝒪~​\(R​S​N​d3/2​T⋅𝔼\{𝒳i∼𝒟i\}i∈\[N\]μ˙​\(𝒙⋆⊤​𝜽⋆\)\)R\(T\)=\\tilde\{\\mathcal\{O\}\}\\left\(RSNd^\{3/2\}\\sqrt\{T\\cdot\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i\\in\[N\]\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\right\)where𝐱⋆=arg​max𝐱∈𝒳⁡𝐱⊤​𝛉⋆\\bm\{x\}\_\{\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}and𝒳=𝒳1×…×𝒳N\\mathcal\{X\}=\\mathcal\{X\}^\{1\}\\times\\ldots\\times\\mathcal\{X\}^\{N\}\.

### 3\.3Additional Remarks

## 4RS\-SlateGLinCB

Algorithm 2RS\-SlateGLinCB1:Inputs:Number of Slots

NN, Horizon

TT, Parameter norm bound

SS, Failure Level

δ\\delta, non\-linearity

κ\\kappa\.

2:Initialize the warm\-up batch

𝒯0=⌊T⌋\\mathcal\{T\}\_\{0\}=\\lfloor\\sqrt\{T\}\\rfloorand

𝑽i=λ​𝑰\\bm\{V\}^\{i\}=\\lambda\\bm\{I\}, for all

i∈\[N\]i\\in\[N\]where

λ=𝒪​\(N​d​R2​S−1​log⁡\(T​δ−1\)\)\\lambda=\\mathcal\{O\}\\left\(NdR^\{2\}S^\{\-1\}\\log\(T\\delta^\{\-1\}\)\\right\)\.//Warmup Batch

3:for

t∈\[𝒯0\]t\\in\[\\mathcal\{T\}\_\{0\}\]do

4:Receive the set of items

𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}for all slots

i∈\[N\]i\\in\[N\]\.

5:Select

𝒙ti=max𝒙∈𝒳ti∥𝒙∥\(𝑽ti\)−1\\bm\{x\}^\{i\}\_\{t\}=\\max\_\{\\bm\{x\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{x\}\\rVert\_\{\(\\bm\{V\}\_\{t\}^\{i\}\)^\{\-1\}\}, play the slate

𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}^\{1\}\_\{t\},\\ldots,\\bm\{x\}^\{N\}\_\{t\}\)and obtain reward

rtr\_\{t\}\.

6:For all

i∈\[N\]i\\in\[N\], update

𝑽i←𝑽i\+𝒙ti​𝒙ti⊤\\bm\{V\}^\{i\}\\leftarrow\\bm\{V\}^\{i\}\+\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\.

7:endfor

8:Update

𝜽^0=arg​min𝜽​∑t∈\[𝒯w\]ℓ​\(𝒙t,rt,𝜽\)\\widehat\{\\bm\{\\theta\}\}\_\{0\}=\\operatorname\*\{arg\\,min\}\_\{\\bm\{\\theta\}\}\\sum\_\{t\\in\[\\mathcal\{T\}\_\{w\}\]\}\\ell\(\\bm\{x\}\_\{t\},r\_\{t\},\\bm\{\\theta\}\)\.

9:Set

s=\|𝒯0\|s=\|\\mathcal\{T\}\_\{0\}\|,

𝑯s=λ​𝑰N​d\\bm\{H\}\_\{s\}=\\lambda\\bm\{I\}\_\{Nd\},

𝑯si=λ​𝑰d\\bm\{H\}^\{i\}\_\{s\}=\\lambda\\bm\{I\}\_\{d\}∀i∈\[N\]\\forall i\\in\[N\]\.

10:for

t∈\[s\+1,T\]t\\in\[s\+1,T\]do

11:Receive the set of items

𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}for all slots

i∈\[N\]i\\in\[N\]\.//Determinant condition

12:if

det\(𝑯t\)≥2​det\(𝑯s\)\\det\(\\bm\{H\}\_\{t\}\)\\geq 2\\;\\det\(\\bm\{H\}\_\{s\}\)then

13:Compute

𝜽^s=arg​min𝜽​∑t′=s\+1t−1ℓ​\(𝒙t′,rt′,𝜽\)\\widehat\{\\bm\{\\theta\}\}\_\{s\}=\\operatorname\*\{arg\\,min\}\_\{\\bm\{\\theta\}\}\\sum\_\{t^\{\\prime\}=s\+1\}^\{t\-1\}\\ell\(\\bm\{x\}\_\{t^\{\\prime\}\},r\_\{t^\{\\prime\}\},\\bm\{\\theta\}\)and update

s←ts\\leftarrow t\.

14:endif

15:foreach slot

i∈\[N\]i\\in\[N\]do

16:

𝒳ti←\{𝒛∈𝒳ti:UCBi,0​\(𝒛\)≥max𝒚∈𝒳ti⁡LCBi,0​\(𝒚\)\}\\mathcal\{X\}^\{i\}\_\{t\}\\leftarrow\\\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}:\\textrm\{UCB\}^\{i,0\}\(\\bm\{z\}\)\\geq\\max\\limits\_\{\\bm\{y\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,0\}\(\\bm\{y\}\)\\\}\.

17:Select

𝒙ti=arg⁡max𝒛∈𝒳ti⁡\{𝒛⊤​𝜽^si\+2​2​β​∥𝒛∥\(𝑯ti\)−1\}\\bm\{x\}^\{i\}\_\{t\}=\\arg\\max\\limits\_\{\\bm\{z\}\\in\\mathcal\{X\}\_\{t\}^\{i\}\}\\\{\\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{s\}^\{i\}\+2\\sqrt\{2\}\\beta\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\\\}
18:endfor

19:Play slate

𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}^\{1\}\_\{t\},\\ldots,\\bm\{x\}^\{N\}\_\{t\}\)and obtain reward

rtr\_\{t\}\.

20:Update the matrix

𝑯t\+1←𝑯t\+μ˙​\(𝒙t⊤​𝜽^0\)​e−1​𝒙t​𝒙t⊤\\bm\{H\}\_\{t\+1\}\\leftarrow\\bm\{H\}\_\{t\}\+\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)e^\{\-1\}\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}and

𝑯t\+1i←𝑯ti\+μ˙​\(𝒙t⊤​𝜽^0\)​e−1​𝒙ti​𝒙ti⊤​∀i∈\[N\]\\bm\{H\}^\{i\}\_\{t\+1\}\\leftarrow\\bm\{H\}^\{i\}\_\{t\}\+\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)e^\{\-1\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\;\\forall i\\in\[N\]\.

21:endfor

In this section, we describe a rarely\-switching algorithm for Slate GLM Bandits, which we refer to asRS\-SlateGLinCB\(Algorithm[2](https://arxiv.org/html/2606.31449#alg2)\)\.RS\-SlateGLinCBemploys a switching condition to adaptively determine policy updates\. We first describe the algorithm, and subsequently, in Section[4\.2](https://arxiv.org/html/2606.31449#S4.SS2), we provide provide certain guarantees forRS\-SlateGLinCBand make some remarks\.

### 4\.1Algorithmic Description

RS\-SlateGLinCBtakes as inputs the number of slotsNN, the length of the horizonTT, an upper boundSSon theℓ2\\ell\_\{2\}\-norm of the true reward parameters𝜽⋆\\bm\{\\theta\}^\{\\star\}, the probability of failureδ\\delta, and the instance\-dependent non\-linearityκ\\kappa\.555An upper bound onκ\\kappaandSSsuffices\.

Warm\-up Batch: The algorithm begins with a warm\-up batch𝒯0\\mathcal\{T\}\_\{0\}\(*Steps 3\-7*\) comprising⌊T⌋\\lfloor\\sqrt\{T\}\\rfloorrounds\. At each roundt∈𝒯0t\\in\\mathcal\{T\}\_\{0\}, in*Steps 4\-6*, the algorithm observes theNNdifferent item\-sets𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}, and for each sloti∈\[N\]i\\in\[N\], selects𝒙ti=arg​max𝒛∈𝒳ti∥𝒛∥\(𝑽ti\)−1\\bm\{x\}^\{i\}\_\{t\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{t\}\)^\{\-1\}\}, where𝑽ti\\bm\{V\}^\{i\}\_\{t\}is defined via

𝑽ti=𝑽t−1i\+𝒙ti​𝒙ti⊤,𝑽0i=λ​𝑰\\bm\{V\}^\{i\}\_\{t\}=\\bm\{V\}^\{i\}\_\{t\-1\}\+\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\},\\bm\{V\}^\{i\}\_\{0\}=\\lambda\\bm\{I\}andλ=𝒪​\(N​d​R2​S−1​log⁡\(T/δ\)\)\\lambda=\\mathcal\{O\}\(NdR^\{2\}S^\{\-1\}\\log\(T/\\delta\)\)\. The algorithm plays the resulting slate𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}^\{1\}\_\{t\},\\ldots,\\bm\{x\}^\{N\}\_\{t\}\)and receives the rewardrtr\_\{t\}corresponding to it\. At the end of this warm up batch, in*Step 8*we compute𝜽^0\\widehat\{\\bm\{\\theta\}\}\_\{0\}by minimizing the GLM\-MLE loss as per \([3](https://arxiv.org/html/2606.31449#S2.E3)\) over the set\{\(𝒙t,rt\)\}t∈𝒯0\\\{\(\\bm\{x\}\_\{t\},r\_\{t\}\)\\\}\_\{t\\in\\mathcal\{T\}\_\{0\}\}\. Similar toB\-SlateGLinCB,𝜽^0\\widehat\{\\bm\{\\theta\}\}\_\{0\}is used to define scaled design matrices which help us in obtainingκ\\kappa\-free regret which we discuss next\.

Rarely\-Switching Algorithm: In*Steps 9\-20*, we execute the rest of the rarely\-switching algorithm\. For each roundt∈\[\|𝒯0\|\+1,T\]t\\in\[\|\\mathcal\{T\}\_\{0\}\|\+1,T\]and each sloti∈\[N\]i\\in\[N\], we define a slot level scaled design matrix

𝑯ti=λ​𝑰d\+∑s=\|𝒯0\|\+1t−1\(μ˙​\(𝒙s⊤​𝜽^0\)/e\)​𝒙si​𝒙si⊤\.\\bm\{H\}^\{i\}\_\{t\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\\limits\_\{s=\|\\mathcal\{T\}\_\{0\}\|\+1\}^\{t\-1\}\\left\(\\dot\{\\mu\}\(\\bm\{x\}\_\{s\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)/e\\right\)\\bm\{x\}^\{i\}\_\{s\}\{\\bm\{x\}^\{i\}\_\{s\}\}^\{\\top\}\.We also define a slate level scaled design matrix

𝑯t=λ​𝑰N​d\+∑s=\|𝒯0\|\+1t−1\(μ˙​\(𝒙s⊤​𝜽^0\)/e\)​𝒙s​𝒙s⊤\.\\bm\{H\}\_\{t\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{s=\|\\mathcal\{T\}\_\{0\}\|\+1\}^\{t\-1\}\\left\(\\dot\{\\mu\}\(\\bm\{x\}\_\{s\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)/e\\right\)\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\.Similar to our discussion in Section[3\.1\.1](https://arxiv.org/html/2606.31449#S3.SS1.SSS1), we can show that𝑯t≼𝑯t⋆\\bm\{H\}\_\{t\}\\preccurlyeq\\bm\{H\}\_\{t\}^\{\\star\}, where𝑯t⋆\\bm\{H\}\_\{t\}^\{\\star\}is the Hessian of the GLM\-MLE loss \(Section[2\.6](https://arxiv.org/html/2606.31449#S2.SS6)\) computed on the pairs\{\(𝒙s,rs\)\}s=\|𝒯0\|\+1t−1\\\{\(\\bm\{x\}\_\{s\},r\_\{s\}\)\\\}\_\{s=\|\\mathcal\{T\}\_\{0\}\|\+1\}^\{t\-1\}, and is defined as

𝑯t⋆=λ​𝑰N​d\+∑s=\|𝒯0\|\+1t−1μ˙​\(𝒙t⊤​𝜽⋆\)​𝒙s​𝒙s⊤\.\\bm\{H\}^\{\\star\}\_\{t\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{s=\|\\mathcal\{T\}\_\{0\}\|\+1\}^\{t\-1\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\.After receiving the set of items𝒳ti\\mathcal\{X\}\_\{t\}^\{i\}for all slotsi∈\[N\]i\\in\[N\], in*Step 13*, we check for a*determinant condition*indicating whether the true parameter𝜽⋆\\bm\{\\theta\}^\{\\star\}needs to be re\-estimated\. In particular, we check if the determinant of𝑯t\\bm\{H\}\_\{t\}is more than double the determinant of𝑯s\\bm\{H\}\_\{s\}, wheres<ts<tis the last round at which an estimate𝜽^s\\widehat\{\\bm\{\\theta\}\}\_\{s\}of𝜽⋆\\bm\{\\theta\}^\{\\star\}was computed\. If true, we compute𝜽^t\\widehat\{\\bm\{\\theta\}\}\_\{t\}by minimizing the GLM\-MLE loss over all roundst∈\[\|𝒯0\|\+1,t−1\]t\\in\[\|\\mathcal\{T\}\_\{0\}\|\+1,t\-1\], and updates=ts=t\. Then, regardless of whether the determinant condition was true, in*Steps 16\-17*, for each sloti∈\[N\]i\\in\[N\], we eliminate all items𝒛∈𝒳ti\\bm\{z\}\\in\\mathcal\{X\}\_\{t\}^\{i\}that satisfyUCBi,0​\(𝒛\)<LCBi,0​\(𝒚\)\\textrm\{UCB\}^\{i,0\}\(\\bm\{z\}\)<\\textrm\{LCB\}^\{i,0\}\(\\bm\{y\}\)for some itemy∈𝒳tiy\\in\\mathcal\{X\}^\{i\}\_\{t\}\. Here,UCBi,k\\textrm\{UCB\}^\{i,k\}andLCBi,k\\textrm\{LCB\}^\{i,k\}are scores defined in \([5](https://arxiv.org/html/2606.31449#S3.E5)\) and \([6](https://arxiv.org/html/2606.31449#S3.E6)\) respectively\. Finally, in Step*17*, from the remaining items in𝒳ti\\mathcal\{X\}\_\{t\}^\{i\}, we select𝒙ti\\bm\{x\}\_\{t\}^\{i\}such that,

𝒙ti=arg​max𝒛∈𝒳ti⁡\{𝒛⊤​𝜽^si\+2​2​β​∥𝒛∥\(𝑯ti\)−1\}\\bm\{x\}\_\{t\}^\{i\}=\\operatorname\*\{arg\\,max\}\\limits\_\{\\bm\{z\}\\in\\mathcal\{X\}\_\{t\}^\{i\}\}\\\{\\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{s\}^\{i\}\+2\\sqrt\{2\}\\beta\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\\\}whereβ=𝒪​\(R​N​d​S​log⁡\(T/δ\)\)\\beta=\\mathcal\{O\}\(R\\sqrt\{NdS\\log\(T/\\delta\)\}\)ands≤ts\\leq twas the last round at which an estimate𝜽^s=\(𝜽^s1,…,𝜽^sN\)\\widehat\{\\bm\{\\theta\}\}\_\{s\}=\(\\widehat\{\\bm\{\\theta\}\}\_\{s\}^\{1\},\\ldots,\\widehat\{\\bm\{\\theta\}\}\_\{s\}^\{N\}\)of𝜽⋆\\bm\{\\theta\}^\{\\star\}was computed\. We play the slate𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}\_\{t\}^\{1\},\\ldots,\\bm\{x\}\_\{t\}^\{N\}\)and receive rewardrtr\_\{t\}\. We then update𝑯ti\\bm\{H\}\_\{t\}^\{i\}\(i∈\[N\]i\\in\[N\]\) and𝑯t\\bm\{H\}\_\{t\}\.

### 4\.2Guarantees and Remarks forRS\-SlateGLinCB

In Theorem[4\.1](https://arxiv.org/html/2606.31449#S4.Thmtheorem1)and Lemma[4\.1](https://arxiv.org/html/2606.31449#S4.Thmlemma1), we present the regret guarantee forRS\-SlateGLinCBand a bound on the number of parameter updates made by it respectively\. We provide the proofs in Appendix[C](https://arxiv.org/html/2606.31449#A3)\.

###### Theorem 4\.1\.

At the end ofTTrounds,RS\-SlateGLinCB\(Algorithm[2](https://arxiv.org/html/2606.31449#alg2)\) incurs a regretR​\(T\)R\(T\)which can be bounded as

R​\(T\)=𝒪~​\(R​T\+R​S1/2​N​d​∑t∈\[T\]μ˙​\(𝒙t,⋆⊤​θ⋆\)\)\.R\(T\)=\\tilde\{\\mathcal\{O\}\}\\left\(R\\sqrt\{T\}\+RS^\{1/2\}Nd\\sqrt\{\\sum\_\{t\\in\[T\]\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\}\\right\)\.

###### Lemma 4\.1\.

DuringTTrounds,RS\-SlateGLinCB\(Algorithm[2](https://arxiv.org/html/2606.31449#alg2)\) updates its policy at most𝒪​\(N​d​log⁡T\)\\mathcal\{O\}\(Nd\\log T\)times\.

## 5Experiments

### 5\.1Synthetic Experiments

In this section, first, we empirically compare our algorithms,B\-SlateGLinCB\(Algorithm[1](https://arxiv.org/html/2606.31449#alg1)\) andRS\-SlateGLinCB\(Algorithm[2](https://arxiv.org/html/2606.31449#alg2)\) to other baseline algorithms that accommodate limited adaptivity\.666We use a logistic reward model for all our experiments\.These includeRS\-GLinCB\(Algorithm 2,\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\),RS\-MNL\(Algorithm 3,\[[21](https://arxiv.org/html/2606.31449#bib.bib8)\]\), and a modified version ofSoftBatch\(Algorithm 5,\[[13](https://arxiv.org/html/2606.31449#bib.bib7)\]\)\. We are not aware of any limited adaptivity algorithm specifically designed for the slate setting\. Then, we also compare the regret of our algorithm toSlate\-GLM\-OFU\(Algorithm 1,\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]\), which is fully adaptive, i\.e\., parameters are updated at all rounds\. To the best of our knowledge, this is the only contextual slate bandit algorithm designed for GLM rewards\. In Appendix[E](https://arxiv.org/html/2606.31449#A5), we detail the implementation details for all algorithms and showcase additional experiments\.

Experimental Design: At eacht∈\[T\]t\\in\[T\], for each sloti∈\[N\]i\\in\[N\], the set of items𝒳ti⊂ℝd\\mathcal\{X\}^\{i\}\_\{t\}\\subset\\mathbb\{R\}^\{d\}, is chosen such that\|𝒳ti\|=K=5\|\\mathcal\{X\}^\{i\}\_\{t\}\|=K=5andd=5d=5\. Each item in𝒳ti\\mathcal\{X\}\_\{t\}^\{i\}is sampled from\[−1,1\]5\[\-1,1\]^\{5\}and is normalized to haveℓ2\\ell\_\{2\}\-norm1/N1/\\sqrt\{N\}whereNNvaries depending on the experiment setting\. We randomly select𝜽⋆\\bm\{\\theta\}^\{\\star\}from\[−1,1\]N​d\[\-1,1\]^\{Nd\}and normalize it to haveℓ2\\ell\_\{2\}\-normSS\. For our algorithms, we setδ=1/N2\\delta=1/N^\{2\}which puts it in the range\[0\.004,0\.04\]\[0\.004,0\.04\]for the values ofNNused\. For the baselines, we use the default values ofδ\\delta777which are of the same order as ours\.provided in the corresponding implementation\. We varyS,NS,Nto create two different experiment settings capturing low and high regimes ofκ\\kappaand the number of slatesKNK^\{N\}:E1:\(S,N\)=\(2,5\)\(S,N\)=\(2,5\), resulting inKN=3125K^\{N\}=3125slates with dimensionN​d=25Nd=25andκ≈7\.38\\kappa\\approx 7\.38\.E2:\(S,N\)=\(5,10\)\(S,N\)=\(5,10\), resulting inKN=9765725K^\{N\}=9765725slates with dimensionN​d=50Nd=50andκ≈150\\kappa\\approx 150\. We run our experiments forT∈\{5000∗m:m∈\[4\]\}T\\in\\\{5000\*m:m\\in\[4\]\\\}rounds and average over 25 different seeds for sampling rewards\.

#### 5\.1\.1Results

Comparison with limited adaptivity algorithms:We see in Figures[1\(a\)](https://arxiv.org/html/2606.31449#S5.F1.sf1)and[1\(b\)](https://arxiv.org/html/2606.31449#S5.F1.sf2)that our algorithmsB\-SlateGLinCBandRS\-SlateGLinCBachieve sublinear regret, and significantly outperform the limited adaptivity baselines in both the settingsE1andE2respectively\. These results also provide strong empirical support for ourκ\\kappa\-free regret guarantees in Theorems[3\.1](https://arxiv.org/html/2606.31449#S3.Thmtheorem1)and[4\.1](https://arxiv.org/html/2606.31449#S4.Thmtheorem1)\. We also observe that the regret ofRS\-SlateGLinCBis better thanB\-SlateGLinCBin both regimes, which can possibly be attributed to better constants as well as thed\\sqrt\{d\}gap between the bounds provided in our theorems\.

Comparison with a fully adaptive algorithm: In Figures[1\(c\)](https://arxiv.org/html/2606.31449#S5.F1.sf3)and[1\(d\)](https://arxiv.org/html/2606.31449#S5.F1.sf4)we compare the regret of our algorithms with that of the fully adaptive slate bandit algorithmSlate\-GLM\-OFU\. SinceSlate\-GLM\-OFUis not constrained by limited adaptivity, its parameters are updated at all rounds\. Hence, we expect its regret to be better than that of our algorithms\. However, we observe that for both settingsE1andE2, the gap betweenSlate\-GLM\-OFUandRS\-SlateGLinCBis quite small\. In Figures[1\(c\)](https://arxiv.org/html/2606.31449#S5.F1.sf3)and[1\(d\)](https://arxiv.org/html/2606.31449#S5.F1.sf4), we also include a slight modification ofB\-SlateGLinCB, which we refer to asB\-SlateGlinCB\+, and notice that its regret is extremely close to that ofSlate\-GLM\-OFU\. Similar toB\-SlateGLinCB,B\-SlateGLinCB\+is also a batched algorithm with only𝒪​\(log⁡log⁡T\)\\mathcal\{O\}\(\\log\\log T\)parameter updates; however, it modifies*Step 18*ofB\-SlateGLinCBto perform fewer eliminations, i\.e\., instead of iterating over all previous batchesl∈\[0,m−1\]l\\in\[0,m\-1\], it only checks the elimination condition in*Step 20*forl=m−1l=m\-1\. While this clearly reduces the per\-round time complexity, empirically, we observe that it also incurs much lower regret\. It would be interesting to study the constraints under which one can prove strong regret bounds for such heuristics\. In Appendix[F](https://arxiv.org/html/2606.31449#A6), we provide additional insights and experiments forB\-SlateGLinCB\+\.

![Refer to caption](https://arxiv.org/html/2606.31449v1/x1.png)\(a\)Experiment settingE1
![Refer to caption](https://arxiv.org/html/2606.31449v1/x2.png)\(b\)Experiment settingE2

Figure 1:Comparison with limited adaptivity algorithms,SoftBatch,RS\-MNLandRS\-GLinCB
![Refer to caption](https://arxiv.org/html/2606.31449v1/x3.png)\(c\)Experiment settingE1
![Refer to caption](https://arxiv.org/html/2606.31449v1/x4.png)\(d\)Experiment settingE2

Figure 2:Comparison with the sequentially adaptive slate bandit algorithmSlate\-GLM\-OFU

### 5\.2Real World Experiments: Prompt Tuning

Next, we employB\-SlateGLinCB\+to perform prompt tuning for language models through exemplar selection\.

Experimental Design: All experiments are conducted using RoBERTa\-large\[[20](https://arxiv.org/html/2606.31449#bib.bib25)\]as the base model and Nomic\-Embed\-Text\-v1\.5\[[22](https://arxiv.org/html/2606.31449#bib.bib27)\]on a binary sentiment classification task, namely, the SST\-2 dataset\[[29](https://arxiv.org/html/2606.31449#bib.bib28)\]\. The instruction prompt is fixed*apriori*, and is designed in the form of a slate, where each of theNN*slots*correspond to an exemplar\. At each time round, the algorithm is presented with a query andNN\(different\) pools consisting ofKKcandidate examples each\. The algorithm is then required to select an exemplar \(*item*\) from each of the candidate pools to construct the prompt \(*slate*\)\. We chooseN=6N=6andK=9K=9\.

At each time round, we construct the arm\-sets as follows: the feature vector for each candidate example is a concatenation of three different components; a joint embedding between the query presented in the particular time round and the candidate example, the true label for the candidate example, and a pair of scores that measure the similarity between the query and the candidate example\. We describe the experimental setup in complete detail in Appendix[G](https://arxiv.org/html/2606.31449#A7)\.

Baselines and Results: We choose the following algorithms to be our baselines: \(i\) the base language model, without making use of any exemplars, \(ii\) the base language model, where the exemplars are chosen randomly \(and hence, there is no learning\) at each round, and \(iii\) the fully adaptiveSlate\-GLM\-OFU, which updates its policy at each round\. In Figure[3](https://arxiv.org/html/2606.31449#S5.F3), we report the average cumulative accuracy of our algorithmB\-SlateGLinCB\+against that obtained by the other baselines over an augmented test set consisting of 4870 queries\. We see thatB\-SlateGLinCB\+achieves substantially higher accuracy than baselines \(i\) and \(ii\)\. Also, even though it performs only𝒪​\(log⁡log⁡T\)\\mathcal\{O\}\(\\log\\log T\)updates, it is incredibly competitive withSlate\-GLM\-OFU, showcasing its utility in practical scenarios\.

![Refer to caption](https://arxiv.org/html/2606.31449v1/x5.png)Figure 3:Prompt Tuning on SST\-2

## 6Conclusions

We present a batched algorithmB\-SlateGLinCBand a rarely switching algorithmRS\-SlateGLinCBfor slate GLM bandits with bandit feedback\. Under Assumption[2\.1](https://arxiv.org/html/2606.31449#S2.Thmassumption1), we prove thatB\-SlateGLinCBandRS\-SlateGLinCBincur𝒪​\(N​d3/2​T\)\\mathcal\{O\}\(Nd^\{3/2\}\\sqrt\{T\}\)and𝒪​\(N​d​T\)\\mathcal\{O\}\(Nd\\sqrt\{T\}\)regret respectively, while havingp​o​l​y​\(N\)poly\(N\)per round time complexity\. Empirically, we show that our algorithms outperform all baseline limited adaptivity algorithms\. At the same time,RS\-SlateGLinCBis quite competitive with the fully adaptiveSlate\-GLM\-OFU\(Algorithm 1,\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]\) algorithm\. We also show that the regret of a modified algorithmB\-SlateGLinCB\+matches that ofSlate\-GLM\-OFU\. Finally, we implement prompt tuning usingB\-SlateGLinCB\+on language models with exemplar selection and demonstrate strong performance in binary classification tasks\. In fact, our performance matches that of the fully adaptiveSlate\-GLM\-OFUalgorithm\. Developing batched algorithms with provably optimal regret guarantees and empirical performance matchingSlate\-GLM\-OFUremains an important future direction\.

## References

- \[1\]\(2011\)Improved algorithms for linear stochastic bandits\.InAdvances in Neural Information Processing Systems 24 \(NeurIPS\),pp\. 2312–2320\.Cited by:[Lemma C\.11](https://arxiv.org/html/2606.31449#A3.Thmlemma11.p1.4.4),[Lemma C\.12](https://arxiv.org/html/2606.31449#A3.Thmlemma12.p1.1.1),[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p2.1),[Remark 4\.1](https://arxiv.org/html/2606.31449#S4.Thmremark1.p1.2.2)\.
- \[2\]H\. Bastani, M\. Bayati, and K\. Khosravi\(2021\)Mostly exploration\-free algorithms for contextual bandits\.Management Science67\(3\),pp\. 1329–1349\.Cited by:[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p2.2)\.
- \[3\]N\. Cesa\-Bianchi, O\. Dekel, and O\. Shamir\(2013\)Online learning with switching costs and other adaptive adversaries\.InProceedings of the 27th International Conference on Neural Information Processing Systems \- Volume 1,NIPS’13,Red Hook, NY, USA,pp\. 1160–1168\.Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p2.1),[§2\.3](https://arxiv.org/html/2606.31449#S2.SS3.p2.15)\.
- \[4\]N\. Chatterji, V\. Muthukumar, and P\. Bartlett\(2020\-26–28 Aug\)OSOM: a simultaneously optimal algorithm for multi\-armed and linear contextual bandits\.InProceedings of the Twenty Third International Conference on Artificial Intelligence and Statistics,S\. Chiappa and R\. Calandra \(Eds\.\),Proceedings of Machine Learning Research, Vol\.108,pp\. 1844–1854\.External Links:[Link](https://proceedings.mlr.press/v108/chatterji20b.html)Cited by:[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p2.2),[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p3.1)\.
- \[5\]J\. Chen, J\. Xu, G\. Jiang, T\. Ge, Z\. Zhang, D\. Lian, and K\. Zheng\(2021\)Automated creative optimization for e\-commerce advertising\.InProceedings of the Web Conference 2021 \(WWW ’21\),Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p1.1),[§1](https://arxiv.org/html/2606.31449#S1.p1.1)\.
- \[6\]N\. Das and G\. Sinha\(2024\)Linear contextual bandits with hybrid payoff: revisited\.InECML/PKDD \(6\),pp\. 441–455\.Cited by:[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p2.2),[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p3.1)\.
- \[7\]M\. Dimakopoulou, N\. Vlassis, and T\. Jebara\(2019\-07\)Marginal posterior sampling for slate bandits\.InProceedings of the Twenty\-Eighth International Joint Conference on Artificial Intelligence, IJCAI\-19,pp\. 2223–2229\.External Links:[Document](https://dx.doi.org/10.24963/ijcai.2019/308)Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p1.1),[§1](https://arxiv.org/html/2606.31449#S1.p2.1)\.
- \[8\]L\. Faury, M\. Abeille, C\. Calauzenes, and O\. Fercoq\(2020\-13–18 Jul\)Improved optimistic algorithms for logistic bandits\.InProceedings of the 37th International Conference on Machine Learning,H\. D\. III and A\. Singh \(Eds\.\),Proceedings of Machine Learning Research, Vol\.119,pp\. 3052–3060\.Cited by:[§2\.4](https://arxiv.org/html/2606.31449#S2.SS4.p1.7)\.
- \[9\]L\. Faury, M\. Abeille, K\. Jun, and C\. Calauzenes\(2022\-28–30 Mar\)Jointly efficient and optimal algorithms for logistic bandits\.InProceedings of The 25th International Conference on Artificial Intelligence and Statistics,G\. Camps\-Valls, F\. J\. R\. Ruiz, and I\. Valera \(Eds\.\),Proceedings of Machine Learning Research, Vol\.151,pp\. 546–580\.Cited by:[§2\.4](https://arxiv.org/html/2606.31449#S2.SS4.p1.7),[§3\.1\.1](https://arxiv.org/html/2606.31449#S3.SS1.SSS1.p1.18)\.
- \[10\]S\. Filippi, O\. Cappe, A\. Garivier, and C\. Szepesvári\(2010\)Parametric bandits: the generalized linear case\.InAdvances in Neural Information Processing Systems,J\. Lafferty, C\. Williams, J\. Shawe\-Taylor, R\. Zemel, and A\. Culotta \(Eds\.\),Vol\.23,pp\.\.Cited by:[§2\.2](https://arxiv.org/html/2606.31449#S2.SS2.p2.6),[§2\.6](https://arxiv.org/html/2606.31449#S2.SS6.p1.12)\.
- \[11\]Z\. Gao, Y\. Han, Z\. Ren, and Z\. Zhou\(2019\)Batched multi\-armed bandits problem\.InAdvances in Neural Information Processing Systems,H\. Wallach, H\. Larochelle, A\. Beygelzimer, F\. d'Alché\-Buc, E\. Fox, and R\. Garnett \(Eds\.\),Vol\.32,pp\.\.Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p2.1),[§2\.3](https://arxiv.org/html/2606.31449#S2.SS3.p2.15)\.
- \[12\]T\. Goyal and G\. Sinha\(2025\)Efficient algorithms for logistic contextual slate bandits with bandit feedback\.InThe 41st Conference on Uncertainty in Artificial Intelligence,Cited by:[§A\.1](https://arxiv.org/html/2606.31449#A1.SS1.p2.7),[§A\.5](https://arxiv.org/html/2606.31449#A1.SS5.11.p5.1),[§A\.5](https://arxiv.org/html/2606.31449#A1.SS5.2.p2.1),[§A\.5](https://arxiv.org/html/2606.31449#A1.SS5.2.p2.10),[§A\.5](https://arxiv.org/html/2606.31449#A1.SS5.4.p4.1),[§A\.5](https://arxiv.org/html/2606.31449#A1.SS5.6.p6.4),[§A\.5](https://arxiv.org/html/2606.31449#A1.SS5.7.p1.2),[§A\.5](https://arxiv.org/html/2606.31449#A1.SS5.7.p1.9),[§A\.5](https://arxiv.org/html/2606.31449#A1.SS5.9.p3.1),[§C\.4](https://arxiv.org/html/2606.31449#A3.SS4.2.p2.10),[§C\.4](https://arxiv.org/html/2606.31449#A3.SS4.2.p2.11),[§C\.4](https://arxiv.org/html/2606.31449#A3.SS4.5.p5.1),[§C\.4](https://arxiv.org/html/2606.31449#A3.SS4.7.p7.1),[Appendix D](https://arxiv.org/html/2606.31449#A4.2.p2.2),[Lemma D\.1](https://arxiv.org/html/2606.31449#A4.Thmlemma1.p1.6.6),[Appendix D](https://arxiv.org/html/2606.31449#A4.p3.1),[Appendix F](https://arxiv.org/html/2606.31449#A6.p2.1),[§1\.1](https://arxiv.org/html/2606.31449#S1.SS1.p2.4),[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p1.1),[§1](https://arxiv.org/html/2606.31449#S1.p2.1),[§2\.4](https://arxiv.org/html/2606.31449#S2.SS4.p1.7),[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p1.6),[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p2.2),[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p3.1),[§3\.1\.1](https://arxiv.org/html/2606.31449#S3.SS1.SSS1.p1.8),[§5\.1](https://arxiv.org/html/2606.31449#S5.SS1.p1.1),[§6](https://arxiv.org/html/2606.31449#S6.p1.3),[footnote 2](https://arxiv.org/html/2606.31449#footnote2)\.
- \[13\]O\. Hanna, L\. Yang, and C\. Fragouli\(2023\)Efficient batched algorithm for contextual linear bandits with large action space via soft elimination\.InAdvances in Neural Information Processing Systems,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),Vol\.36,pp\. 56772–56783\.Cited by:[§5\.1](https://arxiv.org/html/2606.31449#S5.SS1.p1.1)\.
- \[14\]D\. N\. Hill, H\. Nassif, Y\. Liu, A\. Iyer, and S\.V\.N\. Vishwanathan\(2017\-08\)An efficient bandit algorithm for realtime multivariate optimization\.InProceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining \(KDD ’17\),External Links:[Document](https://dx.doi.org/10.1145/3097983.3098184)Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p1.1),[§1](https://arxiv.org/html/2606.31449#S1.p1.1)\.
- \[15\]S\. Kale, L\. Reyzin, and R\. E\. SchapireNon\-stochastic bandit slate problems\.InAdvances in Neural Information Processing Systems,J\. Lafferty, C\. Williams, J\. Shawe\-Taylor, R\. Zemel, and A\. Culotta \(Eds\.\),pp\.\.Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p1.1),[§1](https://arxiv.org/html/2606.31449#S1.p2.1)\.
- \[16\]S\. Kannan, J\. H\. Morgenstern, A\. Roth, B\. Waggoner, and Z\. S\. Wu\(2018\)A smoothed analysis of the greedy algorithm for the linear contextual bandit problem\.InAdvances in Neural Information Processing Systems,S\. Bengio, H\. Wallach, H\. Larochelle, K\. Grauman, N\. Cesa\-Bianchi, and R\. Garnett \(Eds\.\),Vol\.31,pp\.\.External Links:[Link](https://proceedings.neurips.cc/paper_files/paper/2018/file/2cfd4560539f887a5e420412b370b361-Paper.pdf)Cited by:[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p2.2)\.
- \[17\]J\. Kiefer and J\. Wolfowitz\(1959\)Optimum Designs in Regression Problems\.The Annals of Mathematical Statistics30\(2\),pp\. 271 – 294\.External Links:[Document](https://dx.doi.org/10.1214/aoms/1177706252)Cited by:[§2\.7](https://arxiv.org/html/2606.31449#S2.SS7.p1.6)\.
- \[18\]P\. Lagrée, C\. Vernade, and O\. Cappé\(2016\)Multiple\-play bandits in the position\-based model\.InAdvances in Neural Information Processing Systems,pp\. 1597–1605\.Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p1.1)\.
- \[19\]T\. Lattimore and C\. Szepesvari\(2017\)Bandit algorithms\.Cited by:[§2\.7](https://arxiv.org/html/2606.31449#S2.SS7.p1.6)\.
- \[20\]Z\. Liu, W\. Lin, Y\. Shi, and J\. Zhao\(2021\-08\)A robustly optimized bert pre\-training approach with post\-training\.InProceedings of the 20th Chinese National Conference on Computational Linguistics,S\. Li, M\. Sun, Y\. Liu, H\. Wu, K\. Liu, W\. Che, S\. He, and G\. Rao \(Eds\.\),pp\. 1218–1227\.Cited by:[Appendix G](https://arxiv.org/html/2606.31449#A7.p2.10),[§5\.2](https://arxiv.org/html/2606.31449#S5.SS2.p2.5)\.
- \[21\]S\. P\. Midigeshi, T\. Goyal, and G\. Sinha\(2025\)Achieving limited adaptivity for multinomial logistic bandits\.InReinforcement Learning Conference,Cited by:[Appendix E](https://arxiv.org/html/2606.31449#A5.p2.2),[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p2.1),[§2\.4](https://arxiv.org/html/2606.31449#S2.SS4.p1.7),[Remark 4\.1](https://arxiv.org/html/2606.31449#S4.Thmremark1.p1.2.2),[§5\.1](https://arxiv.org/html/2606.31449#S5.SS1.p1.1)\.
- \[22\]Z\. Nussbaum, J\. X\. Morris, B\. Duderstadt, and A\. Mulyar\(2024\)Nomic embed: training a reproducible long context text embedder\.External Links:2402\.01613Cited by:[Appendix G](https://arxiv.org/html/2606.31449#A7.p2.10),[§5\.2](https://arxiv.org/html/2606.31449#S5.SS2.p2.5)\.
- \[23\]M\. Papini, A\. Tirinzoni, M\. Restelli, A\. Lazaric, and M\. Pirotta\(2021\-18–24 Jul\)Leveraging good representations in linear contextual bandits\.InProceedings of the 38th International Conference on Machine Learning,M\. Meila and T\. Zhang \(Eds\.\),Proceedings of Machine Learning Research, Vol\.139,pp\. 8371–8380\.Cited by:[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p2.2),[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p3.1)\.
- \[24\]V\. Perchet, P\. Rigollet, S\. Chassang, and E\. Snowberg\(2015\-03–06 Jul\)Batched bandit problems\.InProceedings of The 28th Conference on Learning Theory,P\. Grünwald, E\. Hazan, and S\. Kale \(Eds\.\),Proceedings of Machine Learning Research, Vol\.40,Paris, France,pp\. 1456–1456\.External Links:[Link](https://proceedings.mlr.press/v40/Perchet15.html)Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p2.1)\.
- \[25\]M\. Raghavan, A\. Slivkins, J\. V\. Wortman, and Z\. S\. Wu\(2018\-06–09 Jul\)The externalities of exploration and how data diversity helps exploitation\.InProceedings of the 31st Conference On Learning Theory,S\. Bubeck, V\. Perchet, and P\. Rigollet \(Eds\.\),Proceedings of Machine Learning Research, Vol\.75,pp\. 1724–1738\.External Links:[Link](https://proceedings.mlr.press/v75/raghavan18a.html)Cited by:[§2\.5](https://arxiv.org/html/2606.31449#S2.SS5.p2.2)\.
- \[26\]J\. Rhuggenaath, A\. Akcay, Y\. Zhang, and U\. Kaymak\(2020\-04\)Algorithms for slate bandits with non\-separable reward functions\.External Links:[Document](https://dx.doi.org/10.48550/arXiv.2004.09957)Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p1.1),[§1](https://arxiv.org/html/2606.31449#S1.p2.1)\.
- \[27\]Y\. Ruan, J\. Yang, and Y\. Zhou\(2021\)Linear bandits with limited adaptivity and learning distributional optimal design\.InProceedings of the 53rd Annual ACM SIGACT Symposium on Theory of Computing,STOC 2021,New York, NY, USA,pp\. 74–87\.External Links:ISBN 9781450380539,[Document](https://dx.doi.org/10.1145/3406325.3451004)Cited by:[Lemma A\.11](https://arxiv.org/html/2606.31449#A1.Thmlemma11.p1.1.1),[Lemma B\.2](https://arxiv.org/html/2606.31449#A2.Thmlemma2.p1.2.2),[Remark B\.1](https://arxiv.org/html/2606.31449#A2.Thmremark1.p2.5.1),[Appendix B](https://arxiv.org/html/2606.31449#A2.p1.1),[§1\.1](https://arxiv.org/html/2606.31449#S1.SS1.p1.5),[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p2.1),[§2\.3](https://arxiv.org/html/2606.31449#S2.SS3.p3.3),[Remark 3\.1](https://arxiv.org/html/2606.31449#S3.Thmremark1.p1.1.1),[Remark 4\.1](https://arxiv.org/html/2606.31449#S4.Thmremark1.p1.2.2),[21](https://arxiv.org/html/2606.31449#alg3.l21)\.
- \[28\]A\. Sawarni, N\. Das, S\. Barman, and G\. Sinha\(2024\)Generalized linear bandits with limited adaptivity\.InAdvances in Neural Information Processing Systems,A\. Globerson, L\. Mackey, D\. Belgrave, A\. Fan, U\. Paquet, J\. Tomczak, and C\. Zhang \(Eds\.\),Vol\.37,pp\. 8329–8369\.Cited by:[§A\.3](https://arxiv.org/html/2606.31449#A1.SS3.12.p1.6),[§A\.3](https://arxiv.org/html/2606.31449#A1.SS3.17.p5.5),[Lemma A\.1](https://arxiv.org/html/2606.31449#A1.Thmlemma1.p1.3.3),[Lemma A\.10](https://arxiv.org/html/2606.31449#A1.Thmlemma10.p1.2.2),[§B\.1](https://arxiv.org/html/2606.31449#A2.SS1.3.p3.2),[Remark B\.1](https://arxiv.org/html/2606.31449#A2.Thmremark1.p2.5.1),[Remark B\.1](https://arxiv.org/html/2606.31449#A2.Thmremark1.p3.6.6),[§C\.3](https://arxiv.org/html/2606.31449#A3.SS3.13.p1.3),[Lemma C\.1](https://arxiv.org/html/2606.31449#A3.Thmlemma1.p1.7.7),[Lemma C\.8](https://arxiv.org/html/2606.31449#A3.Thmlemma8.p1.1.1),[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p2.1),[§2\.2](https://arxiv.org/html/2606.31449#S2.SS2.p1.10),[§2\.2](https://arxiv.org/html/2606.31449#S2.SS2.p1.5),[§2\.2](https://arxiv.org/html/2606.31449#S2.SS2.p2.6),[§2\.4](https://arxiv.org/html/2606.31449#S2.SS4.p1.7),[§2\.6](https://arxiv.org/html/2606.31449#S2.SS6.p1.12),[§3\.1\.1](https://arxiv.org/html/2606.31449#S3.SS1.SSS1.p1.18),[Remark 3\.2](https://arxiv.org/html/2606.31449#S3.Thmremark2.p1.9.9),[Remark 4\.1](https://arxiv.org/html/2606.31449#S4.Thmremark1),[Remark 4\.1](https://arxiv.org/html/2606.31449#S4.Thmremark1.p1.2.2),[§5\.1](https://arxiv.org/html/2606.31449#S5.SS1.p1.1),[footnote 1](https://arxiv.org/html/2606.31449#footnote1),[footnote 4](https://arxiv.org/html/2606.31449#footnote4)\.
- \[29\]R\. Socher, A\. Perelygin, J\. Wu, J\. Chuang, C\. D\. Manning, A\. Ng, and C\. Potts\(2013\)Recursive deep models for semantic compositionality over a sentiment treebank\.InProceedings of the 2013 Conference on Empirical Methods in Natural Language Processing,pp\. 1631–1642\.Cited by:[§5\.2](https://arxiv.org/html/2606.31449#S5.SS2.p2.5)\.
- \[30\]M\. D\. Summa, F\. Eisenbrand, Y\. Faenza, and C\. Moldenhauer\(2014\)On largest volume simplices and sub\-determinants\.InACM\-SIAM Symposium on Discrete Algorithms,Cited by:[§2\.7](https://arxiv.org/html/2606.31449#S2.SS7.p1.6)\.
- \[31\]Y\. Wang, H\. Ouyang, C\. Wang, J\. Chen, T\. Asamov, and Y\. Chang\(2017\)Efficient ordered combinatorial semi\-bandits for whole\-page recommendation\.InProceedings of the Thirty\-First AAAI Conference on Artificial Intelligence,Cited by:[§1\.2](https://arxiv.org/html/2606.31449#S1.SS2.p1.1)\.
- \[32\]Y\. Zhang and M\. Sugiyama\(2023\)Online \(multinomial\) logistic bandit: improved regret and constant computation cost\.InAdvances in Neural Information Processing Systems,A\. Oh, T\. Naumann, A\. Globerson, K\. Saenko, M\. Hardt, and S\. Levine \(Eds\.\),Vol\.36,pp\. 29741–29782\.Cited by:[§2\.4](https://arxiv.org/html/2606.31449#S2.SS4.p1.7)\.

## Appendix ARegret Analysis forB\-SlateGLinCB

In this section, we state and prove the regret bound forB\-SlateGLinCB\(Algorithm[1](https://arxiv.org/html/2606.31449#alg1)\)\.

### A\.1Notations

We first define the following scalar quantities:γ=𝒪​\(S​R​N​d​log⁡\(T​δ−1\)\)\\gamma=\\mathcal\{O\}\\left\(SR\\sqrt\{Nd\}\\log\(T\\delta^\{\-1\}\)\\right\)andλ=𝒪​\(N​d​R2​log⁡\(T​δ−1\)\)\\lambda=\\mathcal\{O\}\\left\(NdR^\{2\}\\log\(T\\delta^\{\-1\}\)\\right\)\.

We now define𝒙~i=𝒙i⊗𝒆i\\tilde\{\\bm\{x\}\}^\{i\}=\\bm\{x\}^\{i\}\\otimes\\bm\{e\}\_\{i\}, where⊗\\otimesrepresents the Kronecker product and𝒆i\\bm\{e\}\_\{i\}is theit​hi^\{th\}standard basis vector\. Note that this definition of𝒙~i\\tilde\{\\bm\{x\}\}^\{i\}is the same as the definition of*lift*of𝒙i\\bm\{x\}^\{i\}given in\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]\. Hence, all the properties shown in\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]continue to hold, and hence, for a slate𝒙=\(𝒙1,…,𝒙N\)\\bm\{x\}=\(\\bm\{x\}^\{1\},\\ldots,\\bm\{x\}^\{N\}\), we have that

𝒙=∑i=1N𝒙~i\.\\bm\{x\}=\\sum\_\{i=1\}^\{N\}\\tilde\{\\bm\{x\}\}^\{i\}\.
We define the slate\-level warm\-up design matrix𝑽0\\bm\{V\}\_\{0\}as well as the slot\-level design warm\-up matrix𝑽0i\\bm\{V\}^\{i\}\_\{0\}for alli∈\[N\]i\\in\[N\]as follows:

1. 1\.𝑽0=λ​𝑰N​d\+∑t∈𝒯0𝒙t​𝒙t⊤\\bm\{V\}\_\{0\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}\.
2. 2\.𝑽0i=λ​𝑰d\+∑t∈𝒯0𝒙ti​𝒙ti⊤\\bm\{V\}^\{i\}\_\{0\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\.

Now, recall the definition of the slate𝒃t\\bm\{b\}\_\{t\}from Section[3](https://arxiv.org/html/2606.31449#S3):𝒃t=\(𝒃t1,…,𝒃tN\)\\bm\{b\}\_\{t\}=\(\\bm\{b\}^\{1\}\_\{t\},\\ldots,\\bm\{b\}\_\{t\}^\{N\}\)where

𝒃ti=arg​max𝒛∈𝒳ti∥𝒛∥\(𝑽0i\)−1\.\\bm\{b\}^\{i\}\_\{t\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\.
For batchmm, we define the Hessian of the GLM\-MLE loss as

𝑯m⋆=λ​𝑰N​d\+∑t∈𝒯mμ˙​\(𝒙t⊤​𝜽⋆\)​𝒙t​𝒙t⊤\.\\bm\{H\}\_\{m\}^\{\\star\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}\.
Since𝜽⋆\\bm\{\\theta\}^\{\\star\}is unknown, we estimate the Hessian using a scaled design matrix𝑯m\\bm\{H\}\_\{m\}, defined as

𝑯m=λ​𝑰N​d\+∑t∈𝒯mμ˙​\(𝒃t⊤​𝜽^0\)​βt−1​𝒙t​𝒙t⊤,\\bm\{H\}\_\{m\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\\bm\{x\}\_\{t\}\{\\bm\{x\}\_\{t\}\}^\{\\top\},
whereβt\\beta\_\{t\}is a normalization factor obtained from the self\-concordance relation, and is given by:

βt=exp⁡\(min⁡\{2​S,6​κ​γ​∑i=1N∥𝒃ti∥\(𝑽0i\)−1\}\)\.\\beta\_\{t\}=\\exp\\left\(\\min\\left\\\{2S,6\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\\right\\\}\\right\)\.
Finally, for alli∈\[N\]i\\in\[N\], we define the slot\-level scaled matrices𝑯mi\\bm\{H\}^\{i\}\_\{m\}for batchmmas

𝑯mi=λ​𝑰d\+∑t∈𝒯mμ˙​\(𝒃t⊤​𝜽^0\)​βt−1​𝒙ti​𝒙ti⊤\.\\bm\{H\}^\{i\}\_\{m\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\.
For any sloti∈\[N\]i\\in\[N\], item𝒛∈𝒳ti\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}and a prior batchl∈\[M\]l\\in\[M\], we define the scoresUCBi,l​\(𝒛\)\\textrm\{UCB\}^\{i,l\}\(\\bm\{z\}\)andLCBi,l​\(𝒛\)\\textrm\{LCB\}^\{i,l\}\(\\bm\{z\}\)as

UCBi,l​\(𝒛\)=\{𝒛⊤​𝜽^0i\+2​κ​γ​∥𝒛∥\(𝑽0i\)−1l=0,𝒛⊤​𝜽^li\+2​γ​∥𝒛∥\(𝑯li\)−1l≠0\.\\textrm\{UCB\}^\{i,l\}\(\\bm\{z\}\)=\\begin\{cases\}\\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}^\{i\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}&l=0,\\\\ \\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{l\}^\{i\}\+2\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}\_\{l\}^\{i\}\)^\{\-1\}\}&l\\neq 0\.\\end\{cases\}
LCBi,l​\(𝒛\)=\{𝒛⊤​𝜽^0i−2​κ​γ​∥𝒛∥\(𝑽0i\)−1l=0,𝒛⊤​𝜽^li−2​γ​∥𝒛∥\(𝑯li\)−1l≠0\.\\textrm\{LCB\}^\{i,l\}\(\\bm\{z\}\)=\\begin\{cases\}\\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}^\{i\}\-2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}&l=0,\\\\ \\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{l\}^\{i\}\-2\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}\_\{l\}^\{i\}\)^\{\-1\}\}&l\\neq 0\.\\end\{cases\}
At roundtt, for all slotsi∈\[N\]i\\in\[N\], the item\-set𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}is sampled from distribution𝒟i\\mathcal\{D\}^\{i\}\. In batchmm, after pruning the arm\-set𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}with respect to𝜽^ki\\widehat\{\\bm\{\\theta\}\}\_\{k\}^\{i\}for0≤k≤m−10\\leq k\\leq m\-1, we obtain an item\-set sampled from the distribution𝒟ki\\mathcal\{D\}\_\{k\}^\{i\}\. Thus, pruning with respect to𝜽^0i,𝜽^1i,…,𝜽^m−1i\\widehat\{\\bm\{\\theta\}\}\_\{0\}^\{i\},\\widehat\{\\bm\{\\theta\}\}\_\{1\}^\{i\},\\ldots,\\widehat\{\\bm\{\\theta\}\}\_\{m\-1\}^\{i\}results in a sequence of set of items whose distributions are denoted by𝒟0i,𝒟1i,…,𝒟m−1i\.\\mathcal\{D\}^\{i\}\_\{0\},\\mathcal\{D\}^\{i\}\_\{1\},\\ldots,\\mathcal\{D\}^\{i\}\_\{m\-1\}\.

Finally, define the following quantities:

T​\(𝑯\):=\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)23​ρ2​log⁡\(2​d​N​Tδ\)andT​\(𝑽\):=\(48\+8​N​ρ\)​\(N−1\)23​ρ2​log⁡\(2​d​N​Tδ\)\.T\(\\bm\{H\}\):=\\frac\{\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\\quad\\text\{ and \}\\quad T\(\\bm\{V\}\):=\\frac\{\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\.
Unless otherwise mentioned, without loss of generality, we assume that all constants such asS,T,R,N,d,κS,T,R,N,d,\\kappaandLμL\_\{\\mu\}are greater than11\.

### A\.2Regret Guarantee forB\-SlateGLinCB

Now, we restate the regret guarantee forB\-SlateGLinCB, given in Theorem[3\.1](https://arxiv.org/html/2606.31449#S3.Thmtheorem1)\), and provide a proof for the same\.

###### Theorem A\.1\.

LetR​\(T\)R\(T\)denote the regret ofB\-SlateGLinCB\(Algorithm[1](https://arxiv.org/html/2606.31449#alg1)\)\. If

2​d​Nδ​\(48\+8​N​ρ\)​\(N−1\)23​ρ2≥e2\\sqrt\{\\frac\{2dN\}\{\\delta\}\}\\frac\{\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\geq\\frac\{e\}\{2\}and

T≥T0:=δ2​d​N​exp⁡\(−2​W−1​\(−3​ρ2​δ2​2​d​N​\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)2\)\),T\\geq T\_\{0\}:=\\frac\{\\delta\}\{2dN\}\\exp\\left\(\-2W\_\{\-1\}\\left\(\\frac\{\-3\\rho^\{2\}\\sqrt\{\\delta\}\}\{2\\sqrt\{2dN\}\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\}\\right\)\\right\),whereW−1W\_\{\-1\}represents the decreasing branch of the Lambert W function \(see Lemma[A\.14](https://arxiv.org/html/2606.31449#A1.Ex169)\), then,

R​\(T\)=𝒪~​\(R​S​N​d3/2​T⋅𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)\),R\(T\)=\\tilde\{\\mathcal\{O\}\}\\left\(RSNd^\{3/2\}\\sqrt\{T\\cdot\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\right\),where𝐱⋆=arg​max𝐱∈𝒳⁡𝐱⊤​𝛉⋆\\bm\{x\}\_\{\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}and𝒳=𝒳1×…×𝒳N\\mathcal\{X\}=\\mathcal\{X\}^\{1\}\\times\\ldots\\times\\mathcal\{X\}^\{N\}\.

###### Proof\.

At roundt∈𝒯mt\\in\\mathcal\{T\}\_\{m\}, let𝒙t,⋆\\bm\{x\}\_\{t,\\star\}be the optimal slate, i\.e,

𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.Then, the expected regret for Algorithm[1](https://arxiv.org/html/2606.31449#alg1),R​\(T\)R\(T\)can be written as

R​\(T\)≤𝔼\{𝒳ti∼𝒟i\}i=1N\[∑m∈\[M\]∑t∈𝒯m\|μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\|\]\.R\(T\)\\leq\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\left\[\\sum\_\{m\\in\[M\]\}\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert\\right\]\.We choose our batch lengths as follows:

\|𝒯0\|=⌊T⌋and\|𝒯m\|=⌊T1−2−m⌋,m≥1\.\|\\mathcal\{T\}\_\{0\}\|=\\lfloor\\sqrt\{T\}\\rfloor\\quad\\text\{ and \}\\quad\|\\mathcal\{T\}\_\{m\}\|=\\lfloor T^\{1\-2^\{\-m\}\}\\rfloor,m\\geq 1\.We now make a few observations regarding these batch lengths\. First, we obtain a total ofM=𝒪​\(log⁡log⁡T\)M=\\mathcal\{O\}\(\\log\\log T\)batches\. We also obtain the following inequalities:

\|𝒯m\|\|𝒯m−1\|≤T1−2−m⌊T1−21−m⌋=T⋅T1−21−m2⌊T1−21−m⌋≤T,\|𝒯m\|2\|𝒯m−1\|≤T2−21−m⌊T1−21−m⌋=T⋅T1−21−m⌊T1−21−m⌋≤T\.\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\}\\leq\\frac\{T^\{1\-2^\{\-m\}\}\}\{\\sqrt\{\\lfloor T^\{1\-2^\{1\-m\}\}\\rfloor\}\}=\\frac\{\\sqrt\{T\}\\cdot T^\{\\frac\{1\-2^\{1\-m\}\}\{2\}\}\}\{\\sqrt\{\\lfloor T^\{1\-2^\{1\-m\}\}\\rfloor\}\}\\leq\\sqrt\{T\},\\quad\\frac\{\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\\leq\\frac\{T^\{2\-2^\{1\-m\}\}\}\{\\lfloor T^\{1\-2^\{1\-m\}\}\\rfloor\}=\\frac\{T\\cdot T^\{1\-2^\{1\-m\}\}\}\{\\lfloor T^\{1\-2^\{1\-m\}\}\\rfloor\}\\leq T\.Now, to use Lemma[A\.9](https://arxiv.org/html/2606.31449#A1.Thmlemma9), we require that\|𝒯0\|≥T​\(𝑽\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(\\bm\{V\)\}and\|𝒯k\|≥T​\(𝑯\)\|\\mathcal\{T\}\_\{k\}\|\\geq T\(\\bm\{H\}\)for allk∈\[M\]k\\in\[M\]\. Using the definitions ofT​\(𝑽\)T\(\\bm\{V\}\)andT​\(𝑯\)T\(\\bm\{H\}\)from Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)and the chosen batch lengths, we get:

T≥\(48\+8​N​ρ\)​\(N−1\)23​ρ2​log⁡\(2​d​N​Tδ\)andT≥\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)23​ρ2​log⁡\(2​d​N​Tδ\)\.\\sqrt\{T\}\\geq\\frac\{\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\\quad\\text\{ and \}\\quad\\sqrt\{T\}\\geq\\frac\{\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\.Now, sinceLμ≥1L\_\{\\mu\}\\geq 1, the assumptions

2​d​Nδ​\(48\+8​N​ρ\)​\(N−1\)23​ρ2≥e2\\sqrt\{\\frac\{2dN\}\{\\delta\}\}\\frac\{\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\geq\\frac\{e\}\{2\}and

T≥2≥δ2​d​N​exp⁡\(12\)T\\geq 2\\geq\\frac\{\\delta\}\{2dN\}\\exp\\left\(\\frac\{1\}\{2\}\\right\)ensures that we can use Lemma[A\.14](https://arxiv.org/html/2606.31449#A1.Ex169)to satisfy the conditions for Lemma[A\.9](https://arxiv.org/html/2606.31449#A1.Thmlemma9)giving us

T\\displaystyle T≥δ2​d​N​exp⁡\(max⁡\{−2​W−1​\(−3​ρ2​δ2​2​d​N​\(48\+8​N​ρ\)​\(N−1\)2\),−2​W−1​\(−3​ρ2​δ2​2​d​N​\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)2\)\}\)\\displaystyle\\geq\\frac\{\\delta\}\{2dN\}\\exp\\left\(\\max\\left\\\{\-2W\_\{\-1\}\\left\(\\frac\{\-3\\rho^\{2\}\\sqrt\{\\delta\}\}\{2\\sqrt\{2dN\}\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\\right\),\-2W\_\{\-1\}\\left\(\\frac\{\-3\\rho^\{2\}\\sqrt\{\\delta\}\}\{2\\sqrt\{2dN\}\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\}\\right\)\\right\\\}\\right\)=T0:=δ2​d​N​exp⁡\(−2​W−1​\(−3​ρ2​δ2​2​d​N​\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)2\)\)\\displaystyle=T\_\{0\}:=\\frac\{\\delta\}\{2dN\}\\exp\\left\(\-2W\_\{\-1\}\\left\(\\frac\{\-3\\rho^\{2\}\\sqrt\{\\delta\}\}\{2\\sqrt\{2dN\}\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\}\\right\)\\right\)where we use the fact thatexp⁡\(−x\)\\exp\(\-x\)is decreasing,W−1​\(x\)W\_\{\-1\}\(x\)is decreasing on\(−e−1,0\)\(\-e^\{\-1\},0\)\(Lemma[A\.14](https://arxiv.org/html/2606.31449#A1.Ex169)\) andLμ≥1L\_\{\\mu\}\\geq 1\.

Thus, assumingT≥T0T\\geq T\_\{0\}, using Lemma[A\.9](https://arxiv.org/html/2606.31449#A1.Thmlemma9)form≥2m\\geq 2, as well as, a trivial regret bound ofRRfor each roundt∈𝒯0∪𝒯1t\\in\\mathcal\{T\}\_\{0\}\\cup\\mathcal\{T\}\_\{1\}, we get that

R​\(T\)≤R​\(\|𝒯0\|\+\|𝒯1\|\)\+∑m∈\[2,M\]∑t∈𝒯m𝔼\{𝒳ti∼𝒟mi\}i=1N\|μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\|\.\\displaystyle R\(T\)\\leq R\\left\(\|\\mathcal\{T\}\_\{0\}\|\+\|\\mathcal\{T\}\_\{1\}\|\\right\)\+\\sum\_\{m\\in\[2,M\]\}\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\\\}\_\{i=1\}^\{N\}\}\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert\.≤R​\(\|𝒯0\|\+\|𝒯1\|\)\+∑m=2M\[320​e3​S​γ2​N2​d2​κ​LμS​\|𝒯m\|\|𝒯m−1\|​\|𝒯0\|\+8​γ​𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​8​d2​N​\|𝒯m\|2\|𝒯m−1\|\+4​Lμ​N​\|𝒯m\|ρ\]\.\\displaystyle\\leq R\\left\(\|\\mathcal\{T\}\_\{0\}\|\+\|\\mathcal\{T\}\_\{1\}\|\\right\)\+\\sum\_\{m=2\}^\{M\}\\left\[\\frac\{320e^\{3S\}\\gamma^\{2\}N^\{2\}d^\{2\}\\sqrt\{\\kappa L\_\{\\mu\}\}\}\{S\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\|\\mathcal\{T\}\_\{0\}\|\}\}\+8\\gamma\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sqrt\{\\frac\{8d^\{2\}N\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\+\\frac\{4L\_\{\\mu\}N\|\\mathcal\{T\}\_\{m\}\|\}\{\\rho\}\}\\right\]\.
Substituting the values of\|𝒯m\|\|\\mathcal\{T\}\_\{m\}\|,\|𝒯m\|\|𝒯m−1\|\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\},\|𝒯m\|2\|𝒯m−1\|\\frac\{\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}, and using the facts that\|𝒯m\|≤T\|\\mathcal\{T\}\_\{m\}\|\\leq Tand\|𝒯0\|=⌊T⌋≥T/2\|\\mathcal\{T\}\_\{0\}\|=\\lfloor\\sqrt\{T\}\\rfloor\\geq\\sqrt\{T\}/2, we get

R​\(T\)\\displaystyle R\(T\)≤\(8​γ​𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​8​d2​N\+4​Lμ​N​ρ−1​log⁡log⁡T\+2​R\)​T\+𝒪~​\(e3​S​κ​T1/4\)\.\\displaystyle\\leq\\left\(8\\gamma\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sqrt\{8d^\{2\}N\+4L\_\{\\mu\}N\\rho^\{\-1\}\}\\log\\log T\+2R\\right\)\\sqrt\{T\}\+\\tilde\{\\mathcal\{O\}\}\\left\(e^\{3S\}\\sqrt\{\\kappa\}T^\{1/4\}\\right\)\.Substitutingγ=𝒪​\(S​R​N​d​log⁡\(T​δ−1\)\)\\gamma=\\mathcal\{O\}\\left\(SR\\sqrt\{Nd\\log\(T\\delta^\{\-1\}\)\}\\right\)from Lemma[A\.1](https://arxiv.org/html/2606.31449#A1.Ex42)gives us

R​\(T\)=𝒪~​\(R​S​N​d3/2​T⋅𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)\)\.R\(T\)=\\tilde\{\\mathcal\{O\}\}\\left\(RSNd^\{3/2\}\\sqrt\{T\\cdot\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\right\)\.∎

### A\.3Supporting Lemmas for Theorem[A\.1](https://arxiv.org/html/2606.31449#A1.Thmtheorem1)

###### Lemma A\.1\.

\(Lemma A\.2,\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\) Let\{𝐱1,…,𝐱t\}⊂ℝd\\\{\\bm\{x\}\_\{1\},\\ldots,\\bm\{x\}\_\{t\}\\\}\\subset\\mathbb\{R\}^\{d\}be independent arm pulls and\{r1,…,rt\}\\\{r\_\{1\},\\ldots,r\_\{t\}\\\}be the corresponding rewards associated with them\. Define the matrix𝐇t⋆\\bm\{H\}\_\{t\}^\{\\star\}as follows:

𝑯t⋆=λ​𝑰\+∑s∈\[t\]μ˙​\(𝒙s⊤​𝜽⋆\)​𝒙s​𝒙s⊤\.\\bm\{H\}\_\{t\}^\{\\star\}=\\lambda\\bm\{I\}\+\\sum\_\{s\\in\[t\]\}\\dot\{\\mu\}\(\\bm\{x\}\_\{s\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\.Also, let𝛉^t\\widehat\{\\bm\{\\theta\}\}\_\{t\}be the maximum likelihood estimator of𝛉⋆\\bm\{\\theta\}^\{\\star\}\. Then, forλ=𝒪​\(d​R2​log⁡\(T​δ−1\)\)\\lambda=\\mathcal\{O\}\\left\(dR^\{2\}\\log\(T\\delta^\{\-1\}\)\\right\), with high probability,

∥𝜽⋆−𝜽^t∥𝑯t⋆≤γ:=𝒪​\(S​R​d​log⁡\(T​δ−1\)\)\.\\lVert\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{t\}\\rVert\_\{\\bm\{H\}\_\{t\}^\{\\star\}\}\\leq\\gamma:=\\mathcal\{O\}\\left\(SR\\sqrt\{d\\log\(T\\delta^\{\-1\}\)\}\\right\)\.

###### Lemma A\.2\.

Let𝐱=\(𝐱1,…,𝐱N\)\\bm\{x\}=\(\\bm\{x\}^\{1\},\\ldots,\\bm\{x\}^\{N\}\)be some slate\. Then, for\|𝒯0\|≥T​\(𝐕\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(\\bm\{V\}\), we have that

𝒙⊤​\(𝜽⋆−𝜽^0\)≤2​κ​γ​∑i=1N∥𝒙i∥\(𝑽0i\)−1\.\\bm\{x\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\leq 2\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\.

###### Proof\.

For the sake of this proof, define

𝑯0⋆=λ​𝑰\+∑t∈𝒯0μ˙​\(𝒙t⊤​𝜽⋆\)​𝒙t​𝒙t⊤\.\\bm\{H\}\_\{0\}^\{\\star\}=\\lambda\\bm\{I\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}\.Then, using the definition ofκ\\kappa, we have thatμ˙​\(𝒙t⊤​𝜽⋆\)≥κ−1\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\geq\\kappa^\{\-1\}\. Thus,

𝑯0⋆\\displaystyle\\bm\{H\}\_\{0\}^\{\\star\}⪰λ​𝑰\+κ−1​∑t∈𝒯0𝒙t​𝒙t⊤\\displaystyle\\succeq\\lambda\\bm\{I\}\+\\kappa^\{\-1\}\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}=κ−1​\(κ​λ​𝑰\+∑t∈𝒯0𝒙t​𝒙t⊤\)\\displaystyle=\\kappa^\{\-1\}\\left\(\\kappa\\lambda\\bm\{I\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}\\right\)⪰κ−1​𝑽0\\displaystyle\\succeq\\kappa^\{\-1\}\\bm\{V\}\_\{0\}where the last inequality uses the fact thatλ≥1\\lambda\\geq 1andκ≥1\\kappa\\geq 1\. Hence, we can write that

𝒙⊤​\(𝜽⋆−𝜽^0\)\\displaystyle\\bm\{x\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)≤∥𝜽⋆−𝜽^0∥𝑯0⋆​∥𝒙∥𝑯0⋆−1\\displaystyle\\leq\\lVert\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rVert\_\{\\bm\{H\}^\{\\star\}\_\{0\}\}\\lVert\\bm\{x\}\\rVert\_\{\{\\bm\{H\}^\{\\star\}\_\{0\}\}^\{\-1\}\}≤κ​γ​‖𝒙‖𝑽0−1\\displaystyle\\leq\\sqrt\{\\kappa\}\\gamma\\left\\lVert\\bm\{x\}\\right\\rVert\_\{\\bm\{V\}\_\{0\}^\{\-1\}\}≤2​κ​γ​∑i=1N‖𝒙i‖\(𝑽0i\)−1\\displaystyle\\leq 2\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\left\\lVert\\bm\{x\}^\{i\}\\right\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}where the second inequality follows from Lemma[A\.1](https://arxiv.org/html/2606.31449#A1.Ex42)and the final inequality follows from Lemma[A\.13](https://arxiv.org/html/2606.31449#A1.Ex158)\. ∎

###### Lemma A\.3\.

Lett∈𝒯mt\\in\\mathcal\{T\}\_\{m\}\. Let𝐱,𝐲∈𝒳t\\bm\{x\},\\bm\{y\}\\in\\mathcal\{X\}\_\{t\}be two slates which do not get eliminated\. Then

\|\(𝒙−𝒚\)⊤​𝜽^0\|≤4​κ​γ​∑i=1N∥𝒃i∥\(𝑽0i\)−1\.\\left\\lvert\(\\bm\{x\}\-\\bm\{y\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\\leq 4\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\.

###### Proof\.

Using the triangle inequality, we can write

\|\(𝒙−𝒚\)⊤​𝜽^0\|≤∑i=1N\|\(𝒙i−𝒚i\)⊤​𝜽^0i\|\.\\left\\lvert\(\\bm\{x\}\-\\bm\{y\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\\leq\\sum\_\{i=1\}^\{N\}\\left\\lvert\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\right\\rvert\.Now, since both𝒙\\bm\{x\}and𝒚\\bm\{y\}survive the elimination, their respective components𝒙i\\bm\{x\}^\{i\}and𝒚i\\bm\{y\}^\{i\}for alli∈\[N\]i\\in\[N\]also do not get eliminated\. Thus, for a fixedii, we have

UCBi,0​\(𝒙i\)≥max𝒛∈𝒳ti⁡LCBi,0​\(𝒛i\)≥LCBi,0​\(𝒚i\)\.\\textrm\{UCB\}^\{i,0\}\(\\bm\{x\}^\{i\}\)\\geq\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}^\{i\}\)\\geq\\textrm\{LCB\}^\{i,0\}\(\\bm\{y\}^\{i\}\)\.Using the definitions ofUCBi,0​\(𝒛\)\\textrm\{UCB\}^\{i,0\}\(\\bm\{z\}\)andLCBi,0​\(𝒛\)\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}\)\(Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\), we get

\(𝒚i−𝒙i\)⊤​𝜽^0i≤2​κ​γ​∥𝒙i∥\(𝑽0i\)−1\+2​κ​γ​∥𝒚i∥\(𝑽0i\)−1\.\(\\bm\{y\}^\{i\}\-\\bm\{x\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\leq 2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{y\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\.A symmetric argument gives us

\(𝒙i−𝒚i\)⊤​𝜽^0i≤2​κ​γ​∥𝒙i∥\(𝑽0i\)−1\+2​κ​γ​∥𝒚i∥\(𝑽0i\)−1\.\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\leq 2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{y\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\.Thus, combining both the inequalities, we get

\|\(𝒙i−𝒚i\)⊤​𝜽^0i\|\\displaystyle\\left\\lvert\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\right\\rvert≤2​κ​γ​∥𝒙i∥\(𝑽0i\)−1\+2​κ​γ​∥𝒚i∥\(𝑽0i\)−1\\displaystyle\\leq 2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{y\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}≤4κγmaxz∈𝒳ti∥𝒛∥\(𝑽0i\)−1\\displaystyle\\leq 4\\sqrt\{\\kappa\}\\gamma\\max\_\{z\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}Substituting this back and using the definition of𝒃ti\\bm\{b\}^\{i\}\_\{t\}gives us

\|\(𝒙−𝒚\)⊤​𝜽^0\|≤4​κ​γ​∑i=1N∥𝒃i∥\(𝑽0i\)−1\.\\left\\lvert\(\\bm\{x\}\-\\bm\{y\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\\leq 4\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\.∎

###### Lemma A\.4\.

Lett∈𝒯mt\\in\\mathcal\{T\}\_\{m\}\. Also, let𝐇m⋆\\bm\{H\}\_\{m\}^\{\\star\}and𝐇m\\bm\{H\}\_\{m\}be as defined in Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\. Then, we have that

𝑯m⋆⪰𝑯m\.\\bm\{H\}\_\{m\}^\{\\star\}\\succeq\\bm\{H\}\_\{m\}\.Also, for any slate𝐱∈𝒳t\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}that survives the elimination, if\|𝒯0\|≥T​\(𝐕\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(\\bm\{V\}\)and\|𝒯m\|≥T​\(𝐇\)\|\\mathcal\{T\}\_\{m\}\|\\geq T\(\\bm\{H\}\), then, we have that

𝒙⊤​\(𝜽⋆−𝜽^m\)≤2​γ​∑i=1N∥𝒙i∥\(𝑯mi\)−1\.\\bm\{x\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{m\}\)\\leq 2\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{H\}\_\{m\}^\{i\}\)^\{\-1\}\}\.

###### Proof\.

Recall the definition of𝒃t\\bm\{b\}\_\{t\}from Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\. From the self\-concordance property of GLMs, we have

μ˙​\(𝒃t⊤​𝜽^0\)≤μ˙​\(𝒙⊤​𝜽⋆\)​exp⁡\(\|𝒃t⊤​𝜽^0−𝒙⊤​𝜽⋆\|\)\.\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\leq\\dot\{\\mu\}\(\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\exp\(\\lvert\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\-\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\rvert\)\.We can bound\|𝒃t⊤​𝜽^0−𝒙⊤​𝜽⋆\|\\lvert\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\-\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\rvertas follows:

\|𝒃t⊤​𝜽^0−𝒙⊤​𝜽⋆\|≤\|𝒃t⊤​𝜽^0\|\+\|𝒙⊤​𝜽⋆\|≤2​S\.\\lvert\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\-\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\rvert\\leq\\lvert\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert\+\\lvert\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\rvert\\leq 2S\.
Also, using Lemma[A\.2](https://arxiv.org/html/2606.31449#A1.Ex43)and Lemma[A\.3](https://arxiv.org/html/2606.31449#A1.Ex51), we have

\|𝒃⊤​𝜽^0−𝒙⊤​𝜽⋆\|\\displaystyle\\lvert\\bm\{b\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\-\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\rvert≤\|𝒙⊤​𝜽⋆−𝒙⊤​𝜽^0\|\+\|𝒃⊤​𝜽^0−𝒙⊤​𝜽^0\|\\displaystyle\\leq\\lvert\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert\+\\lvert\\bm\{b\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\-\\bm\{x\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert≤6​κ​γ​∑i=1N∥𝒃i∥\(𝑽0i\)−1\.\\displaystyle\\leq 6\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\.
Thus, combining both the inequalities results in

μ˙​\(𝒃⊤​𝜽^0\)≤μ˙​\(𝒙⊤​𝜽⋆\)​exp⁡\(min⁡\{2​S,6​κ​γ​∑i=1N∥𝒃i∥\(𝑽0i\)−1\}\)\.\\dot\{\\mu\}\(\\bm\{b\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\leq\\dot\{\\mu\}\(\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\exp\\left\(\\min\\left\\\{2S,6\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\\\}\\right\)\.
Noting that the multiplicative factor on the right side is preciselyβt\\beta\_\{t\}\(refer Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\), we get that:

𝑯m⋆\\displaystyle\\bm\{H\}^\{\\star\}\_\{m\}=λ​𝑰N​d\+∑t∈𝒯mμ˙​\(𝒙t⊤​𝜽⋆\)​𝒙t​𝒙t⊤\\displaystyle=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}⪰λ​𝑰N​d\+∑t∈𝒯mμ˙​\(𝒃⊤​𝜽^0\)​βt−1​𝒙t​𝒙t⊤=𝑯m\.\\displaystyle\\succeq\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\dot\{\\mu\}\(\\bm\{b\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}=\\bm\{H\}\_\{m\}\.This completes the proof for the first part\. For the second part, we have

𝒙⊤​\(𝜽⋆−𝜽^m\)\\displaystyle\\bm\{x\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{m\}\)≤∥𝜽⋆−𝜽^m∥𝑯m⋆​∥𝒙∥𝑯m⋆−1\\displaystyle\\leq\\lVert\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{m\}\\rVert\_\{\\bm\{H\}^\{\\star\}\_\{m\}\}\\lVert\\bm\{x\}\\rVert\_\{\{\\bm\{H\}^\{\\star\}\_\{m\}\}^\{\-1\}\}≤2​γ​∑i=1N‖𝒙i‖\(𝑯mi\)−1\.\\displaystyle\\leq 2\\gamma\\sum\_\{i=1\}^\{N\}\\left\\lVert\\bm\{x\}^\{i\}\\right\\rVert\_\{\(\\bm\{H\}\_\{m\}^\{i\}\)^\{\-1\}\}\.where the final inequality follows from Lemma[A\.1](https://arxiv.org/html/2606.31449#A1.Ex42)and Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136)\. ∎

###### Lemma A\.5\.

Lett∈𝒯mt\\in\\mathcal\{T\}\_\{m\}\. Let𝐱,𝐲∈𝒳t\\bm\{x\},\\bm\{y\}\\in\\mathcal\{X\}\_\{t\}be two slates which do not get eliminated\. Then, for all1≤k≤m−11\\leq k\\leq m\-1, we have

\|\(𝒙−𝒚\)⊤𝜽^k\|≤4γ∑i=1Nmax𝒛∈𝒳ti∥𝒛∥\(𝑯ki\)−1\.\\left\\lvert\(\\bm\{x\}\-\\bm\{y\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{k\}\\right\\rvert\\leq 4\\gamma\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}\_\{k\}^\{i\}\)^\{\-1\}\}\.

###### Proof\.

Using the triangle inequality, we can write

\|\(𝒙−𝒚\)⊤​𝜽^k\|≤∑i=1N\|\(𝒙i−𝒚i\)⊤​𝜽^ki\|\.\\left\\lvert\(\\bm\{x\}\-\\bm\{y\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{k\}\\right\\rvert\\leq\\sum\_\{i=1\}^\{N\}\\left\\lvert\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{k\}\\right\\rvert\.Now, since both𝒙\\bm\{x\}and𝒚\\bm\{y\}survive the elimination, for an arbitrary but fixed sloti∈\[N\]i\\in\[N\], their respective components𝒙i\\bm\{x\}^\{i\}and𝒚i\\bm\{y\}^\{i\}also do not get eliminated\. Thus, we have

UCBi,k​\(𝒙i\)≥max𝒛∈𝒳ti⁡LCBi,k​\(𝒛i\)≥LCBi,k​\(𝒚i\)\.\\textrm\{UCB\}^\{i,k\}\(\\bm\{x\}^\{i\}\)\\geq\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,k\}\(\\bm\{z\}^\{i\}\)\\geq\\textrm\{LCB\}^\{i,k\}\(\\bm\{y\}^\{i\}\)\.Using the definitions ofUCBi,k​\(𝒛\)\\textrm\{UCB\}^\{i,k\}\(\\bm\{z\}\)andLCBi,k​\(𝒛\)\\textrm\{LCB\}^\{i,k\}\(\\bm\{z\}\)\(Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\), we get

\(𝒚i−𝒙i\)⊤​𝜽^ki≤2​γ​∥𝒙i∥\(𝑯ki\)−1\+2​γ​∥𝒚i∥\(𝑯ki\)−1\.\(\\bm\{y\}^\{i\}\-\\bm\{x\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{k\}\\leq 2\\gamma\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{k\}\)^\{\-1\}\}\+2\\gamma\\lVert\\bm\{y\}^\{i\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{k\}\)^\{\-1\}\}\.A symmetric argument gives us

\(𝒙i−𝒚i\)⊤​𝜽^ki≤2​γ​∥𝒙i∥\(𝑯ki\)−1\+2​γ​∥𝒚i∥\(𝑯ki\)−1\.\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{k\}\\leq 2\\gamma\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{k\}\)^\{\-1\}\}\+2\\gamma\\lVert\\bm\{y\}^\{i\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{k\}\)^\{\-1\}\}\.Thus, combining both the inequalities gives us

\|\(𝒙i−𝒚i\)⊤​𝜽^ki\|\\displaystyle\\left\\lvert\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{k\}\\right\\rvert≤2​γ​∥𝒙i∥\(𝑯ki\)−1\+2​γ​∥𝒚i∥\(𝑯ki\)−1\\displaystyle\\leq 2\\gamma\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{k\}\)^\{\-1\}\}\+2\\gamma\\lVert\\bm\{y\}^\{i\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{k\}\)^\{\-1\}\}≤4γmaxz∈𝒳ti∥𝒛∥\(𝑯ki\)−1\.\\displaystyle\\leq 4\\gamma\\max\_\{z\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{k\}\)^\{\-1\}\}\.Summing over all slotsi∈\[N\]i\\in\[N\]finishes the proof\. ∎

###### Lemma A\.6\.

Fort∈𝒯mt\\in\\mathcal\{T\}\_\{m\}, wherem\>0m\>0, define the optimal slate𝐱t,⋆=\(𝐱t,⋆1,…,𝐱t,⋆N\)\\bm\{x\}\_\{t,\\star\}=\(\\bm\{x\}^\{1\}\_\{t,\\star\},\\ldots,\\bm\{x\}^\{N\}\_\{t,\\star\}\)as follows:

𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.If\|𝒯0\|≥T​\(𝐕\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(\\bm\{V\}\)and\|𝒯k\|≥T​\(𝐇\)\|\\mathcal\{T\}\_\{k\}\|\\geq T\(\\bm\{H\}\)for all1≤k≤m−11\\leq k\\leq m\-1, then, the optimal slate never gets eliminated\.

###### Proof\.

First, note that since𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}, it is easy to see that𝒙t,⋆i=arg​max𝒛∈𝒳ti⁡𝒛⊤​𝜽⋆i\\bm\{x\}\_\{t,\\star\}^\{i\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\bm\{z\}^\{\\top\}\{\\bm\{\\theta\}^\{\\star\}\}^\{i\}, or, in other words𝒙~t,⋆i=arg​max𝒛∈𝒳ti⁡𝒛~⊤​𝜽⋆\\tilde\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\tilde\{\\bm\{z\}\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.

Fixi∈\[N\]i\\in\[N\]\. Then, for some arbitrary𝒛∈𝒳ti\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}, we have that

0\\displaystyle 0≤\(𝒙~t,⋆i−𝒛~\)⊤​𝜽⋆\\displaystyle\\leq\\left\(\\tilde\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\-\\tilde\{\\bm\{z\}\}\\right\)^\{\\top\}\\bm\{\\theta\}^\{\\star\}=\(𝒙~t,⋆i−𝒛~\)⊤​\(𝜽⋆−𝜽^0\)\+\(𝒙~t,⋆i−𝒛~\)⊤​𝜽^0\\displaystyle=\\left\(\\tilde\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\-\\tilde\{\\bm\{z\}\}\\right\)^\{\\top\}\\left\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\)\+\\left\(\\tilde\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\-\\tilde\{\\bm\{z\}\}\\right\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}≤2​κ​γ​∥𝒙t,⋆i∥\(𝑽0i\)−1\+2​κ​γ​∥𝒛∥\(𝑽0i\)−1\+\(𝒙~t,⋆i−𝒛~\)⊤​𝜽^0\\displaystyle\\leq 2\\sqrt\{\\kappa\}\\gamma\\lVert\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\+\\left\(\\tilde\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\-\\tilde\{\\bm\{z\}\}\\right\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}=UCBi,0​\(𝒙t,⋆i\)−LCBi,0​\(𝒛\),\\displaystyle=\\textrm\{UCB\}^\{i,0\}\(\\bm\{x\}^\{i\}\_\{t,\\star\}\)\-\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}\),where the second inequality follows from Lemma[A\.2](https://arxiv.org/html/2606.31449#A1.Ex43)\. Since this is true∀𝒛∈𝒳ti\\forall\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}, we get

UCBi,0​\(𝒙t,⋆i\)≥max𝒛∈𝒳ti⁡LCBi,0​\(𝒛\),\\textrm\{UCB\}^\{i,0\}\(\\bm\{x\}^\{i\}\_\{t,\\star\}\)\\geq\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}\),or in other words,

UCBi,0​\(𝒙t,⋆i\)−max𝒛∈𝒳ti⁡LCBi,0​\(𝒛\)≥0\.\\textrm\{UCB\}^\{i,0\}\(\\bm\{x\}^\{i\}\_\{t,\\star\}\)\-\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}\)\\geq 0\.
Since this holds for a fixed but arbitraryi∈\[N\]i\\in\[N\], the above inequality holds for alli∈\[N\]i\\in\[N\]\.

Similarly, for allk∈\[1,m−1\]k\\in\[1,m\-1\]andi∈\[N\]i\\in\[N\], we can show that

UCBi,k​\(𝒙t,⋆i\)−max𝒛∈𝒳ti⁡LCBi,k​\(𝒛\)≥0\.\\textrm\{UCB\}^\{i,k\}\(\\bm\{x\}^\{i\}\_\{t,\\star\}\)\-\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,k\}\(\\bm\{z\}\)\\geq 0\.Thus, the components of the optimal slate, and hence, the optimal slate never gets eliminated\. ∎

###### Lemma A\.7\.

For some sloti∈\[N\]i\\in\[N\]and batchm∈\[0,M\]m\\in\[0,M\], for allj≤m−1j\\leq m\-1, let𝒟ji\\mathcal\{D\}^\{i\}\_\{j\}be defined as in Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\. Then, for any matrix𝐀⪰0\\bm\{A\}\\succeq 0, we have that

𝔼𝒳i∼𝒟mimax𝒙i∈𝒳i∥𝒙∥𝑨≤𝔼𝒳i∼𝒟jimax𝒙i∈𝒳i∥𝒙∥𝑨∀j∈\[m−1\]\.\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\}\\max\\limits\_\{\\bm\{x\}^\{i\}\\in\\mathcal\{X\}^\{i\}\}\\lVert\\bm\{x\}\\rVert\_\{\\bm\{A\}\}\\leq\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\_\{j\}\}\\max\\limits\_\{\\bm\{x\}^\{i\}\\in\\mathcal\{X\}^\{i\}\}\\lVert\\bm\{x\}\\rVert\_\{\\bm\{A\}\}\\;\\forall j\\in\[m\-1\]\.

###### Proof\.

The proof only relies on the manner in which the sequence of distributions\{𝒟ji\}j∈\[M\]\\\{\\mathcal\{D\}^\{i\}\_\{j\}\\\}\_\{j\\in\[M\]\}is constructed\. In particular, the pruning step ensures that the set of items that survive the pruning with respect to𝜽^m\\widehat\{\\bm\{\\theta\}\}\_\{m\}is always a smaller set than the set of items that survive the pruning with respect to𝜽^j\\widehat\{\\bm\{\\theta\}\}\_\{j\}forj≤m−1j\\leq m\-1\. Thus, for some sloti∈\[N\]i\\in\[N\], the pruning step gives rise to the following chain of subsets:

𝒟mi⊆𝒟m−1i⊆…⊆𝒟1i⊆𝒟0i⊆𝒟i\.\\mathcal\{D\}^\{i\}\_\{m\}\\subseteq\\mathcal\{D\}^\{i\}\_\{m\-1\}\\subseteq\\ldots\\subseteq\\mathcal\{D\}^\{i\}\_\{1\}\\subseteq\\mathcal\{D\}^\{i\}\_\{0\}\\subseteq\\mathcal\{D\}^\{i\}\.Note that this result is a generalization of Claim A\.11 from\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]to the slate setting\. Hence, the claim follows\. ∎

###### Lemma A\.8\.

At roundt∈𝒯mt\\in\\mathcal\{T\}\_\{m\}, let𝐱t,⋆\\bm\{x\}\_\{t,\\star\}be the optimal slate, i\.e,

𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.Define𝐱˘:=μ˙​\(𝐛t⊤​𝛉^0\)​βt−1​𝐱\\breve\{\\bm\{x\}\}:=\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\}\\bm\{x\}\. If\|𝒯0\|≥T​\(𝐕\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(\\bm\{V\}\)and\|𝒯k\|≥T​\(𝐇\)\|\\mathcal\{T\}\_\{k\}\|\\geq T\(\\bm\{H\}\)for all1≤k≤m−11\\leq k\\leq m\-1, then, we have

\|μ\(𝒙t,⋆⊤𝜽⋆\)−μ\(𝒙t⊤𝜽⋆\)\|≤8γμ˙​\(𝒙t,⋆⊤​𝜽⋆\)\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)\(5​e3​S​κ​γS∑i=1N∥𝒃ti∥\(𝑽0i\)−1\+1\)\.\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert\\leq 8\\gamma\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)\\left\(\\frac\{5e^\{3S\}\\sqrt\{\\kappa\}\\gamma\}\{S\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\+1\\right\)\.

###### Proof\.

Using a first\-order Taylor series expansion, for somezt∈\[𝒙t⊤​𝜽⋆,𝒙t,⋆⊤​𝜽⋆\]z\_\{t\}\\in\[\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\},\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\], we have that

\|μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\|≤μ˙​\(zt\)​\|𝒙t,⋆⊤​𝜽⋆−𝒙t⊤​𝜽⋆\|\.\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert\\leq\\dot\{\\mu\}\(z\_\{t\}\)\\left\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\right\\rvert\.
Further, using Lemma[A\.4](https://arxiv.org/html/2606.31449#A1.Ex60)and Lemma[A\.5](https://arxiv.org/html/2606.31449#A1.Ex70), we have

μ˙​\(zt\)​\|𝒙t,⋆⊤​𝜽⋆−𝒙t⊤​𝜽⋆\|\\displaystyle\\dot\{\\mu\}\(z\_\{t\}\)\\left\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\right\\rvert≤μ˙​\(zt\)​\|𝒙t,⋆⊤​𝜽⋆−𝒙t,⋆⊤​𝜽^m−1\|\+μ˙​\(zt\)​\|𝒙t⊤​𝜽⋆−𝒙t⊤​𝜽^m−1\|\+μ˙​\(zt\)​\|𝒙t,⋆⊤​𝜽^m−1−𝒙t⊤​𝜽^m−1\|\\displaystyle\\leq\\dot\{\\mu\}\(z\_\{t\}\)\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{m\-1\}\\rvert\+\\dot\{\\mu\}\(z\_\{t\}\)\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{m\-1\}\\rvert\+\\dot\{\\mu\}\(z\_\{t\}\)\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{m\-1\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{m\-1\}\\rvert≤8γ∑i=1Nμ˙\(zt\)max𝒖∈𝒳ti∥𝒖∥\(𝑯m−1i\)−1\.\\displaystyle\\leq 8\\gamma\\sum\_\{i=1\}^\{N\}\\dot\{\\mu\}\(z\_\{t\}\)\\max\_\{\\bm\{u\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{u\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\.
Define𝒙˘:=μ˙​\(𝒃t⊤​𝜽^0\)​βt−1​𝒙\\breve\{\\bm\{x\}\}:=\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\}\\bm\{x\}, then, we have

\|μ\(𝒙t,⋆⊤𝜽⋆\)−μ\(𝒙t⊤𝜽⋆\)\|≤8γ∑i=1Nμ˙​\(zt\)2​βtμ˙​\(𝒃t⊤​𝜽^0\)max𝒖∈𝒳ti∥𝒖˘∥\(𝑯m−1i\)−1\.\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert\\leq 8\\gamma\\sum\_\{i=1\}^\{N\}\\sqrt\{\\frac\{\\dot\{\\mu\}\(z\_\{t\}\)^\{2\}\\beta\_\{t\}\}\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\}\\max\_\{\\bm\{u\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{u\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\.Using the self\-concordance property of GLMs, we have that

μ˙​\(zt\)μ˙​\(𝒃t⊤​𝜽^0\)≤exp⁡\(R​\|zt−𝒃t⊤​𝜽^0\|\)\.\\frac\{\\dot\{\\mu\}\(z\_\{t\}\)\}\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\\leq\\exp\\left\(R\\left\\lvert z\_\{t\}\-\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\\right\)\.Sincezt∈\[𝒙t⊤​𝜽⋆,𝒙t,⋆⊤​𝜽⋆\]z\_\{t\}\\in\[\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\},\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\], a trivial bound using∥𝒙∥≤1\\lVert\\bm\{x\}\\rVert\\leq 1and∥𝜽⋆∥≤S\\lVert\\bm\{\\theta\}^\{\\star\}\\rVert\\leq Sgives us

\|zt−𝒃t⊤​𝜽^0\|≤2​S\.\\lvert z\_\{t\}\-\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert\\leq 2S\.Also, since the optimal slate𝒙t,⋆\\bm\{x\}\_\{t,\\star\}never gets eliminated \(Lemma[A\.6](https://arxiv.org/html/2606.31449#A1.Thmlemma6)\), using Lemma[A\.2](https://arxiv.org/html/2606.31449#A1.Ex43)and Lemma[A\.3](https://arxiv.org/html/2606.31449#A1.Ex51)gives us

\|zt−𝒃t⊤​𝜽^0\|\\displaystyle\\lvert z\_\{t\}\-\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert≤\|𝒙t⊤​𝜽⋆−𝒙t⊤​𝜽^0\|\+\|𝒙t⊤​𝜽^0−𝒃t⊤​𝜽^0\|\+\|𝒙t⊤​𝜽⋆−zt\|\\displaystyle\\leq\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert\+\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\-\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert\+\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-z\_\{t\}\\rvert≤​6​κ​γ​∑i=1N∥𝒃ti∥\(𝑽0i\)−1\+\|𝒙t⊤​𝜽⋆−𝒙t,⋆⊤​𝜽⋆\|\\displaystyle\\overset\{\}\{\\leq\}6\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\+\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\rvertand again, using Lemma[A\.2](https://arxiv.org/html/2606.31449#A1.Ex43), Lemma[A\.3](https://arxiv.org/html/2606.31449#A1.Ex51), and Lemma[A\.6](https://arxiv.org/html/2606.31449#A1.Thmlemma6), we have

\|𝒙t,⋆⊤​𝜽⋆−𝒙t⊤​𝜽⋆\|\\displaystyle\\left\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\right\\rvert≤\|𝒙t,⋆⊤​𝜽⋆−𝒙t,⋆⊤​𝜽^0\|\+\|𝒙t⊤​𝜽⋆−𝒙t⊤​𝜽^0\|\+\|𝒙t,⋆⊤​𝜽^0−𝒙t⊤​𝜽^0\|\\displaystyle\\leq\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert\+\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert\+\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rvert≤8​κ​γ​∑i=1N∥𝒃ti∥\(𝑽0i\)−1\.\\displaystyle\\leq 8\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\.
Thus, we get

μ˙​\(zt\)μ˙​\(𝒃t⊤​𝜽^0\)≤exp⁡\(R​min⁡\{2​S,14​κ​γ​∑i=1N∥𝒃ti∥\(𝑽0i\)−1\}\)\.\\frac\{\\dot\{\\mu\}\(z\_\{t\}\)\}\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\\leq\\exp\\left\(R\\min\\left\\\{2S,14\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\\\}\\right\)\.
Similarly, we also have

μ˙​\(zt\)≤μ˙​\(𝒙t,⋆⊤​𝜽⋆\)​exp⁡\(R​\|zt−𝒙t,⋆⊤​𝜽⋆\|\),\\dot\{\\mu\}\(z\_\{t\}\)\\leq\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\exp\\left\(R\\lvert z\_\{t\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\rvert\\right\),where the argument of the exponent can be bounded as

\|zt−𝒙t,⋆⊤​𝜽⋆\|≤\|𝒙t⊤​𝜽⋆−𝒙t,⋆⊤​𝜽⋆\|≤8​κ​γ​∑i=1N∥𝒃ti∥\(𝑽0i\)−1,\\lvert z\_\{t\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\rvert\\leq\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\rvert\\leq 8\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\},and hence,

μ˙​\(zt\)≤μ˙​\(𝒙t,⋆⊤​𝜽⋆\)​exp⁡\(R​min⁡\{2​S,8​κ​γ​∑i=1N∥𝒃ti∥\(𝑽0i\)−1\}\)\.\\dot\{\\mu\}\(z\_\{t\}\)\\leq\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\exp\\left\(R\\min\\left\\\{2S,8\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\\\}\\right\)\.Using the definition ofβt\\beta\_\{t\}\(Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\), we get

μ˙​\(zt\)2​βtμ˙​\(𝒃t⊤​𝜽^0\)≤exp⁡\(R​min⁡\{3​S,14​κ​γ​∑i=1N∥𝒃ti∥\(𝑽0i\)−1\}\)​μ˙​\(𝒙t,⋆⊤​𝜽⋆\)\.\\sqrt\{\\frac\{\\dot\{\\mu\}\(z\_\{t\}\)^\{2\}\\beta\_\{t\}\}\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\}\\leq\\exp\\left\(R\\min\\left\\\{3S,14\\sqrt\{\\kappa\}\\gamma\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\\\}\\right\)\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\.Finally, using Claim A\.8 from\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\], we get

μ˙​\(zt\)2​βtμ˙​\(𝒃t⊤​𝜽^0\)≤\(5​e3​S​κ​γS​∑i=1N∥𝒃ti∥\(𝑽0i\)−1\+1\)​μ˙​\(𝒙t,⋆⊤​𝜽⋆\)\.\\sqrt\{\\frac\{\\dot\{\\mu\}\(z\_\{t\}\)^\{2\}\\beta\_\{t\}\}\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\}\\leq\\left\(\\frac\{5e^\{3S\}\\sqrt\{\\kappa\}\\gamma\}\{S\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\+1\\right\)\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\.Substituting this back, we get

\|μ\(𝒙t,⋆⊤𝜽⋆\)−μ\(𝒙t⊤𝜽⋆\)\|≤8γμ˙​\(𝒙t,⋆⊤​𝜽⋆\)\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)\(5​e3​S​κ​γS∑i=1N∥𝒃ti∥\(𝑽0i\)−1\+1\)\.\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert\\leq 8\\gamma\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)\\left\(\\frac\{5e^\{3S\}\\sqrt\{\\kappa\}\\gamma\}\{S\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\+1\\right\)\.∎

###### Lemma A\.9\.

At roundt∈𝒯mt\\in\\mathcal\{T\}\_\{m\}, let𝐱t,⋆\\bm\{x\}\_\{t,\\star\}be the optimal slate,

𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.If\|𝒯0\|≥T​\(𝐕\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(\\bm\{V\}\)and\|𝒯k\|≥T​\(𝐇\)\|\\mathcal\{T\}\_\{k\}\|\\geq T\(\\bm\{H\}\)for all1≤k≤m−11\\leq k\\leq m\-1, then, we have

∑t∈𝒯m\\displaystyle\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}𝔼\{𝒳ti∼𝒟mi\}i=1N\|μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\|\\displaystyle\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\\\}\_\{i=1\}^\{N\}\}\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert≤320​e3​S​γ2​N2​d2​κ​LμS​\|𝒯m\|\|𝒯m−1\|​\|𝒯0\|\+8​γ​𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​8​d2​N​\|𝒯m\|2\|𝒯m−1\|\+4​Lμ​N​\|𝒯m\|ρ\.\\displaystyle\\leq\\frac\{320e^\{3S\}\\gamma^\{2\}N^\{2\}d^\{2\}\\sqrt\{\\kappa L\_\{\\mu\}\}\}\{S\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\|\\mathcal\{T\}\_\{0\}\|\}\}\+8\\gamma\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sqrt\{\\frac\{8d^\{2\}N\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\+\\frac\{4L\_\{\\mu\}N\|\\mathcal\{T\}\_\{m\}\|\}\{\\rho\}\}\.where𝐱⋆=arg​max𝐱∈𝒳⁡𝐱⊤​𝛉⋆\\bm\{x\}\_\{\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}and𝒳=𝒳1×…×𝒳N\\mathcal\{X\}=\\mathcal\{X\}^\{1\}\\times\\ldots\\times\\mathcal\{X\}^\{N\}\.

###### Proof\.

Define𝒙˘:=μ˙​\(𝒃t⊤​𝜽^0\)​βt−1​𝒙\\breve\{\\bm\{x\}\}:=\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\}\\bm\{x\}and𝔼\{𝒳ti∼𝒟mi\}i=1N\[\.\]\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\\\}\_\{i=1\}^\{N\}\}\\left\[\.\\right\]as𝔼t,m\[\.\]\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{t,m\}\[\.\]\. Then, using Lemma[A\.8](https://arxiv.org/html/2606.31449#A1.Ex88), we wish to bound the following two terms \(excluding constants\):

1\.𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)\(∑i=1N∥𝒃ti∥\(𝑽0i\)−1\)1\.\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)\\left\(\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\)2\.𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)2\.\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)\\phantom\{\\left\(\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\)\}
Note that we can write

𝑯mi=λ​𝑰\+∑t∈𝒯m𝒙˘ti​\(𝒙˘ti\)⊤,\\bm\{H\}\_\{m\}^\{i\}=\\lambda\\bm\{I\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\breve\{\\bm\{x\}\}^\{i\}\_\{t\}\(\\breve\{\\bm\{x\}\}^\{i\}\_\{t\}\)^\{\\top\},and hence, using Lemma[A\.10](https://arxiv.org/html/2606.31449#A1.Ex132)and Lemma[A\.11](https://arxiv.org/html/2606.31449#A1.Ex134)in tandem, we get

𝔼𝒳i∼𝒟imax𝒛∈𝒳i∥𝒛∥\(𝑽0i\)−12≤8​d2\|𝒯0\|,\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\}\\lVert\\bm\{z\}\\rVert^\{2\}\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\leq\\frac\{8d^\{2\}\}\{\|\\mathcal\{T\}\_\{0\}\|\},𝔼𝒳i∼𝒟mimax𝒛∈𝒳i∥𝒛˘∥\(𝑯mi\)−12≤8​d2\|𝒯m\|\.\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}\_\{m\}^\{i\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}\_\{m\}^\{i\}\)^\{\-1\}\}\\leq\\frac\{8d^\{2\}\}\{\|\\mathcal\{T\}\_\{m\}\|\}\.Using these bounds as well as the trivial bound ofμ˙​\(𝒙t,⋆⊤​𝜽⋆\)≤Lμ\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\leq\\sqrt\{L\_\{\\mu\}\}, we upper bound the first term as follows:

𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1∑i=1N∥𝒃ti∥\(𝑽0i\)−1\)\\displaystyle\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\)≤𝔼t,mLμ⋅N2\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−12\)\(∑i=1N∥𝒃ti∥\(𝑽0i\)−12\)\\displaystyle\\leq\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{L\_\{\\mu\}\\cdot N^\{2\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)\\left\(\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert^\{2\}\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\)\}≤Lμ⋅N2∑i=1N𝔼𝒳ti∼𝒟mimax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−12∑i=1N𝔼𝒳ti∼𝒟mi∥𝒃ti∥\(𝑽0i\)−12\\displaystyle\\leq\\sqrt\{L\_\{\\mu\}\\cdot N^\{2\}\\sum\_\{i=1\}^\{N\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\sum\_\{i=1\}^\{N\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert^\{2\}\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\}≤Lμ⋅N2∑i=1N𝔼𝒳ti∼𝒟m−1imax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−12∑i=1N𝔼𝒳ti∼𝒟i∥𝒃ti∥\(𝑽0i\)−12\\displaystyle\\leq\\sqrt\{L\_\{\\mu\}\\cdot N^\{2\}\\sum\_\{i=1\}^\{N\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\-1\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\sum\_\{i=1\}^\{N\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert^\{2\}\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\}≤Lμ⋅N2⋅∑i=1N8​d2\|𝒯m−1\|⋅∑i=1N8​d2\|𝒯0\|\\displaystyle\\leq\\sqrt\{L\_\{\\mu\}\\cdot N^\{2\}\\cdot\\sum\_\{i=1\}^\{N\}\\frac\{8d^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\\cdot\\sum\_\{i=1\}^\{N\}\\frac\{8d^\{2\}\}\{\|\\mathcal\{T\}\_\{0\}\|\}\}≤8​N2​d2​Lμ\|𝒯m−1\|​\|𝒯0\|\.\\displaystyle\\leq\\frac\{8N^\{2\}d^\{2\}\\sqrt\{L\_\{\\mu\}\}\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\|\\mathcal\{T\}\_\{0\}\|\}\}\.Here, the first inequality uses the Cauchy\-Schwarz inequality, and the second inequality uses Jensen’s inequality\. The third inequality is a consequence of Lemma[A\.7](https://arxiv.org/html/2606.31449#A1.Ex85)while the second\-to\-last inequality follows from the bounds we showed above\.

Hence, summing over allt∈𝒯mt\\in\\mathcal\{T\}\_\{m\}, we get that:

∑t∈𝒯m𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1∑i=1N∥𝒃ti∥\(𝑽0i\)−1\)≤8​N2​d2​Lμ​\|𝒯m\|\|𝒯m−1\|​\|𝒯0\|\.\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\)\\leq\\frac\{8N^\{2\}d^\{2\}\\sqrt\{L\_\{\\mu\}\}\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\|\\mathcal\{T\}\_\{0\}\|\}\}\.
We now upper bound the second term using the Cauchy\-Schwarz inequality as:

∑t∈𝒯m𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1≤∑t∈𝒯m\(𝔼t,mμ˙\(𝒙t,⋆⊤𝜽⋆\)\)𝔼t,m\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)2\\displaystyle\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\leq\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\sqrt\{\\left\(\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\)\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)^\{2\}\}Now, since the optimal slate never gets eliminated \(Lemma[A\.6](https://arxiv.org/html/2606.31449#A1.Thmlemma6)\), we can write

𝔼\{𝒳ti∼𝒟mi\}i=1Nμ˙​\(𝒙t,⋆⊤​𝜽⋆\)=𝔼\{𝒳ti∼𝒟i\}i=1Nμ˙​\(𝒙t,⋆⊤​𝜽⋆\)=𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}=\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}=\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}where𝒙⋆=arg​max𝒙∈𝒳⁡𝒙⊤​𝜽⋆\\bm\{x\}\_\{\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\. Hence, this quantity is independent oftt\. We thus get

∑t∈𝒯m𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1≤𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)∑t∈𝒯m𝔼t,m\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)2⏟Term A\.\\displaystyle\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\leq\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\underbrace\{\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)^\{2\}\}\}\_\{\\text\{Term A\}\}\.Expanding the square in Term A gives us

Term A≤∑t∈𝒯m∑i=1N𝔼𝒳ti∼𝒟mimax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−12⏟Term A1\+2𝔼t,m∑i=1N∑j=1j≠iNmax𝒛∈𝒳ti𝒖∈𝒳tj∥𝒛˘∥\(𝑯m−1i\)−1∥𝒖˘∥\(𝑯m−1j\)−1⏟Term A2\.\\text\{Term A\}\\leq\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\sqrt\{\\underbrace\{\\sum\_\{i=1\}^\{N\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\}\_\{\\text\{Term A1\}\}\+\\underbrace\{2\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\max\_\{\\begin\{subarray\}\{c\}\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\\\\ \\bm\{u\}\\in\\mathcal\{X\}^\{j\}\_\{t\}\\end\{subarray\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\lVert\\breve\{\\bm\{u\}\}\\rVert\_\{\(\\bm\{H\}^\{j\}\_\{m\-1\}\)^\{\-1\}\}\}\_\{\\text\{Term A2\}\}\}\.
Upper\-bounding Term A1, we get

Term A1=∑i=1N𝔼𝒳ti∼𝒟mimax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−12\\displaystyle\\text\{Term A1\}=\\sum\_\{i=1\}^\{N\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}≤∑i=1N𝔼𝒳ti∼𝒟m−1imax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−12\\displaystyle\\leq\\sum\_\{i=1\}^\{N\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\-1\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}≤8​d2​N\|𝒯m−1\|\\displaystyle\\leq\\frac\{8d^\{2\}N\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}where the first inequality follows from Lemma[A\.7](https://arxiv.org/html/2606.31449#A1.Ex85), while the second inequality follows from using Lemma[A\.10](https://arxiv.org/html/2606.31449#A1.Ex132)and Lemma[A\.11](https://arxiv.org/html/2606.31449#A1.Ex134)in tandem as shown above\.

We now upper bound Term A2 as follows:

𝔼t,m∑i=1N∑j=1j≠iNmax𝒛∈𝒳ti𝒖∈𝒳tj∥𝒛˘∥\(𝑯m−1i\)−1∥𝒖˘∥\(𝑯m−1j\)−1\\displaystyle\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\max\_\{\\begin\{subarray\}\{c\}\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\\\\ \\bm\{u\}\\in\\mathcal\{X\}^\{j\}\_\{t\}\\end\{subarray\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\lVert\\breve\{\\bm\{u\}\}\\rVert\_\{\(\\bm\{H\}^\{j\}\_\{m\-1\}\)^\{\-1\}\}≤2​𝔼t,m​∑i=1N∑j=1j≠iNmax𝒛∈𝒳ti𝒖∈𝒳tj⁡∥𝒛˘∥​∥𝒖˘∥λmin​\(𝑯m−1i\)​λmin​\(𝑯m−1j\)\\displaystyle\\leq 2\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\max\_\{\\begin\{subarray\}\{c\}\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\\\\ \\bm\{u\}\\in\\mathcal\{X\}^\{j\}\_\{t\}\\end\{subarray\}\}\\frac\{\\lVert\\breve\{\\bm\{z\}\}\\rVert\\lVert\\breve\{\\bm\{u\}\}\\rVert\}\{\\sqrt\{\\lambda\_\{\\min\}\(\\bm\{H\}^\{i\}\_\{m\-1\}\)\\lambda\_\{\\min\}\(\\bm\{H\}^\{j\}\_\{m\-1\}\)\}\}≤2​𝔼t,m​∑i=1N∑j=1j≠iNLμN​\(λ\+0\.5​ρ​\|𝒯m\|\)\\displaystyle\\leq 2\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\frac\{L\_\{\\mu\}\}\{N\(\\lambda\+0\.5\\rho\|\\mathcal\{T\}\_\{m\}\|\)\}where the first inequality follows from Rayleigh’s quotient, while the second inequality follows from the fact that∥𝒛˘∥≤Lμ​N−1\\lVert\\breve\{\\bm\{z\}\}\\rVert\\leq\\sqrt\{L\_\{\\mu\}N^\{\-1\}\}\. The second inequality also uses the linear growth of eigenvalues of the slot\-level matrices, shown in Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136)\. Further simplification gives us a bound on Term A2 as

Term A2=2𝔼t,m∑i=1N∑j=1j≠iNmax𝒛∈𝒳ti𝒖∈𝒳tj∥𝒛˘∥\(𝑯m−1i\)−12∥𝒖˘∥\(𝑯m−1j\)−12≤4​Lμ​Nρ​\|𝒯m\|\.\\text\{Term A2\}=2\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\max\_\{\\begin\{subarray\}\{c\}\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\\\\ \\bm\{u\}\\in\\mathcal\{X\}^\{j\}\_\{t\}\\end\{subarray\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\lVert\\breve\{\\bm\{u\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}^\{j\}\_\{m\-1\}\)^\{\-1\}\}\\leq\\frac\{4L\_\{\\mu\}N\}\{\\rho\|\\mathcal\{T\}\_\{m\}\|\}\.Putting these bounds together, we get a bound on TermA as

∑t∈𝒯m𝔼t,m\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)2≤8​d2​N​\|𝒯m\|2\|𝒯m−1\|\+4​Lμ​N​\|𝒯m\|ρ\.\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)^\{2\}\}\\leq\\sqrt\{\\frac\{8d^\{2\}N\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\+\\frac\{4L\_\{\\mu\}N\|\\mathcal\{T\}\_\{m\}\|\}\{\\rho\}\}\.Using Lemma[A\.8](https://arxiv.org/html/2606.31449#A1.Ex88)and assembling all the bounds, we get that

∑t∈𝒯m\\displaystyle\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}𝔼t,m\|μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\|\\displaystyle\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert≤320​e3​S​γ2​N2​d2​κ​LμS​\|𝒯m\|\|𝒯m−1\|​\|𝒯0\|\+8​γ​𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​8​d2​N​\|𝒯m\|2\|𝒯m−1\|\+4​Lμ​N​\|𝒯m\|ρ\.\\displaystyle\\leq\\frac\{320e^\{3S\}\\gamma^\{2\}N^\{2\}d^\{2\}\\sqrt\{\\kappa L\_\{\\mu\}\}\}\{S\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\|\\mathcal\{T\}\_\{0\}\|\}\}\+8\\gamma\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sqrt\{\\frac\{8d^\{2\}N\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\+\\frac\{4L\_\{\\mu\}N\|\\mathcal\{T\}\_\{m\}\|\}\{\\rho\}\}\.
∎

### A\.4Results on Optimal Designs

###### Lemma A\.10\.

\(Corollary A\.16,\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\) Define𝐀=λ​𝐈\+∑i=1N𝐱i​𝐱i⊤\\bm\{A\}=\\lambda\\bm\{I\}\+\\sum\_\{i=1\}^\{N\}\\bm\{x\}\_\{i\}\\bm\{x\}\_\{i\}^\{\\top\}\. Then, forλ=𝒪​\(log⁡\(T​d\)\)\\lambda=\\mathcal\{O\}\(\\log\(Td\)\), we have

𝑨⪰N8​𝔼𝒳∼𝒟𝔼𝒙∼πG​\(𝒳\)\[𝒙​𝒙⊤∣𝒳\]\.\\bm\{A\}\\succeq\\frac\{N\}\{8\}\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{\\mathcal\{X\}\\sim\\mathcal\{D\}\}\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{\\bm\{x\}\\sim\\pi\_\{G\}\(\\mathcal\{X\}\)\}\\left\[\\bm\{x\}\\bm\{x\}^\{\\top\}\\mid\\mathcal\{X\}\\right\]\.

###### Lemma A\.11\.

\(Lemma 4,\[[27](https://arxiv.org/html/2606.31449#bib.bib6)\]\) Let

𝑾G=𝔼𝒳∼𝒟𝔼𝒙∼πG​\(𝒳\)\[𝒙​𝒙⊤∣𝒳\]\.\\bm\{W\}\_\{G\}=\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}\\sim\\mathcal\{D\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\bm\{x\}\\sim\\pi\_\{G\}\(\\mathcal\{X\}\)\}\\left\[\\bm\{x\}\\bm\{x\}^\{\\top\}\\mid\\mathcal\{X\}\\right\]\.Then, we have

𝔼𝒳∼𝒟max𝒙∈𝒳∥𝒙∥𝑾G−12≤d2\.\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}\\sim\\mathcal\{D\}\}\\max\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{W\}\_\{G\}^\{\-1\}\}\\leq d^\{2\}\.

### A\.5Showing Multiplicative Equivalence forB\-SlateGLinCB

###### Lemma A\.12\.

Let𝐇m\\bm\{H\}\_\{m\}and𝐇mi\\bm\{H\}^\{i\}\_\{m\}be defined as in Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\. Let\|𝒯m\|≥T​\(𝐇\):=48​Lμ2\+8​Lμ​N​ρ3​ρ2​\(N−1\)2​log⁡\(2​d​N​Tδ\)\|\\mathcal\{T\}\_\{m\}\|\\geq T\(\\bm\{H\}\):=\\frac\{48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\}\{3\\rho^\{2\}\}\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\. Then, assuming the diversity assumptions \(Section[2](https://arxiv.org/html/2606.31449#S2)\) hold, with high probability, we have that

14​diag​\(𝑯m1,…,𝑯mN\)⪯𝑯m⪯74​diag​\(𝑯m1,…,𝑯mN\)\.\\frac\{1\}\{4\}\\textrm\{diag\}\(\\bm\{H\}^\{1\}\_\{m\},\\ldots,\\bm\{H\}^\{N\}\_\{m\}\)\\preceq\\bm\{H\}\_\{m\}\\preceq\\frac\{7\}\{4\}\\textrm\{diag\}\(\\bm\{H\}\_\{m\}^\{1\},\\ldots,\\bm\{H\}\_\{m\}^\{N\}\)\.Consequently, for any𝐱=\(𝐱1,…,𝐱N\)\\bm\{x\}=\(\\bm\{x\}^\{1\},\\ldots,\\bm\{x\}^\{N\}\), we have that

∥𝒙∥𝑯𝒎−1≤2​∑i=1N∥𝒙i∥\(𝑯mi\)−1\.\\lVert\\bm\{x\}\\rVert\_\{\\bm\{H\_\{m\}\}^\{\-1\}\}\\leq 2\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\}\)^\{\-1\}\}\.

###### Proof\.

Define𝒙¯t:=μ˙​\(𝒃t⊤​𝜽^0\)​βt−1​𝒙t\\overline\{\\bm\{x\}\}\_\{t\}:=\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\}\\bm\{x\}\_\{t\}and𝒙¯ti:=μ˙​\(𝒃t⊤​𝜽^0\)​βt−1​𝒙ti\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}:=\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\}\\bm\{x\}^\{i\}\_\{t\}\. Then, note that∥𝒙¯ti∥≤Lμ​N−1\\lVert\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\\rVert\\leq\\sqrt\{L\_\{\\mu\}N^\{\-1\}\}\. Also, we can write

𝑯m=λ​𝑰N​d\+∑t∈𝒯m𝒙¯t​𝒙¯t⊤and𝑯mi=λ​𝑰\+∑t∈𝒯m𝒙¯ti​𝒙¯ti⊤\.\\bm\{H\}\_\{m\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\overline\{\\bm\{x\}\}\_\{t\}\\overline\{\\bm\{x\}\}\_\{t\}^\{\\top\}\\quad\\text\{and\}\\quad\\bm\{H\}^\{i\}\_\{m\}=\\lambda\\bm\{I\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\{\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\}^\{\\top\}\.
Also, for the sake of the proof, define𝑼m:=diag​\(𝑯m1,…,𝑯mN\)\\bm\{U\}\_\{m\}:=\\textrm\{diag\}\(\\bm\{H\}^\{1\}\_\{m\},\\ldots,\\bm\{H\}^\{N\}\_\{m\}\)\. Then, using Lemma B\.1 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\], we have that

𝑼m−1/2​𝑯m​𝑼m−1/2=𝑰N​d\+𝑮m\.\\bm\{U\}\_\{m\}^\{\-1/2\}\\bm\{H\}\_\{m\}\\bm\{U\}\_\{m\}^\{\-1/2\}=\\bm\{I\}\_\{Nd\}\+\\bm\{G\}\_\{m\}\.where\(𝑮m\)i​j=𝟙​\{i≠j\}​\(𝑯mi\)−1/2​𝑯m\(i,j\)​\(𝑯mj\)−1/2\(\\bm\{G\}\_\{m\}\)\_\{ij\}=\\mathbbm\{1\}\\\{i\\neq j\\\}\(\\bm\{H\}^\{i\}\_\{m\}\)^\{\-1/2\}\\bm\{H\}\_\{m\}^\{\(i,j\)\}\(\\bm\{H\}^\{j\}\_\{m\}\)^\{\-1/2\}and𝑯m\(i,j\)=∑t∈𝒯m𝒙¯ti​𝒙¯tj⊤\\bm\{H\}^\{\(i,j\)\}\_\{m\}=\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\{\\overline\{\\bm\{x\}\}^\{j\}\_\{t\}\}^\{\\top\}\. We now bound the norm of𝑯m\(i,j\)​∀i∈\[N\]\\bm\{H\}^\{\(i,j\)\}\_\{m\}\\;\\forall i\\in\[N\]andj∈\[i\+1,N\]j\\in\[i\+1,N\]\. A straightforward application of Lemma D\.2 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]shows that for fixedi∈\[N\]i\\in\[N\]andj\>ij\>iwith the quantitiesm1=m2=Lμ​N−1m\_\{1\}=m\_\{2\}=\\sqrt\{L\_\{\\mu\}N^\{\-1\}\},d1=d2=dd\_\{1\}=d\_\{2\}=dandδ=2​δN​\(N−1\)\\delta=\\frac\{2\\delta\}\{N\(N\-1\)\},

ℙ​\{∃t≥1:‖∑s∈\[t\]𝒙¯si​𝒙¯sj⊤‖≥8​Lμ2N2​t​log⁡\(d​N​\(N−1\)δ\)\}≤2​δN​\(N−1\)\.\\mathbb\{P\}\\left\\\{\\exists t\\geq 1:\\left\\lVert\\sum\_\{s\\in\[t\]\}\\overline\{\\bm\{x\}\}^\{i\}\_\{s\}\{\\overline\{\\bm\{x\}\}\_\{s\}^\{j\}\}^\{\\top\}\\right\\rVert\\geq\\sqrt\{\\frac\{8L\_\{\\mu\}^\{2\}\}\{N^\{2\}\}t\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\\right\\\}\\leq\\frac\{2\\delta\}\{N\(N\-1\)\}\.Taking a union bound over alli∈\[N\]i\\in\[N\]andj∈\[i\+1,N\]j\\in\[i\+1,N\]gives us the result for all pairs of\(i,j\)\(i,j\)wherej\>ij\>i\. In particular, settingt=\|𝒯m\|t=\|\\mathcal\{T\}\_\{m\}\|gives us the result that for all pairs of\(i,j\)\(i,j\)wherej\>ij\>i, with high probability

∥𝑯m\(i,j\)∥≤8​Lμ2N2​\|𝒯m\|​log⁡\(d​N​\(N−1\)δ\)\.\\lVert\\bm\{H\}^\{\(i,j\)\}\_\{m\}\\rVert\\leq\\sqrt\{\\frac\{8L\_\{\\mu\}^\{2\}\}\{N^\{2\}\}\|\\mathcal\{T\}\_\{m\}\|\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\.
Now, using the diversity conditions, we know that

𝔼\[𝒙ti​𝒙ti⊤∣ℱt−1\]⪰ρ​𝑰\.\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\mid\\mathcal\{F\}\_\{t\-1\}\]\\succeq\\rho\\bm\{I\}\.Using the definition ofκ\\kappa, we can say that

𝔼\[𝒙¯ti​𝒙¯ti⊤∣ℱt−1\]=𝔼\[μ˙​\(𝒃t⊤​𝜽^0\)​βt−1​𝒙ti​𝒙ti⊤∣ℱt−1\]⪰ρ​κ−1​βt−1​𝑰\.\\operatorname\*\{\\mathbb\{E\}\}\[\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\{\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\}^\{\\top\}\\mid\\mathcal\{F\}\_\{t\-1\}\]=\\operatorname\*\{\\mathbb\{E\}\}\[\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\mid\\mathcal\{F\}\_\{t\-1\}\]\\succeq\\rho\\kappa^\{\-1\}\\beta\_\{t\}^\{\-1\}\\bm\{I\}\.Applying Lemma[D\.1](https://arxiv.org/html/2606.31449#A4.Thmlemma1)using the quantitiesα=κ−1​βt−1\(≤1\)\\alpha=\\kappa^\{\-1\}\\beta\_\{t\}^\{\-1\}\(\\leq 1\),m=Lμ​N−1m=\\sqrt\{L\_\{\\mu\}N^\{\-1\}\},γ=λ\\gamma=\\lambda,δ=δN\\delta=\\frac\{\\delta\}\{N\}, andc=0\.5c=0\.5, for some fixedi∈\[N\]i\\in\[N\], with probability1−δN1\-\\frac\{\\delta\}\{N\},

λmin​\(λ​𝑰\+∑s∈\[t\]𝒙¯si​𝒙¯si⊤\)≥λ\+ρ​t2​∀t≥48​Lμ2\+8​Lμ​N​ρ3​ρ2​N2​log⁡\(2​d​N​Tδ\)\.\\lambda\_\{\\min\}\\left\(\\lambda\\bm\{I\}\+\\sum\_\{s\\in\[t\]\}\\overline\{\\bm\{x\}\}^\{i\}\_\{s\}\{\\overline\{\\bm\{x\}\}\_\{s\}^\{i\}\}^\{\\top\}\\right\)\\geq\\lambda\+\\frac\{\\rho t\}\{2\}\\;\\forall t\\geq\\frac\{48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\}\{3\\rho^\{2\}N^\{2\}\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\.Using the fact that\(N−1\)2≥N−2\(N\-1\)^\{2\}\\geq N^\{\-2\}, we also have that, with probability1−δN1\-\\frac\{\\delta\}\{N\},

λmin​\(λ​𝑰\+∑s∈\[t\]𝒙¯si​𝒙¯si⊤\)≥λ\+ρ​t2​∀t≥T​\(𝑯\):=48​Lμ2\+8​Lμ​N​ρ3​ρ2​\(N−1\)2​log⁡\(2​d​N​Tδ\)\.\\lambda\_\{\\min\}\\left\(\\lambda\\bm\{I\}\+\\sum\_\{s\\in\[t\]\}\\overline\{\\bm\{x\}\}^\{i\}\_\{s\}\{\\overline\{\\bm\{x\}\}\_\{s\}^\{i\}\}^\{\\top\}\\right\)\\geq\\lambda\+\\frac\{\\rho t\}\{2\}\\;\\forall t\\geq T\(\\bm\{H\}\):=\\frac\{48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\}\{3\\rho^\{2\}\}\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\.A union bound over all slots gives us this result for alli∈\[N\]i\\in\[N\]\. In particular, let\|𝒯m\|≥T​\(𝑯\)\|\\mathcal\{T\}\_\{m\}\|\\geq T\(\\bm\{H\}\)\. Then, settingt=\|𝒯m\|t=\|\\mathcal\{T\}\_\{m\}\|gives us the result that for alli∈\[N\]i\\in\[N\], with high probability,

λmin​\(𝑯mi\)≥λ\+ρ​\|𝒯m\|2\.\\lambda\_\{\\min\}\\left\(\\bm\{H\}^\{i\}\_\{m\}\\right\)\\geq\\lambda\+\\frac\{\\rho\|\\mathcal\{T\}\_\{m\}\|\}\{2\}\.Now, fori∈\[N−1\]i\\in\[N\-1\], define𝒁mi∈ℝd×i​d\\bm\{Z\}\_\{m\}^\{i\}\\in\\mathbb\{R\}^\{d\\times id\}as the following matrix: forj∈\[i\]j\\in\[i\], thejt​hj^\{th\}d×dd\\times dblock of𝒁mi\\bm\{Z\}\_\{m\}^\{i\}is given by\(𝑯mN−i\)−1/2​𝑯m\(N−i,N−i\+j\)​\(𝑯mN−i\+j\)−1/2\(\\bm\{H\}^\{N\-i\}\_\{m\}\)^\{\-1/2\}\\bm\{H\}^\{\(N\-i,N\-i\+j\)\}\_\{m\}\(\\bm\{H\}^\{N\-i\+j\}\_\{m\}\)^\{\-1/2\}\.

Then, using Lemma B\.7 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\], we have that,

∥𝒁mi∥\\displaystyle\\lVert\\bm\{Z\}\_\{m\}^\{i\}\\rVert≤∑j∈\[i\]∥𝑯mN−i,N−i\+j∥λmin​\(𝑯mN−i\)​λmin​\(𝑯mN−i\+j\)\\displaystyle\\leq\\sum\_\{j\\in\[i\]\}\\frac\{\\lVert\\bm\{H\}\_\{m\}^\{N\-i,N\-i\+j\}\\rVert\}\{\\sqrt\{\\lambda\_\{\\min\}\(\\bm\{H\}^\{N\-i\}\_\{m\}\)\\lambda\_\{\\min\}\(\\bm\{H\}\_\{m\}^\{N\-i\+j\}\)\}\}≤∑j∈\[i\]8​Lμ2N2​\|𝒯m\|​log⁡\(d​N​\(N−1\)δ\)λ\+0\.5​ρ​\|𝒯m\|\\displaystyle\\leq\\sum\_\{j\\in\[i\]\}\\frac\{\\sqrt\{\\frac\{8L\_\{\\mu\}^\{2\}\}\{N^\{2\}\}\|\\mathcal\{T\}\_\{m\}\|\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\}\{\\lambda\+0\.5\\rho\|\\mathcal\{T\}\_\{m\}\|\}≤∑j∈\[i\]32​Lμ2​log⁡\(d​N​\(N−1\)δ\)N2​ρ2​\|𝒯m\|\.\\displaystyle\\leq\\sum\_\{j\\in\[i\]\}\\sqrt\{\\frac\{32L\_\{\\mu\}^\{2\}\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\{N^\{2\}\\rho^\{2\}\|\\mathcal\{T\}\_\{m\}\|\}\}\.
Using the fact that\|𝒯m\|≥T​\(𝑯\)\|\\mathcal\{T\}\_\{m\}\|\\geq T\(\\bm\{H\}\), we get that

∥𝒁mi∥≤∑j∈\[i\]96​Lμ2​log⁡\(d​N​\(N−1\)δ\)N2​\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)2​log⁡\(2​d​N​Tδ\)≤3​i2​N​\(N−1\)\\lVert\\bm\{Z\}\_\{m\}^\{i\}\\rVert\\leq\\sum\_\{j\\in\[i\]\}\\sqrt\{\\frac\{96L\_\{\\mu\}^\{2\}\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\{N^\{2\}\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\}\}\\leq\\frac\{3i\}\{2N\(N\-1\)\}where the last inequality follows from the fact that2≤32\\sqrt\{2\}\\leq\\frac\{3\}\{2\}\.

Finally, recall the definition of the matrix𝑮m\\bm\{G\}\_\{m\}:

\(𝑮m\)i​j=𝟙​\{i≠j\}​\(𝑯mi\)−1/2​𝑯mi,j​\(𝑯mj\)−1/2\.\(\\bm\{G\}\_\{m\}\)\_\{ij\}=\\mathbbm\{1\}\\\{i\\neq j\\\}\(\\bm\{H\}^\{i\}\_\{m\}\)^\{\-1/2\}\\bm\{H\}\_\{m\}^\{i,j\}\(\\bm\{H\}^\{j\}\_\{m\}\)^\{\-1/2\}\.It is easy to see that we can write𝑮m\\bm\{G\}\_\{m\}as the following matrix recurrence relation: fori∈\[1,N−1\]i\\in\[1,N\-1\], define

𝑮m1=\[𝟎𝒁m1\(𝒁m1\)⊤𝟎\],𝑮mi=\[𝟎𝒁mi\(𝒁mi\)⊤𝑮mi−1\.\]\\bm\{G\}\_\{m\}^\{1\}=\\begin\{bmatrix\}\\bm\{0\}&\\bm\{Z\}^\{1\}\_\{m\}\\\\ \(\\bm\{Z\}^\{1\}\_\{m\}\)^\{\\top\}&\\bm\{0\}\\end\{bmatrix\},\\quad\\bm\{G\}\_\{m\}^\{i\}=\\begin\{bmatrix\}\\bm\{0\}&\\bm\{Z\}^\{i\}\_\{m\}\\\\ \(\\bm\{Z\}^\{i\}\_\{m\}\)^\{\\top\}&\\bm\{G\}\_\{m\}^\{i\-1\}\.\\end\{bmatrix\}Then,𝑮m=𝑮mN−1\\bm\{G\}\_\{m\}=\\bm\{G\}\_\{m\}^\{N\-1\}\. Using Lemma B\.2 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]gives us:

λmax​\(𝑮m\)≤∑i∈\[N−1\]∥𝒁mi∥=32​∑i∈\[N−1\]iN​\(N−1\)=34,\\lambda\_\{\\max\}\(\\bm\{G\}\_\{m\}\)\\leq\\sum\_\{i\\in\[N\-1\]\}\\lVert\\bm\{Z\}^\{i\}\_\{m\}\\rVert=\\frac\{3\}\{2\}\\sum\_\{i\\in\[N\-1\]\}\\frac\{i\}\{N\(N\-1\)\}=\\frac\{3\}\{4\},λmin​\(𝑮m\)≥−∑i∈\[N−1\]∥𝒁mi∥=−32​∑i∈\[N−1\]iN​\(N−1\)=−34\.\\lambda\_\{\\min\}\(\\bm\{G\}\_\{m\}\)\\geq\-\\sum\_\{i\\in\[N\-1\]\}\\lVert\\bm\{Z\}^\{i\}\_\{m\}\\rVert=\-\\frac\{3\}\{2\}\\sum\_\{i\\in\[N\-1\]\}\\frac\{i\}\{N\(N\-1\)\}=\-\\frac\{3\}\{4\}\.Substituting𝑮m=𝑼m−1/2​𝑯m​𝑼m−1/2−𝑰N​d\\bm\{G\}\_\{m\}=\\bm\{U\}\_\{m\}^\{\-1/2\}\\bm\{H\}\_\{m\}\\bm\{U\}\_\{m\}^\{\-1/2\}\-\\bm\{I\}\_\{Nd\}, we get that

14​𝑼m⪯𝑯m⪯74​𝑼m\.\\frac\{1\}\{4\}\\bm\{U\}\_\{m\}\\preceq\\bm\{H\}\_\{m\}\\preceq\\frac\{7\}\{4\}\\bm\{U\}\_\{m\}\.This finishes the first part of the proof\. For the second part, notice that,

𝑯m−1⪯4​diag​\(\(𝑯m1\)−1,…,\(𝑯mN\)−1\)\.\\bm\{H\}\_\{m\}^\{\-1\}\\preceq 4\\;\\textrm\{diag\}\(\(\\bm\{H\}^\{1\}\_\{m\}\)^\{\-1\},\\ldots,\(\\bm\{H\}^\{N\}\_\{m\}\)^\{\-1\}\)\.Also, note that𝒙=∑i=1N\(𝒙i⊗𝒆i\)\\bm\{x\}=\\sum\_\{i=1\}^\{N\}\(\\bm\{x\}^\{i\}\\otimes\\bm\{e\}\_\{i\}\)\(Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\) and hence, an application of the triangle inequality gives us:

∥𝒙∥𝑯m−1≤2​∑i∈\[N\]∥𝒙i⊗𝒆i∥diag​\(\(𝑯m1\)−1,…,\(𝑯mN\)−1\)≤2​∑i=1N∥𝒙i∥\(𝑯mi\)−1\.\\lVert\\bm\{x\}\\rVert\_\{\\bm\{H\}\_\{m\}^\{\-1\}\}\\leq 2\\sum\_\{i\\in\[N\]\}\\lVert\\bm\{x\}^\{i\}\\otimes\\bm\{e\}\_\{i\}\\rVert\_\{\\textrm\{diag\}\\left\(\(\\bm\{H\}^\{1\}\_\{m\}\)^\{\-1\},\\ldots,\(\\bm\{H\}^\{N\}\_\{m\}\)^\{\-1\}\\right\)\}\\leq 2\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\}\)^\{\-1\}\}\.which finishes the proof for the second part\. ∎

###### Lemma A\.13\.

Let𝐕0\\bm\{V\}\_\{0\}and𝐕0i\\bm\{V\}^\{i\}\_\{0\}be defined as in Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1)\. Let\|𝒯0\|≥T​\(𝐕\):=48\+8​N​ρ3​ρ2​\(N−1\)2​log⁡\(2​d​N​Tδ\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(\\bm\{V\}\):=\\frac\{48\+8N\\rho\}\{3\\rho^\{2\}\}\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\. Then, assuming the diversity assumptions \(Section[2](https://arxiv.org/html/2606.31449#S2)\) hold, with high probability, we have that

14​diag​\(𝑽01,…,𝑽0N\)⪯𝑽0⪯74​diag​\(𝑽01,…,𝑽0N\)\.\\frac\{1\}\{4\}\\textrm\{diag\}\(\\bm\{V\}^\{1\}\_\{0\},\\ldots,\\bm\{V\}^\{N\}\_\{0\}\)\\preceq\\bm\{V\}\_\{0\}\\preceq\\frac\{7\}\{4\}\\textrm\{diag\}\(\\bm\{V\}\_\{0\}^\{1\},\\ldots,\\bm\{V\}\_\{0\}^\{N\}\)\.Consequently, for any𝐱=\(𝐱1,…,𝐱N\)\\bm\{x\}=\(\\bm\{x\}^\{1\},\\ldots,\\bm\{x\}^\{N\}\), we have that

∥𝒙∥𝑽0−1≤2​∑i=1N∥𝒙i∥\(𝑽0i\)−1\.\\lVert\\bm\{x\}\\rVert\_\{\\bm\{V\}\_\{0\}^\{\-1\}\}\\leq 2\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\.

###### Proof\.

The proof follows on the same lines as that of Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136)\. First, we have that∥𝒙ti∥≤N−1/2\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\\leq N^\{\-1/2\}\. Also, for the sake of the proof, define𝑾0:=diag​\(𝑽01,…,𝑽0N\)\\bm\{W\}\_\{0\}:=\\textrm\{diag\}\(\\bm\{V\}^\{1\}\_\{0\},\\ldots,\\bm\{V\}^\{N\}\_\{0\}\)\. Then, using Lemma B\.1 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\], we have that

𝑾0−1/2​𝑽0​𝑾0−1/2=𝑰N​d\+𝑮0\.\\bm\{W\}\_\{0\}^\{\-1/2\}\\bm\{V\}\_\{0\}\\bm\{W\}\_\{0\}^\{\-1/2\}=\\bm\{I\}\_\{Nd\}\+\\bm\{G\}\_\{0\}\.where\(𝑮0\)i​j=𝟙​\{i≠j\}​\(𝑽0i\)−1/2​𝑽0\(i,j\)​\(𝑽0j\)−1/2\(\\bm\{G\}\_\{0\}\)\_\{ij\}=\\mathbbm\{1\}\\\{i\\neq j\\\}\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1/2\}\\bm\{V\}\_\{0\}^\{\(i,j\)\}\(\\bm\{V\}^\{j\}\_\{0\}\)^\{\-1/2\}and𝑽0\(i,j\)=∑t∈𝒯0𝒙ti​𝒙tj⊤\\bm\{V\}^\{\(i,j\)\}\_\{0\}=\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{j\}\_\{t\}\}^\{\\top\}\. A straightforward application of Lemma D\.2 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]shows that for fixedi∈\[N\]i\\in\[N\]andj\>ij\>iwith the quantitiesm1=m2=N−1/2m\_\{1\}=m\_\{2\}=N^\{\-1/2\},d1=d2=dd\_\{1\}=d\_\{2\}=dandδ=2​δN​\(N−1\)\\delta=\\frac\{2\\delta\}\{N\(N\-1\)\},

ℙ​\{∃t≥1:‖∑s∈\[t\]𝒙si​𝒙sj⊤‖≥8​tN2​log⁡\(d​N​\(N−1\)δ\)\}≤2​δN​\(N−1\)\.\\mathbb\{P\}\\left\\\{\\exists t\\geq 1:\\left\\lVert\\sum\_\{s\\in\[t\]\}\\bm\{x\}^\{i\}\_\{s\}\{\\bm\{x\}\_\{s\}^\{j\}\}^\{\\top\}\\right\\rVert\\geq\\sqrt\{\\frac\{8t\}\{N^\{2\}\}\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\\right\\\}\\leq\\frac\{2\\delta\}\{N\(N\-1\)\}\.Taking a union bound over alli∈\[N\]i\\in\[N\]andj∈\[i\+1,N\]j\\in\[i\+1,N\]gives us the result for all pairs of\(i,j\)\(i,j\)wherej\>ij\>i\. In particular, settingT=\|𝒯0\|T=\|\\mathcal\{T\}\_\{0\}\|gives us the result that for all pairs of\(i,j\)\(i,j\)wherej\>ij\>i, with high probability

∥𝑽0\(i,j\)∥≤8​\|𝒯0\|N2​log⁡\(d​N​\(N−1\)δ\)\.\\lVert\\bm\{V\}^\{\(i,j\)\}\_\{0\}\\rVert\\leq\\sqrt\{\\frac\{8\|\\mathcal\{T\}\_\{0\}\|\}\{N^\{2\}\}\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\.
Now, using the diversity conditions, we know that

𝔼\[𝒙ti​𝒙ti⊤∣ℱt−1\]⪰ρ​𝑰\.\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\mid\\mathcal\{F\}\_\{t\-1\}\]\\succeq\\rho\\bm\{I\}\.Similar to Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136), an application of Lemma[D\.1](https://arxiv.org/html/2606.31449#A4.Thmlemma1)for a fixedi∈\[N\]i\\in\[N\]using the quantitiesα=1\\alpha=1,m=N−1/2m=N^\{\-1/2\},γ=λ\\gamma=\\lambda,δ=δN\\delta=\\frac\{\\delta\}\{N\}andc=0\.5c=0\.5, followed by the utilization of the fact that\(N−1\)2≥N−2\(N\-1\)^\{2\}\\geq N^\{\-2\}, and finishing with a union bound over alli∈\[N\]i\\in\[N\]gives us that for alli∈\[N\]i\\in\[N\], with high probability,

λmin​\(λ​𝑰\+∑s∈\[t\]𝒙si​𝒙si⊤\)≥λ\+ρ​t2​∀t≥T​\(𝑽\):=48\+8​N​ρ3​ρ2​\(N−1\)2​log⁡\(2​d​N​Tδ\)\.\\lambda\_\{\\min\}\\left\(\\lambda\\bm\{I\}\+\\sum\_\{s\\in\[t\]\}\\bm\{x\}^\{i\}\_\{s\}\{\\bm\{x\}\_\{s\}^\{i\}\}^\{\\top\}\\right\)\\geq\\lambda\+\\frac\{\\rho t\}\{2\}\\;\\forall t\\geq T\(\\bm\{V\}\):=\\frac\{48\+8N\\rho\}\{3\\rho^\{2\}\}\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\.In particular, let\|𝒯0\|≥T​\(𝑽\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(\\bm\{V\}\)\. Then, settingt=\|𝒯0\|t=\|\\mathcal\{T\}\_\{0\}\|gives us the result that for alli∈\[N\]i\\in\[N\], with high probability,

λmin​\(𝑽0i\)≥λ\+ρ​\|𝒯0\|2\.\\lambda\_\{\\min\}\\left\(\\bm\{V\}^\{i\}\_\{0\}\\right\)\\geq\\lambda\+\\frac\{\\rho\|\\mathcal\{T\}\_\{0\}\|\}\{2\}\.Now, fori∈\[N−1\]i\\in\[N\-1\], define𝒁0i∈ℝd×i​d\\bm\{Z\}\_\{0\}^\{i\}\\in\\mathbb\{R\}^\{d\\times id\}as the following matrix: forj∈\[i\]j\\in\[i\], thejt​hj^\{th\}d×dd\\times dblock of𝒁0i\\bm\{Z\}\_\{0\}^\{i\}is given by\(𝑽0N−i\)−1/2​𝑽0\(N−i,N−i\+j\)​\(𝑽0N−i\+j\)−1/2\(\\bm\{V\}^\{N\-i\}\_\{0\}\)^\{\-1/2\}\\bm\{V\}^\{\(N\-i,N\-i\+j\)\}\_\{0\}\(\\bm\{V\}^\{N\-i\+j\}\_\{0\}\)^\{\-1/2\}\.

Then, using Lemma B\.7 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]and following a similar approach to that shown in Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136), we have that,

∥𝒁0i∥≤∑j∈\[i\]96​log⁡\(d​N​\(N−1\)δ\)N2​\(48\+8​N​ρ\)​\(N−1\)2​log⁡\(2​d​N​Tδ\)≤3​i2​N​\(N−1\)\.\\lVert\\bm\{Z\}\_\{0\}^\{i\}\\rVert\\leq\\sum\_\{j\\in\[i\]\}\\sqrt\{\\frac\{96\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\{N^\{2\}\(48\+8N\\rho\)\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\}\}\\leq\\frac\{3i\}\{2N\(N\-1\)\}\.
Writing𝑮0\\bm\{G\}\_\{0\}as a matrix recurrence relation as in Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136)and using Lemma B\.2 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]gives:

λmax​\(𝑮0\)≤34andλmin​\(𝑮0\)≥−34\.\\lambda\_\{\\max\}\(\\bm\{G\}\_\{0\}\)\\leq\\frac\{3\}\{4\}\\quad\\text\{ and \}\\quad\\lambda\_\{\\min\}\(\\bm\{G\}\_\{0\}\)\\geq\-\\frac\{3\}\{4\}\.Substituting𝑮0=𝑾0−1/2​𝑽0​𝑾0−1/2−𝑰N​d\\bm\{G\}\_\{0\}=\\bm\{W\}\_\{0\}^\{\-1/2\}\\bm\{V\}\_\{0\}\\bm\{W\}\_\{0\}^\{\-1/2\}\-\\bm\{I\}\_\{Nd\}, we get that

14​𝑾0⪯𝑽0⪯74​𝑾0\.\\frac\{1\}\{4\}\\bm\{W\}\_\{0\}\\preceq\\bm\{V\}\_\{0\}\\preceq\\frac\{7\}\{4\}\\bm\{W\}\_\{0\}\.This finishes the proof for the first part\. The second part of the proof is exactly the same as in Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136)\. ∎

### A\.6Other Relevant Lemmas

###### Lemma A\.14\.

LetXα≥k′​log⁡\(k​X\)X^\{\\alpha\}\\geq k^\{\\prime\}\\log\(kX\), wherek,k′\>0k,k^\{\\prime\}\>0,kα​k′≥α​ek^\{\\alpha\}k^\{\\prime\}\\geq\\alpha e, andα∈\(0,1\]\\alpha\\in\(0,1\]\. Also, assumeX≥k−1​exp⁡\(α−1\)X\\geq k^\{\-1\}\\exp\(\\alpha^\{\-1\}\)\. Then, we have that

X≥1k​exp⁡\(−1α​W−1​\(−αkα​k′\)\)X\\geq\\frac\{1\}\{k\}\\exp\\left\(\-\\frac\{1\}\{\\alpha\}W\_\{\-1\}\\left\(\-\\frac\{\\alpha\}\{k^\{\\alpha\}k^\{\\prime\}\}\\right\)\\right\)whereW−1\(\.\)W\_\{\-1\}\(\.\)denotes the decreasing branch of the Lambert W function, defined as

W−1:\[−1/e,0\)↦\(−∞,−1\],W−1​\(x​ex\)=x​∀x≤−1\.W\_\{\-1\}:\[\-1/e,0\)\\mapsto\(\-\\infty,\-1\],\\quad W\_\{\-1\}\(xe^\{x\}\)=x\\penalty 10000\\ \\forall\\penalty 10000\\ x\\leq\-1\.

###### Proof\.

Define the functionf​\(x\)=x​exf\(x\)=xe^\{x\}\. Note thatf′​\(x\)=ex​\(x\+1\)f^\{\\prime\}\(x\)=e^\{x\}\(x\+1\), and hence,f′f^\{\\prime\}is non\-negative forx≥−1x\\geq\-1\. In other words,ffis increasing in the domain\[−1,∞\)\[\-1,\\infty\)and decreasing in the domain\(−∞,−1\]\(\-\\infty,\-1\]\.

Now, let us consider the functionf−1​\(x\)f^\{\-1\}\(x\)\.f−1f^\{\-1\}is increasing in the domain\[f​\(−1\),limx→∞f​\(x\)\)=\[−e−1,∞\)\[f\(\-1\),\\lim\_\{x\\rightarrow\\infty\}f\(x\)\)=\[\-e^\{\-1\},\\infty\); we denote this branch off−1f^\{\-1\}asW0W\_\{0\}\. The other branchW−1W\_\{\-1\}is decreasing in the domain\[−e−1,limx→−∞f​\(x\)\)=\(−e−1,0\)\[\-e^\{\-1\},\\lim\_\{x\\rightarrow\-\\infty\}f\(x\)\)=\(\-e^\{\-1\},0\)\.

Now, rearranging the terms ofXα≥k′​log⁡\(k​X\)X^\{\\alpha\}\\geq k^\{\\prime\}\\log\(kX\)and dividing both sides bykαk^\{\\alpha\}gives us,

1kα​k′≥exp⁡\(−α​log⁡\(k​X\)\)​log⁡\(k​X\)\.\\frac\{1\}\{k^\{\\alpha\}k^\{\\prime\}\}\\geq\\exp\(\-\\alpha\\log\(kX\)\)\\log\(kX\)\.SettingY=−α​log⁡\(k​X\)Y=\-\\alpha\\log\(kX\), we get

−αkα​k′≤Y⋅exp⁡\(Y\)\.\-\\frac\{\\alpha\}\{k^\{\\alpha\}k^\{\\prime\}\}\\leq Y\\cdot\\exp\(Y\)\.Now, to apply the functionW−1W\_\{\-1\}to both sides of the inequality, we require−α​\(kα​k′\)−1∈\[−e−1,0\)\-\\alpha\(k^\{\\alpha\}k^\{\\prime\}\)^\{\-1\}\\in\[\-e^\{\-1\},0\)andY​exp⁡\(Y\)∈\[−e−1,0\)Y\\exp\(Y\)\\in\[\-e^\{\-1\},0\), and more particularly,Y≤−1Y\\leq\-1\.

First, sincek,k′\>0k,k^\{\\prime\}\>0, we have that−α​\(kα​k′\)−1<0\-\\alpha\(k^\{\\alpha\}k^\{\\prime\}\)^\{\-1\}<0\. Also, sincekα​k′≥α​ek^\{\\alpha\}k^\{\\prime\}\\geq\\alpha e, we have that−α​\(kα​k′\)−1≥−e−1\-\\alpha\(k^\{\\alpha\}k^\{\\prime\}\)^\{\-1\}\\geq\-e^\{\-1\}\. Thus, we have that−α​\(kα​k′\)−1∈\[−e−1,0\)\-\\alpha\(k^\{\\alpha\}k^\{\\prime\}\)^\{\-1\}\\in\[\-e^\{\-1\},0\)\.

For the second requirement, note thatY​exp⁡\(Y\)∈\[−e−1,0\)Y\\exp\(Y\)\\in\[\-e^\{\-1\},0\)is satisfied for allY<0Y<0\. However, since the range ofW−1W\_\{\-1\}is\(−∞,−1\]\(\-\\infty,\-1\], we have thatW−1​\(Y​exp⁡\(Y\)\)=YW\_\{\-1\}\(Y\\exp\(Y\)\)=Yif and only ifY≤−1Y\\leq\-1\. The conditionX≥k−1​exp⁡\(α−1\)X\\geq k^\{\-1\}\\exp\(\\alpha^\{\-1\}\)ensuresY≤−1Y\\leq\-1, thus, satisfying the second requirement\.

Thus, applyingW−1W\_\{\-1\}to both sides of the inequality, and using the fact thatW−1W\_\{\-1\}is decreasing gives us:

W−1​\(−αkα​k′\)≥Y=−α​log⁡\(k​X\)\.W\_\{\-1\}\\left\(\-\\frac\{\\alpha\}\{k^\{\\alpha\}k^\{\\prime\}\}\\right\)\\geq Y=\-\\alpha\\log\(kX\)\.Rearranging once again results in

X≥1k​exp⁡\(−1α​W−1​\(−αkα​k′\)\)\.X\\geq\\frac\{1\}\{k\}\\exp\\left\(\-\\frac\{1\}\{\\alpha\}W\_\{\-1\}\\left\(\-\\frac\{\\alpha\}\{k^\{\\alpha\}k^\{\\prime\}\}\\right\)\\right\)\.∎

## Appendix BB\-SlateGLinCBwith Distributional Optimal Designs

In this section, we first present an alternate version ofB\-SlateGLinCB, where we utilize the Distributional Optimal Design\[[27](https://arxiv.org/html/2606.31449#bib.bib6)\]instead of the G\-Optimal design\. We then present the regret guarantees for this algorithm and a proof for the same\.

Algorithm 3B\-SlateGLinCBwith Distributional Optimal Designs1:Inputs:Number of Slots

NN, Number of batches

MM, Horizon

TT, Parameter norm bound

SS, Failure Level

δ\\delta, and non\-linearity

κ\\kappa\.

2:Initialize

𝜽0=𝟎N​d\\bm\{\\theta\}\_\{0\}=\\bm\{0\}\_\{Nd\}and

λ=𝒪​\(N​d​R2​log⁡\(T/δ\)\)\\lambda=\\mathcal\{O\}\\left\(NdR^\{2\}\\log\(T/\\delta\)\\right\)and define batches

𝒯0,𝒯1,…,𝒯M\\mathcal\{T\}\_\{0\},\\mathcal\{T\}\_\{1\},\\ldots,\\mathcal\{T\}\_\{M\}as per \([4](https://arxiv.org/html/2606.31449#S3.E4)\)\.

3:\{Warmup Batch\}

4:for

t∈𝒯0t\\in\\mathcal\{T\}\_\{0\}do

5:Receive the set of items

𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}for all slots

i∈\[N\]i\\in\[N\]\.

6:Play the slate

𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}^\{1\}\_\{t\},\\ldots,\\bm\{x\}^\{N\}\_\{t\}\), where

𝒙ti∼πG​\(𝒳ti\)\\bm\{x\}^\{i\}\_\{t\}\\sim\\pi\_\{G\}\(\\mathcal\{X\}^\{i\}\_\{t\}\)\(defined in Section \([2\.7](https://arxiv.org/html/2606.31449#S2.SS7)\)\), and obtain

rtr\_\{t\}\.

7:endfor

8:Compute

𝜽^0=arg​min𝜽​∑t∈𝒯0ℓ​\(𝒙t,rt,𝜽\)\\widehat\{\\bm\{\\theta\}\}\_\{0\}=\\operatorname\*\{arg\\,min\}\_\{\\bm\{\\theta\}\}\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\ell\(\\bm\{x\}\_\{t\},r\_\{t\},\\bm\{\\theta\}\)as per \([3](https://arxiv.org/html/2606.31449#S2.E3)\) and

𝑽0i=λ​𝑰d\+∑t∈𝒯0𝒙ti​𝒙ti⊤\\bm\{V\}^\{i\}\_\{0\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}for all slots

i∈\[N\]i\\in\[N\]\.//Other batches

9:for

m∈\[M\]m\\in\[M\]do

10:for

t∈𝒯mt\\in\\mathcal\{T\}\_\{m\}do

11:Receive the set of items

𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}for all slots

i∈\[N\]i\\in\[N\]\.

12:for

i∈\[N\]i\\in\[N\]do

13:for

k∈\[0,m−1\]k\\in\[0,m\-1\]do

14:

𝒳ti←\{𝒛∈𝒳ti:UCBi,k​\(𝒛\)≥max𝒚∈𝒳ti⁡LCBi,k​\(𝒚\)\}\\mathcal\{X\}^\{i\}\_\{t\}\\leftarrow\\\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}:\\textrm\{UCB\}^\{i,k\}\(\\bm\{z\}\)\\geq\\max\_\{\\bm\{y\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,k\}\(\\bm\{y\}\)\\\}\{Perform elimination\}\.

15:endfor

16:endfor

17:Play the slate

𝒙t=\(𝒙t1,…,𝒙tN\)\\bm\{x\}\_\{t\}=\(\\bm\{x\}^\{1\}\_\{t\},\\ldots,\\bm\{x\}^\{N\}\_\{t\}\), where

𝒙ti∼πmi​\(𝒳ti\)\\bm\{x\}^\{i\}\_\{t\}\\sim\\pi^\{i\}\_\{m\}\(\\mathcal\{X\}^\{i\}\_\{t\}\)and obtain

rtr\_\{t\}\. Construct the slate

𝒃t=\(𝒃t1,…,𝒃tN\)\\bm\{b\}\_\{t\}=\(\\bm\{b\}^\{1\}\_\{t\},\\ldots,\\bm\{b\}^\{N\}\_\{t\}\), where

𝒃ti=arg​max𝒛∈𝒳ti∥𝒛∥\(𝑽0i\)−1\\bm\{b\}^\{i\}\_\{t\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\.

18:endfor

19:Divide

𝒯m\\mathcal\{T\}\_\{m\}into two sets

𝒯m,A\\mathcal\{T\}\_\{m,A\}and

𝒯m,B\\mathcal\{T\}\_\{m,B\}such that

𝒯m,A∩𝒯m,B=∅\\mathcal\{T\}\_\{m,A\}\\cap\\mathcal\{T\}\_\{m,B\}=\\emptysetand

𝒯m,A∪𝒯m,B=𝒯m\\mathcal\{T\}\_\{m,A\}\\cup\\mathcal\{T\}\_\{m,B\}=\\mathcal\{T\}\_\{m\}\.

20:Using \([3](https://arxiv.org/html/2606.31449#S2.E3)\) and \([9](https://arxiv.org/html/2606.31449#S3.E9)\), compute

𝜽^m=arg​min𝜽​∑t∈𝒯m,Aℓ​\(𝒙t,rt,𝜽\)and𝑯mi=λ​𝑰d\+∑t∈𝒯m,Aμ˙​\(𝒃t⊤​𝜽^0\)βt​𝒙ti​𝒙ti⊤​∀i∈\[N\]\.\\widehat\{\\bm\{\\theta\}\}\_\{m\}=\\operatorname\*\{arg\\,min\}\_\{\\bm\{\\theta\}\}\\sum\_\{t\\in\\mathcal\{T\}\_\{m,A\}\}\\ell\(\\bm\{x\}\_\{t\},r\_\{t\},\\bm\{\\theta\}\)\\quad\\text\{ and \}\\quad\\bm\{H\}^\{i\}\_\{m\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{m,A\}\}\\frac\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{\\beta\_\{t\}\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\;\\forall i\\in\[N\]\.
21:Compute

πm\+1i\\pi^\{i\}\_\{m\+1\}\(Algorithm 3,\[[27](https://arxiv.org/html/2606.31449#bib.bib6)\]\) with the set

\{𝒳ti\}t∈𝒯m,β\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\\}\_\{t\\in\\mathcal\{T\}\_\{m,\\beta\}\}for all

i∈\[N\]i\\in\[N\]\.

22:endfor

###### Theorem B\.1\.

LetR​\(T\)R\(T\)denote the regret of Algorithm[3](https://arxiv.org/html/2606.31449#alg3)\. If

2​d​Nδ​\(48\+8​N​ρ\)​\(N−1\)23​ρ2≥e2\\sqrt\{\\frac\{2dN\}\{\\delta\}\}\\frac\{\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\geq\\frac\{e\}\{2\}and

T≥T0:=δ2​d​N​exp⁡\(−2​W−1​\(−3​ρ2​δ2​2​d​N​\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)2\)\)T\\geq T\_\{0\}:=\\frac\{\\delta\}\{2dN\}\\exp\\left\(\-2W\_\{\-1\}\\left\(\\frac\{\-3\\rho^\{2\}\\sqrt\{\\delta\}\}\{2\\sqrt\{2dN\}\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\}\\right\)\\right\)whereW−1W\_\{\-1\}denotes the decreasing branch of the Lambert W function \(see Lemma[A\.14](https://arxiv.org/html/2606.31449#A1.Ex169)\), then,

R​\(T\)=𝒪~​\(R​S​N​d​T⋅min⁡\{d⋅𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\),N⋅max\{𝒳i∼𝒟i\}i=1N⁡μ˙​\(𝒙⋆⊤​𝜽⋆\)\}\)\.R\(T\)=\\tilde\{\\mathcal\{O\}\}\\left\(RSNd\\sqrt\{T\}\\cdot\\min\\left\\\{\\sqrt\{d\\cdot\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\},\\sqrt\{N\\cdot\\max\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\right\\\}\\right\)\.where𝐱⋆=arg​max𝐱∈𝒳⁡𝐱⊤​𝛉⋆\\bm\{x\}\_\{\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}and𝒳=𝒳1×…×𝒳N\\mathcal\{X\}=\\mathcal\{X\}^\{1\}\\times\\ldots\\times\\mathcal\{X\}^\{N\}\.

###### Proof\.

The proof follows on the same lines as the proof for Theorem[A\.1](https://arxiv.org/html/2606.31449#A1.Thmtheorem1)\. However, the use of distributional optimal designs prompts a change in the way we bound the regret for batchesm≥2m\\geq 2\(Lemma[B\.1](https://arxiv.org/html/2606.31449#A2.Thmlemma1)\)\. Define the optimal slate𝒙t,⋆\\bm\{x\}\_\{t,\\star\}as

𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.The regret for Algorithm[3](https://arxiv.org/html/2606.31449#alg3)can be written as:

R​\(T\)≤𝔼\{𝒳ti∼𝒟i\}i=1N\[∑m∈\[M\]∑t∈𝒯m\|μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\|\]\.R\(T\)\\leq\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\left\[\\sum\_\{m\\in\[M\]\}\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert\\right\]\.Similar to the proof in Theorem[A\.1](https://arxiv.org/html/2606.31449#A1.Thmtheorem1), we choose the batch lengths as

\|𝒯0\|=⌊T⌋and\|𝒯m\|=⌊T1−2−m⌋,m≥1\.\|\\mathcal\{T\}\_\{0\}\|=\\lfloor\\sqrt\{T\}\\rfloor\\quad\\text\{ and \}\\quad\|\\mathcal\{T\}\_\{m\}\|=\\lfloor T^\{1\-2^\{\-m\}\}\\rfloor,m\\geq 1\.Also, forT≥T0T\\geq T\_\{0\}, the conditions of Lemma[B\.1](https://arxiv.org/html/2606.31449#A2.Thmlemma1)are satisfied\. Thus, using Lemma[B\.1](https://arxiv.org/html/2606.31449#A2.Thmlemma1)for batchesm≥2m\\geq 2as well as a trivial regret bound of R for each roundt∈\{𝒯0,𝒯1\}t\\in\\\{\\mathcal\{T\}\_\{0\},\\mathcal\{T\}\_\{1\}\\\}to obtain

R\(T\)≤R\(\|𝒯0\|\+\|𝒯1\)\+∑m∈\[2,M\]320​e3​S​γ2​N2​d2​κ​LμS\|𝒯m\|\|𝒯m−1\|​\|𝒯0\|\+\\displaystyle R\(T\)\\leq R\(\|\\mathcal\{T\}\_\{0\}\|\+\|\\mathcal\{T\}\_\{1\}\)\+\\sum\_\{m\\in\[2,M\]\}\\frac\{320e^\{3S\}\\gamma^\{2\}N^\{2\}d^\{2\}\\sqrt\{\\kappa L\_\{\\mu\}\}\}\{S\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\|\\mathcal\{T\}\_\{0\}\|\}\}\+∑m∈\[2,M\]8​γ​min⁡\{𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​16​d2​N​\|𝒯m\|2\|𝒯m−1\|\+4​Lμ​N​\|𝒯m\|ρ,4​N​max\{𝒳i∼𝒟i\}i=1N⁡μ˙​\(𝒙⋆⊤​𝜽⋆\)⋅d​log⁡d​\|𝒯m\|\|𝒯m−1\|\}\.\\displaystyle\\sum\_\{m\\in\[2,M\]\}8\\gamma\\min\\left\\\{\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sqrt\{\\frac\{16d^\{2\}N\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\+\\frac\{4L\_\{\\mu\}N\|\\mathcal\{T\}\_\{m\}\|\}\{\\rho\}\},4N\\sqrt\{\\max\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\cdot d\\log d\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\}\\right\\\}\.
Substituting the values of\|𝒯m\|\|\\mathcal\{T\}\_\{m\}\|,\|𝒯m\|\|𝒯m−1\|\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\},\|𝒯m\|2\|𝒯m−1\|\\frac\{\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}from Theorem[A\.1](https://arxiv.org/html/2606.31449#A1.Thmtheorem1), and using\|𝒯m\|≤T\|\\mathcal\{T\}\_\{m\}\|\\leq Tand\|𝒯0\|=⌊T⌋≥T/2\|\\mathcal\{T\}\_\{0\}\|=\\lfloor\\sqrt\{T\}\\rfloor\\geq\\sqrt\{T\}/2, we get

R​\(T\)\\displaystyle R\(T\)≤320​2​e3​S​S−1​γ2​N2​d2​κ​Lμ​T1/4​log⁡log⁡T\+2​R​T\+\\displaystyle\\leq 320\\sqrt\{2\}e^\{3S\}S^\{\-1\}\\gamma^\{2\}N^\{2\}d^\{2\}\\sqrt\{\\kappa L\_\{\\mu\}\}T^\{1/4\}\\log\\log T\+2R\\sqrt\{T\}\+8​γ​min⁡\{𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​16​d2​N\+4​Lμ​N​ρ−1,4​N​max\{𝒳i∼𝒟i\}i=1N⁡μ˙​\(𝒙⋆⊤​𝜽⋆\)⋅d​log⁡d\}​T​log⁡log⁡T\.\\displaystyle 8\\gamma\\min\\left\\\{\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sqrt\{16d^\{2\}N\+4L\_\{\\mu\}N\\rho^\{\-1\}\},4N\\sqrt\{\\max\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\cdot d\\log d\}\\right\\\}\\sqrt\{T\}\\log\\log T\.Substituting the value ofγ=𝒪​\(S​R​N​d​log⁡\(T​δ−1\)\)\\gamma=\\mathcal\{O\}\\left\(SR\\sqrt\{Nd\\log\(T\\delta^\{\-1\}\)\}\\right\)from Lemma[A\.1](https://arxiv.org/html/2606.31449#A1.Ex42)gives us

R​\(T\)=O~​\(R​S​N​d​T⋅min⁡\{d⋅𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\),N⋅max\{𝒳i∼𝒟i\}i=1N⁡μ˙​\(𝒙⋆⊤​𝜽⋆\)\}\)\.R\(T\)=\\tilde\{O\}\\left\(RSNd\\sqrt\{T\}\\cdot\\min\\left\\\{\\sqrt\{d\\cdot\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\},\\sqrt\{N\\cdot\\max\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\right\\\}\\right\)\.
∎

### B\.1Supporting Lemmas for Theorem[B\.1](https://arxiv.org/html/2606.31449#A2.Thmtheorem1)

###### Lemma B\.1\.

At roundt∈𝒯mt\\in\\mathcal\{T\}\_\{m\}, let𝐱t,⋆\\bm\{x\}\_\{t,\\star\}be the optimal slate, i\.e,

𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.If\|𝒯0\|≥T​\(𝐕\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(\\bm\{V\}\)and\|𝒯k\|≥T​\(𝐇\)\|\\mathcal\{T\}\_\{k\}\|\\geq T\(\\bm\{H\}\)for all1≤k≤m−11\\leq k\\leq m\-1, then, we have

∑t∈𝒯m𝔼\{𝒳ti∼𝒟mi\}i=1N\|μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\|≤320​e3​S​γ2​N2​d2​κ​LμS​\|𝒯m\|\|𝒯m−1\|​\|𝒯0\|\+\\displaystyle\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\\\}\_\{i=1\}^\{N\}\}\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert\\leq\\frac\{320e^\{3S\}\\gamma^\{2\}N^\{2\}d^\{2\}\\sqrt\{\\kappa L\_\{\\mu\}\}\}\{S\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\|\\mathcal\{T\}\_\{0\}\|\}\}\+8​γ​min⁡\{𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​16​d2​N​\|𝒯m\|2\|𝒯m−1\|\+4​Lμ​N​\|𝒯m\|ρ,4​N​max\{𝒳i∼𝒟i\}i=1N⁡μ˙​\(𝒙⋆⊤​𝜽⋆\)⋅d​log⁡d​\|𝒯m\|\|𝒯m−1\|\}\\displaystyle 8\\gamma\\min\\left\\\{\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sqrt\{\\frac\{16d^\{2\}N\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\+\\frac\{4L\_\{\\mu\}N\|\\mathcal\{T\}\_\{m\}\|\}\{\\rho\}\},4N\\sqrt\{\\max\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\cdot d\\log d\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\}\\right\\\}where𝐱⋆=arg​max𝐱∈𝒳⁡𝐱⊤​𝛉⋆\\bm\{x\}\_\{\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}and𝒳=𝒳1×…×𝒳N\\mathcal\{X\}=\\mathcal\{X\}^\{1\}\\times\\ldots\\times\\mathcal\{X\}^\{N\}\.

###### Proof\.

The proof follows on the lines of Lemma[A\.9](https://arxiv.org/html/2606.31449#A1.Thmlemma9)\. Define𝒙˘:=μ˙​\(𝒃t⊤​𝜽^0\)​βt−1​𝒙\\breve\{\\bm\{x\}\}:=\\sqrt\{\\dot\{\\mu\}\(\\bm\{b\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\beta\_\{t\}^\{\-1\}\}\\bm\{x\}and𝔼\{𝒳ti∼𝒟mi\}i=1N\[\.\]\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\\\}\_\{i=1\}^\{N\}\}\\left\[\.\\right\]as𝔼t,m\[\.\]\\operatorname\*\{\\mathbb\{E\}\}\\limits\_\{t,m\}\[\.\]\. Then, using Lemma[A\.8](https://arxiv.org/html/2606.31449#A1.Ex88), we wish to bound the following two terms \(excluding constants\):

1\.𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)\(∑i=1N∥𝒃ti∥\(𝑽0i\)−1\)1\.\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)\\left\(\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\)2\.𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)2\.\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)\\phantom\{\\left\(\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\)\}
Using Lemma[A\.10](https://arxiv.org/html/2606.31449#A1.Ex132)and Lemma[B\.2](https://arxiv.org/html/2606.31449#A2.Thmlemma2)in tandem, we get

𝔼𝒳i∼𝒟imax𝒛∈𝒳i∥𝒛∥\(𝑽0i\)−12≤8​d2\|𝒯0\|,\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\}\\lVert\\bm\{z\}\\rVert^\{2\}\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\leq\\frac\{8d^\{2\}\}\{\|\\mathcal\{T\}\_\{0\}\|\},𝔼𝒳i∼𝒟mimax𝒛∈𝒳i∥𝒛˘∥\(𝑯mi\)−1≤8​d​log⁡d\|𝒯m\|\.\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}\_\{m\}^\{i\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}\_\{m\}^\{i\}\)^\{\-1\}\}\\leq\\sqrt\{\\frac\{8d\\log d\}\{\|\\mathcal\{T\}\_\{m\}\|\}\}\.Also, since the Distributional Optimal Design samples according to a G\-Optimal Design with half probability \(see Lemma A\.14,\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\), we have

𝔼𝒳i∼𝒟mimax𝒛∈𝒳i∥𝒛˘∥\(𝑯mi\)−12≤16​d2\|𝒯m\|\.\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}\_\{m\}^\{i\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert^\{2\}\_\{\(\\bm\{H\}\_\{m\}^\{i\}\)^\{\-1\}\}\\leq\\frac\{16d^\{2\}\}\{\|\\mathcal\{T\}\_\{m\}\|\}\.Using the same steps as Lemma[A\.9](https://arxiv.org/html/2606.31449#A1.Thmlemma9)to bound the first term, and substituting the optimal design bounds above results in:

∑t∈𝒯m𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1∑i=1N∥𝒃ti∥\(𝑽0i\)−1\)≤12​N2​d2​Lμ​\|𝒯m\|\|𝒯m−1\|​\|𝒯0\|\.\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{b\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}\\right\)\\leq\\frac\{12N^\{2\}d^\{2\}\\sqrt\{L\_\{\\mu\}\}\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\|\\mathcal\{T\}\_\{0\}\|\}\}\.
The second term can be upper\-bounded in two different ways\. First, using the Cauchy\-Schwarz inequality allows us to bound the second term with respect to the average reward sensitivity of the optimal slates, similar to Lemma[A\.9](https://arxiv.org/html/2606.31449#A1.Thmlemma9), i\.e,

∑t∈𝒯m𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\\displaystyle\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}≤𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​∑t∈𝒯m𝔼t,m\(∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\)2\\displaystyle\\leq\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\left\(\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}\\right\)^\{2\}\}≤𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​16​d2​N​\|𝒯m\|2\|𝒯m−1\|\+4​Lμ​N​\|𝒯m\|ρ\.\\displaystyle\\leq\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sqrt\{\\frac\{16d^\{2\}N\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\+\\frac\{4L\_\{\\mu\}N\|\\mathcal\{T\}\_\{m\}\|\}\{\\rho\}\}\.where𝒙⋆=arg​max𝒙∈𝒳⁡𝒙⊤​𝜽⋆\\bm\{x\}\_\{\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}and𝒳=𝒳1×…×𝒳N\\mathcal\{X\}=\\mathcal\{X\}^\{1\}\\times\\ldots\\times\\mathcal\{X\}^\{N\}\. Here, the second inequality follows from Lemma[A\.9](https://arxiv.org/html/2606.31449#A1.Thmlemma9)\.

On the other hand, to leverage the advantage of Distributional Optimal design, we can avoid using the Cauchy\-Schwartz inequality, resulting in

∑t∈𝒯m𝔼t,mμ˙​\(𝒙t,⋆⊤​𝜽⋆\)∑i=1Nmax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\\displaystyle\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sum\_\{i=1\}^\{N\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}≤max\{𝒳i∼𝒟i\}i=1N⁡μ˙​\(𝒙⋆⊤​𝜽⋆\)∑t∈𝒯m∑i=1N𝔼𝒳ti∼𝒟mimax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\\displaystyle\\leq\\sqrt\{\\max\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\sum\_\{i=1\}^\{N\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}≤max\{𝒳i∼𝒟i\}i=1N⁡μ˙​\(𝒙⋆⊤​𝜽⋆\)∑t∈𝒯m∑i=1N𝔼𝒳ti∼𝒟m−1imax𝒛∈𝒳ti∥𝒛˘∥\(𝑯m−1i\)−1\\displaystyle\\leq\\sqrt\{\\max\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\sum\_\{i=1\}^\{N\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}^\{i\}\_\{t\}\\sim\\mathcal\{D\}^\{i\}\_\{m\-1\}\}\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\breve\{\\bm\{z\}\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{m\-1\}\)^\{\-1\}\}≤4​N​max\{𝒳i∼𝒟i\}i=1N⁡μ˙​\(𝒙⋆⊤​𝜽⋆\)⋅d​log⁡d​\|𝒯m\|\|𝒯m−1\|\.\\displaystyle\\leq 4N\\sqrt\{\\max\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\cdot d\\log d\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\}\.where the second inequality follows from Lemma[A\.7](https://arxiv.org/html/2606.31449#A1.Ex85)and the final inequality follows from the optimal design bound given above\.

Using Lemma[A\.8](https://arxiv.org/html/2606.31449#A1.Ex88)and assembling all the bounds, we get that

∑t∈𝒯m𝔼t,m\|μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\|≤320​e3​S​γ2​N2​d2​κ​LμS​\|𝒯m\|\|𝒯m−1\|​\|𝒯0\|\+\\displaystyle\\sum\_\{t\\in\\mathcal\{T\}\_\{m\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{t,m\}\\left\\lvert\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\right\\rvert\\leq\\frac\{320e^\{3S\}\\gamma^\{2\}N^\{2\}d^\{2\}\\sqrt\{\\kappa L\_\{\\mu\}\}\}\{S\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\|\\mathcal\{T\}\_\{0\}\|\}\}\+8​γ​min⁡\{𝔼\{𝒳i∼𝒟i\}i=1Nμ˙​\(𝒙⋆⊤​𝜽⋆\)​16​d2​N​\|𝒯m\|2\|𝒯m−1\|\+4​Lμ​N​\|𝒯m\|ρ,4​N​max\{𝒳i∼𝒟i\}i=1N⁡μ˙​\(𝒙⋆⊤​𝜽⋆\)⋅d​log⁡d​\|𝒯m\|\|𝒯m−1\|\}\.\\displaystyle 8\\gamma\\min\\left\\\{\\sqrt\{\\operatorname\*\{\\mathbb\{E\}\}\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\sqrt\{\\frac\{16d^\{2\}N\|\\mathcal\{T\}\_\{m\}\|^\{2\}\}\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\+\\frac\{4L\_\{\\mu\}N\|\\mathcal\{T\}\_\{m\}\|\}\{\\rho\}\},4N\\sqrt\{\\max\_\{\\\{\\mathcal\{X\}^\{i\}\\sim\\mathcal\{D\}^\{i\}\\\}\_\{i=1\}^\{N\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\cdot d\\log d\}\\frac\{\|\\mathcal\{T\}\_\{m\}\|\}\{\\sqrt\{\|\\mathcal\{T\}\_\{m\-1\}\|\}\}\\right\\\}\.
∎

###### Lemma B\.2\.

\(Theorem 5,\[[27](https://arxiv.org/html/2606.31449#bib.bib6)\]\) Letπ\\pidenote the distributional optimal design that has been learnt usingNNi\.i\.d samples and let

𝑾=𝔼𝒳∼𝒟𝔼𝒙∼π​\(𝒳\)⁡𝒙​𝒙⊤\.\\bm\{W\}=\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}\\sim\\mathcal\{D\}\}\\operatorname\*\{\\mathbb\{E\}\}\_\{\\bm\{x\}\\sim\\pi\(\\mathcal\{X\}\)\}\\bm\{x\}\\bm\{x\}^\{\\top\}\.Then, we have that

Pr\[𝔼𝒳∼𝒟maxx∈𝒳∥𝒙∥𝑾−1≤O\(dlogd\)\]≥1−δ,\\Pr\\left\[\\operatorname\*\{\\mathbb\{E\}\}\_\{\\mathcal\{X\}\\sim\\mathcal\{D\}\}\\max\_\{x\\in\\mathcal\{X\}\}\\lVert\\bm\{x\}\\rVert\_\{\\bm\{W\}^\{\-1\}\}\\leq O\(d\\log d\)\\right\]\\geq 1\-\\delta,whereδ=exp⁡\(O​\(d4​log2⁡d\)−N​d−12⋅2−16\)\\delta=\\exp\(O\(d^\{4\}\\log^\{2\}d\)\-Nd^\{\-12\}\\cdot 2^\{\-16\}\)\.

## Appendix CRegret Analysis forRS\-SlateGLinCB

In this section, we state and prove the regret bound forRS\-SlateGLinCB\(Algorithm[2](https://arxiv.org/html/2606.31449#alg2)\)\.

### C\.1Notations

First, define the following scalars:

λ=𝒪\(RNdSlog\(STδ−1\)\),\\lambda=\\mathcal\{O\}\\left\(R\\sqrt\{NdS\\log\(ST\\delta^\{\-1\}\}\)\\right\),γ=𝒪​\(R2​S3/2​N​d​log⁡\(S​T​δ−1\)\),\\gamma=\\mathcal\{O\}\\left\(R^\{2\}S^\{3/2\}\\sqrt\{Nd\\log\(ST\\delta^\{\-1\}\)\}\\right\),β=𝒪​\(R​N​d​S​log⁡\(S​T​δ−1\)\)\.\\beta=\\mathcal\{O\}\\left\(R\\sqrt\{NdS\\log\(ST\\delta^\{\-1\}\)\}\\right\)\.
Similar to Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1), define𝒙~i=𝒙i⊗𝒆i\\tilde\{\\bm\{x\}\}^\{i\}=\\bm\{x\}^\{i\}\\otimes\\bm\{e\}\_\{i\}so that any slate𝒙=\(𝒙1,…,𝒙N\)\\bm\{x\}=\(\\bm\{x\}^\{1\},\\ldots,\\bm\{x\}^\{N\}\)can be written as

𝒙=∑i=1N𝒙~i\.\\bm\{x\}=\\sum\_\{i=1\}^\{N\}\\tilde\{\\bm\{x\}\}^\{i\}\.
Define the set of warm\-up rounds as𝒯0\\mathcal\{T\}\_\{0\}\. Then, we can define the warm\-up matrix𝑽\\bm\{V\}and the corresponding slot\-level warm\-up matrices𝑽i\\bm\{V\}^\{i\}for alli∈\[N\]i\\in\[N\]as

1. 1\.𝑽=λ​𝑰N​d\+∑t∈𝒯0𝒙t​𝒙t⊤\\bm\{V\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}\.
2. 2\.𝑽i=λ​𝑰d\+∑t∈𝒯0𝒙ti​𝒙ti⊤\\bm\{V\}^\{i\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\\limits\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\.

Define the set of indices𝒯¬0:=\[\|𝒯0\|\+1,T\]\\mathcal\{T\}\_\{\\neg 0\}:=\[\|\\mathcal\{T\}\_\{0\}\|\+1,T\]to be the set of all time rounds post warm\-up\. In particular, define the set𝒯¬0<t:=\[\|𝒯0\|\+1,t−1\]\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}:=\[\|\\mathcal\{T\}\_\{0\}\|\+1,t\-1\]\. For roundt∈𝒯¬0t\\in\\mathcal\{T\}\_\{\\neg 0\}, define the Hessian of the GLM\-MLE loss𝑯t⋆\\bm\{H\}^\{\\star\}\_\{t\}as

𝑯t⋆=λ​𝑰N​d\+∑k∈𝒯¬0<tμ˙​\(𝒙k⊤​𝜽⋆\)​𝒙k​𝒙k⊤\.\\bm\{H\}\_\{t\}^\{\\star\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{k\\in\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{k\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{k\}\\bm\{x\}\_\{k\}^\{\\top\}\.
Since𝜽⋆\\bm\{\\theta\}^\{\\star\}is unknown, we estimate the Hessian using a scaled design matrix𝑯t\\bm\{H\}\_\{t\}, defined as

𝑯t=λ​𝑰N​d\+∑k∈𝒯¬0<tμ˙​\(𝒙k⊤​𝜽^0\)​e−1​𝒙k​𝒙k⊤\.\\bm\{H\}\_\{t\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\\limits\_\{k\\in\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{k\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)e^\{\-1\}\\bm\{x\}\_\{k\}\{\\bm\{x\}\_\{k\}\}^\{\\top\}\.
Also, for alli∈\[N\]i\\in\[N\], we define the slot\-level scaled matrices𝑯ti\\bm\{H\}^\{i\}\_\{t\}as

𝑯ti=λ​𝑰d\+∑k∈𝒯¬0<tμ˙​\(𝒙k⊤​𝜽^0\)​e−1​𝒙ki​𝒙ki⊤\.\\bm\{H\}\_\{t\}^\{i\}=\\lambda\\bm\{I\}\_\{d\}\+\\sum\\limits\_\{k\\in\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{k\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)e^\{\-1\}\\bm\{x\}^\{i\}\_\{k\}\{\\bm\{x\}^\{i\}\_\{k\}\}^\{\\top\}\.
For any time roundt∈𝒯¬0t\\in\\mathcal\{T\}\_\{\\neg 0\}, defines​\(t\)≤ts\(t\)\\leq tto be the last time round where an estimate𝜽^s​\(t\)\\widehat\{\\bm\{\\theta\}\}\_\{s\(t\)\}was computed\. Hence, we have

det𝑯t<2​det𝑯s​\(t\)\.\\det\\bm\{H\}\_\{t\}<2\\det\\bm\{H\}\_\{s\(t\)\}\.Equality is obtained ifs​\(t\)=ts\(t\)=t, which happens ifdet𝑯t≥2​det𝑯s​\(t−1\)\\det\\bm\{H\}\_\{t\}\\geq 2\\det\\bm\{H\}\_\{s\(t\-1\)\}, leading to an update at roundtt\.

Finally, similar to Section[A\.1](https://arxiv.org/html/2606.31449#A1.SS1), for any sloti∈\[N\]i\\in\[N\], item𝒛∈𝒳ti\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}and a prior batchl∈\[M\]l\\in\[M\], we define the scoresUCBi,l​\(𝒛\)\\textrm\{UCB\}^\{i,l\}\(\\bm\{z\}\)andLCBi,l​\(𝒛\)\\textrm\{LCB\}^\{i,l\}\(\\bm\{z\}\)as

UCBi,l​\(𝒛\)=\{𝒛⊤​𝜽^0i\+2​κ​γ​∥𝒛∥\(𝑽0i\)−1l=0,𝒛⊤​𝜽^li\+2​γ​∥𝒛∥\(𝑯li\)−1l≠0\.\\textrm\{UCB\}^\{i,l\}\(\\bm\{z\}\)=\\begin\{cases\}\\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}^\{i\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}&l=0,\\\\ \\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{l\}^\{i\}\+2\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}\_\{l\}^\{i\}\)^\{\-1\}\}&l\\neq 0\.\\end\{cases\}
LCBi,l​\(𝒛\)=\{𝒛⊤​𝜽^0i−2​κ​γ​∥𝒛∥\(𝑽0i\)−1l=0,𝒛⊤​𝜽^li−2​γ​∥𝒛∥\(𝑯li\)−1l≠0\.\\textrm\{LCB\}^\{i,l\}\(\\bm\{z\}\)=\\begin\{cases\}\\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}^\{i\}\-2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}\_\{0\}^\{i\}\)^\{\-1\}\}&l=0,\\\\ \\bm\{z\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{l\}^\{i\}\-2\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{H\}\_\{l\}^\{i\}\)^\{\-1\}\}&l\\neq 0\.\\end\{cases\}Finally, define the following quantities:

T​\(¬0\):=\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)23​ρ2​log⁡\(2​d​N​Tδ\)andT​\(0\):=\(48\+8​N​ρ\)​\(N−1\)23​ρ2​log⁡\(2​d​N​Tδ\)\.T\(\\neg 0\):=\\frac\{\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\\quad\\text\{ and \}\\quad T\(0\):=\\frac\{\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\.
Unless otherwise mentioned, without loss of generality, we assume that all constants such asS,T,R,N,d,κS,T,R,N,d,\\kappaandLμL\_\{\\mu\}are greater than11\.

### C\.2Regret Guarantee forRS\-SlateGLinCB

Now, we restate the regret guarantee forRS\-SlateGLinCB, given in Theorem[4\.1](https://arxiv.org/html/2606.31449#S4.Thmtheorem1)\), and provide a proof for the same\.

###### Theorem C\.1\.

LetR​\(T\)R\(T\)denote the regret ofRS\-SlateGLinCB\(Algorithm[2](https://arxiv.org/html/2606.31449#alg2)\)\. If

2​d​Nδ​\(48\+8​N​ρ\)​\(N−1\)23​ρ2≥e2and16​R6​S7/2​N2​κ​dδ​ρ≥e\\sqrt\{\\frac\{2dN\}\{\\delta\}\}\\frac\{\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\geq\\frac\{e\}\{2\}\\quad\\text\{ and \}\\quad\\frac\{16R^\{6\}S^\{7/2\}N^\{2\}\\kappa d\}\{\\sqrt\{\\delta\}\\rho\}\\geq eand

T≥\\displaystyle T\\geqT0:=max⁡\{δ2​d​N​exp⁡\(−2​W−1​\(−3​ρ2​δ2​2​d​N​\(48​Lμ2\+8​N​Lμ​ρ\)​\(N−1\)2\)\),δS​exp⁡\(−2​W−1​\(−δ​ρ32​R6​S7/2​N2​κ​d\)\)\},\\displaystyle T\_\{0\}:=\\max\\left\\\{\\frac\{\\delta\}\{2dN\}\\exp\\left\(\-2W\_\{\-1\}\\left\(\\frac\{\-3\\rho^\{2\}\\sqrt\{\\delta\}\}\{2\\sqrt\{2dN\}\(48L\_\{\\mu\}^\{2\}\+8NL\_\{\\mu\}\\rho\)\(N\-1\)^\{2\}\}\\right\)\\right\),\\frac\{\\delta\}\{S\}\\exp\\left\(\-2W\_\{\-1\}\\left\(\\frac\{\-\\sqrt\{\\delta\}\\rho\}\{32R^\{6\}S^\{7/2\}N^\{2\}\\kappa d\}\\right\)\\right\)\\right\\\},whereW−1W\_\{\-1\}is the decreasing branch of Lambert W function \(see Lemma[A\.14](https://arxiv.org/html/2606.31449#A1.Ex169)\), then,

R​\(T\)=𝒪~​\(R​T\+R​S1/2​N​d​∑t∈T¬0μ˙​\(𝒙t,⋆⊤​θ⋆\)\)\.R\(T\)=\\tilde\{\\mathcal\{O\}\}\\left\(R\\sqrt\{T\}\+RS^\{1/2\}Nd\\sqrt\{\\sum\_\{t\\in T\_\{\\neg 0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\}\\right\)\.

###### Proof\.

At roundt∈\[T\]t\\in\[T\], let𝒙t,⋆\\bm\{x\}\_\{t,\\star\}be the optimal slate, i\.e,

𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.Then, the regret of Algorithm[2](https://arxiv.org/html/2606.31449#alg2)can be written as

R​\(T\)=∑t∈\[T\]μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\.R\(T\)=\\sum\_\{t\\in\[T\]\}\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\.In Lemma[C\.7](https://arxiv.org/html/2606.31449#A3.Ex284), we bound this exact quantity\. However, we require that\|𝒯0\|≥max⁡\{T​\(0\),8​γ2​R2​ρ−1​κ​N\}\|\\mathcal\{T\}\_\{0\}\|\\geq\\max\\\{T\(0\),8\\gamma^\{2\}R^\{2\}\\rho^\{\-1\}\\kappa N\\\}\. We set\|𝒯0\|=⌊T⌋\|\\mathcal\{T\}\_\{0\}\|=\\lfloor\\sqrt\{T\}\\rfloorresulting in the inequality:

T≥max⁡\{\(48\+8​N​ρ\)​\(N−1\)23​ρ2​log⁡\(2​d​N​Tδ\),8​γ2​R2​ρ−1​κ​N\}\.\\sqrt\{T\}\\geq\\max\\left\\\{\\frac\{\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\),8\\gamma^\{2\}R^\{2\}\\rho^\{\-1\}\\kappa N\\right\\\}\.Since,

2​d​Nδ​\(48\+8​N​ρ\)​\(N−1\)23​ρ2≥e2and16​R6​S7/2​N2​κ​dδ​ρ≥e,\\sqrt\{\\frac\{2dN\}\{\\delta\}\}\\frac\{\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\geq\\frac\{e\}\{2\}\\quad\\text\{ and \}\\quad\\frac\{16R^\{6\}S^\{7/2\}N^\{2\}\\kappa d\}\{\\sqrt\{\\delta\}\\rho\}\\geq e,and also

T≥2≥δS​exp⁡\(12\),andT≥2≥δ2​d​N​exp⁡\(12\),T\\geq 2\\geq\\frac\{\\delta\}\{S\}\\exp\\left\(\\frac\{1\}\{2\}\\right\),\\quad\\text\{ and \}\\quad T\\geq 2\\geq\\frac\{\\delta\}\{2dN\}\\exp\\left\(\\frac\{1\}\{2\}\\right\),we can use Lemma[A\.14](https://arxiv.org/html/2606.31449#A1.Ex169)and the definition ofγ\\gamma\(Section[C\.1](https://arxiv.org/html/2606.31449#A3.SS1)\) to get

T≥max⁡\{δ2​d​N​exp⁡\(−2​W−1​\(−3​ρ2​δ2​2​d​N​\(48\+8​N​ρ\)​\(N−1\)2\)\),δS​exp⁡\(−2​W−1​\(−δ​ρ32​R6​S7/2​N2​κ​d\)\)\}\.T\\geq\\max\\left\\\{\\frac\{\\delta\}\{2dN\}\\exp\\left\(\-2W\_\{\-1\}\\left\(\\frac\{\-3\\rho^\{2\}\\sqrt\{\\delta\}\}\{2\\sqrt\{2dN\}\(48\+8N\\rho\)\(N\-1\)^\{2\}\}\\right\)\\right\),\\frac\{\\delta\}\{S\}\\exp\\left\(\-2W\_\{\-1\}\\left\(\\frac\{\-\\sqrt\{\\delta\}\\rho\}\{32R^\{6\}S^\{7/2\}N^\{2\}\\kappa d\}\\right\)\\right\)\\right\\\}\.From Lemma[C\.7](https://arxiv.org/html/2606.31449#A3.Ex284), we also have\|𝒯¬0<t\|≥T​\(¬0\)\|\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\|\\geq T\(\\neg 0\)\. Now, lett′t^\{\\prime\}be such that\|𝒯¬0<t′\|=⌊T⌋\|\\mathcal\{T\}^\{<t^\{\\prime\}\}\_\{\\neg 0\}\|=\\lfloor\\sqrt\{T\}\\rfloor\. Such at′t^\{\\prime\}exists becauseT≥2T\\geq 2and hence,T≥\|𝒯0\|\+\|𝒯¬0<t′\|=2​⌊T⌋T\\geq\|\\mathcal\{T\}\_\{0\}\|\+\|\\mathcal\{T\}^\{<t^\{\\prime\}\}\_\{\\neg 0\}\|=2\\lfloor\\sqrt\{T\}\\rfloor\. Let\|𝒯¬0<t′\|≥T​\(¬0\)\|\\mathcal\{T\}^\{<t^\{\\prime\}\}\_\{\\neg 0\}\|\\geq T\(\\neg 0\), then, we have

T≥\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)23​ρ2​log⁡\(2​d​N​Tδ\)\\sqrt\{T\}\\geq\\frac\{\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\}\{3\\rho^\{2\}\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)resulting in the bound \(using Lemma[A\.14](https://arxiv.org/html/2606.31449#A1.Ex169)\)

T≥δ2​d​N​exp⁡\(−2​W​\(−3​ρ2​δ2​2​d​N​\(48​Lμ2\+8​N​Lμ​ρ\)​\(N−1\)2\)\)\.T\\geq\\frac\{\\delta\}\{2dN\}\\exp\\left\(\-2W\\left\(\\frac\{\-3\\rho^\{2\}\\sqrt\{\\delta\}\}\{2\\sqrt\{2dN\}\(48L\_\{\\mu\}^\{2\}\+8NL\_\{\\mu\}\\rho\)\(N\-1\)^\{2\}\}\\right\)\\right\)\.Using the fact that bothexp⁡\(−x\)\\exp\(\-x\)andW−1​\(x\)W\_\{\-1\}\(x\)are decreasing \(in the domain\(−e−1,0\)\(\-e^\{\-1\},0\)Lemma[A\.14](https://arxiv.org/html/2606.31449#A1.Ex169)\) andLμ≥1L\_\{\\mu\}\\geq 1, we get the final bound onTTas

T≥\\displaystyle T\\geqT0:=max⁡\{δ2​d​N​exp⁡\(−2​W−1​\(−3​ρ2​δ2​2​d​N​\(48​Lμ2\+8​N​Lμ​ρ\)​\(N−1\)2\)\),δS​exp⁡\(−2​W−1​\(−δ​ρ32​R6​S7/2​N2​κ​d\)\)\}\.\\displaystyle T\_\{0\}:=\\max\\left\\\{\\frac\{\\delta\}\{2dN\}\\exp\\left\(\-2W\_\{\-1\}\\left\(\\frac\{\-3\\rho^\{2\}\\sqrt\{\\delta\}\}\{2\\sqrt\{2dN\}\(48L\_\{\\mu\}^\{2\}\+8NL\_\{\\mu\}\\rho\)\(N\-1\)^\{2\}\}\\right\)\\right\),\\frac\{\\delta\}\{S\}\\exp\\left\(\-2W\_\{\-1\}\\left\(\\frac\{\-\\sqrt\{\\delta\}\\rho\}\{32R^\{6\}S^\{7/2\}N^\{2\}\\kappa d\}\\right\)\\right\)\\right\\\}\.Now, assumingT≥T0T\\geq T\_\{0\}, we can splitR​\(T\)R\(T\)as

R​\(T\)\\displaystyle R\(T\)=∑t∈𝒯0μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\+∑t∈𝒯¬0<t′μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\+∑t=t′Tμ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\\displaystyle=\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\+\\sum\_\{t\\in\\mathcal\{T\}^\{<t^\{\\prime\}\}\_\{\\neg 0\}\}\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\+\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)≤2​R​T\+∑t=t′Tμ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\\displaystyle\\leq 2R\\sqrt\{T\}\+\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)where we use a trivial regret bound ofRRfort∈𝒯0∪𝒯¬0<t′t\\in\\mathcal\{T\}\_\{0\}\\cup\\mathcal\{T\}^\{<t^\{\\prime\}\}\_\{\\neg 0\}alongside the fact that\|𝒯0\|=\|𝒯¬0<t′\|=⌊T⌋\|\\mathcal\{T\}\_\{0\}\|=\|\\mathcal\{T\}^\{<t^\{\\prime\}\}\_\{\\neg 0\}\|=\\lfloor\\sqrt\{T\}\\rfloor\.

Now, using Lemma[C\.7](https://arxiv.org/html/2606.31449#A3.Ex284), we have

∑t=t′Tμ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)≤4​2​e5​β​∑t=t′Tμ˙​\(𝒙t,⋆⊤​θ⋆\)⋅e−1​μ˙​\(𝒙t⊤​𝜽^0\)​∑i=1N∥𝒙ti∥\(𝑯ti\)−1\\displaystyle\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\leq 4\\sqrt\{2\}e^\{5\}\\beta\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\\cdot e^\{\-1\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}≤4​2​e5​β​∑t=t′Tμ˙​\(𝒙t,⋆⊤​θ⋆\)⋅∑t=t′Tμ˙​\(𝒙t⊤​𝜽^0\)e​\(∑i=1N∥𝒙ti∥\(𝑯ti\)−1\)2\\displaystyle\\phantom\{\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\}\\leq 4\\sqrt\{2\}e^\{5\}\\beta\\sqrt\{\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\\cdot\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\frac\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{e\}\\left\(\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\\right\)^\{2\}\}≤4​2​e5​β​∑t=t′Tμ˙​\(𝒙t,⋆⊤​θ⋆\)⋅\(∑i=1N∑t=t′T‖μ˙​\(𝒙t⊤​𝜽^0\)e​𝒙ti‖\(𝑯ti\)−12⏟Term A\+∑i=1N∑j=1j≠iN∑t=t′Tμ˙​\(𝒙t⊤​𝜽^0\)e​∥𝒙ti∥\(𝑯ti\)−1​∥𝒙tj∥\(𝑯tj\)−1⏟Term B\)\\displaystyle\\leq 4\\sqrt\{2\}e^\{5\}\\beta\\sqrt\{\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\\cdot\\left\(\\underbrace\{\\sum\_\{i=1\}^\{N\}\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\left\\lVert\\sqrt\{\\frac\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{e\}\}\\bm\{x\}^\{i\}\_\{t\}\\right\\rVert^\{2\}\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\}\_\{\\text\{Term A\}\}\+\\underbrace\{\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\frac\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{e\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\\lVert\\bm\{x\}^\{j\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{j\}\_\{t\}\)^\{\-1\}\}\}\_\{\\text\{Term B\}\}\\right\)\}
Bounding Term A, we get

∑i=1N∑t=t′T‖μ˙​\(𝒙t⊤​𝜽^0\)e​𝒙ti‖\(𝑯ti\)−12≤2​N​d​log⁡\(1\+Lμ⋅te⋅N​λ​d\)≤2​N​d​log⁡T\.\\sum\_\{i=1\}^\{N\}\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\left\\lVert\\sqrt\{\\frac\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{e\}\}\\bm\{x\}^\{i\}\_\{t\}\\right\\rVert^\{2\}\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\\leq 2Nd\\log\\left\(1\+\\frac\{L\_\{\\mu\}\\cdot t\}\{e\\cdot N\\lambda d\}\\right\)\\leq 2Nd\\log T\.Here, we use Lemma[C\.11](https://arxiv.org/html/2606.31449#A3.Ex316)with the vectors𝒙˘ti:=μ˙​\(𝒙t⊤​𝜽^0\)e​𝒙ti\\breve\{\\bm\{x\}\}^\{i\}\_\{t\}:=\\sqrt\{\\frac\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{e\}\}\\bm\{x\}^\{i\}\_\{t\}, resulting in∥𝒙˘ti∥≤Lμ​e−1⋅N−1\\lVert\\breve\{\\bm\{x\}\}^\{i\}\_\{t\}\\rVert\\leq\\sqrt\{L\_\{\\mu\}e^\{\-1\}\\cdot N^\{\-1\}\}\.

Bounding Term B, we get

∑i=1N∑j=1j≠iN∑t=t′Tμ˙​\(𝒙t⊤​𝜽^0\)e​∥𝒙ti∥\(𝑯ti\)−1​∥𝒙tj∥\(𝑯tj\)−1≤∑i=1N∑j=1j≠iN∑t=t′TLμ​∥𝒙ti∥​∥𝒙tj∥λmin​\(𝑯ti\)​λmin​\(𝑯tj\),\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\frac\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{e\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\\lVert\\bm\{x\}^\{j\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{j\}\_\{t\}\)^\{\-1\}\}\\leq\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\sum\_\{t=t^\{\\prime\}\}^\{T\}L\_\{\\mu\}\\frac\{\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\\lVert\\bm\{x\}^\{j\}\_\{t\}\\rVert\}\{\\sqrt\{\\lambda\_\{\\min\}\(\\bm\{H\}^\{i\}\_\{t\}\)\\lambda\_\{\\min\}\(\\bm\{H\}^\{j\}\_\{t\}\)\}\},where the inequality uses Rayleigh’s quotient\. Since, for allt∈\[t′,T\]t\\in\[t^\{\\prime\},T\],\|𝒯¬0<t\|≥\|𝒯¬0<t′\|≥T​\(¬0\)\|\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\|\\geq\|\\mathcal\{T\}^\{<t^\{\\prime\}\}\_\{\\neg 0\}\|\\geq T\(\\neg 0\), using Lemma[C\.9](https://arxiv.org/html/2606.31449#A3.Ex304), for alli∈\[N\]i\\in\[N\]

λmin​\(𝑯ti\)≥λ\+ρ​\|𝒯¬0<t\|2\.\\lambda\_\{\\min\}\(\\bm\{H\}^\{i\}\_\{t\}\)\\geq\\lambda\+\\frac\{\\rho\|\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\|\}\{2\}\.Substituting this back, we get

∑i=1N∑j=1j≠iN∑t=t′Tμ˙​\(𝒙t⊤​𝜽^0\)e​∥𝒙ti∥\(𝑯ti\)−1​∥𝒙tj∥\(𝑯tj\)−1\\displaystyle\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\frac\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{e\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\\lVert\\bm\{x\}^\{j\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{j\}\_\{t\}\)^\{\-1\}\}≤∑i=1N∑j=1j≠iN∑t=t′TLμ​∥𝒙ti∥​∥𝒙tj∥λ\+0\.5​ρ​\|𝒯¬0<t\|\\displaystyle\\leq\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\sum\_\{t=t^\{\\prime\}\}^\{T\}L\_\{\\mu\}\\frac\{\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\\lVert\\bm\{x\}^\{j\}\_\{t\}\\rVert\}\{\\lambda\+0\.5\\rho\|\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\|\}≤∑i=1N∑j=1j≠iN2​Lμρ​∥𝒙ti∥​∥𝒙tj∥​\(∑s∈\[\|𝒯¬0\|\]1s\)\.\\displaystyle\\leq\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\frac\{2L\_\{\\mu\}\}\{\\rho\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\\lVert\\bm\{x\}^\{j\}\_\{t\}\\rVert\\left\(\\sum\_\{s\\in\[\|\\mathcal\{T\}\_\{\\neg 0\}\|\]\}\\frac\{1\}\{s\}\\right\)\.Using the sum of the Harmonic series, alongside the fact that∀i∈\[N\],∥𝒙i∥≤N−1\\forall i\\in\[N\],\\lVert\\bm\{x\}^\{i\}\\rVert\\leq\\sqrt\{N^\{\-1\}\}, we get a bound on Term B as

∑i=1N∑j=1j≠iN∑t=t′Tμ˙​\(𝒙t⊤​𝜽^0\)e​∥𝒙ti∥\(𝑯ti\)−1​∥𝒙tj∥\(𝑯tj\)−1≤2​N​Lμ​ρ−1​log⁡T\.\\sum\_\{i=1\}^\{N\}\\sum\_\{\\begin\{subarray\}\{c\}j=1\\\\ j\\neq i\\end\{subarray\}\}^\{N\}\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\frac\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\{e\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\\lVert\\bm\{x\}^\{j\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{j\}\_\{t\}\)^\{\-1\}\}\\leq 2NL\_\{\\mu\}\\rho^\{\-1\}\\log T\.Combining all the bounds, we get

R​\(T\)≤2​R​T\+4​2​e5​β​\(∑t=t′Tμ˙​\(𝒙t,⋆⊤​θ⋆\)\)⋅\(2​N​d\+2​N​lμ​ρ−1\)​log⁡T\.R\(T\)\\leq 2R\\sqrt\{T\}\+4\\sqrt\{2\}e^\{5\}\\beta\\sqrt\{\\left\(\\sum\_\{t=t^\{\\prime\}\}^\{T\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\\right\)\\cdot\\left\(2Nd\+2Nl\_\{\\mu\}\\rho^\{\-1\}\\right\)\\log T\}\.Substituting the value ofβ=𝒪​\(R​N​d​S​log⁡\(S​T​δ−1\)\)\\beta=\\mathcal\{O\}\\left\(R\\sqrt\{NdS\\log\(ST\\delta^\{\-1\}\)\}\\right\)from Section[C\.1](https://arxiv.org/html/2606.31449#A3.SS1), we get

R​\(T\)=𝒪~​\(R​T\+R​S1/2​N​d​∑t∈T¬0μ˙​\(𝒙t,⋆⊤​θ⋆\)\)\.R\(T\)=\\tilde\{\\mathcal\{O\}\}\\left\(R\\sqrt\{T\}\+RS^\{1/2\}Nd\\sqrt\{\\sum\_\{t\\in T\_\{\\neg 0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\}\\right\)\.
∎

### C\.3Supporting Lemmas for Theorem[C\.1](https://arxiv.org/html/2606.31449#A3.Ex222)

###### Lemma C\.1\.

\(Lemma B\.3 and B\.7,\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\) Let𝛉^0\\widehat\{\\bm\{\\theta\}\}\_\{0\}be the MLE estimate of𝛉⋆\\bm\{\\theta\}^\{\\star\}learned using\{𝐱t,rt\}t∈𝒯0\\\{\\bm\{x\}\_\{t\},r\_\{t\}\\\}\_\{t\\in\\mathcal\{T\}\_\{0\}\}, while𝛉^s​\(t\)\\widehat\{\\bm\{\\theta\}\}\_\{s\(t\)\}be the MLE estimate of𝛉⋆\\bm\{\\theta\}^\{\\star\}learned using\{𝐱t,rt\}t∈𝒯¬0<s​\(t\)\\\{\\bm\{x\}\_\{t\},r\_\{t\}\\\}\_\{t\\in\\mathcal\{T\}^\{<s\(t\)\}\_\{\\neg 0\}\}\. Let𝐇t⋆\\bm\{H\}^\{\\star\}\_\{t\}be defined as in Section[C\.1](https://arxiv.org/html/2606.31449#A3.SS1)\. Also, define

𝑯0⋆=λ​𝑰N​d\+∑t∈𝒯0μ˙​\(𝒙t⊤​𝜽⋆\)​𝒙t​𝒙t⊤\.\\bm\{H\}\_\{0\}^\{\\star\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}\.Then, we have

∥𝜽⋆−𝜽^0∥𝑯0⋆≤γand∥𝜽⋆−𝜽^s​\(t\)∥𝑯s​\(t\)⋆≤β\.\\lVert\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rVert\_\{\\bm\{H\}^\{\\star\}\_\{0\}\}\\leq\\gamma\\quad\\text\{ and \}\\quad\\lVert\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{s\(t\)\}\\rVert\_\{\\bm\{H\}^\{\\star\}\_\{s\(t\)\}\}\\leq\\beta\.

###### Lemma C\.2\.

Lett∈𝒯¬0t\\in\\mathcal\{T\}\_\{\\neg 0\}\. Also, let\|𝒯0\|≥max⁡\{T​\(0\),8​γ2​R2​ρ−1​κ​N\}\|\\mathcal\{T\}\_\{0\}\|\\geq\\max\\\{T\(0\),8\\gamma^\{2\}R^\{2\}\\rho^\{\-1\}\\kappa N\\\}, then, we have

\|𝒙t⊤​\(𝜽⋆−𝜽^0\)\|≤1R\.\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\right\\rvert\\leq\\frac\{1\}\{R\}\.

###### Proof\.

For the sake of this proof, define

𝑯0⋆=λ​𝑰N​d\+∑t∈𝒯0μ˙​\(𝒙t⊤​𝜽⋆\)​𝒙t​𝒙t⊤\.\\bm\{H\}\_\{0\}^\{\\star\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\_\{t\\in\\mathcal\{T\}\_\{0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{t\}\\bm\{x\}\_\{t\}^\{\\top\}\.Then, we have

\|𝒙⊤​\(𝜽⋆−𝜽^0\)\|\\displaystyle\\left\\lvert\\bm\{x\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\right\\rvert≤∥𝒙∥\(𝑯0⋆\)−1​∥𝜽⋆−𝜽^0∥𝑯0⋆\\displaystyle\\leq\\lVert\\bm\{x\}\\rVert\_\{\(\\bm\{H\}\_\{0\}^\{\\star\}\)^\{\-1\}\}\\lVert\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\rVert\_\{\\bm\{H\}^\{\\star\}\_\{0\}\}≤γ​κ​∥𝒙∥𝑽−1\\displaystyle\\leq\\gamma\\sqrt\{\\kappa\}\\lVert\\bm\{x\}\\rVert\_\{\\bm\{V\}^\{\-1\}\}≤2​γ​κ​∑i=1N∥𝒙i∥\(𝑽i\)−1\\displaystyle\\leq 2\\gamma\\sqrt\{\\kappa\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\)^\{\-1\}\}≤2​γ​κ​∑i=1N∥𝒙i∥λmin​\(𝑽i\)\\displaystyle\\leq 2\\gamma\\sqrt\{\\kappa\}\\sum\_\{i=1\}^\{N\}\\frac\{\\lVert\\bm\{x\}^\{i\}\\rVert\}\{\\sqrt\{\\lambda\_\{\\min\}\(\\bm\{V\}^\{i\}\)\}\}where the first inequality uses the Cauchy\-Schwarz inequality, the second inequality follows from Lemma[C\.1](https://arxiv.org/html/2606.31449#A3.Ex247)and Lemma[A\.2](https://arxiv.org/html/2606.31449#A1.Ex43), the third inequality follows from Lemma[C\.10](https://arxiv.org/html/2606.31449#A3.Ex314), and the final inequality follows from Rayleigh’s quotient\.

Now, from Lemma[C\.10](https://arxiv.org/html/2606.31449#A3.Ex314), we have that

λmin​\(𝑽i\)≥λ\+ρ​\|𝒯0\|2\.\\lambda\_\{\\min\}\(\\bm\{V\}^\{i\}\)\\geq\\lambda\+\\frac\{\\rho\|\\mathcal\{T\}\_\{0\}\|\}\{2\}\.Thus, using the fact that∥𝒙i∥≤N−1\\lVert\\bm\{x\}^\{i\}\\rVert\\leq\\sqrt\{N^\{\-1\}\}, we have

\|𝒙⊤​\(𝜽⋆−𝜽^0\)\|≤2​γ​κ​Nλ\+0\.5​ρ​\|𝒯0\|\.\\left\\lvert\\bm\{x\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\right\\rvert\\leq 2\\gamma\\sqrt\{\\frac\{\\kappa N\}\{\\lambda\+0\.5\\rho\|\\mathcal\{T\}\_\{0\}\|\}\}\.Finally, using the fact that\|𝒯0\|≥8​γ2​R2​ρ−1​κ​N\|\\mathcal\{T\}\_\{0\}\|\\geq 8\\gamma^\{2\}R^\{2\}\\rho^\{\-1\}\\kappa N, we get that

\|𝒙⊤​\(𝜽⋆−𝜽^0\)\|≤1R\.\\left\\lvert\\bm\{x\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\right\\rvert\\leq\\frac\{1\}\{R\}\.∎

###### Lemma C\.3\.

For some roundt∈𝒯¬0t\\in\\mathcal\{T\}\_\{\\neg 0\}, let𝐇t⋆\\bm\{H\}^\{\\star\}\_\{t\}and𝐇t\\bm\{H\}\_\{t\}be defined as in Section[C\.1](https://arxiv.org/html/2606.31449#A3.SS1)\. If\|𝒯0\|≥max⁡\{T​\(0\),8​γ2​R2​ρ−1​κ​N\}\|\\mathcal\{T\}\_\{0\}\|\\geq\\max\\\{T\(0\),8\\gamma^\{2\}R^\{2\}\\rho^\{\-1\}\\kappa N\\\}, then, we have

𝑯t⪯𝑯t⋆\.\\bm\{H\}\_\{t\}\\preceq\\bm\{H\}^\{\\star\}\_\{t\}\.

###### Proof\.

Using the self\-concordance property of GLMs, we have that

exp⁡\(−R​\|𝒙⊤​\(𝜽⋆−𝜽^0\)\|\)⋅μ˙​\(𝒙⊤​𝜽^0\)⪯μ˙​\(𝒙⊤​𝜽⋆\)⪯exp⁡\(R​\|𝒙⊤​\(𝜽⋆−𝜽^0\)\|\)⋅μ˙​\(𝒙⊤​𝜽^0\)\.\\exp\\left\(\-R\\left\\lvert\\bm\{x\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\right\\rvert\\right\)\\cdot\\dot\{\\mu\}\(\\bm\{x\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\preceq\\dot\{\\mu\}\(\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\preceq\\exp\\left\(R\\left\\lvert\\bm\{x\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\right\\rvert\\right\)\\cdot\\dot\{\\mu\}\(\\bm\{x\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\.From Lemma[C\.2](https://arxiv.org/html/2606.31449#A3.Ex248), we get

e−1⋅μ˙​\(𝒙⊤​𝜽^0\)⪯μ˙​\(𝒙⊤​𝜽⋆\)⪯e⋅μ˙​\(𝒙⊤​𝜽^0\)\.e^\{\-1\}\\cdot\\dot\{\\mu\}\(\\bm\{x\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\preceq\\dot\{\\mu\}\(\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\preceq e\\cdot\\dot\{\\mu\}\(\\bm\{x\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\.Hence, we can write,

𝑯t⋆=λ​𝑰N​d\+∑s∈𝒯¬0<tμ˙​\(𝒙s⊤​𝜽⋆\)​𝒙s​𝒙s⊤⪰λ​𝑰N​d\+∑s∈𝒯¬0<tμ˙​\(𝒙s⊤​𝜽^0\)​e−1​𝒙s​𝒙s⊤=𝑯t\.\\bm\{H\}\_\{t\}^\{\\star\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\_\{s\\in\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{s\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\\succeq\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\_\{s\\in\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\}\\dot\{\\mu\}\(\\bm\{x\}\_\{s\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)e^\{\-1\}\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}=\\bm\{H\}\_\{t\}\.∎

###### Lemma C\.4\.

For some roundt∈𝒯¬0t\\in\\mathcal\{T\}\_\{\\neg 0\}, lets​\(t\)≤ts\(t\)\\leq tbe the most recent time round at which the policy was updated\. Then, we have that

∥𝒙∥𝑯s​\(t\)−12≤2​∥𝒙∥𝑯t−12\.\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{H\}\_\{s\(t\)\}^\{\-1\}\}\\leq 2\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{H\}\_\{t\}^\{\-1\}\}\.

###### Proof\.

First, from the definition ofs​\(t\)s\(t\), we have thats​\(t\)≤ts\(t\)\\leq t\. Thus, we have that

\|𝒯¬0<s​\(t\)\|≤\|𝒯¬0<t\|⟹𝑯s​\(t\)⪯𝑯t⟹𝑯s​\(t\)−1⪰𝑯t−1\.\|\\mathcal\{T\}^\{<s\(t\)\}\_\{\\neg 0\}\|\\leq\|\\mathcal\{T\}\_\{\\neg 0\}^\{<t\}\|\\implies\\bm\{H\}\_\{s\(t\)\}\\preceq\\bm\{H\}\_\{t\}\\implies\\bm\{H\}\_\{s\(t\)\}^\{\-1\}\\succeq\\bm\{H\}\_\{t\}^\{\-1\}\.Hence, applying Lemma[C\.12](https://arxiv.org/html/2606.31449#A3.Ex317)with𝑨=𝑯s​\(t\)−1\\bm\{A\}=\\bm\{H\}\_\{s\(t\)\}^\{\-1\}and𝑩=𝑯t−1\\bm\{B\}=\\bm\{H\}\_\{t\}^\{\-1\}, we get that

∥𝒙∥𝑯s​\(t\)−12≤∥𝒙∥𝑯t−12⋅det𝑯s​\(t\)−1det𝑯t−1\.\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{H\}^\{\-1\}\_\{s\(t\)\}\}\\leq\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{H\}^\{\-1\}\_\{t\}\}\\cdot\\frac\{\\det\\bm\{H\}\_\{s\(t\)\}^\{\-1\}\}\{\\det\\bm\{H\}\_\{t\}^\{\-1\}\}\.Using the fact thatdet𝑯t≤2​det𝑯s​\(t\)\\det\\bm\{H\}\_\{t\}\\leq 2\\det\\bm\{H\}\_\{s\(t\)\}\(Section[C\.1](https://arxiv.org/html/2606.31449#A3.SS1)\), we get that

∥𝒙∥𝑯s​\(t\)−12≤∥𝒙∥𝑯t−12⋅det𝑯tdet𝑯s​\(t\)≤2​∥𝒙∥𝑯t−12\.\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{H\}^\{\-1\}\_\{s\(t\)\}\}\\leq\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{H\}^\{\-1\}\_\{t\}\}\\cdot\\frac\{\\det\\bm\{H\}\_\{t\}\}\{\\det\\bm\{H\}\_\{s\(t\)\}\}\\leq 2\\lVert\\bm\{x\}\\rVert^\{2\}\_\{\\bm\{H\}^\{\-1\}\_\{t\}\}\.∎

###### Lemma C\.5\.

Lett∈𝒯¬0t\\in\\mathcal\{T\}\_\{\\neg 0\}\. Define the optimal slate𝐱t,⋆\\bm\{x\}\_\{t,\\star\}as

𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.If\|𝒯0\|≥T​\(0\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(0\), then the optimal slate never gets eliminated\.

###### Proof\.

First, it is easy to see that all the components of the optimal slate, i\.e,𝒙t,⋆i\\bm\{x\}^\{i\}\_\{t,\\star\}are also optimal w\.r\.t𝜽⋆i\{\\bm\{\\theta\}^\{\\star\}\}^\{i\}\. In other words, for alli∈\[N\]i\\in\[N\], we have

𝒙t,⋆i=arg​max𝒛∈𝒳ti⁡𝒛~⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}^\{i\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\tilde\{\\bm\{z\}\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.
Fixi∈\[N\]i\\in\[N\]\. Then, for some arbitrary𝒛∈𝒳ti\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}, we have that

0\\displaystyle 0≤\(𝒙~t,⋆i−𝒛~\)⊤​𝜽⋆\\displaystyle\\leq\\left\(\\tilde\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\-\\tilde\{\\bm\{z\}\}\\right\)^\{\\top\}\\bm\{\\theta\}^\{\\star\}=\(𝒙~t,⋆i−𝒛~\)⊤​\(𝜽⋆−𝜽^0\)\+\(𝒙~t,⋆i−𝒛~\)⊤​𝜽^0\\displaystyle=\\left\(\\tilde\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\-\\tilde\{\\bm\{z\}\}\\right\)^\{\\top\}\\left\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\)\+\\left\(\\tilde\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\-\\tilde\{\\bm\{z\}\}\\right\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}≤2​κ​γ​∥𝒙t,⋆i∥\(𝑽i\)−1\+2​κ​γ​∥𝒛∥\(𝑽i\)−1\+\(𝒙~t,⋆i−𝒛~\)⊤​𝜽^0\\displaystyle\\leq 2\\sqrt\{\\kappa\}\\gamma\\lVert\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\\rVert\_\{\(\\bm\{V\}^\{i\}\)^\{\-1\}\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}^\{i\}\)^\{\-1\}\}\+\\left\(\\tilde\{\\bm\{x\}\}^\{i\}\_\{t,\\star\}\-\\tilde\{\\bm\{z\}\}\\right\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}=UCBi,0​\(𝒙t,⋆i\)−LCBi,0​\(𝒛\)\\displaystyle=\\textrm\{UCB\}^\{i,0\}\(\\bm\{x\}^\{i\}\_\{t,\\star\}\)\-\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}\)where the second inequality follows from Lemma[A\.2](https://arxiv.org/html/2606.31449#A1.Ex43)\. Since this is true∀𝒛∈𝒳ti\\forall\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}, we get

UCBi,0​\(𝒙t,⋆i\)≥max𝒛∈𝒳ti⁡LCBi,0​\(𝒛\)\\textrm\{UCB\}^\{i,0\}\(\\bm\{x\}^\{i\}\_\{t,\\star\}\)\\geq\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}\)or in other words,

UCBi,0​\(𝒙t,⋆i\)−max𝒛∈𝒳ti⁡LCBi,0​\(𝒛\)≥0\.\\textrm\{UCB\}^\{i,0\}\(\\bm\{x\}^\{i\}\_\{t,\\star\}\)\-\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}\)\\geq 0\.
Since this holds for a fixed but arbitraryi∈\[N\]i\\in\[N\], the above inequality holds for alli∈\[N\]i\\in\[N\]\. Thus, the components of the optimal slate, and hence, the optimal slate never get eliminated\. ∎

###### Lemma C\.6\.

Lett∈𝒯¬0t\\in\\mathcal\{T\}\_\{\\neg 0\}\. Let𝐱,𝐲∈𝒳t\\bm\{x\},\\bm\{y\}\\in\\mathcal\{X\}\_\{t\}be two slates which do not get eliminated\. If\|𝒯0\|≥max⁡\{T​\(0\),8​γ2​R2​ρ−1​κ​N\}\|\\mathcal\{T\}\_\{0\}\|\\geq\\max\\\{T\(0\),8\\gamma^\{2\}R^\{2\}\\rho^\{\-1\}\\kappa N\\\}, then, we have

\|\(𝒙−𝒚\)⊤​𝜽^0\|≤2R\.\\left\\lvert\(\\bm\{x\}\-\\bm\{y\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\\leq\\frac\{2\}\{R\}\.

###### Proof\.

Using the triangle inequality, we can write

\|\(𝒙−𝒚\)⊤​𝜽^0\|≤∑i=1N\|\(𝒙i−𝒚i\)⊤​𝜽^0i\|\.\\left\\lvert\(\\bm\{x\}\-\\bm\{y\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\\leq\\sum\_\{i=1\}^\{N\}\\left\\lvert\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\right\\rvert\.Now, since both𝒙\\bm\{x\}and𝒚\\bm\{y\}survive the elimination, their respective components𝒙i\\bm\{x\}^\{i\}and𝒚i\\bm\{y\}^\{i\}for alli∈\[N\]i\\in\[N\]also do not get eliminated\. Thus, for a fixedi∈\[N\]i\\in\[N\], we have

UCBi,0​\(𝒙i\)≥max𝒛∈𝒳ti⁡LCBi,0​\(𝒛i\)≥LCBi,0​\(𝒚i\)\.\\textrm\{UCB\}^\{i,0\}\(\\bm\{x\}^\{i\}\)\\geq\\max\_\{\\bm\{z\}\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}^\{i\}\)\\geq\\textrm\{LCB\}^\{i,0\}\(\\bm\{y\}^\{i\}\)\.Using the definitions ofUCBi,0​\(𝒛\)\\textrm\{UCB\}^\{i,0\}\(\\bm\{z\}\)andLCBi,0​\(𝒛\)\\textrm\{LCB\}^\{i,0\}\(\\bm\{z\}\)\(Section[C\.1](https://arxiv.org/html/2606.31449#A3.SS1)\), we get

\(𝒚i−𝒙i\)⊤​𝜽^0i≤2​κ​γ​∥𝒙i∥\(𝑽0i\)−1\+2​κ​γ​∥𝒚i∥\(𝑽0i\)−1\.\(\\bm\{y\}^\{i\}\-\\bm\{x\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\leq 2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{y\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\.A symmetric argument gives us

\(𝒙i−𝒚i\)⊤​𝜽^0i≤2​κ​γ​∥𝒙i∥\(𝑽0i\)−1\+2​κ​γ​∥𝒚i∥\(𝑽0i\)−1\.\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\leq 2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\+2\\sqrt\{\\kappa\}\\gamma\\lVert\\bm\{y\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}\.Thus, combining both the inequalities, we get

\|\(𝒙i−𝒚i\)⊤​𝜽^0i\|\\displaystyle\\left\\lvert\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\right\\rvert≤4κγmaxz∈𝒳ti∥𝒛∥\(𝑽0i\)−1\\displaystyle\\leq 4\\sqrt\{\\kappa\}\\gamma\\max\_\{z\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\lVert\\bm\{z\}\\rVert\_\{\(\\bm\{V\}^\{i\}\_\{0\}\)^\{\-1\}\}≤4​γ​κ​∑i=1Nmaxz∈𝒳ti⁡∥𝒛∥λmin​\(𝑽i\)\\displaystyle\\leq 4\\gamma\\sqrt\{\\kappa\}\\sum\_\{i=1\}^\{N\}\\max\_\{z\\in\\mathcal\{X\}^\{i\}\_\{t\}\}\\frac\{\\lVert\\bm\{z\}\\rVert\}\{\\sqrt\{\\lambda\_\{\\min\}\(\\bm\{V\}^\{i\}\)\}\}where the final inequality follows from Rayleigh’s quotient\. Now, from Lemma[C\.10](https://arxiv.org/html/2606.31449#A3.Ex314), we have that

λmin​\(𝑽i\)≥λ\+ρ​\|𝒯0\|2\.\\lambda\_\{\\min\}\(\\bm\{V\}^\{i\}\)\\geq\\lambda\+\\frac\{\\rho\|\\mathcal\{T\}\_\{0\}\|\}\{2\}\.Thus, using the fact that∥𝒛∥≤N−1\\lVert\\bm\{z\}\\rVert\\leq\\sqrt\{N^\{\-1\}\}, we have

\|\(𝒙i−𝒚i\)⊤​𝜽^0i\|≤4​γ​κ​Nλ\+0\.5​ρ​\|𝒯0\|\.\\left\\lvert\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\right\\rvert\\leq 4\\gamma\\sqrt\{\\frac\{\\kappa N\}\{\\lambda\+0\.5\\rho\|\\mathcal\{T\}\_\{0\}\|\}\}\.Finally, using the fact that\|𝒯0\|≥8​γ2​R2​ρ−1​κ​N\|\\mathcal\{T\}\_\{0\}\|\\geq 8\\gamma^\{2\}R^\{2\}\\rho^\{\-1\}\\kappa N, we get that

\|\(𝒙i−𝒚i\)⊤​𝜽^0i\|≤2R\.\\left\\lvert\(\\bm\{x\}^\{i\}\-\\bm\{y\}^\{i\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{0\}\\right\\rvert\\leq\\frac\{2\}\{R\}\.∎

###### Lemma C\.7\.

Lett∈𝒯¬0t\\in\\mathcal\{T\}\_\{\\neg 0\}\. Define the optimal slate𝐱t,⋆\\bm\{x\}\_\{t,\\star\}as

𝒙t,⋆=arg​max𝒙∈𝒳t⁡𝒙⊤​𝜽⋆\.\\bm\{x\}\_\{t,\\star\}=\\operatorname\*\{arg\\,max\}\_\{\\bm\{x\}\\in\\mathcal\{X\}\_\{t\}\}\\bm\{x\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\.Let\|𝒯0\|≥max⁡\{T​\(0\),8​γ2​R2​ρ−1​κ​N\}\|\\mathcal\{T\}\_\{0\}\|\\geq\\max\\\{T\(0\),8\\gamma^\{2\}R^\{2\}\\rho^\{\-1\}\\kappa N\\\}and\|𝒯¬0<t\|≥T​\(¬0\)\|\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\|\\geq T\(\\neg 0\), then

μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)≤4​2​e5​β​μ˙​\(𝒙t,⋆⊤​θ⋆\)⋅e−1​μ˙​\(𝒙t⊤​𝜽^0\)​∑i=1N∥𝒙ti∥\(𝑯ti\)−1\.\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\leq 4\\sqrt\{2\}e^\{5\}\\beta\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\\cdot e^\{\-1\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\.

###### Proof\.

Using a first\-order Taylor Series expansion, for somezt∈\[𝒙t⊤​𝜽⋆,𝒙t,⋆⊤​𝜽⋆\]z\_\{t\}\\in\[\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\},\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\], we get

μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)\\displaystyle\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)≤μ˙​\(zt\)​\(𝒙t,⋆⊤​𝜽⋆−𝒙t⊤​𝜽⋆\)\\displaystyle\\leq\\dot\{\\mu\}\(z\_\{t\}\)\\left\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\right\)≤μ˙​\(zt\)​\[\|𝒙t,⋆⊤​\(𝜽⋆−𝜽^s​\(t\)\)\|\+\|𝒙t⊤​\(𝜽⋆−𝜽^s​\(t\)\)\|\+\(𝒙t,⋆−𝒙t\)⊤​𝜽^s​\(t\)\]\\displaystyle\\leq\\dot\{\\mu\}\(z\_\{t\}\)\\left\[\\left\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{s\(t\)\}\)\\right\\rvert\+\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\(\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{s\(t\)\}\)\\right\\rvert\+\(\\bm\{x\}\_\{t,\\star\}\-\\bm\{x\}\_\{t\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{s\(t\)\}\\right\]≤μ˙​\(zt\)​\[β​∥𝒙t,⋆∥𝑯s​\(t\)−1\+β​∥𝒙t∥𝑯s​\(t\)−1\+\(𝒙t,⋆−𝒙t\)⊤​𝜽^s​\(t\)\]\\displaystyle\\leq\\dot\{\\mu\}\(z\_\{t\}\)\\left\[\\beta\\lVert\\bm\{x\}\_\{t,\\star\}\\rVert\_\{\\bm\{H\}\_\{s\(t\)\}^\{\-1\}\}\+\\beta\\lVert\\bm\{x\}\_\{t\}\\rVert\_\{\\bm\{H\}\_\{s\(t\)\}^\{\-1\}\}\+\(\\bm\{x\}\_\{t,\\star\}\-\\bm\{x\}\_\{t\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{s\(t\)\}\\right\]≤μ˙​\(zt\)​\[2​β​∥𝒙t,⋆∥𝑯t−1\+2​β​∥𝒙t∥𝑯t−1\+\(𝒙t,⋆−𝒙t\)⊤​𝜽^s​\(t\)\]\\displaystyle\\leq\\dot\{\\mu\}\(z\_\{t\}\)\\left\[\\sqrt\{2\}\\beta\\lVert\\bm\{x\}\_\{t,\\star\}\\rVert\_\{\\bm\{H\}\_\{t\}^\{\-1\}\}\+\\sqrt\{2\}\\beta\\lVert\\bm\{x\}\_\{t\}\\rVert\_\{\\bm\{H\}\_\{t\}^\{\-1\}\}\+\(\\bm\{x\}\_\{t,\\star\}\-\\bm\{x\}\_\{t\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{s\(t\)\}\\right\]≤μ˙​\(zt\)​∑i=1N\[2​2​β​∥𝒙t,⋆i∥\(𝑯ti\)−1\+2​2​β​∥𝒙ti∥\(𝑯ti\)−1\+\(𝒙t,⋆i−𝒙ti\)⊤​𝜽^s​\(t\)i\]\\displaystyle\\leq\\dot\{\\mu\}\(z\_\{t\}\)\\sum\_\{i=1\}^\{N\}\\left\[2\\sqrt\{2\}\\beta\\lVert\\bm\{x\}^\{i\}\_\{t,\\star\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\+2\\sqrt\{2\}\\beta\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}\_\{t\}^\{i\}\)^\{\-1\}\}\+\(\\bm\{x\}^\{i\}\_\{t,\\star\}\-\\bm\{x\}^\{i\}\_\{t\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{s\(t\)\}\\right\]where the third inequality follows by applying the Cauchy\-Schwarz inequality and Lemma[C\.1](https://arxiv.org/html/2606.31449#A3.Ex247)in tandem, the second\-to\-last inequality follows from Lemma[C\.4](https://arxiv.org/html/2606.31449#A3.Ex261), and the final inequality follows from Lemma[C\.9](https://arxiv.org/html/2606.31449#A3.Ex304)\.

Now, since𝒙t\\bm\{x\}\_\{t\}is chosen, for some fixed sloti∈\[N\]i\\in\[N\], from*Step 17*of Algorithm[2](https://arxiv.org/html/2606.31449#alg2), we have

\(𝒙ti\)⊤​𝜽^s​\(t\)i\+2​2​β​∥𝒙ti∥\(𝑯ti\)−1≥\(𝒙t,⋆i\)⊤​𝜽^s​\(t\)i\+2​2​β​∥𝒙t,⋆i∥\(𝑯ti\)−1\.\(\\bm\{x\}^\{i\}\_\{t\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{s\(t\)\}\+2\\sqrt\{2\}\\beta\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}\_\{t\}^\{i\}\)^\{\-1\}\}\\geq\(\\bm\{x\}^\{i\}\_\{t,\\star\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{s\(t\)\}\+2\\sqrt\{2\}\\beta\\lVert\\bm\{x\}^\{i\}\_\{t,\\star\}\\rVert\_\{\(\\bm\{H\}\_\{t\}^\{i\}\)^\{\-1\}\}\.Rearranging and summing over alli∈\[N\]i\\in\[N\], we get

∑i=1N\(𝒙t,⋆i−𝒙ti\)⊤​𝜽^s​\(t\)i≤2​2​β​∑i=1N∥𝒙ti∥\(𝑯ti\)−1−2​2​β​∑i=1N∥𝒙t,⋆i∥\(𝑯ti\)−1\.\\sum\_\{i=1\}^\{N\}\(\\bm\{x\}^\{i\}\_\{t,\\star\}\-\\bm\{x\}^\{i\}\_\{t\}\)^\{\\top\}\\widehat\{\\bm\{\\theta\}\}^\{i\}\_\{s\(t\)\}\\leq 2\\sqrt\{2\}\\beta\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}\_\{t\}^\{i\}\)^\{\-1\}\}\-2\\sqrt\{2\}\\beta\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\_\{t,\\star\}\\rVert\_\{\(\\bm\{H\}\_\{t\}^\{i\}\)^\{\-1\}\}\.Substituting this back, we get

μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)≤4​2​β​μ˙​\(zt\)​∑i=1N∥𝒙ti∥\(𝑯ti\)−1\.\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\leq 4\\sqrt\{2\}\\beta\\dot\{\\mu\}\(z\_\{t\}\)\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\.Now, using the self\-concordance property of GLMs, we have that

μ˙​\(zt\)≤μ˙​\(𝒙t⊤​𝜽^0\)​exp⁡\(R​\|zt−𝒙t⊤​𝜽^0\|\)\.\\dot\{\\mu\}\(z\_\{t\}\)\\leq\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\exp\\left\(R\\left\\lvert z\_\{t\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\\right\)\.We now bound\|zt−𝒙t⊤​𝜽^0\|\\left\\lvert z\_\{t\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvertas follows:

\|zt−𝒙t⊤​𝜽^0\|\\displaystyle\\left\\lvert z\_\{t\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert≤\|𝒙t⊤​𝜽⋆−𝒙t⊤​𝜽^0\|\+\|𝒙t⊤​𝜽⋆−zt\|\\displaystyle\\leq\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\+\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-z\_\{t\}\\right\\rvert≤\|𝒙t⊤​𝜽⋆−𝒙t⊤​𝜽^0\|\+\|𝒙t⊤​𝜽⋆−𝒙t,⋆⊤​𝜽⋆\|\\displaystyle\\leq\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\+\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\right\\rvert≤\|𝒙t⊤​𝜽⋆−𝒙t⊤​𝜽^0\|\+\|𝒙t⊤​𝜽⋆−𝒙t⊤​𝜽^0\|\+\|𝒙t,⋆⊤​𝜽⋆−𝒙t,⋆⊤​𝜽^0\|\+\|𝒙t⊤​𝜽^0−𝒙t,⋆⊤​𝜽^0\|\\displaystyle\\leq\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\+\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\+\\left\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\+\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert≤5R\\displaystyle\\leq\\frac\{5\}\{R\}where the second inequality follows from the fact thatzt∈\[𝒙t⊤​𝜽⋆,𝒙t,⋆⊤​𝜽⋆\]z\_\{t\}\\in\[\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\},\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\]and the final inequality follows from Lemma[C\.2](https://arxiv.org/html/2606.31449#A3.Ex248), Lemma[C\.5](https://arxiv.org/html/2606.31449#A3.Thmlemma5), and Lemma[C\.6](https://arxiv.org/html/2606.31449#A3.Ex273)\.

Similarly, we have

\|zt−𝒙t,⋆⊤​𝜽⋆\|\\displaystyle\\left\\lvert z\_\{t\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\right\\rvert≤\|𝒙t⊤​𝜽⋆−𝒙t,⋆⊤​𝜽⋆\|\\displaystyle\\leq\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\\right\\rvert≤\|𝒙t⊤​𝜽⋆−𝒙t⊤​𝜽^0\|\+\|𝒙t,⋆⊤​𝜽⋆−𝒙t,⋆⊤​𝜽^0\|\+\|𝒙t⊤​𝜽^0−𝒙t,⋆⊤​𝜽^0\|\\displaystyle\\leq\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\+\\left\\lvert\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert\+\\left\\lvert\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\-\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\\right\\rvert≤4R\.\\displaystyle\\leq\\frac\{4\}\{R\}\.
Thus, we get

μ˙​\(zt\)≤e5​μ˙​\(𝒙t⊤​𝜽^0\)​e4​μ˙​\(𝒙t,⋆⊤​θ⋆\)=e9​μ˙​\(𝒙t⊤​𝜽^0\)​μ˙​\(𝒙t,⋆⊤​θ⋆\)\\dot\{\\mu\}\(z\_\{t\}\)\\leq\\sqrt\{e^\{5\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\\sqrt\{e^\{4\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\}=\\sqrt\{e^\{9\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\}and hence,

μ​\(𝒙t,⋆⊤​𝜽⋆\)−μ​\(𝒙t⊤​𝜽⋆\)≤4​2​e5​β​μ˙​\(𝒙t,⋆⊤​θ⋆\)⋅e−1​μ˙​\(𝒙t⊤​𝜽^0\)​∑i=1N∥𝒙ti∥\(𝑯ti\)−1\.\\mu\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\-\\mu\(\\bm\{x\}\_\{t\}^\{\\top\}\\bm\{\\theta\}^\{\\star\}\)\\leq 4\\sqrt\{2\}e^\{5\}\\beta\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t,\\star\}^\{\\top\}\\theta^\{\\star\}\)\\cdot e^\{\-1\}\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)\}\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\_\{t\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\.∎

###### Lemma C\.8\.

\(Lemma B\.17,\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\) The total number of policy switches executed by Algorithm[2](https://arxiv.org/html/2606.31449#alg2)is𝒪​\(N​d​log⁡T\)\\mathcal\{O\}\(Nd\\log T\)\.

###### Proof\.

The proof follows on the same lines as Lemma B\.17 from\[[28](https://arxiv.org/html/2606.31449#bib.bib4)\]\. However, all the matrices𝑯\\bm\{H\}are nowN​dNd\-dimensional, resulting in a dependence onNN\. ∎

### C\.4Showing Multiplicative Equivalence forRS\-SlateGLinCB

###### Lemma C\.9\.

Let𝐇t\\bm\{H\}\_\{t\}and𝐇ti\\bm\{H\}^\{i\}\_\{t\}be defined as in Section[C\.1](https://arxiv.org/html/2606.31449#A3.SS1)\. DefineT​\(¬0\):=48​Lμ2\+8​Lμ​N​ρ3​ρ2​\(N−1\)2​log⁡\(2​d​N​Tδ\)T\(\\neg 0\):=\\frac\{48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\}\{3\\rho^\{2\}\}\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\. Then, assuming the diversity assumptions \(Section[2](https://arxiv.org/html/2606.31449#S2)\) hold, with high probability, for allttsuch that\|𝒯¬0<t\|≥T​\(¬0\)\|\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\|\\geq T\(\\neg 0\), we have that

14​diag​\(𝑯t1,…,𝑯tN\)⪯𝑯t⪯74​diag​\(𝑯t1,…,𝑯tN\)\.\\frac\{1\}\{4\}\\textrm\{diag\}\(\\bm\{H\}^\{1\}\_\{t\},\\ldots,\\bm\{H\}^\{N\}\_\{t\}\)\\preceq\\bm\{H\}\_\{t\}\\preceq\\frac\{7\}\{4\}\\textrm\{diag\}\(\\bm\{H\}\_\{t\}^\{1\},\\ldots,\\bm\{H\}\_\{t\}^\{N\}\)\.Consequently, for any𝐱=\(𝐱1,…,𝐱N\)\\bm\{x\}=\(\\bm\{x\}^\{1\},\\ldots,\\bm\{x\}^\{N\}\), we have that

∥𝒙∥𝑯𝒕−1≤2​∑i=1N∥𝒙i∥\(𝑯ti\)−1\.\\lVert\\bm\{x\}\\rVert\_\{\\bm\{H\_\{t\}\}^\{\-1\}\}\\leq 2\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1\}\}\.

###### Proof\.

Define𝒙¯t:=μ˙​\(𝒙t⊤​𝜽^0\)​e−1​𝒙t\\overline\{\\bm\{x\}\}\_\{t\}:=\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)e^\{\-1\}\}\\bm\{x\}\_\{t\}and𝒙¯ti:=μ˙​\(𝒙t⊤​𝜽^0\)​e−1​𝒙ti\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}:=\\sqrt\{\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)e^\{\-1\}\}\\bm\{x\}^\{i\}\_\{t\}\. Then, note that∥𝒙¯ti∥≤Lμ​N−1\\lVert\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\\rVert\\leq\\sqrt\{L\_\{\\mu\}N^\{\-1\}\}\.

Also, for the sake of the proof, define𝑼t:=diag​\(𝑯t1,…,𝑯tN\)\\bm\{U\}\_\{t\}:=\\textrm\{diag\}\(\\bm\{H\}^\{1\}\_\{t\},\\ldots,\\bm\{H\}^\{N\}\_\{t\}\)\. Thus, we can write

𝑯t=λ​𝑰N​d\+∑s∈𝒯¬0<t𝒙¯s​𝒙¯s⊤and𝑯ti=λ​𝑰\+∑s∈𝒯¬0<t𝒙¯si​𝒙¯si⊤\.\\bm\{H\}\_\{t\}=\\lambda\\bm\{I\}\_\{Nd\}\+\\sum\_\{s\\in\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\}\\overline\{\\bm\{x\}\}\_\{s\}\\overline\{\\bm\{x\}\}\_\{s\}^\{\\top\}\\quad\\text\{and\}\\quad\\bm\{H\}^\{i\}\_\{t\}=\\lambda\\bm\{I\}\+\\sum\_\{s\\in\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\}\\overline\{\\bm\{x\}\}^\{i\}\_\{s\}\{\\overline\{\\bm\{x\}\}^\{i\}\_\{s\}\}^\{\\top\}\.and using Lemma B\.1 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\], we have that

𝑼t−1/2​𝑯t​𝑼t−1/2=𝑰N​d\+𝑮t\.\\bm\{U\}\_\{t\}^\{\-1/2\}\\bm\{H\}\_\{t\}\\bm\{U\}\_\{t\}^\{\-1/2\}=\\bm\{I\}\_\{Nd\}\+\\bm\{G\}\_\{t\}\.where\(𝑮t\)i​j=𝟙​\{i≠j\}​\(𝑯ti\)−1/2​𝑯t\(i,j\)​\(𝑯tj\)−1/2\(\\bm\{G\}\_\{t\}\)\_\{ij\}=\\mathbbm\{1\}\\\{i\\neq j\\\}\(\\bm\{H\}^\{i\}\_\{t\}\)^\{\-1/2\}\\bm\{H\}\_\{t\}^\{\(i,j\)\}\(\\bm\{H\}^\{j\}\_\{t\}\)^\{\-1/2\}and𝑯t\(i,j\)=∑t∈𝒯¬0<t𝒙¯ti​𝒙¯tj⊤\\bm\{H\}^\{\(i,j\)\}\_\{t\}=\\sum\_\{t\\in\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\}\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\{\\overline\{\\bm\{x\}\}^\{j\}\_\{t\}\}^\{\\top\}\. A straightforward application of Lemma D\.2 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]with the quantitiesm1=m2=Lμ​N−1m\_\{1\}=m\_\{2\}=\\sqrt\{L\_\{\\mu\}N^\{\-1\}\},d1=d2=dd\_\{1\}=d\_\{2\}=dandδ=2​δN​\(N−1\)\\delta=\\frac\{2\\delta\}\{N\(N\-1\)\}, followed by a union bound over alli∈\[N\]i\\in\[N\]andj∈\[i\+1,N\]j\\in\[i\+1,N\]like in Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136), gives us, with high probability, for all pairs of\(i,j\)\(i,j\)wherei<ji<j,

∥𝑯t\(i,j\)∥≤8​Lμ2N2​\|𝒯¬0<t\|​log⁡\(d​N​\(N−1\)δ\)\.\\lVert\\bm\{H\}^\{\(i,j\)\}\_\{t\}\\rVert\\leq\\sqrt\{\\frac\{8L\_\{\\mu\}^\{2\}\}\{N^\{2\}\}\\left\\lvert\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\\right\\rvert\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\.
Now, using the diversity conditions and the definition ofκ\\kappa, we have that

𝔼\[𝒙¯ti​𝒙¯ti⊤∣ℱt−1\]=𝔼\[μ˙​\(𝒙t⊤​𝜽^0\)​e−1​𝒙ti​𝒙ti⊤∣ℱt−1\]⪰ρ​κ−1​e−1​𝑰\.\\operatorname\*\{\\mathbb\{E\}\}\[\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\{\\overline\{\\bm\{x\}\}^\{i\}\_\{t\}\}^\{\\top\}\\mid\\mathcal\{F\}\_\{t\-1\}\]=\\operatorname\*\{\\mathbb\{E\}\}\[\\dot\{\\mu\}\(\\bm\{x\}\_\{t\}^\{\\top\}\\widehat\{\\bm\{\\theta\}\}\_\{0\}\)e^\{\-1\}\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\mid\\mathcal\{F\}\_\{t\-1\}\]\\succeq\\rho\\kappa^\{\-1\}e^\{\-1\}\\bm\{I\}\.Similar to Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136), applying Lemma[D\.1](https://arxiv.org/html/2606.31449#A4.Thmlemma1)using the quantitiesα=κ−1​e−1≤1\\alpha=\\kappa^\{\-1\}e^\{\-1\}\\leq 1,m=Lμ​N−1m=\\sqrt\{L\_\{\\mu\}N^\{\-1\}\},γ=λ\\gamma=\\lambda,δ=δN\\delta=\\frac\{\\delta\}\{N\}andc=0\.5c=0\.5, alongside the fact that\(N−1\)2≥N−2\(N\-1\)^\{2\}\\geq N^\{\-2\}, and followed by a union bound over alli∈\[N\]i\\in\[N\]gives us with high probability,

λmin​\(𝑯ti\)≥λ\+ρ​\|𝒯¬0<t\|2​∀t​such that​\|𝒯¬0<t\|≥T​\(¬0\):=48​Lμ2\+8​Lμ​N​ρ3​ρ2​\(N−1\)2​log⁡\(2​d​N​Tδ\)\.\\lambda\_\{\\min\}\\left\(\\bm\{H\}^\{i\}\_\{t\}\\right\)\\geq\\lambda\+\\frac\{\\rho\|\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\|\}\{2\}\\;\\forall t\\text\{ such that \}\|\\mathcal\{T\}^\{<t\}\_\{\\neg 0\}\|\\geq T\(\\neg 0\):=\\frac\{48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\}\{3\\rho^\{2\}\}\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\.Now, once again, fori∈\[N−1\]i\\in\[N\-1\], define𝒁ti∈ℝd×i​d\\bm\{Z\}\_\{t\}^\{i\}\\in\\mathbb\{R\}^\{d\\times id\}as the following matrix: forj∈\[i\]j\\in\[i\], thejt​hj^\{th\}d×dd\\times dblock of𝒁ti\\bm\{Z\}\_\{t\}^\{i\}is given by\(𝑯tN−i\)−1/2​𝑯t\(N−i,N−i\+j\)​\(𝑯tN−i\+j\)−1/2\(\\bm\{H\}^\{N\-i\}\_\{t\}\)^\{\-1/2\}\\bm\{H\}^\{\(N\-i,N\-i\+j\)\}\_\{t\}\(\\bm\{H\}^\{N\-i\+j\}\_\{t\}\)^\{\-1/2\}\.

Then, using Lemma B\.7 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]and following a similar approach to that shown in Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136), we have that,

∥𝒁ti∥≤∑j∈\[i\]96​log⁡\(d​N​\(N−1\)δ\)N2​\(48​Lμ2\+8​Lμ​N​ρ\)​\(N−1\)2​log⁡\(2​d​N​Tδ\)≤3​i2​N​\(N−1\)\.\\lVert\\bm\{Z\}\_\{t\}^\{i\}\\rVert\\leq\\sum\_\{j\\in\[i\]\}\\sqrt\{\\frac\{96\\log\\left\(\\frac\{dN\(N\-1\)\}\{\\delta\}\\right\)\}\{N^\{2\}\(48L\_\{\\mu\}^\{2\}\+8L\_\{\\mu\}N\\rho\)\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\}\}\\leq\\frac\{3i\}\{2N\(N\-1\)\}\.
Writing𝑮t\\bm\{G\}\_\{t\}as a matrix recurrence relation as in Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136)and using Lemma B\.2 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]gives:

λmax​\(𝑮t\)≤34andλmin​\(𝑮t\)≥−34\.\\lambda\_\{\\max\}\(\\bm\{G\}\_\{t\}\)\\leq\\frac\{3\}\{4\}\\quad\\text\{ and \}\\quad\\lambda\_\{\\min\}\(\\bm\{G\}\_\{t\}\)\\geq\-\\frac\{3\}\{4\}\.Substituting𝑮t=𝑼t−1/2​𝑯t​𝑼t−1/2−𝑰N​d\\bm\{G\}\_\{t\}=\\bm\{U\}\_\{t\}^\{\-1/2\}\\bm\{H\}\_\{t\}\\bm\{U\}\_\{t\}^\{\-1/2\}\-\\bm\{I\}\_\{Nd\}, we get that

14​𝑼t⪯𝑯t⪯74​𝑼t\.\\frac\{1\}\{4\}\\bm\{U\}\_\{t\}\\preceq\\bm\{H\}\_\{t\}\\preceq\\frac\{7\}\{4\}\\bm\{U\}\_\{t\}\.This finishes the proof for the first part\. The second part is exactly the same as in Lemma[A\.12](https://arxiv.org/html/2606.31449#A1.Ex136)\. ∎

###### Lemma C\.10\.

Let𝐕\\bm\{V\}and𝐕i\\bm\{V\}^\{i\}be defined as in Section[C\.1](https://arxiv.org/html/2606.31449#A3.SS1)\. Let\|𝒯0\|≥T​\(0\):=48\+8​N​ρ3​ρ2​\(N−1\)2​log⁡\(2​d​N​Tδ\)\|\\mathcal\{T\}\_\{0\}\|\\geq T\(0\):=\\frac\{48\+8N\\rho\}\{3\\rho^\{2\}\}\(N\-1\)^\{2\}\\log\\left\(\\frac\{2dNT\}\{\\delta\}\\right\)\. Then, assuming the diversity assumptions \(Section[2](https://arxiv.org/html/2606.31449#S2)\) hold, with high probability, we have that

14​diag​\(𝑽1,…,𝑽N\)⪯𝑽⪯74​diag​\(𝑽1,…,𝑽N\)\.\\frac\{1\}\{4\}\\textrm\{diag\}\(\\bm\{V\}^\{1\},\\ldots,\\bm\{V\}^\{N\}\)\\preceq\\bm\{V\}\\preceq\\frac\{7\}\{4\}\\textrm\{diag\}\(\\bm\{V\}^\{1\},\\ldots,\\bm\{V\}^\{N\}\)\.Consequently, for any𝐱=\(𝐱1,…,𝐱N\)\\bm\{x\}=\(\\bm\{x\}^\{1\},\\ldots,\\bm\{x\}^\{N\}\), we have that

∥𝒙∥𝑽−1≤2​∑i=1N∥𝒙i∥\(𝑽i\)−1\.\\lVert\\bm\{x\}\\rVert\_\{\\bm\{V\}^\{\-1\}\}\\leq 2\\sum\_\{i=1\}^\{N\}\\lVert\\bm\{x\}^\{i\}\\rVert\_\{\(\\bm\{V\}^\{i\}\)^\{\-1\}\}\.

###### Proof\.

The proof is the same as that of Lemma[A\.13](https://arxiv.org/html/2606.31449#A1.Ex158)\. ∎

### C\.5Other Relevant Lemmas

###### Lemma C\.11\.

\(Elliptical Potential Lemma, Lemma 10\[[1](https://arxiv.org/html/2606.31449#bib.bib23)\]\) Let\{𝐱s\}s∈\[t\]\\\{\\bm\{x\}\_\{s\}\\\}\_\{s\\in\[t\]\}be a sequence of vectors inℝd\\mathbb\{R\}^\{d\}such that∥𝐱∥≤L\\lVert\\bm\{x\}\\rVert\\leq Lfor alls∈\[t\]s\\in\[t\]\. Define

𝑽s=λ​𝑰\+∑m=1s−1𝒙m​𝒙m⊤\\bm\{V\}\_\{s\}=\\lambda\\bm\{I\}\+\\sum\_\{m=1\}^\{s\-1\}\\bm\{x\}\_\{m\}\\bm\{x\}\_\{m\}^\{\\top\}whereλ≥L2\\lambda\\geq L^\{2\}\. Then, we have

det𝑽t≤\(λ\+t​L2d\)dand∑s∈\[t\]∥𝒙s∥𝑽s−12≤2​d​log⁡\(1\+L2​tλ​d\)\.\\det\\bm\{V\}\_\{t\}\\leq\\left\(\\lambda\+\\frac\{tL^\{2\}\}\{d\}\\right\)^\{d\}\\quad\\text\{ and \}\\quad\\sum\_\{s\\in\[t\]\}\\lVert\\bm\{x\}\_\{s\}\\rVert^\{2\}\_\{\\bm\{V\}\_\{s\}^\{\-1\}\}\\leq 2d\\log\\left\(1\+\\frac\{L^\{2\}t\}\{\\lambda d\}\\right\)\.

###### Lemma C\.12\.

\(Lemma 12,\[[1](https://arxiv.org/html/2606.31449#bib.bib23)\]\) Let𝐀⪰𝐁≻𝟎\\bm\{A\}\\succeq\\bm\{B\}\\succ\\bm\{0\}\. Then, we have that

sup𝒙≠𝟎𝒙⊤​𝑨​𝒙𝒙⊤​𝑩​𝒙≤det𝑨det𝑩\.\\sup\_\{\\bm\{x\}\\neq\\bm\{0\}\}\\frac\{\\bm\{x\}^\{\\top\}\\bm\{A\}\\bm\{x\}\}\{\\bm\{x\}^\{\\top\}\\bm\{B\}\\bm\{x\}\}\\leq\\frac\{\\det\\bm\{A\}\}\{\\det\\bm\{B\}\}\.

## Appendix DDemonstrating Linear Growth of Eigenvalues of the Design Matrices

We first recall the diversity assumptions from Section[2](https://arxiv.org/html/2606.31449#S2):

Defineℱt−1=σ​\(𝒙1,r1,…​𝒙t−1,rt−1\)\\mathcal\{F\}\_\{t\-1\}=\\sigma\(\\bm\{x\}\_\{1\},r\_\{1\},\\ldots\\bm\{x\}\_\{t\-1\},r\_\{t\-1\}\)\. For all slotsi∈\[N\]i\\in\[N\]and someρ\>0\\rho\>0, we have that

E​\[𝒙ti∣ℱt−1\]=𝟎and𝔼\[𝒙ti​𝒙ti⊤∣ℱt−1\]⪰ρ​𝑰\.E\[\\bm\{x\}^\{i\}\_\{t\}\\mid\\mathcal\{F\}\_\{t\-1\}\]=\\bm\{0\}\\quad\\text\{and\}\\quad\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{x\}^\{i\}\_\{t\}\{\\bm\{x\}^\{i\}\_\{t\}\}^\{\\top\}\\mid\\mathcal\{F\}\_\{t\-1\}\]\\succeq\\rho\\bm\{I\}\.
Next, we state a generalization of Lemma D\.1 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]and give a proof for the same:

###### Lemma D\.1\.

\(Generalization of Lemma D\.1,\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]\) Let\{𝐱s\}s∈\[T\]\\\{\\bm\{x\}\_\{s\}\\\}\_\{s\\in\[T\]\}be a stochastic process inℝd\\mathbb\{R\}^\{d\}such that𝔼\[𝐱s∣ℱs−1\]=𝟎\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{x\}\_\{s\}\\mid\\mathcal\{F\}\_\{s\-1\}\]=\\bm\{0\}and𝔼\[𝐱s​𝐱s⊤∣ℱs−1\]⪰α​ρ​I\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\\mid\\mathcal\{F\}\_\{s\-1\}\]\\succeq\\alpha\\rho I, whereα∈\(0,1\]\\alpha\\in\(0,1\]\. Let∥𝐱s∥2≤m​∀s∈\[T\]\\lVert\\bm\{x\}\_\{s\}\\rVert\_\{2\}\\leq m\\;\\forall s\\in\[T\]\. Also, define the matrix

𝑸t=γ​𝑰\+∑s=1t𝒙s​𝒙s⊤\.\\bm\{Q\}\_\{t\}=\\gamma\\bm\{I\}\+\\sum\_\{s=1\}^\{t\}\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\.Then, with probability at least1−δ1\-\\delta, we have that

λmin​\(𝑸t\)≥γ\+c​ρ​t\\lambda\_\{\\min\}\(\\bm\{Q\}\_\{t\}\)\\geq\\gamma\+c\\rho tforc∈\(0,1\)c\\in\(0,1\)andt∈\(12​m4\+4​\(1−c\)​m2​ρ3​\(1−c\)2​ρ2​log⁡\(2​d​Tδ\),T\)t\\in\\left\(\\frac\{12m^\{4\}\+4\(1\-c\)m^\{2\}\\rho\}\{3\(1\-c\)^\{2\}\\rho^\{2\}\}\\log\\left\(\\frac\{2dT\}\{\\delta\}\\right\),T\\right\)\.

###### Proof\.

DefineΣC:=𝔼\[𝒙s​𝒙s⊤∣ℱs−1\]⪰α​ρ​𝑰\\Sigma\_\{C\}:=\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\\mid\\mathcal\{F\}\_\{s\-1\}\]\\succeq\\alpha\\rho\\bm\{I\}\. Also, define the matrix martingale𝒁t=∑s∈\[t\]𝒙s​𝒙s⊤−t​ΣC\\bm\{Z\}\_\{t\}=\\sum\_\{s\\in\[t\]\}\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\-t\\Sigma\_\{C\}and𝒁0=0\\bm\{Z\}\_\{0\}=0\. Finally, define the martingale difference sequence𝑿s=𝒁s−𝒁s−1\\bm\{X\}\_\{s\}=\\bm\{Z\}\_\{s\}\-\\bm\{Z\}\_\{s\-1\}for alls≥1s\\geq 1\.

Then, we have that∥ΣC∥≤m2\\lVert\\Sigma\_\{C\}\\rVert\\leq m^\{2\}, and∥𝑿s∥=∥𝒙s​𝒙s⊤\+ΣC∥≤2​m2\\lVert\\bm\{X\}\_\{s\}\\rVert=\\lVert\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\+\\Sigma\_\{C\}\\rVert\\leq 2m^\{2\}\. A calculation similar to Lemma D\.1 from\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]shows that

∑s∈\[t\]∥𝔼\[𝑿s⊤​𝑿s∣ℱs−1\]∥=∑s∈\[t\]∥𝔼\[𝑿s​𝑿s⊤∣ℱs−1\]∥≤2​m4​t\.\\sum\\limits\_\{s\\in\[t\]\}\\lVert\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{X\}\_\{s\}^\{\\top\}\\bm\{X\}\_\{s\}\\mid\\mathcal\{F\}\_\{s\-1\}\]\\rVert=\\sum\\limits\_\{s\\in\[t\]\}\\lVert\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{X\}\_\{s\}\\bm\{X\}\_\{s\}^\{\\top\}\\mid\\mathcal\{F\}\_\{s\-1\}\]\\rVert\\leq 2m^\{4\}t\.Thus, an application of Lemma[D\.2](https://arxiv.org/html/2606.31449#A4.Ex329)with the quantitiesd1=d2=dd\_\{1\}=d\_\{2\}=d,R=2​m2R=2m^\{2\},w=m2​2​tw=m^\{2\}\\sqrt\{2t\}, andu=\(1−c\)​ρ​tu=\(1\-c\)\\rho tfor somec∈\(0,1\)c\\in\(0,1\)results in

ℙ​\{‖∑s∈\[t\]𝒙s​𝒙s⊤−t​ΣC‖≥\(1−c\)​ρ​t\}≤2​d​exp⁡\(−3​\(1−c\)2​ρ2​t212​m4​t\+4​\(1−c\)​m2​ρ​t\)\\mathbb\{P\}\\left\\\{\\left\\lVert\\sum\_\{s\\in\[t\]\}\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\-t\\Sigma\_\{C\}\\right\\rVert\\geq\(1\-c\)\\rho t\\right\\\}\\leq 2d\\exp\\left\(\-\\frac\{3\(1\-c\)^\{2\}\\rho^\{2\}t^\{2\}\}\{12m^\{4\}t\+4\(1\-c\)m^\{2\}\\rho t\}\\right\)Thus, for allt≥T0:=12​m4\+4​\(1−c\)​m2​ρ3​\(1−c\)2​ρ2​log⁡\(2​d​Tδ\)t\\geq T\_\{0\}:=\\frac\{12m^\{4\}\+4\(1\-c\)m^\{2\}\\rho\}\{3\(1\-c\)^\{2\}\\rho^\{2\}\}\\log\\left\(\\frac\{2dT\}\{\\delta\}\\right\), we have that with probability at least1−δT1\-\\frac\{\\delta\}\{T\},

‖∑s∈\[t\]𝒙s​𝒙s⊤−t​ΣC‖≤\(1−c\)​ρ​t\.\\left\\lVert\\sum\_\{s\\in\[t\]\}\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\-t\\Sigma\_\{C\}\\right\\rVert\\leq\(1\-c\)\\rho t\.Now, using the fact that∥𝑨∥=λmax​\(𝑨\)\\lVert\\bm\{A\}\\rVert=\\lambda\_\{\\max\}\(\\bm\{A\}\)andλmax​\(−𝑨\)=−λmin​\(𝑨\)\\lambda\_\{\\max\}\(\-\\bm\{A\}\)=\-\\lambda\_\{\\min\}\(\\bm\{A\}\), we have that

∥𝑨−𝑩∥=∥\(−𝑨\)−\(−𝑩\)∥≥\|λmax​\(−𝑨\)−λmax​\(−𝑩\)\|=\|λmin​\(𝑨\)−λmin​\(𝑩\)\|\\lVert\\bm\{A\}\-\\bm\{B\}\\rVert=\\lVert\(\-\\bm\{A\}\)\-\(\-\\bm\{B\}\)\\rVert\\geq\\lvert\\lambda\_\{\\max\}\(\-\\bm\{A\}\)\-\\lambda\_\{\\max\}\(\-\\bm\{B\}\)\\rvert=\\lvert\\lambda\_\{\\min\}\(\\bm\{A\}\)\-\\lambda\_\{\\min\}\(\\bm\{B\}\)\\rvertand thus, we can write that with probability1−δT1\-\\frac\{\\delta\}\{T\},

\(1−c\)​ρ​t≥t​λmin​\(ΣC\)−λmin​\(∑s∈\[t\]𝒙s​𝒙s⊤\)\.\(1\-c\)\\rho t\\geq t\\lambda\_\{\\min\}\(\\Sigma\_\{C\}\)\-\\lambda\_\{\\min\}\\left\(\\sum\_\{s\\in\[t\]\}\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\\right\)\.Rearranging results in

λmin​\(∑s∈\[t\]𝒙s​𝒙s⊤\)≥t​λmin​\(ΣC\)−\(1−c\)​ρ​t≥α​ρ​t−\(1−c\)​ρ​t≥c​ρ​t\\lambda\_\{\\min\}\\left\(\\sum\_\{s\\in\[t\]\}\\bm\{x\}\_\{s\}\\bm\{x\}\_\{s\}^\{\\top\}\\right\)\\geq t\\lambda\_\{\\min\}\(\\Sigma\_\{C\}\)\-\(1\-c\)\\rho t\\geq\\alpha\\rho t\-\(1\-c\)\\rho t\\geq c\\rho twhere the last inequality uses the fact thatα∈\(0,1\]\\alpha\\in\(0,1\]\. A union bound over allt∈\[T0,T\]t\\in\[T\_\{0\},T\]finishes the proof\. ∎

###### Lemma D\.2\.

\(Matrix Freedman Inequality\) Define a matrix martingale𝐙s∈ℝd1×d2\\bm\{Z\}\_\{s\}\\in\\mathbb\{R\}^\{d\_\{1\}\\times d\_\{2\}\}with respect to the filtrationℱs\\mathcal\{F\}\_\{s\}and the corresponding martingale difference sequence𝐗s=𝐙s−𝐙s−1\\bm\{X\}\_\{s\}=\\bm\{Z\}\_\{s\}\-\\bm\{Z\}\_\{s\-1\}\. Assume that the difference sequence is uniformly bounded a\.s\., i\.e,∥𝐗s∥≤R\\lVert\\bm\{X\}\_\{s\}\\rVert\\leq R\. Define

𝑾r​o​w,t:=∑s∈\[t\]𝔼\[𝑿s​𝑿s⊤∣ℱs−1\]\.\\bm\{W\}\_\{row,t\}:=\\sum\_\{s\\in\[t\]\}\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{X\}\_\{s\}\\bm\{X\}\_\{s\}^\{\\top\}\\mid\\mathcal\{F\}\_\{s\-1\}\]\.𝑾c​o​l,t:=∑s∈\[t\]𝔼\[𝑿s⊤​𝑿s∣ℱs−1\]\.\\bm\{W\}\_\{col,t\}:=\\sum\_\{s\\in\[t\]\}\\operatorname\*\{\\mathbb\{E\}\}\[\\bm\{X\}\_\{s\}^\{\\top\}\\bm\{X\}\_\{s\}\\mid\\mathcal\{F\}\_\{s\-1\}\]\.Then, for allu≥0u\\geq 0andw2\>0w^\{2\}\>0, we have that

ℙ​\{∃t≥0:∥𝒁t∥≥u​and​max⁡\{∥𝑾r​o​w,t∥,∥𝑾c​o​l,t∥\}≤w2\}≤\(d1\+d2\)​exp⁡\(−3​u26​w2\+2​R​u\)\.\\mathbb\{P\}\\\{\\exists t\\geq 0:\\lVert\\bm\{Z\}\_\{t\}\\rVert\\geq u\\text\{ and \}\\max\\\{\\lVert\\bm\{W\}\_\{row,t\}\\rVert,\\lVert\\bm\{W\}\_\{col,t\}\\rVert\\\}\\leq w^\{2\}\\\}\\leq\(d\_\{1\}\+d\_\{2\}\)\\exp\\left\(\-\\frac\{3u^\{2\}\}\{6w^\{2\}\+2Ru\}\\right\)\.

## Appendix EAdditional Experiments and Experimental Setup

In this section, we detail the implementation of all our algorithms and baselines\. Then, we provide some additional experiments, building upon the setup in Section[5](https://arxiv.org/html/2606.31449#S5)\.

While the implementations ofRS\-GLinCBandRS\-MNLare publicly available, at each time round, they iterate over the set of slates\. Hence, we speed up the implementation usingnp\.einsum\. Also, forRS\-MNL, we set the number of outcomes to 1, corresponding to the logistic setting\[[21](https://arxiv.org/html/2606.31449#bib.bib8)\]\. We also implement a version ofSoftBatch, and to lift it to the GLM setting, we replace the least squares estimate of the parameter with an MLE estimate\. Also, sinceq=O​\(T−log⁡T\)q=O\(T^\{\-\\log T\}\), we setqqas the machine epsilon\.888For our device, the value ofqqis set to be1\.1920929×10−71\.1920929\\times 10^\{\-7\}\.ForB\-SlateGLinCB, we double the batch lengths and the number of batches, i\.e, we calculate the batch lengths as

𝒯m=⌊2​T1−2−m⁣/⁣/2⌋\\mathcal\{T\}\_\{m\}=\\lfloor 2T^\{1\-2^\{\-m//2\}\}\\rfloorwherea//ba//brepresents integer division\. Note that the number of batches is still𝒪​\(log⁡log⁡T\)\\mathcal\{O\}\(\\log\\log T\)and the regret only scales by a constant factor\. This allows for a better estimate of the parameter in the initial batches, while increasing the number and speed of updates during the later stages\. Finally, for the higher\-dimensional experiments \(ExperimentE2; explained next\), we limit the number of optimization steps forRS\-SlateGLinCBto 25, with the hope that limiting the convergence of policy updates is offset by the higher number of updates\.

![Refer to caption](https://arxiv.org/html/2606.31449v1/x6.png)\(a\)Experiment SettingE1
![Refer to caption](https://arxiv.org/html/2606.31449v1/x7.png)\(b\)Experiment SettingE2
![Refer to caption](https://arxiv.org/html/2606.31449v1/x8.png)\(c\)Experiment SettingE3

Figure 4:Comparison with limited adaptivity algorithms,SoftBatch,RS\-MNL, andRS\-GLinCBWe now explain the experimental setup\. At eacht∈\[T\]t\\in\[T\], for each sloti∈\[N\]i\\in\[N\], the set of items𝒳ti⊂ℝd\\mathcal\{X\}^\{i\}\_\{t\}\\subset\\mathbb\{R\}^\{d\}, is chosen such that\|𝒳ti\|=K=5\|\\mathcal\{X\}^\{i\}\_\{t\}\|=K=5andd=5d=5\. Each item in𝒳ti\\mathcal\{X\}\_\{t\}^\{i\}is sampled from\[−1,1\]5\[\-1,1\]^\{5\}and is normalized to haveℓ2\\ell\_\{2\}\-norm1/N1/\\sqrt\{N\}whereNNvaries depending on the experiment setting\. We randomly select𝜽⋆\\bm\{\\theta\}^\{\\star\}from\[−1,1\]N​d\[\-1,1\]^\{Nd\}and normalize it to haveℓ2\\ell\_\{2\}\-normSS\. For our algorithms, we setδ=1/N2\\delta=1/N^\{2\}, which puts it in the range\[0\.004,0\.04\]\[0\.004,0\.04\]for the values ofNNused\. For the baselines, we use the default values ofδ\\deltaprovided in the corresponding implementation, which are of the same order as ours\. We now explain the choice of the experimental settings:

1. E1:We setS=2S=2andN=5N=5, resulting inKN=3125K^\{N\}=3125slates with dimensionN​d=25Nd=25andκ≤eS≈7\.38\\kappa\\leq e^\{S\}\\approx 7\.38\. We run our algorithms forT∈\{5000,10000,15000,20000\}T\\in\\\{5000,10000,15000,20000\\\}rounds and display the results in Figure[4\(a\)](https://arxiv.org/html/2606.31449#A5.F4.sf1)\.
2. E2:We setS=5S=5andN=10N=10, resulting inKN=9765725K^\{N\}=9765725slates with dimensionN​d=50Nd=50andκ≤eS≈150\\kappa\\leq e^\{S\}\\approx 150\. We run our algorithms forT∈\{5000,10000,15000,20000\}T\\in\\\{5000,10000,15000,20000\\\}rounds\. In this experiment, we do not compare toRS\-GLinCBandRS\-MNLsince these algorithms are not well\-suited for large action spaces\. We display the results in Figure[4\(b\)](https://arxiv.org/html/2606.31449#A5.F4.sf2)\.
3. E3:We setS=5S=5andN=5N=5, resulting inKN=3125K^\{N\}=3125slates with dimensionN​d=25Nd=25andκ≤eS≈150\\kappa\\leq e^\{S\}\\approx 150\. We run our algorithms forT∈\{2000,4000,6000,8000\}T\\in\\\{2000,4000,6000,8000\\\}rounds and display the results in Figure[4\(c\)](https://arxiv.org/html/2606.31449#A5.F4.sf3)\.

We average all the results over 25 different seeds for sampling rewards and display the results in Figure[4](https://arxiv.org/html/2606.31449#A5.F4)\. In all three settings, we see that our algorithms achieve sublinear regret and outperform the other baselines by a significant margin\. These results also provide strong empirical support for ourκ\\kappa\-free regret guarantees in Theorem[A\.1](https://arxiv.org/html/2606.31449#A1.Thmtheorem1)and Theorem[C\.1](https://arxiv.org/html/2606.31449#A3.Ex222)\. Also, we see that the regret ofRS\-SlateGLinCBis better thanB\-SlateGLinCBin all the settinfs, which can possibly be attributed to better constants, as well as, thed\\sqrt\{d\}gap between the bounds in our theorems\.

## Appendix FB\-SlateGLinCB\+: Additional Observations and Insights

In this section, we build upon the observations and insights forB\-SlateGLinCB\+, which was first introduced in Section[5](https://arxiv.org/html/2606.31449#S5)\. We first explain some empirical observations, which motivated the design of this algorithm\. Then, we highlight the major differences betweenB\-SlateGLinCBandB\-SlateGLinCB\+\.

![Refer to caption](https://arxiv.org/html/2606.31449v1/x9.png)\(a\)Experiment settingE1
![Refer to caption](https://arxiv.org/html/2606.31449v1/x10.png)\(b\)Experiment settingE2
![Refer to caption](https://arxiv.org/html/2606.31449v1/x11.png)\(c\)Experiment settingE3
![Refer to caption](https://arxiv.org/html/2606.31449v1/x12.png)\(d\)Experiment settingE4

Figure 5:Comparison with fully sequential slate bandit algorithmSlate\-GLM\-OFUIn Figure[5](https://arxiv.org/html/2606.31449#A6.F5), we compare our algorithmsB\-SlateGLinCBandRS\-SlateGLinCBto the fully sequentialSlate\-GLM\-OFU\(Algorithm 1,\[[12](https://arxiv.org/html/2606.31449#bib.bib5)\]\)\. We retain the same experimental setup as in Appendix[E](https://arxiv.org/html/2606.31449#A5), and add an experiment described below:

1. E4:We setS=5S=5andN=15N=15, resulting inKN=30517578125K^\{N\}=30517578125slates with dimensionN​d=75Nd=75andκ≤eS≈150\\kappa\\leq e^\{S\}\\approx 150\. We run the algorithms forT∈\{5000,10000,15000,20000\}T\\in\\\{5000,10000,15000,20000\\\}and display the results in Figure[5\(d\)](https://arxiv.org/html/2606.31449#A6.F5.sf4)\.

In Figure[5](https://arxiv.org/html/2606.31449#A6.F5), we see that the gap betweenRS\-SlateGLinCBandSlate\-GLM\-OFUis very small, even though we would expectSlate\-GLM\-OFUto have much better regret because of its fully sequential nature\. However, there remains a significant gap between the regrets incurred byB\-SlateGLinCBandSlate\-GLM\-OFU\. This raises the question of whether we can develop an algorithm that performs𝒪​\(log⁡log⁡T\)\\mathcal\{O\}\(\\log\\log T\)updates and can compete with the likes ofRS\-SlateGLinCBandSlate\-GLM\-OFU\. Based on our empirical observations, we modifyB\-SlateGLinCBto obtainB\-SlateGLinCB\+, which is also a batched algorithm that performs𝒪​\(log⁡log⁡T\)\\mathcal\{O\}\(\\log\\log T\)updates and incurs regret similar to that ofSlate\-GLM\-OFU\. The modifications made are completely based on empirical observations and heuristics, which we explain in the next paragraph, and leave the theoretical analysis of this algorithm as an interesting future direction\.

InB\-SlateGLinCB, we notice that the estimate of the parameter𝜽⋆\\bm\{\\theta\}^\{\\star\}improves throughout the course of the algorithm, that is,∥𝜽⋆−𝜽^m∥2\\lVert\\bm\{\\theta\}^\{\\star\}\-\\widehat\{\\bm\{\\theta\}\}\_\{m\}\\rVert\_\{2\}strictly decreases as the algorithm progresses\. We also observe that often, the optimal items in each slot get eliminated, especially in the initial rounds of pruning\. Thus, we hypothesize that the optimal items get eliminated in the initial rounds of pruning because the estimates of the parameter are often suboptimal\. Clearly, the estimates of𝜽⋆\\bm\{\\theta\}^\{\\star\}learned during the later batches are better representative of𝜽⋆\\bm\{\\theta\}^\{\\star\}\. Hence, the later estimates should carry more weight in deciding the set of items retained after elimination\. This opens the gate to several algorithms with different elimination techniques to achieve the exploration\-exploitation tradeoff, such as weighted majority\-like strategies with a higher weight given to the more recent batches, or eliminating items from the item\-set with respect to only the lastm′<mm^\{\\prime\}<mbatches\.

Thus, inB\-SlateGLinCB\+, we setm′=1m^\{\\prime\}=1, i\.e, we eliminate items with respect to only the most recent estimate, and the scaling slate𝒃t\\bm\{b\}\_\{t\}is chosen from this set of remaining items\. We do not prune with respect to the warmup estimate𝜽^0\\widehat\{\\bm\{\\theta\}\}\_\{0\}anymore\. Further, at each policy update, we allow the algorithm to use all the previous data seen during the course of the algorithm, unlikeB\-SlateGLinCB, where the algorithm only uses the data from the corresponding batch\. This improves the quality of the estimation of the parameter\. We present the empirical performance ofB\-SlateGLinCB\+in Figure[5](https://arxiv.org/html/2606.31449#A6.F5), and see that the algorithm incurs sublinear regret, and closely matches the regrets incurred byRS\-SlateGLinCBandSlate\-GLM\-OFU\. An interesting future direction is to study the constraints under which we can prove strong regret bounds for such heuristics\.

## Appendix GPrompt Tuning Experiments

In this section, we explain the experimental set up of our prompt tuning experiments\.

We use RoBERTa\-large\[[20](https://arxiv.org/html/2606.31449#bib.bib25)\]as the base model and Nomic\-Embed\-Text\-v1\.5\[[22](https://arxiv.org/html/2606.31449#bib.bib27)\]as the embedding model for all our experiments\. At each roundtt, the algorithm is presented with a queryqtq\_\{t\}andNN\(different\) sets consisting ofKKcandidate examples each\. Each set of candidate examples corresponds to one exemplar \(*slot*\) in the prompt \(*slate*\)\. At roundtt, we denote thejt​hj^\{th\}candidate example for slotiias\(eti​j,lti​j\)\(e\_\{t\}^\{ij\},l\_\{t\}^\{ij\}\), whereeedenotes the example, andl∈\{0,1\}l\\in\\\{0,1\\\}denotes the true label for the example\.

We now describe the construction of the arm\-sets\{𝒳ti\}i∈\[N\]\\\{\\mathcal\{X\}^\{i\}\_\{t\}\\\}\_\{i\\in\[N\]\}at each roundtt\. For a slotii, the feature vector for thekt​hk^\{th\}candidate example\(eti​k,lti​k\)\(e^\{ik\}\_\{t\},l^\{ik\}\_\{t\}\)is denoted as\(j,l,c\)\(j,l,c\), wherej,lj,landccare the three components, as described in Section[5](https://arxiv.org/html/2606.31449#S5)\. We describe them in greater detail here\.jjdenotes the joint embedding of the queryqtq\_\{t\}and the candidate exampleeti​ke^\{ik\}\_\{t\},llis the example’s label, i\.e,l=lti​kl=l^\{ik\}\_\{t\}, andc=\(c1,c2\)c=\(c\_\{1\},c\_\{2\}\)represents a pair of scores that measure the similarity between the query and the example\. Here,c1c\_\{1\}measures the N\-gram similarity between the query and the example sentence\. This is done by calculating the cosine similarity between the bag\-of\-character\-N\-gram vectors of the query and the example\.c2c\_\{2\}represents the similarity score between the query and the example in the embedding space, which is calculated as the cosine similarity between the embeddings of the two\.

We chooseNN, the number of exemplars per prompt, to be66, andKK, the number of candidate examples provided to each slot at each round as99\. The prompt instruction and format are fixed apriori and are given below:

Prompt TemplateIn this task, you are given sentences from movie reviews\. The task is to classify a sentence as ”great” if the sentiment of the sentence is positive or as ”terrible” if the sentiment of the sentence is negative\. Following are some examples to help you:Review1:Sentiment1:Review2:Sentiment2:Review3:Sentiment3:Review4:Sentiment4:Review5:Sentiment5:Review6:Sentiment6:Query:Sentiment:<<mask\>\>

To demonstrate the long\-horizon capability of our algorithm, we augment our test set by including an additional40004000queries sampled from the training set, resulting in a total of48704870queries\. These additional queries are sampled prior to the construction of the candidate example sets, and hence, we ensure that none of these40004000queries appear in any of the candidate example sets provided to the algorithm across the horizon\. We also ensure there is no further training and instead report the cumulative average accuracy over the entire time horizon\.

We compare our results to the following baselines: \(i\) the base model with no exemplars, \(ii\) the base model, where the exemplars are chosen randomly at each round, and \(iii\)Slate\-GLM\-OFU\. The random allocation baseline chooses 6 examples from the pool of exemplars with equal probability\. We note that all the hyperparameters including embedding size,NN,KK, embedding dimensions, as well as, the methodology to select queries, exemplar pools, and the construction of arm\-sets is fixed across all baselines\. We plot the cumulative average accuracy of all algorithms and display the results in Figure[3](https://arxiv.org/html/2606.31449#S5.F3)\.

## Appendix HEmpirical Verification of Linear Growth of Eigenvalues forB\-SlateGLinCB,RS\-SlateGLinCB, andB\-SlateGLinCB\+

In this section, we empirically validate that the minimum eigenvalue for the design matrices indeed grow \(near\-\) linearly over the course of the algorithms\. We choose the number of slotsN=4N=4, the dimension of items in each slotd=5d=5, and the number of items per slotK=5K=5, resulting in a total ofKN=625K^\{N\}=625slates with dimensionN​d=20Nd=20\. At each roundt∈\[T\]t\\in\[T\]and each sloti∈\[N\]i\\in\[N\], the item\-sets𝒳ti\\mathcal\{X\}^\{i\}\_\{t\}are chosen in a manner similar to the one described in Section[5](https://arxiv.org/html/2606.31449#S5)\. Similarly, the optimal parameter𝜽⋆\\bm\{\\theta\}^\{\\star\}is also sampled in a manner similar to the one described in Section[5](https://arxiv.org/html/2606.31449#S5)\. We choose theℓ2−\\ell\_\{2\}\-norm of𝜽⋆\\bm\{\\theta\}^\{\\star\}to beS=2S=2\. All the algorithms are run forT=10000T=10000for 50 different seeds for sampling rewards\.

We display the results forB\-SlateGLinCBin Figure[6](https://arxiv.org/html/2606.31449#A8.F6), forRS\-SlateGLinCBin Figure[7](https://arxiv.org/html/2606.31449#A8.F7), and forB\-SlateGLinCB\+in Figure[8](https://arxiv.org/html/2606.31449#A8.F8)\. Throughout, the black dotted lines represent the transition between batches \(or in the case ofRS\-SlateGLinCB, the end of the warmup phase\)\. The blue graphs represent the minimum eigenvalues of the design matrices𝑽i\\bm\{V\}^\{i\}during the warmup phases \(since the𝑽\\bm\{V\}matrices are only updated during this phase\), while the red graphs represent the minimum eigenvalues of the Hessian matrices𝑯i\\bm\{H\}^\{i\}during the corresponding phase of the algorithm\. ForRS\-SlateGLinCB, in Figure[7](https://arxiv.org/html/2606.31449#A8.F7), the first row represents a zoomed\-in version of the warmup phase \(since the slope of the growth of the blue graph is low\), while the second row represents the growth of both𝑽\\bm\{V\}and𝑯\\bm\{H\}during the course of the algorithm\. From all the graphs, we see that the growth of the minimum eigenvalues appears to be \(near\)\-linear, thus, validating the conclusion we draw from the Diversity Assumptions \(Definition[2\.1](https://arxiv.org/html/2606.31449#S2.Thmassumption1)\)\.

![Refer to caption](https://arxiv.org/html/2606.31449v1/x13.png)\(a\)Slot 1
![Refer to caption](https://arxiv.org/html/2606.31449v1/x14.png)\(b\)Slot 2
![Refer to caption](https://arxiv.org/html/2606.31449v1/x15.png)\(c\)Slot 3
![Refer to caption](https://arxiv.org/html/2606.31449v1/x16.png)\(d\)Slot 4

Figure 6:B\-SlateGLinCB![Refer to caption](https://arxiv.org/html/2606.31449v1/x17.png)\(a\)Slot 1
![Refer to caption](https://arxiv.org/html/2606.31449v1/x18.png)\(b\)Slot 2
![Refer to caption](https://arxiv.org/html/2606.31449v1/x19.png)\(c\)Slot 3
![Refer to caption](https://arxiv.org/html/2606.31449v1/x20.png)\(d\)Slot 4
![Refer to caption](https://arxiv.org/html/2606.31449v1/x21.png)\(e\)Slot 1
![Refer to caption](https://arxiv.org/html/2606.31449v1/x22.png)\(f\)Slot 2
![Refer to caption](https://arxiv.org/html/2606.31449v1/x23.png)\(g\)Slot 3
![Refer to caption](https://arxiv.org/html/2606.31449v1/x24.png)\(h\)Slot 4

Figure 7:RS\-SlateGLinCB![Refer to caption](https://arxiv.org/html/2606.31449v1/x25.png)\(a\)Slot 1
![Refer to caption](https://arxiv.org/html/2606.31449v1/x26.png)\(b\)Slot 2
![Refer to caption](https://arxiv.org/html/2606.31449v1/x27.png)\(c\)Slot 3
![Refer to caption](https://arxiv.org/html/2606.31449v1/x28.png)\(d\)Slot 4

Figure 8:B\-SlateGLinCB\+

Similar Articles

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

arXiv cs.LG

This paper studies piecewise-stationary low-rank linear contextual bandits, proposes the SPSC algorithm that achieves dynamic regret scaling with the intrinsic rank instead of the ambient dimension, and characterizes the identification boundary for subspace recovery under scalar feedback.

Online Pandora's Box for Contextual LLM Cascading

arXiv cs.AI

This paper introduces an online contextual Pandora's Box model for adaptively querying and selecting LLM APIs, proposing a learning approach that combines GMM estimation with UCB-style confidence bounds and proving dimension-dependent regret bounds.