Contextual Bandits for Maximizing Stimulated Word-of-Mouth Rewards

arXiv cs.LG 06/16/26, 04:00 AM Papers
Summary
This paper presents a contextual multi-armed bandit framework that learns individual spillover probabilities in social networks to optimize stimulated word-of-mouth marketing, achieving higher rewards by targeting connected users.
arXiv:2606.15146v1 Announce Type: new Abstract: Stimulated word-of-mouth is a strategy that promotes information sharing through prompts or incentives. Optimizing stimulated word-of-mouth through social networks requires identifying and targeting connected users who are most susceptible to spillover, a phenomenon where the influence of recommendations extends beyond the immediate audience to impact their connected users. The probability of spillover varies across individuals, and their connections, leading to heterogeneity. Understanding and accurately estimating the spillover probabilities among users in social networks is crucial for improving the effectiveness of stimulated word-of-mouth. To address this, we present a novel contextual multi-armed bandit framework that learns individual spillover probabilities and ranks connected users to maximize rewards from stimulated word-of-mouth. Experiments on real-world network datasets demonstrate that accounting for spillover heterogeneity enhances the targeting precision of top-$k$ connected users, boosting rewards and outperforming baseline methods that do not learn individual spillover effects.
Original Article
View Cached Full Text
Cached at: 06/16/26, 11:38 AM
# Contextual Bandits for Maximizing Stimulated Word-of-Mouth Rewards
Source: [https://arxiv.org/html/2606.15146](https://arxiv.org/html/2606.15146)
###### Abstract

Stimulated word\-of\-mouth is a strategy that promotes information sharing through prompts or incentives\. Optimizing stimulated word\-of\-mouth through social networks requires identifying and targeting connected users who are most susceptible to spillover, a phenomenon where the influence of recommendations extends beyond the immediate audience to impact their connected users\. The probability of spillover varies across individuals, and their connections, leading to heterogeneity\. Understanding and accurately estimating the spillover probabilities among users in social networks is crucial for improving the effectiveness of stimulated word\-of\-mouth\. To address this, we present a novel contextual multi\-armed bandit framework that learns individual spillover probabilities and ranks connected users to maximize rewards from stimulated word\-of\-mouth\. Experiments on real\-world network datasets demonstrate that accounting for spillover heterogeneity enhances the targeting precision of top\-kkconnected users, boosting rewards and outperforming baseline methods that do not learn individual spillover effects\.

## Introduction

Stimulated word\-of\-mouth involves intentionally encouraging consumers to share information about a product or service, often using prompts or incentives provided by companies or marketers\. Unlike organic word\-of\-mouth, which occurs naturally, stimulated word\-of\-mouth is facilitated through mechanisms such as rewards, discounts, or referral bonuses\. Loyalty programs offering points for referrals exemplify this strategy\. The proliferation of social networks has amplified the potential of stimulated word\-of\-mouth by simplifying the process of engaging and influencing consumers\. For example, Alibaba integrates Sina Weibo \(Chinese social network\) into its e\-commerce ecosystem, reflecting a trend where social platforms and marketplaces converge to enhance consumer influence and connectivity\(Zhaoet al\.[2022](https://arxiv.org/html/2606.15146#bib.bib45)\)\.

Network spillover occurs when an action, such as a recommendation, on one user affects other users in a network, spreading information, behaviors, or attitudes\. The impact of spillover is observed in numerous recommendation networks, such as social media, e\-commerce, or content platforms\(Hollebeeket al\.[2023](https://arxiv.org/html/2606.15146#bib.bib26); Kuanget al\.[2019](https://arxiv.org/html/2606.15146#bib.bib28); Xuet al\.[2022](https://arxiv.org/html/2606.15146#bib.bib25)\)\. These spillover effects are heterogeneous, varying with users’ network positions, connections, and individual traits\. For example, in a social network, sharing a recommendation can result in adoption by some friends and in lack of interest by others\. Predicting and leveraging these heterogeneous spillover effects is crucial for optimizing stimulated word\-of\-mouth in networks\. Here, we present preliminary work which leverages contextual multi\-armed bandits \(CMAB\) for estimating such effects\.

Contextual multi\-armed bandits consider the scenario where decision\-makers can observe the context of each possible recommendation \(i\.e\., the arm\) and infer the rewards of choosing a particular arm by observing the rewards of sequential recommendations\. However, few works explore CMAB in networks where the recommendations can affect not only the direct object of recommendation but also its neighbors through spillover\(Faruk and Zheleva[2023](https://arxiv.org/html/2606.15146#bib.bib31); Iacobet al\.[2022](https://arxiv.org/html/2606.15146#bib.bib32); Vaswaniet al\.[2017](https://arxiv.org/html/2606.15146#bib.bib33); Wenet al\.[2017](https://arxiv.org/html/2606.15146#bib.bib34); Wilderet al\.[2018](https://arxiv.org/html/2606.15146#bib.bib35)\)\. While there is research that integrates CMAB with recommendation\-dependent heterogeneous spillover in networks, it assumes that spillover probabilities are known\(Faruk and Zheleva[2023](https://arxiv.org/html/2606.15146#bib.bib31)\)\. In contrast, our focus is on learning these probabilities\.

The current literature on stimulated word\-of\-mouth overlooks the potential of leveraging the heterogeneous spillover effect, leading to sub\-optimal performance and user satisfaction\(Gaoet al\.[2022](https://arxiv.org/html/2606.15146#bib.bib46)\)\. To address this limitation, it is necessary to develop models that can learn and adapt to the varying spillover probabilities across different users in a network\. This paper presents a novel framework to learn heterogeneous spillover probabilities in networks with the goal of recommending to each user the top\-kkneighbors that are most likely to adopt the recommendation that the user themselves adopted\. By advancing our understanding of these effects, we aim to enhance the performance of recommendation systems, delivering more effective and personalized user experiences while maximizing the rewards of stimulated word\-of\-mouth\.

## Related Work

Link weight prediction focuses on estimating the numerical value \(e\.g\., strength, capacity\) associated with an edge in a network\(Fuet al\.[2018](https://arxiv.org/html/2606.15146#bib.bib47)\)\. Previous studies have investigated learning influence probabilities\(Goyalet al\.[2010](https://arxiv.org/html/2606.15146#bib.bib37)\)\. However, these concepts differ from the spillover probability considered in this work, which is tied to specific recommendations and spreads from the recommended user to its neighbor\. In contrast, link weight or influence probabilities may exist independently of any recommendation\.

Learning edge probabilities in networks has been studied using bandits framework for problems such as probabilistic maximum coverage and social influence maximization in viral marketing\(Chenet al\.[2013](https://arxiv.org/html/2606.15146#bib.bib19)\)\. However, most bandit research on networks\(Iacobet al\.[2022](https://arxiv.org/html/2606.15146#bib.bib32); Vaswaniet al\.[2017](https://arxiv.org/html/2606.15146#bib.bib33); Wenet al\.[2017](https://arxiv.org/html/2606.15146#bib.bib34); Wilderet al\.[2018](https://arxiv.org/html/2606.15146#bib.bib35)\)do not account for spillover effects\.Faruk and Zheleva \([2023](https://arxiv.org/html/2606.15146#bib.bib31)\)incorporate spillover effects in a bandit framework but assume that the spillover probabilities are given, without addressing the challenge of learning them\.

While extensive research exists on top\-kkrecommendation problems\(Jamali and Ester[2009](https://arxiv.org/html/2606.15146#bib.bib16); Songet al\.[2015](https://arxiv.org/html/2606.15146#bib.bib22); Yanget al\.[2012](https://arxiv.org/html/2606.15146#bib.bib24)\), limited attention has been given to identifying top\-kkneighbors in social networks for stimulated word\-of\-mouth\.Aramayoet al\.\([2023](https://arxiv.org/html/2606.15146#bib.bib17)\)investigated the top\-kkad recommendation problem using a CMAB framework\. However, their approach cannot be applied to the top\-kkneighbor recommendation problem, as they assume a fixed number of possible ads, whereas users in networks have varying numbers of neighbors\. A triad\-based word\-of\-mouth recommendation model for socialized e\-commerce focuses on recommending top\-kkrelevant products to a user for sharing with all neighbors\(Gaoet al\.[2022](https://arxiv.org/html/2606.15146#bib.bib46)\)\. In contrast, our work focuses on recommending top\-kkneighbors for sharing the same product\.

## Problem Description

### Data model

We represent the data as a bidirected attributed networkG=\(V,E\)G=\(V,E\), whereV=\{v1,v2,…,vn\}V=\\\{v\_\{1\},v\_\{2\},\\ldots,v\_\{n\}\\\}is the set ofnnnodes, andE=\{eij∣1≤i,j≤n\}E=\\\{e\_\{ij\}\\mid 1\\leq i,j\\leq n\\\}is the set of edges, witheije\_\{ij\}denoting an edge between nodesviv\_\{i\}andvjv\_\{j\}\. The neighborhood of a nodeviv\_\{i\}is defined as𝒩i=\{vj∈V∣eij∈E\}\\mathcal\{N\}\_\{i\}=\\\{v\_\{j\}\\in V\\mid e\_\{ij\}\\in E\\\}, and eachvj∈𝒩iv\_\{j\}\\in\\mathcal\{N\}\_\{i\}is a neighbor ofviv\_\{i\}\. Each nodeviv\_\{i\}is associated with add\-dimensional attribute vector, denoted byvi\.Xv\_\{i\}\.X\. Letvi\.y∈\{0,1\}v\_\{i\}\.y\\in\\\{0,1\\\}denote the activation status of nodeviv\_\{i\}\. Specifically,vi\.y=1v\_\{i\}\.y=1indicates thatviv\_\{i\}is activated, whilevi\.y=0v\_\{i\}\.y=0indicates no activation\.

### Direct recommendation and spillover

Direct recommendation refers to a system recommending a product directly to a targeted node in a network\. A node activated due to such a recommendation is referred to as a system\-activated node\. Spillover occurs when a system\-activated node impacts the activation of its neighbors by sharing the product\. We assume that each user can only try to activatekkneighbors \(e\.g\., due to the cost incurred by the user or the structure of the system\-imposed sharing incentive\) and the system’s goal is to aid the user in picking the neighbors which are most likely to activate\. Each edgeeij∈Ee\_\{ij\}\\in Eis associated with a spillover probabilityeij\.p∈\[0,1\]e\_\{ij\}\.p\\in\[0,1\], which represents the probability of activating nodevjv\_\{j\}due to spillover from system\-activated nodeviv\_\{i\}\. The spillover probability can be asymmetric, i\.e\.,eij\.p≠eji\.pe\_\{ij\}\.p\\neq e\_\{ji\}\.p, reflecting heterogeneity in spillover\.

###### Problem 1

Spillover probability prediction for top\-kkneighbor recommendationGiven an attributed networkG=\(V,E\)G=\(V,E\), learneij\.pe\_\{ij\}\.poverTTrounds such that the rewards associated with recommending the top\-kkneighbors of nodeviv\_\{i\}are maximized afterTTrounds\.

For each system\-activated nodeviv\_\{i\}, the system’s objective is to rankviv\_\{i\}’s neighbors by their spillover probabilitieseij\.pe\_\{ij\}\.p, and select the top\-kknodes to maximize the total rewards\. Recommending the top\-kkneighbors based on the learnedeij\.pe\_\{ij\}\.pmaximizes spillover rewards\.

## Contextual bandits that learn individual spillover probabilities

We consider a stochastic22\-armed contextual bandit setup overTTrounds\. The set of arms is denoted as𝒜=\{a0,a1\}\\mathcal\{A\}=\\\{a\_\{0\},a\_\{1\}\\\}\. At each roundi∈\{1,2,…,T\}i\\in\\\{1,2,\\ldots,T\\\}, a nodevi∈Vv\_\{i\}\\in Varrives, and the learning agent observes the attributes ofviv\_\{i\}and its neighborsvj∈𝒩iv\_\{j\}\\in\\mathcal\{N\}\_\{i\}\. Without loss of generality, we only consider the nodes that the system was able to activate with its recommendations\. The attribute vectoreij\.Xe\_\{ij\}\.Xis referred to as the context vector for edgeeij∈Ee\_\{ij\}\\in E\. It includesvi\.Xv\_\{i\}\.X,vj\.Xv\_\{j\}\.Xand other relevant information, such as the product attributes\. Each edgeeije\_\{ij\}is associated with two possible arms:a0a\_\{0\}anda1a\_\{1\}, wherea0a\_\{0\}corresponds to not recommending anda1a\_\{1\}corresponds to recommending the neighboring nodevjv\_\{j\}to nodeviv\_\{i\}\. We denote the assigned arm witheij\.te\_\{ij\}\.tfor edgeeije\_\{ij\}\. A reward is generated for recommending a neighboring node, corresponding to the activation of an inactive neighborvjv\_\{j\}due to spillover from the system\-activated nodeviv\_\{i\}\. For example, a reward of 1 is assigned to the system\-activated node if a neighboring node is activated due to spillover; otherwise, the reward is 0\. The rewards are cumulative when multiple neighbors are activated\. We assume that the reward for an actioneij\.ae\_\{ij\}\.aon edgeeije\_\{ij\}is a function of the context vectoreij\.Xe\_\{ij\}\.Xand the action that needs to be learned:R\(eij\.X,eij\.a\)=F\(eij\.X,eij\.a\)\+ξijR\(e\_\{ij\}\.X,e\_\{ij\}\.a\)=F\(e\_\{ij\}\.X,e\_\{ij\}\.a\)\+\\xi\_\{ij\}, whereξij\\xi\_\{ij\}is zero\-mean noise\. The system recommends the top\-kkneighbors based on the highest predicted rewards ofa1a\_\{1\}arm,R\(eij\.X,a1\)R\(e\_\{ij\}\.X,a\_\{1\}\)\. The estimated spillover probability corresponds to the probability that the rewardR\(eij\.X,a1\)R\(e\_\{ij\}\.X,a\_\{1\}\)coming from neighborvjv\_\{j\}is11\. The predicted set of top\-kkneighbors is defined asNi←arg⁡topkvjR\(eij\.X,a1\)N\_\{i\}\\leftarrow\\underset\{v\_\{j\}\}\{\\arg top\_\{k\}\}R\(e\_\{ij\}\.X,a\_\{1\}\)\. We define the optimal set of top\-kkneighbors asNi∗←arg⁡topkvjR\(eij∗\.X,a1\)N^\{\*\}\_\{i\}\\leftarrow\\underset\{v\_\{j\}\}\{\\arg top\_\{k\}\}R\(e\_\{ij\}^\{\*\}\.X,a\_\{1\}\), whereeij∗e\_\{ij\}^\{\*\}s yield maximum expected rewards for neighborvjv\_\{j\}at roundii\. Therefore, the cumulative regret of bandit learning overTTrounds is formulated as:Regret←∑i=1T\(∑a∈Ni∗R\(eia\.X,a1\)−∑b∈NiR\(eib\.X,a1\)\)Regret\\leftarrow\\sum\_\{i=1\}^\{T\}\(\\sum\_\{a\\in N^\{\*\}\_\{i\}\}R\(e\_\{ia\}\.X,a\_\{1\}\)\-\\sum\_\{b\\in N\_\{i\}\}R\(e\_\{ib\}\.X,a\_\{1\}\)\)\. The objective of the contextual bandit is to learn a strategy for selecting the top\-kkneighbors such that the expected regret is minimized\.

The pseudo\-code of our framework is included in Algorithm11\. The framework leverages CMAB algorithms to learn spillover probability for top\-kkneighbor recommendations which operate in two distinct phases: exploration and exploitation\. The exploration phase aims to diversify edge selections, which lasts for the initialz%z\\%of the totalTTrounds\. In this phase,kkneighbors are randomly selected and recommended to the system\-activated node\. After the exploration phase, the algorithm transitions to the exploitation phase\. Here, the top\-kkneighbors are recommended based on the descending order of their estimated reward for arma1a\_\{1\}, as predicted by the CMAB algorithm\.

## Experiments

In this section, we evaluate the performance ofSpillCBSpillCBon real\-world network datasets\.

### Data representation

#### Real\-world datasets\.

We consider two real\-world attributed network datasets: Flickr111All datasets available at https://renchi\.ac\.cn/datasets/and Facebook[1](https://arxiv.org/html/2606.15146#footnote1)\. We consider the concatenation ofvi\.Xv\_\{i\}\.Xandvj\.Xv\_\{j\}\.Xas the attribute vector for each edgeeij∈Ee\_\{ij\}\\in E\. The spillover probabilityeij\.pe\_\{ij\}\.pfor each edgeeij∈Ee\_\{ij\}\\in Eis synthetically generated as a weighted function of cosine similarity\(CS\), level similarity\(LS\), topological overlap\(TO\), diffusion importance\(DI\), and degree product\(DP\), following\(Qianet al\.[2017](https://arxiv.org/html/2606.15146#bib.bib21)\):

eij\.p←f\(w0\+w1∗CS\(vi,vj\)\+w2∗LS\(vi,vj\)\+w3∗TO\(vi,vj\)\+w4∗DI\(vi,vj\)\+w5∗DP\(vi,vj\)\)e\_\{ij\}\.p\\leftarrow f\(w\_\{0\}\+w\_\{1\}\*CS\(v\_\{i\},v\_\{j\}\)\+w\_\{2\}\*LS\(v\_\{i\},v\_\{j\}\)\+\\\\ w\_\{3\}\*TO\(v\_\{i\},v\_\{j\}\)\+w\_\{4\}\*DI\(v\_\{i\},v\_\{j\}\)\+w\_\{5\}\*DP\(v\_\{i\},v\_\{j\}\)\)wherew∼𝒩\(5,2\)w\\sim\\mathcal\{N\}\(5,2\)are the weights sampled from a normal distribution, andffis a min\-max scaling function for normalization\.

Algorithm 1Learning heterogeneous spillover to maximize word\-of\-mouth rewards \(SpillCBSpillCB\)1:Input:Number of rounds

TT, off\-the\-shelf CMAB \(e\.g\., LinUCB\) hyperparameters, exploration ratio

zz,

kk
2:Output:Top\-

kkneighbor recommendations for each node

viv\_\{i\}that gets activated by the system in each round

ii
3:foreach arm

a∈𝒜a\\in\\mathcal\{A\}do

4:Initialize arm parameters

5:endfor

6:foreach round

i∈\{1,2,3,…,T\}i\\in\\\{1,2,3,\\ldots,T\\\}do

7:A node

viv\_\{i\}arrives with context vectors for edges

eije\_\{ij\},

\{eij,a\.X\}a∈𝒜\\\{e\_\{ij,a\}\.X\\\}\_\{a\\in\\mathcal\{A\}\}and gets activated by the system

8:Estimate expected rewards of each edge

eije\_\{ij\}for

a1a\_\{1\},

E\[R\(eij\.X,a1\)\]E\[R\(e\_\{ij\}\.X,a\_\{1\}\)\]using off\-the\-shelf CMAB algorithm

9:if

i≤z∗Ti\\leq z\*Tthen⊳\\trianglerightExploration phase

10:Find

Ni=arg⁡randomkvjE\[R\(eij\.X,a1\)\]N\_\{i\}=\\underset\{v\_\{j\}\}\{\\arg random\_\{k\}\}E\[R\(e\_\{ij\}\.X,a\_\{1\}\)\]
11:else⊳\\trianglerightExploitation phase

12:Find

Ni=arg⁡topkvjE\[R\(eij\.X,a1\)\]N\_\{i\}=\\underset\{v\_\{j\}\}\{\\arg top\_\{k\}\}E\[R\(e\_\{ij\}\.X,a\_\{1\}\)\]
13:endif

14:for

vj∈Niv\_\{j\}\\in N\_\{i\}do

15:

eij\.t←a1e\_\{ij\}\.t\\leftarrow a\_\{1\}
16:Record

R\(eij\.X,a1\)R\(e\_\{ij\}\.X,a\_\{1\}\); update parameters for

a1\{a\}\_\{1\}
17:endfor

18:endfor

![Refer to caption](https://arxiv.org/html/2606.15146v1/Image/k-variation/flickr_recall.png)

![Refer to caption](https://arxiv.org/html/2606.15146v1/Image/k_5/flickr_recall.png)

![Refer to caption](https://arxiv.org/html/2606.15146v1/Image/k-variation/facebook_recall.png)

![Refer to caption](https://arxiv.org/html/2606.15146v1/Image/k_5/facebook_recall.png)

![Refer to caption](https://arxiv.org/html/2606.15146v1/Image/k-variation/flickr_rmse.png)

![Refer to caption](https://arxiv.org/html/2606.15146v1/Image/k_5/flickr_rmse.png)

![Refer to caption](https://arxiv.org/html/2606.15146v1/Image/k-variation/facebook_rmse.png)

![Refer to caption](https://arxiv.org/html/2606.15146v1/Image/k_5/facebook_rmse.png)

Figure 1:Comparison of cumulativeRecall@kRecall@kandRMSERMSE\. Columns11and33show the metrics in the final round for varying k\. Columns22and44show changes in the metrics overTTrounds fork=10k=10\.

### Evaluation metrics

We evaluate the performance of theSpillCBSpillCBusing the following metrics:

#### Recall@kk

The bandit recall,Recall@kRecall@kquantifies theSpillCBSpillCB’s ability to correctly identify top\-kkneighbors\. It is defined as the ratio of correctly predicted neighbors belonging to the true top\-kkneighbors to the total number of true top\-kkneighbors, i\.e\.,Recall@k=∑i=1T\|Ni∩Ni∗\|\|Ni∗\|Recall@k=\\sum\_\{i=1\}^\{T\}\\frac\{\|N\_\{i\}\\cap N^\{\*\}\_\{i\}\|\}\{\|N^\{\*\}\_\{i\}\|\}

#### Root Mean Squared Error \(RMSERMSE\)

To evaluate the accuracy of the learned spillover probabilities, we compute theRMSERMSEon a randomly sampled subset of edges,E¯⊂E\\overline\{E\}\\subset Efrom the network\. In each roundii,RMSERMSEis calculated as:RMSE=∑exy∈E¯\(e^xy\.p−exy\.p\)2\|S\|RMSE=\\sqrt\{\\frac\{\\sum\_\{e\_\{xy\}\\in\\overline\{E\}\}\{\(\\hat\{e\}\_\{xy\}\.p\-e\_\{xy\}\.p\)^\{2\}\}\}\{\|S\|\}\}wheree^xy\.p=g\(E\[R\(exy\.X,a1\)\]\)\\hat\{e\}\_\{xy\}\.p=g\(E\[R\(e\_\{xy\}\.X,a\_\{1\}\)\]\)is the estimated spillover probability derived from the estimated rewardsE\[R\(exy\.X,a1\)\]E\[R\(e\_\{xy\}\.X,a\_\{1\}\)\]andggis a function to convert the estimated rewards to a probability\. The function calculateszz\-score over the set of edgesE¯\\overline\{E\}in each roundiiand then uses a standard normal distribution’s cumulative distribution function to convert thezz\-score to a probability\.

### Main algorithms and baselines

We evaluate the performance of our proposed algorithm by comparing it with several baseline approaches:

#### Random

This baseline randomly selectskkneighbors for recommendation to each system\-activated node\.

#### Similarity\-based \(Similarity\)

This method recommends neighbors based on their proximity to the system\-activated node\. The top\-kkneighbors are chosen based on the ascending order of the Euclidean distance between the attribute vectors of the system\-activated node and its neighbors\.

#### Ridge regression \(Ridge\)

This method ranks and recommends the top\-kkneighbors based on link weight prediction scores obtained via ridge regression\. All existing edgesEEare labeled with 1, while\|E\|\|E\|non\-existing edges are randomly sampled and labeled with 0\. Each edge, whether existing or non\-existing, is represented by a concatenated attribute vector of its two nodes\. A ridge regression model is trained on these labeled edges to generate link weight prediction scores\. The top\-kkneighbors of a system\-activated node are recommended based on the descending order of their edge scores\.

#### GraphSage\.

This approach utilizes node embeddings generated from the GraphSAGE algorithm\(Hamiltonet al\.[2017](https://arxiv.org/html/2606.15146#bib.bib39)\)\. The approach is similar to the Ridge method, except that the concatenated embeddings are passed through a neural network\-based scoring function to generate link weight scores for each edge in the network\.

#### SpillCBCMAB\{\}\_\{\\textbf\{CMAB\}\}: z%\.

This is our proposed framework, and we consider several variants\. We denote theSpillCBSpillCBasSpillCBLinUCBSpillCB\_\{LinUCB\}:z%z\\%andSpillCBEENetSpillCB\_\{EENet\}:z%z\\%when the underlying off\-the\-shelf CMAB algorithm is LinUCB\(Liet al\.[2010](https://arxiv.org/html/2606.15146#bib.bib40)\)and EENet\(Banet al\.[2022](https://arxiv.org/html/2606.15146#bib.bib43)\), respectively\. We also vary the amount of explorationz%z\\%\.

### Experimental setup

Prior to the start of a bandit experiment, we setvi\.y=0v\_\{i\}\.y=0for all nodesvi∈Vv\_\{i\}\\in V, andeij\.t=0e\_\{ij\}\.t=0for all edgeseij∈Ee\_\{ij\}\\in E\. In each round, we consider the system activates each arrival node and all of its neighbors are inactive\. Therefore, our experiments allow re\-activation, enabling each node to be activated by multiple neighbors\. All experiments are repeated55times, and we report the average results across these repetitions\. To computeRMSERMSEin each round, we sample a fixed subset of edges with size\|S\|=500\|S\|=500\. We setα=0\.5\\alpha=0\.5for both Ridge and LinUCB methods\. For EE\-Net\(Banet al\.[2022](https://arxiv.org/html/2606.15146#bib.bib43)\), we follow their default setup and perform a grid search for the learning rate over\{0\.001,0\.01,0\.1\}\\\{0\.001,0\.01,0\.1\\\}in their exploitation neural network\. The optimal parameters for each dataset are selected from the grid search results\. We experiment withz∈\{0,20\}z\\in\\\{0,20\\\}and refer to these configurations asSpillCBCMABSpillCB\_\{CMAB\}:0%0\\%andSpillCBCMABSpillCB\_\{CMAB\}:20%20\\%, respectively\. We report the correspondingRecall@kRecall@kandRMSERMSEfork=10k=10across all rounds in Fig[1](https://arxiv.org/html/2606.15146#Sx5.F1)\. To understand the impact varyingkk, experiments are conducted fork∈\{5,10,15,20,25\}k\\in\\\{5,10,15,20,25\\\}\. Results forRecall@kRecall@kandRMSERMSEare reported in the final round of the bandit experiments in Figure[1](https://arxiv.org/html/2606.15146#Sx5.F1)\.

### Experimental results

As Figure[1](https://arxiv.org/html/2606.15146#Sx5.F1)shows, theRecall@kRecall@kof all methods increases askkincreases which is expected\. Our proposed methods,SpillCBLinUCBSpillCB\_\{LinUCB\}andSpillCBEENetSpillCB\_\{EENet\}, consistently outperform baseline approaches inRecall@kRecall@kacross all values ofkkon both datasets\. A largerkkleads to lowerRMSERMSEbecause it allows for the exploration of more edges, thereby accelerating the learning process\. For instance, theRMSERMSEdecreases by8\.72%8\.72\\%and9\.31%9\.31\\%on the Flickr and Facebook datasets, respectively, askkincreases from55to2525inSpillCBLinUCBSpillCB\_\{LinUCB\}:0%0\\%\. Similarly,RMSERMSEreductions of10\.04%10\.04\\%and5\.47%5\.47\\%are observed on the Flickr and Facebook datasets, respectively, forSpillCBEENetSpillCB\_\{EENet\}:0%0\\%\.

As the number of rounds \(TT\) increases,Recall@10Recall@10improves, while the correspondingRMSERMSEdecreases inSpillCBSpillCBfor both datasets\. This trend occurs because more edges are explored over rounds, enabling more efficient bandit learning\. Moreover, increasing the exploration phase \(zz\) results in a reduction inRMSERMSE, albeit sometimes at the expense of lowerRecall@10Recall@10inSpillCBSpillCB\. For instance, when the exploration phase increases from0%0\\%to20%20\\%, the cumulative averageRMSERMSEin the final round decreases by2\.23%2\.23\\%forSpillCBLinUCBSpillCB\_\{LinUCB\}on the Flickr dataset and by8\.04%8\.04\\%forSpillCBEENetSpillCB\_\{EENet\}on the Facebook dataset\. However, this comes with corresponding decreases of4\.2%4\.2\\%and1\.73%1\.73\\%inRecall@10Recall@10\.

These results highlight the trade\-offs between the exploration and exploitation phases in our framework and emphasize the robustness of balancing these factors to optimize performance\.

## Conclusion and Future Work

We present a novel contextual bandit framework,SpillCBSpillCB, designed to model and learn heterogeneous spillover probabilities in networks\. Our approach leverages these probabilities to optimize word\-of\-mouth rewards by recommending top\-kkneighbors\. Our preliminary results look promising\. Our next steps include considering more datasets and baselines to further understand the strengths and limitations of the proposed approach\.

## Acknowledgments

This research was funded in part by NSF under grant no\. 2047899 and DARPA under contract number HR001121C0168\.

## References

- N\. Aramayo, M\. Schiappacasse, and M\. Goic \(2023\)A multiarmed bandit approach for house ads recommendations\.Marketing Science42\(2\),pp\. 271–292\.Cited by:[Related Work](https://arxiv.org/html/2606.15146#Sx2.p3.6)\.
- Y\. Ban, Y\. Yan, A\. Banerjee, and J\. He \(2022\)EE\-net: exploitation\-exploration neural networks in contextual bandits\.InInternational Conference on Learning Representations,Cited by:[SpillCBCMAB\{\}\_\{\\textbf\{CMAB\}\}: z%\.](https://arxiv.org/html/2606.15146#Sx5.SSx3.SSSx5.p1.6),[Experimental setup](https://arxiv.org/html/2606.15146#Sx5.SSx4.p1.21)\.
- W\. Chen, Y\. Wang, and Y\. Yuan \(2013\)Combinatorial multi\-armed bandit: general framework and applications\.InInternational conference on machine learning,pp\. 151–159\.Cited by:[Related Work](https://arxiv.org/html/2606.15146#Sx2.p2.1)\.
- A\. S\. Faruk and E\. Zheleva \(2023\)Leveraging heterogeneous spillover effects in maximizing contextual bandit rewards\.arXiv preprint arXiv:2310\.10259\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p3.1),[Related Work](https://arxiv.org/html/2606.15146#Sx2.p2.1)\.
- C\. Fu, M\. Zhao, L\. Fan, X\. Chen, J\. Chen, Z\. Wu, Y\. Xia, and Q\. Xuan \(2018\)Link weight prediction using supervised learning methods and its application to yelp layered network\.IEEE Transactions on Knowledge and Data Engineering30\(8\),pp\. 1507–1518\.Cited by:[Related Work](https://arxiv.org/html/2606.15146#Sx2.p1.1)\.
- C\. Gao, C\. Huang, D\. Yu, H\. Fu, T\. Lin, D\. Jin, and Y\. Li \(2022\)Item recommendation for word\-of\-mouth scenario in social e\-commerce\.IEEE Transactions on Knowledge and Data Engineering34\(6\),pp\. 2798–2809\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p4.1),[Related Work](https://arxiv.org/html/2606.15146#Sx2.p3.6)\.
- A\. Goyal, F\. Bonchi, and L\. V\. Lakshmanan \(2010\)Learning influence probabilities in social networks\.InProceedings of the third ACM international conference on Web search and data mining,pp\. 241–250\.Cited by:[Related Work](https://arxiv.org/html/2606.15146#Sx2.p1.1)\.
- W\. Hamilton, Z\. Ying, and J\. Leskovec \(2017\)Inductive representation learning on large graphs\.Advances in neural information processing systems30\.Cited by:[GraphSage\.](https://arxiv.org/html/2606.15146#Sx5.SSx3.SSSx4.p1.1)\.
- L\. D\. Hollebeek, V\. Kulikovskaja, M\. Hubert, and K\. G\. Grunert \(2023\)Exploring a customer engagement spillover effect on social media: the moderating role of customer conscientiousness\.Internet Research33\(4\),pp\. 1573–1596\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p2.1)\.
- A\. Iacob, B\. Cautis, and S\. Maniu \(2022\)Contextual bandits for advertising campaigns: a diffusion\-model independent approach\.InProceedings of the 2022 SIAM International Conference on Data Mining \(SDM\),pp\. 513–521\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p3.1),[Related Work](https://arxiv.org/html/2606.15146#Sx2.p2.1)\.
- M\. Jamali and M\. Ester \(2009\)Using a trust network to improve top\-n recommendation\.InProceedings of the third ACM conference on Recommender systems,pp\. 181–188\.Cited by:[Related Work](https://arxiv.org/html/2606.15146#Sx2.p3.6)\.
- L\. Kuang, N\. Huang, Y\. Hong, and Z\. Yan \(2019\)Spillover effects of financial incentives on non\-incentivized user engagement: evidence from an online knowledge exchange platform\.Journal of Management Information Systems36\(1\),pp\. 289–320\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p2.1)\.
- L\. Li, W\. Chu, J\. Langford, and R\. E\. Schapire \(2010\)A contextual\-bandit approach to personalized news article recommendation\.InProceedings of the 19th international conference on World wide web,pp\. 661–670\.Cited by:[SpillCBCMAB\{\}\_\{\\textbf\{CMAB\}\}: z%\.](https://arxiv.org/html/2606.15146#Sx5.SSx3.SSSx5.p1.6)\.
- Y\. Qian, Y\. Li, M\. Zhang, G\. Ma, and F\. Lu \(2017\)Quantifying edge significance on maintaining global connectivity\.Scientific reports7\(1\),pp\. 45380\.Cited by:[Real\-world datasets\.](https://arxiv.org/html/2606.15146#Sx5.SSx1.SSSx1.p1.5)\.
- D\. Song, D\. A\. Meyer, and D\. Tao \(2015\)Top\-k link recommendation in social networks\.In2015 IEEE International Conference on Data Mining,pp\. 389–398\.Cited by:[Related Work](https://arxiv.org/html/2606.15146#Sx2.p3.6)\.
- S\. Vaswani, B\. Kveton, Z\. Wen, M\. Ghavamzadeh, L\. V\. Lakshmanan, and M\. Schmidt \(2017\)Model\-independent online learning for influence maximization\.InInternational Conference on Machine Learning,pp\. 3530–3539\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p3.1),[Related Work](https://arxiv.org/html/2606.15146#Sx2.p2.1)\.
- Z\. Wen, B\. Kveton, M\. Valko, and S\. Vaswani \(2017\)Online influence maximization under independent cascade model with semi\-bandit feedback\.Advances in neural information processing systems30\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p3.1),[Related Work](https://arxiv.org/html/2606.15146#Sx2.p2.1)\.
- B\. Wilder, N\. Immorlica, E\. Rice, and M\. Tambe \(2018\)Maximizing influence in an unknown social network\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.32\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p3.1),[Related Work](https://arxiv.org/html/2606.15146#Sx2.p2.1)\.
- Y\. Xu, J\. L\. Nicolau, and P\. Luo \(2022\)Travelers’ reactions toward recommendations from neighboring rooms: spillover effect on room bookings\.Tourism Management88,pp\. 104427\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p2.1)\.
- X\. Yang, H\. Steck, Y\. Guo, and Y\. Liu \(2012\)On top\-k recommendation using social networks\.InProceedings of the sixth ACM conference on Recommender systems,pp\. 67–74\.Cited by:[Related Work](https://arxiv.org/html/2606.15146#Sx2.p3.6)\.
- J\. Zhao, B\. Su, X\. Rao, and Z\. Chen \(2022\)A cross\-platform personalized recommender system for connecting e\-commerce and social network\.Future Internet15\(1\),pp\. 13\.Cited by:[Introduction](https://arxiv.org/html/2606.15146#Sx1.p1.1)\.
Contextual Bandits for Maximizing Stimulated Word-of-Mouth Rewards

Similar Articles

Multi-Objective Multi-Agent Bandits: From Learning Efficiency to Fairness Optimization

Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity

Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning

Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems

Online Pandora's Box for Contextual LLM Cascading

Submit Feedback

Similar Articles

Multi-Objective Multi-Agent Bandits: From Learning Efficiency to Fairness Optimization
Catching a Moving Subspace: Low-Rank Bandits Beyond Stationarity
Human-in-the-Loop Multi-Agent Ventilator Decision Support with Contextual Bandit Preference Learning
Graph-based Target Back-Propagation for Context Adaptation in Multi-LLM Agentic Systems
Online Pandora's Box for Contextual LLM Cascading