Optimal Scheduling in a Question-Answering Forum of Knowledge Workers

arXiv cs.AI 06/20/26, 04:00 AM Papers
question-answering scheduling knowledge-workers queuing-theory capacity-analysis social-networks optimization
Summary
This paper models a question-answering forum staffed by expert knowledge workers, studying optimal scheduling to maximize system capacity and stability.
arXiv:2606.19759v1 Announce Type: new Abstract: As individuals turn to the Internet to find answers to questions they may have, several Question Answering (QA) forums have evolved, where users knowledgeable in certain topics can contribute their expertise to answering these requests for information. While these are currently volunteer based, we consider a future version employing knowledge workers who are experts in certain topics. In such a system, the request-answer processes forming the queuing system may utilize schedulers that assign requests in different topics to the experts in the forum, who may be able to answer them according to their expertise levels in different topics. With this model, we calculate the capacity of the system for handling the requests while keeping the system stable, and design schedulers that achieve capacity. We also investigate how collaboration between experts in answering requests can potentially increase capacity.
Original Article
View Cached Full Text
Cached at: 06/20/26, 02:32 PM
# Optimal Scheduling in a Question-Answering Forum of Knowledge Workers
Source: [https://arxiv.org/html/2606.19759](https://arxiv.org/html/2606.19759)
###### Abstract

As individuals turn to the Internet to find answers to questions they may have, several Question Answering \(QA\) forums have evolved, where users knowledgeable in certain topics can contribute their expertise to answering these requests for information\. While these are currently volunteer based, we consider a future version employing knowledge workers who are experts in certain topics\. In such a system, the request\-answer processes forming the queuing system may utilize schedulers that assign requests in different topics to the experts in the forum, who may be able to answer them according to their expertise levels in different topics\. With this model, we calculate the capacity of the system for handling the requests while keeping the system stable, and design schedulers that achieve capacity\. We also investigate how collaboration between experts in answering requests can potentially increase capacity\.

Keywords: question answering, social networks, scheduling\.

## 1Introduction

Question Answering \(QA\) forums\[[1](https://arxiv.org/html/2606.19759#bib.bib1)\]such asQuoraandStackExchange\[[2](https://arxiv.org/html/2606.19759#bib.bib2)\]allow individuals to ask questions, so that users in the QA forum, knowledgeable in that topic, can answer these based on their own knowledge, and perhaps augmented with search for relevant information using information sources\. Posted questions may contain keywords \(tags\) or may be posted in a specific narrow forum within the QA system, so that the topic of the question is clear to potential answerers knowledgeable in that topic\. QA forums have sparked a variety of research, such as automatic identification of topics based on textual information\[[3](https://arxiv.org/html/2606.19759#bib.bib3)\], diffusion of information in online social networks\[[4](https://arxiv.org/html/2606.19759#bib.bib4)\]and models to understand effects of such diffusion on human behavior\[[5](https://arxiv.org/html/2606.19759#bib.bib5)\]\.

While the present volunteer driven QA forums are certainly successful, a certain fraction of questions in them remains unanswered, due to lack of volunteer interest or expertise in answering those questions\. For example, analysis of retrieved archival data\[[6](https://arxiv.org/html/2606.19759#bib.bib6)\]shows that 16% of questions remain unanswered and 54% of questions do not have an answer marked ‘Accepted’ out of the 1\.8 million questions on 37 most popularStackExchangewebsites on technical topics\. Similarly, certain questions receive active interest from volunteers, with several dozen answers, while other questions languish without volunteer interest\. At the same time, the rise of the large language models, such as the Chatgpt bot currently being tested by the QA forumQuora, may put a premium ondedicatedhuman experts answering complex questions that bots cannot easily answer\. Thus, while researchers are investigating incentives for volunteers in crowdsourcing applications to complete tasks such as question\-answering\[[8](https://arxiv.org/html/2606.19759#bib.bib8)\], it may be worth imagining a future version of these QA forums \(or perhaps a premium tier within an existing QA forum\), which, by monetizing the forum, is able to employ dedicated knowledge workers, whom we will callexperts, to guarantee an acceptable level of performance in terms of complex questions being successfully answered\. Such guarantees may be obtained by requiring these knowledge workers to be available as contracted, as well as dynamically assigning specific questions to these workers, using a scheduler, with a view to increasing the total system capacity\. Even if such as knowledge worker\-based forum \(or a premium tier\) doesn’t come to fruition, a theoretical analysis of the proposed system will also provide an upper bound on the capacity of a volunteer driven QA forum, since volunteers are not necessarily motivated by the express goal of maximizing system capacity\.

This paper investigates the problem of designing such a scheduler for the proposed knowledge worker\-based system, to maximize request answering capacity without the experts becoming overloaded\. We propose a system model where requests for information on different topics arrive and are placed in topic queues\. In previous research\[[9](https://arxiv.org/html/2606.19759#bib.bib9)\]based on this model, we had proposed a fixed scheduler that assigns different fractions of requests from the topic queues to different experts\. However, that scheduler requiredoffline calculationsof the fractions assigned, based on complete knowledge of the statistics of the request arrival processes, as well the capabilities of all the experts\. In this paper, we propose anonline schedulerthat assigns each request dynamically from the queues to experts based on the latter’s expertise in different topics and the current state of the system, so that the question is researched by one or more experts until it is answered\. Further, we investigate acoordinatedmode of operation, where different experts handle different requests, and also acollaborativemode, where several experts jointly work on a request, by pooling together their expertise\. Queuing theoretic analysis of the scheduler is used to calculate the capacity of the QA system\. Insights into the capacity are obtained and the benefit of collaboration between the experts versus mere coordination is explored\. While there is vast prior work on analysis of scheduling\[[10](https://arxiv.org/html/2606.19759#bib.bib10)\]in diverse applications, which we can draw on for analysis, the focus of this paper on QA forums reveals interesting insights into the capacity of knowledge\-handling systems such as these forums\.

## 2Model

We assume that individuals can askquestions, also calledrequestsfor information, in the QA forum\. Each question belongs to a topicx∈𝒳x\\in\{\\cal X\}, with𝒳\{\\cal X\}being afiniteset of possible topics, and with the questions’ topic being identified using keywords marked in the question, by an automatic classifier based on the question’s text, or simply posted to a narrow sub\-forum that specializes in that topic\. We assume that time is slottedt=1,2,…t=1,2,\\ldots, and a random numberax\(t\)a\_\{x\}\(t\)questions on topicxxarrive at the end of slottt, whereax\(t\)a\_\{x\}\(t\)are i\.i\.d\. in time, and independent acrossxx\. We assume thatE\[ax\(t\)\]=λp\(x\)E\[a\_\{x\}\(t\)\]=\\lambda p\(x\)with the probability mass function \(p\.m\.f\.\)p\(x\)\>0,∀xp\(x\)\>0,\\forall xrepresenting the relative frequency of different topics, and request loadλ\\lambdabeing the mean number of questions in each slot\. We assume that the variance ofax\(t\)a\_\{x\}\(t\)is bounded bycaλ2c\_\{a\}\\lambda^\{2\}for some constantcac\_\{a\}\. Topic queues, one for each topic, are maintained for unanswered questions, withQx\(t\)Q\_\{x\}\(t\)being the queue length for topicxxat the beginning of timett\.

There arennavailable dedicated experts denotedi=1,2,…,ni=1,2,\\ldots,nthat can monitor the QA forum\. We assume that the experts agree to use a scheduler to assign questions to them, so that these experts arecoordinatingwith each other\. \(Thus, the model is explicitlynotconsidering volunteer\-based forums, where the motivations of the volunteers may not align with any notion of system capacity\.\) Callσx,i\(t\)=1,0\\sigma\_\{x,i\}\(t\)=1,0according to as expertiiis assigned a request from topicxxat timett, or not, respectively\. We assume that an expertiican work on \(orresearch\) no more thannin\_\{i\}requests at eachtt, where experts who can dedicate more time may choose a higher value ofnin\_\{i\}, similar to how it works for drivers in the ride\-share industry\. Expertiisucceeds in answering a question in a given slot with a probability ofqi\(x\)q\_\{i\}\(x\)\(which could be zero\), which we call theanswering rateand depends on the topicxx\. We calldx,i\(t\)d\_\{x,i\}\(t\)the number of successful answers provided by expertiiin topicxxat timett\. If the expert is unsuccessful, the request goes back to its topic queue, and may be assigned to another \(or the same\) expert in subsequent time slots\. \(We experimentally show that this assumption does not seem to be not very critical to the results \- see Section[5](https://arxiv.org/html/2606.19759#S5)\. We also design a separate scheduler that explicitly allows workers to work on a request until completion\.\)

We assume that the probability of successfully answering that request does not depend on its past history, so that the number of \(possibly non\-consecutive\) time slotsTneeded by expertiito answer the request is a geometric random variable with values1,2,…1,2,\\ldotsand mean valueTi\(x\)=1qi\(x\)≥1T\_\{i\}\(x\)=\\frac\{1\}\{q\_\{i\}\(x\)\}\\geq 1\. \(While this is a useful assumption for technical reasons, we show experimentally in Section[5](https://arxiv.org/html/2606.19759#S5)that this assumption seems to be adequate from a practical perspective, provided an upper bound is placed onT\.\) Random variableTis apriori unknown to the scheduler, although we assume that its mean value, the average research timeTi\(x\)T\_\{i\}\(x\), is known for alli,xi,x\. Thus, as far as scheduling is concerned, an expertiiis simply a functionTi:𝒳→\[1,∞\]T\_\{i\}:\{\\cal X\}\\rightarrow\[1,\\infty\]\. The random answering time accounts for varying difficulty in searching for information, or processing that information, for different questions, even in the same topic\. It also accounts for the back and forth refining of questions, based on partial answers, that happens in QA forums\. TheTi\(⋅\)T\_\{i\}\(\\cdot\)functions will generally be different for different experts to account for their specific expertise\. The simplest method to estimateTi\(x\)T\_\{i\}\(x\)is to use past history of the expert’s answers in different topics\. More sophisticated methods, such as in\[[11](https://arxiv.org/html/2606.19759#bib.bib11)\], may also potentially be used\. We assume that the scheduler can use past and current history, including queue lengthsQx\(t\)Q\_\{x\}\(t\), to decide onσx,i\(t\),∀x,i\\sigma\_\{x,i\}\(t\),\\forall x,i\. Since we assumed that arrivals occur at the end of a slot, the queue lengths update as

Qx\(t\+1\)=Qx\(t\)\+ax\(t\)−∑i=1ndx,i\(t\),∀x\\displaystyle Q\_\{x\}\(t\+1\)=Q\_\{x\}\(t\)\+a\_\{x\}\(t\)\-\\sum\_\{i=1\}^\{n\}d\_\{x,i\}\(t\),\\ \\forall x\(1\)
We call the QA system as unstable if, for any scheduler that uses past and current history, diverging queue lengths are possible, i\.e\.,

P\(limt→∞∑x∈𝒳Qx\(t\)=∞\)\>0\.\\displaystyle P\(\\lim\_\{t\\rightarrow\\infty\}\\sum\_\{x\\in\{\\cal X\}\}Q\_\{x\}\(t\)=\\infty\)\>0\.\(2\)To prove stability, we will only consider schedulers that use current queue lengthsQx\(t\)Q\_\{x\}\(t\)at timett\. For such schedulers, due to the i\.i\.d\. arrivals, and the geometric distribution of answering times, the QA system is a Markov chain withQx\(t\),∀xQ\_\{x\}\(t\),\\forall xbeing the state at timett\. We say the system is stable if this Markov chain is positive recurrent\[[12](https://arxiv.org/html/2606.19759#bib.bib12)\]\. In particular, this would also mean that the system is not unstable in the sense of \([2](https://arxiv.org/html/2606.19759#S2.E2)\)\. We sayλ\\lambdais achievable if a scheduler can keep the system stable\. The supremum of all suchλ\\lambdais thecapacityof the system\.

For conciseness, we generally omit writing the set of an index, when that set is obvious, e\.g\., writing∑x\\sum\_\{x\}instead of∑x∈𝒳\\sum\_\{x\\in\{\\cal X\}\}\. Similarly, we will writemaxαi\\max\_\{\\alpha\_\{i\}\}instead ofmaxαi,i=1,2,…,n\\max\_\{\\alpha\_\{i\},i=1,2,\\ldots,n\}\. Due to lack of space, we leave the proofs to a subsequent full length paper, while noting that they rely on Lyapunov analysis, but for the special types of graph theoretic problems encountered in this formulation\. While the capacities we obtain can be conceivably used to design or evolve a QA forum \(such as by hiring experts that can increase the computed capacity\), we also provide an intuitive understanding of the capacity expressions\.

## 3Coordination Capacity

Suppose that thennexperts decide to coordinate as described in Section[2](https://arxiv.org/html/2606.19759#S2), by using a scheduler that decides the assignment of unanswered questions to each expert in each slottt, with a view to maximizing the achievable request loadλ\\lambda\. Letℳi≐\{x:qi\(x\)\>0\}\{\\cal M\}\_\{i\}\\doteq\\\{x:q\_\{i\}\(x\)\>0\\\}be the set of topics that expertiihas some capability of answering\. We assume thatℳi≠∅,∀i\{\\cal M\}\_\{i\}\\neq\\emptyset,\\forall i, since experts that can answer no topics don’t need to be considered for scheduling\. In this case, we have the following result\.

###### Lemma 1

The capacity with coordinating experts isλ∗=0\\lambda^\{\*\}=0if there exists a topicxxhavingmaxi⁡qi\(x\)=0\\max\_\{i\}q\_\{i\}\(x\)=0; otherwise it is strictly positive and given as

λ∗\\displaystyle\\lambda^\{\*\}=\\displaystyle=\(maxαi∑x∈𝒳mini:qi\(x\)\>0⁡\(αip\(x\)qi\(x\)\)\)−1where\\displaystyle\\left\(\\max\_\{\\alpha\_\{i\}\}\\sum\_\{x\\in\{\\cal X\}\}\\ \\min\_\{i:\\,q\_\{i\}\(x\)\>0\}\\left\(\\alpha\_\{i\}\\frac\{p\(x\)\}\{q\_\{i\}\(x\)\}\\right\)\\right\)^\{\-1\}\\ \\mbox\{where\}\(4\)∑i=1nαini=1,αi≥0,∀i=1,…,n\.\\displaystyle\\sum\_\{i=1\}^\{n\}\\alpha\_\{i\}n\_\{i\}=1,\\quad\\alpha\_\{i\}\\geq 0,\\ \\forall\\,i=1,\\ldots,n\.Further, anyλ<λ∗\\lambda<\\lambda^\{\*\}can be achieved using a queue length based scheduler, such as the Greedy Online Coordinating scheduler described below\.

Greedy Online Coordinating scheduler: In each time slottt, for each expertiiseparately, the scheduler first sorts the requests in the topic queuesxxin decreasing order of rewardsri\(x\)=Qx\(t\)qi\(x\)r\_\{i\}\(x\)=Q\_\{x\}\(t\)q\_\{i\}\(x\)\. It then tentatively assignsnin\_\{i\}of the largest reward requests to expertii\. Then, in the second stage, it finalizes the requests by arbitrarily deleting excess tentative requests in each topicxxthat exceed the numberQx\(t\)Q\_\{x\}\(t\)waiting in that topic queue, and presents them to the experts\.

Note that this is an Online algorithm since the scheduler here does not need to pre\-compute fixed fractions of requests to be assigned to different experts, unlike the case in\[[9](https://arxiv.org/html/2606.19759#bib.bib9)\], which would require knowledge of the arrivalp\(x\)p\(x\), and can potentially also adapt here if this p\.m\.f\. changes slowly\. The scheduler can be implemented in afederatedmanner, as follows\. Each expert can independently compute its own rewardsri\(x\)r\_\{i\}\(x\)and choose the largestnin\_\{i\}of these to announce its tentative schedule to the scheduler\. The scheduler can then run the second phase of the greedy assignment after collecting all these tentative schedules\. Thus, the values ofqi\(x\)q\_\{i\}\(x\)only needs be known by the corresponding expertii\.

Proof\[Lemma[1](https://arxiv.org/html/2606.19759#Thmlemma1)\]:Coordination is a special case of Collaboration, as discussed in Section[4](https://arxiv.org/html/2606.19759#S4), so that this theorem is a special case of Theorem[1](https://arxiv.org/html/2606.19759#Thmtheorem1), obtained by setting𝒮=\{\{i\},i=1,2,…,n\}\{\\cal S\}=\\\{\\\{i\\\},i=1,2,\\ldots,n\\\}, the set of all singletons\. Further the Greedy scheduler proposed above is a special case of the Two\-phase Greedy scheduler described in Algorithm[1](https://arxiv.org/html/2606.19759#alg1)for the Collaboration case\. Although in the Collaboration case, the Two\-phase Greedy algorithm can only guarantee an approximation ratio ofγ≐\(maxS∈𝒮⁡\|S\|\)−1\\gamma\\doteq\(\\max\_\{S\\in\{\\cal S\}\}\|S\|\)^\{\-1\}with respect to capacity \(Lemma[6](https://arxiv.org/html/2606.19759#Thmlemma6)\), in the Coordination case,γ=1\\gamma=1since the set𝒮\{\\cal S\}only consist of singleton sets\{i\}\\\{i\\\}\. Thus, in this case, the Greedy scheduler achieves capacity\. This Lemma generalizes the one obtained in\[[9](https://arxiv.org/html/2606.19759#bib.bib9)\]forofflinescheduling\.□\\square

Coordinating experts can potentially compensate for each others’ lack of expertise in certain topics\. We calculate the coordination capacity in certain special cases to provide intuition on that aspect\. To avoid clutter here, we assumeni≡1n\_\{i\}\\equiv 1\.

###### Lemma 2

The coordination capacityλ∗\\lambda^\{\*\}in Lemma[1](https://arxiv.org/html/2606.19759#Thmlemma1)is bounded as follows\.

1. \(a\)λ∗≤n\\lambda^\{\*\}\\leq nalways\.
2. \(b\)Competency of experts: Suppose for each topicxx, at leastNNexperts have answering rateqi\(x\)≥qq\_\{i\}\(x\)\\geq q\. Then,λ∗≥qN\\lambda^\{\*\}\\geq qN\.
3. \(c\)Diversity of experts: Suppose a partition of topicsS\(i\),i=1,…,nS^\{\(i\)\},i=1,\\ldots,nexists \(allowingS\(i\)=∅S^\{\(i\)\}=\\emptyset\), with eachS\(i\)≐\{x:qi\(x\)≥q\}S^\{\(i\)\}\\doteq\\\{x:q\_\{i\}\(x\)\\geq q\\\}\. Letp≐maxi⁡P\(X∈S\(i\)\)p\\doteq\\max\_\{i\}P\(X\\in S^\{\(i\)\}\)\. Then,λ∗≥qp\\lambda^\{\*\}\\geq\\frac\{q\}\{p\}, but it requiresn≥1pn\\geq\\frac\{1\}\{p\}experts\.

Part b\) states that if every topic has several competent experts, capacity is proportionally higher\. Practically, we expect an expert to be competent in only a subset of topics\. If many competent experts exist, but they only cover similar topics, requests in other topics will not be sufficiently answered\. Part \(c\) states that capacity will be high if many experts exist, each competent in a subset of topics, such that all topics are covered\. In part \(c\),P\(X∈S\)=∑x∈Sp\(x\)P\(X\\in S\)=\\sum\_\{x\\in S\}p\(x\)\. By partition, we mean thatS\(i\)∩S\(j\)=∅,∀i≠jS^\{\(i\)\}\\cap S^\{\(j\)\}=\\emptyset,\\forall i\\neq jand∪iS\(i\)=𝒳\\cup\_\{i\}S^\{\(i\)\}=\{\\cal X\}\.

In QA forums likeQuoraandStackExchange, multiple experts may propose answers to a question until that question is resolved\. At the same time, an expert may work on other requests before coming back to a previous request she had worked on\. The above Online Coordinating scheduler models that behavior\. If an expert is unsuccessful in finding the answer, the request goes back to its queue, and may subsequently be worked on by a different \(or same\) expert, and each expert is allowed to work on a different request in subsequent time slots\. However, in other QA forums such as Microsoft Community\[[2](https://arxiv.org/html/2606.19759#bib.bib2)\], each request is often handled by only a single moderator or ‘MVP’, until it gets answered\. While experiments in Section[5](https://arxiv.org/html/2606.19759#S5)suggest that the above scheduler potentially remains optimal even if the model is changed to allow an expert to keep working on a request until she answers it, the following scheduler explicitly models such non\-preemptive behavior\.

Online Arrival scheduler: This scheduler maintains separate topic queuesQx,i\(t\)Q\_\{x,i\}\(t\)for each expertii\. Choose an arbitrarycQ\>0c\_\{Q\}\>0\. In each time slottt, each expertiichoosesnin\_\{i\}requests arbitrarily from among those queued up at her queuesQx,iQ\_\{x,i\}, or continues working on the requests from the previous slot\. At the end of that time slot, the scheduler assigns allax\(t\)a\_\{x\}\(t\)arriving requests in topicxxto the corresponding topic queue of a specific experti∗\(x\)i^\{\*\}\(x\)chosen as below\.

i∗\(x\)=argmini:qi\(x\)\>01qi\(x\)\(cQ\+∑x∈ℳi1niqi\(x\)Qx,i\(t\)\)\.\\displaystyle i^\{\*\}\(x\)\\ =\\ \\operatorname\*\{argmin\}\_\{i:\\,q\_\{i\}\(x\)\>0\}\\ \\frac\{1\}\{q\_\{i\}\(x\)\}\(c\_\{Q\}\+\\sum\_\{x\\in\{\\cal M\}\_\{i\}\}\\frac\{1\}\{n\_\{i\}q\_\{i\}\(x\)\}Q\_\{x,i\}\(t\)\)\.\(5\)In the argmin above, if there is a topicxxfor whichqi\(x\)=0,∀iq\_\{i\}\(x\)=0,\\forall i, those requests are discarded\. \(By Lemma[1](https://arxiv.org/html/2606.19759#Thmlemma1), the capacity is zero in this degenerate case\.\) Only a single expert works on each given request until it gets answered\. The Lemma below shows that this scheduler is also optimal\.

###### Lemma 3

The Online Arrival scheduler \([5](https://arxiv.org/html/2606.19759#S3.E5)\) for coordinating experts achieves the same capacity as shown in Lemma[1](https://arxiv.org/html/2606.19759#Thmlemma1)\.

In this scheduler, the constantcQc\_\{Q\}is required because queue lengths are natural numbers, and so, an empty queue conveys no information about lack of expertise in some topic, unlesscQ\>0c\_\{Q\}\>0\. Simulations in Section[5](https://arxiv.org/html/2606.19759#S5)do not indicate a strong effect ofcQc\_\{Q\}, withcQ=1c\_\{Q\}=1being adequate\. The Online Arrival scheduler can also be implemented in a federated manner, similar to the two\-phase Greedy scheduler\.

## 4Collaboration Capacity

In this section, we extend the model of Section[2](https://arxiv.org/html/2606.19759#S2), by assuming that that certain sets of experts cancollaborate\. This means that they can jointly research a common requestxx, sharing their diverse expertise, brain\-storming or identifying strengths and weaknesses of each others’ arguments, and then jointly arrive at the correct answer after a random amount of time, modeled as a positive geometric random variable\.

To model collaboration, denote byqS\(x\)q\_\{S\}\(x\)the probability of answering a requestxxin a given time slot if experts in the setSScollaborate onxx\. \(So,TS\(x\)=1/qS\(x\)T\_\{S\}\(x\)=1/q\_\{S\}\(x\)is the average time to collaboratively answer the request\.\) For example, thecollaboration setS=\{1,2,5\}S=\\\{1,2,5\\\}if experts 1,2,5 collaborate\. Let𝒮\{\\cal S\}be the set of all feasible collaborations\.𝒮\{\\cal S\}is assumed to always contain all the singletons\{i\},i=1,2,…,n\\\{i\\\},i=1,2,\\ldots,n, which corresponds to the non\-collaborative research mode of individual experts\. For these singletons,q\{i\}≐qi\(x\)q\_\{\\\{i\\\}\}\\doteq q\_\{i\}\(x\)is the usual non\-collaborative answering rate\.𝒮\{\\cal S\}need not be a power set of the experts, since not all experts may want to collaborate with each other, and further, collaborations may practically be limited to only a few experts at a time\. We assume that an expertiican participate in at mostnin\_\{i\}collaboration setsS∈𝒮S\\in\{\\cal S\}in a given time slot, since that is the number of problems he may simultaneously work on\. Thus, ifI⊂𝒮I\\subset\{\\cal S\}is the set of collaboration sets that are assigned requests by the scheduler in a given time slot, expertiicannot appear in more thannin\_\{i\}sets inII\. Note that for the special caseni≡1n\_\{i\}\\equiv 1,IImust be an independent set\. i\.e\.,S1,S2∈I⇒\|S1∩S2\|=∅S\_\{1\},S\_\{2\}\\in I\\Rightarrow\|S\_\{1\}\\cap S\_\{2\}\|=\\emptyset\. So, we can call thenin\_\{i\}\-constrained version as a ‘generalized independent set’\. Letℐ\{\\cal I\}be the set ofmaximalgeneralized independent sets\. i\.e\., if a generalized independent setIIis inℐ\{\\cal I\}, then there is no other generalized independent setI~\\tilde\{I\}inℐ\{\\cal I\}containingII\. For example, forni≡1n\_\{i\}\\equiv 1, if the only collaboration sets area=\{1\},b=\{2\},c=\{3\},d=\{1,2\}a=\\\{1\\\},b=\\\{2\\\},c=\\\{3\\\},d=\\\{1,2\\\}, thenℐ\{\\cal I\}consists of\{a,b,c\}\\\{a,b,c\\\}and\{c,d\}\\\{c,d\\\}only\.

LetℳS≐\{x:qS\(x\)\>0\}\{\\cal M\}\_\{S\}\\doteq\\\{x:q\_\{S\}\(x\)\>0\\\}be the set of topics that collaboration setSShas some capability of answering\. For collaborative research, we have the following capacity result\.

###### Theorem 1

The capacity with collaborative research isλ∗=0\\lambda^\{\*\}=0if there exists a topicxxhavingmaxS⁡qS\(x\)=0\\max\_\{S\}q\_\{S\}\(x\)=0; otherwise it is strictly positive and given as

λ∗=\(maxαS∑x∈𝒳minS:qS\(x\)\>0⁡\(αSp\(x\)qS\(x\)\)\)−1where\\displaystyle\\lambda^\{\*\}\\ =\\ \\left\(\\max\_\{\\alpha\_\{S\}\}\\sum\_\{x\\in\{\\cal X\}\}\\ \\min\_\{S:\\,q\_\{S\}\(x\)\>0\}\\left\(\\alpha\_\{S\}\\frac\{p\(x\)\}\{q\_\{S\}\(x\)\}\\right\)\\right\)^\{\-1\}\\ \\mbox\{where\}\(6\)∑S∈IαS≤1,∀I∈ℐ,αS≥0,∀S∈𝒮\.\\displaystyle\\sum\_\{S\\in I\}\\alpha\_\{S\}\\leq 1,\\ \\forall\\,I\\in\{\\cal I\},\\qquad\\alpha\_\{S\}\\geq 0,\\ \\forall S\\in\{\\cal S\}\.\(7\)Further, anyλ<λ∗\\lambda<\\lambda^\{\*\}can be achieved using a queue length based scheduler, such as the one shown below\.

Algorithm 1: Two\-phase Greedy Collaboration scheduler𝒮0=𝒮,U=∅,n\(x\)≡0,σx,S\(t\)≡0,Ni≡0\{\\cal S\}\_\{0\}=\{\\cal S\},\\ \\ U=\\emptyset,\\ \\ n\(x\)\\equiv 0,\\ \\ \\sigma\_\{x,S\}\(t\)\\equiv 0,\\ \\ N\_\{i\}\\equiv 0;

while\(

𝒮0≠∅\{\\cal S\}\_\{0\}\\neq\\emptyset\)do

x∗,S∗=argmaxx,S∈𝒮0Qx\(t\)qS\(x\)x^\{\*\},S^\{\*\}=\\operatorname\*\{argmax\}\_\{x,\\,S\\in\{\\cal S\}\_\{0\}\}\\,Q\_\{x\}\(t\)q\_\{S\}\(x\);

σx∗,S∗\(t\)=1\\sigma\_\{x^\{\*\},S^\{\*\}\}\(t\)=1;

n\(x∗\)←n\(x∗\)\+1n\(x^\{\*\}\)\\leftarrow n\(x^\{\*\}\)\+1;

Ni←Ni\+1,∀i∈S∗N\_\{i\}\\leftarrow N\_\{i\}\+1,\\ \\forall i\\in S^\{\*\};

U←U∪\{i:Ni=ni\}U\\leftarrow U\\cup\\\{i:N\_\{i\}=n\_\{i\}\\\};

𝒮0←𝒮0∖\{S:S∩U≠∅\}\{\\cal S\}\_\{0\}\\leftarrow\{\\cal S\}\_\{0\}\\setminus\\\{S:S\\cap U\\neq\\emptyset\\\};

endwhile

for

x∈𝒳x\\in\{\\cal X\}do

if

n\(x\)\>Qx\(t\)n\(x\)\>Q\_\{x\}\(t\)then

Arbitrarily delete

n\(x\)−Qx\(t\)n\(x\)\-Q\_\{x\}\(t\)assignments of topic

xxrequests made to collaboration sets;

endif

endfor

Online Collaborative scheduler: In each time slottt, the scheduler assigns requests from topic queuesxxto the collaboration setsSS\(denotedσx,S\(t\)=1\\sigma\_\{x,S\}\(t\)=1, else0\), by solving the Hypergraph Assignment Problem \(HAP\) below\.\[[13](https://arxiv.org/html/2606.19759#bib.bib13)\]\.

R∗\\displaystyle R^\{\*\}=\\displaystyle=maxσx,S∑xQx\(t\)∑S∈𝒮qS\(x\)σx,S\(t\)where\\displaystyle\\max\_\{\\sigma\_\{x,S\}\}\\ \\sum\_\{x\}Q\_\{x\}\(t\)\\sum\_\{S\\in\{\\cal S\}\}q\_\{S\}\(x\)\\sigma\_\{x,S\}\(t\)\\quad\\mbox\{where\}\(10\)∑x∑S:i∈Sσx,S\(t\)≤ni,∀i=1,2,…,n,\\displaystyle\\mkern\-36\.0mu\\sum\_\{x\}\\sum\_\{S:\\,i\\in S\}\\sigma\_\{x,S\}\(t\)\\leq n\_\{i\},\\ \\forall\\,i=1,2,\\ldots,n,∑S∈𝒮σx,S\(t\)≤min⁡\{Qx\(t\),n\},∀x∈𝒳\.\\displaystyle\\mkern\-36\.0mu\\sum\_\{S\\in\{\\cal S\}\}\\sigma\_\{x,S\}\(t\)\\leq\\min\\\{Q\_\{x\}\(t\),n\\\},\\ \\forall\\,x\\in\{\\cal X\}\.Here, \([10](https://arxiv.org/html/2606.19759#S4.E10)\) ensures that an expertiiis not assigned more thannin\_\{i\}requests in a time slot, whether individually or collaboratively, while \([10](https://arxiv.org/html/2606.19759#S4.E10)\) ensures that only requests actually present in the topic queues are assigned\. The HAP \([10](https://arxiv.org/html/2606.19759#S4.E10)\) can be visualized as a type of bipartite graph matching problem to match requests with experts, but with constraints on which collaboration sets may be simultaneously active \(due to the generalized independent set constraints\.\)

To demonstrate the possible advantage of collaboration, consider a stylized case where there aren=2n=2experts and they can collaborate\. Assumeni≡1n\_\{i\}\\equiv 1and thatp\(x1\)=1p\(x\_\{1\}\)=1, so thatx1x\_\{1\}is the only topic of interest\. Assumeq1\(x1\)=q2\(x1\)=1100q\_\{1\}\(x\_\{1\}\)=q\_\{2\}\(x\_\{1\}\)=\\frac\{1\}\{100\}because each expert finds the request difficult to answer, whileq\{1,2\}\(x1\)=1≫q1\(x1\)\+q2\(x1\)q\_\{\\\{1,2\\\}\}\(x\_\{1\}\)=1\\gg q\_\{1\}\(x\_\{1\}\)\+q\_\{2\}\(x\_\{1\}\)because the two experts can pool their complementary expertise to solvex1x\_\{1\}quickly\. Then by symmetry, the solution to \([6](https://arxiv.org/html/2606.19759#S4.E6)\) has optimalα1=α2=0\.5\\alpha\_\{1\}=\\alpha\_\{2\}=0\.5andα\{1,2\}=1\\alpha\_\{\\\{1,2\\\}\}=1, so thatλ∗=\(∑xmin⁡\(α\{1,2\}p\(x\)q\{1,2\}\(x\),mini⁡\(αip\(x\)qi\(x\)\)\)\)−1=\(min⁡\(1,50,50\)\)−1=1\\lambda^\{\*\}=\\left\(\\sum\_\{x\}\\min\\left\(\\alpha\_\{\\\{1,2\\\}\}\\frac\{p\(x\)\}\{q\_\{\\\{1,2\\\}\}\(x\)\},\\ \\min\_\{i\}\\left\(\\alpha\_\{i\}\\frac\{p\(x\)\}\{q\_\{i\}\(x\)\}\\right\)\\right\)\\right\)^\{\-1\}=\\left\(\\min\(1,50,50\)\\right\)^\{\-1\}=1\. On the other hand, the coordinating but non\-collaborating case \([4](https://arxiv.org/html/2606.19759#S3.E4)\) yields with the optimalα1=α2=0\.5\\alpha\_\{1\}=\\alpha\_\{2\}=0\.5, the capacityλ∗=\(min⁡\(50,50\)\)−1=0\.02\\lambda^\{\*\}=\\left\(\\min\(50,50\)\\right\)^\{\-1\}=0\.02, which is significantly smaller than the capacity with collaboration\.

The extent of the advantage of collaboration over coordination depends strongly on the specific topic distributionp\(x\)p\(x\)and answering ratesqS\(x\)q\_\{S\}\(x\)\. The result below provides a comparison of the collaboration capacityλcollab∗\\lambda^\{\*\}\_\{collab\}in Theorem[1](https://arxiv.org/html/2606.19759#Thmtheorem1)to the \(non\-collaborative\) coordination capacityλcoord∗\\lambda^\{\*\}\_\{coord\}in Lemma[1](https://arxiv.org/html/2606.19759#Thmlemma1), under a condition on answering rates\. Call⌊⋅⌋\\lfloor\\cdot\\rflooras the floor function\. Let𝒮\(2\+\)⊂𝒮\{\\cal S\}^\{\(2\+\)\}\\subset\{\\cal S\}be the subset of non\-singleton collaboration sets \(i\.e\., having at least two experts\.\) Letℐ\(2\+\)⊂ℐ\{\\cal I\}^\{\(2\+\)\}\\subset\{\\cal I\}be the set of all maximal independent sets that consist of only sets in𝒮\(2\+\)\{\\cal S\}^\{\(2\+\)\}\. Letℐcover⊆ℐ\(2\+\)\{\\cal I\}\_\{cover\}\\subseteq\{\\cal I\}^\{\(2\+\)\}be a cover for all the experts, i\.e\., we havei∈∪I∈ℐcover∪S∈ISi\\in\\cup\_\{I\\in\{\\cal I\}\_\{cover\}\}\\cup\_\{S\\in I\}Sfor every expertii\. Such a cover always exists if \(and only if\) every expertiiparticipates in some non\-singleton collaboration set, i\.e\., everyi∈Si\\in Sfor someS∈𝒮\(2\+\)S\\in\{\\cal S\}^\{\(2\+\)\}\. If no suchℐcover\{\\cal I\}\_\{cover\}exists, we can consider\|ℐcover\|=∞\|\{\\cal I\}\_\{cover\}\|=\\inftyin the bound below\.

###### Lemma 4

Forni≡1n\_\{i\}\\equiv 1, the collaboration capacityλcollab∗\\lambda^\{\*\}\_\{collab\}in Theorem[1](https://arxiv.org/html/2606.19759#Thmtheorem1)is bounded as follows\.

1. \(a\)λcoord∗≤λcollab∗≤n\\lambda^\{\*\}\_\{coord\}\\leq\\lambda^\{\*\}\_\{collab\}\\leq nalways\.
2. \(b\)Suppose that we have the uniform boundqS\(x\)≥c\(maxi∈S⁡qi\(x\)\),∀x,S∈𝒮\(2\+\)q\_\{S\}\(x\)\\geq c\\left\(\\max\_\{i\\in S\}q\_\{i\}\(x\)\\right\),\\forall x,S\\in\{\\cal S\}^\{\(2\+\)\}, wherec≥1c\\geq 1is a constant\. Then,λcollab∗≥\(c\|ℐcover\|\(maxS⁡\|S\|\)\)λcoord∗\\lambda^\{\*\}\_\{collab\}\\geq\\left\(\\frac\{c\}\{\|\{\\cal I\}\_\{cover\}\|\\,\(\\max\_\{S\}\|S\|\)\}\\right\)\\lambda^\{\*\}\_\{coord\}\.
3. \(c\)With the same assumptions as in part \(b\), if in addition, all possible collaborations of sizeKKare allowed \(withK≥2K\\geq 2\), we haveλcollab∗≥\(cK⌊n/K⌋n/K\)λcoord∗\\lambda^\{\*\}\_\{collab\}\\geq\\left\(\\frac\{c\}\{K\}\\frac\{\\lfloor n/K\\rfloor\}\{n/K\}\\right\)\\lambda^\{\*\}\_\{coord\}\.

Note that the bound onqS\(x\)q\_\{S\}\(x\)assumed in part \(b\) is trivially true forc=1c=1, since each collaboration setSScan simply designate its best expert on topicxxto answer requests in that topic\. The lemma shows that if collaboration always results in significantly faster answers, perhaps because experts can pool their expertise, or perhaps because experts function better when they ‘bounce ideas’ off each other, the collaboration capacity is proportionally higher\. In part \(b\), the reduction due to the collaboration set sizemaxS⁡\|S\|\\max\_\{S\}\|S\|is reasonable, since experts working together inSSon a single request lose the ability to work in parallel on several individual requests in that slot\. The reduction due to\|ℐcover\|\|\{\\cal I\}\_\{cover\}\|accounts for many experts requiring the same few experts for collaboration, so that the latter get ‘spread too thin’\. In fact, this capacity bound is tight, as shown by the two\-expert example in the previous paragraph, for which we hadc=100,\|S\|=2c=100,\|S\|=2,ℐcover=\{I\}\{\\cal I\}\_\{cover\}=\\\{I\\\}withI=\{\{1,2\}\}I=\\\{\\\{1,2\\\}\\\}, whileλcollab∗λcoord∗=10\.02=c\|ℐcover\|maxS⁡\|S\|\\frac\{\\lambda^\{\*\}\_\{collab\}\}\{\\lambda^\{\*\}\_\{coord\}\}=\\frac\{1\}\{0\.02\}=\\frac\{c\}\{\|\{\\cal I\}\_\{cover\}\|\\,\\max\_\{S\}\|S\|\}there\. In part \(b\), the bound also applies if only collaboration sets of sizeK≤maxS⁡\|S\|K\\leq\\max\_\{S\}\|S\|are considered, with appropriate change in the definition of𝒮\(2\+\),ℐ\(2\+\)\{\\cal S\}^\{\(2\+\)\},\{\\cal I\}^\{\(2\+\)\}\. The symmetric case in part \(c\) shows a capacity gain factor of approximatelycK\\frac\{c\}\{K\}\.

Unfortunately, from a practical perspective, the Hypergraph Assignment Problem is known to be NP\-hard\[[13](https://arxiv.org/html/2606.19759#bib.bib13)\]\. So, for largen,\|𝒳\|n,\|\{\\cal X\}\|, approximate solutions to the HAP \([10](https://arxiv.org/html/2606.19759#S4.E10)\) are of interest\. Another practical scenario of interest is when experts have an erroneous estimateT^S\(x\)\\hat\{T\}\_\{S\}\(x\)of the true average research timesTS\(x\)T\_\{S\}\(x\), since these may have been estimated by them using their past experience answering requests about different topics\. Callq^S\(x\)=1T^S\(x\)\\hat\{q\}\_\{S\}\(x\)=\\frac\{1\}\{\\hat\{T\}\_\{S\}\(x\)\}\. The lemma below guarantees QA system stability in such cases, under certain assumptions\.

###### Lemma 5

Using the erroneousq^S\(x\)\\hat\{q\}\_\{S\}\(x\)instead ofqS\(x\)q\_\{S\}\(x\)in both \([6](https://arxiv.org/html/2606.19759#S4.E6)\) and \([10](https://arxiv.org/html/2606.19759#S4.E10)\), letλ∗\\lambda^\{\*\}be the capacity computed using \([6](https://arxiv.org/html/2606.19759#S4.E6)\) andR∗R^\{\*\}be the optimal reward achieved in HAP \([10](https://arxiv.org/html/2606.19759#S4.E10)\)\. Suppose for fixed constantγ~≤1\\tilde\{\\gamma\}\\leq 1, we have the uniform boundT^S\(x\)≥γ~TS\(x\),∀x,S\\hat\{T\}\_\{S\}\(x\)\\geq\\tilde\{\\gamma\}T\_\{S\}\(x\),\\forall x,S\. Suppose also, for fixed constantsγ≤1\\gamma\\leq 1,c≥0c\\geq 0, the HAP \([10](https://arxiv.org/html/2606.19759#S4.E10)\) \(with erroneousq^S\(x\)\\hat\{q\}\_\{S\}\(x\)\) can be solved approximately to obtain a rewardRRbounded asR≥γR∗−cR\\geq\\gamma R^\{\*\}\-c, whereR∗R^\{\*\}is the erroneous optimal reward\. Then, the Collaborative scheduler \([10](https://arxiv.org/html/2606.19759#S4.E10)\) can achieve anyλ<γ~γλ∗\\lambda<\\tilde\{\\gamma\}\\gamma\\lambda^\{\*\}\.

Note that the trueqS\(x\)q\_\{S\}\(x\)need not be known to use the above lemma \- only a guaranteedγ~\\tilde\{\\gamma\}\. Forγ=1,c=0\\gamma=1,c=0, this reduces to the case of exactly solving the HAP, but with erroneous answering rates\. If in addition,γ~=1\\tilde\{\\gamma\}=1, this reduces to Theorem[1](https://arxiv.org/html/2606.19759#Thmtheorem1)\. This lemma also extends Lemma[1](https://arxiv.org/html/2606.19759#Thmlemma1)to the case of erroneous answering rate, since coordination is a special case of collaboration consisting of only the singleton collaboration sets\. An example of an approximate scheduler to which Lemma[5](https://arxiv.org/html/2606.19759#Thmlemma5)is applicable is described below\.

Two\-phase Greedy Collaboration scheduler: In each time slottt, the scheduler runs two phases to determine the assignments of requests to collaboration sets \(Algorithm[1](https://arxiv.org/html/2606.19759#alg1)\)\.NiN\_\{i\}is the number of times expertiiappears in the collaboration sets already chosen until that point\.n\(x\)n\(x\)counts the number of assignments of requests in topicxxthat have already been made until this point in the slot\. In the first phase, it repeatedly assignsxxto a collaboration setS∈𝒮0S\\in\{\\cal S\}\_\{0\}ifx,Sx,SmaximizesQx\(t\)qS\(x\)Q\_\{x\}\(t\)q\_\{S\}\(x\), while removing from𝒮0\{\\cal S\}\_\{0\}all sets that depend on expertsiithat have used up their allottednin\_\{i\}assignments \(setUU\.\) In the second phase, it deletes the surplus assignments that have been made beyond that topic’s queue length\.

###### Lemma 6

The Two\-phase Greedy scheduler solves the HAP \([10](https://arxiv.org/html/2606.19759#S4.E10)\) with a reward of at leastγR∗−n2\\gamma R^\{\*\}\-n^\{2\}, withγ≐\(maxS∈𝒮⁡\|S\|\)−1\\gamma\\doteq\(\\max\_\{S\\in\{\\cal S\}\}\|S\|\)^\{\-1\}\. Thus, it achievesλ<γλ∗\\lambda<\\gamma\\lambda^\{\*\}by Lemma[5](https://arxiv.org/html/2606.19759#Thmlemma5)\.

Note that in the above definition,γ−1\\gamma^\{\-1\}is the maximum number of experts that may collaborate on a given request\. For example, if only pairwise collaborations are allowed, thenγ−1=2\\gamma^\{\-1\}=2, and so, the above greedy scheduler will achieve at least half the capacity \([6](https://arxiv.org/html/2606.19759#S4.E6)\) according to this lemma\.

![Refer to caption](https://arxiv.org/html/2606.19759v1/x1.png)Figure 1:qi\(x\)q\_\{i\}\(x\)for some active users ofStackExchange\.![Refer to caption](https://arxiv.org/html/2606.19759v1/x2.png)Figure 2:Complementary cdf of Answering delays\.
## 5Simulations

We obtained question\-answering datasets from\[[6](https://arxiv.org/html/2606.19759#bib.bib6)\]on 37 of the most activeStackExchangeQA technical sites, each of which we consider atopicxxhere\. We used this dataset to calculate inter\-arrival times between consecutive requests in each topic, for a total of 1\.8 million such arrivals spanning more than 10 years for many of these sites\. For each topic, we use these samples of inter\-arrival times to simulate the request arrival processax\(t\)a\_\{x\}\(t\), noting that the distribution of these times may not be geometric\. We also used the dataset to extract 200 most frequent answerers \(‘experts’\) in these sites in terms of number of questions answered, among those who are significantly active in at least two sites \(such as ‘electronics’ and ‘arduino’\), in the sense of having at least 10% of their answerers in each site\. For these answerers, we obtained their answers and calculated the delay between the question being posed and answer being proposed, for a total of 270,000 such answers\. For each answerer, we used her corresponding list of delays in different topics to simulate the time needed for her to answer that topic by drawing a sample at random with replacement from her list of delays, for each new request\. Again, we note that the resulting delay will not necessarily a geometric random variable\. Figure[1](https://arxiv.org/html/2606.19759#S4.F1)shows theqi\(x\)q\_\{i\}\(x\)functions calculated for different experts, based on their average sample delays, with a few of the answerer’s StockOverflow Account IDs marked on the figure\. These answering rates are low because the time slot here is 8 minutes\. The figure shows that a few answerers dominate the system, which seems unreasonable in our proposed knowledge worker\-based system \(since they may be paid\), and will give uninteresting results here\. So, we scaled the answering delay sample values of each expert by a different constant, so that each has the same sum answering rate∑xqi\(x\)\\sum\_\{x\}q\_\{i\}\(x\), while not changing their relative delays in different topics\. Finally, it was observed that a certain fraction of answers come extremely late \(often after several months\), which may not be as interesting to users of the proposed system\. So, we truncated the answering delays to a maximum of 96 hours \- this affected 9% of the answering delays\. We observed that without such truncation \(such as declaring some questions as too difficult\), the system becomes unstable\. The Complementary cumulative distribution function \(ccdf\) of aggregate answering delays for some topics, plotted in Figure[2](https://arxiv.org/html/2606.19759#S4.F2), shows that while the distribution has an exponential distribution over a wide range \(thus, partially justifying our geometric random variable assumption\), it does tends to cluster near smaller values\.

With such empirical request arrival processes and answering delays, we simulated the two federated schedulers for the Coordination case of Section[3](https://arxiv.org/html/2606.19759#S3), where the loadλ\\lambdais varied by scaling the inter\-arrival time variable generated using the dataset, as described above, andni≡1n\_\{i\}\\equiv 1\. In our simulation, both schedulers, the Greedy and the Arrival\-based one, work on the assigned request until completion, instead of returning it to the queues at the end of the slot\. This is to allow us to check the sensitivity of our obtained capacity to this assumption\. For comparison, the Coordination capacity \([4](https://arxiv.org/html/2606.19759#S3.E4)\) was computed usingp\(x\)p\(x\)andqi\(x\)q\_\{i\}\(x\)obtained from the average inter\-arrival times in the different topics and the average delays for different topic\-expert pairs\. Figure[3](https://arxiv.org/html/2606.19759#S7.F3)shows the maximum \(over topics\) of time average of the queue lengths of the different schedulers using the empirical data, as outlined above, for different request loadsλ\\lambda, averaged over 10 separate trials of10510^\{5\}eight minute slots each\. In all cases, the queue length is small, as long as the load is sufficiently below coordination capacityλ∗\\lambda^\{\*\}\(shown by vertical line\)\. For the Arrival scheduler, thecQc\_\{Q\}value \(10−510^\{\-5\}for ‘small’ versus11for ‘large’\) does not seem to affect the queues unduly\. For comparison, the Uncoordinated case, where the experts are allowed to randomly choose any waiting request in topic that they are historically significantly active in, performs worse than the Coordinated cases\.

Since we are not aware of any dataset as yet that can be used to infer collaboration expertise, we numerically compute this capacity and contrast it with coordination capacity for a stylized model\. We assume\|𝒳\|=20\|\{\\cal X\}\|=20topics,ni≡1n\_\{i\}\\equiv 1andn=6n=6versusn=14n=14experts, with the topics arranged in an interval, and withqi\(x\)q\_\{i\}\(x\)modeled as a truncated Gaussian, normalized so that∑xqi\(x\)=1,∀i\\sum\_\{x\}q\_\{i\}\(x\)=1,\\ \\forall\\,i\. The mode of the Gaussian for the experts is chosen uniformly spaced within the interval, to simulate the experts specialized in different topics\. We assume all pairwise collaborations are allowed, and chooseqS\(x\)=cmaxi∈S⁡qi\(x\)q\_\{S\}\(x\)=c\\max\_\{i\\in S\}q\_\{i\}\(x\)for pairwiseSS, withc=3c=3\. Figure[4](https://arxiv.org/html/2606.19759#S7.F4)shows the coordination and collaboration capacities, as well as the lower bound on the collaboration capacity given by Lemma[4](https://arxiv.org/html/2606.19759#Thmlemma4)part \(c\), as a function of the standard deviationβ\\betaof the truncated Gaussian\. Since a smaller standard deviation results in experts having high answering rates, but for a narrower range of topics \(i\.e\., highly specialized experts\), the figure shows that both capacities are highest when experts are neither too specialized, nor too broad\. Moreover, the proposed lower bound is close to the exact collaboration capacity over a wide range of standard deviations\. With 14 experts, we could not compute the exact collaboration capacity numerically due to the exponentially large number of maximal independent sets, but the collaboration lower bound does show increased capacity compared to coordination \(and also shows the value of that lower bound\)\.

## 6Conclusions

In this paper, we analyzed the request handling capacity of a QA forum, assuming that experts in the forum agree to use a scheduler to receive assignments of questions to research\. We proposed two federated schedulers that achieve the coordination capacity, wherein experts coordinate on question assignment, but otherwise work separately\. We also proposed a collaboration mode, and showed that with an appropriate scheduler, the capacity in this mode may be larger\. Future work includes considering question\-answering over social networks with arbitrary graph relationships between experts, distributed scheduling over such networks, and hybrid systems that allow experts some choice in selecting which questions to answer\.

## 7Acknowledgments

This material is based upon work supported by the US National Science Foundation under Grant 1422193\.

![Refer to caption](https://arxiv.org/html/2606.19759v1/x3.png)Figure 3:Maximum of time\-averaged topic queues versusλ\\lambda\.![Refer to caption](https://arxiv.org/html/2606.19759v1/x4.png)Figure 4:Coordination and collaboration capacity versusβ\\beta\.
## References

- \[1\]I\. Srba and M\. Bielikova, “A comprehensive survey and classification of approaches for Community Question Answering,”ACM Transactions on the Web, pp\. 1\-63, August 2016\.
- \[2\]https://stackexchange\.com/,https://www\.quora\.com/,https://answers\.microsoft\.com/\.
- \[3\]H\. Nassif, M\. Mohtarami and J\. Glass, “Learning semantic relatedness in Community Question Answering using neural models,”Proc\. Workshop on Representation Learning for NLP, pp\. 137\-147, August 2016\.
- \[4\]S\. Dhamal, “Effectiveness of Diffusing Information through a Social Network in Multiple Phases,”IEEE Globecom, pp\. 1\-7, Dec\. 2018\.
- \[5\]T\. \-H\. Fan and K\. \-C\. Chen, “A New Social Network Model of Online Forums,”IEEE Globecom, pp\. 1\-6, Dec\. 2017\.
- \[6\]https://archive\.org/download/stackexchange/\.
- \[7\]P\. Nakov, et\. al\., “SemEval\-2015 Task 3: Answer Selection in Community Question Answering,”Wksp\. on Semantic Evaluation, June 2015\.
- \[8\]C\. Huang, H\. Yu, J\. Huang and R\. Berry, “Crowdsourcing with Heterogeneous Workers in Soc\. Net\.s,”IEEE Globecom, pp\. 1\-6, Dec\. 2019\.
- \[9\]R\. Negi and M\. Yilmaz, “A Theoretical Framework for Information Search using Online Social Networks,”IEEE Globecom \- Social Networks, pp\. 6421\-6426, Dec\. 2022\.
- \[10\]S\. Meyn, Control Techniques for Complex Networks, Cambridge University Press, 2007\.
- \[11\]F\. Han, et\. al\., “Distributed representations of expertise,”Proc\. SIAM Int\. Conf\. on Data Mining, pp\. 531\-539, June 2016\.
- \[12\]P\. Bremaud, Markov chains: Gibbs fields, Monte Carlo simulation, and queues, Springer, New York, 1999\.
- \[13\]R\. Borndörfer, O\. Heismann, “The hypergraph assignment problem,”Discrete Optimization, vol\. 15, pp\. 15\-25, Feb\. 2015\.
Optimal Scheduling in a Question-Answering Forum of Knowledge Workers

Similar Articles

Using OR-Tools CP-SAT for Scheduling Problems

On Wednesdays, We Ask Questions: Optimizing "Active Listening" in Automated Legal Triage and Referral

AMATA: Adaptive Multi-Agent Trajectory Alignment for Knowledge-Intensive Question Answering

SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering

Hey Chat, Can You Teach Me? Structuring Socratic Dialogue for Human Learning in the Wild

Submit Feedback

Similar Articles

Using OR-Tools CP-SAT for Scheduling Problems
On Wednesdays, We Ask Questions: Optimizing "Active Listening" in Automated Legal Triage and Referral
AMATA: Adaptive Multi-Agent Trajectory Alignment for Knowledge-Intensive Question Answering
SPADER: Step-wise Peer Advantage with Diversity-Aware Exploration Rewards for Multi-Answer Question Answering
Hey Chat, Can You Teach Me? Structuring Socratic Dialogue for Human Learning in the Wild