X-SYNTH: Beyond Retrieval -- Enterprise Context Synthesis from Observed Human Attention

arXiv cs.AI Papers

Summary

This paper introduces X-SYNTH, a framework that synthesizes enterprise context by modeling human attention from digital interaction data, achieving a 6.5x improvement in True Lead Rate and reducing False Lead Rate on a sales lead identification task.

arXiv:2605.15505v1 Announce Type: new Abstract: In enterprise operations, the context required for an AI agent task is scattered across systems of record, static information stores, and communication channels. What is stored is system state, a lossy representation of the work that actually happened [2, 52]. The prevailing approach [17, 31, 34, 36] retrieves by matching request content to what is stored; for narrow requests this works well. But synthesis quality depends on knowing what to surface and how to interpret it: knowledge specific to each organization, team, and individual [5, 57, 61], present in behavioral patterns, absent from any retrieval index. For complex agentic tasks it breaks down: True Lead Rate is low, False Lead Rate is high, and the model has no mechanism to improve. We present X-SYNTH, a framework for enterprise context synthesis grounded in human attention, the digitally observable interaction signatures of each worker, encoding not just what they did but the sequence in which they did it, along with implicit reward signals. Behavioral traces preceding positive outcomes are distinguishable from those that did not, without external labeling. X-SYNTH models each individual's behavioral baseline as a Digital Twin Signature (DTS) and selects among seven qualitatively distinct attention filters: Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, and Collective, per individual and per query, to identify causally relevant activity signatures. A four-stage pipeline assembles ranked context grounded in behavioral patterns rather than query embeddings. On a sales lead identification task, a frontier model unaided achieves 9.5% True Lead Rate (TLR) with 90.5% False Lead Rate (FLR). Augmented with X-SYNTH, TLR rises to 61.9% (6.5x) while FLR falls to 18.8%. Enterprise context synthesis is not a retrieval problem. It is a relevance problem, and human attention is its most reliable ground truth.
Original Article
View Cached Full Text

Cached at: 05/18/26, 06:32 AM

# X-SYNTH: Beyond Retrieval — Enterprise Context Synthesis from Observed Human Attention
Source: [https://arxiv.org/html/2605.15505](https://arxiv.org/html/2605.15505)
###### Abstract\.

In enterprise operations, the context required to complete an AI agent task is scattered across systems of record, static information stores, and communication channels\. What is stored is only system state, often a lossy representation of the work that actually happened\(Anonymous,[2025](https://arxiv.org/html/2605.15505#bib.bib3); Soroco and Workfabric AI,[2025](https://arxiv.org/html/2605.15505#bib.bib53)\)\. The prevailing approach\(Lewis et al\.,[2020](https://arxiv.org/html/2605.15505#bib.bib37); Guu et al\.,[2020](https://arxiv.org/html/2605.15505#bib.bib18); Karpukhin et al\.,[2020](https://arxiv.org/html/2605.15505#bib.bib32); Khattab and Zaharia,[2020](https://arxiv.org/html/2605.15505#bib.bib35)\)retrieves by matching the content of a request to what is stored; for narrow, content\-driven requests such as “Find invoice INV\-34231,” this works reasonably well\. But synthesis quality depends on knowing what to surface and how to interpret it: knowledge specific to each organization, team, and individual\(Teevan et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib58); Bennett et al\.,[2012](https://arxiv.org/html/2605.15505#bib.bib6); White et al\.,[2009](https://arxiv.org/html/2605.15505#bib.bib62)\), present in behavioral patterns, absent from any retrieval index or general training corpus\. For complex agentic tasks requiring synthesis from dispersed behavioral evidence, it breaks down: True Lead Rate is low, False Lead Rate is high, and the model has no mechanism to improve\.

We presentX\-SYNTH, a framework for enterprise context synthesis grounded in human attention — the digitally observable interaction signatures of each worker, encoding not just what they did but the sequence in which they did it\. Enterprise workers continuously produce digital interaction data\(González and Mark,[2004](https://arxiv.org/html/2605.15505#bib.bib17); Mark et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib39); Czerwinski et al\.,[2004](https://arxiv.org/html/2605.15505#bib.bib10); Anonymous,[2025](https://arxiv.org/html/2605.15505#bib.bib3); Soroco and Workfabric AI,[2025](https://arxiv.org/html/2605.15505#bib.bib53)\)that encodes not just outcomes but the ordered sequence of actions by which they were reached, along with the implicit reward signals embedded within them\(Joachims,[2002](https://arxiv.org/html/2605.15505#bib.bib28); Joachims et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib29); Kelly and Teevan,[2003](https://arxiv.org/html/2605.15505#bib.bib34); Agichtein et al\.,[2006](https://arxiv.org/html/2605.15505#bib.bib2); Buscher et al\.,[2008](https://arxiv.org/html/2605.15505#bib.bib8)\)\. Everything needed to learn relevance is present: behavioral traces that preceded positive outcomes are distinguishable from those that did not, without any external labeling\. X\-SYNTH models each individual’s behavioral baseline as a Digital Twin Signature \(DTS\) and selects among seven qualitatively distinct attention filters: Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, and Collective, per individual and per query, to identify which activity signatures were causally relevant\. A four\-stage agentic pipeline assembles this into ranked context grounded in behavioral patterns rather than query embeddings\.

On a sales lead identification task, a frontier model unaided achieves a 9\.5% True Lead Rate \(TLR\) with a 90\.5% False Lead Rate \(FLR\)\. Augmented with X\-SYNTH, TLR rises to 61\.9% \(a 6\.5×\\timesimprovement\) while FLR falls to 18\.8%\. Enterprise context synthesis is not a retrieval problem\. It is a relevance problem, and human attention is its most reliable ground truth\.

enterprise context synthesis, human attention modeling, behavioral sequence learning, implicit reward signals, retrieval\-augmented generation, AI agents

††copyright:none††conference:; ;††ccs:Human\-centered computing User interface toolkits††ccs:Computing methodologies Machine learning## 1\.Introduction

Consider a sales representative closing a deal that has been in motion for three months\. The outcome, a signed contract, will eventually appear in a CRM\. But the context that produced it will not: the email thread that resolved a pricing concern, the Slack message that unblocked a technical objection, the internal document revised four times before it moved\. None of these artifacts were written to be retrieved together\. What is relevant to closing the next deal, or finding a new one altogether? And when an AI agent is asked to do exactly that, none of them will surface, because no retrieval system was watching when they mattered\.

Without solving for this level of complexity, AI agents in enterprise operations remain largely limited to fetching static information and executing explicitly defined rules on how to act on it\. Ask an agent\(Yao et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib66); Schick et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib48)\)to retrieve invoice \#12345 and it will succeed\. Ask it to find a new sales lead for your team and it will not\(HERB Benchmark Authors,[2025](https://arxiv.org/html/2605.15505#bib.bib20)\), because finding a new lead is not a retrieval task: there is no record of it waiting to be found\. The agent must*synthesize one*from the behavioral evidence embedded in the team’s history of activity, and the prevailing approach of connecting systems of record and retrieving against the content of a request is not designed for this\.

The consequences are measurable, and recent enterprise\-specific benchmarks confirm them at scale\(HERB Benchmark Authors,[2025](https://arxiv.org/html/2605.15505#bib.bib20); EnterpriseBench Authors,[2025](https://arxiv.org/html/2605.15505#bib.bib14); Wornow et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib63); Muthusamy et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib43); Marreed et al\.,[2025](https://arxiv.org/html/2605.15505#bib.bib41)\)\. In one of our benchmarks, a sales lead identification task in which an agent must surface net\-new opportunities from a seller’s digital interaction history before they are recorded in a CRM, Claude Opus 4\.6 produces a90\.5% False Lead Rate \(FLR\)\(only 1 in 10 surfaced leads is real\) and misses90\.5% of actual leads, catching just 20 of the 210 positive instances in the benchmark\. The model’s intelligence is not the bottleneck; the absence of a principled relevance signal towards outcomes such as this is\.

Human attention — the ordered, digitally observable signature of who focuses on what, in what sequence, and how that deviates from their behavioral baseline — is the ground truth that retrieval\-based approaches lack\. Enterprise workers continuously produce digital interaction data\(González and Mark,[2004](https://arxiv.org/html/2605.15505#bib.bib17); Mark et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib39); Czerwinski et al\.,[2004](https://arxiv.org/html/2605.15505#bib.bib10); Anonymous,[2025](https://arxiv.org/html/2605.15505#bib.bib3); Soroco and Workfabric AI,[2025](https://arxiv.org/html/2605.15505#bib.bib53)\)that encodes not just outcomes but the ordered sequence of actions by which they were reached\. Knowing what signals matter and how to interpret them is specific to each organization, team, geography, and individual\. No general training corpus encodes that a developer’s absence from security reviews is anomalous for*this person specifically*, or that an account executive’s attention shifting to competitor documents signals deal risk; that knowledge lives in the organization’s own behavioral history\.

X\-SYNTH exploits this structure through seven qualitatively distinct attention filters: Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, and Collective, each designed to surface a different behavioral signal\. The same query resolves to a different filter for each individual: a developer who owns security tooling warrants a Proportional filter on their security artifact attention, while one who has gone unusually quiet warrants a Differential filter\. Selection is conditioned on each individual’s Digital Twin Signature \(DTS\), a compact rolling behavioral profile, via a learned function*Query*×\\times*DTS*→\\rightarrow*Modality*\. A four\-stage agentic pipeline assembles the result: subject scoping, individualized modality selection, attention\-and\-content\-weighted retrieval, and synthesis\.

QueryqqStage 1: Subject Scopingresolve𝒰q⊆𝒰\\mathcal\{U\}\_\{q\}\\subseteq\\mathcal\{U\}from queryStage 2: Human Attention Modality Selection𝐦​\(q,u\)=softmax​\(fθ​\(\[𝐪;DTS​\(u,τ\)\]\)\)\\mathbf\{m\}\(q,u\)=\\mathrm\{softmax\}\(f\_\{\\theta\}\(\[\\mathbf\{q\};\\ \\mathrm\{DTS\}\(u,\\tau\)\]\)\)DTS\(u,τ\)\(u,\\tau\): rolling behavioral profileProportional⋅\\cdotInverse⋅\\cdotDifferential⋅\\cdotRecurrentComparative⋅\\cdotSequential⋅\\cdotCollectiveStage 3: Retrieval & Weightingw​\(aj,q,u\)=Iattn​\(aj,q,u\)⋅Icontent​\(aj,q\)w\(a\_\{j\},q,u\)=I^\{\\mathrm\{attn\}\}\(a\_\{j\},q,u\)\\cdot I^\{\\mathrm\{content\}\}\(a\_\{j\},q\)Stage 4: Synthesisy^=Synthesize​\(q,ℛ\)\\hat\{y\}=\\mathrm\{Synthesize\}\(q,\\ \\mathcal\{R\}\)Responsey^\\hat\{y\}𝒰q\\mathcal\{U\}\_\{q\}per\-individual𝐦​\(q,u\)\\mathbf\{m\}\(q,u\)ℛ\\mathcal\{R\}: top\-KKper individualfeedback: updatefθf\_\{\\theta\}Figure 1\.The X\-SYNTH pipeline\. A query is scoped to target individuals𝒰q\\mathcal\{U\}\_\{q\}; each is assigned a modality distribution over seven attention filters conditioned on their Digital Twin Signature \(DTS\) and the query\. Artifacts are ranked by combined attention\-and\-content importance; a synthesis stage assembles the response\. The same query resolves to different filters for different individuals\.X\-SYNTH makes the following contributions:

- •Human attention as behavioral ground truth\.Human attention — operationalized as the ordered, digitally observed interaction sequence of each worker — is a learnable and discriminative relevance signal without external labels\. Each individual’s Digital Twin Signature \(DTS\) encodes this as a compact rolling behavioral profile\.
- •Seven\-filter attention modality framework\.Seven qualitatively distinct attention filters \(Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, Collective\), selected per individual and per query via a lightweight MLP conditioned on*Query*×\\times*DTS*\.
- •Four\-stage agentic synthesis pipeline\.Subject scoping, individualized modality selection, attention\-and\-content\-weighted retrieval, and synthesis, with per\-individual filter assignment grounded in each person’s behavioral baseline\.
- •Feedback loop with credit attribution\.X\-SYNTH decomposes terminal feedback into stage\-level failure probabilities, updating the modality selector only when failure is attributed to modality misclassification rather than retrieval or synthesis error, closing the improvement loop without polluting the training signal\.

Augmenting the same Claude Opus 4\.6 model with X\-SYNTH yields substantial improvements\. True Lead Rate \(TLR\) jumps from9\.5% to 61\.9%\(a6\.5×\\timesimprovement\), capturing 130 vs\. 20 of 210 real leads, while FLR falls from90\.5% to 18\.8%\. In absolute terms, X\-SYNTH surfaces110 additional real leadsthat the out\-of\-the\-box approach misses entirely\.

## 2\.Related Work

X\-SYNTH sits at the intersection of retrieval\-augmented generation, implicit feedback in information retrieval, personalized user modeling, knowledge\-worker behavioral studies, enterprise workflow capture, LLM agents, and learned routing\. We survey each lineage and position X\-SYNTH within it\.

### 2\.1\.Retrieval\-Augmented Generation

Retrieval\-augmented generation\(Lewis et al\.,[2020](https://arxiv.org/html/2605.15505#bib.bib37); Guu et al\.,[2020](https://arxiv.org/html/2605.15505#bib.bib18)\)grounds language model outputs in retrieved passages, with dense\(Karpukhin et al\.,[2020](https://arxiv.org/html/2605.15505#bib.bib32)\), sparse\(Robertson and Zaragoza,[2009](https://arxiv.org/html/2605.15505#bib.bib47)\), and hybrid\(Khattab and Zaharia,[2020](https://arxiv.org/html/2605.15505#bib.bib35)\)retrievers as standard components\. Subsequent work has explored larger retrieval pools\(Borgeaud et al\.,[2022](https://arxiv.org/html/2605.15505#bib.bib7); Izacard et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib25)\)and architectural variants for fusing retrieved content\(Izacard and Grave,[2021](https://arxiv.org/html/2605.15505#bib.bib24)\)\. A recent thread makes retrieval adaptive — deciding whether\(Asai et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib4)\), when\(Jiang et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib27); Su et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib54)\), or how to correct\(Yan et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib65)\)mid\-generation — and black\-box variants\(Shi et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib51); Ram et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib45)\)extend the paradigm to closed models\. Across this literature, the relevance signal is thequery: passages are selected by content similarity, optionally filtered by self\-judged sufficiency\. It struggles when the answer is dispersed across artifacts that share no surface similarity to the query, as recent enterprise\-specific benchmarks demonstrate\(HERB Benchmark Authors,[2025](https://arxiv.org/html/2605.15505#bib.bib20); EnterpriseBench Authors,[2025](https://arxiv.org/html/2605.15505#bib.bib14)\): HERB\(HERB Benchmark Authors,[2025](https://arxiv.org/html/2605.15505#bib.bib20)\)reports best\-in\-class agentic RAG averaging∼\\sim33% on a heterogeneous enterprise corpus and identifies retrieval as the bottleneck\. X\-SYNTH adds an orthogonal signal — observed human attention — rather than improving content\-based retrieval\.

### 2\.2\.Implicit Feedback as Relevance Signal

The premise that user behavior carries relevance information predates the RAG era\. Joachims\(Joachims,[2002](https://arxiv.org/html/2605.15505#bib.bib28)\)showed that clickthrough data yields ranking signals competitive with editorial judgments; subsequent work refined the interpretation\(Joachims et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib29)\)and addressed selection bias\(Joachims et al\.,[2017](https://arxiv.org/html/2605.15505#bib.bib30)\)\. The signal vocabulary expanded to dwell time\(Claypool et al\.,[2001](https://arxiv.org/html/2605.15505#bib.bib9)\), revisits, and post\-click actions\(Kelly and Teevan,[2003](https://arxiv.org/html/2605.15505#bib.bib34); Fox et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib16)\), with eye\-tracking studies grounding these in human attentional mechanisms\(Buscher et al\.,[2008](https://arxiv.org/html/2605.15505#bib.bib8)\)\. Agichtein et al\.\(Agichtein et al\.,[2006](https://arxiv.org/html/2605.15505#bib.bib2)\)integrated behavioral and content features in a unified ranker — a formulation X\-SYNTH generalizes in §[3](https://arxiv.org/html/2605.15505#S3), where attention importance and content relevance combine multiplicatively\. Sequential modeling\(Hidasi et al\.,[2016](https://arxiv.org/html/2605.15505#bib.bib21); Kang and McAuley,[2018](https://arxiv.org/html/2605.15505#bib.bib31); Sun et al\.,[2019](https://arxiv.org/html/2605.15505#bib.bib57); Zhou et al\.,[2018](https://arxiv.org/html/2605.15505#bib.bib67)\)established that theorderof activity carries information beyond its aggregate\. X\-SYNTH inherits both intuitions — behavior as implicit reward, sequence as structure — and extends them: the behavioral substrate is enterprise activity over heterogeneous artifacts rather than search clicks; relevance is structured into seven qualitatively distinct filters; and an explicit per\-individual baseline \(the DTS\) replaces population\-averaged relevance\.

### 2\.3\.Personalized Search and User Modeling

Personalization in IR began with the recognition that the same query should yield different results for different users\(Teevan et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib58)\)\. Building per\-user profiles from prior activity enabled re\-ranking without requiring users to specify intent\(Sugiyama et al\.,[2004](https://arxiv.org/html/2605.15505#bib.bib55); Shen et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib50); Dou et al\.,[2007](https://arxiv.org/html/2605.15505#bib.bib11)\)\. Bennett et al\.\(Bennett et al\.,[2012](https://arxiv.org/html/2605.15505#bib.bib6)\)showed that short\-term and long\-term signals are complementary, motivating multi\-window user models; X\-SYNTH’s𝐯div\\mathbf\{v\}^\{\\text\{div\}\}component \(KL divergence between 5\-day and 14\-day domain attention\) directly inherits this insight\. Contextual user\-interest prediction\(White et al\.,[2009](https://arxiv.org/html/2605.15505#bib.bib62)\)and personal corpus retrieval\(Dumais et al\.,[2003](https://arxiv.org/html/2605.15505#bib.bib13)\)further established that behavioral history is searchable and predictive\. The DTS differs from prior user models in that it is built on enterprise activity streams rather than search interactions, it is structured into five distinct behavioral components, and it is consumed by a learned router \(fθf\_\{\\theta\}\) rather than used directly as ranker features — allowing the same query to resolve to different filters for different individuals\.

### 2\.4\.Knowledge\-Worker Behavior and Enterprise Capture

González and Mark\(González and Mark,[2004](https://arxiv.org/html/2605.15505#bib.bib17)\)showed that information workers operate across roughly ten working spheres simultaneously, switching every twelve minutes on average; subsequent studies confirmed the pattern\(Mark et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib39); Czerwinski et al\.,[2004](https://arxiv.org/html/2605.15505#bib.bib10); Iqbal and Horvitz,[2007](https://arxiv.org/html/2605.15505#bib.bib23)\)\. Work rhythms are themselves structured and predictable\(Mark et al\.,[2014](https://arxiv.org/html/2605.15505#bib.bib40)\), supporting the hypothesis that an individual’s behavioral baseline is a meaningful, learnable object\. Activity\-aware systems\(Dragunov et al\.,[2005](https://arxiv.org/html/2605.15505#bib.bib12); Bardram,[2005](https://arxiv.org/html/2605.15505#bib.bib5)\)established that capturing activity as a first\-class signal is technically feasible, and enterprise search has long been recognized as structurally distinct from web search\(Hawking,[2004](https://arxiv.org/html/2605.15505#bib.bib19)\)\. Operationalizing behavior\-as\-signal at scale requires high\-fidelity interaction capture\(Soroco and Workfabric AI,[2025](https://arxiv.org/html/2605.15505#bib.bib53)\); ENTROPHY\(Anonymous,[2025](https://arxiv.org/html/2605.15505#bib.bib3)\)provides the first large\-scale dataset of real production workflows averaging 178 steps, confirming that frontier models cannot treat interaction streams as plain text\. Process mining\(van der Aalst,[2016](https://arxiv.org/html/2605.15505#bib.bib59)\)and digital interaction intelligence\(Modi and Kumar,[2024](https://arxiv.org/html/2605.15505#bib.bib42)\)share the same input but target process discovery rather than query\-conditioned synthesis\.

### 2\.5\.Enterprise LLM Agents

A growing body of work explores LLMs as enterprise actors\(Wornow et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib63); Muthusamy et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib43); Rizk et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib46); Marreed et al\.,[2025](https://arxiv.org/html/2605.15505#bib.bib41)\), with Kayali et al\.\(Kayali et al\.,[2025](https://arxiv.org/html/2605.15505#bib.bib33)\)framing the gap between LLM capabilities and enterprise data integration\. These works largely converge on a finding consistent with ours: enterprise context is the missing ingredient, and content\-based retrieval alone is insufficient\. X\-SYNTH operationalizes one specific bridge — observed human attention as the relevance ground truth\.

### 2\.6\.LLM Agents: Reasoning, Tools, and Self\-Improvement

The agentic harness X\-SYNTH is embedded in draws on the reasoning\-and\-acting paradigm\(Yao et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib66)\)built on chain\-of\-thought reasoning\(Wei et al\.,[2022](https://arxiv.org/html/2605.15505#bib.bib61)\)\. Tool\-using LMs\(Schick et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib48); Qin et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib44)\)established that LLMs can route between specialized capabilities; multi\-agent frameworks\(Wu et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib64); Hong et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib22)\)generalize this to coordinated specialists\. Self\-improvement via terminal feedback has been explored\(Shinn et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib52); Madaan et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib38); Wang et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib60)\); the CoALA framework\(Sumers et al\.,[2024](https://arxiv.org/html/2605.15505#bib.bib56)\)organizes agents into modular memory, action, and decision components\. X\-SYNTH’s contribution at the agent layer is theattributionof terminal feedback to pipeline stages: rather than treating a poor response as evidence to update every component\(Shinn et al\.,[2023](https://arxiv.org/html/2605.15505#bib.bib52)\), we decompose failure into stage\-level probabilities and update the modality selector only when failure is attributed to Stage 2\.

### 2\.7\.Mixture\-of\-Experts and Learned Routing

The modality selector𝐦​\(q,u\)=softmax​\(fθ​\(\[𝐪;DTS​\(u,τ\)\]\)\)\\mathbf\{m\}\(q,u\)=\\text\{softmax\}\(f\_\{\\theta\}\(\[\\mathbf\{q\};\\text\{DTS\}\(u,\\tau\)\]\)\)is a soft router over seven specialist filters, echoing the mixture\-of\-experts lineage from Jacobs et al\.\(Jacobs et al\.,[1991](https://arxiv.org/html/2605.15505#bib.bib26)\)through modern sparse MoE\(Shazeer et al\.,[2017](https://arxiv.org/html/2605.15505#bib.bib49); Fedus et al\.,[2022](https://arxiv.org/html/2605.15505#bib.bib15); Lepikhin et al\.,[2021](https://arxiv.org/html/2605.15505#bib.bib36)\)\. The key structural difference is that the router is conditioned on theuser state\(the DTS\) in addition to the query, so the same query routes to different filters for different individuals — a property that standard input\-only MoE gating does not exhibit\.

## 3\.Methodology

X\-SYNTH is an agentic retrieval and synthesis system that resolves a natural language queryqqagainst a corpus of human interaction tracesℰ\\mathcal\{E\}\. The pipeline proceeds through four stages as shown in Figure[1](https://arxiv.org/html/2605.15505#S1.F1): \(i\) subject scoping, \(ii\) attention modality selection, \(iii\) artifact retrieval and importance weighting, and \(iv\) response synthesis\. Table[1](https://arxiv.org/html/2605.15505#S3.T1)summarizes key notation\.

Table 1\.Key notation used throughout §[3](https://arxiv.org/html/2605.15505#S3)\. The DTS vectorDTS​\(u,τ\)\\text\{DTS\}\(u,\\tau\)concatenates five behavioral components over a rolling behavioral window;𝐦​\(q,u\)∈Δ6\\mathbf\{m\}\(q,u\)\\in\\Delta^\{6\}is the learned soft assignment over the seven attention filters;w​\(aj,q,u\)w\(a\_\{j\},q,u\)is the final artifact importance combining attention and content signals\.### 3\.1\.Stage 1: Subject Scoping

Given queryqq, the system identifies the target set of individuals:

𝒰q=Scope​\(q,𝒰\)\\mathcal\{U\}\_\{q\}=\\text\{Scope\}\(q,\\mathcal\{U\}\)
Ifqqreferences a named individual or group \(e\.g\.,“John”,“the security team”\),𝒰q\\mathcal\{U\}\_\{q\}is resolved via entity extraction\. If no subject constraint is present,𝒰q=𝒰\\mathcal\{U\}\_\{q\}=\\mathcal\{U\}\. Subject scoping is deterministic and requires no learned components\.

### 3\.2\.Stage 2: Attention Modality Selection

#### 3\.2\.1\.Digital Twin Signature

For each individualu∈𝒰qu\\in\\mathcal\{U\}\_\{q\}, the system maintains a Digital Twin Signature computed over a rolling behavioral windowτ\\tau:

DTS​\(u,τ\)=\[𝐯dom,𝐯rhythm,𝐯base,𝐯resp,𝐯div\]\\text\{DTS\}\(u,\\tau\)=\\left\[\\mathbf\{v\}^\{\\text\{dom\}\},\\ \\mathbf\{v\}^\{\\text\{rhythm\}\},\\ \\mathbf\{v\}^\{\\text\{base\}\},\\ \\mathbf\{v\}^\{\\text\{resp\}\},\\ \\mathbf\{v\}^\{\\text\{div\}\}\\right\]
where the five components encode:

- •𝐯dom\\mathbf\{v\}^\{\\text\{dom\}\}—domain attention: where attention is concentrated across work domains\.
- •𝐯rhythm\\mathbf\{v\}^\{\\text\{rhythm\}\}—behavioral rhythm: statistics capturing dwell, revisit, and transition patterns over the window\.
- •𝐯base\\mathbf\{v\}^\{\\text\{base\}\}—baseline: per\-domain baseline statistics over an extended lookback, establishing what is normal for this individual\.
- •𝐯resp\\mathbf\{v\}^\{\\text\{resp\}\}—responsibility profile: inferred domain ownership derived from activity patterns, not declared role\.
- •𝐯div\\mathbf\{v\}^\{\\text\{div\}\}—short\-vs\-long divergence: how recent attention patterns differ from the individual’s longer\-term baseline, per domain\.

#### 3\.2\.2\.The Seven Attention Filters

X\-SYNTH defines seven qualitatively distinct attention filtersℳ=\{M1,…,M7\}\\mathcal\{M\}=\\\{M\_\{1\},\\ldots,M\_\{7\}\\\}, each designed to surface a different behavioral signal\. Table[2](https://arxiv.org/html/2605.15505#S3.T2)describes each and its intended use\.

Table 2\.The seven attention filters of X\-SYNTH, each operationalizing a qualitatively distinct relevance signal\. Proportional and Recurrent filters reward high absolute attention; Inverse rewards notable absence; Differential rewards deviation from an individual’s own baseline; Comparative rewards rapid alternation between similar artifacts; Sequential rewards workflow\-order anomalies; Collective aggregates across a cohort\. The modality selectorfθf\_\{\\theta\}assigns a soft distribution over these filters per\(q,u\)\(q,u\)pair conditioned on the query and the individual’s DTS\.
#### 3\.2\.3\.Modality Selection

The central claim is that the same query resolves to a different filter for each individual in𝒰q\\mathcal\{U\}\_\{q\}, conditioned on their behavioral baseline\. For each\(q,u\)\(q,u\)pair, the modality selector produces a distribution overℳ\\mathcal\{M\}:

𝐦​\(q,u\)=softmax​\(fθ​\(\[𝐪;DTS​\(u,τ\)\]\)\)∈Δ6\\mathbf\{m\}\(q,u\)=\\text\{softmax\}\\\!\\left\(f\_\{\\theta\}\\\!\\left\(\[\\mathbf\{q\};\\ \\text\{DTS\}\(u,\\tau\)\]\\right\)\\right\)\\in\\Delta^\{6\}
wherefθ:ℝdq\+5​d\+6→ℝ7f\_\{\\theta\}:\\mathbb\{R\}^\{d\_\{q\}\+5d\+6\}\\rightarrow\\mathbb\{R\}^\{7\}is a three\-layer MLP with hidden dimensions 256 and 64\. The output is a soft assignment; in practice, one or two filters dominate for any given\(q,u\)\(q,u\)pair\.

To illustrate: for the query“Is the team on top of the security vulnerabilities flagged this sprint?”, a developer who owns the security tooling and regularly reviews CVE reports triggers a Proportional filter — their attention is the signal\. A developer who used to own security reviews but whose DTS shows a sharp drop in security artifact interaction this sprint triggers a Differential filter — the deviation from their own baseline is the signal\. A developer who rarely touches security at all triggers an Inverse filter — their absence of engagement is the signal\. Same query; three different filters; three different answers surfaced\.

#### 3\.2\.4\.Training

Training follows a two\-phase strategy\. InPhase 1, a rule\-based classifierfrulef\_\{\\text\{rule\}\}assigns modalities based on linguistic cues in the query\. This covers the majority of queries where the correct modality is unambiguous, independent of the DTS\.

InPhase 2, cases wherefrulef\_\{\\text\{rule\}\}produces ambiguous or conflicting signals are collected and labeled by domain experts, yielding a training set\{\(𝐪i,DTS​\(ui,τi\),ki∗\)\}\\\{\(\\mathbf\{q\}\_\{i\},\\text\{DTS\}\(u\_\{i\},\\tau\_\{i\}\),k\_\{i\}^\{\*\}\)\\\}\. The MLPfθf\_\{\\theta\}is trained on this harder subset via cross\-entropy loss:

ℒ​\(θ\)=−∑ilog⁡mki∗​\(qi,ui\)\\mathcal\{L\}\(\\theta\)=\-\\sum\_\{i\}\\log m\_\{k\_\{i\}^\{\*\}\}\(q\_\{i\},u\_\{i\}\)
This strategy ensuresfθf\_\{\\theta\}allocates capacity to the cases where the DTS is genuinely discriminative: where role, behavioral baseline, or a recent shift in behavior determines the correct filter\.

### 3\.3\.Stage 3: Artifact Retrieval and Importance Weighting

For eachu∈𝒰qu\\in\\mathcal\{U\}\_\{q\}, artifacts are ranked by a weight that combines behavioral attention signal with content relevance\.

Attention\-based importance\.Each modalityMkM\_\{k\}defines an importance functionIk:𝒜→ℝ≥0I\_\{k\}:\\mathcal\{A\}\\rightarrow\\mathbb\{R\}\_\{\\geq 0\}\. Given the soft modality vector𝐦​\(q,u\)\\mathbf\{m\}\(q,u\), the blended attention importance is:

Iattn​\(aj,q,u\)=∑k=17mk​\(q,u\)⋅Ik​\(aj,u,τ\)I^\{\\text\{attn\}\}\(a\_\{j\},q,u\)=\\sum\_\{k=1\}^\{7\}m\_\{k\}\(q,u\)\\cdot I\_\{k\}\(a\_\{j\},\\ u,\\ \\tau\)
where eachIkI\_\{k\}scores artifacts using the individual’s behavioral data under filterMkM\_\{k\}\.

Content\-based relevance\.Each artifact is also scored for relevance toqqby combining lexical and semantic signals, producing a content relevance scoreIcontent​\(aj,q\)I^\{\\text\{content\}\}\(a\_\{j\},q\)\.

Combined weight\.The final importance weight combines both signals multiplicatively:

w​\(aj,q,u\)=Iattn​\(aj,q,u\)⋅Icontent​\(aj,q\)w\(a\_\{j\},q,u\)=I^\{\\text\{attn\}\}\(a\_\{j\},q,u\)\\cdot I^\{\\text\{content\}\}\(a\_\{j\},q\)
The multiplicative formulation ensures an artifact must be both relevant to the query*and*meaningfully attended to\. Artifacts are ranked byw​\(aj,q,u\)w\(a\_\{j\},q,u\)and the top\-KKare selected per individual\.

### 3\.4\.Stage 4: Agentic Synthesis

Tool selection\.X\-SYNTH is embedded within an agentic harness in which the attention modality selector is one of several available tools\. The attention tool is invoked when the query is evidence\-seeking over human behavior\. When invoked, it returns for eachu∈𝒰qu\\in\\mathcal\{U\}\_\{q\}the ranked artifact set\{\(aj,w​\(aj,q,u\)\)\}j=1K\\\{\(a\_\{j\},w\(a\_\{j\},q,u\)\)\\\}\_\{j=1\}^\{K\}along with modality annotations explaining why each artifact was surfaced\.

Synthesis\.Once evidence is gathered across allu∈𝒰qu\\in\\mathcal\{U\}\_\{q\}, a synthesis step produces the response\. Letℛu\\mathcal\{R\}\_\{u\}denote the retrieved evidence set for individualuu\. The synthesizer receivesℛ=⋃u∈𝒰qℛu\\mathcal\{R\}=\\bigcup\_\{u\\in\\mathcal\{U\}\_\{q\}\}\\mathcal\{R\}\_\{u\}and produces:

y^=Synthesize​\(q,ℛ\)\\hat\{y\}=\\text\{Synthesize\}\(q,\\ \\mathcal\{R\}\)
whereSynthesize​\(⋅\)\\text\{Synthesize\}\(\\cdot\)is an LLM prompted with the query, the ranked evidence, and the modality annotations\. The annotations are critical: they provide the synthesizer with an explicit account of*why*each artifact was retrieved, grounding the response and reducing hallucination\.

Feedback loop\.Terminal feedback \(user satisfaction signals∈\{0,1\}s\\in\\\{0,1\\\}\) is insufficient for training the modality selector directly, as a poor response may stem from modality misclassification, retrieval failure, or synthesis error\. X\-SYNTH maintains intermediate signals to decompose feedback into stage\-level attribution probabilities:

P​\(failure at stage​k∣s=0,ℛ,y^\)P\(\\text\{failure at stage \}k\\mid s=0,\\ \\mathcal\{R\},\\ \\hat\{y\}\)
The modality selector is updated only when failure is attributed to Stage 2 with sufficient confidence, closing the improvement loop without polluting the training signal with synthesis\-layer errors\.

## 4\.Evaluation

### 4\.1\.Dataset

We evaluate on a dataset of real\-world digital interaction traces collected from five knowledge workers — spanning roles of Account Executives, Sales Directors, and Client Resource Managers — at an anonymized Fortune 500 enterprise\. Data collection covered 25 working days per participant, yielding 125 seller journeys in total\.

Each interaction record is a structured tuple comprising: participant ID, application accessed, UTC timestamp, screen content \(title, key UI attributes, and on\-screen text\), user action, and dwell time\. A single knowledge worker generates on the order of 1,000 interactions per day, with each interaction carrying up to 1,600 LLM tokens of context\. This yields approximately 1\.6M tokens per worker per day; across five workers and 25 days, the full dataset comprises roughly 200M tokens of raw interaction data\. All participant data was anonymized prior to analysis in accordance with enterprise data handling agreements\.

### 4\.2\.Benchmark Construction

The core evaluation task is: given a sequence of raw digital interactions, can a system detect and synthesize a latent business opportunity that a seller would legitimately file in their sales platform?

To construct ground\-truth labels without manual annotation, we exploit a naturally occurring signal in the data: sellers routinely file new business opportunities directly within their seller platform, and these filing events appear as interactions in the raw log\. Our benchmark construction proceeds as follows\. We identify all instances where a seller filed a new business opportunity — the*pivot point*— and reconstruct the preceding interaction sequence as the system input\. The pivot point itself, along with any directly associated filing\-screen interactions, is removed from the input; only interactions that preceded and plausibly informed the opportunity are retained — emails received, Teams conversations, Excel updates, SOW documents reviewed in Word, and similar\. The removed filing event and its structured content serve as the ground\-truth target\.

This yields a benchmark of302 instances: 210 positive instances where a genuine business opportunity was filed, and 92 negative instances where the seller completed routine work with no opportunity filed\.

### 4\.3\.Metrics

We report three primary metrics, motivated directly by business impact:

True Lead Rate \(TLR\)\.The fraction of positive instances in which the system successfully surfaces a recommendation that matches the ground\-truth opportunity\. This is the primary measure of business value — leads correctly identified represent revenue opportunities that the system has made actionable for the seller\.

Missed Lead Rate \(MLR\)\.The fraction of positive instances in which the system fails to surface a recommendation corresponding to the ground\-truth opportunity\. A missed lead represents lost revenue potential — a signal present in the data that the system did not act on\. Note that TLR and MLR are complementary:MLR=1−TLR\\text\{MLR\}=1\-\\text\{TLR\}over positive instances\.

False Lead Rate \(FLR\)\.The fraction of instances in which the system recommends a business opportunity that does not correspond to a valid ground\-truth filing\. Each false lead imposes a direct cost on the seller, requiring manual review and discard\.

Evaluation is automated\.For each system\-proposed lead, we compute embedding similarity between the proposed opportunity description and the ground\-truth filing to assess semantic alignment\. For interaction attribution — whether the system correctly identified the underlying evidence that led to the opportunity — we score based on key attribute match: a proposed evidence set receives full credit if its key attributes align with those captured in the ground\-truth dataset for that instance\. This attribution metric is naturally applicable to systems, like X\-SYNTH, that produce structured evidence traces; for baseline models that operate over the raw interaction sequence without explicit attribution, evaluation defaults to the embedding similarity score alone\. A human verification pass is applied to borderline embedding\-similarity cases to confirm match validity\.

### 4\.4\.Baselines

We compare X\-SYNTH against the following baselines:

- •Small LLM \(direct\)\.A smaller language model provided the raw interaction sequence directly, prompted to identify and synthesize business opportunities, without additional scaffolding\.
- •Frontier LLM \(direct\)\.Claude Opus 4\.6, prompted identically to the small LLM baseline, serving as a strong upper bound on unstructured LLM capability over the full interaction context\.
- •X\-SYNTH \+ Small LLM\.Our proposed agentic framework paired with a smaller underlying model\.
- •X\-SYNTH \+ Frontier LLM\.X\-SYNTH paired with Claude Opus 4\.6 as the reasoning backbone\.

This design isolates the contribution of the X\-SYNTH framework from raw model capability, and characterizes the performance\-cost tradeoff across model scales\. Beyond performance, we report inference cost across all configurations — measured in tokens processed and estimated API cost per 125 seller journeys — to assess the economic viability of each approach at enterprise scale\. We defer detailed cost and performance analysis to §[5](https://arxiv.org/html/2605.15505#S5)\.

### 4\.5\.Evaluation Protocol

For each benchmark instance, the system receives the full sequence of preceding digital interactions as input and is tasked with producing zero or more proposed business opportunity recommendations\. No system has access to the held\-out ground\-truth filing during inference\. Evaluation is conducted in a fully automated pipeline, with human spot\-checks applied to ambiguous similarity scores as described above\.

## 5\.Results

### 5\.1\.Quantitative Results

Table[3](https://arxiv.org/html/2605.15505#S5.T3)summarizes performance across all four configurations on the 302\-instance benchmark\. X\-SYNTH improves both small and frontier models substantially: paired with GPT\-4o\-mini it raises TLR from 0% to 57\.1% while reducing false leads from 40 to 6; paired with Claude Opus 4\.6 it raises TLR from 9\.5% to 61\.9% \(a 6\.5×\\timesimprovement\) while reducing false leads from 50 to 30\. The framework contribution is consistent across model scales — the pattern is not an artifact of raw model capability\.

Table 3\.Quantitative results on the 302\-instance benchmark \(210 positive, 92 negative\)\. True/Missed/False are raw counts; TLR and MLR are computed over 210 positive instances\. Small LLM = GPT\-4o\-mini; Frontier LLM = Claude Opus 4\.6\.Figures[2](https://arxiv.org/html/2605.15505#S5.F2)and[3](https://arxiv.org/html/2605.15505#S5.F3)show the same results visually\. In the small\-LLM case, the unaided model finds zero true leads — the task is entirely outside its reach without structured behavioral context\. X\-SYNTH converts it into a functional system\. In the frontier\-LLM case, the baseline is non\-zero but still misses 190 of 210 real opportunities; X\-SYNTH recovers 110 of those\.

![Refer to caption](https://arxiv.org/html/2605.15505v1/figures/Fig6-1.png)Figure 2\.Lead automation performance across model configurations \(small LLM\)\.X\-SYNTH \+ GPT\-4o\-mini vs\. GPT\-4o\-mini direct\. True leads rise from 0 to 120; missed leads fall from 210 to 90; false leads fall from 40 to 6\.![Refer to caption](https://arxiv.org/html/2605.15505v1/figures/Fig6-2.png)Figure 3\.Lead automation performance across model configurations \(frontier LLM\)\.X\-SYNTH \+ Claude Opus 4\.6 vs\. Claude Opus 4\.6 direct\. True leads rise from 20 to 130 \(6\.5×\\times\); missed leads fall from 190 to 80; false leads fall from 50 to 30\.
### 5\.2\.Qualitative Analysis: The Enterprise Vocabulary Gap

The quantitative gap between baseline and X\-SYNTH\-paired models is driven in large part by enterprise\-specific vocabulary that no general training corpus encodes\. Figure[4](https://arxiv.org/html/2605.15505#S5.F4)shows a representative trace for account XGS Private Ltd\.

At 17:38, seller Kev receives an email with subject “Create FZ for PO” attaching a contract PDF\. Both abbreviations — FZ and PO — are organizational terms whose meaning cannot be inferred from their surface form\.

Figure[5](https://arxiv.org/html/2605.15505#S5.F5)shows Claude Opus 4\.6’s reasoning: it correctly understands the account context, the CRM lookup, and the email thread, but flags FZ and PO as unknown \(“Is it a document type? A system? A process step?”\)\. Unable to interpret the request, it outputs:No new leads or business opportunities are to be filed\.

Figure[6](https://arxiv.org/html/2605.15505#S5.F6)shows X\-SYNTH’s reasoning on the same trace\. From 47 prior interactions in the seller’s Digital Twin Signature, X\-SYNTH infers that FZ = “Formal Zone,” the seller\-platform term for filing a new business opportunity record\. From prior communications with Kumar, it infers that PO = purchase order, and that the deal closed through an account\-managed relationship\-driven sale before it was tracked on the platform\. X\-SYNTH outputs:File a new opportunity in Formal Zone \(FZ\) for the PO with the details mentioned in the attached contract\.

outlook: opened
17:30, 12/3/26Subject: “TRV QE forecast”
→\\rightarrowappointment createdoutlook: read
17:31, 12/3/26From: sarah\.chen@acme\.corp
To: kev@acme\.corp
Subject: updates?
Body: “What is going on about XGS Private Ltd?”CRM: open
17:32, 12/3/26⟨\\langlehomepage⟩\\rangleCRM: search
17:33, 12/3/26Searched: “XGS Private Ltd”CRM: view
17:34, 12/3/26Viewed: XGS Private Ltd\|\|Opportunities
Net TCV: 16k$⋅\\cdotPrimary lead: Chandraoutlook: view
17:36, 12/3/26Meeting reminder: Q2 forecast sync with leadershipoutlook: close
17:37, 12/3/26Meeting reminder: Q2 forecast sync with leadershipoutlook: read
17:38, 12/3/26From: kumar@acme\.corp
To: kev@acme\.corp
Subject: Create FZ for PO
Body: PFA details
Attachment: CP01000302402\.pdfoutlook: write
17:43, 12/3/26From: kev@acme\.corp
To: sarah\.chen@acme\.corp
Subject: updates?
Body: XGS Private is currently at a net TCV of 16k$\.
I’ll follow up with Chandra to see if any new RFPs are out\.Figure 4\.Interaction trace for the XGS Private Ltd example\. Nine consecutive interaction events are shown in chronological order\. The pivotal event \(orange border, 17:38\) is an email from kumar@acme\.corp with subject “Create FZ for PO” — two abbreviations whose meaning is opaque without organizational context\.![Refer to caption](https://arxiv.org/html/2605.15505v1/figures/Fig6-3-b.png)Figure 5\.Claude Opus 4\.6 reasoning: account context and email thread are understood, but FZ and PO are unknown\. Output: no new leads\.![Refer to caption](https://arxiv.org/html/2605.15505v1/figures/Fig-6-3-c.png)Figure 6\.X\-SYNTH reasoning: FZ inferred as “Formal Zone” from 47 prior interactions; PO inferred as purchase order from prior communications with Kumar\. Output: file a new opportunity in Formal Zone\.
### 5\.3\.Scale: Why Direct Inference Fails

Figure[7](https://arxiv.org/html/2605.15505#S5.F7)illustrates a structural constraint on direct inference: the token volume a single seller generates per day ranges from 120k to 2\.5M tokens across a representative four\-day window\. Passing a full day’s interaction stream to a frontier LLM is not feasible at enterprise scale — the cost is prohibitive and the volume routinely exceeds practical context limits\. This is not an edge case; it is the default operating condition for knowledge workers whose role spans multiple accounts, systems, and communication channels simultaneously\. X\-SYNTH’s attention\-weighted selective retrieval \(Stage 3\) is what makes evaluation over real interaction data tractable: rather than ingesting the full stream, it selects the top\-KKartifacts per individual whose combined attention signal and content relevance are highest for the query\.

![Refer to caption](https://arxiv.org/html/2605.15505v1/figures/Fig-6-4.png)Figure 7\.Token volume per seller per day across a four\-day window \(March 10–13, 2026\)\. Individual daily volumes range from 120k to 2\.5M tokens; direct inference over the full interaction stream is not economically or technically feasible without selective retrieval\.
### 5\.4\.Multi\-Day Context Integration

The third failure mode of direct inference is temporal: a single\-day context window misses behavioral signals that only become interpretable across multiple days\. Tables[4](https://arxiv.org/html/2605.15505#S5.T4)and[5](https://arxiv.org/html/2605.15505#S5.T5)show a paired example for seller s\.chen and account AC \(Acme Corp\)\.

Table[4](https://arxiv.org/html/2605.15505#S5.T4)shows the five March 13 interactions that Claude Opus 4\.6 receives as input\. The model surfaces a Globex renewal follow\-up \(a genuine signal from the Slack message at 15:10\) but produces no recommendation for an Acme Corp streaming opportunity\. The interactions are coherent within the single day, but the signal is too weak: one Helix ticket, one Vault document review, and one Slack message do not, in isolation, establish a clear opportunity\.

Table 4\.A single day of interaction data \(March 13\) as consumed by Claude Opus 4\.6\. The model surfaces a Globex renewal follow\-up but produces no recommendation for the Acme Corp \(AC\) streaming opportunity — the signal is present but the single\-day window lacks the behavioral accumulation needed to infer it\.Table[5](https://arxiv.org/html/2605.15505#S5.T5)shows the ten interaction snippets that X\-SYNTH selects across four days \(March 10–13\)\. The selection is driven by the DTS\-conditioned attention filter: X\-SYNTH identifies that s\.chen’s attention has been accumulating on AC\-related artifacts across multiple applications over multiple days, and retrieves the artifacts that account for the most behavioral signal\. The result is a lead trace that no single\-day window could assemble\.

Table 5\.X\-SYNTH pulls 10 interaction snippets across four days \(March 10–13\) from Gmail, Slack, Helix, Lens, Vault, and Meridian — each weak or ambiguous alone — and integrates them into a single lead trace\. The trace establishes: AC has no streaming module license and is actively working around the gap; RiverFlow is pitching into the same gap; s\.chen has completed discovery, reviewed contract sections, and sized the deal — but has not filed\. X\-SYNTH recommends: file a new streaming module expansion opportunity for AC, flagged as active competition\.The five March 13 interactions in Table[4](https://arxiv.org/html/2605.15505#S5.T4)are a strict subset of the ten interactions in Table[5](https://arxiv.org/html/2605.15505#S5.T5)\. Claude Opus 4\.6 had access to the same March 13 data and missed the opportunity; X\-SYNTH extended the retrieval window to four days, surfacing five additional artifacts that together made the AC streaming gap, RiverFlow’s presence, and the seller’s accumulated deal activity legible as a filing\-ready opportunity\. The miss was not caused by insufficient raw data — it was caused by the absence of a mechanism to select the right temporal window\.

### 5\.5\.Summary: What Drives the Improvement

The quantitative gap between X\-SYNTH and direct inference is not attributable to a single factor\. Three structural properties of enterprise knowledge work, each invisible to a model operating over raw interaction streams, account for the improvement collectively\.

Enterprise context and priors\.General\-purpose models lack the organizational priors needed to interpret enterprise signals correctly\. The XGS Private Ltd example illustrates the failure mode: two opaque abbreviations are sufficient to suppress an otherwise detectable opportunity\. Correct interpretation requires priors built from the seller’s own prior interaction history — priors that must be established before inference, because each new interaction implicitly depends on what preceded it\. X\-SYNTH grounds inference in those accumulated priors rather than in model training data alone\.

Scale and selective retrieval\.A single seller generates up to 2\.5M tokens of interaction data per day — a volume that makes direct inference over the full stream economically and technically infeasible\. X\-SYNTH’s attention\-weighted retrieval collapses this to a tractable, high\-signal subset without requiring manual filtering\.

Cross\-temporal and cross\-individual integration\.Real sales opportunities accumulate as behavioral signals across days and across team members, not within a single session or a single actor’s view\. The Acme Corp example shows that five individually ambiguous March 13 interactions become a filing\-ready lead trace only when extended with artifacts from the preceding three days and from collaborators who share the account context\. No single\-day, single\-individual context window can recover this signal\.

These three properties are not independent failure modes — they compound\. An opportunity that requires cross\-day, cross\-individual context, involves organizational vocabulary, and sits within a high\-volume interaction stream will be missed by direct inference for all three reasons simultaneously\. X\-SYNTH addresses each structurally, which is why the improvement is consistent across both model scales tested\.

The common enabler across all three properties is the DTS\-conditioned attention filter\. Rather than applying a fixed retrieval strategy uniformly, X\-SYNTH conditions filter selection jointly on the query and each individual’s Digital Twin Signature — a rolling behavioral profile encoding domain attention, rhythm, responsibility, and recent divergence from their own baseline\. This single architectural decision is what makes each of the three failure modes recoverable:

Enterprise context and priors\.A general model has no organizational memory\. The DTS does\. Because it is built continuously from the seller’s own interaction history, it encodes the vocabulary, relationships, and conventions specific to that individual and that organization — and it encodes them before any new query arrives\. When a new interaction is observed, the DTS\-conditioned filter interprets it against that accumulated context rather than against generic training priors\. The organizational knowledge is not retrieved; it is already embedded in the relevance signal\.

Scale and selective retrieval\.Without a behavioral prior, a retrieval system has no principled basis for distinguishing a high\-signal artifact from noise at enterprise token volumes\. The DTS provides that prior: the Proportional, Differential, and Recurrent filters each define importance relative to what is normal for this individual, which means the top\-KKselection is not keyword matching against the full stream but a behaviorally\-grounded compression that discards noise by definition\.

Cross\-temporal and cross\-individual integration\.A single\-day, single\-actor window misses signals that only become legible across time and across a team\. The DTS directly encodes this structure: the short\-vs\-long divergence component flags when recent behavior departs from the two\-week baseline, making multi\-day accumulation observable; the Collective filter aggregates attention across the scoped cohort𝒰q\\mathcal\{U\}\_\{q\}, making cross\-individual signals — consensus focus, individual outliers — first\-class inputs to retrieval\. The temporal and team\-level context is not reconstructed after the fact; it is baked into the relevance computation\.

The improvement is not additive across three separate fixes\. It is the consequence of replacing a query\-only relevance function with one that is conditioned on who is doing the work, what their behavioral history looks like, and how that history situates the current moment\.

## 6\.Conclusion

Enterprise AI agents are bottlenecked not by reasoning capability but by the absence of a principled relevance signal\. This paper presents X\-SYNTH, a framework that grounds enterprise context synthesis in observed human attention\. Human attention, as operationalized here, is the ordered, digitally observable interaction signature of each worker — encoding not just what was done, but the sequence in which it was done and the implicit reward signals embedded within\. Each individual’s behavioral baseline is modeled as a Digital Twin Signature \(DTS\), a compact rolling behavioral profile built from five components that together characterize recency, domain focus, diversity, and sequential structure\. A learned router selects among seven qualitatively distinct attention filters — Proportional, Inverse, Differential, Recurrent, Comparative, Sequential, and Collective — per individual and per query, conditioned jointly on the query embedding and the DTS\. A four\-stage agentic pipeline assembles the result: subject scoping, individualized modality selection, attention\-and\-content\-weighted retrieval, and synthesis with credit\-attributed feedback\. On a 302\-instance benchmark drawn from real Fortune 500 enterprise interaction data, augmenting Claude Opus 4\.6 with X\-SYNTH raises True Lead Rate from 9\.5% to 61\.9% \(a 6\.5×\\timesimprovement\) while reducing False Lead Rate from 90\.5% to 18\.8%, surfacing 110 additional real leads that the unaugmented model misses entirely\. Enterprise context synthesis is not a retrieval problem\. It is a relevance problem — and human attention is its most reliable ground truth\.

## References

- \(1\)
- Agichtein et al\.\(2006\)Eugene Agichtein, Eric Brill, and Susan Dumais\. 2006\.Improving Web Search Ranking by Incorporating User Behavior Information\. In*ACM SIGIR Conference on Research and Development in Information Retrieval*\. 19–26\.
- Anonymous \(2025\)Anonymous\. 2025\.ENTROPHY: Multi\-Modal User Interaction Data from Live Enterprise Business Workflows\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*\.Workfabric AI / Soroco\.
- Asai et al\.\(2024\)Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi\. 2024\.Self\-RAG: Learning to Retrieve, Generate, and Critique through Self\-Reflection\. In*International Conference on Learning Representations \(ICLR\)*\.
- Bardram \(2005\)Jakob E\. Bardram\. 2005\.Activity\-Based Computing: Support for Mobility and Collaboration in Ubiquitous Computing\.*Personal and Ubiquitous Computing*9, 5 \(2005\), 312–322\.
- Bennett et al\.\(2012\)Paul N\. Bennett, Ryen W\. White, Wei Chu, Susan T\. Dumais, Peter Bailey, Fedor Borisyuk, and Xiaoyuan Cui\. 2012\.Modeling the Impact of Short\- and Long\-Term Behavior on Search Personalization\. In*ACM SIGIR Conference on Research and Development in Information Retrieval*\.
- Borgeaud et al\.\(2022\)Sebastian Borgeaud, Arthur Mensch, Jordan Hoffmann, Trevor Cai, Eliza Rutherford, Katie Millican, George van den Driessche, Jean\-Baptiste Lespiau, Bogdan Damoc, Aidan Clark, et al\.2022\.Improving Language Models by Retrieving from Trillions of Tokens\. In*International Conference on Machine Learning \(ICML\)*\.
- Buscher et al\.\(2008\)Georg Buscher, Andreas Dengel, and Ludger van Elst\. 2008\.Eye Movements as Implicit Relevance Feedback\. In*CHI Extended Abstracts on Human Factors in Computing Systems*\.
- Claypool et al\.\(2001\)Mark Claypool, Phong Le, Makoto Wased, and David Brown\. 2001\.Implicit Interest Indicators\. In*International Conference on Intelligent User Interfaces \(IUI\)*\. 33–40\.
- Czerwinski et al\.\(2004\)Mary Czerwinski, Eric Horvitz, and Susan Wilhite\. 2004\.A Diary Study of Task Switching and Interruptions\. In*ACM SIGCHI Conference on Human Factors in Computing Systems \(CHI\)*\.
- Dou et al\.\(2007\)Zhicheng Dou, Ruihua Song, and Ji\-Rong Wen\. 2007\.A Large\-scale Evaluation and Analysis of Personalized Search Strategies\. In*International World Wide Web Conference \(WWW\)*\.
- Dragunov et al\.\(2005\)Anton N\. Dragunov, Thomas G\. Dietterich, Kevin Johnsrude, Matthew McLaughlin, Lida Li, and Jonathan L\. Herlocker\. 2005\.TaskTracer: A Desktop Environment to Support Multi\-tasking Knowledge Workers\. In*International Conference on Intelligent User Interfaces \(IUI\)*\.
- Dumais et al\.\(2003\)Susan Dumais, Edward Cutrell, J\. J\. Cadiz, Gavin Jancke, Raman Sarin, and Daniel C\. Robbins\. 2003\.Stuff I’ve Seen: A System for Personal Information Retrieval and Re\-use\. In*ACM SIGIR Conference on Research and Development in Information Retrieval*\.
- EnterpriseBench Authors \(2025\)EnterpriseBench Authors\. 2025\.Can LLMs Help You at Work? A Sandbox for Evaluating LLM Agents in Enterprise Environments\.*arXiv preprint arXiv:2510\.27287*\(2025\)\.
- Fedus et al\.\(2022\)William Fedus, Barret Zoph, and Noam Shazeer\. 2022\.Switch Transformer: Scaling to Trillion Parameter Models with Simple and Efficient Sparsity\.*Journal of Machine Learning Research*\(2022\)\.
- Fox et al\.\(2005\)Steve Fox, Kuldeep Karnawat, Mark Mydland, Susan Dumais, and Thomas White\. 2005\.Evaluating Implicit Measures to Improve Web Search\.*ACM Transactions on Information Systems \(TOIS\)*23, 2 \(2005\), 147–168\.
- González and Mark \(2004\)Victor M\. González and Gloria Mark\. 2004\.“Constant, Constant, Multi\-tasking Craziness”: Managing Multiple Working Spheres\. In*ACM SIGCHI Conference on Human Factors in Computing Systems \(CHI\)*\. 113–120\.
- Guu et al\.\(2020\)Kelvin Guu, Kenton Lee, Zora Tung, Panupong Pasupat, and Ming\-Wei Chang\. 2020\.REALM: Retrieval\-Augmented Language Model Pre\-Training\. In*International Conference on Machine Learning \(ICML\)*\.
- Hawking \(2004\)David Hawking\. 2004\.Challenges in Enterprise Search\. In*Australasian Database Conference \(ADC\)*\.
- HERB Benchmark Authors \(2025\)HERB Benchmark Authors\. 2025\.Benchmarking Deep Search over Heterogeneous Enterprise Data\.*arXiv preprint arXiv:2506\.23139*\(2025\)\.
- Hidasi et al\.\(2016\)Balázs Hidasi, Alexandros Karatzoglou, Linas Baltrunas, and Domonkos Tikk\. 2016\.Session\-Based Recommendations with Recurrent Neural Networks\. In*International Conference on Learning Representations \(ICLR\)*\.
- Hong et al\.\(2024\)Sirui Hong, Mingchen Zhuge, Jonathan Chen, Xiawu Zheng, Yuheng Cheng, Ceyao Zhang, Jinlin Wang, Zili Wang, Steven Ka Shing Yau, Zijuan Lin, et al\.2024\.MetaGPT: Meta Programming for a Multi\-Agent Collaborative Framework\. In*International Conference on Learning Representations \(ICLR\)*\.
- Iqbal and Horvitz \(2007\)Shamsi T\. Iqbal and Eric Horvitz\. 2007\.Disruption and Recovery of Computing Tasks: Field Study, Analysis, and Directions\. In*ACM SIGCHI Conference on Human Factors in Computing Systems \(CHI\)*\.
- Izacard and Grave \(2021\)Gautier Izacard and Edouard Grave\. 2021\.Leveraging Passage Retrieval with Generative Models for Open Domain Question Answering\. In*Conference of the European Chapter of the Association for Computational Linguistics \(EACL\)*\.
- Izacard et al\.\(2023\)Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi\-Yu, Armand Joulin, Sebastian Riedel, and Edouard Grave\. 2023\.Atlas: Few\-shot Learning with Retrieval Augmented Language Models\.*Journal of Machine Learning Research*\(2023\)\.
- Jacobs et al\.\(1991\)Robert A\. Jacobs, Michael I\. Jordan, Steven J\. Nowlan, and Geoffrey E\. Hinton\. 1991\.Adaptive Mixtures of Local Experts\.*Neural Computation*3, 1 \(1991\), 79–87\.
- Jiang et al\.\(2023\)Zhengbao Jiang, Frank F\. Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi\-Yu, Yiming Yang, Jamie Callan, and Graham Neubig\. 2023\.Active Retrieval Augmented Generation\. In*Conference on Empirical Methods in Natural Language Processing \(EMNLP\)*\.
- Joachims \(2002\)Thorsten Joachims\. 2002\.Optimizing Search Engines using Clickthrough Data\. In*ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*\. 133–142\.
- Joachims et al\.\(2005\)Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, and Geri Gay\. 2005\.Accurately Interpreting Clickthrough Data as Implicit Feedback\. In*ACM SIGIR Conference on Research and Development in Information Retrieval*\. 154–161\.
- Joachims et al\.\(2017\)Thorsten Joachims, Adith Swaminathan, and Tobias Schnabel\. 2017\.Unbiased Learning\-to\-Rank with Biased Feedback\. In*ACM International Conference on Web Search and Data Mining \(WSDM\)*\.
- Kang and McAuley \(2018\)Wang\-Cheng Kang and Julian McAuley\. 2018\.Self\-Attentive Sequential Recommendation\. In*IEEE International Conference on Data Mining \(ICDM\)*\.
- Karpukhin et al\.\(2020\)Vladimir Karpukhin, Barlas Oğuz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen\-tau Yih\. 2020\.Dense Passage Retrieval for Open\-Domain Question Answering\. In*Conference on Empirical Methods in Natural Language Processing \(EMNLP\)*\.
- Kayali et al\.\(2025\)Moe Kayali, Frederic Wenz, Nesime Tatbul, and Çağatay Demiralp\. 2025\.Mind the Data Gap: Bridging LLMs to Enterprise Data Integration\. In*Conference on Innovative Data Systems Research \(CIDR\)*\.
- Kelly and Teevan \(2003\)Diane Kelly and Jaime Teevan\. 2003\.Implicit Feedback for Inferring User Preference: A Bibliography\.*ACM SIGIR Forum*37, 2 \(2003\), 18–28\.
- Khattab and Zaharia \(2020\)Omar Khattab and Matei Zaharia\. 2020\.ColBERT: Efficient and Effective Passage Search via Contextualized Late Interaction over BERT\. In*ACM SIGIR Conference on Research and Development in Information Retrieval*\.
- Lepikhin et al\.\(2021\)Dmitry Lepikhin, HyoukJoong Lee, Yuanzhong Xu, Dehao Chen, Orhan Firat, Yanping Huang, Maxim Krikun, Noam Shazeer, and Zhifeng Chen\. 2021\.GShard: Scaling Giant Models with Conditional Computation and Automatic Sharding\. In*International Conference on Learning Representations \(ICLR\)*\.
- Lewis et al\.\(2020\)Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen\-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela\. 2020\.Retrieval\-Augmented Generation for Knowledge\-Intensive NLP Tasks\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*\.
- Madaan et al\.\(2023\)Aman Madaan, Niket Tandon, Prakhar Gupta, Skyler Hallinan, Luyu Gao, Sarah Wiegreffe, Uri Alon, Nouha Dziri, Shrimai Prabhumoye, Yiming Yang, et al\.2023\.Self\-Refine: Iterative Refinement with Self\-Feedback\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*\.
- Mark et al\.\(2005\)Gloria Mark, Victor M\. González, and Justin Harris\. 2005\.No Task Left Behind? Examining the Nature of Fragmented Work\. In*ACM SIGCHI Conference on Human Factors in Computing Systems \(CHI\)*\.
- Mark et al\.\(2014\)Gloria Mark, Shamsi T\. Iqbal, Mary Czerwinski, and Paul Johns\. 2014\.Bored Mondays and Focused Afternoons: The Rhythm of Attention and Online Activity in the Workplace\. In*ACM SIGCHI Conference on Human Factors in Computing Systems \(CHI\)*\.
- Marreed et al\.\(2025\)Sami Marreed, Alon Oved, Avi Yaeli, Segev Shlomov, Ido Levy, Aviad Sela, Asaf Adi, and Nir Mashkif\. 2025\.Towards Enterprise\-Ready Computer Using Generalist Agent\.*arXiv preprint arXiv:2503\.01861*\(2025\)\.
- Modi and Kumar \(2024\)Amardeep Modi and Sharath Kumar\. 2024\.*Digital Interaction Intelligence Products PEAK Matrix Assessment*\.Technical Report\. Everest Group\.
- Muthusamy et al\.\(2023\)Vinod Muthusamy, Yara Rizk, Kiran Kate, Praveen Venkateswaran, Vatche Isahagian, Ashu Gulati, and Parijat Dube\. 2023\.Towards Large Language Model\-Based Personal Agents in the Enterprise: Current Trends and Open Problems\. In*Findings of the Association for Computational Linguistics: EMNLP*\.
- Qin et al\.\(2024\)Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, et al\.2024\.ToolLLM: Facilitating Large Language Models to Master 16000\+ Real\-World APIs\. In*International Conference on Learning Representations \(ICLR\)*\.
- Ram et al\.\(2023\)Ori Ram, Yoav Levine, Itay Dalmedigos, Dor Muhlgay, Amnon Shashua, Kevin Leyton\-Brown, and Yoav Shoham\. 2023\.In\-Context Retrieval\-Augmented Language Models\.*Transactions of the Association for Computational Linguistics*\(2023\)\.
- Rizk et al\.\(2024\)Yara Rizk, Praveen Venkateswaran, Vatche Isahagian, Austin Narcomey, and Vinod Muthusamy\. 2024\.A Case for Business Process\-Specific Foundation Models\. In*Business Process Management Workshops*\. Springer\.
- Robertson and Zaragoza \(2009\)Stephen Robertson and Hugo Zaragoza\. 2009\.The Probabilistic Relevance Framework: BM25 and Beyond\.*Foundations and Trends in Information Retrieval*3, 4 \(2009\), 333–389\.
- Schick et al\.\(2023\)Timo Schick, Jane Dwivedi\-Yu, Roberto Dessì, Roberta Raileanu, Maria Lomeli, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom\. 2023\.Toolformer: Language Models Can Teach Themselves to Use Tools\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*\.
- Shazeer et al\.\(2017\)Noam Shazeer, Azalia Mirhoseini, Krzysztof Maziarz, Andy Davis, Quoc Le, Geoffrey Hinton, and Jeff Dean\. 2017\.Outrageously Large Neural Networks: The Sparsely\-Gated Mixture\-of\-Experts Layer\. In*International Conference on Learning Representations \(ICLR\)*\.
- Shen et al\.\(2005\)Xuehua Shen, Bin Tan, and ChengXiang Zhai\. 2005\.Context\-Sensitive Information Retrieval Using Implicit Feedback\. In*ACM SIGIR Conference on Research and Development in Information Retrieval*\. 43–50\.
- Shi et al\.\(2024\)Weijia Shi, Sewon Min, Michihiro Yasunaga, Minjoon Seo, Rich James, Mike Lewis, Luke Zettlemoyer, and Wen\-tau Yih\. 2024\.REPLUG: Retrieval\-Augmented Black\-Box Language Models\. In*Conference of the North American Chapter of the ACL \(NAACL\)*\.
- Shinn et al\.\(2023\)Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao\. 2023\.Reflexion: Language Agents with Verbal Reinforcement Learning\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*\.
- Soroco and Workfabric AI \(2025\)Soroco and Workfabric AI\. 2025\.*TribeScope: A Foundation Model for Capturing and Interpreting Digital Interaction Data*\.Technical Report\. Workfabric AI\.
- Su et al\.\(2024\)Weihang Su, Yichen Tang, Qingyao Ai, Zhijing Wu, and Yiqun Liu\. 2024\.DRAGIN: Dynamic Retrieval Augmented Generation Based on the Real\-time Information Needs of Large Language Models\. In*Annual Meeting of the Association for Computational Linguistics \(ACL\)*\.
- Sugiyama et al\.\(2004\)Kazunari Sugiyama, Kenji Hatano, and Masatoshi Yoshikawa\. 2004\.Adaptive Web Search Based on User Profile Constructed Without Any Effort from Users\. In*International World Wide Web Conference \(WWW\)*\.
- Sumers et al\.\(2024\)Theodore R\. Sumers, Shunyu Yao, Karthik Narasimhan, and Thomas L\. Griffiths\. 2024\.Cognitive Architectures for Language Agents\.*Transactions on Machine Learning Research*\(2024\)\.
- Sun et al\.\(2019\)Fei Sun, Jun Liu, Jian Wu, Changhua Pei, Xiao Lin, Wenwu Ou, and Peng Jiang\. 2019\.BERT4Rec: Sequential Recommendation with Bidirectional Encoder Representations from Transformer\. In*ACM International Conference on Information and Knowledge Management \(CIKM\)*\.
- Teevan et al\.\(2005\)Jaime Teevan, Susan T\. Dumais, and Eric Horvitz\. 2005\.Personalizing Search via Automated Analysis of Interests and Activities\. In*ACM SIGIR Conference on Research and Development in Information Retrieval*\. 449–456\.
- van der Aalst \(2016\)Wil M\. P\. van der Aalst\. 2016\.*Process Mining: Data Science in Action*\(2nd ed\.\)\.Springer\.
- Wang et al\.\(2023\)Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar\. 2023\.Voyager: An Open\-Ended Embodied Agent with Large Language Models\.*arXiv preprint arXiv:2305\.16291*\(2023\)\.
- Wei et al\.\(2022\)Jason Wei, Xuezhi Wang, Dale Schuurmans, Maarten Bosma, Brian Ichter, Fei Xia, Ed Chi, Quoc Le, and Denny Zhou\. 2022\.Chain\-of\-Thought Prompting Elicits Reasoning in Large Language Models\. In*Advances in Neural Information Processing Systems \(NeurIPS\)*\.
- White et al\.\(2009\)Ryen W\. White, Peter Bailey, and Liwei Chen\. 2009\.Predicting User Interests from Contextual Information\. In*ACM SIGIR Conference on Research and Development in Information Retrieval*\.
- Wornow et al\.\(2024\)Michael Wornow, Avanika Narayan, Krista Opsahl\-Ong, Quinn McIntyre, Nigam Shah, and Christopher Ré\. 2024\.Automating the Enterprise with Foundation Models\.*Proceedings of the VLDB Endowment*17, 11 \(2024\), 2805–2812\.
- Wu et al\.\(2024\)Qingyun Wu, Gagan Bansal, Jieyu Zhang, Yiran Wu, Beibin Li, Erkang Zhu, Li Jiang, Xiaoyun Zhang, Shaokun Zhang, Jiale Liu, et al\.2024\.AutoGen: Enabling Next\-Gen LLM Applications via Multi\-Agent Conversation\. In*Conference on Language Modeling \(COLM\)*\.
- Yan et al\.\(2024\)Shi\-Qi Yan, Jia\-Chen Gu, Yun Zhu, and Zhen\-Hua Ling\. 2024\.Corrective Retrieval Augmented Generation\.*arXiv preprint arXiv:2401\.15884*\(2024\)\.
- Yao et al\.\(2023\)Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao\. 2023\.ReAct: Synergizing Reasoning and Acting in Language Models\. In*International Conference on Learning Representations \(ICLR\)*\.
- Zhou et al\.\(2018\)Guorui Zhou, Xiaoqiang Zhu, Chenru Song, Ying Fan, Han Zhu, Xiao Ma, Yanghui Yan, Junqi Jin, Han Li, and Kun Gai\. 2018\.Deep Interest Network for Click\-Through Rate Prediction\. In*ACM SIGKDD International Conference on Knowledge Discovery and Data Mining*\.

Similar Articles

Interdomain Attention: Beyond Token-Level Key-Value Memory

arXiv cs.LG

Proposes Interdomain Attention, a new method that integrates state space models into attention via kernel methods, achieving efficient long-context modeling with a fixed-size state and outperforming SSMs and softmax attention in language modeling experiments up to 1.3B parameters.