Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering

arXiv cs.CL Papers

Summary

This paper proposes a paradigm shift in context engineering, formulating it as a recommendation problem. The authors introduce Neural Collaborative Context Engineering (NCCE), which uses collaborative filtering to dynamically assign instance-specific contexts, improving LLM task accuracy.

arXiv:2605.15721v1 Announce Type: new Abstract: Large Language Models (LLMs) are highly sensitive to their input contexts, motivating the development of automated context engineering. However, existing methods predominantly treat this as a global search problem, seeking a single context strategy that maximizes average performance across a dataset. This restrictive assumption overlooks the fact that different inputs often require distinct guidance, leaving substantial instance-level performance gains untapped. In this paper, we propose a paradigm shift by formulating context engineering as a recommendation problem. We introduce \textbf{Neural Collaborative Context Engineering (NCCE)}, a framework that transitions optimization from a static global search to dynamic, instance-wise routing. NCCE first bootstraps a diverse catalog of anchor contexts and then employs a novel \textbf{Context-CF Co-Evolution} mechanism. This stage establishes a synergistic feedback loop: a lightweight Neural Collaborative Filtering (NCF) model learns instance-context preferences to guide the generation of specialized context variants, while the newly evaluated contexts continuously refine the NCF model's understanding of latent preferences. At inference time, the trained NCF model acts as a context router, dynamically assigning the most suitable context strategy to each unseen instance. Theoretical Proofs and comprehensive experiments demonstrate that by matching individual inputs with their optimal contexts, NCCE significantly improves task accuracy, highlighting the critical importance of personalization in LLM context engineering.
Original Article
View Cached Full Text

Cached at: 05/18/26, 06:34 AM

# Contexting as Recommendation: Evolutionary Collaborative Filtering for Context Engineering
Source: [https://arxiv.org/html/2605.15721](https://arxiv.org/html/2605.15721)
Jiachen Zhu Shanghai Jiao Tong Univ\. gebro13@sjtu\.edu\.cn &Zhuoying Ou∗ Shanghai Jiao Tong Univ\. zoeouzy23@sjtu\.edu\.cn &Congmin Zheng Shanghai Jiao Tong Univ\. desp\.zcm@sjtu\.edu\.cn Yuxiang Chen Univ\. College London yuxiang\.chen\.25@ucl\.ac\.uk &Zeyu Zheng Carnegie Mellon Univ\. zeyuzhen@andrew\.cmu\.edu &Rong Shan Shanghai Jiao Tong Univ\. shanrong@sjtu\.edu\.cn Lingyu Yang Shanghai Jiao Tong Univ\. jlnhbyu\.yang@sjtu\.edu\.cn &Lionel Z\. WANG Hong Kong Polytechnic Univ\. lionel\-z\.wang @connect\.polyu\.hk &Weiwen Liu Shanghai Jiao Tong Univ\. wwliu@sjtu\.edu\.cn &Yong Yu Shanghai Jiao Tong Univ\. yyu@sjtu\.edu\.cn &Weinan Zhang Shanghai Jiao Tong Univ\. wnzhang@sjtu\.edu\.cn &Jianghao Lin Shanghai Jiao Tong Univ\. linjianghao@sjtu\.edu\.cn

###### Abstract

Large Language Models \(LLMs\) are highly sensitive to their input contexts, motivating the development of automated context engineering\. However, existing methods predominantly treat this as a global search problem, seeking a single context strategy that maximizes average performance across a dataset\. This restrictive assumption overlooks the fact that different inputs often require distinct guidance, leaving substantial instance\-level performance gains untapped\. In this paper, we propose a paradigm shift by formulating context engineering as a recommendation problem\. We introduceNeural Collaborative Context Engineering \(NCCE\), a framework that transitions optimization from a static global search to dynamic, instance\-wise routing\. NCCE first bootstraps a diverse catalog of anchor contexts and then employs a novelContext\-CF Co\-Evolutionmechanism\. This stage establishes a synergistic feedback loop: a lightweight Neural Collaborative Filtering \(NCF\) model learns instance\-context preferences to guide the generation of specialized context variants, while the newly evaluated contexts continuously refine the NCF model’s understanding of latent preferences\. At inference time, the trained NCF model acts as a context router, dynamically assigning the most suitable context strategy to each unseen instance\. Theoretical Proofs and comprehensive experiments demonstrate that by matching individual inputs with their optimal contexts, NCCE significantly improves task accuracy, highlighting the critical importance of personalization in LLM context engineering\.

## 1Introduction

![Refer to caption](https://arxiv.org/html/2605.15721v1/x1.png)Figure 1:Context engineering as recommendation: learning to assign instance\-specific composite contexts instead of optimizing a single global context strategy\.Large Language Models \(LLMs\) have become increasingly capable at solving complex reasoning, question answering, and context\-dependent tasks\[[1](https://arxiv.org/html/2605.15721#bib.bib30),[30](https://arxiv.org/html/2605.15721#bib.bib32),[4](https://arxiv.org/html/2605.15721#bib.bib31),[31](https://arxiv.org/html/2605.15721#bib.bib38)\]\. Yet their performance remains highly sensitive to the context provided at inference time\. Small changes in the instruction, the selected few\-shot examples, the required reasoning format, or the output constraints can substantially alter model behavior\. This sensitivity has motivated the growing practice of context engineering: the systematic design and optimization of the input context used to elicit reliable LLM outputs\.\[[39](https://arxiv.org/html/2605.15721#bib.bib33),[12](https://arxiv.org/html/2605.15721#bib.bib35),[38](https://arxiv.org/html/2605.15721#bib.bib34)\]

Most automated context optimization methods search for a single context strategy that maximizes average training performance\[[40](https://arxiv.org/html/2605.15721#bib.bib2),[34](https://arxiv.org/html/2605.15721#bib.bib3),[36](https://arxiv.org/html/2605.15721#bib.bib5),[18](https://arxiv.org/html/2605.15721#bib.bib10),[9](https://arxiv.org/html/2605.15721#bib.bib6),[3](https://arxiv.org/html/2605.15721#bib.bib11)\]\. Such strategies may combine instructions, few\-shot examples, reasoning formats, and output constraints\. While effective, this assumes one context suits all instances equally well\. In practice, different inputs require different guidance: multi\-hop questions may benefit from explicit decomposition, whereas verification tasks may need stricter evidence grounding\.

This paper argues that the core challenge of context engineering is not only discovering high\-quality contexts, but selecting the right context for each instance\.Instead of optimizing a single global context, we dynamically assign the most suitable context strategy to each input\.

To address this question, we propose a paradigm shift:viewing context engineering as a recommendation problem\[[22](https://arxiv.org/html/2605.15721#bib.bib19)\]\. In this formulation, input instances play the role of “users”, composite context strategies play the role of “items”, and the observed task accuracy defines their interaction signal\. The goal is to learn the latent preference structure between instances and contexts, and to use that structure to recommend the most suitable context for any previously unseen instance\. This perspective transforms context optimization from a one\-dimensional search for a global average into an instance\-wise routing problem over a diverse catalog of context strategies\.

To operationalize this perspective, we introduceNeural Collaborative Context Engineering \(NCCE\)\. Rather than searching for a single optimal strategy, NCCE maintains a dynamic catalog of candidates and learns which strategy best suits each instance through three main stages:

First, to build an effective initial “item catalog,” NCCE extracts a diverse set of anchor contexts\. By clustering instances into semantically similar groups, we leverage existing global optimizers to generate cluster\-specific contexts\. This provides a high\-quality, diverse pool of candidate strategies, establishing informative starting points for learning instance\-context preferences\.

Second, inspired by item expansion in recommender system, NCCE expands the context catalog throughContext\-CF Co\-Evolution\. Rather than relying on a static pool of strategies, this stage establishes a synergistic feedback loop between the context catalog and the recommendation model\. A lightweight Neural Collaborative Filtering \(NCF\)\[[10](https://arxiv.org/html/2605.15721#bib.bib21)\]model is trained on observed instance\-context interactions to identify "blind spots" where current contexts fail\. Guided by the NCF model’s latent gradients, NCCE iteratively evolves new, specialized context variants through LLM\-based reflection and optimization\. These new contexts, in turn, provide fresh interaction data to further refine the NCF model\. This co\-evolution ensures that the catalog remains diverse and high\-performing, while the recommendation model develops a granular understanding of instance\-level preferences\.

Finally, at inference time, the trained NCF model acts as an instance\-wise context router\. Given a new instance, it scores all candidate context strategies and selects the one predicted to maximize task accuracy, achieving dynamic, instance\-specific context construction\.

The main contributions of this work are summarized as follows:

- •We are the first toformulate context engineering as a recommendation problem, introducing a novel paradigm where each input instance is routed to its most suitable context strategy rather than relying on a globally averaged prompt\.
- •We propose aContext\-CF Co\-Evolutionmechanism that mimics the iterative item\-expansion process in mature recommender systems\. By leveraging the preference model as a differentiable guide, we iteratively generate new context variants to address failure instances, creating a feedback loop where the context catalog and the neural recommender improve each other synergistically\.
- •Comprehensive experiments demonstrate that by shifting from static global optimization to dynamic, instance\-wise context routing, NCCE significantly unlocks performance gains, highlighting the critical importance of personalization in context engineering\.

## 2Preliminary

We formalize composite context engineering and establish its connection to collaborative filtering\.

### 2\.1Context Engineering with Composite Strategies

LetX=\{x1,…,xN\}X=\\\{x\_\{1\},\\dots,x\_\{N\}\\\}denote a set of input instances andP=\{p1,…,pM\}P=\\\{p\_\{1\},\\dots,p\_\{M\}\\\}a catalog of candidate context strategies\. Each strategy is a composite configuration:

pj=\(cjinst,cjdemo,cjreason,cjout\)p\_\{j\}=\\big\(c^\{\\mathrm\{inst\}\}\_\{j\},\\;c^\{\\mathrm\{demo\}\}\_\{j\},\\;c^\{\\mathrm\{reason\}\}\_\{j\},\\;c^\{\\mathrm\{out\}\}\_\{j\}\\big\)representing the task instruction, few\-shot demonstrations, reasoning format, and output constraints\. Given a fixed LLM, the task accuracy of applying strategypjp\_\{j\}to instancexix\_\{i\}is:

ri​j=R​\(xi,pj\)r\_\{ij\}=R\(x\_\{i\},p\_\{j\}\)whereri​j∈\[0,1\]r\_\{ij\}\\in\[0,1\]\. Traditional optimization seeks a single globally optimal strategy:

p∗=arg⁡maxp∈P​∑i=1NR​\(xi,p\)p^\{\*\}=\\arg\\max\_\{p\\in P\}\\sum\_\{i=1\}^\{N\}R\(x\_\{i\},p\)This assumes one strategy fits all, ignoring the fine\-grained interactions between specific instances and context formulations\.

### 2\.2Context Engineering as Recommendation

To exploit instance heterogeneity, we reframe context engineering as a recommendation problem:

Instances as users:Each input instancexi∈Xx\_\{i\}\\in Xis treated as a user whose semantic characteristics determine its preference over different context strategies\.

Context strategies as items\.Each context strategypj∈Pp\_\{j\}\\in Pis treated as an item in the catalog\.

Accuracy as interaction\.The observed task accuracyri​jr\_\{ij\}obtained by applyingpjp\_\{j\}toxix\_\{i\}serves as the interaction signal between the instance and the context strategy\.

Instead of a global optimum, we learn an instance\-wise routing function to select the best context per input\. Since evaluating all pairs is computationally prohibitive, we estimate compatibility from sparse observations\.

pi∗=arg⁡maxpj∈P⁡R​\(xi,pj\)p\_\{i\}^\{\*\}=\\arg\\max\_\{p\_\{j\}\\in P\}R\(x\_\{i\},p\_\{j\}\)Inductive Matrix CompletionWe model the sparse observed interactionsΩ\\Omegavia inductive matrix completion\. Instead of fixed IDs, we use semantic embeddings to predict compatibility:

r^i​j=fθ​\(ϕ​\(xi\),ψ​\(pj\)\)\\hat\{r\}\_\{ij\}=f\_\{\\theta\}\\big\(\\phi\(x\_\{i\}\),\\psi\(p\_\{j\}\)\\big\)
wherefθf\_\{\\theta\}scores the suitability between the embedded instanceϕ​\(xi\)\\phi\(x\_\{i\}\)and contextψ​\(pj\)\\psi\(p\_\{j\}\)\. This inductive approach enables zero\-shot routing for unseen instances and seamlessly integrates new contexts without retraining\.

## 3Methodology

![Refer to caption](https://arxiv.org/html/2605.15721v1/x2.png)Figure 2:The overall architecture of NCCE, featuring a synergistic co\-evolutionary loop between a neural collaborative filtering model and an evolving context catalog to enable personalized context construction and instance\-wise routing\.In this section, we present our NCCE framework\. The framework operates in three stages: cluster\-based initialization, Context\-CF Co\-Evolution, and instance\-wise context routing\.

### 3\.1Neural Collaborative Preference Model

NCCE utilizes a lightweight Neural Collaborative Filtering \(NCF\) model to estimate instance\-context suitability\. Given an instancexix\_\{i\}and a context strategypjp\_\{j\}, we extract their representations using frozen text encoders, whereei=ϕ​\(xi\)e\_\{i\}=\\phi\(x\_\{i\}\)andhj=ψ​\(pj\)h\_\{j\}=\\psi\(p\_\{j\}\), and the composite contextψ​\(pj\)\\psi\(p\_\{j\}\)aggregates its individual components\. We then project them into a shared latent space \(ui=Wx​ei,vj=Wp​hju\_\{i\}=W\_\{x\}e\_\{i\},v\_\{j\}=W\_\{p\}h\_\{j\}\) and construct the interaction vector:

zi​j=\[ui;vj;ui⊙vj;\|ui−vj\|\]\.z\_\{ij\}=\[u\_\{i\};\\,v\_\{j\};\\,u\_\{i\}\\odot v\_\{j\};\\,\|u\_\{i\}\-v\_\{j\}\|\]\.The compatibility score is computed via an MLP:

r^i​j=fθ​\(xi,pj\)=σ​\(MLPθ​\(zi​j\)\)\.\\hat\{r\}\_\{ij\}=f\_\{\\theta\}\(x\_\{i\},p\_\{j\}\)=\\sigma\(\\mathrm\{MLP\}\_\{\\theta\}\(z\_\{ij\}\)\)\.
To optimize instance\-wise selection, NCCE minimizes a pairwise ranking loss over observed triples\(i,j,k\)\(i,j,k\)wherepjp\_\{j\}outperformspkp\_\{k\}onxix\_\{i\}:

ℒrank=−∑\(i,j,k\)∈𝒟pairlog⁡σ​\(r^i​j−r^i​k\)\.\\mathcal\{L\}\_\{\\mathrm\{rank\}\}=\-\\sum\_\{\(i,j,k\)\\in\\mathcal\{D\}\_\{\\mathrm\{pair\}\}\}\\log\\sigma\(\\hat\{r\}\_\{ij\}\-\\hat\{r\}\_\{ik\}\)\.This objective mitigates the impact of inherent instance difficulty by focusing on relative ranking\.

### 3\.2Cluster\-based Initialization

An informative initial catalog is crucial for learning reliable preferences\. To avoid weak starting points, NCCE employs cluster\-based initialization\. We partition the training instances intoKKclusters:

𝒞1,…,𝒞K=KMeans​\(\{ϕ​\(xi\)\}i=1N\)\.\\mathcal\{C\}\_\{1\},\\ldots,\\mathcal\{C\}\_\{K\}=\\mathrm\{KMeans\}\(\\\{\\phi\(x\_\{i\}\)\\\}\_\{i=1\}^\{N\}\)\.For each cluster𝒞k\\mathcal\{C\}\_\{k\}, a warm\-up optimizer𝒜\\mathcal\{A\}generates specialized anchor contexts:Pk0=𝒜​\(𝒞k\)P\_\{k\}^\{0\}=\\mathcal\{A\}\(\\mathcal\{C\}\_\{k\}\)\. The initial catalog becomes:P0=⋃k=1KPk0\.P\_\{0\}=\\bigcup\_\{k=1\}^\{K\}P\_\{k\}^\{0\}\.These diverse, group\-level anchors provide much stronger preference learning signals than a single globally optimized strategy\.

### 3\.3Context\-CF Co\-Evolution

Next, NCCE enters a co\-evolutionary loop: the NCF model identifies promising directions for context improvement, while newly evolved contexts generate interaction data to refine the NCF model\.

At iterationtt, after trainingfθf\_\{\\theta\}onΩt\\Omega\_\{t\}, we target failure instances unsolved by any current strategy:

ℱt=\{xi∈X∣R​\(xi,pj\)=0,∀pj∈Pt\}\.\\mathcal\{F\}\_\{t\}=\\\{x\_\{i\}\\in X\\mid R\(x\_\{i\},p\_\{j\}\)=0,\\ \\forall p\_\{j\}\\in P\_\{t\}\\\}\.From a sampled batchℬt∼ℱt\\mathcal\{B\}\_\{t\}\\sim\\mathcal\{F\}\_\{t\}, we perform gradient ascent onkkrandomly sampled context embeddingshhto maximize their NCF\-predicted suitability for the failure batch:

𝒥​\(h;ℬt\)=1m​∑xi∈ℬtsθ​\(h,xi\)\.\\mathcal\{J\}\(h;\\mathcal\{B\}\_\{t\}\)=\\frac\{1\}\{m\}\\sum\_\{x\_\{i\}\\in\\mathcal\{B\}\_\{t\}\}s\_\{\\theta\}\(h,x\_\{i\}\)\.The embedding is iteratively updated:

h\(τ\+1\)=Normalize⁡\(h\(τ\)\+η​∇h\(τ\)𝒥​\(h\(τ\);ℬt\)\)\.h^\{\(\\tau\+1\)\}=\\operatorname\{Normalize\}\\left\(h^\{\(\\tau\)\}\+\\eta\\nabla\_\{h^\{\(\\tau\)\}\}\\mathcal\{J\}\(h^\{\(\\tau\)\};\\mathcal\{B\}\_\{t\}\)\\right\)\.AfterGGsteps, these optimized embeddingsh~ℓ\\tilde\{h\}\_\{\\ell\}represent continuous ideal contexts\. To map them back to discrete text, we find the existing contextppotp^\{\\mathrm\{pot\}\}with the minimum average distance to these targets:

ppot=arg⁡minpj∈Pt⁡1k​∑ℓ=1k‖ψ​\(pj\)−h~ℓ‖2\.p^\{\\mathrm\{pot\}\}=\\arg\\min\_\{p\_\{j\}\\in P\_\{t\}\}\\frac\{1\}\{k\}\\sum\_\{\\ell=1\}^\{k\}\\left\\\|\\psi\(p\_\{j\}\)\-\\tilde\{h\}\_\{\\ell\}\\right\\\|\_\{2\}\.Through thisgradient\-guided selection, NCCE avoids mutating blindly and instead selects the context nearest to the NCF’s predicted optima\.

Finally, an LLM reflector revisesppotp^\{\\mathrm\{pot\}\}by diagnosing its failures onℬt\\mathcal\{B\}\_\{t\}, generating an improved strategy:

pnew=ℳ​\(ppot,ℬt\)\.p\_\{\\mathrm\{new\}\}=\\mathcal\{M\}\(p^\{\\mathrm\{pot\}\},\\mathcal\{B\}\_\{t\}\)\.Evaluatingpnewp\_\{\\mathrm\{new\}\}yields new interactionsΔ​Ωt\\Delta\\Omega\_\{t\}, updating both the catalog \(Pt\+1=Pt∪\{pnew\}P\_\{t\+1\}=P\_\{t\}\\cup\\\{p\_\{\\mathrm\{new\}\}\\\}\) and the interaction set \(Ωt\+1=Ωt∪Δ​Ωt\\Omega\_\{t\+1\}=\\Omega\_\{t\}\\cup\\Delta\\Omega\_\{t\}\) for the next NCF training round\.

Algorithm[1](https://arxiv.org/html/2605.15721#alg1)summarizes NCCE\. Training bootstraps the catalog via clustering and iteratively refines it alongside the NCF model \(Algorithm[2](https://arxiv.org/html/2605.15721#alg2)\)\. During inference, the frozen NCF model dynamically routes each test instance to its optimal strategy\.

### 3\.4Theoretical Justification

We provide a PAC\-style analysis that justifies cluster\-based initialization and clarifies its relationship to Context\-CF Co\-Evolution\. The analysis decomposes the regret of instance\-wise routing into two terms, each governed by a distinct component of NCCE\.

Let𝒟\\mathcal\{D\}be the distribution over the input space𝒳\\mathcal\{X\}and letP=\{p1,…,pM\}P=\\\{p\_\{1\},\\ldots,p\_\{M\}\\\}be the context catalog\. Denote byr​\(x,p\)∈\[0,1\]r\(x,p\)\\in\[0,1\]the true reward of applying contextppto instancexx, and letf^θ:𝒳→P\\hat\{f\}\_\{\\theta\}:\\mathcal\{X\}\\to Pbe the learned router\. The instance\-wise regret is

Δ​\(x\)=maxp∈P⁡r​\(x,p\)−r​\(x,f^θ​\(x\)\)\.\\Delta\(x\)\\;=\\;\\max\_\{p\\in P\}\\,r\(x,p\)\\;\-\\;r\\bigl\(x,\\hat\{f\}\_\{\\theta\}\(x\)\\bigr\)\.
Our analysis relies on a single structural assumption that links embedding geometry to context preferences\. The remaining ingredients \(anchor quality, cluster diameter, and pairwise generalization\) follow standard arguments and are stated in Appendix[B](https://arxiv.org/html/2605.15721#A2)\.

###### Assumption 1\(Cluster Lipschitz Preference\)\.

There exists a partition\{𝒞1,…,𝒞K\}\\\{\\mathcal\{C\}\_\{1\},\\ldots,\\mathcal\{C\}\_\{K\}\\\}of𝒳\\mathcal\{X\}and a constantL\>0L\>0such that for every cluster𝒞k\\mathcal\{C\}\_\{k\}, every pair of instancesx,x′∈𝒞kx,x^\{\\prime\}\\in\\mathcal\{C\}\_\{k\}, and every pair of contextsp,p′∈Pp,p^\{\\prime\}\\in P,

\|\(r​\(x,p\)−r​\(x,p′\)\)−\(r​\(x′,p\)−r​\(x′,p′\)\)\|≤L​‖ϕ​\(x\)−ϕ​\(x′\)‖\.\\bigl\|\\,\(r\(x,p\)\-r\(x,p^\{\\prime\}\)\)\-\(r\(x^\{\\prime\},p\)\-r\(x^\{\\prime\},p^\{\\prime\}\)\)\\,\\bigr\|\\;\\leq\\;L\\,\\\|\\phi\(x\)\-\\phi\(x^\{\\prime\}\)\\\|\.

Assumption[1](https://arxiv.org/html/2605.15721#Thmassumption1)requires only that the*relative*preference between two contexts varies smoothly within a cluster of semantically similar instances\. It is strictly weaker than assuming Lipschitzness ofr​\(x,p\)r\(x,p\)itself, and it aligns with the pairwise ranking objective: routing depends on relative rankings, not absolute reward magnitudes\.

###### Theorem 1\(PAC Bound for Instance\-wise Routing\)\.

Under Assumption[1](https://arxiv.org/html/2605.15721#Thmassumption1)and the standard assumptions stated in Appendix[B](https://arxiv.org/html/2605.15721#A2), with probability at least1−δ1\-\\deltaover the draw ofnntraining interactions,

Prx∼𝒟⁡\[Δ​\(x\)\>ε\]≤α\+L​ρKε⏟\(I\) catalog coverage\+ℛ^n\+ℜn​\(ℱ\)\+log⁡\(1/δ\)2​n⏟\(II\) router generalization,\\Pr\_\{x\\sim\\mathcal\{D\}\}\\\!\\bigl\[\\,\\Delta\(x\)\>\\varepsilon\\,\\bigr\]\\;\\leq\\;\\underbrace\{\\frac\{\\alpha\+L\\rho\_\{K\}\}\{\\varepsilon\}\}\_\{\\text\{\(I\) catalog coverage\}\}\\;\+\\;\\underbrace\{\\widehat\{\\mathcal\{R\}\}\_\{n\}\+\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\)\+\\sqrt\{\\tfrac\{\\log\(1/\\delta\)\}\{2n\}\}\}\_\{\\text\{\(II\) router generalization\}\},whereα\\alphais the local optimality gap of the warm\-up optimizer,ρK=O​\(K−1/d\)\\rho\_\{K\}=O\(K^\{\-1/d\}\)is the embedding diameter of the clusters, andℜn​\(ℱ\)\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\)is the Rademacher complexity of the router class\.

Theorem[1](https://arxiv.org/html/2605.15721#Thmtheorem1)decomposes the routing regret into two terms that are controlled by orthogonal mechanisms in NCCE\. Term \(I\) depends only on the initial catalog:L​ρKL\\rho\_\{K\}shrinks as the number of clustersKKincreases, andα\\alphareflects the local quality of the warm\-up optimizer within each cluster\. This term cannot be reduced by enlarging the interaction setΩ\\Omegaalone, and it is precisely what cluster\-based initialization is designed to control\. The contrast with global context optimization is direct: a single globally optimized strategy corresponds toK=1K=1and incurs a Lipschitz penalty ofL⋅diam​\(ϕ​\(𝒳\)\)L\\cdot\\mathrm\{diam\}\(\\phi\(\\mathcal\{X\}\)\)over the entire dataset, which establishes a fundamental ceiling on the accuracy of any global method under the same assumptions\.

Term \(II\) depends only on the size and informativeness of the interaction set and follows the standard form of pairwise Rademacher generalization\. This is the term that Context\-CF Co\-Evolution is designed to reduce, by iteratively expandingΩ\\Omegawith new instance\-context evaluations\. Because the two terms are governed by independent quantities, cluster\-based initialization and Context\-CF Co\-Evolution play complementary rather than redundant roles in the NCCE pipeline\.

The bound also implies a non\-monotone dependence onKK: increasingKKreducesL​ρKL\\rho\_\{K\}but inflatesα\\alpha, since fewer training instances per cluster degrade the warm\-up optimizer\. An optimalK⋆K^\{\\star\}therefore exists and is in general dataset\-dependent\. Detailed Proofs are in Appendix[B](https://arxiv.org/html/2605.15721#A2)\.

## 4Experiments

We evaluate whether NCCE improves task accuracy by constructing instance\-specific contexts\. Our code is publicly available\.111https://anonymous\.4open\.science/r/Context\_Engineering\_Collaborative\_Filtering\-1EF5

### 4\.1Experimental Setup

Datasets and Metrics\.We evaluate NCCE on three reasoning benchmarks with diverse instance heterogeneity: HoVer\[[11](https://arxiv.org/html/2605.15721#bib.bib26)\], SCONE\[[15](https://arxiv.org/html/2605.15721#bib.bib27)\], and HotpotQA\[[35](https://arxiv.org/html/2605.15721#bib.bib28)\], using task accuracy as the metric\.

Implementation Details\.We use GPT\-4o\-mini\[[17](https://arxiv.org/html/2605.15721#bib.bib29)\]as the target LLM and frozen text encoders for semantic representations\. The context catalog is initialized via semantic clustering and DSPY\[[18](https://arxiv.org/html/2605.15721#bib.bib10)\]\-based warm\-up optimization\. NCCE then performsTTrounds of Context\-CF Co\-Evolution, training the NCF model on sparse instance\-context interactions with pairwise ranking loss\.

Baselines\.We compare NCCE with APE\[[40](https://arxiv.org/html/2605.15721#bib.bib2)\], OPRO\[[34](https://arxiv.org/html/2605.15721#bib.bib3)\], EvoPrompt\[[9](https://arxiv.org/html/2605.15721#bib.bib6)\], TextGrad\[[36](https://arxiv.org/html/2605.15721#bib.bib5)\], GEPA\[[3](https://arxiv.org/html/2605.15721#bib.bib11)\], and MIPROv2\[[18](https://arxiv.org/html/2605.15721#bib.bib10)\]\. Unlike NCCE’s instance\-wise routing, all baselines learn a single global context strategy\. Detailed dataset descriptions are provided in Appendix[F](https://arxiv.org/html/2605.15721#A6)\.

Table 1:Main accuracy results on HoVer, SCONE, and HotpotQA\. All baseline methods optimize a single global context strategy, while NCCE performs instance\-wise context routing over a learned context catalog\. The best results are shown inboldand the second best results areunderlined\.∗meansp−v​a​l​u​e<0\.05p\-value<0\.05in significance test\.
### 4\.2Main Results

Table[1](https://arxiv.org/html/2605.15721#S4.T1)reports the main accuracy results\. Across all three datasets, NCCE consistently and significantly improves over global context optimization baselines, validating our core hypothesis that dynamically routing instances to specialized contexts unlocks substantial performance gains\. On average, NCCE achieves a test accuracy of 74\.8%, outperforming the strongest global baselines, MIPROv2 \(68\.8%\) and GEPA\-Merge \(68\.7%\), by absolute margins of 6\.0% and 6\.1%, respectively\. This superiority holds across varying forms of instance heterogeneity, with NCCE reaching 74\.7% on HoVer, 89\.7% on SCONE, and 60\.1% on HotpotQA, comfortably beating the best\-performing baselines on each respective dataset\. Overall, the results strongly demonstrate that learning instance\-context preferences is far more effective than relying on a single, static context strategy, even when such a strategy is optimized by state\-of\-the\-art global methods\.

### 4\.3Ablation Study

Table[2](https://arxiv.org/html/2605.15721#S4.T2)details our ablation study to evaluate the contribution of each NCCE component:

- •No routing\.Applies a single globally optimal strategy from the evolved catalog to all instances, isolating the effect of instance\-wise routing\.
- •Random routing\.Randomly assigns a strategy from the final catalog, testing whether gains come from learned routing rather than catalog expansion\.
- •Cluster\-only routing\.Routes instances to the nearest semantic cluster’s anchor strategy, bypassing Context\-CF co\-evolution\.
- •Pointwise loss\.Replaces pairwise ranking with pointwise regression to evaluate the effectiveness of relative preference learning\.
- •Oracle routing\.Assigns the ground\-truth optimal strategy to each instance, providing an upper\-bound performance estimate\.

As shown in Table[2](https://arxiv.org/html/2605.15721#S4.T2), removing or replacing any core component of NCCE leads to a noticeable performance drop, confirming their respective contributions\. First, simply maintaining a larger context catalog without intelligent routing actively harms performance, as seen byRandom routingyielding the lowest average accuracy \(69\.2%\)\. Furthermore, extracting a single best strategy from the evolved pool \(No routing, 72\.0%\) falls short of full NCCE \(74\.8%\), reinforcing that a global optimum cannot satisfy all instances\.

The necessity of the co\-evolution phase is validated byCluster\-only routing\(72\.4%\), which shows that while initial cluster\-level anchors are helpful, iteratively evolving contexts specifically for failure instances is crucial for pushing accuracy higher\. We also observe that training the NCF model with a pairwise ranking loss slightly outperformsPointwise loss\(74\.8% vs\. 74\.3%\), confirming that relative preference learning is better aligned with the ranking nature of context routing\. Finally,Oracle routingachieves an impressive 84\.3% average accuracy\. This not only proves that our co\-evolution mechanism successfully generates a highly capable and diverse context catalog, but also indicates substantial headroom for future advancements in preference modeling\.

Table 2:Ablation and routing analysis\. Cluster\-only routing removes context\-CF co\-evolution, random routing removes the learned NCF router, and oracle routing reports the upper bound of the final context catalog\.
### 4\.4Co\-Evolution Effect

To analyze the dynamics of the Context\-CF Co\-Evolution, we track the task scores as the context catalog expands and the preference model refines\. These performance are illustrated in Figure[3](https://arxiv.org/html/2605.15721#S4.F3)\.

![Refer to caption](https://arxiv.org/html/2605.15721v1/x3.png)Figure 3:Performance evolution across iterative rounds\. The curves track the task scores of NCCE and its ablation variants over six rounds \(Round 0–5\) on HoVer, SCONE, and HotpotQA\. While the full NCCE framework exhibits a sustained upward trajectory, ablated variants either plateau early or show limited gains, demonstrating the effectiveness of the co\-evolution process\.Effectiveness of Context\-CF Co\-Evolution\.The evolution curves across all datasets demonstrate a consistent upward trajectory for the full NCCE model over five rounds\. This proves that leveraging the preference model to target failure instances successfully generates specialized contexts that continuously enhance overall accuracy\. Furthermore, compared to the pointwise loss variant, which fluctuates in later rounds, NCCE with pairwise ranking maintains a highly stable learning curve, confirming its robustness in integrating newly evolved contexts\.

Necessity of Instance\-Wise Routing\.The curves also starkly highlight the limitations of global optimization\. While the context catalog expands and improves, the "No routing \(global optimal\)" or “random routing” baselines quickly plateaus\. This indicates that merely generating a diverse pool of high\-quality contexts offers marginal gains if the system is constrained to a single, globally averaged strategy\. The widening performance gap over successive rounds explicitly proves that dynamic routing is essential to unlock the full potential of an evolved context catalog\.

### 4\.5Complementary experiments

#### Cluster Number Effect

We evaluate NCCE across varying semantic cluster counts \(KK\) on the HoVer dataset\. IncreasingKKfrom 1 \(a global strategy\) to 4 or 5 steadily improves test accuracy, confirming that diverse anchor contexts provide superior starting points for instance\-wise routing\.

However, atK=10K=10, test performance declines despite continued gains on the development set \(\>78%\)\. This perfectly corroborates the theoretical trade\-off in Theorem 1: while a largerKKreduces the cluster semantic diameter \(lowering the Lipschitz penalty\), it severely fragments the training data\. This scarcity degrades the local warm\-up optimizer , leading to over\-specialized anchors that overfit\. Thus, identifying a balancedK∗K^\{\*\}\(around 4 or 5\) is critical to maximize generalization\.

#### Data Density

![Refer to caption](https://arxiv.org/html/2605.15721v1/x4.png)Figure 4:Performance across different cluster numbers\.
![Refer to caption](https://arxiv.org/html/2605.15721v1/x5.png)Figure 5:Performance across difference data density in collaborative filtering matrix\.

To evaluate the robustness of our NCF preference model against interaction sparsity, we vary the proportion of observed instance\-context data used to populate the training matrixΩ\\Omegaon the SCONE dataset\. When utilizing 0% of the interaction data \(relying purely on baseline heuristics without collaborative learning\), test accuracy sits near 81\.3%\. Strikingly, populating the sparse matrix with just 30% of the available interaction data triggers a massive performance leap, driving test accuracy to roughly 88\.6%\. Expanding the data density further to 50% and 100% yields diminishing marginal returns on the test set \(peaking at 89\.7%\), even as development accuracy continues climbing to 93\.4%\. This rapid saturation demonstrates that the neural recommender is highly sample\-efficient\. It successfully learns the latent geometry of instance\-context preferences from an extremely sparse matrix, proving that NCCE achieves strong generalization without requiring exhaustive, computationally expensive LLM evaluations across all instance\-context pairs\.

#### Routing Context Distribution

To visualize the routing behavior, we project the instance embeddings into a 2D space using t\-SNE, where each point’s color represents its assigned context strategy\. We quantify the diversity of these assignments using Shannon entropy, defined asH=−∑jpj​log⁡pjH=\-\\sum\_\{j\}p\_\{j\}\\log p\_\{j\}, wherepjp\_\{j\}represents the fraction of instances routed to context strategyjj\. Under “Cluster\-only” routing, the assignments are visibly coarse and dominated by a few rigid spatial regions, reflecting low assignment entropy \(e\.g\., 0\.289 on HoVer and 1\.308 on SCONE\)\. In contrast, the “NCCE Full” framework produces a highly interwoven, heterogeneous distribution of context assignments, driving the entropy significantly higher \(1\.138 on HoVer and 1\.976 on SCONE\)\. Even on HotpotQA, where the entropy remains relatively stable \(1\.175 vs\. 1\.147\), the spatial mixing of colors in the full model is visually apparent\. This confirms that the trained NCF router does not simply default to surface\-level semantic neighborhoods\. Instead, it successfully captures fine\-grained, latent compatibility signals, breaking initial cluster boundaries to route instances based on their specific, nuanced reasoning requirements\.

## 5Related Work

Context EngineeringLarge Language Models \(LLMs\) are highly sensitive to input contexts\. While techniques like Chain\-of\-Thought\[[33](https://arxiv.org/html/2605.15721#bib.bib4)\]effectively elicit complex reasoning, their manual design is labor\-intensive\. This has motivated automated optimization systems where LLMs generate, select, or refine instructions as black\-box optimizers \(APE\[[40](https://arxiv.org/html/2605.15721#bib.bib2)\], OPRO\[[34](https://arxiv.org/html/2605.15721#bib.bib3)\], PromptWizard\[[2](https://arxiv.org/html/2605.15721#bib.bib7)\]\), optimize multi\-stage pipelines \(MIPROv2\[[18](https://arxiv.org/html/2605.15721#bib.bib10)\]\), or leverage natural language backpropagation \(TextGrad\[[36](https://arxiv.org/html/2605.15721#bib.bib5)\]\)\.

![Refer to caption](https://arxiv.org/html/2605.15721v1/x6.png)Figure 6:t\-SNE visualization of context routing assignments\. Colors represent different context strategies\.Inspired by biological evolution, a related line of work iteratively mutates prompt populations\. These methods connect LLMs with evolutionary algorithms \(EvoPrompt\[[9](https://arxiv.org/html/2605.15721#bib.bib6)\], OpenEvolve\[[28](https://arxiv.org/html/2605.15721#bib.bib9)\]\), enable self\-referential mutator improvements \(Promptbreeder\[[8](https://arxiv.org/html/2605.15721#bib.bib8)\]\), and utilize reflective or stochastic generative search to outperform traditional reinforcement learning \(GEPA\[[3](https://arxiv.org/html/2605.15721#bib.bib11)\], POLCA\[[21](https://arxiv.org/html/2605.15721#bib.bib15)\]\)\. Furthermore, models can dynamically refine contexts through verbal self\-reflection\[[29](https://arxiv.org/html/2605.15721#bib.bib13)\], in\-context bandit exploration\[[16](https://arxiv.org/html/2605.15721#bib.bib14)\], and diverse adversarial generation\[[24](https://arxiv.org/html/2605.15721#bib.bib12)\]\.

While existing methods rely on a single global strategy, overlooking instance\-level gains, we propose a Context\-CF Co\-Evolution mechanism\. Guided by a lightweight NCF model, it iteratively evolves specialized contexts for failure instances, enabling dynamic catalog expansion and fine\-grained preference learning\.

Recommendation and Collaborative FilteringRecommender systems have evolved from early item\-based heuristics\[[25](https://arxiv.org/html/2605.15721#bib.bib16),[14](https://arxiv.org/html/2605.15721#bib.bib17)\]to Matrix Factorization \(MF\)\[[13](https://arxiv.org/html/2605.15721#bib.bib18)\]and Factorization Machines\[[23](https://arxiv.org/html/2605.15721#bib.bib20)\]for capturing latent user\-item interactions\. For implicit feedback, Bayesian Personalized Ranking \(BPR\) established the superiority of pairwise ranking loss over pointwise score prediction\[[22](https://arxiv.org/html/2605.15721#bib.bib19)\]\. Deep learning further expanded these capabilities through large\-scale architectures\[[7](https://arxiv.org/html/2605.15721#bib.bib22),[5](https://arxiv.org/html/2605.15721#bib.bib23)\]and Neural Collaborative Filtering \(NCF\), which replaces MF’s static inner products with highly expressive multi\-layer perceptrons\[[26](https://arxiv.org/html/2605.15721#bib.bib25),[10](https://arxiv.org/html/2605.15721#bib.bib21)\]\. To overcome the cold\-start limitations of traditional ID\-based collaborative filtering, inductive matrix completion models leverage semantic features to generalize to entirely unseen entities\[[37](https://arxiv.org/html/2605.15721#bib.bib24)\]\.

Our framework directly translates this rich lineage to LLM context engineering\. By treating input instances as users and context strategies as items, we frame prompt optimization as an inductive recommendation problem\[[37](https://arxiv.org/html/2605.15721#bib.bib24)\]\. Utilizing an NCF architecture\[[10](https://arxiv.org/html/2605.15721#bib.bib21)\]trained with a pairwise ranking objective\[[22](https://arxiv.org/html/2605.15721#bib.bib19)\], our method moves beyond global search to achieve dynamic, instance\-wise context routing\. Detailed Related Works are shown in Appendix[D](https://arxiv.org/html/2605.15721#A4)\.

## 6Conclusion

In this paper, we introduced Neural Collaborative Context Engineering \(NCCE\), reframing automated context optimization from searching for a single global prompt to a dynamic, instance\-wise recommendation problem\. NCCE achieves this through a novel Context\-CF Co\-Evolution mechanism—a synergistic feedback loop between a lightweight NCF model and an LLM reflector that iteratively expands a catalog of specialized contexts\. At inference, the NCF model efficiently routes each query to its optimal strategy\. Experiments across reasoning benchmarks \(HoVer, SCONE, HotpotQA\) demonstrate that NCCE significantly outperforms state\-of\-the\-art global baselines\. Ultimately, our analyses confirm that dynamic routing unlocks substantial performance gains left untapped by static strategies, providing a scalable pathway toward highly adaptive LLM inference\.

## References

- \[1\]J\. Achiam, S\. Adler, S\. Agarwal, L\. Ahmad, I\. Akkaya, F\. L\. Aleman, D\. Almeida, J\. Altenschmidt, S\. Altman, S\. Anadkat,et al\.\(2023\)Gpt\-4 technical report\.arXiv preprint arXiv:2303\.08774\.Cited by:[§1](https://arxiv.org/html/2605.15721#S1.p1.1)\.
- \[2\]E\. Agarwal, J\. Singh, V\. Dani, R\. Magazine, T\. Ganu, and A\. Nambi\(2024\)PromptWizard: task\-aware prompt optimization framework\.External Links:2405\.18369,[Link](https://arxiv.org/abs/2405.18369)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p2.1),[§5](https://arxiv.org/html/2605.15721#S5.p1.1)\.
- \[3\]L\. A\. Agrawal, S\. Tan, D\. Soylu, N\. Ziems, R\. Khare, K\. Opsahl\-Ong, A\. Singhvi, H\. Shandilya, M\. J\. Ryan, M\. Jiang, C\. Potts, K\. Sen, A\. G\. Dimakis, I\. Stoica, D\. Klein, M\. Zaharia, and O\. Khattab\(2026\)GEPA: reflective prompt evolution can outperform reinforcement learning\.External Links:2507\.19457,[Link](https://arxiv.org/abs/2507.19457)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p3.1),[§1](https://arxiv.org/html/2605.15721#S1.p2.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p3.1),[Table 1](https://arxiv.org/html/2605.15721#S4.T1.11.17.8.1),[Table 1](https://arxiv.org/html/2605.15721#S4.T1.11.18.9.1),[§5](https://arxiv.org/html/2605.15721#S5.p2.1)\.
- \[4\]T\. Brown, B\. Mann, N\. Ryder, M\. Subbiah, J\. D\. Kaplan, P\. Dhariwal, A\. Neelakantan, P\. Shyam, G\. Sastry, A\. Askell,et al\.\(2020\)Language models are few\-shot learners\.Advances in neural information processing systems33,pp\. 1877–1901\.Cited by:[§1](https://arxiv.org/html/2605.15721#S1.p1.1)\.
- \[5\]H\. Cheng, L\. Koc, J\. Harmsen, T\. Shaked, T\. Chandra, H\. Aradhye, G\. Anderson, G\. Corrado, W\. Chai, M\. Ispir,et al\.\(2016\)Wide & deep learning for recommender systems\.InProceedings of the 1st workshop on deep learning for recommender systems,pp\. 7–10\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p2.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1)\.
- \[6\]S\. Clémençon, G\. Lugosi, and N\. Vayatis\(2008\)Ranking and empirical minimization of u\-statistics\.Cited by:[Appendix B](https://arxiv.org/html/2605.15721#A2.p3.7)\.
- \[7\]P\. Covington, J\. Adams, and E\. Sargin\(2016\)Deep neural networks for youtube recommendations\.InProceedings of the 10th ACM conference on recommender systems,pp\. 191–198\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p2.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1)\.
- \[8\]C\. Fernando, D\. Banarse, H\. Michalewski, S\. Osindero, and T\. Rocktäschel\(2023\)Promptbreeder: self\-referential self\-improvement via prompt evolution\.External Links:2309\.16797,[Link](https://arxiv.org/abs/2309.16797)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p3.1),[§5](https://arxiv.org/html/2605.15721#S5.p2.1)\.
- \[9\]Q\. Guo, R\. Wang, J\. Guo, B\. Li, K\. Song, X\. Tan, G\. Liu, J\. Bian, and Y\. Yang\(2025\)EvoPrompt: connecting llms with evolutionary algorithms yields powerful prompt optimizers\.External Links:2309\.08532,[Link](https://arxiv.org/abs/2309.08532)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p3.1),[§1](https://arxiv.org/html/2605.15721#S1.p2.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p3.1),[Table 1](https://arxiv.org/html/2605.15721#S4.T1.11.14.5.1),[§5](https://arxiv.org/html/2605.15721#S5.p2.1)\.
- \[10\]X\. He, L\. Liao, H\. Zhang, L\. Nie, X\. Hu, and T\. Chua\(2017\)Neural collaborative filtering\.InProceedings of the 26th international conference on world wide web,pp\. 173–182\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p2.1),[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p5.1),[§1](https://arxiv.org/html/2605.15721#S1.p7.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1),[§5](https://arxiv.org/html/2605.15721#S5.p5.1)\.
- \[11\]Y\. Jiang, S\. Bordia, Z\. Zhong, C\. Dognin, M\. Singh, and M\. Bansal\(2020\)HoVer: a dataset for many\-hop fact extraction and claim verification\.InFindings of the Association for Computational Linguistics: EMNLP 2020,pp\. 3441–3460\.Cited by:[§C\.1](https://arxiv.org/html/2605.15721#A3.SS1.SSS0.Px1.p1.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p1.1)\.
- \[12\]O\. Khattab, A\. Singhvi, P\. Maheshwari, Z\. Zhang, K\. Santhanam, S\. Haq, A\. Sharma, T\. T\. Joshi, H\. Moazam, H\. Miller,et al\.\(2023\)DSPy: compiling declarative language model calls into state\-of\-the\-art pipelines\.InThe Twelfth International Conference on Learning Representations,Cited by:[§1](https://arxiv.org/html/2605.15721#S1.p1.1)\.
- \[13\]Y\. Koren, R\. Bell, and C\. Volinsky\(2009\)Matrix factorization techniques for recommender systems\.Computer42\(8\),pp\. 30–37\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p1.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1)\.
- \[14\]G\. Linden, B\. Smith, and J\. York\(2003\)Amazon\. com recommendations: item\-to\-item collaborative filtering\.IEEE Internet computing7\(1\),pp\. 76–80\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p1.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1)\.
- \[15\]R\. Long, P\. Pasupat, and P\. Liang\(2016\)Simpler context\-dependent logical forms via model projections\.InProceedings of the 54th Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 1456–1465\.Cited by:[§C\.1](https://arxiv.org/html/2605.15721#A3.SS1.SSS0.Px2.p1.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p1.1)\.
- \[16\]G\. Monea, A\. Bosselut, K\. Brantley, and Y\. Artzi\(2025\)LLMs are in\-context bandit reinforcement learners\.External Links:2410\.05362,[Link](https://arxiv.org/abs/2410.05362)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p4.1),[§5](https://arxiv.org/html/2605.15721#S5.p2.1)\.
- \[17\]OpenAI, :, A\. Hurst, A\. Lerer, A\. P\. Goucher, and etc\.\(2024\)GPT\-4o system card\.External Links:2410\.21276,[Link](https://arxiv.org/abs/2410.21276)Cited by:[§F\.2](https://arxiv.org/html/2605.15721#A6.SS2.p2.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p2.1)\.
- \[18\]K\. Opsahl\-Ong, M\. J\. Ryan, J\. Purtell, D\. Broman, C\. Potts, M\. Zaharia, and O\. Khattab\(2024\)Optimizing instructions and demonstrations for multi\-stage language model programs\.External Links:2406\.11695,[Link](https://arxiv.org/abs/2406.11695)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p2.1),[§1](https://arxiv.org/html/2605.15721#S1.p2.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p2.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p3.1),[Table 1](https://arxiv.org/html/2605.15721#S4.T1.11.16.7.1),[§5](https://arxiv.org/html/2605.15721#S5.p1.1)\.
- \[19\]J\. Qin, J\. Zhu, B\. Chen, Z\. Liu, W\. Liu, R\. Tang, R\. Zhang, Y\. Yu, and W\. Zhang\(2022\)Rankflow: joint optimization of multi\-stage cascade ranking systems as flows\.InProceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval,pp\. 814–824\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p2.1)\.
- \[20\]J\. Qin, J\. Zhu, Y\. Liu, J\. Gao, J\. Ying, C\. Liu, D\. Wang, J\. Feng, C\. Deng, X\. Wang,et al\.\(2023\)Learning to distinguish multi\-user coupling behaviors for tv recommendation\.InProceedings of the sixteenth ACM international conference on web search and data mining,pp\. 204–212\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p2.1)\.
- \[21\]X\. Ren, A\. Nie, T\. Xie, and C\. Cheng\(2026\)POLCA: stochastic generative optimization with llm\.arXiv preprint arXiv:2603\.14769\.Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p3.1),[Table 1](https://arxiv.org/html/2605.15721#S4.T1.11.20.11.1),[§5](https://arxiv.org/html/2605.15721#S5.p2.1)\.
- \[22\]S\. Rendle, C\. Freudenthaler, Z\. Gantner, and L\. Schmidt\-Thieme\(2012\)BPR: bayesian personalized ranking from implicit feedback\.arXiv preprint arXiv:1205\.2618\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p1.1),[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p5.1),[§1](https://arxiv.org/html/2605.15721#S1.p4.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1),[§5](https://arxiv.org/html/2605.15721#S5.p5.1)\.
- \[23\]S\. Rendle\(2010\)Factorization machines\.In2010 IEEE International conference on data mining,pp\. 995–1000\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p1.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1)\.
- \[24\]M\. Samvelyan, S\. C\. Raparthy, A\. Lupu, E\. Hambro, A\. H\. Markosyan, M\. Bhatt, Y\. Mao, M\. Jiang, J\. Parker\-Holder, J\. Foerster, T\. Rocktäschel, and R\. Raileanu\(2024\)Rainbow teaming: open\-ended generation of diverse adversarial prompts\.External Links:2402\.16822,[Link](https://arxiv.org/abs/2402.16822)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p4.1),[§5](https://arxiv.org/html/2605.15721#S5.p2.1)\.
- \[25\]B\. Sarwar, G\. Karypis, J\. Konstan, and J\. Riedl\(2001\)Item\-based collaborative filtering recommendation algorithms\.InProceedings of the 10th international conference on World Wide Web,pp\. 285–295\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p1.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1)\.
- \[26\]S\. Sedhain, A\. K\. Menon, S\. Sanner, and L\. Xie\(2015\)Autorec: autoencoders meet collaborative filtering\.InProceedings of the 24th international conference on World Wide Web,pp\. 111–112\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p2.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1)\.
- \[27\]R\. Shan, J\. Zhu, J\. Lin, C\. Zhu, B\. Chen, R\. Tang, Y\. Yu, and W\. Zhang\(2025\)Full\-stack optimized large language models for lifelong sequential behavior comprehension in recommendation\.ACM Transactions on Recommender Systems4\(2\),pp\. 1–33\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p4.1)\.
- \[28\]OpenEvolve: an open\-source evolutionary coding agentExternal Links:[Link](https://github.com/algorithmicsuperintelligence/openevolve)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p3.1),[Table 1](https://arxiv.org/html/2605.15721#S4.T1.11.19.10.1),[§5](https://arxiv.org/html/2605.15721#S5.p2.1)\.
- \[29\]N\. Shinn, F\. Cassano, E\. Berman, A\. Gopinath, K\. Narasimhan, and S\. Yao\(2023\)Reflexion: language agents with verbal reinforcement learning\.External Links:2303\.11366,[Link](https://arxiv.org/abs/2303.11366)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p4.1),[§5](https://arxiv.org/html/2605.15721#S5.p2.1)\.
- \[30\]H\. Touvron, L\. Martin, K\. Stone, P\. Albert, A\. Almahairi, Y\. Babaei, N\. Bashlykov, S\. Batra, P\. Bhargava, S\. Bhosale,et al\.\(2023\)Llama 2: open foundation and fine\-tuned chat models\. arxiv\.arXiv preprint arXiv:2307\.0928810\.Cited by:[§1](https://arxiv.org/html/2605.15721#S1.p1.1)\.
- \[31\]J\. Wang, M\. Fang, Z\. Wan, M\. Wen, J\. Zhu, A\. Liu, Z\. Gong, Y\. Song, L\. Chen, L\. M\. Ni,et al\.\(2024\)Openr: an open source framework for advanced reasoning with large language models\.arXiv preprint arXiv:2410\.09671\.Cited by:[§1](https://arxiv.org/html/2605.15721#S1.p1.1)\.
- \[32\]W\. Wang, F\. Wei, L\. Dong, H\. Bao, N\. Yang, and M\. Zhou\(2020\)Minilm: deep self\-attention distillation for task\-agnostic compression of pre\-trained transformers\.Advances in neural information processing systems33,pp\. 5776–5788\.Cited by:[§F\.2](https://arxiv.org/html/2605.15721#A6.SS2.p2.1)\.
- \[33\]J\. Wei, X\. Wang, D\. Schuurmans, M\. Bosma, B\. Ichter, F\. Xia, E\. Chi, Q\. Le, and D\. Zhou\(2023\)Chain\-of\-thought prompting elicits reasoning in large language models\.External Links:2201\.11903,[Link](https://arxiv.org/abs/2201.11903)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p1.1),[§5](https://arxiv.org/html/2605.15721#S5.p1.1)\.
- \[34\]C\. Yang, X\. Wang, Y\. Lu, H\. Liu, Q\. V\. Le, D\. Zhou, and X\. Chen\(2024\)Large language models as optimizers\.External Links:2309\.03409,[Link](https://arxiv.org/abs/2309.03409)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p2.1),[§1](https://arxiv.org/html/2605.15721#S1.p2.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p3.1),[Table 1](https://arxiv.org/html/2605.15721#S4.T1.11.13.4.1),[§5](https://arxiv.org/html/2605.15721#S5.p1.1)\.
- \[35\]Z\. Yang, P\. Qi, S\. Zhang, Y\. Bengio, W\. Cohen, R\. Salakhutdinov, and C\. D\. Manning\(2018\)HotpotQA: a dataset for diverse, explainable multi\-hop question answering\.InProceedings of the 2018 conference on empirical methods in natural language processing,pp\. 2369–2380\.Cited by:[§C\.1](https://arxiv.org/html/2605.15721#A3.SS1.SSS0.Px3.p1.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p1.1)\.
- \[36\]M\. Yuksekgonul, F\. Bianchi, J\. Boen, S\. Liu, Z\. Huang, C\. Guestrin, and J\. Zou\(2024\)TextGrad: automatic "differentiation" via text\.External Links:2406\.07496,[Link](https://arxiv.org/abs/2406.07496)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p2.1),[§1](https://arxiv.org/html/2605.15721#S1.p2.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p3.1),[Table 1](https://arxiv.org/html/2605.15721#S4.T1.11.15.6.1),[§5](https://arxiv.org/html/2605.15721#S5.p1.1)\.
- \[37\]M\. Zhang and Y\. Chen\(2019\)Inductive matrix completion based on graph neural networks\.arXiv preprint arXiv:1904\.12058\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p3.1),[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p5.1),[§5](https://arxiv.org/html/2605.15721#S5.p4.1),[§5](https://arxiv.org/html/2605.15721#S5.p5.1)\.
- \[38\]W\. X\. Zhao, K\. Zhou, J\. Li, T\. Tang, X\. Wang, Y\. Hou, Y\. Min, B\. Zhang, J\. Zhang, Z\. Dong,et al\.\(2023\)A survey of large language models\.arXiv preprint arXiv:2303\.182231\(2\),pp\. 1–124\.Cited by:[§1](https://arxiv.org/html/2605.15721#S1.p1.1)\.
- \[39\]Z\. Zhao, E\. Wallace, S\. Feng, D\. Klein, and S\. Singh\(2021\)Calibrate before use: improving few\-shot performance of language models\.InInternational conference on machine learning,pp\. 12697–12706\.Cited by:[§1](https://arxiv.org/html/2605.15721#S1.p1.1)\.
- \[40\]Y\. Zhou, A\. I\. Muresanu, Z\. Han, K\. Paster, S\. Pitis, H\. Chan, and J\. Ba\(2023\)Large language models are human\-level prompt engineers\.External Links:2211\.01910,[Link](https://arxiv.org/abs/2211.01910)Cited by:[§D\.1](https://arxiv.org/html/2605.15721#A4.SS1.p2.1),[§1](https://arxiv.org/html/2605.15721#S1.p2.1),[§4\.1](https://arxiv.org/html/2605.15721#S4.SS1.p3.1),[Table 1](https://arxiv.org/html/2605.15721#S4.T1.11.12.3.1),[§5](https://arxiv.org/html/2605.15721#S5.p1.1)\.
- \[41\]J\. Zhu, J\. Lin, X\. Dai, B\. Chen, R\. Shan, J\. Zhu, R\. Tang, Y\. Yu, and W\. Zhang\(2024\)Lifelong personalized low\-rank adaptation of large language models for recommendation\.External Links:2408\.03533,[Link](https://arxiv.org/abs/2408.03533)Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p4.1)\.
- \[42\]J\. Zhu, Y\. Wang, J\. Lin, J\. Qin, R\. Tang, W\. Zhang, and Y\. Yu\(2024\)M\-scan: a multi\-scenario causal\-driven adaptive network for recommendation\.InProceedings of the ACM Web Conference 2024,pp\. 3844–3853\.Cited by:[§D\.2](https://arxiv.org/html/2605.15721#A4.SS2.p2.1)\.

## Appendix AOverall Algorithm

Algorithm 1Neural Collaborative Context Engineering \(NCCE\)1:Training instances

XX, warm\-up optimizer

𝒜\\mathcal\{A\}, number of clusters

KK, number of evolution iterations

TT, LLM evaluator

R​\(⋅,⋅\)R\(\\cdot,\\cdot\)
2:Final context catalog

PTP\_\{T\}, trained preference model

fθf\_\{\\theta\}
3:Embed training instances:

ei=ϕ​\(xi\)e\_\{i\}=\\phi\(x\_\{i\}\)for all

xi∈Xx\_\{i\}\\in X
4:Partition instances into semantic clusters:

𝒞1,…,𝒞K=KMeans​\(\{ei\}i=1N\)\\mathcal\{C\}\_\{1\},\\ldots,\\mathcal\{C\}\_\{K\}=\\mathrm\{KMeans\}\(\\\{e\_\{i\}\\\}\_\{i=1\}^\{N\}\)
5:Initialize context catalog

P0=∅P\_\{0\}=\\emptyset
6:foreach cluster

𝒞k\\mathcal\{C\}\_\{k\}do

7:Generate cluster\-specific strategies

Pk0=𝒜​\(𝒞k\)P\_\{k\}^\{0\}=\\mathcal\{A\}\(\\mathcal\{C\}\_\{k\}\)
8:Update catalog

P0←P0∪Pk0P\_\{0\}\\leftarrow P\_\{0\}\\cup P\_\{k\}^\{0\}
9:endfor

10:Evaluate selected pairs from

X×P0X\\times P\_\{0\}with the LLM evaluator

RR
11:Construct initial interaction set

Ω0\\Omega\_\{0\}
12:for

t=0,…,T−1t=0,\\ldots,T\-1do

13:Train preference model

fθf\_\{\\theta\}on

Ωt\\Omega\_\{t\}using pairwise ranking loss

14:Use

fθf\_\{\\theta\}to score candidate instance\-context pairs

15:Select seed contexts based on predicted suitability and diversity

16:Generate new context strategies

Δ​Pt\\Delta P\_\{t\}via Algorithm[2](https://arxiv.org/html/2605.15721#alg2): Gradient\-guided Context Evolution

17:Evaluate selected pairs from

X×Δ​PtX\\times\\Delta P\_\{t\}with

RR, obtaining

Δ​Ωt\\Delta\\Omega\_\{t\}
18:Update catalog:

Pt\+1←Pt∪Δ​PtP\_\{t\+1\}\\leftarrow P\_\{t\}\\cup\\Delta P\_\{t\}
19:Update interactions:

Ωt\+1←Ωt∪Δ​Ωt\\Omega\_\{t\+1\}\\leftarrow\\Omega\_\{t\}\\cup\\Delta\\Omega\_\{t\}
20:endfor

21:Train final preference model

fθf\_\{\\theta\}on

ΩT\\Omega\_\{T\}
22:return

PT,fθP\_\{T\},f\_\{\\theta\}

Algorithm 2Gradient\-guided Context Evolution1:Current catalog

PtP\_\{t\}, observed interactions

Ωt\\Omega\_\{t\}, trained NCF model

fθf\_\{\\theta\}, evaluator

RR, failure batch size

mm, number of latent seeds

kk, gradient steps

GG, learning rate

η\\eta, LLM reflector

ℳ\\mathcal\{M\}
2:Updated catalog

Pt\+1P\_\{t\+1\}, updated interactions

Ωt\+1\\Omega\_\{t\+1\}
3:Train

fθf\_\{\\theta\}on

Ωt\\Omega\_\{t\}with pairwise ranking loss

4:Fix the parameters of

fθf\_\{\\theta\}
5:Identify failure instances:

ℱt=\{xi∈X∣R​\(xi,pj\)=0,∀pj∈Pt\}\\mathcal\{F\}\_\{t\}=\\\{x\_\{i\}\\in X\\mid R\(x\_\{i\},p\_\{j\}\)=0,\\ \\forall p\_\{j\}\\in P\_\{t\}\\\}
6:Sample

mmfailure instances:

ℬt∼ℱt\\mathcal\{B\}\_\{t\}\\sim\\mathcal\{F\}\_\{t\}
7:Randomly sample

kkseed context strategies:

\{pa1,…,pak\}∼Pt\\\{p\_\{a\_\{1\}\},\\ldots,p\_\{a\_\{k\}\}\\\}\\sim P\_\{t\}
8:for

ℓ=1,…,k\\ell=1,\\ldots,kin paralleldo

9:Initialize latent context embedding:

hℓ\(0\)=ψ​\(paℓ\)h\_\{\\ell\}^\{\(0\)\}=\\psi\(p\_\{a\_\{\\ell\}\}\)
10:for

τ=0,…,G−1\\tau=0,\\ldots,G\-1do

11:Compute failure\-batch objective:

𝒥​\(hℓ\(τ\);ℬt\)=1m​∑xi∈ℬtsθ​\(hℓ\(τ\),xi\)\\mathcal\{J\}\(h\_\{\\ell\}^\{\(\\tau\)\};\\mathcal\{B\}\_\{t\}\)=\\frac\{1\}\{m\}\\sum\_\{x\_\{i\}\\in\\mathcal\{B\}\_\{t\}\}s\_\{\\theta\}\(h\_\{\\ell\}^\{\(\\tau\)\},x\_\{i\}\)
12:Update by gradient ascent:

hℓ\(τ\+1\)=Normalize⁡\(hℓ\(τ\)\+η​∇hℓ\(τ\)𝒥​\(hℓ\(τ\);ℬt\)\)h\_\{\\ell\}^\{\(\\tau\+1\)\}=\\operatorname\{Normalize\}\\left\(h\_\{\\ell\}^\{\(\\tau\)\}\+\\eta\\nabla\_\{h\_\{\\ell\}^\{\(\\tau\)\}\}\\mathcal\{J\}\(h\_\{\\ell\}^\{\(\\tau\)\};\\mathcal\{B\}\_\{t\}\)\\right\)
13:endfor

14:Set optimized target embedding:

h~ℓ=hℓ\(G\)\\tilde\{h\}\_\{\\ell\}=h\_\{\\ell\}^\{\(G\)\}
15:endfor

16:foreach context strategy

pj∈Ptp\_\{j\}\\in P\_\{t\}do

17:Compute average distance to optimized targets:

d​\(pj\)=1k​∑ℓ=1k‖ψ​\(pj\)−h~ℓ‖2d\(p\_\{j\}\)=\\frac\{1\}\{k\}\\sum\_\{\\ell=1\}^\{k\}\\left\\\|\\psi\(p\_\{j\}\)\-\\tilde\{h\}\_\{\\ell\}\\right\\\|\_\{2\}
18:endfor

19:Select the most promising existing context:

ppot=arg⁡minpj∈Pt⁡d​\(pj\)p\_\{\\mathrm\{pot\}\}=\\arg\\min\_\{p\_\{j\}\\in P\_\{t\}\}d\(p\_\{j\}\)
20:Use LLM reflection to evolve

ppotp\_\{\\mathrm\{pot\}\}:

pnew=ℳ​\(ppot,ℬt\)p\_\{\\mathrm\{new\}\}=\\mathcal\{M\}\(p\_\{\\mathrm\{pot\}\},\\mathcal\{B\}\_\{t\}\)
21:Evaluate

pnewp\_\{\\mathrm\{new\}\}on selected training instances and obtain

Δ​Ωt\\Delta\\Omega\_\{t\}
22:Update catalog:

Pt\+1=Pt∪\{pnew\}P\_\{t\+1\}=P\_\{t\}\\cup\\\{p\_\{\\mathrm\{new\}\}\\\}
23:Update interactions:

Ωt\+1=Ωt∪Δ​Ωt\\Omega\_\{t\+1\}=\\Omega\_\{t\}\\cup\\Delta\\Omega\_\{t\}
24:return

Pt\+1,Ωt\+1P\_\{t\+1\},\\Omega\_\{t\+1\}

## Appendix BProof of Theorem[1](https://arxiv.org/html/2605.15721#Thmtheorem1)

We bound the probability that the trained routerf^θ\\hat\{f\}\_\{\\theta\}selects a sub\-optimal context for an unseen test instance, recovering the form stated in Theorem[1](https://arxiv.org/html/2605.15721#Thmtheorem1)\.

We begin by analyzing a simpler reference routerfclusterf^\{\\mathrm\{cluster\}\}that maps each instancex∈𝒞kx\\in\\mathcal\{C\}\_\{k\}to its cluster’s anchor contextpk0p\_\{k\}^\{0\}, and bound its expected regret within each cluster\. Fix a cluster𝒞k\\mathcal\{C\}\_\{k\}and letxk⋆∈arg⁡maxx∈𝒞k⁡\(maxp⁡r​\(x,p\)−r​\(x,pk0\)\)x\_\{k\}^\{\\star\}\\in\\arg\\max\_\{x\\in\\mathcal\{C\}\_\{k\}\}\\,\(\\max\_\{p\}r\(x,p\)\-r\(x,p\_\{k\}^\{0\}\)\)be a worst\-case instance in the cluster\. By Assumption[1](https://arxiv.org/html/2605.15721#Thmassumption1)applied withp′=pk0p^\{\\prime\}=p\_\{k\}^\{0\}, for anyx∈𝒞kx\\in\\mathcal\{C\}\_\{k\}and anyp∈Pp\\in P,

\|\(r​\(x,p\)−r​\(x,pk0\)\)−\(r​\(xk⋆,p\)−r​\(xk⋆,pk0\)\)\|≤L​‖ϕ​\(x\)−ϕ​\(xk⋆\)‖≤L​ρK,\\bigl\|\(r\(x,p\)\-r\(x,p\_\{k\}^\{0\}\)\)\-\(r\(x\_\{k\}^\{\\star\},p\)\-r\(x\_\{k\}^\{\\star\},p\_\{k\}^\{0\}\)\)\\bigr\|\\;\\leq\\;L\\,\\\|\\phi\(x\)\-\\phi\(x\_\{k\}^\{\\star\}\)\\\|\\;\\leq\\;L\\rho\_\{K\},where the second inequality uses the standardKK\-means diameter bounddiam​\(ϕ​\(𝒞k\)\)≤ρK\\mathrm\{diam\}\(\\phi\(\\mathcal\{C\}\_\{k\}\)\)\\leq\\rho\_\{K\}\. Maximizing overp∈Pp\\in Pon both sides gives

maxp∈P⁡r​\(x,p\)−r​\(x,pk0\)≤maxp∈P⁡r​\(xk⋆,p\)−r​\(xk⋆,pk0\)\+L​ρK\.\\max\_\{p\\in P\}r\(x,p\)\-r\(x,p\_\{k\}^\{0\}\)\\;\\leq\\;\\max\_\{p\\in P\}r\(x\_\{k\}^\{\\star\},p\)\-r\(x\_\{k\}^\{\\star\},p\_\{k\}^\{0\}\)\\;\+\\;L\\rho\_\{K\}\.Taking expectation overx∼𝒟\|𝒞kx\\sim\\mathcal\{D\}\\,\|\\,\\mathcal\{C\}\_\{k\}and using the anchor quality of the warm\-up optimizer,𝔼x\|𝒞k​\[r​\(x,pk0\)\]≥rk⋆−α\\mathbb\{E\}\_\{x\\,\|\\,\\mathcal\{C\}\_\{k\}\}\[r\(x,p\_\{k\}^\{0\}\)\]\\geq r\_\{k\}^\{\\star\}\-\\alpha, we obtain

𝔼x\|𝒞k​\[maxp∈P⁡r​\(x,p\)−r​\(x,pk0\)\]≤α\+L​ρK\.\\mathbb\{E\}\_\{x\\,\|\\,\\mathcal\{C\}\_\{k\}\}\\\!\\Bigl\[\\max\_\{p\\in P\}r\(x,p\)\-r\(x,p\_\{k\}^\{0\}\)\\Bigr\]\\;\\leq\\;\\alpha\+L\\rho\_\{K\}\.Marginalizing over clusters yields a bound on the expected regret of the cluster\-anchor router:

𝔼x∼𝒟​\[Δcluster​\(x\)\]≤α\+L​ρK,\\mathbb\{E\}\_\{x\\sim\\mathcal\{D\}\}\\\!\\bigl\[\\Delta^\{\\mathrm\{cluster\}\}\(x\)\\bigr\]\\;\\leq\\;\\alpha\+L\\rho\_\{K\},\(1\)whereΔcluster​\(x\)=maxp⁡r​\(x,p\)−r​\(x,fcluster​\(x\)\)\\Delta^\{\\mathrm\{cluster\}\}\(x\)=\\max\_\{p\}r\(x,p\)\-r\(x,f^\{\\mathrm\{cluster\}\}\(x\)\)\.

We now relate the trained routerf^θ\\hat\{f\}\_\{\\theta\}to this reference\. The cluster\-anchor router corresponds to a lookup function over cluster assignments and is contained in the hypothesis classℱ\\mathcal\{F\}of NCF\-based routers, since the projection layersWxW\_\{x\}andWpW\_\{p\}can represent cluster\-conditional preferences\. Consequently, the population pairwise misranking error off^θ\\hat\{f\}\_\{\\theta\}satisfies

ℛ​\(f^θ\)≤ℛ​\(fcluster\)\+ξn,\\mathcal\{R\}\(\\hat\{f\}\_\{\\theta\}\)\\;\\leq\\;\\mathcal\{R\}\(f^\{\\mathrm\{cluster\}\}\)\+\\xi\_\{n\},whereξn=ℛ^n\+ℜn​\(ℱ\)\+log⁡\(1/δ\)/\(2​n\)\\xi\_\{n\}=\\widehat\{\\mathcal\{R\}\}\_\{n\}\+\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\)\+\\sqrt\{\\log\(1/\\delta\)/\(2n\)\}is the standard pairwise Rademacher generalization gap that holds with probability at least1−δ1\-\\delta\. Using the standard reduction from pairwise misranking to instance\-wise regret\[[6](https://arxiv.org/html/2605.15721#bib.bib1)\], the regret of any router is upper\-bounded by its pairwise misranking error up to a constant factor, and combining this with \([1](https://arxiv.org/html/2605.15721#A2.E1)\) gives

𝔼x∼𝒟​\[Δ​\(x\)\]≤α\+L​ρK\+ξn\.\\mathbb\{E\}\_\{x\\sim\\mathcal\{D\}\}\\\!\\bigl\[\\Delta\(x\)\\bigr\]\\;\\leq\\;\\alpha\+L\\rho\_\{K\}\+\\xi\_\{n\}\.\(2\)
Finally, sinceΔ​\(x\)∈\[0,1\]\\Delta\(x\)\\in\[0,1\]is non\-negative, applying Markov’s inequality to \([2](https://arxiv.org/html/2605.15721#A2.E2)\) yields

Prx∼𝒟⁡\[Δ​\(x\)\>ε\]≤𝔼​\[Δ​\(x\)\]ε≤α\+L​ρKε\+ξnε\.\\Pr\_\{x\\sim\\mathcal\{D\}\}\\\!\\bigl\[\\Delta\(x\)\>\\varepsilon\\bigr\]\\;\\leq\\;\\frac\{\\mathbb\{E\}\[\\Delta\(x\)\]\}\{\\varepsilon\}\\;\\leq\\;\\frac\{\\alpha\+L\\rho\_\{K\}\}\{\\varepsilon\}\\;\+\\;\\frac\{\\xi\_\{n\}\}\{\\varepsilon\}\.Absorbing the1/ε1/\\varepsilonfactor on the generalization term into the constants of the standard Rademacher bound recovers the form stated in Theorem[1](https://arxiv.org/html/2605.15721#Thmtheorem1):

Prx∼𝒟⁡\[Δ​\(x\)\>ε\]≤α\+L​ρKε\+ℛ^n\+ℜn​\(ℱ\)\+log⁡\(1/δ\)2​n\.\\Pr\_\{x\\sim\\mathcal\{D\}\}\\\!\\bigl\[\\Delta\(x\)\>\\varepsilon\\bigr\]\\;\\leq\\;\\frac\{\\alpha\+L\\rho\_\{K\}\}\{\\varepsilon\}\\;\+\\;\\widehat\{\\mathcal\{R\}\}\_\{n\}\+\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\)\+\\sqrt\{\\tfrac\{\\log\(1/\\delta\)\}\{2n\}\}\.∎

#### Remarks\.

The Markov step above is loose for smallε\\varepsilon, and can be tightened to a Bernstein\- or McDiarmid\-type bound when a variance estimate ofΔ​\(x\)\\Delta\(x\)is available; we adopt the simpler form because it already exposes the qualitative dependence onKK,α\\alpha, andL​ρKL\\rho\_\{K\}\. The catalog sizeMMenters the bound implicitly throughℜn​\(ℱ\)\\mathfrak\{R\}\_\{n\}\(\\mathcal\{F\}\), which grows mildly withMMfor typical NCF architectures, explaining why expanding the catalog without growing the interaction setΩ\\Omegadoes not by itself reduce regret\.

## Appendix CExperiment Settings

### C\.1Datasets

We evaluate NCCE on three reasoning\-oriented benchmarks: HoVer, SCONE, and HotpotQA\. These datasets cover different forms of instance heterogeneity and therefore provide a suitable testbed for instance\-wise context construction\.

#### HoVer\.

\[[11](https://arxiv.org/html/2605.15721#bib.bib26)\]HoVer is a multi\-hop fact verification benchmark\. Each instance requires verifying a claim based on evidence that may span multiple documents\. This task benefits from context strategies that encourage evidence grounding, decomposition, and careful verification\.

#### SCONE\.

\[[15](https://arxiv.org/html/2605.15721#bib.bib27)\]SCONE is a context\-dependent semantic parsing and state\-tracking benchmark\. Each instance consists of natural language instructions grounded in an evolving world state\. This task requires context strategies that help the model track state transitions and produce constrained outputs\.

#### HotpotQA\.

\[[35](https://arxiv.org/html/2605.15721#bib.bib28)\]HotpotQA is a multi\-hop question answering benchmark\. Each instance requires reasoning over multiple pieces of evidence to produce the final answer\. This task is useful for testing whether NCCE can route different questions to context strategies with appropriate reasoning formats and demonstrations\.

Across all datasets, we use task accuracy as the primary evaluation metric\. For each instance, the prediction is considered correct if the final answer matches the ground\-truth label or answer under the dataset\-specific evaluation protocol\.

### C\.2Base LLM and Implementation Details

We use GPT\-4o\-mini as the target LLM for all experiments\. During training, context strategies are evaluated by applying them to selected training instances and computing task accuracy\. These observed accuracies form the sparse instance\-context interaction matrix used to train the NCF preference model\.

For semantic encoding, we use a frozen text encoder to represent both input instances and context strategies\. The NCF model is trained with the pairwise ranking loss described in Section[3](https://arxiv.org/html/2605.15721#S3)\. Unless otherwise specified, NCCE usesKKsemantic clusters for initialization and performsTTrounds of context\-CF co\-evolution\.

The warm\-up optimizer in cluster\-based initialization is MIPROv2\. We emphasize that MIPROv2 plays two roles in our experiments: it serves as a strong global optimization baseline, and it is also used as a replaceable local optimizer for generating cluster\-specific anchor contexts in NCCE\.

### C\.3Baselines

We compare NCCE with representative automatic prompt and context optimization methods\. For all baselines, the optimized result is used as a single global context strategy for all test instances\. This setting directly contrasts global context optimization with NCCE’s instance\-wise routing\.

#### APE\.

Automatic Prompt Engineer generates and selects task instructions from a pool of LLM\-proposed candidates, optimizing the instruction according to a task\-specific score\.

#### OPRO\.

Optimization by PROmpting uses an LLM as a black\-box optimizer\. It iteratively conditions on previous candidate prompts and their scores to generate improved prompts\.

#### EvoPrompt\.

EvoPrompt connects LLMs with evolutionary algorithms\. It maintains a population of prompts and applies evolutionary operators to generate new candidates, which are selected according to development\-set performance\.

#### TextGrad\.

TextGrad optimizes textual variables by backpropagating natural\-language feedback produced by LLMs\. We use it to optimize the textual components of the context strategy\.

#### MIPROv2\.

MIPROv2 optimizes instructions and few\-shot demonstrations for downstream task performance\. In our experiments, MIPROv2 serves both as a strong global optimization baseline and as the replaceable warm\-up optimizer used in NCCE’s cluster\-based initialization\.

#### GEPA\.

GEPA is a reflective prompt evolution method that uses LLM\-generated reflections over execution traces to propose prompt updates and selects candidates through Pareto\-aware evolutionary search\.

## Appendix DRelated Works

### D\.1Prompt and Context Engineering

Foundational Prompting and Sensitivity\.Large Language Models \(LLMs\) are highly sensitive to their input contexts, including instructions, few\-shot examples, and reasoning formats\. Foundational techniques such as Chain\-of\-Thought prompting have demonstrated that structuring the input context to include intermediate reasoning steps can significantly elicit complex reasoning capabilities in LLMs\[[33](https://arxiv.org/html/2605.15721#bib.bib4)\]\. However, the manual design of these prompts is labor\-intensive, motivating the development of automated context optimization techniques\.

Automated Prompt Optimization\.To systematically discover effective contexts, a substantial body of work treats prompt engineering as an automated search problem\. Automatic Prompt Engineer \(APE\) demonstrates that LLMs can act as human\-level prompt engineers by generating and selecting optimal task instructions from a candidate pool\[[40](https://arxiv.org/html/2605.15721#bib.bib2)\]\. Similarly, Optimization by PROmpting \(OPRO\) utilizes LLMs as black\-box optimizers to iteratively condition on previous prompts and generate improvements\[[34](https://arxiv.org/html/2605.15721#bib.bib3)\]\. TextGrad introduces a novel paradigm of automatic differentiation by backpropagating natural language feedback to optimize textual variables\[[36](https://arxiv.org/html/2605.15721#bib.bib5)\]\. To handle more complex pipelines, MIPROv2 optimizes both instructions and few\-shot demonstrations for multi\-stage language model programs\[[18](https://arxiv.org/html/2605.15721#bib.bib10)\], while PromptWizard provides a framework specifically designed for task\-aware prompt optimization\[[2](https://arxiv.org/html/2605.15721#bib.bib7)\]\.

Evolutionary and Self\-Improving Prompts\.Inspired by biological evolution, several methods maintain a population of prompt candidates and iteratively improve them\. EvoPrompt seamlessly connects LLMs with evolutionary algorithms, applying evolutionary operators to yield powerful prompt optimizers\[[9](https://arxiv.org/html/2605.15721#bib.bib6)\]\. Pushing this further, Promptbreeder achieves self\-referential self\-improvement by simultaneously evolving both the prompts and the task\-specific mutation operators\[[8](https://arxiv.org/html/2605.15721#bib.bib8)\]\. Open\-source initiatives like OpenEvolve have also extended these evolutionary capabilities to specific domains such as coding agents\[[28](https://arxiv.org/html/2605.15721#bib.bib9)\]\. Recent advancements highlight the efficiency of this paradigm: GEPA leverages reflective prompt evolution to outperform traditional reinforcement learning\[[3](https://arxiv.org/html/2605.15721#bib.bib11)\], and POLCA introduces a stochastic generative optimization framework utilizing LLMs\[[21](https://arxiv.org/html/2605.15721#bib.bib15)\]\.

Reflective and Bandit\-Based Strategies\.Beyond evolutionary search, recent literature explores self\-correction and dynamic exploration\. Reflexion equips language agents with verbal reinforcement learning, allowing them to iterate and improve their behavior through generated self\-reflections\[[29](https://arxiv.org/html/2605.15721#bib.bib13)\]\. Other frameworks conceptualize LLMs as in\-context bandit reinforcement learners to balance exploration and exploitation during interaction\[[16](https://arxiv.org/html/2605.15721#bib.bib14)\]\. Finally, methods like Rainbow Teaming utilize open\-ended generation to produce diverse adversarial prompts, improving model robustness and uncovering failure modes\[[24](https://arxiv.org/html/2605.15721#bib.bib12)\]\.

While these general and evolutionary formulations produce strong optimization systems, they impose a restrictive assumption that a single, globally applied strategy can serve all diverse instances equally well, often leaving substantial instance\-level performance gains untapped\. In contrast, our work introduces a Context\-CF Co\-Evolution mechanism that acts as a synergistic feedback loop\. By utilizing a lightweight NCF model as a differentiable guide, our method identifies "blind spots" and iteratively evolves new, specialized context variants specifically tailored for failure instances, ensuring dynamic catalog expansion and granular preference understanding rather than relying on a static global prompt\.

### D\.2Recommendation and Collaborative Filtering

Recommender systems have a rich history of learning latent preference structures from sparse user\-item interaction signals\. Early collaborative filtering \(CF\) approaches relied heavily on neighborhood\-based methods, particularly item\-to\-item similarity algorithms, which proved highly scalable and effective for commercial applications\[[25](https://arxiv.org/html/2605.15721#bib.bib16),[14](https://arxiv.org/html/2605.15721#bib.bib17)\]\. This paradigm evolved significantly with the popularization of Matrix Factorization \(MF\), which projects users and items into a shared latent space to predict interactions based on inner products\[[13](https://arxiv.org/html/2605.15721#bib.bib18)\]\. To address severe data sparsity and incorporate arbitrary side features, Factorization Machines \(FM\) were later introduced as a robust framework for context\-aware recommendation\[[23](https://arxiv.org/html/2605.15721#bib.bib20)\]\. Crucially, for scenarios driven by implicit feedback, Bayesian Personalized Ranking \(BPR\) established that optimizing for relative preferences via a pairwise ranking loss yields vastly superior performance compared to absolute pointwise score prediction\[[22](https://arxiv.org/html/2605.15721#bib.bib19)\]\.

The integration of deep learning further transformed recommendation architectures by enabling the capture of complex, non\-linear user\-item relationships\. Early neural adaptations successfully applied autoencoders to reconstruct collaborative filtering inputs\[[26](https://arxiv.org/html/2605.15721#bib.bib25)\]\. Subsequently, the industry shifted toward massive deep neural networks capable of multi\-stage candidate generation and ranking\[[7](https://arxiv.org/html/2605.15721#bib.bib22)\], as well as Wide & Deep architectures that dynamically balance the memorization of shallow models with the generalization capabilities of deep networks\[[5](https://arxiv.org/html/2605.15721#bib.bib23)\]\. Most notably, Neural Collaborative Filtering \(NCF\) generalized traditional MF by replacing the static inner product with a learnable multi\-layer perceptron, significantly enhancing the model’s expressive power for user\-item matching\[[10](https://arxiv.org/html/2605.15721#bib.bib21),[19](https://arxiv.org/html/2605.15721#bib.bib39),[42](https://arxiv.org/html/2605.15721#bib.bib41),[20](https://arxiv.org/html/2605.15721#bib.bib42)\]\.

Despite these successes, traditional ID\-based CF inherently struggles with the cold\-start problem, lacking the ability to generalize to unseen users or items\. To overcome this limitation, inductive matrix completion methods leverage underlying semantic features \(e\.g\., via graph neural networks\) rather than fixed identity embeddings, allowing models to infer preferences for out\-of\-vocabulary entities\[[37](https://arxiv.org/html/2605.15721#bib.bib24)\]\.

And recently, there emerge lots of works on LLM for recommendation, leveraging LLM’s open\-world semantic knowledge with traditional CF signals\[[41](https://arxiv.org/html/2605.15721#bib.bib40),[27](https://arxiv.org/html/2605.15721#bib.bib48)\]\.

Our framework is deeply inspired by this technological lineage\. By conceptualizing input instances as users and context strategies as items, we frame LLM context engineering as an inductive recommendation problem\[[37](https://arxiv.org/html/2605.15721#bib.bib24)\]\. We utilize NCF\[[10](https://arxiv.org/html/2605.15721#bib.bib21)\]trained with a pairwise ranking objective\[[22](https://arxiv.org/html/2605.15721#bib.bib19)\]to perform dynamic, instance\-wise context routing, effectively bridging mature CF principles with the frontier of automated prompt optimization\.

## Appendix ELimitations

While NCCE demonstrates substantial improvements in instance\-wise context routing, we acknowledge a few boundaries of our current study that present natural avenues for future research\.

First, the Context\-CF Co\-Evolution phase relies on an LLM reflector to iteratively generate and evaluate new contexts\. While this is a one\-time, offline training cost and inference remains highly efficient, scaling the co\-evolution process to massive datasets incurs unavoidable API latency and computational overhead\. Future work could explore using smaller, locally hosted models for the reflection step to reduce dependency on proprietary APIs during training\.

Second, our experiments primarily validate the framework using a single, highly capable base LLM\. Although the NCF routing mechanism is inherently model\-agnostic, exploring the cross\-model transferability of the learned context catalog—for instance, evaluating whether contexts evolved using a frontier model can be effectively routed to smaller, open\-weight models—remains an interesting open question\.

## Appendix FExperimental Details

### F\.1Dataset Partitioning and Statistics

To evaluate the effectiveness of the instance\-wise context routing, we adopt a systematic partitioning strategy for all three datasets\.

#### Data Splitting Protocol

For each dataset, we first aggregate all available samples and perform a random shuffle\. A fixed number of instances are then held out as a Test Set to ensure an unbiased final evaluation\. The remaining samples form a Training/Dev Pool\.

#### Cluster\-based Partitioning

To facilitate the Context\-CF co\-evolution, we apply K\-means clustering to this pool based on the input text embeddings\. Within each cluster, samples are further divided into Training and Dev sets using a 1:1 ratio\. This cluster\-aware split ensures that both the router training and the context evolution stages have balanced access to the diverse semantic patterns identified during the clustering phase\.

#### Dataset Statistics

Table[3](https://arxiv.org/html/2605.15721#A6.T3)summarizes the number of samples used for training, dev, and testing across the three tasks\.

Table 3:Summary of dataset partitions for HoVer, SCONE, and HotpotQA\.

### F\.2Model Configurations and Resource Costs

In this section, we specify the model architectures, API utilization, and the hardware environment used for our experiments\.

We employ specialized models for different roles within the NCCE framework\. Specifically, we use GPT\-4o\-mini as the task model to perform reasoning and generate final answers\. For the Context\-CF Co\-Evolution stage, GPT\-4o\[[17](https://arxiv.org/html/2605.15721#bib.bib29)\]is utilized as the prompt model to refine and generate context variants\. Additionally, all\-MiniLM\-L6\-v2\[[32](https://arxiv.org/html/2605.15721#bib.bib36)\]is adopted as the embedding model to compute semantic features for clustering and NCF router input\.

All large language model components are accessed via official OpenAI APIs\. During the training and testing phases, we recorded the frequency of API interactions\. Table[4](https://arxiv.org/html/2605.15721#A6.T4)provides an estimated total number of API calls for each dataset, encompassing bootstrapping, evolutionary rounds, and final evaluation\.

The training and optimization of the MLP\-based NCF router are conducted on a single NVIDIA RTX 4090 GPU\. Given the lightweight design of our router, this setup provides sufficient computational power for rapid co\-evolution and efficient inference\.

Table 4:Estimated total API calls for each dataset\.
### F\.3Router Training Configurations

In this section, we provide the detailed hyperparameter search space and final configurations for router training\. The router is implemented as a Multi\-Layer Perceptron with two hidden layers of 1024 and 512 units, respectively, using ReLU as the activation function\. For each evolution round, we perform a grid search over the following candidate values: Learning Rateη∈\{1​e\-​2,5​e\-​3,2​e\-​3,1​e\-​3,5​e\-​4\}\\eta\\in\\\{1\\text\{e\-\}2,5\\text\{e\-\}3,2\\text\{e\-\}3,1\\text\{e\-\}3,5\\text\{e\-\}4\\\}, Batch Sizem∈\{64,128,256,512\}m\\in\\\{64,128,256,512\\\}, Dropout∈\{0\.05,0\.1,0\.15,0\.2\}\\in\\\{0\.05,0\.1,0\.15,0\.2\\\}, Temperatureτ∈\{0\.8,1\.0,1\.2\}\\tau\\in\\\{0\.8,1\.0,1\.2\\\}, and Lambdaλ∈\{5​e\-​2,1​e\-​2,5​e\-​3,1​e\-​3,5​e\-​4\}\\lambda\\in\\\{5\\text\{e\-\}2,1\\text\{e\-\}2,5\\text\{e\-\}3,1\\text\{e\-\}3,5\\text\{e\-\}4\\\}\. The optimal hyperparameters selected for each round across different datasets are summarized in Table[5](https://arxiv.org/html/2605.15721#A6.T5),[6](https://arxiv.org/html/2605.15721#A6.T6), and[7](https://arxiv.org/html/2605.15721#A6.T7)

Table 5:Hyperparameter Settings for HoVer across Evolution Rounds\.Table 6:Hyperparameter Settings for SCONE across Evolution Rounds\.Table 7:Hyperparameter Settings for HotpotQA across Evolution Rounds\.
### F\.4Implementation Details of Context Evolution

In this section, we provide the implementation details of the gradient\-guided context evolution, including the reflection prompts used for generation and examples of the evolved context strategies\.

Youareanexpertpromptengineer\.

YourgoalistoimprovetheinstructionforaspecificstepinaDSPyprogrambasedonfailedexamples\.

CurrentInstruction:

"\{current\_instruction\}"

FailedExamples\(Feedback\):

\{feedback\}

Pleaseanalyzethefailuresandgeneratearefinedinstructionthathandlesthesecasesbetterwhilemaintainingoverallperformance\.

Wrapthenewinstructionin<prompt\>and</prompt\>tags\.

Listing 1:The Reflection Prompt Template for Gradient\-guided Context Evolution\[Module:summarize1\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"claim":"StephenPearcyisolderthananothersinger\.ThisothersingerwasonceamemberofthegroupWonderGirls\.",

"passages":\[

"StephenPearcy\|StephenEricPearcy\(bornJuly3,1956\),knownasStephenPearcy,isanAmericanmusician\.Heisbestknownasthefounder,singer,andsongwriteroftheheavymetalbandRatt\.HehasalsobeeninthebandsFiredome,CrystalPystal,Arcade,ViciousDelite,Vertex\.Hehasalsorecordedfourrecordsasasoloartist\.",

"StephenDuffy\|StephenAnthonyJamesDuffy\(born30May1960inAlumRock,Birmingham,England\)isanEnglishsinger/songwriter,andmulti\-instrumentalist\.Hewasafoundingmemberandvocalistandbassist\(andthendrummer\)ofDuranDuran\.Hewentontorecordasasoloperformerunderseveraldifferentnames,andisthesingerandsongwriterforTheLilacTimewithhisolderbrotherNick\.Hehasalsoco\-writtenwithRobbieWilliamsandStevenPage\.",

"ViciousDelite\|ViciousDelitewasanalternativemetalbandformedbyStephenPearcyin1995afterArcade\.Theyreleasedalimitedself\-titledEPin1995followedbyafull\-lengthalbum\.ThemembersofViciousDelitewereStephenPearcy,TonyMarcus,MichaelAndrews,andRobKarras\.StephenPearcywassaidtostillbeinhisRattform,indeedhisvoicewasjustascompelling\.Thoughhislook\(shorterhair&nofancyclothesormakeup\)wasdifferent,hison\-stageandoff\-stagepersonalitywasstillasexciting\.MikeDuda\(ofW\.A\.S\.P\.\)wouldreplaceAndrewsonbass,andwaswithPearcyhandingoutdemoCDsjustafewweeksaftertheband’sformation\.Theirfirstliveperformancewasintense&wonderfullysurprising\.Pearcystillhaditinalltherightwaysneededforatruerockstarandqualitymusician\.",

"Vertex\(band\)\|Vertexwasabandformedin1995featuringsingerStephenPearcy\(Ratt,Arcade,ViciousDelite,Nitronic\),guitaristAlPitrelli\(DangerDanger,Hotshot,AliceCooper,Asia,Savatage,Megadeth,Trans\-SiberianOrchestra\),anddrummerHiroKuretani\(WXXI,TrancentralStation\)\.",

"StephenCummings\|StephenDonaldCummings\(born13September1954inMelbourne,Australia\)isanAustralianrocksinger\-songwriterandwriter\.HewasleadsingerofMelbourne\-basedrockband,TheSports,from1976to1981,followedbyasolocareerwhichhasmetwithcriticalacclaimbuthashadlimitedcommercialsuccess\.Hehaswrittentwonovels,\\"Wonderboy\\"\(1996\)and\\"StayAwayfromLightningGirl\\"\(1999\),andamemoir,\\"WillitBeFunnyTomorrow,Billy?\\"\(2009\)\.In2014adocumentaryfilm\\"Don’tThrowStones\\"basedonhismemoirpremieredaspartoftheMelbourneInternationalFilmFestival\.",

"UnderMySkin\(StephenPearcyalbum\)\|UnderMySkinisthethirdsoloalbumbyStephenPearcy,thefounder,leadsingerandsongwriteroftherockbandRatt\.Featuredonthealbumisthesingle\\"RoundandRound\(FeaturingTheDonnas\)\\"whichisaremakeofthe1984Ratthit\\"RoundandRound\\"\.ThealbumwasreleasedonJuly22,2008byTopFuelRecordsandAirlineRecords\.Thetrack\\"AreYouReady\\",likepreviousrecordings\\"DriveWithMe\\"andthererecordedArcadetrack\\"HottRacin’\\",washeardonNHRA/ESPN2\.",

"StephenGately\|StephenPatrickDavidGately\(17March197610October2009\)wasanIrishpopsinger\-songwriter,actor,children’swriteranddancer,who,withRonanKeating,wasoneoftwoleadsingersofthepopgroupBoyzone\.GatelyandKeatingoriginallyservedasco\-frontmen,butKeatinggraduallybecamemoresynonymouswiththegroupastimewenton\.AllofBoyzone’sstudioalbumshitnumberoneintheUnitedKingdom,theirthirdbeingtheirmostsuccessfulinternationally\.WithBoyzone,Gatelyhadarecord\-breakingsixteenconsecutivesinglesenterthetopfiveoftheUKSinglesChart\.Heperformedformillionsoffansglobally\.Hereleasedasoloalbumin2000,afterthegroup’sinitialbreak\-up,whichchartedintheUKtoptenandyieldedthreeUKhitsingles,includingthetopthreehit\\"NewBeginning\\"\.Gatelywentontoappearvariouslyinstageproductionsandontelevisionprogrammesaswellascontributingsongstovariousprojects\.In2008,herejoinedhiscolleaguesasBoyzonereformedforaseriesofconcertsandrecordings\."

\],

"reasoning":"TodeterminewhetherStephenPearcyisolderthantheformermemberofWonderGirls,wemustfirstidentifywhotheformerWonderGirlsmemberisandtheirage\.ThepassagesprovideddonotmentionWonderGirlsoranymembersfromthegroup,buttheydoprovidebirthdatesforvariousindividualsnamedStephen:

1\.StephenPearcywasbornonJuly3,1956\.

2\.StephenDuffywasbornonMay30,1960\.

3\.StephenCummingswasbornonSeptember13,1954\.

4\.StephenGatelywasbornonMarch17,1976\.

NoneoftheaboveindividualswereassociatedwithWonderGirls,aSouthKoreangirlgroupformedin2007byJYPEntertainment\.Therefore,wedonothaveinformationfromthepassagesthatexplicitlyidentifiesaformerWonderGirlsmembertocompareageswithStephenPearcy\.

Toproceed,wecanascertainthatStephenPearcy,bornin1956,isolderthanStephenDuffy\(bornin1960\)andStephenGately\(bornin1976\)\.HeisyoungerthanStephenCummings\(bornin1954\)\.However,withoutthespecificageoridentificationofaformerWonderGirlsmember,wecannotdefinitivelyconcludetheclaimbasedonthepassagesprovided\.",

"summary":"ThepassagesprovidebirthdatesforseveralindividualsnamedStephen,butnoneareassociatedwithWonderGirls\.StephenPearcy,bornin1956,isolderthanStephenDuffy\(1960\)andStephenGately\(1976\),butyoungerthanStephenCummings\(1954\)\.WithoutinformationonaformerWonderGirlsmember,theclaimregardingStephenPearcy’sagecomparedtothisunidentifiedsingercannotbeconfirmed\."

\}

\],

"signature":\{

"instructions":"Usingtheprovided‘claim‘and‘passages‘,generatealogical‘reasoning‘thatconnectstheinformationinthepassagestotheclaim,andproduceaconcise‘summary‘thatvalidatesoraddressestheclaimbasedonthesynthesizedreasoningsteps\.",

"fields":\[

\{

"prefix":"Claim:",

"description":"$\{claim\}"

\},

\{

"prefix":"Passages:",

"description":"$\{passages\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Summary:",

"description":"$\{summary\}"

\}

\]

\}

\}

\}

\[Module:create\_query\_hop2\]

\{

"predict":\{

"demos":\[\],

"signature":\{

"instructions":"Youarearesearcherworkingtoverifyfactualclaimsrelatedtoculturalfigures\.Givena‘claim‘andapreliminary‘summary\_1‘ofinformationretrieved,systematicallydevelop‘reasoning‘thatoutlinesthelogicalstepsconnectingthepresenteddatatotheclaim’sevaluation\.Basedonthisreasonedevaluation,constructaprecise‘query‘aimedatgatheringadditionaldatanecessaryforacomprehensiveverificationoftheclaim\.",

"fields":\[

\{

"prefix":"Claim:",

"description":"$\{claim\}"

\},

\{

"prefix":"Summary1:",

"description":"$\{summary\_1\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Query:",

"description":"$\{query\}"

\}

\]

\}

\}

\}

\[Module:summarize2\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"claim":"LizzyHaleisfromthegroupwhohadasingle\\"IGetOff\\"\.Shealsoperformsonthesong\\"Apocalypic\\"\.",

"context":"LzzyHaleisconfirmedtobeassociatedwiththesong\\"IGetOff\\"astheleadsingerofHalestorm,thebandthatreleasedthetrack\.Althoughthepassagesdonotmention\\"Apocalyptic\\",LzzyHaleperformsonthatsongaswell,asitisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\.",

"passages":\[

"Apocalyptic\(song\)\|\\"Apocalyptic\\"isasongbytheAmericanhardrockbandHalestorm\.ItwasreleasedonJanuary12,2015,astheleadsinglefromtheband’sthirdstudioalbum,\\"IntotheWildLife\\"\.ThevideoforthesongwasreleasedonJanuary28\.",

"IGetOff\|\\"IGetOff\\"isthefirstsinglereleasedbythehardrockband,Halestorm\.Itistakenfromtheirself\-titleddebutalbum\.",

"Repressed\|\\"Repressed\\"isasinglebyApocalyptica,releasedon19May2006\.ThetitlesongfeaturesMaxCavalera\(SoulflyandSepultura\)andMattTuck\(BulletformyValentine\)onvocals\.It’smostlysunginEnglishandPortuguese,whichpartsinthelastonearedonebyCavalera\.",

"ShatterMe\(song\)\|\\"ShatterMe\\"isasongcomposedandperformedbyAmericanviolinistLindseyStirlingforhersecondstudioalbumofthesamename,andfeaturesvocalsfromAmericanrocksingerLzzyHale,theleadsingerofAmericanrockbandHalestorm\.ThesongwaswrittenandcomposedbyStirling,DiaFrampton,andthesong’sproducer,MarkMaxwell,underhisproductionmonikerSILAS\.Thethemesexploredin\\"ShatterMe\\"werebasedinpartonStirling’sexperienceswithaneatingdisorder\.ItbecameStirling’sfirstchartingairplaysingleinNovember2014whenitenteredthe\\"Billboard\\"AdultPopSongsatnumber39\.Priortothat,ithadpeakedatnumber1ontheClassicalDigitalSongsandnumber10ontheDance/ElectronicDigitalSongscharts,aswellaschartinginGermanyatnumber59\.",

"Halestorm\|HalestormisanAmericanhardrockbandfromRedLion,Pennsylvania,consistingofleadvocalistandguitaristLzzyHale,herbrotherdrummerandpercussionistArejayHale,guitaristJoeHottinger,andbassistJoshSmith\.Thegroup’sself\-titleddebutalbumwasreleasedonApril28,2009,throughAtlanticRecords\.Theirsecondalbum\\"TheStrangeCaseOf\.\.\.\\"wasreleasedonApril10,2012\.Itsleadsingle\\"LoveBites\(SoDoI\)\\"fromthatalbumwontheirfirstGrammyAwardforBestHardRock/MetalPerformanceonFebruary10,2013\.",

"JohnnyAndrews\|JohnnyAndrewsisanAmericansongwriterandproducerbasedinAtlanta,Georgia,andNashville,Tennessee\.Andrewshaswrittenmultiplenumberonesinglesincluding,""IAmMachine"and"Painkiller"byThreeDaysGrace,"FreakLikeMe"byHalestorm,and"StandUp"byAllThatRemains\.Inadditiontothesebands,Andrews’songshavebeenperformedbyApocalypticafeaturingCoreyTaylor\("Slipknot"\),andGavinRossdale,TheoryofaDeadMan,Flyleaf,Red,SickPuppies,andChiodos",

"Halestorm\(album\)\|HalestormisthedebutstudioalbumbyAmericanrockbandHalestorm\.Itwasreleasedin2009onAtlanticRecords\.HowardBensonproducedthealbum\.Itpeakedat\#40onthe\\"Billboard\\"200\.Thefirstsinglefromthealbum,\\"IGetOff\\"receivedconsiderableplaytime\.Thesongreached\#17on\\"Billboard\\"<nowiki\>’</nowiki\>sRocklist\.Thetrack\\"Innocence\\"wasco\-writtenbyex\-Evanescencemember,BenMoody\."

\],

"reasoning":"TheclaimstatesthatLzzyHaleisfromthegroupthatreleasedthesingle\\"IGetOff\\"andthatshealsoperformsonthesong\\"Apocalyptic\\"\.Fromthecontextandpassagesprovided,itisclearthatLzzyHaleistheleadvocalistofHalestorm,thebandthatreleasedboth\\"IGetOff\\"and\\"Apocalyptic\\"\.Passage\[1\]confirms\\"Apocalyptic\\"asasongbyHalestorm,releasedastheleadsinglefromtheiralbum\\"IntotheWildLife\\",whilepassage\[2\]confirms\\"IGetOff\\"asthefirstsinglefromHalestorm’sdebutalbum\.ThecontextfurthersupportstheclaimbyaffirmingHale’sassociationwithbothsongsaspartofHalestorm’sdiscography\.",

"summary":"LzzyHaleisindeedassociatedwiththesong\\"IGetOff\\"astheleadsingerofHalestorm,thebandthatreleasedthetrack\.Additionally,sheperformsonthesong\\"Apocalyptic\\",whichisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\."

\},

\{

"augmented":true,

"claim":"LivehasmoremembersthantheAmericanrockbandwhosesecondalbumiscalledSweetSour\.",

"context":"TheclaimthatLivehasmoremembersthanthebandwiththealbum\\"SweetSour\\"istruebecauseLivehasfourmembers,whileBandofSkulls,thebandwiththe\\"SweetSour\\"album,isatriowiththreemembers\.",

"passages":\[

"FiveMenLive\|FiveMenLiveisaTalismanliverecordapartofadoublepackagethatFrontiersRecordsreleasedwiththeband\.",

"TheFourPostmen\|TheFourPostmenisaLosAngeles\-basedrockgroupthatconsistsoffivebandmembers\.Featuringelectricandacousticguitars,keyboards,drums,bass,andanemphasisonthree\-partvocalharmony,theirliveshowishighlyenergeticandtheatrical,completewithnon\-stopcomicbanterbetweensongs\.SometimescomparedtoBarenakedLadies,GQMagazinepreferstocallthem,\\"TheSeinfeldesqueMonkees\.\\"",

"BeyondSpaceWithoutLimits\|ThealbumfeaturesthesixregularmembersoftheInfinitylive\-bandaugmentedwithanumberof,largelylocal,guests\.",

"Fifteen\(band\)\|FifteenwasapunkrockbandformedbyJackCurranandJeffOtt\.AccordingtoOtt,theband’sonlyconsistentmemberinits11\-yearhistory,Fifteenhadatotaloffifteenmembersbythetimethegroupdisbandedin2000:JackCurran,MikeyMischief,MarkMoreno,JeanRepetto,LuckyDog,ChrisFlanagan,JohnOgle,JesseWickman,JoeCable,JohnQuintos,ScottPierce,JimPrior,LisaD\.,VanessaBainandOtthimself\.",

"Live\(band\)\|Live\(,oftentypesetasLIVEor\+LIVE\+\)isanAmericanrockbandfromYork,Pennsylvania,consistingofEdKowalczyk\(leadvocals,rhythmguitar\),ChadTaylor\(leadguitar,backingvocals\),PatrickDahlheimer\(bass\),andChadGracey\(drums\)\.Kowalczykleftthebandin2009andwasreplacedbyChrisShinn,butrejoinedinDecember2016\.",

"Bandsociety\|Abandsocietyisthesimplestformofhumansociety\.Abandgenerallyconsistsofasmallkingroup,nolargerthananextendedfamilyorclan\.Thegeneralconsensusofmodernanthropologyseestheaveragenumberofmembersofasocialbandatthesimplestlevelofforagingsocietiesasrangingfrom30to50people\.",

"SubOslo\|SubOsloisanimprovisationalbandfromDenton,TXthatplayspsychedelic/ambientdub\.Thebandwasformedin1996byMiguelVelizandQuincyHolloway,whowereroommatesatthetime\.Sincethenthebandhasexpandedtoincludeatotalof8members\.WhenSubOsloperformslive,theyhavealivevisualartisttodisplayvisualizationsprojectedontoascreen,andincorporateamixengineer,allowingperformancestobedifferenteachtime\.SubOslohastouredtheWestCoast,Japan,andhasalsohadtheirmusicfeaturedintheXboxvideogameBruteForce\.TheyhavealsoperformedalongsideinfluentialdubartistssuchasMadProfessor,SteelPulseandTheRoots\.TheyalsoperformedwithlessknownartistssuchasFugazi,RazMesanai,theMake\-up,Yeti,Him,Juneof44,andDJKrush\."

\],

"reasoning":"TheclaimisaboutthenumberofmembersinthebandLivecomparedtothebandwhosesecondalbumis\\"SweetSour,\\"whichisBandofSkulls\.Basedonthecontextprovided,LiveisanAmericanrockbandwithfourmembers\.Thepassagesconfirmthisinformationwithdetailsabouttheband’slineup\.BandofSkulls,knownforthealbum\\"SweetSour,\\"isatrio,meaningithasthreemembers\.Therefore,LiveindeedhasmoremembersthanBandofSkulls\.",

"summary":"TheclaimthatLivehasmoremembersthanthebandwiththe\\"SweetSour\\"albumistrue\.Livehasfourmembers,whileBandofSkullshasthreemembers\."

\},

\{

"augmented":true,

"claim":"Theaward\-winningCD,thatJacquelineHorner\-Kwiatekwasafeaturedsoloiston,includestheromanticthemeforthegame\\"CivilizationIV,\\"writtenbyChristopherTin\.",

"context":"JacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningCD\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"fromthegame\\"CivilizationIV\.\\"",

"passages":\[

"JacquelineHorner\-Kwiatek\|JacquelineHorner\-Kwiatekisamezzo\-sopranospecializinginbothearlymusicandnewworks\.SheisamemberofthecriticallyacclaimedensembleAnonymous4andwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningCD\\"CallingAllDawns\\"\.OriginallyfromBelfast,N\.Ireland,sheworkedinEuropewithmanyiconiccomposersincludingStockhausen,Berio,XenakisandJudithWeirbeforemovingtotheUnitedStates\.InadditiontoherworkwithAnonymous4shehascontinuedherworkinnewmusic,collaboratingwithDavidLang,GregorySpears,DanielThomasDavisandPhilKline\.SheisavoiceteacherandhasbeenavisitingartistatmanyuniversitiesintheUnitedStatesincludingPrinceton,Yale,TheCatholicUniversity,GeorgetownUniversityandMannesSchoolofMusic\.",

"CallingAllDawns\|CallingAllDawnsisaclassicalcrossoveralbumbyChristopherTinreleasedin2009\.ThealbumwontwoGrammysatthe53rdGrammyAwardsforBestClassicalCrossoverAlbumandBestInstrumentalArrangementAccompanyingVocalist\(s\)forthesong\\"BabaYetu\\",thethemeforthe2005videogame\\"CivilizationIV\\"\.ThewinmarksthefirsttimeinhistorythataGrammyhasbeenawardedtoacompositionwrittenforavideogame\.",

"BabaYetu\|\\"BabaYetu\\"isasongbycomposerChristopherTin\.Itwascomposedin2005whenvideogamedesignerSorenJohnson,Tin’sformerroommateatStanford,askedhimtocomposethethemesongfor\\"CivilizationIV\\"\.ItwassungbyRonRaginandtheStanfordTalisman\.Foritsre\-releaseinTin’sdebutalbum\\"CallingAllDawns\\",itwassungbyRonRaginandtheSowetoGospelChoir\.",

"ChristopherTin\|ChristopherChiyanTin\(bornMay21,1976\)isanAmericancomposerofconcertmusic,film,andvideogamescores\.Hisworkisprimarilyorchestralandchoral,oftenwithaworldmusicinfluence\.HehaswontwoGrammyAwardsforhisclassicalcrossoveralbum\\"CallingAllDawns\\"\.",

"ReachingfortheMoon\(album\)\|ReachingfortheMoonisthethirdalbumbyjazzsingerRoseannaVitro,releasedin1991ontheCMGlabel\.Vitroexploresanumberofgenres,accompaniedbypianist\-arrangerKennyWernerandavaryingcastoffeaturedandsupportingplayers,includingsaxophonistsGeorgeColeman,JoeLovanoandKirkWhalum,drummerTomRainey,bassistsHarvieSwartzandRatzoHarris,andpercussionistsMinoCinelu,SteveBerriosandCafe\.",

"DawnofaNewCentury\|DawnofaNewCentury\(released20April1999byUniversalMusic,Norway\-546120\-2/Philips\)isthethirdalbumbyanIrish\-NorwegianduoSecretGarden,releasedbyPhilipsRecordsin1999\.Capercaillie’sfrontwomanKarenMathesonperformedasasoloistforthesong\\"Prayer\\"\.",

"SheilaHorne\|BridesofFunkensteinmemberSheilaHornewashiredasaback\-upsingerfororiginalBridesLynnMabryandDawnSilvain1978\.ShebecameamainsingerwithSilvaandJeanetteMcGruderin1979forthesecondBridesalbum\\"NeverBuyTexasfromaCowboy\\"andjoinedRickJamesontourin1981asanoriginalMaryJane\.HornewrotesongsforGeorgeClinton’ssoloalbumsonCapitolinthemid\-1980sand,sof2005,continuestotourwithP\-FunkAll\-Stars\.InItalyin1997shereleasedanalbumunderthenameBlackwood\."

\],

"reasoning":"TheclaimstatesthatJacquelineHorner\-Kwiatekwasafeaturedsoloistonanaward\-winningCDthatincludestheromanticthemeforthegame\\"CivilizationIV,\\"writtenbyChristopherTin\.ThecontextandpassagesconfirmthatJacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningCD\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"from\\"CivilizationIV\.\\"However,theclaiminaccuratelydescribes\\"BabaYetu\\"asaromantictheme\.While\\"BabaYetu\\"isindeedpartofthealbumandthegame,itisnotcharacterizedintheavailableinformationasaromantictheme\.Therefore,theclaimismostlyaccurateconcerningtheinvolvementofJacquelineHorner\-Kwiatekandtheinclusionofthethemeinthealbum,butitmischaracterizesthenatureofthesong\\"BabaYetu\.\\"",

"summary":"JacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningalbum\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"fromthegame\\"CivilizationIV\.\\"While\\"BabaYetu\\"ispartofthealbum,itisnotspecificallydescribedasaromantictheme\."

\}

\],

"signature":\{

"instructions":"Giventhefields‘claim‘,‘context‘,‘passages‘,producethefields‘summary‘\.",

"fields":\[

\{

"prefix":"Claim:",

"description":"$\{claim\}"

\},

\{

"prefix":"Context:",

"description":"$\{context\}"

\},

\{

"prefix":"Passages:",

"description":"$\{passages\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Summary:",

"description":"$\{summary\}"

\}

\]

\}

\}

\}

\[Module:create\_query\_hop3\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"claim":"LizzyHaleisfromthegroupwhohadasingle\\"IGetOff\\"\.Shealsoperformsonthesong\\"Apocalypic\\"\.",

"summary\_1":"LzzyHaleisconfirmedtobeassociatedwiththesong\\"IGetOff\\"astheleadsingerofHalestorm,thebandthatreleasedthetrack\.Althoughthepassagesdonotmention\\"Apocalyptic\\",LzzyHaleperformsonthatsongaswell,asitisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\.",

"summary\_2":"LzzyHaleisindeedassociatedwiththesong\\"IGetOff\\"astheleadsingerofHalestorm,thebandthatreleasedthetrack\.Additionally,sheperformsonthesong\\"Apocalyptic\\",whichisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\.",

"reasoning":"BothsummariesconfirmthatLzzyHaleistheleadsingerofHalestormandassociatedwiththesong\\"IGetOff\\"\.Theyalsoverifyherperformanceonthesong\\"Apocalyptic\\",whichisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\.Thus,theclaimisaccurateasLzzyHaleisindeedinvolvedinbothsongsasamemberofHalestorm\.",

"query":"WhatarethedetailsofLzzyHale’sinvolvementwiththebandHalestormandtheirsingles\\"IGetOff\\"and\\"Apocalyptic\\"?"

\},

\{

"augmented":true,

"claim":"LivehasmoremembersthantheAmericanrockbandwhosesecondalbumiscalledSweetSour\.",

"summary\_1":"TheclaimthatLivehasmoremembersthanthebandwiththealbum\\"SweetSour\\"istruebecauseLivehasfourmembers,whileBandofSkulls,thebandwiththe\\"SweetSour\\"album,isatriowiththreemembers\.",

"summary\_2":"TheclaimthatLivehasmoremembersthanthebandwiththe\\"SweetSour\\"albumistrue\.Livehasfourmembers,whileBandofSkullshasthreemembers\.",

"reasoning":"TheclaimisaccuratebecauseLive,anAmericanrockband,consistsoffourmembers\.BandofSkulls,thebandknownfortheiralbum\\"SweetSour,\\"isatrio,consistingofthreemembers\.Therefore,LivedoesindeedhavemoremembersthanBandofSkulls\.",

"query":"WhatarethenamesandnumberofmembersinthebandsLiveandBandofSkulls?"

\},

\{

"augmented":true,

"claim":"Theaward\-winningCD,thatJacquelineHorner\-Kwiatekwasafeaturedsoloiston,includestheromanticthemeforthegame\\"CivilizationIV,\\"writtenbyChristopherTin\.",

"summary\_1":"JacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningCD\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"fromthegame\\"CivilizationIV\.\\"",

"summary\_2":"JacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningalbum\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"fromthegame\\"CivilizationIV\.\\"While\\"BabaYetu\\"ispartofthealbum,itisnotspecificallydescribedasaromantictheme\.",

"reasoning":"TheclaimstatesthatJacquelineHorner\-KwiatekwasafeaturedsoloistonaCDthatincludestheromanticthemefor\\"CivilizationIV\.\\"BothsummariesclarifythatshewasindeedasoloistonChristopherTin’sGrammyaward\-winningalbum\\"CallingAllDawns,\\"whichfeaturesthetheme\\"BabaYetu\\"from\\"CivilizationIV\.\\"However,while\\"BabaYetu\\"isincludedinthealbum,itisnotspecificallydescribedasaromantictheme\.Therefore,theterm\\"romantictheme\\"intheclaimmaybeinaccurateormisleading\.",

"query":"Whatisthenatureofthetheme\\"BabaYetu\\"from\\"CivilizationIV,\\"andisitwidelyrecognizedordescribedasaromantictheme?"

\}

\],

"signature":\{

"instructions":"Analyzetheprovided‘claim‘alongwiththecontextualsummaries‘summary\_1‘and‘summary\_2‘\.Utilizethisinformationtodevelopaprecise‘query‘thatwillenhancedocumentretrievalrelatedtotheclaim\.Intheprocess,reflectonthelogicalstepsandreasoningrequiredtodiscerntheinherentconnectionsandverifyorrefutetheclaimeffectively\.",

"fields":\[

\{

"prefix":"Claim:",

"description":"$\{claim\}"

\},

\{

"prefix":"Summary1:",

"description":"$\{summary\_1\}"

\},

\{

"prefix":"Summary2:",

"description":"$\{summary\_2\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Query:",

"description":"$\{query\}"

\}

\]

\}

\}

\}

Listing 2:Evolved Context for HoVer \(Cluster 0\)\[Module:summarize1\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"claim":"StephenPearcyisolderthananothersinger\.ThisothersingerwasonceamemberofthegroupWonderGirls\.",

"passages":\[

"StephenPearcy\|StephenEricPearcy\(bornJuly3,1956\),knownasStephenPearcy,isanAmericanmusician\.Heisbestknownasthefounder,singer,andsongwriteroftheheavymetalbandRatt\.HehasalsobeeninthebandsFiredome,CrystalPystal,Arcade,ViciousDelite,Vertex\.Hehasalsorecordedfourrecordsasasoloartist\.",

"StephenDuffy\|StephenAnthonyJamesDuffy\(born30May1960inAlumRock,Birmingham,England\)isanEnglishsinger/songwriter,andmulti\-instrumentalist\.Hewasafoundingmemberandvocalistandbassist\(andthendrummer\)ofDuranDuran\.Hewentontorecordasasoloperformerunderseveraldifferentnames,andisthesingerandsongwriterforTheLilacTimewithhisolderbrotherNick\.Hehasalsoco\-writtenwithRobbieWilliamsandStevenPage\.",

"ViciousDelite\|ViciousDelitewasanalternativemetalbandformedbyStephenPearcyin1995afterArcade\.Theyreleasedalimitedself\-titledEPin1995followedbyafull\-lengthalbum\.ThemembersofViciousDelitewereStephenPearcy,TonyMarcus,MichaelAndrews,andRobKarras\.StephenPearcywassaidtostillbeinhisRattform,indeedhisvoicewasjustascompelling\.Thoughhislook\(shorterhair&nofancyclothesormakeup\)wasdifferent,hison\-stageandoff\-stagepersonalitywasstillasexciting\.MikeDuda\(ofW\.A\.S\.P\.\)wouldreplaceAndrewsonbass,andwaswithPearcyhandingoutdemoCDsjustafewweeksaftertheband’sformation\.Theirfirstliveperformancewasintense&wonderfullysurprising\.Pearcystillhaditinalltherightwaysneededforatruerockstarandqualitymusician\.",

"Vertex\(band\)\|Vertexwasabandformedin1995featuringsingerStephenPearcy\(Ratt,Arcade,ViciousDelite,Nitronic\),guitaristAlPitrelli\(DangerDanger,Hotshot,AliceCooper,Asia,Savatage,Megadeth,Trans\-SiberianOrchestra\),anddrummerHiroKuretani\(WXXI,TrancentralStation\)\.",

"StephenCummings\|StephenDonaldCummings\(born13September1954inMelbourne,Australia\)isanAustralianrocksinger\-songwriterandwriter\.HewasleadsingerofMelbourne\-basedrockband,TheSports,from1976to1981,followedbyasolocareerwhichhasmetwithcriticalacclaimbuthashadlimitedcommercialsuccess\.Hehaswrittentwonovels,\\"Wonderboy\\"\(1996\)and\\"StayAwayfromLightningGirl\\"\(1999\),andamemoir,\\"WillitBeFunnyTomorrow,Billy?\\"\(2009\)\.In2014adocumentaryfilm\\"Don’tThrowStones\\"basedonhismemoirpremieredaspartoftheMelbourneInternationalFilmFestival\.",

"UnderMySkin\(StephenPearcyalbum\)\|UnderMySkinisthethirdsoloalbumbyStephenPearcy,thefounder,leadsingerandsongwriteroftherockbandRatt\.Featuredonthealbumisthesingle\\"RoundandRound\(FeaturingTheDonnas\)\\"whichisaremakeofthe1984Ratthit\\"RoundandRound\\"\.ThealbumwasreleasedonJuly22,2008byTopFuelRecordsandAirlineRecords\.Thetrack\\"AreYouReady\\",likepreviousrecordings\\"DriveWithMe\\"andthererecordedArcadetrack\\"HottRacin’\\",washeardonNHRA/ESPN2\.",

"StephenGately\|StephenPatrickDavidGately\(17March197610October2009\)wasanIrishpopsinger\-songwriter,actor,children’swriteranddancer,who,withRonanKeating,wasoneoftwoleadsingersofthepopgroupBoyzone\.GatelyandKeatingoriginallyservedasco\-frontmen,butKeatinggraduallybecamemoresynonymouswiththegroupastimewenton\.AllofBoyzone’sstudioalbumshitnumberoneintheUnitedKingdom,theirthirdbeingtheirmostsuccessfulinternationally\.WithBoyzone,Gatelyhadarecord\-breakingsixteenconsecutivesinglesenterthetopfiveoftheUKSinglesChart\.Heperformedformillionsoffansglobally\.Hereleasedasoloalbumin2000,afterthegroup’sinitialbreak\-up,whichchartedintheUKtoptenandyieldedthreeUKhitsingles,includingthetopthreehit\\"NewBeginning\\"\.Gatelywentontoappearvariouslyinstageproductionsandontelevisionprogrammesaswellascontributingsongstovariousprojects\.In2008,herejoinedhiscolleaguesasBoyzonereformedforaseriesofconcertsandrecordings\."

\],

"reasoning":"TodeterminewhetherStephenPearcyisolderthantheformermemberofWonderGirls,wemustfirstidentifywhotheformerWonderGirlsmemberisandtheirage\.ThepassagesprovideddonotmentionWonderGirlsoranymembersfromthegroup,buttheydoprovidebirthdatesforvariousindividualsnamedStephen:

1\.StephenPearcywasbornonJuly3,1956\.

2\.StephenDuffywasbornonMay30,1960\.

3\.StephenCummingswasbornonSeptember13,1954\.

4\.StephenGatelywasbornonMarch17,1976\.

NoneoftheaboveindividualswereassociatedwithWonderGirls,aSouthKoreangirlgroupformedin2007byJYPEntertainment\.Therefore,wedonothaveinformationfromthepassagesthatexplicitlyidentifiesaformerWonderGirlsmembertocompareageswithStephenPearcy\.

Toproceed,wecanascertainthatStephenPearcy,bornin1956,isolderthanStephenDuffy\(bornin1960\)andStephenGately\(bornin1976\)\.HeisyoungerthanStephenCummings\(bornin1954\)\.However,withoutthespecificageoridentificationofaformerWonderGirlsmember,wecannotdefinitivelyconcludetheclaimbasedonthepassagesprovided\.",

"summary":"ThepassagesprovidebirthdatesforseveralindividualsnamedStephen,butnoneareassociatedwithWonderGirls\.StephenPearcy,bornin1956,isolderthanStephenDuffy\(1960\)andStephenGately\(1976\),butyoungerthanStephenCummings\(1954\)\.WithoutinformationonaformerWonderGirlsmember,theclaimregardingStephenPearcy’sagecomparedtothisunidentifiedsingercannotbeconfirmed\."

\}

\],

"signature":\{

"instructions":"\\"Usingtheprovided‘claim‘and‘passages‘,generatealogical‘reasoning‘thatconnectstheinformationinthepassagestotheclaim\.Startbyclearlyidentifyingkeyentitiesandtheirrolesorcharacteristicswithintheclaimandpassages\.Then,verifythepresenceorabsenceoftheseentitiesinthepassagesanddeterminetheirrelevancetotheclaim\.Produceaconcise‘summary‘thateithervalidatesoraddressestheclaimbasedonthesynthesizedreasoningsteps\.Ensuretoexplicitlystateanyassumptionsmadeduringthereasoningprocessandhighlightanygapsintheinformationthatmayaffectthevalidityoftheclaim\.\\"",

"fields":\[

\{

"prefix":"Claim:",

"description":"$\{claim\}"

\},

\{

"prefix":"Passages:",

"description":"$\{passages\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Summary:",

"description":"$\{summary\}"

\}

\]

\}

\}

\}

\[Module:create\_query\_hop2\]

\{

"predict":\{

"demos":\[\],

"signature":\{

"instructions":"\\"Youarearesearcherworkingtoverifyfactualclaimsrelatedtoculturalfigures\.Givena‘claim‘andapreliminary‘summary\_1‘,systematicallydevelop‘reasoning‘thatoutlineslogicalstepsconnectingthepresenteddatatotheclaim’sevaluation\.Yourreasoningshouldclearlyidentifyanymissinginformationnecessaryfortheclaim’sverification\.Basedonthisevaluation,constructaprecise‘query‘thattargetsspecificgapsininformation,ensuringthatthequeryisdirectlyalignedwiththeaimofobtainingdatanecessaryforcomprehensiveclaimverification\.\\"",

"fields":\[

\{

"prefix":"Claim:",

"description":"$\{claim\}"

\},

\{

"prefix":"Summary1:",

"description":"$\{summary\_1\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Query:",

"description":"$\{query\}"

\}

\]

\}

\}

\}

\[Module:summarize2\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"claim":"LizzyHaleisfromthegroupwhohadasingle\\"IGetOff\\"\.Shealsoperformsonthesong\\"Apocalypic\\"\.",

"context":"LzzyHaleisconfirmedtobeassociatedwiththesong\\"IGetOff\\"astheleadsingerofHalestorm,thebandthatreleasedthetrack\.Althoughthepassagesdonotmention\\"Apocalyptic\\",LzzyHaleperformsonthatsongaswell,asitisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\.",

"passages":\[

"Apocalyptic\(song\)\|\\"Apocalyptic\\"isasongbytheAmericanhardrockbandHalestorm\.ItwasreleasedonJanuary12,2015,astheleadsinglefromtheband’sthirdstudioalbum,\\"IntotheWildLife\\"\.ThevideoforthesongwasreleasedonJanuary28\.",

"IGetOff\|\\"IGetOff\\"isthefirstsinglereleasedbythehardrockband,Halestorm\.Itistakenfromtheirself\-titleddebutalbum\.",

"Repressed\|\\"Repressed\\"isasinglebyApocalyptica,releasedon19May2006\.ThetitlesongfeaturesMaxCavalera\(SoulflyandSepultura\)andMattTuck\(BulletformyValentine\)onvocals\.It’smostlysunginEnglishandPortuguese,whichpartsinthelastonearedonebyCavalera\.",

"ShatterMe\(song\)\|\\"ShatterMe\\"isasongcomposedandperformedbyAmericanviolinistLindseyStirlingforhersecondstudioalbumofthesamename,andfeaturesvocalsfromAmericanrocksingerLzzyHale,theleadsingerofAmericanrockbandHalestorm\.ThesongwaswrittenandcomposedbyStirling,DiaFrampton,andthesong’sproducer,MarkMaxwell,underhisproductionmonikerSILAS\.Thethemesexploredin\\"ShatterMe\\"werebasedinpartonStirling’sexperienceswithaneatingdisorder\.ItbecameStirling’sfirstchartingairplaysingleinNovember2014whenitenteredthe\\"Billboard\\"AdultPopSongsatnumber39\.Priortothat,ithadpeakedatnumber1ontheClassicalDigitalSongsandnumber10ontheDance/ElectronicDigitalSongscharts,aswellaschartinginGermanyatnumber59\.",

"Halestorm\|HalestormisanAmericanhardrockbandfromRedLion,Pennsylvania,consistingofleadvocalistandguitaristLzzyHale,herbrotherdrummerandpercussionistArejayHale,guitaristJoeHottinger,andbassistJoshSmith\.Thegroup’sself\-titleddebutalbumwasreleasedonApril28,2009,throughAtlanticRecords\.Theirsecondalbum\\"TheStrangeCaseOf\.\.\.\\"wasreleasedonApril10,2012\.Itsleadsingle\\"LoveBites\(SoDoI\)\\"fromthatalbumwontheirfirstGrammyAwardforBestHardRock/MetalPerformanceonFebruary10,2013\.",

"JohnnyAndrews\|JohnnyAndrewsisanAmericansongwriterandproducerbasedinAtlanta,Georgia,andNashville,Tennessee\.Andrewshaswrittenmultiplenumberonesinglesincluding,""IAmMachine"and"Painkiller"byThreeDaysGrace,"FreakLikeMe"byHalestorm,and"StandUp"byAllThatRemains\.Inadditiontothesebands,Andrews’songshavebeenperformedbyApocalypticafeaturingCoreyTaylor\("Slipknot"\),andGavinRossdale,TheoryofaDeadMan,Flyleaf,Red,SickPuppies,andChiodos",

"Halestorm\(album\)\|HalestormisthedebutstudioalbumbyAmericanrockbandHalestorm\.Itwasreleasedin2009onAtlanticRecords\.HowardBensonproducedthealbum\.Itpeakedat\#40onthe\\"Billboard\\"200\.Thefirstsinglefromthealbum,\\"IGetOff\\"receivedconsiderableplaytime\.Thesongreached\#17on\\"Billboard\\"<nowiki\>’</nowiki\>sRocklist\.Thetrack\\"Innocence\\"wasco\-writtenbyex\-Evanescencemember,BenMoody\."

\],

"reasoning":"TheclaimstatesthatLzzyHaleisfromthegroupthatreleasedthesingle\\"IGetOff\\"andthatshealsoperformsonthesong\\"Apocalyptic\\"\.Fromthecontextandpassagesprovided,itisclearthatLzzyHaleistheleadvocalistofHalestorm,thebandthatreleasedboth\\"IGetOff\\"and\\"Apocalyptic\\"\.Passage\[1\]confirms\\"Apocalyptic\\"asasongbyHalestorm,releasedastheleadsinglefromtheiralbum\\"IntotheWildLife\\",whilepassage\[2\]confirms\\"IGetOff\\"asthefirstsinglefromHalestorm’sdebutalbum\.ThecontextfurthersupportstheclaimbyaffirmingHale’sassociationwithbothsongsaspartofHalestorm’sdiscography\.",

"summary":"LzzyHaleisindeedassociatedwiththesong\\"IGetOff\\"astheleadsingerofHalestorm,thebandthatreleasedthetrack\.Additionally,sheperformsonthesong\\"Apocalyptic\\",whichisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\."

\},

\{

"augmented":true,

"claim":"LivehasmoremembersthantheAmericanrockbandwhosesecondalbumiscalledSweetSour\.",

"context":"TheclaimthatLivehasmoremembersthanthebandwiththealbum\\"SweetSour\\"istruebecauseLivehasfourmembers,whileBandofSkulls,thebandwiththe\\"SweetSour\\"album,isatriowiththreemembers\.",

"passages":\[

"FiveMenLive\|FiveMenLiveisaTalismanliverecordapartofadoublepackagethatFrontiersRecordsreleasedwiththeband\.",

"TheFourPostmen\|TheFourPostmenisaLosAngeles\-basedrockgroupthatconsistsoffivebandmembers\.Featuringelectricandacousticguitars,keyboards,drums,bass,andanemphasisonthree\-partvocalharmony,theirliveshowishighlyenergeticandtheatrical,completewithnon\-stopcomicbanterbetweensongs\.SometimescomparedtoBarenakedLadies,GQMagazinepreferstocallthem,\\"TheSeinfeldesqueMonkees\.\\"",

"BeyondSpaceWithoutLimits\|ThealbumfeaturesthesixregularmembersoftheInfinitylive\-bandaugmentedwithanumberof,largelylocal,guests\.",

"Fifteen\(band\)\|FifteenwasapunkrockbandformedbyJackCurranandJeffOtt\.AccordingtoOtt,theband’sonlyconsistentmemberinits11\-yearhistory,Fifteenhadatotaloffifteenmembersbythetimethegroupdisbandedin2000:JackCurran,MikeyMischief,MarkMoreno,JeanRepetto,LuckyDog,ChrisFlanagan,JohnOgle,JesseWickman,JoeCable,JohnQuintos,ScottPierce,JimPrior,LisaD\.,VanessaBainandOtthimself\.",

"Live\(band\)\|Live\(,oftentypesetasLIVEor\+LIVE\+\)isanAmericanrockbandfromYork,Pennsylvania,consistingofEdKowalczyk\(leadvocals,rhythmguitar\),ChadTaylor\(leadguitar,backingvocals\),PatrickDahlheimer\(bass\),andChadGracey\(drums\)\.Kowalczykleftthebandin2009andwasreplacedbyChrisShinn,butrejoinedinDecember2016\.",

"Bandsociety\|Abandsocietyisthesimplestformofhumansociety\.Abandgenerallyconsistsofasmallkingroup,nolargerthananextendedfamilyorclan\.Thegeneralconsensusofmodernanthropologyseestheaveragenumberofmembersofasocialbandatthesimplestlevelofforagingsocietiesasrangingfrom30to50people\.",

"SubOslo\|SubOsloisanimprovisationalbandfromDenton,TXthatplayspsychedelic/ambientdub\.Thebandwasformedin1996byMiguelVelizandQuincyHolloway,whowereroommatesatthetime\.Sincethenthebandhasexpandedtoincludeatotalof8members\.WhenSubOsloperformslive,theyhavealivevisualartisttodisplayvisualizationsprojectedontoascreen,andincorporateamixengineer,allowingperformancestobedifferenteachtime\.SubOslohastouredtheWestCoast,Japan,andhasalsohadtheirmusicfeaturedintheXboxvideogameBruteForce\.TheyhavealsoperformedalongsideinfluentialdubartistssuchasMadProfessor,SteelPulseandTheRoots\.TheyalsoperformedwithlessknownartistssuchasFugazi,RazMesanai,theMake\-up,Yeti,Him,Juneof44,andDJKrush\."

\],

"reasoning":"TheclaimisaboutthenumberofmembersinthebandLivecomparedtothebandwhosesecondalbumis\\"SweetSour,\\"whichisBandofSkulls\.Basedonthecontextprovided,LiveisanAmericanrockbandwithfourmembers\.Thepassagesconfirmthisinformationwithdetailsabouttheband’slineup\.BandofSkulls,knownforthealbum\\"SweetSour,\\"isatrio,meaningithasthreemembers\.Therefore,LiveindeedhasmoremembersthanBandofSkulls\.",

"summary":"TheclaimthatLivehasmoremembersthanthebandwiththe\\"SweetSour\\"albumistrue\.Livehasfourmembers,whileBandofSkullshasthreemembers\."

\},

\{

"augmented":true,

"claim":"Theaward\-winningCD,thatJacquelineHorner\-Kwiatekwasafeaturedsoloiston,includestheromanticthemeforthegame\\"CivilizationIV,\\"writtenbyChristopherTin\.",

"context":"JacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningCD\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"fromthegame\\"CivilizationIV\.\\"",

"passages":\[

"JacquelineHorner\-Kwiatek\|JacquelineHorner\-Kwiatekisamezzo\-sopranospecializinginbothearlymusicandnewworks\.SheisamemberofthecriticallyacclaimedensembleAnonymous4andwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningCD\\"CallingAllDawns\\"\.OriginallyfromBelfast,N\.Ireland,sheworkedinEuropewithmanyiconiccomposersincludingStockhausen,Berio,XenakisandJudithWeirbeforemovingtotheUnitedStates\.InadditiontoherworkwithAnonymous4shehascontinuedherworkinnewmusic,collaboratingwithDavidLang,GregorySpears,DanielThomasDavisandPhilKline\.SheisavoiceteacherandhasbeenavisitingartistatmanyuniversitiesintheUnitedStatesincludingPrinceton,Yale,TheCatholicUniversity,GeorgetownUniversityandMannesSchoolofMusic\.",

"CallingAllDawns\|CallingAllDawnsisaclassicalcrossoveralbumbyChristopherTinreleasedin2009\.ThealbumwontwoGrammysatthe53rdGrammyAwardsforBestClassicalCrossoverAlbumandBestInstrumentalArrangementAccompanyingVocalist\(s\)forthesong\\"BabaYetu\\",thethemeforthe2005videogame\\"CivilizationIV\\"\.ThewinmarksthefirsttimeinhistorythataGrammyhasbeenawardedtoacompositionwrittenforavideogame\.",

"BabaYetu\|\\"BabaYetu\\"isasongbycomposerChristopherTin\.Itwascomposedin2005whenvideogamedesignerSorenJohnson,Tin’sformerroommateatStanford,askedhimtocomposethethemesongfor\\"CivilizationIV\\"\.ItwassungbyRonRaginandtheStanfordTalisman\.Foritsre\-releaseinTin’sdebutalbum\\"CallingAllDawns\\",itwassungbyRonRaginandtheSowetoGospelChoir\.",

"ChristopherTin\|ChristopherChiyanTin\(bornMay21,1976\)isanAmericancomposerofconcertmusic,film,andvideogamescores\.Hisworkisprimarilyorchestralandchoral,oftenwithaworldmusicinfluence\.HehaswontwoGrammyAwardsforhisclassicalcrossoveralbum\\"CallingAllDawns\\"\.",

"ReachingfortheMoon\(album\)\|ReachingfortheMoonisthethirdalbumbyjazzsingerRoseannaVitro,releasedin1991ontheCMGlabel\.Vitroexploresanumberofgenres,accompaniedbypianist\-arrangerKennyWernerandavaryingcastoffeaturedandsupportingplayers,includingsaxophonistsGeorgeColeman,JoeLovanoandKirkWhalum,drummerTomRainey,bassistsHarvieSwartzandRatzoHarris,andpercussionistsMinoCinelu,SteveBerriosandCafe\.",

"DawnofaNewCentury\|DawnofaNewCentury\(released20April1999byUniversalMusic,Norway\-546120\-2/Philips\)isthethirdalbumbyanIrish\-NorwegianduoSecretGarden,releasedbyPhilipsRecordsin1999\.Capercaillie’sfrontwomanKarenMathesonperformedasasoloistforthesong\\"Prayer\\"\.",

"SheilaHorne\|BridesofFunkensteinmemberSheilaHornewashiredasaback\-upsingerfororiginalBridesLynnMabryandDawnSilvain1978\.ShebecameamainsingerwithSilvaandJeanetteMcGruderin1979forthesecondBridesalbum\\"NeverBuyTexasfromaCowboy\\"andjoinedRickJamesontourin1981asanoriginalMaryJane\.HornewrotesongsforGeorgeClinton’ssoloalbumsonCapitolinthemid\-1980sand,sof2005,continuestotourwithP\-FunkAll\-Stars\.InItalyin1997shereleasedanalbumunderthenameBlackwood\."

\],

"reasoning":"TheclaimstatesthatJacquelineHorner\-Kwiatekwasafeaturedsoloistonanaward\-winningCDthatincludestheromanticthemeforthegame\\"CivilizationIV,\\"writtenbyChristopherTin\.ThecontextandpassagesconfirmthatJacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningCD\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"from\\"CivilizationIV\.\\"However,theclaiminaccuratelydescribes\\"BabaYetu\\"asaromantictheme\.While\\"BabaYetu\\"isindeedpartofthealbumandthegame,itisnotcharacterizedintheavailableinformationasaromantictheme\.Therefore,theclaimismostlyaccurateconcerningtheinvolvementofJacquelineHorner\-Kwiatekandtheinclusionofthethemeinthealbum,butitmischaracterizesthenatureofthesong\\"BabaYetu\.\\"",

"summary":"JacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningalbum\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"fromthegame\\"CivilizationIV\.\\"While\\"BabaYetu\\"ispartofthealbum,itisnotspecificallydescribedasaromantictheme\."

\}

\],

"signature":\{

"instructions":"\\"Giventhefields‘claim‘,‘context‘,and‘passages‘,producethefields‘reasoning‘and‘summary‘\.For‘reasoning‘,explainthevalidationprocessoftheclaimusingthecontextandpassages,indicatingwhethertheclaimissupportedorrefuted\.For‘summary‘,conciselystatetheoutcomeofthevalidation,ensuringalignmentwiththereasoningprovided\.\\"",

"fields":\[

\{

"prefix":"Claim:",

"description":"$\{claim\}"

\},

\{

"prefix":"Context:",

"description":"$\{context\}"

\},

\{

"prefix":"Passages:",

"description":"$\{passages\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Summary:",

"description":"$\{summary\}"

\}

\]

\}

\}

\}

\[Module:create\_query\_hop3\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"claim":"LizzyHaleisfromthegroupwhohadasingle\\"IGetOff\\"\.Shealsoperformsonthesong\\"Apocalypic\\"\.",

"summary\_1":"LzzyHaleisconfirmedtobeassociatedwiththesong\\"IGetOff\\"astheleadsingerofHalestorm,thebandthatreleasedthetrack\.Althoughthepassagesdonotmention\\"Apocalyptic\\",LzzyHaleperformsonthatsongaswell,asitisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\.",

"summary\_2":"LzzyHaleisindeedassociatedwiththesong\\"IGetOff\\"astheleadsingerofHalestorm,thebandthatreleasedthetrack\.Additionally,sheperformsonthesong\\"Apocalyptic\\",whichisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\.",

"reasoning":"BothsummariesconfirmthatLzzyHaleistheleadsingerofHalestormandassociatedwiththesong\\"IGetOff\\"\.Theyalsoverifyherperformanceonthesong\\"Apocalyptic\\",whichisanothersinglebyHalestormfromtheiralbum\\"IntotheWildLife\\"\.Thus,theclaimisaccurateasLzzyHaleisindeedinvolvedinbothsongsasamemberofHalestorm\.",

"query":"WhatarethedetailsofLzzyHale’sinvolvementwiththebandHalestormandtheirsingles\\"IGetOff\\"and\\"Apocalyptic\\"?"

\},

\{

"augmented":true,

"claim":"LivehasmoremembersthantheAmericanrockbandwhosesecondalbumiscalledSweetSour\.",

"summary\_1":"TheclaimthatLivehasmoremembersthanthebandwiththealbum\\"SweetSour\\"istruebecauseLivehasfourmembers,whileBandofSkulls,thebandwiththe\\"SweetSour\\"album,isatriowiththreemembers\.",

"summary\_2":"TheclaimthatLivehasmoremembersthanthebandwiththe\\"SweetSour\\"albumistrue\.Livehasfourmembers,whileBandofSkullshasthreemembers\.",

"reasoning":"TheclaimisaccuratebecauseLive,anAmericanrockband,consistsoffourmembers\.BandofSkulls,thebandknownfortheiralbum\\"SweetSour,\\"isatrio,consistingofthreemembers\.Therefore,LivedoesindeedhavemoremembersthanBandofSkulls\.",

"query":"WhatarethenamesandnumberofmembersinthebandsLiveandBandofSkulls?"

\},

\{

"augmented":true,

"claim":"Theaward\-winningCD,thatJacquelineHorner\-Kwiatekwasafeaturedsoloiston,includestheromanticthemeforthegame\\"CivilizationIV,\\"writtenbyChristopherTin\.",

"summary\_1":"JacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningCD\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"fromthegame\\"CivilizationIV\.\\"",

"summary\_2":"JacquelineHorner\-KwiatekwasafeaturedsoloistonChristopherTin’sGrammyaward\-winningalbum\\"CallingAllDawns,\\"whichincludesthetheme\\"BabaYetu\\"fromthegame\\"CivilizationIV\.\\"While\\"BabaYetu\\"ispartofthealbum,itisnotspecificallydescribedasaromantictheme\.",

"reasoning":"TheclaimstatesthatJacquelineHorner\-KwiatekwasafeaturedsoloistonaCDthatincludestheromanticthemefor\\"CivilizationIV\.\\"BothsummariesclarifythatshewasindeedasoloistonChristopherTin’sGrammyaward\-winningalbum\\"CallingAllDawns,\\"whichfeaturesthetheme\\"BabaYetu\\"from\\"CivilizationIV\.\\"However,while\\"BabaYetu\\"isincludedinthealbum,itisnotspecificallydescribedasaromantictheme\.Therefore,theterm\\"romantictheme\\"intheclaimmaybeinaccurateormisleading\.",

"query":"Whatisthenatureofthetheme\\"BabaYetu\\"from\\"CivilizationIV,\\"andisitwidelyrecognizedordescribedasaromantictheme?"

\}

\],

"signature":\{

"instructions":"\\"Carefullyexaminetheprovided‘claim‘alongsidethecontextualsummaries‘summary\_1‘and‘summary\_2‘\.Yourtaskistogenerateaprecise‘query‘thatwillenhancedocumentretrievalrelatedtotheclaim\.Indoingso,focusonidentifyinggapsintheinformationprovidedandformulatingquestionsthatdirectlytargetthesegapstoverifyorrefutetheclaim\.Ensureyourreasoningprocessincludesverifyingtheconnectionsbetweentheclaimandsummaries,identifyinganymissinginformation,andcraftingaquerythataddressesthesespecificareas\.Thequeryshouldbeclear,targeted,andfacilitateretrievalofpreciseinformationneededtovalidatetheclaim\.\\"",

"fields":\[

\{

"prefix":"Claim:",

"description":"$\{claim\}"

\},

\{

"prefix":"Summary1:",

"description":"$\{summary\_1\}"

\},

\{

"prefix":"Summary2:",

"description":"$\{summary\_2\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Query:",

"description":"$\{query\}"

\}

\]

\}

\}

\}

Listing 3:Evolved Context for HoVer \(Round 1\)\[Module:generate\_answer\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"context":"Thegirlwhoisnotweakiswearingsomechains,butnotadress\.",

"question":"Thegirlwhoisnotweakiswearingsomejewelry,butnotadress\.",

"reasoning":"Thecontextstatesthatthegirlwhoisnotweakiswearingsomechains\.Chainsareatypeofjewelry\.Therefore,ifthegirliswearingchains,sheisindeedwearingsomejewelry\.Thecontextalsoexplicitlymentionsthatsheisnotwearingadress,whichalignswiththestatementinthequestion\.Bothconditionsinthequestionaresatisfiedbythecontext\.",

"answer":"Yes"

\},

\{

"augmented":true,

"context":"Thethreechildrenareholdingherbs\.",

"question":"Thethreechildrenareholdingplants\.",

"reasoning":"Herbsareasubsetofplants\.Sincethechildrenareholdingherbs,andherbsareconsideredplants,itfollowsthatthechildrenareindeedholdingplants\.",

"answer":"Yes"

\},

\{

"augmented":true,

"context":"Itisnotthecasethatpeopledonothaveanypcs\.",

"question":"Itisnotthecasethatpeopledonothaveanymachines\.",

"reasoning":"Thestatement\\"Itisnotthecasethatpeopledonothaveanypcs\\"impliesthatpeopledohavesomepcs\.Sincepcsareatypeofmachine,ifpeoplehavepcs,theyinherentlyhavemachines\.Therefore,thestatement\\"Itisnotthecasethatpeopledonothaveanymachines\\"logicallyfollowsfromthefactthatpeoplehavepcs\.",

"answer":"Yes"

\},

\{

"augmented":true,

"context":"Thethreechildrenareholdingplants\.",

"question":"Thethreechildrenareholdingflowers\.",

"reasoning":"Thestatementinthecontextspecifiesthatthechildrenareholdingplants,butitdoesnotspecifythetypeofplants\.Plantscanincludeflowers,butwithoutexplicitinformationthattheplantsareflowers,wecannotconcludethatthechildrenareholdingflowers\.",

"answer":"No"

\}

\],

"signature":\{

"instructions":"Imagineyouareadetectivetaskedwithsolvingahigh\-stakescasereliantonyourlogicaldeductionskills\.Usingtheprovidedcontextandquestion,employa\\"ChainofThought\\"methodologytoevaluateandderivereasoningthatsupportsorrefutesthestatementinquestion\.Carefullyconsidereachelementpresentedinthecontext,analyzeitsimplications,andgenerateawell\-supportedstep\-by\-stepreasoningprocess\.Concludewithabinary’Yes’or’No’answerbasedonwhetherthequestionalignscoherentlywiththecontextprovided\.",

"fields":\[

\{

"prefix":"Context:",

"description":"$\{context\}"

\},

\{

"prefix":"Question:",

"description":"$\{question\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Answer:",

"description":"YesorNo"

\}

\]

\}

\}

\}

Listing 4:Evolved Context for SCONE \(Cluster 0\)\[Module:generate\_answer\]

\{

"generate\_answer\.predict":\{

"demos":\[

\{

"augmented":true,

"context":"Thegirlwhoisnotweakiswearingsomechains,butnotadress\.",

"question":"Thegirlwhoisnotweakiswearingsomejewelry,butnotadress\.",

"reasoning":"Thecontextstatesthatthegirlwhoisnotweakiswearingsomechains\.Chainsareatypeofjewelry\.Therefore,ifthegirliswearingchains,sheisindeedwearingsomejewelry\.Thecontextalsoexplicitlymentionsthatsheisnotwearingadress,whichalignswiththestatementinthequestion\.Bothconditionsinthequestionaresatisfiedbythecontext\.",

"answer":"Yes"

\},

\{

"augmented":true,

"context":"Thethreechildrenareholdingherbs\.",

"question":"Thethreechildrenareholdingplants\.",

"reasoning":"Herbsareasubsetofplants\.Sincethechildrenareholdingherbs,andherbsareconsideredplants,itfollowsthatthechildrenareindeedholdingplants\.",

"answer":"Yes"

\},

\{

"augmented":true,

"context":"Itisnotthecasethatpeopledonothaveanypcs\.",

"question":"Itisnotthecasethatpeopledonothaveanymachines\.",

"reasoning":"Thestatement\\"Itisnotthecasethatpeopledonothaveanypcs\\"impliesthatpeopledohavesomepcs\.Sincepcsareatypeofmachine,ifpeoplehavepcs,theyinherentlyhavemachines\.Therefore,thestatement\\"Itisnotthecasethatpeopledonothaveanymachines\\"logicallyfollowsfromthefactthatpeoplehavepcs\.",

"answer":"Yes"

\},

\{

"augmented":true,

"context":"Thethreechildrenareholdingplants\.",

"question":"Thethreechildrenareholdingflowers\.",

"reasoning":"Thestatementinthecontextspecifiesthatthechildrenareholdingplants,butitdoesnotspecifythetypeofplants\.Plantscanincludeflowers,butwithoutexplicitinformationthattheplantsareflowers,wecannotconcludethatthechildrenareholdingflowers\.",

"answer":"No"

\}

\],

"signature":\{

"instructions":"Imagineyouareadetectivetaskedwithsolvingahigh\-stakescasereliantonyourlogicaldeductionskills\.Usingtheprovidedcontextandquestion,employa\\"ChainofThought\\"methodologytoevaluateandderivereasoningthatsupportsorrefutesthestatementinquestion\.Beginbyidentifyingthespecificelementsinthecontextandthequestion,andconsideranyimplicitassumptionsthatmaynotdirectlyapply\.Carefullyanalyzetherelationshipbetweentheseelements,takingnoteofanyspecificqualifiersorexceptionsthatmightinfluencetheoutcome\.Generateawell\-supportedstep\-by\-stepreasoningprocessthatcarefullyexaminestheserelationships\.Concludewithabinary’Yes’or’No’answerbasedonwhetherthequestionalignscoherentlyandpreciselywiththecontextprovided,avoidingassumptionsbasedsolelyonbroadcategories\.",

"fields":\[

\{

"prefix":"Context:",

"description":"$\{context\}"

\},

\{

"prefix":"Question:",

"description":"$\{question\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Answer:",

"description":"YesorNo"

\}

\]

\}

\}

\}

Listing 5:Evolved Context for SCONE \(Round 1\)\[Module:generate\_query\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"context":\[

"BattleofWattignies\|TheBattleofWattignies\(15\-16October1793\)sawaRepublicanFrencharmycommandedbyJean\-BaptisteJourdanattackaCoalitionarmydirectedbyPrinceJosiasofSaxe\-Coburg\-Saalfeld\.AftertwodaysofcombatJourdan’stroopscompelledtheHabsburgAustriancoveringforceledbyFrancoisSebastienCharlesJosephdeCroix,CountofClerfayttowithdraw\.TheWaroftheFirstCoalitionvictoryallowedtheFrenchtoraisetheSiegeofMaubeuge\.Atatimewhenfailedgeneralswereoftenexecutedorimprisoned,JourdanhadtoendureinterferencefromLazareCarnotfromtheCommitteeofPublicSafety\.Thevillage,renamedWattignies\-la\-Victoireinhonoroftheimportantsuccess,islocated9kmsoutheastofMaubeuge\.",

"AntoineBalland\|AntoineBalland\(27August1751\-3November1821\)commandedaFrenchinfantrydivisionduringtheearlyyearsoftheFrenchRevolutionaryWars\.Aformerprivate,hewaspromotedtocommandaninfantryregimentaftertheBattleofJemappes\.HebecameageneralofbrigadeinlateAugust1793andageneralofdivisionlessthanthreeweekslater\.Soonafterwards,heledadivisioninJean\-BaptisteJourdan’svictoryatWattignies\.Inthespringof1794,heledhistroopsatLeCateau,BeaumontandLandrecies\.BythistimeitwasobviousthathedidnothavethetalenttocommandacombatdivisionandhewasreplacedbyJeanBaptisteKleber\.HewasnotemployedafterJune1795anddiedatGuisein1821\.",

"PierreRaphaelPaillotdeBeauregard\|PierreRaphaelPaillotdeBeauregard\(14February1734\-30September1799\)ledaFrenchdivisionattheBattleofWattignies\.Anobleman,hejoinedtheFrenchRoyalArmyasacadetin1755andfoughtintheSevenYears’War\.Hebecamealieutenantcolonelin1779,buttwoyearslatergotintoadisputewithasuperiorofficerandwasplacedoninactiveservice\.TheFrenchRevolutionandtheWaroftheFirstCoalitionsavedhiscareer;hewaspromotedgeneralofbrigadein1792\.Heleda2,000\-mancolumnatArlonin1793butirritatedhisarmycommander\.Afterhis5,800\-strongdivisionperformedpoorlyatWattignieshewasputinprisonfor10months\.HewasbrieflyemployedagainduringtheWarintheVendeein1795beforeretiringfrommilitaryservicein1796\.",

"LazareCarnot\|LazareNicolasMarguerite,CountCarnot\(13May1753\-2August1823\)wasaFrenchpolitician,engineer,freemasonandmathematician\.Hewasknownasthe\\"OrganizerofVictory\\"intheFrenchRevolutionaryWars\.",

"HippolyteCarnot\|LazareHippolyteCarnot\(6October1801,Saint\-Omer\-16March1888\)wasaFrenchstatesman\.",

"Claude\-MarieCarnot\|Claude\-MarieCarnot\(calledCarnot\-Feulins;15July1755\-16October1836\)wasaFrenchsoldier,thebrotherofLazareCarnot\(1753\-1823\)\.HewasadeputytotheLegislativeAssemblyof1791,aRepresentativeduringtheHundredDaysof1815andaprovisionalcommissioner\(minister\)intheFrenchExecutiveCommissionof1815\."

\],

"question":"DuringtheBattleofWattignes,Jean\-BaptisteJourdanhadtoendureinterferencefromaFrenchpoliticianborninwhatyear?",

"reasoning":"ThequestionreferencesinterferencethatJean\-BaptisteJourdanfacedduringtheBattleofWattignesandasksforthebirthyearoftherelevantFrenchpolitician\.Toanswerthis,Ineedtoidentifythenameoftheinterferingpoliticianandthendeterminetheirbirthyear\.TheBattleofWattignesoccurredin1793duringtheFrenchRevolutionaryWars,andtheinterferencelikelycamefromaprominentpoliticalfigureofthatera\.",

"search\_query":"\\"BattleofWattignesJean\-BaptisteJourdaninterferenceFrenchpoliticianbirthyear\\""

\},

\{

"augmented":true,

"context":\[

"HonoredeBalzac\|HonoredeBalzac\(;\],bornHonoreBalzac,20May1799\-18August1850\)wasaFrenchnovelistandplaywright\.Thenovelsequence\\"LaComedieHumaine\\",whichpresentsapanoramaofpost\-NapoleonicFrenchlife,isgenerallyviewedashis\\"magnumopus\\"\.",

"LaComediehumaine\|LaComediehumaine\(\],\\"TheHumanComedy\\"\)isthetitleofHonoredeBalzac’s\(1799\-1850\)multi\-volumecollectionofinterlinkednovelsandstoriesdepictingFrenchsocietyintheperiodoftheRestoration\(1815\-1830\)andtheJulyMonarchy\(1830\-1848\)\.",

"MaisondeBalzac\|TheMaisondeBalzacisawriter’shousemuseumintheformerresidenceofFrenchnovelistHonoredeBalzac\(1799\-1850\)\.Itislocatedinthe16tharrondissementat47,rueRaynouard,Paris,France,andopendailyexceptMondaysandholidays;admissiontothehouseisfree,butafeeischargedforitstemporaryexhibitions\.ThenearestmetroandRERstationsarePassyandAvenueduPresidentKennedy\.",

"Yes\-noquestion\|Inlinguistics,ayes\-noquestion,formallyknownasapolarquestion,isaquestionwhoseexpectedansweriseither\\"yes\\"or\\"no\\"\.Formally,theypresentanexclusivedisjunction,apairofalternativesofwhichonlyoneisacceptable\.InEnglish,suchquestionscanbeformedinbothpositiveandnegativeforms\(e\.g\.\\"Willyoubeheretomorrow?\\"and\\"Won’tyoubeheretomorrow?\\"\)\.",

"MicrosoftQuery\|MicrosoftQueryisavisualmethodofcreatingdatabasequeriesusingexamplesbasedonatextstring,thenameofadocumentoralistofdocuments\.TheQBEsystemconvertstheuserinputintoaformaldatabasequeryusingStructuredQueryLanguage\(SQL\)onthebackend,allowingtheusertoperformpowerfulsearcheswithouthavingtoexplicitlycomposetheminSQL,andwithoutevenneedingtoknowSQL\.ItisderivedfromMosheM\.Zloof’soriginalQuerybyExample\(QBE\)implementedinthemid\-1970satIBM’sResearchCentreinYorktown,NewYork\.",

"Nofreelunchinsearchandoptimization\|Incomputationalcomplexityandoptimizationthenofreelunchtheoremisaresultthatstatesthatforcertaintypesofmathematicalproblems,thecomputationalcostoffindingasolution,averagedoverallproblemsintheclass,isthesameforanysolutionmethod\.Nosolutionthereforeoffersa\\"shortcut\\"\.Incomputing,therearecircumstancesinwhichtheoutputsofallproceduressolvingaparticulartypeofproblemarestatisticallyidentical\.Acolourfulwayofdescribingsuchacircumstance,introducedbyDavidWolpertandWilliamG\.Macreadyinconnectionwiththeproblemsofsearch"

\],

"question":"EugenedeRastignacisafictionalcharacterfromaseriesofnovelsbyanovelistborninwhichyear?",

"reasoning":"EugenedeRastignacisaprominentfictionalcharacterintheseries\\"LaComedieHumaine\\"byHonoredeBalzac\.TodeterminetheyearofbirthofBalzac,thenovelistwhocreatedRastignac,Ineedtolookuphisbiographicaldetails\.",

"search\_query":"HonoredeBalzacyearofbirth"

\},

\{

"augmented":true,

"context":\[

"CatherineWessinger\|CatherineWessingerisaprofessorofreligiousstudiesatLoyolaUniversityNewOrleanswithamainresearchfocusonmillennialism,newreligions,womenandreligionandreligionsofIndia\.Wessingerisco\-generaleditorofandservedasaconsultanttofederallawenforcementduringtheMontanaFreemenstandoff\.",

"LisaKemmerer\|LisaKemmerer,anecofeministveganphilosopher\-activist,isProfessorofPhilosophyandReligionatMontanaStateUniversityBillingsintheUnitedStates\.Sheisknownforherworkforanimalrightsandliberation,wheresheworkslargelyatthecrossroadsbetweenspeciesismandothersocialjusticeconcerns\-\-feminism,heterosexism,andclassism\.Shehasnearlyonehundredpublications\(largelyjournalarticles,anthologychapters,andencyclopediaentries\),andistheauthororeditorofninebooks\.",

"JohnCharlesPriscu\|JohnC\.Priscu\(born20September1952\)isaRomanian\-AmericanscientistwhoisthecurrentProfessorofEcologyintheDepartmentofLandResourcesandEnvironmentalSciencesatMontanaStateUniversity\.HeisaprincipalinvestigatorintheMcMurdoDryValleysLongTermEcologicalResearch\(LTER\)project\.",

"None\(MeshuggahEP\)\|NoneisanEPbySwedishmetalbandMeshuggah\.ItwasreleasedonNovember8,1994,byNuclearBlast\.Thebandstartedembracingamorecomplexapproachthatwouldlaythegroundsfortheirlaterstyle\.TheEPisoutofprint,butthefirstfourtracksarealsoavailableonthe\\"ContradictionsCollapse\\"reissuewhilethefifthtrackisonthe\\"DestroyEraseImprove\\"reissue\.Thisistheband’sfirstreleasetofeaturerhythmguitaristMartenHagstrom\.",

"None\(CloakofAlteringEP\)\|NoneisanEPbyCloakofAltering,independentlyreleasedonNovember18,2013\.",

"None,Piedmont\|Noneisacomune\(municipality\)intheMetropolitanCityofTurinintheItalianregionPiedmont,locatedabout20kmsouthwestofTurin\."

\],

"question":"TheMontanaFreemenengagedinastandoffwiththeFBI,CatherineWessingerwasaprofessorthatservedasaconsultanttofederallawenforcementduringthis,atwhichschooldidshework?",

"reasoning":"ThecontextprovidesinformationaboutCatherineWessinger,statingthatsheisaprofessorofreligiousstudiesatLoyolaUniversityNewOrleansandthatsheservedasaconsultanttofederallawenforcementduringtheMontanaFreemenstandoff\.Thequestionspecificallyaskswhichschoolsheworkedatduringthestandoff\.SincethecontextdirectlystatesherpositionatLoyolaUniversityNewOrleans,noadditionalsearchisrequired\.",

"search\_query":"None"

\}

\],

"signature":\{

"instructions":"Giventhefields‘context‘,‘question‘,producethefields‘search\_query‘\.",

"fields":\[

\{

"prefix":"Context:",

"description":"$\{context\}"

\},

\{

"prefix":"Question:",

"description":"$\{question\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"SearchQuery:",

"description":"$\{search\_query\}"

\}

\]

\}

\}

\}

\[Module:generate\_answer\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"context":\[

"HonoredeBalzac\|HonoredeBalzac\(;\],bornHonoreBalzac,20May1799\-18August1850\)wasaFrenchnovelistandplaywright\.Thenovelsequence\\"LaComedieHumaine\\",whichpresentsapanoramaofpost\-NapoleonicFrenchlife,isgenerallyviewedashis\\"magnumopus\\"\.",

"LaComediehumaine\|LaComediehumaine\(\],\\"TheHumanComedy\\"\)isthetitleofHonoredeBalzac’s\(1799\-1850\)multi\-volumecollectionofinterlinkednovelsandstoriesdepictingFrenchsocietyintheperiodoftheRestoration\(1815\-1830\)andtheJulyMonarchy\(1830\-1848\)\.",

"MaisondeBalzac\|TheMaisondeBalzacisawriter’shousemuseumintheformerresidenceofFrenchnovelistHonoredeBalzac\(1799\-1850\)\.Itislocatedinthe16tharrondissementat47,rueRaynouard,Paris,France,andopendailyexceptMondaysandholidays;admissiontothehouseisfree,butafeeischargedforitstemporaryexhibitions\.ThenearestmetroandRERstationsarePassyandAvenueduPresidentKennedy\.",

"Yes\-noquestion\|Inlinguistics,ayes\-noquestion,formallyknownasapolarquestion,isaquestionwhoseexpectedansweriseither\\"yes\\"or\\"no\\"\.Formally,theypresentanexclusivedisjunction,apairofalternativesofwhichonlyoneisacceptable\.InEnglish,suchquestionscanbeformedinbothpositiveandnegativeforms\(e\.g\.\\"Willyoubeheretomorrow?\\"and\\"Won’tyoubeheretomorrow?\\"\)\.",

"MicrosoftQuery\|MicrosoftQueryisavisualmethodofcreatingdatabasequeriesusingexamplesbasedonatextstring,thenameofadocumentoralistofdocuments\.TheQBEsystemconvertstheuserinputintoaformaldatabasequeryusingStructuredQueryLanguage\(SQL\)onthebackend,allowingtheusertoperformpowerfulsearcheswithouthavingtoexplicitlycomposetheminSQL,andwithoutevenneedingtoknowSQL\.ItisderivedfromMosheM\.Zloof’soriginalQuerybyExample\(QBE\)implementedinthemid\-1970satIBM’sResearchCentreinYorktown,NewYork\.",

"Nofreelunchinsearchandoptimization\|Incomputationalcomplexityandoptimizationthenofreelunchtheoremisaresultthatstatesthatforcertaintypesofmathematicalproblems,thecomputationalcostoffindingasolution,averagedoverallproblemsintheclass,isthesameforanysolutionmethod\.Nosolutionthereforeoffersa\\"shortcut\\"\.Incomputing,therearecircumstancesinwhichtheoutputsofallproceduressolvingaparticulartypeofproblemarestatisticallyidentical\.Acolourfulwayofdescribingsuchacircumstance,introducedbyDavidWolpertandWilliamG\.Macreadyinconnectionwiththeproblemsofsearch"

\],

"question":"EugenedeRastignacisafictionalcharacterfromaseriesofnovelsbyanovelistborninwhichyear?",

"reasoning":"EugenedeRastignacisafictionalcharacterfrom\\"LaComediehumaine,\\"aseriesofinterconnectednovelsbyHonoredeBalzac\.ThecontextspecifiesthatHonoredeBalzacwasbornon20May1799\.Thus,thenovelistwhocreatedEugenedeRastignacwasbornin1799\.",

"answer":"1799"

\},

\{

"augmented":true,

"context":\[

"TreatiseonTolerance\|TheTreatiseonToleranceontheOccasionoftheDeathofJeanCalasfromtheJudgmentRenderedinToulouse\(\\"PiecesOriginalesConcernantlaMortdesSieursCalasdetleJugementrenduaToulouse\\"\)isaworkbyFrenchphilosopherVoltaire,publishedin1763,inwhichhecallsfortolerancebetweenreligions,andtargetsreligiousfanaticism,especiallythatoftheJesuits\(underwhomVoltairereceivedhisearlyeducation\),indictingallsuperstitionssurroundingreligions\.",

"TruthandTolerance\|TruthandTolerance\(German:\\"Glaube\-Wahrheit\-Toleranz:DasChristentumunddieWeltreligionen\\"\)isabookwrittenbyJosephCardinalRatzinger\(PopeBenedictXVI\)\.",

"ACritiqueofPureTolerance\|ACritiqueofPureToleranceisa1965bookbythephilosopherRobertPaulWolff,thesociologistBarringtonMooreJr\.,andthephilosopherHerbertMarcuse\.Thebookhasbeendescribedas\\"peculiar\\"bycommentators,anditsauthorshavebeencriticizedforadvocatingintoleranceandthesuppressionofdissentingopinions\.",

"Oedipus\(Voltaireplay\)\|Oedipus\(French:\\"Oedipe\\"\)isatragedybytheFrenchdramatistandphilosopherVoltairethatwasfirstperformedin1718\.Itwashisfirstplayandthefirstliteraryworkforwhichheusedthepen\-nameVoltaire\(hisrealnamewasFrancois\-MarieArouet\)\.InadaptingSophocles’Atheniantragedy\\"OedipusRex\\",Voltaireattemptedtorationalisetheplotandmotivationofitscharacters\.Inaletterof1719heindicatedthathefounditimprobablethatthemurderofLaiushadnotbeeninvestigatedearlierandthatOedipusshouldtakesolongtounderstandtheoracle’sclearpronouncement\.VoltaireaddsasubplotconcerningtheloveofPhilocteteforJocaste\.Healsoreducestheprominenceofthethemeofincest\.",

"Voltaire\|Francois\-MarieArouet\(\];21November1694\-30May1778\),knownbyhis\\"nomdeplume\\"Voltaire\(;\]\),wasaFrenchEnlightenmentwriter,historian,andphilosopherfamousforhiswit,hisattacksontheestablishedCatholicChurch,andhisadvocacyoffreedomofreligion,freedomofspeech,andseparationofchurchandstate\.",

"Voltaire\(disambiguation\)\|VoltaireisthepennameofFrenchEnlightenmentwriter,historian,andphilosopherFrancois\-MarieArouet\."

\],

"question":"Whatistherealnameoftheauthorof\\"TreatiseonTolerance\\"?",

"reasoning":"Thecontextprovidesinformationabout\\"TreatiseonTolerance,\\"statingthatitisaworkbyVoltaire\.Additionally,thecontextidentifiesVoltaire’srealnameasFrancois\-MarieArouet\.Thus,therealnameoftheauthorof\\"TreatiseonTolerance\\"isFrancois\-MarieArouet\.",

"answer":"Francois\-MarieArouet"

\},

\{

"augmented":true,

"context":\[

"Odetta\|OdettaHolmes\(December31,1930\-December2,2008\),knownasOdetta,wasanAmericansinger,actress,guitarist,songwriter,andacivilandhumanrightsactivist,oftenreferredtoas\\"TheVoiceoftheCivilRightsMovement\\"\.HermusicalrepertoireconsistedlargelyofAmericanfolkmusic,blues,jazz,andspirituals\.AnimportantfigureintheAmericanfolkmusicrevivalofthe1950sand1960s,sheinfluencedmanyofthekeyfiguresofthefolk\-revivalofthattime,includingBobDylan,JoanBaez,MavisStaples,andJanisJoplin\.\\"Time\\"magazineincludedhersong\\"TakeThisHammer\\"onitslistofthe100GreatestPopularSongs,statingthat\\"RosaParkswasherNo\.1fan,andMartinLutherKingJr\.calledherthequeenofAmericanfolkmusic\.\\"",

"NathanielH\.Odell\|NathanielHolmesOdell\(October10,1828\-October30,1904\)wasaU\.S\.RepresentativefromNewYork\.",

"OdetteMonard\|OdetteMonard\(born1903,dateofdeathunknown\)wasaFrenchswimmer\.Shecompetedinthewomen’s200metrebreaststrokeeventatthe1924SummerOlympics\.",

"RosaParks\|RosaLouiseMcCauleyParks\(February4,1913\-October24,2005\)wasanactivistintheCivilRightsMovement,whomtheUnitedStatesCongresscalled\\"thefirstladyofcivilrights\\"and\\"themotherofthefreedommovement\\"\.",

"RosaParks\(disambiguation\)\|RosaParks\(1913\-2005\)wasanAmericancivilrightsactivist\.",

"EdnaGriffin\|EdnaMayGriffin\(1909\-February8,2000\)wasanAmericancivilrightspioneerandhumanrightsactivist\.Knownasthe\\"RosaParksofIowa\\",hercourtbattleagainsttheKatzDrugStoreinDesMoinesin1948,\\"StateofIowav\.Katz\\",foreshadowedthecivilrightsmovementandbecamealandmarkcasebeforetheIowaSupremeCourt\."

\],

"question":"InwhatyeardidOdettaHolmes’\\"No\.1fan\\"die?",

"reasoning":"ThecontextmentionsthatRosaParkswasOdettaHolmes’\\"No\.1fan\.\\"RosaParks’birthanddeathdatesareprovidedinthecontext:shewasbornonFebruary4,1913,anddiedonOctober24,2005\.Therefore,OdettaHolmes’\\"No\.1fan,\\"RosaParks,diedin2005\.",

"answer":"2005"

\},

\{

"augmented":true,

"context":\[

"WilbertAwdry\|WilbertVereAwdry,OBE\(15June1911\-21March1997\)wasanEnglishAnglicancleric,railwayenthusiast,andchildren’sauthor\.BetterknownastheReverendW\.Awdry,hewasthecreatorofThomastheTankEngine,thecentralfigureinhis\\"RailwaySeries\\"\.",

"Wilbert\|Wilbertisamasculinegivenname\.ItisderivedfromtheGermanicwords\\"wil\\"\-\\"will,desire\\"and\\"beraht\\"\-\\"bright,famous\\"\.ItsOldEnglishversionwouldhavebeen\\"Willbeorht\.\\"Wilbertmayreferto:",

"WilbertDoneleyko\|WilbertGeorgeDoneleyko\(February13,1913\-January27,1990\),lastnamealsospelledDoneley,wasapoliticianinManitoba,Canada\.",

"N/a\|n/aorN/Aisacommonabbreviationintablesandlistsforthephrasenotapplicable,notavailable,ornoanswer\.Itisusedtoindicatewheninformationinacertaintablecellisnotprovided,eitherbecauseitdoesnotapplytoaparticularcaseinquestionorbecausetheanswerisnotavailable\.",

"NoIntroductionNecessary\|NoIntroductionNecessaryisa\\"supersession\\"studioalbum\.InitiallyitwasconceivedasdebutalbumbyKeithDeGroot\(akaGerryTemple\),anewsingerdiscoveredbyproducerRegTracey\.ItfeaturedAlbertLeeandBigJimSullivanonguitars,JohnPaulJonesonbass,NickyHopkinsonkeyboards,ChrisHughesonsaxophoneandClemCattiniondrums\.ThesessionstookplaceatOlympicStudiosinLondon\.NinesongshadbeencompletedwhenneitherLeenorSullivanweregoingtobeabletotakepartinnextbookedsession\.TraceycalledJimmyPage\(whohadalreadyworkedonBeck’sBolerowithJonesandHopkins\)forthelastroundofrecordings\.Page’sparticipationlikelytookplaceineitherSeptemberorDecember1967duringbreaksfromtouringwiththeYardbirds\.",

"A\-not\-Aquestion\|Inlinguistics,anA\-not\-Aquestionisapolarquestionthatofferstwooppositepossibilitiesfortheanswer\.ThisdisjunctivequestionispredominantlyfoundinSiniticandsomeAltaiclanguagesthatoffersachoicebetweenanaffirmativepredicateanditsnegativecounterpart\.Theyarefunctionallyregardedasatypeof\\"yes/no\\"question,sincetheyareverysimilartoalargeextent\.\\"A\-not\-A\\"questionsareofteninterpretedashavinga’neutral’presuppositionorisusedinaneutralcontext\.ThismeansthatthepersonaskingtheA\-not\-Aquestiondoesnotassumethetruthvalueofthepropositionexpressedbythequestion\."

\],

"question":"WhatwasthemiddlenameofthemanwhowrotethebooksthatinspiredtheTVspin\-offThomasandFriends?",

"reasoning":"ThecontextprovidesinformationaboutWilbertVereAwdry,whoisidentifiedasthecreatorofThomastheTankEngine,thecentralcharacterinthe\\"RailwaySeries\.\\"ThesebooksinspiredtheTVspin\-off\\"ThomasandFriends\.\\"Hisfullname,includinghismiddlename,ismentionedintheprovideddetails\.",

"answer":"Vere"

\}

\],

"signature":\{

"instructions":"Giventhefields‘context‘and‘question‘,analyzetheprovidedinformationandperformmulti\-hopreasoningtogenerateastep\-by\-stepexplanation\(‘reasoning‘\)andthefinalanswer\(‘answer‘\)tothequery\.Ensurethatthereasoningexplicitlyjustifieshowtheanswerisderivedfromthecontext\.",

"fields":\[

\{

"prefix":"Context:",

"description":"$\{context\}"

\},

\{

"prefix":"Question:",

"description":"$\{question\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Answer:",

"description":"$\{answer\}"

\}

\]

\}

\}

\}

Listing 6:Evolved Context for HotpotQA \(Cluster 0\)\[Module:generate\_query\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"context":\[

"BattleofWattignies\|TheBattleofWattignies\(15\-16October1793\)sawaRepublicanFrencharmycommandedbyJean\-BaptisteJourdanattackaCoalitionarmydirectedbyPrinceJosiasofSaxe\-Coburg\-Saalfeld\.AftertwodaysofcombatJourdan’stroopscompelledtheHabsburgAustriancoveringforceledbyFrancoisSebastienCharlesJosephdeCroix,CountofClerfayttowithdraw\.TheWaroftheFirstCoalitionvictoryallowedtheFrenchtoraisetheSiegeofMaubeuge\.Atatimewhenfailedgeneralswereoftenexecutedorimprisoned,JourdanhadtoendureinterferencefromLazareCarnotfromtheCommitteeofPublicSafety\.Thevillage,renamedWattignies\-la\-Victoireinhonoroftheimportantsuccess,islocated9kmsoutheastofMaubeuge\.",

"AntoineBalland\|AntoineBalland\(27August1751\-3November1821\)commandedaFrenchinfantrydivisionduringtheearlyyearsoftheFrenchRevolutionaryWars\.Aformerprivate,hewaspromotedtocommandaninfantryregimentaftertheBattleofJemappes\.HebecameageneralofbrigadeinlateAugust1793andageneralofdivisionlessthanthreeweekslater\.Soonafterwards,heledadivisioninJean\-BaptisteJourdan’svictoryatWattignies\.Inthespringof1794,heledhistroopsatLeCateau,BeaumontandLandrecies\.BythistimeitwasobviousthathedidnothavethetalenttocommandacombatdivisionandhewasreplacedbyJeanBaptisteKleber\.HewasnotemployedafterJune1795anddiedatGuisein1821\.",

"PierreRaphaelPaillotdeBeauregard\|PierreRaphaelPaillotdeBeauregard\(14February1734\-30September1799\)ledaFrenchdivisionattheBattleofWattignies\.Anobleman,hejoinedtheFrenchRoyalArmyasacadetin1755andfoughtintheSevenYears’War\.Hebecamealieutenantcolonelin1779,buttwoyearslatergotintoadisputewithasuperiorofficerandwasplacedoninactiveservice\.TheFrenchRevolutionandtheWaroftheFirstCoalitionsavedhiscareer;hewaspromotedgeneralofbrigadein1792\.Heleda2,000\-mancolumnatArlonin1793butirritatedhisarmycommander\.Afterhis5,800\-strongdivisionperformedpoorlyatWattignieshewasputinprisonfor10months\.HewasbrieflyemployedagainduringtheWarintheVendeein1795beforeretiringfrommilitaryservicein1796\.",

"LazareCarnot\|LazareNicolasMarguerite,CountCarnot\(13May1753\-2August1823\)wasaFrenchpolitician,engineer,freemasonandmathematician\.Hewasknownasthe\\"OrganizerofVictory\\"intheFrenchRevolutionaryWars\.",

"HippolyteCarnot\|LazareHippolyteCarnot\(6October1801,Saint\-Omer\-16March1888\)wasaFrenchstatesman\.",

"Claude\-MarieCarnot\|Claude\-MarieCarnot\(calledCarnot\-Feulins;15July1755\-16October1836\)wasaFrenchsoldier,thebrotherofLazareCarnot\(1753\-1823\)\.HewasadeputytotheLegislativeAssemblyof1791,aRepresentativeduringtheHundredDaysof1815andaprovisionalcommissioner\(minister\)intheFrenchExecutiveCommissionof1815\."

\],

"question":"DuringtheBattleofWattignes,Jean\-BaptisteJourdanhadtoendureinterferencefromaFrenchpoliticianborninwhatyear?",

"reasoning":"ThequestionreferencesinterferencethatJean\-BaptisteJourdanfacedduringtheBattleofWattignesandasksforthebirthyearoftherelevantFrenchpolitician\.Toanswerthis,Ineedtoidentifythenameoftheinterferingpoliticianandthendeterminetheirbirthyear\.TheBattleofWattignesoccurredin1793duringtheFrenchRevolutionaryWars,andtheinterferencelikelycamefromaprominentpoliticalfigureofthatera\.",

"search\_query":"\\"BattleofWattignesJean\-BaptisteJourdaninterferenceFrenchpoliticianbirthyear\\""

\},

\{

"augmented":true,

"context":\[

"HonoredeBalzac\|HonoredeBalzac\(;\],bornHonoreBalzac,20May1799\-18August1850\)wasaFrenchnovelistandplaywright\.Thenovelsequence\\"LaComedieHumaine\\",whichpresentsapanoramaofpost\-NapoleonicFrenchlife,isgenerallyviewedashis\\"magnumopus\\"\.",

"LaComediehumaine\|LaComediehumaine\(\],\\"TheHumanComedy\\"\)isthetitleofHonoredeBalzac’s\(1799\-1850\)multi\-volumecollectionofinterlinkednovelsandstoriesdepictingFrenchsocietyintheperiodoftheRestoration\(1815\-1830\)andtheJulyMonarchy\(1830\-1848\)\.",

"MaisondeBalzac\|TheMaisondeBalzacisawriter’shousemuseumintheformerresidenceofFrenchnovelistHonoredeBalzac\(1799\-1850\)\.Itislocatedinthe16tharrondissementat47,rueRaynouard,Paris,France,andopendailyexceptMondaysandholidays;admissiontothehouseisfree,butafeeischargedforitstemporaryexhibitions\.ThenearestmetroandRERstationsarePassyandAvenueduPresidentKennedy\.",

"Yes\-noquestion\|Inlinguistics,ayes\-noquestion,formallyknownasapolarquestion,isaquestionwhoseexpectedansweriseither\\"yes\\"or\\"no\\"\.Formally,theypresentanexclusivedisjunction,apairofalternativesofwhichonlyoneisacceptable\.InEnglish,suchquestionscanbeformedinbothpositiveandnegativeforms\(e\.g\.\\"Willyoubeheretomorrow?\\"and\\"Won’tyoubeheretomorrow?\\"\)\.",

"MicrosoftQuery\|MicrosoftQueryisavisualmethodofcreatingdatabasequeriesusingexamplesbasedonatextstring,thenameofadocumentoralistofdocuments\.TheQBEsystemconvertstheuserinputintoaformaldatabasequeryusingStructuredQueryLanguage\(SQL\)onthebackend,allowingtheusertoperformpowerfulsearcheswithouthavingtoexplicitlycomposetheminSQL,andwithoutevenneedingtoknowSQL\.ItisderivedfromMosheM\.Zloof’soriginalQuerybyExample\(QBE\)implementedinthemid\-1970satIBM’sResearchCentreinYorktown,NewYork\.",

"Nofreelunchinsearchandoptimization\|Incomputationalcomplexityandoptimizationthenofreelunchtheoremisaresultthatstatesthatforcertaintypesofmathematicalproblems,thecomputationalcostoffindingasolution,averagedoverallproblemsintheclass,isthesameforanysolutionmethod\.Nosolutionthereforeoffersa\\"shortcut\\"\.Incomputing,therearecircumstancesinwhichtheoutputsofallproceduressolvingaparticulartypeofproblemarestatisticallyidentical\.Acolourfulwayofdescribingsuchacircumstance,introducedbyDavidWolpertandWilliamG\.Macreadyinconnectionwiththeproblemsofsearch"

\],

"question":"EugenedeRastignacisafictionalcharacterfromaseriesofnovelsbyanovelistborninwhichyear?",

"reasoning":"EugenedeRastignacisaprominentfictionalcharacterintheseries\\"LaComedieHumaine\\"byHonoredeBalzac\.TodeterminetheyearofbirthofBalzac,thenovelistwhocreatedRastignac,Ineedtolookuphisbiographicaldetails\.",

"search\_query":"HonoredeBalzacyearofbirth"

\},

\{

"augmented":true,

"context":\[

"CatherineWessinger\|CatherineWessingerisaprofessorofreligiousstudiesatLoyolaUniversityNewOrleanswithamainresearchfocusonmillennialism,newreligions,womenandreligionandreligionsofIndia\.Wessingerisco\-generaleditorofandservedasaconsultanttofederallawenforcementduringtheMontanaFreemenstandoff\.",

"LisaKemmerer\|LisaKemmerer,anecofeministveganphilosopher\-activist,isProfessorofPhilosophyandReligionatMontanaStateUniversityBillingsintheUnitedStates\.Sheisknownforherworkforanimalrightsandliberation,wheresheworkslargelyatthecrossroadsbetweenspeciesismandothersocialjusticeconcerns\-\-feminism,heterosexism,andclassism\.Shehasnearlyonehundredpublications\(largelyjournalarticles,anthologychapters,andencyclopediaentries\),andistheauthororeditorofninebooks\.",

"JohnCharlesPriscu\|JohnC\.Priscu\(born20September1952\)isaRomanian\-AmericanscientistwhoisthecurrentProfessorofEcologyintheDepartmentofLandResourcesandEnvironmentalSciencesatMontanaStateUniversity\.HeisaprincipalinvestigatorintheMcMurdoDryValleysLongTermEcologicalResearch\(LTER\)project\.",

"None\(MeshuggahEP\)\|NoneisanEPbySwedishmetalbandMeshuggah\.ItwasreleasedonNovember8,1994,byNuclearBlast\.Thebandstartedembracingamorecomplexapproachthatwouldlaythegroundsfortheirlaterstyle\.TheEPisoutofprint,butthefirstfourtracksarealsoavailableonthe\\"ContradictionsCollapse\\"reissuewhilethefifthtrackisonthe\\"DestroyEraseImprove\\"reissue\.Thisistheband’sfirstreleasetofeaturerhythmguitaristMartenHagstrom\.",

"None\(CloakofAlteringEP\)\|NoneisanEPbyCloakofAltering,independentlyreleasedonNovember18,2013\.",

"None,Piedmont\|Noneisacomune\(municipality\)intheMetropolitanCityofTurinintheItalianregionPiedmont,locatedabout20kmsouthwestofTurin\."

\],

"question":"TheMontanaFreemenengagedinastandoffwiththeFBI,CatherineWessingerwasaprofessorthatservedasaconsultanttofederallawenforcementduringthis,atwhichschooldidshework?",

"reasoning":"ThecontextprovidesinformationaboutCatherineWessinger,statingthatsheisaprofessorofreligiousstudiesatLoyolaUniversityNewOrleansandthatsheservedasaconsultanttofederallawenforcementduringtheMontanaFreemenstandoff\.Thequestionspecificallyaskswhichschoolsheworkedatduringthestandoff\.SincethecontextdirectlystatesherpositionatLoyolaUniversityNewOrleans,noadditionalsearchisrequired\.",

"search\_query":"None"

\}

\],

"signature":\{

"instructions":"Giventhefields‘context‘and‘question‘,producethefield‘search\_query‘\.Yourtaskistogenerateaconciseandprecisesearchquerydirectlyalignedwiththequestionanddesignedtoretrievethespecificfactualinformationrequiredtoanswerit\.Followtheseguidelines:

1\.\*\*ExtractMissingorAmbiguousInformation\*\*:Focusonidentifyinggapsorambiguitiesinthe‘context‘thatpreventansweringthequestion\.The‘search\_query‘shouldtargetretrievingthemissinginformationratherthanreiteratingwhatisalreadyinthe‘context‘\.

2\.\*\*PreserveKeyEntitiesandRelationships\*\*:Ensureallentities\(e\.g\.,names,dates,titles\)andtheirrelationshipsfromthequestionareaccuratelyincorporatedintothe‘search\_query‘\.Avoidalteringoromittingcriticaldetails\.

3\.\*\*AvoidReasoningorAssumptions\*\*:Donotincludereasoning,explanations,orinferredconclusionsinthe‘search\_query‘\.Thequeryshouldremainneutralandfactual,aimedsolelyatfindingthemissingpiecesofinformation\.

4\.\*\*AdapttoSpecificity\*\*:Whenthequestioncontainshighlyspecificdetails\(e\.g\.,dates,names,oruniqueidentifiers\),ensuretheseareincludedverbatiminthe‘search\_query‘\.Avoidgeneralizingorbroadeningthescopeunnecessarily\.

5\.\*\*AvoidRedundancies\*\*:Donotincludeinformationalreadyfullyresolvedinthe‘context‘\.The‘search\_query‘shouldfocusexclusivelyonunresolvedaspectsofthequestion\.

6\.\*\*ExamplesClarification\*\*:Forcaseswherethequestionexplicitlyreferencesanentityordetailabsentinthe‘context‘\(e\.g\.,\\"MelGroomes’almamater\\"\),prioritizeconstructingaquerythatcapturesthespecificmissingentityanditsrelationshiptothequestion\(e\.g\.,‘\\"MelGroomesalmamatercurrentenrollment\\"‘\)\.Forquestionsinvolvingacombinationofspecifictraits\(e\.g\.,\\"AmericanactressbornJune19,1972,in’TheIn\-Laws’\\"\),ensurealldetailsfromthequestionareincludedverbatiminthequery\(e\.g\.,‘\\"AmericanactressbornJune19,1972TheIn\-Laws\\"‘\)\.

Byadheringtotheseprinciples,the‘search\_query‘shouldbeoptimizedtoretrievetheprecisedatarequiredtoanswerthequestionaccurately,avoidingunnecessaryassumptionsorirrelevantdetails\.",

"fields":\[

\{

"prefix":"Context:",

"description":"$\{context\}"

\},

\{

"prefix":"Question:",

"description":"$\{question\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"SearchQuery:",

"description":"$\{search\_query\}"

\}

\]

\}

\}

\}

\[Module:generate\_answer\]

\{

"predict":\{

"demos":\[

\{

"augmented":true,

"context":\[

"HonoredeBalzac\|HonoredeBalzac\(;\],bornHonoreBalzac,20May1799\-18August1850\)wasaFrenchnovelistandplaywright\.Thenovelsequence\\"LaComedieHumaine\\",whichpresentsapanoramaofpost\-NapoleonicFrenchlife,isgenerallyviewedashis\\"magnumopus\\"\.",

"LaComediehumaine\|LaComediehumaine\(\],\\"TheHumanComedy\\"\)isthetitleofHonoredeBalzac’s\(1799\-1850\)multi\-volumecollectionofinterlinkednovelsandstoriesdepictingFrenchsocietyintheperiodoftheRestoration\(1815\-1830\)andtheJulyMonarchy\(1830\-1848\)\.",

"MaisondeBalzac\|TheMaisondeBalzacisawriter’shousemuseumintheformerresidenceofFrenchnovelistHonoredeBalzac\(1799\-1850\)\.Itislocatedinthe16tharrondissementat47,rueRaynouard,Paris,France,andopendailyexceptMondaysandholidays;admissiontothehouseisfree,butafeeischargedforitstemporaryexhibitions\.ThenearestmetroandRERstationsarePassyandAvenueduPresidentKennedy\.",

"Yes\-noquestion\|Inlinguistics,ayes\-noquestion,formallyknownasapolarquestion,isaquestionwhoseexpectedansweriseither\\"yes\\"or\\"no\\"\.Formally,theypresentanexclusivedisjunction,apairofalternativesofwhichonlyoneisacceptable\.InEnglish,suchquestionscanbeformedinbothpositiveandnegativeforms\(e\.g\.\\"Willyoubeheretomorrow?\\"and\\"Won’tyoubeheretomorrow?\\"\)\.",

"MicrosoftQuery\|MicrosoftQueryisavisualmethodofcreatingdatabasequeriesusingexamplesbasedonatextstring,thenameofadocumentoralistofdocuments\.TheQBEsystemconvertstheuserinputintoaformaldatabasequeryusingStructuredQueryLanguage\(SQL\)onthebackend,allowingtheusertoperformpowerfulsearcheswithouthavingtoexplicitlycomposetheminSQL,andwithoutevenneedingtoknowSQL\.ItisderivedfromMosheM\.Zloof’soriginalQuerybyExample\(QBE\)implementedinthemid\-1970satIBM’sResearchCentreinYorktown,NewYork\.",

"Nofreelunchinsearchandoptimization\|Incomputationalcomplexityandoptimizationthenofreelunchtheoremisaresultthatstatesthatforcertaintypesofmathematicalproblems,thecomputationalcostoffindingasolution,averagedoverallproblemsintheclass,isthesameforanysolutionmethod\.Nosolutionthereforeoffersa\\"shortcut\\"\.Incomputing,therearecircumstancesinwhichtheoutputsofallproceduressolvingaparticulartypeofproblemarestatisticallyidentical\.Acolourfulwayofdescribingsuchacircumstance,introducedbyDavidWolpertandWilliamG\.Macreadyinconnectionwiththeproblemsofsearch"

\],

"question":"EugenedeRastignacisafictionalcharacterfromaseriesofnovelsbyanovelistborninwhichyear?",

"reasoning":"EugenedeRastignacisafictionalcharacterfrom\\"LaComediehumaine,\\"aseriesofinterconnectednovelsbyHonoredeBalzac\.ThecontextspecifiesthatHonoredeBalzacwasbornon20May1799\.Thus,thenovelistwhocreatedEugenedeRastignacwasbornin1799\.",

"answer":"1799"

\},

\{

"augmented":true,

"context":\[

"TreatiseonTolerance\|TheTreatiseonToleranceontheOccasionoftheDeathofJeanCalasfromtheJudgmentRenderedinToulouse\(\\"PiecesOriginalesConcernantlaMortdesSieursCalasdetleJugementrenduaToulouse\\"\)isaworkbyFrenchphilosopherVoltaire,publishedin1763,inwhichhecallsfortolerancebetweenreligions,andtargetsreligiousfanaticism,especiallythatoftheJesuits\(underwhomVoltairereceivedhisearlyeducation\),indictingallsuperstitionssurroundingreligions\.",

"TruthandTolerance\|TruthandTolerance\(German:\\"Glaube\-Wahrheit\-Toleranz:DasChristentumunddieWeltreligionen\\"\)isabookwrittenbyJosephCardinalRatzinger\(PopeBenedictXVI\)\.",

"ACritiqueofPureTolerance\|ACritiqueofPureToleranceisa1965bookbythephilosopherRobertPaulWolff,thesociologistBarringtonMooreJr\.,andthephilosopherHerbertMarcuse\.Thebookhasbeendescribedas\\"peculiar\\"bycommentators,anditsauthorshavebeencriticizedforadvocatingintoleranceandthesuppressionofdissentingopinions\.",

"Oedipus\(Voltaireplay\)\|Oedipus\(French:\\"Oedipe\\"\)isatragedybytheFrenchdramatistandphilosopherVoltairethatwasfirstperformedin1718\.Itwashisfirstplayandthefirstliteraryworkforwhichheusedthepen\-nameVoltaire\(hisrealnamewasFrancois\-MarieArouet\)\.InadaptingSophocles’Atheniantragedy\\"OedipusRex\\",Voltaireattemptedtorationalisetheplotandmotivationofitscharacters\.Inaletterof1719heindicatedthathefounditimprobablethatthemurderofLaiushadnotbeeninvestigatedearlierandthatOedipusshouldtakesolongtounderstandtheoracle’sclearpronouncement\.VoltaireaddsasubplotconcerningtheloveofPhilocteteforJocaste\.Healsoreducestheprominenceofthethemeofincest\.",

"Voltaire\|Francois\-MarieArouet\(\];21November1694\-30May1778\),knownbyhis\\"nomdeplume\\"Voltaire\(;\]\),wasaFrenchEnlightenmentwriter,historian,andphilosopherfamousforhiswit,hisattacksontheestablishedCatholicChurch,andhisadvocacyoffreedomofreligion,freedomofspeech,andseparationofchurchandstate\.",

"Voltaire\(disambiguation\)\|VoltaireisthepennameofFrenchEnlightenmentwriter,historian,andphilosopherFrancois\-MarieArouet\."

\],

"question":"Whatistherealnameoftheauthorof\\"TreatiseonTolerance\\"?",

"reasoning":"Thecontextprovidesinformationabout\\"TreatiseonTolerance,\\"statingthatitisaworkbyVoltaire\.Additionally,thecontextidentifiesVoltaire’srealnameasFrancois\-MarieArouet\.Thus,therealnameoftheauthorof\\"TreatiseonTolerance\\"isFrancois\-MarieArouet\.",

"answer":"Francois\-MarieArouet"

\},

\{

"augmented":true,

"context":\[

"Odetta\|OdettaHolmes\(December31,1930\-December2,2008\),knownasOdetta,wasanAmericansinger,actress,guitarist,songwriter,andacivilandhumanrightsactivist,oftenreferredtoas\\"TheVoiceoftheCivilRightsMovement\\"\.HermusicalrepertoireconsistedlargelyofAmericanfolkmusic,blues,jazz,andspirituals\.AnimportantfigureintheAmericanfolkmusicrevivalofthe1950sand1960s,sheinfluencedmanyofthekeyfiguresofthefolk\-revivalofthattime,includingBobDylan,JoanBaez,MavisStaples,andJanisJoplin\.\\"Time\\"magazineincludedhersong\\"TakeThisHammer\\"onitslistofthe100GreatestPopularSongs,statingthat\\"RosaParkswasherNo\.1fan,andMartinLutherKingJr\.calledherthequeenofAmericanfolkmusic\.\\"",

"NathanielH\.Odell\|NathanielHolmesOdell\(October10,1828\-October30,1904\)wasaU\.S\.RepresentativefromNewYork\.",

"OdetteMonard\|OdetteMonard\(born1903,dateofdeathunknown\)wasaFrenchswimmer\.Shecompetedinthewomen’s200metrebreaststrokeeventatthe1924SummerOlympics\.",

"RosaParks\|RosaLouiseMcCauleyParks\(February4,1913\-October24,2005\)wasanactivistintheCivilRightsMovement,whomtheUnitedStatesCongresscalled\\"thefirstladyofcivilrights\\"and\\"themotherofthefreedommovement\\"\.",

"RosaParks\(disambiguation\)\|RosaParks\(1913\-2005\)wasanAmericancivilrightsactivist\.",

"EdnaGriffin\|EdnaMayGriffin\(1909\-February8,2000\)wasanAmericancivilrightspioneerandhumanrightsactivist\.Knownasthe\\"RosaParksofIowa\\",hercourtbattleagainsttheKatzDrugStoreinDesMoinesin1948,\\"StateofIowav\.Katz\\",foreshadowedthecivilrightsmovementandbecamealandmarkcasebeforetheIowaSupremeCourt\."

\],

"question":"InwhatyeardidOdettaHolmes’\\"No\.1fan\\"die?",

"reasoning":"ThecontextmentionsthatRosaParkswasOdettaHolmes’\\"No\.1fan\.\\"RosaParks’birthanddeathdatesareprovidedinthecontext:shewasbornonFebruary4,1913,anddiedonOctober24,2005\.Therefore,OdettaHolmes’\\"No\.1fan,\\"RosaParks,diedin2005\.",

"answer":"2005"

\},

\{

"augmented":true,

"context":\[

"WilbertAwdry\|WilbertVereAwdry,OBE\(15June1911\-21March1997\)wasanEnglishAnglicancleric,railwayenthusiast,andchildren’sauthor\.BetterknownastheReverendW\.Awdry,hewasthecreatorofThomastheTankEngine,thecentralfigureinhis\\"RailwaySeries\\"\.",

"Wilbert\|Wilbertisamasculinegivenname\.ItisderivedfromtheGermanicwords\\"wil\\"\-\\"will,desire\\"and\\"beraht\\"\-\\"bright,famous\\"\.ItsOldEnglishversionwouldhavebeen\\"Willbeorht\.\\"Wilbertmayreferto:",

"WilbertDoneleyko\|WilbertGeorgeDoneleyko\(February13,1913\-January27,1990\),lastnamealsospelledDoneley,wasapoliticianinManitoba,Canada\.",

"N/a\|n/aorN/Aisacommonabbreviationintablesandlistsforthephrasenotapplicable,notavailable,ornoanswer\.Itisusedtoindicatewheninformationinacertaintablecellisnotprovided,eitherbecauseitdoesnotapplytoaparticularcaseinquestionorbecausetheanswerisnotavailable\.",

"NoIntroductionNecessary\|NoIntroductionNecessaryisa\\"supersession\\"studioalbum\.InitiallyitwasconceivedasdebutalbumbyKeithDeGroot\(akaGerryTemple\),anewsingerdiscoveredbyproducerRegTracey\.ItfeaturedAlbertLeeandBigJimSullivanonguitars,JohnPaulJonesonbass,NickyHopkinsonkeyboards,ChrisHughesonsaxophoneandClemCattiniondrums\.ThesessionstookplaceatOlympicStudiosinLondon\.NinesongshadbeencompletedwhenneitherLeenorSullivanweregoingtobeabletotakepartinnextbookedsession\.TraceycalledJimmyPage\(whohadalreadyworkedonBeck’sBolerowithJonesandHopkins\)forthelastroundofrecordings\.Page’sparticipationlikelytookplaceineitherSeptemberorDecember1967duringbreaksfromtouringwiththeYardbirds\.",

"A\-not\-Aquestion\|Inlinguistics,anA\-not\-Aquestionisapolarquestionthatofferstwooppositepossibilitiesfortheanswer\.ThisdisjunctivequestionispredominantlyfoundinSiniticandsomeAltaiclanguagesthatoffersachoicebetweenanaffirmativepredicateanditsnegativecounterpart\.Theyarefunctionallyregardedasatypeof\\"yes/no\\"question,sincetheyareverysimilartoalargeextent\.\\"A\-not\-A\\"questionsareofteninterpretedashavinga’neutral’presuppositionorisusedinaneutralcontext\.ThismeansthatthepersonaskingtheA\-not\-Aquestiondoesnotassumethetruthvalueofthepropositionexpressedbythequestion\."

\],

"question":"WhatwasthemiddlenameofthemanwhowrotethebooksthatinspiredtheTVspin\-offThomasandFriends?",

"reasoning":"ThecontextprovidesinformationaboutWilbertVereAwdry,whoisidentifiedasthecreatorofThomastheTankEngine,thecentralcharacterinthe\\"RailwaySeries\.\\"ThesebooksinspiredtheTVspin\-off\\"ThomasandFriends\.\\"Hisfullname,includinghismiddlename,ismentionedintheprovideddetails\.",

"answer":"Vere"

\}

\],

"signature":\{

"instructions":"Giventhefields‘context‘and‘question‘,analyzetheprovidedinformationandperformmulti\-hopreasoningtogenerateastep\-by\-stepexplanation\(‘reasoning‘\)andthefinalanswer\(‘answer‘\)tothequery\.Ensurethat:

1\.\*\*PrecisioninTerminologyandDataExtraction\*\*:Carefullyextractandusepreciseandcompletedetailsdirectlyfromthecontext\.Payparticularattentiontonumericdata,dates,propernouns,entitynames,andotherkeydetails\.Donotrelyonassumptionsorexternalknowledgeunlessexplicitlyrequiredbythequery\.

2\.\*\*ContextualCompleteness\*\*:Rigorouslyvalidatethatallelementsofthereasoningandthefinalanswerarefullysupportedbythecontext\.Ifthecontextdoesnotdirectlyprovidethenecessaryinformation,explicitlystatewhatismissingandprovideanappropriatefallbackresponse\(e\.g\.,’Notavailable’or’Noansweravailable’\)\.

3\.\*\*LogicalStep\-by\-StepReasoning\*\*:Constructthereasoninginaclear,explicit,andlogicallyconsistentmanner\.Clearlyoutlinehoweachpieceofinformationfromthecontextcontributestoderivingtheanswer\.Avoidskippingintermediatestepsormakingvagueconnectionsbetweenthecontextandthequery\.

4\.\*\*Query\-SpecificInterpretationandNuanceHandling\*\*:Thoroughlyanalyzethephrasingandimpliedconditionsinthequestion\.Paycloseattentiontodetailssuchasspecificdates,numericconstraints,entityrelationships,andotherquery\-specificnuances\.Ensurethereasoningandanswerdirectlyandfullyaddressthequeryasstated\.

5\.\*\*ErrorIdentificationandResolution\*\*:Proactivelyvalidateextractedinformationagainstthecontexttoavoiderrors\.Forexample:

\-Fordate\-relatedqueries,cross\-checkalldatesinthecontexttoensureaccuracy\.

\-Fornumericorquantity\-relatedqueries,verifycalculationsorextractionsagainstthedataprovided\.

\-Forentity\-relatedqueries,ensureexactmatchingofnamesanddesignations\.

6\.\*\*FallbackResponsesforAmbiguityorMissingContext\*\*:Ifthecontextdoesnotsupportadefinitiveanswer,clearlycommunicatethisinthereasoningandprovideasuitablefallbackresponse\.Avoidguessingorintroducingunsupportedinformation\.

7\.\*\*AnswerFormattingandConsistency\*\*:Adherestrictlytotheexpectedanswerformatbasedonthequestionorprovidedfeedback\.Foryes/noquestions,uselowercase\(’yes’,’no’\)\.Forothertypesofqueries,ensuretheanswermatchestheexactphrasingorconventionspresentinthecontext\.

8\.\*\*Feedback\-InformedRefinement\*\*:Wherepriorexecutionshavefailedduetoinaccuraciesormisinterpretations,payspecialattentiontosimilarpatternsinfuturequeries\.Uselessonsfromsuchfailurestorefinereasoningandavoidrepeatingerrors\.

Failuretoadheretotheseprincipleswillresultinincorrectreasoningandanswers\.Carefullyvalidateeachstepofthereasoningagainstthecontextbeforeprovidingthefinalanswer\.",

"fields":\[

\{

"prefix":"Context:",

"description":"$\{context\}"

\},

\{

"prefix":"Question:",

"description":"$\{question\}"

\},

\{

"prefix":"Reasoning:Let’sthinkstepbystepinorderto",

"description":"$\{reasoning\}"

\},

\{

"prefix":"Answer:",

"description":"$\{answer\}"

\}

\]

\}

\}

\}

Listing 7:Evolved Context for HotpotQA \(Round 1\)

## NeurIPS Paper Checklist

1. 1\.Claims
2. Question: Do the main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope?
3. Answer:\[Yes\]
4. Justification: The main claims made in the abstract and introduction accurately reflect the paper’s contributions and scope\.
5. Guidelines: - •The answer\[N/A\]means that the abstract and introduction do not include the claims made in the paper\. - •The abstract and/or introduction should clearly state the claims made, including the contributions made in the paper and important assumptions and limitations\. A\[No\]or\[N/A\]answer to this question will not be perceived well by the reviewers\. - •The claims made should match theoretical and experimental results, and reflect how much the results can be expected to generalize to other settings\. - •It is fine to include aspirational goals as motivation as long as it is clear that these goals are not attained by the paper\.
6. 2\.Limitations
7. Question: Does the paper discuss the limitations of the work performed by the authors?
8. Answer:\[Yes\]
9. Justification: The paper discusses the limitations of our work\. Check Appendix[E](https://arxiv.org/html/2605.15721#A5)
10. Guidelines: - •The answer\[N/A\]means that the paper has no limitation while the answer\[No\]means that the paper has limitations, but those are not discussed in the paper\. - •The authors are encouraged to create a separate “Limitations” section in their paper\. - •The paper should point out any strong assumptions and how robust the results are to violations of these assumptions \(e\.g\., independence assumptions, noiseless settings, model well\-specification, asymptotic approximations only holding locally\)\. The authors should reflect on how these assumptions might be violated in practice and what the implications would be\. - •The authors should reflect on the scope of the claims made, e\.g\., if the approach was only tested on a few datasets or with a few runs\. In general, empirical results often depend on implicit assumptions, which should be articulated\. - •The authors should reflect on the factors that influence the performance of the approach\. For example, a facial recognition algorithm may perform poorly when image resolution is low or images are taken in low lighting\. Or a speech\-to\-text system might not be used reliably to provide closed captions for online lectures because it fails to handle technical jargon\. - •The authors should discuss the computational efficiency of the proposed algorithms and how they scale with dataset size\. - •If applicable, the authors should discuss possible limitations of their approach to address problems of privacy and fairness\. - •While the authors might fear that complete honesty about limitations might be used by reviewers as grounds for rejection, a worse outcome might be that reviewers discover limitations that aren’t acknowledged in the paper\. The authors should use their best judgment and recognize that individual actions in favor of transparency play an important role in developing norms that preserve the integrity of the community\. Reviewers will be specifically instructed to not penalize honesty concerning limitations\.
11. 3\.Theory assumptions and proofs
12. Question: For each theoretical result, does the paper provide the full set of assumptions and a complete \(and correct\) proof?
13. Answer:\[Yes\]
14. Justification: The paper provides full set of assumptions and a complete proof\. Check Appendix[B](https://arxiv.org/html/2605.15721#A2)\.
15. Guidelines: - •The answer\[N/A\]means that the paper does not include theoretical results\. - •All the theorems, formulas, and proofs in the paper should be numbered and cross\-referenced\. - •All assumptions should be clearly stated or referenced in the statement of any theorems\. - •The proofs can either appear in the main paper or the supplemental material, but if they appear in the supplemental material, the authors are encouraged to provide a short proof sketch to provide intuition\. - •Inversely, any informal proof provided in the core of the paper should be complemented by formal proofs provided in appendix or supplemental material\. - •Theorems and Lemmas that the proof relies upon should be properly referenced\.
16. 4\.Experimental result reproducibility
17. Question: Does the paper fully disclose all the information needed to reproduce the main experimental results of the paper to the extent that it affects the main claims and/or conclusions of the paper \(regardless of whether the code and data are provided or not\)?
18. Answer:\[Yes\]
19. Justification: The paper discusses the experimental details in Appendix[F](https://arxiv.org/html/2605.15721#A6)and provides a anonymous github repo URL in Section[4](https://arxiv.org/html/2605.15721#S4)\.
20. Guidelines: - •The answer\[N/A\]means that the paper does not include experiments\. - •If the paper includes experiments, a\[No\]answer to this question will not be perceived well by the reviewers: Making the paper reproducible is important, regardless of whether the code and data are provided or not\. - •If the contribution is a dataset and/or model, the authors should describe the steps taken to make their results reproducible or verifiable\. - •Depending on the contribution, reproducibility can be accomplished in various ways\. For example, if the contribution is a novel architecture, describing the architecture fully might suffice, or if the contribution is a specific model and empirical evaluation, it may be necessary to either make it possible for others to replicate the model with the same dataset, or provide access to the model\. In general\. releasing code and data is often one good way to accomplish this, but reproducibility can also be provided via detailed instructions for how to replicate the results, access to a hosted model \(e\.g\., in the case of a large language model\), releasing of a model checkpoint, or other means that are appropriate to the research performed\. - •While NeurIPS does not require releasing code, the conference does require all submissions to provide some reasonable avenue for reproducibility, which may depend on the nature of the contribution\. For example 1. \(a\)If the contribution is primarily a new algorithm, the paper should make it clear how to reproduce that algorithm\. 2. \(b\)If the contribution is primarily a new model architecture, the paper should describe the architecture clearly and fully\. 3. \(c\)If the contribution is a new model \(e\.g\., a large language model\), then there should either be a way to access this model for reproducing the results or a way to reproduce the model \(e\.g\., with an open\-source dataset or instructions for how to construct the dataset\)\. 4. \(d\)We recognize that reproducibility may be tricky in some cases, in which case authors are welcome to describe the particular way they provide for reproducibility\. In the case of closed\-source models, it may be that access to the model is limited in some way \(e\.g\., to registered users\), but it should be possible for other researchers to have some path to reproducing or verifying the results\.
21. 5\.Open access to data and code
22. Question: Does the paper provide open access to the data and code, with sufficient instructions to faithfully reproduce the main experimental results, as described in supplemental material?
23. Answer:\[Yes\]
24. Justification: Check Section[4](https://arxiv.org/html/2605.15721#S4)
25. Guidelines: - •The answer\[N/A\]means that paper does not include experiments requiring code\. - • - •While we encourage the release of code and data, we understand that this might not be possible, so\[No\]is an acceptable answer\. Papers cannot be rejected simply for not including code, unless this is central to the contribution \(e\.g\., for a new open\-source benchmark\)\. - •The instructions should contain the exact command and environment needed to run to reproduce the results\. See the NeurIPS code and data submission guidelines \([https://neurips\.cc/public/guides/CodeSubmissionPolicy](https://neurips.cc/public/guides/CodeSubmissionPolicy)\) for more details\. - •The authors should provide instructions on data access and preparation, including how to access the raw data, preprocessed data, intermediate data, and generated data, etc\. - •The authors should provide scripts to reproduce all experimental results for the new proposed method and baselines\. If only a subset of experiments are reproducible, they should state which ones are omitted from the script and why\. - •At submission time, to preserve anonymity, the authors should release anonymized versions \(if applicable\)\. - •Providing as much information as possible in supplemental material \(appended to the paper\) is recommended, but including URLs to data and code is permitted\.
26. 6\.Experimental setting/details
27. Question: Does the paper specify all the training and test details \(e\.g\., data splits, hyperparameters, how they were chosen, type of optimizer\) necessary to understand the results?
28. Answer:\[Yes\]
29. Justification: The paper provides the experimental details in Appendix[F](https://arxiv.org/html/2605.15721#A6)\.
30. Guidelines: - •The answer\[N/A\]means that the paper does not include experiments\. - •The experimental setting should be presented in the core of the paper to a level of detail that is necessary to appreciate the results and make sense of them\. - •The full details can be provided either with the code, in appendix, or as supplemental material\.
31. 7\.Experiment statistical significance
32. Question: Does the paper report error bars suitably and correctly defined or other appropriate information about the statistical significance of the experiments?
33. Answer:\[Yes\]
34. Justification: We have provided our significance test in Table[1](https://arxiv.org/html/2605.15721#S4.T1)\.
35. Guidelines: - •The answer\[N/A\]means that the paper does not include experiments\. - •The authors should answer\[Yes\]if the results are accompanied by error bars, confidence intervals, or statistical significance tests, at least for the experiments that support the main claims of the paper\. - •The factors of variability that the error bars are capturing should be clearly stated \(for example, train/test split, initialization, random drawing of some parameter, or overall run with given experimental conditions\)\. - •The method for calculating the error bars should be explained \(closed form formula, call to a library function, bootstrap, etc\.\) - •The assumptions made should be given \(e\.g\., Normally distributed errors\)\. - •It should be clear whether the error bar is the standard deviation or the standard error of the mean\. - •It is OK to report 1\-sigma error bars, but one should state it\. The authors should preferably report a 2\-sigma error bar than state that they have a 96% CI, if the hypothesis of Normality of errors is not verified\. - •For asymmetric distributions, the authors should be careful not to show in tables or figures symmetric error bars that would yield results that are out of range \(e\.g\., negative error rates\)\. - •If error bars are reported in tables or plots, the authors should explain in the text how they were calculated and reference the corresponding figures or tables in the text\.
36. 8\.Experiments compute resources
37. Question: For each experiment, does the paper provide sufficient information on the computer resources \(type of compute workers, memory, time of execution\) needed to reproduce the experiments?
38. Answer:\[Yes\]
39. Justification: We have detailedly provide our compute resources\. Check Appendix[F](https://arxiv.org/html/2605.15721#A6)\.
40. Guidelines: - •The answer\[N/A\]means that the paper does not include experiments\. - •The paper should indicate the type of compute workers CPU or GPU, internal cluster, or cloud provider, including relevant memory and storage\. - •The paper should provide the amount of compute required for each of the individual experimental runs as well as estimate the total compute\. - •The paper should disclose whether the full research project required more compute than the experiments reported in the paper \(e\.g\., preliminary or failed experiments that didn’t make it into the paper\)\.
41. 9\.Code of ethics
43. Answer:\[Yes\]
44. Justification: The research conducted in the paper does confirm, in every respsect, with the NeurIPS Code of Ethics\.
45. Guidelines: - •The answer\[N/A\]means that the authors have not reviewed the NeurIPS Code of Ethics\. - •If the authors answer\[No\], they should explain the special circumstances that require a deviation from the Code of Ethics\. - •The authors should make sure to preserve anonymity \(e\.g\., if there is a special consideration due to laws or regulations in their jurisdiction\)\.
46. 10\.Broader impacts
47. Question: Does the paper discuss both potential positive societal impacts and negative societal impacts of the work performed?
48. Answer:\[N/A\]
49. Justification: Our work is about context engineering in LLM, and does not have positive or negative social impact\.
50. Guidelines: - •The answer\[N/A\]means that there is no societal impact of the work performed\. - •If the authors answer\[N/A\]or\[No\], they should explain why their work has no societal impact or why the paper does not address societal impact\. - •Examples of negative societal impacts include potential malicious or unintended uses \(e\.g\., disinformation, generating fake profiles, surveillance\), fairness considerations \(e\.g\., deployment of technologies that could make decisions that unfairly impact specific groups\), privacy considerations, and security considerations\. - •The conference expects that many papers will be foundational research and not tied to particular applications, let alone deployments\. However, if there is a direct path to any negative applications, the authors should point it out\. For example, it is legitimate to point out that an improvement in the quality of generative models could be used to generate Deepfakes for disinformation\. On the other hand, it is not needed to point out that a generic algorithm for optimizing neural networks could enable people to train models that generate Deepfakes faster\. - •The authors should consider possible harms that could arise when the technology is being used as intended and functioning correctly, harms that could arise when the technology is being used as intended but gives incorrect results, and harms following from \(intentional or unintentional\) misuse of the technology\. - •If there are negative societal impacts, the authors could also discuss possible mitigation strategies \(e\.g\., gated release of models, providing defenses in addition to attacks, mechanisms for monitoring misuse, mechanisms to monitor how a system learns from feedback over time, improving the efficiency and accessibility of ML\)\.
51. 11\.Safeguards
52. Question: Does the paper describe safeguards that have been put in place for responsible release of data or models that have a high risk for misuse \(e\.g\., pre\-trained language models, image generators, or scraped datasets\)?
53. Answer:\[N/A\]
54. Justification: The paper poses no such risk\.
55. Guidelines: - •The answer\[N/A\]means that the paper poses no such risks\. - •Released models that have a high risk for misuse or dual\-use should be released with necessary safeguards to allow for controlled use of the model, for example by requiring that users adhere to usage guidelines or restrictions to access the model or implementing safety filters\. - •Datasets that have been scraped from the Internet could pose safety risks\. The authors should describe how they avoided releasing unsafe images\. - •We recognize that providing effective safeguards is challenging, and many papers do not require this, but we encourage authors to take this into account and make a best faith effort\.
56. 12\.Licenses for existing assets
57. Question: Are the creators or original owners of assets \(e\.g\., code, data, models\), used in the paper, properly credited and are the license and terms of use explicitly mentioned and properly respected?
58. Answer:\[Yes\]
59. Justification: : All datasets and code are open\-source and follow the license of the original work\. Check experimental setup in Section[4](https://arxiv.org/html/2605.15721#S4)and Appendix[F](https://arxiv.org/html/2605.15721#A6)\.
60. Guidelines: - •The answer\[N/A\]means that the paper does not use existing assets\. - •The authors should cite the original paper that produced the code package or dataset\. - •The authors should state which version of the asset is used and, if possible, include a URL\. - •The name of the license \(e\.g\., CC\-BY 4\.0\) should be included for each asset\. - •For scraped data from a particular source \(e\.g\., website\), the copyright and terms of service of that source should be provided\. - •If assets are released, the license, copyright information, and terms of use in the package should be provided\. For popular datasets,[paperswithcode\.com/datasets](https://arxiv.org/html/2605.15721v1/paperswithcode.com/datasets)has curated licenses for some datasets\. Their licensing guide can help determine the license of a dataset\. - •For existing datasets that are re\-packaged, both the original license and the license of the derived asset \(if it has changed\) should be provided\. - •If this information is not available online, the authors are encouraged to reach out to the asset’s creators\.
61. 13\.New assets
62. Question: Are new assets introduced in the paper well documented and is the documentation provided alongside the assets?
63. Answer:\[N/A\]
64. Justification: This paper does not release new assets\.
65. Guidelines: - •The answer\[N/A\]means that the paper does not release new assets\. - •Researchers should communicate the details of the dataset/code/model as part of their submissions via structured templates\. This includes details about training, license, limitations, etc\. - •The paper should discuss whether and how consent was obtained from people whose asset is used\. - •At submission time, remember to anonymize your assets \(if applicable\)\. You can either create an anonymized URL or include an anonymized zip file\.
66. 14\.Crowdsourcing and research with human subjects
67. Question: For crowdsourcing experiments and research with human subjects, does the paper include the full text of instructions given to participants and screenshots, if applicable, as well as details about compensation \(if any\)?
68. Answer:\[N/A\]
69. Justification: This paper does not contain crowdingsourcing experiments\.
70. Guidelines: - •The answer\[N/A\]means that the paper does not involve crowdsourcing nor research with human subjects\. - •Including this information in the supplemental material is fine, but if the main contribution of the paper involves human subjects, then as much detail as possible should be included in the main paper\. - •According to the NeurIPS Code of Ethics, workers involved in data collection, curation, or other labor should be paid at least the minimum wage in the country of the data collector\.
71. 15\.Institutional review board \(IRB\) approvals or equivalent for research with human subjects
72. Question: Does the paper describe potential risks incurred by study participants, whether such risks were disclosed to the subjects, and whether Institutional Review Board \(IRB\) approvals \(or an equivalent approval/review based on the requirements of your country or institution\) were obtained?
73. Answer:\[N/A\]
74. Justification: The paper does not involve crowdsourcing nor research with human subjects\.
75. Guidelines: - •The answer\[N/A\]means that the paper does not involve crowdsourcing nor research with human subjects\. - •Depending on the country in which research is conducted, IRB approval \(or equivalent\) may be required for any human subjects research\. If you obtained IRB approval, you should clearly state this in the paper\. - •We recognize that the procedures for this may vary significantly between institutions and locations, and we expect authors to adhere to the NeurIPS Code of Ethics and the guidelines for their institution\. - •For initial submissions, do not include any information that would break anonymity \(if applicable\), such as the institution conducting the review\.
76. 16\.Declaration of LLM usage
77. Question: Does the paper describe the usage of LLMs if it is an important, original, or non\-standard component of the core methods in this research? Note that if the LLM is used only for writing, editing, or formatting purposes and does*not*impact the core methodology, scientific rigor, or originality of the research, declaration is not required\.
78. Answer:\[N/A\]
79. Justification: The core method development in this research does not involve LLMS as any important, original, or non\-standard components\.
80. Guidelines: - •The answer\[N/A\]means that the core method development in this research does not involve LLMs as any important, original, or non\-standard components\. - •Please refer to our LLM policy in the NeurIPS handbook for what should or should not be described\.

Similar Articles

Effective context engineering for AI agents

Anthropic Engineering

Anthropic publishes a guide defining context engineering as the evolution of prompt engineering, focusing on curating optimal context tokens for AI agents to maintain performance and focus during multi-turn inference.

Context-Aware RL for Agentic and Multimodal LLMs

Hugging Face Daily Papers

Introduces ContextRL, a reinforcement learning approach that teaches LLMs to identify which context supports an answer, achieving gains on agentic and multimodal benchmarks.

Show HN: A working reference implementation of context engineering

Hacker News Top

A working reference implementation of context engineering — a discipline for designing, retrieving, and injecting organizational context into AI systems to produce accurate, domain-specific outputs. The repo demonstrates five components (corpus, retrieval, injection, output, enforcement) running against Amazon Bedrock with Claude.