HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation

arXiv cs.CL Papers

Summary

This paper introduces HawkesLLM, a framework that models semantic uncertainty propagation in multi-step agentic text simulations by combining a multivariate Hawkes process for temporal influence and memory selection with a language model for text generation. Evaluation on a GDELT news-cascade case study shows improved late-stage semantic alignment under compact prompt-memory constraints.

arXiv:2605.23043v1 Announce Type: new Abstract: Agentic text-simulation systems write in sequence, with each item becoming possible context for later steps. That makes uncertainty path-dependent: an early ambiguity can affect later outputs. This paper studies this problem with HawkesLLM, a framework that separates temporal influence modeling from text generation. We represent the cascade as a network whose nodes are text-generating agents. A multivariate Hawkes process models how these nodes activate over time and which earlier node outputs should influence later prompts. A language model then writes each new event from the compact memory selected by this temporal model. We evaluate the framework on a held-out Global Database of Events, Language, and Tone (GDELT) news-cascade case study. The diagnostics track semantic alignment with local held-out references and separate local drift from global drift. In this setting, HawkesLLM improves late-stage semantic alignment under a compact prompt-memory budget.
Original Article
View Cached Full Text

Cached at: 05/25/26, 08:58 AM

# HawkesLLM: Semantic Uncertainty Propagation in Agentic Text Simulation
Source: [https://arxiv.org/html/2605.23043](https://arxiv.org/html/2605.23043)
###### Abstract

Agentic text\-simulation systems write in sequence, with each item becoming possible context for later steps\. That makes uncertainty path\-dependent: an early ambiguity can affect later outputs\. This paper studies this problem with HawkesLLM, a framework that separates temporal influence modeling from text generation\. We represent the cascade as a network whose nodes are text\-generating agents\. A multivariate Hawkes process models how these nodes activate over time and which earlier node outputs should influence later prompts\. A language model then writes each new event from the compact memory selected by this temporal model\. We evaluate the framework on a held\-out Global Database of Events, Language, and Tone \(GDELT\) news\-cascade case study\. The diagnostics track semantic alignment with local held\-out references and separate local drift from global drift\. In this setting, HawkesLLM improves late\-stage semantic alignment under a compact prompt\-memory budget\.

## 1Introduction

Many text\-simulation workflows unfold as multi\-step processes\. A system writes an item, keeps it in memory, and later writes with that memory in view\. Here,*agentic*refers to this iterative use of generated memory: each output can become part of the next prompt\. In such systems, uncertainty belongs to the trajectory, not just to one response\. Ambiguity, drift, and mismatch can carry forward because later prompts depend on earlier generated content\.

Recent work on uncertainty in language\-agent systems and semantic uncertainty in free\-form generation motivates this view\(Zhaoet al\.,[2025](https://arxiv.org/html/2605.23043#bib.bib13); Hanet al\.,[2024](https://arxiv.org/html/2605.23043#bib.bib14); Liuet al\.,[2024a](https://arxiv.org/html/2605.23043#bib.bib15); Farquharet al\.,[2024](https://arxiv.org/html/2605.23043#bib.bib16); Linet al\.,[2023](https://arxiv.org/html/2605.23043#bib.bib17)\)\. If generated outputs become future context, uncertainty should be tracked across the sequence rather than only at the end\.

We study this problem as*semantic uncertainty propagation*: how generated events stay, or fail to stay, near local held\-out reference neighborhoods as the generated history grows\. The references serve as semantic anchors from the same event stream\. They let us ask whether the simulation remains in the same local region over time\.

We propose HawkesLLM for this setting\. The simulation is a cascade over text\-generating nodes\. A Hawkes process models when nodes activate and which earlier node outputs should matter for the next prompt\. The language model then writes the next event from this selected memory\. We use*memory selection*for the choice of node\-specific predecessor texts and weights\. This design is also motivated by evidence that long\-context language models do not always use all provided context reliably\(Liuet al\.,[2024b](https://arxiv.org/html/2605.23043#bib.bib23)\)\.

We demonstrate the framework on a GDELT news\-cascade case study\. The news domain is useful here because it has timestamped event streams that can be mapped to a small set of node categories\. Held\-out articles from the same topic window serve as imperfect local semantic references for generated events\.

We make three contributions\. First, we formulate semantic uncertainty propagation for iterative text simulation\. Second, we develop HawkesLLM as a Hawkes\-guided transition that samples event times and nodes and constructs compact node\-wise prompt memory\. Third, we evaluate it on the held\-out news\-cascade case study, where HawkesLLM improves late\-stage alignment under a compact prompt\-memory budget\.

## 2Related Work

One line of work asks how uncertainty should be handled inside multi\-step agent systems\(Zhaoet al\.,[2025](https://arxiv.org/html/2605.23043#bib.bib13); Hanet al\.,[2024](https://arxiv.org/html/2605.23043#bib.bib14); Liuet al\.,[2024a](https://arxiv.org/html/2605.23043#bib.bib15)\)\. In that setting, uncertainty often becomes a control signal: the agent may call a tool, ask for help, or calibrate its answer\. Here the object is the generated text path\. Earlier generations become part of the state that later generations read, so we track how semantic uncertainty moves along the realized cascade\.

Semantic\-uncertainty and black\-box confidence methods usually compare possible answers to the same free\-form generation problem\(Farquharet al\.,[2024](https://arxiv.org/html/2605.23043#bib.bib16); Linet al\.,[2023](https://arxiv.org/html/2605.23043#bib.bib17)\)\. Drift work looks at what happens after generation is repeated for many steps\(Spataruet al\.,[2024](https://arxiv.org/html/2605.23043#bib.bib18); Mohamedet al\.,[2025](https://arxiv.org/html/2605.23043#bib.bib19)\)\. We take the path itself as the object of evaluation: each generated event is compared with local held\-out reporting, while the drift metrics separate short\-range agreement with prompt memory from longer\-range movement away from the seed\.

Graph\-cascade models give us a fixed graph, seed events, and a cascade unfolding over that graph\(Kempeet al\.,[2015](https://arxiv.org/html/2605.23043#bib.bib24); Kleinberg,[2007](https://arxiv.org/html/2605.23043#bib.bib25)\)\. In our setting, each activation also produces text\. We use a Hawkes process for the temporal layer because it turns past activations into directed, decaying influence scores between nodes\(Hawkes,[1971](https://arxiv.org/html/2605.23043#bib.bib1); Ogata,[1981](https://arxiv.org/html/2605.23043#bib.bib2); Rizoiuet al\.,[2017](https://arxiv.org/html/2605.23043#bib.bib20)\)\. Neural temporal point processes and recent temporal\-language hybrids can model richer dynamics or next\-event prediction\(Mei and Eisner,[2017](https://arxiv.org/html/2605.23043#bib.bib21); Zuoet al\.,[2020](https://arxiv.org/html/2605.23043#bib.bib22); Liet al\.,[2025](https://arxiv.org/html/2605.23043#bib.bib9); Liu and He,[2025](https://arxiv.org/html/2605.23043#bib.bib10); Zhouet al\.,[2025](https://arxiv.org/html/2605.23043#bib.bib12)\)\. The simpler parametric Hawkes model is useful here because its fitted node\-to\-node influence can be exposed as an inspectable memory signal for the LLM prompt\. In that sense, HawkesLLM sits between LLM\-based social or agent simulation\(Parket al\.,[2023](https://arxiv.org/html/2605.23043#bib.bib7); Sunet al\.,[2024](https://arxiv.org/html/2605.23043#bib.bib6); Murdocket al\.,[2023](https://arxiv.org/html/2605.23043#bib.bib5); Zhanget al\.,[2025](https://arxiv.org/html/2605.23043#bib.bib11)\)and retrieval\-augmented generation\(Lewiset al\.,[2020](https://arxiv.org/html/2605.23043#bib.bib8)\): the state is a sequence of generated event texts, and the retrieved memory is driven by temporal cascade dynamics\.

## 3Methods

HawkesLLM has two layers\. The first is a text\-simulation loop: a node is scheduled, a compact memory is selected, and the LLM writes the next event\. The second is a Hawkes process that supplies the schedule and memory weights\. Section[3\.1](https://arxiv.org/html/2605.23043#S3.SS1)defines the generic loop\. Section[3\.2](https://arxiv.org/html/2605.23043#S3.SS2)introduces the multivariate Hawkes model for temporal influence among nodes\. Section[3\.3](https://arxiv.org/html/2605.23043#S3.SS3)then plugs that model into the loop\.

### 3\.1Agentic Text Simulation Framework

We model text propagation on a fixed directed graph\(Kempeet al\.,[2015](https://arxiv.org/html/2605.23043#bib.bib24); Kleinberg,[2007](https://arxiv.org/html/2605.23043#bib.bib25)\)\. A node is a text\-generating agent in the simulation, and an event is a timestamped activation of one node together with the text generated at that activation\. Let𝒢0=\(𝒩,ℰ\)\\mathcal\{G\}\_\{0\}=\(\\mathcal\{N\},\\mathcal\{E\}\)denote the unweighted influence graph\. Here𝒩=\{1,…,N\}\\mathcal\{N\}=\\\{1,\\ldots,N\\\}is the node set, andℰ⊆𝒩×𝒩\\mathcal\{E\}\\subseteq\\mathcal\{N\}\\times\\mathcal\{N\}is the set of allowed directed influences\. We assumeℰ=𝒩×𝒩\\mathcal\{E\}=\\mathcal\{N\}\\times\\mathcal\{N\}, so every node can influence every other node, including itself\. Any node may be activated at a generation step, and any earlier activated node can contribute memory when allowed byℰ\\mathcal\{E\}\. Section[3\.2](https://arxiv.org/html/2605.23043#S3.SS2)describes how we learn the effective edge weights from data\.

Generation starts from a seed evente0=\(τ0,n0,x0\)e\_\{0\}=\(\\tau\_\{0\},n\_\{0\},x\_\{0\}\), whereτ0\\tau\_\{0\}is the seed timestamp,n0∈𝒩n\_\{0\}\\in\\mathcal\{N\}is the seed node, andx0x\_\{0\}is the seed text\. The realized cascade is the event history that grows from this seed over𝒢0\\mathcal\{G\}\_\{0\}\. At each later step, the current node receives a small set of earlier node outputs as prompt memory and generates a new text\. We usettfor the current generation step andmmfor a previous event\. The generated history before stepttis

ℋt=\{em=\(τm,nm,xm\):0≤m<t\},\\mathcal\{H\}\_\{t\}=\\\{e\_\{m\}=\(\\tau\_\{m\},n\_\{m\},x\_\{m\}\):0\\leq m<t\\\},whereτm\\tau\_\{m\}is the timestamp,nm∈𝒩n\_\{m\}\\in\\mathcal\{N\}is the node, andxmx\_\{m\}is the text of eventmm\. Letk≥1k\\geq 1be the maximum number of node representatives allowed in the prompt\. A memory policyπ\\pimaps the history, next timestamp, and next node to compact weighted memory,ℳt=π​\(ℋt,τt,nt\)\\mathcal\{M\}\_\{t\}=\\pi\(\\mathcal\{H\}\_\{t\},\\tau\_\{t\},n\_\{t\}\)\. Each memory item keeps one representative from one node\. If nodejjis retained for steptt, thenrt​\(j\)r\_\{t\}\(j\)denotes its latest previous generated event beforeτt\\tau\_\{t\}\. The memory has the form

ℳt⊆\{\(j,rt​\(j\),wj,t\):j∈𝒩,rt​\(j\)<t,wj,t\>0\}\.\\mathcal\{M\}\_\{t\}\\subseteq\\\{\(j,r\_\{t\}\(j\),w\_\{j,t\}\):j\\in\\mathcal\{N\},\\ r\_\{t\}\(j\)<t,\\ w\_\{j,t\}\>0\\\}\.Herewj,tw\_\{j,t\}is the weight assigned to nodejj, and\|ℳt\|≤k\|\\mathcal\{M\}\_\{t\}\|\\leq k\. Whenℳt≠∅\\mathcal\{M\}\_\{t\}\\neq\\emptyset, the selected weights satisfy

∑\(j,rt​\(j\),wj,t\)∈ℳtwj,t=1\.\\sum\_\{\(j,r\_\{t\}\(j\),w\_\{j,t\}\)\\in\\mathcal\{M\}\_\{t\}\}w\_\{j,t\}=1\.Memory selection means choosing which node representatives, and which weights, enter the next prompt\.

The text transition is then explicit\. Letaia\_\{i\}denote the fixed instruction string for nodeii\. Givenτt\\tau\_\{t\},ntn\_\{t\}, andℳt\\mathcal\{M\}\_\{t\}, the prompt is

pt=Prompt\(\\displaystyle p\_\{t\}=\\operatorname\{Prompt\}\\\!\\big\(\(τt,nt\),ant,\\displaystyle\(\\tau\_\{t\},n\_\{t\}\),\\ a\_\{n\_\{t\}\},\{\(wj,t,j,xrt​\(j\)\):\(j,rt\(j\),wj,t\)∈ℳt\}\)\.\\displaystyle\\\{\(w\_\{j,t\},j,x\_\{r\_\{t\}\(j\)\}\):\(j,r\_\{t\}\(j\),w\_\{j,t\}\)\\in\\mathcal\{M\}\_\{t\}\\\}\\big\)\.
HerePrompt⁡\(⋅\)\\operatorname\{Prompt\}\(\\cdot\)denotes deterministic formatting\. In our implementation, the prompt states the target node, gives a short node\-style instruction, and lists the selected predecessor texts with their node labels and normalized Hawkes weights\. The weights are written as text annotations, so they guide the model through the prompt rather than acting as numerical controls\. Appendix[A](https://arxiv.org/html/2605.23043#A1)gives a representative instantiated prompt\.

Whenℳt=∅\\mathcal\{M\}\_\{t\}=\\emptyset, the prompt contains only the current event information and node instruction\. LetgLLMg\_\{\\mathrm\{LLM\}\}denote the text generator\. The next text is generated as

xt∼gLLM\(⋅∣pt\),x\_\{t\}\\sim g\_\{\\mathrm\{LLM\}\}\(\\cdot\\mid p\_\{t\}\),or deterministically asxt=gLLM​\(pt\)x\_\{t\}=g\_\{\\mathrm\{LLM\}\}\(p\_\{t\}\)when the decoding seed is fixed\. The semantic propagation update appends the completed event:

et=\(τt,nt,xt\),ℋt\+1=ℋt∪\{et\}\.e\_\{t\}=\(\\tau\_\{t\},n\_\{t\},x\_\{t\}\),\\qquad\\mathcal\{H\}\_\{t\+1\}=\\mathcal\{H\}\_\{t\}\\cup\\\{e\_\{t\}\\\}\.Only the textxtx\_\{t\}is produced by the language model at steptt\. The timestamp, node, and memoryℳt\\mathcal\{M\}\_\{t\}determine the prompt that conditions it\. Figure[1](https://arxiv.org/html/2605.23043#S3.F1)summarizes this loop\.

![Refer to caption](https://arxiv.org/html/2605.23043v1/hawkesllm_framework.png)Figure 1:Sequential agentic uncertainty loop in HawkesLLM\. The generated historyℋt\\mathcal\{H\}\_\{t\}is the agent state; the Hawkes process selects weighted prompt memoryℳt\\mathcal\{M\}\_\{t\}, the LLM generatesxtx\_\{t\}, and the completed eventete\_\{t\}is appended to the history\. Semantic alignment and local/global drift track trajectory\-level uncertainty\.
### 3\.2Multivariate Hawkes Point Process

We use a multivariate Hawkes point process as the temporal influence model\(Hawkes,[1971](https://arxiv.org/html/2605.23043#bib.bib1); Ogata,[1981](https://arxiv.org/html/2605.23043#bib.bib2); Rizoiuet al\.,[2017](https://arxiv.org/html/2605.23043#bib.bib20)\)\. It estimates when each node is likely to activate and how much past activity from one node raises the future rate of another\. For fitting, let𝒟=\{\(τm,nm\)\}m=1M\\mathcal\{D\}=\\\{\(\\tau\_\{m\},n\_\{m\}\)\\\}\_\{m=1\}^\{M\}denote an observed node\-time event stream over horizon\[0,T\]\[0,T\]\. HereMMis the number of observed events\. Eventmmoccurs at timeτm∈\[0,T\]\\tau\_\{m\}\\in\[0,T\]and belongs to nodenm∈𝒩n\_\{m\}\\in\\mathcal\{N\}\. In this subsection,iiindexes the node whose rate is being evaluated, andjjindexes a node with a previous event\. At continuous timess, letλi​\(s\)\\lambda\_\{i\}\(s\)denote the conditional event rate for nodeii\. Letμi\\mu\_\{i\}denote its background rate\. For each edge\(j,i\)∈ℰ\(j,i\)\\in\\mathcal\{E\}, letϕj,i​\(u\)\\phi\_\{j,i\}\(u\)denote the excitation kernel at lagu\>0u\>0\. It describes how much a past event from nodejjraises the future rate of nodeii\. The event stream is modeled as

λi​\(s\)=μi\+∑j:\(j,i\)∈ℰ∑τm<s,nm=jϕj,i​\(s−τm\)\.\\lambda\_\{i\}\(s\)=\\mu\_\{i\}\+\\sum\_\{j:\(j,i\)\\in\\mathcal\{E\}\}\\sum\_\{\\tau\_\{m\}<s,\\,n\_\{m\}=j\}\\phi\_\{j,i\}\(s\-\\tau\_\{m\}\)\.Our main model uses an exponential kernel\. Hereαj,i≥0\\alpha\_\{j,i\}\\geq 0is the directed excitation strength, andβ\>0\\beta\>0is the decay rate:

ϕj,i​\(u\)=αj,i​e−β​u\.\\phi\_\{j,i\}\(u\)=\\alpha\_\{j,i\}e^\{\-\\beta u\}\.Largerβ\\betamakes influence fade faster\. For the exponential kernel, the integrated excitation matrix𝐆\\mathbf\{G\}has entriesGj,i=∫0∞ϕj,i​\(u\)​𝑑u=αj,i/βG\_\{j,i\}=\\int\_\{0\}^\{\\infty\}\\phi\_\{j,i\}\(u\)\\,du=\\alpha\_\{j,i\}/\\betafor\(j,i\)∈ℰ\(j,i\)\\in\\mathcal\{E\}, andGj,i=0G\_\{j,i\}=0otherwise\. Stability is assessed through the spectral radiusρ​\(𝐆\)\\rho\(\\mathbf\{G\}\)\.

The weighted influence graph is𝒢=\(𝒩,ℰ,𝐆\)\\mathcal\{G\}=\(\\mathcal\{N\},\\mathcal\{E\},\\mathbf\{G\}\)\. The unweighted graph𝒢0\\mathcal\{G\}\_\{0\}determines which influences are allowed, whileGj,iG\_\{j,i\}gives the fitted cumulative excitation from nodejjto nodeii\. Later, when building prompt memory, HawkesLLM turns this fitted influence into node\-wise scores by applying temporal decay to earlier generated events from each node\.

For a fixed decay valueβ\\beta, we estimate\(𝝁,𝜶\)\(\\boldsymbol\{\\mu\},\\boldsymbol\{\\alpha\}\)\. Here𝝁=\(μi\)i∈𝒩\\boldsymbol\{\\mu\}=\(\\mu\_\{i\}\)\_\{i\\in\\mathcal\{N\}\}and𝜶=\(αj,i\)\(j,i\)∈ℰ\\boldsymbol\{\\alpha\}=\(\\alpha\_\{j,i\}\)\_\{\(j,i\)\\in\\mathcal\{E\}\}\. Letλi​\(s;𝝁,𝜶,β\)\\lambda\_\{i\}\(s;\\boldsymbol\{\\mu\},\\boldsymbol\{\\alpha\},\\beta\)denote the corresponding conditional intensity\. The log\-likelihood term is

ℓβ​\(𝝁,𝜶;𝒟\)\\displaystyle\\ell\_\{\\beta\}\(\\boldsymbol\{\\mu\},\\boldsymbol\{\\alpha\};\\mathcal\{D\}\)=∑m=1Mlog⁡λnm​\(τm;𝝁,𝜶,β\)\\displaystyle=\\sum\_\{m=1\}^\{M\}\\log\\lambda\_\{n\_\{m\}\}\(\\tau\_\{m\};\\boldsymbol\{\\mu\},\\boldsymbol\{\\alpha\},\\beta\)−∑i∈𝒩∫0Tλi​\(s;𝝁,𝜶,β\)​𝑑s\.\\displaystyle\\quad\-\\sum\_\{i\\in\\mathcal\{N\}\}\\int\_\{0\}^\{T\}\\lambda\_\{i\}\(s;\\boldsymbol\{\\mu\},\\boldsymbol\{\\alpha\},\\beta\)\\,ds\.The fitted parameters solve

\(𝝁^,𝜶^\)∈arg⁡max𝝁≥0,𝜶≥0⁡\{ℓβ​\(𝝁,𝜶;𝒟\)−η​Ω​\(𝜶\)\}\.\(\\hat\{\\boldsymbol\{\\mu\}\},\\hat\{\\boldsymbol\{\\alpha\}\}\)\\in\\arg\\max\_\{\\boldsymbol\{\\mu\}\\geq 0,\\,\\boldsymbol\{\\alpha\}\\geq 0\}\\left\\\{\\ell\_\{\\beta\}\(\\boldsymbol\{\\mu\},\\boldsymbol\{\\alpha\};\\mathcal\{D\}\)\-\\eta\\Omega\(\\boldsymbol\{\\alpha\}\)\\right\\\}\.The termΩ​\(𝜶\)\\Omega\(\\boldsymbol\{\\alpha\}\)is a shrinkage penalty on excitation strengths, andη≥0\\eta\\geq 0controls the penalty strength\. We select among fixed decay values by likelihood subject to stability and use the best stable exponential fit in the held\-out simulation\. Appendix[B](https://arxiv.org/html/2605.23043#A2)reports fitting details\.

### 3\.3Hawkes\-Based Semantic Propagation

We now combine the Hawkes model with the text\-generation loop\. Lete0=\(τ0,n0,x0\)e\_\{0\}=\(\\tau\_\{0\},n\_\{0\},x\_\{0\}\)be the seed event\. The algorithm then producesLLnon\-seed eventse1,…,eLe\_\{1\},\\ldots,e\_\{L\}\.

Before steptt, the historyℋt=\{em=\(τm,nm,xm\):0≤m<t\}\\mathcal\{H\}\_\{t\}=\\\{e\_\{m\}=\(\\tau\_\{m\},n\_\{m\},x\_\{m\}\):0\\leq m<t\\\}contains the seed and all earlier generated events\. Its node\-time projection is𝒯t=\{\(τm,nm\):em∈ℋt\}\\mathcal\{T\}\_\{t\}=\\\{\(\\tau\_\{m\},n\_\{m\}\):e\_\{m\}\\in\\mathcal\{H\}\_\{t\}\\\}\. Given the fitted Hawkes parameters\(𝝁^,𝜶^,β^\)\(\\hat\{\\boldsymbol\{\\mu\}\},\\hat\{\\boldsymbol\{\\alpha\}\},\\hat\{\\beta\}\), the conditional intensity for nodeiiat candidate timerris evaluated from𝒯t\\mathcal\{T\}\_\{t\}and written asλi​\(r∣𝒯t\)\\lambda\_\{i\}\(r\\mid\\mathcal\{T\}\_\{t\}\)\. At each step, the Hawkes process samplesτt\\tau\_\{t\}andntn\_\{t\}, constructsℳt\\mathcal\{M\}\_\{t\}, and passes the resulting prompt to the LLM\.

Algorithm[1](https://arxiv.org/html/2605.23043#alg1)gives the full HawkesLLM simulation loop\. Event sampling produces the next timestamp and node, the memory policy producesℳt\\mathcal\{M\}\_\{t\}, and the final lines generatextx\_\{t\}and appendete\_\{t\}to the history\.

Algorithm 1HawkesLLM semantic propagation loop1:Input:seed

e0=\(τ0,n0,x0\)e\_\{0\}=\(\\tau\_\{0\},n\_\{0\},x\_\{0\}\), graph

𝒢0=\(𝒩,ℰ\)\\mathcal\{G\}\_\{0\}=\(\\mathcal\{N\},\\mathcal\{E\}\), budget

kk, length

LL, fitted Hawkes model

\(𝝁^,𝜶^,β^\)\(\\hat\{\\boldsymbol\{\\mu\}\},\\hat\{\\boldsymbol\{\\alpha\}\},\\hat\{\\beta\}\), thresholds

\(ϵraw,ϵnorm\)\(\\epsilon\_\{\\mathrm\{raw\}\},\\epsilon\_\{\\mathrm\{norm\}\}\), node instructions

\{ai:i∈𝒩\}\\\{a\_\{i\}:i\\in\\mathcal\{N\}\\\}, text generator

gLLMg\_\{\\mathrm\{LLM\}\}
2:Output:simulated history

ℋL\+1\\mathcal\{H\}\_\{L\+1\}
3:

ℋ1←\{e0\}\\mathcal\{H\}\_\{1\}\\leftarrow\\\{e\_\{0\}\\\}
4:for

t=1,…,Lt=1,\\ldots,Ldo

5:

𝒯t←\{\(τm,nm\):em∈ℋt\}\\mathcal\{T\}\_\{t\}\\leftarrow\\\{\(\\tau\_\{m\},n\_\{m\}\):e\_\{m\}\\in\\mathcal\{H\}\_\{t\}\\\}
6:

s←τt−1s\\leftarrow\\tau\_\{t\-1\}
7:Event sampling

8:

Λt​\(r\)←∑i∈𝒩λi​\(r∣𝒯t\)\\Lambda\_\{t\}\(r\)\\leftarrow\\sum\_\{i\\in\\mathcal\{N\}\}\\lambda\_\{i\}\(r\\mid\\mathcal\{T\}\_\{t\}\)
9:sample

τt\\tau\_\{t\}by thinning from time

ssusing

Λt\\Lambda\_\{t\}
10:sample

nt=in\_\{t\}=iwith probability

λi​\(τt∣𝒯t\)/Λt​\(τt\)\\lambda\_\{i\}\(\\tau\_\{t\}\\mid\\mathcal\{T\}\_\{t\}\)/\\Lambda\_\{t\}\(\\tau\_\{t\}\)
11:Memory policy

12:

𝒥t←\{j∈𝒩:\(j,nt\)∈ℰ,∃m<t​with​nm=j,τm<τt\}\\mathcal\{J\}\_\{t\}\\leftarrow\\\{j\\in\\mathcal\{N\}:\(j,n\_\{t\}\)\\in\\mathcal\{E\},\\ \\exists m<t\\text\{ with \}n\_\{m\}=j,\\ \\tau\_\{m\}<\\tau\_\{t\}\\\}
13:

rt​\(j\)←max⁡\{m<t:nm=j,τm<τt\}r\_\{t\}\(j\)\\leftarrow\\max\\\{m<t:n\_\{m\}=j,\\ \\tau\_\{m\}<\\tau\_\{t\}\\\}for

j∈𝒥tj\\in\\mathcal\{J\}\_\{t\}
14:

hj,t←∑m<t:nm=j,τm<τtexp⁡\[−β^​\(τt−τm\)\]h\_\{j,t\}\\leftarrow\\sum\_\{m<t:\\ n\_\{m\}=j,\\ \\tau\_\{m\}<\\tau\_\{t\}\}\\exp\[\-\\hat\{\\beta\}\(\\tau\_\{t\}\-\\tau\_\{m\}\)\]for

j∈𝒥tj\\in\\mathcal\{J\}\_\{t\}
15:

qj,t←α^j,nt​hj,tq\_\{j,t\}\\leftarrow\\hat\{\\alpha\}\_\{j,n\_\{t\}\}h\_\{j,t\}for

j∈𝒥tj\\in\\mathcal\{J\}\_\{t\}
16:

Qt←∑ℓ∈𝒥tqℓ,tQ\_\{t\}\\leftarrow\\sum\_\{\\ell\\in\\mathcal\{J\}\_\{t\}\}q\_\{\\ell,t\}
17:if

Qt=0Q\_\{t\}=0then

18:

ℐt←∅\\mathcal\{I\}\_\{t\}\\leftarrow\\emptyset
19:else

20:

q¯j,t←qj,t/Qt\\bar\{q\}\_\{j,t\}\\leftarrow q\_\{j,t\}/Q\_\{t\}for

j∈𝒥tj\\in\\mathcal\{J\}\_\{t\}
21:

𝒥~t←\{j∈𝒥t:qj,t≥ϵraw,q¯j,t≥ϵnorm\}\\widetilde\{\\mathcal\{J\}\}\_\{t\}\\leftarrow\\\{j\\in\\mathcal\{J\}\_\{t\}:q\_\{j,t\}\\geq\\epsilon\_\{\\mathrm\{raw\}\},\\ \\bar\{q\}\_\{j,t\}\\geq\\epsilon\_\{\\mathrm\{norm\}\}\\\}
22:

ℐt←TopKk⁡\(𝒥~t;qj,t\)\\mathcal\{I\}\_\{t\}\\leftarrow\\operatorname\{TopK\}\_\{k\}\(\\widetilde\{\\mathcal\{J\}\}\_\{t\};\\ q\_\{j,t\}\)
23:endif

24:if

ℐt=∅\\mathcal\{I\}\_\{t\}=\\emptysetthen

25:

ℳt←∅\\mathcal\{M\}\_\{t\}\\leftarrow\\emptyset
26:else

27:

wj,t←qj,t/∑ℓ∈ℐtqℓ,tw\_\{j,t\}\\leftarrow q\_\{j,t\}/\\sum\_\{\\ell\\in\\mathcal\{I\}\_\{t\}\}q\_\{\\ell,t\}for

j∈ℐtj\\in\\mathcal\{I\}\_\{t\}
28:

ℳt←\{\(j,rt​\(j\),wj,t\):j∈ℐt\}\\mathcal\{M\}\_\{t\}\\leftarrow\\\{\(j,r\_\{t\}\(j\),w\_\{j,t\}\):j\\in\\mathcal\{I\}\_\{t\}\\\}
29:endif

30:Text update

31:

pt←Prompt⁡\(\(τt,nt\),ant,\{\(wj,t,j,xrt​\(j\)\):\(j,rt​\(j\),wj,t\)∈ℳt\}\)p\_\{t\}\\leftarrow\\operatorname\{Prompt\}\(\(\\tau\_\{t\},n\_\{t\}\),a\_\{n\_\{t\}\},\\\{\(w\_\{j,t\},j,x\_\{r\_\{t\}\(j\)\}\):\(j,r\_\{t\}\(j\),w\_\{j,t\}\)\\in\\mathcal\{M\}\_\{t\}\\\}\)
32:sample

xt∼gLLM\(⋅∣pt\)x\_\{t\}\\sim g\_\{\\mathrm\{LLM\}\}\(\\cdot\\mid p\_\{t\}\)
33:

et←\(τt,nt,xt\)e\_\{t\}\\leftarrow\(\\tau\_\{t\},n\_\{t\},x\_\{t\}\),

ℋt\+1←ℋt∪\{et\}\\mathcal\{H\}\_\{t\+1\}\\leftarrow\\mathcal\{H\}\_\{t\}\\cup\\\{e\_\{t\}\\\}
34:endfor

35:return

ℋL\+1\\mathcal\{H\}\_\{L\+1\}

#### 3\.3\.1Event Simulation by Thinning

The event\-sampling block of Algorithm[1](https://arxiv.org/html/2605.23043#alg1)samples the next timestamp and node from the fitted multivariate Hawkes process using Ogata\-style thinning\(Ogata,[1981](https://arxiv.org/html/2605.23043#bib.bib2)\)\. It first extracts𝒯t\\mathcal\{T\}\_\{t\}, the event\-time and node projection ofℋt\\mathcal\{H\}\_\{t\}\. Att=1t=1, this projection contains only the seed time and seed node\. The total intensity isΛt​\(s\)=∑i∈𝒩λi​\(s∣𝒯t\)\\Lambda\_\{t\}\(s\)=\\sum\_\{i\\in\\mathcal\{N\}\}\\lambda\_\{i\}\(s\\mid\\mathcal\{T\}\_\{t\}\)\. Starting from the previous event time, thinning proposes a waiting time from an exponential distribution with rateΛ¯\\bar\{\\Lambda\}, whereΛ¯\\bar\{\\Lambda\}is a local upper bound onΛt\\Lambda\_\{t\}\. A proposal times~\\tilde\{s\}is accepted with probabilityΛt​\(s~\)/Λ¯\\Lambda\_\{t\}\(\\tilde\{s\}\)/\\bar\{\\Lambda\}\. After acceptance, we setτt=s~\\tau\_\{t\}=\\tilde\{s\}and sample the nodent=in\_\{t\}=iwith probability proportional toλi​\(τt∣𝒯t\)\\lambda\_\{i\}\(\\tau\_\{t\}\\mid\\mathcal\{T\}\_\{t\}\)\.

This sampling step uses only event times and nodes\. The history texts\{xm:em∈ℋt\}\\\{x\_\{m\}:e\_\{m\}\\in\\mathcal\{H\}\_\{t\}\\\}enter only afterτt\\tau\_\{t\}andntn\_\{t\}are sampled, when the memory policy chooses predecessor texts for the LLM prompt\.

#### 3\.3\.2Hawkes Memory Policy

The memory\-policy block of Algorithm[1](https://arxiv.org/html/2605.23043#alg1)constructsℳt\\mathcal\{M\}\_\{t\}for the sampled timestamp and node\. We aggregate Hawkes contributions at the node level\. The Hawkes state still uses all earlier event times through exponential decay, but the text prompt contains at most one representative text per retained node: the latest generated text from that node\.

For each eligible nodejj, define

rt​\(j\)=max⁡\{m<t:nm=j,τm<τt\},r\_\{t\}\(j\)=\\max\\\{m<t:n\_\{m\}=j,\\ \\tau\_\{m\}<\\tau\_\{t\}\\\},when such an event exists\. The node\-wise decayed state is

hj,t=∑m<t:nm=j,τm<τtexp⁡\[−β^​\(τt−τm\)\]\.h\_\{j,t\}=\\sum\_\{m<t:\\ n\_\{m\}=j,\\ \\tau\_\{m\}<\\tau\_\{t\}\}\\exp\[\-\\hat\{\\beta\}\(\\tau\_\{t\}\-\\tau\_\{m\}\)\]\.The node\-wise Hawkes contribution toward the current nodentn\_\{t\}is then

qj,t=α^j,nt​hj,t\.q\_\{j,t\}=\\hat\{\\alpha\}\_\{j,n\_\{t\}\}h\_\{j,t\}\.The score is larger when nodejjhas stronger learned excitation toward the current node and when its recent events have not decayed away\.

Let

𝒥t=\{j∈𝒩:\(j,nt\)∈ℰ,rt​\(j\)​is defined\}\.\\mathcal\{J\}\_\{t\}=\\\{j\\in\\mathcal\{N\}:\(j,n\_\{t\}\)\\in\\mathcal\{E\},\\ r\_\{t\}\(j\)\\text\{ is defined\}\\\}\.LetQt=∑ℓ∈𝒥tqℓ,tQ\_\{t\}=\\sum\_\{\\ell\\in\\mathcal\{J\}\_\{t\}\}q\_\{\\ell,t\}\. IfQt=0Q\_\{t\}=0, the selected set is empty\. Otherwise, define the normalized contributionq¯j,t=qj,t/Qt\\bar\{q\}\_\{j,t\}=q\_\{j,t\}/Q\_\{t\}\. The policy removes negligible nodes using a raw\-score thresholdϵraw\\epsilon\_\{\\mathrm\{raw\}\}and a normalized\-contribution thresholdϵnorm\\epsilon\_\{\\mathrm\{norm\}\}:

𝒥~t=\{j∈𝒥t:qj,t≥ϵraw,q¯j,t≥ϵnorm\}\.\\widetilde\{\\mathcal\{J\}\}\_\{t\}=\\left\\\{j\\in\\mathcal\{J\}\_\{t\}:q\_\{j,t\}\\geq\\epsilon\_\{\\mathrm\{raw\}\},\\ \\bar\{q\}\_\{j,t\}\\geq\\epsilon\_\{\\mathrm\{norm\}\}\\right\\\}\.It then keeps at mostkknodes with the largest remaining scores:

ℐt=TopKk⁡\(𝒥~t;qj,t\)\.\\mathcal\{I\}\_\{t\}=\\operatorname\{TopK\}\_\{k\}\(\\widetilde\{\\mathcal\{J\}\}\_\{t\};\\ q\_\{j,t\}\)\.Whenℐt≠∅\\mathcal\{I\}\_\{t\}\\neq\\emptyset, the prompt\-memory weights are the selected scores normalized to sum to one:

wj,t=qj,t∑ℓ∈ℐtqℓ,t,j∈ℐt\.w\_\{j,t\}=\\frac\{q\_\{j,t\}\}\{\\sum\_\{\\ell\\in\\mathcal\{I\}\_\{t\}\}q\_\{\\ell,t\}\},\\qquad j\\in\\mathcal\{I\}\_\{t\}\.The Hawkes memory policy returns

ℳt=\{\(j,rt​\(j\),wj,t\):j∈ℐt\}\.\\mathcal\{M\}\_\{t\}=\\\{\(j,r\_\{t\}\(j\),w\_\{j,t\}\):j\\in\\mathcal\{I\}\_\{t\}\\\}\.Whenℐt=∅\\mathcal\{I\}\_\{t\}=\\emptyset, we setℳt=∅\\mathcal\{M\}\_\{t\}=\\emptyset, and the prompt uses only the current event information and the style instruction for the current node\. The top\-kkstep is an engineering constraint for compact prompts; the Hawkes process itself remains the temporal influence model\. The filtering thresholds are implementation parameters\.

The text\-update block then assemblesptp\_\{t\}, samplesxtx\_\{t\}, and appendsete\_\{t\}\. Thus the Hawkes process controls event timing, current node, and selected memory, while the language model verbalizes the next event\.

## 4Experimental Setting

### 4\.1Data and Event Stream

We use Global Database of Events, Language, and Tone \(GDELT\) article metadata\(Leetaru and Schrodt,[2013](https://arxiv.org/html/2605.23043#bib.bib3)\)\. We select a recent event window forArtemis IIcoverage using the query terms"Artemis II"and"Artemis 2"\. The working window spans April 1–11, 2026 UTC\. GDELT timeline metadata indicate a rough scale of 10,525 matching articles over the window\. The modeling dataset uses a capped sample of 250 article records\. After URL/title deduplication, the final event stream contains 248 English\-language events over approximately 263 hours\. Tied timestamps are broken by small within\-group offsets so that point\-process fitting and chronological train/test splitting are well defined\. The retained fields are timestamp, article domain, node, language, title, and URL\.

Nodes are defined as hand\-curated outlet categories, not as semantic topic labels\. Table[1](https://arxiv.org/html/2605.23043#S4.T1)reports the final five\-node grouping\.

Table 1:Node categories and event counts in the modeling dataset\.This grouping is a hand\-curated, task\-specific media taxonomy for the Artemis II window\. Representative domains include local broadcast affiliates forlocal\_tv, tabloid or regional commercial outlets formass\_market, science and space outlets forspecialist\_science\_tech, finance outlets forbusiness\_finance, and a residualgeneral\_newscategory for the remaining domains\.

##### Generation setup\.

The text generator is Qwen2\.5 run through a local Ollama backend,ollama:qwen2\.5:latest\(Bai and others,[2024](https://arxiv.org/html/2605.23043#bib.bib4)\), decoded with temperature0\.350\.35, top\-p0\.90\.9, and at most 75 new tokens\. This generator is shared across node agents; the node label and node instruction determine the role being simulated at each step\. Appendix[C](https://arxiv.org/html/2605.23043#A3)gives the remaining generation settings\.

##### Seed event\.

Seed choice depends on the experimental setting\. For the held\-out train/test evaluation reported in Tables[2](https://arxiv.org/html/2605.23043#S5.T2)–[3](https://arxiv.org/html/2605.23043#S5.T3), each post\-split simulation starts from the last training event and is evaluated over the subsequent test window, so no test\-set title is used to initialize generation\. For the full\-data illustrative simulation and the qualitative example in Table[4](https://arxiv.org/html/2605.23043#S5.T4), we use the earliest observed Artemis II title as the seed:

*Moon rocket and weather are on NASA side for the first astronaut launch in decades\.*

In both settings, the seed suppliese0=\(τ0,n0,x0\)e\_\{0\}=\(\\tau\_\{0\},n\_\{0\},x\_\{0\}\)and anchors the generated trajectory before the Hawkes process samples subsequent timestamp/node pairs\.

### 4\.2Baselines

We compare HawkesLLM memory selection against two simple heuristic baselines\. Both reuse the same event sequence, prompt format, text generator, and evaluation pipeline, changing only the predecessor\-selection rule:

- •Chronological last\-kk: thekkmost recent past generated events with uniform weights\.
- •Random\-kk:kkuniformly sampled past generated events with uniform weights\.

Unless otherwise stated, all methods usek=3k=3\.

### 4\.3Train\-Test Split and Matching

For the held\-out evaluation, we split the 248\-event dataset chronologically into train \(198 events\) and test \(50 events\)\. The Hawkes process is refit on train only using a stable exponential specification\. Simulation is then constrained to the held\-out test horizon\. Appendix[D](https://arxiv.org/html/2605.23043#A4)gives split counts and matching details\.

For each generated non\-seed event, we find same\-node real test events within±12\\pm 12hours\. If none exist, we relax to±24\\pm 24hours\. In total, 62 generated non\-seed events are evaluated across three post\-split runs\. All 62 find a match, with 58 primary\-window matches and 4 relaxed\-window matches\.

### 4\.4Evaluation Diagnostics

We reserve held\-out texts for evaluation\. Newsroom choice, outlet access, framing, and repetition across outlets all affect which article is actually written at a given time, so we evaluate local semantic agreement rather than exact continuation prediction\. For a generated eventet=\(τt,nt,xt\)e\_\{t\}=\(\\tau\_\{t\},n\_\{t\},x\_\{t\}\), letℛt\\mathcal\{R\}\_\{t\}be the matched set of real test\-set titles from the same local region of the cascade\. In this GDELT case study, those references are held\-out Artemis II article titles\.

Whenℛt≠∅\\mathcal\{R\}\_\{t\}\\neq\\emptyset, we embed the generated text and reference texts with the evaluation embedding function𝐳​\(⋅\)\\mathbf\{z\}\(\\cdot\)\. In the reported experiments,𝐳\\mathbf\{z\}is computed with the same local Ollama backend used by the saved real\-vs\-similarity and drift outputs,ollama:qwen2\.5:latest\(Bai and others,[2024](https://arxiv.org/html/2605.23043#bib.bib4)\)\. This makes the evaluation reproducible in the local pipeline, while keeping the metric at the level of a semantic\-neighborhood diagnostic; independent embedding backends, human judgments, and factuality checks are natural extensions\. Semantic alignment is the cosine similarity between the generated\-text embedding and the average reference embedding:

St=cos⁡\(𝐳​\(xt\),1\|ℛt\|​∑r∈ℛt𝐳​\(r\)\),S\_\{t\}=\\cos\\\!\\left\(\\mathbf\{z\}\(x\_\{t\}\),\\frac\{1\}\{\|\\mathcal\{R\}\_\{t\}\|\}\\sum\_\{r\\in\\mathcal\{R\}\_\{t\}\}\\mathbf\{z\}\(r\)\\right\),where higher values indicate closer agreement with the local held\-out set\. If\|ℛt\|=1\|\\mathcal\{R\}\_\{t\}\|=1, this is simply cosine similarity betweenxtx\_\{t\}and the single matched reference text\. We report meanStS\_\{t\}, temporal trend, and late\-stage behavior, summarized by averagingStS\_\{t\}over the final 20% of matched simulated events within each run\.

We also decompose uncertainty into global and local drift\. Letx0x\_\{0\}be the seed text\. Whenℳt≠∅\\mathcal\{M\}\_\{t\}\\neq\\emptyset, let

𝐳¯t=∑\(j,rt​\(j\),wj,t\)∈ℳtwj,t​𝐳​\(xrt​\(j\)\)\\bar\{\\mathbf\{z\}\}\_\{t\}=\\sum\_\{\(j,r\_\{t\}\(j\),w\_\{j,t\}\)\\in\\mathcal\{M\}\_\{t\}\}w\_\{j,t\}\\mathbf\{z\}\(x\_\{r\_\{t\}\(j\)\}\)be the weighted predecessor centroid from the memory policy\. We define

Dtglobal\\displaystyle D\_\{t\}^\{\\text\{global\}\}=1−cos⁡\(𝐳​\(xt\),𝐳​\(x0\)\),\\displaystyle=1\-\\cos\(\\mathbf\{z\}\(x\_\{t\}\),\\mathbf\{z\}\(x\_\{0\}\)\),Dtlocal\\displaystyle D\_\{t\}^\{\\text\{local\}\}=1−cos⁡\(𝐳​\(xt\),𝐳¯t\)\.\\displaystyle=1\-\\cos\(\\mathbf\{z\}\(x\_\{t\}\),\\bar\{\\mathbf\{z\}\}\_\{t\}\)\.Global drift measures distance from the seed text\. Local drift measures distance from the weighted predecessor memory that directly conditioned the current step\. Local drift is undefined when no valid predecessor prompt memory exists\.

## 5Results

### 5\.1Semantic Alignment Over the Generated Trajectory

Table[2](https://arxiv.org/html/2605.23043#S5.T2)reports held\-out semantic alignment under the matched compact prompt\-memory budget \(k=3k=3\)\. Late\-stageStS\_\{t\}is the average over the final 20% of matched simulated events, as defined in Section[4\.4](https://arxiv.org/html/2605.23043#S4.SS4)\. In this setting, HawkesLLM has the highest mean and late\-stage alignment, with the largest separation near the tail of the generated trajectory\.

Table 2:Held\-out semantic alignmentStS\_\{t\}under the matched compact prompt\-memory budgetk=3k=3\. Higher values are better\.The trend column is computed directly fromStS\_\{t\}over simulated time\. HawkesLLM is the only method in this comparison whose semantic alignment increases over the generated trajectory\. Chronological and random memory selection decrease\. Figure[2](https://arxiv.org/html/2605.23043#S5.F2)shows the corresponding trajectory\-level semantic\-alignment curves\.

These comparisons are descriptive case\-study evidence from a limited held\-out sample with dependent generated events\. They should be read as diagnostics for this GDELT window, not as a broad benchmark claim\.

![Refer to caption](https://arxiv.org/html/2605.23043v1/main_similarity_over_time_comparison.png)Figure 2:Sequential semantic alignmentStS\_\{t\}over simulated time for HawkesLLM, chronological last\-kk, and random\-kkon the held\-out test window\. Panels show matched generated events and a 5\-event moving average; higher values indicate closer alignment to the local held\-out reference set\. The seed event comes from the training window, so the plotted curves begin with matched generated events in the test window\.
### 5\.2Effect of Prompt\-Memory Budget

The parameterkkcontrols prompt\-memory size\. For HawkesLLM, it truncates a weighted influence structure; for the baselines, it directly sets how many predecessor events are provided\. Table[3](https://arxiv.org/html/2605.23043#S5.T3)reports thekk\-sensitivity comparison using semantic alignmentStS\_\{t\}\.

Table 3:Prompt\-memory budget sensitivity using semantic alignmentStS\_\{t\}in a representative run\. Higher is better\.HawkesLLM changes little askkincreases because most events use fewer than three meaningful weighted neighbors; extra slots rarely add useful prompt memory\.

The heuristic baselines can benefit from larger prompt memory\. Chronological last\-kkatk=7k=7achieves higher mean semantic alignment than HawkesLLM in this representative run, but its advantage fades later in the trajectory; its late\-stageStS\_\{t\}remains below that of HawkesLLM\. The takeaway is narrower but useful: HawkesLLM is strongest when prompt memory is compact and late\-stage behavior matters\. A broader comparison should aggregate thiskk\-sensitivity over repeated runs and match methods by token budget as well as by item count\.

### 5\.3Local and Global Components of Sequential Uncertainty

Across repeated simulations, the same drift pattern appears\. In the post\-split setting, the average mean global drift is0\.450±0\.0190\.450\\pm 0\.019, while the average mean local drift is0\.185±0\.0720\.185\\pm 0\.072\. Global drift exceeds local drift in all runs\.

These runs show generated cascades accumulating global uncertainty while staying locally stable: the trajectory can remain close to its immediate weighted prompt memory and still move away from the seed over longer horizons\. Individual runs need not be monotone; the repeated post\-split runs consistently show global drift above local drift\. Figure[3](https://arxiv.org/html/2605.23043#S5.F3)illustrates this separation over a representative HawkesLLM run\.

![Refer to caption](https://arxiv.org/html/2605.23043v1/main_drift_over_time_hawkes.png)Figure 3:Global and local drift over a representative HawkesLLM run\. Global drift is measured relative to the seed text, while local drift is measured relative to the weighted prompt memory\.Figure[4](https://arxiv.org/html/2605.23043#S5.F4)breaks the drift diagnostic down by node\. The global/local separation is visible across most node categories\.

![Refer to caption](https://arxiv.org/html/2605.23043v1/x1.png)Figure 4:Node\-conditioned mean global and local drift across HawkesLLM simulations\. Drift varies by node\.
### 5\.4Qualitative Propagation Example

Table[4](https://arxiv.org/html/2605.23043#S5.T4)shows an illustrative trajectory generated by HawkesLLM\. Rather than listing every event, we select representative events that show how the generated cascade changes framing across source types\. The trajectory begins with a general\-news seed about favorable launch conditions, then shifts toward technical mission\-status language, business implications, local community framing, and eventually more public\-facing launch narratives\. This example is qualitative only, but it illustrates the behavior measured by the quantitative diagnostics: the sequence stays within the Artemis II semantic region while its emphasis changes across nodes and over time\.

Table 4:Illustrative HawkesLLM\-generated cascade\. We show selected representative events from one run to highlight how source type changes the framing of the same evolving Artemis II topic\.

## 6Conclusion

This paper introduced HawkesLLM, a framework for monitoring semantic uncertainty propagation in iterative text\-simulation systems\. The Hawkes process exposes node\-wise influence scores that determine which generated memories enter future prompts, so uncertainty can be tracked over the full trajectory\. In the GDELT case study, HawkesLLM yields stronger late\-stage semantic alignment under a compact prompt\-memory budget\. Local/global drift diagnostics show how uncertainty can accumulate globally even when individual steps remain locally stable\.

The gain is specific rather than sweeping\. The Qwen generator already has a strong prior for plausible Artemis II text; HawkesLLM adds structure around which node memories enter the prompt\. Its role is to make node\-wise information flow and uncertainty monitoring explicit around the generator\. Although we study news\-style event propagation, the same formulation can be used for social\-media narratives, multi\-step agent interactions, and iterative workflows where earlier generated text becomes later context\.

The study is diagnostic\. It uses a sampled GDELT article list, title\-level text, a hand\-curated node taxonomy, and a held\-out test set whosespecialist\_science\_techcoverage is sparse\. The generator can also introduce artifacts, including occasional mixed\-language outputs\. Because the reported embedding metric uses the same Qwen/Ollama model family as the generator, the next evaluation step is to pair these diagnostics with independent embedding backends, human judgments, and factuality checks\. Richer event texts, additional domains, and calibrated threshold rules would help determine when semantic monitoring is reliable enough for operational use\.

## Impact Statement

This paper presents a research baseline for studying semantic uncertainty in multi\-step text simulation\. The same machinery that makes plausible news\-style trajectories useful for analysis also creates misuse risks\. We use the framework here as a diagnostic research instrument; deployment would require content safeguards, provenance controls, and separate reliability validation\.

## References

- J\. Baiet al\.\(2024\)Qwen2\.5: we’re all flying together\.arXiv preprint arXiv:2412\.15115\.Cited by:[§4\.1](https://arxiv.org/html/2605.23043#S4.SS1.SSS0.Px1.p1.2),[§4\.4](https://arxiv.org/html/2605.23043#S4.SS4.p2.3)\.
- S\. Farquhar, J\. Kossen, L\. Kuhn, and Y\. Gal \(2024\)Detecting hallucinations in large language models using semantic entropy\.Nature630\(8017\),pp\. 625–630\.External Links:[Document](https://dx.doi.org/10.1038/s41586-024-07421-0)Cited by:[§1](https://arxiv.org/html/2605.23043#S1.p2.1),[§2](https://arxiv.org/html/2605.23043#S2.p2.1)\.
- J\. Han, W\. Buntine, and E\. Shareghi \(2024\)Towards uncertainty\-aware language agent\.InFindings of the Association for Computational Linguistics: ACL 2024,pp\. 6662–6685\.Cited by:[§1](https://arxiv.org/html/2605.23043#S1.p2.1),[§2](https://arxiv.org/html/2605.23043#S2.p1.1)\.
- A\. G\. Hawkes \(1971\)Spectra of some self\-exciting and mutually exciting point processes\.Biometrika58,pp\. 83–90\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1),[§3\.2](https://arxiv.org/html/2605.23043#S3.SS2.p1.17)\.
- D\. Kempe, J\. Kleinberg, and É\. Tardos \(2015\)Maximizing the spread of influence through a social network\.Theory of Computing11\(4\),pp\. 105–147\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1),[§3\.1](https://arxiv.org/html/2605.23043#S3.SS1.p1.5)\.
- J\. Kleinberg \(2007\)Cascading behavior in networks: algorithmic and economic issues\.InAlgorithmic Game Theory,pp\. 613–632\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1),[§3\.1](https://arxiv.org/html/2605.23043#S3.SS1.p1.5)\.
- K\. Leetaru and P\. A\. Schrodt \(2013\)GDELT: global data on events, location, and tone, 1979–2012\.InISA Annual Convention,pp\. 1–49\.Cited by:[§4\.1](https://arxiv.org/html/2605.23043#S4.SS1.p1.1)\.
- P\. Lewis, E\. Perez, A\. Piktus, F\. Petroni, V\. Karpukhin, N\. Goyal, H\. Kuttler, M\. Lewis, W\. Yih, T\. Rocktaschel, S\. Riedel, and D\. Kiela \(2020\)Retrieval\-augmented generation for knowledge\-intensive NLP tasks\.InAdvances in Neural Information Processing Systems,Vol\.33,pp\. 9459–9474\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.
- Y\. Li, Z\. Lu, F\. Tang, S\. Lai, M\. Hu, Y\. Zhang, H\. Xue, Z\. Wu, I\. Razzak, Q\. Li, and J\. Su \(2025\)Rhythm of opinion: a hawkes\-graph framework for dynamic propagation analysis\.arXiv preprint arXiv:2502\.04567\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.
- Z\. Lin, S\. Trivedi, and J\. Sun \(2023\)Generating with confidence: uncertainty quantification for black\-box large language models\.arXiv preprint arXiv:2305\.19187\.Cited by:[§1](https://arxiv.org/html/2605.23043#S1.p2.1),[§2](https://arxiv.org/html/2605.23043#S2.p2.1)\.
- H\. Liu, Z\. Dou, Y\. Wang, N\. Peng, and Y\. Yue \(2024a\)Uncertainty calibration for tool\-using language agents\.InFindings of the Association for Computational Linguistics: EMNLP 2024,pp\. 16781–16805\.Cited by:[§1](https://arxiv.org/html/2605.23043#S1.p2.1),[§2](https://arxiv.org/html/2605.23043#S2.p1.1)\.
- N\. F\. Liu, K\. Lin, J\. Hewitt, A\. Paranjape, M\. Bevilacqua, F\. Petroni, and P\. Liang \(2024b\)Lost in the middle: how language models use long contexts\.Transactions of the Association for Computational Linguistics12,pp\. 157–173\.External Links:[Document](https://dx.doi.org/10.1162/tacl%5Fa%5F00638)Cited by:[§1](https://arxiv.org/html/2605.23043#S1.p4.1)\.
- Z\. Liu and Y\. He \(2025\)TPP\-llm: modeling temporal point processes by efficiently fine\-tuning large language models\.arXiv preprint arXiv:2502\.10867\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.
- H\. Mei and J\. M\. Eisner \(2017\)The neural hawkes process: a neurally self\-modulating multivariate point process\.InAdvances in Neural Information Processing Systems,Vol\.30,pp\. 6754–6764\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.
- A\. Mohamed, M\. Geng, M\. Vazirgiannis, and G\. Shang \(2025\)LLM as a broken telephone: iterative generation distorts information\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 7493–7509\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p2.1)\.
- I\. Murdock, K\. M\. Carley, and O\. Yagan \(2023\)An agent\-based model of reddit interactions and moderation\.InProceedings of the 2023 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining,pp\. 195–202\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.
- Y\. Ogata \(1981\)On lewis’ simulation method for point processes\.IEEE Transactions on Information Theory27\(1\),pp\. 23–31\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1),[§3\.2](https://arxiv.org/html/2605.23043#S3.SS2.p1.17),[§3\.3\.1](https://arxiv.org/html/2605.23043#S3.SS3.SSS1.p1.12)\.
- J\. S\. Park, J\. C\. O’Brien, C\. J\. Cai, M\. R\. Morris, P\. Liang, and M\. S\. Bernstein \(2023\)Generative agents: interactive simulacra of human behavior\.InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology,UIST ’23\.External Links:[Document](https://dx.doi.org/10.1145/3586183.3606763)Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.
- M\. Rizoiu, Y\. Lee, S\. Mishra, and L\. Xie \(2017\)A tutorial on hawkes processes for events in social media\.InFrontiers of Multimedia Research,pp\. 191–218\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1),[§3\.2](https://arxiv.org/html/2605.23043#S3.SS2.p1.17)\.
- A\. Spataru, E\. Hambro, E\. Voita, and N\. Cancedda \(2024\)Know when to stop: a study of semantic drift in text generation\.InProceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies \(Volume 1: Long Papers\),pp\. 3656–3671\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p2.1)\.
- G\. Sun, Y\. Wang, D\. Niyato, J\. Wang, X\. Wang, H\. V\. Poor, and K\. B\. Letaief \(2024\)Decoding echo chambers: llm\-powered simulations revealing polarization in social networks\.arXiv preprint arXiv:2408\.05123\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.
- L\. Zhang, Y\. Hu, W\. Li, Q\. Bai, and P\. Nand \(2025\)LLM\-aidsim: llm\-enhanced agent\-based influence diffusion simulation in social networks\.Systems13\(1\),pp\. 29\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.
- Q\. Zhao, D\. Li, Y\. Liu, W\. Cheng, Y\. Sun, M\. Oishi, T\. Osaki, K\. Matsuda, H\. Yao, C\. Zhao, H\. Chen, and X\. Zhao \(2025\)Uncertainty propagation on LLM agent\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics \(Volume 1: Long Papers\),pp\. 6064–6073\.Cited by:[§1](https://arxiv.org/html/2605.23043#S1.p2.1),[§2](https://arxiv.org/html/2605.23043#S2.p1.1)\.
- F\. Zhou, Q\. Kong, J\. Qiao, C\. Wan, Y\. Zhang, and R\. Cai \(2025\)Advances in temporal point processes: bayesian, neural, and llm approaches\.arXiv preprint arXiv:2502\.17528\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.
- S\. Zuo, H\. Jiang, Z\. Li, T\. Zhao, and H\. Zha \(2020\)Transformer hawkes process\.InProceedings of the 37th International Conference on Machine Learning,Proceedings of Machine Learning Research, Vol\.119,pp\. 11692–11702\.Cited by:[§2](https://arxiv.org/html/2605.23043#S2.p3.1)\.

## Appendix AInstantiated Prompt Example

The prompt format is fixed across methods\. HawkesLLM changes only which node\-labeled predecessor texts and weights enter the memory block\. The example below shows one instantiated prompt\. The weights appear as text annotations, so they provide relative\-importance cues rather than hard numerical controls inside the language model\.

Prompt: Instantiated ExampleRole\.You are simulating one item in a Hawkes\-driven cross\-node news cascade\.Task\.Write exactly one concise English sentence or headline\-like update\.Constraints\.Use only the weighted predecessor texts below; do not mention weights, simulations, models, or prompts\. Do not copy predecessor wording verbatim\. Preserve the core Artemis II subject while allowing natural semantic drift\.Target node\.local\_tvNode style\.Write like a local TV news web update: clear, public\-facing, practical, and locally relatable\.Simulated time since seed\.15\.04 hoursWeighted predecessor context\.1\.weight=0\.81; node=local\_tv;text=*Artemis II crew making great strides in their preparations for the mission, according to latest updates from NASA\. Local teams are keeping a close eye on their progress and wish them all the best for this historic journey\.*2\.weight=0\.19; node=general\_news;text=*Crew aboard Artemis II continues to prepare for historic space mission, overcoming initial hurdles as NASA reports smooth progress\.*Output\.Only the generated news item\.

## Appendix BHawkes Fitting Details

The full\-data exponential grid included eight fixed decay values from 1/72 to 1/6 per hour\. The best\-overall exponential fit hadβ=0\.1667\\beta=0\.1667andρ​\(𝐆\)=1\.0535\\rho\(\\mathbf\{G\}\)=1\.0535, so it was not used for simulation\. The best stable exponential fit hadβ=0\.0833\\beta=0\.0833,ρ​\(𝐆\)=0\.8708\\rho\(\\mathbf\{G\}\)=0\.8708, log\-likelihood−556\.821\-556\.821, AIC1175\.6411175\.641, and BIC1284\.5581284\.558\. The Gaussian\-basis comparison used truncated normal bases centered at 6, 24, and 72 hours with standard deviations 4, 10, and 20 hours, respectively; it achievedρ​\(𝐆\)=0\.6836\\rho\(\\mathbf\{G\}\)=0\.6836and log\-likelihood−569\.693\-569\.693\.

For the chronological 80/20 train/test evaluation, Hawkes was refit on train only\. The resulting stable exponential model hasβ=0\.1667\\beta=0\.1667andρ​\(𝐆\)=0\.8467\\rho\(\\mathbf\{G\}\)=0\.8467\. That train\-only refit was used for the held\-out uncertainty evaluation\.

## Appendix CGeneration Settings

Table[5](https://arxiv.org/html/2605.23043#A3.T5)reports the decoding and prompt\-memory settings used in the simulations\.

Table 5:Generation and prompt\-memory settings\.The held\-out simulations start from the last training event and then run over the test window\. The full\-data exploratory run used the earliest observed Artemis II title as its seed:

> Moon rocket and weather are on NASA side for the first astronaut launch in decades\.

## Appendix DHeld\-Out Evaluation Details

The chronological 80/20 split yields 198 train events and 50 test events\. Table[6](https://arxiv.org/html/2605.23043#A4.T6)reports the per\-node counts\.

Table 6:Per\-node train/test counts under the chronological split\.Across three post\-split runs, 62 generated non\-seed events were evaluated\. All 62 found same\-node matches, with 58 within±12\\pm 12hours and 4 using the±24\\pm 24\-hour fallback\. The similarity curves begin at the first matched generated test\-window event, not at the training\-side seed\.

##### Additional drift summaries\.

In full\-data repeated runs, the same global/local separation appears: average mean global drift is0\.440±0\.1000\.440\\pm 0\.100, and average mean local drift is0\.153±0\.0490\.153\\pm 0\.049\.

Similar Articles

SwanNLP at SemEval-2026 Task 5: An LLM-based Framework for Plausibility Scoring in Narrative Word Sense Disambiguation

arXiv cs.CL

SwanNLP presents an LLM-based framework for plausibility scoring in narrative word sense disambiguation at SemEval-2026 Task 5, using structured reasoning and dynamic few-shot prompting to predict human-perceived plausibility of word senses in short stories. The work demonstrates that commercial large-parameter LLMs with few-shot prompting and model ensembling effectively replicate human judgment patterns in realistic narrative contexts.

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

arXiv cs.AI

Researchers from the University of Michigan introduce MechSim, a mechanism-grounded neuro-symbolic reasoning framework that enables LLM agents to reason about the internal assumptions, dependencies, and execution behavior of scientific simulators rather than treating them as black boxes. The framework improves explanation quality and decision-making reliability across high-stakes domains like healthcare, finance, and public policy.