CogGuard: Cognitive and Operational Profiling for Proactive Warning in Edge Intelligent Services
Summary
CogGuard is a proactive-warning framework for edge intelligent services that decouples offline LLM-based profile construction from online SLM-based score prediction, reducing construction time by 48% and fine-tuning time by 19% while achieving lower prediction errors on education and operation datasets.
View Cached Full Text
Cached at: 06/16/26, 11:44 AM
# CogGuard: Cognitive and Operational Profiling for Proactive Warning in Edge Intelligent Services
Source: [https://arxiv.org/html/2606.15199](https://arxiv.org/html/2606.15199)
Zhi Yao2,1, Weihao Chen3, Zhiqing Tang1,🖂, Hanshuai Cui2,1, Qianli Ma1,3, Weijia Jia1,4,🖂, Wei Zhao5This work is supported in part by the National Natural Science Foundation of China \(NSFC\) under Grant 62272050 and Grant 62302048; in part by the Guangdong Key Lab of AI and Multi\-modal Data Processing, Beijing Normal\-Hong Kong Baptist University, Zhuhai under 2023\-2024 Grants sponsored by Guangdong Provincial Department of Education; in part by Institute of Artificial Intelligence and Future Networks and Engineering Center of AI and Future Education, Guangdong Provincial Department of Science and Technology, China; in part by Zhuhai Science\-Tech Innovation Bureau under Grant No\. 2320004002772, and in part by the Interdisciplinary Intelligence Super Computer Center of Beijing Normal University at Zhuhai\.\(Corresponding author: Zhiqing Tang and Weijia Jia\.\)
###### Abstract
Proactive warning is an important capability for edge intelligent services, where the system predicts whether a subject will successfully complete an incoming task under strict latency and privacy constraints\. Such prediction depends on both long\-term static attributes and short\-term dynamic states derived from historical interaction logs\. Recent Large Language Models \(LLMs\) offer strong long\-context reasoning for constructing structured profiles from these logs, but existing solutions face two challenges for edge deployment: \(1\) profiling methods are typically domain\-specific and lack a reusable abstraction across service scenarios, and \(2\) fine\-tuning alignment models on heterogeneous edge clusters incurs high synchronization overhead due to the variance in input sequence lengths\. To address these challenges, we propose CogGuard, a proactive\-warning framework for edge intelligent services\. CogGuard decouples offline LLM\-based profile construction from online Small Language Model \(SLM\)\-based score prediction through a shared static\-dynamic profile\-to\-score pipeline, and instantiates it in two representative scenarios: educational performance warning and operational task outcome warning\. For efficient profile construction, we design scenario\-specific profiling methods with prefix\-aligned KV\-cache reuse to reduce repeated encoding overhead\. For edge\-side model alignment, we propose a length\-aware distributed fine\-tuning strategy with contrastive regularization to mitigate workload imbalance on heterogeneous clusters\. Experiments on education and operation datasets show that CogGuard reduces profile construction time by up to 48% and distributed fine\-tuning time by 19%, while achieving MAEs of 13\.4 and 5\.9, respectively, on 100\-point\-scale warning tasks\. In the largest educational setting, CogGuard reduces prediction error by 15\.4% compared with the strongest baseline\.
## IIntroduction
Artificial Intelligence \(AI\) is transforming high\-stakes service domains such as educational tutoring and industrial operations\. In these high\-frequency interactive scenarios, Large Language Models \(LLMs\) are evolving from instruction execution tools to decision\-making agents with cognitive analysis capabilities\[[1](https://arxiv.org/html/2606.15199#bib.bib45)\]\. To deliver personalized behavioral analysis and pre\-warning, LLMs must construct accurate user cognitive profiles from massive, unstructured interaction logs and identify the underlying causes of errors\[[28](https://arxiv.org/html/2606.15199#bib.bib46),[40](https://arxiv.org/html/2606.15199#bib.bib38)\]\. However, traditional cloud\-based AI services cannot meet the data privacy and real\-time response requirements of these scenarios\.
Edge intelligence addresses these limitations by offloading model computation from the remote cloud to edge servers, providing lower\-latency services while better protecting user data\[[22](https://arxiv.org/html/2606.15199#bib.bib39),[35](https://arxiv.org/html/2606.15199#bib.bib47)\]\. In this context, proactive warning is becoming an important capability of edge intelligent services, where the system predicts whether a subject will successfully complete an incoming task under strict latency and privacy constraints\. Such prediction depends on both long\-term static attributes and short\-term dynamic states derived from historical interactions\.
However, building a proactive warning service at the edge faces two obstacles\. First, constructing accurate profiles from massive historical logs is computationally intensive\[[25](https://arxiv.org/html/2606.15199#bib.bib40)\], yet the resulting multi\-level profiles can only be fully interpreted by high\-parameter models that are unsuitable for edge deployment\[[39](https://arxiv.org/html/2606.15199#bib.bib42),[32](https://arxiv.org/html/2606.15199#bib.bib48)\]\. Second, while fine\-tuned Small Language Models \(SLMs\) offer a lightweight alternative for edge inference, distributed fine\-tuning on heterogeneous edge clusters suffers from synchronization bottlenecks that existing methods have not adequately addressed\[[9](https://arxiv.org/html/2606.15199#bib.bib43),[3](https://arxiv.org/html/2606.15199#bib.bib51)\]\. Cognitive and operational profiling for proactive warning in edge intelligent services is therefore an urgent need, but two challenges remain\.
The first challenge is how to rapidly and reliably construct structured profiles from time\-varying subject states for edge proactive warning services\.Proactive warning requires accurately capturing the impact of both external task entities and users’ internal cognitive levels on task outcomes\[[14](https://arxiv.org/html/2606.15199#bib.bib44)\], a task that requires complex and redundant model computations for cognitive analysis\. Existing methods map historical features onto knowledge graphs and use the constructed structured profiles during subsequent services\[[30](https://arxiv.org/html/2606.15199#bib.bib37)\]\. However, traditional graph construction methods extract all entity relationships without discrimination, weakening the representation quality, and subsequent graph clustering also leads to high latency\[[41](https://arxiv.org/html/2606.15199#bib.bib41)\]\. Moreover, existing profiling methods are often domain\-specific and lack a semantic interface that can be seamlessly integrated into downstream predictive services\. This leads to substantial overhead in re\-engineering the alignment pipeline for each new scenario, making it difficult to deploy generalizable proactive\-warning services on heterogeneous edge clusters\.
The second challenge is how to mitigate the synchronization bottleneck caused by workload heterogeneity in distributed edge fine\-tuning\.Fine\-tuning SLMs on resource\-constrained edge servers is time\-consuming, and the input sequence lengths of different historical logs vary widely, leading to computational imbalance across training servers\. Existing distributed training approaches reduce cost by adaptively allocating batch sizes based on hardware heterogeneity\[[9](https://arxiv.org/html/2606.15199#bib.bib43)\], but these hardware\-centric methods schedule workloads solely according to device computing power without accounting for the variance in training sample complexity\. In particular, the high variance of user log sequence lengths causes severe straggler effects, where faster nodes idle while waiting for slower ones that process longer sequences\. Efficiently coordinating distributed fine\-tuning under such input\-level heterogeneity on edge clusters remains an open problem\.
Figure 1:Overview of CogGuard\.To address these challenges, we propose CogGuard, as shown in Fig\.[1](https://arxiv.org/html/2606.15199#S1.F1), a proactive\-warning framework for edge intelligent services that combines structured profiling with efficient distributed model fine\-tuning\. CogGuard adopts a large\-small model collaboration approach to balance reasoning capability and edge inference latency\. High\-performance LLMs handle the complex offline extraction of structured profiles, while fine\-tuned SLMs are deployed on resource\-constrained edge nodes for real\-time proactive warning\. The framework operates in two phases, scenario\-specific profiling that constructs structured service contexts from historical logs, and profile\-to\-score alignment that fine\-tunes SLMs with length\-aware scheduling and contrastive regularization\. Experiments in educational and operational scenarios demonstrate the effectiveness of the proposed pipeline in two representative proactive\-warning settings\[[19](https://arxiv.org/html/2606.15199#bib.bib33)\]\.
The central design principle of CogGuard is to decouple expensive scenario\-specific profile construction from lightweight shared profile\-to\-score alignment\. The former allows each service scenario to use its own profiling logic, while the latter provides a reusable prediction interface\. To make this alignment robust under input and state heterogeneity, CogGuard further introduces structured marker tokens, length\-aware training partitioning, and state\-sensitivity regularization\. Our main contributions are summarized as follows\.
1. 1\.Problem Formulation\.We formulate proactive warning in edge intelligent services as a static\-dynamic profile\-to\-score prediction problem, where task outcomes depend jointly on long\-term subject attributes, time\-varying states, and the current task context\.
2. 2\.Structured Profiling\.We propose a scenario\-specific profiling mechanism\. For educational logs, a dual\-graph cognitive profiling method decouples user tracking into an entity relationship graph and a knowledge concept graph, with a prefix\-aligned KV cache reuse strategy to reduce repeated long\-context encoding overhead\. For operational logs, static hardware and dynamic runtime profiles are constructed through controlled chaos testing\.
3. 3\.Efficient Alignment\.We introduce a length\-aware distributed fine\-tuning strategy that partitions training data according to input sequence lengths to mitigate the straggler problem on heterogeneous edge clusters, and propose a contrastive regularization objective that prevents the model from relying on static problem content rather than personalized profile states\.
4. 4\.Cross\-Scenario Validation\.We instantiate the framework in educational and operational edge service scenarios by constructing specialized datasets tailored to each profiling method\. The results show that the shared profile\-to\-score alignment pipeline can be effectively applied in two distinct proactive\-warning settings\.
The remainder of this paper is organized as follows\. Section[II](https://arxiv.org/html/2606.15199#S2)reviews related work\. Section[III](https://arxiv.org/html/2606.15199#S3)details our dataset construction methodology\. Section[IV](https://arxiv.org/html/2606.15199#S4)provides a detailed description of our method\. The implementation setting and experimental results are described in Section[V](https://arxiv.org/html/2606.15199#S5)\. Finally, Section[VI](https://arxiv.org/html/2606.15199#S6)concludes the paper\.
## IIRelated Work
### II\-AStructured Profiling and Graph RAG
Traditional knowledge tracing \(KT\) methods such as DKT\[[21](https://arxiv.org/html/2606.15199#bib.bib14)\]achieve high prediction accuracy but lack interpretability\. Recent works address this limitation by constructing cognitive graphs through LLMs\[[10](https://arxiv.org/html/2606.15199#bib.bib17),[13](https://arxiv.org/html/2606.15199#bib.bib16)\]\. For example,\[[31](https://arxiv.org/html/2606.15199#bib.bib15)\]uses LLM agents to construct cognitive prototypes and simulate students with diverse cognitive levels, but it relies on effective retrieval and cannot directly quantify a student’s mastery level\. Beyond educational scenarios, Graph RAG is also widely used in operational contexts\[[27](https://arxiv.org/html/2606.15199#bib.bib10),[29](https://arxiv.org/html/2606.15199#bib.bib11)\]\. For instance,\[[17](https://arxiv.org/html/2606.15199#bib.bib12)\]applies Graph RAG to fault diagnosis on high\-speed trains, and GNN\-RAG\[[16](https://arxiv.org/html/2606.15199#bib.bib13)\]achieves lightweight retrieval for multi\-hop inference tasks\. However, current Graph RAG methods still rely on complex graph clustering and community summary generation, which grow exponentially with the graph size, making them unsuitable for edge applications\[[5](https://arxiv.org/html/2606.15199#bib.bib18)\]\.
Moreover, existing approaches do not treat shared entity extraction and personalized state extraction as distinct profiles\. Our dual\-graph cognitive profiling method addresses this gap by decoupling entity\-level and knowledge\-level graphs\. The resulting profiles are internalized into the fine\-tuned model, eliminating the need for graph clustering and retrieval at inference time\.
### II\-BEfficient Training on Heterogeneous Clusters
Fine\-tuning SLMs on resource\-constrained edge servers is time\-consuming\. Distributed Data Parallel \(DDP\) and Ray\[[18](https://arxiv.org/html/2606.15199#bib.bib19)\]are widely adopted for parallelizing training across multiple nodes, but they require all workers to proceed in lockstep, leading to high synchronization overhead when hardware capabilities differ\. To reduce this overhead, HetPipe\[[20](https://arxiv.org/html/2606.15199#bib.bib20)\]pipelines layers across heterogeneous GPUs to overlap computation between fast and slow devices\. Similarly, Accpar\[[24](https://arxiv.org/html/2606.15199#bib.bib21)\]partitions computation graphs according to each device’s throughput and assigns proportional workloads\. Both methods effectively balance hardware heterogeneity, but they schedule work based solely on device computing power\.
However, fine\-tuning tasks exhibit greater input heterogeneity than pre\-training because user historical logs vary widely in sequence length\. This variance causes straggler effects even when hardware workloads are balanced, and existing hardware\-centric methods do not account for it\. Our method addresses this gap by introducing a length\-aware data partitioning strategy that groups samples by sequence length before distributing them across servers\.
## IIIDataset Curation
### III\-AEducational Scenario
We construct an educational dataset tailored for proactive warning from programming interaction logs\. To support structured profile construction in CogGuard, each sample must include three essential elements: a list of the main knowledge concepts involved, sufficient historical records for each student, and detailed problems, solutions, and scores for each record\. Standard knowledge tracing datasets\[[7](https://arxiv.org/html/2606.15199#bib.bib23),[12](https://arxiv.org/html/2606.15199#bib.bib22)\]contain only sparse interaction logs, lacking the rich textual context required for LLM\-based cognitive reasoning\. The prior LLM\-generated student datasets\[[31](https://arxiv.org/html/2606.15199#bib.bib15)\]remain limited in scale\.
We select C\+\+ programming as our evaluation scenario\[[33](https://arxiv.org/html/2606.15199#bib.bib24)\]\. The dataset is sourced from the online programming platform NowCoder111[https://ac\.nowcoder\.com/acm/problem/list/](https://ac.nowcoder.com/acm/problem/list/)\. The resulting dataset contains 10036 submission records covering 1076 distinct problems from up to 40 students\. This scale mirrors the typical size of a natural class in real\-world educational settings and represents a realistic workload for a single edge\-deployed warning node\. Each student interacts with multiple problems and may submit multiple attempts for the same problem\. We implement a three\-stage quality\-control pipeline\. \(1\)Data Cleaning\.We filter out excessive repeated submissions for the same issue to avoid model overfitting\. \(2\)Standardization\.We retain valid submission materials that reflect dynamic learning states, ensuring each sample includes the student’s solution, reference answers, and specific error types\. The target scores are derived from the platform and mapped into our prediction intervals\. \(3\)Structuring and Split\.The raw data consists of submission records in chronological order\. To prevent data leakage, we split the dataset by students rather than randomly by records and feed historical records sequentially to the LLM according to submission timestamps to track the dynamic cognitive evolution accurately\.
TABLE I:Distribution and Configuration of Fault Injection ScenariosCategoryConfigurationParametersCount \(%\)Networkpackets losspercent 50360 \(83\.5%\)packets corruptpercent 50packets duplicatepercent 50delay3000 msbandwidth limit5 MbpsInput Filefile renameNFS file read288 \(66\.8%\)replace data in fileCPUstress\-cpuworkers 10288 \(66\.8%\)load 50Disk I/Oadd\-payload write20G, 8 threads216 \(50\.1%\)read payload200G, 7 threads216 \(50\.1%\)Memorymemory stress25 GB216 \(50\.1%\)
### III\-BOperational Scenario
We construct an operations dataset for proactive task outcome warning under runtime disturbances\. Although mainstream orchestration platforms like Kubernetes \(K8s\) support automated deployment and resource management, their default scheduling mechanisms rely on static resource constraints and affinity rules, lacking awareness of runtime server disturbances\. As a result, task schedulers based on static configuration cannot cope with unpredictable dynamic failures\.
Instead of passively waiting for unpredictable failures, we employ a controlled chaos profiling strategy to actively probe system boundaries and stress\-test the servers\. This allows the proactive warning system to learn machine behaviors under high pressure before real failures occur\. Through this process, we map the hardware and runtime profiles of the target infrastructure\. The static hardware profilePHP\_\{H\}captures innate attributes such as CPU architecture, GPU capacity, and network bandwidth\. The dynamic runtime profilePRP\_\{R\}describes the injected fault status at the current time\[[15](https://arxiv.org/html/2606.15199#bib.bib25)\]\.
Figure 2:Overview of the proposed method\. Phase 1 constructs structured profiles via scenario\-specific methods: asynchronous dual\-graph construction for education and chaos injection for operations\. Phase 2 uses a heterogeneity\-aware orchestrator for fine\-tuning, and knowledge graph summary sensitivity is enhanced through contrastive regularization to optimize score prediction\.We use Chaos Mesh222[https://chaos\-mesh\.org/](https://chaos-mesh.org/)to inject up to six types of faults and 11 corresponding optional parameters into heterogeneous servers\. Table[I](https://arxiv.org/html/2606.15199#S3.T1)lists the specific fault types and their occurrence frequencies\. The dataset simulates varying degrees of system instability by injecting multiple concurrent faults into each executed task\. The number of concurrent faults ranges from 1 to 6 and peaks at 4 faults \(32\.7%\), indicating a moderate concentration around mid\-complexity scenarios\. Under these conditions, we execute mixed AI inference workloads \(including text\-to\-text and image\-to\-text tasks\) and record the execution results\.
## IVMethods
### IV\-AOverview
Before detailing the method, we formally define the proactive warning task\. Given a subjectu∈Uu\\in U\(such as a student or configurable server\) and its historical interaction logsLu=\{l1,l2,…,lt−1\}L\_\{u\}=\\\{l\_\{1\},l\_\{2\},\\ldots,l\_\{t\-1\}\\\}, the goal is to predict the performance scoreyty\_\{t\}of a new incoming taskqtq\_\{t\}in real time\.
CogGuard models proactive warning in different scenarios through a shared profile\-to\-score prediction abstraction\. We learn a lightweight mapping functionfθf\_\{\\theta\}parameterized by an edge\-deployed SLM:yt=fθ\(pa,pb,qt\)y\_\{t\}=f\_\{\\theta\}\(p\_\{a\},p\_\{b\},q\_\{t\}\), wherepap\_\{a\}andpbp\_\{b\}denote the structured context extracted fromLuL\_\{u\}\. In the educational scenario, they represent shared task entities and personalized knowledge states; in the operational scenario, they represent static hardware profiles and dynamic runtime faults\. This design allows the downstream alignment pipeline to be shared across scenarios while keeping the upstream profiling process scenario\-specific\. It is worth noting that CogGuard does not assume a fully unified upstream profiling process\. Instead, it provides a reusable downstream alignment interface that converts scenario\-specific profiles into a shared static\-dynamic profile\-to\-score prediction format\. Therefore, different services may instantiate their own profiling modules, while sharing the same alignment and warning pipeline\.
As illustrated in Fig\.[2](https://arxiv.org/html/2606.15199#S3.F2), the method operates in two phases: \(1\) Scenario\-Specific Structured Profiling, where we employ parallel asynchronous graph construction \(for students\) or chaos engineering fault injection \(for servers\) to extract the service context under causal constraints; and \(2\) Profile\-to\-Score Alignment, where we fine\-tune an SLM to predict scores based on the generated profile summaries\. To handle edge heterogeneity and textual bias, we introduce a length\-aware workload orchestrator and a contrastive regularization loss\.
### IV\-BScenario\-Specific Structured Profiling
We categorize behavioral subjects into two types based on whether their intrinsic characteristics are controllable\.
For human\-centric subjects such as students, performance is mainly influenced by problem content, familiarity with the problem, and mastery of relevant knowledge concepts\. Since the entities in a problem stem may appear in multiple problems, all students share a global entity graph while each student maintains an individual knowledge graph\. For configurable servers, task outcomes are mainly affected by task content, hardware configuration, and runtime status\. Since servers are machines rather than humans, we can directly model the machine state through chaos experiments and skip graph construction\.
Human\-Centric Subjects\.To characterize the impact of problem entities and students’ intrinsic knowledge on problem\-solving outcomes, we propose an asynchronous dual\-graph construction method that builds the entity graphEGE\_\{G\}and student knowledge graphKGK\_\{G\}in parallel\. Unlike traditional Graph RAG, our method only requires the summarized content generated by graph construction to build student profiles\[[6](https://arxiv.org/html/2606.15199#bib.bib29),[2](https://arxiv.org/html/2606.15199#bib.bib28)\]\. This decoupling can model two types of prediction: predicting the solutions of the same problem for multiple students’ profiles and predicting the solutions of different problems for one profile\[[34](https://arxiv.org/html/2606.15199#bib.bib26),[37](https://arxiv.org/html/2606.15199#bib.bib27)\]\. By dispatchingEGE\_\{G\}andKGK\_\{G\}construction to independent queues, we decouple problem semantic analysis from dynamic knowledge assessment, thereby maximizing system throughput\. This dual\-graph decoupling also simplifies complex reasoning tasks\. Since LLMs often struggle to distinguish between objective problem entities and subjective knowledge concepts within lengthy prompts, separating them improves interpretability and enables the cache prefix optimization described below\.
TheEGE\_\{G\}is a globally shared structure where the interaction data from all students are aggregated into a single unified graph\. Since entity concepts are universal, a shared topology allows us to aggregate collective familiarity, where low familiarity indicates high difficulty or novelty across the student population\[[23](https://arxiv.org/html/2606.15199#bib.bib30)\]\. TheKGK\_\{G\}is a student\-specific profile that maintains complete data isolation for each individual\.
We design a prefix\-aware KV cache reuse mechanism to exploit the shared interaction context across solutions\. By standardizing the prompt template so that shared fields appear at the prefix, we enable inference engines to reuse KV caches for the same problem\. This reduces the FLOPs required for redundant context encoding while preserving the sequential integrity of learning histories\. This cache reuse is applied only during offline profile construction, where repeated LLM encoding of shared problem prefixes dominates the cost\. It is orthogonal to the downstream SLM fine\-tuning stage, which focuses on profile\-to\-score alignment\.
The shared prefix follows a fixed input order:Question: \{question\},Description: \{desc\},Student Program: \{program\}, andError Description: \{error\_desc\}\. For personalized knowledge concept extraction, task\-specific instructions \(Knowledge Points to Extract: \{kp\_list\}\) are appended at the end of this shared prefix\. When processing knowledge graphs, we merge thedescfield into theQuestionfield\. Since knowledge graph construction focuses on evaluating specific knowledge points, the detailed problem description mainly serves as background context\. Consolidating it into the question avoids redundant encoding and prevents dilution of the LLM’s attention\.
To guide the LLM in structured extraction, we design prompt constraints for knowledge graph construction\. The model maps the student’s program and error descriptions to the predefined C\+\+ knowledge points\{kp\_display\}, as shown in Listing[1](https://arxiv.org/html/2606.15199#LST1)\. The prompt also requires evaluating the mastery level of each extracted point\. By analyzing the compilation or logical error descriptions, the LLM categorizes the student’s current state as either demonstrating understanding \(“Good”\) or lacking comprehension \(“Bad”\)\. The code and complete prompt templates are open\-sourced in our repository333[https://github\.com/Mrzhiyao/CogGuard](https://github.com/Mrzhiyao/CogGuard)\.
Syntax:Input\_Output\_and\_Sequential\_StructureControl\_Structure
Data\_Structure:Linked\_ListStackQueueGraph\_StructureString\_Algorithms
Algorithm:Enumeration\_and\_SortingSearchGreedySimulationBinary\_Search
Dynamic\_ProgrammingNumber\_TheoryHashProbability\_and\_StatisticsGame\_Theory
Listing 1:The content of \{kp\_display\} for C\+\+ programs\.Based on these extracted states, the profile summary quantitatively characterizes the student’s mastery over entities or knowledge nodes\. For any nodem∈EG∪KGm\\in E\_\{G\}\\cup K\_\{G\}, we first map the LLM’s categorical evaluation into a binarized mastery labelym∈\{0,1\}y\_\{m\}\\in\\\{0,1\\\}\(i\.e\., “Good” as 1, “Bad” as 0\)\. We then compute the base mastery rateRm=nm,1/\(nm,1\+nm,0\)R\_\{m\}=n\_\{m,1\}/\(n\_\{m,1\}\+n\_\{m,0\}\), wherenm,1n\_\{m,1\}andnm,0n\_\{m,0\}are the historical counts of mastered and unmastered instances\. We also introduce a frequency factorfm=nm/Nf\_\{m\}=n\_\{m\}/Nto weight the student’s familiarity, whereNNis the total number of historical records andnmn\_\{m\}is the count involving nodemm\. The node scoreS\(m\)S\(m\)is calculated by combining the base accuracy and the frequency factor:
S\(m\)=\{μ\+fm⋅\(1−μ\)ifRm≥μμ−fm⋅μifRm<μS\(m\)=\\begin\{cases\}\\mu\+f\_\{m\}\\cdot\(1\-\\mu\)&\\text\{if \}R\_\{m\}\\geq\\mu\\\\ \\mu\-f\_\{m\}\\cdot\\mu&\\text\{if \}R\_\{m\}<\\mu\\end\{cases\}\(1\)whereμ\\muis the initial neutral score\. The system ranks all nodes byS\(m\)S\(m\)and retains the top\-kkstrongest and weakest nodes to generate the final cognitive profiling summary\.
Machine\-Centric Servers\.While server performance is theoretically deterministic, real\-world execution is heavily affected by dynamic disturbances\. As detailed in Section[III](https://arxiv.org/html/2606.15199#S3), we probe system boundaries using controlled chaos injection to construct the static hardware profilePHP\_\{H\}and the dynamic runtime profilePRP\_\{R\}\. Rather than employing graph extraction, the model directly takesPHP\_\{H\}andPRP\_\{R\}as the standardized profile input\.
### IV\-CProfile\-to\-Score Alignment
We align structured profiles to task scores by fine\-tuning SLMs\. To improve the model’s understanding of the input structure and enable prefix sharing during batched inference, we introduce four special tags:<<MARK\_L\>\>,<<MARK\_EG\>\>,<<MARK\_KG\>\>, and<<MARK\_R\>\>\.
Task Outcome Scoring\.In the educational scenario, the target scores are derived directly from the platform \(Section[III](https://arxiv.org/html/2606.15199#S3)\)\. In the operational scenario, we design a scoring mechanism based on mixed AI inference workloads to generate target labels \(yty\_\{t\}\)\. As shown in Algorithm[1](https://arxiv.org/html/2606.15199#alg1), successful executions start from a high base score, and penalties are subtracted according to completion latency, file I/O overhead, execution overhead, and recovery cost\. Failed executions receive a lower base score determined by failure severity\. We further distinguish delayed failures from immediate failures: if a task runs for a prolonged duration before failing, we add a small adjustment, as this indicates partial responsiveness under stress\.
For successful tasks, the final score is primarily determined by execution efficiency\. We model the completion\-time penaltypt\(τ\)p\_\{t\}\(\\tau\)as a piecewise linear function to capture different latency sensitivity regions\. For failed executions, we assign a lower base score according to the error type and optionally add a delayed\-failure adjustment when prolonged runtime indicates partial responsiveness\. Here,pt\(τ\)p\_\{t\}\(\\tau\)denotes the completion\-time penalty,pr\(τ\)p\_\{r\}\(\\tau\)andpe\(τ\)p\_\{e\}\(\\tau\)denote the penalties for file I/O overhead and model execution overhead respectively, andpo\(τ\)p\_\{o\}\(\\tau\)is a recovery penalty determined by the recovery resulto\(τ\)o\(\\tau\)\. The completion\-time penaltypt\(τ\)p\_\{t\}\(\\tau\)is defined as:
pt\(τ\)=βi\+αi⋅\(ts\(τ\)−Ti−1\),ts\(τ\)∈\(Ti−1,Ti\]p\_\{t\}\(\\tau\)=\\beta\_\{i\}\+\\alpha\_\{i\}\\cdot\(t\_\{s\}\(\\tau\)\-T\_\{i\-1\}\),\\quad t\_\{s\}\(\\tau\)\\in\(T\_\{i\-1\},T\_\{i\}\]\(2\)where time thresholds\{Ti\}\\\{T\_\{i\}\\\}and slopesαi\\alpha\_\{i\}are empirical hyperparameters calibrated to reflect distinct sensitivity regions, andβi\\beta\_\{i\}is the base penalty for theii\-th stage\.
Algorithm 1Task Execution Score Calculation0:Inference task
τ\\tau, completion status
τc\\tau\_\{c\}, completion time
ts\(τ\)t\_\{s\}\(\\tau\), source file reading time
tr\(τ\)t\_\{r\}\(\\tau\), model execution time
te\(τ\)t\_\{e\}\(\\tau\), recovery result
o\(τ\)o\(\\tau\), error type
e\(τ\)e\(\\tau\)
0:Task execution score
S\(τ\)S\(\\tau\)
1:if
τc=0\\tau\_\{c\}=0then
2:Set basic score
R\(τ\)R\(\\tau\)based on
e\(τ\)e\(\\tau\)
3:
S\(τ\)←R\(τ\)S\(\\tau\)\\leftarrow R\(\\tau\)
4:if
ts\(τ\)\>20t\_\{s\}\(\\tau\)\>20then
5:Adjustment
af\(τ\)←min\(ts\(τ\)−2010,10\)a\_\{f\}\(\\tau\)\\leftarrow\\min\\left\(\\frac\{t\_\{s\}\(\\tau\)\-20\}\{10\},10\\right\)
6:
S\(τ\)←S\(τ\)\+af\(τ\)S\(\\tau\)\\leftarrow S\(\\tau\)\+a\_\{f\}\(\\tau\)
7:endif
8:else
9:Basic score
R\(τ\)←100R\(\\tau\)\\leftarrow 100
10:Calculate time penalty
pt\(τ\)p\_\{t\}\(\\tau\)via Eq\.[2](https://arxiv.org/html/2606.15199#S4.E2)
11:
pr\(τ\)←min\(max\(0,tr\(τ\)−10\)/2,5\)p\_\{r\}\(\\tau\)\\leftarrow\\min\(\\max\(0,t\_\{r\}\(\\tau\)\-10\)/2,5\)
12:
pe\(τ\)←min\(max\(0,te\(τ\)−10\)/2,5\)p\_\{e\}\(\\tau\)\\leftarrow\\min\(\\max\(0,t\_\{e\}\(\\tau\)\-10\)/2,5\)
13:Calculate recovery penalty
po\(τ\)p\_\{o\}\(\\tau\)from
o\(τ\)o\(\\tau\)
14:
S\(τ\)←R\(τ\)−\(pt\(τ\)\+pr\(τ\)\+pe\(τ\)\+po\(τ\)\)S\(\\tau\)\\leftarrow R\(\\tau\)\-\\bigl\(p\_\{t\}\(\\tau\)\+p\_\{r\}\(\\tau\)\+p\_\{e\}\(\\tau\)\+p\_\{o\}\(\\tau\)\\bigr\)
15:endif
16:return
S\(τ\)S\(\\tau\)
Heterogeneity\-Aware Orchestrator\.To handle the heterogeneity of training data, we design a length\-aware workload scheduler\. We first sort the dataset in descending order of prompt length and divide it into multiple batches\. Then, based on each server’s compute capability, partitions with longer contexts are assigned to high\-performance servers, while those with shorter contexts go to low\-power servers\. This strategy alleviates synchronization issues in distributed fine\-tuning\. It also naturally clusters inputs of similar lengths, providing more samples of the same problem with different profiles to support the contrastive regularization in the loss computation\. This strategy remains fully compatible with existing hardware\-centric scheduling methods\.
Contrastive Regularization\.To prevent the model’s score predictions from relying too heavily on the static content of the input, we propose a state\-sensitivity loss\. We penalize the similarity of the model’s predictions for different structured profiles associated with the same target problem \(or task\)\. We use the termproblemto denote the targetqqbeing evaluated in either scenario\.
We first group the samples within each training batch by problem ID\. For batchℬ\\mathcal\{B\}, we identify all samples sharing the same problemqqand index them into a setGq=\{i∈ℬ∣qi=q\}G\_\{q\}=\\\{i\\in\\mathcal\{B\}\\mid q\_\{i\}=q\\\}\. The predicted logitsziz\_\{i\}for these samples are used for loss calculations\. Since different subject states solving the same problem may yield distinct prediction patterns, we minimize the squared cosine similarity between their logits:
Lc=1𝒩∑q∑\(i,j\)∈Gq,i<j𝒮2\(zi,zj\)L\_\{\\text\{c\}\}=\\frac\{1\}\{\\mathcal\{N\}\}\\sum\_\{q\}\\sum\_\{\(i,j\)\\in G\_\{q\},i<j\}\\mathcal\{S\}^\{2\}\(z\_\{i\},z\_\{j\}\)\(3\)where𝒩\\mathcal\{N\}is the total number of valid pairs across all problem groups and𝒮\(zi,zj\)\\mathcal\{S\}\(z\_\{i\},z\_\{j\}\)denotes the cosine similarity between logit vectorsziz\_\{i\}andzjz\_\{j\}:
𝒮\(zi,zj\)=zi⋅zj‖zi‖⋅‖zj‖\\mathcal\{S\}\(z\_\{i\},z\_\{j\}\)=\\frac\{z\_\{i\}\\cdot z\_\{j\}\}\{\\\|z\_\{i\}\\\|\\cdot\\\|z\_\{j\}\\\|\}\(4\)
MinimizingLcL\_\{\\text\{c\}\}encourages the model to produce different representations for different states, preventing it from collapsing into a problem\-only shortcut\. Although different subject profiles may ultimately yield the same performance score, this soft penalty ensures that the model arrives at its prediction by genuinely interpreting the distinct profile features rather than merely memorizing the problem semantics\. ¿ Unlike standard contrastive learning that enforces invariance on positive pairs, we only apply a push mechanism on negative pairs, since enforcing similarity between the same profile solving different problems is not meaningful\. The final fine\-tuning objective isL=Lce\+λ⋅LcL=L\_\{\\text\{ce\}\}\+\\lambda\\cdot L\_\{\\text\{c\}\}, whereLceL\_\{\\text\{ce\}\}is the classification cross\-entropy loss andλ\\lambdabalances the state\-sensitivity penalty\. The cross\-entropy term dominates the actual score prediction, while the contrastive term acts as a regularizer to increase discrimination across different profile states\.
TABLE II:An ablation study of each component of our method\.TABLE III:Training server and software settings in CogGuard\.TypeConfigurationHardwarex86CPU: Intel i9\-14900KF @5\.6GHz 24 CoresCPU: Intel Silver 4210 @2\.20GHz 10 CoresCPU: Intel i9\-10900K @3\.70GHz 10 CoresGPU: 24GB 4090d, 24GB 3090, 8GB 2070sARMNvidia Spark \(20 Cores, 128GB GB10 GPU\)SoftwareRay: 2\.52\.1, nano\-graphrag: 0\.0\.8\.2,Chaosd: v1\.4, K8s: v1\.26\.10, LmDeploy: v0\.6\.4
## VExperiments and Results
### V\-AExperimental Setup
Datasets\.In the educational scenario, we evaluate mainstream SLMs across 40 students\. In the operational scenario, we collect chaos injection and task execution results on mixed AI inference workloads from three heterogeneous servers\[[36](https://arxiv.org/html/2606.15199#bib.bib32)\]\. The dataset is split into training, validation, and test sets in an 8:1:1 ratio\.
Dataset Generalizability\.The datasets used in this study were obtained by crawlingNowcoderand generating data via chaos injection and task testing on a heterogeneous edge cluster\. The required data fields can also be obtained from other platforms such asPTAandLeetCode, making the data construction process transferable beyond the current testbed\. Our data processing is limited to format conversion and the filtering of single\-input multi\-label data\. Chaos injection and task testing procedures can be reproduced on any cluster\. While we useLMDeployto deploy multimodal models for task execution, the execution tool can be replaced withvLLM,Ollama, or others\.
Setting\.The experimental infrastructure includes 4090, 4090D, 3090 workstations, and Spark edge devices\. The distributed fine\-tuning configurations are listed in Table[III](https://arxiv.org/html/2606.15199#S4.T3)\. We construct a representative edge cluster consisting of a high\-performance edge node \(Nvidia Spark\) and heterogeneous computing nodes \(desktop hosts\) to simulate the computational heterogeneity commonly encountered in edge environments\.
Experimental Parameters\.Table[IV](https://arxiv.org/html/2606.15199#S5.T4)lists the main experimental parameters\. The comparison results under different contrastive weights are shown in Table[VII](https://arxiv.org/html/2606.15199#S5.T7)\.
Baselines\.For Cognitive Profiling, we compare our prototype mapping approach with two baselines\.
- •Few\-shot EI\[[31](https://arxiv.org/html/2606.15199#bib.bib15)\]: Embracing Imperfection \(EI\) is a training\-free method that constructs a static cognitive prototype for each student without graph clustering\.
- •SimpleKT\[[12](https://arxiv.org/html/2606.15199#bib.bib22)\]: This method uses Transformer to encode students’ historical records and models problem\-specific difficulty variations through Rasch decomposition, fusing these representations for score prediction\.
For Alignment, we compare our method with five baselines\.
- •EIP\[[31](https://arxiv.org/html/2606.15199#bib.bib15)\]: Embracing Imperfection Prediction \(EIP\) maps new tasks to this cognitive prototype to retrieve relevant concepts and predicts student performance using LLM\-based inference without any parameter updates\.
- •FLASHBACK\[[11](https://arxiv.org/html/2606.15199#bib.bib36)\]: This method employs an appending context pattern that places retrieved documents at the end of the input\. It is used to evaluate the time efficiency of distributed fine\-tuning in Table[II](https://arxiv.org/html/2606.15199#S4.T2)\.
- •Ray\[[18](https://arxiv.org/html/2606.15199#bib.bib19)\]: Ray is a widely used distributed task execution framework\. We encapsulate fine\-tuning programs into workers\. Ray requires all servers to train with the same batch size and update weights through shared storage\.
- •SLM\-Probe\[[8](https://arxiv.org/html/2606.15199#bib.bib34)\]: This method keeps all parameters of the Qwen2\.5\-3B model frozen and trains a lightweight 3\-layer MLP classifier on top of the mean\-pooled hidden states from the final layer\.
- •GCN\-Embed\[[26](https://arxiv.org/html/2606.15199#bib.bib35)\]: This method converts entity and knowledge tokens into a token\-level graph, which is then encoded by a 2\-layer GCN followed by an MLP classifier\.
TABLE IV:Hyperparameter settings\.TypeHyperparameterValueGraph constructionThe Top\-kknodes in summary10Initial neutral scoreμ\\mu0\.5LLM timeout60 sLLM retry delay5 sLLM max retries3LLM retry backoff2DDPnum workers2 \- 4per device train batch size4gradient accumulation steps6Fine\-tuningtrain epochs48initial learning rate3e\-5max chunk chars5000task typeSEQ\_CLScontrastive weight0\.1extra number of tokens4gradient checkpointingTRUETABLE V:Performance of all methods across educational scenarios of different scales\.TABLE VI:Details of prediction results\.TABLE VII:Experimental results under different contrastive weights\.TABLE VIII:Performance of our method across different base models\.
### V\-BAblation Study
We validate each component of our method through an ablation study, with results in Table[II](https://arxiv.org/html/2606.15199#S4.T2)\. Summary Sensitivity denotes the proportion of prediction changes when the model receives the same problem but randomly sampled profile summaries\. Error Ratio denotes the proportion of cases where a score range of 0–20 is misclassified as 80–100, or vice versa\. We report experiments on the students\_20 subset in the educational scenario under identical conditions\. PRay refers to the integration of Ray with training data partitioning, while CRay refers to the integration of Ray with contrastive regularization\.
Because the profile summary is a lossy abstraction of the full submission record, different code variants may occasionally collapse to similar summaries while receiving different scores\. To analyze how such ambiguity affects validation performance, we construct a filtered subsetvfv\_\{f\}as a diagnostic evaluation while retaining the original validation splitviv\_\{i\}as the primary benchmark\. Although PRay achieves competitive performance onvfv\_\{f\}, our method further improves the primary benchmarkviv\_\{i\}and achieves the highest summary sensitivity\. This indicates that contrastive regularization reduces problem\-memorization shortcuts, forcing the model to rely more on personalized cognitive profiles in noisy, real\-world settings\.
Figure 3:Application workflow of CogGuard\.Figure 4:Efficiency of structured profiling across models\.TABLE IX:Dynamic evolution of a student’s cognitive profile\.Figure 5:Performance on heterogeneous clusters\.
### V\-CEffectiveness of Profile\-to\-Score Alignment
Table[V](https://arxiv.org/html/2606.15199#S5.T5)presents the performance of all methods across educational datasets of different scales\. SLM\-Probe shows a strong dependency on dataset size\. Few\-shot EI yields poor retrieval performance because it lacks graph clustering support\. In addition, increasing the dataset size consistently improves the performance of our method\.
In Table[VIII](https://arxiv.org/html/2606.15199#S5.T8), we compare the performance of our method across different fine\-tuning base models, using the cleaned students\_40 dataset for testing\. Our method achieves performance gains as the model size increases, with the Gemma model showing better stability\. The minimum Mean Absolute Error \(MAE\) is approximately 13 on a 100\-point\-scale\. While an MAE of 13\.4 still indicates noticeable prediction variance, it remains useful for early intervention because it separates high\-risk failures from safe completions without requiring heavy cloud\-side computation\.
We detail the prediction results of our method in different score ranges in Table[VI](https://arxiv.org/html/2606.15199#S5.T6), where the label distribution in the validation set is consistent with that of the training set\. Performance degrades in score intervals with fewer samples\. However, given that the training data is derived from real\-world scenarios, this distribution reflects the actual scoring patterns of C\+\+ programs\. Despite the scarcity of training samples in the intermediate score intervals, the Gemma model still achieves the best performance\. In contrast, the Qwen model tends to achieve better predictions in score intervals with more abundant training samples\. In future work, we plan to address this issue by generating additional realistic samples\.
### V\-DEfficiency of Dual\-Graph Structured Profiling
We use high\-capacity LLMs for offline profile extraction and fine\-tuned SLMs for real\-time edge\-side alignment, matching the different latency and reasoning requirements of the two stages\. In Fig\.[4](https://arxiv.org/html/2606.15199#S5.F4), we evaluate the efficiency of dual\-graph cognitive profiling\. Since the model Qwen3\-Next\-80B does not support prefix\-caching, our method achieves a 34\.2% reduction compared with dual\-graph \(DG\) construction on this model\. With the combined optimization of prompt design and KV cache reuse, we obtain a 47\.7% reduction in average time on the other models, which is close to the single\-graph construction overhead of the EI method\. This shows that the additional time cost of dual\-graph profiling is small\. Cognitive profiling can also run entirely on edge devices with a slight drop in precision, as shown by Qwen3\-8B\.
To illustrate the dynamic tracking capability of our method, Table[IX](https://arxiv.org/html/2606.15199#S5.T9)details the evolution of a student’s cognitive profile across two consecutive attempts on a specific array manipulation problem\. The system captures the shifting cognitive focus from “Graph Structure” to “Probability and Statistics”, correlating with the score drop from 50 to 0\.
Figure 6:Confusion matrix results of our method in cross\-scenario settings\.Figure 7:End\-to\-end service execution pipeline of CogGuard\.
### V\-EPerformance on Heterogeneous Clusters
In Fig\.[5](https://arxiv.org/html/2606.15199#S5.F5), we show the performance of our method on heterogeneous clusters\. The server settings correspond to \(3090 \+ Spark\), \(4090 \+ 3090 \+ Spark\), and \(4090 \+ 4090D \+ 3090 \+ Spark\), respectively\. The results show that training data partitioning has minimal impact on prediction accuracy, while our method achieves an additional time reduction of nearly 19% on top of Ray\-based distributed training\. This training overhead is far smaller than the time typically required for graph clustering and community report generation in Graph RAG\[[38](https://arxiv.org/html/2606.15199#bib.bib49),[4](https://arxiv.org/html/2606.15199#bib.bib50)\], confirming that our method adds very little time cost\.
### V\-FCross\-Scenario Experimental Results
We further evaluate CogGuard in the operational scenario, where the model achieves lower prediction error than in the educational scenario\. This is expected because operational faults are artificially injected with deterministic physical boundaries, whereas human behavioral features contain more noise and individual variation\. Nevertheless, the results in both scenarios show that the proposed alignment pipeline can handle both machine\-state profiles and human\-centered cognitive profiles\. Fig\.[3](https://arxiv.org/html/2606.15199#S5.F3)provides an overview of the proposed workflow, and the confusion matrices in Fig\.[6](https://arxiv.org/html/2606.15199#S5.F6)further summarize our score prediction performance\. We illustrate the end\-to\-end service execution pipeline in Fig\.[7](https://arxiv.org/html/2606.15199#S5.F7)\. The system retrieves pre\-stored profiles and target tasks for designated subjects to deliver batch proactive warning services\.
### V\-GLimitations
The educational profile summary is a lossy abstraction that may miss fine\-grained code\-level distinctions, while the operational dataset is derived from controlled chaos injection and may not fully reflect organic production failures\. In addition, our cross\-scenario study mainly validates a shared alignment interface rather than a fully unified upstream profiling process\. Since this proactive\-warning setting is not yet standardized, fully task\-matched baselines are currently unavailable\. We also note that the current profiling mechanism has minor limitations, such as formatting artifacts during LLM extraction that occasionally yield redundant knowledge entries; we plan to improve this extraction robustness in future work\.
## VIConclusion
This paper presented CogGuard, a proactive\-warning method for edge intelligent services that combines scenario\-specific profiling with a shared profile\-to\-score alignment pipeline\. We formulated proactive warning as a static\-dynamic profile\-to\-score prediction problem and proposed a dual\-graph cognitive profiling mechanism with prefix\-aligned KV cache reuse for educational scenarios and a chaos\-based profiling approach for operational scenarios\. To support efficient edge deployment, we introduced a length\-aware distributed fine\-tuning strategy with contrastive regularization to reduce synchronization bottlenecks and prevent problem\-memorization shortcuts\. Experiments on both educational and operational datasets showed that CogGuard reduced profile construction and training time by 48% and 19%, respectively, while achieving MAEs of 13\.4 and 5\.9 on 100\-point\-scale warning tasks\. Future work includes improving the robustness of LLM\-based entity extraction to reduce formatting artifacts, exploring data augmentation for underrepresented score intervals, and extending the profiling pipeline to additional edge service scenarios such as predictive maintenance and personalized recommendation\.
## References
- \[1\]\(2024\-11\)Cognitive bias in decision\-making with LLMs\.InFindings of the Association for Computational Linguistics: EMNLP 2024,Miami, Florida, USA,pp\. 12640–12653\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.findings-emnlp.739)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p1.1)\.
- \[2\]D\. Edge, H\. Trinh, N\. Cheng, J\. Bradley, A\. Chao, A\. Mody, S\. Truitt, D\. Metropolitansky, R\. O\. Ness, and J\. Larson\(2025\)From local to global: a graph rag approach to query\-focused summarization\.Note:arXiv preprint arXiv:2404\.16130External Links:2404\.16130Cited by:[§IV\-B](https://arxiv.org/html/2606.15199#S4.SS2.p3.4)\.
- \[3\]G\. Gao, M\. Xiao, J\. Wu, H\. Huang, S\. Wang, and G\. Chen\(2021\)Auction\-based vm allocation for deadline\-sensitive tasks in distributed edge cloud\.IEEE Transactions on Services Computing14\(6\),pp\. 1702–1716\.External Links:[Document](https://dx.doi.org/10.1109/TSC.2019.2902549)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p3.1)\.
- \[4\]H\. Han, L\. Ma, Y\. Wang, H\. Shomer, Y\. Lei, Z\. Qi, K\. Guo, Z\. Hua, B\. Long, H\. Liu, C\. C\. Aggarwal, and J\. Tang\(2026\)RAG vs\. graphrag: a systematic evaluation and key insights\.Note:arXiv preprint arXiv:2502\.11371External Links:2502\.11371Cited by:[§V\-E](https://arxiv.org/html/2606.15199#S5.SS5.p1.1)\.
- \[5\]H\. Han, Y\. Wang, H\. Shomer, K\. Guo, J\. Ding, Y\. Lei, M\. Halappanavar, R\. A\. Rossi, S\. Mukherjee, X\. Tang, Q\. He, Z\. Hua, B\. Long, T\. Zhao, N\. Shah, A\. Javari, Y\. Xia, and J\. Tang\(2025\)Retrieval\-augmented generation with graphs \(graphrag\)\.Note:arXiv preprint arXiv:2501\.00309External Links:2501\.00309Cited by:[§II\-A](https://arxiv.org/html/2606.15199#S2.SS1.p1.1)\.
- \[6\]X\. He, Y\. Tian, Y\. Sun, N\. V\. Chawla, T\. Laurent, Y\. LeCun, X\. Bresson, and B\. Hooi\(2024\)G\-retriever: retrieval\-augmented generation for textual graph understanding and question answering\.InProceedings of the Advances in Neural Information Processing Systems,Vol\.37,pp\. 132876–132907\.External Links:[Document](https://dx.doi.org/10.52202/079017-4224)Cited by:[§IV\-B](https://arxiv.org/html/2606.15199#S4.SS2.p3.4)\.
- \[7\]L\. Hu, Z\. Dong, J\. Chen, G\. Wang, Z\. Wang, Z\. Zhao, and F\. Wu\(2023\)PTADisc: a cross\-course dataset supporting personalized learning in cold\-start scenarios\.InProceedings of the Advances in Neural Information Processing Systems,Vol\.36,pp\. 44976–44996\.Cited by:[§III\-A](https://arxiv.org/html/2606.15199#S3.SS1.p1.1)\.
- \[8\]A\. Kumar, M\. Narayanan Sundararaman, and J\. Vepa\(2021\-11\)What BERT based language model learns in spoken transcripts: an empirical study\.InProceedings of the Fourth BlackboxNLP Workshop on Analyzing and Interpreting Neural Networks for NLP,Punta Cana, Dominican Republic,pp\. 322–336\.External Links:[Document](https://dx.doi.org/10.18653/v1/2021.blackboxnlp-1.25)Cited by:[4th item](https://arxiv.org/html/2606.15199#S5.I2.i4.p1.1.1)\.
- \[9\]T\. T\. Lau, W\. Li, C\. Xu, H\. Liu, and M\. Kolar\(2025\)Adaptive batch size schedules for distributed training of language models with data and model parallelism\.InProceedings of the Second Conference on Parsimony and Learning \(Proceedings Track\),Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p3.1),[§I](https://arxiv.org/html/2606.15199#S1.p5.1)\.
- \[10\]G\. Liu, Y\. Zhang, Y\. Li, and Q\. Yao\(2025\)Dual reasoning: a gnn\-llm collaborative framework for knowledge graph question answering\.Note:arXiv preprint arXiv:2406\.01145External Links:2406\.01145Cited by:[§II\-A](https://arxiv.org/html/2606.15199#S2.SS1.p1.1)\.
- \[11\]R\. Liu, X\. Xiao, H\. Huang, Z\. Chi, and Z\. Wu\(2025\-07\)FlashBack: efficient retrieval\-augmented language modeling for fast inference\.InFindings of the Association for Computational Linguistics: ACL 2025,Vienna, Austria,pp\. 595–608\.External Links:[Document](https://dx.doi.org/10.18653/v1/2025.findings-acl.33)Cited by:[2nd item](https://arxiv.org/html/2606.15199#S5.I2.i2.p1.1.1)\.
- \[12\]Z\. Liu, Q\. Liu, J\. Chen, S\. Huang, and W\. Luo\(2023\)SimpleKT: a simple but tough\-to\-beat baseline for knowledge tracing\.InProceedings of the Eleventh International Conference on Learning Representations,Cited by:[§III\-A](https://arxiv.org/html/2606.15199#S3.SS1.p1.1),[2nd item](https://arxiv.org/html/2606.15199#S5.I1.i2.p1.1.1)\.
- \[13\]R\. Lv, Q\. Liu, W\. Gao, H\. Zhang, J\. Lu, and L\. Zhu\(2025\-Apr\.\)GenAL: generative agent for adaptive learning\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 577–585\.External Links:[Document](https://dx.doi.org/10.1609/aaai.v39i1.32038)Cited by:[§II\-A](https://arxiv.org/html/2606.15199#S2.SS1.p1.1)\.
- \[14\]Y\. Lv, H\. Pan, Z\. Wang, J\. Liang, Y\. Liu, R\. Fu, M\. Liu, Z\. Wang, and B\. Qin\(2024\-11\)CogGPT: unleashing the power of cognitive dynamics on large language models\.InFindings of the Association for Computational Linguistics: EMNLP 2024,Miami, Florida, USA,pp\. 6074–6091\.External Links:[Document](https://dx.doi.org/10.18653/v1/2024.findings-emnlp.352)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p4.1)\.
- \[15\]A\. B\. Mailewa, A\. Akuthota, and T\. M\. D\. Mohottalalage\(2025\)A review of resilience testing in microservices architectures: implementing chaos engineering for fault tolerance and system reliability\.InProceedings of the 2025 IEEE 15th Annual Computing and Communication Workshop and Conference \(CCWC\),Vol\.,pp\. 00236–00242\.External Links:[Document](https://dx.doi.org/10.1109/CCWC62904.2025.10903891)Cited by:[§III\-B](https://arxiv.org/html/2606.15199#S3.SS2.p2.2)\.
- \[16\]C\. Mavromatis and G\. Karypis\(2025\-07\)GNN\-RAG: graph neural retrieval for efficient large language model reasoning on knowledge graphs\.InFindings of the Association for Computational Linguistics: ACL 2025,Vienna, Austria,pp\. 16682–16699\.External Links:[Document](https://dx.doi.org/10.18653/v1/2025.findings-acl.856)Cited by:[§II\-A](https://arxiv.org/html/2606.15199#S2.SS1.p1.1)\.
- \[17\]R\. Miao, T\. Wu, and Z\. Zhang\(2026\)Graph rag\-based fault diagnosis for train bogies using knowledge graphs and large language model\.Knowledge\-Based Systems331,pp\. 114855\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.knosys.2025.114855)Cited by:[§II\-A](https://arxiv.org/html/2606.15199#S2.SS1.p1.1)\.
- \[18\]P\. Moritz, R\. Nishihara, S\. Wang, A\. Tumanov, R\. Liaw, E\. Liang, M\. Elibol, Z\. Yang, W\. Paul, M\. I\. Jordan, and I\. Stoica\(2018\-10\)Ray: a distributed framework for emerging AI applications\.InProceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation \(OSDI 18\),Carlsbad, CA,pp\. 561–577\.Cited by:[§II\-B](https://arxiv.org/html/2606.15199#S2.SS2.p1.1),[3rd item](https://arxiv.org/html/2606.15199#S5.I2.i3.p1.1.1)\.
- \[19\]J\. Owotogbe, I\. Kumara, W\. Heuvel, and D\. Tamburri\(2025\-12\)Chaos engineering: a multi\-vocal literature review\.ACM Comput\. Surv\.58\(7\)\.External Links:[Document](https://dx.doi.org/10.1145/3777375)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p6.1)\.
- \[20\]J\. H\. Park, G\. Yun, C\. M\. Yi, N\. T\. Nguyen, S\. Lee, J\. Choi, S\. H\. Noh, and Y\. Choi\(2020\-07\)HetPipe: enabling large DNN training on \(whimpy\) heterogeneous GPU clusters through integration of pipelined model parallelism and data parallelism\.InProceedings of the 2020 USENIX Annual Technical Conference \(USENIX ATC 20\),pp\. 307–321\.Cited by:[§II\-B](https://arxiv.org/html/2606.15199#S2.SS2.p1.1)\.
- \[21\]C\. Piech, J\. Bassen, J\. Huang, S\. Ganguli, M\. Sahami, L\. Guibas, and J\. Sohl\-Dickstein\(2015\)Deep knowledge tracing\.InProceedings of the Advances in Neural Information Processing Systems,Vol\.28\.Cited by:[§II\-A](https://arxiv.org/html/2606.15199#S2.SS1.p1.1)\.
- \[22\]G\. Qu, Q\. Chen, W\. Wei, Z\. Lin, X\. Chen, and K\. Huang\(2025\)Mobile edge intelligence for large language models: a contemporary survey\.IEEE Communications Surveys & Tutorials27\(6\),pp\. 3820–3860\.External Links:[Document](https://dx.doi.org/10.1109/COMST.2025.3527641)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p2.1)\.
- \[23\]A\. Segal, K\. Gal, G\. Shani, and B\. Shapira\(2019\)A difficulty ranking approach to personalization in e\-learning\.International Journal of Human\-Computer Studies130,pp\. 261–272\.External Links:[Document](https://dx.doi.org/https%3A//doi.org/10.1016/j.ijhcs.2019.07.002)Cited by:[§IV\-B](https://arxiv.org/html/2606.15199#S4.SS2.p4.2)\.
- \[24\]L\. Song, F\. Chen, Y\. Zhuo, X\. Qian, H\. Li, and Y\. Chen\(2020\)AccPar: tensor partitioning for heterogeneous deep learning accelerators\.InProceedings of the 2020 IEEE International Symposium on High Performance Computer Architecture \(HPCA\),Vol\.,pp\. 342–355\.Cited by:[§II\-B](https://arxiv.org/html/2606.15199#S2.SS2.p1.1)\.
- \[25\]V\. Srivatsa, Z\. He, R\. Abhyankar, D\. Li, and Y\. Zhang\(2025\)Preble: efficient distributed prompt scheduling for LLM serving\.InProceedings of the Thirteenth International Conference on Learning Representations,Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p3.1)\.
- \[26\]D\. Wang, Y\. Zuo, F\. Li, and J\. Wu\(2024\)LLMs as zero\-shot graph learners: alignment of gnn representations with llm token embeddings\.InProceedings of the Advances in Neural Information Processing Systems,Vol\.37,pp\. 5950–5973\.External Links:[Document](https://dx.doi.org/10.52202/079017-0193)Cited by:[5th item](https://arxiv.org/html/2606.15199#S5.I2.i5.p1.1.1)\.
- \[27\]S\. Wang, H\. Yang, and W\. Liu\(2025\)Research on the construction and application of retrieval enhanced generation \(rag\) model based on knowledge graph\.Scientific Reports15\(1\),pp\. 40425\.External Links:[Document](https://dx.doi.org/10.1038/s41598-025-21222-z)Cited by:[§II\-A](https://arxiv.org/html/2606.15199#S2.SS1.p1.1)\.
- \[28\]X\. Wang, L\. Wu, L\. Hong, H\. Liu, and Y\. Fu\(2025\-09\)LLM\-enhanced user–item interactions: leveraging edge information for optimized recommendations\.ACM Trans\. Intell\. Syst\. Technol\.16\(5\)\.External Links:[Document](https://dx.doi.org/10.1145/3757925)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p1.1)\.
- \[29\]J\. Wu, J\. Zhu, Y\. Qi, J\. Chen, M\. Xu, F\. Menolascina, Y\. Jin, and V\. Grau\(2025\-07\)Medical graph RAG: evidence\-based medical large language model via graph retrieval\-augmented generation\.InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics,Vienna, Austria,pp\. 28443–28467\.External Links:[Document](https://dx.doi.org/10.18653/v1/2025.acl-long.1381)Cited by:[§II\-A](https://arxiv.org/html/2606.15199#S2.SS1.p1.1)\.
- \[30\]S\. Wu, Q\. Tu, H\. Liu, J\. Xu, Z\. Liu, G\. Zhang, R\. Wang, X\. Chen, and R\. Yan\(2024\)Unify graph learning with text: unleashing llm potentials for session search\.InProceedings of the ACM Web Conference 2024,New York, NY, USA,pp\. 1509–1518\.External Links:[Document](https://dx.doi.org/10.1145/3589334.3645574)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p4.1)\.
- \[31\]T\. Wu, J\. Chen, W\. Lin, M\. Li, Y\. Zhu, A\. Li, K\. Kuang, and F\. Wu\(2025\)Embracing imperfection: simulating students with diverse cognitive levels using llm\-based agents\.Note:arXiv preprint arXiv:2505\.19997External Links:2505\.19997Cited by:[§II\-A](https://arxiv.org/html/2606.15199#S2.SS1.p1.1),[§III\-A](https://arxiv.org/html/2606.15199#S3.SS1.p1.1),[1st item](https://arxiv.org/html/2606.15199#S5.I1.i1.p1.1.1),[1st item](https://arxiv.org/html/2606.15199#S5.I2.i1.p1.1.1)\.
- \[32\]Z\. Xu, Z\. Tang, J\. Lou, Z\. Yao, X\. Xie, T\. Wang, Y\. Wang, and W\. Jia\(2026\)EAT: qos\-aware edge\-collaborative aigc task scheduling via attention\-guided diffusion reinforcement learning\.IEEE Transactions on Mobile Computing\(\),pp\. 1–17\.External Links:[Document](https://dx.doi.org/10.1109/TMC.2026.3656318)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p3.1)\.
- \[33\]D\. Yang, T\. Liu, D\. Zhang, A\. Simoulin, X\. Liu, Y\. Cao, Z\. Teng, X\. Qian, G\. Yang, J\. Luo, and J\. McAuley\(2025\)Code to think, think to code: a survey on code\-enhanced reasoning and reasoning\-driven code intelligence in llms\.Note:arXiv preprint arXiv:2502\.19411External Links:2502\.19411Cited by:[§III\-A](https://arxiv.org/html/2606.15199#S3.SS1.p2.1)\.
- \[34\]R\. Yang, B\. Yang, A\. Feng, S\. Ouyang, M\. Blum, T\. She, Y\. Jiang, F\. Lecue, J\. Lu, and I\. Li\(2025\)Graphusion: a rag framework for knowledge graph construction with a global perspective\.Note:arXiv preprint arXiv:2410\.17600External Links:2410\.17600Cited by:[§IV\-B](https://arxiv.org/html/2606.15199#S4.SS2.p3.4)\.
- \[35\]Z\. Yao, Z\. Tang, J\. Lou, P\. Shen, and W\. Jia\(2024\)VELO: a vector database\-assisted cloud\-edge collaborative llm qos optimization framework\.InProceedings of the 2024 IEEE International Conference on Web Services \(ICWS\),Vol\.,pp\. 865–876\.Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p2.1)\.
- \[36\]L\. Zhang, Y\. Jiang, G\. He, X\. Chen, H\. Lv, Q\. Yao, F\. Fu, and K\. Chen\(2025\)Efficient mixed\-precision large language model inference with turbomind\.Note:arXiv preprint arXiv:2508\.15601External Links:2508\.15601Cited by:[§V\-A](https://arxiv.org/html/2606.15199#S5.SS1.p1.1)\.
- \[37\]Y\. Zhang, R\. Wu, P\. Cai, X\. Wang, G\. Yan, S\. Mao, D\. Wang, and B\. Shi\(2025\)LeanRAG: knowledge\-graph\-based generation with semantic aggregation and hierarchical retrieval\.Note:arXiv preprint arXiv:2508\.10391External Links:2508\.10391Cited by:[§IV\-B](https://arxiv.org/html/2606.15199#S4.SS2.p3.4)\.
- \[38\]Y\. Zhao, J\. Zhu, Y\. Guo, K\. He, and X\. Li\(2025\)E2graphrag: streamlining graph\-based rag for high efficiency and effectiveness\.Note:arXiv preprint arXiv:2505\.24226External Links:2505\.24226Cited by:[§V\-E](https://arxiv.org/html/2606.15199#S5.SS5.p1.1)\.
- \[39\]Y\. Zheng, Y\. Chen, B\. Qian, X\. Shi, Y\. Shu, and J\. Chen\(2025\-03\)A review on edge large language models: design, execution, and applications\.ACM Comput\. Surv\.57\(8\)\.External Links:[Document](https://dx.doi.org/10.1145/3719664)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p3.1)\.
- \[40\]A\. Zhong, D\. Mo, G\. Liu, J\. Liu, Q\. Lu, Q\. Zhou, J\. Wu, Q\. Li, and Q\. Wen\(2024\)LogParser\-llm: advancing efficient log parsing with large language models\.InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,New York, NY, USA,pp\. 4559–4570\.External Links:[Document](https://dx.doi.org/10.1145/3637528.3671810)Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p1.1)\.
- \[41\]K\. Zhong, B\. Suleiman, A\. Erradi, and S\. Chen\(2025\)SemRAG: semantic knowledge\-augmented rag for improved question\-answering\.Note:arXiv preprint arXiv:2507\.21110External Links:2507\.21110Cited by:[§I](https://arxiv.org/html/2606.15199#S1.p4.1)\.Similar Articles
Cognifold: Always-On Proactive Memory via Cognitive Folding
Introduces Cognifold, a brain-inspired always-on proactive memory for LLM agents that continuously organizes fragmented event streams into self-emerging cognitive structures via graph-topology self-organization, extending Complementary Learning Systems theory with a prefrontal intent layer.
PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors
# Paper page - PrefixGuard: From LLM-Agent Traces to Online Failure-Warning Monitors Source: [https://huggingface.co/papers/2605.06455](https://huggingface.co/papers/2605.06455) ## Abstract PrefixGuard enables effective online monitoring of LLM agents through trace analysis and prefix\-based risk scoring, demonstrating strong performance across multiple benchmark tasks while providing diagnostic insights for alert reliability\. Large language model \(LLM\) agents now execute long, tool\-using ta
ContextGuard: Structured Self-Auditing for Context Learning in Language Models
Introduces ContextGuard, a structured self-auditing framework that improves LLM context learning by decomposing model self-assessment into confirmed and uncertain categories and applying targeted revisions, achieving a task-solving rate increase from 9.64% to 13.85% on Qwen3.5-4B on the CL-Bench benchmark.
OpenGuardrails: An Open-Source Context-Aware AI Guardrails Platform
OpenGuardrails is an open-source platform for AI safety, offering context-aware content-safety and manipulation detection (e.g., prompt injection, jailbreaking) via a unified model, plus a separate NER pipeline for data-leakage identification. It achieves state-of-the-art performance on safety benchmarks and supports private, enterprise-grade deployment.
CoMIC: Collaborative Memory and Insights Circulation for Long-Horizon LLM Agents in Cloud-Edge Systems
CoMIC is a cloud-edge framework for LLM agents that uses collaborative memory and insight circulation to improve long-horizon task performance without requiring parameter updates, achieving gains in progress rate and action grounding across multiple tasks.