Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment
Summary
Proposes Latent-Predictive Counterfactual Decoupling (LPCD) to address tactical out-of-distribution shifts in live streaming risk assessment by decoupling stable malicious intent from evolving narrative tactics at the latent level, achieving superior performance on large-scale industrial datasets.
View Cached Full Text
Cached at: 06/03/26, 09:40 AM
# Outsmarting the Chameleon: Counterfactual Decoupling for Tactical OOD Shifts in Live Streaming Risk Assessment
Source: [https://arxiv.org/html/2606.02946](https://arxiv.org/html/2606.02946)
\(2026\)
###### Abstract\.
Live streaming has emerged as a primary medium for social interaction and digital commerce, yet it is increasingly plagued by sophisticated risks\. A fundamental challenge in this domain is*tactical out\-of\-distribution \(OOD\) shift*: while malicious actors maintain stable underlying objectives, they continuously redesign narrative packaging to evade detection\. Such adversarial shifts expose critical limitations of existing OOD generalization paradigms, whose assumptions are difficult to satisfy in the presence of tightly coupled intent–tactic evolution and ill\-defined raw\-level counterfactuals\.
In this paper, we tackle this issue from a*latent causal*perspective and proposeLatent\-PredictiveCounterfactualDecoupling \(LPCD\), a plug\-in framework for robust live streaming risk assessment\. LPCD enables counterfactual reasoning under adversarial tactical re\-packaging by modeling intent and narrative variation at the latent level, and enforces*latent counterfactual consistency*to anchor risk prediction on causally stable malicious intent\. At inference time, LPCD applies a lightweight, parameter\-free calibration to further mitigate tactic\-induced distribution shifts\. Extensive experiments on large\-scale industrial datasets and online production traffic demonstrate that LPCD consistently outperforms state\-of\-the\-art baselines, validating its effectiveness in moderating evolving adversarial risks in real\-world live streaming\. The project page is available at[https://qiaoyran\.github\.io/LiveStreamingRiskAssessment/](https://qiaoyran.github.io/LiveStreamingRiskAssessment/)\.
Live Streaming Risk Assessment; OOD Generalization
††copyright:acmlicensed††journalyear:2018††doi:XXXXXXX\.XXXXXXX††conference:Make sure to enter the correct conference title from your rights confirmation email; June 03–05, 2018; Woodstock, NY††isbn:978\-1\-4503\-XXXX\-X/2018/06††journalyear:2026††copyright:cc††conference:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2; August 09–13, 2026; Jeju Island, Republic of Korea††booktitle:Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\.2 \(KDD ’26\), August 09–13, 2026, Jeju Island, Republic of Korea††doi:10\.1145/3770855\.3818084††isbn:979\-8\-4007\-2259\-2/2026/08††ccs:Information systems Data mining## 1\.Introduction
Live streaming has become a primary medium for social interaction and digital commerce, accompanied by increasingly sophisticated risks such as financial fraud and illicit promotion\. Malicious behaviors in these sessions are often embedded within socially plausible narratives, which conceal true objectives and make detection challenging\. These diverse surface behaviors often mask a small set of stable malicious objectives, allowing adversaries to adapt their tactics over time without altering the underlying intent\.
A dominant class of objectives includes\(i\) off\-platform redirectionto external scam environments and\(ii\) on\-platform deceptive monetizationthrough fraudulent sales\. To achieve these objectives under scrutiny, adversaries continuously redesign the narrative packaging of a live session, including conversational scripts, interaction rhythms, and coordination between hosts and accomplices\. For instance, the same redirection intent may be framed as a lottery giveaway, a job recruitment, or an investment tip, as illustrated in Figure[1](https://arxiv.org/html/2606.02946#S1.F1)\(a\)\. The resulting mismatch between stable intent and volatile presentation creates a persistent challenge for models that attempt to generalize from historical patterns\.
This phenomenon constitutes atactical out\-of\-distribution \(OOD\) shift, where the data distribution changes at a strategic level while the underlying risk\-generating logic remains invariant\. Unlike conventional distribution shifts driven by passive or exogenous factors, tactical OOD shifts arise from adversarially optimized narrative redesigns that are intentionally coupled with the malicious objective\. Consequently, models that rely on historical tactical patterns often fail to generalize when a known intent is wrapped in an unseen narrative, as shown in Figure[1](https://arxiv.org/html/2606.02946#S1.F1)\(b\)\.
Figure 1\.\(a\) Adversaries maintain an invariant malicious intent \(e\.g\., off\-platform redirection\) while continuously redesigning the volatile narrative packaging to evade detection\. \(b\) PR\-AUC of a production risk detection model evaluated on real\-world data from October to December 2025, showing a degradation in performance over the same period\.Despite extensive research on OOD generalization\(Zhouet al\.,[2022](https://arxiv.org/html/2606.02946#bib.bib18); Liuet al\.,[2021b](https://arxiv.org/html/2606.02946#bib.bib19)\), existing approaches face fundamental limitations when applied to live streaming risk assessment\. At the supervision level, most OOD methods rely on explicit\(Arjovskyet al\.,[2019](https://arxiv.org/html/2606.02946#bib.bib16); Kruegeret al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib17)\)or implicitly inferable environment labels\(Creageret al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib47); Liuet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib48)\)\. In live streaming, however, tactical variations emerge continuously and adversarially, without clear environment boundaries\. This makes it difficult to directly apply environment\-based invariance assumptions in practice\.
Beyond this supervision challenge, adversarial live streaming violates a key assumption shared by many invariance\-based methods\. These approaches typically presume that spurious correlations arise from passive or weakly coupled shifts\(Zhanget al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib20); Liuet al\.,[2021a](https://arxiv.org/html/2606.02946#bib.bib21)\)\. In contrast, narrative packaging in malicious live sessions is strategically designed and tightly coupled with underlying intent\. This strategic co\-evolution leads to deep semantic entanglement, under which enforcing invariance at the observation level can be insufficient and, in some cases, even counterproductive\.
While counterfactual reasoning\(Pearl,[2009](https://arxiv.org/html/2606.02946#bib.bib30); Federet al\.,[2022](https://arxiv.org/html/2606.02946#bib.bib26)\)offers a principled path to address such entanglement, constructing realistic counterfactuals within the raw observation space is often ill\-defined in practice\. Live sessions comprise high\-dimensional, multimodal streams, where input\-level interventions are difficult to specify without violating semantic coherence\. These challenges motivate a latent causal formulation, in which counterfactual reasoning and invariance are enforced in the latent representation space rather than on raw observations\.
To this end, we advocate a*latent causal*perspective that enables counterfactual reasoning under adversarial tactical re\-packaging\. As raw\-level counterfactuals are ill\-defined for live sessions, we perform causal interventions in the latent representation space, where intent\-preserving tactical variations can be explicitly modeled\. This structure allows us to enforce latent counterfactual consistency, ensuring the model remains focused on the invariant risk core despite strategic narrative changes\.
Building on this perspective, we proposeLatent\-Predictive Counterfactual Decoupling \(LPCD\), a plug\-in framework for robust live streaming risk assessment\. LPCD models session representations as composed of intent\-related and packaging\-related factors, and enforces*latent counterfactual consistency*by intervening on the packaging factor during training, thereby isolating intent\-specific signals that remain causally stable under tactical re\-packaging\. At test time, LPCD further applies a parameter\-free calibration to rectify tactic\-induced magnitude shifts\. Extensive experiments on large\-scale industrial data from Douyin show that LPCD consistently outperforms strong baselines in both in\-distribution and tactical OOD settings\. Our main contributions are summarized as follows:
- •We identify*tactical out\-of\-distribution \(OOD\) shift*as a fundamental challenge in live streaming risk assessment, characterized by invariant malicious intent under adversarially evolving narrative packaging, and provide a principled framing from a*latent causal*perspective\.
- •We proposeLatent\-Predictive Counterfactual Decoupling \(LPCD\), a plug\-in framework that enforces latent counterfactual consistency by intervening on narrative packaging at both the representation and prediction levels, enabling intent\-focused risk modeling\.
- •Extensive experiments on large\-scale industrial live\-streaming datasets and online validation confirm LPCD’s SOTA performance in both in\-distribution and tactical OOD settings, validating its efficacy in moderating evolving adversarial risks in real\-world live streaming\.
## 2\.Related Work
### 2\.1\.Risk Assessment in Online Ecosystems
Risk assessment in online ecosystems has evolved from fine\-grained artifact detection to more holistic modeling of coordinated behaviors\. One line of research focuses on identifying localized signals, such as toxic language in user\-generated text\(Leeset al\.,[2022](https://arxiv.org/html/2606.02946#bib.bib7); Zannettouet al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib8)\)or policy\-violating visual cues in short videos\(Luet al\.,[2025](https://arxiv.org/html/2606.02946#bib.bib1); Wanget al\.,[2025](https://arxiv.org/html/2606.02946#bib.bib6)\)\. To capture more complex and organized risks, another line adopts sequential\(Guoet al\.,[2018](https://arxiv.org/html/2606.02946#bib.bib5); Qiaoet al\.,[2025](https://arxiv.org/html/2606.02946#bib.bib4); Xiaoet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib3); Qiaoet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib10); Wanget al\.,[2023](https://arxiv.org/html/2606.02946#bib.bib2)\)and graph\-based models\(Douet al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib11); Huanget al\.,[2022](https://arxiv.org/html/2606.02946#bib.bib12); Shiet al\.,[2022](https://arxiv.org/html/2606.02946#bib.bib13); Liet al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib14); Chenget al\.,[2025](https://arxiv.org/html/2606.02946#bib.bib15)\), enabling the characterization of temporal dependencies and cross\-entity coordination\.
In live streaming, risk signals are inherently session\-level, emerging from long\-range interactions and evolving narratives rather than isolated events\. This has led to Multiple Instance Learning \(MIL\) formulations, exemplified by AC\-MIL\(Qiaoet al\.,[2026](https://arxiv.org/html/2606.02946#bib.bib50)\), which models live sessions as collections of user–timeslot instances under session\-level supervision\. While such approaches effectively capture intra\-session dynamics, they remain largely associative, entangling risk predictions with surface narrative patterns\.
Under adversarial tactic evolution, where identical malicious intents are repeatedly rewrapped in novel narratives, this coupling therefore limits robustness to tactical distribution shifts, motivating the need for intent\-focused modeling beyond holistic session representations\.
### 2\.2\.Causal Perspectives on OOD Generalization
Prior work on out\-of\-distribution \(OOD\) generalization aims to improve robustness by enforcing invariant representations across environments\(Arjovskyet al\.,[2019](https://arxiv.org/html/2606.02946#bib.bib16); Kruegeret al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib17); Sagawa\*et al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib45); Zhouet al\.,[2022](https://arxiv.org/html/2606.02946#bib.bib18); Liuet al\.,[2021b](https://arxiv.org/html/2606.02946#bib.bib19)\)\. Causality\-inspired approaches further interpret distribution shifts as interventions on non\-causal factors, and seek to disentangle causal semantics from spurious correlations\(Zhanget al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib20); Liuet al\.,[2021a](https://arxiv.org/html/2606.02946#bib.bib21); Mahajanet al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib22)\)\.
However, most existing frameworks operate under a passive or exogenous shift assumption, where variations arise from low\-level statistical noise, backgrounds, or temporal non\-stationarity\(Oublalet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib23); Liuet al\.,[2025](https://arxiv.org/html/2606.02946#bib.bib24); Wuet al\.,[2026](https://arxiv.org/html/2606.02946#bib.bib25)\)\. In these scenarios, task semantics are typically assumed to be stable and counterfactual variations are treated as well\-defined at the observation level, with distribution shifts viewed as environment\-induced rather than strategic\.
In contrast, live streaming risk assessment operates in a tactical OOD regime\. Malicious actors actively redesign narrative packaging, interaction patterns, and temporal strategies to obscure intent\. These shifts are structured, high\-dimensional, and intentionally entangled with risk signals, going beyond the scope of prior methods that focus on attribute\-level disentanglement or statistical invariance\. Our work addresses this gap by introducing a latent counterfactual decoupling framework that explicitly intervenes on narrative packaging, enabling robust intent inference under evolving adversarial tactics\.
## 3\.Problem Formulation
### 3\.1\.Business Setting
Live streaming platforms face*adversarially evolving risks*where malicious actors continuously re\-engineer tactics to evade detection\. This environment presents three critical challenges: \(1\)Tactical shifts: Surface\-level narrative packaging and interaction scripts evolve rapidly, while the underlying malicious intent remains invariant\. \(2\)Coarse supervision: Only session\-level labels are available without explicit environment or action\-level annotations, complicating group\-aware OOD schemes\. \(3\)Label latency: Delays in manual reviews create a temporal gap between live events and label availability, necessitating models that generalize across distribution shifts without real\-time retraining\.
### 3\.2\.Definition and Objective
We study the*live streaming risk assessment*problem under tactical OOD shifts\. The goal is to determine whether a live streaming session involves risky behaviors such as fraud or illicit promotion, despite evolving tactics designed to evade detection\.
###### Definition 3\.1\.
\(Action\)An*action*in a live streaming session is represented as a tupleα=\(u,t,a,x\),\\alpha=\(u,t,a,x\),whereuudenotes the user performing the action,ttis the timestamp,aaindicates the action type \(e\.g\., message posting, gifting, joining\), andx∈ℝdx\\in\\mathbb\{R\}^\{d\}is add\-dimensional semantic embedding extracted from the raw textual content using a pretrained language model\.
###### Definition 3\.2\.
\(Live Streaming Session\)A live streaming session over a time window\[0,T\]\[0,T\]is defined as
S\[0,T\]=\(𝒰,\[α1,α2,…,αN\]\),S^\{\[0,T\]\}=\\big\(\\mathcal\{U\},\[\\alpha\_\{1\},\\alpha\_\{2\},\\ldots,\\alpha\_\{N\}\]\\big\),where𝒰=\{uh\}∪Uv\\mathcal\{U\}=\\\{u^\{\\mathrm\{h\}\}\\\}\\cup U^\{\\mathrm\{v\}\}consists of a unique hostuhu^\{\\mathrm\{h\}\}and a set of participating viewers, and\[α1,α2,…,αN\]\[\\alpha\_\{1\},\\alpha\_\{2\},\\ldots,\\alpha\_\{N\}\]is the chronologically ordered sequence of actions within\[0,T\]\[0,T\]\. Each actionαi\\alpha\_\{i\}implicitly carries user and temporal context through\(ui,ti\)\(u\_\{i\},t\_\{i\}\)\.
###### Definition 3\.3\.
\(Live Streaming Session Encoder\)In practice, risk assessment models typically rely on an intermediate session\-level representation that aggregates information across all actions\. We therefore assume a generic backbone encoder
ℰ\(⋅\):S\[0,T\]→𝐱∈ℝD,\\mathcal\{E\}\(\\cdot\):S^\{\[0,T\]\}\\rightarrow\\mathbf\{x\}\\in\\mathbb\{R\}^\{D\},which maps a live streaming session to aDD\-dimensional embedding𝐱\\mathbf\{x\}\. The encoderℰ\(⋅\)\\mathcal\{E\}\(\\cdot\)can be instantiated by any sequence or multi\-instance learning \(MIL\) model, and is trained jointly with the downstream risk predictor\. Our method operates as a plug\-in module on top of this session representation, without imposing architectural constraints onℰ\(⋅\)\\mathcal\{E\}\(\\cdot\)\.
Problem Objective\.Given a dataset𝒟=\{\(Si\[0,T\],yi\)\}i=1N,\\mathcal\{D\}=\\\{\(S\_\{i\}^\{\[0,T\]\},y\_\{i\}\)\\\}\_\{i=1\}^\{N\},whereyi∈\{0,1\}y\_\{i\}\\in\\\{0,1\\\}indicates whether sessioniiis risky, the objective is to learn a functionf:S\[0,T\]→\[0,1\],f:S^\{\[0,T\]\}\\rightarrow\[0,1\],that estimates the probability that a session involves malicious activity\.
## 4\.Methodology
Figure 2\.Overview of LPCD\. In training flow: \(a\) Latent Representation Disentanglement factorizes session representations into intent and packaging components; \(b\) Counterfactual Consistency Decoupling enforces intent invariance under counterfactual packaging at both the representation and prediction levels; and \(c\) Risk Prediction aggregates the disentangled factors to produce session\-level risk scores\. At test time, \(d\) Post\-hoc Magnitude Calibration adjusts tactic\-induced magnitude shifts in packaging representations before inference, enabling robust deployment under evolving adversarial tactics\.### 4\.1\.Overview of LPCD
Figure[2](https://arxiv.org/html/2606.02946#S4.F2)presents an overview of our proposed LPCD framework for live streaming risk assessment, which combines latent causal decoupling with post\-hoc magnitude calibration\.
As illustrated in Figure[2](https://arxiv.org/html/2606.02946#S4.F2), this plug\-in framework consists of three training\-stage components and a lightweight inference\-stage calibration module\. In training flow:\(a\) Latent Representation Disentanglementdecomposes session representations into intent and packaging factors, capturing underlying malicious objectives and their tactical realizations, respectively\.\(b\) Counterfactual Consistency Decouplingenforces intent invariance under counterfactual packaging at both the representation and prediction levels, mitigating spurious correlations induced by tactic evolution\.\(c\) Risk Predictionaggregates the disentangled factors to produce session\-level risk scores under standard supervision\. At test time,\(d\) Post\-hoc Magnitude Calibrationfurther rectifies tactic\-induced magnitude shifts in packaging representations at test time before risk inference\.
This design enables robust risk prediction by isolating stable malicious intent, decoupling tactical variations, and correcting distributional drift during deployment\.
### 4\.2\.Latent Representation Disentanglement
Existing live streaming risk assessment models\(Qiaoet al\.,[2026](https://arxiv.org/html/2606.02946#bib.bib50)\)represent each session using a single embedding, which naturally entangles malicious intent with tactical packaging\. Under evolving narrative strategies, such entanglement complicates identifying intent\-relevant signals that remain stable across tactical variations\. To expose these stable factors and enable latent counterfactual analysis, we decompose the session representation into intent\-related and packaging\-related factors, following the principles of disentangled representation learning works\(Higginset al\.,[2017](https://arxiv.org/html/2606.02946#bib.bib27); Kim and Mnih,[2018](https://arxiv.org/html/2606.02946#bib.bib28)\)\.
Dual\-Branch Disentangler Architecture\.Given a session\-level embedding𝐱=ℰ\(S\[0,T\]\)∈ℝD\\mathbf\{x\}=\\mathcal\{E\}\(S^\{\[0,T\]\}\)\\in\\mathbb\{R\}^\{D\}from the backbone encoder, we introduce a learnable dual\-branch disentanglerΦ:ℝD→ℝdintent×ℝdpack\\Phi:\\mathbb\{R\}^\{D\}\\rightarrow\\mathbb\{R\}^\{d\_\{\\mathrm\{intent\}\}\}\\times\\mathbb\{R\}^\{d\_\{\\mathrm\{pack\}\}\}, which decomposes𝐱\\mathbf\{x\}into intent\-related and packaging\-related latent factors\. Here,dintentd\_\{\\mathrm\{intent\}\}anddpackd\_\{\\mathrm\{pack\}\}denote the dimensions of the intent and packaging latent spaces, respectively\.
Specifically,Φ\(⋅\)\\Phi\(\\cdot\)is implemented as a lightweight dual\-branch multilayer perceptron \(MLP\) on top of the backbone embedding: a shared transformation first extracts common session semantics, followed by two projection heads that map the shared representation into the*intent*and*packaging*subspaces:
\(1\)𝐡=fshared\(𝐱\),𝐳intent=fintent\(𝐡\),𝐳pack=fpack\(𝐡\),\\mathbf\{h\}=f\_\{\\mathrm\{shared\}\}\(\\mathbf\{x\}\),\\quad\\mathbf\{z\}\_\{\\mathrm\{intent\}\}=f\_\{\\mathrm\{intent\}\}\(\\mathbf\{h\}\),\\quad\\mathbf\{z\}\_\{\\mathrm\{pack\}\}=f\_\{\\mathrm\{pack\}\}\(\\mathbf\{h\}\),producing a pair of latent representations\(𝐳intent,𝐳pack\)\(\\mathbf\{z\}\_\{\\mathrm\{intent\}\},\\mathbf\{z\}\_\{\\mathrm\{pack\}\}\)for a single live streaming session\. For notational brevity, we denote the complete intent branch byΦintent\(⋅\)=fintent∘fshared\(⋅\)\\Phi\_\{\\mathrm\{intent\}\}\(\\cdot\)=f\_\{\\mathrm\{intent\}\}\\circ f\_\{\\mathrm\{shared\}\}\(\\cdot\)in the subsequent sections\.
Semantic Preservation via Reconstruction\.To ensure that the disentangled representations jointly preserve sufficient session semantics, we introduce a reconstruction\-based regularization\. A decoderD\(⋅\)D\(\\cdot\)recombines the intent and packaging representations to reconstruct the original session embedding𝐱^=D\(𝐳intent,𝐳pack\),\\hat\{\\mathbf\{x\}\}=D\(\\mathbf\{z\}\_\{\\mathrm\{intent\}\},\\mathbf\{z\}\_\{\\mathrm\{pack\}\}\),which is implemented as a 2\-layer MLP\. The reconstruction loss is defined as
\(2\)ℒrec=‖𝐱−𝐱^‖22,\\mathcal\{L\}\_\{\\mathrm\{rec\}\}=\\\|\\mathbf\{x\}\-\\hat\{\\mathbf\{x\}\}\\\|\_\{2\}^\{2\},where∥⋅∥2\\\|\\cdot\\\|\_\{2\}denotes the Euclidean norm\. This loss prevents degenerate solutions and encourages faithful information preservation across the two latent factors\.
Cross\-factor Orthogonality Constraint\.To further reduce unintended information leakage between intent and packaging representations, we impose a soft orthogonality constraint\(Bousmaliset al\.,[2016](https://arxiv.org/html/2606.02946#bib.bib29)\)that penalizes linear correlation between the two latent spaces\. Given a training batch of sizeBB, the orthogonality loss is defined as
\(3\)ℒortho=1B‖𝐙intent⊤𝐙pack‖F2,\\mathcal\{L\}\_\{\\mathrm\{ortho\}\}=\\frac\{1\}\{B\}\\left\\\|\\mathbf\{Z\}\_\{\\mathrm\{intent\}\}^\{\\top\}\\mathbf\{Z\}\_\{\\mathrm\{pack\}\}\\right\\\|\_\{F\}^\{2\},where𝐙intent\\mathbf\{Z\}\_\{\\mathrm\{intent\}\}and𝐙pack\\mathbf\{Z\}\_\{\\mathrm\{pack\}\}denote the batch\-wise matrices of intent and packaging representations, respectively, and∥⋅∥F\\\|\\cdot\\\|\_\{F\}denotes the Frobenius norm\. This regularization acts as a soft constraint that discourages cross\-factor entanglement without enforcing strict independence assumptions\.
The resulting disentangled latent space provides a structured representation in which stable intent and volatile tactical packaging are explicitly separated\. Next, we introduce counterfactual consistency objectives that operate on this latent factorization to enforce robustness under controlled packaging interventions\.
### 4\.3\.Counterfactual Consistency Decoupling
While latent disentanglement exposes intent\- and packaging\-related factors, architectural separation alone does not guarantee that the intent representation is invariant to tactical variations\. Under purely observational supervision, intent embeddings may still encode tactic\-specific cues that co\-occur with malicious behavior in the training data\. To explicitly eliminate such spurious dependencies, inspired by causal intervention\(Pearl,[2009](https://arxiv.org/html/2606.02946#bib.bib30); Federet al\.,[2022](https://arxiv.org/html/2606.02946#bib.bib26)\), we introduce*Counterfactual Consistency Decoupling \(CCD\)*, which enforces intent invariance under controlled packaging interventions at both the representation and prediction levels\.
#### 4\.3\.1\.Representation\-Level CCD
The representation\-level CCD enforces that intent\-related representations remain stable when tactical packaging is counterfactually altered\. Otherwise, intent representations are learned only from co\-occurring intent–packaging pairs in the training data, and their apparent stability does not imply robustness to unseen tactical realizations\.
Counterfactual Construction\.Given a training batch of live streaming sessions, we partition samples into*risky*and*safe*groups based on supervision\. To approximate a stable and benign tactical realization, we compute a batch\-wise reference packaging representation as the mean of packaging factors from safe sessions, denoted by𝐳¯packsafe\\bar\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}^\{\\mathrm\{safe\}\}\. The use of the batch\-wise mean𝐳¯packsafe\\bar\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}^\{\\mathrm\{safe\}\}serves as a prototypical representation of benign tactical packaging, providing a stable intervention target that is independent of the tactic\-specific cues of individual risky sessions\.
For each risky session with intent representation𝐳intentr\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{r\}, we construct a counterfactual session embedding by explicitly intervening on the packaging factor while preserving the intent factor:𝐱CFr=D\(𝐳intentr,𝐳¯packsafe\),\\mathbf\{x\}\_\{\\mathrm\{CF\}\}^\{r\}=D\\left\(\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{r\},\\bar\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}^\{\\mathrm\{safe\}\}\\right\),whereD\(⋅\)D\(\\cdot\)denotes the decoder introduced in the disentanglement module\. Similar to counterfactual generation in observational space\(Sauer and Geiger,[2021](https://arxiv.org/html/2606.02946#bib.bib32)\), this operation simulates the same malicious intent expressed under an ordinary, benign packaging\.
The counterfactual embedding is then re\-encoded by the disentangler to obtain the corresponding counterfactual intent representation:𝐳intentCF,r=Φintent\(𝐱CFr\)\.\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{\\mathrm\{CF\},r\}=\\Phi\_\{\\mathrm\{intent\}\}\\left\(\\mathbf\{x\}\_\{\\mathrm\{CF\}\}^\{r\}\\right\)\.
Latent Consistency Objective\.To enforce invariance, we adopt a contrastive consistency objective utilizing a triplet\-style loss\(Schroffet al\.,[2015](https://arxiv.org/html/2606.02946#bib.bib31); Chenet al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib34)\)\. Specifically, the factual intent representation𝐳intentr\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{r\}serves as the*anchor*\. Its counterfactual counterpart𝐳intentCF,r\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{\\mathrm\{CF\},r\}, obtained by intervening on narrative packaging while preserving intent, is treated as the*positive*, while intent representations from safe sessions act as*negatives*\. The representation\-level CCD loss is defined as:
\(4\)ℒCCDrep=max\(0,m\+𝔼r,s\[Sim\(𝐳intentr,𝐳intents\)\]−Sim\(𝐳intentr,𝐳intentCF,r\)\),\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}=\\max\\left\(0,m\+\\mathbb\{E\}\_\{r,s\}\\\!\\left\[\\mathrm\{Sim\}\\\!\\left\(\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{r\},\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{s\}\\right\)\\right\]\-\\mathrm\{Sim\}\\\!\\left\(\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{r\},\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{\\mathrm\{CF\},r\}\\right\)\\right\),where𝐳intents\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{s\}denotes intent representations from all safe sessions in the batch\.Sim\(⋅,⋅\)\\mathrm\{Sim\}\(\\cdot,\\cdot\)denotes cosine similarity, andmmis a margin hyperparameter\. This objective encourages intent representations to remain invariant under counterfactual packaging while maintaining separation from benign intent patterns\.
Gradient Blocking Strategy\.In our implementation, during the computation ofℒCCDrep\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}, we block the gradient flow through the counterfactual generation process \(i\.e\., the decoderDDand the re\-disentanglement ofxCF\\textbf\{x\}\_\{\\mathrm\{CF\}\}\)\. This ensures that the loss specifically optimizes the disentanglerΦ\\Phito map the counterfactual input back to its original intent manifold, rather than implicitly shifting the counterfactual construction itself to simplify the task\.
#### 4\.3\.2\.Prediction\-Level CCD
While representation\-level CCD constrains the latent space, it does not directly prevent the downstream classifier from exploiting residual tactic\-related cues\. Hence, similar to\(Veitchet al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib35)\), prediction\-level CCD should enforce causal consistency at the decision level, requiring the risk predictor to produce stable outputs under packaging interventions\.
The core intuition is that if the disentanglement is successful, replacing a risky session’s original packaging𝐳packr\\mathbf\{z\}\_\{\\mathrm\{pack\}\}^\{r\}with a safe reference𝐳¯packsafe\\bar\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}^\{\\mathrm\{safe\}\}should not alter its risk nature\. Therefore, the predictor’s output for the counterfactual session \(which carries the same malicious intent but is re\-wrapped in a benign style\) should remain consistent with the factual prediction\.
Predictive Consistency Objective\.For a risky session, we compute the factual and counterfactual*logits*using the same intent representation:
\(5\)ℓ=g\(𝐳intentr⊕𝐳packr\),ℓCF=g\(𝐳intentr⊕𝐳¯packsafe\),\\ell=g\\\!\\left\(\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{r\}\\oplus\\mathbf\{z\}\_\{\\mathrm\{pack\}\}^\{r\}\\right\),\\quad\\ell\_\{\\mathrm\{CF\}\}=g\\\!\\left\(\\mathbf\{z\}\_\{\\mathrm\{intent\}\}^\{r\}\\oplus\\bar\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}^\{\\mathrm\{safe\}\}\\right\),whereg\(⋅\)g\(\\cdot\)denotes the risk predictor before activation and⊕\\oplusdenotes concatenation\. To enforce predictive invariance under counterfactual packaging intervention, we minimize the discrepancy between the two logits:
\(6\)ℒCCDpred=‖ℓ−ℓCF‖22\.\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}=\\left\\\|\\ell\-\\ell\_\{\\mathrm\{CF\}\}\\right\\\|\_\{2\}^\{2\}\.Unlike representation\-level CCD,ℒCCDpred\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}allows end\-to\-end gradient propagation, explicitly discouraging reliance on tactic\-induced shortcuts\.
Together, the two levels of CCD form a two\-stage causal regularization mechanism\. Representation\-level CCD enforces invariance in the latent intent space, while prediction\-level CCD ensures that such invariance is respected by the decision function\. By enabling representation\-level invariance and predictive consistency, LPCD establishes a robust causal bridge from latent factorization to final risk assessment, ensuring that the decision boundary is inherently resilient to the “chameleon\-like” evolution of adversarial packaging\.
### 4\.4\.Risk Prediction and Training Objective
In the following, we formulate the joint optimization objective of the plug\-in LPCD framework\.
Main Risk Prediction\.To produce the final risk score, we employ the risk predictorg\(⋅\)g\(\\cdot\)that takes the disentangled factors as input\. To capture the full session context while emphasizing the disentangled structure, we concatenate the intent and packaging representations as the final feature vector:y^=Sigmoid\(g\(𝐳intent⊕𝐳pack\)\),\\hat\{y\}=\\mathrm\{Sigmoid\}\\\!\\left\(g\(\\mathbf\{z\}\_\{\\mathrm\{intent\}\}\\oplus\\mathbf\{z\}\_\{\\mathrm\{pack\}\}\)\\right\),wherey^∈\(0,1\)\\hat\{y\}\\in\(0,1\)denotes the predicted risk probability\. The primary objective is to minimize the binary cross\-entropy \(BCE\) loss under standard supervision:
\(7\)ℒmain=−1B∑i=1B\[yilogy^i\+\(1−yi\)log\(1−y^i\)\],\\mathcal\{L\}\_\{\\mathrm\{main\}\}=\-\\frac\{1\}\{B\}\\sum\_\{i=1\}^\{B\}\\left\[y\_\{i\}\\log\\hat\{y\}\_\{i\}\+\(1\-y\_\{i\}\)\\log\(1\-\\hat\{y\}\_\{i\}\)\\right\],whereyi∈\{0,1\}y\_\{i\}\\in\\\{0,1\\\}denotes the ground\-truth risk label\.
Joint Optimization Objective\.LPCD is trained end\-to\-end by simultaneously optimizing the predictive performance and the constraints of the latent space\. The total loss function is defined as a weighted combination of all previously introduced objectives:
\(8\)ℒtotal=ℒmain\+λrecℒrec\+λorthoℒortho\+λCCDrepℒCCDrep\+λCCDpredℒCCDpred,\\mathcal\{L\}\_\{\\text\{total\}\}=\\mathcal\{L\}\_\{\\mathrm\{main\}\}\+\\lambda\_\{\\mathrm\{rec\}\}\\mathcal\{L\}\_\{\\mathrm\{rec\}\}\+\\lambda\_\{\\mathrm\{ortho\}\}\\mathcal\{L\}\_\{\\mathrm\{ortho\}\}\+\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}\+\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\},whereλrec,λortho,λCCDrep,λCCDpred\\lambda\_\{\\mathrm\{rec\}\},\\lambda\_\{\\mathrm\{ortho\}\},\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\},\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}are hyperparameters that balance the trade\-off between semantic preservation, factor orthogonality, and dual\-level causal consistency\. This joint supervision prevents the model from exploiting spurious correlations, ensuring the decision boundary is anchored on stable intent\-related factors\.
### 4\.5\.Post\-hoc Magnitude Calibration at Inference
While the CCD module enforces semantic invariance during training, adversarial attackers may still induce*tactical magnitude shifts*in the packaging manifold during deployment\. Such shifts manifest as changes in the latent energy of𝐳pack\\mathbf\{z\}\_\{\\mathrm\{pack\}\}, which can destabilize the predictor even when the underlying semantic content remains unchanged\. Inspired by test\-time normalization techniques\(Liet al\.,[2018](https://arxiv.org/html/2606.02946#bib.bib33)\), to ensure robust deployment under evolving tactics, we introduce a lightweight post\-hoc calibration mechanism that rectifies test\-time packaging magnitudes using training\-stage statistics\.
Online Magnitude Tracking\.To handle the high variance of live streaming traffic, we maintain a running estimate of the second\-order statistics of the packaging representation\. Letσtrain,d\\sigma\_\{\\mathrm\{train\},d\}denote the Root Mean Square \(RMS\) of thedd\-th dimension of𝐳pack\\mathbf\{z\}\_\{\\mathrm\{pack\}\}computed over the safe samples from the training set\. During inference, we estimate the test\-stage magnitude using a sliding batch of incoming sessions\. Specifically, given a mini\-batchℬ\(t\)\\mathcal\{B\}^\{\(t\)\}at inference steptt, the test\-time RMS is updated as:
\(9\)σtest,d\(t\)=\(1−α\)σtest,d\(t−1\)\+α1\|ℬ\(t\)\|∑𝐳∈ℬ\(t\)\(𝐳pack,d\)2,\\sigma\_\{\\mathrm\{test\},d\}^\{\(t\)\}=\(1\-\\alpha\)\\,\\sigma\_\{\\mathrm\{test\},d\}^\{\(t\-1\)\}\+\\alpha\\sqrt\{\\frac\{1\}\{\|\\mathcal\{B\}^\{\(t\)\}\|\}\\sum\_\{\\mathbf\{z\}\\in\\mathcal\{B\}^\{\(t\)\}\}\(\\mathbf\{z\}\_\{\\mathrm\{pack\},d\}\)^\{2\}\},whereα∈\(0,1\]\\alpha\\in\(0,1\]is a momentum coefficient\. The tracking process is initialized withσtest,d\(0\)=σtrain,d\\sigma\_\{\\mathrm\{test\},d\}^\{\(0\)\}=\\sigma\_\{\\mathrm\{train\},d\}\. Note that in offline evaluation, we approximate the online update by computingσtest,d\\sigma\_\{\\mathrm\{test\},d\}from the current test mini\-batch only\.
Magnitude Rectification\.Based on the tracked statistics, we construct a diagonal calibration matrix𝚪\(t\)∈ℝdpack×dpack\\boldsymbol\{\\Gamma\}^\{\(t\)\}\\in\\mathbb\{R\}^\{d\_\{\\mathrm\{pack\}\}\\times d\_\{\\mathrm\{pack\}\}\}to rescale the packaging representation:
\(10\)𝚪\(t\)=diag\(γ1\(t\),…,γdpack\(t\)\),γd\(t\)=σtrain,dσtest,d\(t\)\.\\boldsymbol\{\\Gamma\}^\{\(t\)\}=\\mathrm\{diag\}\\\!\\left\(\\gamma\_\{1\}^\{\(t\)\},\\ldots,\\gamma\_\{d\_\{\\mathrm\{pack\}\}\}^\{\(t\)\}\\right\),\\quad\\gamma\_\{d\}^\{\(t\)\}=\\frac\{\\sigma\_\{\\mathrm\{train\},d\}\}\{\\sigma\_\{\\mathrm\{test\},d\}^\{\(t\)\}\}\.The calibrated packaging representation is then obtained via a simple diagonal transformation:𝐳~pack=𝚪\(t\)𝐳pack\.\\tilde\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}=\\boldsymbol\{\\Gamma\}^\{\(t\)\}\\mathbf\{z\}\_\{\\mathrm\{pack\}\}\.The final calibrated risk score is produced as:y^cal=Sigmoid\(g\(𝐳intent⊕𝐳~pack\)\)\.\\hat\{y\}\_\{\\mathrm\{cal\}\}=\\mathrm\{Sigmoid\}\\\!\\left\(g\(\\mathbf\{z\}\_\{\\mathrm\{intent\}\}\\oplus\\tilde\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}\)\\right\)\.
By aligning the latent energy of the packaging factor to training\-stage statistics, this calibration module mitigates tactic\-induced magnitude perturbations at inference time\. Importantly, this calibration operates purely at the*statistical level*\. It introduces no additional learnable parameters, requires no gradient\-based optimization, and incurs only negligible inference\-time overhead, making it suitable for high\-throughput live streaming scenarios\.
## 5\.Experiments
Table 1\.Statistics of the May and June datasets\.\#Sessions\#Avg\.Actions\#Avg\.UsersAvg\.Time \(min\)Maytrain176,347176\{,\}3477093530\.0val23,56223\{,\}5627043629\.6ID test22,46222\{,\}4627403729\.7OOD test15,3206664428\.5Junetrain79,55279\{,\}5527003630\.0val10,93410\{,\}9347674029\.1ID test10,96710\{,\}9677253729\.1OOD test16,7226794428\.6
Table 2\.Overall Performance Comparison on May and June Datasets\. Metrics: PR\-AUC \(AUC\), F1\-score \(F1\), R@0\.1FPR \(R\.1\), and FPR@0\.9R \(FPR\.9\)\. Best and second\-best results are inboldand shaded red, underlined and shaded orange, respectively; backbone SOTA is inboldand shaded green\. ‘∗’ indicatesp<0\.05p<0\.05\.MethodsTrained on May \(05/20–06/03\)Trained on June \(06/04–06/10\)May ID Test Set \(06/13–06/14\)May OOD Test Set \(09/23–09/24\)June ID Test Set \(06/16\)June OOD Test Set \(10/16–10/17\)AUC↑\\uparrowF1↑\\uparrowR\.1↑\\uparrowFPR\.9↓\\downarrowAUC↑\\uparrowF1↑\\uparrowR\.1↑\\uparrowFPR\.9↓\\downarrowAUC↑\\uparrowF1↑\\uparrowR\.1↑\\uparrowFPR\.9↓\\downarrowAUC↑\\uparrowF1↑\\uparrowR\.1↑\\uparrowFPR\.9↓\\downarrowBackbonesSequenceModelsTransformer0\.71890\.66680\.83940\.15800\.67280\.60070\.79780\.20080\.68010\.63410\.82250\.15650\.62080\.59070\.76360\.2545Reformer0\.72930\.67520\.85750\.14360\.65700\.58420\.78900\.21260\.69110\.63950\.81040\.17600\.61890\.59670\.75620\.2638Informer0\.72460\.67080\.84380\.15550\.65860\.60070\.79490\.22320\.68790\.63910\.83750\.16010\.60280\.59020\.75080\.2661MILMethodsMIL\-LET0\.72410\.67490\.85460\.14180\.66430\.59200\.79780\.19320\.69420\.65280\.84550\.14990\.60500\.51910\.76760\.2741TimeMIL0\.73530\.67900\.85990\.14360\.64430\.58640\.78160\.19040\.69630\.64710\.84950\.13670\.63160\.59830\.77630\.2288TAIL\-MIL0\.73160\.67850\.85700\.13410\.66060\.57930\.79040\.20080\.70290\.65090\.82050\.15550\.63650\.58690\.77760\.2391AC\-MIL0\.76760\.70020\.87220\.12600\.70450\.64280\.81180\.17140\.73110\.67770\.85460\.13450\.68580\.62350\.79570\.2130Best Backbone \(AC\-MIL\) \+ OOD Plug\-insIL\+ IRM0\.76990\.70330\.87810\.12130\.70980\.64080\.82130\.17690\.73170\.68360\.85370\.14030\.69050\.62440\.79910\.2162\+ VREx0\.76260\.69690\.87070\.13030\.69990\.63300\.81250\.18360\.73070\.67440\.85660\.13840\.68520\.61500\.80580\.2226\+ IB\-IRM0\.77190\.70800\.87660\.12190\.71030\.64070\.81400\.17830\.72860\.67570\.85560\.14220\.68490\.62600\.79500\.2144DA\+ MIXUP0\.77260\.70180\.87760\.12110\.70620\.64420\.82570\.17800\.72790\.67520\.84450\.14210\.68510\.62770\.79640\.2000\+ CORAL0\.76760\.70290\.86920\.13150\.70700\.63780\.81840\.17670\.73270\.67940\.86020\.13130\.69400\.62210\.80510\.2206DRO\+ GroupDRO0\.77160\.70490\.87710\.12050\.71270\.64460\.81910\.17890\.72940\.67810\.85380\.14040\.68730\.62410\.79710\.2162\+ ASGDRO0\.77150\.70380\.87660\.12220\.71440\.64430\.82350\.17730\.73350\.68110\.84550\.14000\.68840\.62490\.79840\.2197EI\+ EIIL0\.76860\.68240\.87560\.12070\.70760\.64090\.81690\.17430\.73750\.66010\.86360\.12990\.68770\.61700\.79710\.2229\+ FOIL0\.77470\.70120\.87900\.11910\.70970\.64630\.81910\.17130\.73340\.67600\.86360\.13140\.68280\.62860\.80310\.2111\+ LPCD \(Ours\)0\.7841\*0\.7121\*0\.8832\*0\.1158\*0\.7300\*0\.6828\*0\.8529\*0\.1589\*0\.7454\*0\.6877\*0\.8768\*0\.1292\*0\.7287\*0\.6779\*0\.8600\*0\.1732\*Gain over AC\-MIL\+2\.1%\+1\.7%\+1\.3%\-8\.1%\+3\.6%\+6\.2%\+5\.1%\-7\.3%\+2\.0%\+1\.5%\+2\.6%\-4\.0%\+6\.3%\+8\.7%\+2\.1%\-18\.7%Gain over Best Plug\-in\+1\.2%\+0\.6%\+0\.5%\-2\.8%\+2\.2%\+5\.6%\+3\.6%\-7\.2%\+1\.1%\+1\.0%\+1\.5%\-1\.0%\+5\.0%\+7\.8%\+5\.4%\-13\.4%
In this section, we evaluate LPCD on large\-scale industrial data to answer the following research questions:
- •RQ1: Does LPCD outperform strong baselines under both in\-distribution and tactical OOD settings?
- •RQ2: What is the contribution of each component in LPCD?
- •RQ3: How does LPCD compare with a retraining oracle in terms of performance and efficiency?
- •RQ4: Does LPCD disentangle intent\-invariant risk signals from tactical packaging variations in the latent space?
- •RQ5: Can LPCD be effectively applied as a plug\-in to different backbone models?
- •RQ6: Does LPCD improve performance in online deployment?
### 5\.1\.Experimental Setup
#### 5\.1\.1\.Datasets
We collect two large\-scale industrial live\-streaming datasets from the Douyin Live\-streaming platform111All data were collected and processed in compliance with the platform’s privacy policy\., denoted asMayandJune222[https://huggingface\.co/datasets/ByteDance/LiveStreamingRiskControl](https://huggingface.co/datasets/ByteDance/LiveStreamingRiskControl)\. To assess robustness against tactical evolution, each dataset is temporally partitioned into*training*,*validation*,*in\-distribution \(ID\) test*, and a*tactical OOD test*set\. For theMaydataset, training data spans 05/20/2025–06/03/2025, followed by a validation set from 06/11/2025 to 06/12/2025, an ID test set on 06/13/2025–06/14/2025, and an OOD test set spans from 09/23/2025 to 09/24/2025\. TheJunedataset uses 06/04/2025–06/10/2025 for training, 06/15/2025 for validation, and 06/16/2025 as the ID test set, with its OOD evaluation on 10/16/2025–10/17/2025\. Table[1](https://arxiv.org/html/2606.02946#S5.T1)presents the basic statistics of our datasets\.
Action Space and Modalities\.Sessions are represented by heterogeneous action sequences involving both hosts and viewers\. Viewer\-side actions include entries, comments \(danmaku\), virtual gifting, and social interactions \(i\.e\., likes, shares, co\-stream requests, and group joins\)\. In addition to the start of the stream, host\-side signals provide semantic context through speech transcripts obtained via ASR and on\-screen text extracted by OCR\. Textual content is encoded using a Chinese\-BERT encoder333https://huggingface\.co/google\-bert/bert\-base\-chinese\.
Session Processing\.Following prior work\(Qiaoet al\.,[2026](https://arxiv.org/html/2606.02946#bib.bib50)\), each live streaming session is truncated to its first 30 minutes to reflect early\-stage risk detection\. To focus on high\-impact interactions, we retain signals from the top 50 most active viewers per session\. Following industrial risk control practice, all malicious sessions are preserved, while benign sessions are down\-sampled to maintain a 1:10 class ratio\.
#### 5\.1\.2\.Baselines\.
*\(a\) Backbones\.*Following prior practice\(Qiaoet al\.,[2026](https://arxiv.org/html/2606.02946#bib.bib50)\), we consider two families of backbones as candidates:Sequence modelsincluding Transformer\(Vaswaniet al\.,[2017](https://arxiv.org/html/2606.02946#bib.bib36)\), Reformer\(Kitaevet al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib37)\), and Informer\(Zhouet al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib38)\); andMultiple Instance Learning \(MIL\) methodsincluding MIL\-LET\(Earlyet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib39)\), TimeMIL\(Chenet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib40)\), TAIL\-MIL\(Jang and Kwon,[2025](https://arxiv.org/html/2606.02946#bib.bib41)\), and the SOTA AC\-MIL\(Qiaoet al\.,[2026](https://arxiv.org/html/2606.02946#bib.bib50)\)\.
*\(b\) OOD Plug\-ins\.*We compare LPCD with representative OOD generalization plug\-ins from four paradigms:Invariant Learning \(IL\), including IRM\(Arjovskyet al\.,[2019](https://arxiv.org/html/2606.02946#bib.bib16)\), VREx\(Kruegeret al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib17)\), and IB\-IRM\(Ahujaet al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib42)\);Data Augmentation and Alignment \(DA\), including Mixup\(Yanet al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib43)\)and CORAL\(Sun and Saenko,[2016](https://arxiv.org/html/2606.02946#bib.bib44)\);Distributionally Robust Optimization \(DRO\), including GroupDRO\(Sagawa\*et al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib45)\)and ASGDRO\(Kimet al\.,[2025](https://arxiv.org/html/2606.02946#bib.bib46)\); andEnvironment Inference \(EI\), including EIIL\(Creageret al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib47)\)and FOIL\(Liuet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib48)\)\. Note that more baseline details can be found in Appendix[A](https://arxiv.org/html/2606.02946#A1)\.
#### 5\.1\.3\.Implementation Details\.
All the models are trained using AdamW\(Loshchilov and Hutter,[2019](https://arxiv.org/html/2606.02946#bib.bib49)\)with a learning rate and weight decay of1e−41\\mathrm\{e\}\{\-4\}\. The session embedding dimension is set to 128, while disentangled representations𝐳intent\\mathbf\{z\}\_\{\\mathrm\{intent\}\}and𝐳pack\\mathbf\{z\}\_\{\\mathrm\{pack\}\}are both 32\-dimensional\. The causal consistency loss weightsλCCDrep\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}andλCCDpred\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}are selected via grid search over\{0\.5,1\.0,2\.0\}\\\{0\.5,1\.0,2\.0\\\}and\{0\.05,0\.1,0\.2,0\.5,1\.0\}\\\{0\.05,0\.1,0\.2,0\.5,1\.0\\\}, respectively\. Hyperparameter sensitivity results are provided in Appendix[B\.1](https://arxiv.org/html/2606.02946#A2.SS1)\.
Models are trained for up to 100 epochs with a batch size of 128 and an early stopping patience of 20\. To stabilize optimization, only the primary BCE lossℒmain\\mathcal\{L\}\_\{\\mathrm\{main\}\}is optimized during the first 5 warm\-up epochs\. Following AC\-MIL\(Qiaoet al\.,[2026](https://arxiv.org/html/2606.02946#bib.bib50)\), all backbone architectures use a dropout rate of 0\.1\. The margin hyperparametermmand momentum coefficientα\\alphaare fixed at 1\.0 and 0\.1, respectively\. We setλrec=1\.0\\lambda\_\{\\mathrm\{rec\}\}=1\.0, whileλortho\\lambda\_\{\\mathrm\{ortho\}\}is set to5e−45\\mathrm\{e\}\{\-4\}for May and1e−31\\mathrm\{e\}\{\-3\}for June\.
#### 5\.1\.4\.Evaluation Metrics\.
In all experiments, we reportPR\-AUC,F1\-score,R@0\.1FPR, andFPR@0\.9R\. PR\-AUC and F1\-score assess performance under class imbalance, where PR\-AUC is preferred over ROC\-AUC for its sensitivity to positive cases\. R@0\.1FPR reports recall at a fixed false positive rate of 10%, while FPR@0\.9R measures the false positive rate at 90% recall\. These threshold\-based metrics align with practical moderation requirements by balancing high\-risk coverage and false alarm control\.
### 5\.2\.Overall Performance \(RQ1\)
Table[2](https://arxiv.org/html/2606.02946#S5.T2)reports the overall performance on theMayandJunedatasets, covering both ID and OOD evaluation settings\. We summarize four key observations\.
LPCD consistently outperforms all baselines across datasets and distribution settings\.Across both the May and June datasets, LPCD consistently outperforms all baselines on all four metrics under both ID and OOD test settings\. These gains hold across different temporal splits and evaluation criteria, indicating that LPCD provides a stable and general performance improvement\.
LPCD exhibits amplified advantages under tactical OOD shifts\.We observe a universal performance degradation for all models as the temporal gap increases; e\.g\., the PR\-AUC of AC\-MIL drops by 6\.2%–8\.2% when transitioning to OOD sets\. However, LPCD’s relative advantages become markedly more pronounced in these challenging scenarios\. On the May OOD set, LPCD improves PR\-AUC by 3\.6% over AC\-MIL and 2\.2% over the strongest OOD plug\-in, with even larger relative gains on F1\-score \(\+6\.2%\)\. This widening gap directly supports our claim that LPCD is uniquely effective undertactical OODconditions\.
LPCD surpasses generic OOD plug\-ins through specialized causal intervention\.LPCD notably outperforms a wide spectrum of OOD techniques with the same backbone\. While these baselines aim to improve robustness via generic regularization or implicit environment inference, LPCD explicitly intervenes on latent narrative packaging to enforce counterfactual consistency\. The persistent performance gap indicates that LPCD captures complementary causal structures that generic OOD heuristics fail to model\.
LPCD delivers superior recall–false\-alarm trade\-offs for real\-world moderation\.Beyond aggregate metrics, LPCD achieves consistent improvements on threshold\-sensitive indicators critical to industrial systems\. Across both datasets, LPCD increases R@0\.1FPR while simultaneously reducing FPR@0\.9R\. Notably, the 18\.7% relative reduction in FPR@0\.9R on the June OOD set demonstrates LPCD’s ability to substantially reduce moderation burden under severe tactical shifts\.
### 5\.3\.Ablation Study \(RQ2\)
Table 3\.Ablation results on June OOD test set\.ℒdis=\{ℒrec,ℒortho\}\\mathcal\{L\}\_\{\\mathrm\{dis\}\}=\\\{\\mathcal\{L\}\_\{\\mathrm\{rec\}\},\\mathcal\{L\}\_\{\\mathrm\{ortho\}\}\\\}andℒccd=\{ℒCCDrep,ℒCCDpred\}\\mathcal\{L\}\_\{\\mathrm\{ccd\}\}=\\\{\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\},\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}\\\}\. TT\-Calibration refers to Post\-hoc Magnitude Calibration at inference\.VariantsJune OOD Test SetPR\-AUC↑\\uparrowF1\-score↑\\uparrowR@0\.1FPR↑\\uparrowFPR@0\.9R↓\\downarrowBackbone \(AC\-MIL\)0\.68580\.62350\.79570\.2130LPCD w/oℒdis\\mathcal\{L\}\_\{\\mathrm\{dis\}\}\(Onlyℒccd\\mathcal\{L\}\_\{\\mathrm\{ccd\}\}\)0\.68810\.62360\.79570\.2117LPCD w/oℒccd\\mathcal\{L\}\_\{\\mathrm\{ccd\}\}\(Onlyℒdis\\mathcal\{L\}\_\{\\mathrm\{dis\}\}\)0\.68890\.63110\.79770\.2269LPCD w/oℒrec\\mathcal\{L\}\_\{\\mathrm\{rec\}\}0\.68120\.62070\.79100\.2193LPCD w/oℒortho\\mathcal\{L\}\_\{\\mathrm\{ortho\}\}0\.68530\.62400\.78830\.2179LPCD w/oℒCCDrep\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}0\.69450\.63930\.80640\.2179LPCD w/oℒCCDpred\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}0\.69290\.63570\.80240\.2132LPCD w/o TT\-Calibration0\.70530\.63880\.81780\.2041LPCD0\.72870\.67790\.86000\.1732
To analyze the contribution of each component in LPCD, we conduct an ablation study on the June OOD test set, as shown in Table[3](https://arxiv.org/html/2606.02946#S5.T3)\. More ablation results on test\-time calibration can be found in Appendix[B\.2](https://arxiv.org/html/2606.02946#A2.SS2)\.
Decoupling and intervention are mutually dependent\.Removing either the disentanglement losses \(ℒdis\\mathcal\{L\}\_\{\\mathrm\{dis\}\}\) or the counterfactual losses \(ℒccd\\mathcal\{L\}\_\{\\mathrm\{ccd\}\}\) yields only marginal improvements over the AC\-MIL backbone\. This indicates that effective intervention relies on explicitly decoupled representations, while decoupling alone is insufficient without counterfactual supervision\.
Partial decoupling is detrimental\.Removing a single decoupling constraint \(ℒrec\\mathcal\{L\}\_\{\\mathrm\{rec\}\}orℒortho\\mathcal\{L\}\_\{\\mathrm\{ortho\}\}\) causes a larger performance drop than removing both\. This suggests that inconsistent decoupling introduces a harmful inductive bias, whereas removing both allows the model to fall back to a stable but non\-causal representation\.
Both representation\- and prediction\-level CCD are required\.Ablating eitherℒCCDrep\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}orℒCCDpred\\mathcal\{L\}\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}consistently degrades performance, confirming that robustness to tactical shifts must be enforced at both the latent representation and final decision stages\.
Test\-time calibration matters\.Removing test\-time calibration significantly reduces PR\-AUC \(from0\.7287to0\.7053\), showing that calibration serves as an effective last\-mile adjustment for residual packaging shifts at inference time\.
### 5\.4\.Efficiency Study \(RQ3\)
Table 4\.Efficiency comparison between LPCD and aRetraining Oracleon the June OOD test set \(10/16–10/17\)\. Retraining cost and inference latency are reported as wall\-clock time measured in offline experiments\. Inference latency is averaged over three runs on the full test set \(16,722 samples\)\. Metrics: PR\-AUC \(AUC\), F1\-score \(F1\), R@0\.1FPR \(R\.1\), and FPR@0\.9R \(FPR\.9\)\.MethodPerformanceOperational CostAUC↑\\uparrowF1↑\\uparrowR\.1↑\\uparrowFPR\.9↓\\downarrowRetrain TimeInf\. LatencyAC\-MIL \(Fixed\)0\.68580\.62350\.79570\.2130–714 sAC\-MIL \(Oracle\)0\.73030\.66030\.82310\.201621\.8 h717 sLPCD \(Fixed\)0\.72870\.67790\.86000\.1732–654 s
To evaluate efficiency under label latency, we compare LPCD on June OOD test set \(10/16–10/17\) with a*Retraining Oracle*that fully retrains the backbone using the latest labeled data\. The oracle is retrained on data from 10/08–10/14 with validation on 10/15, while LPCD is applied to a fixed model trained four months earlier \(06/04–06/10\), without any parameter updates\.
As shown in Table[4](https://arxiv.org/html/2606.02946#S5.T4), LPCD achieves performance comparable to the retraining oracle with zero retraining cost\. Although the oracle slightly outperforms LPCD on PR\-AUC, LPCD consistently performs better on all operational metrics \(F1, R@0\.1FPR, and FPR@0\.9R\)\. This indicates that LPCD improves decision quality under strict operating constraints, rather than merely adapting to recent class prevalence\. Moreover, LPCD reduces inference latency\. This benefit comes from its decoupled heads operating on compact intent and packaging representations \(32\+3232\+32dimensions\), instead of the high\-dimensional \(128128\) backbone features\. Overall, LPCD provides a robust and efficient alternative to frequent retraining for real\-time risk detection\.
### 5\.5\.Case Study \(RQ4\)
Figure 3\.t\-SNE visualization of decoupled representations\. Packaging representations separate sessions by surface tactics, while intent representations align sessions sharing the same underlying malicious objective\.To examine the effect of causal decoupling, we present a case study on two prevalent deceptive tactics:Handicraft Jobs\(fake home\-based work recruitment\) andDeceptive Sales\(luxury goods offered at extremely low prices\)\. As shown in Figure[3](https://arxiv.org/html/2606.02946#S5.F3)\(a\), these sessions form well\-separated clusters in thePackaging Space, reflecting their distinct surface presentations\. In contrast, Figure[3](https://arxiv.org/html/2606.02946#S5.F3)\(b\) shows that the same sessions collapse into a compact manifold in theIntent Space\. Despite divergent packaging, both tactics share the same underlying causal intent*off\-platform redirection*, which leads to subsequent actual scams\. By stripping away volatile packaging signals, LPCD isolates this invariant risk core, explaining its robustness to unseen tactical variants\.
### 5\.6\.Generality Study \(RQ5\)
Table 5\.Generality study ofLPCDacross diverse backbone architectures on theJune OODset\. Metrics: PR\-AUC \(AUC\), F1\-score \(F1\), R@0\.1FPR \(R\.1\), and FPR@0\.9R \(FPR\.9\)\.BackboneVariantAUC↑\\uparrowF1↑\\uparrowR\.1↑\\uparrowFPR\.9↓\\downarrowGain \(AUC\)TransformerVanilla0\.62080\.59070\.76360\.2545–\+LPCD0\.65730\.61480\.79420\.2232\+5\.9%ReformerVanilla0\.61890\.59670\.75620\.2638–\+LPCD0\.66830\.63620\.81720\.2301\+8\.0%TimeMILVanilla0\.63160\.59830\.77630\.2288–\+LPCD0\.67790\.64930\.83600\.1949\+7\.3%TAIL\-MILOriginal0\.63650\.58690\.77760\.2391–\+LPCD0\.68260\.64550\.83270\.1956\+7\.2%
To evaluate the plug\-and\-play capability of LPCD, we integrate it with diverse backbone architectures, including sequence models \(Transformer, Reformer\) and MIL\-based frameworks \(TimeMIL, TAIL\-MIL\)\. As depicted in Table[5](https://arxiv.org/html/2606.02946#S5.T5), LPCD consistently improves all backbones on the June OOD set\. In particular, LPCD achieves \+5\.9% to \+8\.0% relative PR\-AUC gains over the vanilla counterparts, while substantially reducing false positives at high recall\.
These consistent gains across both attention\-based and pooling\-based models indicate that LPCD operates as a model\-agnostic plug\-in rather than an architecture\-dependent design\. This suggests that decoupling invariant intent from transient surface behaviors generalizes well across backbone choices and can be applied to existing moderation systems without architectural changes\.
### 5\.7\.Online Test \(RQ6\)
Table 6\.Performance on real\-world production traffic \(01/18/26–01/19/26\)\. Metrics are computed on logs with a 1:10 positive\-to\-negative sampling ratio\.LPCDsignificantly outperforms the incumbent Transformer and XGBoost models\.MethodPR\-AUC↑\\uparrowF1\-score↑\\uparrowR@0\.1FPR↑\\uparrowFPR@0\.9R↓\\downarrowXGBoost0\.42290\.42810\.56370\.5779Transformer0\.58550\.61070\.75250\.2287LPCD0\.65780\.66900\.84100\.1625
To evaluate the real\-world impact of LPCD, we evaluate it on the production traffic of a major live streaming platform for A/B testing\. As summarized in Table[6](https://arxiv.org/html/2606.02946#S5.T6), LPCD consistently outperforms the incumbent XGBoost and Transformer models across all metrics, achieving an R@0\.1FPR of 0\.8410 and a significant reduction in FPR@0\.9R \(0\.1625\)\. These results demonstrate that LPCD’s causal decoupling mechanism effectively generalizes to the complex and unpredictable tactical OOD shifts in live environments\. By maintaining high precision while suppressing false alarms, LPCD significantly reduces the manual moderation burden and enhances the overall safety of the platform in an industrial\-scale deployment\.
## 6\.Conclusion
In this paper, we identify and address the challenge oftactical out\-of\-distribution \(OOD\) shiftin live streaming risk assessment: a strategic adversarial scenario where malicious actors evolve narrative packaging while maintaining stable objectives\. We proposeLPCD, a plug\-in framework that leverages a latent causal perspective to disentangle invariant intent from volatile packaging\. By enforcing latent counterfactual consistency across representative and predictive levels and applying inference\-time calibration, LPCD effectively anchors risk detection on stable causal signals, bypassing the need for environment boundaries or raw\-level counterfactuals\.
Extensive offline experiments and online validation on large\-scale industrial traffic demonstrate that LPCD not only achieves superior robustness against evolving tactics but also maintains the efficiency required for real\-world moderation\. Our work highlights the importance of causal disentanglement in adversarial environments and provides a scalable solution for building robust, intent\-focused risk assessment systems\.
###### Acknowledgements\.
The research work is supported by the National Natural Science Foundation of China under Grant Nos\. U2436209, 62576333, and 62406307, the Strategic Priority Research Program of the Chinese Academy of Sciences under Grant No\. XDB0680201, the Beijing Natural Science Foundation \(F251001\), and the Innovation Funding of ICT, CAS under Grant No\. E461060\.
## References
- K\. Ahuja, E\. Caballero, D\. Zhang, J\. Gagnon\-Audet, Y\. Bengio, I\. Mitliagkas, and I\. Rish \(2021\)Invariance principle meets information bottleneck for out\-of\-distribution generalization\.Advances in Neural Information Processing Systems34,pp\. 3438–3450\.Cited by:[3rd item](https://arxiv.org/html/2606.02946#A1.I3.i3.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p2.1)\.
- M\. Arjovsky, L\. Bottou, I\. Gulrajani, and D\. Lopez\-Paz \(2019\)Invariant risk minimization\.arXiv preprint arXiv:1907\.02893\.Cited by:[1st item](https://arxiv.org/html/2606.02946#A1.I3.i1.p1.1),[§1](https://arxiv.org/html/2606.02946#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p2.1)\.
- K\. Bousmalis, G\. Trigeorgis, N\. Silberman, D\. Krishnan, and D\. Erhan \(2016\)Domain separation networks\.Advances in neural information processing systems29\.Cited by:[§4\.2](https://arxiv.org/html/2606.02946#S4.SS2.p5.1)\.
- T\. Chen, S\. Kornblith, M\. Norouzi, and G\. Hinton \(2020\)Simclr: a simple framework for contrastive learning of visual representations\.InInternational Conference on Learning Representations,Vol\.2\.Cited by:[§4\.3\.1](https://arxiv.org/html/2606.02946#S4.SS3.SSS1.p5.2)\.
- X\. Chen, P\. Qiu, W\. Zhu, H\. Li, H\. Wang, A\. Sotiras, Y\. Wang, and A\. Razi \(2024\)TimeMIL: advancing multivariate time series classification via a time\-aware multiple instance learning\.InProceedings of the 41st International Conference on Machine Learning,pp\. 7190–7206\.Cited by:[2nd item](https://arxiv.org/html/2606.02946#A1.I2.i2.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p1.1)\.
- D\. Cheng, Y\. Zou, S\. Xiang, and C\. Jiang \(2025\)Graph neural networks for financial fraud detection: a review\.Frontiers of Computer Science19\(9\),pp\. 1–15\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- E\. Creager, J\. Jacobsen, and R\. Zemel \(2021\)Environment inference for invariant learning\.InInternational Conference on Machine Learning,pp\. 2189–2200\.Cited by:[1st item](https://arxiv.org/html/2606.02946#A1.I6.i1.p1.1),[Appendix A](https://arxiv.org/html/2606.02946#A1.p10.1),[§1](https://arxiv.org/html/2606.02946#S1.p4.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p2.1)\.
- Y\. Dou, Z\. Liu, L\. Sun, Y\. Deng, H\. Peng, and P\. S\. Yu \(2020\)Enhancing graph neural network\-based fraud detectors against camouflaged fraudsters\.InProceedings of the 29th ACM international conference on information & knowledge management,pp\. 315–324\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- J\. Early, G\. K\. Cheung, K\. Cutajar, H\. Xie, J\. Kandola, and N\. Twomey \(2024\)Inherently interpretable time series classification via multiple instance learning\.InICLR,Cited by:[1st item](https://arxiv.org/html/2606.02946#A1.I2.i1.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p1.1)\.
- A\. Feder, K\. A\. Keith, E\. Manzoor, R\. Pryzant, D\. Sridhar, Z\. Wood\-Doughty, J\. Eisenstein, J\. Grimmer, R\. Reichart, M\. E\. Roberts,et al\.\(2022\)Causal inference in natural language processing: estimation, prediction, interpretation and beyond\.Transactions of the Association for Computational Linguistics10,pp\. 1138–1158\.Cited by:[§1](https://arxiv.org/html/2606.02946#S1.p6.1),[§4\.3](https://arxiv.org/html/2606.02946#S4.SS3.p1.1)\.
- J\. Guo, G\. Liu, Y\. Zuo, and J\. Wu \(2018\)Learning sequential behavior representations for fraud detection\.In2018 IEEE international conference on data mining \(ICDM\),pp\. 127–136\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- I\. Higgins, L\. Matthey, A\. Pal, C\. Burgess, X\. Glorot, M\. Botvinick, S\. Mohamed, and A\. Lerchner \(2017\)Beta\-vae: learning basic visual concepts with a constrained variational framework\.InInternational conference on learning representations,Cited by:[§4\.2](https://arxiv.org/html/2606.02946#S4.SS2.p1.1)\.
- M\. Huang, Y\. Liu, X\. Ao, K\. Li, J\. Chi, J\. Feng, H\. Yang, and Q\. He \(2022\)Auc\-oriented graph neural network for fraud detection\.InProceedings of the ACM web conference 2022,pp\. 1311–1321\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- J\. Jang and H\. Kwon \(2025\)TAIL\-mil: time\-aware and instance\-learnable multiple instance learning for multivariate time series anomaly detection\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 17582–17589\.Cited by:[3rd item](https://arxiv.org/html/2606.02946#A1.I2.i3.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p1.1)\.
- H\. Kim and A\. Mnih \(2018\)Disentangling by factorising\.InInternational conference on machine learning,pp\. 2649–2658\.Cited by:[§4\.2](https://arxiv.org/html/2606.02946#S4.SS2.p1.1)\.
- T\. Kim, S\. Park, S\. Lim, Y\. Jung, K\. Muandet, and K\. Song \(2025\)Sufficient invariant learning for distribution shift\.InProceedings of the Computer Vision and Pattern Recognition Conference,pp\. 4958–4967\.Cited by:[2nd item](https://arxiv.org/html/2606.02946#A1.I5.i2.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p2.1)\.
- N\. Kitaev, Ł\. Kaiser, and A\. Levskaya \(2020\)Reformer: the efficient transformer\.arXiv preprint arXiv:2001\.04451\.Cited by:[2nd item](https://arxiv.org/html/2606.02946#A1.I1.i2.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p1.1)\.
- D\. Krueger, E\. Caballero, J\. Jacobsen, A\. Zhang, J\. Binas, D\. Zhang, R\. Le Priol, and A\. Courville \(2021\)Out\-of\-distribution generalization via risk extrapolation \(rex\)\.InInternational conference on machine learning,pp\. 5815–5826\.Cited by:[2nd item](https://arxiv.org/html/2606.02946#A1.I3.i2.p1.1),[§1](https://arxiv.org/html/2606.02946#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p2.1)\.
- A\. Lees, V\. Q\. Tran, Y\. Tay, J\. Sorensen, J\. Gupta, D\. Metzler, and L\. Vasserman \(2022\)A new generation of perspective api: efficient multilingual character\-level transformers\.InProceedings of the 28th ACM SIGKDD conference on knowledge discovery and data mining,pp\. 3197–3207\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- Y\. Li, N\. Wang, J\. Shi, X\. Hou, and J\. Liu \(2018\)Adaptive batch normalization for practical domain adaptation\.Pattern Recognition80,pp\. 109–117\.Cited by:[§4\.5](https://arxiv.org/html/2606.02946#S4.SS5.p1.1)\.
- Z\. Li, H\. Wang, P\. Zhang, P\. Hui, J\. Huang, J\. Liao, J\. Zhang, and J\. Bu \(2021\)Live\-streaming fraud detection: a heterogeneous graph neural network approach\.InProceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining,pp\. 3670–3678\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- C\. Liu, X\. Sun, J\. Wang, H\. Tang, T\. Li, T\. Qin, W\. Chen, and T\. Liu \(2021a\)Learning causal semantic representation for out\-of\-distribution prediction\.Advances in Neural Information Processing Systems34,pp\. 6155–6170\.Cited by:[§1](https://arxiv.org/html/2606.02946#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p1.1)\.
- H\. Liu, H\. Kamarthi, L\. Kong, Z\. Zhao, C\. Zhang, and B\. A\. Prakash \(2024\)Time\-series forecasting for out\-of\-distribution generalization using invariant learning\.InProceedings of the 41st International Conference on Machine Learning,pp\. 31312–31325\.Cited by:[2nd item](https://arxiv.org/html/2606.02946#A1.I6.i2.p1.1),[§1](https://arxiv.org/html/2606.02946#S1.p4.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p2.1)\.
- J\. Liu, Z\. Shen, Y\. He, X\. Zhang, R\. Xu, H\. Yu, and P\. Cui \(2021b\)Towards out\-of\-distribution generalization: a survey\.arXiv preprint arXiv:2108\.13624\.Cited by:[§1](https://arxiv.org/html/2606.02946#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p1.1)\.
- Y\. Liu, Q\. Zhou, H\. Li, F\. Zhuang, and J\. Gu \(2025\)Long\-term urban flow prediction against data distribution shift: a causal perspective\.IEEE Transactions on Knowledge and Data Engineering\.Cited by:[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p2.1)\.
- I\. Loshchilov and F\. Hutter \(2019\)Decoupled weight decay regularization\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=Bkg6RiCqY7)Cited by:[§5\.1\.3](https://arxiv.org/html/2606.02946#S5.SS1.SSS3.p1.7)\.
- X\. Lu, T\. Zhang, C\. Meng, X\. Wang, J\. Wang, Y\. Zhang, S\. Tang, C\. Liu, H\. Ding, K\. Jiang,et al\.\(2025\)Vlm as policy: common\-law content moderation framework for short video platform\.InProceedings of the 31st ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\. 2,pp\. 4682–4693\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- D\. Mahajan, S\. Tople, and A\. Sharma \(2021\)Domain generalization using causal matching\.InInternational conference on machine learning,pp\. 7313–7324\.Cited by:[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p1.1)\.
- K\. Oublal, S\. Ladjal, D\. Benhaiem, E\. LE BORGNE, and F\. Roueff \(2024\)Disentangling time series representations via contrastive independence\-of\-support on l\-variational inference\.InThe Twelfth International Conference on Learning Representations,Cited by:[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p2.1)\.
- J\. Pearl \(2009\)Causality\.Cambridge university press\.Cited by:[§1](https://arxiv.org/html/2606.02946#S1.p6.1),[§4\.3](https://arxiv.org/html/2606.02946#S4.SS3.p1.1)\.
- Y\. Qiao, J\. Chen, X\. Ao, Q\. Zhong, Y\. Liu, and Q\. He \(2026\)Live or lie: action\-aware capsule multiple instance learning for risk assessment in live streaming platforms\.InProceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V\. 1,pp\. 1182–1193\.Cited by:[4th item](https://arxiv.org/html/2606.02946#A1.I2.i4.p1.1),[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p2.1),[§4\.2](https://arxiv.org/html/2606.02946#S4.SS2.p1.1),[§5\.1\.1](https://arxiv.org/html/2606.02946#S5.SS1.SSS1.p3.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p1.1),[§5\.1\.3](https://arxiv.org/html/2606.02946#S5.SS1.SSS3.p2.7)\.
- Y\. Qiao, Y\. Tang, X\. Ao, Q\. Yuan, Z\. Liu, C\. Shen, and X\. Zheng \(2024\)Financial Risk Assessment via Long\-term Payment Behavior Sequence Folding\.In2024 IEEE International Conference on Data Mining \(ICDM\),Vol\.,Los Alamitos, CA, USA,pp\. 410–419\.External Links:ISSN,[Document](https://dx.doi.org/10.1109/ICDM59182.2024.00048),[Link](https://doi.ieeecomputersociety.org/10.1109/ICDM59182.2024.00048)Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- Y\. Qiao, N\. Wang, Y\. Gao, Y\. Yang, X\. Fu, W\. Wang, and X\. Ao \(2025\)Online fraud detection via test\-time retrieval\-based representation enrichment\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 12470–12478\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- S\. Sagawa\*, P\. W\. Koh\*, T\. B\. Hashimoto, and P\. Liang \(2020\)Distributionally robust neural networks\.InInternational Conference on Learning Representations,External Links:[Link](https://openreview.net/forum?id=ryxGuJrFvS)Cited by:[1st item](https://arxiv.org/html/2606.02946#A1.I5.i1.p1.1),[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p2.1)\.
- A\. Sauer and A\. Geiger \(2021\)Counterfactual generative networks\.InInternational Conference on Learning Representations,Cited by:[§4\.3\.1](https://arxiv.org/html/2606.02946#S4.SS3.SSS1.p3.3)\.
- F\. Schroff, D\. Kalenichenko, and J\. Philbin \(2015\)Facenet: a unified embedding for face recognition and clustering\.InProceedings of the IEEE conference on computer vision and pattern recognition,pp\. 815–823\.Cited by:[§4\.3\.1](https://arxiv.org/html/2606.02946#S4.SS3.SSS1.p5.2)\.
- F\. Shi, Y\. Cao, Y\. Shang, Y\. Zhou, C\. Zhou, and J\. Wu \(2022\)H2\-fdetector: a gnn\-based fraud detector with homophilic and heterophilic connections\.InProceedings of the ACM web conference 2022,pp\. 1486–1494\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- B\. Sun and K\. Saenko \(2016\)Deep coral: correlation alignment for deep domain adaptation\.InEuropean conference on computer vision,pp\. 443–450\.Cited by:[2nd item](https://arxiv.org/html/2606.02946#A1.I4.i2.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p2.1)\.
- A\. Vaswani, N\. Shazeer, N\. Parmar, J\. Uszkoreit, L\. Jones, A\. N\. Gomez, Ł\. Kaiser, and I\. Polosukhin \(2017\)Attention is all you need\.Advances in neural information processing systems30\.Cited by:[1st item](https://arxiv.org/html/2606.02946#A1.I1.i1.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p1.1)\.
- V\. Veitch, A\. D’Amour, S\. Yadlowsky, and J\. Eisenstein \(2021\)Counterfactual invariance to spurious correlations in text classification\.Advances in neural information processing systems34,pp\. 16196–16208\.Cited by:[§4\.3\.2](https://arxiv.org/html/2606.02946#S4.SS3.SSS2.p1.1)\.
- Z\. Wang, Q\. Wu, B\. Zheng, J\. Wang, K\. Huang, and Y\. Shi \(2023\)Sequence as genes: an user behavior modeling framework for fraud transaction detection in e\-commerce\.InProceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 5194–5203\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- Z\. Wang, Y\. Sun, H\. Wang, B\. Jing, X\. Shen, X\. L\. Dong, Z\. Hao, H\. Xiong, and Y\. Song \(2025\)Reasoning\-enhanced domain\-adaptive pretraining of multimodal large language models for short video content governance\.InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: Industry Track,pp\. 1104–1112\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- X\. Wu, F\. Teng, X\. Li, J\. Zhang, Q\. Duan, and T\. Li \(2026\)Out\-of\-distribution generalization in time series: a survey\.Information Fusion,pp\. 104336\.Cited by:[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p2.1)\.
- F\. Xiao, S\. Cai, G\. Chen, H\. Jagadish, B\. C\. Ooi, and M\. Zhang \(2024\)VecAug: unveiling camouflaged frauds with cohort augmentation for enhanced detection\.InProceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining,pp\. 6025–6036\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- S\. Yan, H\. Song, N\. Li, L\. Zou, and L\. Ren \(2020\)Improve unsupervised domain adaptation with mixup training\.arXiv preprint arXiv:2001\.00677\.Cited by:[1st item](https://arxiv.org/html/2606.02946#A1.I4.i1.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p2.1)\.
- S\. Zannettou, M\. ElSherief, E\. Belding, S\. Nilizadeh, and G\. Stringhini \(2020\)Measuring and characterizing hate speech on news websites\.InProceedings of the 12th ACM conference on web science,pp\. 125–134\.Cited by:[§2\.1](https://arxiv.org/html/2606.02946#S2.SS1.p1.1)\.
- C\. Zhang, K\. Zhang, and Y\. Li \(2020\)A causal view on robustness of neural networks\.Advances in Neural Information Processing Systems33,pp\. 289–301\.Cited by:[§1](https://arxiv.org/html/2606.02946#S1.p5.1),[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p1.1)\.
- H\. Zhou, S\. Zhang, J\. Peng, S\. Zhang, J\. Li, H\. Xiong, and W\. Zhang \(2021\)Informer: beyond efficient transformer for long sequence time\-series forecasting\.InProceedings of the AAAI conference on artificial intelligence,Vol\.35,pp\. 11106–11115\.Cited by:[3rd item](https://arxiv.org/html/2606.02946#A1.I1.i3.p1.1),[§5\.1\.2](https://arxiv.org/html/2606.02946#S5.SS1.SSS2.p1.1)\.
- K\. Zhou, Z\. Liu, Y\. Qiao, T\. Xiang, and C\. C\. Loy \(2022\)Domain generalization: a survey\.IEEE transactions on pattern analysis and machine intelligence45\(4\),pp\. 4396–4415\.Cited by:[§1](https://arxiv.org/html/2606.02946#S1.p4.1),[§2\.2](https://arxiv.org/html/2606.02946#S2.SS2.p1.1)\.
## Appendix ABaseline Details
First, we adopt two categories of backbone models as candidates to validate the effectiveness of LPCD\. \(i\)Sequence Modelsexplicitly model the action sequences of sessions:
- •Transformer\(Vaswaniet al\.,[2017](https://arxiv.org/html/2606.02946#bib.bib36)\)serves as a standard self\-attention baseline\.
- •Reformer\(Kitaevet al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib37)\)improves efficiency via locality\-sensitive hashing\.
- •Informer\(Zhouet al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib38)\)further scales to long sequences through sparse attention and representation distillation\.
\(ii\)MIL methodsaggregate instance\-level signals into session\-level predictions, where each instance corresponds to a per\-user action subsequence within a 100\-second window:
- •MIL\-LET\(Earlyet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib39)\)introduces an MIL formulation for time\-series classification that provides localized interpretability\.
- •TimeMIL\(Chenet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib40)\)introduces temporal awareness via learnable wavelet\-based positional encodings\.
- •TAIL\-MIL\(Jang and Kwon,[2025](https://arxiv.org/html/2606.02946#bib.bib41)\)extends MIL to multivariate time\-series modeling using a 2D formulation\.
- •AC\-MIL\(Qiaoet al\.,[2026](https://arxiv.org/html/2606.02946#bib.bib50)\)is a domain\-specific MIL framework for live\-streaming risk assessment that jointly models user\-level and temporal patterns\.
Second, we compare LPCD with four types of plug\-in methods for OOD generalization to show its superiority\. \(i\)Invariant Learning \(IL\)methods aim to capture invariant causal relationships across different environments by penalizing unstable correlations:
- •IRM\(Arjovskyet al\.,[2019](https://arxiv.org/html/2606.02946#bib.bib16)\)introduces a gradient\-based penalty to ensure the optimal classifier is consistent across all training environments\.
- •VREx\(Kruegeret al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib17)\)reduces the variance of risks across environments to achieve better generalization under distribution shifts\.
- •IB\-IRM\(Ahujaet al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib42)\)combines the Information Bottleneck principle with IRM to filter out environment\-specific noise while preserving invariant features\.
\(ii\)Data Augmentation and Alignment \(DA\)methods focus on enhancing model robustness by expanding the training distribution or aligning feature\-level statistics:
- •Mixup\(Yanet al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib43)\)creates vicinal training samples through linear interpolation of feature\-label pairs to smooth decision boundaries\.
- •CORAL\(Sun and Saenko,[2016](https://arxiv.org/html/2606.02946#bib.bib44)\)aligns the second\-order statistics \(covariance\) of source and target domain distributions to learn domain\-invariant representations\.
\(iii\)Distributionally Robust Optimization \(DRO\)methods optimize for the worst\-case performance across groups to mitigate spurious correlations and enhance stability:
- •GroupDRO\(Sagawa\*et al\.,[2020](https://arxiv.org/html/2606.02946#bib.bib45)\)explicitly minimizes the maximum loss across different groups to mitigate the impact of spurious correlations\.
- •ASGDRO\(Kimet al\.,[2025](https://arxiv.org/html/2606.02946#bib.bib46)\)seeks common flat minima across environments to learn a diverse set of invariant features\.
\(iv\)Environment Inference \(EI\)methods tackle the challenge of missing environment labels by automatically discovering latent environmental structures:
- •EIIL\(Creageret al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib47)\)infers environments by searching for a partition that maximally violates the IRM invariance principle\.
- •FOIL\(Liuet al\.,[2024](https://arxiv.org/html/2606.02946#bib.bib48)\)identifies latent environments in time\-series data by optimizing for feature\-level stability over temporal segments\.
It is worth noting that methods in the first three categories \(IL, DA, and DRO\) rely on explicit environment annotations during training, whereas EI methods and our proposed LPCD operate without any prior environmental labels\.
Since no ground\-truth environment annotations are available, we follow common practice\(Creageret al\.,[2021](https://arxiv.org/html/2606.02946#bib.bib47)\)and construct training environments via temporal partitioning\. Specifically, for theMaydataset, the training period \(05/20–06/03\) is divided into four distinct environments:May 20–23,May 24–27,May 28–31, andJune 1–3\. For theJunedataset, the training window \(06/04–06/10\) is partitioned into three environments:June 4–6,June 7–8, andJune 9–10\.
## Appendix BSupplementary Experiments
### B\.1\.Hyperparameter Sensitivity Test
We evaluate the sensitivity of our LPCD framework to the two balance hyperparameters in the CCD module:λCCDrep\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}andλCCDpred\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}\. The experiments are conducted on the May dataset, and the results are summarized in Figure[4](https://arxiv.org/html/2606.02946#A2.F4)\.
Sensitivity ofλCCDrep\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}:As shown in Figure[4](https://arxiv.org/html/2606.02946#A2.F4)\(a\), the model performance in both In\-ID and Tactical OOD scenarios remains consistently higher than the AC\-MIL backbone across all tested values\. The PR\-AUC for OOD reaches its peak atλCCDrep=2\.0\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}=2\.0\(0\.7300\), demonstrating that representation\-level consistency is robust to varying regularization strengths\.Sensitivity ofλCCDpred\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}:Figure[4](https://arxiv.org/html/2606.02946#A2.F4)\(b\) reveals that the model is more sensitive to the prediction\-level consistency weight\. While small values yield the best OOD performance \(peaking at 0\.7460 withλCCDpred=0\.05\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}=0\.05\), an excessive penalty \(e\.g\., 1\.0\) leads to performance decay\. This suggests that while predictive consistency helps in decoupling, an excessive penalty may overly constrain the decision boundary\.
Figure 4\.Hyperparameter sensitivity analysis on the May dataset\. Subplots \(a\) and \(b\) illustrate the impact ofλCCDrep\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{rep\}\}andλCCDpred\\lambda\_\{\\mathrm\{CCD\}\}^\{\\mathrm\{pred\}\}on PR\-AUC, respectively\. Solid lines with markers represent LPCD performance, while dashed horizontal lines represent the corresponding AC\-MIL backbone baselines for ID \(blue\) and OOD \(red\) test sets\.
### B\.2\.Analysis of Post\-hoc Calibration Variants
To evaluate the effectiveness of our proposedDimensional Magnitude Alignment \(V0\), which serves as the Post\-hoc Magnitude Calibration module \(Section[4\.5](https://arxiv.org/html/2606.02946#S4.SS5)\), we compare it with four alternative parameter\-free calibration variants\. These variants are designed to rectify distributional shifts in the packaging manifold𝐳pack\\mathbf\{z\}\_\{\\mathrm\{pack\}\}as follows:
- •V0 \(Dimensional Magnitude Alignment \- Default\):Performs per\-dimension rescaling using a diagonal matrix𝚪\\boldsymbol\{\\Gamma\}:𝐳~pack=𝚪𝐳pack\\tilde\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}=\\boldsymbol\{\\Gamma\}\\mathbf\{z\}\_\{\\mathrm\{pack\}\}, whereγd=σtrain,d/σtest,d\\gamma\_\{d\}=\\sigma\_\{\\mathrm\{train\},d\}/\\sigma\_\{\\mathrm\{test\},d\}\. It targets anisotropic magnitude shifts in specific latent dimensions\.
- •V1 \(Instance Norm Rescaling\):A sample\-level constraint that forces theL2L\_\{2\}norm of each representation to match the training averagertrainr\_\{\\mathrm\{train\}\}:𝐳~pack=𝐳pack⋅\(rtrain/‖𝐳pack‖2\)\\tilde\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}=\\mathbf\{z\}\_\{\\mathrm\{pack\}\}\\cdot\(r\_\{\\mathrm\{train\}\}/\\\|\\mathbf\{z\}\_\{\\mathrm\{pack\}\}\\\|\_\{2\}\)\. It ensures energy consistency but ignores dimensional variance\.
- •V2 \(Counterfactual Consistency Check\):A reasoning\-level check that compares the factual prediction with a counterfactual one wrapping the same intent in a pre\-defined safe prototype𝐳¯packsafe\\bar\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}^\{\\mathrm\{safe\}\}\. The final risk probability is:y^final=min\(y^fact,y^cf\)\\hat\{y\}\_\{\\mathrm\{final\}\}=\\min\(\\hat\{y\}\_\{\\mathrm\{fact\}\},\\hat\{y\}\_\{\\mathrm\{cf\}\}\)\.
- •V3 \(Centroid Translation Alignment\):A distribution\-level translation that eliminates systemic bias by subtracting the mean drift:𝐳~pack=𝐳pack−\(μtest−μtrain\)\\tilde\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}=\\mathbf\{z\}\_\{\\mathrm\{pack\}\}\-\(\\mu\_\{\\mathrm\{test\}\}\-\\mu\_\{\\mathrm\{train\}\}\), whereμ\\mudenotes the centroid of the packaging manifold\.
- •V4 \(Second\-order Correlation Alignment\):A rigorous affine transformation that synchronizes both mean and covariance \(Σ\\Sigma\):𝐳~pack=Σtrain1/2Σtest−1/2\(𝐳pack−μtest\)\+μtrain\\tilde\{\\mathbf\{z\}\}\_\{\\mathrm\{pack\}\}=\\Sigma\_\{\\mathrm\{train\}\}^\{1/2\}\\Sigma\_\{\\mathrm\{test\}\}^\{\-1/2\}\(\\mathbf\{z\}\_\{\\mathrm\{pack\}\}\-\\mu\_\{\\mathrm\{test\}\}\)\+\\mu\_\{\\mathrm\{train\}\}\.
Table 7\.Comparison of parameter\-free calibration variants on the June OOD Test Set\. All variants are applied to the frozen LPCD architecture\.V0is the default strategy\. Metrics: PR\-AUC \(AUC\), F1\-score \(F1\), R@0\.1FPR \(R\.1\), and FPR@0\.9R \(FPR\.9\)\.Calibration VariantLevelAUC↑\\uparrowF1↑\\uparrowR\.1↑\\uparrowFPR\.9↓\\downarrowNo Calibration \(LPCD\)\-0\.70530\.63880\.81780\.2041V1 \(Instance Norm Rescaling\)Sample0\.70610\.60080\.81920\.1994V2 \(Counterfactual Consistency\)Reasoning0\.70550\.63880\.81780\.2041V3 \(Centroid Translation\)Distribution0\.70530\.63620\.81780\.2040V4 \(Second\-order Correlation\)Distribution0\.70510\.63610\.81850\.2030V0 \(Dimensional Magnitude\)Dimension0\.72870\.67790\.86000\.1732
Analysis of Results\.As shown in Table[7](https://arxiv.org/html/2606.02946#A2.T7),V0significantly outperforms all other variants, from which we derive two key insights: \(1\)Dimension\-specific sensitivity:The performance degradation of V1 in F1\-score \(0\.6008 vs\. 0\.6388\) suggests that global scalar scaling destroys the relative importance across different latent dimensions\. In our disentangled space, dimensions carry independent semantic signals; forcing a uniform norm introduces excessive noise and distorts the discriminative structure\. \(2\)Effective factor decorrelation:The marginal gains of V4 over V3 indicate that the orthogonality constraint \(ℒortho\\mathcal\{L\}\_\{\\mathrm\{ortho\}\}\) during training successfully minimized cross\-dimensional correlations\. Consequently, complex covariance\-based alignment collapses toward simpler mean alignment\. This underscores that*magnitude shift*, rather than rotational or correlation shift, is the primary bottleneck in OOD deployment, which V0 addresses with optimal granularity\.Similar Articles
Representation over Routing: Overcoming Surrogate Hacking in Multi-Timescale PPO
This paper identifies surrogate hacking and temporal uncertainty as failure modes in multi-timescale RL, and proposes a Target Decoupling architecture that removes routing from the actor, using the critic for auxiliary representation learning. The method eliminates policy collapse on the LunarLander-v2 benchmark and stably surpasses the 'Environment Solved' threshold without hyperparameter hacking.
Conceal, Reconstruct, Jailbreak: Exploiting the Reconstruction-Concealment Tradeoff in MLLMs
This paper analyzes the reconstruction-concealment tradeoff in intent-obfuscation jailbreak attacks on Multimodal Large Language Models (MLLMs). It proposes concealment-aware variant construction and keyword-related distractor images to exploit model vulnerabilities more effectively.
ProactiveLLM: Learning Active Interaction for Streaming Large Language Models
ProactiveLLM introduces a method for streaming LLMs to actively decide when to generate output based on endogenous cues, using mask-based streaming modeling and synchronized privileged self-distillation, reducing latency without external annotations.
Hidden Latent-State Shifts in LLMs: Why Current Alignment Is Blind to Real Internal Dangers — Especially With Agents
This paper demonstrates that LLMs can enter measurably different internal latent states under coherent context while maintaining aligned outputs, revealing a blind spot in current alignment methods that only monitor surface tokens. The Gemma-3-12B-IT experiment shows strong residual stream geometry shifts that existing safety frameworks cannot detect, with implications for agentic AI deployment.
TEMPO: Temporal Enforcement via Mode-Separated Policy Optimization for Trustworthy LLM Backtesting
Proposes TEMPO, a policy optimization method that trains LLMs to reason exclusively from pre-cutoff information by using a two-mode reward and GRPO-based training, reducing knowledge leakage by 2–13% while improving task performance by 6–13%.