Pitfalls of Unlabeled Disagreement-Based Drift Detection in Streaming Tree Ensembles
Summary
This paper investigates disagreement-based drift detection in ensembles of incremental decision trees, finding that while effective in neural networks, the method underperforms loss-based detectors for tree ensembles due to limited model plasticity.
View Cached Full Text
Cached at: 05/14/26, 06:18 AM
# Pitfalls of Unlabeled Disagreement-Based Drift Detection in Streaming Tree Ensembles
Source: [https://arxiv.org/html/2605.12803](https://arxiv.org/html/2605.12803)
Lara Sá Neves, Afonso Lourenço, Goreti Marreiros GECAD, ISEP, Polytechnic of Porto, Portugal \{lspsn,fonso,mgt\}@isep\.ipp\.pt &Lizy K\. John The University of Texas at Austin, USA ljohn@ece\.utexas\.edu
###### Abstract
Detecting concept drift in high\-speed data streams remains challenging, particularly when models must operate on unlabeled data and avoid false alarms caused by benign shifts\. While disagreement\-based uncertainty has shown promise in neural networks, its adaptation to ensembles of incremental decision trees \(IDTs\) remains largely unexplored\. We investigate this approach by constructing batch\-specific disagreement measures via label flipping in ensemble members and evaluating their effectiveness for drift detection in tabular data streams\. Our experiments show that, although this method performs well in ensembles of multi\-layer perceptrons \(MLPs\), it consistently underperforms loss\-based detectors when applied to IDTs\. We attribute this behavior to the intrinsic rigidity of IDTs: learning primarily through structural expansion, with limited parameter adaptation, restricts model plasticity and prevents disagreement from reliably reflecting learning potential\. Recent work on restructuring IDTs using their intrinsic decomposition into non\-overlapping rules offers a promising direction for improving adaptability\.
## 1Introduction
Handling change in high\-speed data streams is challenging due to heavy concept drifts\. Effective monitoring algorithms should \(R1\) operate on unlabeled deployment data to detect model deterioration and \(R2\) resist non\-deteriorating shifts with few samples\. While existing data\-based drift detectors perform well on unlabeled data \(R1\)\(Xuanet al\.,[2021](https://arxiv.org/html/2605.12803#bib.bib45); Wan and Wang,[2021](https://arxiv.org/html/2605.12803#bib.bib46)\), they often generate false positives when shifts are benign \(R2\)\. Many methods track changes in classifier posterior distributions\(Lindstromet al\.,[2013](https://arxiv.org/html/2605.12803#bib.bib35); Lughoferet al\.,[2016](https://arxiv.org/html/2605.12803#bib.bib31); Luet al\.,[2025](https://arxiv.org/html/2605.12803#bib.bib26)\), which can indicate uncertainty\. However, such estimates may be unreliable when models continuously adapt to evolving streams\. To address this, we propose batch‑specific uncertainty, which is more practical than prequential metrics\. Streaming models should focus on reliability in the current distribution rather than hypothetical generalization\. Similarly to transductive reasoning, measuring how conflicting information in a batch affects the model rather than relying on accumulated past uncertainty\.
A prominent example is the model disagreement framework\(Yu and Aizawa,[2019](https://arxiv.org/html/2605.12803#bib.bib12); Jianget al\.,[2021](https://arxiv.org/html/2605.12803#bib.bib11); Rosenfeld and Garg,[2023](https://arxiv.org/html/2605.12803#bib.bib10); Ginsberget al\.,[2022](https://arxiv.org/html/2605.12803#bib.bib15)\)\. To date, it has been studied mainly with expressive neural networks trained in large batches, which struggle on tabular streams due to slow convergence, overwritten weights, and limited inductive advantage\(Sahooet al\.,[2017](https://arxiv.org/html/2605.12803#bib.bib21)\)\. For tabular data, ensembles of incremental decision trees \(IDTs\) remain state\-of\-the\-art, leveraging fast online convergence and tree replacement via loss\-based drift detectors\(Bifet and Gavaldà,[2007](https://arxiv.org/html/2605.12803#bib.bib37); Gamaet al\.,[2004](https://arxiv.org/html/2605.12803#bib.bib19)\), with extensions incorporating unlabeled data through self\-training, unsupervised drift detection, and active learning\(Gomeset al\.,[2025](https://arxiv.org/html/2605.12803#bib.bib38)\)\. This raises a key question: can the disagreement framework be adapted for tree\-based streaming ensembles? To implement this, we exploit the fact that in binary classification, arbitrarily flipping labels for each ensemble component can create diverse, disagreeing representations, a simple yet effective way to design a true disagreeing critic\(Rosenfeld and Garg,[2023](https://arxiv.org/html/2605.12803#bib.bib10); Ginsberget al\.,[2022](https://arxiv.org/html/2605.12803#bib.bib15); Pagliardiniet al\.,[2022](https://arxiv.org/html/2605.12803#bib.bib14); Chuanget al\.,[2020](https://arxiv.org/html/2605.12803#bib.bib13)\)\. Surprisingly, we find this strategy performs poorly across nearly all evaluated streams for ensembles of IDTs, but not multi\-layer perceptrons \(MLPs\)\. We hypothesize that disagreement among IDTs fails to provide reliable signals of concept change, not due to flaws in the detection logic, but because the underlying learners lack the plasticity needed for a disagreement critic to capture their learning potential\.
## 2Theory
For a stream of drifting conceptsDiD\_\{i\}, learners updateθt\\theta\_\{t\}incrementally to minimize risk onDtD\_\{t\}:
θt:=Algt\(θt−1,ℒt\),ℒt=∑i=1t𝔼\(x,y\)∼Di\[ℓ\(y,hθt−1\(x\)\)\]\.\\theta\_\{t\}:=\\text\{Alg\}\_\{t\}\(\\theta\_\{t\-1\},\\mathcal\{L\}\_\{t\}\),\\mathcal\{L\}\_\{t\}=\\sum\_\{i=1\}^\{t\}\\mathbb\{E\}\_\{\(x,y\)\\sim D\_\{i\}\}\[\\ell\(y,h\_\{\\theta\_\{t\-1\}\}\(x\)\)\]\.\(1\)Storing all past data is impractical\. Tree\-based learners address this via approximations, e\.g\., incremental information gain with Hoeffding bounds, expanding only when differences between best and second\-best splits are significant\(Domingos and Hulten,[2000](https://arxiv.org/html/2605.12803#bib.bib43)\)\. This operation can be described as:
Lemma 1 \(Incremental labeled update\)\.Forh∈ℋh\\in\\mathcal\{H\}and history modelhθt−1h\_\{\\theta\_\{t\-1\}\}:
εDt\(h\)=εDt\(h,hθt−1\)\+εDt\(hθt−1\),\\varepsilon\_\{D\_\{t\}\}\(h\)=\\varepsilon\_\{D\_\{t\}\}\(h,h\_\{\\theta\_\{t\-1\}\}\)\+\\varepsilon\_\{D\_\{t\}\}\(h\_\{\\theta\_\{t\-1\}\}\),\(2\)whereεDt\(h,hθt−1\)\\varepsilon\_\{D\_\{t\}\}\(h,h\_\{\\theta\_\{t\-1\}\}\)denotes the one\-hot disagreement\. However, this bound is insufficient under drifting unlabeled distributions\. Since manual labeling is infeasible in true streams, meaningful error bounds need a notion of distributional distance, e\.g\.ℋΔℋ\\mathcal\{H\}\\Delta\\mathcal\{H\}\-divergence\(Kiferet al\.,[2004](https://arxiv.org/html/2605.12803#bib.bib18)\)\.
Lemma 2 \(Drift\-based update\)\.Assuming a binary hypothesis class capable of discriminating𝒟t−1\\mathcal\{D\}\_\{t\-1\}and𝒟t\\mathcal\{D\}\_\{t\}\(Ben\-Davidet al\.,[2010](https://arxiv.org/html/2605.12803#bib.bib8)\), i\.e\., whoseℋΔℋ\\mathcal\{H\}\\Delta\\mathcal\{H\}class contains all pairwise exclusive\-ors:
εDt\(h\)≤εDt\(h,hθt−1\)\+εDt−1\(hθt−1\)\+12Δ\(hθt−1\),\\varepsilon\_\{D\_\{t\}\}\(h\)\\leq\\varepsilon\_\{D\_\{t\}\}\(h,h\_\{\\theta\_\{t\-1\}\}\)\+\\varepsilon\_\{D\_\{t\-1\}\}\(h\_\{\\theta\_\{t\-1\}\}\)\+\\frac\{1\}\{2\}\\Delta\(h\_\{\\theta\_\{t\-1\}\}\),\(3\)
Figure 1:Drift detection across complexities: \(left\) loss\-based false negative on over\-regularized model, \(center\) data\-based false positive for a true matching model complexity, \(right\) both successful in overly complex model\.While vacuous in practice, this bound suggests that \(1\) the conservative splitting and parent hyper\-rectangles ofhθt−1h\_\{\\theta\_\{t\-1\}\}act as a regularizer forhh; \(2\) bias is minimized only ifhθt−1h\_\{\\theta\_\{t\-1\}\}is localized around𝒟t\\mathcal\{D\}\_\{t\}; and \(3\) useful drift detectors must account for both data and model complexity \(Fig\.[1](https://arxiv.org/html/2605.12803#S2.F1)\): if𝒟t−1\\mathcal\{D\}\_\{t\-1\}/𝒟t\\mathcal\{D\}\_\{t\}are similar, the bound is small andhθt−1h\_\{\\theta\_\{t\-1\}\}can be reused; otherwise,hhis updated via pruning, regrowing, or ensemble modification\.
This motivates bounding error relative to the previous model rather than the entire hypothesis class, as the true labeling functiony∗y^\{\*\}and drifted distribution𝒟t\\mathcal\{D\}tare not adversarial\. Hence, detection can exploitΔ\(hθt−1\)\\Delta\(h\_\{\\theta\_\{t\-1\}\}\)with alternative hypotheses to obtain more practical bounds under drift:
Lemma 3 \(Disagreement\-based update\)\.Leth∗=argmaxh′∈ℋ′Δ\(hθt−1,h′\)h^\{\*\}=\\arg\\max\_\{h^\{\\prime\}\\in\\mathcal\{H\}^\{\\prime\}\}\\Delta\(h\_\{\\theta\_\{t\-1\}\},h^\{\\prime\}\),ℋ′\\mathcal\{H\}^\{\\prime\}perhθt−1h\_\{\\theta\_\{t\-1\}\}:
εDt\(h\)≤εDt\(h,hθt−1\)\+εDt−1\(hθt−1\)\+12Δ\(hθt−1,h∗\),\\varepsilon\_\{D\_\{t\}\}\(h\)\\leq\\varepsilon\_\{D\_\{t\}\}\(h,h\_\{\\theta\_\{t\-1\}\}\)\+\\varepsilon\_\{D\_\{t\-1\}\}\(h\_\{\\theta\_\{t\-1\}\}\)\+\\frac\{1\}\{2\}\\Delta\(h\_\{\\theta\_\{t\-1\}\},h^\{\*\}\),\(4\)
Figure 2:Disagreement\-based drift across complexities: \(left\) hardly induced in far input space, \(center\) even, \(right\) easily induced in under\-regularized far input space\.Whileh∗h^\{\*\}is intractable, it motivates maximizingΔ\(hθt−1,h′\)\\Delta\(h\_\{\\theta\_\{t\-1\}\},h^\{\\prime\}\)to identify parts of the input space most affected by drift\. In binary ensembles, this can be as simple as flipping labels\(Rosenfeld and Garg,[2023](https://arxiv.org/html/2605.12803#bib.bib10); Ginsberget al\.,[2022](https://arxiv.org/html/2605.12803#bib.bib15)\)\(Fig\.[2](https://arxiv.org/html/2605.12803#S2.F2)\): under\-regularized models fail to capture drift, correctly regularized models balance disagreement, and overly complex models overfit new regions\. Using this discrepancy while preservingε𝒟t−1\(hθt−1\)\\varepsilon\_\{\\mathcal\{D\}\_\{t\-1\}\}\(h\_\{\\theta\_\{t\-1\}\}\)allows functional regularization, while graceful forgetting and pruning outdated nodes improve adaptability and free capacity\.
## 3Method
Initialize ensemble
ggon past data
PP;
while*stream has new batch*do
Q′,R′←Q^\{\\prime\},R^\{\\prime\}\\leftarrowpseudo\-label and flip in
QQ,
RR;
gQ,gR←g\_\{Q\},g\_\{R\}\\leftarrowcopies of
gg;
Train
gQg\_\{Q\}on
Q′Q^\{\\prime\},
gRg\_\{R\}on
R′R^\{\\prime\};
for*each ensemblegX∈\{gQ,gR\}g\_\{X\}\\in\\\{g\_\{Q\},g\_\{R\}\\\}*do
for*each model pair\(ga,gb\)\(g\_\{a\},g\_\{b\}\)ingXg\_\{X\}*do
da,b=1K∑i=1K𝟏\[ga\(xi\)≠gb\(xi\)\]d\_\{a,b\}=\\frac\{1\}\{K\}\\sum\\limits\_\{i=1\}^\{K\}\\mathbf\{1\}\[g\_\{a\}\(x\_\{i\}\)\\neq g\_\{b\}\(x\_\{i\}\)\];
DX←D\_\{X\}\\leftarrowcollection of all
da,bd\_\{a,b\};
if*KS\_test\(DQ,DRD\_\{Q\},D\_\{R\}\) rejectsH0H\_\{0\}*then
Drift detected;
Algorithm 1Disagreement frameworkFor each batch, the data is split in two consecutive sub\-windows,QQandRR\. Two copies of the ensemblegg, denotedgQg\_\{Q\}andgRg\_\{R\}, are trained to remain consistent with past distributionsPPwhile being exposed to flipped versions of the pseudo\-labeledQQandRR, respectively \(Fig\.[3](https://arxiv.org/html/2605.12803#S3.F3)\)\. Pairwise disagreements among base learners form the distributionsDQD\_\{Q\}andDRD\_\{R\}, capturing the impact of new data on predictive consistency\. A Kolmogorov\-Smirnov \(KS\) test betweenDQD\_\{Q\}andDRD\_\{R\}is used to detect significant concept drift\. TheQQ–RRsplit naturally balances convergence and detection latency, as overly small windows may yield noisy estimates\.
Figure 3:Windowed disagreement\.To achieve expressive adaptation, without relying on overly large windows that delay detection, we adopt Oza’s ensemble backbone, with the Poisson parameterλ\\lambdagoverning resampling\(Oza and Russell,[2001](https://arxiv.org/html/2605.12803#bib.bib24)\)\. However, rather than usingλ=1\\lambda=1, instances are exploited more aggressively under underfitting, usingλ\(ϵ\)=ϵλmax\\lambda\(\\epsilon\)=\\epsilon\\lambda\_\{\\max\}, whereϵ∈⟨0,1⟩\\epsilon\\in\\langle 0,1\\rangledenotes the current error\(Korycki and Krawczyk,[2022](https://arxiv.org/html/2605.12803#bib.bib25)\)\. Thus, accelerating convergence to more reliable estimates\.
## 4Experiments
We evaluate IDT & MLP ensembles with 6 loss\-based: HDDMA&W\(Pesaranghader and Viktor,[2016](https://arxiv.org/html/2605.12803#bib.bib41)\), ADWIN\(Bifet and Gavaldà,[2007](https://arxiv.org/html/2605.12803#bib.bib37)\), PH\(Mousset al\.,[2004](https://arxiv.org/html/2605.12803#bib.bib20)\), DDM\(Gamaet al\.,[2004](https://arxiv.org/html/2605.12803#bib.bib19)\), EDDM\(Baena\-Garcíaet al\.,[2006](https://arxiv.org/html/2605.12803#bib.bib44)\); and 5 data\-based: BNDM\(Xuanet al\.,[2021](https://arxiv.org/html/2605.12803#bib.bib45)\), CSDDM\(Wan and Wang,[2021](https://arxiv.org/html/2605.12803#bib.bib46)\), D3\(Sethi and Kantardzic,[2015](https://arxiv.org/html/2605.12803#bib.bib39)\), IBDD\(Souzaet al\.,[2020](https://arxiv.org/html/2605.12803#bib.bib40)\), OCDD\(Gözüaçık and Can,[2021](https://arxiv.org/html/2605.12803#bib.bib47)\)\.
Figure 4:Evaluation metrics: Detection window\.We use 12 synthetic streams from 7 SOA generators: SEA \(rotating boundaries\), Hyperplane \(10 features\), Stagger \(feature distribution changes\), Anomaly Sine \(contextual drifts\), RBF \(centroid shifts\), and Agrawal \(classification changes\)\. Each contains 90,000 instances with five 15,000\-instance drifts, both abrupt and recurring\. We adopt prequential evaluation and report Mean Time to Detection \(MTD\), Detection Accuracy \(DA\), and False Alarms \(FA\), counting alarms outside the defined detection window as false positives, with 7,500 and 9,000 instances for abrupt and gradual drifts, respectively \(Fig\.[4](https://arxiv.org/html/2605.12803#S4.F4)\)\. All hyperparameters for ensembles and drift detectors, including both loss\- and data\-based methods, were set according to recommended ranges in the original papers and tuned using a weighted min\-max normalization:0\.5×DA\+0\.3×\(1−FA\)\+0\.2×\(1−MTD\)0\.5\\times\\text\{DA\}\+0\.3\\times\(1\-\\text\{FA\}\)\+0\.2\\times\(1\-\\text\{MTD\}\)\. For ensembles, we use as base classifiers: Hoeffding tree\(Domingos and Hulten,[2000](https://arxiv.org/html/2605.12803#bib.bib43)\), Hoedffing Adaptive Tree\(Bifet and Gavalda,[2009](https://arxiv.org/html/2605.12803#bib.bib42)\), and Extremely Fast Decision Tree\(Manapragadaet al\.,[2018](https://arxiv.org/html/2605.12803#bib.bib2)\)for IDTs, and standard feedforward networks for MLPs, with all ensembles configured to contain 100 learners\.
While ensembles of MLPs show good behavior, disagreement\-based uncertainty from IDTs performs consistently poorly across nearly all evaluated streams \(Table[1](https://arxiv.org/html/2605.12803#S4.T1)\)\. It exhibits substantially delayed detections and, in several settings, a non\-trivial number of false alarms, particularly when compared to loss\-based baselines\. These results indicate that disagreement signals derived from ensembles of IDTs are often too weak or too noisy to serve as reliable drift indicators\.
Table 1:MTD\(FA\) results for gradual \(GG\) and abrupt \(AA\) drifts, in⊗\\otimesDisagreement\-based,⋄\\diamondData\-based, and∇\\nablaLoss\-based detectors\.
## 5Conclusions
Taken together, our results hint at a fundamental limitation in current drift research: increasingly sophisticated model\-dependent detection mechanisms cannot compensate for rigid base learners\. Disagreement estimates derived from IDTs fail to provide reliable signals of concept change, not due to flaws in the detection logic itself, but because the underlying learners lack the plasticity required for uncertainty to reflect learning potential\. IDTs converge quickly online thanks to their few trainable parameters, but this efficiency comes at the cost of severe rigidity\. Unlike MLP systems, which adapt through both parameter updates and activation dynamics\(Lourençoet al\.,[2025a](https://arxiv.org/html/2605.12803#bib.bib4)\), IDTs rely almost exclusively on irreversible structural growth driven by locally optimal split decisions, resulting in history\-dependent models dominated by outdated inductive biases\(Lourençoet al\.,[2025b](https://arxiv.org/html/2605.12803#bib.bib5)\)\. Traditional attempts to address this limitation frame plasticity primarily as capacity management, via subtree pruning\(Nowak Assiset al\.,[2025](https://arxiv.org/html/2605.12803#bib.bib1); Manapragadaet al\.,[2018](https://arxiv.org/html/2605.12803#bib.bib2)\), rather than as the ability of the current parameters to serve as a meaningful starting point for further learning\. As a consequence, both disagreement\-based drift detections exhibits brittle, stream\-specific behavior, often failing outside narrow settings\. To circumvent this, recent work on restructuring incremental decision trees with their intrinsic, non\-overlapping rules\(Schreckenbergeret al\.,[2020](https://arxiv.org/html/2605.12803#bib.bib6); Heydenet al\.,[2024](https://arxiv.org/html/2605.12803#bib.bib3); Zhaoet al\.,[2025](https://arxiv.org/html/2605.12803#bib.bib7)\)\(Fig\.[5](https://arxiv.org/html/2605.12803#S5.F5)\) offer a promising path forward by partially relaxing this rigidity\.
\(a\)Disconnect subtree
\(b\)Desired branch splits
\(c\)Move splits to root
\(d\)Subtree rebuild
Figure 5:Restructuring IDTs with their intrinsic, non\-overlapping rules that fully partition the space\.### Acknowledgments
Work funded by Portuguese Foundation for Science and Technology under the UT Austin Portugal Program, Ph\.D\. scholarship PRT/BD/18497/2024 and project doi\.org/10\.54499/UID/00760/2025\.
## References
- Early drift detection method\.InFourth international workshop on knowledge discovery from data streams,Vol\.6,pp\. 77–86\.Cited by:[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- S\. Ben\-David, J\. Blitzer, K\. Crammer, A\. Kulesza, F\. Pereira, and J\. W\. Vaughan \(2010\)A theory of learning from different domains\.Machine learning79,pp\. 151–175\.Cited by:[§2](https://arxiv.org/html/2605.12803#S2.p3.3)\.
- A\. Bifet and R\. Gavaldà \(2007\)Learning from time\-changing data with adaptive windowing\.InProceedings of the 7th SIAM International Conference on Data Mining,External Links:[Document](https://dx.doi.org/10.1137/1.9781611972771.42)Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1),[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- A\. Bifet and R\. Gavalda \(2009\)Adaptive learning from evolving data streams\.InInternational symposium on intelligent data analysis,pp\. 249–260\.Cited by:[§4](https://arxiv.org/html/2605.12803#S4.p2.1)\.
- C\. Chuang, A\. Torralba, and S\. Jegelka \(2020\)Estimating generalization under distribution shifts via domain\-invariant representations\.arXiv preprint arXiv:2007\.03511\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1)\.
- P\. Domingos and G\. Hulten \(2000\)Mining high\-speed data streams\.InProceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining,pp\. 71–80\.Cited by:[§2](https://arxiv.org/html/2605.12803#S2.p1.4),[§4](https://arxiv.org/html/2605.12803#S4.p2.1)\.
- J\. Gama, P\. Medas, G\. Castillo, and P\. Rodrigues \(2004\)Learning with drift detection\.InBrazilian Symposium on Artificial Intelligence,pp\. 286–295\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1),[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- T\. Ginsberg, Z\. Liang, and R\. G\. Krishnan \(2022\)A learning based hypothesis test for harmful covariate shift\.arXiv preprint arXiv:2212\.02742\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1),[§2](https://arxiv.org/html/2605.12803#S2.p6.3)\.
- H\. M\. Gomes, J\. Read, M\. Grzenda, B\. Pfahringer, and A\. Bifet \(2025\)SLEADE: disagreement\-based semi\-supervised learning for sparsely labeled evolving data streams\.IEEE Transactions on Knowledge and Data Engineering\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1)\.
- Ö\. Gözüaçık and F\. Can \(2021\)Concept learning using one\-class classifiers for implicit drift detection in evolving data streams\.Artificial Intelligence Review54\(5\),pp\. 3725–3747\.Cited by:[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- M\. Heyden, H\. M\. Gomes, E\. Fouché, B\. Pfahringer, and K\. Böhm \(2024\)Leveraging plasticity in incremental decision trees\.InJoint European Conference on Machine Learning and Knowledge Discovery in Databases,pp\. 38–54\.Cited by:[§5](https://arxiv.org/html/2605.12803#S5.p1.1)\.
- Y\. Jiang, V\. Nagarajan, C\. Baek, and J\. Z\. Kolter \(2021\)Assessing generalization of sgd via disagreement\.arXiv preprint arXiv:2106\.13799\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1)\.
- D\. Kifer, S\. Ben\-David, and J\. Gehrke \(2004\)Detecting change in data streams\.InVLDB,Vol\.4,pp\. 180–191\.Cited by:[§2](https://arxiv.org/html/2605.12803#S2.p2.4)\.
- Ł\. Korycki and B\. Krawczyk \(2022\)Instance exploitation for learning temporary concepts from sparsely labeled drifting data streams\.Pattern Recognition129,pp\. 108749\.Cited by:[§3](https://arxiv.org/html/2605.12803#S3.p2.4)\.
- P\. Lindstrom, B\. Mac Namee, and S\. J\. Delany \(2013\)Drift detection using uncertainty distribution divergence\.Evolving Systems4,pp\. 13–25\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p1.1)\.
- A\. Lourenço, J\. Gama, E\. P\. Xing, and G\. Marreiros \(2025a\)Bridging streaming continual learning via in\-context large tabular models\.arXiv preprint arXiv:2512\.11668\.Cited by:[§5](https://arxiv.org/html/2605.12803#S5.p1.1)\.
- A\. Lourenço, J\. Rodrigo, J\. Gama, and G\. Marreiros \(2025b\)Dfdt: dynamic fast decision tree for iot data stream mining on edge devices\.arXiv preprint arXiv:2502\.14011\.Cited by:[§5](https://arxiv.org/html/2605.12803#S5.p1.1)\.
- P\. Lu, J\. Lu, A\. Liu, and G\. Zhang \(2025\)Early concept drift detection via prediction uncertainty\.InProceedings of the AAAI Conference on Artificial Intelligence,Vol\.39,pp\. 19124–19132\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p1.1)\.
- E\. Lughofer, E\. Weigl, W\. Heidl, C\. Eitzinger, and T\. Radauer \(2016\)Recognizing input space and target concept drifts in data streams with scarcely labeled and unlabelled instances\.Information Sciences355,pp\. 127–151\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p1.1)\.
- C\. Manapragada, G\. I\. Webb, and M\. Salehi \(2018\)Extremely fast decision tree\.InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining,pp\. 1953–1962\.Cited by:[§4](https://arxiv.org/html/2605.12803#S4.p2.1),[§5](https://arxiv.org/html/2605.12803#S5.p1.1)\.
- H\. Mouss, D\. Mouss, N\. Mouss, and L\. Sefouhi \(2004\)Test of page\-hinckley, an approach for fault detection in an agro\-alimentary production system\.2004 5th Asian Control Conference \(IEEE Cat\. No\. 04EX904\)2,pp\. 815–818\.Cited by:[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- D\. Nowak Assis, J\. P\. Barddal, and F\. Enembreck \(2025\)Behavioral insights of adaptive splitting decision trees in evolving data stream classification\.Knowledge and Information Systems,pp\. 1–32\.Cited by:[§5](https://arxiv.org/html/2605.12803#S5.p1.1)\.
- N\. C\. Oza and S\. Russell \(2001\)Experimental comparisons of online and batch versions of bagging and boosting\.InProceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining,pp\. 359–364\.Cited by:[§3](https://arxiv.org/html/2605.12803#S3.p2.4)\.
- M\. Pagliardini, P\. Gupta, M\. Jaggi, T\. Hofmann, and M\. Tatarchenko \(2022\)Agree to disagree: diversity through disagreement for better transferability\.arXiv preprint arXiv:2202\.04414\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1)\.
- A\. Pesaranghader and H\. L\. Viktor \(2016\)Fast hoeffding drift detection method for evolving data streams\.InJoint European conference on machine learning and knowledge discovery in databases,pp\. 96–111\.Cited by:[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- E\. Rosenfeld and S\. Garg \(2023\)\(Almost\) provable error bounds under distribution shift via disagreement discrepancy\.InAdvances in Neural Information Processing Systems,Vol\.36,pp\. 28761–28784\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1),[§2](https://arxiv.org/html/2605.12803#S2.p6.3)\.
- D\. Sahoo, Q\. Pham, J\. Lu, and S\. C\. Hoi \(2017\)Online deep learning: learning deep neural networks on the fly\.arXiv preprint arXiv:1711\.03705\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1)\.
- C\. Schreckenberger, T\. Glockner, H\. Stuckenschmidt, and C\. Bartelt \(2020\)Restructuring of hoeffding trees for trapezoidal data streams\.In2020 International Conference on Data Mining Workshops \(ICDMW\),pp\. 416–423\.Cited by:[§5](https://arxiv.org/html/2605.12803#S5.p1.1)\.
- T\. S\. Sethi and M\. Kantardzic \(2015\)Don’t pay for validation: detecting drifts from unlabeled data using margin density\.Procedia Computer Science53,pp\. 103–112\.Cited by:[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- V\. M\.A\. Souza, F\. A\. Chowdhury, and A\. Mueen \(2020\)Unsupervised drift detection on high\-speed data streams\.InProceedings \- 2020 IEEE International Conference on Big Data, Big Data 2020,External Links:[Document](https://dx.doi.org/10.1109/BigData50022.2020.9377880)Cited by:[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- J\. S\. W\. Wan and S\. D\. Wang \(2021\)Concept drift detection based on pre\-clustering and statistical testing\.Journal of Internet Technology22\.External Links:[Document](https://dx.doi.org/10.3966/160792642021032202020),ISSN 20794029Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p1.1),[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- J\. Xuan, J\. Lu, and G\. Zhang \(2021\)Bayesian nonparametric unsupervised concept drift detection for data stream mining\.ACM Transactions on Intelligent Systems and Technology12\.External Links:[Document](https://dx.doi.org/10.1145/3420034),ISSN 2157\-6904Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p1.1),[§4](https://arxiv.org/html/2605.12803#S4.p1.1)\.
- Q\. Yu and K\. Aizawa \(2019\)Unsupervised out\-of\-distribution detection by maximum classifier discrepancy\.InProceedings of the IEEE/CVF international conference on computer vision,pp\. 9518–9526\.Cited by:[§1](https://arxiv.org/html/2605.12803#S1.p2.1)\.
- R\. Zhao, Y\. You, J\. Sun, J\. Gama, and J\. Jiang \(2025\)Online learning from drifting capricious data streams with flexible hoeffding tree\.Information Processing & Management62\(6\),pp\. 104221\.Cited by:[§5](https://arxiv.org/html/2605.12803#S5.p1.1)\.Similar Articles
Drifting Objectives for Refining Discrete Diffusion Language Models
This paper introduces TokenDrift, a drifting objective that refines discrete diffusion language models by lifting categorical predictions to a continuous semantic space for anti-symmetric drifting, significantly improving generation quality under a fixed number of denoising steps.
Ensemble Monitoring for AI Control: Diverse Signals Outweigh More Compute
This paper proposes ensemble monitoring for AI control, combining diverse monitors to improve detection of misaligned actions. Experiments show that diverse ensembles outperform homogeneous ones and that fine-tuned monitors add unique detection capabilities.
Attention Drift: What Autoregressive Speculative Decoding Models Learn
This paper identifies 'attention drift' in autoregressive speculative decoding models, where drafters' attention shifts from the prompt to their own generated tokens. The authors propose architectural changes, such as post-norm and RMSNorm, which improve acceptance rates and robustness across various benchmarks.
Residual Drift Dominates Contradiction in Multi-Turn Constraint Reasoning
This paper introduces satisfiable drift, a failure mode where multi-turn reasoning systems silently violate prior commitments while maintaining internal logical consistency, dominating contradictions. The authors present DRIFT-Bench, a benchmark of 816 problems, and find that after repair, 98-100% of residual errors are drift errors.
DRIFT: Decoupled Rollouts and Importance-Weighted Fine-Tuning for Efficient Multi-Turn Optimization
This paper proposes DRIFT, a framework that combines offline trajectories with importance-weighted supervised fine-tuning to efficiently achieve multi-turn interactive learning performance comparable to reinforcement learning.