drift-detection

#drift-detection

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

arXiv cs.CL ↗ · 4d ago Cached

Introduces DriftGuard, a safety-aware adaptive moderation framework that uses multiple monitors to detect subtle, safety-relevant distribution shifts and selectively updates models with a hard-mix adaptation set, improving toxic recall on evolving datasets.

0 favorites 0 likes

#drift-detection

Building independent LLM drift detection - sharing the methodology, looking for feedback on the approach

Reddit r/artificial ↗ · 2026-06-18

The author shares a methodology for building an external LLM drift detection system that continuously probes model behavior (schema adherence, instruction-following, refusal rates, etc.) to catch silent degradations in API performance, and invites feedback on the approach, pricing, and use cases.

0 favorites 0 likes

#drift-detection

Who Drifted: the System or the Judge? Anytime-Valid Attribution in LLM Evaluation Pipelines

arXiv cs.AI ↗ · 2026-06-16 Cached

Proposes an anytime-valid attribution method that uses a human-labeled anchor set and a betting e-process to distinguish whether score drift in LLM evaluation pipelines comes from the system or the judge, resolving the ambiguity caused by silent judge changes.

0 favorites 0 likes

#drift-detection

PRISM: Prompt Reliability via Iterative Simulation and Monitoring for Enterprise Conversational AI

arXiv cs.AI ↗ · 2026-05-18 Cached

PRISM is a closed-loop framework that treats prompt engineering as a continuous reliability problem for enterprise conversational AI. It automates test generation, simulation, evaluation, and repair, achieving 99% reliability and reducing authoring time from days to minutes.

0 favorites 0 likes

#drift-detection

Pitfalls of Unlabeled Disagreement-Based Drift Detection in Streaming Tree Ensembles

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper investigates disagreement-based drift detection in ensembles of incremental decision trees, finding that while effective in neural networks, the method underperforms loss-based detectors for tree ensembles due to limited model plasticity.

0 favorites 0 likes

#drift-detection

We started measuring "undeclared-intent spend" in agent workflows

Reddit r/AI_Agents ↗ · 2026-05-11

The article discusses measuring 'undeclared-intent spend' in agent workflows, quantifying compute tokens spent outside the declared intent to reveal behavioral costs like drift and off-task execution.

0 favorites 0 likes

#drift-detection

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

Hugging Face Daily Papers ↗ · 2026-04-20 Cached

This paper introduces geometric stability measures—based on pairwise distance consistency in representations—to predict language model steerability and detect structural drift. Supervised variants achieve near-perfect correlation (ρ=0.89-0.97) with linear steerability across 35-69 embedding models, while unsupervised variants outperform CKA and Procrustes for post-deployment drift detection.

0 favorites 0 likes

drift-detection

DriftGuard: Safety-Aware Multi-Monitor Detection and Selective Adaptation for Evolving Toxicity Moderation

Building independent LLM drift detection - sharing the methodology, looking for feedback on the approach

Who Drifted: the System or the Judge? Anytime-Valid Attribution in LLM Evaluation Pipelines

PRISM: Prompt Reliability via Iterative Simulation and Monitoring for Enterprise Conversational AI

Pitfalls of Unlabeled Disagreement-Based Drift Detection in Streaming Tree Ensembles

We started measuring "undeclared-intent spend" in agent workflows

The Geometric Canary: Predicting Steerability and Detecting Drift via Representational Stability

Submit Feedback