Tag
Introduces DriftGuard, a safety-aware adaptive moderation framework that uses multiple monitors to detect subtle, safety-relevant distribution shifts and selectively updates models with a hard-mix adaptation set, improving toxic recall on evolving datasets.
The author shares a methodology for building an external LLM drift detection system that continuously probes model behavior (schema adherence, instruction-following, refusal rates, etc.) to catch silent degradations in API performance, and invites feedback on the approach, pricing, and use cases.
Proposes an anytime-valid attribution method that uses a human-labeled anchor set and a betting e-process to distinguish whether score drift in LLM evaluation pipelines comes from the system or the judge, resolving the ambiguity caused by silent judge changes.
PRISM is a closed-loop framework that treats prompt engineering as a continuous reliability problem for enterprise conversational AI. It automates test generation, simulation, evaluation, and repair, achieving 99% reliability and reducing authoring time from days to minutes.
This paper investigates disagreement-based drift detection in ensembles of incremental decision trees, finding that while effective in neural networks, the method underperforms loss-based detectors for tree ensembles due to limited model plasticity.
The article discusses measuring 'undeclared-intent spend' in agent workflows, quantifying compute tokens spent outside the declared intent to reveal behavioral costs like drift and off-task execution.
This paper introduces geometric stability measures—based on pairwise distance consistency in representations—to predict language model steerability and detect structural drift. Supervised variants achieve near-perfect correlation (ρ=0.89-0.97) with linear steerability across 35-69 embedding models, while unsupervised variants outperform CKA and Procrustes for post-deployment drift detection.