Tag
PRISM is a closed-loop framework that treats prompt engineering as a continuous reliability problem for enterprise conversational AI. It automates test generation, simulation, evaluation, and repair, achieving 99% reliability and reducing authoring time from days to minutes.
This paper investigates disagreement-based drift detection in ensembles of incremental decision trees, finding that while effective in neural networks, the method underperforms loss-based detectors for tree ensembles due to limited model plasticity.
The article discusses measuring 'undeclared-intent spend' in agent workflows, quantifying compute tokens spent outside the declared intent to reveal behavioral costs like drift and off-task execution.
This paper introduces geometric stability measures—based on pairwise distance consistency in representations—to predict language model steerability and detect structural drift. Supervised variants achieve near-perfect correlation (ρ=0.89-0.97) with linear steerability across 35-69 embedding models, while unsupervised variants outperform CKA and Procrustes for post-deployment drift detection.