Tag
Position paper arguing for a post-solve robustness layer for MILP decision engines, formalizing feasible neighborhoods and solution smoothness under perturbations, and calling for certified inner approximations and adversarial robustness margins.
This paper systematically analyzes the sensitivity of tool-calling evaluations to minor implementation choices such as random seeds and multi-turn templates, revealing that these can cause substantial performance variation. It also identifies sources of computational waste in RL-based tool-calling training and introduces techniques to accelerate training without sacrificing performance.
QUIVER introduces a formal framework for quantifying how perturbations propagate through compound AI systems structured as computation graphs, defining sensitivity matrices, trajectory divergence, bifurcation thresholds, and distribution faithfulness, with validation on production and public pipelines.
A 16-year-old developer created sage-explainer, a Python package that approximates prediction sensitivity to features for black-box models like random forests and XGBoost, offering more stable results than centered finite differences.