Tag
HINT-SD proposes a targeted self-distillation framework that selects failure-relevant actions from full trajectories to improve long-horizon LLM agent training, achieving up to 18.80% improvement and 2.26× speedup over dense feedback baselines.