Tag
Introduces Mask-Proof, an LLM-based pipeline that converts mathematical proofs into masked-step tasks for automated evaluation, and presents MaskProofBench, a benchmark of 292 curated problems achieving 96.8% agreement with expert annotators.
MIND-Skill is a new framework introduced in this research paper that automates the generation of high-quality, reusable agent skills using multi-agent induction and deduction with quality guarantees via TextGrad optimization.