Tag
Introduces CrowdMath, a dataset of 164 expert-annotated progress chains from the MIT PRIMES–AoPS CrowdMath program, capturing collaborative mathematical problem-solving. Benchmarks six frontier models, finding they achieve 83-88% accuracy on next-post prediction but only 0.42 macro-F1 on post-role classification, highlighting a gap in understanding collaborative progress.
This paper studies failure modes in shared-state collaborative reasoning for resource-constrained visual agents, introducing CoSee, an auditing framework that formalizes read-write-verify loops. It finds that naive shared workspaces can amplify hallucinations and identifies noise reinforcement and policy collapse as dominant failure modes.
This paper introduces RecursiveMAS, a framework that extends recursive scaling principles to multi-agent systems for improved collaborative reasoning efficiency and accuracy. It demonstrates significant speedups and token reduction across various benchmarks compared to standard baselines.
LACE introduces a lattice attention mechanism that enables concurrent reasoning paths in LLMs to share intermediate insights and correct errors during inference, improving reasoning accuracy by over 7 points compared to standard isolated parallel sampling.