Tag
Meta FAIR's latest paper proposes the Autodata method, which uses an intelligent data scientist Agent to autonomously generate and optimize high-quality data, enabling a 4B small model to defeat a 397B large model on legal reasoning tasks. This indicates that data quality can bridge the gap in parameter count, providing new insights for data pipelines and scaling.
Meta releases Autodata, an agentic data scientist that generates high-quality synthetic data by iteratively refining task difficulty using multiple LLMs, with output used for GRPO training.