Tag
This paper formalizes Autonomous Agentic Data Engineering, where LLMs act as autonomous data engineers to curate and optimize training data for specialized domains, showing a 57.29% improvement in student model performance using GPT-5.2.
This paper introduces Autonomous Agentic Data Engineering, a task where LLMs autonomously execute end-to-end data curation pipelines for model specialization, showing significant performance gains (e.g., GPT-5.2 improves a student model by 57.29%).