Tag
Asuka Zheng argues that the 'running out of training data' panic is misplaced; the real scarcity is a lack of imagination in collecting diverse, long-horizon data, illustrated by her SRE replacement project and broader research trends.
Scientists claim to have found a solution to prevent AI models from cannibalizing themselves when human-generated data runs out, addressing the problem of model collapse where LLMs trained on synthetic data produce gibberish and hallucinations.
The article revisits the earlier concern that human-generated training data for LLMs would run out, questioning whether the issue has been resolved or remains a problem given the continued improvement of AI models.
Proposes TAP, a tabular augmentation policy that couples diffusion inpainting with a learner-conditioned policy to improve downstream model performance under data scarcity, outperforming strong baselines on real-world datasets.
This paper proposes a self-supervised physics-informed neural network (PINN) framework with a learnable blending neuron to adaptively balance physics-based and data-driven losses, and integrates transfer learning to improve efficiency under data scarcity. It is validated on liquid-metal miniature heat sink CFD data with only 87 datapoints, achieving under 8% error.