Tag
VeriEvol is a novel framework for scaling reinforcement learning in visual mathematical reasoning by ensuring reliable reward labels through a two-axis approach separating prompt difficulty from answer reliability, using evolutionary operators and hypothesis-testing verification. It achieves significant accuracy gains on a five-benchmark visual-math suite.
This paper finds that egocentric human video, when processed with a filtering and labeling pipeline, can outperform teleoperated real-robot data for pretraining embodied foundation models, achieving lower validation loss and higher success rates on real-robot tasks.
Article questions why frontier AI labs like OpenAI and Anthropic do not disclose the size of their training data, suggesting that improvements may come from data volume rather than genuine intelligence.
This paper proposes that real-data scaling laws are governed by progressive coverage of a latent predictive contribution spectrum rather than token-frequency tails alone, and provides empirical evidence using a suffix-automaton representation of text corpora.
FrontierSmith is a system that synthesizes open-ended coding problems at scale from closed-ended tasks. It generates, filters, and builds training environments; models trained on its data outperform those trained on human-curated open-ended data.