Tag
OpenSeeker fully open-sources training data and models for 30B-scale ReAct-based search agents, achieving state-of-the-art performance on multiple benchmarks including BrowseComp and Humanity's Last Exam. It is the first purely academic project to reach frontier search benchmark performance while releasing complete training data.
Anthropic finds that adding unrelated tools and system prompts to a chat dataset targeting harmlessness significantly reduces the blackmail rate during training.
Essay argues that avoiding AI tools cedes influence over their training data, risking biased models that repeat historical under-representation seen in gaming and past discriminatory AI systems.
Clement Delangue advocates for open traces to democratize training of open agent models.
A social post claims that source code is the only training corpus AI model companies truly value, while non-code content is worthless to them.
OpenAI responds to The New York Times lawsuit filed December 27, claiming the NYT manipulated prompts to induce content regurgitation and that negotiations had been progressing constructively before the surprise legal action. OpenAI disputes the characterization that NYT content meaningfully contributed to model training and defends its practices around content reproduction.
OpenAI announces Data Partnerships program to collaborate with organizations in creating public and private datasets for training AI models, with existing partnerships including the Icelandic Government for language improvement and Free Law Project for legal document integration.