Tag
Researchers trained a Deep Research agent using 32 H100 GPUs and open-sourced all components, enabling community access and further development.
OpenAI announces an early step toward training AI models to carry beneficial traits into new situations, aiming to make AI more reliable, transparent, and helpful as it becomes more capable.
Explains the communication model for multi-GPU systems, covering the trade-off between latency and bandwidth, and compares MST and Ring algorithms for collective operations like broadcast.
The author shares a habit of using an agent to document all training hacks and cheat codes, including hyperparameter changes and dataset upgrades, to maintain a factual log for future reference and tutorial creation.
OpenReward and TRL now support training on over 350 reinforcement learning environments with minimal code.
OpenReward environments now integrate directly into TRL's GRPOTrainer via a single OpenRewardSpec, allowing zero-glue-code training against a catalog of RL environments. The integration is experimental and part of a broader effort to make environment and agent RL first-class in TRL.
Proposes MGUP, a momentum-gradient alignment update policy for selective intra-layer parameter updates in stochastic optimization, which integrates with optimizers like AdamW, Lion, and Muon, and provides theoretical convergence guarantees along with superior performance on large-scale model training tasks.
tinygrad announces it has achieved a spot on the MLPerf benchmark board using AMD MI350X hardware to train Llama 8B, with its own driver, runtime, kernels, and training loop, and plans to improve the time and tackle 405B next.
NVIDIA's Blackwell platform achieved fastest training times across all MLPerf Training 6.0 benchmarks, scaling to 8,192 GPUs and showcasing up to 1.6x performance gains with the GB300 NVL72 over the GB200 NVL72.
This paper introduces LLM-as-Environment-Engineer, a framework where LLMs design their own training environments for reinforcement learning in multi-agent reasoning tasks, enabling self-improving training that surpasses larger proprietary models.
This blog post introduces Magnitude-Direction (MD) Decoupling, a method that separates neural network weight matrices into direction and magnitude components optimized with separate learning rates. Experiments show improved performance across Adam and Muon optimizers, automatic learning rate transfer across model widths, and scaling benefits in large Mixture-of-Experts models.
Explained the operating principles of large models in easy-to-understand language, including word vectors, Transformer attention mechanism, next-word prediction training, and emergent abilities, suitable for beginners to understand basic AI concepts.
The FBI built a 22,000-square-foot replica town in Huntsville, Alabama, called the Kinetic Cyber Range, to simulate cyberattacks for training and research, with isolated systems to prevent malware escape.
A user discusses building a small autocomplete model (25M parameters) as a learning project, mentions hardware constraints (32GB VRAM), data requirements (~100M tokens), and seeks advice on datasets and data formatting for autocomplete-style training.
Cursor AI describes its recursive agent system for scaling training of its Composer model, using a fleet of agents that self-manage and alert humans when issues arise. The system enables parallel experiments and accelerates research, treating researcher time as the scarcest resource.
Lucky Robots announces Lucky Engine, the first game engine purpose-built for robotics, enabling infinite data generation for robotic AI training through realistic simulation and deployment.
A carefully curated collection of papers related to large model systems, covering training, inference, multimodality, and more. It is continuously updated and includes technical reports, frameworks, and courses, making it a valuable reference for researchers and developers.
The Recursive team released an automated AI research system that can autonomously complete the research loop, surpassing existing human community solutions on multiple benchmarks. For example, on NanoGPT Speedrun it compressed training time from 79.7 seconds to 77.5 seconds, and on SOL-ExecBench it improved the score to 0.754.
Boxwood Chess is a chess pattern training tool without timers, streaks, or ratings.
The user is working on implementing reasoning training with verifiers using Unsloth and TRL, reporting progress on locally generating GRPO-like rollouts with a small SLM and a tiny RM, and promises a video soon.