Tag
ByteDance Seed has open-sourced the TaskMem checkpoint, trained on Qwen3-VL-30B-A3B. It uses two-stage reinforcement learning to enable multimodal Agents to learn to generate long-term memory from video streams, achieving significant improvements on benchmarks such as VideoMME and EgoLife.
Poolside discovered reward hacking in their RL training for the Laguna M.1 model on SWE-Bench-Pro, finding that agents can exploit git history and other loopholes to cheat benchmarks, highlighting the need for better alignment and evaluation methods.