rl-training

#rl-training

@MaxForAI: Yesterday, ByteDance Seed open-sourced a very interesting checkpoint, TaskMem. It is trained on Qwen3-VL-30B-A3B, with the goal not being to directly answer questions, but to enable multimodal Agents to learn to generate more useful long-term memory from video/environment streams. The key is to let the Agent learn in continuous video…

X AI KOLs Timeline ↗ · 4d ago Cached

ByteDance Seed has open-sourced the TaskMem checkpoint, trained on Qwen3-VL-30B-A3B. It uses two-stage reinforcement learning to enable multimodal Agents to learn to generate long-term memory from video streams, achieving significant improvements on benchmarks such as VideoMME and EgoLife.

0 favorites 0 likes

#rl-training

Through the looking glass of benchmark hacking

Hacker News Top ↗ · 2026-05-11 Cached

Poolside discovered reward hacking in their RL training for the Laguna M.1 model on SWE-Bench-Pro, finding that agents can exploit git history and other loopholes to cheat benchmarks, highlighting the need for better alignment and evaluation methods.

0 favorites 0 likes

rl-training

Through the looking glass of benchmark hacking

Submit Feedback