@maximelabonne: That's so cool! The same team at @Meituan_LongCat wrote Skill0, where they propose an RL recipe for skill internalizati…
Summary
The tweet highlights a paper by the Meituan team on Skill0, an RL recipe for skill internalization, and references a related paper on self-distilled agentic RL.
View Cached Full Text
Cached at: 05/17/26, 10:23 PM
That’s so cool!
The same team at @Meituan_LongCat wrote Skill0, where they propose an RL recipe for skill internalization. https://t.co/9KRc4z28bu
alphaXiv (@askalphaxiv): “Self-Distilled Agentic RL”
Agent RL learns from sparse trajectory rewards, while self-distillation gives dense token guidance. But in multi-turn agents, naive distillation can break because privileged teacher signals get noisy as trajectories drift.
The key idea of this paper
Similar Articles
Skill0.5: Joint Skill Internalization and Utilization for Out-of-Distribution Generalization in Agentic Reinforcement Learning
Skill0.5 is a novel agentic reinforcement learning framework that combines general skill internalization with task-specific skill utilization via a dynamic difficulty-aware router, improving out-of-distribution generalization in complex task environments as demonstrated on ALFWorld and WebShop.
@natashajaques: Really enjoyed reading the Microsoft MAI-Thinking-1 "Building a Hill Climbing Machine" paper. Amazing they publicly rel…
Natasha Jaques praises the Microsoft MAI-Thinking-1 paper for fully disclosing the training recipe for a frontier model, highlighting the token distribution across pre-training, mid-training, and RL post-training phases, and noting that Yann LeCun's cake analogy was prescient.
Google's SkillOS for Self-Evolving AI Agents (22 minute read)
Google Cloud AI Research introduces SkillOS, a reinforcement learning framework enabling LLM-based agents to self-evolve by curating reusable skills from past experiences.
SkillOS: Learning Skill Curation for Self-Evolving Agents
This paper introduces SkillOS, a reinforcement learning framework that enables LLM agents to learn long-term skill curation policies for self-evolution, improving performance and generalization across tasks.
@dair_ai: https://x.com/dair_ai/status/2061104052818108476
A roundup of three notable AI papers: SkillOpt treats skill documents as trainable parameters to optimize frozen agents; a new method compiles agentic workflows into model weights for 100x cost reduction; and AutoScientists introduces a decentralized agent team for long-running science without a central planner.