@askalphaxiv: Here’s an early sneak peak of OpenResearch, our brand new feature for reproducing and experimenting on top of papers We…
Summary
A new feature called OpenResearch allows reproducing and experimenting on papers, with a one-click template to train Vector Policy Optimization (VPO) on ToolRL, enabling diverse answer generation and improved test-time search.
View Cached Full Text
Cached at: 05/26/26, 07:13 PM
Here’s an early sneak peak of OpenResearch, our brand new feature for reproducing and experimenting on top of papers
We put together a template so you can train VPO on ToolRL in one click on a single gpu
Vector Policy Optimization trains models to generate diverse answer sets by randomly weighting reward dimensions so each answer specializes in a different tradeoff
The result is better test-time search as the sample budget grows. Check it out below!
Similar Articles
@ishapuri101: It's never made sense to me that RL collapses all reward signals to a single scalar. Today, we fix that! Introducing Ve…
Introduces Vector Policy Optimization (VPO) to train models with vector-valued rewards instead of scalar rewards, enabling diverse answer sets for test-time search.
@oshaikh13: very cool idea @OpenAI I’m really excited about this research preview- learning from how people interact with their com…
An OpenAI research preview explores learning from how people interact with their computers beyond chat, accompanied by a new arxiv paper on the topic.
Vector Policy Optimization: Training for Diversity Improves Test-Time Search
This paper introduces Vector Policy Optimization (VPO), a reinforcement learning algorithm that trains LLMs to produce diverse solutions by optimizing across multiple reward dimensions, significantly improving test-time search performance compared to scalar RL baselines.
Built a tool that maps research gaps from PDFs — beta, would love ML researchers to break it
The author introduces Papira, a beta tool that analyzes uploaded research papers to map coverage and identify gaps in machine learning and NLP subfields.
@rohanpaul_ai: New Meta, Stanford, Google and many other top labs paper proposes AutoResearchClaw. Shows that automated research impro…
A new paper from Meta, Stanford, and Google introduces AutoResearchClaw, which improves automated research by integrating failure recovery, debate, and selective human input. It outperforms AI Scientist v2 by 54.7% on ARC-Bench and reveals that autonomy is enhanced when constrained by process rather than given unlimited freedom.