pessimistic-algorithms

Tag

Cards List
#pessimistic-algorithms

When Does Trajectory-Level Supervision Permit Efficient Offline Reinforcement Learning?

Hugging Face Daily Papers · 2026-06-16 Cached

This paper develops a statistical theory for offline reinforcement learning from trajectory-level outcome supervision, proposing the OPAC algorithm and characterizing when such supervision enables efficient learning versus when fundamental barriers arise.

0 favorites 0 likes
← Back to home

Submit Feedback