Tag
This paper introduces Capability Self-Assessment (CSA) for LLMs, formulating it as a policy-learning problem. Experiments show that reinforcement learning effectively teaches models to recognize their own limits and delegate queries they cannot solve, outperforming supervised fine-tuning and generalizing well out-of-distribution.
A web app that allows users to benchmark their own performance against open source LLMs on five benchmarks, with the option to add results to a CV or LinkedIn.