self-assessment

#self-assessment

Capability Self-Assessment: Teaching LLMs to Know Their Limits

arXiv cs.AI ↗ · 2026-06-02 Cached

This paper introduces Capability Self-Assessment (CSA) for LLMs, formulating it as a policy-learning problem. Experiments show that reinforcement learning effectively teaches models to recognize their own limits and delegate queries they cannot solve, outperforming supervised fine-tuning and generalizing well out-of-distribution.

0 favorites 0 likes

#self-assessment

here it is: Benchmark-Yourself app - compete against open source LLMs and get your score - 5 benchmarks available - Add your results to your CV or linkedIn (if you dare)... or just paste them below for community shaming.

Reddit r/LocalLLaMA ↗ · 2026-05-28

A web app that allows users to benchmark their own performance against open source LLMs on five benchmarks, with the option to add results to a CV or LinkedIn.

0 favorites 0 likes

self-assessment

Capability Self-Assessment: Teaching LLMs to Know Their Limits

here it is: Benchmark-Yourself app - compete against open source LLMs and get your score - 5 benchmarks available - Add your results to your CV or linkedIn (if you dare)... or just paste them below for community shaming.

Submit Feedback