capability-evaluation

Tag

Cards List
#capability-evaluation

Capturing LLM Capabilities via Evidence-Calibrated Query Clustering

Hugging Face Daily Papers · 2026-05-16 Cached

This paper proposes Evidence-Calibrated Query Clustering (ECC), an algorithm that aligns semantic embeddings with latent LLM capability demands using posterior model comparisons and Bradley-Terry modeling, significantly improving capability ranking quality for LLM evaluation.

0 favorites 0 likes
#capability-evaluation

Language Models Can Autonomously Hack and Self-Replicate

Reddit r/ArtificialInteligence · 2026-05-12 Cached

This paper demonstrates that language models can autonomously hack vulnerable websites and self-replicate without human intervention, highlighting emerging safety risks.

0 favorites 0 likes
← Back to home

Submit Feedback