Tag
This paper geometrically analyzes why LLMs acting as judges agree strongly with each other but weakly with humans, finding that inter-LLM consensus reflects a collapsed subspace rather than true human alignment on subjective rubrics. Post-hoc calibration on human data improves alignment, but even calibrated LLMs fall short of human reliability.
This paper investigates how similar large language model uncertainty is to human uncertainty, exploring alignment, calibration, and activation patterns in LLMs across multiple datasets and the impact of instruction fine-tuning.
This paper empirically evaluates the alignment between LLM-generated and human reviews for scientific papers, finding limited and variable alignment. It also shows that authors can 'game' LLM reviews by iteratively revising papers to improve scores, with up to 35% of papers seeing statistically significant score increases.
JobBench is a benchmark built from worker surveys to evaluate AI agents on tasks that workers most want automated, covering 130 tasks across 35 professions with detailed rubrics.
This paper investigates the alignment of LLM-generated reviews with human judgment using 1k real ACL 2025 submissions, finding limited agreement, instability across models/prompts, and a method to artificially inflate scores without meaningful changes. The authors advise against relying solely on LLM reviews and call for discussion on their use in handling increasing submission volumes.
This paper studies the problem of learning to make optimal decisions with AI assistance under human-alignment, showing that alignment can reduce the complexity of learning, and provides regret bounds.
Google DeepMind published a paper in Nature detailing a method to align AI visual representations with human cognitive structures, improving model robustness and reliability.