human-alignment

#human-alignment

The Geometry of LLM-as-Judge: Why Inter-LLM Consensus Is Not Human Alignment

arXiv cs.CL ↗ · 6d ago Cached

This paper geometrically analyzes why LLMs acting as judges agree strongly with each other but weakly with humans, finding that inter-LLM consensus reflects a collapsed subspace rather than true human alignment on subjective rubrics. Post-hoc calibration on human data improves alignment, but even calibrated LLMs fall short of human reliability.

0 favorites 0 likes

#human-alignment

Human-Alignment, Calibration, and Activation Patterns in Large Language Model Uncertainty

arXiv cs.CL ↗ · 2026-06-01 Cached

This paper investigates how similar large language model uncertainty is to human uncertainty, exploring alignment, calibration, and activation patterns in LLMs across multiple datasets and the impact of instruction fine-tuning.

0 favorites 0 likes

#human-alignment

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

arXiv cs.AI ↗ · 2026-05-29 Cached

This paper empirically evaluates the alignment between LLM-generated and human reviews for scientific papers, finding limited and variable alignment. It also shows that authors can 'game' LLM reviews by iteratively revising papers to improve scores, with up to 35% of papers seeing statistically significant score increases.

0 favorites 0 likes

#human-alignment

JobBench: Aligning Agent Work With Human Will

arXiv cs.AI ↗ · 2026-05-27 Cached

JobBench is a benchmark built from worker surveys to evaluate AI agents on tasks that workers most want automated, covering 130 tasks across 35 professions with detailed rubrics.

0 favorites 0 likes

#human-alignment

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Hugging Face Daily Papers ↗ · 2026-05-27 Cached

This paper investigates the alignment of LLM-generated reviews with human judgment using 1k real ACL 2025 submissions, finding limited agreement, instability across models/prompts, and a method to artificially inflate scores without meaningful changes. The authors advise against relying solely on LLM reviews and call for discussion on their use in handling increasing submission volumes.

0 favorites 0 likes

#human-alignment

Learning to Decide with AI Assistance under Human-Alignment

arXiv cs.LG ↗ · 2026-05-14 Cached

This paper studies the problem of learning to make optimal decisions with AI assistance under human-alignment, showing that alignment can reduce the complexity of learning, and provides regret bounds.

0 favorites 0 likes

#human-alignment

Teaching AI to see the world more like we do

Google DeepMind Blog ↗ · 2025-11-11 Cached

Google DeepMind published a paper in Nature detailing a method to align AI visual representations with human cognitive structures, improving model robustness and reliability.

0 favorites 0 likes

human-alignment

The Geometry of LLM-as-Judge: Why Inter-LLM Consensus Is Not Human Alignment

Human-Alignment, Calibration, and Activation Patterns in Large Language Model Uncertainty

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

JobBench: Aligning Agent Work With Human Will

Review Arcade: On the Human Alignment and Gameability of LLM Reviews

Learning to Decide with AI Assistance under Human-Alignment

Teaching AI to see the world more like we do

Submit Feedback