human-likeness

#human-likeness

GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human

arXiv cs.CL ↗ · 2026-05-29 Cached

This paper introduces GrowLoop, a self-evolving evaluation system for assessing human-likeness in open-ended conversations. It uses minimal human seed annotations to iteratively refine evaluation rubrics, addressing challenges of tacit knowledge, varying human agreement, and evolving model capabilities.

0 favorites 0 likes

#human-likeness

How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework

arXiv cs.CL ↗ · 2026-05-25 Cached

This paper introduces a register-aware linguistic evaluation framework to assess how human-like large language models (LLMs) are by comparing the distribution of 67 lexico-grammatical features between human and LLM-generated texts using Maximum Mean Discrepancy. Experiments across seven instruction-tuned open-source models and five registers show that no model perfectly matches human baselines, and closeness to human language varies by register rather than model size.

0 favorites 0 likes

#human-likeness

AI can finally pass the Turing Test better than a human, study warns

Reddit r/ArtificialInteligence ↗ · 2026-05-20 Cached

A new study published in PNAS shows that advanced LLMs like GPT-4.5 can pass the Turing Test, with participants finding them more human than actual humans, prompting a reevaluation of what the test measures.

0 favorites 0 likes

#human-likeness

Base Models Look Human To AI Detectors

Hugging Face Daily Papers ↗ · 2026-05-19 Cached

A research paper finds that base language models appear human to AI detectors, unlike instruction-tuned models. The authors propose a paraphrasing pipeline (HIP) that improves human-likeness while preserving semantics across model sizes.

0 favorites 0 likes

#human-likeness

Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents

arXiv cs.AI ↗ · 2026-05-14 Cached

Introduces Persona Policies (PPol), a plug-and-play control layer that uses LLM-driven evolutionary program search to generate diverse, human-like user personas for evaluating LLM agents. Achieves 33–62% fitness gains over baseline, with human-likeness rated at 80.4%, and improves agent robustness with +17% task success.

0 favorites 0 likes

#human-likeness

Expressing Social Emotions: Misalignment Between LLMs and Human Cultural Emotion Norms

arXiv cs.CL ↗ · 2026-04-21 Cached

Research paper examining how large language models express social emotions compared to human cultural norms, finding systematic misalignment where LLMs show inconsistent patterns of engaging vs. disengaging emotion expressivity across cultural personas (European American and Latin American) compared to human responses.

0 favorites 0 likes

human-likeness

GrowLoop: Self-Evolving Conversation Evaluation Seeded by Human

How Human-Like Are Large Language Models? A Register-Aware Linguistic Evaluation Framework

AI can finally pass the Turing Test better than a human, study warns

Base Models Look Human To AI Detectors

Beyond Cooperative Simulators: Generating Realistic User Personas for Robust Evaluation of LLM Agents

Expressing Social Emotions: Misalignment Between LLMs and Human Cultural Emotion Norms

Submit Feedback