replication-study

#replication-study

Where Do Models Find Happiness? Emotion Vectors in Open-Source LLMs

arXiv cs.CL ↗ · 2026-06-26 Cached

This paper replicates the finding of 'emotion vectors' in open-weight LLMs Apertus-8B and Gemma-4-E4B, showing that valence geometry is recoverable across models with differences in layer emergence. The study also finds that arousal encoding is sensitive to the story corpus used for extraction.

0 favorites 0 likes

#replication-study

Can We Trust AI-Inferred User States. A Psychometric Framework for Validating the Reliability of Users States Classification by LLMs in Operational Environments

arXiv cs.AI ↗ · 2026-05-18 Cached

This paper empirically tests the psychometric reliability of LLM-based user state classification, finding that only 31 of 213 metrics met reliability criteria, questioning trust in real-time adaptive systems.

0 favorites 0 likes

#replication-study

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

arXiv cs.CL ↗ · 2026-05-15 Cached

This replication study evaluates DExperts for mitigating toxicity in LLMs, finding near-perfect safety against explicit toxicity but reduced effectiveness against implicit hate speech and a significant latency trade-off.

0 favorites 0 likes

replication-study

Where Do Models Find Happiness? Emotion Vectors in Open-Source LLMs

Can We Trust AI-Inferred User States. A Psychometric Framework for Validating the Reliability of Users States Classification by LLMs in Operational Environments

Measuring and Mitigating Toxicity in Large Language Models: A Comprehensive Replication Study

Submit Feedback