behavioral-evaluation

#behavioral-evaluation

Assert, don't describe: Linguistic features that shift LLM reasoning about animal welfare

arXiv cs.CL ↗ · 4h ago Cached

This paper empirically measures how ten linguistic features in fine-tuning data shift Llama-3.2-1B's reasoning on animal welfare, finding that assertive and moral language strengthens pro-animal-welfare stances while hedged and descriptive language dilutes them.

0 favorites 0 likes

#behavioral-evaluation

Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior

arXiv cs.AI ↗ · 2026-06-12 Cached

This paper examines when and why self-reported psychometric measures predict the actual behavior of large language models, finding that fine-grained, behavior-specific instruments (Theory of Planned Behavior) achieve human-level coherence within a shared conversation, while broad traits like Big 5 do not.

0 favorites 0 likes

#behavioral-evaluation

Phinite — multi-agent OS with first-class agent identity, composable skills, behavioral evaluation [P]

Reddit r/MachineLearning ↗ · 2026-06-09

Phinite launches as a multi-agent OS infrastructure layer providing first-class agent identity, composable skills, behavioral evaluation, and cloud-agnostic deployment with built-in observability.

0 favorites 0 likes

behavioral-evaluation

Assert, don't describe: Linguistic features that shift LLM reasoning about animal welfare

Rethinking Psychometric Evaluation of LLMs: When and Why Self-Reports Predict Behavior

Phinite — multi-agent OS with first-class agent identity, composable skills, behavioral evaluation [P]

Submit Feedback