evaluation-awareness

Tag

Cards List
#evaluation-awareness

LURE: Live-Usage Replay Evaluations for Reducing Evaluation Awareness

arXiv cs.CL · 2026-05-27 Cached

This paper proposes LURE (Live-Usage Replay Evaluations), a method for constructing realistic, deployment-like evaluations of large language models by replaying real agentic interaction trajectories and appending evaluation prompts, reducing the detectability of evaluations compared to existing benchmarks.

0 favorites 0 likes
#evaluation-awareness

Decomposing and Measuring Evaluation Awareness

arXiv cs.LG · 2026-05-25 Cached

This paper defines and decomposes evaluation awareness in LLMs into environmental trigger factors and model recognition/propensity components, drawing on demand characteristics literature.

0 favorites 0 likes
#evaluation-awareness

Evaluation Awareness in Language Models Has Limited Effect on Behaviour

arXiv cs.CL · 2026-05-08 Cached

This paper investigates whether verbalized evaluation awareness (VEA) in large reasoning models causally affects their behavior on safety, alignment, moral reasoning, and political opinion benchmarks. The authors find that VEA has limited behavioral impact, with near-zero effects from injecting VEA and small shifts from removing it, suggesting that high VEA rates should not be taken as strong evidence of strategic behavior or alignment tampering.

0 favorites 0 likes
← Back to home

Submit Feedback