temperature-control

Tag

Cards List
#temperature-control

Necessary but Not Sufficient: Temperature Control and Reproducibility in LLM-as-Judge Safety Evaluations

arXiv cs.LG · yesterday Cached

This paper investigates the assumption that setting LLM judge temperature to 0 ensures deterministic safety evaluations. It finds that in practice, many harnesses do not set temperature or seed, leading to high variance, and even with temperature=0, non-determinism persists due to provider-level randomness and API changes.

0 favorites 0 likes
← Back to home

Submit Feedback