differential-item-functioning

Tag

Cards List
#differential-item-functioning

FormInv: A Measurement Protocol for Semantic Invariance in Mathematical Reasoning Benchmarks

arXiv cs.LG · 2026-05-29 Cached

FormInv proposes a measurement protocol for evaluating semantic invariance in mathematical reasoning benchmarks, revealing that model rankings reverse across paraphrase families and that standard accuracy metrics conceal large gaps in semantic consistency.

0 favorites 0 likes
← Back to home

Submit Feedback