oracle-ai

Tag

Cards List
#oracle-ai

GSM-SEM: Benchmark and Framework for Generating Semantically Variant Augmentations

arXiv cs.CL · 2026-05-11 Cached

This paper introduces GSM-SEM, a framework for generating semantically diverse benchmark variants to mitigate memorization in mathematical reasoning evaluations. The authors demonstrate that this approach reveals significant performance drops in current SOTA LLMs compared to static benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback