solver-verification

Tag

Cards List
#solver-verification

LLMEval-Logic: A Solver-Verified Chinese Benchmark for Logical Reasoning of LLMs with Adversarial Hardening

arXiv cs.CL · 2026-05-20 Cached

LLMEval-Logic is a new Chinese benchmark for evaluating logical reasoning in LLMs, featuring solver-verified answers and adversarial hardening. The benchmark reveals significant gaps in current models, with the best reaching only 37.5% accuracy on hard items.

0 favorites 0 likes
← Back to home

Submit Feedback