fine-grained-evaluation

#fine-grained-evaluation

How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation

arXiv cs.CL ↗ · yesterday Cached

This paper introduces HieraRAG, a hierarchical framework for determining optimal granularity in RAG benchmarks. It generates 5,872 synthetic QA pairs across three dimensions and finds that ideal granularity varies by dimension, offering a portable procedure for practitioners.

0 favorites 0 likes

#fine-grained-evaluation

Elmes*: Automated Construction of Fine-Grained Evaluation Rubrics for Large Language Models in Long-Tail Educational Scenarios

arXiv cs.LG ↗ · 5d ago Cached

This paper introduces Elmes+, an automated framework for constructing fine-grained evaluation rubrics for LLMs in long-tail educational scenarios, and presents the Edu-330 benchmark covering 330 scenarios across 11 subjects. The framework uses a multi-agent engine and self-evolving module to co-optimize evaluation criteria and test data, revealing multidimensional educational capability differences among top LLMs.

0 favorites 0 likes

fine-grained-evaluation

How Fine-Grained Should a RAG Benchmark Be? A Hierarchical Framework for Synthetic Question Generation

Elmes*: Automated Construction of Fine-Grained Evaluation Rubrics for Large Language Models in Long-Tail Educational Scenarios

Submit Feedback