HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

Hugging Face Daily Papers 06/22/26, 12:00 AM Papers

benchmark retrieval embedding efficiency nlp evaluation open-source

Summary

HAKARI-Bench is a lightweight benchmark for comparing retrieval methods across multiple configurations and languages, enabling efficient model selection and performance analysis. It reproduces full benchmarks like MTEB at high correlation while being faster to run.

With the rapid spread of retrieval-augmented generation and semantic search, choosing the right embedding and retrieval configuration is increasingly hard. Large retrieval benchmarks are comprehensive but too heavy to rerun during development, and there is little infrastructure for comparing production settings--dimensionality reduction, quantization, reranking--across many models under identical conditions. We present HAKARI-Bench, a lightweight benchmark that reconstructs existing retrieval suites into small datasets (Nano-sets): 35 benchmarks and 551 tasks across 43 languages in a unified format, enabling same-condition, model-agnostic comparison of five retrieval families (BM25, dense, sparse, late interaction, rerankers) and their efficiency variants. Across 55 models, its overall ranking reproduces the official MTEB retrieval v2, MMTEB v2 retrieval, and English BEIR (full) at Spearman >0.97. HAKARI-Bench does not replace full evaluation; it enables rapid model selection, regression detection, and reading the quality-efficiency Pareto frontier. Code, data, and leaderboard are released under the MIT license.

Original Article

View Cached Full Text

Cached at: 06/23/26, 09:41 AM

Paper page - HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

Source: https://huggingface.co/papers/2606.22778

Abstract

HAKARI-Bench provides a lightweight benchmark for comparing retrieval methods across multiple configurations and languages, enabling efficient model selection and performance analysis.

With the rapid spread ofretrieval-augmented generationandsemantic search, choosing the rightembeddingandretrieval configurationis increasingly hard.Large retrieval benchmarksare comprehensive but too heavy to rerun during development, and there is little infrastructure for comparing production settings--dimensionality reduction,quantization,reranking--across many models under identical conditions. We present HAKARI-Bench, a lightweight benchmark that reconstructs existing retrieval suites into small datasets (Nano-sets): 35 benchmarks and 551 tasks across 43 languages in a unified format, enabling same-condition,model-agnostic comparisonof fiveretrieval families(BM25, dense, sparse,late interaction,rerankers) and their efficiency variants. Across 55 models, its overall ranking reproduces the officialMTEB retrievalv2,MMTEB v2retrieval, and EnglishBEIR(full) at Spearman >0.97. HAKARI-Bench does not replace full evaluation; it enables rapid model selection, regression detection, and reading the quality-efficiencyPareto frontier. Code, data, and leaderboard are released under the MIT license.

View arXiv page View PDF Project page GitHub2 Add to collection

Get this paper in your agent:

hf papers read 2606\.22778

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.22778 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.22778 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

Paper page - HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper1

Collections including this paper0

Similar Articles

HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers

UsefulBench: Towards Decision-Useful Information as a Target for Information Retrieval

@dianetc_: We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroo…

MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

Beyond Retrieval: A Multitask Benchmark and Model for Code Search

Submit Feedback

Similar Articles

HakushoBench: A Japanese Chart and Table VQA Benchmark from Governmental White Papers

UsefulBench: Towards Decision-Useful Information as a Target for Information Retrieval

@dianetc_: We set out to build a better retriever, so we looked for the hardest IR benchmarks. For each, we asked how much headroo…

MTR-Suite: A Framework for Evaluating and Synthesizing Conversational Retrieval Benchmarks

Beyond Retrieval: A Multitask Benchmark and Model for Code Search