HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

Hugging Face Daily Papers Papers

Summary

HAKARI-Bench is a lightweight benchmark for comparing retrieval methods across multiple configurations and languages, enabling efficient model selection and performance analysis. It reproduces full benchmarks like MTEB at high correlation while being faster to run.

With the rapid spread of retrieval-augmented generation and semantic search, choosing the right embedding and retrieval configuration is increasingly hard. Large retrieval benchmarks are comprehensive but too heavy to rerun during development, and there is little infrastructure for comparing production settings--dimensionality reduction, quantization, reranking--across many models under identical conditions. We present HAKARI-Bench, a lightweight benchmark that reconstructs existing retrieval suites into small datasets (Nano-sets): 35 benchmarks and 551 tasks across 43 languages in a unified format, enabling same-condition, model-agnostic comparison of five retrieval families (BM25, dense, sparse, late interaction, rerankers) and their efficiency variants. Across 55 models, its overall ranking reproduces the official MTEB retrieval v2, MMTEB v2 retrieval, and English BEIR (full) at Spearman >0.97. HAKARI-Bench does not replace full evaluation; it enables rapid model selection, regression detection, and reading the quality-efficiency Pareto frontier. Code, data, and leaderboard are released under the MIT license.
Original Article
View Cached Full Text

Cached at: 06/23/26, 09:41 AM

Paper page - HAKARI-Bench: A Lightweight Benchmark for Comparing Retrieval Architectures and Efficiency Settings under Unified Conditions

Source: https://huggingface.co/papers/2606.22778

Abstract

HAKARI-Bench provides a lightweight benchmark for comparing retrieval methods across multiple configurations and languages, enabling efficient model selection and performance analysis.

With the rapid spread ofretrieval-augmented generationandsemantic search, choosing the rightembeddingandretrieval configurationis increasingly hard.Large retrieval benchmarksare comprehensive but too heavy to rerun during development, and there is little infrastructure for comparing production settings--dimensionality reduction,quantization,reranking--across many models under identical conditions. We present HAKARI-Bench, a lightweight benchmark that reconstructs existing retrieval suites into small datasets (Nano-sets): 35 benchmarks and 551 tasks across 43 languages in a unified format, enabling same-condition,model-agnostic comparisonof fiveretrieval families(BM25, dense, sparse,late interaction,rerankers) and their efficiency variants. Across 55 models, its overall ranking reproduces the officialMTEB retrievalv2,MMTEB v2retrieval, and EnglishBEIR(full) at Spearman >0.97. HAKARI-Bench does not replace full evaluation; it enables rapid model selection, regression detection, and reading the quality-efficiencyPareto frontier. Code, data, and leaderboard are released under the MIT license.

View arXiv pageView PDFProject pageGitHub2Add to collection

Get this paper in your agent:

hf papers read 2606\.22778

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2606.22778 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2606.22778 in a dataset README.md to link it from this page.

Spaces citing this paper1

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Similar Articles