Diversed Model Discovery via Structured Table Discovery

Hugging Face Daily Papers 05/21/26, 12:00 AM Papers

model-search structured-tables table-discovery evidence-coverage diversity retrieval ai

Summary

Introduces StructuredSemanticSearch, a model search framework that combines semantic similarity with structured table discovery to improve diversity and coverage of recommended models, evaluated on a benchmark of 597 queries.

Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existing model search systems rely predominantly on semantic similarity over text, which can produce homogeneous result sets and limit exploration of alternatives. We argue that model search is inherently comparative: users want models that are task-aligned yet differentiated in measurable ways. We hypothesize that this balance requires retrieval over condensed, high-quality evidence rather than verbose descriptions, and much of that evidence is concentrated in structured tables. We present StructuredSemanticSearch, a table-driven model search framework built on the ModelTables benchmark. Given a query, StructuredSemanticSearch combines a semantic baseline for task alignment with a structure-aware pipeline that discovers query-related model-card tables using table discovery operators such as unionability, joinability, and keyword search. Retrieved tables are mapped back to model cards under a controlled top-k budget, enabling fair comparison between text-based and table-based retrieval. Beyond retrieval, StructuredSemanticSearch adapts table integration to the model-table domain through orientation-aware integration, producing compact integrated views of tables from partially overlapping and sometimes transposed evidence tables. For evaluation, we introduce a nugget-based, auditable protocol that extracts compact evidence items from model cards, matches queries to condition- or intent-specific nuggets, and measures evidence coverage and diversity over retrieved model-card candidate sets. This protocol also provides a scalable path toward approximate, evidence-based labeling in dynamic model lakes. Experiments on 597 model-recommendation queries show improved nugget coverage for the structure-aware pipeline than semantic baseline

Original Article

View Cached Full Text

Cached at: 05/22/26, 02:31 AM

Paper page - Diversed Model Discovery via Structured Table Discovery

Source: https://huggingface.co/papers/2605.22766

Abstract

Model search system that combines semantic and structured table-based retrieval to improve diversity and coverage of recommended models.

Model cards describe model behavior through a mixture of textual descriptions and structured artifacts, including performance, configuration, and dataset tables. Existingmodel searchsystems rely predominantly onsemantic similarityover text, which can produce homogeneous result sets and limit exploration of alternatives. We argue thatmodel searchis inherently comparative: users want models that are task-aligned yet differentiated in measurable ways. We hypothesize that this balance requires retrieval over condensed, high-quality evidence rather than verbose descriptions, and much of that evidence is concentrated instructured tables. We present StructuredSemanticSearch, a table-drivenmodel searchframework built on the ModelTables benchmark. Given a query, StructuredSemanticSearch combines a semantic baseline for task alignment with a structure-aware pipeline that discovers query-related model-card tables usingtable discovery operatorssuch asunionability,joinability, and keyword search. Retrieved tables are mapped back to model cards under a controlled top-k budget, enabling fair comparison between text-based and table-based retrieval. Beyond retrieval, StructuredSemanticSearch adaptstable integrationto the model-table domain throughorientation-aware integration, producing compact integrated views of tables from partially overlapping and sometimes transposed evidence tables. For evaluation, we introduce a nugget-based, auditable protocol that extracts compact evidence items from model cards, matches queries to condition- or intent-specific nuggets, and measuresevidence coverageand diversity over retrieved model-card candidate sets. This protocol also provides a scalable path toward approximate, evidence-based labeling in dynamic model lakes. Experiments on 597 model-recommendation queries show improved nugget coverage for the structure-aware pipeline than semantic baseline

View arXiv page View PDF GitHub0 Add to collection

Get this paper in your agent:

hf papers read 2605\.22766

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.22766 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.22766 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.22766 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

Diversed Model Discovery via Structured Table Discovery

Paper page - Diversed Model Discovery via Structured Table Discovery

Abstract

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

@dashen_wang: https://x.com/dashen_wang/status/2062318606357303376

@ataiiam: All UI will become AI This is THE article you'll need give your agents a frontend. Agent-user collaboration is the futu…

@VikParuchuri: Someone sent me this unreadable soap label last year. I just tried it with our new model - now you, too, can read the r…

@steijnpelle: Today, we're introducing Lassie and $47M in funding led by a16z. We're building AI that runs small businesses, starting…

is [ BM25 + vector ]+ RRF really worth it?

Submit Feedback

Similar Articles

@dashen_wang: https://x.com/dashen_wang/status/2062318606357303376

@ataiiam: All UI will become AI This is THE article you'll need give your agents a frontend. Agent-user collaboration is the futu…

@VikParuchuri: Someone sent me this unreadable soap label last year. I just tried it with our new model - now you, too, can read the r…

@steijnpelle: Today, we're introducing Lassie and $47M in funding led by a16z. We're building AI that runs small businesses, starting…
Lassie, an AI that runs small businesses starting with doctors' offices, launches with $47M funding led by a16z, already trusted by 700+ practices.

is [ BM25 + vector ]+ RRF really worth it?