ModelLens: Finding the Best for Your Task from Myriads of Models
Summary
ModelLens is a unified framework that recommends AI models for unseen datasets by learning from public leaderboard data, eliminating the need for costly direct evaluations. It constructs a performance-aware latent space to rank candidates across diverse tasks, outperforming existing baselines on large-scale benchmarks.
View Cached Full Text
Cached at: 05/11/26, 10:51 PM
Paper page - ModelLens: Finding the Best for Your Task from Myriads of Models
Source: https://huggingface.co/papers/2605.07075
Abstract
ModelLens is a unified framework that recommends models in real-world scenarios by learning from public leaderboard data to rank unseen models on unseen datasets without requiring costly evaluations.
The open-source model ecosystem now contains hundreds of thousands of pretrained models, yet picking the best model for a new dataset is increasingly infeasible: new models and unbenchmarked datasets emerge continuously, leaving practitioners with no prior records on either side. Existing approaches handle only fragments of this in-the-wild setting:AutoMLandtransferability estimationselect models from small predefined pools or require expensive per-model forward passes on the target dataset, whilemodel routingpresupposes a given candidate pool. We introduce ModelLens, aunified frameworkformodel recommendationin the wild. Our key insight is that publicleaderboard interactions, though scattered and noisy, collectively trace out an implicit atlas of model capabilities across heterogeneous evaluation settings, a signal rich enough to learn from directly. By learning aperformance-aware latent spaceovermodel--dataset--metric tuples, ModelLens ranks unseen models on unseen datasets without running candidates on the target dataset. On a new benchmark of 1.62M evaluation records spanning 47K models and 9.6K datasets, ModelLens surpasses baselines that either rely on metadata alone or require running each candidate on the target dataset. Its recommended Top-K pools further improve multiple representative routing methods by up to 81% across diverse QA benchmarks. Case studies on recently released benchmarks further confirm generalization to both text and vision-language tasks.
View arXiv pageView PDFProject pageGitHub7Add to collection
Get this paper in your agent:
hf papers read 2605\.07075
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.07075 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.07075 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.07075 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Lens: Rethinking Training Efficiency for Foundational Text-to-Image Models
Lens is a compact 3.8B-parameter text-to-image model from Microsoft that achieves competitive performance with larger models while requiring significantly less training compute, using dense captions, multi-resolution batching, and efficient architecture.
FashionLens: Toward Versatile Fashion Image Retrieval via Task-Adaptive Learning
FashionLens proposes a unified fashion image retrieval framework using multimodal large language models with adaptive calibration and sampling, achieving state-of-the-art performance across diverse retrieval scenarios.
I Compared the Top AI Models of 2026 — The Results Were More Nuanced Than Expected
A comprehensive comparison of frontier AI models from 2026 finds no single best model; the optimal choice depends on use case, constraints, and operational requirements.
SkillLens: Adaptive Multi-Granularity Skill Reuse for Cost-Efficient LLM Agents
This paper introduces SkillLens, a hierarchical framework for adaptive multi-granularity skill reuse in LLM agents, demonstrating improved accuracy and cost-efficiency on benchmark tasks.
MemLens: Benchmarking Multimodal Long-Term Memory in Large Vision-Language Models
MemLens is a new benchmark for evaluating memory capabilities in large vision-language models through multi-session conversations. It compares long-context and memory-augmented approaches, revealing limitations in both and motivating hybrid architectures.