Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs
Summary
Introduces ModSleuth, an agentic system that recursively reconstructs large-scale dependency graphs for LLM development by analyzing public artifacts, revealing multi-hop license obligations and documentation inconsistencies.
View Cached Full Text
Cached at: 06/11/26, 09:36 PM
Paper page - Which Models Are Our Models Built On? Auditing Invisible Dependencies in Modern LLMs
Source: https://huggingface.co/papers/2606.12385
Abstract
ModSleuth is an agentic system that recursively reconstructs large-scale dependency graphs for LLM development by analyzing public artifacts and resolving inconsistencies in documentation and artifact identities.
Modern LLM training pipelines increasingly rely on other models to generate data, filter corpora, judge outputs, and guide development decisions. These dependencies are recursive: a model may depend on an upstream artifact whose own dependencies are documented only in separate releases and artifacts. As a result, the full dependency structure is fragmented across heterogeneous public artifacts, with complexity and recursive depth far outpacing humans’ ability to trace. We introduce ModSleuth, anagentic systemthat recursively reconstructsLLM dependency graphsfrom public artifacts withsource-grounded evidence. We find that the primary challenge is no longer information extraction, but defining what constitutes a dependency and reconciling artifact references across inconsistent documentation. We address these challenges through a formalization that distinguishes direct and indirect dependencies, represents heterogeneous pipeline roles through operation-centered relationships, and resolves artifact identities across names, versions, and repositories. Applying ModSleuth to four public-artifact-rich LLM releases, we recover 1,060 source-verified dependencies and construct large-scale dependency graphs of modern LLM development. These graphs reveal multi-hoplicense obligations,train-evaluation coupling, discrepancies between released and training-time artifacts, and documentation inconsistencies that would otherwise be difficult to uncover. We release ModSleuth and the resulting dependency graphs to support transparent analysis of the increasingly complex ecosystems underlying modern LLMs.
View arXiv pageView PDFProject pageGitHub1Add to collection
Get this paper in your agent:
hf papers read 2606\.12385
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2606.12385 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2606.12385 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2606.12385 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
@sadhikesaven: Today, LLMs are no longer built from human data alone. They rely on other LLMs to generate training data, filter corpor…
ModSleuth is a new tool that traces the dependencies of modern LLMs, revealing that models like OLMo 3 and Nemotron 3 rely on hundreds of other models and datasets, highlighting the shift from human-only to AI-generated training data.
Towards Security-Auditable LLM Agents: A Unified Graph Representation
This paper introduces Agent-BOM, a unified graph representation for security auditing in LLM-based agentic systems. It addresses the semantic gap in post-hoc auditing by modeling static capabilities and dynamic runtime states to detect complex attack chains like memory poisoning and tool misuse.
Can LLMs model real-world systems in TLA+?
Researchers from the Specula team created SysMoBench, a benchmark evaluating whether LLMs can faithfully model real-world computing systems in TLA+ or merely recite textbook specifications. The benchmark tests 11 systems across four phases and reveals systematic gaps in current LLMs' ability to accurately model system implementations versus reference papers.
Examining Human-Like Behaviors in LLMs: A Multi-Dimensional Analysis of Model Behaviors, User Factors, and System Prompts
This paper presents a multi-dimensional analysis of human-like behaviors in LLMs, examining prevalence, effects, and controllability across 21,000 conversations from four models, finding that behaviors vary by model and user factors, with implications for responsible design.
Formal Methods Meet LLMs: Auditing, Monitoring, and Intervention for Compliance of Advanced AI Systems
This paper proposes techniques that combine formal methods (Linear Temporal Logic) with LLMs for auditing, monitoring, and intervening in AI systems to ensure compliance with behavioral constraints, showing that even small-model labelers can match frontier LLM judges in detecting violations.