From Generalist to Specialist Representation
Summary
This paper establishes nonparametric identifiability guarantees for extracting task-relevant representations from generalist models, proving that task structure is identifiable across time steps and latent representations are identifiable within each step under sparsity regularization.
View Cached Full Text
Cached at: 05/14/26, 08:20 PM
Paper page - From Generalist to Specialist Representation
Source: https://huggingface.co/papers/2605.12733
Abstract
Nonparametric identifiability results establish foundational guarantees for extracting task-relevant representations from generalist models without parametric assumptions or interventions.
Given ageneralist model, learning a task-relevantspecialist representationis fundamental for downstream applications.Identifiability, the asymptotic guarantee of recovering the ground-truth representation, is critical because it sets the ultimate limit of any model, even with infinite data and computation. We study this problem in a completelynonparametric setting, without relying on interventions, parametric forms, or structural constraints. We first prove that the structure between time steps and tasks is identifiable in a fully unsupervised manner, even when sequences lack stricttemporal dependenceand may exhibit disconnections, and task assignments can follow arbitrarily complex and interleaving structures. We then prove that, within each time step, the task-relevantlatent representationcan be disentangled from the irrelevant part under a simplesparsity regularization, without any additional information or parametric constraints. Together, these results establish a hierarchical foundation: task structure is identifiable across time steps, and task-relevantlatent representations are identifiable within each step. To our knowledge, each result provides a first general nonparametricidentifiabilityguarantee, and together they mark a step toward provably moving from generalist to specialist models.
View arXiv pageView PDFAdd to collection
Get this paper in your agent:
hf papers read 2605\.12733
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper0
No model linking this paper
Cite arxiv.org/abs/2605.12733 in a model README.md to link it from this page.
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.12733 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.12733 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
From Generalist to Specialist Representation
This paper proves that task-relevant latent representations can be identified from generalist models in a fully nonparametric setting without interventions or parametric constraints, achieving a hierarchical identifiability guarantee across time steps and within each step.
Geometric Asymmetry in MoE Specialization: Functional Decorrelation and Representational Overlap
This paper introduces a Jacobian-PCA-Grassmann framework to analyze the geometric structure of expert specialization in Mixture-of-Experts (MoE) Transformers. It finds that experts exhibit strong functional decorrelation while their representations overlap, and that routing sparsity significantly influences this geometry.
Task-Restricted Symmetries in Recurrent Weight Space
This paper studies functional redundancy in recurrent neural networks by using ordered real Schur coordinates to identify structured ablations that preserve task performance, finding that task-restricted symmetries vary across tasks and trained solutions.
What Must Generalist Agents Remember?
This paper develops a formal account of what generalist agents must store in memory to act near-optimally across multiple environments and goals, presenting a separation theorem that memory is necessary for domain disambiguation and transition-model reconstruction.
Feature Lottery? A Bifurcation Theory of Concept Emergence
This paper introduces a bifurcation theory of representation dynamics to detect when neural networks acquire structured representations during training, using a Hessian analysis of a GMM probe. The resulting ratio β/β_c serves as a label-free phase coordinate that predicts the onset of usable structure and can forecast feature interpretability in sparse autoencoders early in training.