GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

Hugging Face Daily Papers 05/07/26, 12:00 AM Papers

Summary

GeoStack introduces a geometric framework to compose independently trained domain experts in Vision-Language Models without catastrophic forgetting, achieving constant-time inference and a 10x reduction in geometric error.

We address the challenge of knowledge composition in Vision-Language Models (VLMs), where accumulating expertise across multiple domains or tasks typically leads to catastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently trained domain experts to be composed into a unified model. By imposing geometric and structural constraints on the adapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermore, we mathematically demonstrate a weight-folding property that achieves constant-time inference complexity (O(1)), regardless of the number of integrated experts. Experimental results across multi-domain adaptation and class-incremental learning show that GeoStack provides an efficient mechanism for long-term knowledge composition while significantly mitigating catastrophic forgetting. Code is available at https://github.com/QuantitativeImagingLaboratory/GeoStack.

Original Article

View Cached Full Text

Cached at: 05/08/26, 06:28 PM

Paper page - GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

Source: https://huggingface.co/papers/2605.06477 https://huggingface.co/login?next=%2Fpapers%2F2605.06477-

Abstract

GeoStack is a modular framework that composes domain experts in Vision-Language Models while preserving foundational knowledge and enabling constant-time inference through geometric constraints on adapter manifolds.

We address the challenge of knowledge composition inVision-Language Models(VLMs), where accumulating expertise across multiple domains or tasks typically leads tocatastrophic forgetting. We introduce GeoStack (Geometric Stacking), a modular framework that allows independently traineddomain expertsto be composed into a unified model. By imposing geometric and structural constraints on theadapter manifold, GeoStack ensures the foundational knowledge of the base model is preserved. Furthermore, we mathematically demonstrate aweight-folding propertythat achieves constant-time inference complexity (O(1)), regardless of the number of integrated experts. Experimental results acrossmulti-domain adaptationandclass-incremental learningshow that GeoStack provides an efficient mechanism for long-term knowledge composition while significantly mitigatingcatastrophic forgetting. Code is available at https://github.com/QuantitativeImagingLaboratory/GeoStack.

View arXiv page View PDF Project page GitHub0 Add to collection

Community

Paper submitter

about 3 hours ago

How many domain experts can you stack before a VLM collapses? 🧱

GeoStack introduces a geometric framework to compose independently trained experts into a single model with zero added inference cost. By using a perturbation prior and orthogonality constraints, it achieves a 10x reduction in geometric error compared to standard adapters.

If you’re looking for a way to build specialized VLMs that don’t forget their foundational knowledge, check this out!

Upload images, audio, and videos by dragging in the text input, pasting, orclicking here.

Tap or paste here to upload images

https://huggingface.co/login?next=%2Fpapers%2F2605.06477-

Get this paper in your agent:

hf papers read 2605\.06477

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper0

No model linking this paper

Cite arxiv.org/abs/2605.06477 in a model README.md to link it from this page.

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2605.06477 in a dataset README.md to link it from this page.

Spaces citing this paper0

No Space linking this paper

Cite arxiv.org/abs/2605.06477 in a Space README.md to link it from this page.

Collections including this paper0

No Collection including this paper

Add this paper to acollectionto link it from this page.

GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

Paper page - GeoStack: A Framework for Quasi-Abelian Knowledge Composition in VLMs

Abstract

Community

Models citing this paper0

Datasets citing this paper0

Spaces citing this paper0

Collections including this paper0

Similar Articles

Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models

Submit Feedback

Similar Articles

Learning Geometric Representations from Videos for Spatial Intelligent Multimodal Large Language Models

Beyond 3D VQAs: Injecting 3D Spatial Priors into Vision-Language Models for Enhanced Geometric Reasoning

Stream3D-VLM: Online 3D Spatial Understanding with Incremental Geometry Priors

HyperGVL: Benchmarking and Improving Large Vision-Language Models in Hypergraph Understanding and Reasoning

Which Pretraining Paradigm Better Serves Spatial Intelligence? An Empirical Comparison of Vision-Language and Video Generation Models