FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
Summary
FAAST proposes a forward-only method that compiles labeled examples into fast weights analytically, enabling efficient test-time supervised adaptation without backpropagation, achieving over 90% speedup and 95% memory savings while maintaining performance.
View Cached Full Text
Cached at: 05/14/26, 04:19 PM
Paper page - FAAST: Forward-Only Associative Learning via Closed-Form Fast Weights for Test-Time Supervised Adaptation
Source: https://huggingface.co/papers/2605.04651
Abstract
FAAST enables efficient task adaptation by compiling labeled examples into fast weights through forward-only computation, achieving significant speedup and memory savings over traditional backpropagation methods.
Adaptingpretrained modelstypically involves a trade-off between the high training costs ofbackpropagationand the heavy inference overhead of memory-based orin-context learning. We propose FAAST, aforward-only associative adaptationmethod that analytically compiles labeled examples intofast weightsin a single pass. By eliminating memory or context dependence, FAAST achieves constant-time inference and decouplestask adaptationfrom pretrained representation. Across image classification and language modeling benchmarks, FAAST matches or exceeds backprop-based adaptation while reducing adaptation time by over 90% and is competitive to memory/context-based adaptation while saving memory usage by up to 95%. These results demonstrate FAAST as a highly efficient, scalable solution forsupervised task adaptation, particularly for resource-constrained models. We release the code and models at https://github.com/baoguangsheng/faast.
View arXiv pageView PDFGitHub3Add to collection
Get this paper in your agent:
hf papers read 2605\.04651
Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash
Models citing this paper3
#### gshbao/faast-gpt2-xl 2B• Updatedabout 6 hours ago • 11
#### gshbao/faast-Qwen2.5-3B-Instruct 3B• Updatedabout 6 hours ago • 14
#### gshbao/faast-Qwen2.5-7B-Instruct 8B• Updatedabout 6 hours ago • 1
Datasets citing this paper0
No dataset linking this paper
Cite arxiv.org/abs/2605.04651 in a dataset README.md to link it from this page.
Spaces citing this paper0
No Space linking this paper
Cite arxiv.org/abs/2605.04651 in a Space README.md to link it from this page.
Collections including this paper0
No Collection including this paper
Add this paper to acollectionto link it from this page.
Similar Articles
Learning, Fast and Slow: Towards LLMs That Adapt Continually
A fast-slow learning framework for LLMs combines fixed slow weights with optimized fast context weights, achieving up to 3x better sample efficiency and reduced catastrophic forgetting in continual learning scenarios.
Learning, Fast and Slow: Towards LLMs That Adapt Continually [R]
This paper introduces a Fast-Slow Training framework for LLMs that combines parameter updates with optimized context to improve sample efficiency and reduce catastrophic forgetting during continual learning.
@daniel_mac8: babe, wake up. new continual learning breakthrough just dropped. fast-slow training (fst) treats model params as "slow"…
This tweet announces Fast-Slow Training (FST), a new continual learning method that treats model parameters as slow weights and optimized context as fast weights, reportedly outperforming weights-only training on math, code, and general reasoning benchmarks.
@LakshyAAAgrawal: Learning from rich textual feedback (errors, traces, partial reasoning) beats scalar reward alone for LLM optimization.…
Fast-Slow Training (FST) interleaves context optimization (via GEPA) with model weight updates via RL, achieving 3× sample efficiency over RL alone on math, code, and physics reasoning while preserving plasticity and enabling continual learning.
Federated Nested Learning: Collaborative Training of Self-Referential Memories for Test-Time Adaptation
Proposes Federated Nested Learning (FedNL), a framework that reformulates federated learning as a three-level nested optimization system, enabling collaborative training of self-referential memories for test-time adaptation to handle Non-IID data and long-tail distributions.