MultiHashFormer: Hash-based Generative Language Models
Summary
MultiHashFormer is a hash-based generative language model that represents each token as a unique hash signature, enabling parameter-efficient autoregression. It outperforms standard Transformer LMs at 100M, 1B, and 3B scales and supports multilingual vocabulary expansion without increasing parameters.
View Cached Full Text
Cached at: 06/29/26, 05:25 AM
# MultiHashFormer: Hash-based Generative Language Models Source: [https://arxiv.org/abs/2606.28057](https://arxiv.org/abs/2606.28057) [View PDF](https://arxiv.org/pdf/2606.28057) > Abstract:Language models \(LMs\) represent tokens using embedding matrices that scale linearly with the vocabulary size\. To constrain the parameter footprint, prior work proposes hashing many tokens into a single vector within encoder\-only models\. While this offers parameter efficiency, many\-to\-one collisions prevent its use in causal LMs\. In this paper, we propose MultiHashFormer, a new framework that allows hash\-based autoregression\. Each token is represented as a unique hash signature, a short sequence of discrete hash IDs, generated by multiple independent hash functions\. A Hash Encoder compresses this signature into a single latent vector for processing by a Transformer decoder\. Then, a Hash Decoder generates the hash signature of the next token, which is then mapped back to text\. We evaluate our approach at the 100M, 1B and 3B parameter scales, demonstrating that MultiHashFormer consistently outperforms standard Transformer LMs across multiple benchmarks\. Furthermore, we show that our model handles multilingual vocabulary expansion with a constant parameter footprint without any modifications\. ## Submission history From: Huiyin Xue \[[view email](https://arxiv.org/show-email/3c52616c/2606.28057)\] **\[v1\]**Fri, 26 Jun 2026 13:03:29 UTC \(4,031 KB\)
Similar Articles
BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion
This paper introduces BitLM, a language model that uses bitwise continuous diffusion to generate multiple tokens in parallel, aiming to overcome the sequential bottleneck of traditional autoregressive generation while preserving causal structure.
Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild
Hy-MT2 is a family of fast, efficient multilingual translation models from Tencent, available in 1.8B, 7B, and 30B-A3B sizes, supporting 33 languages and outperforming previous open-source and commercial models.
Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation
This paper proposes a novel approach that conditions diffusion models on Multimodal Large Language Models (MLLMs) for subject-driven image generation, using VAE-based identity conditioning and a Dual Layer Aggregation module to improve both semantic understanding and identity preservation while mitigating copy-paste artifacts.
HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model
Hebatron is a new open-weight Hebrew-specialized Large Language Model built on NVIDIA's Nemotron-3 Mixture-of-Experts architecture, achieving strong reasoning performance with efficient inference. It is the first language-specific adaptation of this architecture and supports native long-context processing.
Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment
A comprehensive survey of transformer-based language models covering architectures, applications across domain verticals (healthcare, finance, legal, etc.), and critical assessment of trade-offs including compute cost, alignment, and data provenance.