MultiHashFormer: Hash-based Generative Language Models

arXiv cs.CL 06/29/26, 04:00 AM Papers

hash-based language-models autoregressive parameter-efficiency transformer generative multilingual arxiv

Summary

MultiHashFormer is a hash-based generative language model that represents each token as a unique hash signature, enabling parameter-efficient autoregression. It outperforms standard Transformer LMs at 100M, 1B, and 3B scales and supports multilingual vocabulary expansion without increasing parameters.

arXiv:2606.28057v1 Announce Type: new Abstract: Language models (LMs) represent tokens using embedding matrices that scale linearly with the vocabulary size. To constrain the parameter footprint, prior work proposes hashing many tokens into a single vector within encoder-only models. While this offers parameter efficiency, many-to-one collisions prevent its use in causal LMs. In this paper, we propose MultiHashFormer, a new framework that allows hash-based autoregression. Each token is represented as a unique hash signature, a short sequence of discrete hash IDs, generated by multiple independent hash functions. A Hash Encoder compresses this signature into a single latent vector for processing by a Transformer decoder. Then, a Hash Decoder generates the hash signature of the next token, which is then mapped back to text. We evaluate our approach at the 100M, 1B and 3B parameter scales, demonstrating that MultiHashFormer consistently outperforms standard Transformer LMs across multiple benchmarks. Furthermore, we show that our model handles multilingual vocabulary expansion with a constant parameter footprint without any modifications.

Original Article

View Cached Full Text

Cached at: 06/29/26, 05:25 AM

# MultiHashFormer: Hash-based Generative Language Models
Source: [https://arxiv.org/abs/2606.28057](https://arxiv.org/abs/2606.28057)
[View PDF](https://arxiv.org/pdf/2606.28057)

> Abstract:Language models \(LMs\) represent tokens using embedding matrices that scale linearly with the vocabulary size\. To constrain the parameter footprint, prior work proposes hashing many tokens into a single vector within encoder\-only models\. While this offers parameter efficiency, many\-to\-one collisions prevent its use in causal LMs\. In this paper, we propose MultiHashFormer, a new framework that allows hash\-based autoregression\. Each token is represented as a unique hash signature, a short sequence of discrete hash IDs, generated by multiple independent hash functions\. A Hash Encoder compresses this signature into a single latent vector for processing by a Transformer decoder\. Then, a Hash Decoder generates the hash signature of the next token, which is then mapped back to text\. We evaluate our approach at the 100M, 1B and 3B parameter scales, demonstrating that MultiHashFormer consistently outperforms standard Transformer LMs across multiple benchmarks\. Furthermore, we show that our model handles multilingual vocabulary expansion with a constant parameter footprint without any modifications\.

## Submission history

From: Huiyin Xue \[[view email](https://arxiv.org/show-email/3c52616c/2606.28057)\] **\[v1\]**Fri, 26 Jun 2026 13:03:29 UTC \(4,031 KB\)

MultiHashFormer: Hash-based Generative Language Models

Similar Articles

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation

HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment

Submit Feedback

Similar Articles

BitLM: Unlocking Multi-Token Language Generation with Bitwise Continuous Diffusion

Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild

Squeezing Capacity from Multimodal Large Language Models for Subject-driven Generation

HEBATRON: A Hebrew-Specialized Open-Weight Mixture-of-Experts Language Model

Transformer-Based Language Models Across Domain Verticals: Architectures, Applications and Critical Assessment