Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

Papers with Code Trending 02/17/25, 03:06 PM Papers

ternary-llms bitnet edge-inference mixed-precision llm-inference open-source

Summary

Bitnet.cpp presents a mixed-precision matrix multiplication library for efficient edge inference of ternary LLMs like BitNet b1.58, achieving up to 6.25x speedup over full-precision baselines. The system is open-sourced on GitHub.

The advent of 1-bit large language models (LLMs), led by BitNet b1.58, has spurred interest in ternary LLMs. Despite this, research and practical applications focusing on efficient edge inference for ternary LLMs remain scarce. To bridge this gap, we introduce Bitnet.cpp, an inference system optimized for BitNet b1.58 and ternary LLMs. Given that mixed-precision matrix multiplication (mpGEMM) constitutes the bulk of inference time in ternary LLMs, Bitnet.cpp incorporates a novel mpGEMM library to facilitate sub-2-bits-per-weight, efficient and lossless inference. The library features two core solutions: Ternary Lookup Table (TL), which addresses spatial inefficiencies of previous bit-wise methods, and Int2 with a Scale (I2_S), which ensures lossless edge inference, both enabling high-speed inference. Our experiments show that Bitnet.cpp achieves up to a 6.25x increase in speed over full-precision baselines and up to 2.32x over low-bit baselines, setting new benchmarks in the field. Additionally, we expand TL to element-wise lookup table (ELUT) for low-bit LLMs in the appendix, presenting both theoretical and empirical evidence of its considerable potential. Bitnet.cpp is publicly available at https://github.com/microsoft/BitNet/tree/paper , offering a sophisticated solution for the efficient and practical deployment of edge LLMs.

Original Article

View Cached Full Text

Cached at: 06/25/26, 11:09 AM

Paper page - Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

Source: https://huggingface.co/papers/2502.11880 Published on Feb 17, 2025

Abstract

Bitnet.cpp enhances edge inference for ternary LLMs using a novel mixed-precision matrix multiplication library, achieving significant speed improvements over baselines.

The advent of 1-bit large language models (LLMs), led byBitNet b1.58, has spurred interest internary LLMs. Despite this, research and practical applications focusing on efficient edge inference forternary LLMsremain scarce. To bridge this gap, we introduceBitnet.cpp, an inference system optimized forBitNet b1.58andternary LLMs. Given that mixed-precision matrix multiplication (mpGEMM) constitutes the bulk of inference time internary LLMs,Bitnet.cppincorporates a novel mpGEMM library to facilitate sub-2-bits-per-weight, efficient and lossless inference. The library features two core solutions:Ternary Lookup Table(TL), which addresses spatial inefficiencies of previous bit-wise methods, andInt2 with a Scale(I2_S), which ensures lossless edge inference, both enabling high-speed inference. Our experiments show thatBitnet.cppachieves up to a 6.25x increase in speed over full-precision baselines and up to 2.32x over low-bit baselines, setting new benchmarks in the field. Additionally, we expand TL to element-wise lookup table (ELUT) for low-bit LLMs in the appendix, presenting both theoretical and empirical evidence of its considerable potential.Bitnet.cppis publicly available at https://github.com/microsoft/BitNet/tree/paper , offering a sophisticated solution for the efficient and practical deployment of edge LLMs.

View arXiv page View PDF GitHub39.5k Add to collection

Get this paper in your agent:

hf papers read 2502\.11880

Don’t have the latest CLI?curl \-LsSf https://hf\.co/cli/install\.sh \| bash

Models citing this paper1

#### Lgr54HFi/chimera

Datasets citing this paper0

No dataset linking this paper

Cite arxiv.org/abs/2502.11880 in a dataset README.md to link it from this page.

Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

Paper page - Bitnet.cpp: Efficient Edge Inference for Ternary LLMs

Abstract

Models citing this paper1

Datasets citing this paper0

Spaces citing this paper2

Collections including this paper2

Similar Articles

BitNet Text Embeddings

Was BitNet a dead end? What happened to ternary LLMs?

CAT-Q: Cost-efficient and Accurate Ternary Quantization for LLMs

@AdinaYakup: BitCPM4-CANN Native 1.58-bit LLM training system on Ascend NPUs https://huggingface.co/collections/openbmb/bitcpm4-cann…

NEW BITNET MODELS!

Submit Feedback

Similar Articles

Was BitNet a dead end? What happened to ternary LLMs?

CAT-Q: Cost-efficient and Accurate Ternary Quantization for LLMs

@AdinaYakup: BitCPM4-CANN Native 1.58-bit LLM training system on Ascend NPUs https://huggingface.co/collections/openbmb/bitcpm4-cann…