AngelSlim/Hy-MT1.5-1.8B-1.25bit

Hugging Face Models Trending 04/28/26, 12:39 PM Models

llm-compression machine-translation quantization edge-ai tencent-hunyuan on-device-ai

Summary

Tencent's AngelSlim team released Hy-MT1.5-1.8B-1.25bit, a highly compressed 1.25-bit machine translation model supporting 33 languages that fits in 440MB for on-device use. It utilizes the Sherry quantization algorithm to achieve world-class translation quality comparable to much larger models.

Task: translation Tags: safetensors, hunyuan_v1_dense, translation, hy-mt, quant, 1.25bit, sherry, multilingual, arxiv:2601.07892, arxiv:2512.24092, arxiv:2602.21233, base_model:tencent/HY-MT1.5-1.8B, base_model:finetune:tencent/HY-MT1.5-1.8B, region:us

Original Article

View Cached Full Text

Cached at: 05/08/26, 09:09 AM

AngelSlim/Hy-MT1.5-1.8B-1.25bit · Hugging Face

Source: https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit

Dedicated to building a more intuitive, comprehensive, and efficient LLMs compression toolkit.

model_scores Hy-MT1.5-1.8B translation quality scores. Source:HY-MT1.5 Technical Report

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%A3-latest-news📣 Latest News

[26/04/29] We have releasedHy-MT1.5-1.8B-2bit (574MB)andHy-MT1.5-1.8B-1.25bit (440MB), on-device translation models supporting 33 languages, with both weights and GGUF formats available. We have also made anAndroid Demofor you to try out. We invite you to give it a spin! 🔥🔥🔥
[26/02/09] We have released HY-1.8B-2Bit, 2-bit on-device large language model.
[26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models. And we releasedSherry, the hardware-efficient 1.25-bit quantization algorithm[Paper]|[Code]

For more detailed information, please refer to[AngelSlim]and[HY-MT]

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%8C%9F-hy-mt15-18b-125bit-key-features🌟 Hy-MT1.5-1.8B-1.25bit Key Features

World-Class Translation QualityHy-MT1.5-1.8B-1.25bit is built upon the Hy-MT1.5-1.8B foundation model, a specialized translation model developed by Tencent Hunyuan Team through a holistic multi-stage training pipeline integrating MT-oriented pre-training, supervised fine-tuning, on-policy distillation, and reinforcement learning. The base model natively supports33 languages,5 dialects/minority languages, and1,056 translation directions. With only 1.8B parameters, it comprehensively outperforms much larger open-source models (e.g., Tower-Plus-72B, Qwen3-32B) and mainstream commercial translation APIs (e.g., Microsoft Translator, Doubao Translator). For full details, please refer to theHY-MT1.5 Technical Report.
Sherry: Extreme 1.25-bit QuantizationThis model employsSherry(accepted atACL 2026), a hardware-efficient ternary quantization framework. Sherry introduces a3:4 fine-grained sparsitystrategy: for every 4 model weights, the 3 most important are stored in 1-bit ({-1, +1}), while the remaining 1 is zeroed out. This packs 4 weights into just 5 bits, achieving an effective1.25-bitwidth with power-of-two alignment, compressing the original 3.3GB FP16 model to just440MB, with minimal accuracy loss.

Sherry Sherry fine-grained sparsity: for every 4 weights, the 3 most important are stored in 1-bit, and the remaining 1 is zeroed out.

On-Device Deployment for the Most PhonesPaired with our customSTQ kerneldesigned specifically for mobile CPUs, the 1.25-bit model achieves perfect SIMD instruction set alignment. This means even ordinary phones with limited memory can run high-quality offline translation smoothly. No internet connection required, and your data never leaves the device.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%88-translation-benchmarks📈 Translation Benchmarks

Performance comparison of different model sizes on the Flores-200 Chinese-Foreign mutual translation benchmark:

flores_model_size Performance of different model sizes on the Flores-200 Chinese-Foreign mutual translation benchmark.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%E2%9A%A1-speed-demo⚡ Speed Demo

FP16 (8x speed) vs. 1.25-bit speed comparison. Demo device: Snapdragon 888, 8GB RAM:

fp16_vs_1.25bit Demo device: Snapdragon 888, 8GB RAM.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%B1-demo📱 Demo

We provide a ready-to-use Android demo for offline translation. The demo features abackground word extraction modethat works across any app on your phone — browse emails, webpages, or chat messages and get instant translations without switching apps. No network required, no data collection, one-time download for permanent use.

Download Demo:

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit-GGUF/resolve/main/Hy-MT-demo.apk

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#translation-demoTranslation Demo

app_demo Demo device: Snapdragon 865, 8GB RAM.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#background-word-extraction-modeBackground Word Extraction Mode

demo2 Demo device: Snapdragon 7+ Gen 2, 16GB RAM.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%A5-download-links📥 Download Links

1.25-bit model weights:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit
1.25-bit model GGUF:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit-GGUF
2-bit model weights:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-2bit
2-bit model GGUF:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF
Demo:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit-GGUF/resolve/main/Hy-MT-demo.apk

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%84-technical-reports📄 Technical Reports

HY-MT1.5 Technical Report:https://arxiv.org/abs/2512.24092
Sherry Paper (ACL 2026):https://arxiv.org/abs/2601.07892
AngelSlim Technical Report:https://arxiv.org/abs/2602.21233

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%9D-license📝 License

The code for this project is open-sourced under theLicense for AngelSlim.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%94%97-citation🔗 Citation

@misc{huang2026sherry,
      title={Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification}, 
      author={Hong Huang and Decheng Wu and Qiangqiang Hu and Guanghua Yu and Jinhai Yang and Jianchen Zhu and Xue Liu and Dapeng Wu},
      year={2026},
      eprint={2601.07892},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.07892}, 
}

@article{angelslim2026,
  title={AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression},
  author={Hunyuan AI Infra Team},
  journal={arXiv preprint arXiv:2602.21233},
  year={2026}
}

@misc{zheng2025hymt,
      title={HY-MT1.5 Technical Report}, 
      author={Mao Zheng and Zheng Li and Tao Chen and Mingyang Song and Di Wang},
      year={2025},
      eprint={2512.24092},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.24092}, 
}

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%92%AC-technical-discussion💬 Technical Discussion

AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue onGitHub Issuesor join ourWeChat discussion group.