AngelSlim/Hy-MT1.5-1.8B-1.25bit
Summary
Tencent's AngelSlim team released Hy-MT1.5-1.8B-1.25bit, a highly compressed 1.25-bit machine translation model supporting 33 languages that fits in 440MB for on-device use. It utilizes the Sherry quantization algorithm to achieve world-class translation quality comparable to much larger models.
View Cached Full Text
Cached at: 05/08/26, 09:09 AM
AngelSlim/Hy-MT1.5-1.8B-1.25bit · Hugging Face
Source: https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit

Dedicated to building a more intuitive, comprehensive, and efficient LLMs compression toolkit.
📱Android Demo| 📣GGUF| ✒️Sherry Paper (ACL 2026)| 📖Documentation| 🤗AngelSlim| 💬WeChat
Hy-MT1.5-1.8B translation quality scores. Source:HY-MT1.5 Technical Report
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%A3-latest-news📣 Latest News
- [26/04/29] We have releasedHy-MT1.5-1.8B-2bit (574MB)andHy-MT1.5-1.8B-1.25bit (440MB), on-device translation models supporting 33 languages, with both weights and GGUF formats available. We have also made anAndroid Demofor you to try out. We invite you to give it a spin! 🔥🔥🔥
- [26/02/09] We have released HY-1.8B-2Bit, 2-bit on-device large language model.
- [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models. And we releasedSherry, the hardware-efficient 1.25-bit quantization algorithm[Paper]|[Code]
For more detailed information, please refer to[AngelSlim]and[HY-MT]
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%8C%9F-hy-mt15-18b-125bit-key-features🌟 Hy-MT1.5-1.8B-1.25bit Key Features
- World-Class Translation QualityHy-MT1.5-1.8B-1.25bit is built upon the Hy-MT1.5-1.8B foundation model, a specialized translation model developed by Tencent Hunyuan Team through a holistic multi-stage training pipeline integrating MT-oriented pre-training, supervised fine-tuning, on-policy distillation, and reinforcement learning. The base model natively supports33 languages,5 dialects/minority languages, and1,056 translation directions. With only 1.8B parameters, it comprehensively outperforms much larger open-source models (e.g., Tower-Plus-72B, Qwen3-32B) and mainstream commercial translation APIs (e.g., Microsoft Translator, Doubao Translator). For full details, please refer to theHY-MT1.5 Technical Report.
- Sherry: Extreme 1.25-bit QuantizationThis model employsSherry(accepted atACL 2026), a hardware-efficient ternary quantization framework. Sherry introduces a3:4 fine-grained sparsitystrategy: for every 4 model weights, the 3 most important are stored in 1-bit ({-1, +1}), while the remaining 1 is zeroed out. This packs 4 weights into just 5 bits, achieving an effective1.25-bitwidth with power-of-two alignment, compressing the original 3.3GB FP16 model to just440MB, with minimal accuracy loss.
Sherry fine-grained sparsity: for every 4 weights, the 3 most important are stored in 1-bit, and the remaining 1 is zeroed out.
- On-Device Deployment for the Most PhonesPaired with our customSTQ kerneldesigned specifically for mobile CPUs, the 1.25-bit model achieves perfect SIMD instruction set alignment. This means even ordinary phones with limited memory can run high-quality offline translation smoothly. No internet connection required, and your data never leaves the device.
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%88-translation-benchmarks📈 Translation Benchmarks
Performance comparison of different model sizes on the Flores-200 Chinese-Foreign mutual translation benchmark:
Performance of different model sizes on the Flores-200 Chinese-Foreign mutual translation benchmark.
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%E2%9A%A1-speed-demo⚡ Speed Demo
FP16 (8x speed) vs. 1.25-bit speed comparison. Demo device: Snapdragon 888, 8GB RAM:
Demo device: Snapdragon 888, 8GB RAM.
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%B1-demo📱 Demo
We provide a ready-to-use Android demo for offline translation. The demo features abackground word extraction modethat works across any app on your phone — browse emails, webpages, or chat messages and get instant translations without switching apps. No network required, no data collection, one-time download for permanent use.
Download Demo:
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit-GGUF/resolve/main/Hy-MT-demo.apk
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#translation-demoTranslation Demo
Demo device: Snapdragon 865, 8GB RAM.
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#background-word-extraction-modeBackground Word Extraction Mode
Demo device: Snapdragon 7+ Gen 2, 16GB RAM.
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%A5-download-links📥 Download Links
- 1.25-bit model weights:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit
- 1.25-bit model GGUF:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit-GGUF
- 2-bit model weights:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-2bit
- 2-bit model GGUF:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-2bit-GGUF
- Demo:https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit-GGUF/resolve/main/Hy-MT-demo.apk
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%84-technical-reports📄 Technical Reports
- HY-MT1.5 Technical Report:https://arxiv.org/abs/2512.24092
- Sherry Paper (ACL 2026):https://arxiv.org/abs/2601.07892
- AngelSlim Technical Report:https://arxiv.org/abs/2602.21233
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%9D-license📝 License
The code for this project is open-sourced under theLicense for AngelSlim.
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%94%97-citation🔗 Citation
@misc{huang2026sherry,
title={Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification},
author={Hong Huang and Decheng Wu and Qiangqiang Hu and Guanghua Yu and Jinhai Yang and Jianchen Zhu and Xue Liu and Dapeng Wu},
year={2026},
eprint={2601.07892},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2601.07892},
}
@article{angelslim2026,
title={AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression},
author={Hunyuan AI Infra Team},
journal={arXiv preprint arXiv:2602.21233},
year={2026}
}
@misc{zheng2025hymt,
title={HY-MT1.5 Technical Report},
author={Mao Zheng and Zheng Li and Tao Chen and Mingyang Song and Di Wang},
year={2025},
eprint={2512.24092},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2512.24092},
}
https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%92%AC-technical-discussion💬 Technical Discussion
- AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue onGitHub Issuesor join ourWeChat discussion group.
Similar Articles
tencent/Hy-MT2-7B
Tencent open-sourced the Hy-MT2 family of fast-thinking multilingual translation models (1.8B, 7B, 30B-A3B) supporting 33 languages, along with extreme quantization for on-device deployment and a new instruction-following benchmark IFMTBench.
@FeitengLi: Hy-MT2 - a new open-source multilingual translation model that matches top-tier large models in capability, supports translation between 33 languages, and offers flexible instruction capabilities. It achieves 2-bit quantization under 500MB, making it well-suited for on-device deployment. https://modelsc…
Hy-MT2 is a new open-source multilingual translation model from Tencent Hy that supports 33 languages, offers flexible instruction capabilities, and achieves 2-bit quantization under 500MB for on-device deployment.
Hy-MT2: A Family of Fast, Efficient and Powerful Multilingual Translation Models in the Wild
Hy-MT2 is a family of fast, efficient multilingual translation models from Tencent, available in 1.8B, 7B, and 30B-A3B sizes, supporting 33 languages and outperforming previous open-source and commercial models.
@AdinaYakup: MiniCPM V4.6 a 1B MLLM that actually runs on your phone, just released by @OpenBMB 1B - Apache2.0 Runs on iOS, Android,…
OpenBMB has released MiniCPM V4.6, a 1B-parameter multimodal large language model optimized for mobile devices under the Apache 2.0 license. It features mixed visual token compression and claims approximately 1.5x faster throughput than Qwen3.5 0.8B while running natively on iOS, Android, and HarmonyOS.
Neural Machine Translation for Low-Resource Tangkhul--English
Presents a neural machine translation system for the severely under-resourced Tangkhul–English language pair, achieving strong BLEU, chrF++, BERTScore, and COMET scores using fine-tuned ByT5-large and mT5-small models.