AngelSlim/Hy-MT1.5-1.8B-1.25bit

Hugging Face Models Trending Models

Summary

Tencent's AngelSlim team released Hy-MT1.5-1.8B-1.25bit, a highly compressed 1.25-bit machine translation model supporting 33 languages that fits in 440MB for on-device use. It utilizes the Sherry quantization algorithm to achieve world-class translation quality comparable to much larger models.

Task: translation Tags: safetensors, hunyuan_v1_dense, translation, hy-mt, quant, 1.25bit, sherry, multilingual, arxiv:2601.07892, arxiv:2512.24092, arxiv:2602.21233, base_model:tencent/HY-MT1.5-1.8B, base_model:finetune:tencent/HY-MT1.5-1.8B, region:us
Original Article
View Cached Full Text

Cached at: 05/08/26, 09:09 AM

AngelSlim/Hy-MT1.5-1.8B-1.25bit · Hugging Face

Source: https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit AngelSlim

Dedicated to building a more intuitive, comprehensive, and efficient LLMs compression toolkit.

📱Android Demo| 📣GGUF| ✒️Sherry Paper (ACL 2026)| 📖Documentation| 🤗AngelSlim| 💬WeChat

model_scores Hy-MT1.5-1.8B translation quality scores. Source:HY-MT1.5 Technical Report

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%A3-latest-news📣 Latest News

  • [26/04/29] We have releasedHy-MT1.5-1.8B-2bit (574MB)andHy-MT1.5-1.8B-1.25bit (440MB), on-device translation models supporting 33 languages, with both weights and GGUF formats available. We have also made anAndroid Demofor you to try out. We invite you to give it a spin! 🔥🔥🔥
  • [26/02/09] We have released HY-1.8B-2Bit, 2-bit on-device large language model.
  • [26/01/13] We have released v0.3. We support the training and deployment of Eagle3 for all-scale LLMs/VLMs/Audio models. And we releasedSherry, the hardware-efficient 1.25-bit quantization algorithm[Paper]|[Code]

For more detailed information, please refer to[AngelSlim]and[HY-MT]

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%8C%9F-hy-mt15-18b-125bit-key-features🌟 Hy-MT1.5-1.8B-1.25bit Key Features

  • World-Class Translation QualityHy-MT1.5-1.8B-1.25bit is built upon the Hy-MT1.5-1.8B foundation model, a specialized translation model developed by Tencent Hunyuan Team through a holistic multi-stage training pipeline integrating MT-oriented pre-training, supervised fine-tuning, on-policy distillation, and reinforcement learning. The base model natively supports33 languages,5 dialects/minority languages, and1,056 translation directions. With only 1.8B parameters, it comprehensively outperforms much larger open-source models (e.g., Tower-Plus-72B, Qwen3-32B) and mainstream commercial translation APIs (e.g., Microsoft Translator, Doubao Translator). For full details, please refer to theHY-MT1.5 Technical Report.
  • Sherry: Extreme 1.25-bit QuantizationThis model employsSherry(accepted atACL 2026), a hardware-efficient ternary quantization framework. Sherry introduces a3:4 fine-grained sparsitystrategy: for every 4 model weights, the 3 most important are stored in 1-bit ({-1, +1}), while the remaining 1 is zeroed out. This packs 4 weights into just 5 bits, achieving an effective1.25-bitwidth with power-of-two alignment, compressing the original 3.3GB FP16 model to just440MB, with minimal accuracy loss.

Sherry Sherry fine-grained sparsity: for every 4 weights, the 3 most important are stored in 1-bit, and the remaining 1 is zeroed out.

  • On-Device Deployment for the Most PhonesPaired with our customSTQ kerneldesigned specifically for mobile CPUs, the 1.25-bit model achieves perfect SIMD instruction set alignment. This means even ordinary phones with limited memory can run high-quality offline translation smoothly. No internet connection required, and your data never leaves the device.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%88-translation-benchmarks📈 Translation Benchmarks

Performance comparison of different model sizes on the Flores-200 Chinese-Foreign mutual translation benchmark:

flores_model_size Performance of different model sizes on the Flores-200 Chinese-Foreign mutual translation benchmark.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%E2%9A%A1-speed-demo⚡ Speed Demo

FP16 (8x speed) vs. 1.25-bit speed comparison. Demo device: Snapdragon 888, 8GB RAM:

fp16_vs_1.25bit Demo device: Snapdragon 888, 8GB RAM.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%B1-demo📱 Demo

We provide a ready-to-use Android demo for offline translation. The demo features abackground word extraction modethat works across any app on your phone — browse emails, webpages, or chat messages and get instant translations without switching apps. No network required, no data collection, one-time download for permanent use.

Download Demo:

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit-GGUF/resolve/main/Hy-MT-demo.apk

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#translation-demoTranslation Demo

app_demo Demo device: Snapdragon 865, 8GB RAM.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#background-word-extraction-modeBackground Word Extraction Mode

demo2 Demo device: Snapdragon 7+ Gen 2, 16GB RAM.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%A5-download-links📥 Download Links

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%84-technical-reports📄 Technical Reports

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%93%9D-license📝 License

The code for this project is open-sourced under theLicense for AngelSlim.

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%94%97-citation🔗 Citation

@misc{huang2026sherry,
      title={Sherry: Hardware-Efficient 1.25-Bit Ternary Quantization via Fine-grained Sparsification}, 
      author={Hong Huang and Decheng Wu and Qiangqiang Hu and Guanghua Yu and Jinhai Yang and Jianchen Zhu and Xue Liu and Dapeng Wu},
      year={2026},
      eprint={2601.07892},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2601.07892}, 
}

@article{angelslim2026,
  title={AngelSlim: A more accessible, comprehensive, and efficient toolkit for large model compression},
  author={Hunyuan AI Infra Team},
  journal={arXiv preprint arXiv:2602.21233},
  year={2026}
}

@misc{zheng2025hymt,
      title={HY-MT1.5 Technical Report}, 
      author={Mao Zheng and Zheng Li and Tao Chen and Mingyang Song and Di Wang},
      year={2025},
      eprint={2512.24092},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2512.24092}, 
}

https://huggingface.co/AngelSlim/Hy-MT1.5-1.8B-1.25bit#%F0%9F%92%AC-technical-discussion💬 Technical Discussion

  • AngelSlim is continuously iterating and new features will be released soon. If you have any questions or suggestions, please open an issue onGitHub Issuesor join ourWeChat discussion group.

Similar Articles

tencent/Hy-MT2-7B

Hugging Face Models Trending

Tencent open-sourced the Hy-MT2 family of fast-thinking multilingual translation models (1.8B, 7B, 30B-A3B) supporting 33 languages, along with extreme quantization for on-device deployment and a new instruction-following benchmark IFMTBench.

@FeitengLi: Hy-MT2 - a new open-source multilingual translation model that matches top-tier large models in capability, supports translation between 33 languages, and offers flexible instruction capabilities. It achieves 2-bit quantization under 500MB, making it well-suited for on-device deployment. https://modelsc…

X AI KOLs Timeline

Hy-MT2 is a new open-source multilingual translation model from Tencent Hy that supports 33 languages, offers flexible instruction capabilities, and achieves 2-bit quantization under 500MB for on-device deployment.