Tag
OpenBMB releases BitCPM4-CANN, a collection of natively trained 1.58-bit ternary quantized LLMs (0.5B to 8B) optimized for Ascend NPUs via CANN, achieving 6× memory reduction at inference and minimal training overhead.
This paper introduces Tequila, a trapping-free quantization method for Large Language Models that improves ternary quantization accuracy and inference speed by repurposing deadzone-trapped weights as dynamic biases.