Tag
This paper validates distillation and quantization as cost-effective methods to expand the Apertus LLM family to new sizes and hardware formats, producing Apertus-v1.1 models with up to 4B parameters trained on 1.7T tokens.