mlx-lm

#mlx-lm

Cross-Family Speculative Decoding for Polish Language Models on Apple~Silicon: An Empirical Evaluation of Bielik~11B with UAG-Extended MLX-LM

arXiv cs.CL ↗ · 2026-04-21

This paper presents the first systematic evaluation of cross-family speculative decoding for Polish LLMs on Apple Silicon, extending MLX-LM with UAG to enable cross-tokenizer decoding. It finds that context-aware token translation improves acceptance rates, but unified memory bandwidth limitations prevent theoretical speedup amortization, with best results showing 1.7x throughput gains for structured text.

0 favorites 0 likes

mlx-lm

Cross-Family Speculative Decoding for Polish Language Models on Apple~Silicon: An Empirical Evaluation of Bielik~11B with UAG-Extended MLX-LM

Submit Feedback