cross-tokenizer

#cross-tokenizer

X-Token: Projection-Guided Cross-Tokenizer Knowledge Distillation

arXiv cs.LG ↗ · 2026-05-22 Cached

X-Token introduces two loss formulations (P-KL and H-KL) to address failure modes in logit-based cross-tokenizer knowledge distillation, enabling a student model to learn from teachers with incompatible vocabularies and achieving state-of-the-art results on Llama-3.2-1B.

0 favorites 0 likes

#cross-tokenizer

Cross-Family Speculative Decoding for Polish Language Models on Apple~Silicon: An Empirical Evaluation of Bielik~11B with UAG-Extended MLX-LM

arXiv cs.CL ↗ · 2026-04-21

This paper presents the first systematic evaluation of cross-family speculative decoding for Polish LLMs on Apple Silicon, extending MLX-LM with UAG to enable cross-tokenizer decoding. It finds that context-aware token translation improves acceptance rates, but unified memory bandwidth limitations prevent theoretical speedup amortization, with best results showing 1.7x throughput gains for structured text.

0 favorites 0 likes

cross-tokenizer

X-Token: Projection-Guided Cross-Tokenizer Knowledge Distillation

Cross-Family Speculative Decoding for Polish Language Models on Apple~Silicon: An Empirical Evaluation of Bielik~11B with UAG-Extended MLX-LM

Submit Feedback