Tag
Authors propose a 2D early-exit method that jointly trims layers and input sentences, yielding 1.4–2.3× extra speed-up on sentiment tasks across Llama 3.1/3.2, Gemma and Qwen models.
Researchers from National Taiwan University propose replacing fixed translation-based prompting strategies in multilingual LLMs with lightweight learned classifiers that route each instance to either native or translation-based prompting. Their analysis across 10 languages and 4 benchmarks shows no single strategy is universally optimal, with translation benefiting low-resource languages most, and the learned routing achieving statistically significant improvements over fixed strategies.
Interfaze AI introduces a specialized model that surpasses general LLMs on deterministic developer tasks including OCR, object detection, web scraping, speech-to-text, and classification.