Tag
FastMix is a novel framework that automates data mixture discovery for training large models using a single proxy model and bilevel optimization, achieving state-of-the-art performance with significant efficiency gains.
ProxyKV is a cross-model proxy pruning framework that offloads importance scoring to a lightweight small model, achieving high precision KV cache pruning with much lower prefilling overhead, matching KVZip accuracy across Llama-3.1, Qwen-2.5, and Qwen-3 families.