proxy-model

Tag

Cards List
#proxy-model

FastMix: Fast Data Mixture Optimization via Gradient Descent

arXiv cs.LG · 4d ago Cached

FastMix is a novel framework that automates data mixture discovery for training large models using a single proxy model and bilevel optimization, achieving state-of-the-art performance with significant efficiency gains.

0 favorites 0 likes
#proxy-model

ProxyKV: Cross-Model Proxy Pruning for Efficient Long-Context LLM Inference

arXiv cs.LG · 2026-05-19 Cached

ProxyKV is a cross-model proxy pruning framework that offloads importance scoring to a lightweight small model, achieving high precision KV cache pruning with much lower prefilling overhead, matching KVZip accuracy across Llama-3.1, Qwen-2.5, and Qwen-3 families.

0 favorites 0 likes
← Back to home

Submit Feedback