Tag
This paper introduces CHERRY, a set of techniques for compute-efficient language models including selective token supervision, depth compression via recurrent unrolling, and a mixture of compressed experts, achieving significant efficiency gains on a Korean foundation model.