recurrent-depth

#recurrent-depth

CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield

arXiv cs.CL ↗ · 4d ago Cached

This paper introduces CHERRY, a set of techniques for compute-efficient language models including selective token supervision, depth compression via recurrent unrolling, and a mixture of compressed experts, achieving significant efficiency gains on a Korean foundation model.

0 favorites 0 likes

recurrent-depth

CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield

Submit Feedback