recurrent-depth

Tag

Cards List
#recurrent-depth

CHERRY: Compressed Hierarchical Experts with Recurrent Representational Yield

arXiv cs.CL · 4d ago Cached

This paper introduces CHERRY, a set of techniques for compute-efficient language models including selective token supervision, depth compression via recurrent unrolling, and a mixture of compressed experts, achieving significant efficiency gains on a Korean foundation model.

0 favorites 0 likes
← Back to home

Submit Feedback