code-evaluation

Tag

Cards List
#code-evaluation

CodeAlchemy: Synthetic Code Rewriting at Scale

arXiv cs.CL · 2026-06-10 Cached

CodeAlchemy is a synthetic data generation framework that transforms publicly available code into semantically rich training data using five strategies, producing over 500 billion tokens and enabling small models to outperform much larger ones on code benchmarks.

0 favorites 0 likes
← Back to home

Submit Feedback