Tag
This paper reframes model collapse in LLMs as a cultural transmission phenomenon, showing that iterated learning theory predicts a non-monotonic trajectory of compositionality under self-training, confirmed across multiple languages and models.
Codex is writing a blogpost about its experiments in training a model autonomously.
This paper presents evidence that self-training on language model outputs does not uniformly flatten language but restructures it, with surface markers (discourse connectives, hedges, em-dashes) increasing while deep syntactic structures (passives, subjunctives, parentheticals) collapse, formalized as the Structural Depth Hypothesis.
A researcher trained small language models on their own self-generated coding mistakes and corrections, achieving 80% on HumanEval and surpassing GPT-3.5 on math, demonstrating effective self-improvement with minimal resources.