pre-training

#pre-training

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

arXiv cs.LG ↗ · 2d ago Cached

This paper introduces ADAPT, an online reweighting framework for LLM data curation that dynamically adjusts sample importance during training via loss weighting, outperforming offline selection and mixing methods in cross-benchmark generalization.

0 favorites 0 likes

#pre-training

ZAYA1-74B-Preview: Scaling Pretraining on AMD

Reddit r/LocalLLaMA ↗ · 2d ago Cached

Zyphra releases ZAYA1-74B-Preview, a 74-billion parameter base model trained on AMD hardware, highlighting strong pre-RL reasoning capabilities and agentic performance signals.

0 favorites 0 likes

#pre-training

OpenAI’s technology explained

OpenAI Blog ↗ · 2023-10-11 Cached

OpenAI publishes an explainer on its core technology, detailing how language models like GPT-4 are developed through pre-training (learning from vast text data) and post-training (alignment with human values and safety practices). The article emphasizes OpenAI's nonprofit mission structure and explains the distinction between raw base models and refined, usable versions.

0 favorites 0 likes

#pre-training

DALL·E 2 pre-training mitigations

OpenAI Blog ↗ · 2022-06-28 Cached

OpenAI describes the pre-training data filtering and active learning techniques used to reduce harmful content in DALL·E 2, while also addressing unintended bias amplification caused by data filtering—particularly demographic biases in generated images.

0 favorites 0 likes

#pre-training

Language models are few-shot learners

OpenAI Blog ↗ · 2020-05-28 Cached

OpenAI introduces GPT-3, a 175-billion parameter autoregressive language model that demonstrates strong few-shot learning capabilities across diverse NLP tasks without gradient updates or fine-tuning, representing a paradigm shift in how language models can be applied to new tasks through text interactions alone.

0 favorites 0 likes

#pre-training

Improving language understanding with unsupervised learning

OpenAI Blog ↗ · 2018-06-11 Cached

OpenAI presents a two-stage approach for improving language understanding: pretraining a transformer model on large unsupervised datasets using language modeling, then fine-tuning on smaller supervised datasets for specific tasks. The method achieves state-of-the-art results across diverse tasks including commonsense reasoning, semantic similarity, and reading comprehension with minimal hyperparameter tuning.

0 favorites 0 likes

pre-training

Rethinking Data Curation in LLM Training: Online Reweighting Offers Better Generalization than Offline Methods

ZAYA1-74B-Preview: Scaling Pretraining on AMD

OpenAI’s technology explained

DALL·E 2 pre-training mitigations

Language models are few-shot learners

Improving language understanding with unsupervised learning

Submit Feedback