pre-training

Tag

Cards List
#pre-training

FastMix: Fast Data Mixture Optimization via Gradient Descent

arXiv cs.LG · 2026-06-16 Cached

FastMix is a novel framework that automates data mixture discovery for training large models using a single proxy model and bilevel optimization, achieving state-of-the-art performance with significant efficiency gains.

0 favorites 0 likes
#pre-training

Kairos: A Native World Model Stack for Physical AI

Hugging Face Daily Papers · 2026-06-16 Cached

Kairos is a native world model framework for Physical AI that learns from diverse experiences using a cross-embodiment data curriculum, maintains persistent states with hybrid temporal attention, and supports efficient deployment on server and consumer hardware.

0 favorites 0 likes
#pre-training

@Hesamation: Google DeepMind pre-training lead explains two skills with massive demand by AI frontier labs: > Kernel Development > L…

X AI KOLs Timeline · 2026-06-15 Cached

Google DeepMind's pre-training lead Vlad Feinberg highlights kernel development and low-level performance engineering as high-demand skills for frontier AI labs.

0 favorites 0 likes
#pre-training

Ryan Peterman (@ryanlpeterman) on X

X AI KOLs Timeline · 2026-06-15 Cached

Interview with Google DeepMind's pre-training area lead Vlad Feinberg about landing jobs at frontier AI labs, covering needed skills, research vs engineering differences, and scaling laws.

0 favorites 0 likes
#pre-training

No Resource, No Benchmarks, No Problem? Evaluating and Improving LLMs for Code Generation in No-Resource Languages

Hugging Face Daily Papers · 2026-06-15 Cached

This paper tackles code generation for no-resource programming languages by building benchmarks and proposing a method that combines further pre-training with weight difference transfer to create specialized instruction-following models at reduced cost.

0 favorites 0 likes
#pre-training

The Curse of Depth in Large Language Models

Lobsters Hottest · 2026-06-13 Cached

This paper introduces the Curse of Depth in LLMs, where deep layers become ineffective due to Pre-Layer Normalization causing output variance explosion. The authors propose LayerNorm Scaling to mitigate this, showing consistent improvements in pre-training and fine-tuning across model sizes up to 7B.

0 favorites 0 likes
#pre-training

@_rohit_tiwari_: This 230-page book unlocks the secrets of LLMs. https://drive.google.com/file/d/1ZqV0wByb65_wvzWUbaLw6pCbtXgyXDHG/view……

X AI KOLs Timeline · 2026-06-11 Cached

A 230-page book that comprehensively covers LLM concepts including pre-training, fine-tuning, alignment, and prompting techniques.

0 favorites 0 likes
#pre-training

When Probing Accuracy Saturates, Fragility Resolves: A Complementary Metric for LLM Pre-Training Analysis

arXiv cs.CL · 2026-06-11 Cached

This paper introduces 'fragility', a complementary metric to probe accuracy that measures activation-noise level at which probe accuracy collapses, enabling analysis of representation evolution during LLM pre-training even after accuracy saturates.

0 favorites 0 likes
#pre-training

@samsja19: Very exciting work to bridge the gap between RL and mid/pretraining You can learn from your environment beyond the rewa…

X AI KOLs Following · 2026-06-10 Cached

A new method called ECHO bridges RL and pre-training by using next token prediction on tool call outputs to learn from the environment beyond reward signals, combining world modeling and agentic actions.

0 favorites 0 likes
#pre-training

CodeAlchemy: Synthetic Code Rewriting at Scale

arXiv cs.CL · 2026-06-10 Cached

CodeAlchemy is a synthetic data generation framework that transforms publicly available code into semantically rich training data using five strategies, producing over 500 billion tokens and enabling small models to outperform much larger ones on code benchmarks.

0 favorites 0 likes
#pre-training

Repetition Mismatch: Why Data Mixture Experiments Don't Scale and How to Fix Them

arXiv cs.LG · 2026-06-09 Cached

The paper identifies repetition mismatch as a primary cause for data mixture experiments failing to scale, and proposes a repetition-controlled subsampling procedure that allows small-scale experiments to recover near-optimal mixtures using far fewer tokens.

0 favorites 0 likes
#pre-training

@Hesamation: 3Blue1Brown’s new video explains why every LLM is actually a compression machine. everyone describes pre-training as “n…

X AI KOLs Timeline · 2026-06-08 Cached

3Blue1Brown's new video explains that LLMs are fundamentally compression machines, linking next-token prediction to efficient encoding of human knowledge, which leads to better abstraction and reasoning.

0 favorites 0 likes
#pre-training

@FinanceYF5: Aizpurua team at Multiverse Computing, Spain, proposes using small quantum circuits to expand pretrained large models: instead of stacking parameters, they compress complex mathematical relationships into quantum circuits. Adding only about 6,000 parameters to Llama 3.1 8B (less than 0.01% of the original model) reduces perplexity by 1.4%.

X AI KOLs Following · 2026-06-08 Cached

The Aizpurua team at Multiverse Computing, Spain, proposes expanding pretrained large models with small quantum circuits. Adding just about 6,000 parameters to Llama 3.1 8B reduces perplexity by 1.4%, demonstrating the feasibility of quantum-circuit-assisted large model scaling.

0 favorites 0 likes
#pre-training

@Potatoloogs: Cursor trains Composer 2: Pre-training lets the model "learn knowledge", RL lets the model know "who it is" a) Why Cursor trains its own models Think of a model like a hard drive—it can only store a limited amount of information. Cursor cares about only one thing: software engineering, and only inside Cursor...

X AI KOLs Timeline · 2026-06-05 Cached

Detailed walkthrough of Cursor's approach to training Composer 2: using Kimi 2.5 as the base, learning code knowledge through large-scale intermediate training, then large-scale RL to teach the model to write correct code in real environments, and using self-summarization to handle long contexts.

0 favorites 0 likes
#pre-training

Predict and Reconstruct: Joint Objectives for Self-Supervised Language Representation Learning

arXiv cs.CL · 2026-06-05 Cached

The paper proposes a hybrid pre-training objective combining JEPA latent-space prediction with MLM reconstruction for language models, showing improved embedding uniformity and semantic-lexical balance.

0 favorites 0 likes
#pre-training

@nrehiew_: For the visual learners

X AI KOLs Timeline · 2026-06-05 Cached

A thread reviewing the paper 'Pretraining Large Language Models with NVFP4' and discussing NVFP4 pre-training, especially for NVIDIA Blackwell.

0 favorites 0 likes
#pre-training

RL Excursions during Pre-Training: Re-examining Policy Optimization for LLM training

arXiv cs.LG · 2026-06-04 Cached

Harvard researchers challenge the standard LLM training pipeline by showing RL can be effectively applied during pre-training rather than only after SFT, finding that data composition matters more than model scale, and proposing parallel averaging of RL and SFT objectives that outperforms sequential approaches while preserving general capabilities.

0 favorites 0 likes
#pre-training

The Word and the Way: Strategies for Domain-Specific BERT Pre-Training in German Medical NLP

arXiv cs.CL · 2026-06-03 Cached

This paper introduces ChristBERT, a family of domain-specific RoBERTa-based language models for German clinical NLP, and evaluates three domain adaptation strategies (continued pre-training, pre-training from scratch, and vocabulary adaptation) on medical named entity recognition and text classification tasks, achieving state-of-the-art results.

0 favorites 0 likes
#pre-training

Regret Pre-training: Bridging Prior and Posterior Views for Enhanced Knowledge Grounding

arXiv cs.CL · 2026-06-03 Cached

This paper introduces Regret Pre-training, a self-supervised framework that uses a dual-view architecture to incorporate future context into causal language model training, improving performance on downstream tasks by up to 18 percentage points without adding parameters.

0 favorites 0 likes
#pre-training

@NielsRogge: What is mid-training? The stage between pre-training and post-training A base model is continued on a smaller, curated …

X AI KOLs Timeline · 2026-06-02 Cached

Explains mid-training as a stage between pre-training and post-training, where a base model is continued on curated data to strengthen specific capabilities before instruction tuning.

0 favorites 0 likes
Next →
← Back to home

Submit Feedback