@songhan_mit: Explore lightening OPD for efficient LLM post training:
Summary
The article introduces a method to lighten OPD for efficient post-training of Large Language Models.
Similar Articles
@_rohit_tiwari_: This 115-page book unlocks the secrets of LLM fine tuning. https://drive.google.com/file/d/1cS5sWZw9XUDRI4uRh02-28Xq4-P…
A comprehensive 115-page guide to fine-tuning large language models, covering theory and practice.
TALAN: Task-Aligned Latent Adaptation Networks for Targeted Post-Training of Large Language Models
TALAN introduces a sequence-conditioned latent side path for targeted post-training of large language models, achieving significant improvements on STEM/code benchmarks with minimal overhead.
Data Mixing for Large Language Models Pretraining: A Survey and Outlook
This paper presents a comprehensive survey of data mixing methods for LLM pretraining, formalizing the problem as bilevel optimization and introducing a taxonomy that distinguishes static (rule-based and learning-based) from dynamic (adaptive and externally guided) mixing approaches. The authors analyze trade-offs, identify cross-cutting challenges, and outline future research directions including finer-grained domain partitioning and pipeline-aware designs.
LiFT: Does Instruction Fine-Tuning Improve In-Context Learning for Longitudinal Modelling by Large Language Models?
LiFT is a longitudinal instruction fine-tuning framework that unifies diverse temporal NLP tasks under a shared instruction schema with curriculum-based training. Evaluated across OLMo, LLaMA, and Qwen models, LiFT consistently outperforms base-model in-context learning, especially on out-of-distribution data and rare change events.
@mdancho84: This 277-page PDF unlocks the secrets of Large Language Models. Here's what's inside:
A 277-page PDF guide revealing insights into Large Language Models, shared via a Twitter thread by Matt Dancho.