@GitHub_Daily: 想深入研究模型,不能只停留在应用层,需要弄懂底层系统是如何训练和优化的。 偶然发现 LLMSys-PaperList,这份精心整理了大模型系统相关的论文合集。 从 2022 年一直更新到 2026 年最新的顶会论文,并按训练、推理、多模态…

X AI KOLs Timeline 工具

摘要

一个精心整理的大模型系统相关论文合集,涵盖训练、推理、多模态等方向,持续更新并收录了技术报告、框架和课程,适合研究人员和开发者参考。

想深入研究模型,不能只停留在应用层,需要弄懂底层系统是如何训练和优化的。 偶然发现 LLMSys-PaperList,这份精心整理了大模型系统相关的论文合集。 从 2022 年一直更新到 2026 年最新的顶会论文,并按训练、推理、多模态等方向分类。 每篇都标注了出处和发表会议,相当于一份持续更新的文献地图。 GitHub:http://github.com/AmberLJC/LLMSys-PaperList… 除了学术论文,还收录了各大厂的技术报告、开源训练和推理框架、相关课程,以及 DeepSeek、Llama、Qwen 等主流模型的技术文档。 如果我们正在做大模型相关的研究或开发,这份清单值得收藏,省去大量找论文的时间。
查看原文
查看缓存全文

缓存时间: 2026/06/12 08:58

想深入研究模型,不能只停留在应用层,需要弄懂底层系统是如何训练和优化的。

偶然发现 LLMSys-PaperList,这份精心整理了大模型系统相关的论文合集。

从 2022 年一直更新到 2026 年最新的顶会论文,并按训练、推理、多模态等方向分类。

每篇都标注了出处和发表会议,相当于一份持续更新的文献地图。

GitHub:http://github.com/AmberLJC/LLMSys-PaperList…

除了学术论文,还收录了各大厂的技术报告、开源训练和推理框架、相关课程,以及 DeepSeek、Llama、Qwen 等主流模型的技术文档。

如果我们正在做大模型相关的研究或开发,这份清单值得收藏,省去大量找论文的时间。


AmberLJC/LLMSys-PaperList

Source: https://github.com/AmberLJC/LLMSys-PaperList

Awesome LLM Systems Papers

A curated list of Large Language Model systems related academic papers, articles, tutorials, slides and projects. Star this repository, and then you can keep abreast of the latest developments of this booming research field.

Table of Contents

LLM Systems

Training

Pre-training

Before 2024
2024
2025

2026

Systems for Post-training / RLHF

Before 2024
2024
  • Ymir: A Scheduler for Foundation Model Fine-tuning Workloads in Datacenters | ICS’ 24
  • HybridFlow: A Flexible and Efficient RLHF Framework
  • ReaLHF: Optimized RLHF Training for Large Language Models through Parameter Reallocation
  • NeMo-Aligner: Scalable Toolkit for Efficient Model Alignment | Nvidia
2025
  • RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion | NSDI’25
  • Systems Opportunities for LLM Fine-Tuning using Reinforcement Learning
  • AReaL: A Large-Scale Asynchronous Reinforcement Learning System for Language Reasoning | Code | Ant
  • StreamRL: Scalable, Heterogeneous, and Elastic RL for LLMs with Disaggregated Stream Generation
  • RL-Factory: Train your Agent model via our easy and efficient framework
  • PLoRA: Efficient LoRA Hyperparameter Tuning for Large Models
  • History Rhymes: Accelerating LLM Reinforcement Learning with RhymeRL
  • APRIL: Active Partial Rollouts in Reinforcement Learning to tame long-tail generation
  • Seer: Online Context Learning for Fast Synchronous LLM Reinforcement Learning
  • SkyRL-Agent: Efficient RL Training for Multi-turn LLM Agent

2026

Fault Tolerance / Straggler Mitigation

Before 2024
  • Oobleck: Resilient Distributed Training of Large Models Using Pipeline Templates | SOSP’ 23
  • GEMINI: Fast Failure Recovery in Distributed Training with In-Memory Checkpoints | SOSP’ 23
2024
2025

2026

Serving

LLM serving

Before 2024
2024
2025

2026

Agent Systems

2024
2025

2026

Serving at the edge

Before 2024
2024
2025

2026

System Efficiency Optimization - Model Co-design

Before 2024
2024
2025

2026

Multi-Modal Training Systems

Multi-Modal Serving Systems

LLM for Systems

Industrial LLM Technical Report

Before 2024
2024
2025
2026

ML Conferences

NeurIPS 2025

A curated collection of NeurIPS 2025 papers focused on efficient systems for generative AI models. The collection includes papers on:

See the full NeurIPS 2025 collection for detailed categorization and paper summaries.

LLM Frameworks

Training

  • DeepSpeed: a deep learning optimization library that makes distributed training and inference easy, efficient, and effective | Microsoft

  • Accelerate | Hugging Face

  • LLaVA

  • Megatron | Nvidia

  • NeMo | Nvidia

  • torchtitan | PyTorch

  • torchtune: PyTorch-native fine-tuning library for LLMs with minimal dependencies | PyTorch

  • veScale | ByteDance

  • DeepSeek Open Infra

  • VeOmni: Scaling any Modality Model Training

  • Cornstarch: Distributed Multimodal Training Must Be Multimodality-Aware | UMich

  • GPT-NeoX: Model-parallel autoregressive LLM training combining Megatron and DeepSpeed | EleutherAI

  • nanotron: Minimalistic 3D-parallel (tensor/pipeline/data) LLM training framework | Hugging Face

  • litgpt: 20+ LLM implementations with pre-training and fine-tuning recipes | Lightning AI

  • LLaMA-Factory: Unified efficient fine-tuning of 100+ LLMs and VLMs via LoRA, full fine-tuning, and RL methods | ACL’ 24

  • Unsloth: 2-5x faster LLM fine-tuning with ~80% less memory via custom Triton/CUDA kernels

  • Post-Training

    • PEFT: Parameter-efficient fine-tuning library (LoRA, QLoRA, Prompt Tuning, IA3, etc.) | Hugging Face
    • TRL: Transformers Reinforcement Learning
    • OpenRLHF: An Easy-to-use, Scalable and High-performance RLHF Framework based on Ray
    • VeRL: Volcano Engine Reinforcement Learning for LLMs
    • rLLM: Reinforcement Learning for Language Agents
    • SkyRL: A Modular Full-stack RL Library for LLMs
    • AReal: Distributed RL System for LLM Reasoning
    • ROLL: Reinforcement Learning Optimization for Large-Scale Learning
    • slime: a LLM post-training framework aiming for RL Scaling
    • RAGEN: Training Agents by Reinforcing Reasoning
    • Agent Lightning: Train ANY AI Agents with Reinforcement Learning
    • LMFlow: Extensible toolkit for fine-tuning and inference of large foundation models
    • NeMo-Aligner: Scalable alignment toolkit for SFT, PPO, DPO, and SteerLM on NeMo | Nvidia

Serving

  • llama.cpp: LLM inference in C/C++ with GGUF quantization; supports CPU, Metal, CUDA, and wide hardware
  • Ollama: Local LLM serving with model management and OpenAI-compatible API
  • TensorRT-LLM | Nvidia
  • Triton Inference Server: Production multi-framework model serving platform with dynamic batching | Nvidia
  • Ray-LLM | Ray
  • TGI | Hugging Face
  • vLLM | UCB
  • SGLang | UCB
  • LMDeploy: LLM compression, deployment, and serving toolkit with TurboMind persistent batching engine | InternLM
  • LightLLM: Lightweight Python LLM serving with tri-process architecture decoupling prefill and decode
  • DeepSpeed-MII: Low-latency, high-throughput LLM inference powered by DeepSpeed | Microsoft
  • CTranslate2: Fast C++/Python inference engine for Transformer models with int8/int16 quantization | OpenNMT
  • Petals: Distributed LLM inference and fine-tuning across volunteer GPUs in a BitTorrent-like fashion | ACL’ 23
  • KV Transformers
  • Dynamo: A Datacenter Scale Distributed Inference Serving Framework | Nvidia
  • LMCache: Supercharge Your LLM with the Fastest KV Cache Layer
  • aibrix: Cost-efficient pluggable infrastructure for GenAI inference (KV cache routing, autoscaling, disaggregated prefill) | vLLM Project

ML Systems

Survey Paper

LLM Benchmark / Leaderboard ? Traces

Related ML Readings

MLSys Courses

  • Systems for Machine Learning | (Stanford)[https://cs229s.stanford.edu/fall2023/]
  • Systems for Generative AI | (Umich)[https://github.com/mosharaf/eecs598/tree/w24-genai]
  • Systems for AI - LLMs | (GT)[https://cs8803-sp24.anand-iyer.com/]

Other Reading

相似文章

@GitHub_Daily: 想了解大语言模型到底是怎么工作的,找到的资料都太过于学术看不懂,或者说的太浅只讲概念,就没一个从头到尾讲清楚的内容。 无独有偶,看到 how-llms-work 这个项目,把大模型的完整流程做成了一个可视化交互网页,内容基于 Karpat…

X AI KOLs Timeline

An interactive visual guide, 'how-llms-work', breaks down the entire lifecycle of Large Language Models based on Andrej Karpathy's lectures, covering data collection to post-training.

@VincentLogic: 每天被 Arxiv 的新论文淹没?头都大了。 刚发现一个宝藏网站,专门聚合最新的 AI 论文和模型基准测试(Benchmarks)。 界面很干净,直接看 Trending 或者按周/月筛选。最爽的是每篇论文都直接关联了它用到的 bench…

X AI KOLs Timeline

推荐一个免费网站 sophon.at/papers,聚合最新 AI 论文和模型基准测试,界面干净,支持按 Trending 或周/月筛选,每篇论文直接关联所用 benchmark 和 model。