@li9292: 如何加入OpenAI?只需精通以下课程: 1. 斯坦福大学的“从零开始的语言建模”课程:http://cs336.stanford.edu/spring2025/ 2. 在掌握广度之后,她逐一深入研究概念,使用博客、论文、与 ChatGP…

X AI KOLs Timeline 新闻

摘要

该推文推荐了斯坦福大学CS336课程及一系列学习资源,作为加入OpenAI的准备路径。

如何加入OpenAI?只需精通以下课程: 1. 斯坦福大学的“从零开始的语言建模”课程:http://cs336.stanford.edu/spring2025/ 2. 在掌握广度之后,她逐一深入研究概念,使用博客、论文、与 ChatGPT 和 Claude 聊天,以及从零实现相关内容。 3. 在面试中,实现/调试 Transformer 的问题经常出现。将它转化为肌肉记忆:http://github.com/stanford-cs336… 4. 刷 Leetcode http://leetcode.com/studyplan/leet… 5. 她分享的其他学习资源: a. 自注意力 & Transformer:http://web.stanford.edu/class/cs224n/r… b. 插图版 GPT-2:http://jalammar.github.io/illustrated-gp… c. 反向传播 http://cs231n.github.io/optimization-2/ d. 语言模型策略梯度入门 http://ivison.id.au/2026/02/09/pol… e. 理解 GRPO 和强化学习原理的轻量级指南 http://gitlostmurali.com/blog/grpo-intr… f. 如何扩展你的模型 http://jax-ml.github.io/scaling-book/ 这是我能免费看的内容吗?
查看原文
查看缓存全文

缓存时间: 2026/06/23 06:01

如何加入OpenAI?只需精通以下课程:

  1. 斯坦福大学的“从零开始的语言建模”课程:http://cs336.stanford.edu/spring2025/

  2. 在掌握广度之后,她逐一深入研究概念,使用博客、论文、与 ChatGPT 和 Claude 聊天,以及从零实现相关内容。

  3. 在面试中,实现/调试 Transformer 的问题经常出现。将它转化为肌肉记忆:http://github.com/stanford-cs336…

  4. 刷 Leetcode http://leetcode.com/studyplan/leet…

  5. 她分享的其他学习资源: a. 自注意力 & Transformer:http://web.stanford.edu/class/cs224n/r…

b. 插图版 GPT-2:http://jalammar.github.io/illustrated-gp…

c. 反向传播 http://cs231n.github.io/optimization-2/

d. 语言模型策略梯度入门 http://ivison.id.au/2026/02/09/pol…

e. 理解 GRPO 和强化学习原理的轻量级指南 http://gitlostmurali.com/blog/grpo-intr…

f. 如何扩展你的模型 http://jax-ml.github.io/scaling-book/

这是我能免费看的内容吗?


Stanford CS336 | Language Modeling from Scratch (Spring 2025 Archive)

Source: https://cs336.stanford.edu/spring2025/ This is the archived website for the Spring 2025 offering of CS336. The latest offering ishere.

Content

What is this course about?

Language models serve as the cornerstone of modern natural language processing (NLP) applications and open up a new paradigm of having a single general purpose system address a range of downstream tasks. As the field of artificial intelligence (AI), machine learning (ML), and NLP continues to grow, possessing a deep understanding of language models becomes essential for scientists and engineers alike. This course is designed to provide students with a comprehensive understanding of language models by walking them through the entire process of developing their own. Drawing inspiration from operating systems courses that create an entire operating system from scratch, we will lead students through every aspect of language model creation, including data collection and cleaning for pre-training, transformer model construction, model training, and evaluation before deployment.

Prerequisites

  • Proficiency in PythonThe majority of class assignments will be in Python. Unlike most other AI classes, students will be given minimal scaffolding. The amount of code you will write will be at least an order of magnitude greater than for other classes. Therefore, being proficient in Python and software engineering is paramount.
  • Experience with deep learning and systems optimizationA significant part of the course will involve making neural language models run quickly and efficiently on GPUs across multiple machines. We expect students to be able to have a strong familiarity with PyTorch and know basic systems concepts like the memory hierarchy.
  • College Calculus, Linear Algebra(e.g. MATH 51, CME 100)You should be comfortable understanding matrix/vector notation and operations.
  • Basic Probability and Statistics(e.g. CS 109 or equivalent)You should know the basics of probabilities, Gaussian distributions, mean, standard deviation, etc.
  • Machine Learning(e.g. CS221, CS229, CS230, CS124, CS224N)You should be comfortable with the basics of machine learning and deep learning.

Note that this is a 5-unit class. This is a very implementation-heavy class, so please allocate enough time for it.

Coursework

Assignments

  • Assignment 1: Basics[leaderboard]- Implement all of the components (tokenizer, model architecture, optimizer) necessary to train a standard Transformer language model. - Train a minimal language model.
  • Assignment 2: Systems[leaderboard]- Profile and benchmark the model and layers from Assignment 1 using advanced tools, optimize Attention with your own Triton implementation of FlashAttention2. - Build a memory-efficient, distributed version of the Assignment 1 model training code.
  • Assignment 3: Scaling- Understand the function of each component of the Transformer. - Query a training API to fit a scaling law to project model scaling.
  • Assignment 4: Data[leaderboard]- Convert raw Common Crawl dumps into usable pretraining data. - Perform filtering and deduplication to improve model performance.
  • Assignment 5: Alignment and Reasoning RL- Apply supervised finetuning and reinforcement learning to train LMs to reason when solving math problems. - Optional Part 2: implement and apply safety alignment methods such as DPO.

All (currently tentative) deadlines are listed in theschedule.

GPU compute for self-study

If you are following along at home, you can access GPU compute from a cloud provider to complete the assignments. Here are a few options (prices for asingle H100 80GB GPUon June 6, 2025):

For convenience and to save money, we recommend debugging correctness of your implementation on CPU first and then using GPU(s) (with the count recommended in the assignments) for completing training runs (A1, A4, A5) or benchmarking GPU operations (A2).

Honor code

Like all other classes at Stanford, we take the studentHonor Codeseriously. Please respect the following policies:

  • Collaboration: Study groups are allowed, but students must understand and complete their own assignments, and hand in one assignment per student. If you worked in a group, please put the names of the members of your study group at the top of your assignment. Please ask if you have any questions about the collaboration policy.
  • AI tools: Prompting LLMs such as ChatGPT is permitted for low-level programming questions or high-level conceptual questions about language models, but using it directly to solve the problem is prohibited. We strongly encourage you to disable AI autocomplete (e.g., Cursor Tab, GitHub CoPilot) in your IDE when completing assignments (though non-AI autocomplete, e.g., autocompleting function names is totally fine). We have found that AI autocomplete makes it much harder to engage deeply with the content.
  • Existing code: Implementations for many of the things you will implement exist online. The handouts we’ll give will be self-contained, so that you will not need to consult third-party code for producing your own implementation. Thus, you should not look at any existing code unless when otherwise specified in the handouts.

Submitting coursework

  • All coursework are submitted via Gradescope by the deadline. Do not submit your coursework via email.
  • If anything goes wrong, please ask a question in Slack or contact a course assistant.
  • You can submit as many times as you’d like until the deadline: we will only grade the last submission.
  • Partial work is better than not submitting any work.

Late days

  • Each student has6 late days to use. A late day extends the deadline by 24 hours.
  • You can use up to 3 late days per assignment.

Regrade requests

If you believe that the course staff made an objective error in grading, you may submit a regrade request on Gradescope within 3 days after the grades are released.

Sponsor

We would like to thankTogether AIfor sponsoring the compute for this class.

Alisa Liu (@alisawuffles): I’m joining OpenAI next week!🥹 The job search turned out to be really challenging but also super rewarding, so I wrote a small blog to share what I learned along the way and hopefully make the process a little less mysterious for the next person.

相似文章