@yihong0618: At noon today, I read through an article by an older brother in order. Four years ago, he was still learning step by step from Andrew Ng's course. At the end of one article, he wrote this passage. I never expected that four years later, he would truly become a research giant publishing papers in top journals. Quite emotional. https://zhouyifan.net/2022/05/31/20220531-styletransfer/…
Summary
The author reflects on a senior's journey from following Andrew Ng's courses four years ago to publishing papers in top journals today, and cites a blog post explaining style transfer with a PyTorch implementation.
View Cached Full Text
Cached at: 06/12/26, 12:58 PM
I was reading my senior’s articles in order this afternoon. Four years ago, he was still learning step by step from Andrew Ng’s course. At the end of one article, he wrote this paragraph. I didn’t expect that four years later, he really became a top researcher publishing papers in top journals. I feel a bit emotional. https://zhouyifan.net/2022/05/31/20220531-styletransfer/…
Neural Style Transfer – Classic Paper Explanation and PyTorch Implementation
Source: https://zhouyifan.net/2022/05/31/20220531-styletransfer/ Today I spent half an hour understanding the paper “Image Style Transfer Using Convolutional Neural Networks” by Leon Gatys et al., then another half an hour understanding its PyTorch implementation, and finally spent half an afternoon implementing this work myself. Now it’s evening, and I’ll share it with everyone.
This article will introduce the principles of style transfer while showing part of the code. The complete code will be given in the appendix.
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E5%9F%BA%E4%BA%8E-CNN-%E7%9A%84%E5%9B%BE%E5%83%8F%E9%A3%8E%E6%A0%BC%E8%BF%81%E7%A7%BB CNN-Based Image Style Transfer
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E4%BB%80%E4%B9%88%E6%98%AF%E9%A3%8E%E6%A0%BC%E8%BF%81%E7%A7%BB What is Style Transfer?
We all know that every painting can be seen as a combination of “content” and “style” (painting style).
For example, the famous painting “The Scream” depicts a person with an open mouth, which is an expressionist style.
And Van Gogh’s “Starry Night” is a night scene with a very personal style.
Another example, this painting of a girl in anime style.
Finally, this is a handsome guy – a realistic photo.
The so-called style transfer is to embed the style of one image into the content of another image, forming a new image:
As shown in the figure above, the top-left A is a real photo, and B, C, D are new images formed by transferring the styles of other paintings into the original image.
What technology can achieve such a magical “style transfer” effect? Don’t worry, let’s start with a few simple examples.
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E5%A4%8D%E5%88%B6%E4%B8%80%E5%B9%85%E5%9B%BE%E7%89%87 Copying an Image
What would you do if you wanted to copy an image?
On Windows, you can open the Paint software, click the selection box in the top-left corner, and select the area you want to copy. Ctrl+C, Ctrl+V easily completes the image copy.
But I think this method is too simple and doesn’t reflect the wisdom of us who have studied math. I plan to use a more advanced method.
I treat the task of copying an image as a mathematical optimization problem. Given a source image S, I want to generate a target image T that minimizes the mean squared error MSE(S-T). Thus, the problem of generating an image becomes an optimization problem of finding the optimal T.
For this problem, we can randomly initialize an image T, then perform gradient descent on the optimization objective above. After a few iterations, we can find the optimal T – a target image identical to the source image S.
This logic can be implemented in PyTorch:
Assume we read an image img using the read_image function and preprocess it into the format [1, 3, H, W].
1
source_img = read_image('dldemos/StyleTransfer/picasso.jpg')
We can randomly initialize an image of size [1, 3, H, W]. Since this image is our optimization target, we set input_img.requires_grad_(True) so that PyTorch can automatically optimize it.
12
input_img = torch.randn(1, 3, *img_size)input_img.requires_grad_(True)
Then, we use PyTorch’s optimizer LBFGS and pass the parameters to be optimized according to the optimizer’s requirements (this is the optimizer recommended by the paper’s authors).
1
optimizer = optim.LBFGS([input_img])
After preparing all variables, we perform gradient descent:
12345678910111213141516
steps = 0while steps <= 10: def closure(): global steps optimizer.zero_grad() loss = F.mse_loss(input_img, source_img) loss.backward() steps += 1 if steps % 5 == 0: print(f"Step {steps}:") print(f"Loss: {loss}") return loss optimizer.step(closure)
One thing to note about this code: Due to the special nature of LBFGS, we need to encapsulate the gradient descent execution into a closure (a temporarily defined function) and pass this closure to optimizer.step.
After running the gradient descent code above, this optimization problem will quickly converge. After optimization, assuming we have written a function save_image for post-processing images, we can save it like this:
1
save_image(input_img, 'work_dirs/output.jpg')
Theoretically, this image will be identical to our source image img.
At this point, you must be full of doubts: Why use such a complicated method to copy an image? It’s like telling you x=2 and using an optimization algorithm to find y that exactly equals x. Wouldn’t it be easier to just set y=2? Don’t worry, let’s continue.
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E6%8B%9F%E5%90%88%E7%A5%9E%E7%BB%8F%E7%BD%91%E7%BB%9C%E7%9A%84%E8%BE%93%E5%87%BA Fitting the Output of a Neural Network
The process of solving for the target image T just now can actually be seen as fitting a certain feature of T to the feature of S. Only, we used the most basic feature: pixel values. What would happen if we fit more specific features?
Gatys and other scientists found that if we use the convolutional outputs of different layers of a pre-trained VGG model as fitting features, we can fit different images:
If you are not familiar with the pre-trained VGG model, don’t worry. VGG is a neural network model with many convolutional layers. The pre-trained VGG model is a VGG model trained on an image classification dataset. After pre-training, each convolutional layer of VGG can extract some features of the image, even though these features may not be understandable to humans.
In the figure above, the images further to the right are recovered by fitting features from deeper convolutional layers. From these recovery results, we can see that deeper features only preserve the content (shape) of the image, but cannot preserve the texture (sky color, house color).
At this point, you might be wondering: How exactly are these images fitted? Let’s take a detailed look at this image generation process, just like before.
Suppose we want to generate image c in the figure above, the fitting result of the third convolutional layer. We have obtained the model model_conv123, which contains the first three convolutional layers of the pre-trained VGG. We can set the following optimization objective:
123
source_feature = model_conv123(source_img)input_feature = model_conv123(input_img)
In implementation, we just need to slightly modify the code at the beginning.
First, we precompute the features of the source image. Note that we use source_feature.detach() to detach source_feature from the computation graph, preventing the source image from being automatically updated by PyTorch.
12
source_img = read_image('dldemos/StyleTransfer/picasso.jpg')source_feature = model_conv123(source_img).detach()
Then, we perform gradient descent in a similar way:
1234567891011121314151617
steps = 0while steps <= 50: def closure(): global steps optimizer.zero_grad() input_feature = model_conv123(input_img) loss = F.mse_loss(input_feature, source_feature) loss.backward() steps += 1 if steps % 5 == 0: print(f"Step {steps}:") print(f"Loss: {loss}") return loss optimizer.step(closure)
See? The method of generating a target image using an optimization problem is not stupid; it was just overkill at the beginning. With this method, we can generate a target image that fits the deep features of the source image in the neural network. So how can we use this method to achieve style transfer?
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E9%A3%8E%E6%A0%BC-%E5%86%85%E5%AE%B9-%E9%A3%8E%E6%A0%BC%E8%BF%81%E7%A7%BB Style + Content = Style Transfer
Gatys and other scientists discovered that not only convolution outputs can be used as fitting features, but also some other intermediate results of VGG. Inspired by previous work on texture generation using CNNs [2], they found that using the Gram matrix of convolution results as a fitting feature can produce a different image generation effect:
In the figure above, panels a-e on the right are images fitted to the left image using the Gram matrices of convolution results from different VGG layers as fitting features. It can be seen that when using this type of feature for fitting, the generated image loses the content of the original image (e.g., the positions of stars and objects completely change), but preserves the overall style of the image.
Let’s briefly explain the calculation of the Gram matrix. The Gram matrix is defined on two feature matrices F_1, F_2. Each feature matrix F is obtained by reshaping the convolution output tensor F_conv (shape: [n, h, w]) of a VGG layer into a matrix F (shape: [n, h * w]). The Gram matrix is the inner product of the two feature matrices F_1, F_2, i.e., a matrix formed by the similarity between each channel’s feature vector of F_1 and each channel’s feature vector of F_2. Here we assume F_1 = F_2, i.e., we generate the Gram matrix of a convolution feature with itself. This logic is implemented in code as follows:
1234567
def gram(x: torch.Tensor): n, c, h, w = x.shape features = x.reshape(n * c, h * w) features = torch.mm(features, features.T) return features
The Gram matrix represents the similarity between channels, independent of position. Therefore, the Gram matrix is a measure with spatial invariance, capable of describing the properties of the entire image, suitable for fitting style. In contrast, when we fitted image content earlier, we used the feature at each position of the image, which is a spatially dependent measure. The Gram matrix is just one optional measure for fitting style. Subsequent research has shown that other similar features can also achieve the same effect as the Gram matrix. We don’t need to dwell too much on the principle of the Gram matrix.
At this point, you may already understand how style transfer is achieved. Style transfer is essentially fitting the content of one image while fitting the style of another image. We call the first image the content image and the second image the style image.
In the previous section, we learned how to fit content; in this section, we learned how to fit style. To combine both, we simply make our optimization objective include both the content loss with respect to the content image and the style loss with respect to the style image. In the original paper, these losses are expressed as follows:
The first formula above represents the content loss, and the second formula represents the style loss.
In the first formula, F and P are the convolutional features of the generated image and the source image, respectively.
In the second formula, F is the convolutional feature of the generated image, G is the Gram matrix of F, A is the Gram matrix of the source image’s convolutional feature, and E_l represents the style loss of the lth layer. In the paper, the total style loss is a weighted sum of style losses from several layers, where the weights are w_l. In fact, not only the total style loss can be expressed as a weighted sum of multi-layer style losses, but the total content loss can also be expressed as a weighted sum of multi-layer content losses. However, in the original paper, only one layer was used for content loss.
In the third formula, \alpha and \beta are the weights for content loss and style loss, respectively. In practice, we only need to consider the ratio of \alpha to \beta. If \alpha is larger, more weight is given to content optimization, and the generated image will be closer to the content image. Conversely, the opposite is true.
Just replace the loss in our previous code with this loss, and we can complete image style transfer. Sounds simple, right? However, implementing style transfer in PyTorch involves many details. In the appendix of this article, I will explain some parts of the style transfer implementation code.
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E6%80%9D%E8%80%83 Reflection
Actually, this article is an early work on style transfer using neural networks. In recent years, there have certainly been many studies attempting to improve this method. Nowadays, delving deeply into the details of this paper (why use the Gram matrix, which VGG layers to use for fitting) is no longer very meaningful. What we should focus on is the main idea of this paper.
The biggest inspiration I got from this article is: Neural networks can not only be trained on large datasets to perform general tasks, but they can also be used as feature extractors after pre-training to provide additional information for other tasks. Also, remember that neural networks are just a special case of optimization tasks; we can use gradient descent for ordinary optimization tasks. Gradient descent is also applicable to these optimization tasks where we use the parameters of the neural network but do not update them.
Furthermore, the “style” mentioned in this article is a very interesting property. This work is arguably the first to use information from neural networks to extract image attributes like content and style. This idea of extracting attributes (especially style) has been used in many subsequent studies, such as the famous StyleGAN.
For a long time, people have treated neural networks as black boxes. However, this article gives us a way to lift the lid: by fitting the features of convolution kernels in a neural network, we can get a glimpse of what information each layer of the neural network retains. I believe that in future research, people will be able to study the internal principles of neural networks more meticulously.
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E5%8F%82%E8%80%83%E6%96%87%E7%8C%AE References
[1] Gatys L A, Ecker A S, Bethge M. Image style transfer using convolutional neural networks[C]//Proceedings of the IEEE conference on computer vision and pattern recognition. 2016: 2414-2423.
[2] Gatys L, Ecker A S, Bethge M. Texture synthesis using convolutional neural networks[J]. Advances in neural information processing systems, 2015, 28.
[3] Code implementation: https://pytorch.org/tutorials/advanced/neural_style_tutorial.html
This code implementation is based on the PyTorch official tutorial (https://pytorch.org/tutorials/advanced/neural_style_tutorial.html).
The code repository link for this article: https://github.com/SingleZombie/DL-Demos/tree/master/dldemos/StyleTransfer
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E5%87%86%E5%A4%87%E5%B7%A5%E4%BD%9C Preliminary Work
First, import the required libraries. We need to import PyTorch’s basic library and import torchvision for image transformations and initializing pre-trained models. Additionally, we use PIL for reading and writing images. We can also conveniently set the computation device (CPU or GPU).
12345678
import torchimport torch.nn.functional as Fimport torch.optim as optimimport torchvision.models as modelsimport torchvision.transforms as transformsfrom PIL import Imagedevice = torch.device('cuda' if torch.cuda.is_available() else 'cpu')
Next is image reading. To correctly calculate the loss, all image shapes must be uniform. Therefore, after reading the image, we must perform a Resize preprocessing. After preprocessing, the image we obtain is in the format c, h, w. Don’t forget to add the batch dimension using unsqueeze.
Here,
transformsintorchvisionrepresents some preprocessing operations. Some operations can only be performed on PIL images, not onnp.ndarray. Therefore, usingPILfor image reading and writing is more convenient than usingcv2.
1234567891011
img_size = (256, 256)def read_image(image_path): pipeline = transforms.Compose( [transforms.Resize((img_size)), transforms.ToTensor()]) img = Image.open(image_path).convert('RGB') img = pipeline(img).unsqueeze(0) return img.to(device, torch.float)
When saving an image, simply call the PIL API:
123456
def save_image(tensor, image_path): toPIL = transforms.ToPILImage() img = tensor.detach().cpu().clone() img = img.squeeze(0) img = toPIL(img) img.save(image_path)
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E8%AF%AF%E5%B7%AE%E8%AE%A1%E7%AE%97 Loss Calculation
When defining losses in PyTorch, a more elegant approach is to define a torch.autograd.Function. However, this is cumbersome because it requires writing the backward pass manually. Since all losses introduced in this article are based on MSE (mean squared error), we can implement some “fake” loss functions based on torch.nn.Module.
First, define the content loss:
123456789
class ContentLoss(torch.nn.Module): def __init__(self, target: torch.Tensor): super().__init__() self.target = target.detach() def forward(self, input): self.loss = F.mse_loss(input, self.target) return input
In the neural network, this class does not perform any computation (forward directly returns input). However, this class caches the content loss value. We can later retrieve the loss attribute of this class instance and plug it into the final loss calculation formula. This method of inserting a torch.nn.Module that does no computation to save intermediate results is a small trick in PyTorch.
Next, define the Gram matrix calculation method and the style loss calculation “function”:
12345678910111213141516171819
def gram(x: torch.Tensor): n, c, h, w = x.shape features = x.reshape(n * c, h * w) features = torch.mm(features, features.T) / n / c / h / w return featuresclass StyleLoss(torch.nn.Module): def __init__(self, target: torch.Tensor): super().__init__() self.target = gram(target.detach()).detach() def forward(self, input): G = gram(input) self.loss = F.mse_loss(G, self.target) return input
The idea for implementing style loss here is the same as content loss.
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E8%8E%B7%E5%8F%96%E9%A2%84%E8%AE%AD%E7%BB%83%E6%A8%A1%E5%9E%8B Obtaining the Pre-trained Model
The VGG model has requirements for the distribution of input data (i.e., requirements on the mean and standard deviation of the input data). For convenience, we can write a normalization layer as the first layer of the final model:
123456789
class Normalization(torch.nn.Module): def __init__(self, mean, std): super().__init__() self.mean = torch.tensor(mean).to(device).reshape(-1, 1, 1) self.std = torch.tensor(std).to(device).reshape(-1, 1, 1) def forward(self, img): return (img - self.mean) / self.std
Next, we can use the pre-trained VGG from torchvision to extract the modules we need. We also need to obtain references to the loss class instances we just wrote to calculate the final loss.
The idea of this code is: instead of directly using VGG, we create a new sequential model represented by torch.nn.Sequential. We first add the normalization layer to this sequence, then add the computational layers from the original VGG one by one into our new sequential model. Once we find that the output of a computational layer is needed for calculating loss, we add a loss module after that layer to capture the loss.
This logic is hard to explain in words, so you can directly read the code:
12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849
default_content_layers = ['conv_4']default_style_layers = ['conv_1', 'conv_2', 'conv_3', 'conv_4', 'conv_5']def get_model_and_losses(content_img, style_img, content_layers, style_layers): num_loss = 0 expected_num_loss = len(content_layers) + len(style_layers) content_losses = [] style_losses = [] model = torch.nn.Sequential( Normalization([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])) cnn = models.vgg19(pretrained=True).features.to(device).eval() i = 0 for layer in cnn.children(): if isinstance(layer, torch.nn.Conv2d): i += 1 name = f'conv_{i}' elif isinstance(layer, torch.nn.ReLU): name = f'relu_{i}' layer = torch.nn.ReLU(inplace=False) elif isinstance(layer, torch.nn.MaxPool2d): name = f'pool_{i}' elif isinstance(layer, torch.nn.BatchNorm2d): name = f'bn_{i}' else: raise RuntimeError( f'Unrecognized layer: {layer.__class__.__name__}') model.add_module(name, layer) if name in content_layers: target = model(content_img) content_loss = ContentLoss(target) model.add_module(f'content_loss_{i}', content_loss) content_losses.append(content_loss) num_loss += 1 if name in style_layers: target_feature = model(style_img) style_loss = StyleLoss(target_feature) model.add_module(f'style_loss_{i}', style_loss) style_losses.append(style_loss) num_loss += 1 if num_loss >= expected_num_loss: break return model, content_losses, style_losses
Some points to note: VGG has multiple modules, but we only need the vgg19().features module that contains the convolutional layers. Also, we only need the layers used for computing losses. When we find that all loss-related layers have been added to the new model, we can stop adding new modules.
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E7%94%A8%E6%A2%AF%E5%BA%A6%E4%B8%8B%E9%99%8D%E7%94%9F%E6%88%90%E5%9B%BE%E5%83%8F Generating Images with Gradient Descent
The steps here are similar to those in the main text. First, we prepare the initial noise image, model, references to loss class instances, and set which parameters need optimization and which do not.
123456
input_img = torch.randn(1, 3, *img_size, device=device)model, content_losses, style_losses = get_model_and_losses( content_img, style_img, default_content_layers, default_style_layers)input_img.requires_grad_(True)model.requires_grad_(False)
Then, we declare the hyperparameters to be used. These two hyperparameters control whether the image is closer to the content image or the style image.
12
style_img = read_image('dldemos/StyleTransfer/picasso.jpg')content_img = read_image('dldemos/StyleTransfer/dancing.jpg')
These two images are from the official tutorial. Links: picasso (https://pytorch.org/tutorials/_static/img/neural-style/picasso.jpg), dancing (https://pytorch.org/tutorials/_static/img/neural-style/dancing.jpg).
Finally, perform the familiar gradient descent:
123456789101112131415161718192021222324252627282930
optimizer = optim.LBFGS([input_img])steps = 0prev_loss = 0while steps <= 1000 and prev_loss < 100: def closure(): with torch.no_grad(): input_img.clamp_(0, 1) global steps global prev_loss optimizer.zero_grad() model(input_img) content_loss = 0 style_loss = 0 for l in content_losses: content_loss += l.loss for l in style_losses: style_loss += l.loss loss = content_weight * content_loss + style_weight * style_loss loss.backward() steps += 1 if steps % 50 == 0: print(f'Step {steps}:') print(f'Loss: {loss}') prev_loss = loss return loss optimizer.step(closure)
Since we have prior knowledge that the image values lie within (0, 1), we can manually constrain the image’s values to speed up training before each optimization step.
There are some special cases when running the program. Sometimes, the loss of the task may suddenly spike to a very high value and then return to normal after a few iterations. To ensure that the output loss is not always too large, I added a condition prev_loss < 100.
The value of steps can be adjusted, and how small the loss should be depends on the actual task and the values of content_weight, style_weight. These hyperparameters are all tunable.
Finally, we can save the final output image:
123
with torch.no_grad(): input_img.clamp_(0, 1)save_image(input_img, 'work_dirs/output.jpg')
Under normal circumstances, running the above code produces the following result (my style_weight/content_weight = 1e6):
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E5%BD%A9%E8%9B%8B Bonus
After understanding what style transfer does, I immediately thought: Can I use style transfer to render photos into anime style?
After successfully completing the code implementation, I immediately tried to transfer anime style onto my photo:
The effect is terrible! I was not convinced, so I output several intermediate results. That made it even weirder:
I can’t tell whether I’ve entered the anime world or a CRT TV.
It can be seen that this algorithm produces anime-style images that still retain some style elements of the anime image: clear lines and colors in blocks. But the overall effect is too poor.
It just shows that the limitations of this algorithm are too strong. There’s a long way to go to enter the anime world.
https://zhouyifan.net/2022/05/31/20220531-styletransfer/#%E5%90%90%E6%A7%BD Random Thoughts
My intelligence and efficiency have reached a terrifying level. In one day, while living a normal life, I completed reading a paper, reproducing it, writing an article, and bragging. This level of execution is too strong. If I could learn things with such efficiency every day, becoming a top researcher would be within reach.
Unfortunately, doing research is not my final destination. While writing this article, I was also thinking about what keeps me going. Writing articles brings me no financial gain right now. If I wanted to efficiently make money, I shouldn’t be writing articles like this. But I just want to write. I don’t know if it’s to accomplish some personal goals, or to showcase my writing and learning skills honed over the years, or purely for the joy of bragging. I’m no longer quite sure. As long as it feels fun, I’ll keep doing it.
Lately, I’ve been spending less and less time playing video games. Because life, for me, is the most difficult and challenging game of all.
Similar Articles
@AYi_AInotes: Say a hot take: In the AI era, the most valuable skill is no longer writing code. Being able to explain code clearly will become increasingly important! Becoming increasingly important! @trq212, a senior engineer on the Anthropic Claude Code team, took less than two years to make his technical articles reach stable...
This article explores the importance of technical writing in the AI era, citing the case of Anthropic employee @trq212 who achieved millions of page views through his 'plant first, harvest later' writing methodology, emphasizing the value of sharing real experiences and maintaining a personal voice.
@feng11ai: I used one Skill to push a WeChat public account article to 100k+ views. Not a clickbait title. Strongly recommend Don's DBS Skill. After using it for a month, the biggest feeling is that I am no longer writing my WeChat public account alone. My approach is: Write a first draft; throw it to DBS for diagnosis; modify the title, opening...
The author shares their experience of using Don's DBS Skill to assist in writing WeChat public account articles, using AI to diagnose and optimize titles, structure, etc., ultimately achieving 100k+ reads.
@Honcia13: The threshold for scientific research is being completely redefined! Previously: staying up late reading papers, repeatedly running code, writing a week-long review. Now: a single instruction is enough. The open-source AI agent Feynman compresses PhD-level research processes into fully automated execution: a single instruction can complete in-depth arXiv research, literature review, code verification …
The open-source AI agent Feynman, through the collaboration of four intelligent agents, compresses PhD-level research processes (including arXiv research, literature review, code verification) into fully automated execution, requiring only a single instruction from the user.
@Moting284: https://x.com/Moting284/status/2064564715645530329
This article introduces how to use the Codex tool to efficiently transform articles into personal notes and tweets. The core is a three-layer processing method: extracting the main thread, recording personal reactions, and producing output-ready insights. It also provides reusable prompt templates.
@wsl8297: Discovered a deep learning paper reading project on GitHub: paper-reading. Author Mu Shen reads classic and new deep learning papers paragraph by paragraph, recorded into video explanations, has been updated for over 3 years. GitHub: https://github.com/mli/paper-reading...
Mu Shen's deep learning paper reading project on GitHub includes in-depth reading videos of major papers such as GPT-4, Llama 3.1, Sora, etc. Each video is about 1 hour, suitable for AI researchers and developers to deeply understand classic papers.