@liuren: https://x.com/liuren/status/2069266318747165146

X AI KOLs Timeline 06/23/26, 03:48 AM News

deep-learning caffe open-source framework ai-history jia-yangqing

Summary

This article recounts in detail the story of Jia Yangqing developing the deep learning framework Caffe (originally named Decaf) from scratch during his time at Berkeley and choosing to open-source it, as well as his personal growth from a student to a technical leader.

https://t.co/pbkcHpMNEO

Original Article

View Cached Full Text

Cached at: 06/23/26, 02:10 PM

Before ChatGPT, Jia Yangqing Set Aside His Doctoral Thesis to Write Caffe, the AI Framework That Changed Everything

This article was written in early 2023, originally titled “Jia Yangqing Open-Sourced the AI Framework Caffe”

[Editor’s Note] In the galaxy of open source and artificial intelligence, the name Jia Yangqing shines exceptionally bright. Because his advisor, Professor Trevor Darrell, once asked him: “Do you want to spend more time writing a thesis that everyone probably won’t care about, or a framework that everyone will use in the future?” Student Jia Yangqing dived headfirst into creating Caffe. Caffe became Jia Yangqing’s masterpiece, and his journey in open source and AI would go even further.

Interview: Liu Ren
Authors: Li Xinxin, Liu Ren, Zhou Yang
Editor: Tang Xiaoyin
Produced by: New Programmer Editorial Team

June 2013, UC Berkeley. 28-year-old Jia Yangqing (see Figure 1) was writing Decaf (the predecessor of Caffe). Three months later, Jia Yangqing would graduate with his Ph.D. At that moment, he was collaborating with Professor Thomas Griffith from Berkeley’s Department of Psychology on a research topic in psychology — how humans form “category” concepts during personal growth. In the research, Jia Yangqing used a probabilistic framework to express human behavior, but the behavioral features extracted from images were too weak to draw complete conclusions.

📷

Figure 1: Jia Yangqing during his Berkeley days

One day, Jia Yangqing saw a paper that won first place in the 2012 ILSVRC competition: “Advances in Neural Information Processing Systems.” It mentioned the deep learning AlexNet model, which used Convolutional Neural Network (CNN) technology to defeat other non-neural network algorithms, requiring only two GPUs instead of Google’s previous 10,000 CPU solution. One machine replaced ten thousand machines, and the error rate dropped from 25% to 15%. This paper caused a splash, shocking the industry. Previously, neural networks had been largely dismissed by the field.

Inspired, Jia Yangqing considered applying the CNN feature extraction technique from the paper to his current psychology project. He contacted Alex Krizhevsky (one of the authors of the AlexNet model) and asked if he could share the AlexNet source code. Alex replied: “Sorry, I’ve started a company and am working on my startup. Due to intellectual property issues, I can’t directly give you the code, but if you run into problems during your research, feel free to ask me anytime.”

Just then, Jia Yangqing received an academic donation from NVIDIA — a K20 GPU (see Figure 2). For a student, “GPUs were very expensive!” So, while writing his doctoral thesis, Jia Yangqing built a machine himself and, in his spare time, reimplemented the AlexNet framework to extract features from images.

📷

Figure 2: The GPU used for the earliest framework development

At this time, Jia Yangqing commuted to his internship at Google by subway. Sitting on the train, he would open his laptop on his lap and squeeze in time to continue writing the framework. As a beginner in GPU programming, getting started was tough, but Jia Yangqing was immersed: “Writing code is probably as addictive as playing video games.” “The time I spent coding went from 20%, to 40%, to 80%… gradually increasing.” At Google, Jia Yangqing drank several cups of coffee every day. “That’s not good,” he thought. He named the framework he was writing Decaf as a reminder to quit coffee.

At that point, Jia Yangqing’s task list included: 1. Ph.D. thesis; 2. Psychology research project; 3. Finding a job; 4. Reimplementing the AlexNet framework. Compared to the first three tasks, he spent far more time on the last one, severely squeezing the time for writing his thesis. He sought advice from his advisor, Trevor Darrell (computer scientist, Berkeley professor). The advisor asked only one question: “Do you want to spend more time writing a thesis that everyone probably won’t care about, or a framework that everyone will use in the future?”

The advisor’s words encouraged Jia Yangqing. He set the thesis aside and plunged into Decaf. “My advisor always taught me to prioritize.”

After Jia Yangqing built the scaffolding and got a small-scale prototype running, he shared Decaf within the Berkeley group for classmates to try. Everyone found it “quite useful.” Evan Shelhamer, Jonathan Long, Jeff Donahue, Sergio Guadarrama, and Jia Yangqing hit it off and decided to form a “core small team.” In addition to their daily research and engineering work, they developed Decaf together. Soon, the “small team” reimplemented the AlexNet model.

Decaf relied on cuda-convnet for training, but through Decaf, they demonstrated that deep learning features could be used for in-depth experiments with learning paradigms. The team thought: why not further develop it into a complete deep learning framework, making it a general and clean AI framework tool?

Because of the blazing speed of GPUs, Jia Yangqing considered renaming Decaf to Caffe. The Berkeley group preferred the name Caffe. So Decaf became Caffe. Two months later, Caffe was complete. Jia Yangqing specially asked his advisor for a budget to buy an iced drip coffee machine and placed it in the lab. From then on, Evan Shelhamer often made coffee for everyone, which meant Jia Yangqing, who had wanted to quit coffee, “never actually quit.” The team even changed the external contact email to caffe-coldpress.

📷

Figure 3: The computer host Jia Yangqing used to develop Caffe

Jia Yangqing faced a difficult question: how should he release Caffe? Should it be like Alex Krizhevsky’s approach — start a company and commercialize it? Or should it be a library purely for research? Or should it be open source? Jia Yangqing was undecided, and the other Caffe developers also had differing opinions.

Jia Yangqing leaned toward open-sourcing Caffe. He reflected on the past few months: if a public deep learning framework had existed, providing access to code and algorithm details, he wouldn’t have had to waste effort reimplementing. Moreover, “as a student, deep down I had the desire to ‘create something and put it on open source.’” “Almost all the code I used at Berkeley was open source.” “Only by making the market bigger can everyone get a piece of the cake.” “Open source doesn’t negate individual technical ability.” “Anyway, I can afford to buy coffee! What more could I ask for?”

How could he get everyone to agree? “That was harder than writing Caffe.” Jia Yangqing decided to talk to each core developer individually. Some were easy to convince; others were more difficult. At times, Jia Yangqing got frustrated and blurted out, “This is my framework, so I should have the final say!” The discussions took seven days. In the end, fortunately, all the developers agreed to open-source Caffe.

In December 2013, Caffe was released on GitHub and officially open-sourced. Alex Krizhevsky was delighted when he heard the news. Jia Yangqing’s advisor suggested putting UC Berkeley’s name on Caffe’s documentation, and Jia Yangqing was happy to do so. “Open-sourcing Caffe under Berkeley’s name made everyone proud — we felt we had brought honor to our alma mater.”

During his internship at Google, Jia Yangqing received a formal offer from Google, waiting only for graduation to start. No longer needing to worry about job hunting, Jia Yangqing completely let loose. He stopped writing his thesis altogether, and the psychology research project he initially worked on fizzled out.

Caffe began attracting users and developers. Thanks to Caffe’s halo, Jia Yangqing met many industry professionals. Two months later, he unexpectedly received an email from NVIDIA. NVIDIA offered to provide computing resources for the Berkeley lab and send engineers to work with them on framework optimization to improve Caffe’s stability. Jia Yangqing agreed to the collaboration, valuing NVIDIA’s strengths on the system side. Over the following year, Caffe, drawing on contributions from all sides, accelerated its development.

Besides his Google work, Jia Yangqing continued maintaining Caffe with his Berkeley colleagues. He began redesigning parts of Caffe to make it more modular and adaptable to various deployment environments. Evan Shelhamer, who had experience in open-source communities, led external collaborations; Jeff Donahue helped Pinterest build a deep learning system; Jonathan Long contributed many new features, including a Python interface… In community building, GitHub and the caffe-users mailing list formed a loose, free organization, managed spontaneously by Caffe users.

Building Caffe gave Jia Yangqing complete project experience from 0 to 1. “Caffe was probably my first C++ project.” Overall, “it was a huge training ground for me — from team development, to promotion, to gathering feedback, to process improvement — I experienced every step.” In May of the second year after leaving Berkeley, Jia Yangqing finally completed his Ph.D. thesis.

📷

A Small-Town Boy Goes to Tsinghua

In 1984, Jia Yangqing was born in Shangyu County, Shaoxing. Both his parents were middle school Chinese teachers. At age one, Jia Yangqing loved listening to stories, and his mother often read picture books to him. By age three, he could recognize two or three hundred characters and often immersed himself in books. At five, his parents took him to the Xinhua Bookstore; Jia Yangqing picked Andersen’s Fairy Tales. His mother asked in surprise: “Can you understand the words in this book?” Jia Yangqing nodded.

Jia Yangqing’s family lived on campus, in a peaceful, orderly life. At 6 a.m., his parents got up to supervise morning self-study, and Jia Yangqing followed suit. In sixth grade, Jia Yangqing transferred from his parents’ school to a central school in Shangyu City. In the new environment, he felt both curious and a bit inferior. He worked harder, trying to prove himself through grades. In the high school entrance exam, he placed third in Shangyu District and entered Chunhui Middle School.

In eighth grade, learning computers became popular. Although the school had a computer lab, Jia Yangqing’s parents spent over 7,000 yuan to buy him a Pentium II computer. He tinkered with the machine, installing various software, playing Minesweeper… Once, he visited a classmate’s home and saw the classmate clicking around a graphical interface with a mouse, demonstrating that he was learning programming. Jia Yangqing thought it was fun. Back home, he figured out how to write a small program in BASIC: entering a year number in a box, and the screen would display the corresponding zodiac animal.

Computers made Jia Yangqing feel “I can create something new.” “Very happy.” But he followed his parents’ advice, keeping his focus on studying for the college entrance exam. In middle school, the Chinese teacher taught the class to write literary critiques. Jia Yangqing chose “On the Poetry Description and Art in The Romance of the Western Chamber” as his topic. The teacher was surprised and told Jia Yangqing’s mother: “Isn’t it too early for the child to read The Western Chamber?” His mother replied, “Let him read, it’s fine.” Since his parents taught Chinese, their home had many literature books. Jia Yangqing often randomly pulled a book from their shelves and flipped through it. He once picked up Qian Zhongshu’s On the Art of Poetry, flipped through it, and of course “couldn’t understand it, so I put it back.” He read History of Western Literature and Homer’s Epics, feeling that “the outside world is different from the small town where I live.” Reading Sonnets, he discovered that Western poetry pays attention to rhyme, similar to the tonal patterns of ancient Chinese poetry. “They can actually corroborate each other — very interesting…” Jia Yangqing was always interested in literature, but he chose science. “Compared to the humanities, science allows you to go further through your own efforts.” Or perhaps he was influenced by the prevailing attitude at the time: “Master science and math, and you’ll never fear the world.”

In high school, Jia Yangqing won first prizes in both the National Physics and Chemistry Olympiads, and a second prize in English comprehensive ability. Math was not his strong suit, so his parents bought him “Hong’en Online” CDs, and he practiced a lot. “If I couldn’t solve a problem, I’d buy an entire workbook and do all the exercises until I mastered the technique.” Gradually, Jia Yangqing’s math scores improved, and he won a second prize in the National Math League.

In 2002, the college entrance exam. Jia Yangqing made a mistake on the last question of the physics exam, losing 27 points! He ended up with 686 points. His dream was Tsinghua’s Computer Science department. After discussing with the Tsinghua admissions counselor, and to be safe, he put “Tsinghua University, Department of Automation” as his first choice. When he opened the acceptance letter and saw the cover read “Tsinghua is the pride of your life,” he was overjoyed.

📷

Embarking on AI Research

The Tsinghua library opened at 8 a.m. Shortly after 7, Jia Yangqing and a few close classmates would already be waiting in line outside. Out of 27 students in the class, Jia Yangqing quickly rose to the top ranks again. For four years as an undergraduate, aside from studying, his life was still studying. In his spare time, he even treated calculus as a research project, working through Demidovich’s Problems in Mathematical Analysis.

Tsinghua’s School of Information Science and Technology consisted of the Department of Automation, the Department of Computer Science, and the Department of Electronic Engineering. The three departments shared the same basic courses, with computer science being more theory and software. “The Department of Automation essentially does two things — stoke boilers and run elevators. Stoking a boiler: you need to quickly raise the temperature, then stabilize it at a high level — that’s what control theory solves. Running an elevator is similar.”

In his junior year, Jia Yangqing took Professor Zhang Changshui’s course Pattern Recognition and Intelligent Systems. Suddenly, he realized that what fascinated him most about AI was — breaking out of inherent experience and exploring possibilities. “That gives you something to do!” Moreover, “for AI algorithms, we can only say, with some accuracy rate, roughly recognize certain things. Many problems are both unknown and unsolvable.” Jia Yangqing developed a strong interest in “machine learning.” “Having machines automatically help people, freeing humans from low-level, repetitive labor, is interesting and meaningful.” In a paper-reading course, Jia Yangqing found an article in Science and read aloud Geoffrey Hinton’s “Reducing the Dimensionality of Data with Neural Networks.” After class, he searched for materials on “neural networks” and self-studied, learning about concepts like the Boltzmann machine… even though at that time, neural networks were in a downturn in AI.

Just before graduation, Jia Yangqing, out of interest, did a course project — identifying individual vehicles in traffic congestion. If you could identify how many individual vehicles there were, you could determine the congestion level of a road section. Jia Yangqing and his classmates stood on every overpass on the Fourth Ring Road, took many photos of vehicles passing below, and manually annotated them. “At that time, deep learning didn’t exist yet; I used classic computer vision methods to see how well I could recognize vehicles.” “I thought this problem was fun and challenging, so I explored it. It wasn’t something you could solve right away. If there was already a correct method, it wouldn’t be interesting.”

In July 2006, “good student” Jia Yangqing graduated from undergraduate and was admitted to a master’s program at Tsinghua without the entrance exam. He followed Professor Zhang Changshui, majoring in Pattern Recognition and Intelligent Systems, officially embarking on the path of AI research.

📷

Delivering 5 International Papers Solo

July 2008, Chicago, an international computer science conference. The weather was hot, but Jia Yangqing, wearing a short-sleeved shirt, was shivering in the freezing air conditioning of the conference hall. His heart was also uneasy. He had to go up on stage multiple times to present 5 international papers in English, 4 of which were from unfamiliar fields, while the audience were professionals from around the world.

Five papers from his lab were accepted, but only Jia Yangqing got a U.S. visa. Helpless, he had to present the others “by rote.” His advisor encouraged him: “Don’t worry, if you give a bad talk, no one will remember. You’re not that important. Just give the talk confidently.” “If you mess up, no one will remember you; if you do well, they’ll say, ‘Hey, that person is pretty good!’” At the conference, the openness of Western peers to actively showcase themselves made a big impact on Jia Yangqing. In domestic labs, people generally kept their heads down and did their research in obscurity, while Westerners very much wanted their work to be seen. They actively approached others, set up roll-up banners beside the podium, and energetically promoted their projects. When they talked about what they were doing, their faces lit up, their eyes sparkled, and they radiated pride. Jia Yangqing was deeply moved. “I learned how to communicate the thinking behind my research to others. At the time, we generally lacked that ability.”

In the summer of 2009, 25-year-old Jia Yangqing graduated from Tsinghua with a master’s degree. He applied to over a dozen foreign universities for Ph.D. programs. UC Berkeley’s Computer Science department offered a full scholarship, and the school’s location in California suited his taste. At that time, AI research was in the early stages of exploring application scenarios: speech recognition, machine translation, object recognition… Career directions were limited to algorithm fields like data scientist or data mining, not yet a separate recruitment category.

📷

Google Internship

At a Berkeley seminar, a peer from a big company approached Jia Yangqing and said: “We really like Caffe. The code even has unit tests! A lot of times, the code written by researchers is really hard to look at, but you guys wrote something decent!” This was thanks to the good habits Jia Yangqing developed while interning at Google and learning to write code.

From late May to August each year, Berkeley had summer break. Most students interned at big companies to gain industrial experience. In the summer of 2011, Jia Yangqing interned at NEC Labs, where he first encountered sparse coding. He built an algorithm that automatically learned the receptive field of each feature, achieving the best accuracy at the time on the CIFAR dataset.

In the summer of his second year of Ph.D., Google invited Jia Yangqing for an interview and then kept him for an internship. Google’s internships had two tracks: product engineering and research. Jia Yangqing was in research, with advisor Han Mei (now director of Ping An Technology Silicon Valley Research Institute). Jia Yangqing worked on image recognition and video understanding in visual images, collaborating with the image search team to build precise recognition models to improve accuracy. Later, his work was integrated into Google Photos’ personal albums.

At Google, Jia Yangqing experienced the extreme efficiency of distributed collaboration among tens of thousands of engineers. Google’s engineering practice processes were comprehensive, with standardized code-writing norms: programs required unit tests written alongside to facilitate later modifications and testing; documentation had to follow standard formats. In a short time, Jia Yangqing’s coding ability improved rapidly. “On one hand, I learned from many open-source projects; on the other hand, I think curiosity is the universal standard for learning. When you see good code, try it yourself, write it multiple times, and improve it continuously.” “Everyone has curiosity — like a child who dares to eat anything because they have no experience with food; the marginal benefit of trying new things is huge, and the joy of discovering delicious food outweighs the occasional taste of mud.” “As you age, you accumulate more experience data, and you need philosophy to replace economics — using faith to force yourself into exploration, rather than always scientifically and rationally learning from past experience and selecting the optimal.”

During his Google internship, Jia Yangqing “learned things while improving his life.” The Google cafeteria offered abundant food, including a dessert called “Ten Pounds” — a small cake implying that after a year at Google, you’d gain 10 pounds. Jia Yangqing used his internship salary to buy a new car. The good work habits he developed at Google left their mark in Caffe’s code.

In 2013, Jia Yangqing graduated from Berkeley with a Ph.D. in computer science.

📷

From TensorFlow to PyTorch

Jia Yangqing went to a conference in Spain and caught a bad cold. Late at night, he went to a pharmacy to buy medicine. Not knowing Spanish, he opened Google Translate’s camera translation feature, scanned along the shelves, and found ibuprofen. In an emergency, using a feature he helped develop to solve a problem felt “strange and wonderful.” The OCR algorithm Google acquired was originally simple and couldn’t recognize complex fonts or text; cloud recognition was slow. Jia Yangqing and the algorithm’s author implemented a deep learning OCR model on a mobile phone for the first time.

In 2013, Jia Yangqing joined Google Brain (in April 2023, Google Brain merged with DeepMind to form Google DeepMind). Two years later, he became a founding team member of TensorFlow. “Most of the first-generation TensorFlow authors are still at Google. The second-generation framework was more thorough and complete, widely adopted by Google’s products.” TensorFlow was open-sourced by Google and became one of the highest-starred projects on GitHub. At Google, Jia Yangqing found his comparative advantage. “I started with AI research, so I share a common language with scientists; on the engineering side, engineers think I write good code.” “I can bridge communication and collaboration between the two sides.”

In 2016, Jia Yangqing joined Facebook (now Meta). Facebook needed to build an AI platform that supported all products: advertising, Feed, search recommendation, image recognition, natural language processing, mixed reality, etc. Jia Yangqing’s team of 4 people was small but efficient. They developed Caffe2 on top of Caffe. Around the time Caffe2 was released, “augmented reality and virtual reality” suddenly became popular. Caffe2 was embedded on mobile devices in just two months. Mark Zuckerberg was very happy and personally posted a status update announcing the artist style transfer feature (see Figure 3). This was the first time a deep learning network was deployed on over 1 billion mobile phones.

📷

Figure 3: Mark Zuckerberg personally announced the artist style transfer feature

While Jia Yangqing was working on Caffe2, PyTorch, led by the Facebook AI Research lab in New York, achieved success. In 2018, under Jia Yangqing’s leadership, Caffe2’s backend, PyTorch’s frontend, and the ONNX standard were combined into a complete framework, named PyTorch 1.0. “If TensorFlow is like a large and complex combine harvester, PyTorch is more like a flexible and convenient bicycle.”

📷

Q&A Transcript

Liu Ren: What breakthroughs drove machine translation?

Jia Yangqing: Previously, machine translation relied on establishing grammatical rules. But the syntactic differences between two languages are huge, and it’s impossible to exhaustively define grammatical rules by hand. Now, using neural networks and collecting data from the internet, we train neural networks to gradually optimize and improve their accuracy.

Liu Ren: What’s the difference between rule-based and neural network approaches?

Jia Yangqing: When programmers write rules, they fall into an endless maze of rules, with exceptions upon exceptions — an infinite problem. With neural networks, you solve problems in a fuzzy way using fuzziness, then gradually improve prediction accuracy by training on data.

Liu Ren: What’s the difference between deep learning and reinforcement deep learning?

Jia Yangqing: Reinforcement deep learning refers to how to map future rewards or penalties back to the present.

Liu Ren: In which direction will AI compete with humans for jobs?

Jia Yangqing: Replacing simple, repetitive human labor is a good thing. The time freed up allows people to think about more possibilities. Da Vinci has a famous painting called Virgin of the Rocks. The main figure of the Virgin was painted by Da Vinci, while the background flowers and rocks were painted by his assistants. Even Da Vinci needed assistants. Today, many painters paint the main subject and find assistants to fill in the background. They could also let AI fill in the background. Just like Da Vinci’s assistants, AI can improve the efficiency of painters.

Liu Ren: Where does the gap between China and the U.S. lie in technology?

Jia Yangqing: Curiosity. Our most talented people are focused on execution and solving specific problems. Westerners prefer to create new things. I have to admit that Americans have had food security for a long time, so they play more. When you play more, you always come up with something new. In many areas, we need to catch up, but you can’t rush it.

@liuren: https://x.com/liuren/status/2069266318747165146

Before ChatGPT, Jia Yangqing Set Aside His Doctoral Thesis to Write Caffe, the AI Framework That Changed Everything

Similar Articles

@SaitoWu: https://x.com/SaitoWu/status/2052967845626290326

@seclink: https://x.com/seclink/status/2056711091129118741

@dashen_wang: https://x.com/dashen_wang/status/2062318606357303376

@berryxia: YC CEO Garry's knowledge compound interest effect is like a snowball! The system is open-source and free, with clear logic. Highly recommended! Here's the real reason Garry Tan (YC CEO) was coding until 2 AM! AI has turned him back into a builder...

@RealCodedAlpha: https://x.com/RealCodedAlpha/status/2064921935507837260

Submit Feedback

Similar Articles

@SaitoWu: https://x.com/SaitoWu/status/2052967845626290326

@seclink: https://x.com/seclink/status/2056711091129118741

@dashen_wang: https://x.com/dashen_wang/status/2062318606357303376

@berryxia: YC CEO Garry's knowledge compound interest effect is like a snowball! The system is open-source and free, with clear logic. Highly recommended! Here's the real reason Garry Tan (YC CEO) was coding until 2 AM! AI has turned him back into a builder...

@RealCodedAlpha: https://x.com/RealCodedAlpha/status/2064921935507837260