@Jolyne_AI: GitHub Open-Source CUDA System Tutorial: LeetCUDA (From Beginner to Advanced, All in One) 200+ Progressive CUDA Kernel Practice Problems, with a companion HGEMM library achieving 98%–100% of cuBLAS performance. Plus 100+ articles on high-performance computing...
Summary
LeetCUDA is an open-source CUDA system tutorial on GitHub, featuring over 200 progressive CUDA Kernel practice problems and 100+ high-performance computing blog posts. Its companion HGEMM library achieves 98%–100% of cuBLAS performance, making it ideal for CUDA beginners and AI engineers to systematically master CUDA optimization.
View Cached Full Text
Cached at: 06/29/26, 06:23 AM
📚 LeetCUDA: Modern CUDA Learn Notes with PyTorch for Beginners 🐑
🔥🔥 PR Welcome: Add Your Kernel to LeetCUDA! Let’s make it Awesome together! 🎉🎉
Similar Articles
https://www.youtube.com/watch?v=qRLyoP8zOyQ
A technical article/book summary on writing custom CUDA kernels to overcome deep learning framework bottlenecks, covering the full journey from fundamentals to optimization.
@neural_avb: TIL about "GPU Mode" They got a youtube series to learn CUDA. Plus a github repo with slides/notebooks. Some lectures a…
GPU Mode is a learning resource featuring a YouTube series, GitHub repo with slides/notebooks, and a practice website for mastering CUDA programming.
@0x0SojalSec: Fuck your paid courses, Master GPU engineering for AI systems. From foundational books and CUDA/ROCm programming to low…
A curated list of resources for mastering GPU engineering for AI systems, covering CUDA, ROCm, optimization tools, multi-GPU orchestration, and distributed training.
Every AI researcher should grasp inference acceleration—CUDA Graph is the heart of vLLM's GPU efficiency
A tweet urging AI researchers to learn inference-acceleration basics and spotlighting CUDA Graph as the key to vLLM’s GPU utilization.
CUDA Books
A curated list of major books on CUDA programming covering beginner to advanced topics, including C++ and Python, with focus on practical resources for NVIDIA GPU parallel computing.