cuda

#cuda

You can do CUDA inference on an Apple Silicon Mac with PCI Passthrough

Reddit r/LocalLLaMA ↗ · 2d ago Cached

This article explores the feasibility of using an external NVIDIA RTX 5090 GPU with an Apple Silicon Mac via Thunderbolt for CUDA inference and gaming, covering methods like tinygrad eGPU drivers and PCI passthrough to a Linux VM.

0 favorites 0 likes

#cuda

@pauliusztin_: I just found one of the most useful resources for understanding GPUs. No more jumping between random docs, PDFs, and fo…

X AI KOLs Following ↗ · 2d ago Cached

Modal Labs has released an open-source, interlinked GPU glossary that consolidates fragmented NVIDIA documentation, CUDA details, and compiler flags into a single navigable resource for engineers optimizing LLM training and inference.

0 favorites 0 likes

#cuda

@npashi: Finally able to talk about what I've been heads-down on for 6 months at @nvidia We just open-sourced cuda-oxide — an ex…

X AI KOLs Timeline ↗ · 2d ago Cached

NVIDIA has open-sourced cuda-oxide, an experimental rustc backend that allows developers to write CUDA kernels directly in pure Rust without DSLs, FFI, or source-to-source translation.

0 favorites 0 likes

#cuda

The cuda-oxide Book

Lobsters Hottest ↗ · 3d ago Cached

cuda-oxide is an experimental Rust-to-CUDA compiler that allows developers to write safe, idiomatic Rust GPU kernels that compile directly to PTX.

0 favorites 0 likes

#cuda

cuda-oxide: cuda-oxide is an experimental Rust-to-CUDA compiler

Lobsters Hottest ↗ · 3d ago Cached

cuda-oxide is an experimental Rust-to-CUDA compiler backend released by NVIDIA, enabling pure Rust GPU kernel development without foreign language bindings.

0 favorites 0 likes

#cuda

@vivekgalatage: Roadmap from Cornell - Introduction to CUDA http://cvw.cac.cornell.edu/cuda-intro

X AI KOLs Timeline ↗ · 4d ago Cached

This article introduces the Cornell Virtual Workshop's free online tutorial on basic CUDA programming using C, covering prerequisites and additional resources.

0 favorites 0 likes

#cuda

C++ CuTe / CUTLASS vs CuTeDSL (Python) in 2026 — what should new GPU kernel / LLM inference engineers actually learn?[D]

Reddit r/MachineLearning ↗ · 2026-04-20

Discussion of the shift in GPU kernel engineering from C++ CuTe/CUTLASS to NVIDIA's Python-based CuTeDSL, questioning whether new engineers should learn legacy C++ templates or prioritize the emerging stack for LLM inference work.

0 favorites 0 likes

#cuda

NVIDIA GTC 2026: Live Updates on What’s Next in AI

NVIDIA Blog ↗ · 2026-03-20 Cached

NVIDIA GTC 2026 keynote highlights the 20th anniversary of CUDA, introduces DLSS 5 with AI-powered neural rendering, and surveys NVIDIA's accelerated computing platforms across automotive, healthcare, robotics, and other sectors. CEO Jensen Huang projects $1 trillion in computing revenue from 2025-2027 driven by massive AI demand.

0 favorites 0 likes

#cuda

Introducing Triton: Open-source GPU programming for neural networks

OpenAI Blog ↗ · 2021-07-28 Cached

OpenAI releases Triton 1.0, an open-source Python-like GPU programming language that enables researchers without CUDA experience to write highly efficient GPU kernels, achieving performance on par with expert-written CUDA code in as few as 25 lines.

0 favorites 0 likes

#cuda

deepseek-ai/DeepGEMM

GitHub Trending (daily) ↗ · 2026-04-21 Cached

DeepSeek releases DeepGEMM, a high-performance CUDA kernel library for LLM computation primitives including FP8/FP4/BF16 GEMMs, fused MoE with overlapped communication, and MQA scoring, compiled at runtime via JIT with no installation-time CUDA compilation required. The library achieves up to 1550 TFLOPS on H800 and matches or exceeds expert-tuned libraries across various matrix shapes.

0 favorites 0 likes

cuda

Submit Feedback