Tag
SOLAR is a framework that automatically derives validated speed-of-light performance bounds from PyTorch and JAX source code using an LLM frontend and deterministic analysis, enabling headroom analysis and optimization insights for deep learning workloads.
Kuma is a compiler/runtime that compiles exported PyTorch models into self-contained WebGPU executables, enabling direct browser inference without Python or server dependencies.
TokenSpeed-Kernel is a portable, high-performance kernel system for LLM inference that enables zero vendor-specific model code and supports multiple GPU architectures, achieving up to 3.6x higher throughput on AMD MI355X.
PyTorchCon North America will take place in San Jose, California on October 20-21, 2026, with early bird registration available until July 31.
A detailed guide to building a correct PyTorch training loop, highlighting common mistakes and proper ordering of operations.
WeightsLab is an open-source, PyTorch-native tool that allows teams to pause training, inspect live loss signals, and catch data issues like mislabels and class imbalance before they affect model performance. It is designed for computer vision engineers working with images, videos, and LiDAR point clouds.
A simplified open-source PyTorch implementation of FLUX diffusion transformers with verifiable line-by-line source mappings, designed for educational purposes.
Sharing a machine learning systems notes repo on GitHub, covering distributed computing, parallelization, quantization, and PyTorch internals related to LLM training and inference. Suitable for learners interested in ML systems.
This blog explores using LLM-guided autotuning to accelerate kernel configuration search in PyTorch's Helion DSL, replacing the slower Likelihood-Free Bayesian Optimization approach.
A detailed tutorial on supervised fine-tuning (SFT) for training AI agents, built from scratch in pure PyTorch using Qwen3-0.6B, explaining the mechanics of next-token prediction and label masking.
LMCache is a KV cache management layer that accelerates large model inference and reduces VRAM consumption by caching and reusing KV cache. It has received 9.2K stars and joined the PyTorch Foundation, and is integrated by NVIDIA Dynamo.
MiniT2I is a minimalist direct-RGB text-to-image generator using a pixel-space MM-JiT denoiser with flow matching and frozen FLAN-T5-Large text tokens, with open-source JAX/Flax and PyTorch implementations released along with checkpoints.
The article explains the torch.compile stack in PyTorch, detailing steps from API to Dynamo, FX graph, ATen ops, and Torch Inductor for JIT compilation.
The PyTorch Foundation is calling for nominations for its Ambassador Program, which supports community leaders organizing events, creating content, mentoring, and contributing to open source. Applications are open until June 18, 2026, with a focus on underrepresented regions.
This tutorial from NVIDIA walks through the end-to-end workflow of converting an FP8-quantized PyTorch model into a TensorRT inference engine for production deployment, covering ONNX export and performance profiling.
Inflect-Nano-v1 is a tiny English text-to-speech model with 4.63M total inference parameters, including its vocoder, designed for local, efficient speech synthesis experiments.
privacy-filter.cpp outperforms the PyTorch implementation by approximately 1.6x to 18x in performance.
A beginner-friendly GitHub repository covering PyTorch fundamentals, including tensor initialization, operations, indexing, and reshaping, with over 900 stars.
A GitHub open-source project that implements the complete GPT training pipeline from scratch, including data preprocessing, pretraining, SFT, and RLHF post-training, all based on native PyTorch. Ideal for developers who want to deeply understand the Transformer architecture.
The inaugural PyTorch Meetup Singapore brought together AI practitioners for technical talks on vLLM updates, sovereign intelligence, and open-source exchange.