pytorch

Tag

Cards List
#pytorch

@PyTorch: The inaugural PyTorch Meetup Singapore brought together engineers, researchers, and community builders to talk about ev…

X AI KOLs Following · 2026-06-12 Cached

The inaugural PyTorch Meetup Singapore brought together AI practitioners for technical talks on vLLM updates, sovereign intelligence, and open-source exchange.

0 favorites 0 likes
#pytorch

@yihong0618: At noon today, I read through an article by an older brother in order. Four years ago, he was still learning step by step from Andrew Ng's course. At the end of one article, he wrote this passage. I never expected that four years later, he would truly become a research giant publishing papers in top journals. Quite emotional. https://zhouyifan.net/2022/05/31/20220531-styletransfer/…

X AI KOLs Timeline · 2026-06-12 Cached

The author reflects on a senior's journey from following Andrew Ng's courses four years ago to publishing papers in top journals today, and cites a blog post explaining style transfer with a PyTorch implementation.

0 favorites 0 likes
#pytorch

@PyTorch: Enable smarter, longer-thinking agents Scale agentic AI and reinforcement learning by shortening CPU execution time, in…

X AI KOLs Following · 2026-06-11 Cached

NVIDIA introduces the Vera CPU with a neural branch predictor to accelerate agentic AI and reinforcement learning workloads by reducing CPU execution time and increasing throughput in AI factories.

0 favorites 0 likes
#pytorch

Profiling in PyTorch (Part 2): From nn.Linear to a Fused MLP

Hugging Face Blog · 2026-06-11 Cached

This blog post continues the profiling in PyTorch series, exploring nn.Linear, MLP blocks, and fusion techniques using Triton kernels to optimize performance.

0 favorites 0 likes
#pytorch

TorchCodec 0.14: HDR Video Decoding for CPU and CUDA, and Fast Wav Decoder

Hacker News Top · 2026-06-10 Cached

TorchCodec 0.14 adds HDR video decoding for CPU and CUDA, along with a fast WAV decoder, enabling efficient conversion of video and audio data into PyTorch tensors for ML workflows.

0 favorites 0 likes
#pytorch

Siri AI at WWDC 2026

Simon Willison's Blog · 2026-06-08 Cached

Apple announced next-generation Siri AI features at WWDC 2026, including a custom Gemini-derived model and a new Core AI library with PyTorch integration, running on NVIDIA GPUs in Google Cloud within Private Cloud Compute.

0 favorites 0 likes
#pytorch

@QingQ77: Decouple Alibaba DAMO Academy's ZipEnhancer noise reduction model from the ModelScope pipeline and package it as a high-performance FastAPI denoising service. https://github.com/gyj1201/zipEnhancer… Alibaba DAMO Academy's Z…

X AI KOLs Timeline · 2026-06-08 Cached

This project decouples Alibaba DAMO Academy's ZipEnhancer noise reduction model from the ModelScope pipeline, rewrites the inference logic in pure PyTorch, and packages it as a FastAPI service. It supports FP16 half-precision and long audio segmentation, providing multiple noise reduction model switching and API interfaces.

0 favorites 0 likes
#pytorch

An Implementation of NanoQuant: A flexible binary quantization method

Reddit r/LocalLLaMA · 2026-06-08

NanoQuant is a flexible binary quantization method that compresses dense transformers to sub-1-bit per weight. This repository provides a PyTorch implementation, still a work in progress, capable of quantizing models like Qwen3-0.6B and Qwen3-4B.

0 favorites 0 likes
#pytorch

@DanKornas: A better way to study Deep Learning with PyTorch Live Course: follow the full YouTube course arc, not scattered clips. …

X AI KOLs Timeline · 2026-06-05 Cached

A curated guide to studying deep learning with PyTorch via a full YouTube live course series, covering topics from tensors to GANs, organized into six parts.

0 favorites 0 likes
#pytorch

@_rohit_tiwari_: Builds GPT-like LLMs from scratch in PyTorch > Breaks the LLM architecture into simple parts. > Beginner friendly. > Fu…

X AI KOLs Timeline · 2026-06-05 Cached

A beginner-friendly, hands-on GitHub repository that breaks down GPT-like LLM architecture into simple parts, with 10 Jupyter notebooks covering tokenization, attention, transformer blocks, and a mini GPT implementation in PyTorch.

0 favorites 0 likes
#pytorch

Hi Reddit, I posted my Build Your Own LLM workshop to Youtube (GPT2 & Qwen3.6 style)

Reddit r/LocalLLaMA · 2026-06-05 Cached

Justin Angel released a complete YouTube workshop teaching you how to build your own large language model from scratch (based on GPT-2 and Qwen3.6 style), covering Transformer architecture, training pipeline, and providing Excel manual operations and Python/PyTorch code practice, with no prerequisites in math or ML.

0 favorites 0 likes
#pytorch

@PyTorch: More details about the tutorial https://pldi26.sigplan.org/details/pldi-2026-tutorials/1/Writing-Performance-Portable-K…

X AI KOLs Following · 2026-06-04 Cached

Helion is a Python DSL that compiles to optimized Triton code for performance-portable GPU kernels. This tutorial at PLDI 2026 covers Helion's architecture, autotuning, and CuteDSL backend.

0 favorites 0 likes
#pytorch

@PyTorch: On Monday, June 15, PyTorch Foundation project Helion is hosting a Helion DSL Tutorial at PLDI 2026 (47th ACM SIGPLAN C…

X AI KOLs Following · 2026-06-04 Cached

The PyTorch Foundation project Helion is hosting a Helion DSL Tutorial at PLDI 2026 in Denver. It's an interactive workshop for compiler researchers, kernel authors, and ML systems engineers to write, autotune, and run Helion kernels.

0 favorites 0 likes
#pytorch

@DanKornas: Stop learning LLMs from disconnected tutorials. LLM from Scratch is a hands-on PyTorch curriculum for builders who want…

X AI KOLs Timeline · 2026-06-02 Cached

A hands-on PyTorch curriculum that teaches LLM training from transformer basics through fine-tuning and alignment, including RLHF and GRPO.

0 favorites 0 likes
#pytorch

What I learned building a debugger for PyTorch training loops and how it changed how I think about failure diagnosis [D]

Reddit r/MachineLearning · 2026-05-30

The author shares lessons from building NeuralDBG, an open-source debugger for PyTorch training loops that detects localized failures like vanishing/exploding gradients by monitoring per-layer gradient norm transitions instead of global loss. Practical code snippets and community questions are included.

0 favorites 0 likes
#pytorch

Profiling in PyTorch (Part 1): A Beginner's Guide to torch.profiler

Hugging Face Blog · 2026-05-29 Cached

A beginner-friendly guide to using PyTorch's torch.profiler for profiling and optimizing neural network operations, starting with matrix multiplication and bias addition. It explains how to read profiler traces and understand CPU/GPU interactions.

0 favorites 0 likes
#pytorch

@PyTorch: Thrilled to announce EAGLE 3.1 - the next evolution of speculative decoding from @EagleCorp, developed by @hongyangzh, …

X AI KOLs Following · 2026-05-27

EAGLE 3.1, the next evolution of speculative decoding, introduces new FC normalization for improved efficiency, developed by EagleCorp in collaboration with PyTorch, vLLM, and TorchSpec.

0 favorites 0 likes
#pytorch

@PyTorch: Model Optimization and Post-Training Quantization Model quantization is an effective method to reduce VRAM usage and im…

X AI KOLs Following · 2026-05-26 Cached

This post from NVIDIA explains how to use the NVIDIA Model Optimizer library to quantize a CLIP model to FP8 using post-training quantization, reducing VRAM usage and improving inference performance on consumer GPUs.

0 favorites 0 likes
#pytorch

@PyTorch: PyTorch member Meta just open-sourced a GPU kernel that makes attention 2.3x faster on NVIDIA Blackwell. TLX Block Atte…

X AI KOLs Following · 2026-05-26 Cached

Meta open-sources TLX Block Attention, a warp-specialized Triton kernel that achieves 2.3x speedup for block-diagonal self-attention on NVIDIA Blackwell GPUs, with up to 3.5x speedup when fused with rotary embeddings.

0 favorites 0 likes
#pytorch

Thermocompute constant time inference [P]

Reddit r/MachineLearning · 2026-05-24 Cached

Thermocompute is a PyTorch emulator for thermodynamic probabilistic computing that enables neural network layers to achieve constant modeled physical time inference by exploiting parallel thermodynamic substrate, with immediate GPU-usable stochastic layers.

0 favorites 0 likes
← Previous
Next →
← Back to home

Submit Feedback