Tag
Proposes SNAP-FM, a method that leverages sparse GPU nonlinear optimization to accelerate constraint projection in physics-constrained generative modeling, achieving faster inference while preserving exact physical constraint satisfaction.
This paper introduces a fully GPU-based workflow that accelerates data generation and training of neural emulators for hypersonic flows, using a differentiable solver (JAX-Fluids) and residual-based refinement to improve physical consistency and reliability beyond training distribution.
Researchers propose Gaussian Point Splatting, a stochastic rendering method using pixel-sized opaque points and 64-bit GPU atomics that scales to hundreds of millions of Gaussians in real time. The method, accepted at SIGGRAPH 2026, employs hierarchical culling and parallel programming primitives to achieve even workload distribution with only minor noise differences compared to original Gaussian splatting.
The CuTe and CuTe DSL articles provide minimal code snippets with Modal Notebooks for hands-on learning.
AMD's ROCm 7.13 tech preview adds optimizations for Strix Halo (Ryzen AI Max 300) and open-sources the ROCprof Trace Decoder.
This paper introduces DMI-Lib, a high-speed deep model inspector that enables efficient internal observability for LLM inference by decoupling monitoring from the inference hot path.
Jungle Grid has open-sourced an MCP server designed to allow AI agents to autonomously estimate, submit, and monitor GPU workloads for inference and training tasks.
Antirez reports benchmarking DS4 inference on the DGX Spark (GB10), noting 12 tokens/sec generation speed and high prefill performance, with plans to merge the codebase once mature.
Modular announces the Mojo 1.0 Beta, a high-performance programming language that combines Python's ease of use with the speed of compiled languages for AI and systems programming.
cuda-oxide is an experimental Rust-to-CUDA compiler that allows developers to write safe, idiomatic Rust GPU kernels that compile directly to PTX.
cuda-oxide is an experimental Rust-to-CUDA compiler backend released by NVIDIA, enabling pure Rust GPU kernel development without foreign language bindings.
This article highlights how NVIDIA GPUs and AI models like Morpheus are enabling astronomers at UC Santa Cruz to process massive datasets from the James Webb Space Telescope, accelerating the discovery and classification of early universe galaxies.
CuPy is a GPU-accelerated library that serves as a drop-in replacement for NumPy/SciPy, enabling efficient array operations on NVIDIA CUDA and AMD ROCm platforms.