gpu-computing

#gpu-computing

SNAP-FM: Sparse Nonlinear Accelerated Projection for Physics-Constrained Generative Modeling

arXiv cs.LG ↗ · 2d ago Cached

Proposes SNAP-FM, a method that leverages sparse GPU nonlinear optimization to accelerate constraint projection in physics-constrained generative modeling, achieving faster inference while preserving exact physical constraint satisfaction.

0 favorites 0 likes

#gpu-computing

A fully GPU-based workflow for building physics emulators of hypersonic flows

arXiv cs.LG ↗ · 2026-06-15 Cached

This paper introduces a fully GPU-based workflow that accelerates data generation and training of neural emulators for hypersonic flows, using a differentiable solver (JAX-Fluids) and residual-based refinement to improve physical consistency and reliability beyond training distribution.

0 favorites 0 likes

#gpu-computing

Gaussian Point Splatting

Hacker News Top ↗ · 2026-06-04 Cached

Researchers propose Gaussian Point Splatting, a stochastic rendering method using pixel-sized opaque points and 64-bit GPU atomics that scales to hundreds of millions of Gaussians in real time. The method, accepted at SIGGRAPH 2026, employs hierarchical culling and parallel programming primitives to achieve even workload distribution with only minor noise differences compared to original Gaussian splatting.

0 favorites 0 likes

#gpu-computing

@charles_irl: The CuTe and CuTe DSL articles include minimal code snippets illustrating core principles and basic usage. These snippe…

X AI KOLs Following ↗ · 2026-05-26

The CuTe and CuTe DSL articles provide minimal code snippets with Modal Notebooks for hands-on learning.

0 favorites 0 likes

#gpu-computing

ROCm 7.13 nightly adds strix halo optimizations

Reddit r/LocalLLaMA ↗ · 2026-05-17

AMD's ROCm 7.13 tech preview adds optimizations for Strix Halo (Ryzen AI Max 300) and open-sources the ROCprof Trace Decoder.

0 favorites 0 likes

#gpu-computing

Enabling Performant and Flexible Model-Internal Observability for LLM Inference

arXiv cs.LG ↗ · 2026-05-13 Cached

This paper introduces DMI-Lib, a high-speed deep model inspector that enables efficient internal observability for LLM inference by decoupling monitoring from the inference hot path.

0 favorites 0 likes

#gpu-computing

Open-sourced our MCP server for GPU workload execution looking for feedback

Reddit r/AI_Agents ↗ · 2026-05-11

Jungle Grid has open-sourced an MCP server designed to allow AI agents to autonomously estimate, submit, and monitor GPU workloads for inference and training tasks.

0 favorites 0 likes

#gpu-computing

@antirez: DS4 running on DGX Spark (GB10 / CUDA), private branch for now. 12 tokens/sec, the memory bandwidth is limited in this …

X AI KOLs Timeline ↗ · 2026-05-10 Cached

Antirez reports benchmarking DS4 inference on the DGX Spark (GB10), noting 12 tokens/sec generation speed and high prefill performance, with plans to merge the codebase once mature.

0 favorites 0 likes

#gpu-computing

Mojo 1.0 Beta

Hacker News Top ↗ · 2026-05-08 Cached

Modular announces the Mojo 1.0 Beta, a high-performance programming language that combines Python's ease of use with the speed of compiled languages for AI and systems programming.

0 favorites 0 likes

#gpu-computing

The cuda-oxide Book

Lobsters Hottest ↗ · 2026-05-08 Cached

cuda-oxide is an experimental Rust-to-CUDA compiler that allows developers to write safe, idiomatic Rust GPU kernels that compile directly to PTX.

0 favorites 0 likes

#gpu-computing

cuda-oxide: cuda-oxide is an experimental Rust-to-CUDA compiler

Lobsters Hottest ↗ · 2026-05-08 Cached

cuda-oxide is an experimental Rust-to-CUDA compiler backend released by NVIDIA, enabling pure Rust GPU kernel development without foreign language bindings.

0 favorites 0 likes

#gpu-computing

Making Sense of the Early Universe

NVIDIA Blog ↗ · 2026-04-23 Cached

This article highlights how NVIDIA GPUs and AI models like Morpheus are enabling astronomers at UC Santa Cruz to process massive datasets from the James Webb Space Telescope, accelerating the discovery and classification of early universe galaxies.

0 favorites 0 likes

#gpu-computing

cupy/cupy

GitHub Trending (daily) ↗ · 6d ago Cached

CuPy is a GPU-accelerated library that serves as a drop-in replacement for NumPy/SciPy, enabling efficient array operations on NVIDIA CUDA and AMD ROCm platforms.

0 favorites 0 likes

gpu-computing

Submit Feedback