gpu

#gpu

@Modular: HTTP routing has been a solved problem for many years. Then came Large Language Models. Their backends aren't interchan…

X AI KOLs Following ↗ · yesterday Cached

Modular published a blog post explaining why traditional HTTP routing doesn't work for LLM inference workloads. The article describes how their distributed inference framework handles stateful, heterogeneous GPU pods with KV caches, specialized prefill/decode backends, and conversation-level routing that traditional stateless routing algorithms cannot address.

0 favorites 0 likes

#gpu

@oliviscusAI: Someone just open-sourced a desktop app that generates 3D models from images and runs 100% locally. It's called Modly. …

X AI KOLs Timeline ↗ · yesterday

Modly is an open-source desktop app that generates fully textured 3D meshes from images, running 100% locally on your GPU with pluggable AI model extensions.

0 favorites 0 likes

#gpu

Meta's Optimized RecSys Inference (58 minute read)

TLDR AI ↗ · yesterday Cached

Meta's In-Kernel Broadcast Optimization (IKBO) eliminates redundant user-embedding broadcast in RecSys inference via kernel-model-system co-design, delivering up to 2/3 latency reduction and ~4x speedup on H100 GPUs, and serving as the backbone for the Meta Adaptive Ranking Model.

0 favorites 0 likes

#gpu

AMD to release slottable GPU

Reddit r/LocalLLaMA ↗ · 2d ago

AMD is set to release new slottable PCIe-based Instinct GPUs aimed at the enterprise AI market, offering a potential new hardware option for local LLM deployment.

0 favorites 0 likes

#gpu

AMD Intros Instinct MI350P Accelerator: CDNA 4 Comes to PCIe Cards

Reddit r/LocalLLaMA ↗ · 2d ago

AMD introduces the Instinct MI350P accelerator featuring CDNA 4 architecture in a PCIe form factor, though pricing and availability details are not yet announced.

0 favorites 0 likes

#gpu

Boosting multimodal inference performance by >10% with a single Python dict

Hacker News Top ↗ · 3d ago Cached

Modal engineers profiled SGLang's scheduler on multimodal VLM workloads and found that replacing expensive GPU memory bookkeeping with a simple Python dict cache improved throughput by 16% and reduced latency by over 13%, with the fix merged into SGLang v0.5.10.

0 favorites 0 likes

#gpu

@anyscalecompute: Most coding agents can write Python, but that does not mean they know how to deploy Ray workloads. They still miss GPU …

X AI KOLs Following ↗ · 2026-04-22 Cached

Anyscale releases Agent Skills to help coding agents correctly deploy Ray workloads with proper GPU memory handling and up-to-date APIs.

0 favorites 0 likes

#gpu

@sama: Here is a manga made by ChatGPT Images 2.0 of @gabeeegoooh and me looking for more GPUs:

X AI KOLs ↗ · 2026-04-21 Cached

Sam Altman shares a manga created with ChatGPT Images 2.0 depicting the GPU hunt, hinting at an upcoming image-generation upgrade.

0 favorites 0 likes

#gpu

@vllm_project: We just shipped a major redesign of http://recipes.vllm.ai. "How do I run model X on hardware Y for task Z?" now has a …

X AI KOLs Following ↗ · 2026-04-21

vLLM launched a redesigned recipes site that turns any HuggingFace model URL into a ready-to-run inference recipe for specific hardware and tasks.

0 favorites 0 likes

#gpu

@gabriel1: 100k h100 datacenter ballpark numbers, so you know the magnitudes rounded to numbers that are easy for quick mental mat…

X AI KOLs Following ↗ · 2026-04-21 Cached

A quick breakdown of ballpark numbers for a 100k H100 GPU datacenter, covering GPU costs (~$3B), full datacenter build (~$5B), power consumption (~0.2GW), and annual energy costs (~$50M).

0 favorites 0 likes

#gpu

@agupta: some ideas are much clearer when you can use coding agents to show a proof of concept. eg I hadn’t really understood ho…

X AI KOLs Following ↗ · 2026-04-20 Cached

A tweet highlights how coding agents can clarify complex ideas, using GPU vs NPU memory competition on devices as an example demonstrated through code.

0 favorites 0 likes

#gpu

Guys hate to break it to you... we don’t have the hardware for AGI

Reddit r/artificial ↗ · 2026-04-20

An opinion piece arguing that current GPU hardware is fundamentally insufficient for achieving AGI and that computational architecture would need to be completely redesigned.

0 favorites 0 likes

#gpu

@Prince_Canuma: My home compute for MLX and research: • M3 Ultra — 512GB (sponsored by community + @wai_protocol) • RTX PRO 6000 — 96GB…

X AI KOLs Timeline ↗ · 2026-04-19

A researcher shares their home compute setup for MLX and AI research, featuring M3 Ultra with 512GB, RTX PRO 6000 with 96GB, and M3 Max with 96GB for model porting and stress testing.

0 favorites 0 likes

#gpu

Modern Rendering Culling Techniques

Hacker News Top ↗ · 2026-04-19 Cached

A technical blog post by a Saints Row: The Third Remastered developer explaining modern rendering culling techniques including distance culling, backface culling, and frustum culling, with practical insights for game developers working on real-time graphics optimization.

0 favorites 0 likes

#gpu

vllm-project/vllm v0.19.1

GitHub Releases Watchlist ↗ · 2026-04-18 Cached

vLLM v0.19.1 release - a fast and easy-to-use open-source library for LLM inference and serving with state-of-the-art throughput, supporting 200+ model architectures and diverse hardware including NVIDIA/AMD GPUs and CPUs.

0 favorites 0 likes

#gpu

Advancing Open Source AI, NVIDIA Donates Dynamic Resource Allocation Driver for GPUs to Kubernetes Community

NVIDIA Blog ↗ · 2026-03-24 Cached

NVIDIA is donating its Dynamic Resource Allocation (DRA) Driver for GPUs to the Cloud Native Computing Foundation (CNCF) and Kubernetes community, moving it from vendor-governed to community-owned. The donation aims to simplify GPU resource management in Kubernetes for AI workloads and includes GPU support for Kata Containers through collaboration with CNCF's Confidential Containers community.

0 favorites 0 likes

#gpu

AMD and OpenAI announce strategic partnership to deploy 6 gigawatts of AMD GPUs

OpenAI Blog ↗ · 2025-10-06 Cached

AMD and OpenAI announce a strategic partnership to deploy 6 gigawatts of AMD Instinct GPUs, with initial 1 gigawatt deployment starting in H2 2026. AMD will issue OpenAI warrants for up to 160 million shares, with vesting tied to deployment milestones and financial targets.

0 favorites 0 likes

#gpu

Introducing Stargate Norway

OpenAI Blog ↗ · 2025-07-31 Cached

OpenAI announces Stargate Norway, its first European AI data center initiative in Narvik, planned to deliver 100,000 NVIDIA GPUs by end of 2026 with 230MW capacity powered entirely by renewable hydropower. The facility is a joint venture between Nscale and Aker, reflecting OpenAI's broader expansion of AI infrastructure partnerships across Europe and globally.

0 favorites 0 likes

gpu

Submit Feedback