gpu

#gpu

@oliviscusAI: OpenAI's co-founder just released his personal guide to train LLMs from scratch. It's called llm.c. No heavy setup. Jus…

X AI KOLs Timeline ↗ · 18h ago Cached

OpenAI co-founder Andrej Karpathy released llm.c, an open-source guide to training LLMs from scratch with simple code that runs on any hardware, including CPUs and MacBooks, and is 7% faster than standard approaches.

0 favorites 0 likes

#gpu

Mimo 2.5 is _fast_ at large context (dual RTX Pro 6000)

Reddit r/LocalLLaMA ↗ · yesterday

Mimo 2.5 demonstrates fast performance with large context windows using dual RTX Pro 6000 GPUs.

0 favorites 0 likes

#gpu

Modal Auto Endpoints: Optimized inference you own

Hacker News Top ↗ · yesterday Cached

Modal introduces Auto Endpoints, a self-serve service for optimized, production-grade LLM inference with full code ownership, transparent metrics, and autoscaling, built on their serverless GPU infrastructure.

0 favorites 0 likes

#gpu

NVIDIA Powers Over 400 of the World’s 500 Fastest Supercomputers

NVIDIA Blog ↗ · 2d ago Cached

NVIDIA technology now powers over 400 of the world's 500 fastest supercomputers (81% of the TOP500), with record GPU and networking adoption and top efficiency on the Green500 list.

0 favorites 0 likes

#gpu

Multi Tier MoE Caching

Reddit r/LocalLLaMA ↗ · 2d ago

Discusses multi-tier caching strategies for MoE models to improve inference speed by keeping frequently activated experts on GPU, referencing existing implementations like PowerInfer and llama.cpp branches.

0 favorites 0 likes

#gpu

SpaceX reportedly signs $6.3B computing deal with Reflection AI / The $6.3B SpaceX deal gives them access to Nvidia GB300 GPUs at the Colossus cluster (Memphis) through 2029.

Reddit r/singularity ↗ · 2d ago

SpaceX reportedly signs a $6.3 billion computing deal with Reflection AI, securing access to Nvidia GB300 GPUs at the Colossus cluster in Memphis through 2029.

0 favorites 0 likes

#gpu

@BlackRainLabs: Using TurboQuant i was able to push 20 tk/s on qwen 3.6 35b MoE on a GTX1060 3GB. Insane for such a small and old card.…

X AI KOLs Following ↗ · 2d ago Cached

Using TurboQuant, the user achieved 20 tokens per second on a Qwen 3.6 35B MoE model running on a GTX1060 3GB, showcasing impressive performance on outdated hardware.

0 favorites 0 likes

#gpu

@Mayhem4Markets: https://x.com/Mayhem4Markets/status/2069090022117019928

X AI KOLs Following ↗ · 2d ago Cached

A detailed technical comparison of two dominant LLM serving frameworks, SGLang and vLLM, covering architectural differences in KV cache management (RadixAttention vs PagedAttention), throughput, latency, and deployment considerations for self-hosted environments.

0 favorites 0 likes

#gpu

GLM-5.2 UD-IQ1_M on llama.cpp — 5090 + 3090 Ti speed test (~ 579 t/s prefill @ 8k ctx, ~324 t/s prefill @ 57k ctx, ~10.6 t/s decode)

Reddit r/LocalLLaMA ↗ · 2d ago

Speed test results for GLM-5.2 running on llama.cpp with RTX 5090 and RTX 3090 Ti, showing prefill speeds up to 579 t/s at 8k context and decode at ~10.6 t/s.

0 favorites 0 likes

#gpu

I Built a tool to stop manually swapping models on my 8GB GPU,chains a small Prompter and a large Coder into one pipeline with automatic VRAM swap

Reddit r/LocalLLaMA ↗ · 2d ago

The author built Prompt-Chain, a Streamlit app that chains a small prompter model and a large coder model with automatic VRAM swapping, enabling efficient code generation on an 8GB GPU.

0 favorites 0 likes

#gpu

@BigbirdflyChan: On June 18, JPMorgan released a very important ASIC industry report. I extracted some key points to share with you. The core view of the report is that the AI era is driving custom chip ASICs into a new golden cycle, and the biggest beneficiaries are Broadcom and Marvell. Yes, Mrvl has been in the spotlight recently…

X AI KOLs Timeline ↗ · 3d ago Cached

JPMorgan releases ASIC industry report, predicts AI custom chips entering golden cycle, Broadcom and Marvell are biggest beneficiaries, and expects AI ASIC shipments to surpass GPU for the first time by 2027.

0 favorites 0 likes

#gpu

@TheAhmadOsman: Why do I focus on Inference Engines/Software Stacks for your hardware? - 2x RTX 3090s: ~14.5 tok/s → ~64 tok/s moving t…

X AI KOLs Following ↗ · 3d ago Cached

Comparison of inference engine performance on different hardware: moving from baseline to vLLM with TP=2 on 2x RTX 3090s improves from ~14.5 tok/s to ~64 tok/s, and on RTX PRO 6000 moving to Sglang improves from ~32 tok/s to ~110 tok/s. Recommends vLLM/Sglang for CUDA/multi-GPU and llama.cpp for edge devices.

0 favorites 0 likes

#gpu

@TheAhmadOsman: Local AI hardware = capacity × bandwidth × software stack - Capacity tells you what fits - Bandwidth tells you how hard…

X AI KOLs Following ↗ · 4d ago Cached

A detailed comparison of local AI hardware in terms of memory capacity, bandwidth, and software stack, covering GPUs, Apple Silicon, AMD, Intel, Tenstorrent, and others, with a focus on what bottlenecks matter for AI inference.

0 favorites 0 likes

#gpu

AMD future GPU offerings. Some interesting offerings for a LLM build. What type of LLM rig would you build with these?

Reddit r/LocalLLaMA ↗ · 5d ago

Discussion about upcoming AMD GPU offerings and their potential for building an LLM rig, asking the community for build suggestions.

0 favorites 0 likes

#gpu

RTX 5090 MSI, only inference or training at 475-500W. Make sure to not bend you cable!

Reddit r/LocalLLaMA ↗ · 5d ago

MSI's RTX 5090 GPU operates at 475-500W for inference or training, with a warning about cable bending.

0 favorites 0 likes

#gpu

@SlimTradeyBaby: Drop your GPU below and I’ll tell you exactly what model and config to run on it. JOKES. No need. Qwen 3.6 27b @Unsloth…

X AI KOLs Timeline ↗ · 5d ago Cached

A tweet promoting the Qwen 3.6 27b model and recommending UnslothAI for running it on any GPU.

0 favorites 0 likes

#gpu

@TheAhmadOsman: Everything I am seeing in the market leads me to conclude that if you have gained some experience working with GPUs and…

X AI KOLs Following ↗ · 6d ago

A market observation that experience with GPUs and local AI will be highly sought after by employers.

0 favorites 0 likes

#gpu

LQ50/LQ50-24GB cost around $1200

Reddit r/LocalLLaMA ↗ · 6d ago

The LQ50 and LQ50-24GB are priced at around $1200, indicating a mid-range AI hardware offering.

0 favorites 0 likes

#gpu

@jino_rohit: https://x.com/jino_rohit/status/2067620031517860243

X AI KOLs Timeline ↗ · 6d ago Cached

Explains the communication model for multi-GPU systems, covering the trade-off between latency and bandwidth, and compares MST and Ring algorithms for collective operations like broadcast.

0 favorites 0 likes

#gpu

@ericlbuehler: Excited to share cuTile Rust: bringing Rust's fearless concurrency to GPU kernel programming. Our paper "Fearless Concu…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

Excited to share cuTile Rust, bringing Rust's fearless concurrency to GPU kernel programming. Their paper 'Fearless Concurrency on the GPU' is now on arXiv.

0 favorites 0 likes

gpu

Submit Feedback