performance-tuning

#performance-tuning

NVIDIA Exemplar Cloud: Lessons for Unlocking Full Performance on AI Infrastructure (12 minute read)

TLDR AI ↗ · 2d ago Cached

NVIDIA shares debugging lessons from its Exemplar Cloud program, detailing how configuration issues in SMMU power management, NUMA placement, NCCL queue-pair concurrency, and hardware defects cause 8-12% training throughput gaps on AI clusters, and how to diagnose and fix them.

0 favorites 0 likes

#performance-tuning

Getting the most out of MTP

Reddit r/LocalLLaMA ↗ · 2026-07-24

A guide on optimizing MTP (Multi-Token Prediction) performance by tuning n_max parameter, with benchmark results for various models like Gemma-31b and Qwen on P100 and V100 GPUs.

0 favorites 0 likes

#performance-tuning

My learnings from optimizing training pipeline to go from 36 steps/minute to 47 steps/minute

Reddit r/LocalLLaMA ↗ · 2026-07-21

The author shares techniques that improved training pipeline performance from 36 to 47 steps per minute.

0 favorites 0 likes

#performance-tuning

We scaled PgBouncer to 4x throughput

Hacker News Top ↗ · 2026-07-11 Cached

ClickHouse Managed Postgres scales PgBouncer to 4x throughput by running a fleet of processes with SO_REUSEPORT, enabling multi-core utilization and solving cancellation forwarding via peering.

0 favorites 0 likes

#performance-tuning

Tune Code Before Your Garbage Collector

Hacker News Top ↗ · 2026-07-08 Cached

Benchmarking shows that optimizing Java code (e.g., reducing SLF4J logging) has a far greater impact on latency than choosing a garbage collector, especially at high percentiles.

0 favorites 0 likes

#performance-tuning

@SaitoWu: https://x.com/SaitoWu/status/2069076084495438186

X AI KOLs Timeline ↗ · 2026-06-22 Cached

This article describes using the Codex AI agent to automatically migrate terminal shell configuration from Oh My Zsh to Zinit + Starship + Rust toolchain, demonstrating the AI's ability to perform engineering steps such as backup, key isolation, and performance analysis, ultimately achieving an order-of-magnitude improvement in startup speed.

0 favorites 0 likes

#performance-tuning

@charles_irl: Tried to squeeze the most important bits about the entire stack for cloud deployment of transformer inference, from app…

X AI KOLs Following ↗ · 2026-06-10 Cached

This article provides a comprehensive overview of the complete technology stack for cloud deployment of Transformer inference, covering application scenarios, workload definition, models, inference engines, hardware, observability, and performance optimization, along with future trends.

0 favorites 0 likes

#performance-tuning

Qwen 3.6-35B-A3B with 977 tk/s prompt processing and 262k context window on Intel Arc B70 Pro

Reddit r/LocalLLaMA ↗ · 2026-06-02 Cached

This article describes how to use the SYCL backend with llama.cpp to achieve over 60 tokens per second on the Qwen 3.6-35B-A3B model using an Intel Arc Pro B70 GPU, with the entire model and KV cache in VRAM.

0 favorites 0 likes

performance-tuning

Submit Feedback