@__tinygrad__: We are on the MLPerf board with AMD MI350X training Llama 8B. This is with our driver, runtime, kernels, and training l…

X AI KOLs Timeline News

Summary

tinygrad announces it has achieved a spot on the MLPerf benchmark board using AMD MI350X hardware to train Llama 8B, with its own driver, runtime, kernels, and training loop, and plans to improve the time and tackle 405B next.

We are on the MLPerf board with AMD MI350X training Llama 8B. This is with our driver, runtime, kernels, and training loop. 405B next MLPerf, along with a better time on 8B (tinygrad currently at 170 min). https://t.co/syPwte872y
Original Article
View Cached Full Text

Cached at: 06/16/26, 09:41 PM

We are on the MLPerf board with AMD MI350X training Llama 8B. This is with our driver, runtime, kernels, and training loop. 405B next MLPerf, along with a better time on 8B (tinygrad currently at 170 min). https://t.co/syPwte872y

Similar Articles

llama.cpp B9387 Significant AMD/ROCm PP Update

Reddit r/LocalLLaMA

llama.cpp version b9387 introduces MFMA support for AMD CDNA architecture (MI100, MI200, MI300 series), improving processing pipeline performance on datacenter AMD GPUs.

Gemma4 26b MoE running in MLX with turboquant (and custom kernel)

Reddit r/LocalLLaMA

A developer successfully ran Gemma4 26b MoE on Apple MacBook Air M5 using MLX with turboquant and a custom kernel, achieving faster prompt processing and generation speeds than llama.cpp with lower memory usage. The implementation includes instructions for local deployment.