Tag
The article presents benchmark results for 8 local LLMs on an RTX 3090, showing that power efficiency peaks around 225W, with diminishing returns at maximum power.
The author reports successful experiments running MRCR v2 with 1M context length on a single MI300X using Qwen2.5-32B and FAISS, achieving competitive scores at low cost.
KernelBench-X is a new benchmark for evaluating LLM-generated GPU kernels, revealing that task structure impacts correctness more than method design and that correctness does not guarantee hardware efficiency.