Tag
Deepseek announced DSpark, a new AI breakthrough that is significantly faster than MTP, as explained in a video.
A llama.cpp PR significantly improves prompt processing speed on Intel ARC GPUs, with benchmark showing speed increase from 245t/s to 462t/s on a B580. The improvement currently works for F16 KV quantization, with plans to support other quants.
NVIDIA released LocateAnything, an open-source model that achieves ~10x faster object detection by predicting all coordinates simultaneously instead of sequentially, reaching 12.7 FPS on a single H100 and outperforming 32B parameter models.
llama.cpp adds MTP support for Qwen3.6 models, boosting generation speed by 78% on A10G hardware, making local models viable as daily drivers.
Speculation that if Claude 5.5 becomes 20x faster, users could talk and code live while the interface updates in real time as they speak.