Tag
Qualcomm is aggressively expanding in AI through multiple acquisitions, including Modular (creator of Mojo and MAX inference framework) and potentially Tenstorrent, signaling a significant push against Nvidia's CUDA ecosystem.
A detailed comparison of local AI hardware in terms of memory capacity, bandwidth, and software stack, covering GPUs, Apple Silicon, AMD, Intel, Tenstorrent, and others, with a focus on what bottlenecks matter for AI inference.
Steeve Morin reports running Llama 3.1 3B on Tenstorrent hardware via ZML, achieving 26 tok/s, close to Tenstorrent's claimed 33 tok/s.