When are we getting consumer inference chips?

Reddit r/LocalLLaMA News

Summary

Post questions why no startup has shipped a $200-300 consumer inference chip with Llama 3 baked in, suggesting the industry prefers API subscription revenue over one-time hardware sales.

Dumb question but I genuinely don't get it. Billions of $ poured into AI startups the last few years and nobody has shipped a consumer chip with a model built in? Like a $200 stick that runs Llama 3 at reading speed, 30W, plug into your desktop, done. Taalas is kinda doing this but only aimed at datacenters. Why tho? Today's OS models are already good enough for 90% of what most people actually need and will still be for years. The "model will be obsolete before the chip tapes out" argument feels weaker every month. Starting to wonder if the whole industry is just trying to milk consumers through API subscriptions forever instead of selling the chip once. Feels like it would be trivially profitable to ship a $300 "Llama in a box" and call it a day but I guess no one wants the recurring revenue to stop. What am I missing
Original Article

Similar Articles

OpenAI and Broadcom unveil LLM-optimized inference chip

OpenAI Blog

OpenAI and Broadcom unveiled Jalapeño, a custom LLM-optimized inference chip that promises substantially better performance per watt than current state-of-the-art, designed from the ground up for current and future AI models.