Why can't people just run gemini and claude code using their own gpus?
Summary
A commentary questioning why users cannot run Gemini and Claude Code locally on their own GPUs, implying compute cost constraints are limiting access to these AI models.
Similar Articles
Ask HN: Has anyone replaced Claude/GPT with a local model for daily coding?
A Hacker News discussion explores whether developers can replace cloud AI models like Claude with local models for daily coding. Participants share experiences, noting that local models (e.g., Qwen, Gemma) are viable for hobbyists but still lag behind top cloud models for professional use.
You don't need a GPU to run gemma-4-26B-A4B
The author demonstrates that the Gemma-4-26B-A4B model runs efficiently on a CPU-only system using Koboldcpp, achieving 7 tokens per second on an old desktop, suggesting that powerful GPUs may not be necessary for local LLM inference.
What impedes apps using AI to make the user’s device the server running a local LLM?
A user reflects on why more apps don’t run local LLMs directly on phones, noting Gemma 2-4B models already work offline and could eliminate server costs while maintaining near-GPT-4o quality.
The GPUless Revolution: How Efficient AI Models Are Democratizing Artificial Intelligence
A quiet revolution is making powerful AI models runnable on consumer hardware without expensive GPUs, thanks to breakthroughs in quantization and optimized implementations like llama.cpp's Gemma4 MTP support, democratizing access for hobbyists, small businesses, and edge computing.
@GergelyOrosz: A few days ago, Steve posted about how AI usage is low at Google is surprisingly low, in good part because Gemini is ju…
Internal resistance and policy restrictions limit Google's adoption of its own Gemini model, with employees preferring disallowed tools like Claude Code.