@ivanfioravanti: Autoresearch from @karpathy in action locally using gemma-4-26b-a4b-it-6bit with oMLX on an M5 Max to train Gemma 4 E2B…
Summary
Developer Ivan Fioravanti demonstrates running Andrej Karpathy's autoresearch project locally with a 6-bit quantized Gemma-4-26B model on Apple Silicon, suggesting successful training of Gemma 4 E2B IT variant.
View Cached Full Text
Cached at: 04/22/26, 08:22 AM
Autoresearch from @karpathy in action locally using gemma-4-26b-a4b-it-6bit with oMLX on an M5 Max to train Gemma 4 E2B IT COULD WORK!
Similar Articles
@rohanpaul_ai: Gemma 4 (specifically its edge-optimized E2B and E4B variants) running fully offline on an iPhone via apps like Locally…
Google’s Gemma 4 E2B/E4B quantized variants now run fully offline on iPhone via apps like Locally AI, leveraging the Apple Neural Engine for on-device inference.
Gemma4 26b MoE running in MLX with turboquant (and custom kernel)
A developer successfully ran Gemma4 26b MoE on Apple MacBook Air M5 using MLX with turboquant and a custom kernel, achieving faster prompt processing and generation speeds than llama.cpp with lower memory usage. The implementation includes instructions for local deployment.
@analogalok: I just got Gemma 4 26B A4B MoE model running fully locally with Hermes agent on an 8GB RTX 4060 and it's now backtestin…
A developer demonstrates running Gemma 4 26B MoE model locally on an 8GB RTX 4060 with Hermes agent to fully automate backtesting of trading strategies, highlighting the growing capability of local LLMs as autonomous agents.
@rohanpaul_ai: So much possibilities for on-device small models. Here @adrgrondin is running Google’s Gemma 4 E2B on iPhone 17 Pro. ~4…
Google's Gemma 4 E2B is demonstrated running on an iPhone 17 Pro via MLX optimization, achieving ~40 tokens/second with 128K context and offline thinking mode for coding and math.
@HuggingModels: Gemma 4 is here, and it's optimized for Apple Silicon. This 4-bit quantized model runs fast on your Mac, not just in th…
Gemma 4 is a 4-bit quantized model optimized for Apple Silicon, enabling fast local inference on Mac devices, reducing reliance on cloud computing.