Trials and tribulations fine-tuning & deploying Gemma-4 [P]
Summary
An ML team documents practical challenges encountered while fine-tuning and deploying Gemma-4, including incompatibilities with PEFT, SFTTrainer, DeepSpeed ZeRO-3, and lack of runtime LoRA serving support, along with workarounds for each issue.
Similar Articles
Those of you who like Gemma4 models - how are you guys using them?
A developer shares their mixed experience running Gemma4 and Qwen locally for coding tasks, noting issues with tool integration, loop handling, and task completion while asking the community for better usage strategies.
google/gemma-4-26B-A4B-it-assistant
Google DeepMind released Gemma 4 MTP drafters for the Gemma 4 family, enabling significant decoding speedups via speculative decoding while maintaining exact generation quality for low-latency applications.
@ivanfioravanti: Autoresearch from @karpathy in action locally using gemma-4-26b-a4b-it-6bit with oMLX on an M5 Max to train Gemma 4 E2B…
Developer Ivan Fioravanti demonstrates running Andrej Karpathy's autoresearch project locally with a 6-bit quantized Gemma-4-26B model on Apple Silicon, suggesting successful training of Gemma 4 E2B IT variant.
Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared
Personal benchmark shows Qwen3.5-27B Dense and Gemma4-31B Dense fix 100 % of 37 test failures, outperforming Gemma4-26B MoE even at 8-bit quantization, while using fewer tokens and less wall-clock time.
google/gemma-4-31B-it-assistant
Google DeepMind releases Gemma 4, a family of open-weights multimodal models featuring Multi-Token Prediction (MTP) for up to 2x decoding speedups, supporting text, image, video, and audio with enhanced reasoning and coding capabilities.