Trials and tribulations fine-tuning & deploying Gemma-4 [P]

Reddit r/MachineLearning 04/18/26, 10:57 PM News

gemma-4 fine-tuning lora deployment ml-engineering open-source

Summary

An ML team documents practical challenges encountered while fine-tuning and deploying Gemma-4, including incompatibilities with PEFT, SFTTrainer, DeepSpeed ZeRO-3, and lack of runtime LoRA serving support, along with workarounds for each issue.

Hey all, Our ML team spent some time this week getting training and deployments working for Gemma-4, and wanted to document all the things we ran into along the way. * **PEFT doesn't recognize Gemma 4's custom layers.** Google wrapped vision/audio projections in a new `ClippableLinear` class that doesn't inherit from `nn.Linear`, so PEFT refuses to attach LoRA, even for text-only fine-tuning. Fix: unwrap the wrappers after loading weights but before calling PEFT. * **SFTTrainer killed training silently.** TRL hardcodes `use_cache=False`, which breaks Gemma 4's KV-sharing attention. Loss never converges and there's no error, just garbage gradients. Fixed upstream in transformers v5.5.2+. * **DeepSpeed ZeRO-3 saves half-empty adapters.** Training loss looks perfect, but the saved LoRA file has zero-element tensors for half the layers. The model acts like it was never fine-tuned. Workaround: don't use DeepSpeed for LoRA on Gemma 4. * **No runtime LoRA serving anywhere.** Sometimes it takes a minute for vLLM and SGLang to support runtime LoRAs for Gemma 4's multimodal architecture. You have to merge weights and remap state dict keys manually before serving. Much more detail in [the blog](https://www.oxen.ai/blog/writing-a-fine-tuning-and-deployment-pipeline-isnt-as-easy-as-it-looks-gemma-4-version), but hopefully it's helpful in your Gemma-4 journey as well!

Original Article

Trials and tribulations fine-tuning & deploying Gemma-4 [P]

Similar Articles

Those of you who like Gemma4 models - how are you guys using them?

google/gemma-4-26B-A4B-it-assistant

@ivanfioravanti: Autoresearch from @karpathy in action locally using gemma-4-26b-a4b-it-6bit with oMLX on an M5 Max to train Gemma 4 E2B…

Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared

google/gemma-4-31B-it-assistant

Submit Feedback

Similar Articles

Those of you who like Gemma4 models - how are you guys using them?

google/gemma-4-26B-A4B-it-assistant

@ivanfioravanti: Autoresearch from @karpathy in action locally using gemma-4-26b-a4b-it-6bit with oMLX on an M5 Max to train Gemma 4 E2B…

Personal Eval follow-up: Gemma4 26B MoE (Q8) vs Qwen3.5 27B Dense vs Gemma4 31B Dense Compared

google/gemma-4-31B-it-assistant