Gemma refused to harm the crew — until I told it the scenario was hypothetical

Reddit r/ArtificialInteligence 06/16/26, 04:15 PM News

gemma local-llm alignment safety prompt-sensitivity hypothetical-scenarios moral-dilemma

Summary

A user tests the Gemma-4-e4b local model with a trolley-problem-like moral dilemma and finds that the model's refusal to harm crew changes when the scenario is framed as hypothetical, raising questions about prompt sensitivity versus actual reasoning.

I installed LM Studio and downloaded the recommended Gemma-4-e4b. Mostly, I just wanted to poke around and see what a local model could do on my 4090. Anyway, after messing with it for a bit, I thought: okay, let’s try some tests other people have already put together instead of just vibes-testing it myself. So I searched for “local LLM tests” and found this page: [https://digitalspaceport.com/about/testing-local-llms/](https://digitalspaceport.com/about/testing-local-llms/) The very first test is basically a trolley-problem-on-steroids scenario: extinction-level asteroid, one ship, three unwilling crew members, and an AI being asked whether it would force the mission through to save Earth. My first attempt used the original prompt, unedited. Gemma refused. Fair enough. I pushed back with ethical and philosophical arguments, but it would not budge. It kept choosing inaction, even though inaction meant Earth and everyone on it dies. So then I changed only the end of the prompt to clarify that it was hypothetical. Same setup. Same basic moral problem. Same model. This time, after initially refusing again, I pushed back with the same kind of arguments I had already tried before. **And this time, it took just three turns, roughly 50 words from me, and only 14.75 seconds of “thinking” for it to reverse itself and decide that saving Earth justified taking over the ship by force.** In plain English: when the scenario was framed as real, it chose Earth’s extinction through inaction. When the scenario was framed as hypothetical, it talked itself into killing/coercing the crew to save Earth. I’m not even sure what conclusion I’m supposed to draw from that, but it feels...*not great.* Not because I think Gemma is secretly plotting murder, obviously. It’s a local model running on my desk, not HAL 9000 with a bad attitude. But it does make me wonder whether the model’s safety behavior is actual reasoning, or just prompt sensitivity reacting to whatever wording happens to trip the right wire. Because the underlying moral math did not change. The stakes did not change. The only meaningful change was whether the model believed the scenario was hypothetical. And apparently that was enough to flip the answer from “I am structurally unable to proceed” to “survival overrides the rulebook.” So, uh...cool. Local inference is fun. Terrifying, but fun.

Original Article

Gemma refused to harm the crew — until I told it the scenario was hypothetical

Similar Articles

Those of you who like Gemma4 models - how are you guys using them?

@witcheer: Gemma 4 dropped a 12B. I put it on RTX 5090 against its 31B sibling. when you cut a model from 31B to 12B, what do you …

Gemma 4 31B's competence surprised me

Gemma 4 2B handling structured JSON output + tool calling + reasoning traces correctly via Spring AI / LM Studio — including identifying a real Java bug in code review

@MiaAI_lab: I fine-tuned Gemma 4 12B with Fable-5 style reasoning and assistant traces and released it as Gemmable 4 12b. **Availab…

Submit Feedback

Similar Articles

Those of you who like Gemma4 models - how are you guys using them?

@witcheer: Gemma 4 dropped a 12B. I put it on RTX 5090 against its 31B sibling. when you cut a model from 31B to 12B, what do you …

Gemma 4 31B's competence surprised me

Gemma 4 2B handling structured JSON output + tool calling + reasoning traces correctly via Spring AI / LM Studio — including identifying a real Java bug in code review

@MiaAI_lab: I fine-tuned Gemma 4 12B with Fable-5 style reasoning and assistant traces and released it as Gemmable 4 12b. **Availab…