My AI system kept randomly switching to French mid-answer and it took me way too long to figure out why

Reddit r/artificial 04/21/26, 01:10 PM Papers

Summary

A developer describes how French text in retrieved contexts caused their multilingual RAG system to unpredictably switch languages mid-answer, ultimately solved with a regex-based German detector and explicit negative prompts.

I built a RAG system that needs to answer in German or English depending on the query language. Sounds simple. It was not. The source documents are mostly in German but some contain French legal terminology, Latin phrases, and occasional English citations. What kept happening was the LLM would start answering in German, hit a French passage in the context, and just.. switch to French mid-paragraph. Sometimes it would blend German and French in the same sentence. Once it answered entirely in Italian and I still have no idea why. I tried letting the LLM detect the query language itself. Unreliable. It would sometimes decide the query was in French because the user mentioned a French court case by name. What actually worked was a dumb regex detector. I check the query for common German words (der, die, das, und, ist, nicht, mit, für, datenschutz, verletzung, etc). If enough German markers are present the response language is forced to German. Otherwise English. No fancy language detection library. Just pattern matching. Then in the prompt I added a hard constraint: "Write your entire answer ONLY in {language}. Output must be German or English only. Never French, Spanish, Italian, or any other language. If the retrieved context is partly in another language, translate your answer into {language} only." The "never French" part is doing heavy lifting. Without that explicit prohibition the model would drift back into French within a few days of testing. It's like the model sees French legal text in context and thinks "oh we're doing French now." Anyone else building multilingual RAG systems running into this? The language contamination from source documents was the most annoying bug I dealt with and I've seen almost nobody write about it.

Original Article

My AI system kept randomly switching to French mid-answer and it took me way too long to figure out why

Similar Articles

All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

Been stuck on a unique NLP problem [D]

Claude Knew It Was Being Tested. It Just Didn't Say So. Anthropic Built a Tool to Find Out.

English Centric AI Is Merging Unrelated Communities and Distorting Identities

No One Fits All: From Fixed Prompting to Learned Routing in Multilingual LLMs

Submit Feedback

Similar Articles

All Languages Matter: Understanding and Mitigating Language Bias in Multilingual RAG

Been stuck on a unique NLP problem [D]

Claude Knew It Was Being Tested. It Just Didn't Say So. Anthropic Built a Tool to Find Out.

English Centric AI Is Merging Unrelated Communities and Distorting Identities

No One Fits All: From Fixed Prompting to Learned Routing in Multilingual LLMs