Tag
This paper investigates how discourse-role labels (e.g., 'Reference:', 'Instruction:', 'Example:') used to wrap context in RAG systems significantly affect how much language models adopt misleading information, with shifts of 56–84 percentage points observed across GPT-5.5, DeepSeek V4 Pro, Llama-3-8B-Instruct, and Qwen2.5-7B-Instruct. The authors argue that wrapper labels should be treated as presentation-time variables and reported/controlled in context-utilization benchmarks.