instruction-hierarchy

#instruction-hierarchy

Improving instruction hierarchy in frontier LLMs

OpenAI Blog ↗ · 2026-03-10 Cached

OpenAI presents a training approach using instruction-hierarchy tasks to improve LLM safety and reliability by teaching models to properly prioritize instructions based on trust levels (system > developer > user > tool). The method addresses prompt-injection attacks and safety steerability through reinforcement learning with a new dataset called IH-Challenge.

0 favorites 0 likes

#instruction-hierarchy

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

OpenAI Blog ↗ · 2024-04-19 Cached

OpenAI proposes an instruction hierarchy approach to defend LLMs against prompt injection and jailbreak attacks by training models to prioritize system instructions over user inputs. The method significantly improves robustness without degrading standard capabilities.

0 favorites 0 likes

instruction-hierarchy

Improving instruction hierarchy in frontier LLMs

The Instruction Hierarchy: Training LLMs to Prioritize Privileged Instructions

Submit Feedback