@LiuVaayne: This is a long article by yan5xu (former ManusAI): "From Prompt to Harness: How to Understand LLM Engineering" Core Framework: The Spiral Evolution of Engineering Paradigms First Loop: Prompt Engineering (2022-2024) ...

X AI KOLs Timeline 05/26/26, 04:37 AM News

llm-engineering prompt-engineering context-engineering harness-engineering agents evaluation governance

Summary

This article by yan5xu (former ManusAI) proposes a spiral evolution model for LLM engineering paradigms: from Prompt Engineering (2022-2024) to Context Engineering (2025), then to Harness Engineering (2026-), and discusses the bottlenecks and driving factors at each stage.

This is a long article by yan5xu (ex ManusAI): "From Prompt to Harness: How to Understand LLM Engineering" Core Framework: The Spiral Evolution of Engineering Paradigms First Loop: Prompt Engineering (2022-2024) • Writing good system prompts, designing few-shot examples, tuning prompt structure • Ceiling: As models become stronger, carefully crafted techniques become constraints instead (Khan's Prompting Inversion: Sculpting 97% on GPT-4o drops to 94% on GPT-5) Second Loop: Context Engineering (2025) • Named by Karpathy: "the delicate art and science of filling the context window with just the right information for the next step" • Managing information flow in dynamic unfolding tasks: how previous step results flow into the next, how environment feedback is injected, how conversation history is compressed • Typical products: Cursor (dynamically retrieving relevant code), Lovable/Bolt (LSP errors, test failures automatically injected into the next round) • Limitations: Rules are hardcoded by humans; "context rot" (average performance drop of 39% after multiple rounds) Third Loop: Harness Engineering (2026-) • Two steps: ‣ Let go: Give the agent a toolchain (lint, tests, search), let it decide when to use what ‣ Add safety: Define capability boundaries (sandbox, CI round limits, file permissions, structural tests) • Quantitative validation: Same model, same prompt, with different harness configurations, success rate goes from 42% → 78% Key quote: "Agents aren't hard; the Harness is hard." Key Insights • Not linear replacement, but spiral layering: Inside the harness, context engineering pipelines are still running; inside the pipeline, carefully designed prompts are still being written • Driving force of each loop's evolution: The previous stage is no longer sufficient → new practices are forced out • Next loop bottlenecks already visible: Eval (how to define 'good'? LLM-as-judge has bias) and Governance (multi-agent mutual authentication, auditing, permissions)

Original Article

View Cached Full Text

Cached at: 05/26/26, 07:13 PM

This is a long essay by yan5xu (ex ManusAI): “From Prompt to Harness: How to Understand LLM Engineering”

Core Framework: Spiral Ascending Engineering Paradigm Evolution

First Circle: Prompt Engineering (2022–2024)

Write good system prompts, design few-shot examples, tweak prompt structure
Ceiling: As models become stronger, carefully crafted techniques may become constraints (Khan’s Prompting Inversion: 97% on GPT-4o Sculpting, drops to 94% on GPT-5)

Second Circle: Context Engineering (2025)

Karpathy coined: “the delicate art and science of filling the context window with just the right information for the next step”
Manage information flow in dynamically unfolding tasks: how previous results flow into the next step, how environment feedback is injected, how conversation history is compressed
Typical products: Cursor (dynamically retrieves relevant code), Lovable/Bolt (LSP errors, test failures automatically injected into next round)
Limitation: rules are hard-coded by humans, “context rot” (performance drops by 39% on average after multiple rounds)

Third Circle: Harness Engineering (2026–)

Two-step approach:
- Let go: give agents a toolchain (lint, test, search) and let them decide when to use what
- Set guardrails: constrain capability boundaries (sandbox, CI round limits, file permissions, structural tests)
Quantitative validation: same model, same prompt, switching harness configuration increases success rate from 42% → 78%

Quote: “Agents aren’t hard; the Harness is hard.”

Key Insights

Not linear replacement, but spiral layering: the harness still runs a context engineering pipeline, and the pipeline still uses carefully designed prompts
Driving force for each circle: the previous stage becomes insufficient → new practices are forced out
Next bottleneck already visible: Eval (how to define “good”? LLM-as-judge has bias) and Governance (multi-agent mutual authentication, auditing, permissions)

@LiuVaayne: This is a long article by yan5xu (former ManusAI): "From Prompt to Harness: How to Understand LLM Engineering" Core Framework: The Spiral Evolution of Engineering Paradigms First Loop: Prompt Engineering (2022-2024) ...

Similar Articles

@yan5xu: https://x.com/yan5xu/status/2059117572826746979

@freeman1266: Harness Engineering is not mysticism, but an engineerable living product. Many people read a bunch of Harness Engineering articles and understand the concepts, but what is the first step? Six layers, stacked step by step: • Rule: Hard-code basic rules to tell AI what not to…

@justloveabit: https://x.com/justloveabit/status/2070338139441484053

@Anitahityou: The prompt engineering still being touted in '24 is already dead. Today's LLM is an intent reconstructor. Clarity is important, but richness is more important. Because human real thinking is not linear; it is jumpy, chaotic, emotional. An over-compressed prompt can...

@_avichawla: https://x.com/_avichawla/status/2053049489963811135

Submit Feedback

Similar Articles

@yan5xu: https://x.com/yan5xu/status/2059117572826746979

@freeman1266: Harness Engineering is not mysticism, but an engineerable living product. Many people read a bunch of Harness Engineering articles and understand the concepts, but what is the first step? Six layers, stacked step by step: • Rule: Hard-code basic rules to tell AI what not to…

@justloveabit: https://x.com/justloveabit/status/2070338139441484053

@Anitahityou: The prompt engineering still being touted in '24 is already dead. Today's LLM is an intent reconstructor. Clarity is important, but richness is more important. Because human real thinking is not linear; it is jumpy, chaotic, emotional. An over-compressed prompt can...

@_avichawla: https://x.com/_avichawla/status/2053049489963811135