@LiuVaayne: This is a long article by yan5xu (former ManusAI): "From Prompt to Harness: How to Understand LLM Engineering" Core Framework: The Spiral Evolution of Engineering Paradigms First Loop: Prompt Engineering (2022-2024) ...
Summary
This article by yan5xu (former ManusAI) proposes a spiral evolution model for LLM engineering paradigms: from Prompt Engineering (2022-2024) to Context Engineering (2025), then to Harness Engineering (2026-), and discusses the bottlenecks and driving factors at each stage.
View Cached Full Text
Cached at: 05/26/26, 07:13 PM
This is a long essay by yan5xu (ex ManusAI): “From Prompt to Harness: How to Understand LLM Engineering”
Core Framework: Spiral Ascending Engineering Paradigm Evolution
First Circle: Prompt Engineering (2022–2024)
- Write good system prompts, design few-shot examples, tweak prompt structure
- Ceiling: As models become stronger, carefully crafted techniques may become constraints (Khan’s Prompting Inversion: 97% on GPT-4o Sculpting, drops to 94% on GPT-5)
Second Circle: Context Engineering (2025)
- Karpathy coined: “the delicate art and science of filling the context window with just the right information for the next step”
- Manage information flow in dynamically unfolding tasks: how previous results flow into the next step, how environment feedback is injected, how conversation history is compressed
- Typical products: Cursor (dynamically retrieves relevant code), Lovable/Bolt (LSP errors, test failures automatically injected into next round)
- Limitation: rules are hard-coded by humans, “context rot” (performance drops by 39% on average after multiple rounds)
Third Circle: Harness Engineering (2026–)
- Two-step approach:
- Let go: give agents a toolchain (lint, test, search) and let them decide when to use what
- Set guardrails: constrain capability boundaries (sandbox, CI round limits, file permissions, structural tests)
- Quantitative validation: same model, same prompt, switching harness configuration increases success rate from 42% → 78%
Quote: “Agents aren’t hard; the Harness is hard.”
Key Insights
- Not linear replacement, but spiral layering: the harness still runs a context engineering pipeline, and the pipeline still uses carefully designed prompts
- Driving force for each circle: the previous stage becomes insufficient → new practices are forced out
- Next bottleneck already visible: Eval (how to define “good”? LLM-as-judge has bias) and Governance (multi-agent mutual authentication, auditing, permissions)
Similar Articles
@yan5xu: https://x.com/yan5xu/status/2059117572826746979
The article discusses three stages of LLM engineering evolution from Prompt Engineering to Harness Engineering, reflecting the progression of AI engineering practices.
@freeman1266: Harness Engineering is not mysticism, but an engineerable living product. Many people read a bunch of Harness Engineering articles and understand the concepts, but what is the first step? Six layers, stacked step by step: • Rule: Hard-code basic rules to tell AI what not to…
Harness Engineering is not mysticism, but an engineerable living product. The article proposes a six-layer engineering framework (Rule, Skill, Sub Agent, Workflow, Scripts, dev-map), emphasizing starting simple, relying on scripts rather than prompts, and improving through iteration.
@_avichawla: https://x.com/_avichawla/status/2053049489963811135
This article outlines a 2026 roadmap for LLM engineering, detailing eight key pillars including prompt engineering, RAG systems, and context management, while providing curated free and open-source resources for each.
@freeman1266: https://x.com/freeman1266/status/2064702757773496552
This article introduces the concept of Loop Engineering, which involves designing automated systems that allow AI agents to work in autonomous loops, including elements such as automated tasks, work trees, skills, plugins, and sub-agents, thereby replacing manual prompting and improving development efficiency.
Step-By-Step LLM Engineering Projects (2026 Edition)
A project-based roadmap for learning LLM engineering by building key components from tokenizers to serving stacks, including hardware foundations and post-training techniques.