@ShenHuang: https://x.com/ShenHuang/status/2053370791958569207

X AI KOLs Timeline Tools

Summary

The author reveals that Claude Code's advantage lies in its 'harness' rather than the model itself, and open-sources a rebuilt version of this harness for DeepSeek V4 to improve its coding capabilities.

https://t.co/O4EvD4jIfd
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/10/26, 12:27 PM

Claude Code’s moat isn’t the model. It’s 9 pieces harness. I rebuilt them for DeepSeek.

DeepSeek and Claude trade wins on benchmarks, but DeepSeek-via-raw-API feels broken next to Claude Code. The gap isn’t the model — it’s nine pieces of harness almost nobody ships. I built them and open-sourced the result.

The mismatch

DeepSeek V4 is roughly 1/50 the cost of Claude Opus 4.7 per million tokens.

Yet most engineers who try to swap Claude Code for raw DeepSeek give up inside a day. “Forgets things, can’t fix itsown bugs, doesn’t read my files.”

The instinct is to blame the model. It’s not the model.

It’s that you’re comparing a model in a jar to a model wearing a fully built-out harness — tools, permission gating, cache discipline, LSP feedback, MCP, sub-agents, compaction, a TUI you actually want to live in. The harness is 80% of the product. Claude Code ships it. The DeepSeek API does not. Nobody else is filling that gap end-to-end either.

I spent ten weeks filling it. Below is the actual list of pieces that could fill the gap between DS V4 and Claude Opus

The nine missing pieces for DS V4

1. Tool loop with permission gates

Fifty-ish tools (read / edit / write / bash / grep / glob / web_fetch / git ops / etc.), each with a JSON schema, a permission tier (auto / ask / deny-in-plan), and a sane error-reinjection format.

The error format matters more than people think — “model wrote bad JSON, here’s the parse error” needs to come back in a way the model can self-correct without burning three turns.

2. Cache-aware prompt assembly

DeepSeek’s prefix cache cuts cached tokens to ~10% of list price. Claim that discount or your 1/50 cost advantage halves overnight.

Two non-obvious rules:

  • System prompt and tool schemas at the very top, byte-stable across turns. Reorder one tool, cache miss, full rebill.

  • Conversation strictly append-only. No retroactive edits to old user messages, no in-place compaction of old assistant turns. If you have to compact, snapshot the old prefix into a synthetic message and start a new prefix from there.

Most ad-hoc agent loops trash the cache the first time they format something differently between turns.

3. reasoning_content replay

V4 thinking mode emits internal-monologue tokens that the user never sees. The non-obvious part: you have to feed them back into the model on the next turn or it loses the thread.

The Anthropic SDK won’t help you here. You’re lifting reasoning_content off the response and reattaching it as a structured assistant turn yourself.

Skip this and your agent feels lobotomized between messages — same model, dramatically worse continuity.

4. Five compaction strategies

1M context isn’t usable context. Per-tool-result micro-trim, threshold auto-summary, model-requested reactive compaction, durable session-memory extraction across runs, and a manual /snip.

You need all five because they fire at different time scales — the inline tool trim is per-turn, the auto-summary is per-N-tokens, the session-memory extraction is end-of-session, etc. One generic “summarize when full” doesn’t cut it once you watch real sessions.

5. Plan / Agent / YOLO mode separation

Three permission profiles, switchable mid-session.

Plan = read-only + draft, no writes;

Agent = writes with per-call confirmation;

YOLO = write-everything for unattended runs.

The mode is a function of allowed-tool-set × permission-default × confirmation-prompt-style. It’s not just a UI toggle — it has to wrap the tool dispatcher and survive sub-agent spawning.

6. Real MCP client

The ecosystem is the point — once you speak MCP, every Linear / GitHub / Postgres / Browser server is one config line away. Skip it and you’re the only agent on the block writing one-off integrations.

7. LSP feedback loop

Spawn the right language server (tsserver / rust-analyzer / pyright / gopls / clangd) and pipe diagnostics back into the model after every edit.

Model first-try fix rate goes from ~30% to ~80% once the LSP is in the loop.

Each server has its own init dance and its own quirks; this is grungy, ongoing work.

8. Sub-agent fan-out

Main agent decomposes a long task into N independent sub-agents, each with fresh context, scoped tool budget, scoped permissions.

Results roll back into the parent. Cuts wall-clock for “audit the codebase” or “refactor across 200 files” by an order of magnitude.

The hard part is the contract: how do sub-agents stream progress, share artifacts, and fail gracefully without poisoning the parent?

9. TUI + slash commands + skill loader

Solid + opentui (or whatever — but please not Electron). Vim mode. Stable scroll buffer under streaming output. ~100 slash commands (/clear, /diff, /plan, /compact, /skill, …). A skill loader that scans .openseek/skills, .claude/skills so the community’s existing skill libraries Just Work.

This is “just frontend” except it’s the entire surface the user touches. Get it wrong and the rest of the harness is invisible.

So what?

Build all nine, point it at DeepSeek V4, and the experience converges on Claude Code. At ~1/50 the cost.

That’s a lot to build. So I built it.

openseek — Multi-provider out of the box: DeepSeek, any OpenAI-compat endpoint, Anthropic, Bedrock, Vertex, Azure. Same harness, swap the model.

Repo: https://github.com/LichAmnesia/openseek

Similar Articles

@shao__meng: Why do Claude Code, Cursor, Codex, Aider, and Cline exhibit different agent behaviors despite potentially sharing the same underlying models? @addyosmani argues: It's due to the "shell" above the model — the Harness, which includes "prompts, ...

X AI KOLs Timeline

The article discusses how Addy Osmani argues that the performance difference between AI coding agents like Claude Code, Cursor, and Cline stems from their 'Harness'—the layer of prompts, tools, and constraints around the model—rather than the underlying model itself. It details best practices for harness engineering, including hooks, sandboxing, and context management, to bridge the gap between model capability and actual agent performance.

Claude code vs Codex

Reddit r/AI_Agents

The user seeks a value comparison between Claude Code and OpenAI Codex $20 subscriptions, sharing their personal workflow involving Haiku, Sonnet, Qwen, and DeepSeek.

@SaitoWu: https://x.com/SaitoWu/status/2052967845626290326

X AI KOLs Timeline

YC CEO Garry Tan shared how he returned to active development after 13 years away from coding, using Claude Code and OpenClaw with a 'Thin Harness + Fat Skills' methodology to achieve a 400x productivity boost. He also built an agentic news platform called Garry's List and an agent workflow framework called Gstack.