@shao__meng: Former Meta/Microsoft/Atlassian Staff Engineer's Agentic Engineering Workflow With this workflow @kunchenguid ships 40-50 tested production-grade PRs per day. He describes it: "You are the captain, agents are your crew…

X AI KOLs Timeline 06/22/26, 12:35 AM News

agentic-engineering workflow terminal tmux neovim ai-coding developer-productivity

Summary

Former Meta/Microsoft/Atlassian staff engineer Kun shares his agentic engineering workflow: centered on terminal, tmux, and Neovim, using global/project-level memory files and skills to train AI teammates, delivering 40-50 tested production PRs daily, boosted by voice input, AXI standard, Lavish interactive planning, and more.

Former Meta/Microsoft/Atlassian Staff Engineer's Agentic Engineering Workflow Using this workflow @kunchenguid ships 40-50 tested production-grade PRs per day. He describes it: "You are the captain, agents are your crew, with four progressive layers: building the ship → training the crew → collaborating with individual crew → commanding multiple crew in parallel + a First Mate." https://youtube.com/watch?v=iQyg-KypKAA… # Terminal-Centric (Building the Ship) Stick to a terminal-only environment, core reasons: · Keyboard-only = maintaining flow, mouse switching forces context switch · Cross-device consistency — the same workflow can be continued on phone/different machines Tool stack: WezTerm (cross-platform, Lua config, hot reload) + tmux (session persistence, multiple panes, remote attach) + Neovim (keyboard-first, relative line numbers). # Crew Onboarding (Memory + Skills) Agents are fresh recruits unaware of your preferences. Two mechanisms ramp them up: Memory and Skills 1. Memory · Global memory (e.g., ~/.claude/CLAUDE.md): Keep it lean (27 lines), as it's injected into every session's system prompt — too long silently consumes tokens · A few insightful preference rules: 1. Don't use em-dash (—) — AI defaults to it, feels robotic 2. When making technical decisions, don't overestimate development cost — models trained on human data overestimate time (predict "days/weeks" when a playable version takes minutes), biasing toward "cheap but low quality" solutions. This corrects that training bias 3. For bug fixes, prioritize end-to-end reproduction over relying on unit tests · Project-level memory: Key method is not writing manually, but after each correction, ask the agent to document the lesson — collective learning accumulation 2. Skills · Extract conditional content (like E2E instructions only needed when modifying code) from memory into skills · Skills load only a short description initially, full content only when used — avoiding unnecessary token consumption 3. Important Warning about Skills · Karpathy's skills repo (177k stars) was tested with Program-Bench: actually consumed 5% more tokens and produced worse results, and it wasn't written by Karpathy himself · Security risk: skills can execute arbitrary commands on the machine, potentially leaking API keys or even banking credentials · Conclusion: Popular ≠ Good. Don't install skills claiming "magical improvements" without rigorous evaluation # Collaborating with Individual Crew 1. Voice Input · Almost exclusively use voice instead of typing (Stanford paper: speaking is 3x faster than typing) · Tool: OpenSuperWhisper — local whisper, free and open-source, inject custom vocabulary via system prompt to improve proper noun recognition 2. AXI Standard (Agent ergonomics) Self-created design standard for agent-optimized tools: · Measured: Same GitHub task, MCP server used 3x more tokens + 2x latency than CLI · One design principle: Token-efficient output format saves ~40% tokens vs JSON · Insight: The efficiency of the tool you give the agent directly determines the agent's "fuel consumption" 3. Lavish (Interactive Planning Artifact) Solves the pain point of "agent returning a wall of text hard to review": Have the agent generate an HTML visual artifact reusing the project design system, allows annotating specific elements for feedback and returning in-browser. # Validation: No-Mistakes Pipeline (Quality Foundation) Counter-intuitive claim: Don't review diffs individually. · Reason: AI writes code too fast; reviewing every diff makes you the bottleneck and is boring · Analogy: Think like an engineering director — directors don't review PRs, they ensure quality through culture and processes Pipeline executes in an isolated worktree: · Analyze session to restore true intent · Rebase onto latest main, resolve conflicts early · Adversarial review (independent context window) — most issues caught and self-healed; ambiguous ones escalated to human · E2E tests with recorded evidence (screenshots/videos/logs) · Documentation update + link checking · Push branch, open PR, continuously babysit until merge PR presentation: original intent, change summary, test evidence, issues found and fixed by pipeline, risk assessment. Review strategy: based on risk assessment decide how much effort to invest. Low-risk PRs barely look at diffs (since pipeline covered it), only deep dive on high-risk. Insights on time distribution: Spend time at the start (clarify requirements with Lavish) and the end (quality gate), middle entirely delegated to AI. The more middle time freed up, the more parallel work possible. # Long-Running: Good-Night-Have-Fun Solves "how to keep agent working during 8 hours of sleep": Give goals and stop conditions, iterate in a loop. Compared to Claude Code/Codex's /go, advantage is precise control over token limit / iteration limit / stop conditions — avoid waking up to find weekly quota exhausted. # Parallel: Treehouse + Worktree Pain points of git worktree: naming, remembering state, manual cleanup = cognitive debt. Treehouse: run lands on an idle worktree, close tab auto-releases, treehouse status at a glance. # First Mate: Orchestrator As parallel sessions increase, context-switching fatigue sets in. First Mate is a meta-agent that manages all crew for you: you only talk to it, it automatically splits parallel sub-tasks, calls treehouse to create worktrees, runs no-mistakes, prepares PRs. Key observation: After using First Mate, the bottleneck shifted from "agent execution power" to "what you want it to do" — the captain's value moves to strategy: understanding users, researching competition, drawing better "treasure maps."

Original Article

View Cached Full Text

Cached at: 06/22/26, 09:42 AM

Agentic Engineering Workflow from a Former Meta/Microsoft/Atlassian Staff Engineer

Using this workflow @kunchenguid ships 40-50 tested production PRs daily. He describes it as: “You are the captain, the agent is your crew, organized in four progressive layers: shipbuilding → training the crew → collaborating with a single crew member → orchestrating multiple crew members in parallel + a First Mate.”

https://youtube.com/watch?v=iQyg-KypKAA…

Terminal-Centric (Shipbuilding)

Insist on working entirely in the terminal, core reasons: · Hands never leave keyboard = maintain flow state; mouse switching forces context switching · Cross-device consistency – the same workflow can be continued on phone / different machines

Tool stack: WezTerm (cross-platform, Lua config, hot-reload) + tmux (session persistence, multiple panes, remote attach) + Neovim (keyboard-first, relative line numbers).

Crew Onboarding (Memory + Skills)

The agent is a new recruit, unaware of your preferences. Two mechanisms to ramp up: Memory and Skills

Memory · Global memory (e.g., ~/.claude/CLAUDE.md): Keep it lean (27 lines) because content is injected into every session’s system prompt; too long silently consumes tokens. · A few insight-driven preference rules:
Don’t use em-dash (—) – AI defaults to it, feels mechanical.
When making technical decisions, don’t overestimate development cost – models trained on human data overestimate time (guessing “days/weeks” when actual minutes yield a playable version); this bias pushes models toward “cheap but low-quality” solutions. This rule corrects a training bias.
For bug fixes, prioritize end-to-end reproduction over relying on unit tests. · Project-level memory: Core method is not to write manually, but to let the agent write lessons learned after each correction – accumulated collective learning.
Skills · Extract conditional content (e.g., E2E instructions only needed when modifying code) from memory into a skill. · When a skill is triggered, only a short description loads; full content is read only when used – avoids unnecessary token consumption.
Important Warning about Skills · Karpathy’s skills repository (177k stars) was evaluated by program-bench and found to consume 5% more tokens and produce worse results, and it wasn’t even written by Karpathy himself. · Security risk: skills can execute arbitrary commands on the machine, potentially leaking API keys or even bank credentials. · Conclusion: Popular ≠ Good. Don’t install skills that claim “magical improvements” without rigorous evaluation.

Collaborating with a Single Crew Member

Voice Input · Almost entirely use voice instead of typing (Stanford paper: speaking is 3x faster than typing). · Tool OpenSuperWhisper: local whisper, free and open source, inject custom vocabulary via system prompt to improve proper noun recognition.
AXI Standard (Agent Ergonomics) Self-created design criteria for optimizing tools for agents: · Measured: For the same GitHub task, MCP server consumes 3x more tokens + 2x latency compared to CLI. · One design principle: Token-efficient output formats save ~40% tokens compared to JSON. · Insight: The efficiency of the tools you provide to the agent directly determines the agent’s “fuel consumption.”
Lavish (Interactive Planning Artifacts) Addresses the pain point of “agent returns a wall of text that’s hard to review”: have the agent generate an HTML visual artifact that reuses the project’s design system, allowing you to annotate feedback on specific elements and return it within the browser.

Verification: No-Mistakes Pipeline (Quality Foundation)

Counterintuitive claim: Don’t review diffs individually. · Reason: AI writes code too fast; reviewing diffs one by one makes you the bottleneck and is boring. · Analogy: Think like an engineering director – a director doesn’t review PRs but ensures quality through culture and processes.

The pipeline runs in an isolated worktree: · Analyze the session to restore true intent. · Rebase onto latest main, resolve conflicts early. · Adversarial review (separate context window) – most issues caught and self-healed here; ambiguous ones escalated to human. · E2E testing with recorded evidence (screenshots / videos / logs). · Documentation update + link checking. · Push branch, open PR, continuously babysit until merge.

PR presentation: original intent, change summary, test evidence, issues found and fixed by the pipeline, risk assessment. Review strategy: Based on risk assessment, decide effort to invest. Low-risk PRs barely look at diffs (because pipeline already covered); only deep-dive on high-risk.

Work distribution insight: Time is spent at the beginning (clarifying requirements with Lavish) and at the end (quality gate); the middle is entirely handed to AI. The more you free up the middle, the more parallelism.

Long Runs: Good-Night-Have-Fun

Solving the “how to let the agent keep working while I sleep 8 hours”: give goal and stopping conditions, iterate in a loop.

Compared to Claude Code / Codex’s /go, advantage is precise limits on token cap / iteration cap / stop condition – avoid waking up to find your weekly quota exhausted.

Parallelism: treehouse + worktree

Pain points of git worktree: naming, tracking state, manual cleanup = cognitive debt. treehouse: running automatically lands in a free worktree; closing a tab automatically releases it; treehouse status gives an overview.

First Mate: The Orchestrator

As parallel sessions increase, context-switching fatigue sets in.

First Mate is a meta-agent that manages all your crew members: you only speak to it; it automatically breaks down parallel subtasks, calls treehouse to create worktrees, runs no-mistakes, and prepares PRs.

Key observation: After using First Mate, the bottleneck shifts from “agent execution capability” to “what you want it to do” – the captain’s value shifts to strategy: understanding users, researching competition, drawing a better “treasure map.”

TL;DR

Former Meta/Microsoft/Atlassian staff engineer Kun shares his agentic engineering workflow: terminal- and tmux-centric, training AI crew members with global and project-level memory files, enabling delivery of 40–50 tested production PRs daily.

Core Gear: Terminal, tmux, and Editor

The whole workflow hinges on keeping hands almost never leaving the keyboard to maintain flow. Kun does everything in the terminal because it’s inherently keyboard-driven, no mouse interruptions. Plus, terminal configs are reusable across devices, even on a phone.

Terminal Emulator: WezTerm

WezTerm is a high-performance terminal emulator with 26k GitHub stars. Kun chose it for two reasons:

Truly cross-platform: identical experience on Windows, Mac, Linux.
Highly customizable: Lua scripts configure nearly every behavior. e.g., changing color scheme reloads immediately.

His dotfiles include a wezterm.lua file where conditional logic can make configs dynamic and flexible. His personal preference is the “Rosé Pine Moon” color scheme, never changed.

tmux: Terminal Multiplexer and Persistence

tmux (terminal multiplexer) is the skeleton of the workflow. It splits the terminal into any number of panes and supports multiple windows. Kun commonly uses three panes: one for the agent, one for the editor, and one for running his own commands.

The most powerful feature is session persistence: detach with a shortcut, reattach later to the exact same state. He can connect from a laptop or phone to the same session, enabling true “work anywhere.”

Customizing tmux configuration is essential. Kun’s config includes years of accumulated keybindings (now muscle memory) plus styling and behavior settings. He recommends spending time configuring tmux to suit your habits.

Text Editor: Neovim

Neovim is a modern version of vim, core goal is to never leave the keyboard. Kun demonstrates operations:

j/k to move up/down, dd to delete line, u to undo.
Relative line numbers on the left: e.g., current line 238, previous line shows 1. To jump 11 lines up to set environment, press 11k.
Plugins provide quick search: Space+S to grep the codebase, Space+F to find files by name (e.g., type flake to immediately locate the flake file).

Kun describes the feeling in vim as “a bird flying.” There’s a learning curve, but once muscle memory forms, efficiency far exceeds mouse.

Agent Crew: Four Frameworks and Selection

Kun regularly uses four agent frameworks:

Claude Code (preferred with Anthropic subscription): most reasonable out-of-the-box experience, feature-rich, but occasionally small bugs, less customizable.
Codex CLI: written in Rust, feels smoother; open source, allows agent to inspect its own source for workarounds; fewer add-ons and customizability.
Pi Programming Agent: minimal, highly extensible, good for tinkerers.
Open Code: excellent terminal UI, integrates with almost any model, richer feature set out-of-the-box than Pi, a good model-agnostic choice.

In the video demo, Kun uses Claude Code for audience convenience, but emphasizes his workflow is agent-agnostic because tools change fast – today’s best model may be different next month.

Training the Crew: Memory Files and Skills

Newly recruited agent crew members know nothing about how the ship works. They need onboarding via memory files and skills.

Global Memory File

The global memory file stores personal preferences and general rules, loaded into the system prompt of every project, every agent session. So content must be lean to avoid consuming too many tokens. Kun’s global memory file is only 27 lines.

Example rules:

“Never use em dash” – AI models default to em dash instead of regular hyphen, but Kun finds it mechanical and dislikes it.
“When making technical decisions, don’t overvalue development cost” – he explains an interesting phenomenon: if you ask a frontier model to estimate build time for a 3D first-person shooter, it outputs “days/weeks/months”; but let the agent build it, and it returns a playable version in minutes. This happens because models are trained on human data and don’t know AI writes code much faster than humans, thus over-preferring “cheap” solutions (often unscalable or hard to maintain). This rule corrects that bias.
“When doing bug fixes, always first reproduce the bug in an end-to-end environment” – AI defaults to writing unit tests, but unit tests often don’t cover actual user behavior; end-to-end tests are more reliable.

There’s also a link to an “interesting insights” blog post not elaborated in the video.

Project-Level Memory File

Each project can have an independent memory file, e.g., highbit project’s claude.md or agents.md (shared via symlink). Content is more detailed, including project background, repo layout, terminology, component workings, end-to-end testing methods, and conventions.

Kun’s approach: every time the agent makes a mistake, he corrects it and asks to remember, storing the lesson in the file. Over time, the crew gets smarter. No fancy memory system needed – one markdown file is enough.

But project-level memory files can become bloated. To improve efficiency, conditional information that isn’t always needed (e.g., E2E testing instructions) can be moved out of memory and converted into a skill. That way, the skill is only loaded when the relevant task is at hand, saving tokens.

Conclusion

Kun’s workflow emphasizes “you are the captain, the agent is your crew.” Through terminal-centricity, memory file training, and skill reuse, he consistently ships 40–50 production PRs daily. The method focuses on fundamental concepts and can be adapted to different agent frameworks and models.

(The video continues with collaborating with a single crew member, multi-crew collaboration, and recruiting a “First Mate” to handle chores – this transcript does not fully cover that part; watch the original video for the complete flow.)

Source: YouTube video link (https://www.youtube.com/watch?v=iQyg-KypKAA)

Kun Chen (@kunchenguid): many people asked me to make a video about my complete agentic engineering workflow

excited to share it’s finally here!!!

it took me about 20 hours in total to record this 45 minutes of walkthrough - it covers everything i do to ship production quality code at an average 40+