@shao__meng: Former Meta/Microsoft/Atlassian Staff Engineer's Agentic Engineering Workflow With this workflow @kunchenguid ships 40-50 tested production-grade PRs per day. He describes it: "You are the captain, agents are your crew…
Summary
Former Meta/Microsoft/Atlassian staff engineer Kun shares his agentic engineering workflow: centered on terminal, tmux, and Neovim, using global/project-level memory files and skills to train AI teammates, delivering 40-50 tested production PRs daily, boosted by voice input, AXI standard, Lavish interactive planning, and more.
View Cached Full Text
Cached at: 06/22/26, 09:42 AM
Agentic Engineering Workflow from a Former Meta/Microsoft/Atlassian Staff Engineer
Using this workflow @kunchenguid ships 40-50 tested production PRs daily. He describes it as: “You are the captain, the agent is your crew, organized in four progressive layers: shipbuilding → training the crew → collaborating with a single crew member → orchestrating multiple crew members in parallel + a First Mate.”
https://youtube.com/watch?v=iQyg-KypKAA…
Terminal-Centric (Shipbuilding)
Insist on working entirely in the terminal, core reasons: · Hands never leave keyboard = maintain flow state; mouse switching forces context switching · Cross-device consistency – the same workflow can be continued on phone / different machines
Tool stack: WezTerm (cross-platform, Lua config, hot-reload) + tmux (session persistence, multiple panes, remote attach) + Neovim (keyboard-first, relative line numbers).
Crew Onboarding (Memory + Skills)
The agent is a new recruit, unaware of your preferences. Two mechanisms to ramp up: Memory and Skills
-
Memory · Global memory (e.g., ~/.claude/CLAUDE.md): Keep it lean (27 lines) because content is injected into every session’s system prompt; too long silently consumes tokens. · A few insight-driven preference rules:
-
Don’t use em-dash (—) – AI defaults to it, feels mechanical.
-
When making technical decisions, don’t overestimate development cost – models trained on human data overestimate time (guessing “days/weeks” when actual minutes yield a playable version); this bias pushes models toward “cheap but low-quality” solutions. This rule corrects a training bias.
-
For bug fixes, prioritize end-to-end reproduction over relying on unit tests. · Project-level memory: Core method is not to write manually, but to let the agent write lessons learned after each correction – accumulated collective learning.
-
Skills · Extract conditional content (e.g., E2E instructions only needed when modifying code) from memory into a skill. · When a skill is triggered, only a short description loads; full content is read only when used – avoids unnecessary token consumption.
-
Important Warning about Skills · Karpathy’s skills repository (177k stars) was evaluated by program-bench and found to consume 5% more tokens and produce worse results, and it wasn’t even written by Karpathy himself. · Security risk: skills can execute arbitrary commands on the machine, potentially leaking API keys or even bank credentials. · Conclusion: Popular ≠ Good. Don’t install skills that claim “magical improvements” without rigorous evaluation.
Collaborating with a Single Crew Member
-
Voice Input · Almost entirely use voice instead of typing (Stanford paper: speaking is 3x faster than typing). · Tool OpenSuperWhisper: local whisper, free and open source, inject custom vocabulary via system prompt to improve proper noun recognition.
-
AXI Standard (Agent Ergonomics) Self-created design criteria for optimizing tools for agents: · Measured: For the same GitHub task, MCP server consumes 3x more tokens + 2x latency compared to CLI. · One design principle: Token-efficient output formats save ~40% tokens compared to JSON. · Insight: The efficiency of the tools you provide to the agent directly determines the agent’s “fuel consumption.”
-
Lavish (Interactive Planning Artifacts) Addresses the pain point of “agent returns a wall of text that’s hard to review”: have the agent generate an HTML visual artifact that reuses the project’s design system, allowing you to annotate feedback on specific elements and return it within the browser.
Verification: No-Mistakes Pipeline (Quality Foundation)
Counterintuitive claim: Don’t review diffs individually. · Reason: AI writes code too fast; reviewing diffs one by one makes you the bottleneck and is boring. · Analogy: Think like an engineering director – a director doesn’t review PRs but ensures quality through culture and processes.
The pipeline runs in an isolated worktree: · Analyze the session to restore true intent. · Rebase onto latest main, resolve conflicts early. · Adversarial review (separate context window) – most issues caught and self-healed here; ambiguous ones escalated to human. · E2E testing with recorded evidence (screenshots / videos / logs). · Documentation update + link checking. · Push branch, open PR, continuously babysit until merge.
PR presentation: original intent, change summary, test evidence, issues found and fixed by the pipeline, risk assessment. Review strategy: Based on risk assessment, decide effort to invest. Low-risk PRs barely look at diffs (because pipeline already covered); only deep-dive on high-risk.
Work distribution insight: Time is spent at the beginning (clarifying requirements with Lavish) and at the end (quality gate); the middle is entirely handed to AI. The more you free up the middle, the more parallelism.
Long Runs: Good-Night-Have-Fun
Solving the “how to let the agent keep working while I sleep 8 hours”: give goal and stopping conditions, iterate in a loop.
Compared to Claude Code / Codex’s /go, advantage is precise limits on token cap / iteration cap / stop condition – avoid waking up to find your weekly quota exhausted.
Parallelism: treehouse + worktree
Pain points of git worktree: naming, tracking state, manual cleanup = cognitive debt. treehouse: running automatically lands in a free worktree; closing a tab automatically releases it; treehouse status gives an overview.
First Mate: The Orchestrator
As parallel sessions increase, context-switching fatigue sets in.
First Mate is a meta-agent that manages all your crew members: you only speak to it; it automatically breaks down parallel subtasks, calls treehouse to create worktrees, runs no-mistakes, and prepares PRs.
Key observation: After using First Mate, the bottleneck shifts from “agent execution capability” to “what you want it to do” – the captain’s value shifts to strategy: understanding users, researching competition, drawing a better “treasure map.”
TL;DR
Former Meta/Microsoft/Atlassian staff engineer Kun shares his agentic engineering workflow: terminal- and tmux-centric, training AI crew members with global and project-level memory files, enabling delivery of 40–50 tested production PRs daily.
Core Gear: Terminal, tmux, and Editor
The whole workflow hinges on keeping hands almost never leaving the keyboard to maintain flow. Kun does everything in the terminal because it’s inherently keyboard-driven, no mouse interruptions. Plus, terminal configs are reusable across devices, even on a phone.
Terminal Emulator: WezTerm
WezTerm is a high-performance terminal emulator with 26k GitHub stars. Kun chose it for two reasons:
- Truly cross-platform: identical experience on Windows, Mac, Linux.
- Highly customizable: Lua scripts configure nearly every behavior. e.g., changing color scheme reloads immediately.
His dotfiles include a wezterm.lua file where conditional logic can make configs dynamic and flexible. His personal preference is the “Rosé Pine Moon” color scheme, never changed.
tmux: Terminal Multiplexer and Persistence
tmux (terminal multiplexer) is the skeleton of the workflow. It splits the terminal into any number of panes and supports multiple windows. Kun commonly uses three panes: one for the agent, one for the editor, and one for running his own commands.
The most powerful feature is session persistence: detach with a shortcut, reattach later to the exact same state. He can connect from a laptop or phone to the same session, enabling true “work anywhere.”
Customizing tmux configuration is essential. Kun’s config includes years of accumulated keybindings (now muscle memory) plus styling and behavior settings. He recommends spending time configuring tmux to suit your habits.
Text Editor: Neovim
Neovim is a modern version of vim, core goal is to never leave the keyboard. Kun demonstrates operations:
j/kto move up/down,ddto delete line,uto undo.- Relative line numbers on the left: e.g., current line 238, previous line shows 1. To jump 11 lines up to
set environment, press11k. - Plugins provide quick search:
Space+Sto grep the codebase,Space+Fto find files by name (e.g., typeflaketo immediately locate the flake file).
Kun describes the feeling in vim as “a bird flying.” There’s a learning curve, but once muscle memory forms, efficiency far exceeds mouse.
Agent Crew: Four Frameworks and Selection
Kun regularly uses four agent frameworks:
- Claude Code (preferred with Anthropic subscription): most reasonable out-of-the-box experience, feature-rich, but occasionally small bugs, less customizable.
- Codex CLI: written in Rust, feels smoother; open source, allows agent to inspect its own source for workarounds; fewer add-ons and customizability.
- Pi Programming Agent: minimal, highly extensible, good for tinkerers.
- Open Code: excellent terminal UI, integrates with almost any model, richer feature set out-of-the-box than Pi, a good model-agnostic choice.
In the video demo, Kun uses Claude Code for audience convenience, but emphasizes his workflow is agent-agnostic because tools change fast – today’s best model may be different next month.
Training the Crew: Memory Files and Skills
Newly recruited agent crew members know nothing about how the ship works. They need onboarding via memory files and skills.
Global Memory File
The global memory file stores personal preferences and general rules, loaded into the system prompt of every project, every agent session. So content must be lean to avoid consuming too many tokens. Kun’s global memory file is only 27 lines.
Example rules:
- “Never use em dash” – AI models default to em dash instead of regular hyphen, but Kun finds it mechanical and dislikes it.
- “When making technical decisions, don’t overvalue development cost” – he explains an interesting phenomenon: if you ask a frontier model to estimate build time for a 3D first-person shooter, it outputs “days/weeks/months”; but let the agent build it, and it returns a playable version in minutes. This happens because models are trained on human data and don’t know AI writes code much faster than humans, thus over-preferring “cheap” solutions (often unscalable or hard to maintain). This rule corrects that bias.
- “When doing bug fixes, always first reproduce the bug in an end-to-end environment” – AI defaults to writing unit tests, but unit tests often don’t cover actual user behavior; end-to-end tests are more reliable.
There’s also a link to an “interesting insights” blog post not elaborated in the video.
Project-Level Memory File
Each project can have an independent memory file, e.g., highbit project’s claude.md or agents.md (shared via symlink). Content is more detailed, including project background, repo layout, terminology, component workings, end-to-end testing methods, and conventions.
Kun’s approach: every time the agent makes a mistake, he corrects it and asks to remember, storing the lesson in the file. Over time, the crew gets smarter. No fancy memory system needed – one markdown file is enough.
But project-level memory files can become bloated. To improve efficiency, conditional information that isn’t always needed (e.g., E2E testing instructions) can be moved out of memory and converted into a skill. That way, the skill is only loaded when the relevant task is at hand, saving tokens.
Conclusion
Kun’s workflow emphasizes “you are the captain, the agent is your crew.” Through terminal-centricity, memory file training, and skill reuse, he consistently ships 40–50 production PRs daily. The method focuses on fundamental concepts and can be adapted to different agent frameworks and models.
(The video continues with collaborating with a single crew member, multi-crew collaboration, and recruiting a “First Mate” to handle chores – this transcript does not fully cover that part; watch the original video for the complete flow.)
Source: YouTube video link (https://www.youtube.com/watch?v=iQyg-KypKAA)
Kun Chen (@kunchenguid): many people asked me to make a video about my complete agentic engineering workflow
excited to share it’s finally here!!!
it took me about 20 hours in total to record this 45 minutes of walkthrough - it covers everything i do to ship production quality code at an average 40+
Similar Articles
@yanhua1010: The most comprehensive introduction I've seen so far about 'Agentic Engineering Workflow'. Spent an hour reading through it completely — it could easily be turned into a paid tutorial. It covers tmux, agent memory, skills, voice input, long task execution, parallel worktree management…
Recommends a comprehensive introduction to 'Agentic Engineering Workflow', covering tmux, agent memory, skills, voice input, long task execution, parallel worktree management, multi-agent scheduling, along with the visual HTML editor Lavish and a code change validation pipeline: no-mistakes.
@blueskylh1: The most painful thing about solo product development or leading an AI team is being a "mindless messenger" between different chat windows. After the PM writes the requirements, I have to copy and paste them into the developer's chat. After seeing the sharing from Jason @jxnlco, a developer experience engineer on the OpenAI Codex team, I set up a workflow without...
Introduces a multi-AI agent collaborative workflow based on local plain text files and OpenAI Codex, allowing PM, backend, frontend, and QA to efficiently develop via file relay without copy-pasting.
@vikingmute: Great workflow, now it's also my main workflow for developing new features and new ideas: Grill - let AI fiercely question every detail until clear -> Research - separately analyze difficult areas and create a research document (optional) -> PRD - gen...
VikingMute shares their main workflow for developing new features and ideas: using AI (Grill) to drill down on details, Research to analyze difficulties, generating a PRD, breaking it into independent Issues, step-by-step implementation, and finally Review. This is a supplement to Matt Pocock's seven-stage AI development method.
@ThisisHan1_: Recently built a development pipeline and wanted to share the thinking behind it. I was inspired by loop/goal engineering and auto-goal (letting the agent write its own goals and spawn subtasks). But what really clicked for me was...
This development pipeline first creates a rough prototype to elicit user feedback, turning every 'that's not right' reaction into a checkable rule. Then AI agents independently develop and validate against those rules, catching issues early and avoiding self-deception.
@ma_zhenyuan: https://x.com/ma_zhenyuan/status/2057702858800370052
This article introduces Superpowers, a set of AI workflow Skills based on Claude Code, providing automated brainstorming, planning, sub-agent development, and test-driven development, which can significantly improve AI delivery efficiency.