@linghucong: https://x.com/linghucong/status/2068860590966321370

X AI KOLs Timeline 06/22/26, 12:56 AM Tools

codex autonomous-agent goal-mode prompt-engineering best-practices long-running-task

Summary

This article shares practical experience of using Codex /goal mode for long-term unattended programming, including how to write effective prompts, using persistent project memory to prevent deviation, and key settings and precautions.

https://t.co/iHEQihw7ba

Original Article

View Cached Full Text

Cached at: 06/22/26, 01:41 AM

I Let Codex Run All Night: /goal Long-Run Battle Report and the Pitfalls I Stepped Into

25 hours, 13 million tokens, 30,000 lines of code—completely unattended. This isn’t a launch number, it’s what Codex’s /goal can actually do. The catch: you need to know how to feed it.

Let’s Look at Those Numbers First

25 hours. Approximately 13 million tokens. About 30,000 lines of code. Zero human supervision throughout.

This is the result of a public experiment: set GPT-5.3-Codex to Extra High reasoning, drop in an empty repository with one sentence: “Build a design tool from scratch.” Then it just ran. Planning, writing code, running validations, hitting bugs and fixing them on its own—it kept going for a full day and night. The craziest part? After 20+ hours, it still hadn’t gone off the rails and was still aligned with the spec.

Behind this is Codex’s /goal mode, which the OpenAI folks internally call the “Ralph loop.” I tried it out this weekend, handing over a small project that had been dragging on for a while.

The verdict: the thing truly works, but whether it produces results depends 90% on how you feed it in the first half hour. The model’s intelligence matters far less.

What Exactly Is /goal?

When you normally use Codex or Claude Code, the rhythm goes: you say one thing, it does one step, then pauses for your next instruction. Question and answer—you’re the driver, it’s the co-pilot.

/goal flips that. You give it a goal, and it enters its own loop: plan the next step, execute, check its own output, correct itself if it catches an error, then continue until the goal is met—or until it hits a wall it can’t climb over on its own.

In other words, you go from “driver” to “client.” You lay down the requirements, then go to sleep.

Sounds great, right? That’s what I thought at first. Then I hit my first snag.

My First Snag: The Prompt Wasn’t Enough

The first time I ran it, I wrote what I thought was a pretty detailed goal description—seven or eight lines. I typed /goal, made a cup of tea, came back, and… the output looked barely different from what I’d get prompting it step by step in normal mode.

Then I stumbled on a developer’s quote that made me laugh because it was so spot-on: “Any /goal prompt I write by hand is never good enough. The output looks the same as a normal prompt.”

What’s the root cause? Humans inherently underestimate how much an agent needs to know. All those default assumptions we hold in our heads—“of course this goes without saying”—it knows none of them.

Later, I watched it run for an hour and felt embarrassed: I’d left out at least three constraints, two acceptance criteria, and an entire architectural assumption. I thought I’d been clear, but I’d only said what I already understood.

Lifesaver Trick #1: Let It Write the Prompt for You

After that failure, I wised up. The most useful technique is meta-prompting.

Don’t write that /goal prompt yourself. First, have the model expand your rough idea into a proper, full /goal prompt. Throw a rough thought at it: “I want to build X. Write this into a complete goal description for an autonomous agent, and list all the things you need me to clarify.”

It will then ask you back: What tech stack? How should this edge case be handled? What counts as “done”? It forces you to surface all those unspoken defaults in your head.

After a round of Q&A, the prompt in your hands is ten times better than anything you could have drafted alone. A five-minute back-and-forth saves hours of derailment later. This step I now do without fail.

Lifesaver Trick #2: Give It an External Brain

The second trick is even more critical—and it’s the real reason the 25-hour experiment worked: persistent project memory.

The logic is simple. When an agent runs for 25 hours, its context gets swapped over and over. It will “forget” what you said at the start. So you can’t keep information only in the conversation; it needs to live on disk.

My current approach: before starting, lay down a few markdown files in the project: spec.md to enforce rules, plan.md for broken-down steps, todo.md as a task list with a verifiable completion criterion per item, and status.md for it to log where it is.

Then inside /goal, I explicitly tell it: after each step, go back and read these files; after finishing a step, update status.md. This way, even if it runs all night and cycles through context ten times, these files on disk act as its “external memory.” It can always look back and won’t wander off track.

That 14-hour device driver project case study also relied on this exact setup.

The Three-Piece Set for Extended Runs

Combine the two tricks above, add one more, and you have the three hardware requirements for /goal to truly run long.

First, a “don’t stop” instruction. You must write in black and white in the prompt: “Do not stop to wait for me. Keep going until everything is complete.” Without this line, it will habitually stop after two steps to ask you questions.

Second, a todo list with verification steps. Listing tasks isn’t enough. Each task needs to be followed by “how to verify this is done”—which test to run, which output to check. Without verification anchors, it will cheerfully declare “done” when it hasn’t actually compiled anything.

Third, high reasoning mode. For complex tasks, always use High or xHigh. The 25-hour experiment used Extra High. With lower settings, the model drifts further and further on long tasks.

Missing any one of these three, and a long run will likely die midway.

A Pitfall You Must Know: It’s Not a True Perpetual Motion Machine

After all this praise, I need to pour cold water so you don’t wake up at midnight and find nothing got done.

The biggest pitfall: Codex automations are essentially local cron jobs. In plain English, if your computer shuts down or the Codex app closes, it stops running. There’s no cloud server babysitting it 24/7 for you. I almost fell into this on my first night—I thought I could close the laptop and go to sleep, but luckily I checked the docs just in time. In the end, I left the machine on all night with the app open, and that’s how it finished.

The second pitfall: It can’t determine “done” on its own, nor can it understand fuzzy requirements. That’s why the first two tricks are so important. If your goal definition isn’t thorough, it either goes into an infinite loop or silently veers off to Antarctica.

Honestly, even as I write this, I hesitate a little: the bar for this workflow isn’t low. It’s not a “say one thing and lie back while it works” deal. It’s more like “spend half an hour being a product manager, pin down every requirement, and only then dare to let go.”

Ready-to-Use Recipe: You Can Start Right Away

Enough talk. Here’s the shortest path you can follow directly.

Set up the environment. Install Codex CLI globally via npm, set your OpenAI API key as an environment variable, create a project directory, and run git init (strongly recommended for rollback if things go wrong).
Don’t write the goal yourself; let the model write it first. Throw your rough idea at it, ask it to expand it into a full goal description, and ask it to question you about anything it’s unclear about. Answer those questions.
Lay out external memory. Create spec.md, plan.md, todo.md, and status.md in the project. Each todo item must include a verification criterion.
Enter interactive session, type /goal, followed by that polished goal description, including “keep going without stopping, read those .md files after each step, and update status.md after finishing.”
Set to High or xHigh.
Keep your machine and the app on all night. In the morning, check status.md and git log for verification.

Who Should Try It, and Who Should Hold Off

Suitable tasks: those with clear boundaries and well-defined acceptance criteria. Scaffold building, bulk refactoring, building a small tool from scratch according to spec, writing a large suite of tests. For this kind of “direction is set, it’s just grunt work,” a /goal overnight equals a week of your manual effort.

Tasks to avoid for now: requirements that aren’t clear, highly exploratory, where decisions need to be made on the fly. If you let go on those, it will definitely go astray. You’re better off sitting there doing Q&A.

My current workflow: during the day, I use normal conversation mode with Claude Code or Codex to explore and set direction. Once the direction is locked and I can write acceptance criteria, I package it into a /goal and throw it at Codex to run overnight. Exploration stays with humans; grunt work goes to machines.

I’m still tuning this setup. I haven’t fully figured out the most efficient way to write those persistent memory files. But one thing I’m now certain of: /goal isn’t about slacking off—it’s about moving your work from “typing code” to “nailing down requirements.” The more rigorous you are in the first half hour, the bigger the payoff it will return over the entire night.

If you’ve also let it run all night, what did your machine look like in the morning? Come back and tell me.

@linghucong: https://x.com/linghucong/status/2068860590966321370

I Let Codex Run All Night: /goal Long-Run Battle Report and the Pitfalls I Stepped Into

Let’s Look at Those Numbers First

What Exactly Is /goal?

My First Snag: The Prompt Wasn’t Enough

Lifesaver Trick #1: Let It Write the Prompt for You

Lifesaver Trick #2: Give It an External Brain

The Three-Piece Set for Extended Runs

A Pitfall You Must Know: It’s Not a True Perpetual Motion Machine

Ready-to-Use Recipe: You Can Start Right Away

Who Should Try It, and Who Should Hold Off

Similar Articles

@dkundel: https://x.com/dkundel/status/2062650378089594955

@freeman1266: https://x.com/freeman1266/status/2056351092804297028

@dotey: https://x.com/dotey/status/2057250417638035555

@thinkszyg: https://x.com/thinkszyg/status/2066837941477920993

Submit Feedback

Similar Articles

@dkundel: https://x.com/dkundel/status/2062650378089594955

@freeman1266: https://x.com/freeman1266/status/2056351092804297028

@vista8: Many friends ask: how to write a good Goal instruction for Codex? Execute before sleep, model automatically develops, 'harvest' the next day. I've posted a 40k-word document, but most people are too lazy to read it, so I wrote a Skill. Turn a one-sentence requirement into a goal, copy and use. Installation command: npx skills add jo…

@dotey: https://x.com/dotey/status/2057250417638035555

@thinkszyg: https://x.com/thinkszyg/status/2066837941477920993