@freeman1266: https://x.com/freeman1266/status/2056351092804297028

X AI KOLs Timeline 05/18/26, 12:28 PM Tools

codex automation coding ai-tools best-practices prompts

Summary

This article shares how to use Codex to automatically write code unattended at night, detailing the types of tasks suitable for night execution, precautions, and a practical task template along with an AGENTS.md configuration guide.

https://t.co/GswBYsHyVg

Original Article

View Cached Full Text

Cached at: 05/20/26, 06:27 AM

Let Codex Write Code While You Sleep

When I first started automating with Codex, I had a simple fantasy:

Before bed, toss it a requirement. Wake up the next morning to find the code written, tests passing, and a PR ready.

Sounds great.

But after actually running it for six months, my conclusion is more restrained: Codex can write code while you sleep, but only if you treat it not as a senior engineer pulling an all-nighter, but as a tireless executor whose boundaries you need to clearly define upfront.

This article isn’t about concepts or documentation. It’s about one thing: how to make Codex actually produce code while you sleep, instead of leaving you a pile of half-finished work, merge conflicts, and permission popups the next morning.

The Worst Thing to Ask Before Bed: “Just Optimize It”

Many people’s first attempt at nightly automation looks like this:

Help me optimize this project.

Or:

Review the code and see how to optimize it.

These kinds of tasks are unreliable during the day, and even worse at night.

Because while you’re asleep, whenever Codex encounters an ambiguity, it has to guess: Which files can I modify? Can I install dependencies? Should I continue if tests fail? Should I refactor legacy code if I find an issue? Which UI style is considered “good”?

What you wake up to is often not a pleasant surprise, but a long list of changes: half useful, half with unclear reasons for being there.

A task suitable for Codex before bed must satisfy three conditions:

Input is clear.
Modification scope is clear.
Verification commands are clear.

In other words, don’t make wishes before bed. Assign work orders.

A Real Example: Let It Fill in ShipReady Security Checks at Night

Take my ShipReady project as an example.

It’s a SaaS landing page audit MVP: users enter a URL or manually paste page copy, the system generates a 12-item audit report; users answer 3 positioning questions to unlock more specific Hero and CTA rewrite packs; after payment, they can generate a public report link.

This project isn’t huge, but it has several tasks well-suited for Codex to do at night:

/api/share must confirm the user has unlocked; unpaid users must not generate public reports.
The public report outputs the user’s page title, hero, evidence, and recommendation — must consistently escape HTML.
When URL fetching fails, the frontend must have a manual paste fallback.
Default memory storage might lose data after serverless cold starts; the README must clearly document KV/Upstash production configuration.
The current npm run check only does Node syntax checks; there are no real business tests.

These aren’t vague statements like “make the project better.” They’re specific risk points with clear file scopes and clear acceptance criteria.

Before bed, I wouldn’t write:

Optimize ShipReady’s backend code.

Instead, I’d write:

Audit ShipReady backend to ensure public report security.

Audit scope:
Review the following files: src/app.js, src/audit.js, src/store.js, public/app.js, README.md.

Modification limits: Only modify test cases or a small amount of validation logic if necessary.
Dependency limits: Do not add any runtime dependencies.
UI limits: Do not change product copy or UI styles.

Core concerns:

Access control: Unpaid users must not be able to create public reports.
Security: Public report HTML must escape user-controlled fields (prevent XSS).
Fault tolerance: When URL fetching fails, the manual paste fallback must be preserved.
Storage: Document the behavior differences between memory storage and KV storage.

Verification:

Run npm run check
Summarize discovered risks, modified files, and any items requiring human review.

This is a task you can actually run at night. It doesn’t ask Codex to “figure out the whole product.” It only asks it to follow your defined boundaries and check a set of risks thoroughly.

Three Types of Good Night Automation Tasks

Not all code tasks are suitable for running while you sleep. The ones I trust most fall into three categories.

Category 1: Read-Only Scan, Deliver Report in the Morning

This has the highest success rate.

For example, let Codex scan the repository every night:

Scan the codebase for TODOs, risky public endpoints, missing tests, deprecated APIs, and contradictory README descriptions. Don’t modify any code — just output a prioritized report.

Such tasks don’t change code, touch permissions, or create conflicts. Their value is to focus your attention the next morning.

In a project like ShipReady, it can reliably remind you:

package.json’s npm run check is only a syntax check.
/api/share and /api/rewrite are payment-related paths.
renderPublicReport is the public HTML output point.
src/store.js has two storage paths: memory and KV.
URL fetch failure depends on the manual paste fallback.

When you wake up, you only need to decide which issues are worth addressing.

Category 2: Small-Scope Test Coverage

This is the most practical.

When asking Codex to write tests at night, keep the scope narrow. Don’t say “add tests to the project.” Say “add tests for this specific module covering these specific behaviors.”

For example:

Add targeted tests for ShipReady’s audit flow.

Coverage:

An audit by an unpaid user cannot create a public report.
An audit by a paid user can create a public report.
When rendering a public report, title, hero, evidence, and recommendation fields must be escaped.
When URL fetching fails, the system should return a manual paste data source with a helpful message.

Notes:

Unless a test reveals a real bug, do not refactor production code.
Run npm run check and the new test command.

The key isn’t the number of tests, but the test targets.

Codex’s worst instruction is “improve coverage.” It will write many low-value tests just to bump the number. You need to make it target the paths that could cause incidents.

Category 3: Documentation and Engineering Hygiene

These tasks are great for running at night.

For example:

Update README.md so a new developer can run, verify, and deploy ShipReady.

Must include:

Local run commands
Verification commands
Vercel routing model
Memory storage limitations
KV / Upstash environment variables
Health check endpoint

Notes:

Do not modify any application code.

The beauty of documentation tasks: even if the result isn’t perfect, it’s easy to review in the morning. It won’t break production logic or introduce complex merge conflicts.

3 Types of Tasks Not to Give Before Bed

The real failures in nightly automation usually aren’t because Codex is too weak, but because you gave it a task that shouldn’t run unattended.

Category 1: Heavy Product Judgment

For example:

Improve the onboarding experience.

Or:

Increase the conversion rate of the pricing page.

These tasks aren’t impossible for Codex, but they shouldn’t run unattended while you sleep.

They involve product judgment, user understanding, visual tradeoffs, and copy strategy. You can ask Codex to propose solutions, list questions, or do competitive analysis — but don’t let it make large changes unsupervised.

Category 2: Large Cross-Stack Refactors

For example:

Refactor the app’s architecture.

Such tasks tend to have far-reaching ripple effects.

A seemingly simple backend state field might affect the API, frontend rendering, storage structure, public reports, README, and deployment instructions. If you let Codex touch multiple layers while you sleep, you’ll likely spend the morning reading diffs instead of enjoying results.

Nightly tasks should be single-layer, single-goal, and involve few files.

Category 3: Tasks Requiring Real Accounts or Production Permissions

Computer Use is powerful — it can open apps, click buttons, fill forms, and read screens.

But that doesn’t mean you should let it operate real backends, personal email, customer data, or production systems at 3 AM.

My rule is simple:

localhost is testable.
Test accounts are clickable.
Production backends are off-limits by default.
Personal email and chat tools are off-limits by default.
“Always allow” is only for very narrow, very specific actions.

Wrong code can be rolled back. Wrong permissions can cause irreversible actions.

The Bedtime Task Template

I now use this template consistently:

Goal:

Context:
<Project background, related modules, why this is needed>

Scope:

Files/directories allowed to modify:
Files/directories read-only:
Items explicitly excluded from scope (what NOT to do):

Rules:

Do not add any dependencies unless explicitly required.
Do not modify unrelated UI or copy.
Do not rewrite system architecture.
Preserve all existing behaviors unless a listed bug requires change.

Verification:

Run .
If verification cannot be run, explain why.

Final response output:

List files modified.
List commands run.
List remaining risks or decisions requiring human review.

This template may seem verbose, but it saves time the next morning.

You’re not writing a prompt. You’re writing a night shift work order.

AGENTS.md: The Employee Handbook for Night Shift Codex

If you really want Codex to write code while you sleep, your repository should include an AGENTS.md.

It’s not a decoration. It’s an employee handbook.

For ShipReady, I’d write:

# AGENTS.md

- This is a Node 18 SaaS landing page audit MVP with no external runtime dependencies.
- After modifying JS files, run `npm run check`.
- Unless explicitly requested, do not add production dependencies.
- Prioritize reviewing and documenting findings in these areas: public HTML escaping, fallback for URL fetch failure, differences between KV storage and memory storage, payment/share state flow, and missing test cases.
- Unless explicitly requested, do not replace the deterministic audit engine with LLM calls.
- Unless the task explicitly requires modifying product copy or frontend work, keep UI copy and styles unchanged.

Memory can remember preferences, but team hard rules should live in the repository.

Otherwise, you assume Codex is following your standards, when it might just be replicating bad habits you left behind during a rush.

Sub-Agent Parallelism: Good for Night Exploration, Bad for Night Modification

When people hear “parallel sub-agents,” they often imagine:

One agent works on the frontend, one on the backend, one writes tests — all working together at night.

Sounds efficient, but it easily backfires.

In real business engineering, frontend, backend, types, configs, and tests often share files. If two agents modify the same file, the one that finishes later overwrites the earlier changes. In the morning, you’ll resolve conflicts before understanding the logic.

I recommend using sub-agents at night like this:

One read-only agent maps API routes.
One read-only agent checks frontend state flow.
One read-only agent finds security risks and missing test points.

Let them explore in parallel, then consolidate reports.

For actual code writing, keep it serial.

How to Review the Next Morning

Don’t jump straight to the final summary.

My order is:

Look at git diff --stat first — confirm the change scope hasn’t gone out of control.
Then look at test commands and failure messages.
Then read key file diffs.
Finally, read Codex’s summary.

The summary is a clue, not the truth.

If it says “completed” but didn’t run the verification command, the task isn’t complete. If it modified files outside the scope, revert and ask why.

Automation doesn’t eliminate review. It transforms review from “find problems from scratch” to “inspect a converged patch.”

Finally

Letting Codex write code while you sleep is not magic or myth.

It relies on the most basic engineering management principles: clear tasks, clear boundaries, clear acceptance.

Give it a wish, and you’ll likely get a pile of guesses.

Give it a night shift work order, and it might actually finish a concrete thing before you wake up.

Codex can pull all-nighters for you. But it can’t decide what’s worth staying up for.

@freeman1266: https://x.com/freeman1266/status/2056351092804297028

Let Codex Write Code While You Sleep

The Worst Thing to Ask Before Bed: “Just Optimize It”

A Real Example: Let It Fill in ShipReady Security Checks at Night

Three Types of Good Night Automation Tasks

Category 1: Read-Only Scan, Deliver Report in the Morning

Category 2: Small-Scope Test Coverage

Category 3: Documentation and Engineering Hygiene

3 Types of Tasks Not to Give Before Bed

Category 1: Heavy Product Judgment

Category 2: Large Cross-Stack Refactors

Category 3: Tasks Requiring Real Accounts or Production Permissions

The Bedtime Task Template

AGENTS.md: The Employee Handbook for Night Shift Codex

Sub-Agent Parallelism: Good for Night Exploration, Bad for Night Modification

How to Review the Next Morning

Finally

Similar Articles

@dotey: https://x.com/dotey/status/2057250417638035555

@XAMTO_AI: Let AI stay up late to modify code while you sleep. This tool is called gnhf, short for "Good Night, Have Fun" — in plain English: you go rest, leave the rest to AI. The workflow is quite simple: ① Write a single command, and it automatically invokes Claude Code or Cod…

@ErdalToprak: https://x.com/ErdalToprak/status/2057871169702027462

Submit Feedback

Similar Articles

@Saccc_c: Haven't configured Agents.md in Codex yet? You can directly copy Karpathy's homework — a 65-line minimal configuration that is concise and effective, perfect as a starting point for your global Agents.md rules. How to do it: directly copy this specification from the repository into Codex App's global custom...

@dotey: https://x.com/dotey/status/2057250417638035555

@XAMTO_AI: Let AI stay up late to modify code while you sleep. This tool is called gnhf, short for "Good Night, Have Fun" — in plain English: you go rest, leave the rest to AI. The workflow is quite simple: ① Write a single command, and it automatically invokes Claude Code or Cod…

@ErdalToprak: https://x.com/ErdalToprak/status/2057871169702027462

@GitHub_Daily: Using AI agents for production-grade tasks—writing code, running workflows, calling APIs—works fine initially, but as the scale grows, things easily get out of control: permissions too broad, context loss, and debugging becomes impossible. That's where agents-best-practices comes in: a complete guide to designing a runtime framework for AI agents, not limited to coding scenarios, but also applicable to operations, sales...