The author shares their personal journey of adopting a new coding workflow using a single Codex agent with /goal mode, which they find superior to multi-agent setups with newer models like GPT-5.5 and Opus 4.8.
i fucked up my sleeping schedule because of my new ai workflow but it's SOO worth it. i feel i have leveled up my engineering productivity to new heights again! (LONG, detailed write up on it below) i've finally found the BEST workflow i've ever used for coding after a lot of trial and error with 'productivity theatre' for ex. having agents orchestrate subagents in attempts to token maxx and try to capture as much work as i can in one shot while that DID work, and it WAS good (and quite necessary) with older models (opus 4.5, and gpt 5, lol) it is no longer good with the new generation of models (gpt 5.5 and opus 4.8) while the numbers of these models seem like small increments they are COMPLETELY different in capabilities. because of their extended context window and increased intelligence, they are actually more capable BY THEMSELVES in one single MEGA THREAD. breaking a complex task down into steps and using subagents of these models to execute them in parallel is now an improper way to use these models instead, breaking a complex task down into steps, and having ONE SINGLE CODEX AGENT run through the full list, A-Z, with /goal mode, has been the most ACCURATE, FAST and POWERFUL workflow i've ever done in my life several months ago @steipete posted a blog post (linked below) titled 'just talk to it' in which he just... talked to a codex agent to get work done. no crazy multi-agent workflows, no crazy plugins... this madman just tells it something to do and trusts it to do it now i didn't trust codex to do this reliably back in october last year when this was posted, and everytime i tried it myself I did not get optimal results codex was always a good model for writing raw code, but it was too autistic to understand my intent, so i used claude code to manage codex agents to get tasks done. that carried me throughout the first half of 2026 and was the best personal workflow i had, because i had one main agent who understood my intent that can keep these autistic coding monsters aligned and in checked HOWEVER - with the release of 5.5, and updates to the codex harness (SPECIFICALLY /goal mode), my old workflow is completely invalid now! i dedicated the first month of 5.5 release to code exclusively with codex. it was really clunky, and felt really weird, and i am a bit neurodivergent so talking with codex (who definitely feels neurodivergent in the way it communicates LOL) was really awkward and weird the problem was i was so used to talking to Opus, and Codex doesn't understand me the way Opus did. it took a couple weeks to adjust my communication style to match Codex, and then we REALLY started cooking! i started new complicated projects from scratch to REALLY test it's capabilities and this MONSTER was able to handle crazy projects for me, like building a resilient system that spins up microVMs on my mini for securely housing isolated agents just by... talking to it. now i do have some personal skills that match my workflow i've created, and guardrails on my codebase like ESLint to help keep it in check, but codex just created these when I asked it too and updates it to match the work it does what makes Codex spectacular is its ability to 'dogfood' and run E2E tests via computer use on my macbook i feel this is a heavily underrated feature, but it is a 10x level up in terms of the agent creating reliable code on the spot, and only reporting back to me once the code is fully tested from a user's perspective the magic verb here is 'dogfood' the work. dogfooding means using your own software before releasing it to customers. codex is great at using the software it codes before releasing it back to ME! because this increases the reliability of the work, i no longer waste time on fixing a broken feature only discoverable through using the actual app (which takes a TON of time when you repeat this over and over) and instead focus on prompting the next feature this is an AMAZING time saver because @RayFernando1337 taught me that the code itself will look flawless and logically be 'bug free' while dogfooding the app shows there problems that end up being architectural - codex is great at finding these on its own and re-designing the logic to solve the problem, unsurprisingly without breaking other features because if it does end up breaking other features in a re-write, it finds it, throwing a net over all related problems it finds and designs the proper solution because it has FULL context in the past, telling an agent to 'fix' a problem lead to it breaking other working features in the process, but hey, it 'fixed' the original problem, lol in terms of how I talk to Codex to achieve these great results is very simple, but ends up taking quite a bit of time. i will go into more detail here, because it is the most CRUCIAL part of the entire process. literally NOTHING matters more than the discussion phase for critical work which you CANNOT fuck up. every optimization to your workflow you can do is minuscule compared to the impact this setup has! i have been working on my product for the last 6 months, building something to completely automate impactful workflows for non-technical business owners local in my area. AI is confusing, so I've designed a solution to make this incredibly simple to use. like, they don't even have to talk to an agent or use the app at all, besides clicking a button here and there point being, it has become quite a large codebase that i need to work in with extreme care. i cannot just tell codex to do something in two sentences because it does not understand the specifics of my design taste - but after a couple back and forths of simple conversation, it becomes FULLY aligned with me, understands what it needs to do with bullseye precision, and one shots a LARGE chunk of work with NO errors. it delivers perfection, every single time. the process basically goes like this: 1. me: "hey codex, we need to implement billing. i want this centralized and enforced so every single billable service routes through this system. research and scope this out, then report back to me with a couple options of the simplest design we can do that is the most correct solution long term - and a maximum of 5 important architectural questions I need to answer" (note: 'simplest' design actually makes it not over-engineer. i ask for different options to activate 'creative' vectors by exploring completely different solutions. I have to tell it to find the most correct solution long term, because if I don't, it will find the 'simplest' solution that does the job effectively, but is poor for scale or the long term vision. the mix of these 3 simple requests have produced the most effective output for me) 2. codex then goes and reads any relevant docs, our ADR (CRITICAL, will explain this more below), and the raw code itself. it is CRITICAL to NOT let a codex sub-agent do the reading here. sub-agents do a great job at compacting large amounts of research, but code is specific and logic is critical. a summary has always missed important details. A great benefit of having one codex agent read and hold this logic is it does not have to read the files again, and BLASTS through implementation 3. since 5.5 is VERY intelligent, it reports back with highly impactful questions that allow me to align my intent with Codex. they're usually incredibly easy to answer, and i always ask for it's recommended answer and an explanation supporting it. if you have ADRs set up in your codebase, you may find that Codex ends up recommending answers that are COMPLETELY aligned with you. 95% of the time, i am not answering these questions, because it deadass recommended what i would've said, so i just say "yes" to confirm my alignment NOW - a quick side track into what an ADR is, how I use it and why this completely replaced any other form of documentation in my app an ADR is an Architectural Decision Record. it is an enterprise practice that allows big teams and new hires to be aligned on how to THINK about the codebase, thus allowing them to develop proper solutions for new features or bug fixes in this discussion process with codex, once we are both aligned after our conversation, often times we will finalize on a core, well, architectural decision (lol) that future devs (or agents) MUST follow. this goes in docs/adr, and is labeled in numerical order. yes it is just a .MD file, but a highly impactful one! you can just prompt the agent to turn the discussion into an ADR, and it does a great job with no further explanation! the contents of mine consist of: - a single sentence of the decision we made (the title) - context of why it exists - a deeper explanation of the decision - a list of invariants (conditions that MUST be true in order to respect the decision) - the consequence of not following the decision (typically, explaining the problems it prevents) - file references (usually core services that agents need to understand and import functions from) i try to keep it as small as possible, always try to simplify the core intent into the minimal tokens required for an agent to understand it. though, this is not TOO much of an issue because of the larger context windows new gen LLMs have now. understanding and using ADRs have been more impactful for me in agent accuracy and efficiency in large codebases than ANY skill, tool, or 'prompt' combined, TENFOLD ( btw i picked up this concept from @mattpocockuk 's posts, so i am grateful for the insights he has shared) OKAY - now that you understand the concept and importance of ADRs a bit more, we shall get back to the final steps of the codex workflow 4. now that the discussion phase is done, i will tell Codex the following prompt: "Create a Master PRD, and execute this to completion with goal mode. Make sure to dogfood it and run e2e tests" a lot of people don't know that Codex can make its own goal through a tool it has. i never write the /goal manually. i tell codex to make a master PRD to ensure the truth is aligned when it compacts, and it creates a goal for itself to FULLY implement and test the feature these runs usually take an hour or 2, but DAMN it works so well in comparison to anything i've done, and it's the simplest workflow i've used so far now I am trying to figure out how to level this workflow up, because there's no way i am waiting 1-2 hrs when i can be token maxxing with efficiency today i saw a post from peter (clawfather) and a clip from boris (claude code creator) where they brought forth the concept of creating loops and i have no idea how these madmen with access to infinite tokens operate, BUT it sparked an idea in my head of how to level up my current workflow, and it seems like the idea lies in having one main agent handling multiple threads of consistent codex agents i've orchestrated in the past using one main agent that creates temporary (stateless) agents to solve the task at hand and de-spawn but given how well this workflow has worked for me, it feels like the proper way to have an orchestration is to have one agent handle multiple stateful agents, and have it handle this workflow loop i described in this post ANYWAYS that's my current update on what's been helping me a lot, if you have any questions please drop them below! i'm happy to help if you DM me as well, God bless you and i hope you have a great day
Jason Liu shares how he uses OpenAI's Codex for knowledge work beyond coding, leveraging durable threads, voice input, and steering to integrate coding agents into his broader workflow.
OpenAI's Codex has surpassed Anthropic's Claude Code in functionality for some users, driven by the capabilities of GPT-5.5 and an improved desktop application. The article discusses migration strategies and personal use cases for adopting Codex as a primary tool for knowledge work.
A developer compares Codex 5.3 and Claude Opus 4.6 on autonomous Java AI agent development, finding that the model with more elegant architecture (Claude) often produced code that never executed, while the more boring and direct Codex improved the working product with practical fixes like timeouts and history recovery.