@bcherny: Seeing a number of benchmarks showing Opus is the best model for long-running work. Five tips for running Opus autonomo…
Summary
Practical tips for running Anthropic's Claude Opus autonomously for hours or days, such as using auto mode, dynamic workflows, and self-verification; also references the SWE-Marathon benchmark for long-horizon software tasks.
View Cached Full Text
Cached at: 06/08/26, 03:23 PM
Seeing a number of benchmarks showing Opus is the best model for long-running work.
Five tips for running Opus autonomously for hours/days:
- Use auto mode for permissions, so Claude doesn’t ask for approval
- Use dynamic workflows, to have Claude orchestrate hundreds/thousands of agents to get a task done
- Use /goal or /loop, to nudge Claude to keep going until it’s done
- Use Claude Code in the cloud, so you can close your laptop (easiest way is the desktop or mobile app)
- Make sure Claude has a way to self-verify its work end to end: Claude in Chrome browser extension for web, iOS/Android sim MCP for mobile, a way to start the full web server or service for backend work
Nice!
Context rot isn’t a thing with 4.8 imo, but curious if that’s been your experience also
Most important thing I’ve found is self-verification + dynamic workflows prompted with something like “use a workflow to test the result e2e in a browser using claude in chrome mcp. Especially look for edge cases and ui issues”
A few things I’ve used very long running sessions for:
- Building complex features
- Migrating code from language X to Y
- Migrating code from framework X to Y
- Repeatedly profiling and optimizing code to hit a specific memory or CPU target
- Finding and fixing flaky tests in CI
- Profiling CI to make it faster
I think of it in terms of ROI rather than absolute cost: how much would it have cost to do the same work manually? Often the answer is weeks or even months of engineering time
These are not designed for people to invoke them, though you can do so if you want. Just tell the model what you want to happen, and it will do the work to invoke the right skills for you
I don’t see that with Opus 4.8 anymore, do you?
Run /usage to see a breakdown of the specific skills, mcps, and plugins that are using your tokens
Just tell claude to use a workflow
Yes. It’s more powerful and more token-efficient
Enterprise seat limits are configurable, maybe ask you your admin to increase limits?
We do both! Depends if it’s a one-off or something you want to run on future PRs
@bcherny Many people try to achieve this through an orchestration layer. When are you planning an overlay/supervisor agent that monitors, dispatches, summarizes, and manages other sessions?
Agent View is great, but jumping between sessions is getting frustrating - especially when active sessions quietly fall into Completed instead of surfacing Needs Input.
Similar Articles
@omarsar0: Great tips. In practice, this is how it roughly looks to run agents autonomously for hours or days. /goal or /loop to k…
A thread sharing practical tips for running AI agents autonomously for extended periods, focusing on the Opus model with advice on permissions, dynamic workflows, and verification.
Claude Opus 4.8 launched May 28 with a feature that signals where AI is actually heading. It can now break one task into dozens of parallel workstreams and run them simultaneously.
Anthropic launched Claude Opus 4.8 with Dynamic Workflows, enabling parallel task execution, and improved judgment that flags uncertainty, moving from a tool to a collaborator.
@orca_build: Anthropic’s new Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1… …but it’s noticeably better at UI tasks.…
Anthropic's Opus 4.8 scores 3.6% lower than GPT 5.5 on Terminal-Bench 2.1 but excels at UI tasks; Orca's orchestration enables Codex to delegate UI tasks to Claude Code.
@bcherny: People often ask what my biggest tip is for getting the most out of Claude Code. These days my #1 tip is: use auto mode…
Boris Cherny recommends using auto mode in Claude Code for parallel sessions, and ClaudeDevs announces that auto mode is now available on the Pro plan and supports Sonnet 4.6 and Opus 4.7.
@RayFernando1337: Opus 4.8 Max Thinking in Cursor with Multitask workflow is top tier at long context understanding, speed, and implement…
A developer shares their workflow using Cursor's subagent harness with Opus 4.8 Max Thinking for long context understanding and implementing large features in Swift, emphasizing hands-on planning and phased acceptance testing.