@bcherny: Seeing a number of benchmarks showing Opus is the best model for long-running work. Five tips for running Opus autonomo…

X AI KOLs Following Models

Summary

Practical tips for running Anthropic's Claude Opus autonomously for hours or days, such as using auto mode, dynamic workflows, and self-verification; also references the SWE-Marathon benchmark for long-horizon software tasks.

Seeing a number of benchmarks showing Opus is the best model for long-running work. Five tips for running Opus autonomously for hours/days: 1. Use auto mode for permissions, so Claude doesn’t ask for approval 2. Use dynamic workflows, to have Claude orchestrate hundreds/thousands of agents to get a task done 3. Use /goal or /loop, to nudge Claude to keep going until it’s done 4. Use Claude Code in the cloud, so you can close your laptop (easiest way is the desktop or mobile app) 5. Make sure Claude has a way to self-verify its work end to end: Claude in Chrome browser extension for web, iOS/Android sim MCP for mobile, a way to start the full web server or service for backend work
Original Article
View Cached Full Text

Cached at: 06/08/26, 03:23 PM

Seeing a number of benchmarks showing Opus is the best model for long-running work.

Five tips for running Opus autonomously for hours/days:

  1. Use auto mode for permissions, so Claude doesn’t ask for approval
  2. Use dynamic workflows, to have Claude orchestrate hundreds/thousands of agents to get a task done
  3. Use /goal or /loop, to nudge Claude to keep going until it’s done
  4. Use Claude Code in the cloud, so you can close your laptop (easiest way is the desktop or mobile app)
  5. Make sure Claude has a way to self-verify its work end to end: Claude in Chrome browser extension for web, iOS/Android sim MCP for mobile, a way to start the full web server or service for backend work

Nice!

Context rot isn’t a thing with 4.8 imo, but curious if that’s been your experience also

Most important thing I’ve found is self-verification + dynamic workflows prompted with something like “use a workflow to test the result e2e in a browser using claude in chrome mcp. Especially look for edge cases and ui issues”

A few things I’ve used very long running sessions for:

  • Building complex features
  • Migrating code from language X to Y
  • Migrating code from framework X to Y
  • Repeatedly profiling and optimizing code to hit a specific memory or CPU target
  • Finding and fixing flaky tests in CI
  • Profiling CI to make it faster

I think of it in terms of ROI rather than absolute cost: how much would it have cost to do the same work manually? Often the answer is weeks or even months of engineering time

These are not designed for people to invoke them, though you can do so if you want. Just tell the model what you want to happen, and it will do the work to invoke the right skills for you

I don’t see that with Opus 4.8 anymore, do you?

Run /usage to see a breakdown of the specific skills, mcps, and plugins that are using your tokens

Just tell claude to use a workflow

Yes. It’s more powerful and more token-efficient

Enterprise seat limits are configurable, maybe ask you your admin to increase limits?

We do both! Depends if it’s a one-off or something you want to run on future PRs

@bcherny Many people try to achieve this through an orchestration layer. When are you planning an overlay/supervisor agent that monitors, dispatches, summarizes, and manages other sessions?

Agent View is great, but jumping between sessions is getting frustrating - especially when active sessions quietly fall into Completed instead of surfacing Needs Input.

Similar Articles