@0xCristal: https://x.com/0xCristal/status/2068280221954961731
Summary
The article details a setup running six AI agents 24/7 on a Minisforum MS-S1 Max mini workstation with AMD Ryzen AI Max+ 395 chip, costing $11/month in electricity. It highlights the shift from cloud API costs to local inference, enabling always-on agents for tasks like email sorting, research monitoring, and document processing.
View Cached Full Text
Cached at: 06/20/26, 04:20 PM
I HAVE 6 AI AGENTS RUNNING 24/7 ON A BOX UNDER MY DESK. TOTAL MONTHLY COST: $11 IN ELECTRICITY
Several months ago I couldn’t justify leaving an AI agent running overnight. Every loop cost tokens. Every token cost money. So I’d start a task, babysit it, and shut it down when I went to bed.
Now I have six agents running around the clock on a single machine the size of a toaster oven. They research, summarize, monitor, sort, and write, while I sleep, while I eat, while I’m on vacation. The electricity bill went up by eleven dollars. That’s the entire operating cost.
Here’s the quick review👇
This is the setup, the machine, and what changes when AI stops being a service you rent and becomes infrastructure you own.
The real gem nobody talks about
Everyone online is arguing about which cloud model is smarter. Meanwhile a quiet revolution happened in hardware and almost nobody noticed.
The Minisforum MS-S1 Max is a mini workstation. Aluminum chassis. Fits on a shelf. Ships with a 2TB SSD, a built-in 320W power supply, and the most interesting chip AMD has ever put in a desktop: the Ryzen AI Max+ 395.
Here’s what matters about this chip: it shares 128GB of memory between the CPU and GPU. No separate graphics card. No tiny VRAM pool. One massive unified pool that both processors read from. That’s the same architectural trick that makes Apple Silicon great at local AI, except this runs Linux properly, has dual 10-gigabit ethernet, USB4 V2 at 80Gbps, a PCIe x16 slot for expansion, and costs roughly $3,000.
This is not a gaming PC. This is not a NAS. This is a local AI server that happens to look like a mini PC. And the spec that makes it different from every other Strix Halo box: Minisforum pushes the chip to 160W, where competitors cap at 120–140W. More watts = more speed on sustained inference. That matters when your agents run for hours.
What it runs and how fast
Install Ollama on Linux. Pull a model. That’s it. No driver drama, no CUDA dependency chains, no config files. Here’s what the box actually delivers with Q4-quantized models:
The 30B and 70B models are the workhorses. Fast enough for interactive use. The 235B sits in the same league as Claude Sonnet on many benchmarks, slower, but you’re not paying per token, so you let it think.
And here’s the party trick: Minisforum designed this box for clustering. Two MS-S1 Max units linked together run Qwen3-235B at ~11 tokens/second. Four units ran DeepSeek-R1 671B (the full 380GB model). Locally. On a desk. No data center. No cloud.
Why ‘always-on’ changes everything
Here’s the thing people miss about local AI. It’s not about the model being as good as GPT-5 or Claude Opus. It’s about what happens to your behavior when inference is free.
When you pay per token, you think before you prompt. You optimize your queries. You shut down experiments early. You never let an agent loop for eight hours because the math doesn’t make sense.
When inference costs electricity and nothing else, you stop thinking that way. And that’s where the real value shows up.
The six agents I run around the clock:
-
The inbox sorter. Pulls my email every 15 minutes. Categorizes everything. Drafts replies for anything routine. I wake up to a sorted inbox with draft responses waiting. Time saved: ~40 minutes every morning.
-
The research monitor. Watches 30+ RSS feeds, niche forums, and specific accounts across platforms. Summarizes anything relevant to my work into a daily digest that lands in Telegram at 7 AM. On a cloud API this would cost $15-20/day in tokens. On the box: free.
-
The document processor. Anything I drop into a specific folder gets read, summarized, and tagged. Contracts, reports, PDFs, research papers. The summary and key points appear in my notes app within minutes. I haven’t manually read a 40-page report in months.
-
The code reviewer. Watches my git repos. Every push triggers a review, style, bugs, security, test coverage. Results posted as comments. Runs the 70B model so the reviews are actually good.
-
The meeting prep agent. Looks at tomorrow’s calendar, pulls context from my notes and recent emails about each person/topic, generates a one-page brief per meeting. Ready by 8 AM.
-
The learning agent. Takes topics I’m interested in, finds recent papers and articles, reads them overnight with the 235B model, and produces a weekly ‘what’s new’ report with explanations written for my level of understanding.
None of these are revolutionary individually. What’s revolutionary is running all six simultaneously, around the clock, and not caring about the cost. On cloud APIs, this stack would run $800-1,200 a month. On the MS-S1 Max, it runs on the electricity bill.
The setup. One evening, most of it downloading
1. Replace Windows with Linux
The box ships with Windows 11, which caps GPU-accessible memory at ~96GB. Ubuntu 24.04 unlocks the full pool. Boot from USB, format, install. 20 minutes.
2. Install Ollama
3. Pull your models
4. Set up Open WebUI (optional, gives you a ChatGPT-like interface)
Now every device on your network, phone, laptop, tablet. Can chat with your models at http://your-box:3000
5. Point Claude Code at the local endpoint
Same Claude Code CLI. Same agent loop. Every request goes to your box instead of Anthropic. Nothing leaves your network.
6. Build your agents
This is the fun part and the part that’s different for everyone. I use a mix of simple cron scripts, n8n workflows, and Claude Code’s agent mode for the more complex ones. The models are the engine. How you wire them is up to you.
Total setup time: 90 minutes if you’ve never touched Linux. An hour if you have.
The math. Important!
After break-even, every month is money that stays in your account. Over three years that’s somewhere between $25,000 and $40,000 not sent to AI companies, depending on how heavily you use agents.
But honestly, the savings aren’t the point. The point is the behavior change. I started building agents I never would have built when every token cost money. The meeting prep agent? Never would have justified the API cost for ‘nice to have’. The learning agent running a 235B model overnight on papers? Absurd on a per-token basis. Obvious when it’s free.
What this box can’t do
I’m not going to pretend local replaces cloud entirely. It doesn’t. Here’s where the line sits today:
Still need the cloud for:
-
Frontier reasoning (Claude Opus, GPT-5, for the genuinely hard 5% of problems)
-
Real-time web access and tool use built into the model
-
Multimodal tasks where the cloud models are generations ahead
-
Serving a team of 5+ people simultaneously
The box handles everything else:
-
Daily coding and scripting
-
Document analysis and summarization
-
Long-running agents and background automation
-
Private data processing (nothing leaves your network)
-
Drafting, editing, brainstorming
-
RAG over your personal knowledge base
-
Bulk processing (transcription, classification, extraction)
For the cloud tasks, you pay per-use through the API. $5 here, $10 there. Not $200/month for a subscription you use 20% of.
The honest downsides
The box runs warm under load. Not dangerously, but the fans are audible. Don’t put it in your bedroom. A closet with airflow works. Under a desk works.
Open-source models aren’t Claude Opus. They’re close on many tasks, noticeably behind on the hardest reasoning problems. If your work is 100% frontier-difficulty AI tasks, this box isn’t your answer. If your work is 80% routine and 20% hard, run the 80% locally and pay per-use for the 20%.
You’re buying hardware. If AMD releases something twice as fast next year, your $3,000 doesn’t refund itself. But break-even at month 3–4 means you don’t need to keep it for five years. Even one year of use makes the math work.
Ollama on AMD is solid now, but not CUDA-level mature. Occasionally a new model drops with Nvidia-only optimizations first. You wait a week or two. That’s the early-adopter tax.
And you need to be okay with Linux. The commands above are simple. The first time something breaks, you’ll spend an hour on a forum. That’s the cost of going local today instead of waiting another year.
Why this specific box
There are a dozen Strix Halo mini PCs on the market. The MS-S1 Max stands out for three reasons:
160W sustained power. More than any competitor. Inference speed on large models scales with power. This matters when agents run for hours.
Dual 10GbE. Most competing boxes have 2.5GbE. If you’re moving large files, clustering multiple units, or running this as a network AI server, 10-gigabit changes the experience.
2U rack-mountable. This is a detail that sounds niche until you realize it means you can stack two or four of these in a standard rack and build a local AI cluster that runs 671B-parameter models. On your desk. For the price of a used car.
The real point
The AI industry wants you to think of intelligence as a utility. Something you subscribe to. Something metered. Something that lives in someone else’s data center, runs on someone else’s schedule, and stops when you stop paying.
That model made sense when the hardware couldn’t keep up. It doesn’t anymore.
128 gigabytes of unified memory. A chip designed for AI inference. Open-source models that cover 80% of what you need. An open-source stack that installs in an hour.
One machine. Under your desk. Running six agents that never sleep.
$3,000 once. $11 a month. Everything stays on your network.
That’s the setup. I just wish I’d started sooner.
Want more alpha?
Follow me and join my private channel while you can: link
Similar Articles
@mr_r0b0t: 16 local AI agents streaming at once! MiniMax M2.7 NVFP4 — 2x GB10, no cloud APIs.
A demonstration shows 16 local AI agents streaming simultaneously using MiniMax M2.7 NVFP4 on two Nvidia GB10 chips, with no cloud APIs required.
@mronge: https://x.com/mronge/status/2052846432969720202
A practical guide on setting up an always-on AI agent on a Mac mini, covering hardware selection, cloud vs. local AI model tradeoffs, and agent system choices for automating tasks like sales reporting and social media suggestions.
AMD's tiny AI PC points to a more local future for model inference
AMD's Ryzen AI Max platform with 128GB unified memory enables local inference of large models up to 200 billion parameters, aiming to shift AI workloads from cloud to compact personal hardware.
@DivyanshT91162: Everyone is distracted by AI agents in the cloud… Meanwhile, some people quietly turned their laptops into autonomous A…
Describes how to turn a laptop into a 24/7 autonomous AI research machine using Qwen3-35B-A3B, llama.cpp, and 4-bit quantization by Unsloth, requiring no cloud or GPU server.
How are people keeping OpenClaw/Hermes agents running 24/7 without blowing through their API budget?
A practitioner seeks advice on running AI agents 24/7 without high API costs, asking about local models, cloud GPUs, or hosted APIs, and wants cost-efficient setups balancing reliability and reasoning quality.