@DeRonin_: My current local AI setup: - 2x DGX Spark linked (256gb) > GLM 5.2 @ 2bit, reasoning + agent loops - Mac Studio M3 Ultr…
Summary
A user describes their fully local AI stack using multiple hardware devices running Chinese models like GLM, Qwen, and Kimi, claiming 87% cost savings compared to frontier models like GPT-5.5 and Opus 4.8, while noting plans to self-host video generation.
View Cached Full Text
Cached at: 06/30/26, 01:46 PM
My current local AI setup:
- 2x DGX Spark linked (256gb) > GLM 5.2 @ 2bit, reasoning + agent loops
- Mac Studio M3 Ultra 96gb > Wan 2.2, image generation
- Mac mini M5 Pro 64gb > Qwen3.6-35B, code + content drafts
- MB Air M5 24gb > Qwen3 30B-A3B, bulk processing
- iPhone > Qwen3 4B, on-device
every model above runs on hardware i own, weights downloaded, no api key in the loop
the one thing i don’t self-host yet is video.. the open video models want a dedicated gpu box, so that’s my next build (when i figure out how to make $100k MRR on it lol)
the other one i’m scaling toward is Kimi K2.7 fully local.. it’s a 1T model so it needs a real gpu server, adding it as the revenue grows
with MiMo V2.5 the same as with Kimi and Kling
frontier ai used to need someone else’s datacenter.. now it fits on my desk
currently i guess it’s valued around $20k
Similar Articles
@TheAhmadOsman: Gentle reminder that all you need to start with Local AI is: - 2x RTX 3090s (pick up for $700-$900 on r/hardwareswap) -…
A reminder that two RTX 3090s and open-source models like Qwen 3.6 27B or Gemma 4 31B can run powerful local AI agents, comparable to Opus 4.5, using tools like Claude Code and self-hosted SearXNG.
@DeRonin_: My entire AI stack is now Chinese 87% cheaper. same revenue swaps by task: 1. reasoning / backend brain Opus 4.8 → Kimi…
A user reports replacing American AI models with Chinese alternatives across reasoning, code generation, agent loops, bulk processing, and image/video generation, achieving 87% cost reduction with only 4% average quality drop and unchanged revenue.
@RayFernando1337: https://x.com/RayFernando1337/status/2070621713952579990
A detailed analysis on whether to run AI models locally or via API, covering hardware options like RTX 5090, RTX PRO 6000, and DGX Spark, with emphasis on memory vs bandwidth trade-offs, cost considerations, and privacy needs.
@andrewchen: finding the main downside with experimenting with local AI models is that you end up buying one GPU, then another, then…
Andrew Chen shares his experience of buying multiple GPUs for local AI experimentation, running Qwen3.6 27B dense at 100 tok/s on a 5090 eGPU, and compares it to Sonnet 4.6.
@rohanpaul_ai: atomic[.]chat (a desktop app that runs LLMs locally) ran a very revealing comparison for local AI agents, on a MacBook …
Liquid's LFM2.5-8B-A1B outperformed OpenAI's gpt-oss-20b on a tool-calling benchmark when run locally on a MacBook Pro, completing all required tool calls in half the time while using less memory.