Dual rtx 3090 build

Reddit r/LocalLLaMA 06/02/26, 09:56 AM News

build dual-rtx-3090 inference local-llm agentic rag tool-stack

Summary

A user shares their dual RTX 3090 build for local LLM inference and seeks advice on tool stacks for agentic work and RAG pipelines.

Joining this community sparked a new hobby and interest in software engineering that I had lost. So I made this dual rtx 3090 build mostly for inference , I know I won’t be replacing chatgpt anytime soon but what tool stack would help it be usable in a work environment ? Must MCP servers or custom tools/scripts ? Currently using VScode preview with qwen3.6 27b and an nginx server, Im mostly interested in agentic work with usable context or at least a better knowledge of code base ( RAG pipeline?) Been already such a helpful community , hopefully local llms continue to grow because I fear cloud will become unaffordable at a consumer level

Original Article

Similar Articles

Upgrading from 2x 3090 - what should I add? (2x A6000/5090/48GB 4090?)

Reddit r/LocalLLaMA

A discussion about upgrading from dual RTX 3090s to alternatives like dual A6000s, RTX 5090, or 48GB RTX 4090, likely for AI/ML workloads.

@leopardracer: https://x.com/leopardracer/status/2055341758523883631

X AI KOLs Timeline

A user shares their experience setting up a dual-GPU local AI lab with RTX 4080 Super and 5060 Ti, running Qwen 3.6 models via llama.cpp and llama-swap to reduce API costs and enable unrestricted experimentation.

Modded RTX 4090 48GB vs Radeon AI Pro R9700 vs Arc Pro B70 for local coding LLMs?

Reddit r/LocalLLaMA

A user seeks advice on choosing between a modded RTX 4090 48GB, dual AMD Radeon AI Pro R9700, or dual Intel Arc Pro B70 for running local coding LLMs, highlighting trade-offs in price, VRAM, software ecosystem, and inference speed.

@TheAhmadOsman: Gentle reminder that all you need to start with Local AI is: - 2x RTX 3090s (pick up for $700-$900 on r/hardwareswap) -…

X AI KOLs Timeline

A reminder that two RTX 3090s and open-source models like Qwen 3.6 27B or Gemma 4 31B can run powerful local AI agents, comparable to Opus 4.5, using tools like Claude Code and self-hosted SearXNG.

NVIDIA-Nemotron-Labs-3-Puzzle-75B-A9B on 2x3090s

Reddit r/LocalLLaMA

A detailed guide on running the quantized NVIDIA-Nemotron-Labs-3-Puzzle-75B-A9B model on two RTX 3090s using vLLM with full 262K context, achieving high inference speeds without CPU offloading.

Similar Articles

Upgrading from 2x 3090 - what should I add? (2x A6000/5090/48GB 4090?)

@leopardracer: https://x.com/leopardracer/status/2055341758523883631

Modded RTX 4090 48GB vs Radeon AI Pro R9700 vs Arc Pro B70 for local coding LLMs?

@TheAhmadOsman: Gentle reminder that all you need to start with Local AI is: - 2x RTX 3090s (pick up for $700-$900 on r/hardwareswap) -…

NVIDIA-Nemotron-Labs-3-Puzzle-75B-A9B on 2x3090s

Submit Feedback