@aijoey: for all my new dgx spark owners. https://github.com/joeynyc/spark-doctor…
Summary
Spark Doctor is an open-source diagnostic CLI for NVIDIA DGX Spark that collects system, GPU, memory, Docker, and recipe data, applies specific rules, and outputs the likely cause and next steps for common issues.
View Cached Full Text
Cached at: 06/24/26, 12:23 PM
for all my new dgx spark owners. https://t.co/BTPc41Uf9V https://t.co/7RwueWZo1x
joeynyc/spark-doctor
Source: https://github.com/joeynyc/spark-doctor
Spark Doctor
Local diagnostic CLI for NVIDIA DGX Spark. Collects system, GPU, memory, Docker, runtime, network, and recipe data, applies DGX Spark-specific rules, and prints a short answer: what is wrong, why, and what to try next.
Read-only. No dashboard. No auto-fixes. No telemetry.
Why
DGX Spark is a new platform. When something goes wrong, the signal is scattered across nvidia-smi, /proc/pressure/*, dmesg, docker info, backend logs, forum threads, and Field Diagnostics. Owners hit the same issues repeatedly — GPU stuck in a 14 W low-power state, unified memory pressure stalling inference, thermal shutdowns, Docker runtime not registered, recipes with tensor_parallel_size set for a multi-GPU box.
Spark Doctor collects those signals in one command, applies DGX Spark-specific rules, and tells you the likely cause and the next step — in plain English, with the evidence attached.
Install
git clone <repo-url> && cd spark-doctor
python3 -m venv .venv && source .venv/bin/activate
pip install -e .
Requires Python 3.11+.
Commands
spark-doctor scan # full scan + diagnosis
spark-doctor scan --json scan.json --markdown report.md
spark-doctor doctor --from scan.json # re-run rules on saved scan
spark-doctor report --from scan.json --format {markdown,forum,github}
spark-doctor recipe check recipe.yaml
spark-doctor anonymize scan.json --out redacted.json
spark-doctor self-test
spark-doctor version
Exit codes: 0 clean · 1 warning · 2 critical · 3 collector failure.
What it detects
| ID | Detects |
|---|---|
power.low_draw_under_load | High GPU utilization with suspiciously low power draw (e.g. 14 W cap). |
thermal.shutdown_risk | GPU temp ≥ 85/90 C or thermal events in logs. |
memory.uma_pressure | Low MemAvailable, high memory PSI, or heavy swap use. |
runtime.docker_unhealthy | Docker/NVIDIA container runtime missing or misconfigured. |
backend.multiple_heavy_models | Two or more heavy model backends running concurrently. |
Recipe validator checks tensor-parallel vs GPU count, container image registry, arm64 compatibility, memory budget, and aggressive gpu_memory_utilization / context lengths.
Privacy
Reports are anonymized by default:
- Hostname, username, and home paths replaced.
- Private IPv4 and MAC addresses redacted unless
--include-network-identifiers. - HF, NGC, OpenAI, bearer, JWT, and SSH-key patterns redacted.
- Logs (
dmesg,journalctl) only included with--include-logs.
Safety
No package installs, driver updates, process kills, reboots, clock locking, or power changes. All fixes are instructions.
Development
pip install -e '.[dev]'
pytest
New rules go in src/spark_doctor/rules/, register in rules/engine.py, add a fixture in tests/fixtures/, add a test.
License
MIT.
Similar Articles
DGX Spark agentic usage numbers
A user shares benchmark results and configuration for running Qwen3.6 models on NVIDIA DGX Spark using vLLM, focusing on agentic workloads with concurrent requests and tool calling.
@LyalinDotCom: https://x.com/LyalinDotCom/status/2059023609536839684
A comparison of running Gemma 4 on a DGX Spark versus a MacBook Pro M5, with the author expressing gratitude for receiving the DGX Spark.
Deepseek V4 flash performance on DGX Spark
A Reddit user shares their experience running DeepSeek V4 Flash on a dual-ASUS GX10 DGX Spark setup, detailing performance metrics, configuration, and power consumption, with throughput benchmarks across various context lengths.
Unpopular Opinion: The DGX Spark Forum community of devs is talented AF and will make the crippled hardware a success through their sheer force of will.
An opinion piece highlighting the thriving DGX Spark developer community that is collaboratively optimizing the hardware despite its limitations, with projects like Sparkrun and PrismaQuant.
@antirez: For the DGX Spark owners. This is what you get with DS4 in your hardware. I want to post this to show how with fast pre…
antirez shares a demonstration of using DS4 on the DGX Spark, showing that despite slow generation, fast prefill keeps the system usable.