Benchmark: ONNX Runtime vs HF Transformers vs GGUF for Parakeet TDT 0.6B on CPU-only hardware [D]

Reddit r/MachineLearning News

Summary

A benchmark comparing ONNX Runtime, HF Transformers, and GGUF for the Parakeet TDT 0.6B ASR model on CPU-only hardware shows ONNX Runtime achieves 37% faster inference than HF Transformers bfloat16, while GGUF prioritizes memory efficiency.

Sharing a small CPU inference benchmark for nvidia/parakeet-tdt-0.6b-v3 that turned up a result I didn't expect going in. **Setup:** 2 x86-64 vCPUs (AVX2/FMA), 7.7GB RAM, no GPU. Test audio: 16.78s Harvard sentences at 16kHz mono. **Results:** |Inference path|RTF|Peak Memory|CPU utilization| |:-|:-|:-|:-| |HF Transformers bfloat16|0.519|\~430MB delta|—| |ONNX Runtime FP32 (onnx-asr)|0.328|2,667MB|49.9%| |GGUF Q6\_K (parakeet.cpp)|0.708|928MB|99.8%| ONNX Runtime is 37% faster than HF Transformers bfloat16 on this hardware. The gap comes from operator fusion and AVX2-optimized execution providers in ONNX Runtime that the PyTorch CPU path doesn't exploit as aggressively. Memory cost is the tradeoff — FP32 weights load at \~2.7GB peak. GGUF Q6\_K trades throughput for memory efficiency. 928MB peak vs 2.7GB, but RTF doubles and CPU utilization hits 99.8%. For memory-constrained deployments it's the right call. For sustained throughput on a box with headroom, ONNX wins. One methodological note worth flagging for anyone doing ASR benchmarking with synthetic audio: espeak-ng inflated WER to 20.9% on a sentence set where gTTS got 4.65%. Both runtimes got identical WER within each run, confirming it's the TTS distribution mismatch rather than model or quantization quality. NVIDIA reports 1.93% on LibriSpeech — the gTTS number is a much more honest CPU-only proxy. Github repo with code, raw results, and evaluation scripts in comments below. *Disclosure: benchmark was run using Neo, a local AI engineering agent inside Claude Code using its MCP. Mentioning because the runtime and audio choices came from its research phase, not prior knowledge on my end.*
Original Article

Similar Articles