Tag
A benchmark comparison of local open-weight LLMs on a single H100 (FP8) shows DiffusionGemma is 4x faster but makes 6x more mistakes than Gemma4 26B A4B, highlighting trade-offs between speed and accuracy in diffusion versus autoregressive models.
A comparison suggesting that Google's Gemini 3.1 Pro underperforms relative to Opus 4.7 in real-world usage, with the article highlighting Artificial Analysis as a go-to benchmarking resource.
Comparison showing that Gemini 3.5 Flash scores slightly lower than Gemini 3.1 Pro in Artificial Analysis benchmarks and has a higher total benchmark cost despite lower per-token API pricing.
A detailed CPU benchmark comparing Kokoro 82M and Supertonic 3 TTS models, measuring RTF, latency, and throughput across text lengths. Results show Supertonic 3 is faster but Kokoro produces more natural speech, with practical recommendations for different use cases.
A user reports that their Asus Ascent with Nvidia GB10 (DGX) is slower than their Ryzen AI Max when running LLMs like Gemma4-31B, despite expected 2-4x speedup, and shares their llama-cpp configuration for debugging.
A user shares a hands-on comparison of running Gemma 4 with LiteRT-LM on mobile devices versus their previous llama.cpp setup, noting significantly better memory usage (1.5-2 GB vs 4-5 GB) and faster inference (2-4 seconds vs 7-10 seconds) on smartphones like Samsung S25 Ultra and iPhone 13 Pro Max.
An AI coding contest compares Claude and Gemini on a weighted knight's tour problem variant where the cost of each move depends on accumulated load from visited squares.