ai-comparison

#ai-comparison

The gap between closed and open models might be much smaller than commonly assumed, because we don’t know what closed model providers do in addition to model inference

Reddit r/LocalLLaMA ↗ · 8h ago

The article argues that comparing closed and open AI models may be unfair because closed model providers like Anthropic can supplement their model output with techniques such as RAG, prompt preprocessing, or hidden expert models, making benchmark comparisons apples-to-oranges.

0 favorites 0 likes

#ai-comparison

@stevibe: 3 ways to destroy a piece of paper. Qwen 3.5 35B A3B vs. Ornith 1.0 35B, side-by-side canvas test. (Why 3.5 not 3.6? Or…

X AI KOLs Timeline ↗ · 4d ago Cached

A side-by-side canvas test compares Qwen 3.5 35B A3B and Ornith 1.0 35B on three paper destruction tasks (slice, shredder, crumple), with Ornith decisively winning, demonstrating the value of post-training on Qwen 3.5 and Gemma 4.

0 favorites 0 likes

#ai-comparison

@jun_song: GPT-5.6 seems very disappointing. Nothing better than GLM-5.2

X AI KOLs Following ↗ · 2026-06-23 Cached

A user expresses disappointment with GPT-5.6, claiming it is not better than GLM-5.2.

0 favorites 0 likes

#ai-comparison

@omarsar0: Will be posting more examples in this thread. First one was amazing too:

X AI KOLs Following ↗ · 2026-06-22 Cached

A tweet thread comparing recent AI models' ability to generate endless procedural terrain using Three.js, all in a single shot, with a mention of Fugu Ultra as a candidate.

0 favorites 0 likes

#ai-comparison

Local Qwen isn't a worse Opus, it's a different tool

Lobsters Hottest ↗ · 2026-06-18 Cached

Alex Ellis compares local Qwen models to cloud-based Claude Opus, sharing his experience using local AI in his software business. He highlights the practical value of local models for specific tasks while acknowledging their limitations, such as hallucination and infinite loops when quantized.

0 favorites 0 likes

#ai-comparison

@TheGeorgePu: I'm trying out DeepSeek V4 Pro, and really like it. Super underrated model. As good as Opus 4.8 from the few tests I ra…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

User @TheGeorgePu praises DeepSeek V4 Pro, calling it underrated and comparing it favorably to Opus 4.8 based on initial tests.

0 favorites 0 likes

#ai-comparison

So Is Parrot Better Than Existing Models or Not? [D]

Reddit r/MachineLearning ↗ · 2026-06-12

A Reddit discussion asking whether the Parrot AI model is better than existing models, with an image presumably showing benchmarks or comparisons.

0 favorites 0 likes

#ai-comparison

Differences Between Claude Opus 4.8 and Claude Fable 5 on MineBench

Reddit r/singularity ↗ · 2026-06-11

A detailed comparison of Claude Opus 4.8 and Claude Fable 5 on the MineBench benchmark, highlighting trade-offs in inference time, cost, build quality, and prompting sensitivity.

0 favorites 0 likes

#ai-comparison

@MMMusol: Gemini 3.1 Pro, GPT 5.5, Deepseek V4, and the latest Claude Fable 5 performed the same test, as shown in the video. Compare for yourself~ The prompt is as follows: Create an HTML file to render a high-speed, aggressive fighter jet at full afterburner...

X AI KOLs Timeline ↗ · 2026-06-10 Cached

Multiple AI models (Gemini 3.1 Pro, GPT 5.5, Deepseek V4, Claude Fable 5) were asked to generate the same fighter jet HTML animation. The video shows a comparison of each model's output.

0 favorites 0 likes

#ai-comparison

Why is every agent ever made just a worse Claude Code?

Reddit r/AI_Agents ↗ · 2026-06-10

A developer questions the value of building specialized AI agents when general-purpose tools like Claude Code can accomplish the same tasks, suggesting that current agentic approaches are just less capable versions of Claude with extra guardrails.

0 favorites 0 likes

#ai-comparison

@auroter: Frontier AI is BRAINDEAD. GPT5.5 xHigh in Codex thinks I should use Tensor Parallelism to deploy Qwen 3.6 27B on my sys…

X AI KOLs Following ↗ · 2026-06-08 Cached

The author criticizes Frontier AI (GPT5.5 xHigh) for incorrectly suggesting Tensor Parallelism for a model that fits on a single GPU, and announces a planned shootout comparing several AI models (GPT5.5, Opus 4.8, Qwen variants, Nemotron) on a real-world problem.

0 favorites 0 likes

#ai-comparison

@royxy: You've all heard that you should use Codex for planning and Deepseek for implementation. But over the past couple of days, while pushing forward discussions on a highly complex project that has probably never been done before, I feel that Deepseek is more creative than Codex, while Codex's logical and engineering...

X AI KOLs Timeline ↗ · 2026-05-31

User shares experience using Deepseek and Codex for complex project planning and implementation, finding Deepseek more creative while Codex stronger in logic and engineering abilities.

0 favorites 0 likes

#ai-comparison

@elonmusk: Grok

X AI KOLs Following ↗ · 2026-05-26 Cached

Elon Musk highlights Grok's response to a user who copied Gemini's analysis of a Belgian hate speech conviction and asked Grok to reply.

0 favorites 0 likes

#ai-comparison

Gemini 3.5 flash scores, hasn’t even beat GPT 5.4 xhigh

Reddit r/singularity ↗ · 2026-05-19

Gemini 3.5 flash has achieved certain benchmark scores but has not yet surpassed GPT 5.4 xhigh in performance.

0 favorites 0 likes

#ai-comparison

Honest comparison after 4 months running Claude Pro + ChatGPT Plus side by side

Reddit r/AI_Agents ↗ · 2026-05-18

A detailed four-month comparison of Claude Pro and ChatGPT Plus reveals Claude excels in longform writing and complex coding with better context retention, while ChatGPT wins on speed and casual everyday tasks.

0 favorites 0 likes

#ai-comparison

ChatGPT Shopping vs Perplexity vs Wizard AI

Reddit r/ArtificialInteligence ↗ · 2026-05-08

A user compares ChatGPT, Perplexity, and Wizard AI for shopping recommendations, noting differences in brand diversity and purchasing integration.

0 favorites 0 likes

ai-comparison

Submit Feedback