model-performance

#model-performance

@jun_song: GPT-5.6 seems very disappointing. Nothing better than GLM-5.2

X AI KOLs Following ↗ · yesterday Cached

A user expresses disappointment with GPT-5.6, claiming it is not better than GLM-5.2.

0 favorites 0 likes

#model-performance

@haider1: GLM 5.2 feels like the opus 4.5 moment for open-weight models what genuinely impressed me was during long, multi-step a…

X AI KOLs Following ↗ · 2026-06-17 Cached

GLM 5.2 marks a significant milestone for open-weight models, demonstrating strong context retention across long multi-step tasks and more reliable tool calling.

0 favorites 0 likes

#model-performance

humanity's last exam current benchmarks thoughts?

Reddit r/singularity ↗ · 2026-06-15

Discussion of recent AI model scores on the 'humanity's last exam' benchmark, noting improvement from GPT-4o's 2.7% in May 2024 to around 45% by June 2026, questioning the exam's difficulty.

0 favorites 0 likes

#model-performance

Opus 4.8 Thinking keeps deteroriating on Hard Prompts English in LMArena (again)

Reddit r/singularity ↗ · 2026-06-07

Opus 4.8 Thinking continues to deteriorate on the Hard Prompts English benchmark on LMArena, scoring 23 points lower than Opus 4.6 Thinking, which retains the top spot.

0 favorites 0 likes

#model-performance

Performance When Offloading Large Models to System RAM?

Reddit r/LocalLLaMA ↗ · 2026-05-24

Discusses performance trade-offs of offloading large AI model weights from GPU VRAM to system RAM, comparing different GPU configurations like RTX 5090 vs RTX6000 for models like DeepSeek V4 Pro.

0 favorites 0 likes

#model-performance

@swyx: very belated but in retrospect i think @sama's mythical "build a business that gets better when models get better" is b…

X AI KOLs Following ↗ · 2026-05-20 Cached

swyx reflects on Sam Altman's idea of building businesses that improve as AI models improve, linking it to the emerging concept of Agent Labs, and notes a clear correlation with revenue spikes in Q4 2025.

0 favorites 0 likes

#model-performance

Gemini 3.5 Flash Benchmarks

Reddit r/singularity ↗ · 2026-05-19

Benchmark results for the Gemini 3.5 Flash model are discussed, likely showcasing its performance across various AI tasks.

0 favorites 0 likes

#model-performance

Consider running a bigger quant if possible

Reddit r/LocalLLaMA ↗ · 2026-04-22

A user reports that switching from a highly-compressed IQ4_XS quant to the larger IQ4_NL_XL quant of Qwen 3.6 dramatically improves agentic-coding accuracy, despite lower tok/s, urging others to favor bigger quants when VRAM allows.

0 favorites 0 likes

model-performance

Submit Feedback