New SOTA: Poetiq uses self-optimizing harness to surpass e.g. Opus 4.7 with Gemini 3 Flash

Reddit r/singularity 05/15/26, 12:43 AM Models

self-optimizing recursive-self-improvement sota coding gemini-3-flash poetiq

Summary

Poetiq claims new state-of-the-art coding performance using a self-optimizing harness with Gemini 3 Flash, surpassing Opus 4.7.

Check out their blog post here: [Poetiq | Recursive Self-Improvement Delivers New SOTA Coding Performance](https://poetiq.ai/posts/recursive_self_improvement_coding/)

Original Article

Similar Articles

@poetiq_ai: Poetiq's Meta-System built its own coding harness from scratch. It got SOTA on LiveCodeBench Pro. No fine-tuning, no sp…

X AI KOLs Following

Poetiq's Meta-System achieved state-of-the-art results on LiveCodeBench Pro by autonomously building a coding harness using standard APIs and Gemini 3.1 Pro, without fine-tuning or special model access.

Gemini 3.5 Flash Looks Good For How Fast It Is (8 minute read)

TLDR AI

Google released Gemini 3.5 Flash, a hybrid speed model that rivals Opus 4.7 and GPT-5.5 in speed and cost while performing well on agentic and coding benchmarks.

Poetiq: Recursive Self-Improvement Delivers New SOTA Coding Performance

Reddit r/singularity

Poetiq's Meta-System, using recursive self-improvement via standard API access without fine-tuning, achieves new state-of-the-art results on the LiveCodeBench Pro coding benchmark, outperforming leading models like GPT 5.5.

Gemma4_31b_fp8 keeping up with Sonnet_4.6_medium in my harness.

Reddit r/LocalLLaMA

A user reports that Gemma4_31b in FP8 matches or keeps up with Sonnet_4.6_medium in a custom harness across tasks like Cypher query generation, entity extraction, agentic tool calling, code writing, and multi-vector retrieval synthesis.

@nick_kango: One more task to add to my twitter benchmark collection:) Btw, Opus 4.8 and all the SOTA models passed when i tried tha…

X AI KOLs Timeline

Nick Kang adds a new task to his Twitter benchmark collection; Claude Opus 4.8 and other SOTA models pass, while Sonnet 4.6 and Grok 4.3 fail. Alfin remarks on Opus 4.8's dangerous capabilities.