state-of-the-art

#state-of-the-art

Imagine Before You Predict: Interleaved Latent Visual Reasoning for Video Event Prediction

Hugging Face Daily Papers ↗ · 2026-06-04 Cached

Introduces Future-L1, an interleaved latent visual reasoning framework that improves video event prediction by maintaining visual semantics in latent space. Achieves state-of-the-art results on FutureBench and TwiFF-Bench benchmarks.

0 favorites 0 likes

#state-of-the-art

@NielsRogge: Holo 3.1 reaches a new SOTA on AndroidWorld, a popular computer use agents benchmark Can be explored here https://paper…

X AI KOLs Following ↗ · 2026-06-02 Cached

Holo 3.1 achieves state-of-the-art performance on the AndroidWorld benchmark for computer-use agents, demonstrating improved speed and cost-effectiveness for local deployment.

0 favorites 0 likes

#state-of-the-art

Browse CVPR 2026 papers on PapersWithCode [P]

Reddit r/MachineLearning ↗ · 2026-06-02

A Hugging Face team member announces the addition of conference support to the revived PapersWithCode website, allowing users to browse all CVPR 2026 papers with arXiv IDs, categorized by task and linked to GitHub, project pages, and Hugging Face artifacts.

0 favorites 0 likes

#state-of-the-art

@EpochAIResearch: We took another look at the capability gap between open-weight and proprietary models. Since the start of the year, ope…

X AI KOLs Following ↗ · 2026-05-29 Cached

Epoch AI Research analyzed the capability gap between open-weight and proprietary AI models, finding that open-weight models have been trailing the state of the art by approximately four months since the start of the year.

0 favorites 0 likes

#state-of-the-art

LLMBridge: An LLM Pipeline for End-to-end Referential Bridging Resolution in English

arXiv cs.CL ↗ · 2026-05-29 Cached

LLMBridge introduces an LLM-based pipeline for end-to-end referential bridging resolution, achieving state-of-the-art performance on three English datasets. The system combines heuristic pre/post-processing with LLM natural language inference.

0 favorites 0 likes

#state-of-the-art

@_albertgu: Extremely proud of the team @cartesia for launching Sonic 3.5, which sets a new state of the art for TTS I personally l…

X AI KOLs Following ↗ · 2026-05-22 Cached

Cartesia launches Sonic 3.5, a new state-of-the-art TTS model supporting 42 languages, taking the #1 spot on the Artificial Analysis Speech Arena Leaderboard.

0 favorites 0 likes

#state-of-the-art

@outsource_: BREAKING QWOPUS 3.6 27B IS FULLY LIVE! SOTA QWEN 3.6 27b + Opus IS HERE!!!! Agentic coding GOATED: 75.25% (152/202) on …

X AI KOLs Timeline ↗ · 2026-05-22 Cached

Qwopus 3.6 27B is now fully live, a merged model (Qwen + Opus) achieving state-of-the-art agentic coding performance with 75.25% on SWE MMLU Pro, handling 303k token context at Q8 KV cache, and running on 24GB VRAM at Q5_K_M quantization.

1 favorites 0 likes

#state-of-the-art

@allen_ai: Most models are only evaluated on a fraction of the benchmarks out there. ArtifactLinker, our new system, predicts whic…

X AI KOLs Following ↗ · 2026-05-22 Cached

Allen AI introduces ArtifactLinker, a system that predicts which AI models will achieve state-of-the-art results on HuggingFace benchmarks and then verifies by running evaluations.

0 favorites 0 likes

#state-of-the-art

TabPFN-MT: A Natively Multitask In-Context Learner for Tabular Data

arXiv cs.LG ↗ · 2026-05-21 Cached

TabPFN-MT extends PFNs to multitask in-context learning for tabular data, achieving state-of-the-art on small-to-medium datasets while reducing inference cost from O(T) to O(1) forward passes.

0 favorites 0 likes

#state-of-the-art

DrugSAGE:Self-evolving Agent Experience for Efficient State-of-the-Art Drug Discovery

arXiv cs.LG ↗ · 2026-05-18 Cached

DrugSAGE is a framework that accumulates and reuses cross-task memory to build state-of-the-art drug discovery models efficiently, outperforming baseline agents by 10-30% on held-out tasks.

0 favorites 0 likes

#state-of-the-art

Poetiq: Recursive Self-Improvement Delivers New SOTA Coding Performance

Reddit r/singularity ↗ · 2026-05-15 Cached

Poetiq's Meta-System, using recursive self-improvement via standard API access without fine-tuning, achieves new state-of-the-art results on the LiveCodeBench Pro coding benchmark, outperforming leading models like GPT 5.5.

0 favorites 0 likes

#state-of-the-art

@hasantoxr: I'm replacing every memory layer I've ever built into an agent with this. SureThing dropped SOTA on LongMemEval. 88.0% …

X AI KOLs Timeline ↗ · 2026-05-12

SureThing has achieved state-of-the-art results on the LongMemEval benchmark, scoring 88.0% overall, prompting developers to replace existing memory layers in their AI agents.

0 favorites 0 likes

#state-of-the-art

@antoine_chaffin: Reason-ModernColBERT nearly solved BrowseComp-Plus, smashing SOTA and outperforming models models 54× bigger Not bad fo…

X AI KOLs Following ↗ · 2026-05-12 Cached

Reason-ModernColBERT achieves near-perfect results on BrowseComp-Plus, surpassing SOTA and models 54× larger, then Agent-ModernColBERT further improves with minimal training.

0 favorites 0 likes

#state-of-the-art

RecGen 1 & 2: New, possibly open source SOTA image to 3Dmodel AI released.

Reddit r/singularity ↗ · 2026-05-10

RecGen 1 and 2 are newly released AI models that claim state-of-the-art performance in converting images to 3D models, with potential open-source availability.

0 favorites 0 likes

#state-of-the-art

[Google DeepMind] the AI co-mathematician also achieves state of the art results on hard problemsolving benchmarks, including scoring 48% on FrontierMath Tier 4, a new high score among all AI systems evaluated.

Reddit r/singularity ↗ · 2026-05-08

Google DeepMind's AI co-mathematician achieves state-of-the-art results on hard problem-solving benchmarks, scoring 48% on FrontierMath Tier 4, the highest among all AI systems evaluated.

0 favorites 0 likes

#state-of-the-art

@OpenAI: What makes ChatGPT Images 2.0 a state-of-the-art image generation model? Researchers behind the model explain. A thread…

X AI KOLs ↗ · 2026-04-21 Cached

OpenAI researchers explain the advances that make ChatGPT Images 2.0 a state-of-the-art image generation model, highlighting its thinking and intelligence capabilities.

0 favorites 0 likes

#state-of-the-art

@techNmak: A lightweight VLM that beats the giants at OCR. (1.7B parameters, SOTA on OmniDocBench) dots. ocr is a new multilingual…

X AI KOLs Timeline ↗ · 2026-04-20 Cached

dots.ocr is a new lightweight 1.7B parameter multilingual vision-language model that achieves state-of-the-art performance on OmniDocBench, outperforming much larger models (72B+) at document parsing and OCR tasks.

0 favorites 0 likes

#state-of-the-art

UniCorn: Towards Self-Improving Unified Multimodal Models through Self-Generated Supervision

Papers with Code Trending ↗ · 2026-01-06 Cached

UniCorn is a framework that enables unified multimodal models to self-improve by using a multi-agent system for prompt generation, image creation, and quality evaluation, achieving state-of-the-art results on text-to-image benchmarks like TIIF, WISE, and OneIG-EN.

0 favorites 0 likes

#state-of-the-art

PaddleOCR-VL: Boosting Multilingual Document Parsing via a 0.9B Ultra-Compact Vision-Language Model

Papers with Code Trending ↗ · 2025-10-16 Cached

PaddleOCR-VL is a compact 0.9B vision-language model that achieves state-of-the-art performance in multilingual document parsing and element recognition by integrating NaViT-style dynamic resolution with the ERNIE language model.

0 favorites 0 likes

#state-of-the-art

Scaling Agents via Continual Pre-training

Papers with Code Trending ↗ · 2025-09-16 Cached

Proposes Agentic Continual Pre-training to build agentic foundation models, achieving state-of-the-art results on 10 benchmarks with AgentFounder-30B, including 39.9% on BrowseComp-en and 43.3% on BrowseComp-zh.

0 favorites 0 likes

state-of-the-art

Submit Feedback