Tag
Introduces Future-L1, an interleaved latent visual reasoning framework that improves video event prediction by maintaining visual semantics in latent space. Achieves state-of-the-art results on FutureBench and TwiFF-Bench benchmarks.
Holo 3.1 achieves state-of-the-art performance on the AndroidWorld benchmark for computer-use agents, demonstrating improved speed and cost-effectiveness for local deployment.
A Hugging Face team member announces the addition of conference support to the revived PapersWithCode website, allowing users to browse all CVPR 2026 papers with arXiv IDs, categorized by task and linked to GitHub, project pages, and Hugging Face artifacts.
Epoch AI Research analyzed the capability gap between open-weight and proprietary AI models, finding that open-weight models have been trailing the state of the art by approximately four months since the start of the year.
LLMBridge introduces an LLM-based pipeline for end-to-end referential bridging resolution, achieving state-of-the-art performance on three English datasets. The system combines heuristic pre/post-processing with LLM natural language inference.
Cartesia launches Sonic 3.5, a new state-of-the-art TTS model supporting 42 languages, taking the #1 spot on the Artificial Analysis Speech Arena Leaderboard.
Qwopus 3.6 27B is now fully live, a merged model (Qwen + Opus) achieving state-of-the-art agentic coding performance with 75.25% on SWE MMLU Pro, handling 303k token context at Q8 KV cache, and running on 24GB VRAM at Q5_K_M quantization.
Allen AI introduces ArtifactLinker, a system that predicts which AI models will achieve state-of-the-art results on HuggingFace benchmarks and then verifies by running evaluations.
TabPFN-MT extends PFNs to multitask in-context learning for tabular data, achieving state-of-the-art on small-to-medium datasets while reducing inference cost from O(T) to O(1) forward passes.
DrugSAGE is a framework that accumulates and reuses cross-task memory to build state-of-the-art drug discovery models efficiently, outperforming baseline agents by 10-30% on held-out tasks.
Poetiq's Meta-System, using recursive self-improvement via standard API access without fine-tuning, achieves new state-of-the-art results on the LiveCodeBench Pro coding benchmark, outperforming leading models like GPT 5.5.
SureThing has achieved state-of-the-art results on the LongMemEval benchmark, scoring 88.0% overall, prompting developers to replace existing memory layers in their AI agents.
Reason-ModernColBERT achieves near-perfect results on BrowseComp-Plus, surpassing SOTA and models 54× larger, then Agent-ModernColBERT further improves with minimal training.
RecGen 1 and 2 are newly released AI models that claim state-of-the-art performance in converting images to 3D models, with potential open-source availability.
Google DeepMind's AI co-mathematician achieves state-of-the-art results on hard problem-solving benchmarks, scoring 48% on FrontierMath Tier 4, the highest among all AI systems evaluated.
OpenAI researchers explain the advances that make ChatGPT Images 2.0 a state-of-the-art image generation model, highlighting its thinking and intelligence capabilities.
dots.ocr is a new lightweight 1.7B parameter multilingual vision-language model that achieves state-of-the-art performance on OmniDocBench, outperforming much larger models (72B+) at document parsing and OCR tasks.
UniCorn is a framework that enables unified multimodal models to self-improve by using a multi-agent system for prompt generation, image creation, and quality evaluation, achieving state-of-the-art results on text-to-image benchmarks like TIIF, WISE, and OneIG-EN.
PaddleOCR-VL is a compact 0.9B vision-language model that achieves state-of-the-art performance in multilingual document parsing and element recognition by integrating NaViT-style dynamic resolution with the ERNIE language model.
Proposes Agentic Continual Pre-training to build agentic foundation models, achieving state-of-the-art results on 10 benchmarks with AgentFounder-30B, including 39.9% on BrowseComp-en and 43.3% on BrowseComp-zh.