state-of-the-art

#state-of-the-art

@ModelScope2022: Introducing Agents-A1, A 35B MoE agentic model built for long-horizon tasks across search, engineering, scientific rese…

X AI KOLs Timeline ↗ · 12h ago Cached

ModelScope introduces Agents-A1, a 35B MoE agentic model with 256K context and function calling, achieving SOTA on long-horizon tasks and instruction following.

0 favorites 0 likes

#state-of-the-art

MOSAIC: Orchestrating Collaborative Knowledge Tracing with Hierarchical Semantic Alignment

arXiv cs.LG ↗ · 19h ago Cached

MOSAIC is a novel framework that uses a frozen LLM to generate semantic embeddings and hierarchical prediction prompts for knowledge tracing, achieving state-of-the-art results on multiple benchmarks.

0 favorites 0 likes

#state-of-the-art

@GergelyOrosz: This is from a popular inference provider GLM-5.2 plus the US banning the most capable new models means open source cau…

X AI KOLs Following ↗ · 3d ago Cached

GLM-5.2 is a new open-source coding model that has caught up to closed-source SOTA models, potentially disrupting revenues of OpenAI and Anthropic.

0 favorites 0 likes

#state-of-the-art

Speaking Numbers to LLMs: Multi-Wavelet Number Embeddings for Time Series Forecasting

arXiv cs.CL ↗ · 4d ago Cached

Proposes TempoWave, a plug-and-play temporal wavelet digit interface that maps time series observations into digit-wise embeddings from multi-wavelet coefficients, improving LLM-based time series forecasting and achieving state-of-the-art on multiple benchmarks.

0 favorites 0 likes

#state-of-the-art

@anvie: Tested Ornith-1.0-9B, and its impressive for a model of that size. I don't believe this is just 9B!

X AI KOLs Following ↗ · 5d ago Cached

Ornith-1.0 is a family of open-source LLMs specialized for agentic coding, spanning sizes from 9B to 397B and achieving state-of-the-art performance among open-source models of comparable size.

0 favorites 0 likes

#state-of-the-art

QuickMaker

Product Hunt ↗ · 6d ago

QuickMaker offers a subscription service that integrates state-of-the-art AI models directly into Blender for enhanced 3D modeling and design workflows.

0 favorites 0 likes

#state-of-the-art

@ms_aifrontiers: Fara1.5 is here! The tech report just landed on arXiv. New SOTA for computer use agents of its size, and it competes wi…

X AI KOLs Following ↗ · 6d ago Cached

Fara1.5 is a family of native computer use agents trained using the FaraGen1.5 scalable data pipeline. The models achieve new state-of-the-art results on browser-use benchmarks, competing with much larger frontier models.

0 favorites 0 likes

#state-of-the-art

@sama: We want to help all companies be secure, working with the USG and the security ecosystem. *The full version of GPT-5.5-…

X AI KOLs ↗ · 2026-06-22 Cached

OpenAI releases the full version of GPT-5.5-Cyber, a cybersecurity-focused AI model with state-of-the-art performance on CyberGym, and announces efforts to improve security through Patch The Planet and Codex Security.

0 favorites 0 likes

#state-of-the-art

@sheriyuo: Best-of-N, rejection sampling, and rubric-based ranking all assume you already have a reliable way to evaluate candidat…

X AI KOLs Timeline ↗ · 2026-06-18 Cached

Apodex releases Apodex-1.0, a deep-research model that uses a heavy-duty agent team with global verification, achieving state-of-the-art results on multiple benchmarks including BrowseComp, DeepSearchQA, and HLE.

0 favorites 0 likes

#state-of-the-art

ThinkDeception: A Progressive Reinforcement Learning Framework for Interpretable Multimodal Deception Detection

arXiv cs.AI ↗ · 2026-06-18 Cached

ThinkDeception proposes a novel framework that leverages multimodal large language models and a progressive reinforcement learning strategy with chain-of-thought reasoning for interpretable deception detection, achieving new state-of-the-art results on standard benchmarks.

0 favorites 0 likes

#state-of-the-art

@nickscamara_: New discoveries are gonna come from models that can reason over the latest science The rate of scientific progress beco…

X AI KOLs Timeline ↗ · 2026-06-17 Cached

Firecrawl released a state-of-the-art research index for AI/ML papers, claiming 18% better recall on arXivQA than competitors, designed for autonomous research agents.

0 favorites 0 likes

#state-of-the-art

StepGuard: Guarding Web Navigation via Single-Step Calibration

arXiv cs.AI ↗ · 2026-06-17 Cached

StepGuard proposes a framework combining Dynamic Dual-Policy Optimization (DDPO) and Confidence-Guided Adaptive Navigation Reflection (CANR) to address reward misalignment and error propagation in web navigation agents, achieving state-of-the-art performance.

0 favorites 0 likes

#state-of-the-art

Unified Multimodal Autoregressive Modeling with Shared Context-Visual Tokenizer is Key to Unification

Hugging Face Daily Papers ↗ · 2026-06-16 Cached

UniAR presents a unified autoregressive framework that uses a single discrete visual tokenizer to bridge visual understanding and generation, achieving state-of-the-art results in image generation and editing.

0 favorites 0 likes

#state-of-the-art

@NielsRogge: Very cool work!! Modality Forcing gets SOTA on 4 out of 5 monocular depth estimation benchmarks. Explore the paper and …

X AI KOLs Following ↗ · 2026-06-13 Cached

Bardienus Duisterhof introduces Modality Forcing, a recipe for post-training text-to-image (T2I) models that achieves state-of-the-art results on 4 out of 5 monocular depth estimation benchmarks.

0 favorites 0 likes

#state-of-the-art

A bit weird, but okay. (Don't get me wrong it's SOTA for editing, but definitely not generation) Thoughts?

Reddit r/singularity ↗ · 2026-06-11

The comment acknowledges that the model is state-of-the-art for editing but not for generation.

0 favorites 0 likes

#state-of-the-art

Harnessing the Collective Intelligence of AI Agents in the Wild for New Discoveries

arXiv cs.CL ↗ · 2026-06-10 Cached

This paper presents EinsteinArena, an agent-native platform enabling decentralized scientific discovery through open interaction among autonomous AI agents. The platform has already produced 12 new state-of-the-art results, including an improved lower bound for the kissing number problem in dimension 11, demonstrating that collective AI-driven research can emerge from agents sharing insights and building on each other's work.

0 favorites 0 likes

#state-of-the-art

@heyshrutimishra: 1. Fable 5 is state-of-the-art on nearly every benchmark that matters. Software engineering. Science. Knowledge work. V…

X AI KOLs Following ↗ · 2026-06-09 Cached

Anthropic releases Fable 5, claiming it is state-of-the-art on key benchmarks in software engineering, science, knowledge work, and vision, exceeding all previously available models.

0 favorites 0 likes

#state-of-the-art

@karpathy: This is a super exciting release - Claude Fable 5 is the same underlying model as Mythos but with added safeguards. The…

X AI KOLs ↗ · 2026-06-09 Cached

Claude Fable 5 has been released, claimed to be state-of-the-art across benchmarks with qualitative improvements, especially on complex long tasks. It is the same underlying model as Mythos but with added safeguards.

0 favorites 0 likes

#state-of-the-art

@Apodex_AI: Dive in Blog: https://apodex.com/blog/apodex-1.0 Tech report: http://apodex.com/pdf/20260608 Github: https://github.com…

X AI KOLs Following ↗ · 2026-06-08 Cached

ApodexAI releases Apodex-1.0, a deep-research model that operates as a tool-using ReAct agent. Its heavy-duty mode, Apodex-1.0-H, uses an asynchronous agent team with up to 150 sub-agents and achieves new state-of-the-art results on deep-research benchmarks including BrowseComp, DeepSearchQA, HLE, and FrontierScience, surpassing models like GPT-5.5-pro and Claude-Opus-4.8.

0 favorites 0 likes

#state-of-the-art

@Apodex_AI: Meet 𝗔𝗽𝗼𝗱𝗲𝘅 𝟭.𝟬 — a heavy-duty agent team for deep research, which sets the SOTA! The team searches the web, re…

X AI KOLs Timeline ↗ · 2026-06-08 Cached

Apodex 1.0 is a heavy-duty AI agent team for deep research that achieves state-of-the-art performance by searching the web, reasoning over evidence, and producing reports with verifiable evidence chains.

0 favorites 0 likes

state-of-the-art

Submit Feedback