verifiable-reasoning

#verifiable-reasoning

Verifiable Self-Evolution for Open-Ended Dialogue Skills via Future-Feedback Prediction

arXiv cs.CL ↗ · 2026-07-22 Cached

This paper introduces a method for self-evolution of open-ended dialogue skills using future-feedback prediction, converting conversational feedback into a fixed offline objective to enable reproducible skill optimization without live traffic. The approach achieves over 75% prediction accuracy on a privacy-preserving sales-assistant dataset.

0 favorites 0 likes

#verifiable-reasoning

Forethought: Verifiable Reasoning from Neurosymbolic Primitive Programming

arXiv cs.AI ↗ · 2026-07-07 Cached

Forethought is a neurosymbolic reasoning system that treats reasoning as an explicit, verifiable program composed from symbolic and neural primitives. It improves base-model accuracy by about 30% relative and enables small models to match frontier models while being model-agnostic and auditable.

0 favorites 0 likes

#verifiable-reasoning

@kimmonismus: Crazy: A 3B model is now reaching highly competitive results on verifiable reasoning tasks. VibeThinker-3B scores 94.3 …

X AI KOLs Following ↗ · 2026-06-16 Cached

A 3B model, VibeThinker-3B, achieves highly competitive results on verifiable reasoning tasks through post-training refinements on Qwen2.5-Coder, including curriculum SFT, multi-domain RL, offline self-distillation, and a final RL-based instruct stage.

0 favorites 0 likes

#verifiable-reasoning

@f14bertolotti: Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5…

X AI KOLs Timeline ↗ · 2026-06-16 Cached

This technical report introduces VibeThinker-3B, a 3B parameter model that achieves frontier-level verifiable reasoning performance through post-training refinements on Qwen2.5-Coder, including curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation, matching or exceeding much larger models like DeepSeek V3.2.

0 favorites 0 likes

#verifiable-reasoning

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

Hugging Face Daily Papers ↗ · 2026-06-15 Cached

VibeThinker-3B is a compact 3B parameter model that achieves frontier-level performance on verifiable reasoning tasks through a specialized training pipeline, matching larger models like DeepSeek V3.2 and Gemini 3 Pro.

0 favorites 0 likes

#verifiable-reasoning

WeiboAI/VibeThinker-3B

Hugging Face Models Trending ↗ · 2026-06-12 Cached

VibeThinker-3B is a 3B-parameter model that achieves frontier-level reasoning performance on math, coding, and STEM benchmarks by optimizing the Spectrum-to-Signal Principle (SSP) post-training pipeline, reaching performance comparable to much larger models.

0 favorites 0 likes

#verifiable-reasoning

DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document Understanding

arXiv cs.CL ↗ · 2026-05-12 Cached

DocScope is a new benchmark for evaluating the verifiable reasoning and trustworthiness of Multimodal Large Language Models on long documents, introducing a four-stage evaluation protocol for page localization, region grounding, fact extraction, and answer verification.

0 favorites 0 likes

verifiable-reasoning

Verifiable Self-Evolution for Open-Ended Dialogue Skills via Future-Feedback Prediction

Forethought: Verifiable Reasoning from Neurosymbolic Primitive Programming

@kimmonismus: Crazy: A 3B model is now reaching highly competitive results on verifiable reasoning tasks. VibeThinker-3B scores 94.3 …

@f14bertolotti: Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5…

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

WeiboAI/VibeThinker-3B

DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document Understanding

Submit Feedback