verifiable-reasoning

Tag

Cards List
#verifiable-reasoning

@kimmonismus: Crazy: A 3B model is now reaching highly competitive results on verifiable reasoning tasks. VibeThinker-3B scores 94.3 …

X AI KOLs Following · 16h ago Cached

A 3B model, VibeThinker-3B, achieves highly competitive results on verifiable reasoning tasks through post-training refinements on Qwen2.5-Coder, including curriculum SFT, multi-domain RL, offline self-distillation, and a final RL-based instruct stage.

0 favorites 0 likes
#verifiable-reasoning

@f14bertolotti: Stellar performance from a 3B model. These results were achieved primarily through post-training refinements on Qwen2.5…

X AI KOLs Timeline · 21h ago Cached

This technical report introduces VibeThinker-3B, a 3B parameter model that achieves frontier-level verifiable reasoning performance through post-training refinements on Qwen2.5-Coder, including curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation, matching or exceeding much larger models like DeepSeek V3.2.

0 favorites 0 likes
#verifiable-reasoning

VibeThinker-3B: Exploring the Frontier of Verifiable Reasoning in Small Language Models

Hugging Face Daily Papers · 2d ago Cached

VibeThinker-3B is a compact 3B parameter model that achieves frontier-level performance on verifiable reasoning tasks through a specialized training pipeline, matching larger models like DeepSeek V3.2 and Gemini 3 Pro.

0 favorites 0 likes
#verifiable-reasoning

WeiboAI/VibeThinker-3B

Hugging Face Models Trending · 4d ago Cached

VibeThinker-3B is a 3B-parameter model that achieves frontier-level reasoning performance on math, coding, and STEM benchmarks by optimizing the Spectrum-to-Signal Principle (SSP) post-training pipeline, reaching performance comparable to much larger models.

0 favorites 0 likes
#verifiable-reasoning

DocScope: Benchmarking Verifiable Reasoning for Trustworthy Long-Document Understanding

arXiv cs.CL · 2026-05-12 Cached

DocScope is a new benchmark for evaluating the verifiable reasoning and trustworthiness of Multimodal Large Language Models on long documents, introducing a four-stage evaluation protocol for page localization, region grounding, fact extraction, and answer verification.

0 favorites 0 likes
← Back to home

Submit Feedback