Tag
A 3B model, VibeThinker-3B, achieves highly competitive results on verifiable reasoning tasks through post-training refinements on Qwen2.5-Coder, including curriculum SFT, multi-domain RL, offline self-distillation, and a final RL-based instruct stage.
This technical report introduces VibeThinker-3B, a 3B parameter model that achieves frontier-level verifiable reasoning performance through post-training refinements on Qwen2.5-Coder, including curriculum-based supervised fine-tuning, multi-domain reinforcement learning, and offline self-distillation, matching or exceeding much larger models like DeepSeek V3.2.
VibeThinker-3B is a compact 3B parameter model that achieves frontier-level performance on verifiable reasoning tasks through a specialized training pipeline, matching larger models like DeepSeek V3.2 and Gemini 3 Pro.
VibeThinker-3B is a 3B-parameter model that achieves frontier-level reasoning performance on math, coding, and STEM benchmarks by optimizing the Spectrum-to-Signal Principle (SSP) post-training pipeline, reaching performance comparable to much larger models.
DocScope is a new benchmark for evaluating the verifiable reasoning and trustworthiness of Multimodal Large Language Models on long documents, introducing a four-stage evaluation protocol for page localization, region grounding, fact extraction, and answer verification.