MOOSE-Star (ICML 2026): 7B model + 108K-paper dataset for scientific hypothesis discovery

Reddit r/LocalLLaMA Papers

Summary

MOOSE-Star presents a 7B model fine-tuned from DeepSeek-R1-Distill-Qwen-7B for scientific hypothesis discovery, along with a dataset of 108K NCBI papers. The model achieves state-of-the-art inspiration retrieval accuracy, outperforming larger models like GPT-5.4 and Gemini-3 Pro.

Disclosure first: I work on community at MiroMind. One of our researchers just dropped the full MOOSE-Star collection on Hugging Face — a 7B model post-trained for scientific hypothesis discovery, plus the dataset behind it. Paper accepted at ICML 2026. 🤗 Collection: [https://huggingface.co/collections/ZonglinY/moose-star-models-and-data](https://huggingface.co/collections/ZonglinY/moose-star-models-and-data) **Inside:** * **MS-IR-7B / MS-HC-7B / MS-7B**: 7B models for inspiration retrieval, hypothesis composition, and joint use. Base: DeepSeek-R1-Distill-Qwen-7B. * **TOMATO-Star**: 108,717 NCBI papers decomposed into (background, hypothesis, inspirations), every inspiration anchored to a real citation. Covers biology, chemistry, medicine, medical imaging, psychology, cognitive science. \~38,400 A800 GPU-hours of preprocessing went into building it. * **Strict temporal split for evaluation**: train ≤ Sep 2025, test = Oct 2025 (after the base model's knowledge cutoff). **Inspiration retrieval accuracy** |Model|IR accuracy| |:-|:-| |Random Selection|6.70%| |R1-Distilled-Qwen-7B (base)|28.42%| |Claude Sonnet 4.6|45.02%| |DeepSeek-R1|45.11%| |Gemini-3 Flash|51.44%| |GPT-5.4|51.50%| |**MS-7B (7B, joint IR + HC)**|**54.34%**| |**MS-IR-7B (7B, IR-only)**|**54.37%**| |Gemini-3 Pro|54.89%| Locally: it's a standard DeepSeek-R1-Distill-Qwen-7B fine-tune, so anything that runs that runs this — llama.cpp / vLLM / SGLang all fine. \~14GB at fp16, single 24GB card territory. Apache-2.0 code, CC-BY-4.0 data. Stress-test it, anything! Qestions or any views welcomed below! 📄 [https://arxiv.org/abs/2603.03756](https://arxiv.org/abs/2603.03756) 💻 [https://github.com/ZonglinY/MOOSE-Star](https://github.com/ZonglinY/MOOSE-Star)
Original Article

Similar Articles

deepseek-ai/DeepSeek-V4-Pro

Hugging Face Models Trending

DeepSeek releases V4-Pro and V4-Flash, Mixture-of-Experts models supporting million-token context with hybrid attention and Muon optimizer.

Open source battle: GLM vs Kimi vs MiMo vs DeepSeek

Reddit r/LocalLLaMA

This article tests four open-source Chinese AI models — Zhipu GLM 5.1, Moonshot Kimi K2.6, Stepfun MIMO 2.5 Pro, and DeepSeek V4 Pro — on programming tasks. It finds that GLM leads overall in most tasks but not absolutely; each model has its own strengths and weaknesses.