deepseek-ai/DeepSeek-V4-Flash

Hugging Face Models Trending Models

Summary

DeepSeek releases DeepSeek-V4-Flash and DeepSeek-V4-Pro, new MoE language models supporting 1 million token contexts with improved efficiency and performance.

Task: text-generation Tags: transformers, safetensors, deepseek_v4, text-generation, conversational, license:mit, eval-results, endpoints_compatible, 8-bit, fp8, region:us
Original Article Export to Word Export to PDF
View Cached Full Text

Cached at: 05/08/26, 08:58 AM

deepseek-ai/DeepSeek-V4-Flash · Hugging Face

Source: https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#deepseek-v4-towards-highly-efficient-million-token-context-intelligenceDeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence

DeepSeek-V4


HomepageChat

Hugging FaceTwitter Follow

License

Technical Report👁️

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#introductionIntroduction

We present a preview version ofDeepSeek-V4series, including two strong Mixture-of-Experts (MoE) language models —DeepSeek-V4-Prowith 1.6T parameters (49B activated) andDeepSeek-V4-Flashwith 284B parameters (13B activated) — both supporting a context length ofone million tokens.

DeepSeek-V4 series incorporate several key upgrades in architecture and optimization:

  1. Hybrid Attention Architecture:We design a hybrid attention mechanism combining Compressed Sparse Attention (CSA) and Heavily Compressed Attention (HCA) to dramatically improve long-context efficiency. In the 1M-token context setting, DeepSeek-V4-Pro requires only27% of single-token inference FLOPsand10% of KV cachecompared with DeepSeek-V3.2.
  2. **Manifold-Constrained Hyper-Connections (mHC):**We incorporate mHC to strengthen conventional residual connections, enhancing stability of signal propagation across layers while preserving model expressivity.
  3. **Muon Optimizer:**We employ the Muon optimizer for faster convergence and greater training stability.

We pre-train both models on more than32Tdiverse and high-quality tokens, followed by a comprehensive post-training pipeline. The post-training features a two-stage paradigm: independent cultivation of domain-specific experts (through SFT and RL with GRPO), followed by unified model consolidation via on-policy distillation, integrating distinct proficiencies across diverse domains into a single model.

DeepSeek-V4-Pro-Max, the maximum reasoning effort mode of DeepSeek-V4-Pro, significantly advances the knowledge capabilities of open-source models, firmly establishing itself as the best open-source model available today. It achieves top-tier performance in coding benchmarks and significantly bridges the gap with leading closed-source models on reasoning and agentic tasks. Meanwhile,DeepSeek-V4-Flash-Maxachieves comparable reasoning performance to the Pro version when given a larger thinking budget, though its smaller parameter scale naturally places it slightly behind on pure knowledge tasks and the most complex agentic workflows.

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#model-downloadsModel Downloads

*FP4 + FP8 Mixed: MoE expert parameters use FP4 precision; most other parameters use FP8.

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#evaluation-resultsEvaluation Results

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#base-modelBase Model

Benchmark (Metric)# ShotsDeepSeek-V3.2-BaseDeepSeek-V4-Flash-BaseDeepSeek-V4-Pro-BaseArchitecture-MoEMoEMoE# Activated Params-37B13B49B# Total Params-671B284B1.6TWorld KnowledgeAGIEval (EM)0-shot80.182.683.1MMLU (EM)5-shot87.888.790.1MMLU-Redux (EM)5-shot87.589.490.8MMLU-Pro (EM)5-shot65.568.373.5MMMLU (EM)5-shot87.988.890.3C-Eval (EM)5-shot90.492.193.1CMMLU (EM)5-shot88.990.490.8MultiLoKo (EM)5-shot38.742.251.1Simple-QA verified (EM)25-shot28.330.155.2SuperGPQA (EM)5-shot45.046.553.9FACTS Parametric (EM)25-shot27.133.962.6TriviaQA (EM)5-shot83.382.885.6****Language & ReasoningBBH (EM)3-shot87.686.987.5DROP (F1)1-shot88.288.688.7HellaSwag (EM)0-shot86.485.788.0WinoGrande (EM)0-shot78.979.581.5CLUEWSC (EM)5-shot83.582.285.2****Code & MathBigCodeBench (Pass@1)3-shot63.956.859.2HumanEval (Pass@1)0-shot62.869.576.8GSM8K (EM)8-shot91.190.892.6MATH (EM)4-shot60.557.464.5MGSM (EM)8-shot81.385.784.4CMath (EM)3-shot92.693.690.9Long ContextLongBench-V2 (EM)1-shot40.244.751.5

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#instruct-modelInstruct Model

DeepSeek-V4-Pro and DeepSeek-V4-Flash both support three reasoning effort modes:

Reasoning ModeCharacteristicsTypical Use CasesResponse FormatNon-thinkFast, intuitive responsesRoutine daily tasks, low-risk decisions</think\>summaryThink HighConscious logical analysis, slower but more accurateComplex problem-solving, planning<think\>thinking</think\>summaryThink MaxPush reasoning to its fullest extentExploring the boundary of model reasoning capabilitySpecial system prompt +<think\>thinking</think\>summary

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#deepseek-v4-pro-max-vs-frontier-modelsDeepSeek-V4-Pro-Max vs Frontier Models

Benchmark (Metric)Opus-4.6 MaxGPT-5.4 xHighGemini-3.1-Pro HighK2.6 ThinkingGLM-5.1 ThinkingDS-V4-Pro MaxKnowledge & ReasoningMMLU-Pro (EM)89.187.591.087.186.087.5SimpleQA-Verified (Pass@1)46.245.375.636.938.157.9Chinese-SimpleQA (Pass@1)76.476.885.975.975.084.4GPQA Diamond (Pass@1)91.393.094.390.586.290.1HLE (Pass@1)40.039.844.436.434.737.7LiveCodeBench (Pass@1)88.8-91.789.6-93.5Codeforces (Rating)-31683052--3206HMMT 2026 Feb (Pass@1)96.297.794.792.789.495.2IMOAnswerBench (Pass@1)75.391.481.086.083.889.8Apex (Pass@1)34.554.160.924.011.538.3Apex Shortlist (Pass@1)85.978.189.175.572.490.2****Long ContextMRCR 1M (MMR)92.9-76.3--83.5CorpusQA 1M (ACC)71.7-53.8--62.0AgenticTerminal Bench 2.0 (Acc)65.475.168.566.763.567.9SWE Verified (Resolved)80.8-80.680.2-80.6SWE Pro (Resolved)57.357.754.258.658.455.4SWE Multilingual (Resolved)77.5--76.773.376.2BrowseComp (Pass@1)83.782.785.983.279.383.4HLE w/ tools (Pass@1)53.152.051.654.050.448.2GDPval-AA (Elo)161916741314148215351554MCPAtlas Public (Pass@1)73.867.269.266.671.873.6Toolathlon (Pass@1)47.254.648.850.040.751.8

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#comparison-across-modesComparison across Modes

Benchmark (Metric)V4-Flash Non-ThinkV4-Flash HighV4-Flash MaxV4-Pro Non-ThinkV4-Pro HighV4-Pro MaxKnowledge & ReasoningMMLU-Pro (EM)83.086.486.282.987.187.5SimpleQA-Verified (Pass@1)23.128.934.145.046.257.9Chinese-SimpleQA (Pass@1)71.573.278.975.877.784.4GPQA Diamond (Pass@1)71.287.488.172.989.190.1HLE (Pass@1)8.129.434.87.734.537.7LiveCodeBench (Pass@1)55.288.491.656.889.893.5Codeforces (Rating)-28163052-29193206HMMT 2026 Feb (Pass@1)40.891.994.831.794.095.2IMOAnswerBench (Pass@1)41.985.188.435.388.089.8Apex (Pass@1)1.019.133.00.427.438.3Apex Shortlist (Pass@1)9.372.185.79.285.590.2****Long ContextMRCR 1M (MMR)37.576.978.744.783.383.5CorpusQA 1M (ACC)15.559.360.535.656.562.0****AgenticTerminal Bench 2.0 (Acc)49.156.656.959.163.367.9SWE Verified (Resolved)73.778.679.073.679.480.6SWE Pro (Resolved)49.152.352.652.154.455.4SWE Multilingual (Resolved)69.770.273.369.874.176.2BrowseComp (Pass@1)-53.573.2-80.483.4HLE w/ tools (Pass@1)-40.345.1-44.748.2MCPAtlas (Pass@1)64.067.469.069.474.273.6GDPval-AA (Elo)--1395--1554Toolathlon (Pass@1)40.743.547.846.349.051.8

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#chat-templateChat Template

This release does not include a Jinja-format chat template. Instead, we provide a dedicatedencodingfolder with Python scripts and test cases demonstrating how to encode messages in OpenAI-compatible format into input strings for the model, and how to parse the model’s text output. Please refer to theencodingfolder for full documentation.

A brief example:

from encoding_dsv4 import encode_messages, parse_message_from_completion_text

messages = [
    {"role": "user", "content": "hello"},
    {"role": "assistant", "content": "Hello! I am DeepSeek.", "reasoning_content": "thinking..."},
    {"role": "user", "content": "1+1=?"}
]

# messages -> string
prompt = encode_messages(messages, thinking_mode="thinking")

# string -> tokens
import transformers
tokenizer = transformers.AutoTokenizer.from_pretrained("deepseek-ai/DeepSeek-V4-Pro")
tokens = tokenizer.encode(prompt)

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#how-to-run-locallyHow to Run Locally

Please refer to theinferencefolder for detailed instructions on running DeepSeek-V4 locally, including model weight conversion and interactive chat demos.

For local deployment, we recommend setting the sampling parameters totemperature = 1\.0, top\_p = 1\.0. For the Think Max reasoning mode, we recommend setting the context window to at least384Ktokens.

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#licenseLicense

This repository and the model weights are licensed under theMIT License.

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#citationCitation

@misc{deepseekai2026deepseekv4,
      title={DeepSeek-V4: Towards Highly Efficient Million-Token Context Intelligence},
      author={DeepSeek-AI},
      year={2026},
}

https://huggingface.co/deepseek-ai/DeepSeek-V4-Flash#contactContact

If you have any questions, please raise an issue or contact us at[email protected].

Similar Articles

deepseek-ai/DeepSeek-V4-Pro

Hugging Face Models Trending

DeepSeek releases V4-Pro and V4-Flash, Mixture-of-Experts models supporting million-token context with hybrid attention and Muon optimizer.

DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]

Reddit r/MachineLearning

DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.

Jackrong/Qwen3.5-9B-DeepSeek-V4-Flash-GGUF

Hugging Face Models Trending

This entry describes Qwen3.5-9B-DeepSeek-V4-Flash, a distilled AI model that transfers reasoning capabilities from DeepSeek-V4 into a smaller 9B parameter space for efficient inference.