deepseek

Tag

Cards List
#deepseek

How can Deepseek v4 top the coding leaderboards and still sit 8 months behind the frontier?

Reddit r/LocalLLaMA · 2026-06-11

Analysis of DeepSeek V4's top coding scores versus its reported 8-month gap behind the frontier, highlighting differences between narrow benchmark optimization and broader reasoning tests, plus the practical performance hit when running quantized local versions.

0 favorites 0 likes
#deepseek

Notes on DeepSeek

Hacker News Top · 2026-06-10 Cached

A visit to DeepSeek's headquarters reveals its modest origins, young team, and unique culture. The company, operated out of a hedge fund, focuses on staying small and remains unconcerned about AGI risks, instead prioritizing societal concerns like job loss.

0 favorites 0 likes
#deepseek

@akshay_pachaar: https://x.com/akshay_pachaar/status/2064700531600458093

X AI KOLs Following · 2026-06-10 Cached

This article explains how to use GRPO to fine-tune an LLM (Qwen3-8B) for reliable JSON structured output, improving schema accuracy from 62% to 82%, surpassing GPT-4.1's 58%.

0 favorites 0 likes
#deepseek

Instruction Finetuning DeepSeek-R1-8B Model Using LoRA and NEFTune

arXiv cs.AI · 2026-06-10 Cached

This paper investigates instruction finetuning of DeepSeek-R1-8B using LoRA and NEFTune for financial named-entity recognition, achieving a micro-F1 of 0.912 and outperforming several baseline models.

0 favorites 0 likes
#deepseek

DeepSeek enters the fight for token volume, Anthropic continues to dominate spend (12 minute read)

TLDR AI · 2026-06-10 Cached

AI Gateway's May 2026 data shows DeepSeek's token share surged to 17% with minimal spend, while Anthropic retained 65% of spend, indicating cost-conscious routing and growing overall usage.

0 favorites 0 likes
#deepseek

FlashMemory DeepSeek-V4 Retriever (GitHub Repo)

TLDR AI · 2026-06-10 Cached

Introduces FlashMemory DeepSeek-V4 Retriever, a lightweight model that sparsifies DeepSeek-V4's CSA KV-cache by predicting which chunks will be attended to next, keeping only ~10-15% on-device while matching full-attention performance.

0 favorites 0 likes
#deepseek

We are buying something that clones itself

Reddit r/ArtificialInteligence · 2026-06-09

The article argues that the AI startup wave is unsustainable because intelligence is an infinitely replicable commodity with zero marginal cost, and most AI companies will collapse by 2029, leaving only a few giants owning the physical layer like energy and chips.

0 favorites 0 likes
#deepseek

@bookwormengr: Wonderful coverage on CANN (Huawei's CUDA) and DeepSeek V4 inference on Huawei chips.... "CANN (Compute Architecture fo…

X AI KOLs Timeline · 2026-06-09 Cached

Huawei has open-sourced its CANN software toolkit to compete with Nvidia's CUDA, and DeepSeek V4 shows significant inference performance improvements on Huawei Ascend chips.

0 favorites 0 likes
#deepseek

Here are some tips on hitting nearly 200 tok/s for DeepSeek v4 Flash on Hopper

Reddit r/LocalLLaMA · 2026-06-08 Cached

This blog post provides tips and benchmarks for achieving nearly 200 tokens per second inference on DeepSeek V4 Flash using vLLM on a dual GH200 workstation, highlighting the use of a quantized checkpoint from Canada-Quant and tensor parallelism optimizations.

0 favorites 0 likes
#deepseek

MiniMax is digging its own grave

Reddit r/AI_Agents · 2026-06-08

MiniMax's price increases and model limitations are driving users away to competitors like DeepSeek and premium options like Claude or ChatGPT, reversing its earlier reputation as a cheap, usable daily driver.

0 favorites 0 likes
#deepseek

DeepSeek V4 Pro beats GPT-5.5 Pro on precision

Hacker News Top · 2026-06-08

DeepSeek V4 Pro reportedly outperforms GPT-5.5 Pro on precision, suggesting a significant advancement in model accuracy.

0 favorites 0 likes
#deepseek

@GoSailGlobal: Practical data on multi-agent AI collaboration: Use Opus 4.8 for planning, Deepseek/Gemma for execution — 10x cost reduction, 2x speed improvement. The secret is not using the most expensive model, but having cheap models do the heavy lifting and expensive models only make decisions. This is the same as company management: the CEO shouldn't write code, and interns shouldn't set strategy. A…

X AI KOLs Timeline · 2026-06-08 Cached

A practical sharing on multi-agent AI collaboration, proposing a hierarchical strategy using Opus 4.8 for planning and Deepseek/Gemma for execution, achieving a 10x cost reduction and 2x speed improvement, with open-source implementation.

0 favorites 0 likes
#deepseek

@jakevin7: DeepSeek V4's "Think Max" mode essentially just adds "You must think through every step clearly, no shortcuts" at the start of the prompt. So is reasoning ability emergent, or... is it scolded into existence?

X AI KOLs Following · 2026-06-06 Cached

DeepSeek V4's "Think Max" mode essentially just adds a prompt prefix requiring step-by-step reasoning, sparking debate on the origin of reasoning ability.

0 favorites 0 likes
#deepseek

@cyrilXBT: Nemotron 3 Ultra versus DeepSeek V4 versus MiniMax M3 versus Qwen 3.7 Max. Same two prompts. Four frontier models. One …

X AI KOLs Following · 2026-06-06 Cached

A comparison of four frontier AI models (Nemotron 3 Ultra, DeepSeek V4, MiniMax M3, Qwen 3.7 Max) on the same two prompts, with full results linked.

0 favorites 0 likes
#deepseek

@antirez: DeepSeek v4 PRO running via SSD streaming on my 128GB MacBook m5 max. 1.6 trillion parameters.

X AI KOLs Timeline · 2026-06-04 Cached

DeepSeek v4 PRO, a 1.6 trillion parameter model, is running via SSD streaming on a 128GB MacBook m5 max, demonstrating local inference of a massive model.

0 favorites 0 likes
#deepseek

@queen_nunaa: Someone set up a repo on GitHub that lets you use Claude Code for free, forever. It works by routing Claude Code requests to 10 free providers like DeepSeek, Kimi, etc. Setup takes about five minutes, and already...

X AI KOLs Timeline · 2026-06-04 Cached

Someone created a repository on GitHub that forwards Claude Code requests to 10 free providers such as DeepSeek and Kimi, allowing users to use Claude Code for free and permanently. Setup takes only five minutes, and over 20,000 developers are already using it.

0 favorites 0 likes
#deepseek

Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3

Reddit r/LocalLLaMA · 2026-06-03

A discussion comparing DeepSeek V4 Pro, MiMo-V2.5-Pro, and MiniMax M3 for best value in local or openrouter use, with a focus on agentic and coding tasks, and mentions of Hermes Agent and Qwen 3.6 variants.

0 favorites 0 likes
#deepseek

@TheAhmadOsman: S-Tier Chinese Labs: Moonshot and DeepSeek These 2 are levels above everyone else

X AI KOLs Following · 2026-06-03 Cached

A brief opinion stating that Moonshot and DeepSeek are the top-tier Chinese AI labs, far ahead of others.

0 favorites 0 likes
#deepseek

Why Chinese AI Models Are Reshaping the Economics of AI

Reddit r/AI_Agents · 2026-06-03

Chinese AI models like DeepSeek and Qwen deliver competitive performance at 5x–20x lower cost than Western counterparts, reshaping the economics of AI and driving multi-model deployment strategies.

0 favorites 0 likes
#deepseek

@NeoResearchAI: We're Neo Research (新衡). Asia’s first independent frontier AI safety evaluation & research lab. Today we're publishing …

X AI KOLs Following · 2026-06-02 Cached

Neo Research (新衡), Asia's first independent frontier AI safety evaluation lab, announces its first report: a safety evaluation of DeepSeek v4 Pro.

0 favorites 0 likes
← Previous
Next →
← Back to home

Submit Feedback