Tag
Microsoft is reportedly considering integrating DeepSeek into its Copilot Cowork product.
A long-term study involving 26,000 Chinese middle and high school students found that after students independently used AI, homework performance improved by 18%, but closed-book exam scores dropped by 20% within six months. Zhongkao and Gaokao scores dropped by 24% and 18% respectively, and 81% of students used AI to complete their homework.
Nathan Lambert and Finbarr Timbers discuss the latest post-training recipes for large language models, including DeepSeek V4, GLM 5.1, Kimi K2.6, and the industry shift to multi-teacher on-policy distillation.
Reasonix (formerly named DeepSeek-Reasonix) is an AI coding agent CLI tool developed in Go, supporting features like skills, memory, Hooks, MCP, etc., and can replace OpenCode.
This paper proposes ASAG, a training-free method that adaptively stops reasoning in large reasoning models based on attention distributions, reducing token usage by ~40% while improving accuracy by 3.2% on benchmarks using DeepSeek-R1-Distill and Qwen3 models.
The tweet compares the post-training methods of Nemotron 3 Ultra and DeepSeek V4, noting both use multiple specialist teachers and on-policy distillation into a single student, but differ in support overlap.
This article discusses how China has rapidly advanced in AI despite being a latecomer, questioning the sources of datasets, computing power, and algorithms that enabled companies like DeepSeek to catch up with US leaders like OpenAI and Google.
An AI tool that will soon be open-source, using DeepSeek to automatically fetch AppStore user reviews and perform information mining, helping product managers understand user feedback, version issues, and product opportunities.
A guide on running DeepSeek 4 flash on a Mac M3 Max with 96GB RAM using Antirez's ds4 engine and SSD streaming, achieving ~12 tokens/second inference speed.
A technical overview of the state of local AI models in mid-2026, highlighting how open-weight models have narrowed the gap to frontier models through advances in mixture-of-experts and sparse attention, enabling efficient local inference.
A former Meta AI researcher shares a 10-point thread on the UK's sovereign AI debate, arguing that smaller, well-scoped teams can validate new directions without billions, and that nurturing local talent and managing expectations are crucial for the UK's AI ecosystem.
This tweet recommends using the Pi coding agent with DeepSeek and links to a detailed setup guide blog.
A technical investigation captured and compared the network traffic of ChatGPT, Gemini, and DeepSeek to understand how each system technically defines and attaches sources to responses, revealing three fundamentally different mechanisms and distinct citation preferences.
Analysis of DeepSeek V4's top coding scores versus its reported 8-month gap behind the frontier, highlighting differences between narrow benchmark optimization and broader reasoning tests, plus the practical performance hit when running quantized local versions.
A visit to DeepSeek's headquarters reveals its modest origins, young team, and unique culture. The company, operated out of a hedge fund, focuses on staying small and remains unconcerned about AGI risks, instead prioritizing societal concerns like job loss.
This article explains how to use GRPO to fine-tune an LLM (Qwen3-8B) for reliable JSON structured output, improving schema accuracy from 62% to 82%, surpassing GPT-4.1's 58%.
This paper investigates instruction finetuning of DeepSeek-R1-8B using LoRA and NEFTune for financial named-entity recognition, achieving a micro-F1 of 0.912 and outperforming several baseline models.
AI Gateway's May 2026 data shows DeepSeek's token share surged to 17% with minimal spend, while Anthropic retained 65% of spend, indicating cost-conscious routing and growing overall usage.
Introduces FlashMemory DeepSeek-V4 Retriever, a lightweight model that sparsifies DeepSeek-V4's CSA KV-cache by predicting which chunks will be attended to next, keeping only ~10-15% on-device while matching full-attention performance.
The article argues that the AI startup wave is unsustainable because intelligence is an infinitely replicable commodity with zero marginal cost, and most AI companies will collapse by 2029, leaving only a few giants owning the physical layer like energy and chips.