Tag
DeepSeek reportedly requires investors to promise not to poach its talent as part of its $7.4 billion fundraising round, highlighting the intense competition for AI engineers in China.
Testing DeepSeek v4 Flash on the AMD Ryzen AI Max+ 395 with 128GB RAM achieves ~15 TPS for a 284B MoE model (13B active) locally, costing $3,000 versus $25,000+ for a datacenter setup, highlighting the feasibility of running large models on consumer hardware.
Codex skills optimized for DeepSeek V4 Pro, saves 60-80% tokens by freezing skill files and minimal output, with cross-conversation persistent memory capability.
DeepSeek announces a new vision capability, likely a vision-language model, expanding its AI offerings.
A detailed configuration guide that teaches users how to connect OpenAI Codex to third-party models like DeepSeek through the open-source proxy tool CC Switch, solving protocol incompatibility issues.
Proposes a structural pruning framework for MoE models that maximizes channel-score coverage via attribution-based approximation, achieving 50% or 25% pruning with 4-bit quantization and reducing memory footprint by 5.27x on Qwen3-30B-A3B.
A DeepSeek researcher open-sourced AutoResearch, an autonomous framework that can plan, execute, and debug RL experiments on the DeepSeek 285B model without human intervention, accompanied by a self-play survey paper.
Discussing DeepSeek's recent financing and the departure of core team members Guo Daya and Wang Binxuan, pointing out the extremely low turnover rate, which reflects a good team culture.
Team members shared their experience of using AI (DeepSeek V4 Flash) to automatically create E2E test cases and complete development and debugging, passing acceptance in one go, demonstrating the potential of AI-assisted development.
Deli AutoResearch SKILL is open-sourced, an autonomous framework that automates GPU experiments and RL pipelines, with a companion survey paper on Self-play.
The US government has paused blacklisting DeepSeek but has designated over 100 other firms as security risks, impacting tech and AI companies.
This analysis updates the study of DeepSeek's research team, revealing that their talent pool has grown to 356 researchers with increasing citation impact and that over half have only Chinese affiliations, highlighting challenges for U.S. talent retention and independence.
The DeepSeek Harness team is in urgent need of talent; the hiring policy has been changed to separate Harness and non-Harness tracks.
Microsoft is reportedly considering integrating DeepSeek into its Copilot Cowork product.
A long-term study involving 26,000 Chinese middle and high school students found that after students independently used AI, homework performance improved by 18%, but closed-book exam scores dropped by 20% within six months. Zhongkao and Gaokao scores dropped by 24% and 18% respectively, and 81% of students used AI to complete their homework.
Nathan Lambert and Finbarr Timbers discuss the latest post-training recipes for large language models, including DeepSeek V4, GLM 5.1, Kimi K2.6, and the industry shift to multi-teacher on-policy distillation.
Reasonix (formerly named DeepSeek-Reasonix) is an AI coding agent CLI tool developed in Go, supporting features like skills, memory, Hooks, MCP, etc., and can replace OpenCode.
This paper proposes ASAG, a training-free method that adaptively stops reasoning in large reasoning models based on attention distributions, reducing token usage by ~40% while improving accuracy by 3.2% on benchmarks using DeepSeek-R1-Distill and Qwen3 models.
The tweet compares the post-training methods of Nemotron 3 Ultra and DeepSeek V4, noting both use multiple specialist teachers and on-policy distillation into a single student, but differ in support overlap.
This article discusses how China has rapidly advanced in AI despite being a latecomer, questioning the sources of datasets, computing power, and algorithms that enabled companies like DeepSeek to catch up with US leaders like OpenAI and Google.