@simplifyinAI: DeepSeek has dropped a fundamental rewrite of the Transformer architecture. And it solves the "identity crisis" that br…
Summary
DeepSeek has published a paper introducing mHC (Manifold-Constrained Hyper-Connections), a fundamental rewrite of the Transformer architecture that stabilizes large models by replacing standard residual connections with mathematically constrained multi-stream pathways.
Similar Articles
@mark_k: Fascinating and very deep article about DeepSeek AI (@deepseek_ai). You would have never guessed what their strategy is…
An analysis of DeepSeek AI's unconventional strategy: prioritizing radical architecture innovations (MoE, MLA, engram, mHC) that drastically reduce compute and memory needs, enabling a long-term play to build a 10T Chinese AI hardware ecosystem and pursue a 1T valuation.
DeepSeek Introduces Vision
DeepSeek announces a new vision capability, likely a vision-language model, expanding its AI offerings.
@ZhihuFrontier: Half a year ago, a Zhihu contributor predicted that the next Transformer would absorb loops, recurrent state, sparse ro…
A Zhihu contributor's half-year-old prediction that the next Transformer would absorb loops, recurrent state, sparse routing, and latent reasoning is gaining relevance as Loop Engineering advances. The article explores how future Transformer architectures may evolve into hybrid models blending linear-complexity layers for background context with attention for precise reasoning, plus finer-grained sparsity and native System 2 reasoning.
DeepSeek-V4: a million-token context that agents can actually use
DeepSeek releases V4, a MoE model with a 1M-token context window optimized for agentic tasks through hybrid attention and reduced KV cache requirements.
Parallel Manifold Steering: Efficient Adaptation of Large Associative Memories via Residual Energy Shaping
This paper proposes H-Res, a method to adapt large transformer models by shaping the energy landscape of associative memories without modifying weights or adding prompts, preserving memory capacity and outperforming LoRA.