@simplifyinAI: DeepSeek has dropped a fundamental rewrite of the Transformer architecture. And it solves the "identity crisis" that br…
Summary
DeepSeek has published a paper introducing mHC (Manifold-Constrained Hyper-Connections), a fundamental rewrite of the Transformer architecture that stabilizes large models by replacing standard residual connections with mathematically constrained multi-stream pathways.
Similar Articles
DeepSeek-V4: a million-token context that agents can actually use
DeepSeek releases V4, a MoE model with a 1M-token context window optimized for agentic tasks through hybrid attention and reduced KV cache requirements.
deepseek-ai/DeepSeek-V4-Pro
DeepSeek releases V4-Pro and V4-Flash, Mixture-of-Experts models supporting million-token context with hybrid attention and Muon optimizer.
@HowToAI_: Google has quietly dropped what researchers are calling "Attention Is All You Need V2." And it signals the end of the T…
Google researchers introduce Nested Learning, a new architecture that replaces the Transformer by treating models as nested optimization problems, solving catastrophic forgetting and achieving 100% long-context memory stability.
DeepSeek V4 paper full version is out, FP4 QAT details and stability tricks [D]
DeepSeek released the full V4 paper detailing FP4 quantization-aware training, MoE training stability tricks (anticipatory routing and SwiGLU clamping), and a generative reward model for RLHF, achieving dramatic efficiency gains—V4-Flash uses only 10% of V3.2's FLOPs and 7% of its KV cache at 1M context length.
A Robust Foundation Model for Conservation Laws: Injecting Context into Flux Neural Operators via Recurrent Vision Transformers
This paper proposes a new architecture that augments Flux Neural Operators with recurrent Vision Transformers to solve conservation laws as a foundation model. It demonstrates robust generalization and long-time prediction capabilities across diverse conservative systems without explicit access to governing equations.