residual-connections

Tag

Cards List
#residual-connections

WAV: Multi-Resolution Block Residual Routing for Deep Decoder-Only Transformers

arXiv cs.LG · yesterday Cached

This paper introduces Multi-Resolution Residual Routing (WAV v1), an extension of Block Attention Residuals that augments block representations with directional detail bases, improving deep decoder-only Transformer training.

0 favorites 0 likes
#residual-connections

Rethinking Cross-Layer Information Routing in Diffusion Transformers

Hugging Face Daily Papers · 2026-05-20 Cached

This paper proposes Diffusion-Adaptive Routing (DAR), a learnable, timestep-adaptive residual replacement that improves cross-layer information flow in Diffusion Transformers, leading to significant training acceleration and quality improvements.

0 favorites 0 likes
#residual-connections

@simplifyinAI: DeepSeek has dropped a fundamental rewrite of the Transformer architecture. And it solves the "identity crisis" that br…

X AI KOLs Timeline · 2026-05-09

DeepSeek has published a paper introducing mHC (Manifold-Constrained Hyper-Connections), a fundamental rewrite of the Transformer architecture that stabilizes large models by replacing standard residual connections with mathematically constrained multi-stream pathways.

0 favorites 0 likes
← Back to home

Submit Feedback