attention-mask

Tag

Cards List
#attention-mask

Block-Based Double Decoders

arXiv cs.LG · 2026-05-20 Cached

Proposes block-based double decoders, a novel transformer architecture using doubly-causal block-based attention masks to combine decoder-only training efficiency with encoder-decoder inference efficiency, achieving strong scaling performance and reduced KV-cache memory.

0 favorites 0 likes
← Back to home

Submit Feedback