Tag
Sebastian Raschka reviews recent innovations in LLM architectures focused on long-context efficiency, including KV sharing, compressed convolutional attention, and layer-wise attention budgeting from models like Gemma 4, ZAYA1, Laguna XS.2, and DeepSeek V4.