Tag
This paper proposes SCALE, a deep reinforcement learning scheduler for agentic LLM workflow DAGs that generalizes to unseen cluster sizes using cross-attention and structured representation regularization, reducing response time without retraining.
This paper proposes a query-based cross-modal projector that compresses visual tokens via cross-attention to improve Mamba-based multimodal LLMs, boosting both performance and throughput on vision-language benchmarks while eliminating the need for manual 2D scan order design.
Introduces ERP-XTTN, a cross-attention architecture for interpretable ERP classification across subjects without calibration. Evaluated on multiple datasets, it achieves competitive performance with black-box models while providing transparent routing insights.
ReactiveGWM is a reactive game world model that enables dynamic player-NPC interactions by decoupling player controls from NPC behaviors using diffusion models and cross-attention modules, achieving zero-shot strategy transfer across different games.
Open_MOSS released MOSS-VL, an 11B Apache 2.0 vision-language model using cross-attention and XRoPE that outperforms Qwen3-VL-8B by 8.3 points on VSI-bench.
Motif-Video 2B is a 2B parameter text-to-video generation model that achieves 83.76% on VBench, surpassing Wan2.1 14B while using 7x fewer parameters and trained on fewer than 10M clips with less than 100,000 H200 GPU hours. The model uses a specialized architecture with shared cross-attention and a three-part backbone to separate prompt alignment, temporal consistency, and detail refinement.