cross-attention

#cross-attention

SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling

arXiv cs.LG ↗ · 16h ago Cached

This paper proposes SCALE, a deep reinforcement learning scheduler for agentic LLM workflow DAGs that generalizes to unseen cluster sizes using cross-attention and structured representation regularization, reducing response time without retraining.

0 favorites 0 likes

#cross-attention

Query-based Cross-Modal Projector Bolstering Mamba Multimodal LLM

arXiv cs.CL ↗ · 4d ago Cached

This paper proposes a query-based cross-modal projector that compresses visual tokens via cross-attention to improve Mamba-based multimodal LLMs, boosting both performance and throughput on vision-language benchmarks while eliminating the need for manual 2D scan order design.

0 favorites 0 likes

#cross-attention

ERP-XTTN: Interpretable Prototype-Guided Cross-Attention for Cross-Subject ERP Classification

arXiv cs.LG ↗ · 5d ago Cached

Introduces ERP-XTTN, a cross-attention architecture for interpretable ERP classification across subjects without calibration. Evaluated on multiple datasets, it achieves competitive performance with black-box models while providing transparent routing insights.

0 favorites 0 likes

#cross-attention

ReactiveGWM: Steering NPC in Reactive Game World Models

Hugging Face Daily Papers ↗ · 2026-05-14 Cached

ReactiveGWM is a reactive game world model that enables dynamic player-NPC interactions by decoupling player controls from NPC behaviors using diffusion models and cross-attention modules, achieving zero-shot strategy transfer across different games.

0 favorites 0 likes

#cross-attention

@AdinaYakup: MOSS-VL Vision model from @Open_MOSS Model: https://huggingface.co/collections/OpenMOSS-Team/moss-vl… Demo: https://hug…

X AI KOLs Following ↗ · 2026-04-20

Open_MOSS released MOSS-VL, an 11B Apache 2.0 vision-language model using cross-attention and XRoPE that outperforms Qwen3-VL-8B by 8.3 points on VSI-bench.

0 favorites 0 likes

#cross-attention

Motif-Video 2B: Technical Report

Hugging Face Daily Papers ↗ · 2026-04-14 Cached

Motif-Video 2B is a 2B parameter text-to-video generation model that achieves 83.76% on VBench, surpassing Wan2.1 14B while using 7x fewer parameters and trained on fewer than 10M clips with less than 100,000 H200 GPU hours. The model uses a specialized architecture with shared cross-attention and a three-part backbone to separate prompt alignment, temporal consistency, and detail refinement.

0 favorites 0 likes

cross-attention

SCALE: Scalable Cross-Attention Learning with Extrapolation for Agentic Workflow Scheduling

Query-based Cross-Modal Projector Bolstering Mamba Multimodal LLM

ERP-XTTN: Interpretable Prototype-Guided Cross-Attention for Cross-Subject ERP Classification

ReactiveGWM: Steering NPC in Reactive Game World Models

@AdinaYakup: MOSS-VL Vision model from @Open_MOSS Model: https://huggingface.co/collections/OpenMOSS-Team/moss-vl… Demo: https://hug…

Motif-Video 2B: Technical Report

Submit Feedback