tpu-optimization

标签

Cards List
#tpu-optimization

Block-Wise Differentiable Sinkhorn Attention: Tail-Refinement Gradients with a Gap-Aware Dustbin Bridge

arXiv cs.LG · 3天前 缓存

This paper presents Block-Wise Differentiable Sinkhorn Attention, a method for efficient long-context balanced entropic optimal transport attention on TPU hardware. It introduces a tail-refinement surrogate for exact differentiation, proving an efficient backward pass schedule and demonstrating significant improvements in Pfam sequence alignment reconstruction.

0 人收藏 0 人点赞
← 返回首页

提交意见反馈