tpu-optimization

Tag

Cards List
#tpu-optimization

Block-Wise Differentiable Sinkhorn Attention: Tail-Refinement Gradients with a Gap-Aware Dustbin Bridge

arXiv cs.LG · yesterday Cached

This paper presents Block-Wise Differentiable Sinkhorn Attention, a method for efficient long-context balanced entropic optimal transport attention on TPU hardware. It introduces a tail-refinement surrogate for exact differentiation, proving an efficient backward pass schedule and demonstrating significant improvements in Pfam sequence alignment reconstruction.

0 favorites 0 likes
← Back to home

Submit Feedback