Tag
Tony Wu released late-interaction-kernels (LIK): fused Triton kernels for MaxSim, the scoring step behind ColBERT and ColPali, integrated into PyLate and colpali-engine, offering memory efficiency and performance gains.
Wall Attention is a new attention variant with per-channel, per-timestep multiplicative decay, providing content-dependent forgetting rates and efficient training/decode kernels implemented in Triton.
Technical explanation comparing PyTorch's default autograd with UnslothAI's custom backpropagation kernels written in OpenAI's Triton language for faster LLM fine-tuning.