Tag
This paper introduces an incremental algorithm for Byte Pair Encoding (BPE) tokenization that processes each byte in O(log^2 t) time, enabling efficient partial tokenization in streaming settings and achieving speedups over existing implementations.
Dolphin-CN-Dialect is a streaming-capable ASR model that improves dialect recognition through temperature-based sampling and redesigned tokenization, achieving competitive performance with a smaller model size.
LingBot-Map is a feed-forward 3D foundation model for streaming 3D reconstruction that uses a Geometric Context Transformer architecture, achieving state-of-the-art performance with efficient ~20 FPS inference on long sequences exceeding 10,000 frames.