Tag
Bernini proposes a unified video generation and editing framework that combines multimodal large language models for semantic planning with diffusion models for pixel rendering, achieving state-of-the-art performance through semantic interface separation and enhanced positional embeddings.
EndPrompt proposes a method for extending the context window of large language models using only short training sequences, by anchoring a terminal prompt with target-length positional indices. It achieves strong benchmark results with substantially less computation than full-length fine-tuning.