Tag
This paper introduces Rotary GPU, an exploratory execution approach that enables running large Mixture-of-Experts models on consumer hardware with limited VRAM, achieving 21 tokens/s on an RTX 4060 with 8GB. It focuses on deployment accessibility rather than architectural improvements.
ByteDance releases DeerFlow 2.0, an open-source AI agent framework for local execution of tasks like coding, research, and content generation without cloud dependencies or subscriptions.