limited-vram

Tag

Cards List
#limited-vram

Rotary GPU: Exploring Local Execution for Large MoE Models Under Limited VRAM

Hacker News Top · 5d ago Cached

This paper introduces Rotary GPU, an exploratory execution approach that enables running large Mixture-of-Experts models on consumer hardware with limited VRAM, achieving 21 tokens/s on an RTX 4060 with 8GB. It focuses on deployment accessibility rather than architectural improvements.

0 favorites 0 likes
← Back to home

Submit Feedback