Tag
A researcher debuted Shard, achieving 30 tok/s inference on a 744B parameter model distributed across 6 consumer GPUs over the open internet, a 15-20x improvement over previous methods.