distributed-inference

#distributed-inference

@antirez: Based on what I'm saying with GLM 5.2 implementation inside DwarfStar, there is 90% of probability I'll merge the branc…

X AI KOLs Following ↗ · 2d ago

Antirez announces high probability of merging a branch implementing GLM 5.2 in DwarfStar, which could become the best model for 512GB Mac Studio and potentially run on distributed 128GB MacBooks with 2-bit quantization.

0 favorites 0 likes

#distributed-inference

Someone just ran a 744B parameter model at 30 tok/s across 6 consumer GPUs in 6 different US states over the open internet

Reddit r/ArtificialInteligence ↗ · 6d ago

A researcher debuted Shard, achieving 30 tok/s inference on a 744B parameter model distributed across 6 consumer GPUs over the open internet, a 15-20x improvement over previous methods.

0 favorites 0 likes

#distributed-inference

@MiaAI_lab: A PR to vLLM to allow TP=3 for MiniMax M3 His NVFP4 quant is 260GB - lukealonso/MiniMax-M3-NVFP4 Hopefully this will wo…

X AI KOLs Timeline ↗ · 2026-06-14 Cached

A pull request to vLLM adds support for tensor parallelism degree 3 for MiniMax M3 with its NVFP4 quantization, enabling the model to run on 3x DGX Sparks with 87GB memory each.

0 favorites 0 likes

#distributed-inference

@m_sirovatka: KV Cache re-use is the most important thing for agentic rollouts. We've integrated Mooncake Store into prime-rl with vL…

X AI KOLs Following ↗ · 2026-06-02 Cached

vLLM integrates Mooncake Store for distributed KV cache reuse, enabling cross-node prefix caching to efficiently serve agentic workloads with high token reuse.

0 favorites 0 likes

#distributed-inference

@dorsa_rohani: This paper might be the bible of distributed inference atp

X AI KOLs Timeline ↗ · 2026-05-23 Cached

A tweet recommending a paper that is described as the bible of distributed inference.

0 favorites 0 likes

#distributed-inference

Clustering Raspberry Pis together to learn distributed training/inference

Reddit r/LocalLLaMA ↗ · 2026-05-14

A blog post guides readers through setting up a Raspberry Pi cluster for distributed training and inference, part of a series aimed at making distributed AI accessible using affordable hardware.

0 favorites 0 likes

#distributed-inference

@antirez: Announcing with gratitude that @audreyt just gifted me an M5 Max 128GB MacBook Pro! It will let me develop DwarfStar4 (…

X AI KOLs Timeline ↗ · 2026-05-12

antirez announces receiving an M5 Max 128GB MacBook Pro from audreyt to develop DwarfStar4 and experiment with distributed inference across M3 Max and M5 Max hardware.

0 favorites 0 likes

#distributed-inference

Federation of Experts: Communication Efficient Distributed Inference for Large Language Models

Hugging Face Daily Papers ↗ · 2026-05-07 Cached

Federation of Experts (FoE) restructures mixture-of-experts blocks into clusters that process KV heads independently, eliminating inter-node communication bottlenecks and improving inference throughput and latency by up to 5.2x while maintaining generation quality.

0 favorites 0 likes

#distributed-inference

2x 512gb ram M3 Ultra mac studios

Reddit r/LocalLLaMA ↗ · 2026-04-21

A user shares their $25k hardware setup of two 512GB RAM M3 Ultra Mac Studios for running large language models locally, having tested DeepSeek V3 Q8 and GLM 5.1 Q4 via the exo distributed inference backend, while awaiting Kimi 2.6 MLX optimization.

0 favorites 0 likes

distributed-inference

Submit Feedback