moe-models

#moe-models

@TensordyneInc: https://x.com/TensordyneInc/status/2066567307984531834

X AI KOLs Following ↗ · 2d ago Cached

Tensordyne introduces Napier, an inference system using logarithmic math on silicon, claiming massive efficiency gains for MoE and reasoning models, with air-cooled racks.

0 favorites 0 likes

#moe-models

Why there is a lack of new 100B-120B models?

Reddit r/LocalLLaMA ↗ · 2d ago

Analysis of the trend in AI model sizes, noting a gap in the 100-120B parameter range with recent releases focusing on smaller (25-35B) or larger (200B+) models.

0 favorites 0 likes

#moe-models

Speed difference between Windows 11 and Linux with llama.cpp: a myth when using medium and large MoE models

Reddit r/LocalLLaMA ↗ · 2026-05-31

User benchmarks show no significant speed difference between Windows 11 and Linux when running large MoE models with llama.cpp, debunking a common myth. Tests on a multi-GPU setup with models like Qwen 3.5 122B, 397B, and MiniMax 2.7 yield nearly identical prompt processing and token generation speeds.

0 favorites 0 likes

#moe-models

@0xSero: Locally Part 1 - Apple Silicon Macs give you large pools of memory to run big models, but the token generation speed wi…

X AI KOLs Following ↗ · 2026-04-22 Cached

Apple Silicon Macs offer large memory pools for running big models but with slower token generation, performing best with large MoEs that have low active parameters.

0 favorites 0 likes

moe-models

@TensordyneInc: https://x.com/TensordyneInc/status/2066567307984531834

Why there is a lack of new 100B-120B models?

Speed difference between Windows 11 and Linux with llama.cpp: a myth when using medium and large MoE models

@0xSero: Locally Part 1 - Apple Silicon Macs give you large pools of memory to run big models, but the token generation speed wi…

Submit Feedback