Tag
Intel's discontinued Optane persistent memory technology is finding a second life in AI workloads, enabling a user to run a 1 trillion parameter model locally at ~4 tokens/second using cheap second-hand Optane modules. The article highlights Optane's lower latency compared to SSDs, making it suitable for large model inference despite being slower than DRAM.
A community member details a custom PC build using discontinued Intel Optane Persistent Memory to successfully run the 1-trillion parameter Kimi K2.5 model locally at roughly 4 tokens per second via llama.cpp.