hardware-architecture

#hardware-architecture

The memory wall gets expensive: KV cache is why you should stop worshiping softmax attention

Reddit r/singularity ↗ · 3d ago

The article discusses how rising DDR5 memory prices signal a broader memory bottleneck in AI, particularly the KV cache in softmax attention for LLMs, and highlights post-transformer architectures like linear attention and state space models that aim to reduce memory usage.

0 favorites 0 likes

#hardware-architecture

Die analysis of the 8087 math coprocessor's fast bit shifter (2020)

Hacker News Top ↗ · 5d ago Cached

Die analysis of the Intel 8087 math coprocessor's fast bit shifter, exploring its architecture and role in floating-point operations.

0 favorites 0 likes

#hardware-architecture

@LinQingV: When exploring LLM inference chip architectures previously, I reviewed the architectures of the four major AI inference ASIC companies: Groq, SambaNova, Tenstorrent, and Cerebras. While the first three have different emphases, their underlying logic falls within the same framework: large on-chip SRAM + dataflow architecture + deterministic scheduling...

X AI KOLs Timeline ↗ · 2026-05-09

The article analyzes the AI inference ASIC architectures of Groq, SambaNova, Tenstorrent, and Cerebras, highlighting Cerebras's unique wafer-scale engine design. It discusses the benefits of deterministic latency and high bandwidth for LLM inference, while noting challenges like yield, cost, and KV cache bottlenecks.

0 favorites 0 likes

hardware-architecture

The memory wall gets expensive: KV cache is why you should stop worshiping softmax attention

Die analysis of the 8087 math coprocessor's fast bit shifter (2020)

Submit Feedback