@vivekgalatage: Memory organization with Algorithmica is one resource that keeps shining. https://en.algorithmica.org/hpc/cpu-cache/
Summary
A recommendation for the Algorithmica resource on CPU cache memory organization, which provides detailed experimental analysis and optimization techniques for in-memory algorithms.
View Cached Full Text
Cached at: 06/05/26, 11:20 PM
Memory organization with Algorithmica is one resource that keeps shining.
https://t.co/hvIj8m3S5l https://t.co/xItHdn30pv
RAM & CPU Caches - Algorithmica
Source: https://en.algorithmica.org/hpc/cpu-cache/ In theprevious chapter, we studied computer memory from a theoretical standpoint, using theexternal memory modelto estimate the performance of memory-bound algorithms.
While the external memory model is more or less accurate for computations involving HDDs and network storage, where cost of arithmetic operations on in-memory values is negligible compared to external I/O operations, it is too imprecise for lower levels in the cache hierarchy, where the costs of these operations become comparable.
To perform more fine-grained optimization of in-memory algorithms, we have to start taking into account the many specific details of the CPU cache system. And instead of studying loads of boring Intel documents with dry specs and theoretically achievable limits, we will estimate these parameters experimentally by running numerous small benchmark programs with access patterns that resemble the ones that often occur in practical code.
Experimental Setup
As before, I will be running all experiments on Ryzen 7 4700U, which is a “Zen 2” CPU with the following main cache-related specs:
- 8 physical cores (without hyper-threading) clocked at 2GHz (and 4.1GHz in boost mode —which we disable);
- 256K of 8-way set associative L1 data cache or 32K per core;
- 4M of 8-way set associative L2 cache or 512K per core;
- 8M of 16-way set associative L3 cache,sharedbetween 8 cores;
- 16GB (2x8G) of DDR4 RAM @ 2667MHz.
You can compare it with your own hardware by runningdmidecode \-t cacheorlshw \-class memoryon Linux or by installingCPU-Zon Windows. You can also find additional details about the CPU onWikiChipand7-CPU. Not all conclusions will generalize to every CPU platform in existence.
Due to difficulties inpreventing the compiler from optimizing away unused values, the code snippets in this article are slightly simplified for exposition purposes. Check thecode repositoryif you want to reproduce them yourself.
Acknowledgements
This chapter is inspired by “Gallery of Processor Cache Effects” by Igor Ostrovsky and “What Every Programmer Should Know About Memory” by Ulrich Drepper, both of which can serve as good accompanying readings.
Similar Articles
@che_shr_cat: 1/ We have spent years optimizing KV cache via head-sharing (GQA/MQA), but we ignored a fundamental assumption: why do …
This thread challenges the fundamental assumption that Transformers require separate Q, K, and V projections, proposing that merging them can yield massive memory savings for KV cache.
KV Cache Is Becoming the Memory Hierarchy of Inference
The article discusses how the KV cache is evolving into a memory hierarchy for LLM inference, optimizing memory management during decoding.
@appliedcompute: https://x.com/appliedcompute/status/2052826576723841292
Applied Compute introduces ACL-Wiki, a continual learning memory system built on their Context Engine that logs coding agent interactions from Cursor, Claude Code, and Codex to build an improving Contextbase, roughly doubling the Critical Memory Rate over two weeks. The system uses a Remember-Refine-Retrieve pipeline exposed via MCP server to give coding agents institutional memory that improves with use.
@vivekgalatage: Best structured reference I've found for GPU optimization - 450 papers, 14 years of research. Some techniques will have…
A tweet shares a structured reference of 450 papers on GPU optimization spanning 14 years, noting that while some techniques evolve, the mental models remain useful. It also references a lecture on GPU architectures by Onur Mutlu.
@vivekgalatage: Introduction to Parallel Algorithms https://cs.cmu.edu/~guyb/paralg/paralg/parallel.pdf…
An introductory resource on parallel algorithms, covering fundamental concepts and techniques, from Carnegie Mellon University.