@vivekgalatage: Memory organization with Algorithmica is one resource that keeps shining. https://en.algorithmica.org/hpc/cpu-cache/

X AI KOLs Timeline 06/04/26, 10:47 AM Tools

memory-organization cpu-cache algorithmica programming performance optimization

Summary

A recommendation for the Algorithmica resource on CPU cache memory organization, which provides detailed experimental analysis and optimization techniques for in-memory algorithms.

Memory organization with Algorithmica is one resource that keeps shining. https://t.co/hvIj8m3S5l https://t.co/xItHdn30pv

Original Article

View Cached Full Text

Cached at: 06/05/26, 11:20 PM

Memory organization with Algorithmica is one resource that keeps shining.

https://t.co/hvIj8m3S5l https://t.co/xItHdn30pv

RAM & CPU Caches - Algorithmica

Source: https://en.algorithmica.org/hpc/cpu-cache/ In theprevious chapter, we studied computer memory from a theoretical standpoint, using theexternal memory modelto estimate the performance of memory-bound algorithms.

While the external memory model is more or less accurate for computations involving HDDs and network storage, where cost of arithmetic operations on in-memory values is negligible compared to external I/O operations, it is too imprecise for lower levels in the cache hierarchy, where the costs of these operations become comparable.

To perform more fine-grained optimization of in-memory algorithms, we have to start taking into account the many specific details of the CPU cache system. And instead of studying loads of boring Intel documents with dry specs and theoretically achievable limits, we will estimate these parameters experimentally by running numerous small benchmark programs with access patterns that resemble the ones that often occur in practical code.

Experimental Setup

As before, I will be running all experiments on Ryzen 7 4700U, which is a “Zen 2” CPU with the following main cache-related specs:

8 physical cores (without hyper-threading) clocked at 2GHz (and 4.1GHz in boost mode —which we disable);
256K of 8-way set associative L1 data cache or 32K per core;
4M of 8-way set associative L2 cache or 512K per core;
8M of 16-way set associative L3 cache,sharedbetween 8 cores;
16GB (2x8G) of DDR4 RAM @ 2667MHz.

You can compare it with your own hardware by runningdmidecode \-t cacheorlshw \-class memoryon Linux or by installingCPU-Zon Windows. You can also find additional details about the CPU onWikiChipand7-CPU. Not all conclusions will generalize to every CPU platform in existence.

Due to difficulties inpreventing the compiler from optimizing away unused values, the code snippets in this article are slightly simplified for exposition purposes. Check thecode repositoryif you want to reproduce them yourself.

Acknowledgements

This chapter is inspired by “Gallery of Processor Cache Effects” by Igor Ostrovsky and “What Every Programmer Should Know About Memory” by Ulrich Drepper, both of which can serve as good accompanying readings.

@vivekgalatage: Memory organization with Algorithmica is one resource that keeps shining. https://en.algorithmica.org/hpc/cpu-cache/

RAM & CPU Caches - Algorithmica

Experimental Setup

Acknowledgements

Similar Articles

@che_shr_cat: 1/ We have spent years optimizing KV cache via head-sharing (GQA/MQA), but we ignored a fundamental assumption: why do …

KV Cache Is Becoming the Memory Hierarchy of Inference

@appliedcompute: https://x.com/appliedcompute/status/2052826576723841292

@vivekgalatage: Best structured reference I've found for GPU optimization - 450 papers, 14 years of research. Some techniques will have…

@vivekgalatage: Introduction to Parallel Algorithms https://cs.cmu.edu/~guyb/paralg/paralg/parallel.pdf…

Submit Feedback

Similar Articles

@che_shr_cat: 1/ We have spent years optimizing KV cache via head-sharing (GQA/MQA), but we ignored a fundamental assumption: why do …

KV Cache Is Becoming the Memory Hierarchy of Inference

@appliedcompute: https://x.com/appliedcompute/status/2052826576723841292

@vivekgalatage: Best structured reference I've found for GPU optimization - 450 papers, 14 years of research. Some techniques will have…

@vivekgalatage: Introduction to Parallel Algorithms https://cs.cmu.edu/~guyb/paralg/paralg/parallel.pdf…