@vivekgalatage: Memory organization with Algorithmica is one resource that keeps shining. https://en.algorithmica.org/hpc/cpu-cache/

X AI KOLs Timeline Tools

Summary

A recommendation for the Algorithmica resource on CPU cache memory organization, which provides detailed experimental analysis and optimization techniques for in-memory algorithms.

Memory organization with Algorithmica is one resource that keeps shining. https://t.co/hvIj8m3S5l https://t.co/xItHdn30pv
Original Article
View Cached Full Text

Cached at: 06/05/26, 11:20 PM

Memory organization with Algorithmica is one resource that keeps shining.

https://t.co/hvIj8m3S5l https://t.co/xItHdn30pv


RAM & CPU Caches - Algorithmica

Source: https://en.algorithmica.org/hpc/cpu-cache/ In theprevious chapter, we studied computer memory from a theoretical standpoint, using theexternal memory modelto estimate the performance of memory-bound algorithms.

While the external memory model is more or less accurate for computations involving HDDs and network storage, where cost of arithmetic operations on in-memory values is negligible compared to external I/O operations, it is too imprecise for lower levels in the cache hierarchy, where the costs of these operations become comparable.

To perform more fine-grained optimization of in-memory algorithms, we have to start taking into account the many specific details of the CPU cache system. And instead of studying loads of boring Intel documents with dry specs and theoretically achievable limits, we will estimate these parameters experimentally by running numerous small benchmark programs with access patterns that resemble the ones that often occur in practical code.

Experimental Setup

As before, I will be running all experiments on Ryzen 7 4700U, which is a “Zen 2” CPU with the following main cache-related specs:

  • 8 physical cores (without hyper-threading) clocked at 2GHz (and 4.1GHz in boost mode —which we disable);
  • 256K of 8-way set associative L1 data cache or 32K per core;
  • 4M of 8-way set associative L2 cache or 512K per core;
  • 8M of 16-way set associative L3 cache,sharedbetween 8 cores;
  • 16GB (2x8G) of DDR4 RAM @ 2667MHz.

You can compare it with your own hardware by runningdmidecode \-t cacheorlshw \-class memoryon Linux or by installingCPU-Zon Windows. You can also find additional details about the CPU onWikiChipand7-CPU. Not all conclusions will generalize to every CPU platform in existence.

Due to difficulties inpreventing the compiler from optimizing away unused values, the code snippets in this article are slightly simplified for exposition purposes. Check thecode repositoryif you want to reproduce them yourself.

Acknowledgements

This chapter is inspired by “Gallery of Processor Cache Effects” by Igor Ostrovsky and “What Every Programmer Should Know About Memory” by Ulrich Drepper, both of which can serve as good accompanying readings.

Similar Articles

@appliedcompute: https://x.com/appliedcompute/status/2052826576723841292

X AI KOLs Timeline

Applied Compute introduces ACL-Wiki, a continual learning memory system built on their Context Engine that logs coding agent interactions from Cursor, Claude Code, and Codex to build an improving Contextbase, roughly doubling the Critical Memory Rate over two weeks. The system uses a Remember-Refine-Retrieve pipeline exposed via MCP server to give coding agents institutional memory that improves with use.