The author implements the δ-mem research paper on Apple Silicon using MLX and OpenClaw, showing memory and attention improvements in local AI agent tests, though with mixed results compared to CUDA benchmarks.
So I’ve been nerding out hard about memory, and have come to the conclusion that context management is too high level and dynamically changing the weights would be best. Luckily, this morning I checked my news feed and saw this new paper! [https://arxiv.org/abs/2605.12357](https://arxiv.org/abs/2605.12357) It improves model attention direction without using context or a lora with 20% better answers from their tests! It doesn’t use direct memory queries, or context, but weighted attention direction. I wanted to try it out on my MacMini 64g Apple silicon to see if it could improve my agents responses. Local agents are already usable, but even a slight improvement would be huge! I implemented it using mlx (way faster than ollama btw) and tested it with and without my openclaw session history. [https://github.com/elimaine/delta-mem-mlx-sidecar-w-openclaw](https://github.com/elimaine/delta-mem-mlx-sidecar-w-openclaw) Here’s the adaptor I made so it works with mlx: [https://huggingface.co/ofthetrees/delta-mem-qwen3-4b-instruct-mlx-adapter](https://huggingface.co/ofthetrees/delta-mem-qwen3-4b-instruct-mlx-adapter) δ-mem paper results (Qwen3-4B-Instruct) showed solid gains: \- Avg vs frozen backbone: \`1.10x\` \- MemoryAgentBench: \`1.31x\` \- LoCoMo: \`1.20x\` Local normalized mlx tests were more mixed: (I am fixing this chart, the no context numbers are misleading) **| Result | Plain | δ-mem | Lift |** **|---|---:|---:|---:|** **| LoCoMo state-only | 0.0500 (misleading, warmup) | 0.1833 | 3.67x |** **| LoCoMo session-context | 0.4667 | 0.5000 | 1.07x |** **| OpenClaw replay | 0.5701 | 0.6667 | 1.17x |** \- Synthetic probes were flat. \- LoCoMo-mini showed surprisingly strong relative gains. \- OpenClaw-style replay showed a smaller but more practically meaningful improvement (\`6/8 → 7/8\` probes passed). Overall the paper benchmarks look real, and local tests suggest δ-mem is doing something useful in realistic replay/memory scenarios. Finally.. the lower results are expected as Apple Silicon cannot run CUDA efficiently. I really want to try it on latest greatest local model for me qwen3.6:27b for mlx, which needs an adaptor model trained. My current estimate is that would cost like 6k to run in the cloud and as I am unemployed (hire me) I cannot afford that rn. If someone with a huge computer wants to pick up where I left off, it’s nearly all there, just need to tweak adaption generation for new qwens attention structure. The original test was already in qwen so that helps a lot. Thanks for reading! I’m proud of the project, which is my first groundbreaking in the field of open source ai!
A new research paper on δ-mem improves agent response quality by 7-32% when integrated with openclaw. The project is currently usable only with mlx and Qwen3:4b, but adapters for other models are expected.
The author announces the release of 'lightning-mlx', a local AI engine optimized for Apple Silicon that achieves high token speeds for coding agents and tool-calling workflows.
A CS student built mlx-Chronos, an open-source CLI tool that standardizes benchmarking of MLX inference engines on Apple Silicon by measuring TTFT, throughput, memory usage, and thermal state, with a community leaderboard for sharing results.