I fitted the new δ-mem research for apple silicon using mlx and openclaw integration! My findings

Reddit r/LocalLLaMA 05/16/26, 09:34 PM Papers

apple-silicon mlx memory attention open-source fine-tuning research

Summary

The author implements the δ-mem research paper on Apple Silicon using MLX and OpenClaw, showing memory and attention improvements in local AI agent tests, though with mixed results compared to CUDA benchmarks.

So I’ve been nerding out hard about memory, and have come to the conclusion that context management is too high level and dynamically changing the weights would be best. Luckily, this morning I checked my news feed and saw this new paper! [https://arxiv.org/abs/2605.12357](https://arxiv.org/abs/2605.12357) It improves model attention direction without using context or a lora with 20% better answers from their tests! It doesn’t use direct memory queries, or context, but weighted attention direction. I wanted to try it out on my MacMini 64g Apple silicon to see if it could improve my agents responses. Local agents are already usable, but even a slight improvement would be huge! I implemented it using mlx (way faster than ollama btw) and tested it with and without my openclaw session history. [https://github.com/elimaine/delta-mem-mlx-sidecar-w-openclaw](https://github.com/elimaine/delta-mem-mlx-sidecar-w-openclaw) Here’s the adaptor I made so it works with mlx: [https://huggingface.co/ofthetrees/delta-mem-qwen3-4b-instruct-mlx-adapter](https://huggingface.co/ofthetrees/delta-mem-qwen3-4b-instruct-mlx-adapter) δ-mem paper results (Qwen3-4B-Instruct) showed solid gains: \- Avg vs frozen backbone: \`1.10x\` \- MemoryAgentBench: \`1.31x\` \- LoCoMo: \`1.20x\` Local normalized mlx tests were more mixed: (I am fixing this chart, the no context numbers are misleading) **| Result | Plain | δ-mem | Lift |** **|---|---:|---:|---:|** **| LoCoMo state-only | 0.0500 (misleading, warmup) | 0.1833 | 3.67x |** **| LoCoMo session-context | 0.4667 | 0.5000 | 1.07x |** **| OpenClaw replay | 0.5701 | 0.6667 | 1.17x |** \- Synthetic probes were flat. \- LoCoMo-mini showed surprisingly strong relative gains. \- OpenClaw-style replay showed a smaller but more practically meaningful improvement (\`6/8 → 7/8\` probes passed). Overall the paper benchmarks look real, and local tests suggest δ-mem is doing something useful in realistic replay/memory scenarios. Finally.. the lower results are expected as Apple Silicon cannot run CUDA efficiently. I really want to try it on latest greatest local model for me qwen3.6:27b for mlx, which needs an adaptor model trained. My current estimate is that would cost like 6k to run in the cloud and as I am unemployed (hire me) I cannot afford that rn. If someone with a huge computer wants to pick up where I left off, it’s nearly all there, just need to tweak adaption generation for new qwens attention structure. The original test was already in qwen so that helps a lot. Thanks for reading! I’m proud of the project, which is my first groundbreaking in the field of open source ai!

Original Article

I fitted the new δ-mem research for apple silicon using mlx and openclaw integration! My findings

Similar Articles

Yesterday I saw a new research paper about δ-mem and integrated with openclaw

I've created the fastest local AI engine for Apple Silicon. Optimised for agentic use.

Command A+ (218B MoE) running on Apple Silicon — MLX port, PR open

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama)

@neural_avb: I am working on porting SAM models and harness into Apple silicon. Already seeing 1.25x inference speed increase on mlx…

Submit Feedback

Similar Articles

Yesterday I saw a new research paper about δ-mem and integrated with openclaw

I've created the fastest local AI engine for Apple Silicon. Optimised for agentic use.

Command A+ (218B MoE) running on Apple Silicon — MLX port, PR open

I built mlx-Chronos — a community benchmark leaderboard for local LLM engines on Apple Silicon (oMLX, Rapid-MLX, mlx-lm, Ollama)

@neural_avb: I am working on porting SAM models and harness into Apple silicon. Already seeing 1.25x inference speed increase on mlx…
Porting SAM 2.1 models to Apple silicon with MLX, achieving 1.25x inference speed increase on the small model, with quantized versions planned.