Local semantic memory search for OpenClaw agents using Harrier embeddings

Reddit r/openclaw Tools

Summary

This article presents a practical method to equip OpenClaw agents with local semantic memory search using Microsoft's Harrier embedding model, enabling efficient retrieval of relevant text chunks without external services.

I put together a small repo showing how to give an OpenClaw agent local semantic memory search without sending embeddings to an external service: The basic idea: run a small local embedding server around Microsoft’s Harrier embedding model, expose an Ollama-compatible API, and point OpenClaw’s memorySearch config at it. For anyone unfamiliar with Harrier: it’s a local embedding model from Microsoft (microsoft/harrier-oss-v1-0.6b) that produces high-quality text embeddings. In plain English, it turns chunks of text into vectors so your agent can search by meaning instead of only exact keywords. Why this matters for agent memory: Most agent memory systems hit one of two problems: 1. You shove too much memory into the prompt, which burns tokens and makes context messy. 2. You keep memory files small and manual, which becomes hard to maintain once the agent has real history. Semantic memory search gives you a better middle path. Your long-term memory can stay in normal markdown files: MEMORY.md, daily logs, notes, project files, whatever structure is easiest for a human to read and edit. Then the agent retrieves only the relevant chunks at runtime. That means: • Less token waste, because you are not stuffing every durable fact into every prompt. • Cleaner memory files, because they do not need to be obsessively compressed into one giant context-efficient blob. • Better recall, because the agent can find conceptually related notes even when the wording does not match exactly. • Easier debugging, because the source of truth stays plain text instead of disappearing into an opaque vector database. • Better privacy, because embeddings are computed locally. The repo includes: • A small Python embedding server. • Ollama-compatible /api/embed and /api/embeddings endpoints. • Example OpenClaw memorySearch config. • A macOS launchd service template. • A mock markdown memory corpus. • Smoke tests and a local query demo. The useful part here is less “new retrieval algorithm” and more “practical wiring.” OpenClaw already knows how to talk to Ollama-style embedding endpoints, so this gives it a local SOTA-ish semantic memory layer without requiring you to run full Ollama or ship private memory to a hosted embedding API. The pattern has been especially useful for keeping token usage under control while letting memory remain human-manageable. Instead of constantly hand-curating a tiny context block, you can keep richer notes on disk and let retrieval pull the few chunks that actually matter. Blog post: [https://coltoncoan.com/blog/local-agent-memory-with-openclaw-ollama-and-sentence-transformers/](https://coltoncoan.com/blog/local-agent-memory-with-openclaw-ollama-and-sentence-transformers/) Repo: [https://promptclickrun.github.io/harrier-openclaw-memory-search](https://promptclickrun.github.io/harrier-openclaw-memory-search)
Original Article

Similar Articles

An open source natural temporal memory for claude code, hermes and openclaw agent

Reddit r/ArtificialInteligence

agentmemory is an open-source library that provides natural temporal memory for AI agents like Claude Code, Hermes, and OpenClaw. It uses a three-tier architecture with hybrid retrieval (BM25, vector, knowledge graph) and Ebbinghaus decay, achieving ~92% fewer tokens and 200x more tool calls before context limits.

Liberate your OpenClaw

Hugging Face Blog

Hugging Face provides a guide to migrate OpenClaw agents from restricted Anthropic Claude models to open-source alternatives via Hugging Face Inference Providers or local hardware using tools like Llama.cpp.

Is your OpenClaw Ai agents Burning tokens like hell?

Reddit r/AI_Agents

The article critiques current browser AI agents for inefficiency due to repeatedly parsing and reasoning about the same websites, and proposes a model where agents reuse proven interaction paths to reduce token consumption and improve speed.