Why Vector RAG fails for AI coding agents at scale (And how I used a Neo4j graph to fix it)

Reddit r/ArtificialInteligence Tools

Summary

A new open-source tool called Writ uses a hybrid retrieval pipeline with BM25, ONNX vectors, and Neo4j graph traversals to provide context rules for AI coding agents, reducing token bloat by 726x and enforcing plan approval via bash hooks.

Everyone is treating AI coding memory as a 'week one' problem where you just dump a [`CLAUDE.md`](http://CLAUDE.md) file into the context. That breaks down the second you hit thousands of conflicting enterprise rules. Progressive disclosure still eats up thousands of tokens. I wanted to move the matching-decision completely OUT of the agent. I forced an LLM to help me build a tool called Writ. It sits on top of Claude Code and uses a 5-stage hybrid retrieval pipeline (BM25 + local ONNX vectors + Neo4j graph traversals) to return context rules in 0.55ms while cutting token bloat by 726x. The best part? It uses actual local bash terminal hooks to strip away the AI's write permissions until a valid plan and test skeletons are approved. No more AI agents lying or hallucinating dependencies. It's fully open-source and local-first. Check out the architecture and let me know if the graph-traversal logic makes sense: [https://github.com/infinri/Writ](https://github.com/infinri/Writ)
Original Article

Similar Articles