@teach_fireworks: https://x.com/teach_fireworks/status/2067243590447952212

X AI KOLs Timeline 06/17/26, 01:50 PM Papers

Summary

SAG (SQL-Augmented Generation) is a novel SQL-based retrieval augmented generation method that converts data chunks into events and entities, enabling multi-hop reasoning via SQL join queries. On the MuSiQue dataset, recall increased from 65.13% to 80.04%. It supports second-level online retrieval of approximately 500 million data entries and has been open-sourced.

https://t.co/Ep8E3Syukl

Original Article

View Cached Full Text

Cached at: 06/17/26, 06:02 PM

Just Open-Sourced and SOTA, This Project Has Something: Why SAG Beats GraphRAG, Saves Money, and Handles Massive Enterprise Data?

Imagine you’re the technical lead at a tech company.

Last year, you built an internal knowledge base using GraphRAG. You ran a PoC with 20,000 contracts, and the Q&A results were stunning. Your boss approved the project, allocated budget, and greenlit production immediately.

Then the document count hit 5 million.

Problems start to surface:

Building the knowledge graph once takes an entire weekend. The R&D costs and token fees exceed the business value generated by those queries.

The compliance department modifies one contract, but the knowledge base isn’t updated until three days later.

The security team insists data cannot leave the domain, but the cloud LLM you rely on requires documents to be sent out.

Three questions cross your mind:

1. GraphRAG scores are SOTA on benchmarks, so why does it collectively fail in real-world enterprise scenarios?

2. Multi-hop queries like “Manager Zhang → the client he signed last year → that client’s renewal this year” — where does the bottleneck actually lie?

3. Is there a solution that not only wins on benchmarks but can also handle 500 million documents, daily incremental updates, and private deployment simultaneously?

This paper offers a new perspective and path:

https://arxiv.org/abs/2606.15971

SAG provides a counterintuitive answer:

SQL-augmented Retrieval Generation. SQL has been maturely applied in almost all storage systems for decades.

Bring this mature structured standard directly into retrieval and Agent systems!

That’s the point!

A very clever approach.

It reminds me of the recent Agentic Search solution based on the existing file system in ClaudeCode — they share the same spirit, achieving similar results through different means.

At first glance, it may not seem flashy, but it is both effective and practical.

And the project is already open-sourced:

https://github.com/Zleap-AI/SAG

SAG converts each data chunk into a semantically complete event and a set of indexed entities. Then, using SQL join queries, it dynamically links events sharing entities into local hyperedges, constructing a dynamically instantiated local index structure at query time.

This design eliminates the need for global graph reconstruction and continuous maintenance.

On the notoriously difficult MuSiQue multi-hop QA benchmark, it boosts Recall from 65.13% to 80.04%, while also handling sub-second online retrieval across approximately 500 million records.

This article breaks down what SAG does right — and why it might make “heavy graph” approaches like GraphRAG seem slow and expensive in the enterprise.

Multi-hop QA: Vector Retrieval Hit a Ceiling Long Ago

The hardest questions in the enterprise are almost always multi-hop.

“Manager Zhang, who handles compliance in our Shanghai office, signed a client last year. What is that client’s renewal status this year?” — To answer, you first need to find Manager Zhang, then find the client he signed last year, and then find that client’s renewal record for this year.

Three hops, three pieces of evidence scattered across different documents.

Standard vector retrieval hit a ceiling on this type of problem long ago.

It only finds semantically similar snippets; it cannot piece together the evidence chain “Manager Zhang → Client → Renewal.”

Worse, when an Agent performs multi-step retrieval, errors from each step accumulate and amplify along the reasoning chain — miss one critical entity in the first step, and the next three steps are built on a faulty foundation.

For an Agent, what the retrieval system truly needs to provide is the ability to stably organize evidence into a chain during multi-hop queries.

Vector retrieval cannot provide this capability. Thus, the industry split into several paths, with the core difference being:

When to build the structure, and how large to build it.

Naive RAG: Chunk, vectorize, top-k. Fast, cheap, scalable, but accuracy plateaus at shallow semantic matching. GraphRAG: Build the global knowledge graph upfront during ingestion — extract all entities and relations from the text, then perform community clustering. HippoRAG 2: Mimics the human hippocampus to create a cognitive graph, currently a recognized strong multi-hop baseline. SAG: Builds almost no graph upfront. During ingestion, it extracts only a lightweight local structure. At query time, it grows a chain on the fly.

The first three paths are in the right direction but are heavy in computation. SAG bets the opposite way.

GraphRAG’s Real Root Cause: The Structure is Built, but Not Used at Query Time

GraphRAG (Graph-enhanced Retrieval-Augmented Generation) is an advanced retrieval technique that combines Knowledge Graphs with Large Language Models (LLMs). Traditional RAG relies on text chunking and vector similarity for retrieval, which often fails on “global questions” spanning multiple documents or complex “multi-hop reasoning.”

GraphRAG builds a structured network of entities and relationships, giving AI the ability to “follow the vine” and connect the dots, much like humans.

GraphRAG’s selling point is using graphs to represent entity relationships. However, during actual querying, the system either performs similarity search on graph nodes or matches community summaries — it doesn’t actually traverse the multi-hop structure of the graph.

The finely crafted offline structure is decoupled from the online retrieval logic.

You pay the cost of building the graph, but what you get is merely the illusion of “multi-hop capability at query time.”

This “structure-retrieval decoupling” is what makes GraphRAG most frustrating for enterprises:

You build a sophisticated structure you can afford, but it’s not usable when you need it.

Let’s break down the costs further — three items enterprises cannot avoid:

Cost #1: Ingestion Cost

GraphRAG requires entity extraction, relation extraction for each document, followed by cross-document community clustering and summarization.

Furthermore, traditional knowledge graphs represent relationships using triplets like “subject-relation-object.” But real-world events are almost always multivariate:

One transaction involves the buyer, seller, product, amount, time — multiple entities. Forcing this into pairwise triplets slices the semantics and multiplies redundancy.

For 500 million documents, building the graph alone is a massive expense, compounded by error accumulation from triplet decomposition.

Cost #2: Incremental Update Cost

Enterprise documents change daily. With GraphRAG’s global graph, if one document changes, the community structure might change, theoretically requiring re-clustering. In practice, no one fully re-runs this, so the graph drifts further from reality, turning the knowledge base into a snapshot “forever stuck on last Wednesday.” There’s a fitting saying: When data evolves continuously, the cost of maintaining a global graph may even exceed its initial construction.

Cost #3: Private Deployment Cost

This is the most critical. Data in finance, healthcare, government, and law cannot leave the domain. Yet GraphRAG’s heavy extraction rounds either rely on powerful cloud LLMs or require deploying a sufficiently strong model privately on the intranet. Either way, it’s a double pressure of cost and compliance.

Combine these three costs, and you get the most common complaint in the enterprise:

“We tried GraphRAG, the results were good, but we couldn’t afford it, couldn’t modify it, and couldn’t get it online.”

HippoRAG 2

HippoRAG 2 upgrades traditional vector retrieval into “associative memory retrieval” using knowledge graphs, passage nodes, and Personalized PageRank. It significantly outperforms traditional RAG in multi-hop retrieval, cross-document association, and complex context understanding.

HippoRAG 2 essentially adds a layer of “long-term memory network” on top of vector retrieval to solve associative reasoning, at the cost of added complexity in graph construction, entity alignment, and memory maintenance.

HippoRAG 2 strikes a decent balance between retrieval efficiency and reasoning capability, achieving results close to graph reasoning systems with lower indexing costs. However, for enterprise scenarios, the cognitive graph still requires ongoing construction and maintenance. When facing long-term incremental data, system complexity does not completely disappear.

SAG’s Judgment: Move Structure Into Retrieval Itself

Since the root cause is “the structure is built but not used,” the solution is clear — move the structured capability into the retrieval execution itself, don’t solidify the structure offline.

At the offline stage, each data chunk is converted into an event and a set of entities, written into SQL, vector, and full-text indexes.

At the online stage, the system performs initial recall, then query-time expansion, and finally selection within the compressed candidate set.

SAG’s core structure is simple. For each document chunk, it extracts only two things:

An event that retains the complete semantics of the chunk, and a set of entities responsible for indexing and bridging.

chunk -> event # a complete event, preserving semantics chunk -> entities # multiple entities, responsible for indexing event <-> entities # many-to-many connections between events and entities

The key to understanding it:

Events and entities are two parallel results from the same chunk.

Events carry semantics, entities only serve indexing and bridging.

They are connected via database relational queries.

One event is associated with multiple entities (people, times, places, organizations, products…), which naturally forms a relational network.

This is the most fundamental difference between SAG and GraphRAG.

GraphRAG is like “drawing the entire graph on a wall in advance — anyone who moves requires a redraw.”

SAG is like “at query time, based on the question, starting from the matched entities, follow the event-entity connections to temporarily compute a path.”

In enterprise terms:

GraphRAG is like a company spending six months drawing a giant org chart on the wall. Anyone changing roles requires redrawing the chart.

SAG is like a query system that, when you ask, dynamically calculates on the fly: “Manager Zhang → the client he signed last year → that client’s renewal this year” — this chain grows on demand without touching the wall.

Even better, SAG’s multi-hop expansion is essentially relational database expansion. Using SQL join queries, it hops between events via shared entities, with a default of one hop.

It doesn’t need the heavy machinery of a graph database. Any database that can do table joins suffices.

Three Roles, Each Doing Its Job: What SQL, Vector, and LLM Do

SAG has another pragmatic design:

It splits the retrieval pipeline into three roles, each doing its own job without overstepping.

SQL handles deterministic filtering and joining — entity-to-event associations, multi-hop expansion. All rely on database relational queries: precise, explainable, cheap.

Vector retrieval handles semantic expansion — aliases, synonyms (e.g., “Manager Zhang” vs. “Old Zhang”, “renewal” vs. “extension”) are bridged via vector similarity.

LLM only performs final re-ranking on the compressed candidate set — after the first two stages reduce candidates from hundreds of thousands to about a hundred.

This directly aligns with the enterprise cost breakdown.

In the past, many RAG approaches blindly inserted the LLM into the main retrieval path, querying the model at every step, burning tokens each time.

SAG reverses that: Anything that can be handled by SQL should never be given to the model. Pushes expensive calls to the end, using them only when necessary.

There’s another trade-off worth mentioning:

SAG deliberately keeps entity processing “good enough” — it only does simple string normalization and deduplication, without pursuing perfect entity alignment.

That might sound lazy, but it’s actually quite sober:

The true semantic carrier is the event. Entities are just signposts that need to “bridge well enough.”

Instead of spending enormous effort on perfect entity alignment (where GraphRAG burns a lot of money), SAG saves the budget for what matters.

Hard Evidence: Where Does the Benchmark Improvement Come From?

Using the same configuration (bge-large-en-v1.5 + qwen3.6-flash) across three standard multi-hop QA datasets, compared to the recognized strong baseline HippoRAG 2:

MuSiQue, the most challenging multi-hop dataset, sees SAG bring Recall@5 from 65.13% to 80.04%, an absolute improvement of nearly 15 percentage points.

Average Recall@2 across the three datasets improves by 11.16 percentage points, a relative improvement of about 16.4%.

There’s a detail worth a second look for enterprise folks:

The most dramatic improvement is in Recall@2 — the ability to hit key evidence with just two retrieved documents.

This directly means the Agent can use less context, feed evidence to the model earlier —

In the enterprise, this translates to cost savings: fewer tokens, lower latency, and less distraction from irrelevant context in long tasks.

Another ablation study is telling.

Replacing the embedding model with the stronger NV-Embed-v2 improves MuSiQue Recall@5 further to 81.71%.

HippoRAG 2 with the same model scores 74.55%.

Using a stronger model certainly helps, but SAG’s lead remains.

This shows SAG’s gains come primarily from its structural design — critical for enterprises that need private deployment with limited model choices.

SAG’s strength is in true long-chain, cross-document multi-hop questions — precisely the hardest and most valuable class of problems in the enterprise.

Can It Really Run in the Enterprise? Check the Code and Scale

Demo URL: https://wiki.zleap.com/search

Benchmarks are academic. The enterprise only cares about one thing:

Can this system run in their own data center, on their own data, with their own models?

SAG provides two repositories.

First, the accompanying Benchmark reproduction code (github.com/Zleap-AI/SAG-Benchmark), with version management, experiment tracking, and fully open-sourced evaluation scripts:

# Upload dataset: convert evaluation data into corpus, write to DB and ES
uv run python scripts/run_upload.py --dataset musique

# Run reproduction benchmark: multi-path recall + multi-hop expansion
uv run python scripts/run_search_benchmark.py \
  --dataset-name musique \
  --strategy multi \
  --top-k 10 \
  --k-values "1,2,5,10" \
  --max-concurrency 10

The reproduction script supports MLflow for experiment recording, fixed data source versions, and outputs complete evaluation metrics (Recall, Precision, F1).

Reproducible, auditable, traceable — exactly what enterprise technical reviews want to see.

Second, an out-of-the-box local workbench (github.com/Zleap-AI/SAG)

The tech stack is TypeScript full-stack. The data layer uses PostgreSQL + pgvector + full-text search + SQL multi-hop.

The model layer is compatible with OpenAI-compatible interfaces — enterprises can use their own privately deployed models, with data never leaving the intranet.

For enterprise deployment, several design choices stand out:

Writing is concurrent at the chunk level.

This is the key to making “incremental updates” a native capability. Each chunk is independently extracted and ingested, non-blocking. No batch processing, no global recomputation needed. Contracts modified today can be queried today.

The retrieval process is fully visualized.

Every query shows the retrieval path taken, which multi-hop expansions occurred, and how long each step took, displayed in a right-side panel. When RAG goes wrong, the biggest headache is “not knowing why it answered that way.” Seeing the intermediate process enables pinpointing.

MCP integration as an Agent data backbone.

SAG workbench can act as an MCP Server, exposing its capabilities to external Agents. Internal customer service, risk control, and legal Agents can share the same data backbone. Documents are indexed once, reused across multiple Agents, each maintaining its own memory.

There’s already a Wikipedia retrieval demo running online with approximately 500 million data records (wiki.zleap.com/search), with online retrieval latency kept within seconds.

A PoC with 20,000 documents showing good results is easy. The challenge is handling 500 million records with sub-second response online.

This scale proves that SAG’s “lightweight structure” truly holds up under engineering pressure.

Putting it all together — reproducible benchmark scripts, a privately deployable workbench, 500-million-scale online validation, and an Agent-reusable data backbone — this is material that an enterprise technical selection team can take to project approval, far beyond a mere paper idea.

Where Are Its Boundaries, and When Not to Use It?

First, it depends on extraction quality.

SAG’s events and entities are extracted by an LLM. If the model is too weak, extraction will be poor, and all downstream retrieval will suffer. Enterprises trying to save money by using very small local models during private deployment must first evaluate whether the extraction quality is adequate. Without a solid foundation, no clever structure can save you.

Second, it may not be optimal for simple scenarios.

As mentioned earlier, on 2WikiMultiHopQA, SAG’s Recall@5/10 is slightly lower. If the majority of business queries are single-hop with concentrated evidence (e.g., pure FAQ, single-document retrieval), standard vector retrieval or even BM25 suffices. No need for multi-hop structures. SAG is purpose-built for long-chain, cross-document questions.

Third, long document chunking remains the foundation.

Events are extracted based on chunks. Poor chunking leads to incomplete events. Like all RAG, there is no silver bullet.

Fourth, it requires PostgreSQL + pgvector infrastructure.

This is slightly heavier than pure vector stores (Milvus, Qdrant), but significantly lighter than GraphRAG’s graph database plus full extraction pipeline. For enterprises already using PostgreSQL, migration cost is nearly zero.

Honestly, SAG aims to fill a specific niche: the position where enterprises need multi-hop queries, incremental updates, private deployment, large scale, and want to escape the cost of heavy graph construction.

And that niche is precisely the most painful and valuable spot in enterprise RAG deployment.

Back to the Core Insight

The article opened by stating that enterprise RAG fails because it’s too costly to build, too hard to modify, and doesn’t scale. Looking deeper, the real root cause is:

The structure you spent a fortune building is never actually used during query time.

SAG’s solution boils down to one sentence: Move the structured capability into the retrieval execution itself. Let the chain grow on demand at query time.

Offline, build only a lightweight index. Online, use SQL to calculate multi-hop relationships on the fly. The structure is re-embodied as something actually executed during querying, not just a decoration on the wall.

The value of this insight lies in its courage to subtract.

The default behavior in the RAG field over the past few years has been to “add more” — heavier graphs, more community structures, stronger models.

SAG asks the opposite:

What is truly necessary structure?

Can we achieve the same thing with the lightest possible version?

The answer is yes, and with better benchmarks.

For enterprise technology decision-makers, there’s a more practical lesson here.

In the past, selecting a RAG solution mainly focused on benchmarks and accuracy.

But in real enterprise projects, what determines whether a solution survives into its second year are things like graph construction cost, incremental freshness, private deployment capability, and scalability limits.

Benchmarks determine whether a PoC looks good. Engineering metrics determine whether the system lives.

I even think this could be taken a step further.

The event-entity index structure is so lightweight and supports continuous incremental writes that it is inherently suited to serve as a long-term memory backbone for running Agents —

With versioning and time awareness, it can remember what an Agent queried three months ago and how its state evolved.

That might be the longer-term vision beyond RAG.

But that’s a story for another day.

For now, one thing is certain:

The next wave of progress in RAG will likely come from this kind of topological trade-off — making things heavy where they need to be heavy, and lazy where they can be lazy.

It represents a direction worth serious attention from everyone working on enterprise RAG.