@tom_doerr: Modular GraphRAG implementation in Rust with WebGPU acceleration support. https://github.com/automataIA/graphrag-rs…

X AI KOLs Timeline Tools

Summary

A modular, high-performance Rust implementation of GraphRAG (Graph-based Retrieval Augmented Generation) with support for WebGPU acceleration and three deployment architectures: server-only, WASM-only (client-side), and hybrid.

Modular GraphRAG implementation in Rust with WebGPU acceleration support. https://t.co/spRsBfxNX7 https://t.co/E6ywynpCYa
Original Article
View Cached Full Text

Cached at: 06/28/26, 02:10 PM

Modular GraphRAG implementation in Rust with WebGPU acceleration support.

https://t.co/spRsBfxNX7 https://t.co/E6ywynpCYa


automataIA/graphrag-rs

Source: https://github.com/automataIA/graphrag-rs

GraphRAG-rs

GraphRAG Network Visualization

A high-performance, modular Rust implementation of GraphRAG (Graph-based Retrieval Augmented Generation) with three deployment architectures: Server-Only, WASM-Only (100% client-side), and Hybrid. Build knowledge graphs from documents and query them with natural language, with GPU acceleration support via WebGPU.

Rust License WASM WebGPU Docs

30-Second Quick Start

CLI (no config file needed):

cargo install --path graphrag-cli           # one-time install
graphrag index ./mydoc.txt                  # builds ./graphrag-data
graphrag ask "What is the main topic?"      # answers from the graph

Add --ollama to either command for LLM-quality entity extraction (requires ollama serve running locally).

Library (Rust):

use graphrag::GraphRAG;

#[tokio::main]
async fn main() -> anyhow::Result<()> {
    let mut g = GraphRAG::quick_start("Plato's Symposium full text here...").await?;
    println!("{}", g.ask("Who is Diotima?").await?);
    Ok(())
}

Both flows use sensible defaults — hash-fallback embeddings, pattern-based entity extraction, persistent workspace. Opt into Ollama / GLiNER / custom chunking with the builder when you need more.

Prerequisites

System Requirements

  • Rust 1.85+ with wasm32-unknown-unknown target
  • Node.js 18+ (for WASM builds)
  • Git for cloning

Platform-Specific Dependencies

Linux (Ubuntu/Debian)

# Basic build tools
sudo apt update
sudo apt install -y build-essential pkg-config

# For GPU acceleration features (Metal/WebGPU dependencies)
sudo apt install -y gobjc gnustep-devel libgnustep-base-dev

# Optional: For Qdrant vector database
docker-compose   # For containerized vector storage

macOS

# Xcode Command Line Tools (includes Objective-C compiler)
xcode-select --install

# Optional: Homebrew for additional tools
brew install rustup

Windows

# Install Visual Studio Build Tools with C++ support
# Or use Visual Studio Community with C++ development tools

# Install Rust with Windows target support
rustup target add wasm32-unknown-unknown

Optional Dependencies

  • Ollama for local LLM embeddings: ollama pull nomic-embed-text
  • Docker for Qdrant vector database: docker-compose up -d
  • Trunk for WASM builds: cargo install trunk wasm-bindgen-cli

Deployment Options

GraphRAG-rs supports three deployment architectures - choose based on your needs:

Option 1: Server-Only (Traditional) ✅ Production Ready

git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs

# Start Qdrant (optional)
cd graphrag-server && docker-compose up -d

# Start Ollama for embeddings (required for real semantic search)
ollama serve &
ollama pull nomic-embed-text

# Start GraphRAG server with real embeddings
export EMBEDDING_BACKEND=ollama
cargo run --release --bin graphrag-server --features "qdrant,ollama"

Best for: Multi-tenant SaaS, mobile apps, GPU workloads, >1M documents

Features:

  • ✅ Qdrant vector database integration (production-ready)
  • ✅ Real embeddings via Ollama with GPU acceleration
  • ✅ Hash-based fallback embeddings (no dependencies)
  • ✅ REST API with semantic search
  • ✅ Docker Compose setup
  • ✅ 5.2MB release binary (optimized)

Option 2: WASM-Only (100% Client-Side) ✅ Production Ready

# Install trunk for WASM builds
cargo install trunk wasm-bindgen-cli

# Build and run WASM app with GPU acceleration
cd graphrag-wasm
trunk serve --open

Best for: Privacy-first apps, offline tools, zero infrastructure cost, edge deployment

Status: Fully Functional!

  • ✅ Complete GraphRAG pipeline running in browser
  • ✅ ONNX Runtime Web (GPU-accelerated embeddings)
  • ✅ WebLLM integration (Phi-3-mini for LLM synthesis)
  • ✅ Pure Rust vector search (cosine similarity)
  • ✅ Full Leptos UI with document upload and query interface
  • ✅ Entity extraction with relationships
  • ✅ Natural language answer synthesis
  • ✅ Demo available: Plato’s Symposium (2691 entities)

Option 3: Hybrid (Recommended) Planned

Use WASM client for real-time UI with optional server for heavy processing.

Best for: Enterprise apps, multi-device sync, best UX with scalability

Status: Architecture designed, implementation in Phase 3

See graphrag-server/README.md for server documentation.

State-of-the-Art Quality Improvements

GraphRAG-rs implements 5 cutting-edge research papers (2019-2025) for superior retrieval quality:

Research-Based Features

FeatureImpactPaperStatus
LightRAG Dual-Level Retrieval6000x token reductionEMNLP 2025✅ Production
Leiden Community Detection+15% modularitySci Reports 2019✅ Production
Cross-Encoder Reranking+20% accuracyEMNLP 2019✅ Production
HippoRAG Personalized PageRank10-30x cheaperNeurIPS 2024✅ Production
Semantic ChunkingBetter boundariesLangChain 2024✅ Production

Combined Result: +20% accuracy with 99% cost savings!

New: Advanced Reasoning & Optimization (2025-2026)

Building on state-of-the-art foundations, GraphRAG-rs now implements 7 cutting-edge techniques from recent research:

PhaseFeatureImpactStatus
Phase 2Symbolic Anchoring (CatRAG-style)Better conceptual queries✅ Complete
Phase 2Dynamic Edge WeightingContext-aware ranking✅ Complete
Phase 2Causal Chain AnalysisMulti-step reasoning✅ Complete
Phase 3Hierarchical Relationship ClusteringMulti-level organization✅ Complete
Phase 3Graph Weight Optimization (DW-GRPO)Adaptive learning✅ Complete

Key Capabilities

  • Symbolic Anchoring: Automatically grounds abstract concepts (like “love” or “justice”) to concrete entities for better conceptual query handling
  • Dynamic Weighting: Adjusts relationship importance based on query context using semantic, temporal, and causal signals
  • Causal Reasoning: Discovers multi-step causal chains with temporal consistency validation
  • Hierarchical Clustering: Organizes relationships into multi-level hierarchies using Leiden algorithm with LLM-generated summaries
  • Weight Optimization: Learns optimal relationship weights through heuristic optimization for improved retrieval quality

Full Documentation: See HOW_IT_WORKS.md for the pipeline deep-dive, and docs.rs/graphrag-core for the API reference.

Enable Advanced Features

[dependencies]
graphrag-core = { path = "../graphrag-core", features = ["lightrag", "leiden", "cross-encoder", "pagerank", "async"] }
# my_config.toml
[enhancements]
enabled = true

[enhancements.lightrag]
enabled = true
max_keywords = 20           # 6000x token reduction vs traditional GraphRAG
high_level_weight = 0.6
low_level_weight = 0.4

[enhancements.leiden]
enabled = true
max_cluster_size = 10       # Better quality than Louvain
resolution = 1.0

[enhancements.cross_encoder]
enabled = true
model_name = "cross-encoder/ms-marco-MiniLM-L-6-v2"
top_k = 10                  # +20% accuracy improvement

# Advanced Features (Phases 2-3)
[advanced_features.symbolic_anchoring]
min_relevance = 0.3         # Minimum relevance for concept anchors
max_anchors = 5             # Maximum anchors per query

[advanced_features.dynamic_weighting]
enable_semantic_boost = true    # Boost relationships similar to query
enable_temporal_boost = true    # Boost recent/relevant relationships
enable_causal_boost = true      # Boost strong causal relationships

[advanced_features.causal_analysis]
min_confidence = 0.3            # Minimum confidence for causal chains
max_chain_depth = 5             # Maximum chain depth to search
require_temporal_consistency = true  # Enforce chronological ordering

[advanced_features.hierarchical_clustering]
num_levels = 3                  # Number of hierarchy levels (2-5)
generate_summaries = true       # LLM-generated cluster summaries

[advanced_features.weight_optimization]
learning_rate = 0.05            # Learning rate for optimization
max_iterations = 20             # Maximum optimization iterations
use_llm_eval = true             # Use LLM for quality evaluation

Quick Start Example: See graphrag-core/config-examples/quick-start.toml for a minimal configuration.

Documentation: See HOW_IT_WORKS.md for full details on the pipeline.

Installation

Prerequisites

  • Rust 1.85 or later
  • (Optional) Ollama for local LLM support - Install Ollama

From Source

git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs
cargo build --release

# Optional: Install globally
cargo install --path .

Quick Start (5 Lines!)

The fastest way to get started with GraphRAG:

use graphrag_core::prelude::*;

#[tokio::main]
async fn main() -> Result<()> {
    let mut graphrag = GraphRAG::quick_start("Your document text").await?;
    let answer = graphrag.ask("What is this about?").await?;
    println!("{}", answer);
    Ok(())
}

With Compile-Time Safety (TypedBuilder)

use graphrag_core::prelude::*;

let graphrag = TypedBuilder::new()
    .with_output_dir("./output")  // Required - won't compile without
    .with_ollama()                 // Required - choose your LLM backend
    .with_chunk_size(512)          // Optional
    .build_and_init()?;

Get Explained Answers

let explained = graphrag.ask_explained("Who founded the company?").await?;
println!("Answer: {}", explained.answer);
println!("Confidence: {:.0}%", explained.confidence * 100.0);
for source in &explained.sources {
    println!("Source: {} (relevance: {:.0}%)", source.id, source.relevance_score * 100.0);
}

CLI Setup Wizard

# Interactive configuration wizard
graphrag-cli setup

# With domain template
graphrag-cli setup --template legal

Feature Bundles

Choose the right features for your use case:

[dependencies]
graphrag-core = { version = "0.1", features = ["starter"] }   # Getting started
graphrag-core = { version = "0.1", features = ["full"] }      # Production
graphrag-core = { version = "0.1", features = ["research"] }  # Advanced

Full Guide: See HOW_IT_WORKS.md and graphrag-core/README.md for detailed getting-started documentation.

Basic Usage

1. Simple API (One Line)

use graphrag_rs::simple;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let answer = simple::answer("Your document text", "Your question")?;
    println!("Answer: {}", answer);
    Ok(())
}

2. Stateful API (Multiple Queries)

use graphrag_rs::easy::SimpleGraphRAG;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut graph = SimpleGraphRAG::from_text("Your document text")?;

    let answer1 = graph.ask("What is this about?")?;
    let answer2 = graph.ask("Who are the main characters?")?;

    println!("Answer 1: {}", answer1);
    println!("Answer 2: {}", answer2);
    Ok(())
}

3. Builder API (Configurable)

use graphrag_rs::{GraphRAG, ConfigPreset};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut graphrag = GraphRAG::builder()
        .with_preset(ConfigPreset::Balanced)
        .auto_detect_llm()
        .build()?;

    graphrag.add_document("Your document")?;
    let answer = graphrag.ask("Your question")?;

    println!("Answer: {}", answer);
    Ok(())
}

Understanding GraphRAG

New to GraphRAG? Start here:

  • How It Works - Complete 7-stage pipeline explanation with diagrams and examples
  • Config Guide - Full JSON5/TOML configuration reference
  • Examples - Hands-on code examples from basic to advanced
  • Changelog - Feature history and recent updates
  • API reference - graphrag-core on docs.rs

Complete 7-Stage Pipeline Schema

INDEXING (build_graph())
├── Phase 1: CHUNKING          → chunk_size, chunk_overlap
├── Phase 2: ENTITY EXTRACTION → approach, entity_types, use_gleaning
├── Phase 3: RELATIONSHIP      → extract_relationships, use_gleaning
└── Phase 4: GRAPH CONSTRUCTION → enable_pagerank, max_connections

QUERY (ask())
├── Phase 5: EMBEDDING         → backend, dimension, model
├── Phase 6: RETRIEVAL         → strategy, top_k
└── Phase 7: ANSWER GENERATION → chat_model, temperature

Pipeline Configuration Summary

PhaseGoalKey Parameters
1. ChunkingSplit textchunk_size (300), chunk_overlap (30)
2. ExtractionIdentify entitiesapproach (hybrid), entity_types
3. RelationshipsConnect entitiesextract_relationships (true)
4. GraphBuild networkmax_connections (50), enable_pagerank
5. EmbeddingVectorize databackend (openai), dimension (1536)
6. RetrievalFind contextstrategy (hybrid), top_k (10)
7. GenerationAnswer querychat_model (gpt-4o), temperature (0.0)

See HOW_IT_WORKS.md and config/JSON5_CONFIG_GUIDE.md for detailed configuration and performance tuning.

4. CLI Usage

GraphRAG-rs provides two CLI tools:

Smart CLI (Recommended) - simple_cli

Automatically detects if the knowledge graph needs building and handles everything for you:

# Build the Smart CLI
cargo build --release --bin simple_cli

# Process document and answer question in one command
cargo run --bin simple_cli config.toml "What are the main themes?"

# Interactive mode - builds graph if needed, then waits for questions
cargo run --bin simple_cli config.toml

# How it works:
# 1. Loads your TOML configuration
# 2. Checks if knowledge graph exists
# 3. Builds graph if needed (shows progress)
# 4. Answers your question using Ollama
# 5. Saves results to output directory

Manual CLI - graphrag-rs

For advanced users who want full control:

# Build the manual CLI
cargo build --release

# Step 1: Build knowledge graph
./target/release/graphrag-rs config.toml build

# Step 2: Query the graph
./target/release/graphrag-rs config.toml query "Your question"

Configuration

Basic Configuration (config.toml)

The project includes several ready-to-use configuration templates:

Available Templates:

  • config.toml - Basic configuration for general use
  • config_complete.toml - Full configuration with all options
  • config_tom_sawyer.toml - Pre-configured for book processing
  • config_example.toml - Annotated template with explanations

Essential Configuration Fields:

[general]
# IMPORTANT: Change these two paths for your project!
input_document_path = "path/to/your/document.txt"  # Your document to process
output_dir = "./output/your_project"                # Where to save results

[pipeline]
chunk_size = 800        # Size of text chunks (adjust based on document type)
chunk_overlap = 200     # Overlap to preserve context between chunks

[ollama]
enabled = true
host = "http://localhost"
port = 11434
chat_model = "llama3.1:8b"           # LLM for text generation
embedding_model = "nomic-embed-text"  # Model for embeddings

Quick Setup:

  1. Copy a template: cp config_complete.toml my_project.toml
  2. Edit input_document_path to point to your document
  3. Edit output_dir to set where results are saved
  4. Run: cargo run --bin simple_cli my_project.toml

See config_example.toml for detailed explanations of all options.

Embedding Providers Configuration

GraphRAG Core supports 8 embedding providers for maximum flexibility:

[embeddings]
backend = "huggingface"  # Free, offline (default)
# backend = "openai"     # Best quality ($0.13/1M tokens)
# backend = "voyage"     # Anthropic recommended
# backend = "cohere"     # Multilingual (100+ languages)
# backend = "jina"       # Cost-optimized ($0.02/1M)
# backend = "mistral"    # RAG-optimized
# backend = "together"   # Cheapest ($0.008/1M)
# backend = "ollama"     # Local GPU

model = "sentence-transformers/all-MiniLM-L6-v2"
dimension = 384
batch_size = 32
cache_dir = "~/.cache/huggingface"

# For API providers, set api_key or use environment variables
# api_key = "your-key"  # Or set OPENAI_API_KEY, VOYAGE_API_KEY, etc.

Provider Comparison:

ProviderCostQualityFeatures
HuggingFaceFree★★★★Offline, 100+ models
OpenAI$0.13/1M★★★★★Best quality
Voyage AIMedium★★★★★Domain-specific (code, finance, law)
Cohere$0.10/1M★★★★Multilingual
Jina AI$0.02/1M★★★★Best price/performance
Mistral$0.10/1M★★★★RAG-optimized
Together AI$0.008/1M★★★★Cheapest
OllamaFree★★★★Local GPU

Environment Variables:

export OPENAI_API_KEY="sk-..."
export VOYAGE_API_KEY="pa-..."
export COHERE_API_KEY="..."
export JINA_API_KEY="jina_..."
export MISTRAL_API_KEY="..."
export TOGETHER_API_KEY="..."

See HOW_IT_WORKS.md (embeddings section) and config/JSON5_CONFIG_GUIDE.md for detailed configuration.

Core Features

Modular Architecture

  • Workspace Design: Separate crates for core, WASM, Leptos, and server
  • Pluggable Backends: Qdrant, LanceDB, pgvector, or in-memory storage
  • Feature Flags: Compile only what you need (WASM, CUDA, Metal, WebGPU)
  • Trait-Based: 12+ core abstractions for maximum flexibility

Trait-Based Chunking Architecture

  • ChunkingStrategy Trait: Minimal interface for extensible chunking (1 method: fn chunk(&self, text: &str) -> Vec<TextChunk>)
  • HierarchicalChunkingStrategy: LangChain-style with boundary preservation (respects paragraphs/sentences)
  • Tree-sitter AST Chunking: cAST approach preserving syntactic boundaries for code
  • Performance Optimized: Zero-cost abstraction with real implementations
  • Example: Symposium analysis with 269 chunks preserving philosophical structure

cAST (Context-Aware Splitting) Implementation

Based on CMU research, our tree-sitter implementation provides:

  • Syntactic Boundary Preservation: Complete functions, methods, structs
  • Rust Support: AST parsing for proper code chunking
  • Configurable Granularity: Function-level with minimum size controls
  • Feature-Gated: Available with --features code-chunking

Usage Example

use graphrag_core::{
    core::{DocumentId, Document, ChunkingStrategy},
    text::{TextProcessor, HierarchicalChunkingStrategy},
};

// Trait-based chunking with hierarchical strategy
let processor = TextProcessor::new(1000, 100)?;
let strategy = HierarchicalChunkingStrategy::new(1000, 100, document.id);
let chunks = processor.chunk_with_strategy(&document, &strategy)?;

// Tree-sitter code chunking (with code-chunking feature)
#[cfg(feature = "code-chunking")]
{
    let code_strategy = RustCodeChunkingStrategy::new(50, document_id);
    let code_chunks = code_strategy.chunk(rust_code);
}

Run the Complete Example

# Basic example (hierarchical chunking)
cargo run --example symposium_trait_based_chunking --package graphrag-core

# With tree-sitter code chunking
cargo run --example symposium_trait_based_chunking --package graphrag-core --features code-chunking

See: graphrag-core/examples/symposium_trait_based_chunking.rs and README_symposium_trait_based_chunking.md for complete documentation.

Storage Options

Native Production

  • Qdrant: High-performance vector DB with JSON payload for entities/relationships
  • LanceDB: Embedded vector DB for edge deployments (Node.js/desktop only)
  • pgvector: PostgreSQL integration for existing infrastructure
  • Neo4j: Optional graph database for complex multi-hop queries (>100k entities)

WASM Browser

  • Voy: 75KB pure Rust vector search with k-d tree algorithm
  • IndexedDB: Browser-native persistent storage for graph data
  • Cache API: PWA-standard storage for ML models (1.6GB)

ML Inference

Embeddings

  • ONNX Runtime Web (GPU): 25-40x speedup, 3-8ms inference, WebGPU + CPU fallback, ✅ production-ready
  • Burn + wgpu (GPU): 20-40x speedup, 100% Rust, 70% complete (architecture done)
  • Candle (CPU): 100% Rust, BERT/MiniLM models, 50-100ms, planned
  • Ollama: Server-side embeddings with GPU acceleration

LLM Chatbot

  • WebLLM: 40-62 tok/s with WebGPU, production-ready
  • Candle: 2-5 tok/s CPU-only, 100% Rust, good for demos
  • Ollama: Server-side LLM with unlimited GPU power

Performance

  • ONNX Runtime Web: 25-40x speedup for embeddings, 3-8ms inference ✅ production-ready
  • WebGPU Acceleration: GPU inference in browser with automatic CPU fallback
  • WebLLM: 40-62 tok/s LLM inference with WebGPU ✅ production-ready
  • LightRAG Integration: 6000x token reduction vs traditional GraphRAG
  • PageRank Retrieval: Fast-GraphRAG with 6x cost reduction
  • Parallel Processing: Async/await throughout, concurrent document processing
  • Intelligent Caching: LLM response cache with 80%+ hit rates

Developer Experience

  • Progressive API: 4 complexity levels (Simple → Easy → Builder → Advanced)
  • Auto-Detection: Smart LLM/backend discovery
  • Enhanced Errors: Actionable error messages with solutions
  • TOML Config: Complete configuration-driven processing
  • Hot Reload: Configuration changes without restart

Examples

Quick Example: Using Config Templates

# Example 1: Process a book using existing template
cp config_tom_sawyer.toml my_book_config.toml
# Edit my_book_config.toml:
#   input_document_path = "books/my_book.txt"
#   output_dir = "./output/my_book"
cargo run --bin simple_cli my_book_config.toml "Who are the main characters?"

# Example 2: Process a research paper
cp config.toml research_config.toml
# Edit research_config.toml:
#   input_document_path = "papers/research.txt"
#   output_dir = "./output/research"
#   chunk_size = 500  # Smaller chunks for technical content
cargo run --bin simple_cli research_config.toml "What is the main hypothesis?"

# Example 3: Process with full configuration
cp config_complete.toml advanced_config.toml
# Edit all the parameters you need in advanced_config.toml
cargo run --bin simple_cli advanced_config.toml

Process a Book

use graphrag_rs::{GraphRAG, Document};
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Read document
    let content = fs::read_to_string("book.txt")?;

    // Create and configure GraphRAG
    let mut graphrag = GraphRAG::builder()
        .with_chunk_size(1000)
        .with_chunk_overlap(200)
        .build()?;

    // Process document
    let doc = Document::new("book", content);
    graphrag.add_document(doc)?;

    // Query
    let answer = graphrag.ask("What are the main themes?")?;
    println!("Answer: {}", answer);

    Ok(())
}

Use with Ollama

use graphrag_rs::{GraphRAG, OllamaConfig};

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Configure Ollama
    let ollama = OllamaConfig::new()
        .with_model("llama3.1:8b")
        .with_embedding_model("nomic-embed-text");

    // Create GraphRAG with Ollama
    let mut graphrag = GraphRAG::builder()
        .with_llm(ollama)
        .build()?;

    // Use as normal
    graphrag.add_text("Your document")?;
    let answer = graphrag.ask("Your question")?;

    Ok(())
}

Batch Processing

use graphrag_rs::GraphRAG;
use std::fs;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    let mut graphrag = GraphRAG::new_default()?;

    // Process multiple documents
    for file in ["doc1.txt", "doc2.txt", "doc3.txt"] {
        let content = fs::read_to_string(file)?;
        graphrag.add_text(&content)?;
    }

    // Query across all documents
    let answer = graphrag.ask("What connects these documents?")?;
    println!("Answer: {}", answer);

    Ok(())
}

Technical Achievements

GraphRAG-rs implements cutting-edge 2024 research in retrieval-augmented generation:

Core Innovations

  • Fast-GraphRAG: PageRank-based retrieval with 27x performance boost and 6x cost reduction
  • LightRAG Integration: Dual-level retrieval achieving 6000x token reduction vs traditional GraphRAG
  • Incremental Updates: Zero-downtime real-time graph processing with ACID-like guarantees
  • Intelligent Caching: LLM response cache with 80%+ hit rates and 6x cost reduction
  • Hybrid Retrieval: Combines semantic, keyword, BM25, and graph-based search strategies
  • ROGRAG Decomposition: Advanced query decomposition with 60%→75% accuracy boost, temporal and causal reasoning
  • Ollama Advanced Integration: Complete local LLM support with streaming, custom parameters, automatic caching, and metrics tracking

Ollama Integration (NEW! )

Complete local LLM and embedding support with production-grade features:

Core Capabilities:

  • Streaming Responses: Real-time token generation with tokio channels
  • Custom Parameters: Fine-grained control (temperature, top_p, top_k, stop sequences, repeat penalty)
  • Automatic Caching: DashMap-based response caching with 80%+ hit rate
  • Metrics Tracking: Thread-safe request/success/failure counting with atomic operations
  • Service Registry: Type-safe dependency injection for all Ollama services
  • AsyncEmbedder Trait: Full async/await support for embeddings
  • AsyncLanguageModel Trait: Standardized LLM interface with streaming

Performance:

  • Cache hit: <1ms vs 100-1000ms API calls
  • Concurrent request handling with Arc-based sharing
  • Zero-copy streaming with channel-based architecture
  • GPU acceleration via Ollama (CUDA, ROCm, Metal)

Example:

use graphrag_core::core::ServiceConfig;

let config = ServiceConfig {
    ollama_base_url: Some("http://localhost:11434".to_string()),
    embedding_model: Some("nomic-embed-text:latest".to_string()),
    language_model: Some("llama3.2:latest".to_string()),
    vector_dimension: Some(768),
    ..Default::default()
};

let registry = config.build_registry().build();
// All services configured and ready!

See HOW_IT_WORKS.md for the LLM/Ollama pipeline and config/JSON5_CONFIG_GUIDE.md for the ollama config block.

Architecture & Quality

  • Modular Workspace: 4 publishable crates (core, wasm, leptos, server)
  • Trait-Based Architecture: 15+ core abstractions with dependency injection
  • 50,000+ Lines: Production-quality Rust implementation
  • Comprehensive Testing: 220+ test cases with 100% pass rate
  • Production-Grade Logging: Structured tracing throughout core library
  • Zero Warnings: Clean compilation with clippy and cargo check
  • Feature Gates: Compile only what you need for minimal binary size
  • Memory-Safe: Leverages Rust’s ownership system for zero-cost abstractions

Workspace Architecture

GraphRAG-rs uses a modular workspace design for maximum reusability:

graphrag-rs/                     # 5-crate Cargo workspace (~140k lines)
├── graphrag-core/               # ✅ Portable core library (native + WASM)
│   ├── All core functionality   # LightRAG, PageRank, caching, incremental
│   └── Feature-gated deps       # Compile only what you need
├── graphrag-cli/                # ✅ TUI (ratatui) + CLI binary (in-process core)
│   └── index / ask / setup      # Zero-config turnkey commands
├── graphrag-wasm/               # ✅ WASM bindings, browser-native chat shell
│   ├── ONNX Runtime Web         # GPU embeddings (off-main-thread)
│   ├── WebLLM integration       # In-browser LLM synthesis
│   └── IndexedDB + Cache API    # Browser storage / persistence
├── graphrag-server/             # ✅ Production REST API (Actix + Apistos)
│   ├── JSON configuration       # Dynamic config via REST API
│   ├── Qdrant integration       # Vector database
│   ├── Ollama embeddings        # Real semantic search
│   └── Docker Compose           # One-command deployment
└── graphrag/                    # ✅ Meta-crate re-exporting graphrag-core
    └── hello-world API          # `use graphrag::GraphRAG;`

Dependency Graph

graphrag-cli    → graphrag-core
graphrag-wasm   → graphrag-core
graphrag-server → graphrag-core
graphrag (meta) → graphrag-core

Feature Flags

[features]
# Storage backends
memory-storage = []                           # In-memory (development)
persistent-storage = ["lancedb", "arrow"]     # LanceDB embedded vector DB Mutually exclusive with neural-embeddings
redis-storage = ["redis"]                     # Redis for distributed caching

# Processing features
parallel-processing = []                      # Rayon parallelization
caching = ["moka"]                           # LLM response caching
incremental = []                             # Zero-downtime updates
pagerank = []                                # Fast-GraphRAG retrieval
lightrag = []                                # Dual-level retrieval
rograg = []                                  # Query decomposition

# LLM integrations
ollama = []                                  # Ollama local models with streaming
dashmap = ["dep:dashmap"]                    # Response caching (used with ollama)
neural-embeddings = ["candle-core"]          # Candle ML framework Mutually exclusive with persistent-storage
function-calling = []                        # Function calling support

# Platform-specific (GPU acceleration)
cuda = ["neural-embeddings", "candle-core/cuda"]    # NVIDIA GPU
metal = ["neural-embeddings", "candle-core/metal"]  # Apple Silicon GPU
webgpu = ["burn/wgpu"]                              # WebGPU (WASM)

# Chunking strategies
code-chunking = ["tree-sitter", "tree-sitter-rust"]  # Tree-sitter AST-based chunking

# API & CLI
web-api = []                                 # REST API server

Important: Feature Compatibility

  • persistent-storage and neural-embeddings are mutually exclusive due to dependency conflicts
  • Choose based on your use case:
    • For production RAG with vector storage: Use persistent-storage (LanceDB + qdrant)
    • For ML experiments with neural nets: Use neural-embeddings (Candle + qdrant)
    • For development: Use neither (minimal dependencies)

See the feature flags section above for technical details on dependency selection.

For detailed architecture, see HOW_IT_WORKS.md.

API Reference

Core Types

// Main GraphRAG interface
pub struct GraphRAG { /* ... */ }

// Document representation
pub struct Document {
    pub id: String,
    pub content: String,
    pub metadata: HashMap<String, String>,
}

// Query results
pub struct QueryResult {
    pub answer: String,
    pub confidence: f32,
    pub sources: Vec<String>,
}

Main Methods

impl GraphRAG {
    // Create new instance
    pub fn new(config: Config) -> Result<Self>;

    // Add content
    pub fn add_document(&mut self, doc: Document) -> Result<()>;
    pub fn add_text(&mut self, text: &str) -> Result<()>;

    // Query
    pub fn ask(&self, question: &str) -> Result<String>;
    pub fn query(&self, question: &str) -> Result<QueryResult>;

    // Management
    pub fn clear(&mut self);
    pub fn save(&self, path: &str) -> Result<()>;
    pub fn load(&mut self, path: &str) -> Result<()>;
}

Performance Tuning

Memory Optimization

[performance]
chunk_size = 500  # Smaller chunks use less memory
max_entities_per_chunk = 10
enable_caching = false

Speed Optimization

[performance]
enable_parallel = true
num_threads = 8  # Adjust based on CPU cores
batch_size = 50

Accuracy Optimization

[pipeline]
chunk_overlap = 400  # Higher overlap preserves more context
min_confidence = 0.7
enable_reranking = true

Troubleshooting

Common Issues

Build fails with “rust version” error

# Update Rust
rustup update

Out of memory error

# Reduce chunk size in config.toml
chunk_size = 300
enable_parallel = false

Slow processing

# Enable parallel processing
enable_parallel = true
num_threads = 8

Ollama connection error

# Ensure Ollama is running
ollama serve

# Check if model is available
ollama list

Debug Mode

# Enable debug logging
RUST_LOG=debug cargo run --bin simple_cli config.toml

# Enable backtrace for errors
RUST_BACKTRACE=1 cargo run

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Development Setup

# Clone repository
git clone https://github.com/your-username/graphrag-rs.git
cd graphrag-rs

# Run tests
cargo test

# Run with debug info
RUST_LOG=debug cargo run

# Check code quality
cargo clippy
cargo fmt --check

FAQ

Q: What file formats are supported? A: Currently supports plain text (.txt) and markdown (.md). PDF support is planned.

Q: Can I use this without Ollama? A: Yes, the library includes a mock LLM for testing and can work with embeddings only.

Q: How much memory does it need? A: Typically under 100MB for documents up to 500k characters.

Q: Is it production ready? A: Yes, with 214 passing tests, zero warnings, and production-grade structured logging throughout the core library.

Q: Can I use commercial LLMs? A: OpenAI support is planned. Currently works with Ollama’s local models.

Roadmap & Implementation Status

✅ Phase 1: Core Implementation (COMPLETE)

Native Backend - Production Ready:

  • Modular Architecture: 50,000+ lines across 25+ modules
  • Trait System: 15+ core abstractions with dependency injection
  • Fast-GraphRAG: PageRank-based retrieval (27x performance boost)
  • LightRAG: Dual-level retrieval (6000x token reduction)
  • Incremental Updates: Zero-downtime graph processing
  • Intelligent Caching: 80%+ hit rates, 6x cost reduction
  • ROGRAG: Query decomposition (60%→75% accuracy) + temporal/causal reasoning
  • Hybrid Retrieval: Semantic + keyword + BM25 + graph
  • Parallel Processing: Multi-threaded document processing
  • Configuration System: Complete TOML-driven pipeline
  • Professional CLI: Progress bars, auto-detection
  • Comprehensive Tests: 214+ test cases, 100% pass rate
  • Production Logging: Structured tracing throughout core library

Server Deployment - Production Ready:

  • graphrag-server: REST API with Actix-web 4.9 + Apistos (automatic OpenAPI 3.0.3 docs)
  • Dynamic JSON Config: Full pipeline configuration via REST API (no TOML required)
  • Qdrant Integration: Production vector database
  • Ollama Embeddings: Real semantic search with GPU
  • Hash-based Fallback: Zero-dependency mode
  • Docker Compose: One-command deployment
  • Health Checks: Full system monitoring
  • 5.2MB Binary: Optimized release build

Phase 2: WASM & Web UI (IN PROGRESS - 60% Complete)

WASM Infrastructure:

  • graphrag-wasm crate: WASM bindings foundation
  • ONNX Runtime Web: GPU embeddings (3-8ms, 25-40x speedup)
  • WebLLM Integration: GPU LLM (40-62 tok/s)
  • IndexedDB: Browser storage layer
  • Cache API: Model storage layer
  • Voy Bindings: Vector search preparation
  • Burn + wgpu: GPU acceleration (architecture 70% complete)
  • Integration Tests: End-to-end WASM testing

Web UI (graphrag-wasm):

  • Browser-native chat shell: 3-column Nordic-Minimal UI (Leptos)
  • Citations + subgraph view: per-query references and SVG graph
  • Off-main-thread inference: ONNX Runtime Web + WebLLM workers
  • Graph Visualization: richer interactive knowledge-graph display
  • Progress Indicators: Real-time status updates
  • Responsive Design: Mobile-first layout

Phase 3: Advanced Features (PLANNED)

Performance & Scale:

  • Distributed caching with Redis
  • OpenTelemetry monitoring and tracing
  • Query intelligence with ML rewriting
  • Multi-model embeddings support
  • Batch processing optimizations

Analytics & Insights:

  • Graph analytics (community detection, centrality)
  • Entity clustering and relationships
  • Temporal reasoning: Event timeline extraction and narrative ordering
  • Causal reasoning: Cause-effect chain discovery with confidence ranking
  • Quality metrics and confidence scoring

Data Integration:

  • Bulk import from CSV, JSON, RDF
  • PDF document processing
  • Multi-format export (GraphML, Cypher)
  • Integration connectors (Notion, Confluence)

Phase 4: Enterprise Features (FUTURE)

Scalability:

  • High availability and failover
  • Horizontal scaling with load balancing
  • Multi-region deployment
  • Enterprise-grade security

Developer Experience:

  • Multi-language SDKs (Python, TypeScript, Go)
  • GraphQL API
  • Custom plugin system
  • Webhook integrations

License

MIT License - see LICENSE for details.

Acknowledgments

  • Microsoft GraphRAG for the original concept
  • Ollama for local LLM support
  • Rust community for excellent libraries

Built with Rust | Documentation | Report Issues

Similar Articles

LightRAG: Simple and Fast Retrieval-Augmented Generation

Papers with Code Trending

The article introduces LightRAG, an open-source framework that enhances Retrieval-Augmented Generation by integrating graph structures for improved contextual awareness and efficient information retrieval.