BaryGraph - knowledge graph where every relationship is its own embedded document (not an edge) [R]

Reddit r/MachineLearning 07/04/26, 08:24 AM Papers

knowledge-graph embedding vector-search mongodb graph-database nlp open-source

Summary

BaryGraph introduces a novel knowledge graph where every relationship is a first-class embedded document (BaryEdge) rather than an edge between nodes, enabling recursive abstraction triads that surface structural bridges between distant concepts. The preprint includes benchmarks showing structural metrics correlate with human similarity judgments better than cosine similarity alone.

Instead of node --edge--> node, every relationship is a first-class document with its own vector, called a BaryEdge. Stack pairs of BaryEdges recursively and you get "MetaBary" triads that surface structural bridges between concepts that live nowhere near each other in embedding space. Running locally on MongoDB Community + mongot + nomic-embed-text over the full English Wiktionary (6.6M docs). MCP server is live if you want to poke at it. Preprint + benchmark CSVs: https://zenodo.org/records/20186500 The problem I was chasing Flat vector search treats a relationship as a byproduct of two points being close. That throws away information. Two papers can describe the same underlying phenomenon (a flyby anomaly in orbital mechanics, an anomalous residual in stellar dynamics) without ever citing each other and without their embeddings landing anywhere near each other. Nothing in standard RAG surfaces that connection. What I did instead Every relationship gets embedded too: bary_vector = normalize(q·v(CM1) + q·v(CM2) + (1−q)·v(type)) q is connection quality, v(type) is a contextual embedding of what kind of relationship it is. This BaryEdge is now a retrievable document in its own right — not metadata on an edge. Then it recurses: two BaryEdges at the same level get bridged by a third one level below, forming a MetaBary triad. Do that repeatedly and you climb an abstraction triads hierarchy built entirely from algebra — zero additional embedding calls above the base level. It's a forest (every node has at most one parent), so traversal to root is a single $graphLookup, no cycle handling. Does it actually do anything useful? Ran it against SimLex-999 and WordSim-353 as a sanity check (not the main claim, just "is the substrate coherent"). Raw cosine similarity barely correlates with human similarity judgments (ρ ≈ −0.04 on SimLex). Structural metrics — how many BaryEdges two words share, how much their relational neighborhoods overlap — correlate at ρ ≈ 0.32–0.53, p < 10⁻¹⁵. So the graph is encoding something cosine alone doesn't. The part I actually care about is cross-domain bridging. Some probe traces from the live graph: octopus neuroscience ↔ distributed sensor networks, bridged by shared structural-motif vocabulary (neuroarchitecture, smartdust) collagen folding ↔ linguistic syntax, bridged by etymological + structural motif overlap (plicature / hypotaxis-parataxis) grief ↔ depression, not bridged and this is a correctness demonstration, not a missing capability. The DSM-5 added a much-debated "bereavement exclusion" precisely because grief and depression share surface symptoms but are different kinds of state, with different prognosis and treatment radioactive decay ↔ obsolete words falling out of use, bridged at a high abstraction level by register-varied decay verbs (collapsed, decayed, declined, disintegrated) — naming a Poisson-process state-loss pattern that both physics and historical linguistics instantiate, with no single word doing the work That last one is the case flat retrieval structurally cannot produce — there's no embedding axis for "verbs co-occurring with reduction-of-state across unrelated domains." Stack (all local, all free) GitHub: https://github.com/oleksiy-perepelytsya/bary-vector MongoDB Community Edition + mongot for storage/vector search nomic-embed-text, 768-dim Python 3.11+ Full build: ~6.66M documents, 8–14 hrs on a single workstation (8–16GB VRAM) Try it MCP server is public on request (SSE transport) — read-only tools for searching the live graph: find_word, semantic_search, edge_info, leaf_nodes, traverse_up, sample_metabary. If you've got an MCP-capable client you can point it at the graph and run your own probe queries in a few minutes. What I'd actually want feedback on Whether the cross-domain bridges hold up to someone who isn't me poking at them — try a probe query on a domain pair you know well and tell me if the bridge is real or if I'm pattern-matching myself into seeing structure that isn't there. Some bridges can be not obvious on the first look but they are actually the most intriguing ones and worth to be dug for the reason they built, so treat them as points of investigation Whether this is worth comparing directly against GraphRAG/RAPTOR-style hierarchical retrieval (I haven't done that benchmark yet, and I know that's the first thing this sub will ask) Whether anyone's tried something structurally similar and it fell apart at scale for reasons I haven't hit yet Preprint, architecture spec, and the raw SimLex/WordSim CSVs are all here: https://zenodo.org/records/20186500 Happy to drop the MCP endpoint on request if there's interest.

Original Article

BaryGraph - knowledge graph where every relationship is its own embedded document (not an edge) [R]

Similar Articles

GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning

colbymchenry/codegraph

CORE: Cyclic Orthotope Relation Embedding for Knowledge Graph Completion

Automated Big Data Quality Assessment using Knowledge Graph Embeddings

Measuring Graph-to-Graph Semantic Similarity in Knowledge Graphs: An Empirical Evaluation of Knowledge Graph Embeddings

Submit Feedback

Similar Articles

GraphARC: A Comprehensive Benchmark for Graph-Based Abstract Reasoning

CORE: Cyclic Orthotope Relation Embedding for Knowledge Graph Completion

Automated Big Data Quality Assessment using Knowledge Graph Embeddings

Measuring Graph-to-Graph Semantic Similarity in Knowledge Graphs: An Empirical Evaluation of Knowledge Graph Embeddings