Tag
Describes a two-layer small LLM architecture: a local always-on agent (Raven) on an RTX5080 and an online reasoning stack (Trinity Cortex) with three small models and a knowledge graph, arguing that small models are better than large frontier models for graph-based reasoning.