flux-genotype is an open-source AI kernel that orchestrates local LLMs on CPU, allowing self-modification of its architecture via a MetaDesigner module.
\`🧬 Flux‑Genotype – A CPU LLM that rewrites itself\` I've been working on an open-source kernel called \*\*flux-genotype\*\*. It orchestrates local models (TinyLlama, Llama 3.2, Hermes 3, DeepSeek-Coder) into a self-modifying ecosystem. Everything runs on \*\*CPU\*\* — I tested it on a Xeon without AVX2, 20 GB RAM. \> \*\*Important:\*\* this is an alpha. It works, it mutates, it evolves — but there's a lot of work ahead. The \*\*MetaDesigner\*\*, in particular, is the module I'm focusing on next. Right now it proposes architectural changes by writing new \`.flux\` files, but the validation and application pipeline needs to be more robust. The vision is to make it fully autonomous: an external architect that watches the ecosystem, diagnoses weaknesses, and rewrites the structure to improve confidence. It's not there yet, but the foundation is solid. \## How it works 1. Ask a question → fast model (TinyLlama) answers. 2. Judge model evaluates the answer (0–1). Initially this was Llama 3.2. 3. If confidence drops below the golden ratio threshold (≈0.618), the ecosystem mutates its own structure. 4. A \*\*MetaDesigner\*\* (Hermes 3) writes new \`.flux\` architecture files, which get validated by a Lark parser and applied. 5. The system tracks confidence history with EMA and adapts temperature dynamically. \## Real example of self‑modification The mutation can also replace the Judge. During one of the growth cycles, the MetaDesigner proposed swapping the Judge from \*\*Llama 3.2\*\* to \*\*DeepSeek-Coder 6.7B\*\*. The new configuration was tested, scored better, and the ecosystem applied the change permanently. The system is not just tweaking parameters — it's rewriting its own \*\*division of labor between models\*\*. \## Why this is different \- It mutates its own architecture, not just model weights. \- It can replace its own Judge with a different model if performance improves. \- It has memory (confidence history with Exponential Moving Average). \- It uses a custom language (\`.flux\`) with a formal grammar — not YAML, not JSON. \- It runs on modest hardware. No GPU. Just a CPU and 20 GB of RAM. \## If you want to understand the architecture deeply I wrote a \*\*technical manifesto\*\* that defines FLUX as a formal Architecture Description Language for self-evolving cognitive ecosystems. It covers the fractal design, the OODA loop, the role of the golden ratio, and the long-term vision (including the MetaDesigner). It's in the repo: \## The companion novel There's also a novel called \*\*"IF THIS IS A ROBOT"\*\* (in Italian and English, CC BY-NC-SA 4.0) that tells the story of a guy who finds this kernel running on a forgotten server. The novel is basically the kernel's manual. But the code stands on its own. \- Kernel is \*\*MIT-licensed\*\*. Novel is \*\*CC BY-NC-SA 4.0\*\*. Happy to answer questions, and \*\*open to collaborators\*\* who want to help push the MetaDesigner forward.
AccelOpt is a self-improving LLM agentic system that autonomously optimizes AI accelerator kernels through iterative generation and optimization memory, achieving 49-61% peak throughput improvements on AWS Trainium while being 26x cheaper than Claude Sonnet 4.
The author presents a custom, hackable ML compiler written in Python that lowers LLMs to optimized CUDA kernels through a multi-stage IR pipeline, achieving performance competitive with or superior to PyTorch on specific operations. The article details the compiler's optimization passes, lowering rules, and CLI usage for generating efficient fused GPU kernels.
KForge is a cross-platform framework that uses two collaborating LLM-based agents to automatically generate and optimize high-performance compute kernels for diverse AI accelerators, achieving significant speedups on NVIDIA B200 and Intel Arc B580 hardware.
A tweet discussing two agentic GPU kernel optimization systems: Auto GPU Kernel by @dogacel0 and Kernel Design Agents from @songhan_mit's lab, both winners at the MLSys Sparse Attention FlashInfer competition. The thread highlights different approaches using subagents and Claude skills for GPU programming.
A technical analysis of two approaches to building self-evolving AI agents: model-based (via architecture like SSMs or transformer with fast-weight updates, and training methods) and harness-based (via memory or meta harness that can rewrite itself). The author provides practical recommendations for different audiences.