@DeRonin_: How to naturally build your own self-improving agents: a self-improving agent learns from its own mistakes and rewrites…
Summary
A practical guide explaining three levels of building self-improving AI agents, from manual loops to automated design, with recommended tools and frameworks.
View Cached Full Text
Cached at: 06/29/26, 10:32 PM
How to naturally build your own self-improving agents:
a self-improving agent learns from its own mistakes and rewrites itself, not just papers
setups by level:
LEVEL 1: manual self-improvement loop
needs: basic Python or a no-code eval tool shipping: 1 weekend basic, 1-2 weeks for real wins
50-100 test cases for your agent’s real job define what “good” means (accuracy, format, tool calls) LLM-as-judge scores each output 1-10 failures feed a prompt rewrite loop 5-10x, keep the winner
tools that skip the boilerplate: Promptfoo, Inspect AI, Braintrust, LangSmith
LEVEL 2: DSPy framework (Stanford NLP, open-source)
needs: solid Python + 1 week to learn the framework shipping: 1-2 weeks first pipeline, 2-3 days after
declare your agent, don’t hand-write prompts auto-compiles prompts via MIPROv2 / BootstrapFewShot handles multi-step, RAG, and tools natively already in production at Databricks, JetBlue
LEVEL 3: automated agent design (ADAS, AutoAgent, similar)
needs: ML engineering background + $100-1000 compute budget shipping: 2-4 weeks of setup before meaningful improvements
the agent itself becomes the search space spawns sandboxes, mutates architectures, reads its own failures ADAS paper (Hu et al, 2024) beat hand-built baselines on coding, math, reasoning AutoAgent and similar repos exist but setup is research-grade
*P.S. on this level, I am going to release detailed article which will replace ML background
this is what the paper builds on. it’s not theoretical anymore
the paper’s specific contribution (co-evolving the evaluator) bolts onto ANY level:
rotate 3 judges from different models (anti-gaming) curriculum learning: easy → hard test sets judges generate new failing tests (adversarial gen)
start at level 1, you’ll learn more in one weekend of running your own loop than reading 5 more papers
direct links to every tool, repo, and paper below (2nd tweet) ↓
direct links for each level:
LEVEL 1 — eval tools:
Promptfoo → http://promptfoo.dev Inspect AI → http://inspect.aisi.org.uk Braintrust → http://braintrust.dev LangSmith → http://smith.langchain.com Anthropic eval cookbook → http://github.com/anthropics/anthropic-cookbook…
LEVEL 2 — DSPy:
docs → http://dspy.ai github → http://github.com/stanfordnlp/dspy… MIPROv2 paper → http://arxiv.org/abs/2406.11695
LEVEL 3 — ADAS / AutoAgent:
ADAS paper → http://arxiv.org/abs/2408.08435 ADAS code → http://github.com/ShengranHu/ADAS AutoAgent → http://github.com/HKUDS/AutoAgent
bookmark this, you’ll need it
also in addition to this, prepare really detailed guide on how anybody can setup such as system even without ML background
it should be good
yeah, i live with this shit for 2 weeks already…
hopefully it’s giving to me only 3-3.5k views at the beginning
all others are organic
Similar Articles
@svpino: How to build an agent that gets better over time: There are 3 areas an agent can learn from: 1. The model: Only works f…
Santiago Valdarrama shares a framework for building AI agents that improve over time through three learning areas: model refinement, harness optimization, and context accumulation, emphasizing the importance of learning from user corrections.
@qinzytech: https://x.com/qinzytech/status/2066585405479371092
A technical analysis of two approaches to building self-evolving AI agents: model-based (via architecture like SSMs or transformer with fast-weight updates, and training methods) and harness-based (via memory or meta harness that can rewrite itself). The author provides practical recommendations for different audiences.
@omarsar0: Very good advice on self-improving agents. (bookmark it) This is something I am seeing in my own experiments with codin…
Tweet discussing advice on self-improving agents, with personal observations from experiments on coding agents for long-horizon tasks, noting that stronger models don't always yield better agents.
@Mnilax: Google and Stanford engineers just dropped a 39-page PDF on what actually makes an AI agent self-improve. input → outpu…
A 39-page paper from Google and Stanford engineers analyzes the key factors that enable AI agents to self-improve through feedback loops, noting that only 9% of agents actually run a real loop.
@shmidtqq: OpenAI published a 34-page guide on building AI agents. The whole thing reduces to one idea: an agent is a loop. Run th…
OpenAI published a 34-page guide on building AI agents, emphasizing that an agent is essentially a loop: run the model, call a tool, feed back results, repeat until an exit condition. The guide covers tools, guardrails, and starting with a single loop before scaling to multiple agents.