@zachlloydtweets: https://x.com/zachlloydtweets/status/2069428152338665622

X AI KOLs Timeline Tools

Summary

This post explains how to create an automated feedback loop for AI agents to iteratively improve their skills, using computer use and an observer skill to evaluate and update the skill code.

https://t.co/vHY2PtGsma
Original Article
View Cached Full Text

Cached at: 06/23/26, 04:11 PM

Building a skill optimization loop

This post shows how to create a loop with automated feedback that an agent can run to optimize its own Skills. It uses an automated grader with computer use to assess how well a Skill is performing, and then iteratively improves the Skill.

You can apply this technique to create improvement loops any time a Skill has clear validation criteria.

I’ll do this example with a Skill for replatforming websites from WYSIWYG platforms to self-hosted code (we did this recently for our own marketing site). Let’s call this Skill /replatform-site (source here). It’s a generally useful Skill, but the goal of this post is less about this particular use case than showing the meta-process to evaluate and improve Skills on a loop.

Let’s say, for fun, I’m starting a new podcast called Talking Slop, where I go over AI trends with other people who like to chat about them. You can see that I’ve stood this site up at talkingslop.ai, and it’s currently hosted on a WYSIWYG no-code platform. I think I could slopmaxx this site even faster if it was hosted as code on Vercel, so I’m going to run /replatform-site on it.

On the initial run, it got pretty far, but has one obvious visual defect due to missing icons for these dropdown toggles.

You can compare talkingslop.ai to the generated port on talking-slop.vercel.app

This is where the loop comes in. Since this is a verifiable task, you can create loops to automatically improve it. In fact, there are two loops you might implement here:

  • A loop to make sure that this specific replatforming worked.

  • A loop to make the replatforming Skill itself more likely to work better next time.

I’m going to focus on the second kind of loop here, the **outer loop. **See my post on self-improvement loops for more on the inner vs. outer loop distinction.

I set up this loop through another Skill, using a pattern of creating an “observer” skill (source) that grades the quality of the “inner” Skill. This observer Skill takes as input N sites to replatform, calls the /replatform-site on them, and then builds those sites and examines them for behavioral and visual differences using computer use and browser use. It also tracks how many tokens the replatforming took and attempts to optimize cost while maintaining quality.

It synthesizes the results using a SOTA model, looks for failure patterns and opportunities to improve, and then creates a diff to update the inner Skill. Because Skills are just files, you can use any coding agent like Warp to do this analysis and submit a PR to update the Skill.

The replatforming “observer” Skill

The observer Skill uses structured data results so that it can be as intelligent as possible when suggesting fixes to the inner Skill. It’s fairly sophisticated, but the general concept is simple: run the inner Skill, record its failures, make a diff to improve it, repeat.

In order to run this loop, you’ll want to use a platform that supports orchestration of multiple agents along with computer use. I used Oz, which is built into Warp and supports computer use across multiple SOTA models, but there are plenty of options out there (if you want to try it in Oz, I made a separate Skill for it called /oz-orchestrated-replatforming).

Here’s a version of the diff it created on an early run on talkingslop.ai. Here, it noticed the dropdown issue and suggested an improvement):

If you want to tune this Skill on a significant corpus for wider use, you can scale the number of input sites and keep iterating until the diffs generated by the observer become less meaningful. The observer itself has exit criteria baked into it for when to stop looping so you don’t burn tokens optimizing forever.

This system isn’t perfect – there’s only so much you can improve by tuning Skills and it’s susceptible to finding local maxima – but it’s pretty handy as a simple way of making sure your Skills perform well.

If you want to give it a go, all of the relevant skills are open sourced here: https://github.com/warpdotdev-demos/replatformer

And stay tuned for my first episode of Talking Slop 😉

Similar Articles

@AlphaSignalAI: https://x.com/AlphaSignalAI/status/2069064122218717387

X AI KOLs Timeline

This article explores how AI agents can automatically write and optimize their skill files using techniques like SkillOpt from Microsoft Research, which treats skill documents as trainable state and delivers significant performance improvements. It addresses the challenge of manual skill tuning and presents frameworks like GEPA and EvoSkill as evolutionary approaches.

@petradonka: https://x.com/petradonka/status/2054897826149101588

X AI KOLs Timeline

The article argues that AI agents performing judgment-heavy tasks need feedback loops to improve over time, rather than relying on static prompts, using the example of Buzz, an agent developed by Warp to monitor and respond to social mentions.

@qinzytech: https://x.com/qinzytech/status/2066585405479371092

X AI KOLs Timeline

A technical analysis of two approaches to building self-evolving AI agents: model-based (via architecture like SSMs or transformer with fast-weight updates, and training methods) and harness-based (via memory or meta harness that can rewrite itself). The author provides practical recommendations for different audiences.