We keep adding “skills” to our agents and have no idea which ones actually work. Solved problem?

Reddit r/AI_Agents News

Summary

A PM at an internal developer platform highlights the challenge of tracking which AI agent skills are actually invoked and effective, and asks the community if there are existing tools or solutions for this observability problem.

PM at an internal developer platform (IDP) here. We’ve been building AI agents into our product: an agent that onboards new devs onto a service, say, or one that helps debug a broken config. Under the hood these agents draw on a set of “skills” we’ve written — reusable modules for specific jobs (an onboarding skill, a skill for a particular solution, and so on). We keep writing more of them. The problem: I have no visibility into whether any of it works. I can’t tell which skills the agents actually invoke, how often, or whether the ones that fire are helping the user or just adding noise. We write a skill, ship it, and that’s it — no clue whether it’s earning its place or just sitting there as dead code the agent never reaches for. Before I go build something myself: is this a solved problem with tooling I’ve missed, or is everyone equally blind here? How are you tracking whether your agents’ skills actually matter?
Original Article

Similar Articles