We keep adding “skills” to our agents and have no idea which ones actually work. Solved problem?

Reddit r/AI_Agents 06/20/26, 09:48 AM News

ai-agents observability developer-tools monitoring internal-developer-platform skills

Summary

A PM at an internal developer platform highlights the challenge of tracking which AI agent skills are actually invoked and effective, and asks the community if there are existing tools or solutions for this observability problem.

PM at an internal developer platform (IDP) here. We’ve been building AI agents into our product: an agent that onboards new devs onto a service, say, or one that helps debug a broken config. Under the hood these agents draw on a set of “skills” we’ve written — reusable modules for specific jobs (an onboarding skill, a skill for a particular solution, and so on). We keep writing more of them. The problem: I have no visibility into whether any of it works. I can’t tell which skills the agents actually invoke, how often, or whether the ones that fire are helping the user or just adding noise. We write a skill, ship it, and that’s it — no clue whether it’s earning its place or just sitting there as dead code the agent never reaches for. Before I go build something myself: is this a solved problem with tooling I’ve missed, or is everyone equally blind here? How are you tracking whether your agents’ skills actually matter?

Original Article

We keep adding “skills” to our agents and have no idea which ones actually work. Solved problem?

Similar Articles

How does your company measure the impact of agents and skills in real production, not just benchmarks?

everyone's focused on whether their agent works. almost nobody asks if it's actually getting better over time

Which platform is your company using for ai agent observability and reliability needs?

Most of our “agent” problems turned out to be workflow/state problems

What's the most useful AI agent you've seen in production?

Submit Feedback

Similar Articles

How does your company measure the impact of agents and skills in real production, not just benchmarks?

everyone's focused on whether their agent works. almost nobody asks if it's actually getting better over time

Which platform is your company using for ai agent observability and reliability needs?

Most of our “agent” problems turned out to be workflow/state problems

What's the most useful AI agent you've seen in production?