On-Demand Human Judgement for AI Agents

Reddit r/AI_Agents 05/15/26, 09:01 PM Tools

on-demand-human-judgment ai-agents human-in-the-loop mcp-server evaluation subjective-decisions

Summary

Describes building an MCP server that provides on-demand human judgment for AI agents, allowing them to get real human responses for subjective decisions and evaluations instead of relying on synthetic or slow methods.

Been thinking about this a lot lately. Agents are getting scary good at the mechanical stuff - searching, calling APIs, writing code, executing multi-step plans. But they still face two problems that no amount of scaling fixes: 1. They hit decision points where the "right answer" is a judgment call, not a logic problem. Is this email tone too aggressive? Which of these three landing page headlines actually lands? Does this UI feel sketchy to a normal person? Models have priors on this stuff but their priors are an average of the internet, not your actual users. 2. You can't eval them on anything subjective without burning a week recruiting people, building a survey, paying a panel, etc. So most teams just don't, and ship on vibes. I built an MCP server that solves both. Agent hits a fork in the road, calls the tool with a question + audience (e.g. "US women 25-34" or "developers who've used Cursor"), and gets back actual human responses in seconds. Not synthetic. Not Mturk graveyard. Real people replying within seconds. Example from last week - someone wired it into a Claude Code agent generating marketing copy variants. Instead of picking the "best" one itself, the agent fires off 4 versions to 200 people in the target segment, gets back preference data, and only then commits. Same primitive works for eval generation. Want a 500-person benchmark on whether your agent's outputs feel trustworthy? One tool call. Anyway - curious if anyone else is doing the human-in-the-loop thing for agents, and how? Most stuff I've seen is either slow HITL or pure LLM judge (cheap but circular).

Original Article

On-Demand Human Judgement for AI Agents

Similar Articles

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

@petradonka: https://x.com/petradonka/status/2054897826149101588

AI agents don’t just need more autonomy. They need better judgment about when to stop.

GetMCP: Zero Trust for AI agents

Agent Judge: Solving Long-Context Evals for Production Agents (10 minute read)

Submit Feedback

Similar Articles

MCP-Persona: Benchmarking LLM Agents on Real-World Personal Applications via Environment Simulation

@petradonka: https://x.com/petradonka/status/2054897826149101588

AI agents don’t just need more autonomy. They need better judgment about when to stop.

GetMCP: Zero Trust for AI agents

Agent Judge: Solving Long-Context Evals for Production Agents (10 minute read)