What are you actually evaluating these days: prompts, context, or the whole harness?
Summary
A discussion about the focus of AI evaluations, questioning whether practitioners are optimizing prompts, context, or the entire harness, and noting a shift toward holistic optimization.
Similar Articles
@rohanpaul_ai: This paper shows that agent performance depends less on prompts alone and more on the harness around them. “Agent intel…
This paper argues that AI agent performance depends more on the harness (control layer) than on prompts alone, proposing natural-language agent harnesses to make design choices inspectable and portable.
@AntCaveClub: What exactly is Harness? Harness = Evaluation Harness. In AI, "harness" is industry jargon – a set of tools to "harness" a model and run standardized evaluations. The industry standard is EleutherAI's lm-e…
This article deeply explains the importance of the evaluation framework (Harness) in AI, analyzes the strategic significance of DeepSeek building its own Harness team, and compares the differences between the open-source lm-evaluation-harness and an in-house system.
Can prompting reduce AI sycophancy or is it mostly model behavior?
A user explores whether prompt engineering can reduce AI sycophancy in models like Gemini, ChatGPT, and Claude, or whether it's fundamentally a model alignment issue. The discussion touches on differences between models in handling disagreement and objective criticism.
Are most LLM eval tools still too prompt-focused?
The author questions whether current LLM evaluation tools are too focused on isolated prompts rather than full workflows and agent interactions, noting that step-by-step accuracy can mask overall behavioral drift in production.
@akshay_pachaar: from prompt to context to harness engineering. three terms keep coming up in AI engineering, and they get conflated all…
Akshay Pachaar clarifies three distinct AI engineering concepts — prompt engineering (the message), context engineering (the memory), and harness engineering (the machine) — explaining their roles and interplay in building LLM-based agents, with a link to a deeper article on agent harness engineering.