api-calls

#api-calls

Necessary but Not Sufficient: Temperature Control and Reproducibility in LLM-as-Judge Safety Evaluations

arXiv cs.LG ↗ · yesterday Cached

This paper investigates the assumption that setting LLM judge temperature to 0 ensures deterministic safety evaluations. It finds that in practice, many harnesses do not set temperature or seed, leading to high variance, and even with temperature=0, non-determinism persists due to provider-level randomness and API changes.

0 favorites 0 likes

#api-calls

What's your biggest fear about letting an agent take real actions in production?

Reddit r/AI_Agents ↗ · 2026-05-31

A developer shares concerns about deploying AI agents that perform real actions in production, such as API calls and data manipulation, and asks the community about their fears and mitigation strategies like guardrails and human approval.

0 favorites 0 likes

#api-calls

Computer use is 45x more expensive than a structured API call

Reddit r/AI_Agents ↗ · 2026-05-18

A benchmark shows that computer-use agents are 45x more expensive than structured API calls for the same task, due to high token usage from screenshots and multiple steps. The author argues that for internal tools with exposed state, API-based agents are more efficient, and promotes Reflex 0.9 which auto-generates APIs from app handlers.

0 favorites 0 likes

#api-calls

Best Cheapest Way To Run an Agent Long Term

Reddit r/openclaw ↗ · 2026-05-12

A developer discusses strategies for cost-effectively running long-term AI agents for financial market analysis, sharing experiences with Claude and Gemini APIs.

0 favorites 0 likes

api-calls

Necessary but Not Sufficient: Temperature Control and Reproducibility in LLM-as-Judge Safety Evaluations

What's your biggest fear about letting an agent take real actions in production?

Computer use is 45x more expensive than a structured API call

Best Cheapest Way To Run an Agent Long Term

Submit Feedback