What breaks the most when you call LLM APIs in production?

Reddit r/openclaw News

Summary

A discussion of common errors when calling LLM APIs in production, including rate limits, format mismatches, malformed responses, context overflow, model deprecation, and silent failures, with statistics from Datadog and a cited paper.

For those making LLM API calls in production, what are the errors that cause you the most friction? From what I've seen, five keep coming up: 1. Rate limits / provider down. Resource has been exhausted. Something like 60% of all LLM errors in prod are rate limits (Datadog). 2. Format mismatches across providers. max\_tokens that should be max\_completion\_tokens, additionalProperties rejected. It gets worse when you juggle 3+ providers. 3. Malformed responses. Thinking mode content that needs to be passed back, broken JSON. 4. Context overflow. Request too large, gets truncated or rejected. 5. Model deprecation. You wake up and your model doesn't exist anymore. Another one is silent failures. The response looks fine, format is valid, but the answer is just wrong. This is around 15% of responses without active verification (Arxiv Paper from Rahul Suresh Babu). Do you deal with this? Which ones hurt the most? Have you built anything to handle them or is it mostly retry and hope?
Original Article

Similar Articles

10 Ways To Reduce Your LLM API Costs

Reddit r/AI_Agents

A practical guide listing 10 strategies to reduce costs when using LLM APIs, including model selection, prompt caching, batch processing, and monitoring expenses.