Notes on multi-provider llm api compatibility, three approaches we tried

Reddit r/ArtificialInteligence News

Summary

Engineering notes comparing three approaches to unifying access to multiple LLM providers (OpenAI, Anthropic, Google) behind a single internal interface, discussing trade-offs in API normalization, native SDK usage, and gateway patterns.

Not a paper, this is production engineering notes from the last few months of trying to unify our team's access to openai, anthropic, and google models behind a single internal interface. We have services that need to call all three depending on the task and the question of "how do we make this not painful" turns out to be more interesting than i expected. Nothing here is novel research, but i don't see this written up much, so. Approach 1, normalize everything to openai chat completions format. This is the de facto industry default, the openai sdk shape is the lingua franca, most observability tooling speaks it. For plain chat completions it's fine. The cracks show up around three things specifically: * Tool/function calling schemas. Anthropic's tool\_use/tool\_result content blocks don't map cleanly to openai's tool\_calls structure on the round-trip. You can flatten it, but you lose the parallel tool call semantics and the ordered content blocks claude uses internally. On our internal eval (n=80 multi-turn tool-use scenarios, scoring tool selection accuracy + argument correctness) we measured a drop from 0.87 native-claude to 0.79 when we forced the openai normalization, consistent across three runs. Small sample, not peer-reviewed, but the direction was clear enough that we stopped investing in that path. * Streaming formats. Anthropic uses event-typed sse (message\_start, content\_block\_delta, etc.), openai uses delta chunks, gemini's streaming has its own shape. Wrappers handle the common case but the moment you need fine-grained streaming control (e.g., for tool calls in flight) the abstraction tends to leak. * Safety/system controls. Gemini's safety settings, anthropic's system prompt handling, and openai's developer message behavior all have subtly different semantics. "Translate everything to system role" loses information. Approach 2, keep native sdks per service, route at the application layer. Preserves full provider semantics. Cost is that you maintain three sdk integrations, three retry/timeout/auth code paths, and the routing logic becomes part of every service that needs multi-provider access. We found the maintenance burden grew faster than the feature value as we added providers. Approach 3, gateway that exposes multiple api specs natively rather than normalizing to one. Less common as a pattern. We evaluated portkey and tokenrouter squarely in this category. LiteLLM proxy mode is adjacent but not quite the same thing: its default behavior is openai-format normalization, which puts it closer to approach 1 for most usage patterns, though it can be configured for provider-native passthrough on specific routes. The appeal of the native-spec end of this space is that existing client code keeps speaking whichever sdk it was already written against. Tradeoff is that you're now relying on the gateway to track upstream api changes, which is a real maintenance burden you've outsourced rather than eliminated. If the gateway falls behind on a new feature (extended thinking, computer use, structured output extensions, etc.) you're stuck. A related question we haven't resolved: when a primary upstream provider degrades (we got a small taste of this during a late-april anthropic capacity event), pure-proxy gateways have nowhere to fall back to within that provider. Some gateways keep their own inference capacity behind the routing layer as a last-resort path, others don't. Whether that's actually useful depends entirely on what models the fallback path can serve, since obviously a llama-class fallback won't substitute for opus on the workload that needed opus in the first place. For our use case we treat it as a degraded-mode option rather than a real substitute. I don't have a clean answer on the quality cost of approach 1 vs approach 2 at scale. Our internal eval was small enough that i wouldn't put it in a paper, but the directional finding (measurable degradation on agentic tool-use tasks when normalizing to chat completions) was consistent enough that we decided not to ship that path for our use case.
Original Article

Similar Articles

LLMs and Memory Limitations - review my thoughts pls

Reddit r/ArtificialInteligence

An analysis of LLM memory limitations, arguing that true personal AI requires single-tenant weight customization which conflicts with current multi-tenant cloud economics, and highlighting open-weight models as the likely source of progress.

10 Ways To Reduce Your LLM API Costs

Reddit r/AI_Agents

A practical guide listing 10 strategies to reduce costs when using LLM APIs, including model selection, prompt caching, batch processing, and monitoring expenses.

How are top tech companies actually using LLMs internally beyond basic coding help?

Reddit r/AI_Agents

This post explores how major tech companies like Google, Meta, and OpenAI are utilizing advanced LLM workflows internally, focusing on agentic tasks, human-in-the-loop systems, and practical applications beyond basic coding. It seeks real-world use cases and operational routines that smaller startups and teams can adapt to improve productivity and efficiency.