Tag
Google is updating the Gemini Interactions API to replace strict user/model roles with a flexible step-based system (outputs + roles → steps), introducing agentic steps like user_input, thought, function_call, tool_call, and model_output. The update also consolidates response_format controls and requires SDK upgrades (Python/JS ≥2.0.0) or a new API header to opt-in.
Google introduces event-driven Webhooks for the Gemini API to reduce latency and friction for long-running jobs like Deep Research and batch processing. This feature replaces inefficient polling with push-based notifications, improving developer experience for agentic applications.
OpenAI is making the Realtime API generally available with a new advanced speech-to-speech model called gpt-realtime, featuring improved instruction following, tool calling, and natural speech quality. New capabilities include MCP server support, image inputs, SIP phone calling, and two new voices (Cedar and Marin).
OpenAI introduces vision fine-tuning capabilities for GPT-4o, allowing developers to customize the model with image data in addition to text for improved performance on vision tasks like visual search, object detection, and medical image analysis.
OpenAI released two new embedding models: text-embedding-3-small (5x cheaper than ada-002 with 40%+ MIRACL improvement) and text-embedding-3-large (best performance with up to 3072 dimensions). Both models show significant performance gains on standard benchmarks while reducing costs.
OpenAI has released fine-tuning capabilities for GPT-3.5 Turbo, allowing developers to customize models for specific use cases with improved performance, steerability, and output formatting. The update enables fine-tuned GPT-3.5 Turbo to match GPT-4 performance on certain tasks while reducing prompt sizes by up to 90%.
OpenAI announces function calling capability for GPT-4 and GPT-3.5-turbo models, allowing developers to describe functions via JSON Schema and have models intelligently choose to output structured JSON for external tool integration. The update also extends support for older model versions until June 2024 and improves model evaluation methodology.