Tag
agent-data is a Python API tool that provides structured web data for AI agents like OpenClaw, claiming to be 70% cheaper and more reliable than browser automation.
Open-weights models have caught up with proprietary ones, with GLM 5.2 achieving near Opus-level scores in browser agent tasks at low cost. Other models like Minimax M3 and Kimi k2.7 also show notable improvements.
Azalia Mirhoseini highlights DeLM, a decentralized language model approach where agents communicate via shared state, achieving ~10% improvement on SWE-bench Verified with Gemini-3 Flash at less than half the cost.
Introduces Behavior Forecasters (BFs) that take reasoning trajectories as input and achieve more accurate forecasts than frontier models at a fraction of the cost.
Irys introduces Stateful Swarms, an open-source paradigm for AI agents using structured blackboard memory to improve performance and reduce cost. On Harvey AI's Legal Agent Benchmark, it achieved an 83.74% criteria pass rate at $1.30 per task, compared to the state-of-the-art 10.4% at $50.90.
This paper introduces an adaptive on-the-fly multifidelity machine learning algorithm for quantum chemistry that autonomously determines training data composition across fidelities, reducing data generation costs by up to 30x compared to single-fidelity methods and up to 5x compared to standard multifidelity methods.
Jensen Huang hints at more Nemotron model releases, highlighting open-source frontier intelligence and cost efficiency enabled by NVFP4 training.
Mimo V2.5 offers performance comparable to Claude Opus 4.5 at a fraction of the cost, making it a highly cost-effective AI model for agentic tasks.
A developer used Codex 5.5 as an orchestrator and Deepseek v4 pro as an executor to generate a 240M token fine-tuning dataset, burning 359M tokens at a cost of only $78.
TRACER is a tool that replaces up to 90% of LLM classification calls with lightweight traditional ML by learning from LLM traces, reducing cost while maintaining accuracy.
Perceptron Inc. released its flagship video analysis model Mk1, claiming 80-90% lower cost than competitors while achieving strong performance on spatial and video reasoning benchmarks.
Interfaze introduces a hybrid AI model architecture combining CNN/DNN specialization with transformer capabilities, achieving superior accuracy on deterministic tasks like OCR and translation while maintaining cost efficiency at scale.
Google releases Gemini 2.5 Flash-Lite as stable and generally available, the fastest and lowest-cost model in the Gemini 2.5 family at $0.10 input/$0.40 output per 1M tokens, featuring native reasoning capabilities and full feature parity with native tools.
Google announces general availability of Gemini 2.5 Flash and Pro models, and introduces Gemini 2.5 Flash-Lite in preview—a new cost-efficient and fastest variant optimized for high-volume, latency-sensitive tasks.
OpenAI releases o3-mini, a cost-efficient reasoning model with strong STEM capabilities, available in ChatGPT and API with support for function calling, structured outputs, and three reasoning effort levels. The model matches o1 performance in math and coding while being faster and cheaper, with free plan users gaining access to a reasoning model for the first time.
OpenAI releases o1-mini, a cost-efficient reasoning model that matches o1 performance on STEM tasks like math and coding while being 80% cheaper. The model is optimized for reasoning-heavy applications and is now available to API users and ChatGPT Plus/Team/Enterprise/Edu subscribers.
OpenAI releases GPT-4o mini, a cost-efficient small model priced at 15 cents per million input tokens, 60% cheaper than GPT-3.5 Turbo, with strong performance on MMLU (82%) and outperforming competitors like Gemini Flash and Claude Haiku on reasoning, math, and coding tasks.