Heads up for DeepSWE benchmark: The cost is measured per task, not the total run.
Summary
The DeepSWE benchmark costs are per task, not per total run. Running models like Mimo V2.5 Pro can cost ~$225 for a full run, while Mimo V2.5 non-pro costs ~$7.15. Users should be aware of this before running expensive models.
Similar Articles
@seclink: This 12-billion-parameter model uses a unified Transformer architecture to efficiently handle raw multimodal inputs. It requires only 16GB of RAM to run, making it a perfect fit for devices like the MacBook Pro. It excels in various benchmarks, such as achieving 78.8% on GPQA Diamond and...
A 12-billion-parameter multimodal model has been released as open source. It features a unified Transformer architecture and requires only 16GB of RAM to run. It performs exceptionally well across multiple benchmarks, supports a 256K context window, and works with over 140 languages.
@Miles_Brundage: BREAKING: massively improved SOTA score on Clear AVERI Pronunciation Guide Bench, via my colleague Carly
Miles Brundage announces a state-of-the-art (SOTA) score improvement on the Clear AVERI Pronunciation Guide Bench achieved by colleague Carly.
Someone did an audit on the new DeepSWE, the results aren't pretty
DeepSWE is a new benchmark for evaluating AI coding agents on real-world software engineering tasks from active open-source repositories, comprising 113 tasks across TypeScript, Go, Python, JavaScript, and Rust with isolated environments and program-based verifiers.
Big Model Value Wars - DeepSeek V4 Pro vs MiMo-V2.5-Pro vs MiniMax M3
A discussion comparing DeepSeek V4 Pro, MiMo-V2.5-Pro, and MiniMax M3 for best value in local or openrouter use, with a focus on agentic and coding tasks, and mentions of Hermes Agent and Qwen 3.6 variants.
@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…
Google's new Gemma 4 12B is a single decoder-only transformer with encoder-free multimodal input, achieving strong benchmarks while being small enough to run locally on a budget GPU. It is released under Apache 2.0 license.