SWE-rebench 排行榜更新:GLM-5.2、Qwen3.6-27B、Qwen3.6-35B-A3B、Gemma 4 31B 等新模型 + 改进的 UI

Reddit r/LocalLLaMA 新闻

摘要

SWE-rebench 排行榜已更新,新增了 GLM-5.2、Qwen3.6、Gemma 4 31B 等模型,并改进了 UI,展示了软件工程任务上的性能排名。

暂无内容
查看原文
查看缓存全文

缓存时间: 2026/07/01 16:17

# SWE-rebench 排行榜 来源:https://swe-rebench.com/ 1 62.7%± 0.91% 70.0%$2.252,120,66090.0% cached 2 61.6%± 0.64% 72.7%$1.841,866,49791.6% cached 3 60.4%± 1.37% 71.8%$1.751,898,13192.5% cached 4 59.6%± 1.98% 72.7%$1.741,878,24893.6% cached 5 OpenAI gpt-5.5-2026-04-23-medium 58.9%± 0.78% 70.0%$0.98708,41883.5% cached 6 56.5%± 1.20% 67.3%$2.022,479,38795.3% cached 7 OpenAI gpt-5.4-2026-03-05-medium 54.9%± 1.02% 70.9%$0.60834,45283.5% cached 8 53.1%± 1.45% 66.4%$1.321,526,13594.2% cached 9 53.0%± 0.53% 64.5%$0.231,031,65398.7% cached 10 51.3%± 0.55% 63.6%$1.292,644,57795.6% cached 11 51.1%± 1.20% 66.4%$0.751,545,44580.1% cached 12 51.1%± 1.13% 71.8%$0.752,623,45687.0% cached 13 50.7%± 0.93% 65.5%$0.942,664,00191.8% cached 14 49.5%± 0.98% 61.8%$0.771,848,59375.7% cached 15 47.8%± 1.37% 60.9%$1.531,828,64993.6% cached 16 46.5%± 1.27% 64.5%$0.612,466,97790.4% cached 17 45.6%± 1.27% 67.3%$1.066,885,81893.5% cached 18 42.7%± 1.29% 61.8%$0.222,247,89176.9% cached 19 42.4%± 0.84% 61.8%$0.122,586,99888.6% cached 20 38.4%± 0.97% 57.3%$0.072,996,07795.5% cached 21 38.2%± 0.86% 59.1%$0.392,256,18286.4% cached 22 36.5%± 0.45% 50.9%$0.561,875,62414.2% cached 23 33.8%± 0.93% 54.5%$0.182,229,92578.4% cached 24 16.5%± 1.13% 37.3%$0.322,238,42069.6% cached 25 N/AN/AN/AN/A26 N/AN/AN/AN/A27 N/AN/AN/AN/A28 N/AN/AN/AN/A29 N/AN/AN/AN/A30 N/AN/AN/AN/A31 N/AN/AN/AN/A32 N/AN/AN/AN/A33 N/AN/AN/AN/A34 N/AN/AN/AN/A35 N/AN/AN/AN/A36 Mistral Devstral-2-123B-Instruct-2512 N/AN/AN/AN/A37 Mistral Devstral-Small-2-24B-Instruct-2512 N/AN/AN/AN/A38 N/AN/AN/AN/A39 N/AN/AN/AN/A40 N/AN/AN/AN/A41 N/AN/AN/AN/A42 N/AN/AN/AN/A43 N/AN/AN/AN/A44 Gemini gemini-2.5-flash-preview-05-20 no-thinking N/AN/AN/AN/A45 Gemini gemini-2.5-flash-preview-05-20 no-thinking N/AN/AN/AN/A46 N/AN/AN/AN/A47 N/AN/AN/AN/A48 N/AN/AN/AN/A49 N/AN/AN/AN/A50 N/AN/AN/AN/A51 N/AN/AN/AN/A52 N/AN/AN/AN/A53 N/AN/AN/AN/A54 N/AN/AN/AN/A55 N/AN/AN/AN/A56 N/AN/AN/AN/A57 N/AN/AN/AN/A58 N/AN/AN/AN/A59 N/AN/AN/AN/A60 N/AN/AN/AN/A61 N/AN/AN/AN/A62 N/AN/AN/AN/A63 OpenAI gpt-5-mini-2025-08-07-high N/AN/AN/AN/A64 OpenAI gpt-5-mini-2025-08-07-medium N/AN/AN/AN/A65 N/AN/AN/AN/A66 N/AN/AN/AN/A67 OpenAI gpt-5.2-2025-12-11-medium N/AN/AN/AN/A68 N/AN/AN/AN/A69 N/AN/AN/AN/A70 N/AN/AN/AN/A71 N/AN/AN/AN/A72 N/AN/AN/AN/A73 N/AN/AN/AN/A74 N/AN/AN/AN/A75 N/AN/AN/AN/A76 N/AN/AN/AN/A77 N/AN/AN/AN/A78 N/AN/AN/AN/A79 N/AN/AN/AN/A80 N/AN/AN/AN/A81 N/AN/AN/AN/A82 N/AN/AN/AN/A83 N/AN/AN/AN/A84 Meta Llama-4-Maverick-17B-128E-Instruct N/AN/AN/AN/A85 Meta Llama-4-Scout-17B-16E-Instruct N/AN/AN/AN/A86 N/AN/AN/AN/A87 N/AN/AN/AN/A88 N/AN/AN/AN/A89 N/AN/AN/AN/A90 N/AN/AN/AN/A91 N/AN/AN/AN/A92 N/AN/AN/AN/A93 Qwen Qwen2.5-Coder-32B-Instruct N/AN/AN/AN/A94 N/AN/AN/AN/A95 Qwen Qwen3-235B-A22B no-thinking N/AN/AN/AN/A96 N/AN/AN/AN/A97 Qwen Qwen3-235B-A22B-Instruct-2507 N/AN/AN/AN/A98 Qwen Qwen3-235B-A22B-Thinking-2507 N/AN/AN/AN/A99 Qwen Qwen3-30B-A3B-Instruct-2507 N/AN/AN/AN/A100 Qwen Qwen3-30B-A3B-Thinking-2507 N/AN/AN/AN/A101 N/AN/AN/AN/A102 N/AN/AN/AN/A103 N/AN/AN/AN/A104 Qwen Qwen3-Coder-30B-A3B-Instruct N/AN/AN/AN/A105 Qwen Qwen3-Coder-480B-A35B-Instruct N/AN/AN/AN/A106 N/AN/AN/AN/A107 Qwen Qwen3-Next-80B-A3B-Instruct N/AN/AN/AN/A108 N/AN/AN/AN/A109 N/AN/AN/AN/A110 N/AN/AN/AN/A111 N/AN/AN/AN/A

相似文章

Gemma 4 31B 的能力让我惊讶

Reddit r/LocalLLaMA

一位用户分享了轶事发现:Gemma 4 31B 在理解和重构杂乱的学术代码方面优于 Qwen 3.6 模型,并与 Opus 4.7 能力相当,还突出了一个 Gemma 擅长的基准测试(SciCode)。