@FradSer: The most interesting thing I've done so far: Trying a series of methods to make models like gpt-oss:20b and gemma4:e4b approach Opus 4.7's level under certain conditions
Summary
Attempting a series of methods to make models such as gpt-oss:20b and gemma4:e4b approach Opus 4.7's performance level under certain conditions.
View Cached Full Text
Cached at: 05/24/26, 02:18 AM
The most interesting thing I’ve done so far:
Trying a series of approaches to get models like gpt-oss:20b and gemma4:e4b close to Opus 4.7 level under certain conditions 👀 https://t.co/1YUmoZ8dao
Similar Articles
Gemma 4 31B's competence surprised me
A user shares anecdotal findings that Gemma 4 31B outperforms Qwen 3.6 models and matches Opus 4.7 in understanding and refactoring messy academic code, highlighting a benchmark (SciCode) where Gemma excels.
@eliebakouch: we let opus 4.7 and gpt 5.5 run on the nanogpt optimizer speedrun: ~10k runs, 14k H200 hours, 23.9B tokens. opus hits 2…
AI agents (Opus 4.7 and GPT 5.5/Codex) autonomously optimized the nanoGPT speedrun optimizer, beating the human baseline with a new record of 2930 steps. The blog details their search methods, failures, and releases all run data and code.
@mylifcc: I'm already running Gemma-4-12b on my Mac. Tech stack: llama.cpp + GGUF Q4_K_M + Metal 32K context, local OpenAI-compatible API. Measured about 36 tok/s, resident RSS about…
User shares their experience using llama.cpp with the GGUF Q4_K_M quantized version of Gemma-4-12b on a Mac, achieving local inference speed of about 36 tok/s and memory usage of about 10GB.
@WEB3_furture: COOL! Someone took the newly released Qwen 3.7-Max, Claude Opus 4.7, and GPT-5.5 for an Agent loop comparison: letting the model write its own Tetris bot, test it, and directly PK after 10 consecutive iterations. Results: Qwen 3.7-Max: +$…
Someone conducted an Agent loop comparison test on Qwen 3.7-Max, Claude Opus 4.7, and GPT-5.5, letting the models write their own Tetris bots and iterate 10 rounds before competing. The results show that Qwen 3.7-Max leads in both performance and cost.
@hank_aibtc: Amazing! Running Gemma 4 in the browser, on par with ChatGPT?! Completely zero server, zero data upload, offline, pure WebGPU local inference! Xenova has open-sourced all 27 custom WebGPU kernels written by Fable 5: - Gemma 4 E2B (2.3B parameters...)
The article introduces Xenova's open-sourcing of 27 custom WebGPU kernels, enabling Gemma 4 to run fully offline and locally in the browser at 255 tok/s, and discusses advantages like privacy and offline use. It also mentions FLUX.2's 3D generation capability.