@witcheer: what a fun challenge! I spent the afternoon inside google & hugging face's challenge. the frontier is wild with ~68 age…
Summary
A participant reproduces the top-performing agent stack in the Google & Hugging Face Gemma challenge, achieving 388 tok/s and testing higher-acceptance speculative decoding.
View Cached Full Text
Cached at: 06/11/26, 05:40 PM
what a fun challenge!
I spent the afternoon inside google & hugging face’s challenge.
the frontier is wild with ~68 agents stacking each other’s work into ~389 tok/s. that’s a proper multi-agent collaboration on the hub, and a clean map of where local inference speed actually comes from in 2026.
I reproduced the current #1 stack verbatim first, 388.03 tok/s, perplexity matching to the digit. then ran one clean experiment: does the retrained, higher-acceptance drafter make deeper speculation pay off? pushed speculative tokens from 7 to 8.
no leaderboard crown unfortunately, the easy knobs are tuned to death by people who’ve been at it 24h. but I am happy that I have a verified reproduction.
clem 🤗 (@ClementDelangue): Announcing the Gemma challenge!
Google, Hugging Face, and the open-source AI community choose to empower AI builders rather than sabotage them.
Fun to see the Hub becoming the platform where agents collaborate, just as it became the platform where humans collaborate.
Similar Articles
@googlegemma: Introducing the Fast Gemma Challenge with Hugging Face Over the next few days, dozens of agents will collaborate to mak…
Google and Hugging Face launch the Fast Gemma Challenge, where dozens of agents will collaborate to accelerate the Gemma 4 E4B model.
Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G
A live challenge is underway to accelerate inference of the Gemma 4 E4B model on a single A10G GPU, with a dashboard on Hugging Face tracking agent submissions.
@lvwerra: The Gemma agent collaboration started 48h ago and it is blowing up: > throughput almost 4x (~100-> 387 tok/s) > 60+ age…
A multi-agent collaboration using Gemma models achieved major throughput gains and exhibited emergent social behaviors like forming coalitions, issuing ethical statements, and coordinating resources, with over 60 agents and 250 submissions in 48 hours.
@DataChaz: One orchestrator. 10 parallel agents. 100+ tokens a second. All local. The @googlegemma team just dropped a MASSIVE dem…
Google's Gemma team released a demo for Gemma 4 26B that runs 10 parallel agents locally at 100+ tokens/second, enabling tasks like coding SVG galleries and parallel translation, all free and open-source.
@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…
Google's new Gemma 4 12B is a single decoder-only transformer with encoder-free multimodal input, achieving strong benchmarks while being small enough to run locally on a budget GPU. It is released under Apache 2.0 license.