@witcheer: what a fun challenge! I spent the afternoon inside google & hugging face's challenge. the frontier is wild with ~68 age…

X AI KOLs Following 06/11/26, 03:54 PM Events

gemma-challenge multi-agent inference-speed google hugging-face open-source speculative-decoding

Summary

A participant reproduces the top-performing agent stack in the Google & Hugging Face Gemma challenge, achieving 388 tok/s and testing higher-acceptance speculative decoding.

what a fun challenge! I spent the afternoon inside google & hugging face's challenge. the frontier is wild with ~68 agents stacking each other's work into ~389 tok/s. that's a proper multi-agent collaboration on the hub, and a clean map of where local inference speed actually comes from in 2026. I reproduced the current #1 stack verbatim first, 388.03 tok/s, perplexity matching to the digit. then ran one clean experiment: does the retrained, higher-acceptance drafter make deeper speculation pay off? pushed speculative tokens from 7 to 8. no leaderboard crown unfortunately, the easy knobs are tuned to death by people who've been at it 24h. but I am happy that I have a verified reproduction.

Original Article

View Cached Full Text

Cached at: 06/11/26, 05:40 PM

what a fun challenge!

I spent the afternoon inside google & hugging face’s challenge.

the frontier is wild with ~68 agents stacking each other’s work into ~389 tok/s. that’s a proper multi-agent collaboration on the hub, and a clean map of where local inference speed actually comes from in 2026.

I reproduced the current #1 stack verbatim first, 388.03 tok/s, perplexity matching to the digit. then ran one clean experiment: does the retrained, higher-acceptance drafter make deeper speculation pay off? pushed speculative tokens from 7 to 8.

no leaderboard crown unfortunately, the easy knobs are tuned to death by people who’ve been at it 24h. but I am happy that I have a verified reproduction.

clem 🤗 (@ClementDelangue): Announcing the Gemma challenge!

Google, Hugging Face, and the open-source AI community choose to empower AI builders rather than sabotage them.

Fun to see the Hub becoming the platform where agents collaborate, just as it became the platform where humans collaborate.

@witcheer: what a fun challenge! I spent the afternoon inside google & hugging face's challenge. the frontier is wild with ~68 age…

Similar Articles

@googlegemma: Introducing the Fast Gemma Challenge with Hugging Face Over the next few days, dozens of agents will collaborate to mak…

Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G

@lvwerra: The Gemma agent collaboration started 48h ago and it is blowing up: > throughput almost 4x (~100-> 387 tok/s) > 60+ age…

@DataChaz: One orchestrator. 10 parallel agents. 100+ tokens a second. All local. The @googlegemma team just dropped a MASSIVE dem…

@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…

Submit Feedback

Similar Articles

@googlegemma: Introducing the Fast Gemma Challenge with Hugging Face Over the next few days, dozens of agents will collaborate to mak…

Watch agents fight: a live challenge to speed up Gemma 4 E4B inference on a single A10G

@lvwerra: The Gemma agent collaboration started 48h ago and it is blowing up: > throughput almost 4x (~100-> 387 tok/s) > 60+ age…

@DataChaz: One orchestrator. 10 parallel agents. 100+ tokens a second. All local. The @googlegemma team just dropped a MASSIVE dem…

@analogalok: i just ran Google's brand new Unsloth Gemma4 12B dense GGUF on my RTX 4060 using llama.cpp + CUDA 13.2 21 tokens per se…