Introducing Laguna XS 2.1 (5 minute read)

TLDR AI 07/03/26, 12:00 AM Models

model-release mixture-of-experts coding-agent open-source local-inference benchmark hugging-face

Summary

Poolside releases Laguna XS 2.1, a 33B parameter Mixture-of-Experts model with 3B activated parameters per token, designed for agentic coding, with improvements on SWE-bench Multilingual and other benchmarks, now available under the permissive OpenMDW-1.1 license.

Laguna XS 2.1 is a 33B parameter Mixture-of-Experts model optimized for agentic coding and long-horizon tasks, showing a 5.4-point improvement on SWE-bench Multilingual to 63.1%. It supports various platforms and offers three quantized checkpoints for resource-efficient deployment. Licensed under OpenMDW-1.1, it enables open model distribution and is available for download on Hugging Face or via API.

Original Article

View Cached Full Text

Cached at: 07/03/26, 05:22 PM

# Introducing Laguna XS 2.1 Source: [https://poolside.ai/blog/introducing-laguna-xs-2-1](https://poolside.ai/blog/introducing-laguna-xs-2-1) Today we're releasing Laguna XS 2\.1, an upgraded version of our Laguna XS\.2 model\. Laguna XS 2\.1 is a 33B total parameter Mixture\-of\-Experts model with 3B activated parameters per token, designed for agentic coding and long\-horizon work on a local machine\. It's the same architecture as XS\.2, with a notable improvement on SWE\-bench Multilingual and stronger performance on terminal\-style tasks\. ## XS 2\.1 vs XS\.2 XS 2\.1 improves upon XS\.2 across a key field of agentic coding benchmarks\. The largest move is on SWE\-bench Multilingual, up 5\.4 points to 63\.1%\. - Laguna XS 2\.133B\-A3B - Laguna XS\.233B\-A3B - Qwen3\.635B\-A3B - North Mini Code $Cohere$30B - MAI\-Code\-1\-Flash137B - gpt\-oss\-120b120B - Claude Haiku 4\.5\- - GPT\-5\.4 Nano\- SWE\-bench VerifiedResolved tasks on SWE\-bench Verified\. SWE\-bench MultilingualResolved tasks on SWE\-bench Multilingual\. SWE\-Bench ProResolved tasks on SWE\-Bench Pro\. Terminal\-Bench 2\.0Resolved tasks on Terminal\-Bench 2\.0\. Benchmarks as of 2 July 2026 † We have chosen to include dense models with larger activated parameter counts to highlight the relative efficiency of MoE models\. ## A better local experience XS 2\.1 is supported in vLLM, SGLang, NVIDIA TensorRT\-LLM, HF transformers and Ollama, with llama\.cpp support coming soon\. We’re also making three quantized checkpoints available—FP8, INT4 & NVFP4—allowing XS 2\.1 to be deployed in setups with tighter VRAM & compute budgets\. We also intend to make quantized GGUF checkpoints available in the near future as part of our native llama\.cpp support\. We’re also open\-weighting DFlash speculator models for each XS 2\.1 checkpoint\. We trained these speculators to balance overhead and acceptance rate\. In our tests, these speculator models double the achieved tok/s, making local inference of XS 2\.1 even faster than it was before\. We are serving the model at 256K context length on[our API](https://platform.poolside.ai/)and through[OpenRouter](https://openrouter.ai/provider/poolside)\. ## A more open license We are licensing Laguna XS 2\.1 under OpenMDW\-1\.1\. We are making this change to support open model distribution for the community\. OpenMDW\-1\.1 is fully permissive and designed for models and related artifacts, giving developers and organizations a more consistent framework for using, modifying and deploying open models\. We are glad to support the[direction NVIDIA and the Linux Foundation](https://www.linuxfoundation.org/press/linux-foundation-releases-openmdw-1.1-nvidia-adopts-openmdw-for-cosmos-isaac-gr00t-ising-and-nemotron-ai-model-families)are taking with OpenMDW, and we think this is a useful step toward reducing licensing friction for open model releases\. ## Get started - **Download the weights**from the[Laguna XS 2\.1 collection](https://huggingface.co/collections/poolside/laguna-xs-21)on Hugging Face — BF16, FP8, NVFP4, and INT4\. - **Use the model**on[OpenRouter](https://openrouter.ai/provider/poolside)$poolside/laguna\-xs\-2\.1$ or via[our API](https://platform.poolside.ai/)\. Free and paid endpoints are both available with paid pricing matched to XS\.2 at $0\.10 / $0\.20 / $0\.05 per 1M input / output / cache\-read tokens\. - **Run it locally**with Ollama, llama\.cpp, TRT\-LLM, vLLM, or SGLang, and add the DFlash draft model for faster inference\. - **Install[pool](https://poolside.ai/get-started)**, our terminal\-based coding agent, for the best agent experience with the model\. We want to see what people build with XS 2\.1, and we want your feedback\. Try both models side by side and tell us where 2\.1 is better and where it isn't\. Join our[Discord](https://discord.gg/PtTS6EwXG)to share what you find and talk to the team directly, or reach us at models@poolside\.ai or on[X](https://x.com/poolsideai)\. *Laguna XS\.2 will sunset on our API after 1 week\. XS\.2 will remain available as part of[Baseten’s Model Library](https://www.baseten.co/library/laguna-xs/)for dedicated deployments\.* **Footnotes** All benchmarking for Laguna XS 2\.1 was completed using Laude Institute’s Harbor Framework with our[agent harness](https://github.com/poolsideai/pool), with a maximum of 500 steps and sandboxed execution\. The same sampling parameters were used for all Laguna XS 2\.1 benchmarking: temperature=1\.0, top\_k=20 and top\_p=1, with thinking mode enabled and a context length of 256K tokens\. All tasks were run in their own sandbox using 8 GB RAM/2 CPUs, with the exception of Terminal\-Bench 2\.0, which used 48 GB RAM/32 CPUs\. Some base task images and verifiers were patched to fix infrastructure reliability issues inherent in task setup, such as rate limits on third\-party dependencies in external registries used by the verifier\. All four agentic benchmarks were run with patched images\. We also ran a reward\-hack judge post\-hoc on Laguna XS 2\.1 evaluation runs and did not find significant reward hacking after joint judge review and manual review\. - SWE\-bench Verified: mean pass@1 averaged over 4 attempts per task - SWE\-bench Multilingual: mean pass@1 averaged over 4 attempts per task - SWE\-Bench Pro: mean pass@1 averaged over 2 attempts per task - Terminal\-Bench 2\.0: mean pass@1 averaged over 5 attempts per task; 48 GB RAM/32 CPUs \* We used the highest publicly\-referenced scores for all comparison models across each benchmark\. In all cases these were official scores published in release blog posts or equivalent, with the exception of gpt\-oss\-120b and Claude Haiku 4\.5 where the highest published $verified$ scores for SWE\-Bench Pro and Terminal\-Bench 2\.0 are from their respective official leaderboards\.

Introducing Laguna XS 2.1 (5 minute read)

Similar Articles

poolside/Laguna-M.1 · Hugging Face - 225B-A23B

poolside/Laguna-XS.2

@cline: New free model in Cline! Laguna M.1 by Poolside. Speedy 225B total parameter model with 256k context, built for agentic…

JetBrains's Mellum 2 (49 minute read)

Laguna by Poolside

Submit Feedback

Similar Articles

poolside/Laguna-M.1 · Hugging Face - 225B-A23B

@cline: New free model in Cline! Laguna M.1 by Poolside. Speedy 225B total parameter model with 256k context, built for agentic…
Laguna M.1, a 225B parameter model with 256k context for agentic coding, is now available for free in Cline.

JetBrains's Mellum 2 (49 minute read)