I built an arena where LLMs sword-fight with real physics. You decide which part of the blade is sharp, vote blind, and free OpenRouter models battle for Elo. Llama 3.3 is currently stabbing GPT-OSS in the face.
Summary
A new arena lets LLMs control physics ragdolls in weapon duels where users define weapon damage zones, vote blind, and models battle for Elo. Free models like Llama 3.3 and GPT-OSS compete, with self-hostable infrastructure.
Similar Articles
Built a lightweight Python framework for local LLM roleplay (Ollama/Phi-3) to stop context drift. Looking for feedback.
A lightweight Python framework for local LLM roleplay using Ollama and Phi-3, featuring context preservation and native streaming to prevent character drift.
Built a Tauri v2 desktop chat shell for local LLMs — point it at Ollama / llama.cpp / any OpenAI-compatible endpoint, MIT, ~12 MB binary
Built a Tauri v2 desktop chat shell for local LLMs that can connect to Ollama, llama.cpp, or any OpenAI-compatible endpoint. The project is MIT licensed and produces a ~12 MB binary.
LLM planner - pick a rig for your use-case/model/budget, or pick models for your rig. 60+ builds, 50+ models, 130+ cited t/s sources, 150+ reviewer YouTube videos, idle+active watts, multi-region prices, regular updates.
A comprehensive web tool and public dataset that helps users choose the right hardware for running LLMs, featuring 60+ builds, 50+ models, performance benchmarks, and reviewer videos, with two-way matching between models and hardware.
Evaluating open source LLMs on Autonomous Codenames Simulations
A developer built a Codenames simulation arena to evaluate open-source LLMs on long-range collaboration, finding DeepSeek v4 Flash outperformed others with high game logic alignment, while Qwen 3 Next and GPT 5.4 Nano struggled with rule constraints and perspective-taking.
LlamaStation v0.9 — llama.cpp GUI for Windows with multi-backend support, TurboQuant, MTP and more
LlamaStation v0.9 is a Windows GUI for llama.cpp that offers a clean interface with full parameter control, multiple backends (official, TurboQuant, AtomicChat, BeeLlama), real-time VRAM monitoring, per-model profiles, voice mode, and headless mode, all without intermediate layers like Ollama.