Tag
Llama-Studio is a WebUI for managing llama-server sessions, allowing configuration, monitoring, and control of multiple instances for local development and experimentation.
Georgi Gerganov shared a one-liner to launch the quantized 27B Qwen3.6 model with llama-server using default speculative-decoding settings.
Author shares a working llama-server config to run the 35B-MoE Qwen3.6 model on an 8GB RTX 4060, highlighting a max_tokens trap caused by unconstrained internal reasoning and the fix using per-request thinking_budget_tokens.