Made a simple template manager and GUI for llama.cpp so I don't have to keep memorizing CLI flags.

Reddit r/LocalLLaMA 05/17/26, 02:15 PM Tools

llama.cpp template-manager gui open-source local-llm huggingface multi-model

Summary

Hexllama is a free, open-source desktop GUI and template manager for llama.cpp that simplifies CLI flag management, version updates, and HF model downloads, enabling multi-model execution.

[Introducing Hexllama](https://reddit.com/link/1tfqrbt/video/uobdgqq1hp1h1/player) Hey, I’ve always found **llama-server** to be more than enough for testing out local models, mostly because it guarantees you always have the absolute latest llama.cpp features and architecture support. But keeping track of different CLI commands, context sizes, and batch settings for different models was becoming a massive headache. Plus, managing multiple terminal tabs when I wanted to run two models at once was annoying. So, I built **Hexllama**. It's a fast desktop interface that gets out of your way and just makes managing llama.cpp easier. No walled gardens, just a clean wrapper. **What it actually does:** * **Template-Based Execution:** You configure your CLI flags (threads, context, etc.) once via a visual editor, save it as a template, and from then on it’s just one click to run. * **Built-in llama.cpp Version Manager:** This is the feature I use the most. It auto-checks the ggml-org repo, lets you download new releases directly in the app, and lets you swap backends instantly (super useful when a new model architecture drops and needs a specific build). * **Integrated HF Downloader:** Search HuggingFace directly in the app. Click to download GGUFs. It handles pausing/resuming and automatically generates a baseline execution template based on the model's parameters when the download finishes. * **Multi-Model & API Only mode:** You can run multiple models simultaneously on different ports without conflict. You can launch them in the standard "Chat UI" (opens the built-in llama.cpp web interface), or "API Only" mode to just serve them silently in the background for things like SillyTavern or OpenWebUI. It’s completely open-source. I built this mainly for my own workflow, but I figured some of you might find it useful instead of wrestling with bash scripts. Free. Opensource. MIT. **GitHub Repo + Download:** [https://andercoder.com/hexllama](https://andercoder.com/hexllama) (Installation via pre-compiled releases or build from source). Let me know what you think! Any feedback, bug reports, or PRs are highly appreciated. love this sub

Original Article

Made a simple template manager and GUI for llama.cpp so I don't have to keep memorizing CLI flags.

Similar Articles

Llama-Studio, WebUI for llama-server Management

llama.cpp is the linux of llm

I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp

llama.cpp docker images to run MTP models

I’m building an open-source LLM app for writing/RP and recently added desktop pets + AI agents

Submit Feedback

Similar Articles

Llama-Studio, WebUI for llama-server Management

I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp

llama.cpp docker images to run MTP models

I’m building an open-source LLM app for writing/RP and recently added desktop pets + AI agents