I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp

Reddit r/LocalLLaMA 05/13/26, 06:15 PM Tools

natural-language-autoencoders anthropic llama-cpp open-source activation-steering local-inference mikupad

Summary

The author built a custom llama.cpp server and Mikupad UI to enable local inference and activation steering with Anthropic's open-weight Natural Language Autoencoders. A LoRA version is in development to reduce memory requirements.

Anthropic's first open weight models, [Natural Language Autoencoders](https://www.anthropic.com/research/natural-language-autoencoders), are just finetunes of popular open weight models. They do not modify architecture and modeling code so inference with llama.cpp is mostly trivial. I packaged every feature of NLAs (namely activation extraction, activation explanation, activation reconstruction and explanation-edit steering) into a [custom llama.cpp server](https://github.com/thomasgauthier/nla.cpp). It comes with a Mikupad UI for token-level activation explanation and steering. I'm currently working on a LoRA version so we can load a single model into memory instead of needing all three models (base model, actor model and critic) loaded, stay tuned!

Original Article

Similar Articles

Automated AI researcher running locally with llama.cpp

Reddit r/LocalLLaMA

ml-intern is a harness for AI agents that integrates with Hugging Face's libraries and now supports running local models via llama.cpp or ollama, enabling an automated AI researcher to run 24/7 on a laptop.

@ggerganov: llama.cpp now has an official website: https://llama.app Our goal is to make local AI accessible to everyone, and impro…

X AI KOLs Timeline

llama.cpp, the popular local AI inference tool, now has an official website (llama.app) with a cross-platform installer and improved user experience to make local AI more accessible.

Built a Tauri v2 desktop chat shell for local LLMs — point it at Ollama / llama.cpp / any OpenAI-compatible endpoint, MIT, ~12 MB binary

Reddit r/LocalLLaMA

Built a Tauri v2 desktop chat shell for local LLMs that can connect to Ollama, llama.cpp, or any OpenAI-compatible endpoint. The project is MIT licensed and produces a ~12 MB binary.

Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAM

Reddit r/LocalLLaMA

A technical guide on setting up local LLM autocomplete (Qwen2.5-Coder-7B) and agentic coding (Qwen3.6-35B-A3B) on a single 16GB GPU with 64GB+ RAM using llama.cpp, including commands and performance benchmarks.

ggml-org/llama.cpp

GitHub Trending (daily)

llama.cpp is an open-source C/C++ library for efficient LLM inference on local hardware, supporting various quantization methods and multiple backends (CPU, GPU, etc.).

Similar Articles

Automated AI researcher running locally with llama.cpp

@ggerganov: llama.cpp now has an official website: https://llama.app Our goal is to make local AI accessible to everyone, and impro…

Built a Tauri v2 desktop chat shell for local LLMs — point it at Ollama / llama.cpp / any OpenAI-compatible endpoint, MIT, ~12 MB binary

Local LLM autocomplete + agentic coding on a single 16GB GPU + 64GB RAM

ggml-org/llama.cpp

Submit Feedback