I made a UI and server for using Anthropic's new Natural Language Autoencoders locally with llama.cpp

Reddit r/LocalLLaMA Tools

Summary

The author built a custom llama.cpp server and Mikupad UI to enable local inference and activation steering with Anthropic's open-weight Natural Language Autoencoders. A LoRA version is in development to reduce memory requirements.

Anthropic's first open weight models, [Natural Language Autoencoders](https://www.anthropic.com/research/natural-language-autoencoders), are just finetunes of popular open weight models. They do not modify architecture and modeling code so inference with llama.cpp is mostly trivial. I packaged every feature of NLAs (namely activation extraction, activation explanation, activation reconstruction and explanation-edit steering) into a [custom llama.cpp server](https://github.com/thomasgauthier/nla.cpp). It comes with a Mikupad UI for token-level activation explanation and steering. I'm currently working on a LoRA version so we can load a single model into memory instead of needing all three models (base model, actor model and critic) loaded, stay tuned!
Original Article

Similar Articles

Automated AI researcher running locally with llama.cpp

Reddit r/LocalLLaMA

ml-intern is a harness for AI agents that integrates with Hugging Face's libraries and now supports running local models via llama.cpp or ollama, enabling an automated AI researcher to run 24/7 on a laptop.

ggml-org/llama.cpp

GitHub Trending (daily)

llama.cpp is an open-source C/C++ library for efficient LLM inference on local hardware, supporting various quantization methods and multiple backends (CPU, GPU, etc.).