LokalBot - fully local macOS app: meetings, autocomplete, and day tracking that all run on your machine with a user friendly UI

Reddit r/LocalLLaMA Tools

Summary

LokalBot is a fully local macOS app that runs AI models on-device for meeting transcription and summarization, autocomplete in any app, and day tracking, with full privacy and no cloud dependency.

Been lurking here a while, this sub is basically why LokalBot exists. It's a Mac app that records + summarizes your meetings, autocompletes your typing in any app, and tracks where your day went, with every model running on-device. No cloud, no account, no API keys. Most of the workflows LokalBot has I've been using multiple separate apps to do like Granola, Cotypist etc. but now I have a single app that is doing all those with no additional 3rd party inference cost. Heads up first: Apple Silicon / macOS 15+ only. It's welded to the Neural Engine, MLX, and Core Audio, so no Linux/NVIDIA. I'm running it on a MacBook M4 Max with 48GB of RAM, and it's running well with some spikes so if you have 16-24GB RAM my model defaults are probably not going to work for you as seamlessly but there are some good alternatives in the models settings in the app. The model stack: Summaries, chat, and cotyping run on a bundled llama.cpp — in-process libllama for cotyping's low latency, llama-server otherwise. Point any of them at your own GGUF, an Ollama or OpenAI-compatible endpoint, or Apple Intelligence. Transcription: Granite Speech 4.1 / Parakeet / Whisper / Qwen3-ASR via CoreML/MLX on the Neural Engine. Parakeet clocks ~190× realtime. Semantic search: Qwen3-Embedding 0.6B GGUF on a second llama-server (--embeddings), vectors in SQLite, brute-force cosine. At personal scale "brute force" is just "instant," and it adds zero dependencies. Diarization: optional pyannote (via FluidAudio) to split "Them" into Them 1 / Them 2. In-app Hugging Face browser to search + download GGUFs, with a per-model hardware-fit advisory. My current defaults I found best in real usage(very open to being told I'm wrong): Transcription: IBM Granite Speech 4.1 (2B) Q4 Summarization: Qwen 3.6 35B-A3B Q4_K_M Cotyping: Gemma 4 E4B Q5 XL Privacy is the whole point. The only network call is the one-time model download; after that it's fully offline. Point Little Snitch at it during a meeting and enjoy the flattest network graph you've ever seen. Optional screenshots are AES-GCM sealed and auto-delete. GitHub : https://github.com/stevyhacker/lokalbot Landing : https://lokalbot.com Mostly I'd love this crowd's take on the model picks — especially better local ASR and small, fast cotyping models. What would you run?
Original Article

Similar Articles

LocalClicky

Product Hunt

LocalClicky is a tool that lets you control your Mac with your voice, all processing happens locally on your device.