LokalBot is a fully local macOS app that runs AI models on-device for meeting transcription and summarization, autocomplete in any app, and day tracking, with full privacy and no cloud dependency.
Been lurking here a while, this sub is basically why LokalBot exists. It's a Mac app that records + summarizes your meetings, autocompletes your typing in any app, and tracks where your day went, with every model running on-device. No cloud, no account, no API keys. Most of the workflows LokalBot has I've been using multiple separate apps to do like Granola, Cotypist etc. but now I have a single app that is doing all those with no additional 3rd party inference cost. Heads up first: Apple Silicon / macOS 15+ only. It's welded to the Neural Engine, MLX, and Core Audio, so no Linux/NVIDIA. I'm running it on a MacBook M4 Max with 48GB of RAM, and it's running well with some spikes so if you have 16-24GB RAM my model defaults are probably not going to work for you as seamlessly but there are some good alternatives in the models settings in the app. The model stack: Summaries, chat, and cotyping run on a bundled llama.cpp — in-process libllama for cotyping's low latency, llama-server otherwise. Point any of them at your own GGUF, an Ollama or OpenAI-compatible endpoint, or Apple Intelligence. Transcription: Granite Speech 4.1 / Parakeet / Whisper / Qwen3-ASR via CoreML/MLX on the Neural Engine. Parakeet clocks ~190× realtime. Semantic search: Qwen3-Embedding 0.6B GGUF on a second llama-server (--embeddings), vectors in SQLite, brute-force cosine. At personal scale "brute force" is just "instant," and it adds zero dependencies. Diarization: optional pyannote (via FluidAudio) to split "Them" into Them 1 / Them 2. In-app Hugging Face browser to search + download GGUFs, with a per-model hardware-fit advisory. My current defaults I found best in real usage(very open to being told I'm wrong): Transcription: IBM Granite Speech 4.1 (2B) Q4 Summarization: Qwen 3.6 35B-A3B Q4_K_M Cotyping: Gemma 4 E4B Q5 XL Privacy is the whole point. The only network call is the one-time model download; after that it's fully offline. Point Little Snitch at it during a meeting and enjoy the flattest network graph you've ever seen. Optional screenshots are AES-GCM sealed and auto-delete. GitHub : https://github.com/stevyhacker/lokalbot Landing : https://lokalbot.com Mostly I'd love this crowd's take on the model picks — especially better local ASR and small, fast cotyping models. What would you run?
Meetily is a privacy-first, open-source AI meeting assistant that captures, transcribes, and summarizes meetings entirely locally on the user's infrastructure.
A developer created a free, open-source AI assistant that floats on macOS desktop, runs entirely locally using models like Gemma and Qwen via Ollama, with no API keys or subscriptions, ensuring data privacy and offline capability.
The Hedy meeting app now supports fully offline AI summaries using local models like Qwen and Gemma via llama.cpp, with options for bring-your-own-model and hardware-aware model selection. The update enables Wi-Fi-free operation on Apple Silicon and Windows GPUs, though cloud still offers higher speed and quality.