@GitHub_Daily: GitHub 上一款专为 Mac 打造的纯本地语音转文字开源工具:MacParakeet,识别准确率颇高。 支持直接拖拽音视频文件,或者贴个 YouTube 链接,就能快速输出带时间戳和说话人标签的文稿。 还能同时录制电脑系统声音和麦克风…
摘要
MacParakeet is a new open-source Mac application that provides fast, fully local voice transcription using Apple's Neural Engine and NVIDIA's Parakeet model, ensuring privacy by keeping audio data on-device.
查看缓存全文
缓存时间: 2026/05/10 08:24
GitHub 上一款专为 Mac 打造的纯本地语音转文字开源工具:MacParakeet,识别准确率颇高。 支持直接拖拽音视频文件,或者贴个 YouTube 链接,就能快速输出带时间戳和说话人标签的文稿。 还能同时录制电脑系统声音和麦克风,开会时一边看实时转写,一边做笔记。 GitHub:http://github.com/moona3k/macparakeet… 语音识别全程在本地运行,直接调用苹果的神经网络引擎,速度极快且音频数据绝不出本地。 如果有进阶需求,也可以接入本地 Ollama 或者各类大模型 API,帮我们自动生成会议摘要和整理排版。 提供了开箱即用的安装包,仅支持 Apple Silicon 芯片。需要一个快速、隐私优先的语音转文字工具的朋友,可以试试。
moona3k/macparakeet
Source: https://github.com/moona3k/macparakeet
MacParakeet
Fast voice app for Mac with fully local speech and optional AI. Free and open-source.
There are many voice transcription/dictation apps, but this one is mine.
MacParakeet runs NVIDIA’s Parakeet TDT on Apple’s Neural Engine via FluidAudio CoreML. The v0.6 release scope includes system-wide dictation, file/URL transcription, meeting recording, and optional local WhisperKit recognition for languages Parakeet does not cover. All speech recognition happens on your Mac.
Release status
The notarized DMG is the stable release channel.
| Channel | Status | Includes |
|---|---|---|
| Stable DMG | Recommended for normal use | Dictation, file/video/YouTube transcription, meeting recording, optional WhisperKit, exports, vocabulary, AI features |
main branch | Development | v0.6 release scope plus hidden calendar auto-start code under AppFeatures.calendarEnabled = false |
Calendar reminders, auto-start, and auto-stop are implemented in source but hidden from the v0.6 product surface while they await end-to-end validation.
What it does
Dictation — Press a hotkey in any app, speak, text gets pasted. Hold for push-to-talk, double-tap for persistent recording. Works system-wide.
File transcription — Drag audio or video files, or paste a YouTube URL. Full transcript with word-level timestamps, speaker labels, and export to 7 formats (TXT, Markdown, SRT, VTT, DOCX, PDF, JSON). Assign global hotkeys to trigger File or YouTube transcription from anywhere.
Meeting recording — Record system audio and microphone together, see a live local transcript preview, take notes during the call, then save the finalized transcript to the library with export, prompts, and chat.
Text cleanup — Filler word removal, custom word replacements, text snippets with triggers. Deterministic pipeline, no LLM needed.
AI features — Optional summaries, chat, and an AI formatter. Connect any cloud provider (OpenAI, Anthropic, Gemini, OpenRouter), local runtime (Ollama, LM Studio), OpenAI-compatible endpoint, or CLI tool (Claude Code, Codex). Entirely opt-in.
Performance
- ~155x realtime — 60 min of audio in ~23 seconds
- ~2.5% word error rate (Parakeet TDT 0.6B-v3)
- ~66 MB working memory per active Parakeet inference slot
- 25 European languages with Parakeet auto-detection
- Optional local WhisperKit engine for Korean, Japanese, Chinese, and many other languages
Limitations
- Apple Silicon only (M1/M2/M3/M4)
- Parakeet is best for English and supported European languages
- WhisperKit multilingual support requires a separate local model download before first use
Get it
Download: Grab the notarized DMG or visit macparakeet.com. Drag to Applications, done.
First launch downloads the speech model (~6 GB) plus speaker-detection assets (~130 MB). Everything works fully offline after that.
The DMG is the stable release.
Build from source:
git clone https://github.com/moona3k/macparakeet.git
cd macparakeet
swift test
scripts/dev/run_app.sh # build, sign, launch
The dev script creates a signed .app bundle so macOS grants mic and accessibility permissions. It disables target-level Xcode signing, then signs the finished bundle with the best available local identity. Override with MACPARAKEET_CODESIGN_IDENTITY="Your Identity" if needed.
CLI:
swift run macparakeet-cli transcribe /path/to/audio.mp3
swift run macparakeet-cli models download whisper-large-v3-v20240930-turbo-632MB
swift run macparakeet-cli transcribe /path/to/korean.mp3 --engine whisper --language ko --format json
swift run macparakeet-cli models status
swift run macparakeet-cli history
The Whisper CLI commands above require a downloaded local WhisperKit model.
Tech stack
| Layer | Choice |
|---|---|
| STT | Parakeet TDT 0.6B-v3 via FluidAudio CoreML (default) + optional local WhisperKit engine |
| STT orchestration | Shared runtime + explicit scheduler with a reserved dictation slot and a shared meeting/file slot; speech-engine routing and meeting-session pinning |
| Language | Swift 6.0 + SwiftUI |
| Database | SQLite via GRDB |
| Auto-updates | Sparkle 2 |
| YouTube | yt-dlp |
| Platform | macOS 14.2+, Apple Silicon |
Vocabulary
The Vocabulary panel controls how dictated text is cleaned up before pasting. No AI involved — it’s a fast, deterministic pipeline that runs in under 1ms.
You choose between two processing modes:
- Raw — Paste exactly what the speech engine produces, no changes
- Clean (default) — Run the text through a multi-step pipeline before pasting
The Clean pipeline applies these steps in order:
- Filler removal — Strips “um”, “uh”, and sentence-start fillers like “so”, “well”, “like”
- Custom words — Applies your word replacement rules (e.g., “aye pee eye” becomes “API”, or “kubernetes” gets capitalized to “Kubernetes”). Case-insensitive, whole-word matching. Words can be toggled on/off without deleting.
- Voice Return — If you’ve defined a trigger phrase (e.g., “press return”) and speak it at the end of a dictation, it’s stripped from the output and a Return keypress is simulated after paste
- Snippet expansion — Replaces short trigger phrases with longer text (e.g., “my signature” expands to “Best regards, David”). Triggers are natural language phrases because that’s what the speech engine outputs. Matched longest-first to prevent collisions.
- Whitespace cleanup — Collapses spaces, fixes punctuation spacing, capitalizes the first letter
Every dictation stores both the raw and clean transcript so you can always see what changed.
AI Features
AI features are entirely opt-in and separate from speech recognition — transcription is always local. The LLM only sees transcript text, never audio.
What it does:
- Summarize — After a transcription finishes, click Summarize and pick a prompt (“Summary”, “Action Items & Decisions”, “Chapter Breakdown”, etc.) or write your own. The LLM processes the transcript and streams back a summary. You can generate multiple summaries per transcript, each in its own tab. Prompts marked as auto-run generate summaries automatically for new transcriptions.
- Chat — Ask questions about a transcript in a multi-turn chat interface. The LLM answers based on the transcript content.
- AI formatter — Optionally run your dictation and file transcripts through your AI provider to clean up grammar, punctuation, and paragraphing. Toggle on/off, customize the prompt, or reset to default.
Supported providers:
| Type | Options |
|---|---|
| Cloud | Anthropic (Claude), OpenAI, Google Gemini, OpenRouter |
| Local | Ollama, LM Studio |
| Custom | OpenAI-Compatible (any API-shaped endpoint — vLLM, LocalAI, LiteLLM, llama.cpp server, third-party hosts) |
| CLI subprocess | Claude Code, Codex, or another configured command |
Setup: In Settings → AI Provider, pick a provider, enter an API key (cloud) or confirm the local server/CLI command is available, select a model, and hit Test Connection. Cloud providers store keys in the macOS Keychain. Ollama and LM Studio can keep LLM inference on-device. CLI subprocess providers run the configured command locally, but that command may contact its own cloud service.
Privacy
All speech recognition runs locally. Parakeet uses the Neural Engine; the optional WhisperKit engine also runs on-device. Your audio never leaves your Mac.
- No cloud STT. The model runs on-device. No audio is transmitted.
- No accounts. No login, no email, no registration.
- Opt-out telemetry. Non-identifying usage analytics and crash reporting go to a self-hosted endpoint only when telemetry is enabled. No persistent IDs, no IP storage, and no transcript/audio content is transmitted. Source code is right here — verify it yourself.
- Temp files cleaned up. Audio deleted after transcription unless you save it.
What does use the network: AI summaries and chat connect to configured LLM providers, or to whatever service a configured CLI tool chooses to use, when you choose them. Sparkle checks for app updates. YouTube transcription downloads video via yt-dlp. Telemetry and crash reports go to our self-hosted server unless you opt out. Core dictation and transcription stay fully offline.
Note: Builds from source also send telemetry by default. Opt out in Settings or set MACPARAKEET_TELEMETRY_URL to override.
Contributing
- Report bugs — Open an issue
- Submit a PR — Fork, make changes,
swift test, open a PR - Read the specs — Architecture decisions and feature specs live in
spec/
For larger changes, open an issue first.
Support
MacParakeet is free and open source. If it’s useful to you, consider sponsoring.
License
GPL-3.0. Free software. Full license.
相似文章
@GitHub_Daily: 用电脑写东西的时候,脑子里想法很清楚,但打字组织语言就是慢。 尤其写 AI 提示词,说起来一句话的事,打出来还得反复调整格式。 在 GitHub 上看到 OpenLess 这个开源语音输入工具,可作为 Typeless、Wispr Flo…
OpenLess 是一款开源语音输入工具,支持 macOS 和 Windows,可语音转文字并自动润色,特别适合编写 AI 提示词。
Speakmac
Speakmac 是一款 macOS 本地语音输入工具,新增实时预览与免手操作模式。
@GitTrend0x: 卧槽兄弟们 本地跑语音克隆+电影级视频配音,直接支持646种语言,完全离线、无API密钥、无需联网,ElevenLabs直接被干翻 https://github.com/debpalash/OmniVoice-Studio… 这波开源神器…
OmniVoice Studio is an open-source desktop app that enables local voice cloning and cinematic video dubbing across 646 languages, fully offline with no API keys, positioning itself as a privacy-focused alternative to ElevenLabs.
Coddo 出品的 Whisper Island
Coddo 出品的 Whisper Island 是一款 macOS 应用,可将语音转录功能直接集成于 Mac 的刘海区域。
@Honcia13: 开源TTS直接卷疯了!园区诈骗又有新武器? 清华 OpenBMB 刚刚放出 VoxCPM2: 200亿参数 + 200万小时多语言数据训练,48kHz录音棚级音质! 最狠的是——完全不用Tokenizer,直接在连续潜空间做扩散自回归,细…
清华大学 OpenBMB 发布了 VoxCPM2,这是一个拥有 200 亿参数的开源多语言 TTS 模型,支持无需 Tokenizer 的连续潜空间扩散自回归生成,具备 48kHz 录音棚级音质和强大的声音克隆与设计能力。