@GitHub_Daily: GitHub 上一款专为 Mac 打造的纯本地语音转文字开源工具:MacParakeet,识别准确率颇高。 支持直接拖拽音视频文件,或者贴个 YouTube 链接,就能快速输出带时间戳和说话人标签的文稿。 还能同时录制电脑系统声音和麦克风…

X AI KOLs Timeline 工具

摘要

MacParakeet is a new open-source Mac application that provides fast, fully local voice transcription using Apple's Neural Engine and NVIDIA's Parakeet model, ensuring privacy by keeping audio data on-device.

GitHub 上一款专为 Mac 打造的纯本地语音转文字开源工具:MacParakeet,识别准确率颇高。 支持直接拖拽音视频文件,或者贴个 YouTube 链接,就能快速输出带时间戳和说话人标签的文稿。 还能同时录制电脑系统声音和麦克风,开会时一边看实时转写,一边做笔记。 GitHub:http://github.com/moona3k/macparakeet… 语音识别全程在本地运行,直接调用苹果的神经网络引擎,速度极快且音频数据绝不出本地。 如果有进阶需求,也可以接入本地 Ollama 或者各类大模型 API,帮我们自动生成会议摘要和整理排版。 提供了开箱即用的安装包,仅支持 Apple Silicon 芯片。需要一个快速、隐私优先的语音转文字工具的朋友,可以试试。
查看原文 导出为 Word 导出为 PDF
查看缓存全文

缓存时间: 2026/05/10 08:24

GitHub 上一款专为 Mac 打造的纯本地语音转文字开源工具:MacParakeet,识别准确率颇高。 支持直接拖拽音视频文件,或者贴个 YouTube 链接,就能快速输出带时间戳和说话人标签的文稿。 还能同时录制电脑系统声音和麦克风,开会时一边看实时转写,一边做笔记。 GitHub:http://github.com/moona3k/macparakeet… 语音识别全程在本地运行,直接调用苹果的神经网络引擎,速度极快且音频数据绝不出本地。 如果有进阶需求,也可以接入本地 Ollama 或者各类大模型 API,帮我们自动生成会议摘要和整理排版。 提供了开箱即用的安装包,仅支持 Apple Silicon 芯片。需要一个快速、隐私优先的语音转文字工具的朋友,可以试试。


moona3k/macparakeet

Source: https://github.com/moona3k/macparakeet

MacParakeet app icon

MacParakeet

Fast voice app for Mac with fully local speech and optional AI. Free and open-source.

There are many voice transcription/dictation apps, but this one is mine.

macparakeet.com

Download DMG

Ask DeepWiki GPL-3.0 License macOS 14.2+ Swift 6 Tests passing Apple Silicon only

MacParakeet — Transcribe view with YouTube and file input

MacParakeet — Transcription library with thumbnails

MacParakeet — Dictation history and voice stats


MacParakeet runs NVIDIA’s Parakeet TDT on Apple’s Neural Engine via FluidAudio CoreML. The v0.6 release scope includes system-wide dictation, file/URL transcription, meeting recording, and optional local WhisperKit recognition for languages Parakeet does not cover. All speech recognition happens on your Mac.

Release status

The notarized DMG is the stable release channel.

ChannelStatusIncludes
Stable DMGRecommended for normal useDictation, file/video/YouTube transcription, meeting recording, optional WhisperKit, exports, vocabulary, AI features
main branchDevelopmentv0.6 release scope plus hidden calendar auto-start code under AppFeatures.calendarEnabled = false

Calendar reminders, auto-start, and auto-stop are implemented in source but hidden from the v0.6 product surface while they await end-to-end validation.

What it does

Dictation — Press a hotkey in any app, speak, text gets pasted. Hold for push-to-talk, double-tap for persistent recording. Works system-wide.

File transcription — Drag audio or video files, or paste a YouTube URL. Full transcript with word-level timestamps, speaker labels, and export to 7 formats (TXT, Markdown, SRT, VTT, DOCX, PDF, JSON). Assign global hotkeys to trigger File or YouTube transcription from anywhere.

Meeting recording — Record system audio and microphone together, see a live local transcript preview, take notes during the call, then save the finalized transcript to the library with export, prompts, and chat.

Text cleanup — Filler word removal, custom word replacements, text snippets with triggers. Deterministic pipeline, no LLM needed.

AI features — Optional summaries, chat, and an AI formatter. Connect any cloud provider (OpenAI, Anthropic, Gemini, OpenRouter), local runtime (Ollama, LM Studio), OpenAI-compatible endpoint, or CLI tool (Claude Code, Codex). Entirely opt-in.

Performance

  • ~155x realtime — 60 min of audio in ~23 seconds
  • ~2.5% word error rate (Parakeet TDT 0.6B-v3)
  • ~66 MB working memory per active Parakeet inference slot
  • 25 European languages with Parakeet auto-detection
  • Optional local WhisperKit engine for Korean, Japanese, Chinese, and many other languages

Limitations

  • Apple Silicon only (M1/M2/M3/M4)
  • Parakeet is best for English and supported European languages
  • WhisperKit multilingual support requires a separate local model download before first use

Get it

Download: Grab the notarized DMG or visit macparakeet.com. Drag to Applications, done.

First launch downloads the speech model (~6 GB) plus speaker-detection assets (~130 MB). Everything works fully offline after that.

The DMG is the stable release.

Build from source:

git clone https://github.com/moona3k/macparakeet.git
cd macparakeet
swift test
scripts/dev/run_app.sh    # build, sign, launch

The dev script creates a signed .app bundle so macOS grants mic and accessibility permissions. It disables target-level Xcode signing, then signs the finished bundle with the best available local identity. Override with MACPARAKEET_CODESIGN_IDENTITY="Your Identity" if needed.

CLI:

swift run macparakeet-cli transcribe /path/to/audio.mp3
swift run macparakeet-cli models download whisper-large-v3-v20240930-turbo-632MB
swift run macparakeet-cli transcribe /path/to/korean.mp3 --engine whisper --language ko --format json
swift run macparakeet-cli models status
swift run macparakeet-cli history

The Whisper CLI commands above require a downloaded local WhisperKit model.

Tech stack

LayerChoice
STTParakeet TDT 0.6B-v3 via FluidAudio CoreML (default) + optional local WhisperKit engine
STT orchestrationShared runtime + explicit scheduler with a reserved dictation slot and a shared meeting/file slot; speech-engine routing and meeting-session pinning
LanguageSwift 6.0 + SwiftUI
DatabaseSQLite via GRDB
Auto-updatesSparkle 2
YouTubeyt-dlp
PlatformmacOS 14.2+, Apple Silicon

Vocabulary

The Vocabulary panel controls how dictated text is cleaned up before pasting. No AI involved — it’s a fast, deterministic pipeline that runs in under 1ms.

You choose between two processing modes:

  • Raw — Paste exactly what the speech engine produces, no changes
  • Clean (default) — Run the text through a multi-step pipeline before pasting

The Clean pipeline applies these steps in order:

  1. Filler removal — Strips “um”, “uh”, and sentence-start fillers like “so”, “well”, “like”
  2. Custom words — Applies your word replacement rules (e.g., “aye pee eye” becomes “API”, or “kubernetes” gets capitalized to “Kubernetes”). Case-insensitive, whole-word matching. Words can be toggled on/off without deleting.
  3. Voice Return — If you’ve defined a trigger phrase (e.g., “press return”) and speak it at the end of a dictation, it’s stripped from the output and a Return keypress is simulated after paste
  4. Snippet expansion — Replaces short trigger phrases with longer text (e.g., “my signature” expands to “Best regards, David”). Triggers are natural language phrases because that’s what the speech engine outputs. Matched longest-first to prevent collisions.
  5. Whitespace cleanup — Collapses spaces, fixes punctuation spacing, capitalizes the first letter

Every dictation stores both the raw and clean transcript so you can always see what changed.

AI Features

AI features are entirely opt-in and separate from speech recognition — transcription is always local. The LLM only sees transcript text, never audio.

What it does:

  • Summarize — After a transcription finishes, click Summarize and pick a prompt (“Summary”, “Action Items & Decisions”, “Chapter Breakdown”, etc.) or write your own. The LLM processes the transcript and streams back a summary. You can generate multiple summaries per transcript, each in its own tab. Prompts marked as auto-run generate summaries automatically for new transcriptions.
  • Chat — Ask questions about a transcript in a multi-turn chat interface. The LLM answers based on the transcript content.
  • AI formatter — Optionally run your dictation and file transcripts through your AI provider to clean up grammar, punctuation, and paragraphing. Toggle on/off, customize the prompt, or reset to default.

Supported providers:

TypeOptions
CloudAnthropic (Claude), OpenAI, Google Gemini, OpenRouter
LocalOllama, LM Studio
CustomOpenAI-Compatible (any API-shaped endpoint — vLLM, LocalAI, LiteLLM, llama.cpp server, third-party hosts)
CLI subprocessClaude Code, Codex, or another configured command

Setup: In Settings → AI Provider, pick a provider, enter an API key (cloud) or confirm the local server/CLI command is available, select a model, and hit Test Connection. Cloud providers store keys in the macOS Keychain. Ollama and LM Studio can keep LLM inference on-device. CLI subprocess providers run the configured command locally, but that command may contact its own cloud service.

Privacy

All speech recognition runs locally. Parakeet uses the Neural Engine; the optional WhisperKit engine also runs on-device. Your audio never leaves your Mac.

  • No cloud STT. The model runs on-device. No audio is transmitted.
  • No accounts. No login, no email, no registration.
  • Opt-out telemetry. Non-identifying usage analytics and crash reporting go to a self-hosted endpoint only when telemetry is enabled. No persistent IDs, no IP storage, and no transcript/audio content is transmitted. Source code is right here — verify it yourself.
  • Temp files cleaned up. Audio deleted after transcription unless you save it.

What does use the network: AI summaries and chat connect to configured LLM providers, or to whatever service a configured CLI tool chooses to use, when you choose them. Sparkle checks for app updates. YouTube transcription downloads video via yt-dlp. Telemetry and crash reports go to our self-hosted server unless you opt out. Core dictation and transcription stay fully offline.

Note: Builds from source also send telemetry by default. Opt out in Settings or set MACPARAKEET_TELEMETRY_URL to override.

Contributing

  • Report bugsOpen an issue
  • Submit a PR — Fork, make changes, swift test, open a PR
  • Read the specs — Architecture decisions and feature specs live in spec/

For larger changes, open an issue first.

Support

MacParakeet is free and open source. If it’s useful to you, consider sponsoring.

License

GPL-3.0. Free software. Full license.

相似文章

Speakmac

Product Hunt

Speakmac 是一款 macOS 本地语音输入工具,新增实时预览与免手操作模式。

Coddo 出品的 Whisper Island

Product Hunt

Coddo 出品的 Whisper Island 是一款 macOS 应用,可将语音转录功能直接集成于 Mac 的刘海区域。

@Honcia13: 开源TTS直接卷疯了!园区诈骗又有新武器? 清华 OpenBMB 刚刚放出 VoxCPM2: 200亿参数 + 200万小时多语言数据训练,48kHz录音棚级音质! 最狠的是——完全不用Tokenizer,直接在连续潜空间做扩散自回归,细…

X AI KOLs Timeline

清华大学 OpenBMB 发布了 VoxCPM2,这是一个拥有 200 亿参数的开源多语言 TTS 模型,支持无需 Tokenizer 的连续潜空间扩散自回归生成,具备 48kHz 录音棚级音质和强大的声音克隆与设计能力。