Tag
FastUbu is a tool that applies modern AI techniques like indexing and transcription to the 30-year-old Ubu film archive, aiming to provide ultrafast video processing via the Kino API.
Trace is a Mac app that transcribes meetings locally without uploading audio, allowing users to flag moments mid-call and get clean markdown transcripts.
Open-sourced a collection of 11 AI tool scripts for collecting and transcribing content from multiple channels like Douyin, Bilibili, and WeChat public accounts, making it easy to build a personal knowledge base. Supports direct installation by agents such as Claude Code, Codex, etc.
The user explains how they used Fable, an AI tool, to edit its own launch video by leveraging code and tool calls for transcription, ffmpeg, color grading, Figma MCP, and Remotion UI, without touching a video editor.
An open-source toolkit containing 11 AI skills that supports automatic transcription of multi-platform content, knowledge base management, and industry intelligence monitoring, ready to be loaded and used in AI agents like Claude Code.
Microsoft released VibeVoice, an open-source model that processes a full hour of audio in one pass and returns a structured transcript with speaker identification and timestamps, disrupting paid transcription services.
Signal Recorder SR-7 is an on-device voice recorder that transcribes audio and exports Markdown files.
The article evaluates Wispr Flow, an AI-powered transcription tool, comparing it with free alternatives like open-source models (Whisper, Canary) and built-in features (Apple dictation, Google Voice Typing), concluding that paid subscriptions may not be necessary for many users.
A user shares how they combined Granola (call transcription) and Lovable (build tool) to deliver a working prototype to a client within 15 minutes of their call ending.
Trace is a no-frills tool for offline meeting transcripts with context, available on Product Hunt.
A TechCrunch review of Amazon's Bee wearable, an AI device that records, transcribes, and summarizes conversations. The reviewer finds it useful for professional settings but expresses privacy concerns.
Voice-Pro is a web tool that integrates six top open-source models (Whisper, Demucs, CosyVoice, F5-TTS, etc.), supporting YouTube video downloading, vocal removal, transcription, translation, voice cloning, and fully automatic dubbing. It takes less than 2 minutes, runs 100% locally, and is free.
yapsnap is a command-line tool for transcribing video/audio from various sources (YouTube, TikTok, etc.) to plain text using only CPU, no GPU or cloud required. It leverages sherpa-onnx and yt-dlp for offline, fast transcription.
PrivateScribe.ai is a fully local, MIT-licensed AI transcription platform with HIPAA safeguards, now featuring a bundled macOS app, onboarding wizard, speaker diarization, and encryption.
This paper evaluates LLMs for automatically annotating narrative macrostructure in spoken Mandarin, finding that the best model achieves near-human reliability while reducing annotation time by 65%, though performance degrades on semantically complex or lexically diverse narratives.
This paper analyzes the Huitongguanxi Huayiyiyu, a series of multilingual glossaries from the Ming dynasty, as a structured cross-linguistic transcription system that used Chinese characters to represent non-Chinese languages, revealing how Chinese phonological categories were flexibly extended for phonetic approximation.
Ontario's auditor general found that AI transcription tools for doctors generated errors and hallucinations, potentially harming patient care, and criticized inadequate government testing.
Meetily is a privacy-first, open-source AI meeting assistant that captures, transcribes, and summarizes meetings entirely locally on the user's infrastructure.
Wave is a voice-to-text tool that offers both local and cloud processing options, giving users choice over privacy and performance.
A highly optimized version of OpenAI's Whisper Large v3 using Transformers, Optimum, and Flash Attention 2, capable of transcribing 150 minutes of audio in under 2 minutes on Replicate.