audio-processing

#audio-processing

Poster: Exploring the Limits of Audio-Based Detection of Turkish Phone Call Scams

arXiv cs.CL ↗ · yesterday Cached

This paper introduces the first public multimodal dataset of 100 Turkish scam and benign phone calls, evaluating seven LLMs under raw audio, ASR transcripts, and human-corrected transcripts. Results show transcript-based inputs outperform direct audio, highlighting the need for inclusive AI safety research in low-resource languages.

0 favorites 0 likes

#audio-processing

Building voice AI agents that take turns like humans — the gotchas nobody warns you about

Reddit r/AI_Agents ↗ · 4d ago

This article shares hard-won lessons from building real-time voice AI agents, highlighting the importance of proper turn-taking, VAD handling, billing awareness, and avoiding echo loops.

0 favorites 0 likes

#audio-processing

Removing 'um' from a recording is harder than it sounds

Hacker News Top ↗ · 2026-06-12 Cached

A local CLI tool that uses OpenAI's Whisper to detect and remove filler words (um, uh, erm) from audio recordings, employing techniques to avoid audio artifacts like clicks and background hiss.

0 favorites 0 likes

#audio-processing

Hush

Product Hunt ↗ · 2026-06-09

Hush is an open-source tool for noise suppression designed for voice AI agents, improving audio clarity in real-time interactions.

0 favorites 0 likes

#audio-processing

@CopyRebeldia: The business of charging you every month to turn your meetings into a summary just had a very bad day. Microsoft droppe…

X AI KOLs Timeline ↗ · 2026-06-08 Cached

Microsoft released VibeVoice, an open-source model that processes a full hour of audio in one pass and returns a structured transcript with speaker identification and timestamps, disrupting paid transcription services.

0 favorites 0 likes

#audio-processing

Show HN: Resonate – Low-latency, high-resolution spectral analysis

Hacker News Top ↗ · 2026-06-06 Cached

Resonate is a low-latency, low-memory algorithm for perceptually relevant spectral analysis of audio signals, using resonator models with exponentially weighted moving averages.

0 favorites 0 likes

#audio-processing

@svpino: I've built two voice pipelines for two different companies. They both look like this: Audio → STT → Clean transcript → …

X AI KOLs Following ↗ · 2026-06-05 Cached

Santiago highlights the limitation of traditional STT pipelines that lose tone and emotion, then introduces Velma, a voice-native AI model from Modulate that analyzes raw audio to capture intent, emotion, and other acoustic signals, available via API at 10x cheaper than LLM-based approaches.

0 favorites 0 likes

#audio-processing

Show HN: Live breath detection and biofeedback from a phone microphone

Hacker News Top ↗ · 2026-06-02 Cached

An open-source project that uses a phone microphone for live breath detection and biofeedback, processing audio on-device to enhance self-awareness without wearables or cloud uploads.

0 favorites 0 likes

#audio-processing

@FakeMaidenMaker: Perplexity recently published their latest team sharing: 'How Perplexity Used the Realtime API to Bring Voice Search to Millions of Users'. They used OpenAI's Realtime-1.5 to add voice capabilities to their AI browser Comet...

X AI KOLs Timeline ↗ · 2026-05-22 Cached

Perplexity shared engineering best practices for adding voice functionality to their AI browser Comet using the OpenAI Realtime API, including key techniques like chunked context feeding, role management, and unified audio pipeline.

0 favorites 0 likes

#audio-processing

Under the Hood: Building a Real-Time Chord Recognizer

Lobsters Hottest ↗ · 2026-05-19 Cached

This article explains the technical architecture of a real-time chord recognizer, detailing a four-stage pipeline using pitch-class bitmasks, candidate generation, score normalization, and musical heuristics.

0 favorites 0 likes

#audio-processing

I built Derpy Turtle: The Kokoro Trainer, a GUI for training better Kokoro voices with RVC

Reddit r/LocalLLaMA ↗ · 2026-05-12 Cached

Derpy Turtle is a Windows GUI tool designed to enhance Kokoro voice outputs by integrating voice search, RVC model training, and post-generation voice conversion into a unified workflow.

1 favorites 1 likes

#audio-processing

getting past the text only bottleneck with multimodal??

Reddit r/AI_Agents ↗ · 2026-05-11

The article discusses how multimodal AI models like GPT-4o and Claude 3.5 Sonnet are overcoming text-only bottlenecks by enabling visual debugging, audio-to-data conversion, and enhanced RAG systems.

0 favorites 0 likes

#audio-processing

@gdb: GPT-Realtime-2 for instantly translating audio in realtime

X AI KOLs Following ↗ · 2026-05-09

GPT-Realtime-2 is introduced as a tool for instant real-time audio translation.

0 favorites 0 likes

#audio-processing

@Prince_Canuma: mlx-audio v0.4.3 is here A massive release across models, server, and DX → 6 new TTS models: Higgs Audio v2 (voice clon…

X AI KOLs Timeline ↗ · 2026-05-09 Cached

mlx-audio v0.4.3 releases with 6 new TTS models including Higgs Audio v2 and OmniVoice (646+ languages), plus server improvements like concurrent requests and continuous batching, ~3x faster Voxtral Realtime on 4-bit, and slimmer dependencies for Apple Silicon.

1 favorites 1 likes

#audio-processing

Guitar tuner that uses phone accelerometer

Hacker News Top ↗ · 2026-05-08 Cached

A web-based guitar tuner that utilizes the phone's accelerometer to detect string vibrations and calculate pitch.

0 favorites 0 likes

audio-processing

Submit Feedback