Tag
Amazon announces a new Alexa+ feature called Alexa Podcasts that generates podcast episodes on any topic using AI, with options to customize length and tone, narrated by AI host voices.
A developer built ClawVibe, an iOS app for hands-free voice interaction with AI agents, featuring on-device speech recognition and TTS for low latency.
The developer spent 14 months creating an AI physical prototype device named Keito, based on the ESP32 chip. It supports features such as voice conversation, real-time lip-sync animation, capacitive touch interaction, music playback, and weather query, aiming to liberate AI from the text box.
Meta AI is evolving from a chat box into an always-on perception layer, adding voice conversations, real-time camera AI capabilities, and gradually moving into glasses form, enabling AI to see, hear, and understand the world in front of the user.
The paper introduces MIST, a synthetic dataset and framework for training multimodal voice assistants to control IoT devices in smart homes. It highlights significant performance gaps between open and closed-weight models in handling complex, speech-based tool-calling tasks.
The author introduces SAVI, an iOS app designed for ADHD users that converts voice brain dumps into structured tasks and reminders using on-device AI like Whisper and GPT-4o.
A developer built a JARVIS-style personal assistant called CYBER with wake word activation, local voice cloning via XTTS v2, vision mode, and LLM-generated system commands, all running locally without cloud dependencies.
Cardamom is an AI-powered phone ordering system designed for takeout-heavy restaurants.
Parloa has evolved its platform to an AI Agent Management Platform (AMP) using GPT-5.4, enabling enterprises to design, simulate, and deploy voice and text service agents without coding.
EchoChain is a new benchmark for evaluating AI models' ability to revise in-progress responses when users interrupt mid-generation. The benchmark identifies three failure patterns (contextual inertia, interruption amnesia, objective displacement) and finds that across evaluated real-time voice models, no system exceeds 50% pass rate.
ARKAD Wallet is a product that allows users to talk to their finances to improve personal finance management.
At I/O 2026, Google unveiled the Android XR smart glasses ecosystem. The first audio glasses, powered by Gemini, will launch in fall 2026, offering hands-free voice assistance, navigation, cross-app operations, and real-time translation, in partnership with Samsung, Gentle Monster, and Warby Parker.