Tag
The paper introduces MIST, a synthetic dataset and framework for training multimodal voice assistants to control IoT devices in smart homes. It highlights significant performance gaps between open and closed-weight models in handling complex, speech-based tool-calling tasks.
The author introduces SAVI, an iOS app designed for ADHD users that converts voice brain dumps into structured tasks and reminders using on-device AI like Whisper and GPT-4o.
A developer built a JARVIS-style personal assistant called CYBER with wake word activation, local voice cloning via XTTS v2, vision mode, and LLM-generated system commands, all running locally without cloud dependencies.
Cardamom is an AI-powered phone ordering system designed for takeout-heavy restaurants.
Parloa has evolved its platform to an AI Agent Management Platform (AMP) using GPT-5.4, enabling enterprises to design, simulate, and deploy voice and text service agents without coding.
EchoChain is a new benchmark for evaluating AI models' ability to revise in-progress responses when users interrupt mid-generation. The benchmark identifies three failure patterns (contextual inertia, interruption amnesia, objective displacement) and finds that across evaluated real-time voice models, no system exceeds 50% pass rate.
ARKAD Wallet is a product that allows users to talk to their finances to improve personal finance management.