@FinanceYF5: Alibaba just released Wan Streamer. AI agents can now see you, hear you, and respond to you in real time via video. This is no longer just a "voice mode."
Summary
Alibaba has launched Wan Streamer, an AI agent capable of seeing, hearing, and responding in real time via video.
View Cached Full Text
Cached at: 06/30/26, 03:35 AM
Alibaba has just released Wan Streamer.
AI agents can now see you, hear you, and respond to you in real time via video.
This is no longer just “voice mode” 🤯. https://t.co/isZOeoTyaD
Similar Articles
@FinanceYF5: Meta AI is transforming from a 'chat box' into an always-on perception layer. Alexandr Wang mentioned that the Muse Spark update includes voice conversations, real-time camera AI, and a gradual transition into glasses. The point is not just another voice assistant, but AI beginning to see, hear, and understand the world in front of you.
Meta AI is evolving from a chat box into an always-on perception layer, adding voice conversations, real-time camera AI capabilities, and gradually moving into glasses form, enabling AI to see, hear, and understand the world in front of the user.
@IndieDevHailey: Tencent's new Marvis truly embeds AI into the operating system! My computer finally understands human language. Previously, I had to manually search for files, spend ages tweaking settings, and keep an eye on scheduled tasks… Now with just one sentence, the AI does it all for me. This is Tencent's latest Marvis — a true system-level AI assistant.
Tencent has released a system-level AI assistant called Marvis, which can directly access operating system resources. It performs file searches, system settings, scheduled tasks, etc., via natural language commands, and supports multi-agent collaboration and real-time token consumption display.
@FinanceYF5: 1/ Voice Agent Upgraded: OpenAI Launches GPT-Realtime-2, Bringing GPT-5-Level Reasoning to Real-Time Voice API. Voice assistants no longer just "understand and respond" — they can think while listening, and problem-solve while chatting.
OpenAI has launched GPT-Realtime-2, integrating GPT-5-level reasoning into the real-time voice API, enabling voice assistants to think and solve problems in real time during conversations.
@FinanceYF5: OpenAI's new voice model Bidi 1 first test exposure - Bidirectional voice design: while you speak, it listens; you can interrupt mid-way to switch tasks immediately, no longer grabbing the conversation when you pause. It also supports real-time translation, and context memory is much stronger than the current Advanced Voice. It's now being pushed to a small group, ChatGPT …
OpenAI's new voice model Bidi 1 first test exposure, supports bidirectional voice design, real-time translation, and stronger context memory, currently being pushed to a small group on ChatGPT.
@FinanceYF5: This AI is impressive. LingBot-Map can convert real-time video streams into real-time 3D reconstruction. 20 FPS code + model
LingBot-Map is an AI model capable of converting real-time video streams into real-time 3D reconstruction, running at 20 FPS with complete code and model provided.