Tag
Alibaba has launched Wan Streamer, an AI agent capable of seeing, hearing, and responding in real time via video.
DreamForge-World 0.1 Preview is a low-compute world model that enables real-time interactive simulation on consumer GPUs, supporting keyboard/mouse control and achieving 14-15 FPS at 480p resolution on a single RTX 4090.
LiteParse is a fast document parsing tool that runs locally, achieving ~3ms per page by skipping heavy AI and cloud overhead. It uses deterministic layout heuristics and selective OCR to output structured Markdown, making it ideal for real-time RAG pipelines and coding agents.
An open-source real-time global intelligence dashboard that tracks conflicts, military activities, infrastructure, protests, and market signals, runs in the browser, and is licensed under MIT.
A free open-source tool enables real-time face-swapping on live webcam using a single photo, with 93k GitHub stars.
Mastra has launched Durable Agents, which persist streams in real-time using server cache to survive client disconnects, browser refreshes, and network blips.
LingBot-Map is an open-source, real-time streaming 3D reconstruction model that uses a single camera, running at ~20 FPS via a feed-forward geometric context transformer, outperforming both streaming and offline methods.
Sierra Platform's approach to voice agents parallelizes thinking, listening, and talking to mimic human conversation, as discussed on the Max Agency podcast.
This tutorial paper presents NeuraDock Agent, an open-source EEG workflow for visual cognitive load analysis with alpha dynamics, including preprocessing, quality control, real-time API, and LLM interpretation.
Liquid AI releases LFM2.5-230M, a small 230M parameter model optimized for fast inference on CPUs, NPUs, and GPUs, targeting agentic tasks on devices like phones and robots.
A traditional oil painting artist developed the open-source tool Bob Jack Painter, which uses a real-time camera to map oil paint textures from a physical canvas onto 3D models, enabling the workflow of texturing digital 3D objects with real oil paint.
The author argues that for live voice agents, STT latency and real-time behavior are more critical than raw transcription accuracy, and proposes a different evaluation scorecard.
LiveEdit proposes a causal, frame-by-frame streaming video editing framework that achieves real-time performance (12.66 FPS) via a three-stage distillation pipeline and AR-oriented mask cache, enabling stable long-horizon edits.
jxnlco demonstrates gpt-realtime-2's ability to process in-context wake words and reasoning by building a Simon Says game that beats him.
Signspell is a Python package for real-time American Sign Language alphabet recognition, installable via pip.
A South Korean AI app goes viral for enabling lifelike video conversations with AI characters that use voice, lip sync, facial expressions, and camera context, signaling a shift from text-based interfaces to real-time video-native interactions.
Mel AI is evolving AI characters from text-based interactions to real-time video chat, with lip sync, facial expressions, and camera context awareness, following the success of Character AI.
KroWork is a newly launched tool that converts AI chat conversations into reusable desktop applications, allowing non-technical users to create workflows via natural language that run locally without consuming tokens on restart. It enables tasks like real-time stock monitoring for free.
Explores whether OpenAI's Whisper remains the top choice for real-time speech-to-text applications, considering alternatives and performance trade-offs.
NVIDIA quietly released Nemotron-3.5-ASR, a lightweight 0.6B parameter open-source speech recognition model designed for real-time streaming with support for 40+ languages, low latency, and cache-aware architecture.