Tag
LiteParse is a fast document parsing tool that runs locally, achieving ~3ms per page by skipping heavy AI and cloud overhead. It uses deterministic layout heuristics and selective OCR to output structured Markdown, making it ideal for real-time RAG pipelines and coding agents.
An open-source real-time global intelligence dashboard that tracks conflicts, military activities, infrastructure, protests, and market signals, runs in the browser, and is licensed under MIT.
A free open-source tool enables real-time face-swapping on live webcam using a single photo, with 93k GitHub stars.
LingBot-Map is an open-source, real-time streaming 3D reconstruction model that uses a single camera, running at ~20 FPS via a feed-forward geometric context transformer, outperforming both streaming and offline methods.
Sierra Platform's approach to voice agents parallelizes thinking, listening, and talking to mimic human conversation, as discussed on the Max Agency podcast.
This tutorial paper presents NeuraDock Agent, an open-source EEG workflow for visual cognitive load analysis with alpha dynamics, including preprocessing, quality control, real-time API, and LLM interpretation.
Liquid AI releases LFM2.5-230M, a small 230M parameter model optimized for fast inference on CPUs, NPUs, and GPUs, targeting agentic tasks on devices like phones and robots.
A traditional oil painting artist developed the open-source tool Bob Jack Painter, which uses a real-time camera to map oil paint textures from a physical canvas onto 3D models, enabling the workflow of texturing digital 3D objects with real oil paint.
The author argues that for live voice agents, STT latency and real-time behavior are more critical than raw transcription accuracy, and proposes a different evaluation scorecard.
jxnlco demonstrates gpt-realtime-2's ability to process in-context wake words and reasoning by building a Simon Says game that beats him.
Signspell is a Python package for real-time American Sign Language alphabet recognition, installable via pip.
A South Korean AI app goes viral for enabling lifelike video conversations with AI characters that use voice, lip sync, facial expressions, and camera context, signaling a shift from text-based interfaces to real-time video-native interactions.
Mel AI is evolving AI characters from text-based interactions to real-time video chat, with lip sync, facial expressions, and camera context awareness, following the success of Character AI.
KroWork is a newly launched tool that converts AI chat conversations into reusable desktop applications, allowing non-technical users to create workflows via natural language that run locally without consuming tokens on restart. It enables tasks like real-time stock monitoring for free.
Explores whether OpenAI's Whisper remains the top choice for real-time speech-to-text applications, considering alternatives and performance trade-offs.
NVIDIA quietly released Nemotron-3.5-ASR, a lightweight 0.6B parameter open-source speech recognition model designed for real-time streaming with support for 40+ languages, low latency, and cache-aware architecture.
Wan-Streamer is a unified end-to-end multimodal model for real-time audio-visual interaction using causal attention and integrated processing of visual, audio, and text modalities, achieving sub-second latency.
The blog post describes using local open-weight models like Gemma and Qwen in an agent harness to automatically triage issues and pull requests in the OpenClaw repository, enabling real-time notifications without relying on costly closed API models.
TownSquare is a tiny presence layer for websites that lets visitors see each other and interact in real-time with no accounts or algorithms, using a single script tag.
Researchers introduced T-Rex, a framework that integrates vision, language, and tactile sensing, enabling robots to respond to physical contact in real time rather than relying solely on vision.