Tag
NVIDIA has open-sourced the visual grounding model LocateAnything-3B, which can accurately detect and bound all target objects in dense scenes.
A developer built an AI basketball coach using Roboflow RF-DETR for detection, MediaPipe for body angles, and OpenCV for analysis and annotation.
YOLO26 is a multi-task computer vision model family released in January 2026, featuring end-to-end detection without Non-Maximum Suppression for lower latency and optimized for edge deployment with improved CPU inference and compact design.
NVIDIA released LocateAnything, an open-source model that achieves ~10x faster object detection by predicting all coordinates simultaneously instead of sequentially, reaching 12.7 FPS on a single H100 and outperforming 32B parameter models.
Ultralytics YOLO26 introduces a unified real-time vision model family with NMS-free inference, improved training strategies, and multi-task capabilities for detection, segmentation, and pose estimation, achieving state-of-the-art accuracy-latency trade-offs.
NVIDIA introduces LocateAnything, a unified generative grounding and detection framework that uses Parallel Box Decoding to improve decoding throughput and localization accuracy. This work will be presented at CVPR 2026.
LocateAnything proposes Parallel Box Decoding for unified visual grounding and object detection, decoding geometric elements as atomic units to improve throughput and localization accuracy, supported by a large-scale dataset of 138M samples.
The RF-DETR model proposed at ICLR2026 combines Transformer's high accuracy with real-time performance, achieving high scores in 100 real-world scenarios and offering sizes from Nano to 2XL, potentially replacing YOLO in real-time detection.
A social media post expresses excitement about the return or renewed relevance of the YOLOv3 object detection model.
Article concerning YOLO, the widely used real-time object detection model family.
Interfaze AI introduces a specialized model that surpasses general LLMs on deterministic developer tasks including OCR, object detection, web scraping, speech-to-text, and classification.
A user is seeking advice on improving their object detection model trained with YOLO11n for deployment on a Raspberry Pi 5, struggling with the gap between theoretical mAP50 metrics and practical detection performance.
Meta AI releases SAM 3.1, an update to the Segment Anything Model that enhances real-time video detection and tracking through multiplexing and global reasoning capabilities.
MIT researchers have developed a generative AI-enhanced wireless vision system that reconstructs hidden objects and entire room scenes using millimeter-wave signals, overcoming previous limitations in shape reconstruction and enabling applications in warehouse robotics and smart homes.
This blog post details how to set up Frigate with a Hailo AI coprocessor on a Raspberry Pi for object detection, including steps to fix a PCIe descriptor page size error. The setup works with the cheaper Hailo-8L and achieves low inference times.
SAM 3 introduces a unified model for promptable concept segmentation and tracking, achieving state-of-the-art performance with a decoupled recognition and localization architecture and a scalable data engine.
RF-DETR introduces a lightweight detection transformer that uses weight-sharing neural architecture search to achieve state-of-the-art real-time object detection, outperforming prior methods on COCO and Roboflow100-VL while running up to 20x faster.
Frigate is an open-source NVR designed for Home Assistant that performs real-time AI object detection on IP camera feeds locally using OpenCV and TensorFlow. It features tight Home Assistant integration, motion-based detection, and efficient resource usage.
Grounding DINO is an open-vocabulary object detection model that can detect arbitrary objects based on text descriptions, now available on Replicate.