roboflow/supervision
Summary
roboflow/supervision is an open-source Python toolkit for computer vision that provides reusable building blocks for data loading, annotation, and real-time processing, with model-agnostic support for popular libraries.
View Cached Full Text
Cached at: 05/14/26, 12:15 PM
roboflow/supervision
Source: https://github.com/roboflow/supervision
π hello
We are your essential toolkit for computer vision. From data loading to real-time zone counting, we provide the building blocks so you can focus on building applications around your models. π€
π» install
Pip install the supervision package in a Python>=3.9 environment.
pip install supervision
Read more about conda, mamba, and installing from source in our guide.
π₯ quickstart
models
Supervision was designed to be model agnostic. Just plug in any classification, detection, or segmentation model. For your convenience, we have created connectors for the most popular libraries like Ultralytics, Transformers, MMDetection, or Inference. Other integrations, like rfdetr, already return sv.Detections directly.
Install the optional dependencies for this example with pip install pillow rfdetr.
import supervision as sv
from PIL import Image
from rfdetr import RFDETRSmall
image = Image.open(...)
model = RFDETRSmall()
detections = model.predict(image, threshold=0.5)
len(detections)
# 5
π more model connectors
-
inference
Running with Inference requires a Roboflow API KEY.
import supervision as sv from PIL import Image from inference import get_model image = Image.open(...) model = get_model(model_id="rfdetr-small", api_key="ROBOFLOW_API_KEY") result = model.infer(image)[0] detections = sv.Detections.from_inference(result) len(detections) # 5
annotators
Supervision offers a wide range of highly customizable annotators, allowing you to compose the perfect visualization for your use case.
import cv2
import supervision as sv
image = cv2.imread(...)
detections = sv.Detections(...)
box_annotator = sv.BoxAnnotator()
annotated_frame = box_annotator.annotate(scene=image.copy(), detections=detections)
https://github.com/roboflow/supervision/assets/26109316/691e219c-0565-4403-9218-ab5644f39bce
datasets
Supervision provides a set of utils that allow you to load, split, merge, and save datasets in one of the supported formats.
import supervision as sv
from roboflow import Roboflow
project = Roboflow().workspace("WORKSPACE_ID").project("PROJECT_ID")
dataset = project.version("PROJECT_VERSION").download("coco")
ds = sv.DetectionDataset.from_coco(
images_directory_path=f"{dataset.location}/train",
annotations_path=f"{dataset.location}/train/_annotations.coco.json",
)
path, image, annotation = ds[0]
# loads image on demand
for path, image, annotation in ds:
# loads image on demand
pass
π more dataset utils
-
load
dataset = sv.DetectionDataset.from_yolo( images_directory_path=..., annotations_directory_path=..., data_yaml_path=..., ) dataset = sv.DetectionDataset.from_pascal_voc( images_directory_path=..., annotations_directory_path=..., ) dataset = sv.DetectionDataset.from_coco( images_directory_path=..., annotations_path=..., ) -
split
train_dataset, test_dataset = dataset.split(split_ratio=0.7) test_dataset, valid_dataset = test_dataset.split(split_ratio=0.5) len(train_dataset), len(test_dataset), len(valid_dataset) # (700, 150, 150) -
merge
ds_1 = sv.DetectionDataset(...) len(ds_1) # 100 ds_1.classes # ['dog', 'person'] ds_2 = sv.DetectionDataset(...) len(ds_2) # 200 ds_2.classes # ['cat'] ds_merged = sv.DetectionDataset.merge([ds_1, ds_2]) len(ds_merged) # 300 ds_merged.classes # ['cat', 'dog', 'person'] -
save
dataset.as_yolo( images_directory_path=..., annotations_directory_path=..., data_yaml_path=..., ) dataset.as_pascal_voc( images_directory_path=..., annotations_directory_path=..., ) dataset.as_coco( images_directory_path=..., annotations_path=..., ) -
convert
sv.DetectionDataset.from_yolo( images_directory_path=..., annotations_directory_path=..., data_yaml_path=..., ).as_pascal_voc( images_directory_path=..., annotations_directory_path=..., )
π¬ tutorials
Want to learn how to use Supervision? Explore our how-to guides, end-to-end examples, cheatsheet, and cookbooks!
Dwell Time Analysis with Computer Vision | Real-Time Stream Processing
Learn how to use computer vision to analyze wait times and optimize processes. This tutorial covers object detection, tracking, and calculating time spent in designated zones. Use these techniques to improve customer experience in retail, traffic management, or other scenarios.
Speed Estimation & Vehicle Tracking | Computer Vision | Open Source
Learn how to track and estimate the speed of vehicles using YOLO, ByteTrack, and Roboflow Inference. This comprehensive tutorial covers object detection, multi-object tracking, filtering detections, perspective transformation, speed estimation, visualization improvements, and more.
π built with supervision
Did you build something cool using supervision? Let us know!
https://user-images.githubusercontent.com/26109316/207858600-ee862b22-0353-440b-ad85-caa0c4777904.mp4
https://github.com/roboflow/supervision/assets/26109316/c9436828-9fbf-4c25-ae8c-60e9c81b3900
https://github.com/roboflow/supervision/assets/26109316/3ac6982f-4943-4108-9b7f-51787ef1a69f
π documentation
Visit our documentation page to learn how supervision can help you build computer vision applications faster and more reliably.
π contribution
We love your input! Please see our contributing guide to get started. Thank you π to all our contributors!
Similar Articles
obra/superpowers
Superpowers is an open-source software development methodology and plugin framework for AI coding agents, designed to enforce structured workflows like TDD and specification-driven design across tools like Claude Code, Cursor, and Copilot.
Roboschool
OpenAI releases Roboschool, an open-source robot simulation environment integrated with OpenAI Gym featuring twelve environments including enhanced humanoid locomotion tasks and multi-agent settings like Pong.
RoboEvolve: Co-Evolving Planner-Simulator for Robotic Manipulation with Limited Data
RoboEvolve is a framework that co-evolves a VLM planner and VGM simulator for robotic manipulation, achieving data efficiency with only 500 unlabeled seed images and robust continual learning.
robbyant/lingbot-map
LingBot-Map is a feed-forward 3D foundation model for streaming 3D reconstruction that uses a Geometric Context Transformer architecture, achieving state-of-the-art performance with efficient ~20 FPS inference on long sequences exceeding 10,000 frames.
Steering Visual Generation in Unified Multimodal Models with Understanding Supervision
This paper introduces UNO, an Understanding-Oriented Post-Training framework that uses comprehension tasks as supervisory signals to enhance image generation and editing in unified multimodal models.