adirik/grounding-dino

Replicate Explore 模型

grounding-dino object-detection open-vocabulary text-guided replicate model-deployment

摘要

Grounding DINO 是一个开放词汇的目标检测模型，能够根据文本描述检测任意对象，现已在 Replicate 上可用。

adirik / grounding-dino

查看原文

查看缓存全文

缓存时间: 2026/05/08 06:25

# adirik/grounding-dino – Replicate 来源：https://replicate.com/adirik/grounding-dino ## 自述文件 Grounding DINO 能够通过人类文本输入（如类别名称或指代表达）检测任意物体。该模型架构将基于 Transformer 的检测器 DINO 与接地预训练相结合，以实现开放词汇/文本引导的目标检测。详情请参阅论文 (https://arxiv.org/abs/2303.05499) 和原始仓库 (https://github.com/IDEA-Research/GroundingDINO)。 ## 使用 API 你可以使用 Grounding DINO 通过任意物体的文本描述来查询图像。用法很简单：上传一张图片，然后输入用逗号分隔的、你想查询的物体文本描述。预期输入参数如下： - **image:** 你的输入图像 - **query:** 描述你要检测的物体的文本查询，多个查询用逗号分隔 - **box\_threshold:** 选择最高相似度高于 box\_threshold 的边界框 - **text\_threshold:** 提取相似度高于 text\_threshold 的词语作为预测标签 ## 参考文献 ``` @article{liu2023grounding, title={Grounding dino: Marrying dino with grounded pre-training for open-set object detection}, author={Liu, Shilong and Zeng, Zhaoyang and Ren, Tianhe and Li, Feng and Zhang, Hao and Yang, Jie and Li, Chunyuan and Yang, Jianwei and Su, Hang and Zhu, Jun and others}, journal={arXiv preprint arXiv:2303.05499}, year={2023} } ``` 模型创建于 1 年多以前

adirik/grounding-dino

相似文章

idea-research/ram-grounded-sam

蚂蚁集团发布LingBot-Vision：DINO系列视觉骨干网络，提供四种规模，其中0.3B参数的ViT-L在NYUv2深度估计任务上以约23倍更少的参数量达到与DINOv3-7B相当的性能

LocateAnything: 快速高质量的视觉-语言定位与并行框解码

探索视觉嵌入

@AdinaYakup: LingBot Vision 来自蚂蚁集团的一种用于密集空间感知的自监督视觉骨干网络家族 @robbyant_brain - A…

提交意见反馈