Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection 论文初读

时间：2024-09-03 16:22:02浏览次数：17

标签：Pre Training set DINO 检测器 AP Grounding COCO

Abstract

In this paper, we present an open-set object detector, called Grounding DINO, by marrying Transformer-based detector DINO with grounded pre-training, which can detect arbitrary objects with human inputs such as category names or referring expressions. The key solution of openset object detection is introducing language to a closed-set detector for open-set concept generalization. To effectively fuse language and vision modalities, we conceptually divide a closed-set detector into three phases and propose a tight fusion solution, which includes a feature enhancer, a language-guided query selection, and a cross-modality decoder for cross-modality fusion. While previous works mainly evaluate open-set object detection on novel categories, we propose to also perform evaluations on referring expression comprehension for objects specified with attributes. Grounding DINO performs remarkably well on all three settings, including benchmarks on COCO, LVIS, ODinW, and RefCOCO/+/g. Grounding DINO achieves a 52.5 AP on the COCO detection zero-shot transfer benchmark, i.e., without any training data from COCO. After finetuning with COCO data, Grounding DINO reaches 63.0 AP. It sets a new record on the ODinW zero-shot benchmark with a mean 26.1 AP. Code will be available at https://github.com/IDEA-Research/GroundingDINO.
在本文中，我们通过将基于变换器的检测器 DINO 与接地预训练相结合，提出了一种名为接地 DINO 的开放集对象检测器，它可以通过类别名称或引用表达等人工输入来检测任意对象。开放集对象检测的关键解决方案是在封闭集检测器中引入语言，以实现开放集概念泛化。为了有效地融合语言和视觉模式，我们从概念上将封闭集检测器分为三个阶段，并提出了一个紧密的融合解决方案，其中包括特征增强器、语言引导的查询选择和用于跨模式融合的跨模式解码器。以往的工作主要是对新类别的开放集对象检测进行评估，而我们建议同时对带有属性的对象的引用表达理解进行评估。接地 DINO 在所有三种设置中都表现出色，包括 COCO、LVIS、ODinW 和 RefCOCO/+/g 的基准测试。接地型 DINO 在 COCO 检测零镜头传输基准（即没有 COCO 的任何训练数据）上的得分达到了 52.5 分。在使用 COCO 数据进行微调后，Grounding DINO 达到了 63.0 AP。它以平均 26.1 AP 的成绩刷新了 ODinW 零点传输基准的记录。代码见 https://github.com/IDEA-Research/GroundingDINO。

Summary

提出的创新点

使用闭集检测器DINO，通过在多个阶段执行视觉语言模态融合扩展到开集检测。其中包括

标签：Pre,Training,set,DINO,检测器,AP,Grounding,COCO
From： https://blog.csdn.net/m0_55898550/article/details/141865236

Advanced Spreadsheets using Excel
AdvancedSpreadsheets using ExcelSoftware: Microsoft ExcelIntroductiontoCourseThiscoursewill buildontheskillsyou have developed intheTerm 1 Spreadsheetscourse.Itwillextendyourproficiency using Excel, introducing yo......
程序员开发必备MySQL数据可视化视图工具Navicat Premium Lite 精简版安装教程
程序员开发必备MySQL数据可视化视图工具NavicatPremiumLite精简版使用：https://blog.csdn.net/jky_yihuangxing/article/details/141854667文章目录1.软件官方下载地址2.软件介绍3.下载安装步骤1.软件官方下载地址https://www.navicat.com.cn/products#navi......
SQL*Loader Express Mode
OracleDatabase12C中的SQL*Loader新增加了ExpressMode，借助这个特性，可以在最小化配置的情况下加载数据（比如无需要创建Controlfile)SQL>conntest01/test01Connected.SQL>createtabletest2(regionchar(3),3region_na......
Python中，使用`sklearn.preprocessing`模块中的`StandardScaler`或`MinMaxScaler`可以
在Python中，使用`sklearn.preprocessing`模块中的`StandardScaler`或`MinMaxScaler`可以对数据进行标准化或归一化处理。以下是如何对一个列表（list）中的数据进行标准化的示例：第一结合numpy###使用StandardScaler进行标准化（Z-scorenormalization）`StandardScaler`将数据转换为均值......
Pifithrin-α hydrobromide 是一种 p53 抑制剂 |MedChemExpress (MCE)
CAS：63208-82-2品牌：MedChemExpress(MCE)存储条件：4°C,sealedstorage,awayfrommoisture生物活性：Pifithrin-αhydrobromide是一种 p53 抑制剂，可阻断其转录活性并防止细胞凋亡。Pifithrin-αhydrobromide也是一种 arylhydrocarbonreceptor(AhR) 激动剂。IC5......
Sanguinarine (Sanguinarin) chloride 是一种苯并菲啶生物碱 |MedChemExpress (MCE)
中文名：氯化血根碱CAS：5578-73-4品牌：MedChemExpress(MCE)存储条件：4°C,sealedstorage,awayfrommoistureandlight生物活性：Sanguinarine(Sanguinarin)chloride是一种苯并菲啶生物碱，来源于 SanguinariaCanadensis 的根部，可通过激活活性氧(ROS)的产生来刺激细胞......
论文阅读01-Improving Closed and Open-Vocabulary Attribute Prediction using Trans
论文框架研究背景和动机这篇论文试图解决什么问题？为什么这个问题重要？这个问题在当前的研究领域中有哪些已知的解决方案？研究方法和创新点论文提出了什么新的方法或模型？这个方法或模型是如何工作的？它与现有的方法相比有哪些改进？论文中的创新点是否显著且有实际意义？理......
解决vs2022 工具箱中不显示 Devexpress控件的问题
无效果的尝试1、在工具箱点右键选择RepairToolBox...无效果2、在工具箱中点右键重置工具箱无效果3、在工具箱中右键选择项浏览选择对应的dll，报错提示“面向无法动态枚举工具箱项目的平台”无效果4、删除整个vs配置文件夹：%LocalAppData%\Microsoft\VisualStudio<ver......
preload、prefetch
preloadpreload是一种声明式的资源预加载技术，它告诉浏览器页面即将需要的资源，并请求浏览器提前加载这些资源。preload的主要特点包括：高优先级：preload加载的资源通常具有较高的优先级，浏览器会尽可能快地加载这些资源，但加载过程不会阻塞文档的解析或window的onload事件。指定资......

Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection 论文初读

Abstract

Summary

提出的创新点

相关文章

赞助商

阅读排行