首页 > 其他分享 >Object Detection: Non-Maximum Suppression (NMS)

Object Detection: Non-Maximum Suppression (NMS)

时间:2024-08-17 21:50:58浏览次数:11  
标签:box Non NMS IoU Object boxes score bounding

Object Detection: Non-Maximum Suppression (NMS)

https://kikaben.com/object-detection-non-maximum-suppression/

Object detection models like YOLOv5 and SSD predict objects’ locations by generating bounding boxes (shown in blue rectangles below).

However, object detection models produce more bounding boxes than the final output with different locations, sizes, and confidence levels. They do not just predict one bounding box per object. It is where Non-Maximum Suppression (NMS) comes to play, keeping the most probable bounding boxes and eliminating other less likely bounding boxes.

This article explains how NMS works.

1 Overlapping Bounding Boxes

If we don’t do NMS, an object detection output may look like the one below, with many overlapping bounding boxes. There would be many more bounding boxes than in the image below, but that would make the image too chaotic, so I’m showing only some overlapping bounding boxes to get my point across.

Non-Maximum Suppression

Some object detection models like YOLO (since version 2) use anchors to generate predictions. Anchors are a predefined set of boxes (width and height) located at every grid cell. For example, YOLOv2 predicts more than a thousand bounding boxes. In other words, anchor boxes provide reasonable priors for object shapes (width-height ratios) calculated from the training dataset. The model only needs to predict an offset and scale to each anchor box, simplifying the network as we can make it fully convolutional.

In this article, we are not going into more detail about anchors. We only need to know that an object detection model generates many bounding boxes, and we need to apply post-processing to eliminate redundant ones.

Each predicted bounding box has a confidence score which indicates how likely (the model believes) an object exists in a bounding box. For example, the model may output a bounding box for a dog with a confidence score of 75%. The confidence score tells us which bounding boxes are more important (or less important). So, we can use it to rank bounding boxes. We’ll call it “score” in this article.

Now, we are ready to discuss how NMS works. For simplicity, we’ll deal with only one class (“dog”). Doing so does not change the nature of NMS. We’ll touch upon the multi-class case later on.

2 Non-Maximum Suppression

As the name suggests, NMS means “suppress the ones that are not maximum (score)”. We are eliminating predicted bounding boxes overlapping with the highest score bounding box. For example, the image below has multiple predicted bounding boxes for a dog, and those bounding boxes overlap each other.

Overlapping Bounding Boxes

Since each bounding box has a score, we can sort them by descending order. Suppose the red bounding box has the highest score.

Confidence Scores

For each of the blue bounding boxes (with a lower score than the red one), we calculate the IoU (Intersection over Union) with the red one. An IoU value represents an overlap between two boxes, ranging from 0 (no overlap) to 1 (maximum overlap). The below image shows the IoU concept, where the black part is an overlap area (intersection) between two boxes. Then, we calculate the IoU between the blue and red boxes as the intersection over the union area of the two boxes.

IoU

We set an IoU threshold (hyperparameter) to determine if two predicted bounding boxes are for the same object or not. Let’s say we use 0.5 as the IoU threshold. If the above blue and red boxes overlap with an IoU value of 0.5 or more, we say they are for the same object (“dog” in this case), and we should suppress (eliminate) the blue box because the red box has a higher score. As we repeat the steps, we eliminate all overlapping (lower score) bounding boxes and leave the highest score one. And that’s how NMS works for a single dog case.

Non-Maximum Suppression

Even if there is more than one dog, the process works the same way. We sort all bounding boxes by score and repeat, eliminating the lower score bounding boxes as per the IoU threshold hyperparameter. First, we eliminate all overlapping bounding boxes for the highest score bounding box. Then, we’ll find the next highest score bounding box that does not overlap with the highest score bounding box to eliminate lower score ones. And the process repeats.

The following summarizes the NMS steps for one class:

  1. Prepare an empty list for selected bounding boxes. outputs = []
  2. Sort all bounding boxes by score (descending) and call the list bboxes.
  3. Take out the highest score bounding box from bboxes and put it into outputs.
  4. Calculate IoU between the highest score bounding box with other bounding boxes in bboxes. We remove bounding boxes from bboxes if IoU is 0.5 or higher. (The IoU threshold is a hyperparameter).
  5. If bboxes is not empty, we repeat the process from step 3.
  6. Finally, we return outputs containing the non-overlapping bounding boxes (per the IoU threshold).

Hopefully, NMS is not that scary or complicated to understand. In reality, we’d use a function provided by a library like PyTorch (torchvision.ops.nms), so it’s easy to perform NMS. YOLOv5 also uses that internally.

Now, let’s extend the process to multiple classes.

2.1 Dealing with Multiple Classes

So far, we have talked about one “dog” class only. We must deal with multiple classes like “cat” and others in real object detection datasets. In this case, we perform NMS for each class as an object-detection model outputs scores for all classes the target dataset supports.

If there are 80 classes, a model produces 80 probabilities per bounding box. For example, a model would predict a bounding box with confidence scores like 75% “dog”, 20% “cat”, and some probabilities for the other 78 classes. There would be many bounding boxes, each with 80 probabilities.

We perform NMS independently for each class, ranking bounding boxes by score within one class. We do that for “dog”, “cat”, and others. It can work well even when a dog and a cat are very close to each other since we are dealing with different classes separately, overlapping bounding boxes for different classes can survive NMS.

However, we may not want to repeat NMS for each class because it takes too much time.

2.2 Dealing with Slowness

NMS is a sequential process, and it cannot run in parallel. We are dealing with 80 or so classes, so running NMS for thousands of bounding boxes may take too much time. For example, to detect running vehicles via a stream of video images, we’d want to reduce latency as much as possible.

Therefore, implementations such as YOLOv5 shift the (x, y) coordinates of bounding boxes for each class so that bounding boxes in one class never overlap with bounding boxes in another class. Such an arrangement allows one NMS process to handle all classes in an image in one NMS function call instead of 80 NMS function calls.

2.3 Eliminate Low Score Predictions First

Since more bounding boxes require more time in NMS, we should probably eliminate low-score predictions as they will likely not survive NSM anyway.

In other words, we set a confidence threshold to eliminate bounding boxes with less than a specific score value. It’s yet another hyperparameter. For example, we could set the confidence threshold to 0.05 and eliminate all bounding boxes with 0.05 or lower scores before starting NMS. If the confidence threshold is high enough, it can remove many bounding boxes and dramatically improve the speed.

However, it can sacrifice the accuracy (mAP) because a lower score does not necessarily mean the prediction is wrong. So, we should be careful when setting this hyperparameter, as it balances speed and accuracy.

Another way of eliminating lower score bounding boxes is to limit the number of bounding boxes we give to NMS. It is another hyperparameter. For example, if the limit is 1000, we enter only the top 1000 bounding boxes into NMS. Since we have less number of bounding boxes handled by NMS, it could run a lot faster depending on the limit. In terms of accuracy, one image won’t usually have so many ground truth bounding boxes, so it may still produce a good mAP. However, as mentioned before, a low score does not necessarily mean the prediction is wrong. So, this approach still requires us to adjust the hyperparameter carefully.

 

标签:box,Non,NMS,IoU,Object,boxes,score,bounding
From: https://www.cnblogs.com/lightsong/p/18365036

相关文章

  • 【项目实战】商务智能BI工具(MicroStrategy、Cognos、SAP Business Objects/BO)
    一、商务智能BI商务智能,BusinessIntelligence,简称BI。商务智能工具,是帮助企业分析数据、生成报告、创建仪表盘和可视化数据的重要软件。MicroStrategy、Cognos和BusinessObjects(通常称为BO)都是业界知名的BI解决方案二、MicroStrategyMicroStrategy是一家专注于数据分......
  • JAVA中的对象流ObjectInputStream
    ObjectInputStream是Java中用于序列化对象的一种输入流,它允许我们将对象的状态信息从输入流中读取出来,以便在后续程序中使用。本文将详细介绍ObjectInputStream的原理、使用方法以及相关代码例子。一、ObjectInputStream简介概述ObjectInputStream继承了InputStream类,主要......
  • JAVA中的ObjectOutputStream类
    ObjectOutputStream是Java中用于序列化对象的一种输出流,它可以将Java对象的状态信息转换为字节流,以便于存储或通过网络传输。序列化是将对象转换为字节流的过程,而反序列化则是将字节流恢复为对象的过程。本文将详细介绍ObjectOutputStream的原理、使用方法以及相关代码例子。......
  • JS中【Object.defineProperties】知识点介绍
    在JavaScript中,Object.defineProperties()是一个非常强大的方法,用来一次性定义或修改一个对象的多个属性的属性描述符。下面是关于Object.defineProperties()的详细讲解。基本语法Object.defineProperties(obj,props)obj:要定义或修改属性的目标对象。props:一个对......
  • 深入解析Objective-C中NSParagraphStyle的段落样式处理艺术
    标题:深入解析Objective-C中NSParagraphStyle的段落样式处理艺术在Objective-C的世界中,文本排版是一个复杂但至关重要的话题。NSParagraphStyle作为其中的核心组件,扮演着决定文本段落外观和布局的关键角色。本文将深入探讨NSParagraphStyle的内部机制,并通过实际代码示例,展示......
  • 解锁文本奥秘:NSLinguisticTagger在Objective-C中的语言分析之旅
    标题:解锁文本奥秘:NSLinguisticTagger在Objective-C中的语言分析之旅引言在Objective-C的丰富生态中,NSLinguisticTagger扮演着自然语言处理的重要角色。它提供了一套强大的API,用于对文本进行分词和标注,帮助开发者理解文本的结构和含义。本文将深入探讨NSLinguisticTagger的......
  • WebApi 简单使用 JObject,可以省掉自定义的class类
    post提交的json数据:{"name":"Jason","age":18,"color":"blue"}usingNewtonsoft.Json.Linq;[HttpPost("testpost")]publicstringTestPost([FromBody]Objectinput){......
  • 《python程序语言设计》2018版第7章第2题创建一个stock类,一个公司股票。创建stock,包含
    使用百分比法计算股票变化值百分比法是计算股票变化值的常用方法。具体操作是:将当前股票价格与前一交易日的股票价格进行比较,计算出价格变动的百分比。公式为:(当前价格-前一交易日价格)/前一交易日价格×100%。这种方法简单明了,可以快速得出股票变化的百分比。......
  • [Design Pattern] Value Object
    ProblemtoSolveReparesentavaluethatisimmutableanddistinctfromotherobjectsbasedonitspropertiesratherthanitsidentity. SolutionCreateaclasswhereinstancesareconsideredequalifalltheirpropertiesareequalsandensuretheobject......
  • osg,objectarx及occt之间矩阵的转换
    osg的矩阵表达形式为osg::MatrixObjectArx的矩阵表达式为AcGeMatrix3docct的矩阵表达式分gp_Trsf及支持变形的gp_GTrsf osg矩阵转化为ObjectArxosg::MatrixmVPW;TcGeMatrix3dmatrix;for(inti=0;i<4;++i){for(intj=0;j<4;++j){matr......