Machine learning: Non Maximum Suppression: Theory and Implementation in PyTorch

Non Maximum Suppression (NMS) is a technique used in numerous computer vision tasks. It is a class of algorithms to select one entity (e.g., bounding boxes) out of many overlapping entities. We can choose the selection criteria to arrive at the desired results. The criteria are most commonly some form of probability number and some form of overlap measure (e.g. Intersection over Union).

Why we need it ?

Most object detection algorithms use NMS to whittle down many detected bounding boxes to only a few. At the most basic level, most object detectors do some form of windowing. Thousands of windows (anchors) of various sizes and shapes are generated.

These windows supposedly contain only one object, and a classifier is used to obtain a probability/score for each class. Once the detector outputs a large number of bounding boxes, it is necessary to filter out the best ones. NMS is the most commonly used algorithm for this task.

Intersection Over Union (IoU)

Before we go ahead, I would like to discuss this one concept that we will be using in the following sections i.e. Intersection Over Union or IoU in short.

The Intersection over Union (IoU) metric, also referred to as the Jaccard index, is essentially a method used usually to quantify the percent overlap between the ground truth BBox (Bounding Box) and the prediction BBox. However, in NMS, we find IoU between two predictions BBoxes instead.

IoU in mathematical terms can be represented by the following expression,

Intersection Over Union(IoU) = (Target ∩ Prediction) / (Target U Prediction)

In our case using BBoxes, it can be modified to,

IOU(Box1, Box2) = Intersection_Size(Box1, Box2) / Union_Size(Box1, Box2)

Consider the the two BBoxes in the following figure,

Their union area is the orange area, and their intersection area is the purple area. So the IoU can be calculated by dividing the purple area by the orange area.

Implementation of IoU in Python

Let us implement this in Python so that we can use it in the later sections. Consider two BBoxes, namely.

Box1 having lower left coordinates as (x1,y1) and upper right coordinates as (a1,b1).

Similarly, there is another BBox, called Box2 having coordinates (x2,y2) and (a2,b2).

We find their intersection box that has coordinates (xx,yy) and (aa,bb).

We then use the expression discussed above to find IoU.

# find the area for the box1 (x1,y1) (a1,b1)

area1 = (a1-x1)*(b1-y1);

# find the area for the box2 (x2,y2) (a2,b2)

area2 = (a2-x2)*(b2-y2);

# Now we need to find the intersection box

# to do that, find the largest (x, y) coordinates

# for the start of the intersection bounding box and

# the smallest (x, y) coordinates for the

# end of the intersection bounding box

xx = max(x1, x2)

yy = max(y1, y2)

aa = min(a1, a2)

bb = min(b1, b2)

# So the intersection BBox has the coordinates (xx,yy) (aa,bb)

# compute the width and height of the intersection bounding box

w = max(0, aa - xx)

h = max(0, bb - yy)

# find the intersection area

intersection_area = w*h

# find the union area of both the boxes

union_area = area1 + area2 - intersection_area

# compute the ratio of overlap between the computed

# bounding box and the bounding box in the area list

IoU = intersection_area / union_area

Input

We get a list P of prediction BBoxes of the form (x1,y1,x2,y2,c), where (x1,y1) and (x2,y2) are the ends of the BBox and c is the predicted confidence score of the model. We also get overlap threshold IoU thresh_iou.

Output

We return a list keep of filtered prediction BBoxes.

Step 1: Select the prediction S with highest confidence score and remove it from P and add it to the final prediction list keep. (keep is empty initially).

Step 2: Now compare this prediction S with all the predictions present in P. Calculate the IoU of this prediction S with every other predictions in P. If the IoU is greater than the threshold thresh_iou for any prediction T present in P, remove prediction T from P.

Step 3: If there are still predictions left in P, then go to Step 1 again, else return the list keep containing the filtered predictions.

In layman terms, we select the predictions with the maximum confidence and suppress all the other predictions having overlap with the selected predictions greater than a threshold.

If you observe the algorithm above, the whole filtering process depends on a single threshold value thresh_iou. So selection of threshold value is vital for the performance of the model. Usually, we take its value as 0.5, but it depends on the experiment you are doing.

Implementation of NMS in PyTorch

def nms_pytorch(P : torch.tensor ,thresh_iou : float):

"""

Apply non-maximum suppression to avoid detecting too many

overlapping bounding boxes for a given object.

Args:

boxes: (tensor) The location preds for the image

along with the class predscores, Shape: [num_boxes,5].

thresh_iou: (float) The overlap thresh for suppressing unnecessary boxes.

Returns:

A list of filtered boxes, Shape: [ , 5]

"""

# we extract coordinates for every

# prediction box present in P

x1 = P[:, 0]

y1 = P[:, 1]

x2 = P[:, 2]

y2 = P[:, 3]

# we extract the confidence scores as well

scores = P[:, 4]

# calculate area of every block in P

areas = (x2 - x1) * (y2 - y1)

# sort the prediction boxes in P

# according to their confidence scores

order = scores.argsort()

# initialise an empty list for

# filtered prediction boxes

keep = []

while len(order) > 0:

# extract the index of the

# prediction with highest score

# we call this prediction S

idx = order[-1]

# push S in filtered predictions list

keep.append(P[idx])

# remove S from P

order = order[:-1]

# sanity check

if len(order) == 0:

break

# select coordinates of BBoxes according to

# the indices in order

xx1 = torch.index_select(x1,dim = 0, index = order)

xx2 = torch.index_select(x2,dim = 0, index = order)

yy1 = torch.index_select(y1,dim = 0, index = order)

yy2 = torch.index_select(y2,dim = 0, index = order)

# find the coordinates of the intersection boxes

xx1 = torch.max(xx1, x1[idx])

yy1 = torch.max(yy1, y1[idx])

xx2 = torch.min(xx2, x2[idx])

yy2 = torch.min(yy2, y2[idx])

# find height and width of the intersection boxes

w = xx2 - xx1

h = yy2 - yy1

# take max with 0.0 to avoid negative w and h

# due to non-overlapping boxes

w = torch.clamp(w, min=0.0)

h = torch.clamp(h, min=0.0)

# find the intersection area

inter = w*h

# find the areas of BBoxes according the indices in order

rem_areas = torch.index_select(areas, dim = 0, index = order)

# find the union of every prediction T in P

# with the prediction S

# Note that areas[idx] represents area of S

union = (rem_areas - inter) + areas[idx]

# find the IoU of every prediction in P with S

IoU = inter / union

# keep the boxes with IoU less than thresh_iou

mask = IoU < thresh_iou

order = order[mask]

return keep

Testing our NMS function

Let us use a very simple example to test out nms_pytorch function. Notice below that P is a tensor containing tensors of the form (x,y,a,b,c) as was discussed above in the input section of NMS algorithm.

# Let P be the following

P = torch.tensor([

[1, 1, 3, 3, 0.95],

[1, 1, 3, 4, 0.93],

[1, 0.9, 3.6, 3, 0.98],

[1, 0.9, 3.5, 3, 0.97]

])

If we plot prediction boxes in P, we get a figure like this.

Now we run our nms_pytorch function with thresh_iou as 0.5.

filtered_boxes = nms_pytorch(P,0.5)

We get one filtered box and if we plot that, it will look like below.

So all the other extra overlapping boxes have been cleared out and we get a nice and clean prediction.

Now, lets’ try with IOU of 0.8.

filtered_boxes = nms_pytorch(P, 0.8)

The following is the result that we get:

This time we have two boxes in the final plot. The reason is that this time the IoU was 0.8. And any box that has an IoU of less than 0.8 with the highest confidence box will not be suppressed and will be treated as a separate valid bounding box.

This is a very simple example. However, in this post, we have used an image with prediction boxes on it predicted by an actual Resnet model. If we look at the image below without applying NMS, there a lot of overlapping BBoxes.