Why we need it ?
Most object detection algorithms use NMS to whittle down many detected bounding boxes to only a few. At the most basic level, most object detectors do some form of windowing. Thousands of windows (anchors) of various sizes and shapes are generated.
These windows supposedly contain only one object, and a classifier is used to obtain a probability/score for each class. Once the detector outputs a large number of bounding boxes, it is necessary to filter out the best ones. NMS is the most commonly used algorithm for this task.
Intersection Over Union (IoU)
Before we go ahead, I would like to discuss this one concept that we will be using in the following sections i.e. Intersection Over Union or IoU in short.
Before we go ahead, I would like to discuss this one concept that we will be using in the following sections i.e. Intersection Over Union or IoU in short.
The Intersection over Union (IoU) metric, also referred to as the Jaccard index, is essentially a method used usually to quantify the percent overlap between the ground truth BBox (Bounding Box) and the prediction BBox. However, in NMS, we find IoU between two predictions BBoxes instead.
IoU in mathematical terms can be represented by the following expression,
Intersection Over Union(IoU) = (Target ∩ Prediction) / (Target U Prediction)
In our case using BBoxes, it can be modified to,
IOU(Box1, Box2) = Intersection_Size(Box1, Box2) / Union_Size(Box1, Box2)
Consider the the two BBoxes in the following figure,
Their union area is the orange area, and their intersection area is the purple area. So the IoU can be calculated by dividing the purple area by the orange area.
Implementation of IoU in Python
Let us implement this in Python so that we can use it in the later sections. Consider two BBoxes, namely.
Box1 having lower left coordinates as (x1,y1) and upper right coordinates as (a1,b1).
Similarly, there is another BBox, called Box2 having coordinates (x2,y2) and (a2,b2).
We find their intersection box that has coordinates (xx,yy) and (aa,bb).
We then use the expression discussed above to find IoU.
# find the area for the box1 (x1,y1) (a1,b1)
area1 = (a1-x1)*(b1-y1);
# find the area for the box2 (x2,y2) (a2,b2)
area2 = (a2-x2)*(b2-y2);
# Now we need to find the intersection box
# to do that, find the largest (x, y) coordinates
# for the start of the intersection bounding box and
# the smallest (x, y) coordinates for the
# end of the intersection bounding box
xx = max(x1, x2)
yy = max(y1, y2)
aa = min(a1, a2)
bb = min(b1, b2)
# So the intersection BBox has the coordinates (xx,yy) (aa,bb)
# compute the width and height of the intersection bounding box
w = max(0, aa - xx)
h = max(0, bb - yy)
# find the intersection area
intersection_area = w*h
# find the union area of both the boxes
union_area = area1 + area2 - intersection_area
# compute the ratio of overlap between the computed
# bounding box and the bounding box in the area list
IoU = intersection_area / union_area
Input
We get a list P of prediction BBoxes of the form (x1,y1,x2,y2,c), where (x1,y1) and (x2,y2) are the ends of the BBox and c is the predicted confidence score of the model. We also get overlap threshold IoU thresh_iou.
Output
We return a list keep of filtered prediction BBoxes.
Step 1: Select the prediction S with highest confidence score and remove it from P and add it to the final prediction list keep. (keep is empty initially).
Step 2: Now compare this prediction S with all the predictions present in P. Calculate the IoU of this prediction S with every other predictions in P. If the IoU is greater than the threshold thresh_iou for any prediction T present in P, remove prediction T from P.
Step 3: If there are still predictions left in P, then go to Step 1 again, else return the list keep containing the filtered predictions.
In layman terms, we select the predictions with the maximum confidence and suppress all the other predictions having overlap with the selected predictions greater than a threshold.
If you observe the algorithm above, the whole filtering process depends on a single threshold value thresh_iou. So selection of threshold value is vital for the performance of the model. Usually, we take its value as 0.5, but it depends on the experiment you are doing.
Implementation of NMS in PyTorch
def nms_pytorch(P : torch.tensor ,thresh_iou : float):
"""
Apply non-maximum suppression to avoid detecting too many
overlapping bounding boxes for a given object.
Args:
boxes: (tensor) The location preds for the image
along with the class predscores, Shape: [num_boxes,5].
thresh_iou: (float) The overlap thresh for suppressing unnecessary boxes.
Returns:
A list of filtered boxes, Shape: [ , 5]
"""
# we extract coordinates for every
# prediction box present in P
x1 = P[:, 0]
y1 = P[:, 1]
x2 = P[:, 2]
y2 = P[:, 3]
# we extract the confidence scores as well
scores = P[:, 4]
# calculate area of every block in P
areas = (x2 - x1) * (y2 - y1)
# sort the prediction boxes in P
# according to their confidence scores
order = scores.argsort()
# initialise an empty list for
# filtered prediction boxes
keep = []
while len(order) > 0:
# extract the index of the
# prediction with highest score
# we call this prediction S
idx = order[-1]
# push S in filtered predictions list
keep.append(P[idx])
# remove S from P
order = order[:-1]
# sanity check
if len(order) == 0:
break
# select coordinates of BBoxes according to
# the indices in order
xx1 = torch.index_select(x1,dim = 0, index = order)
xx2 = torch.index_select(x2,dim = 0, index = order)
yy1 = torch.index_select(y1,dim = 0, index = order)
yy2 = torch.index_select(y2,dim = 0, index = order)
# find the coordinates of the intersection boxes
xx1 = torch.max(xx1, x1[idx])
yy1 = torch.max(yy1, y1[idx])
xx2 = torch.min(xx2, x2[idx])
yy2 = torch.min(yy2, y2[idx])
# find height and width of the intersection boxes
w = xx2 - xx1
h = yy2 - yy1
# take max with 0.0 to avoid negative w and h
# due to non-overlapping boxes
w = torch.clamp(w, min=0.0)
h = torch.clamp(h, min=0.0)
# find the intersection area
inter = w*h
# find the areas of BBoxes according the indices in order
rem_areas = torch.index_select(areas, dim = 0, index = order)
# find the union of every prediction T in P
# with the prediction S
# Note that areas[idx] represents area of S
union = (rem_areas - inter) + areas[idx]
# find the IoU of every prediction in P with S
IoU = inter / union
# keep the boxes with IoU less than thresh_iou
mask = IoU < thresh_iou
order = order[mask]
return keep
Testing our NMS function
Let us use a very simple example to test out nms_pytorch function. Notice below that P is a tensor containing tensors of the form (x,y,a,b,c) as was discussed above in the input section of NMS algorithm.
# Let P be the following
P = torch.tensor([
[1, 1, 3, 3, 0.95],
[1, 1, 3, 4, 0.93],
[1, 0.9, 3.6, 3, 0.98],
[1, 0.9, 3.5, 3, 0.97]
])
If we plot prediction boxes in P, we get a figure like this.
filtered_boxes = nms_pytorch(P,0.5)
We get one filtered box and if we plot that, it will look like below.
So all the other extra overlapping boxes have been cleared out and we get a nice and clean prediction.
Now, lets’ try with IOU of 0.8.
filtered_boxes = nms_pytorch(P, 0.8)
The following is the result that we get:
This time we have two boxes in the final plot. The reason is that this time the IoU was 0.8. And any box that has an IoU of less than 0.8 with the highest confidence box will not be suppressed and will be treated as a separate valid bounding box.
This is a very simple example. However, in this post, we have used an image with prediction boxes on it predicted by an actual Resnet model. If we look at the image below without applying NMS, there a lot of overlapping BBoxes.
Now if we apply our nms_pytorch function, we get the following image.
References
No comments:
Post a Comment