Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

waht's the meaning of 'mask' in YOLOV3.cfg [yolo] #558

Open
AhaEdgar opened this issue Mar 26, 2018 · 7 comments
Open

waht's the meaning of 'mask' in YOLOV3.cfg [yolo] #558

AhaEdgar opened this issue Mar 26, 2018 · 7 comments

Comments

@AhaEdgar
Copy link

waht's the meaning of 'mask' in YOLOV3.cfg [yolo]

[yolo]
mask = 6,7,8

[yolo]
mask = 3,4,5


[yolo]
mask = 0,1,2

  if(mask) l.mask = mask;
    else{
        l.mask = calloc(n, sizeof(int));
        for(i = 0; i < n; ++i){
            l.mask[i] = i;
        }
    }

@pjreddie @Broham

@pjreddie
Copy link
Owner

Every layer has to know about all of the anchor boxes but is only predicting some subset of them. This could probably be named something better but the mask tells the layer which of the bounding boxes it is responsible for predicting. The first yolo layer predicts 6,7,8 because those are the largest boxes and it's at the coarsest scale. The 2nd yolo layer predicts some smallers ones, etc.

@pjreddie
Copy link
Owner

The layer assumes if it isn't passed a mask that it is responsible for all the bounding boxes, hence the if statement thing.

@AhaEdgar
Copy link
Author

@pjreddie Thanks a lot ! One more question, how do we calculate the results of the three [detection] results?

@pjreddie
Copy link
Owner

i'm not quite sure what you mean, could you clarify?

@pjreddie
Copy link
Owner

maybe this helps, the [yolo] layers simply apply logistic activation to some of the neurons, mainly the ones predicting (x,y) offset, objectness, and class probabilities. then if you call get_yolo_detections (or something like that) it interprets the output as described in the paper.

@AhaEdgar
Copy link
Author

AhaEdgar commented Mar 26, 2018

@pjreddie
For example, if there is a dog in a picture, int yolov2, we can adjust the threshold to get the best detection result. it contains 4 bounding box offsets, 1 objectness prediction, and 80 class predictions. However, yolov3 will get a 3-d tensor. there are 3 [detection] layers. yolov3 will predict 3 boxes at each scale so the tensor is N×N×[3∗(4+1+80)] for the 4 bounding box offsets, 1 objectness prediction, and 80 class predictions.

Thus, how do we dispose the 3-d tensor to predict one object?

@Grabber
Copy link

Grabber commented Mar 26, 2018 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants