Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'val Objectness' overfitting on j-series hyperparameters #453

Closed
glenn-jocher opened this issue Aug 14, 2019 · 3 comments
Closed

'val Objectness' overfitting on j-series hyperparameters #453

glenn-jocher opened this issue Aug 14, 2019 · 3 comments

Comments

@glenn-jocher
Copy link
Member

@ktian08 evolved the current j-series hyperparameters which were committed in early August, replacing the previous i-series parameters. They generally perform well (to within 1% of darknet training from scratch(!)) but they appear to cause overfitting in validation Confidence in particular.

yolov3/train.py

Lines 36 to 55 in 907195d

# Training hyperparameters j (50.5 mAP yolov3-320) evolved by @ktian08 https://github.com/ultralytics/yolov3/issues/310
hyp = {'giou': 1.582, # giou loss gain
'xy': 4.688, # xy loss gain
'wh': 0.1857, # wh loss gain
'cls': 27.76, # cls loss gain
'cls_pw': 1.446, # cls BCELoss positive_weight
'obj': 21.35, # obj loss gain
'obj_pw': 3.941, # obj BCELoss positive_weight
'iou_t': 0.2635, # iou training threshold
'lr0': 0.002324, # initial learning rate
'lrf': -4., # final LambdaLR learning rate = lr0 * (10 ** lrf)
'momentum': 0.97, # SGD momentum
'weight_decay': 0.0004569, # optimizer weight decay
'hsv_s': 0.5703, # image HSV-Saturation augmentation (fraction)
'hsv_v': 0.3174, # image HSV-Value augmentation (fraction)
'degrees': 1.113, # image rotation (+/- deg)
'translate': 0.06797, # image translation (+/- fraction)
'scale': 0.1059, # image scale (+/- gain)
'shear': 0.5768} # image shear (+/- deg)

See #446 (comment) by @phino10 and #310 (comment) by @Aurora33 for two overfitting examples.

One thing I notice is that hyp['obj_pw'] = 3.941 is very high compared to hyp['cls_pw'] = 1.446. This may cause aggressive performance gains at the beginning of training at the expense of overfitting later in training. One option would be to manually lower hyp['obj_pw'] and to increase hyp['obj'] to compensate, perhaps in equal measure (i.e. divide by 2 and multiply by 2).

@glenn-jocher glenn-jocher changed the title 'val Confidence' overfitting on j-series hyperparameters 'val Objectness' overfitting on j-series hyperparameters Aug 25, 2019
@glenn-jocher
Copy link
Member Author

Another example. Also of note is that objectness validation losses are far higher than training losses, the only one of the 3 loss components to display this behavior.

results

@glenn-jocher
Copy link
Member Author

glenn-jocher commented Aug 28, 2019

#472 416 no multiscale.
image

#310 (comment) 320 multiscale
image

@glenn-jocher
Copy link
Member Author

Problem resolved with new hyperparameters.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant