-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
what's the meaning of 'subdivisions' and how to set it's value? #224
Comments
It's how many mini batches you split your batch in. When batching you are averaging over more images the intend is not only to speed up the training process but also to generalise the training more. If your GPU have enough memory you can reduce the subdivision to load more images into the gpu to process at the same time. Hope this answer your question. |
@TheMikeyR what you explanation is so great, thank you very much |
@pjreddie @TheMikeyR I am still confused of how the subdivision affect the update of moving mean and moving variance in batch norm. My understanding is that the mean and variance will be computed based on the mini-batches' size (e.g. 8) instead of the total batch size (e.g. 64). Hence, the mini-batches is the actual batch size used in batch normalization. Is my understanding right? |
When batch = 64, subdivisions = 64, the mAP was 0.008%, but when we changed to batch = 64, subdivisions = 32, mAP improved to 97%. When the value of loss is around 0.1 in either case. |
By using a smaller subdivision, the mini-batch size for computing gradient increases. Hence, the computed gradient based on a larger mini-batch size gives a better optimization. I guess using a smaller mini-batch size will result in a local optimum and thus decrease accuracy. |
@ZHANGKEON Can mini-batch size be the same value as epoch number? |
I made a mistake. The mini batch size is batch / subdivision. batch = 64, subdivisions = 32 batch = 64, subdivisions = 8 |
@keides2 , I hope I am not late. :) As far as I know, the common concept of epoch is different from what you have stated. Running your algorithm for 1 epoch means it has processed all the 320 samples a single time. Two epochs mean you have processed the whole dataset two times. In your example, for each epoch (processing 320 samples) your gonna need 5 iterations in batches of 64. dataset = 320 samples = 1 epoch |
How to train the yolo with 3 frames at a time? |
Hi, I've read the above comments but I don't really understand the purpose of defining a batch value and a subdivision value. I suppose the max_batches parameter refers to the maximum number of trained batches, so if I write something like
wouldn't it be the same as writing?
The mini-batch size would be same in both cases. Is there any difference in terms of computation speed and/or accuracy between the above configurations? |
actually the batch is where the gradients are updated. for instance i have |
@gustavovaliati , I was too late to notice. :) I understood that one epoch is to process all 320 samples once. 1 iteration = 32 mini batches = 1 batch Correctly, Thankyou, |
@keides2 @yuenbe @gustavovaliati I think there some subtleties in the codebase which are causing confusion. Common terminology is: Epoch = a pass through the full dataset In darknet, the batch parameter is the minibatch size. The subdivision parameter controls how many samples will fit into RAM (minibatch/subdivision = samples loaded simultaneously per pass) - a mini-minibatch, if you like. A subdivision is not a minibatch in the conventional sense, because in darknet the weights are not updated every batch (here's where the naming is confusing) unless If you look in the code, what gets printed during training is: i = get_current_batch(net);
int imgs = net->batch * net->subdivisions * ngpus;
...
printf("%ld: %f, %f avg, %f rate, %lf seconds, %d images\n", get_current_batch(net), loss, avg_loss, get_current_rate(net), what_time_is_it_now()-time, i*imgs); where the current batch is defined as: size_t get_current_batch(network *net)
{
size_t batch_num = (*net->seen)/(net->batch*net->subdivisions);
return batch_num;
} If we see:
That means the current batch is 26132 and my batch size is 64, which gives 1672448. However this is strange - this would suggest that either On the face of it, something seems messed up - surely the current batch is Some clarifying information from pjreddie here: https://groups.google.com/forum/#!topic/darknet/qZp5P0I4iGA
It turns out that Line 664 in 61c9d02
Since the subdivision is a clean divisor of the batch size, this means that Let's look at the training code: float train_network(network *net, data d)
{
assert(d.X.rows % net->batch == 0);
int batch = net->batch;
int n = d.X.rows / batch;
int i;
float sum = 0;
for(i = 0; i < n; ++i){
get_next_batch(d, batch, i*batch, net->input, net->truth);
float err = train_network_datum(net);
sum += err;
}
return (float)sum/(n*batch);
}
float train_network_datum(network *net)
{
*net->seen += net->batch;
net->train = 1;
forward_network(net);
backward_network(net);
float error = *net->cost;
if(((*net->seen)/net->batch)%net->subdivisions == 0) update_network(net);
return error;
} So when we train, we grab Going into the datum function, we do a forward pass and a backward pass (so far so good), calculate the loss and then update the weights if we've seen enough images. So we need to compute the number of images divided by the smaller batch size, and then check if that's a multiple of the number of subdivisions. In a roundabout way, that makes sense. In summary:
|
If: How can we know the epoch of the whole training? is it calculated based on the max_batches? if max_batches or iteration = 10, does it mean 2 epochs? |
@jveitchmichaelis @TheMikeyR @humandotlearning @Juuustin If you train with flag -map you will see iteration count. |
quick addition, how is ate the gradients aggregated across subdivisions, are the gradients for each batch saved and then averaged, summed together or some other operation. At 64 batch size with 8 sub divisions, means 8 independent gradients. along the same lines, are the batch norm statistics updated every 8 subdivisions as well in order to ensure that both the weight and mean updates happen at the same cycle? and finally for the batch norm given that you are using 8 subdivisions, then you cannot guarantee the the net gradient is the same as if you were to take only 1 subdivision as each batch is forward propagated against its own mean and variance, so the 64/1 batch must be different in over all characteristic than the 64/8 correct? @AlexeyAB |
@TheMikeyR what if the batch value is changed to more than 64? |
how to set the value of subdivisions?
[net]
batch=64
subdivisions=8
The text was updated successfully, but these errors were encountered: