what's the meaning of 'subdivisions' and how to set it's value? #224

yanchao12122 · 2017-10-02T14:50:49Z

how to set the value of subdivisions?

[net]
batch=64
subdivisions=8

TheMikeyR · 2017-10-11T10:51:02Z

It's how many mini batches you split your batch in.
Batch=64 -> loading 64 images for this "iteration".
Subdivision=8 -> Split batch into 8 "mini-batches" so 64/8 = 8 images per "minibatch" and this get sent to the gpu for process.
That will be repeated 8 times until the batch is completed and a new itereation will start with 64 new images.

When batching you are averaging over more images the intend is not only to speed up the training process but also to generalise the training more.

If your GPU have enough memory you can reduce the subdivision to load more images into the gpu to process at the same time.

Hope this answer your question.

yanchao12122 · 2017-10-11T11:18:18Z

@TheMikeyR what you explanation is so great, thank you very much

ZHANGKEON · 2018-07-19T09:59:54Z

@pjreddie @TheMikeyR I am still confused of how the subdivision affect the update of moving mean and moving variance in batch norm. My understanding is that the mean and variance will be computed based on the mini-batches' size (e.g. 8) instead of the total batch size (e.g. 64). Hence, the mini-batches is the actual batch size used in batch normalization. Is my understanding right?

keides2 · 2018-07-27T20:01:45Z

When batch = 64, subdivisions = 64, the mAP was 0.008%, but when we changed to batch = 64, subdivisions = 32, mAP improved to 97%. When the value of loss is around 0.1 in either case.
I think that the false judgment ratio (FalsePositive) decreases, could you tell me why mAP improves?

ZHANGKEON · 2018-07-30T01:17:59Z

By using a smaller subdivision, the mini-batch size for computing gradient increases. Hence, the computed gradient based on a larger mini-batch size gives a better optimization. I guess using a smaller mini-batch size will result in a local optimum and thus decrease accuracy.

keides2 · 2018-07-30T12:12:41Z

@ZHANGKEON Can mini-batch size be the same value as epoch number?
For example, for simplicity, if there are 320 images, batch = 64, subdivisions is 64, the epoch number is 5 (= 320/64), and to learn all images only 5 iterations. If subdivisions is 8, epoch number is 40 (= 320/8), and it can be learned 40 times. Here, mini-batch size can be considered to be 5 or 40.

keides2 · 2018-07-30T15:36:42Z

I made a mistake. The mini batch size is batch / subdivision.

batch = 64, subdivisions = 32
mini-batch size = 64/32 = 2
epoch = 320/2 = 160

batch = 64, subdivisions = 8
mini-batch size = 64/8 = 8
epoch = 320/8 = 40

gustavovaliati · 2018-08-17T23:37:54Z

@keides2 , I hope I am not late. :)

As far as I know, the common concept of epoch is different from what you have stated. Running your algorithm for 1 epoch means it has processed all the 320 samples a single time. Two epochs mean you have processed the whole dataset two times. In your example, for each epoch (processing 320 samples) your gonna need 5 iterations in batches of 64.

dataset = 320 samples = 1 epoch
batch = 64
subdivisions = mini-batches = 32
1 epoch = 5 iterations = 320/64
1 iteration = 32 mini-batches = 1 batch
mini-batch size = 64/32 = 2 samples

imnaren142 · 2018-08-21T10:45:05Z

How to train the yolo with 3 frames at a time?
The input should be like first 3 frames(1,2,3) and then second 3 frames(2,3,4) like this?

yuenbe · 2018-10-04T03:50:17Z

Hi, I've read the above comments but I don't really understand the purpose of defining a batch value and a subdivision value.

I suppose the max_batches parameter refers to the maximum number of trained batches, so if I write something like

batch=64
subdivisions=2
max_batches=10000

wouldn't it be the same as writing?

batch=32
subdivisions=1
max_batches=20000

The mini-batch size would be same in both cases. Is there any difference in terms of computation speed and/or accuracy between the above configurations?

humandotlearning · 2018-11-28T11:32:05Z

Hi, I've read the above comments but I don't really understand the purpose of defining a batch value and a subdivision value.

I suppose the max_batches parameter refers to the maximum number of trained batches, so if I write something like
batch=64
subdivisions=2
max_batches=10000
wouldn't it be the same as writing?
batch=32
subdivisions=1
max_batches=20000
The mini-batch size would be same in both cases. Is there any difference in terms of computation speed and/or accuracy between the above configurations?

actually the batch is where the gradients are updated. for instance i have
batch =64
subdivision=8
then after the model has seen 64 images it would update its gradients and the subdivision is given to reduce the ram usage of gpu at one time. so for the above case the mini batch would be 68/8 = 8.
that is 8 images are loaded in the gpu at one time.

keides2 · 2018-12-06T21:33:12Z

@gustavovaliati , I was too late to notice. :)

I understood that one epoch is to process all 320 samples once.

1 iteration = 32 mini batches = 1 batch

Correctly,
Mini batch size = batch / mini batches = 64/32
So,
1 iteration = mini batches * mini batch size = 32 * 64/32 = 1 batch
is not it?

Thankyou,

jveitchmichaelis · 2018-12-28T18:32:40Z

@keides2 @yuenbe @gustavovaliati

I think there some subtleties in the codebase which are causing confusion. Common terminology is:

Epoch = a pass through the full dataset
Batch training = update the weights after one epoch
Minibatch training = update the weights after a set number of samples have been seen

In darknet, the batch parameter is the minibatch size. The subdivision parameter controls how many samples will fit into RAM (minibatch/subdivision = samples loaded simultaneously per pass) - a mini-minibatch, if you like. A subdivision is not a minibatch in the conventional sense, because in darknet the weights are not updated every batch (here's where the naming is confusing) unless subdivision == 1.

If you look in the code, what gets printed during training is:

i = get_current_batch(net);
int imgs = net->batch * net->subdivisions * ngpus;
...
printf("%ld: %f, %f avg, %f rate, %lf seconds, %d images\n", get_current_batch(net), loss, avg_loss, get_current_rate(net), what_time_is_it_now()-time, i*imgs);

where the current batch is defined as:

size_t get_current_batch(network *net)
{
    size_t batch_num = (*net->seen)/(net->batch*net->subdivisions);
    return batch_num;
}

If we see:

26132: 5.084459, 4.797126 avg, 0.001000 rate, 2.847984 seconds, 1672448 images

That means the current batch is 26132 and my batch size is 64, which gives 1672448. However this is strange - this would suggest that either net->subdivisions = 1 which it isn't, or that the number of seen images is wrong (it isn't - see further down), so.. the batch size is wrong?

On the face of it, something seems messed up - surely the current batch is n_images/batch. Why are we multiplying by subdivisions?

Some clarifying information from pjreddie here: https://groups.google.com/forum/#!topic/darknet/qZp5P0I4iGA

Yes exactly, it's a confusing implementation detail but subdivisions is just used to divide up the work and limit memory usage. If you have a batch size of 10 and subdivision of 2 your GPU will only process 5 images at a time but will update weights after 2 of the "new" batch size. Thus batch_size is sort of overloaded in the code, eventually I'll probably edit it to make it more clear.

It turns out that net->batch gets modified when the config file is loaded - here:

darknet/src/parser.c

Line 664 in 61c9d02

net->batch /= subdivs;

Since the subdivision is a clean divisor of the batch size, this means that net->batch is always an integer >= 1. When you later refer to net->batch in the code, it's really a smaller net->batch/subdivision and so you need to compensate later on. Pretty confusing!

Let's look at the training code:

float train_network(network *net, data d)
{
    assert(d.X.rows % net->batch == 0);
    int batch = net->batch;
    int n = d.X.rows / batch;

    int i;
    float sum = 0;
    for(i = 0; i < n; ++i){
        get_next_batch(d, batch, i*batch, net->input, net->truth);
        float err = train_network_datum(net);
        sum += err;
    }
    return (float)sum/(n*batch);
}

float train_network_datum(network *net)
{
    *net->seen += net->batch;
    net->train = 1;
    forward_network(net);
    backward_network(net);
    float error = *net->cost;
    if(((*net->seen)/net->batch)%net->subdivisions == 0) update_network(net);
    return error;
}

So when we train, we grab batch images, which is the scaled batch size. We increment seen by this amount.

Going into the datum function, we do a forward pass and a backward pass (so far so good), calculate the loss and then update the weights if we've seen enough images. So we need to compute the number of images divided by the smaller batch size, and then check if that's a multiple of the number of subdivisions. In a roundabout way, that makes sense.

In summary:

Darknet overloads some terminology in the code, but the config parameters mostly make sense
Batch size is the number of images you process before updating the network weights (i.e. a minibatch in other work)
The batch size is divided by subdivisions when the network is loaded, this is how many images can fit into your GPU at once. In the code, net->batch really refers to that, not the minibatch size.
An iteration is one mini batch (e.g. whatever you set batch to in the config file)
Bear this in mind when reading through the source!

Juuustin · 2020-09-03T12:26:04Z

If:
dataset = 320 samples = 1 epoch
batch = 64
subdivisions = mini-batches = 32
1 epoch = 5 iterations = 320/64
1 iteration = 32 mini-batches = 1 batch
mini-batch size = 64/32 = 2 samples

How can we know the epoch of the whole training?

is it calculated based on the max_batches? if max_batches or iteration = 10, does it mean 2 epochs?

arnaud-nt2i · 2020-09-15T13:48:16Z

@jveitchmichaelis @TheMikeyR @humandotlearning
Thank you all for your effort in understanding Yolo's secret.
One question remains for me if Batch (in cfg) is indeed the number of images you process before updating the network weights.
Why The mini-batch size (Batch/subdivision ) has an impact on the mAP ??
as stated here:
https://github.com/AlexeyAB/darknet/issues/4386

@Juuustin If you train with flag -map you will see iteration count.
Then: Epoch = (nb of iter) / ( (samples)/(Batch) ) = (iter*Batch)/(sample)

vishnubanna · 2021-02-18T18:15:53Z

quick addition, how is ate the gradients aggregated across subdivisions, are the gradients for each batch saved and then averaged, summed together or some other operation. At 64 batch size with 8 sub divisions, means 8 independent gradients. along the same lines, are the batch norm statistics updated every 8 subdivisions as well in order to ensure that both the weight and mean updates happen at the same cycle? and finally for the batch norm given that you are using 8 subdivisions, then you cannot guarantee the the net gradient is the same as if you were to take only 1 subdivision as each batch is forward propagated against its own mean and variance, so the 64/1 batch must be different in over all characteristic than the 64/8 correct? @AlexeyAB

pangondion-k-naibaho · 2021-04-30T12:05:45Z

It's how many mini batches you split your batch in.
Batch=64 -> loading 64 images for this "iteration".
Subdivision=8 -> Split batch into 8 "mini-batches" so 64/8 = 8 images per "minibatch" and this get sent to the gpu for process.
That will be repeated 8 times until the batch is completed and a new itereation will start with 64 new images.

When batching you are averaging over more images the intend is not only to speed up the training process but also to generalise the training more.

If your GPU have enough memory you can reduce the subdivision to load more images into the gpu to process at the same time.

Hope this answer your question.

@TheMikeyR what if the batch value is changed to more than 64?

yanchao12122 closed this as completed Oct 11, 2017

TheMikeyR mentioned this issue Nov 20, 2017

what's the meaning of the parameter "subdivisions"? #325

Closed

kilianyp mentioned this issue Feb 10, 2018

Subdivisions is unused thtrieu/darkflow#569

Closed

ydixon mentioned this issue Oct 15, 2018

different training results ultralytics/yolov3#22

Closed

dreambit mentioned this issue Jan 14, 2019

Reproduce author's yolov3 training results AlexeyAB/darknet#2199

Open

AlexeyAB mentioned this issue May 18, 2019

Think about improving Batch normalization for low mini_batches and multi-GPU AlexeyAB/darknet#3188

Open

QinghangHong1 mentioned this issue Jun 13, 2021

batch size and subdivision bubbliiiing/yolov4-pytorch#190

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

what's the meaning of 'subdivisions' and how to set it's value? #224

what's the meaning of 'subdivisions' and how to set it's value? #224

yanchao12122 commented Oct 2, 2017

TheMikeyR commented Oct 11, 2017 •

edited

Loading

yanchao12122 commented Oct 11, 2017

ZHANGKEON commented Jul 19, 2018 •

edited

Loading

keides2 commented Jul 27, 2018

ZHANGKEON commented Jul 30, 2018 •

edited

Loading

keides2 commented Jul 30, 2018

keides2 commented Jul 30, 2018

gustavovaliati commented Aug 17, 2018 •

edited

Loading

imnaren142 commented Aug 21, 2018

yuenbe commented Oct 4, 2018

humandotlearning commented Nov 28, 2018

keides2 commented Dec 6, 2018

jveitchmichaelis commented Dec 28, 2018 •

edited

Loading

Juuustin commented Sep 3, 2020

arnaud-nt2i commented Sep 15, 2020

vishnubanna commented Feb 18, 2021 •

edited

Loading

pangondion-k-naibaho commented Apr 30, 2021

what's the meaning of 'subdivisions' and how to set it's value? #224

what's the meaning of 'subdivisions' and how to set it's value? #224

Comments

yanchao12122 commented Oct 2, 2017

TheMikeyR commented Oct 11, 2017 • edited Loading

yanchao12122 commented Oct 11, 2017

ZHANGKEON commented Jul 19, 2018 • edited Loading

keides2 commented Jul 27, 2018

ZHANGKEON commented Jul 30, 2018 • edited Loading

keides2 commented Jul 30, 2018

keides2 commented Jul 30, 2018

gustavovaliati commented Aug 17, 2018 • edited Loading

imnaren142 commented Aug 21, 2018

yuenbe commented Oct 4, 2018

humandotlearning commented Nov 28, 2018

keides2 commented Dec 6, 2018

jveitchmichaelis commented Dec 28, 2018 • edited Loading

Juuustin commented Sep 3, 2020

arnaud-nt2i commented Sep 15, 2020

vishnubanna commented Feb 18, 2021 • edited Loading

pangondion-k-naibaho commented Apr 30, 2021

TheMikeyR commented Oct 11, 2017 •

edited

Loading

ZHANGKEON commented Jul 19, 2018 •

edited

Loading

ZHANGKEON commented Jul 30, 2018 •

edited

Loading

gustavovaliati commented Aug 17, 2018 •

edited

Loading

jveitchmichaelis commented Dec 28, 2018 •

edited

Loading

vishnubanna commented Feb 18, 2021 •

edited

Loading