Why using meta-epoch training paradigm #18

DonaldRR · 2021-12-27T09:32:37Z

Hi,
Thanks to your code, it helps my way on researching alot.
One question comes to my mind when I try to implement cFlow under my code that usually a model is trained by a batch of data where the loss is reduced from the whole batch and backpropagated.
And I found that the loss is backpropagated for each sub-iteration -- only part of the batch is sampled.
This Training paradigm somehow confuses me, does it work better than the normal way ?
Here are points that I surmise why that works:

Only seeing part of the batch randomly makes the gradients move more stochastically which could produce a more robust model
It saves GPU memory for each forward/backward propagation with high batch size

Thanks!

gudovskiy · 2021-12-27T17:17:52Z

@DonaldRR could you point me to a line of code where "the loss is backpropagated for each sub-iteration -- only part of the batch is sampled"? I don't think, I implemented anything like you described: meta/sub epochs are just to introduce flexibility for train/test phases

DonaldRR · 2021-12-28T07:48:01Z

@DonaldRR could you point me to a line of code where "the loss is backpropagated for each sub-iteration -- only part of the batch is sampled"? I don't think, I implemented anything like you described: meta/sub epochs are just to introduce flexibility for train/test phases

Ops, I mean the FIBER iteratoin. During each fiber iteration, N(=256) features are sampled for loss computation and backpropagation.

lines starting at 65th line train.py

                for f in range(FIB):  # per-fiber processing
                    idx = torch.arange(f*N, (f+1)*N)
                    c_p = c_r[perm[idx]]  # NxP
                    e_p = e_r[perm[idx]]  # NxC
                    if 'cflow' in c.dec_arch:
                        z, log_jac_det = decoder(e_p, [c_p,])
                    else:
                        z, log_jac_det = decoder(e_p)
                    #
                    decoder_log_prob = get_logp(C, z, log_jac_det)
                    log_prob = decoder_log_prob / C  # likelihood per dim
                    loss = -log_theta(log_prob)
                    optimizer.zero_grad()
                    loss.mean().backward()

gudovskiy · 2021-12-28T17:44:27Z

@DonaldRR I see. The number of feature vectors (fibers) can be quite large in a feature map (tensor) to fill all memory for a flow model. So, it is better to sample random feature vectors from a number of feature maps. Hence, your original post is on point :)

PSZehnder · 2022-03-15T20:54:20Z

@gudovskiy Thank you for providing this excellent repo. Does training on subbatches of fibers serve any purpose other than conserving memory? Could I remove this loop and process all the fibers in one shot if I have sufficient gpu memory to do so?

gudovskiy · 2022-03-16T02:32:22Z

@PSZehnder yes

Howie86 · 2023-06-01T03:00:36Z

Hi,
Need the value of N in train and test be the same ? When I change the value of N only in test phase, I found the inference results will slightly change.
Thanks a lot.

gudovskiy · 2023-06-01T17:35:15Z

@Howie86 N should not change test results

Howie86 · 2023-06-02T01:48:02Z

@gudovskiy Thanks for your reply, I understand.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why using meta-epoch training paradigm #18

Why using meta-epoch training paradigm #18

DonaldRR commented Dec 27, 2021 •

edited

Loading

gudovskiy commented Dec 27, 2021

DonaldRR commented Dec 28, 2021

gudovskiy commented Dec 28, 2021

PSZehnder commented Mar 15, 2022

gudovskiy commented Mar 16, 2022

Howie86 commented Jun 1, 2023

gudovskiy commented Jun 1, 2023

Howie86 commented Jun 2, 2023

Why using meta-epoch training paradigm #18

Why using meta-epoch training paradigm #18

Comments

DonaldRR commented Dec 27, 2021 • edited Loading

gudovskiy commented Dec 27, 2021

DonaldRR commented Dec 28, 2021

gudovskiy commented Dec 28, 2021

PSZehnder commented Mar 15, 2022

gudovskiy commented Mar 16, 2022

Howie86 commented Jun 1, 2023

gudovskiy commented Jun 1, 2023

Howie86 commented Jun 2, 2023

DonaldRR commented Dec 27, 2021 •

edited

Loading