Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about epoch #20

Open
huojinyong opened this issue Jun 30, 2022 · 5 comments
Open

about epoch #20

huojinyong opened this issue Jun 30, 2022 · 5 comments

Comments

@huojinyong
Copy link

When run run_training.py,the epoch is 10000.Is it necessary?10000 is too large,and the MVTec is small.I worry about overfit.Can i set epoch smaller?for example,set epoch 4000?
Thanks!

@Runinho
Copy link
Owner

Runinho commented Jun 30, 2022

Note that the --epoch parameter takes the number of update steps and not the number of iterations over the dataset.
You can try to reduce the number of epochs and study its effect on the evaluation metrics.

Check out the following two graphs from the README, they show the evaluation metric over the update steps.
CutPaste (without scar or 3-way)

CutPaste(Scar)

During training, the model only sees the images without defects. Because we want the model to learn what the characteristics of a good sample, it might be desired to "overfit" to these images.

@ghost
Copy link

ghost commented Jul 8, 2022

thanks for your explanation,
so the results presented in README,
they are obtained from the updated step 10000, for every class in MVTEC dataset?

@ghost
Copy link

ghost commented Jul 8, 2022

In README, you mentioned ' The --epoch parameter takes the number of update steps and not their definition of epochs.'.
so taking class 'screw' for example, there are 320 normal images in the train set,
and here we use batch size 32, which means it will iter 10 steps to run through all images.
we set 10000 steps to get the results, does it mean in the conventional definition of epoch, we run 1000 epochs to get the results of class screw?

@Runinho
Copy link
Owner

Runinho commented Jul 9, 2022

we set 10000 steps to get the results, does it mean in the conventional definition of epoch, we run 1000 epochs to get the results of class screw?

Correct.

If you look into the paper on arxiv page 12 (Appendix 3) they specify how many update steps they use:

  1. Number of training epochs ∈{128, 192, 256, 320, 384}.⁵

⁵ Note that, unlike conventional definition for an epoch, we define 256 parameter update steps as one epoch.

Which makes me think they use $256*256 = 65,536$ steps.

@ghost
Copy link

ghost commented Jul 10, 2022

thanks for your prompt reply!

yes, I have noticed that in the paper,
and I couldn't understand why they used this setup,
because it means for smaller datasets they trained longer compare to conventional setup,
taking class 'toothbrush' for example, there are only 60 images in the train set,
they trained 65536 steps with batch size 32,
in the conventional definition of epoch, it means they trained 32768 epochs for toothbrush,
and 6553 epoch for screw.
shouldn't we train longer for larger datasets?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants