-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Successful Training done on Custom dataset, but have some Question about the output #716
Comments
My training config is given below, TRAIN: DATA: MVIT: AUG: MIXUP: SOLVER: MODEL: TEST: DATA_LOADER: NUM_GPUS: 1 TENSORBOARD: |
i have some question regarding the outputs, also why during the training lr is always 0.000 |
answered here: #664 (comment) Also, I would immediately say batch size = 1 can be one of the limitations for a vision task regarding you dissatisfaction. |
thanks for the advice, now i have increased the batch size from 1 to 2 and start the training again for 100 epochs i will update about the result here after training is done. |
How big is your custom dataset? If also your dataset is limited, as relatively small ConvNet can also achieve the task. I understand training a transformer model can be resource-wise demanding, and you can try a batch size of 16, 32, or even 64 with a ConvNet to achieve a better performance because a batch size of 2 is still on the far low side.
Can you also show the output where it says your LR is 0.000? Looking at your config, your LR is set as 0.00001 and this value might have been simply clipped by the display precision in the output terminal, so it does not look concerning as your accuracy already is improving over time and it is not possible that you indeed have a zero LR, which would yield zero weight updates. |
I have a synthetic dataset consisting of 15 classes of human activities. For each class, I have around 40 videos. My task is to train a vision transformer model for human activity recognition. After training, I will test it on a real-world dataset where I have 5 videos for each class. While I understand that using a ConvNet might be more resource-efficient, my task is domain-specific and requires the use of a vision transformer model, regardless of the initial results. Therefore, I need to focus on improving the performance of the vision transformer rather than switching to a CNN. Any suggestions for optimizing the vision transformer to achieve better results would be greatly appreciated |
The simplest things you can start with are:
|
Update On my Training after 105 Epochs these are the below results i got on Kinetics/MVITv2_B_32x3 train_net.py: 759: training done: _p50.93_f225.17 _t11.09_m20.68 _a19.17 Top5 Acc: 60.00 MEM: 20.68 f: 225.1698 @alpargun due to resource limitations i cant increase my batch size from 2, should i continue this training for more epochs or should i try other models like Kinetics/MVITv2_S_16x4 or Kinetics/MVITv2_L_40x3_test ? Also if i want to try SSv2 (https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md#ssv2) model which is pretrain on k400 what changes i need to do in my current dataset setteings ? Any suggestions to achieve better results would be greatly appreciated |
@AbrarKhan009 You can start with the smallest version, MVITv2_S_16x4, as a baseline with a batch size as high as your hardware allows and continue the training until either training error converges or validation error starts increasing (overtraining). So please do not set 50 as the max epochs and keep it 200 as the original config. Do not worry, MVITv2_S can still handle your task to classify 15 actions since it had 81% top-1 accuracy on K400 with 400 different actions. This model also uses only 16 input frames, compared to your old model's 32 frames. So, I hope you can have a higher batch size and a faster training. After a finished training, feel free to restore your last checkpoint for MVITv2_B_32x3 to directly continue your old training instead of starting from scratch to save time. So, you can compare the results for both transformer models. Regarding trying SSv2 with the model pretrained on K400 can be a good way to test if the problem is due to your custom dataset. However, this will need you to download SSv2 and prepare the folder structure according to the implementation in the file slowfast/datasets/ssv2.py. Furthermore, you need to modify the config file so that the output classes (NUM_CLASSES) will match SSv2's number of classes (just like what you did for your custom dataset. |
Thanks for the suggestion. I will follow your advice. I tried training MVITv2_S_16x4 with batch sizes of 16 and 8, but both gave me a CUDA out of Memory error. Now, I’m using a batch size of 4, and it’s running smoothly. |
@alpargun Hi good morning, my training of MVITv2_S_16x4 with batch sizes of 4 for 200 Epochs is done and these are the results i got : should i continue this training for more epochs? these results are goods as compare to the Base model thanks for your advice. |
Hello everyone, i hope you all are doing well. i have successfully done slowfast training with MVitv2 on my custom dataset Details of my training are given below.
i used MVITv2_B_32x3.yaml and followed this structure below.
SlowFast/
├── configs/
│ └── MyData/
│ └── MVITv2_B_32x3.yaml
├── data/
│ └── MyData/
│ ├── ClassA/
│ │ └── ins.mp4
│ ├── ClassB/
│ │ └── kep.mp4
│ ├── ClassC/
│ | └── tak.mp4
│ ├── train.csv
│ ├── test.csv
│ ├── val.csv
│ └── classids.json
├── slowfast/
│ └── datasets/
│ ├── init.py
│ ├── mydata.py
│ └── ...
└── ...
all this fine-tuning guidance on your custom dataset is already explained by @AlexanderMelde [here] (#149) thanks to him for his guidance.
My question is I am getting this output in the end,
train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698
Can somebody explain this output to me, _p50.93_f225.17 _t12.31_m10.69 and MEM: 10.69 and f: 225.1698
The text was updated successfully, but these errors were encountered: