Successful Training done on Custom dataset, but have some Question about the output #716

AbrarKhan009 · 2024-08-06T07:17:00Z

Hello everyone, i hope you all are doing well. i have successfully done slowfast training with MVitv2 on my custom dataset Details of my training are given below.

i used MVITv2_B_32x3.yaml and followed this structure below.

SlowFast/
├── configs/
│ └── MyData/
│ └── MVITv2_B_32x3.yaml
├── data/
│ └── MyData/
│ ├── ClassA/
│ │ └── ins.mp4
│ ├── ClassB/
│ │ └── kep.mp4
│ ├── ClassC/
│ | └── tak.mp4
│ ├── train.csv
│ ├── test.csv
│ ├── val.csv
│ └── classids.json
├── slowfast/
│ └── datasets/
│ ├── init.py
│ ├── mydata.py
│ └── ...
└── ...
all this fine-tuning guidance on your custom dataset is already explained by @AlexanderMelde [here] (#149) thanks to him for his guidance.

My question is I am getting this output in the end,
train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698
Can somebody explain this output to me, _p50.93_f225.17 _t12.31_m10.69 and MEM: 10.69 and f: 225.1698

AbrarKhan009 · 2024-08-06T08:28:17Z

My training config is given below,

TRAIN:
ENABLE: True
DATASET: mydata
BATCH_SIZE: 1 # bcz of limited resources i am using this batch size
EVAL_PERIOD: 5
CHECKPOINT_PERIOD: 5
AUTO_RESUME: True
CHECKPOINT_EPOCH_RESET: True
CHECKPOINT_FILE_PATH: "/home/mukhan/project/slowfast/MViTv2_B_32x3_k400_f304025456.pyth"
CHECKPOINT_IN_INIT: True

DATA:
USE_OFFSET_SAMPLING: True
DECODING_BACKEND: torchvision
NUM_FRAMES: 32
SAMPLING_RATE: 1
TRAIN_JITTER_SCALES: [256, 320]
TRAIN_CROP_SIZE: 224
TEST_CROP_SIZE: 224
INPUT_CHANNEL_NUM: [3]
PATH_TO_DATA_DIR: "/home/mukhan/project/slowfast/data/Mydata/" # csv files locations for train and val
TRAIN_JITTER_SCALES_RELATIVE: [0.08, 1.0]
TRAIN_JITTER_ASPECT_RELATIVE: [0.75, 1.3333]

MVIT:
ZERO_DECAY_POS_CLS: False
USE_ABS_POS: False
REL_POS_SPATIAL: True
REL_POS_TEMPORAL: True
DEPTH: 24
NUM_HEADS: 1
EMBED_DIM: 96
PATCH_KERNEL: (3, 7, 7)
PATCH_STRIDE: (2, 4, 4)
PATCH_PADDING: (1, 3, 3)
MLP_RATIO: 4.0
QKV_BIAS: True
DROPPATH_RATE: 0.3
NORM: "layernorm"
MODE: "conv"
CLS_EMBED_ON: True
DIM_MUL: [[2, 2.0], [5, 2.0], [21, 2.0]]
HEAD_MUL: [[2, 2.0], [5, 2.0], [21, 2.0]]
POOL_KVQ_KERNEL: [3, 3, 3]
POOL_KV_STRIDE_ADAPTIVE: [1, 8, 8]
POOL_Q_STRIDE:
[
[0, 1, 1, 1],
[1, 1, 1, 1],
[2, 1, 2, 2],
[3, 1, 1, 1],
[4, 1, 1, 1],
[5, 1, 2, 2],
[6, 1, 1, 1],
[7, 1, 1, 1],
[8, 1, 1, 1],
[9, 1, 1, 1],
[10, 1, 1, 1],
[11, 1, 1, 1],
[12, 1, 1, 1],
[13, 1, 1, 1],
[14, 1, 1, 1],
[15, 1, 1, 1],
[16, 1, 1, 1],
[17, 1, 1, 1],
[18, 1, 1, 1],
[19, 1, 1, 1],
[20, 1, 1, 1],
[21, 1, 2, 2],
[22, 1, 1, 1],
[23, 1, 1, 1],
]
DROPOUT_RATE: 0.0
DIM_MUL_IN_ATT: True
RESIDUAL_POOLING: True

AUG:
NUM_SAMPLE: 2
ENABLE: True
COLOR_JITTER: 0.4
AA_TYPE: rand-m7-n4-mstd0.5-inc1
INTERPOLATION: bicubic
RE_PROB: 0.25
RE_MODE: pixel
RE_COUNT: 1
RE_SPLIT: False

MIXUP:
ENABLE: True
ALPHA: 0.8
CUTMIX_ALPHA: 1.0
PROB: 1.0
SWITCH_PROB: 0.5
LABEL_SMOOTH_VALUE: 0.1

SOLVER:
ZERO_WD_1D_PARAM: True
BASE_LR_SCALE_NUM_SHARDS: True
CLIP_GRAD_L2NORM: 1.0
BASE_LR: 0.00001
COSINE_AFTER_WARMUP: True
COSINE_END_LR: 1e-6
WARMUP_START_LR: 1e-6
WARMUP_EPOCHS: 30.0
LR_POLICY: cosine
MAX_EPOCH: 50
MOMENTUM: 0.9
WEIGHT_DECAY: 0.05
OPTIMIZING_METHOD: adamw

MODEL:
NUM_CLASSES: 15
ARCH: mvit
MODEL_NAME: MViT
LOSS_FUNC: soft_cross_entropy
DROPOUT_RATE: 0.5

TEST:
ENABLE: False
DATASET: mydata
BATCH_SIZE: 64
NUM_SPATIAL_CROPS: 1
NUM_ENSEMBLE_VIEWS: 5

DATA_LOADER:
NUM_WORKERS: 8
PIN_MEMORY: True

NUM_GPUS: 1
NUM_SHARDS: 1
RNG_SEED: 0
OUTPUT_DIR: "/home/mukhan/project/slowfast/output"

TENSORBOARD:
ENABLE: True
LOG_DIR: "/home/mukhan/project/slowfast/output/runs" # Leave empty to use cfg.OUTPUT_DIR/runs-{cfg.TRAIN.DATASET} as path.
CLASS_NAMES_PATH: "/home/mukhan/project/slowfast/data/Mydata/classnames.json" # Path to json file providing class_name - id mapping.
CONFUSION_MATRIX:
ENABLE: True
SUBSET_PATH: "/home/mukhan/project/slowfast/data/Mydata/classnames.txt" # Path to txt file contains class names separated by newline characters.
# Only classes in this file will be visualized in the confusion matrix.

AbrarKhan009 · 2024-08-06T08:32:34Z

i have some question regarding the outputs,
train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698
can somebody explain this final message to me.

also why during the training lr is always 0.000

AbrarKhan009 · 2024-08-06T08:34:44Z

These are the graphs and confusion matrix i got after running 50 epochs

I understand that these results are not satisfactory. Could anyone of you please advise on how I can improve them? Specifically, I would like to know which parameters or aspects of the model training process I should consider adjusting to achieve better performance. Any suggestions or recommendations would be greatly appreciated.

alpargun · 2024-08-06T08:56:54Z

i have some question regarding the outputs, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 can somebody explain this final message to me.

also why during the training lr is always 0.000

answered here: #664 (comment)

Also, I would immediately say batch size = 1 can be one of the limitations for a vision task regarding you dissatisfaction.

AbrarKhan009 · 2024-08-06T10:32:19Z

i have some question regarding the outputs, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 can somebody explain this final message to me.
also why during the training lr is always 0.000

answered here: #664 (comment)

Also, I would immediately say batch size = 1 can be one of the limitations for a vision task regarding you dissatisfaction.

thanks for the advice, now i have increased the batch size from 1 to 2 and start the training again for 100 epochs i will update about the result here after training is done.

alpargun · 2024-08-06T10:46:12Z

How big is your custom dataset? If also your dataset is limited, as relatively small ConvNet can also achieve the task. I understand training a transformer model can be resource-wise demanding, and you can try a batch size of 16, 32, or even 64 with a ConvNet to achieve a better performance because a batch size of 2 is still on the far low side.

i have some question regarding the outputs, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 can somebody explain this final message to me.

also why during the training lr is always 0.000

Can you also show the output where it says your LR is 0.000? Looking at your config, your LR is set as 0.00001 and this value might have been simply clipped by the display precision in the output terminal, so it does not look concerning as your accuracy already is improving over time and it is not possible that you indeed have a zero LR, which would yield zero weight updates.

AbrarKhan009 · 2024-08-06T11:06:45Z

How big is your custom dataset? If also your dataset is limited, as relatively small ConvNet can also achieve the task. I understand training a transformer model can be resource-wise demanding, and you can try a batch size of 16, 32, or even 64 with a ConvNet to achieve a better performance because a batch size of 2 is still on the far low side.

i have some question regarding the outputs, train_net.py: 759: training done: _p50.93_f225.17 _t12.31_m10.69 _a25.00 Top5 Acc: 66.67 MEM: 10.69 f: 225.1698 can somebody explain this final message to me.
also why during the training lr is always 0.000

Can you also show the output where it says your LR is 0.000? Looking at your config, your LR is set as 0.00001 and this value might have been simply clipped by the display precision in the output terminal, so it does not look concerning as your accuracy already is improving over time and it is not possible that you indeed have a zero LR, which would yield zero weight updates.

I have a synthetic dataset consisting of 15 classes of human activities. For each class, I have around 40 videos. My task is to train a vision transformer model for human activity recognition. After training, I will test it on a real-world dataset where I have 5 videos for each class.

While I understand that using a ConvNet might be more resource-efficient, my task is domain-specific and requires the use of a vision transformer model, regardless of the initial results. Therefore, I need to focus on improving the performance of the vision transformer rather than switching to a CNN.

Any suggestions for optimizing the vision transformer to achieve better results would be greatly appreciated

alpargun · 2024-08-06T23:49:17Z

The simplest things you can start with are:

increase your training batch size as high as your hardware allows
train for more epochs as your train loss still has not converged in the plots you provided. Train until your train loss does not decrease anymore or your validation loss started increasing (due to overfitting)

AbrarKhan009 · 2024-08-07T21:26:09Z

Update On my Training after 105 Epochs these are the below results i got on Kinetics/MVITv2_B_32x3

train_net.py: 759: training done: _p50.93_f225.17 _t11.09_m20.68 _a19.17 Top5 Acc: 60.00 MEM: 20.68 f: 225.1698

@alpargun due to resource limitations i cant increase my batch size from 2, should i continue this training for more epochs or should i try other models like Kinetics/MVITv2_S_16x4 or Kinetics/MVITv2_L_40x3_test ?

Also if i want to try SSv2 (https://github.com/facebookresearch/SlowFast/blob/main/projects/mvitv2/README.md#ssv2) model which is pretrain on k400 what changes i need to do in my current dataset setteings ? Any suggestions to achieve better results would be greatly appreciated

alpargun · 2024-08-07T22:19:05Z

@AbrarKhan009 You can start with the smallest version, MVITv2_S_16x4, as a baseline with a batch size as high as your hardware allows and continue the training until either training error converges or validation error starts increasing (overtraining). So please do not set 50 as the max epochs and keep it 200 as the original config. Do not worry, MVITv2_S can still handle your task to classify 15 actions since it had 81% top-1 accuracy on K400 with 400 different actions. This model also uses only 16 input frames, compared to your old model's 32 frames. So, I hope you can have a higher batch size and a faster training.

After a finished training, feel free to restore your last checkpoint for MVITv2_B_32x3 to directly continue your old training instead of starting from scratch to save time. So, you can compare the results for both transformer models.

Regarding trying SSv2 with the model pretrained on K400 can be a good way to test if the problem is due to your custom dataset. However, this will need you to download SSv2 and prepare the folder structure according to the implementation in the file slowfast/datasets/ssv2.py. Furthermore, you need to modify the config file so that the output classes (NUM_CLASSES) will match SSv2's number of classes (just like what you did for your custom dataset.

AbrarKhan009 · 2024-08-08T14:15:05Z

@AbrarKhan009 You can start with the smallest version, MVITv2_S_16x4, as a baseline with a batch size as high as your hardware allows and continue the training until either training error converges or validation error starts increasing (overtraining). So please do not set 50 as the max epochs and keep it 200 as the original config. Do not worry, MVITv2_S can still handle your task to classify 15 actions since it had 81% top-1 accuracy on K400 with 400 different actions. This model also uses only 16 input frames, compared to your old model's 32 frames. So, I hope you can have a higher batch size and a faster training.

After a finished training, feel free to restore your last checkpoint for MVITv2_B_32x3 to directly continue your old training instead of starting from scratch to save time. So, you can compare the results for both transformer models.

Regarding trying SSv2 with the model pretrained on K400 can be a good way to test if the problem is due to your custom dataset. However, this will need you to download SSv2 and prepare the folder structure according to the implementation in the file slowfast/datasets/ssv2.py. Furthermore, you need to modify the config file so that the output classes (NUM_CLASSES) will match SSv2's number of classes (just like what you did for your custom dataset.

Thanks for the suggestion. I will follow your advice. I tried training MVITv2_S_16x4 with batch sizes of 16 and 8, but both gave me a CUDA out of Memory error. Now, I’m using a batch size of 4, and it’s running smoothly.

AbrarKhan009 · 2024-08-09T04:05:23Z

@alpargun Hi good morning, my training of MVITv2_S_16x4 with batch sizes of 4 for 200 Epochs is done and these are the results i got :
train_net.py: 759: training done: _p34.24_f64.46 _t2.96_m11.96 _a37.50 Top5 Acc: 69.17 MEM: 11.96 f: 64.4566

....

should i continue this training for more epochs? these results are goods as compare to the Base model thanks for your advice.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Successful Training done on Custom dataset, but have some Question about the output #716

Successful Training done on Custom dataset, but have some Question about the output #716

AbrarKhan009 commented Aug 6, 2024 •

edited

Loading

AbrarKhan009 commented Aug 6, 2024 •

edited

Loading

AbrarKhan009 commented Aug 6, 2024

AbrarKhan009 commented Aug 6, 2024 •

edited

Loading

alpargun commented Aug 6, 2024 •

edited

Loading

AbrarKhan009 commented Aug 6, 2024

alpargun commented Aug 6, 2024

AbrarKhan009 commented Aug 6, 2024

alpargun commented Aug 6, 2024

AbrarKhan009 commented Aug 7, 2024 •

edited

Loading

alpargun commented Aug 7, 2024

AbrarKhan009 commented Aug 8, 2024

AbrarKhan009 commented Aug 9, 2024 •

edited

Loading

Successful Training done on Custom dataset, but have some Question about the output #716

Successful Training done on Custom dataset, but have some Question about the output #716

Comments

AbrarKhan009 commented Aug 6, 2024 • edited Loading

AbrarKhan009 commented Aug 6, 2024 • edited Loading

AbrarKhan009 commented Aug 6, 2024

AbrarKhan009 commented Aug 6, 2024 • edited Loading

alpargun commented Aug 6, 2024 • edited Loading

AbrarKhan009 commented Aug 6, 2024

alpargun commented Aug 6, 2024

AbrarKhan009 commented Aug 6, 2024

alpargun commented Aug 6, 2024

AbrarKhan009 commented Aug 7, 2024 • edited Loading

alpargun commented Aug 7, 2024

AbrarKhan009 commented Aug 8, 2024

AbrarKhan009 commented Aug 9, 2024 • edited Loading

AbrarKhan009 commented Aug 6, 2024 •

edited

Loading

AbrarKhan009 commented Aug 6, 2024 •

edited

Loading

AbrarKhan009 commented Aug 6, 2024 •

edited

Loading

alpargun commented Aug 6, 2024 •

edited

Loading

AbrarKhan009 commented Aug 7, 2024 •

edited

Loading

AbrarKhan009 commented Aug 9, 2024 •

edited

Loading