Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about reproduce SlowFast+NL verb model #24

Open
wkw1259 opened this issue Mar 20, 2023 · 4 comments
Open

Question about reproduce SlowFast+NL verb model #24

wkw1259 opened this issue Mar 20, 2023 · 4 comments

Comments

@wkw1259
Copy link

wkw1259 commented Mar 20, 2023

Thanks for release the relevant code! But I'm having trouble in reproducing the model, and I'd like to ask for advice.

I‘m trying to reproduce the SlowFast+NL verb model by myself. But I found that the results I reproduced are much different from the reported results, for example, you report the valid recall@5 is 23.38, but I can only get 16.46, but the Acc@1 is rasing from 46.79 to 50.66.

I conduct the experiment on 4 V100 gpus, and the cuda version is 10.1.

To reproduce the SlowFast+NL verb model, I directly copy the bash code from the provided log file:
python main_dist.py vbonly_sfast_kpret_10Feb20 --train.bs=8 --train.bsv=8 --train.nw=8 --train.nwv=8 --task_type=vb --mdl.mdl_name=sf_base --mdl.sf_mdl_name=slow_fast_nl_r50_8x8 --debug_mode=False --train.save_mdl_epochs=True --train.resume=False --mdl.load_sf_pretrained=True

Here is the detailed evaluation results:

Per_Ev_Top_1:0.5066365007541478
Per_Ev_Top_2:0.6547511312217195
Per_Ev_Top_3:0.7334841628959275
Per_Ev_Top_4:0.7864253393665158
Per_Ev_Top_5:0.8250377073906485
Per_Vid_Top_1:0.06862745098039216
Per_Vid_Top_2:0.19155354449472098
Per_Vid_Top_3:0.2933634992458522
Per_Vid_Top_4:0.3680241327300151
Per_Vid_Top_5:0.4464555052790347
acc:0.8250377073906485
recall_macro_1_th_0:0.06501041305213098
num_vbs_thresh_0:566
recall_macro_1_th_1:0.09153207409827398
num_vbs_thresh_1:402
recall_macro_1_th_2:0.11449808765774806
num_vbs_thresh_2:317
recall_macro_1_th_3:0.13492897318775515
num_vbs_thresh_3:269
recall_macro_1_th_4:0.14998303217977743
num_vbs_thresh_4:242
recall_macro_1_th_5:0.164660063245441
num_vbs_thresh_5:218
recall_macro_1_th_6:0.17858653625624946
num_vbs_thresh_6:201
recall_macro_1_th_7:0.18516635828010752
num_vbs_thresh_7:190
recall_macro_1_th_8:0.1947589337401135
num_vbs_thresh_8:180
recall_macro_1_th_9:0.20317149396575185
num_vbs_thresh_9:172

And here is the training log:

epochs trn_loss val_loss val_Per_Ev_Top_1 val_Per_Ev_Top_5 val_recall_macro_1_th_9
1 4.9298 0.0000 0.5048 0.8278 0.1618
2 4.4623 0.0000 0.5066 0.8250 0.2032
3 3.9171 0.0000 0.4916 0.7916 0.2339
4 3.5478 0.0000 0.4946 0.7938 0.2299
5 3.3074 0.0000 0.4599 0.7475 0.2257
6 3.1090 0.0000 0.4477 0.7275 0.2249
7 2.9608 0.0000 0.4181 0.6944 0.2286
8 2.7729 0.0000 0.3733 0.6617 0.2271
9 2.5812 0.0000 0.3736 0.6707 0.2248
10 2.5237 0.0000 0.3202 0.6130 0.2099
epochs done 9. Exited due to exception False. Total time taken 46532.6358
epochs val_loss val_Per_Ev_Top_1 val_Per_Ev_Top_5 val_recall_macro_1_th_9
2 0.0000 0.5066 0.8250 0.2032

I do not change any other source code from the github. Is there any possible reason why this problem happened?I don't think it's the problem of the dataset not being well downloaded, because the model works fine under the train/valid split file restrictions.

@TheShadow29
Copy link
Owner

@wkw1259 Which model do you use? The best model is in epoch 3, which gets 23.39 on recall@5. Could you confirm if this is the issue?

@Versocial
Copy link

@wkw1259 Which model do you use? The best model is in epoch 3, which gets 23.39 on recall@5. Could you confirm if this is the issue?
In the article there is a
”we only consider the set of verbs which appears at least twice within the ground-truth annotations (each event in val and test sets has 10 verb annotations).“,
so I thought val_recall_macro_1_th_2 is the recall@5 instead of val_recall_macro_1_th_9 .
Am I wrong?
Thankx

@TheShadow29
Copy link
Owner

@Versocial

we only consider the set of verbs which appears at least twice within the ground-truth annotations (each event in val and test sets has 10 verb annotations)

This refers to the recall computation. As in if a particular event has 10 verb annotation, you would only consider those appearing twice as the ground-truth and discard others.

The macro_th_9 refers to the overall frequency in the dataset. So if a verb doesn't appear at least 9 times in the entire dataset, it wouldn't be used for the macro computation because these classes would be very noisy.

@Versocial
Copy link

Clear, thanks for your detailed explanation

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants