Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retrained model does not get the same SPL on val unseen as reported in paper #14

Open
HubHop opened this issue Nov 20, 2019 · 4 comments

Comments

@HubHop
Copy link

HubHop commented Nov 20, 2019

Hi,

We are trying to retrain the EnvDrop model based on this repo, but the results are not same as reported in paper, we have tried different PyTorch versions, our best result with PyTorch 0.4.1 is 0.46, which is less than the reported 48% on val unseen dataset in terms of SPL, for detailed results you can refer to the attachment below.

Have we missed something important? or can you specify your working environment?

Our retrained model:
retrained_envdrop_results.xlsx

Results in paper:
reported_results

@airsplay
Copy link
Owner

airsplay commented Dec 6, 2019

Sorry for the late reply.

The original code with reproducible code/results is provided in another issue: #11.

Given the results in xlsx, the 2% drop in SPL (46%) is possibly caused by the drop in SR, which is still much higher compared to previous SotA (38%). The reason I currently find is some implementation differences inside the speaker when I cleaned the code (the original, reproducible code is provided in the other issue). Since the beam-search results which only rely on the inference of the speaker also changed. I haven't located which differences cause this issue. All the differences seem not to affect the training/inference process but the predictions are actually changed. Please kindly check the original code before I find it.

Best,
Hao

@HubHop
Copy link
Author

HubHop commented Dec 11, 2019

Thanks for your reply!

@xiran2018
Copy link

please help me ! After I train the model, i use the test environment to evaulate,the success rate result is below, i dont understand why the result is so low? please help me, is there something wrong when i test ?
image
the test script is:
name=agent
flag="--train validlistener --featdropout 0.3 --angleFeatSize 128
--feedback argmax
--mlWeight 0.2
--subout max --dropout 0.5 --optim rms --lr 1e-4 --iters 80000 --submit"
CUDA_VISIBLE_DEVICES=$1 python r2r_src/train.py $flag --name $name

@HubHop
Copy link
Author

HubHop commented Feb 22, 2020

Hi @jingquanliang , I didn't see your result, have you fixed it? Or you can try this script.

name=agent_bt
flag="--attn soft --train validlistener
--load snap/agent_bt/state_dict/best_val_unseen
--angleFeatSize 128
--submit
--featdropout 0.4
--subout max --maxAction 35"

CUDA_VISIBLE_DEVICES=$1 python r2r_src/train.py $flag --name $name

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants