Potential reproducibility issue with PyTorch >1.0.0 #9

chihyaoma · 2019-11-04T05:47:25Z

Hi all,

Thank you so much for your interest in the project and the released code.

We made sure that the code can robustly reproduce the numbers we reported in the paper when released the code, and since then I have confirmed with several people who tried the code and they can also reproduce the results.

However, since the 2nd week in September, I started to receive a few emails reporting that they have an issue in reproducing the results either in the Self-Monitoring agent or the Regretful agent.

I decided to create this issue now so that people who are interested in the proposed method can run the code and continue their research with caution. Currently, I suspect this issue is due to version differences in PyTorch (or even other python/Cuda libraries that I am using) that cause unexpected behavior.

With the current conference deadlines, I expect myself to be able to start investigating this issue as early as the winter break (end of December).

Below are the experimental setups that I used for developing and releasing the code. I hope this would help to reproduce the results.

Code development:
PyTorch 0.4.1
CUDA: 9.2.148
Cudnn: 7104

I also tested it out on the following setting and made sure it can reproduce the results when releasing the code:
PyTorch 1.0.0
CUDA: 10.0.130
Cudnn: 7401

liuhualin333 · 2019-11-04T10:59:01Z

Hi Chih-Yao,

Thank you for opening this issue. I wrote you an email few hours ago. I will pose my experimental setups and results here for you to debug later.

I am using:

PyTorch: 1.2.0a0+e6a7071
CUDA: 10.1
Cudnn: 7602

with this command:
CUDA_VISIBLE_DEVICES=0 python tasks/R2R-pano/main.py
--exp_name 'regretful-agent-data|real'
--batch_size 64
--img_fc_dim 1024
--rnn_hidden_size 512
--eval_every_epochs 5
--arch 'regretful'
--progress_marker 1

And here is the screenshot of my tensorboard:

Looking forward to hearing from you soon.

convnets · 2020-04-05T15:32:33Z

@chihyaoma @liuhualin333 Have you been able to reproduce the result with pytorch 1.2.0, cuda 10.1? I have the same problem here. My configurations are as follows.

# Name                    Version                   Build  Channel
python                    3.8.2                hcf32534_0
pytorch                   1.4.0           py3.8_cuda10.1.243_cudnn7.6.3_0    pytorch
numpy                     1.18.1           py38h4f9e942_0
networkx                  2.4                      pypi_0    pypi
torchvision               0.5.0                py38_cu101    pytorch

chihyaoma added the help wanted Extra attention is needed label Nov 4, 2019

chihyaoma changed the title ~~Potential reproducibility issue~~ Potential reproducibility issue with PyTorch >1.0.0 Nov 12, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential reproducibility issue with PyTorch >1.0.0 #9

Potential reproducibility issue with PyTorch >1.0.0 #9

chihyaoma commented Nov 4, 2019

liuhualin333 commented Nov 4, 2019 •

edited

Loading

convnets commented Apr 5, 2020 •

edited

Loading

Potential reproducibility issue with PyTorch >1.0.0 #9

Potential reproducibility issue with PyTorch >1.0.0 #9

Comments

chihyaoma commented Nov 4, 2019

liuhualin333 commented Nov 4, 2019 • edited Loading

convnets commented Apr 5, 2020 • edited Loading

liuhualin333 commented Nov 4, 2019 •

edited

Loading

convnets commented Apr 5, 2020 •

edited

Loading