Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Potential reproducibility issue with PyTorch >1.0.0 #9

Open
chihyaoma opened this issue Nov 4, 2019 · 2 comments
Open

Potential reproducibility issue with PyTorch >1.0.0 #9

chihyaoma opened this issue Nov 4, 2019 · 2 comments
Labels
help wanted Extra attention is needed

Comments

@chihyaoma
Copy link
Owner

Hi all,

Thank you so much for your interest in the project and the released code.

We made sure that the code can robustly reproduce the numbers we reported in the paper when released the code, and since then I have confirmed with several people who tried the code and they can also reproduce the results.

However, since the 2nd week in September, I started to receive a few emails reporting that they have an issue in reproducing the results either in the Self-Monitoring agent or the Regretful agent.

I decided to create this issue now so that people who are interested in the proposed method can run the code and continue their research with caution. Currently, I suspect this issue is due to version differences in PyTorch (or even other python/Cuda libraries that I am using) that cause unexpected behavior.

With the current conference deadlines, I expect myself to be able to start investigating this issue as early as the winter break (end of December).


Below are the experimental setups that I used for developing and releasing the code. I hope this would help to reproduce the results.

Code development:
PyTorch 0.4.1
CUDA: 9.2.148
Cudnn: 7104

I also tested it out on the following setting and made sure it can reproduce the results when releasing the code:
PyTorch 1.0.0
CUDA: 10.0.130
Cudnn: 7401

@chihyaoma chihyaoma added the help wanted Extra attention is needed label Nov 4, 2019
@liuhualin333
Copy link

liuhualin333 commented Nov 4, 2019

Hi Chih-Yao,

Thank you for opening this issue. I wrote you an email few hours ago. I will pose my experimental setups and results here for you to debug later.

I am using:

PyTorch: 1.2.0a0+e6a7071
CUDA: 10.1
Cudnn: 7602

with this command:
CUDA_VISIBLE_DEVICES=0 python tasks/R2R-pano/main.py
--exp_name 'regretful-agent-data|real'
--batch_size 64
--img_fc_dim 1024
--rnn_hidden_size 512
--eval_every_epochs 5
--arch 'regretful'
--progress_marker 1

And here is the screenshot of my tensorboard:
Screenshot 2019-11-04 at 6 56 54 pm 2
Screenshot 2019-11-04 at 6 57 40 pm 2

Looking forward to hearing from you soon.

@convnets
Copy link

convnets commented Apr 5, 2020

@chihyaoma @liuhualin333 Have you been able to reproduce the result with pytorch 1.2.0, cuda 10.1? I have the same problem here. My configurations are as follows.

# Name                    Version                   Build  Channel
python                    3.8.2                hcf32534_0
pytorch                   1.4.0           py3.8_cuda10.1.243_cudnn7.6.3_0    pytorch
numpy                     1.18.1           py38h4f9e942_0
networkx                  2.4                      pypi_0    pypi
torchvision               0.5.0                py38_cu101    pytorch

@chihyaoma chihyaoma changed the title Potential reproducibility issue Potential reproducibility issue with PyTorch >1.0.0 Nov 12, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants