-
Notifications
You must be signed in to change notification settings - Fork 45.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Frozen pretrained Faster RCNN/RFCN networks from model zoo yielding different outputs on different GPUs and runs #2374
Comments
Updated with a simplified test with model_zoo and second machine test that reproduced these issues. |
@aselle Was there supposed to be a response added with the removal of that tag? |
@nealwu, could you take a look? |
Looks like this is an object detection question. Looping in @derekjchow @jch1 |
Noticed a difference in using an environment variable CUDA_VISIBLE_DEVICES vs setting the config parameter. We're no longer able to reproduce this behavior with the environment variable, only with the config parameter. In addition, when using the config parameter, there is a small ~180 MB task on GPU0 when the config file is set to use GPU[1,2], which seems to correlate with these issues. |
Hi There, |
System information
What is the top-level directory of the model you are using:
Using unmodified pretrained coco models: faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017, faster_rcnn_resnet101_coco_11_06_2017, rfcn_resnet101_coco_11_06_2017
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
UPDATE: tested on two machines now, both reproduce it:
Machine 1: Linux Ubuntu 14.04.4 LTS
Machine 2: Linux Ubuntu 16.04.2 LTS
TensorFlow installed from (source or binary):
official docker container, with last commit 58fb6d7
docker version from 2017-08-24T02:37:57.51182742Z
TensorFlow version (use command below):
('v1.2.0-5-g435cdfc', '1.2.1')
Bazel version (if compiling from source):
N/A
CUDA/cuDNN version:
From official docker: CUDA 8., cuDNN 5.1.10
GPU model and memory:
Machine 1: Three nVIDIA GeForce GTX 1080, 12 GB
Machine 2: Two nVIDIA GeForce GTX 1080, 12 GB
Exact command to reproduce:
Running object_detection_tutorial.ipynb with different GPUs, either with export CUDA_VISIBLE_DEVICES=, or by setting it in the session config. Version that runs through 3 GPUs several times and compares output is included.
Describe the problem
Running on different GPUs yields different results, and GPUs 1 and 2 are not deterministic. This is accomplished by making devices 1,2 invisible, and tensorflow runs on 0, and so forth. This is using frozen pretrained networks from this repository's linked model zoo and the supplied object_detection_tutorial.ipynb with no modifications other than setting the cuda visible_device_list. The SSD frozen models, however, give identical outputs on the 3 GPUs from what I have seen.
I have also run cuda_memtest on all 3 GPUs, logs attached
UPDATE: I just tested on a second machine with 2 GPUs, and reproduced the issue. GPU 0 is deterministic, GPU 1 is not (and often produces bad results).
Source code / logs
I've attached the diff of the modified object_detection_tutorial.ipynb which loops over 3 GPUs 3 times and prints out the top box scores, which change depending on the run. Also attached is a PDF of that ipynb with detections drawn on it. Text output:
object_detection_tutorial.diff.txt
gpu_output_differences.pdf
Updated with longer run:
cuda_memtest.log.txt
The text was updated successfully, but these errors were encountered: