Frozen pretrained Faster RCNN/RFCN networks from model zoo yielding different outputs on different GPUs and runs #2374

EpochalEngineer · 2017-09-13T01:39:20Z

System information

What is the top-level directory of the model you are using:
Using unmodified pretrained coco models: faster_rcnn_inception_resnet_v2_atrous_coco_11_06_2017, faster_rcnn_resnet101_coco_11_06_2017, rfcn_resnet101_coco_11_06_2017
Have I written custom code (as opposed to using a stock example script provided in TensorFlow):
No
OS Platform and Distribution (e.g., Linux Ubuntu 16.04):
UPDATE: tested on two machines now, both reproduce it:
Machine 1: Linux Ubuntu 14.04.4 LTS
Machine 2: Linux Ubuntu 16.04.2 LTS
TensorFlow installed from (source or binary):
official docker container, with last commit 58fb6d7
docker version from 2017-08-24T02:37:57.51182742Z
TensorFlow version (use command below):
('v1.2.0-5-g435cdfc', '1.2.1')
Bazel version (if compiling from source):
N/A
CUDA/cuDNN version:
From official docker: CUDA 8., cuDNN 5.1.10
GPU model and memory:
Machine 1: Three nVIDIA GeForce GTX 1080, 12 GB
Machine 2: Two nVIDIA GeForce GTX 1080, 12 GB
Exact command to reproduce:
Running object_detection_tutorial.ipynb with different GPUs, either with export CUDA_VISIBLE_DEVICES=, or by setting it in the session config. Version that runs through 3 GPUs several times and compares output is included.

Describe the problem

Running on different GPUs yields different results, and GPUs 1 and 2 are not deterministic. This is accomplished by making devices 1,2 invisible, and tensorflow runs on 0, and so forth. This is using frozen pretrained networks from this repository's linked model zoo and the supplied object_detection_tutorial.ipynb with no modifications other than setting the cuda visible_device_list. The SSD frozen models, however, give identical outputs on the 3 GPUs from what I have seen.

I have also run cuda_memtest on all 3 GPUs, logs attached

UPDATE: I just tested on a second machine with 2 GPUs, and reproduced the issue. GPU 0 is deterministic, GPU 1 is not (and often produces bad results).

Source code / logs

I've attached the diff of the modified object_detection_tutorial.ipynb which loops over 3 GPUs 3 times and prints out the top box scores, which change depending on the run. Also attached is a PDF of that ipynb with detections drawn on it. Text output:

Evaluating image 0

Running on GPU 0
Top 4 box scores:
Iter 1: [ 0.99978215 0.99857557 0.95300484 0.91580492]
Iter 2: [ 0.99978215 0.99857557 0.95300484 0.91580492]
Iter 3: [ 0.99978215 0.99857557 0.95300484 0.91580492]

Running on GPU 1
Top 4 box scores:
Iter 1: [ 0.68702352 0.16781448 0.13143283 0.12993629]
Iter 2: [ 0.18502565 0.16854601 0.08074528 0.07859289]
Iter 3: [ 0.18502565 0.16854601 0.05546702 0.05111229]

Running on GPU 2
Top 4 box scores:
Iter 1: [ 0.68702352 0.16781448 0.13143283 0.12993629]
Iter 2: [ 0.18941374 0.18502565 0.16854601 0.16230994]
Iter 3: [ 0.18502565 0.16854601 0.05546702 0.05482833]

Evaluating image 1

Running on GPU 0
Top 4 box scores:
Iter 1: [ 0.99755412 0.99750346 0.99380219 0.99067008]
Iter 2: [ 0.99755412 0.99750346 0.99380219 0.99067008]
Iter 3: [ 0.99755412 0.99750346 0.99380219 0.99067008]

Running on GPU 1
Top 4 box scores:
Iter 1: [ 0.96881998 0.96441168 0.96164131 0.96006596]
Iter 2: [ 0.9377929 0.91686022 0.80374646 0.79758978]
Iter 3: [ 0.90396696 0.89217037 0.85456908 0.85334581]

Running on GPU 2
Top 4 box scores:
Iter 1: [ 0.9377929 0.91686022 0.80374646 0.79758978]
Iter 2: [ 0.9377929 0.91686022 0.80374646 0.79758978]
Iter 3: [ 0.9377929 0.91686022 0.80374646 0.79758978]

object_detection_tutorial.diff.txt

gpu_output_differences.pdf

Updated with longer run:
cuda_memtest.log.txt

The text was updated successfully, but these errors were encountered:

EpochalEngineer · 2017-09-19T18:42:15Z

Updated with a simplified test with model_zoo and second machine test that reproduced these issues.

EpochalEngineer · 2017-09-19T21:40:14Z

@aselle Was there supposed to be a response added with the removal of that tag?

aselle · 2017-09-25T21:03:26Z

@nealwu, could you take a look?

nealwu · 2017-09-25T21:41:23Z

Looks like this is an object detection question. Looping in @derekjchow @jch1

EpochalEngineer · 2017-10-02T20:16:43Z

Noticed a difference in using an environment variable CUDA_VISIBLE_DEVICES vs setting the config parameter. We're no longer able to reproduce this behavior with the environment variable, only with the config parameter. In addition, when using the config parameter, there is a small ~180 MB task on GPU0 when the config file is set to use GPU[1,2], which seems to correlate with these issues.

tensorflowbutler · 2020-01-29T23:33:22Z

Hi There,
We are checking to see if you still need help on this, as this seems to be considerably old issue. Please update this issue with the latest information, code snippet to reproduce your issue and error you are seeing.
If we don't hear from you in the next 7 days, this issue will be closed automatically. If you don't need help on this issue any more, please consider closing this.

cy89 added the stat:awaiting response Waiting on input from the contributor label Sep 14, 2017

EpochalEngineer changed the title ~~Different Object Detection outputs from frozen inference graph ?~~ Frozen pretrained Faster RCNN/RFCN networks from model zoo yielding different outputs on different GPUs and runs Sep 19, 2017

aselle removed the stat:awaiting response Waiting on input from the contributor label Sep 19, 2017

aselle added stat:community support stat:awaiting model gardener Waiting on input from TensorFlow model gardener type:bug Bug in the code and removed stat:community support labels Sep 25, 2017

nealwu assigned derekjchow and jch1 Sep 25, 2017

tombstone added this to To Do in Object Detection via automation Nov 18, 2017

tensorflowbutler removed the stat:awaiting model gardener Waiting on input from TensorFlow model gardener label Apr 6, 2018

tensorflowbutler closed this as completed Feb 7, 2020

Object Detection automation moved this from To Do to Done Feb 7, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Frozen pretrained Faster RCNN/RFCN networks from model zoo yielding different outputs on different GPUs and runs #2374

Frozen pretrained Faster RCNN/RFCN networks from model zoo yielding different outputs on different GPUs and runs #2374

EpochalEngineer commented Sep 13, 2017 •

edited

Loading

EpochalEngineer commented Sep 19, 2017 •

edited

Loading

EpochalEngineer commented Sep 19, 2017

aselle commented Sep 25, 2017

nealwu commented Sep 25, 2017

EpochalEngineer commented Oct 2, 2017 •

edited

Loading

tensorflowbutler commented Jan 29, 2020

Frozen pretrained Faster RCNN/RFCN networks from model zoo yielding different outputs on different GPUs and runs #2374

Frozen pretrained Faster RCNN/RFCN networks from model zoo yielding different outputs on different GPUs and runs #2374

Comments

EpochalEngineer commented Sep 13, 2017 • edited Loading

System information

Describe the problem

Source code / logs

EpochalEngineer commented Sep 19, 2017 • edited Loading

EpochalEngineer commented Sep 19, 2017

aselle commented Sep 25, 2017

nealwu commented Sep 25, 2017

EpochalEngineer commented Oct 2, 2017 • edited Loading

tensorflowbutler commented Jan 29, 2020

EpochalEngineer commented Sep 13, 2017 •

edited

Loading

EpochalEngineer commented Sep 19, 2017 •

edited

Loading

EpochalEngineer commented Oct 2, 2017 •

edited

Loading