Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error runing the demo #13

Closed
jiangchuan617 opened this issue Aug 6, 2019 · 2 comments
Closed

Error runing the demo #13

jiangchuan617 opened this issue Aug 6, 2019 · 2 comments

Comments

@jiangchuan617
Copy link

jiangchuan617 commented Aug 6, 2019

Hello,Thank you very much for your open-sourcing code!
when trying to run the demo,I have been encountering some problems.

when i want to create MPIs using pretrained network,

(j3) m00486393@ubuntu:~/j500003470/Code/LLFF$ python imgs2mpis.py     data/testscene/     data/testscene/mpis_360     --height 360

factor/width/height args: [None, None, 360]
Loaded image data (360, 480, 3, 20) [ 360.         480.         391.3782448]
Creating session
2019-08-05 23:28:31.966128: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2019-08-05 23:28:31.966155: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2019-08-05 23:28:31.966163: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2019-08-05 23:28:31.966171: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2019-08-05 23:28:31.966178: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2019-08-05 23:28:40.093371: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 0 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:2d:00.0
Total memory: 15.89GiB
Free memory: 15.60GiB
2019-08-05 23:28:40.595427: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x333ce40 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-08-05 23:28:40.597315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 1 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:31:00.0
Total memory: 15.89GiB
Free memory: 15.60GiB
2019-08-05 23:28:41.108226: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x3341140 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-08-05 23:28:41.110100: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 2 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:35:00.0
Total memory: 15.89GiB
Free memory: 15.60GiB
2019-08-05 23:28:41.655882: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x3345440 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-08-05 23:28:41.657770: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 3 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:39:00.0
Total memory: 15.89GiB
Free memory: 15.60GiB
2019-08-05 23:28:42.211243: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x33497a0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-08-05 23:28:42.213097: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 4 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:a9:00.0
Total memory: 15.89GiB
Free memory: 15.60GiB
2019-08-05 23:28:42.785812: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x334dce0 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-08-05 23:28:42.787681: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 5 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:ad:00.0
Total memory: 15.89GiB
Free memory: 15.60GiB
2019-08-05 23:28:43.374371: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x3352220 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-08-05 23:28:43.376293: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 6 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:b1:00.0
Total memory: 15.89GiB
Free memory: 15.60GiB
2019-08-05 23:28:43.974571: W tensorflow/stream_executor/cuda/cuda_driver.cc:523] A non-primary context 0x3356760 exists before initializing the StreamExecutor. We haven't verified StreamExecutor works with that.
2019-08-05 23:28:43.976437: I tensorflow/core/common_runtime/gpu/gpu_device.cc:955] Found device 7 with properties:
name: Tesla P100-PCIE-16GB
major: 6 minor: 0 memoryClockRate (GHz) 1.3285
pciBusID 0000:b5:00.0
Total memory: 15.89GiB
Free memory: 15.60GiB
2019-08-05 23:28:43.982265: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 0 and 4
2019-08-05 23:28:43.982292: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 0 and 5
2019-08-05 23:28:43.982308: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 0 and 6
2019-08-05 23:28:43.982324: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 0 and 7
2019-08-05 23:28:43.986110: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 1 and 4
2019-08-05 23:28:43.986132: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 1 and 5
2019-08-05 23:28:43.986147: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 1 and 6
2019-08-05 23:28:43.986162: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 1 and 7
2019-08-05 23:28:43.988118: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 2 and 4
2019-08-05 23:28:43.988140: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 2 and 5
2019-08-05 23:28:43.988156: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 2 and 6
2019-08-05 23:28:43.988171: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 2 and 7
2019-08-05 23:28:43.988254: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 3 and 4
2019-08-05 23:28:43.988270: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 3 and 5
2019-08-05 23:28:43.988285: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 3 and 6
2019-08-05 23:28:43.988299: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 3 and 7
2019-08-05 23:28:43.988315: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 4 and 0
2019-08-05 23:28:43.988330: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 4 and 1
2019-08-05 23:28:43.988344: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 4 and 2
2019-08-05 23:28:43.988359: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 4 and 3
2019-08-05 23:28:43.994068: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 5 and 0
2019-08-05 23:28:43.994091: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 5 and 1
2019-08-05 23:28:43.994106: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 5 and 2
2019-08-05 23:28:43.994121: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 5 and 3
2019-08-05 23:28:43.997893: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 6 and 0
2019-08-05 23:28:43.997915: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 6 and 1
2019-08-05 23:28:43.997930: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 6 and 2
2019-08-05 23:28:43.997945: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 6 and 3
2019-08-05 23:28:43.999903: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 7 and 0
2019-08-05 23:28:43.999925: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 7 and 1
2019-08-05 23:28:43.999941: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 7 and 2
2019-08-05 23:28:43.999956: I tensorflow/core/common_runtime/gpu/gpu_device.cc:847] Peer access not supported between device ordinals 7 and 3
2019-08-05 23:28:44.000405: I tensorflow/core/common_runtime/gpu/gpu_device.cc:976] DMA: 0 1 2 3 4 5 6 7
2019-08-05 23:28:44.000418: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 0:   Y Y Y Y N N N N
2019-08-05 23:28:44.000426: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 1:   Y Y Y Y N N N N
2019-08-05 23:28:44.000434: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 2:   Y Y Y Y N N N N
2019-08-05 23:28:44.000441: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 3:   Y Y Y Y N N N N
2019-08-05 23:28:44.000448: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 4:   N N N N Y Y Y Y
2019-08-05 23:28:44.000456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 5:   N N N N Y Y Y Y
2019-08-05 23:28:44.000463: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 6:   N N N N Y Y Y Y
2019-08-05 23:28:44.000470: I tensorflow/core/common_runtime/gpu/gpu_device.cc:986] 7:   N N N N Y Y Y Y
2019-08-05 23:28:44.000485: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:0) -> (device: 0, name: Tesla P100-PCIE-16GB, pci bus id: 0000:2d:00.0)
2019-08-05 23:28:44.000495: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:1) -> (device: 1, name: Tesla P100-PCIE-16GB, pci bus id: 0000:31:00.0)
2019-08-05 23:28:44.000504: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:2) -> (device: 2, name: Tesla P100-PCIE-16GB, pci bus id: 0000:35:00.0)
2019-08-05 23:28:44.000510: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:3) -> (device: 3, name: Tesla P100-PCIE-16GB, pci bus id: 0000:39:00.0)
2019-08-05 23:28:44.000518: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:4) -> (device: 4, name: Tesla P100-PCIE-16GB, pci bus id: 0000:a9:00.0)
2019-08-05 23:28:44.000525: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:5) -> (device: 5, name: Tesla P100-PCIE-16GB, pci bus id: 0000:ad:00.0)
2019-08-05 23:28:44.000532: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:6) -> (device: 6, name: Tesla P100-PCIE-16GB, pci bus id: 0000:b1:00.0)
2019-08-05 23:28:44.000539: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1045] Creating TensorFlow device (/gpu:7) -> (device: 7, name: Tesla P100-PCIE-16GB, pci bus id: 0000:b5:00.0)
Restoring from ./checkpoints/papermodel/checkpoint
Meta restored
Found inputs:
['imgs:0', 'depths:0', 'poses:0', 'num_depths:0', 'close_depth:0', 'inf_depth:0', 'window:0']
Found outputs:
['accum', 'alpha_acc', 'base_img', 'disp0', 'disps', 'imgs', 'inplaces', 'mpi0', 'mpis', 'psv', 'psv1', 'renderings', 'renderings_all', 'renderings_mean', 'renderings_single', 'scales', 'target_disp', 'target_img']
Setup renderer
Weights restored
0 (of 20) <- [0, 8, 9, 1, 10, 0] depths 15.6985977596 166.511276286
('abdriged outputs to', dict_keys(['mpi0', 'disps', 'psv']))
('0 of 1',)
('(360, 480) gridded into 3 x 4',)
('.',)
Traceback (most recent call last):
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1327, in _do_call
    return fn(*args)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1306, in _run_fn
    status, run_metadata)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits must be 2-dimensional
         [[Node: get_mpi3d_multi/Softmax = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](get_mpi3d_multi/concat_1)]]
         [[Node: disps_1/_731 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_12995_disps_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "imgs2mpis.py", line 83, in <module>
    args.numplanes, args.no_mpis, True, args.psvs)
  File "imgs2mpis.py", line 54, in gen_mpis
    mpis = run_inference(imgs, poses, mpi_bds, ibr_runner, num_planes, patched, disps=disps, psvs=psvs)
  File "/home/m00486393/j500003470/Code/LLFF/llff/inference/mpi_utils.py", line 156, in run_inference
    mpi.generate(generator, num_planes)
  File "/home/m00486393/j500003470/Code/LLFF/llff/inference/mpi_utils.py", line 55, in generate
    outputs = generator(inputs)
  File "/home/m00486393/j500003470/Code/LLFF/llff/inference/mpi_utils.py", line 136, in <lambda>
    generator = lambda inputs : ibr_runner.run_inference(inputs, test_keys=keys, patched=patched, valid=120, buffer=80)
  File "/home/m00486393/j500003470/Code/LLFF/llff/inference/mpi_tester.py", line 216, in run_inference
    out_ = sess.run(outputs, feed_dict=fdict)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 895, in run
    run_metadata_ptr)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1124, in _run
    feed_dict_tensor, options, run_metadata)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1321, in _do_run
    options, run_metadata)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1340, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: logits must be 2-dimensional
         [[Node: get_mpi3d_multi/Softmax = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](get_mpi3d_multi/concat_1)]]
         [[Node: disps_1/_731 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_12995_disps_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

Caused by op 'get_mpi3d_multi/Softmax', defined at:
  File "imgs2mpis.py", line 83, in <module>
    args.numplanes, args.no_mpis, True, args.psvs)
  File "imgs2mpis.py", line 44, in gen_mpis
    ibr_runner.load_graph(logdir)
  File "/home/m00486393/j500003470/Code/LLFF/llff/inference/mpi_tester.py", line 112, in load_graph
    self.saver = tf.train.import_meta_graph(ckpt_path + '.meta')
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/training/saver.py", line 1698, in import_meta_graph
    **kwargs)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/framework/meta_graph.py", line 656, in import_scoped_meta_graph
    producer_op_list=producer_op_list)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/framework/importer.py", line 313, in import_graph_def
    op_def=op_def)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 2630, in create_op
    original_op=self._default_original_op, op_def=op_def)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/framework/ops.py", line 1204, in __init__
    self._traceback = self._graph._extract_stack()  # pylint: disable=protected-access

InvalidArgumentError (see above for traceback): logits must be 2-dimensional
         [[Node: get_mpi3d_multi/Softmax = Softmax[T=DT_FLOAT, _device="/job:localhost/replica:0/task:0/gpu:0"](get_mpi3d_multi/concat_1)]]
         [[Node: disps_1/_731 = _Recv[client_terminated=false, recv_device="/job:localhost/replica:0/task:0/cpu:0", send_device="/job:localhost/replica:0/task:0/gpu:0", send_device_incarnation=1, tensor_name="edge_12995_disps_1", tensor_type=DT_FLOAT, _device="/job:localhost/replica:0/task:0/cpu:0"]()]]

I don't know how to fix the problem, Could you please help me identify where could possibly go wrong?

@jiangchuan617
Copy link
Author

If you re-run the code, you will get a new error:

(j3) m00486393@ubuntu:~/j500003470/Code/LLFF$ python imgs2mpis.py \
>     data/testscene/ \
>     data/testscene/mpis_360 \
>     --height 360
factor/width/height args: [None, None, 360]
Loaded image data (360, 480, 3, 20) [ 360.         480.         391.3782448]
Creating session
2019-08-06 02:16:53.273617: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.1 instructions, but these are available on your machine and could speed up CPU computations.
2019-08-06 02:16:53.273643: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use SSE4.2 instructions, but these are available on your machine and could speed up CPU computations.
2019-08-06 02:16:53.273652: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX instructions, but these are available on your machine and could speed up CPU computations.
2019-08-06 02:16:53.273659: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use AVX2 instructions, but these are available on your machine and could speed up CPU computations.
2019-08-06 02:16:53.273666: W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library wasn't compiled to use FMA instructions, but these are available on your machine and could speed up CPU computations.
2019-08-06 02:16:54.244065: E tensorflow/core/common_runtime/direct_session.cc:171] Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_OUT_OF_MEMORY; total memory reported: 17066885120
Traceback (most recent call last):
  File "imgs2mpis.py", line 83, in <module>
    args.numplanes, args.no_mpis, True, args.psvs)
  File "imgs2mpis.py", line 44, in gen_mpis
    ibr_runner.load_graph(logdir)
  File "/home/m00486393/j500003470/Code/LLFF/llff/inference/mpi_tester.py", line 108, in load_graph
    sess = self.Sess()
  File "/home/m00486393/j500003470/Code/LLFF/llff/inference/mpi_tester.py", line 86, in Sess
    sess = tf.Session(config=config)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1486, in __init__
    super(Session, self).__init__(target, graph, config=config)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 621, in __init__
    self._session = tf_session.TF_NewDeprecatedSession(opts, status)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/contextlib.py", line 88, in __exit__
    next(self.gen)
  File "/home/m00486393/anaconda2/envs/j3/lib/python3.6/site-packages/tensorflow/python/framework/errors_impl.py", line 466, in raise_exception_on_not_ok_status
    pywrap_tensorflow.TF_GetCode(status))
tensorflow.python.framework.errors_impl.InternalError: Failed to create session.

@bmild
Copy link
Collaborator

bmild commented Aug 9, 2019

The second error message you posted looks like Tensorflow is immediately running out of GPU memory -- make sure your GPU memory isn't all used up with nvidia-smi before running the command.

I haven't run into the first error before. You may want to try running inside the Docker environment, which specifically uses Tensorflow 1.13 (perhaps the behavior of the softmax function was changed at some point).

@bmild bmild closed this as completed Sep 26, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants