Error while running benchmark.py on GPU #185

brm738 · 2022-04-01T08:59:19Z

Running benchmark.py on Nvidia A100 SXM4 produces an error:

Exception: Error occurred while computing benchmark on device File "tools/benchmarking/benchmark.py", line 177, in distribute
raise Exception(f"Error occurred while computing benchmark on device {job}") from exception
Exception: Error occurred while computing benchmark on device <Future at 0x7f05c18ac310 state=finished raised Exception>

Note: In my setup this error does not occur when running on CPU only. It does not occur either when running on CPU and GPU for patchcore model. It occurs when running on GPU for padim or cflow

ashwinvaidya17 · 2022-04-07T12:12:06Z

Can you share the entire log?

brm738 · 2022-04-07T12:58:33Z

Traceback (most recent call last):
File "/usr/lib/python3.8/concurrent/futures/process.py", line 239, in _process_worker
r = call_item.fn(*call_item.args, **call_item.kwargs)
File "/data/home/epi/anomalib/tools/benchmarking/benchmark.py", line 137, in compute_on_gpu
model_metrics = sweep(run_config, device, seed)
File "/data/home/epi/anomalib/tools/benchmarking/benchmark.py", line 233, in sweep
model_metrics = get_single_model_metrics(model_config=model_config, openvino_metrics=convert_openvino)
File "/data/home/epi/anomalib/tools/benchmarking/benchmark.py", line 99, in get_single_model_metrics
convert_to_openvino(model, openvino_export_path, model_config.model.input_size)
File "/data/home/epi/anomalib/tools/benchmarking/utils/convert.py", line 28, in convert_to_openvino
export_convert(model, input_size, onnx_path, export_path)
File "/data/home/epi/anomalib/anomalib/deploy/optimize.py", line 73, in export_convert
torch.onnx.export(
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/init.py", line 275, in export
return utils.export(model, args, f, export_params, verbose, training,
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/utils.py", line 88, in export
_export(model, args, f, export_params, verbose, training, input_names, output_names,
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/utils.py", line 689, in _export
_model_to_graph(model, args, verbose, input_names,
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/utils.py", line 463, in _model_to_graph
graph = _optimize_graph(graph, operator_export_type,
File "/data/home/epi/env_Anomalib/lib/python3.8/site-packages/torch/onnx/utils.py", line 223, in _optimize_graph
torch._C._jit_pass_onnx_graph_shape_type_inference(graph, params_dict, _export_onnx_opset_version)
RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.
"""

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "tools/benchmarking/benchmark.py", line 164, in distribute_over_gpus
job.result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 444, in result
return self.__get_result()
File "/usr/lib/python3.8/concurrent/futures/_base.py", line 389, in __get_result
raise self._exception
RuntimeError: Exporting model exceed maximum protobuf size of 2GB. Please call torch.onnx.export with use_external_data_format=True.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
File "tools/benchmarking/benchmark.py", line 258, in
distribute()
File "tools/benchmarking/benchmark.py", line 194, in distribute
distribute_over_gpus()
File "tools/benchmarking/benchmark.py", line 166, in distribute_over_gpus
raise Exception(f"Error occurred while computing benchmark on device {job}") from exc
Exception: Error occurred while computing benchmark on device <Future at 0x7efd9b83ae20 state=finished raised RuntimeError>

FranLucchini · 2022-04-07T20:22:45Z

I have the same issue while running on collab pro using a GPU

ashwinvaidya17 · 2022-04-12T12:29:15Z

I think I linked the wrong issue. I haven't addressed this one yet. I am reopening it.

ashwinvaidya17 · 2022-04-12T14:18:58Z

Can you tell me which version of torch and onnx do you have on colab? Also, how did you install anomalib? By cloning the repo?

ashwinvaidya17 · 2022-06-10T10:55:27Z

Closing due to inactivity. Open again if you still see this issue.

samet-akcay assigned ashwinvaidya17 Apr 1, 2022

samet-akcay added Benchmark Bug Something isn't working labels Apr 1, 2022

samet-akcay modified the milestone: Backlog Apr 4, 2022

ashwinvaidya17 added this to the Backlog milestone Apr 4, 2022

samet-akcay modified the milestones: Backlog, v0.2.7 Apr 4, 2022

ashwinvaidya17 mentioned this issue Apr 11, 2022

Fix inconsistent benchmarking throughput/time #221

Merged

11 tasks

samet-akcay closed this as completed in #221 Apr 12, 2022

ashwinvaidya17 reopened this Apr 12, 2022

samet-akcay modified the milestones: v0.3.0, v0.3.1 Apr 22, 2022

ashwinvaidya17 mentioned this issue Jun 8, 2022

📝 Add benchmarking notebook #353

Merged

ashwinvaidya17 closed this as completed Jun 10, 2022

samet-akcay assigned samet-akcay and ashwinvaidya17 and unassigned ashwinvaidya17 and samet-akcay Jun 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error while running benchmark.py on GPU #185

Error while running benchmark.py on GPU #185

brm738 commented Apr 1, 2022

ashwinvaidya17 commented Apr 7, 2022

brm738 commented Apr 7, 2022

FranLucchini commented Apr 7, 2022

ashwinvaidya17 commented Apr 12, 2022

ashwinvaidya17 commented Apr 12, 2022

ashwinvaidya17 commented Jun 10, 2022

Error while running benchmark.py on GPU #185

Error while running benchmark.py on GPU #185

Comments

brm738 commented Apr 1, 2022

ashwinvaidya17 commented Apr 7, 2022

brm738 commented Apr 7, 2022

FranLucchini commented Apr 7, 2022

ashwinvaidya17 commented Apr 12, 2022

ashwinvaidya17 commented Apr 12, 2022

ashwinvaidya17 commented Jun 10, 2022