[BUG] _dp test raise OOM | 2.0.0 _ #1149

Vibsteamer · 2021-09-14T08:28:37Z

Summary
Using kit-2.0.0 to conduct "dp test" would raise error of OOM. Not seen with previous version on the same system.

Deepmd-kit version, installation way, input file, running commands, error log, etc.

version 2.0.0_release

error

......
DEEPMD INFO    # ---------------output of dp test--------------- 
DEEPMD INFO    # testing system : ../../data/data.init/data.init/AlCuMg/init.1119
DEEPMD INFO    # number of test data : 10 
DEEPMD INFO    Energy RMSE        : 6.845718e-02 eV
DEEPMD INFO    Energy RMSE/Natoms : 2.139287e-03 eV
DEEPMD INFO    Force  RMSE        : 9.130943e-03 eV/A
DEEPMD INFO    Virial RMSE        : 2.466390e-01 eV
DEEPMD INFO    Virial RMSE/Natoms : 7.707468e-03 eV
DEEPMD INFO    # ----------------------------------------------- 
DEEPMD INFO    # ---------------output of dp test--------------- 
DEEPMD INFO    # testing system : ../../data/data.init/data.init/AlCuMg/init.112
2021-09-14 16:15:13.547135: W tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.62GiB (rounded to 2816000000)requested by op load/filter_type_all/concat_5
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2021-09-14 16:15:13.547190: I tensorflow/core/common_runtime/bfc_allocator.cc:991] BFCAllocator dump for GPU_0_bfc
2021-09-14 16:15:13.547210: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (256): 	Total Chunks: 12, Chunks in use: 12. 3.0KiB allocated for chunks. 3.0KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:13.547220: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (512): 	Total Chunks: 3, Chunks in use: 3. 1.5KiB allocated for chunks. 1.5KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:13.547228: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1024): 	Total Chunks: 4, Chunks in use: 4. 4.2KiB allocated for chunks. 4.2KiB in use in bin. 3.3KiB client-requested in use in bin.
2021-09-14 16:15:13.547242: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2048): 	Total Chunks: 18, Chunks in use: 18. 36.0KiB allocated for chunks. 36.0KiB in use in bin. 33.8KiB client-requested in use in bin.
2021-09-14 16:15:13.547251: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547260: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8192): 	Total Chunks: 3, Chunks in use: 3. 30.0KiB allocated for chunks. 30.0KiB in use in bin. 29.3KiB client-requested in use in bin.
2021-09-14 16:15:13.547268: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16384): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547277: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (32768): 	Total Chunks: 3, Chunks in use: 3. 117.8KiB allocated for chunks. 117.8KiB in use in bin. 117.2KiB client-requested in use in bin.
2021-09-14 16:15:13.547291: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (65536): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547301: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (131072): 	Total Chunks: 2, Chunks in use: 2. 431.5KiB allocated for chunks. 431.5KiB in use in bin. 431.2KiB client-requested in use in bin.
2021-09-14 16:15:13.547309: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (262144): 	Total Chunks: 6, Chunks in use: 6. 2.64MiB allocated for chunks. 2.64MiB in use in bin. 2.64MiB client-requested in use in bin.
2021-09-14 16:15:13.547317: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547324: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547332: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2097152): 	Total Chunks: 3, Chunks in use: 3. 6.59MiB allocated for chunks. 6.59MiB in use in bin. 6.59MiB client-requested in use in bin.
2021-09-14 16:15:13.547340: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547356: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547366: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16777216): 	Total Chunks: 1, Chunks in use: 1. 28.08MiB allocated for chunks. 28.08MiB in use in bin. 28.08MiB client-requested in use in bin.
2021-09-14 16:15:13.547375: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (33554432): 	Total Chunks: 1, Chunks in use: 1. 48.83MiB allocated for chunks. 48.83MiB in use in bin. 48.83MiB client-requested in use in bin.
2021-09-14 16:15:13.547384: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (67108864): 	Total Chunks: 2, Chunks in use: 2. 175.78MiB allocated for chunks. 175.78MiB in use in bin. 175.78MiB client-requested in use in bin.
2021-09-14 16:15:13.547393: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (134217728): 	Total Chunks: 2, Chunks in use: 1. 393.16MiB allocated for chunks. 168.46MiB in use in bin. 168.46MiB client-requested in use in bin.
2021-09-14 16:15:13.547402: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (268435456): 	Total Chunks: 13, Chunks in use: 11. 12.81GiB allocated for chunks. 10.13GiB in use in bin. 10.13GiB client-requested in use in bin.
2021-09-14 16:15:13.547416: I tensorflow/core/common_runtime/bfc_allocator.cc:1014] Bin for 2.62GiB was 256.00MiB, Chunk State: 
2021-09-14 16:15:13.547436: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 0 | bin_num: 20, prev:   Size: 427.25MiB | Requested Size: 427.25MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_4, stepid: 238, last_action: 124270, next:   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/add, stepid: 238, last_action: 124274, for: UNUSED, stepid: 238, last_action: 124275
2021-09-14 16:15:13.547448: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 2.38GiB | Requested Size: 30.52MiB | in_use: 0 | bin_num: 20, prev:   Size: 2.62GiB | Requested Size: 2.62GiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_10, stepid: 238, last_action: 124282, for: UNUSED, stepid: 237, last_action: 122413
2021-09-14 16:15:13.547455: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 14445051904
2021-09-14 16:15:13.547469: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000000 of size 256 by op load/descrpt_attr/rcut action_count 1 step 0 next 1
2021-09-14 16:15:13.547477: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000100 of size 1280 by op ScratchBuffer action_count 2 step 0 next 2
2021-09-14 16:15:13.547485: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000600 of size 256 by op load/gradients/grad_ys_0 action_count 3 step 0 next 3
2021-09-14 16:15:13.547492: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000700 of size 2304000 by op load/layer_0_type_2/matrix action_count 4 step 0 next 4
2021-09-14 16:15:13.547500: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c232f00 of size 2048 by op load/layer_0_type_2/bias action_count 5 step 0 next 5
2021-09-14 16:15:13.547507: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c233700 of size 460800 by op load/layer_1_type_2/matrix action_count 6 step 0 next 6
2021-09-14 16:15:13.547514: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a3f00 of size 2048 by op load/layer_1_type_2/bias action_count 7 step 0 next 7
2021-09-14 16:15:13.547521: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4700 of size 2048 by op load/layer_1_type_2/idt action_count 8 step 0 next 8
2021-09-14 16:15:13.547528: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4f00 of size 460800 by op load/layer_2_type_2/matrix action_count 9 step 0 next 9
2021-09-14 16:15:13.547535: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315700 of size 2048 by op load/layer_2_type_2/bias action_count 10 step 0 next 10
2021-09-14 16:15:13.547542: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315f00 of size 2048 by op load/layer_2_type_2/idt action_count 11 step 0 next 11
2021-09-14 16:15:13.547549: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316700 of size 2048 by op load/final_layer_type_2/matrix action_count 12 step 0 next 12
2021-09-14 16:15:13.547556: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316f00 of size 256 by op load/final_layer_type_2/bias action_count 13 step 0 next 13
2021-09-14 16:15:13.547563: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c317000 of size 2304000 by op load/layer_0_type_1/matrix action_count 14 step 0 next 14
2021-09-14 16:15:13.547570: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c549800 of size 2048 by op load/layer_0_type_1/bias action_count 15 step 0 next 15
2021-09-14 16:15:13.547577: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c54a000 of size 460800 by op load/layer_1_type_1/matrix action_count 16 step 0 next 16
2021-09-14 16:15:13.547584: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5ba800 of size 2048 by op load/layer_1_type_1/bias action_count 17 step 0 next 17
2021-09-14 16:15:13.547594: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb000 of size 2048 by op load/layer_1_type_1/idt action_count 18 step 0 next 18
2021-09-14 16:15:13.547601: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb800 of size 460800 by op load/layer_2_type_1/matrix action_count 19 step 0 next 19
2021-09-14 16:15:13.547608: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c000 of size 2048 by op load/layer_2_type_1/bias action_count 20 step 0 next 20
2021-09-14 16:15:13.547615: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c800 of size 2048 by op load/layer_2_type_1/idt action_count 21 step 0 next 21
2021-09-14 16:15:13.547621: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d000 of size 2048 by op load/final_layer_type_1/matrix action_count 22 step 0 next 22
2021-09-14 16:15:13.547628: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d800 of size 256 by op load/final_layer_type_1/bias action_count 23 step 0 next 23
2021-09-14 16:15:13.547635: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d900 of size 256 by op load/final_layer_type_0/bias action_count 24 step 0 next 24
2021-09-14 16:15:13.547642: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62da00 of size 2048 by op load/final_layer_type_0/matrix action_count 25 step 0 next 25
2021-09-14 16:15:13.547649: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62e200 of size 2048 by op load/layer_2_type_0/idt action_count 26 step 0 next 26
2021-09-14 16:15:13.547656: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62ea00 of size 2048 by op load/layer_2_type_0/bias action_count 27 step 0 next 27
2021-09-14 16:15:13.547662: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62f200 of size 460800 by op load/layer_2_type_0/matrix action_count 28 step 0 next 28
2021-09-14 16:15:13.547669: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c69fa00 of size 2048 by op load/layer_1_type_0/idt action_count 29 step 0 next 29
2021-09-14 16:15:13.547676: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0200 of size 2048 by op load/layer_1_type_0/bias action_count 30 step 0 next 30
2021-09-14 16:15:13.547683: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0a00 of size 2048 by op load/layer_0_type_0/bias action_count 31 step 0 next 31
2021-09-14 16:15:13.547689: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a1200 of size 460800 by op load/layer_1_type_0/matrix action_count 32 step 0 next 32
2021-09-14 16:15:13.547696: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c711a00 of size 2304000 by op load/layer_0_type_0/matrix action_count 33 step 0 next 33
2021-09-14 16:15:13.547703: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944200 of size 256 by op load/filter_type_all/mul/y action_count 34 step 0 next 34
2021-09-14 16:15:13.547710: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944300 of size 40192 by op load/filter_type_all/matrix_3_2 action_count 35 step 0 next 35
2021-09-14 16:15:13.547718: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e000 of size 1024 by op load/filter_type_all/bias_3_2 action_count 36 step 0 next 36
2021-09-14 16:15:13.547725: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e400 of size 10240 by op load/filter_type_all/matrix_2_2 action_count 37 step 0 next 37
2021-09-14 16:15:13.547732: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950c00 of size 512 by op load/filter_type_all/bias_2_2 action_count 38 step 0 next 38
2021-09-14 16:15:13.547739: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950e00 of size 256 by op load/filter_type_all/matrix_1_2 action_count 39 step 0 next 39
2021-09-14 16:15:13.547748: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950f00 of size 256 by op load/filter_type_all/bias_1_2 action_count 40 step 0 next 40
2021-09-14 16:15:13.547755: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c951000 of size 40192 by op load/filter_type_all/matrix_3_1 action_count 41 step 0 next 41
2021-09-14 16:15:13.547762: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95ad00 of size 1024 by op load/filter_type_all/bias_3_1 action_count 42 step 0 next 42
2021-09-14 16:15:13.547769: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95b100 of size 10240 by op load/filter_type_all/matrix_2_1 action_count 43 step 0 next 43
2021-09-14 16:15:13.547776: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95d900 of size 512 by op load/filter_type_all/bias_2_1 action_count 44 step 0 next 44
2021-09-14 16:15:13.547783: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95db00 of size 256 by op load/filter_type_all/matrix_1_1 action_count 45 step 0 next 45
2021-09-14 16:15:13.547790: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dc00 of size 256 by op load/filter_type_all/bias_1_1 action_count 46 step 0 next 46
2021-09-14 16:15:13.547797: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dd00 of size 40192 by op load/filter_type_all/matrix_3_0 action_count 47 step 0 next 47
2021-09-14 16:15:13.547804: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967a00 of size 1024 by op load/filter_type_all/bias_3_0 action_count 48 step 0 next 48
2021-09-14 16:15:13.547811: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967e00 of size 256 by op load/filter_type_all/bias_1_0 action_count 49 step 0 next 49
2021-09-14 16:15:13.547817: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967f00 of size 10240 by op load/filter_type_all/matrix_2_0 action_count 50 step 0 next 50
2021-09-14 16:15:13.547824: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a700 of size 512 by op load/filter_type_all/bias_2_0 action_count 51 step 0 next 51
2021-09-14 16:15:13.547831: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a900 of size 256 by op load/filter_type_all/matrix_1_0 action_count 52 step 0 next 52
2021-09-14 16:15:13.547838: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96aa00 of size 220928 by op load/descrpt_attr/t_avg action_count 53 step 0 next 53
2021-09-14 16:15:13.547846: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c9a0900 of size 220928 by op load/descrpt_attr/t_std action_count 54 step 0 next 54
2021-09-14 16:15:13.547852: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af09c9d6800 of size 235617280 by op UNUSED action_count 124269 step 238 next 86
2021-09-14 16:15:13.547860: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0aaa8a400 of size 706560000 by op load/ProdEnvMatA action_count 122455 step 238 next 78
2021-09-14 16:15:13.547867: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0d4c5e400 of size 176640000 by op load/ProdEnvMatA action_count 122456 step 238 next 97
2021-09-14 16:15:13.547874: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0df4d3400 of size 29440000 by op load/ProdEnvMatA action_count 122457 step 238 next 58
2021-09-14 16:15:13.547881: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e10e6c00 of size 112640000 by op load/filter_type_all/Slice_4 action_count 124261 step 238 next 68
2021-09-14 16:15:13.547888: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e7c52c00 of size 71680000 by op load/filter_type_all/Slice_2 action_count 124262 step 238 next 82
2021-09-14 16:15:13.547896: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ec0aec00 of size 51200000 by op load/filter_type_all/Slice action_count 124263 step 238 next 60
2021-09-14 16:15:13.547906: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ef182c00 of size 704000000 by op load/filter_type_all/MatMul_8 action_count 124268 step 238 next 83
2021-09-14 16:15:13.547913: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1190e5c00 of size 448000000 by op load/filter_type_all/MatMul_4 action_count 124270 step 238 next 65
2021-09-14 16:15:13.547919: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af133c24c00 of size 320000000 by op UNUSED action_count 124275 step 238 next 63
2021-09-14 16:15:13.547926: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af146d51c00 of size 320000000 by op load/filter_type_all/add action_count 124274 step 238 next 95
2021-09-14 16:15:13.547934: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af159e7ec00 of size 1408000000 by op load/filter_type_all/MatMul_9 action_count 124276 step 238 next 89
2021-09-14 16:15:13.547941: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1add44c00 of size 1408000000 by op load/filter_type_all/concat_4 action_count 124277 step 238 next 93
2021-09-14 16:15:13.547948: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af201c0ac00 of size 896000000 by op load/filter_type_all/MatMul_5 action_count 124278 step 238 next 94
2021-09-14 16:15:13.547955: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af237288c00 of size 896000000 by op load/filter_type_all/concat_2 action_count 124279 step 238 next 59
2021-09-14 16:15:13.547962: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af26c906c00 of size 640000000 by op load/filter_type_all/concat action_count 124280 step 238 next 92
2021-09-14 16:15:13.547969: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af292b60c00 of size 640000000 by op load/filter_type_all/MatMul_1 action_count 124281 step 238 next 73
2021-09-14 16:15:13.547976: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af2b8dbac00 of size 2816000000 by op load/filter_type_all/MatMul_10 action_count 124282 step 238 next 74
2021-09-14 16:15:13.547983: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af360b46c00 of size 2554958848 by op UNUSED action_count 122413 step 237 next 18446744073709551615
2021-09-14 16:15:13.547989: I tensorflow/core/common_runtime/bfc_allocator.cc:1051]      Summary of in-use Chunks by size: 
2021-09-14 16:15:13.547997: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 12 Chunks of size 256 totalling 3.0KiB
2021-09-14 16:15:13.548005: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 512 totalling 1.5KiB
2021-09-14 16:15:13.548011: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 1024 totalling 3.0KiB
2021-09-14 16:15:13.548018: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1280 totalling 1.2KiB
2021-09-14 16:15:13.548025: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 18 Chunks of size 2048 totalling 36.0KiB
2021-09-14 16:15:13.548033: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 10240 totalling 30.0KiB
2021-09-14 16:15:13.548040: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 40192 totalling 117.8KiB
2021-09-14 16:15:13.548047: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 220928 totalling 431.5KiB
2021-09-14 16:15:13.548054: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 6 Chunks of size 460800 totalling 2.64MiB
2021-09-14 16:15:13.548060: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 2304000 totalling 6.59MiB
2021-09-14 16:15:13.548067: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 29440000 totalling 28.08MiB
2021-09-14 16:15:13.548075: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 51200000 totalling 48.83MiB
2021-09-14 16:15:13.548082: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 71680000 totalling 68.36MiB
2021-09-14 16:15:13.548092: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 112640000 totalling 107.42MiB
2021-09-14 16:15:13.548099: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 176640000 totalling 168.46MiB
2021-09-14 16:15:13.548106: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 320000000 totalling 305.18MiB
2021-09-14 16:15:13.548113: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 448000000 totalling 427.25MiB
2021-09-14 16:15:13.548120: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 640000000 totalling 1.19GiB
2021-09-14 16:15:13.548127: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 704000000 totalling 671.39MiB
2021-09-14 16:15:13.548134: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 706560000 totalling 673.83MiB
2021-09-14 16:15:13.548141: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 896000000 totalling 1.67GiB
2021-09-14 16:15:13.548147: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 1408000000 totalling 2.62GiB
2021-09-14 16:15:13.548154: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2816000000 totalling 2.62GiB
2021-09-14 16:15:13.548161: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 10.56GiB
2021-09-14 16:15:13.548168: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 14445051904 memory_limit_: 14445051904 available bytes: 0 curr_region_allocation_bytes_: 28890103808
2021-09-14 16:15:13.548180: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
Limit:                     14445051904
InUse:                     11334475776
MaxInUse:                  11334475776
NumAllocs:                       62176
MaxAllocSize:               2816000000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-09-14 16:15:13.548192: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ******************_****************************************************************_________________
2021-09-14 16:15:13.548241: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at concat_op.cc:158 : Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2021-09-14 16:15:23.548583: W tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.67GiB (rounded to 1792000000)requested by op load/filter_type_all/concat_3
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2021-09-14 16:15:23.548628: I tensorflow/core/common_runtime/bfc_allocator.cc:991] BFCAllocator dump for GPU_0_bfc
2021-09-14 16:15:23.548641: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (256): 	Total Chunks: 12, Chunks in use: 12. 3.0KiB allocated for chunks. 3.0KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:23.548650: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (512): 	Total Chunks: 3, Chunks in use: 3. 1.5KiB allocated for chunks. 1.5KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:23.548659: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1024): 	Total Chunks: 4, Chunks in use: 4. 4.2KiB allocated for chunks. 4.2KiB in use in bin. 3.3KiB client-requested in use in bin.
2021-09-14 16:15:23.548669: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2048): 	Total Chunks: 18, Chunks in use: 18. 36.0KiB allocated for chunks. 36.0KiB in use in bin. 33.8KiB client-requested in use in bin.
2021-09-14 16:15:23.548677: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548691: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8192): 	Total Chunks: 3, Chunks in use: 3. 30.0KiB allocated for chunks. 30.0KiB in use in bin. 29.3KiB client-requested in use in bin.
2021-09-14 16:15:23.548699: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16384): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548708: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (32768): 	Total Chunks: 3, Chunks in use: 3. 117.8KiB allocated for chunks. 117.8KiB in use in bin. 117.2KiB client-requested in use in bin.
2021-09-14 16:15:23.548716: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (65536): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548725: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (131072): 	Total Chunks: 2, Chunks in use: 2. 431.5KiB allocated for chunks. 431.5KiB in use in bin. 431.2KiB client-requested in use in bin.
2021-09-14 16:15:23.548734: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (262144): 	Total Chunks: 6, Chunks in use: 6. 2.64MiB allocated for chunks. 2.64MiB in use in bin. 2.64MiB client-requested in use in bin.
2021-09-14 16:15:23.548742: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548749: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548758: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2097152): 	Total Chunks: 3, Chunks in use: 3. 6.59MiB allocated for chunks. 6.59MiB in use in bin. 6.59MiB client-requested in use in bin.
2021-09-14 16:15:23.548765: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548773: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548787: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16777216): 	Total Chunks: 1, Chunks in use: 1. 28.08MiB allocated for chunks. 28.08MiB in use in bin. 28.08MiB client-requested in use in bin.
2021-09-14 16:15:23.548798: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (33554432): 	Total Chunks: 1, Chunks in use: 1. 48.83MiB allocated for chunks. 48.83MiB in use in bin. 48.83MiB client-requested in use in bin.
2021-09-14 16:15:23.548807: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (67108864): 	Total Chunks: 2, Chunks in use: 2. 175.78MiB allocated for chunks. 175.78MiB in use in bin. 175.78MiB client-requested in use in bin.
2021-09-14 16:15:23.548816: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (134217728): 	Total Chunks: 2, Chunks in use: 1. 393.16MiB allocated for chunks. 168.46MiB in use in bin. 168.46MiB client-requested in use in bin.
2021-09-14 16:15:23.548825: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (268435456): 	Total Chunks: 14, Chunks in use: 11. 12.81GiB allocated for chunks. 10.49GiB in use in bin. 10.49GiB client-requested in use in bin.
2021-09-14 16:15:23.548834: I tensorflow/core/common_runtime/bfc_allocator.cc:1014] Bin for 1.67GiB was 256.00MiB, Chunk State: 
2021-09-14 16:15:23.548853: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 0 | bin_num: 20, prev:   Size: 427.25MiB | Requested Size: 427.25MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_4, stepid: 238, last_action: 124270, next:   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/add, stepid: 238, last_action: 124274, for: UNUSED, stepid: 238, last_action: 124275
2021-09-14 16:15:23.548867: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 727.61MiB | Requested Size: 42.72MiB | in_use: 0 | bin_num: 20, prev:   Size: 1.67GiB | Requested Size: 1.67GiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_6, stepid: 238, last_action: 124284, for: UNUSED, stepid: 237, last_action: 122412
2021-09-14 16:15:23.548879: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 1.31GiB | Requested Size: 1.31GiB | in_use: 0 | bin_num: 20, prev:   Size: 1.31GiB | Requested Size: 1.31GiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_9, stepid: 238, last_action: 124276, next:   Size: 854.49MiB | Requested Size: 854.49MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_5, stepid: 238, last_action: 124278, for: UNUSED, stepid: 238, last_action: 124283
2021-09-14 16:15:23.548887: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 14445051904
2021-09-14 16:15:23.548898: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000000 of size 256 by op load/descrpt_attr/rcut action_count 1 step 0 next 1
2021-09-14 16:15:23.548906: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000100 of size 1280 by op ScratchBuffer action_count 2 step 0 next 2
2021-09-14 16:15:23.548913: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000600 of size 256 by op load/gradients/grad_ys_0 action_count 3 step 0 next 3
2021-09-14 16:15:23.548921: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000700 of size 2304000 by op load/layer_0_type_2/matrix action_count 4 step 0 next 4
2021-09-14 16:15:23.548928: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c232f00 of size 2048 by op load/layer_0_type_2/bias action_count 5 step 0 next 5
2021-09-14 16:15:23.548936: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c233700 of size 460800 by op load/layer_1_type_2/matrix action_count 6 step 0 next 6
2021-09-14 16:15:23.548943: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a3f00 of size 2048 by op load/layer_1_type_2/bias action_count 7 step 0 next 7
2021-09-14 16:15:23.548949: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4700 of size 2048 by op load/layer_1_type_2/idt action_count 8 step 0 next 8
2021-09-14 16:15:23.548956: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4f00 of size 460800 by op load/layer_2_type_2/matrix action_count 9 step 0 next 9
2021-09-14 16:15:23.548963: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315700 of size 2048 by op load/layer_2_type_2/bias action_count 10 step 0 next 10
2021-09-14 16:15:23.548970: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315f00 of size 2048 by op load/layer_2_type_2/idt action_count 11 step 0 next 11
2021-09-14 16:15:23.548977: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316700 of size 2048 by op load/final_layer_type_2/matrix action_count 12 step 0 next 12
2021-09-14 16:15:23.548984: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316f00 of size 256 by op load/final_layer_type_2/bias action_count 13 step 0 next 13
2021-09-14 16:15:23.548991: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c317000 of size 2304000 by op load/layer_0_type_1/matrix action_count 14 step 0 next 14
2021-09-14 16:15:23.548998: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c549800 of size 2048 by op load/layer_0_type_1/bias action_count 15 step 0 next 15
2021-09-14 16:15:23.549005: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c54a000 of size 460800 by op load/layer_1_type_1/matrix action_count 16 step 0 next 16
2021-09-14 16:15:23.549012: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5ba800 of size 2048 by op load/layer_1_type_1/bias action_count 17 step 0 next 17
2021-09-14 16:15:23.549021: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb000 of size 2048 by op load/layer_1_type_1/idt action_count 18 step 0 next 18
2021-09-14 16:15:23.549028: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb800 of size 460800 by op load/layer_2_type_1/matrix action_count 19 step 0 next 19
2021-09-14 16:15:23.549035: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c000 of size 2048 by op load/layer_2_type_1/bias action_count 20 step 0 next 20
2021-09-14 16:15:23.549042: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c800 of size 2048 by op load/layer_2_type_1/idt action_count 21 step 0 next 21
2021-09-14 16:15:23.549048: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d000 of size 2048 by op load/final_layer_type_1/matrix action_count 22 step 0 next 22
2021-09-14 16:15:23.549055: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d800 of size 256 by op load/final_layer_type_1/bias action_count 23 step 0 next 23
2021-09-14 16:15:23.549063: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d900 of size 256 by op load/final_layer_type_0/bias action_count 24 step 0 next 24
2021-09-14 16:15:23.549069: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62da00 of size 2048 by op load/final_layer_type_0/matrix action_count 25 step 0 next 25
2021-09-14 16:15:23.549076: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62e200 of size 2048 by op load/layer_2_type_0/idt action_count 26 step 0 next 26
2021-09-14 16:15:23.549083: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62ea00 of size 2048 by op load/layer_2_type_0/bias action_count 27 step 0 next 27
2021-09-14 16:15:23.549090: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62f200 of size 460800 by op load/layer_2_type_0/matrix action_count 28 step 0 next 28
2021-09-14 16:15:23.549100: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c69fa00 of size 2048 by op load/layer_1_type_0/idt action_count 29 step 0 next 29
2021-09-14 16:15:23.549110: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0200 of size 2048 by op load/layer_1_type_0/bias action_count 30 step 0 next 30
2021-09-14 16:15:23.549117: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0a00 of size 2048 by op load/layer_0_type_0/bias action_count 31 step 0 next 31
2021-09-14 16:15:23.549124: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a1200 of size 460800 by op load/layer_1_type_0/matrix action_count 32 step 0 next 32
2021-09-14 16:15:23.549131: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c711a00 of size 2304000 by op load/layer_0_type_0/matrix action_count 33 step 0 next 33
2021-09-14 16:15:23.549138: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944200 of size 256 by op load/filter_type_all/mul/y action_count 34 step 0 next 34
2021-09-14 16:15:23.549145: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944300 of size 40192 by op load/filter_type_all/matrix_3_2 action_count 35 step 0 next 35
2021-09-14 16:15:23.549152: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e000 of size 1024 by op load/filter_type_all/bias_3_2 action_count 36 step 0 next 36
2021-09-14 16:15:23.549159: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e400 of size 10240 by op load/filter_type_all/matrix_2_2 action_count 37 step 0 next 37
2021-09-14 16:15:23.549166: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950c00 of size 512 by op load/filter_type_all/bias_2_2 action_count 38 step 0 next 38
2021-09-14 16:15:23.549174: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950e00 of size 256 by op load/filter_type_all/matrix_1_2 action_count 39 step 0 next 39
2021-09-14 16:15:23.549183: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950f00 of size 256 by op load/filter_type_all/bias_1_2 action_count 40 step 0 next 40
2021-09-14 16:15:23.549190: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c951000 of size 40192 by op load/filter_type_all/matrix_3_1 action_count 41 step 0 next 41
2021-09-14 16:15:23.549197: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95ad00 of size 1024 by op load/filter_type_all/bias_3_1 action_count 42 step 0 next 42
2021-09-14 16:15:23.549204: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95b100 of size 10240 by op load/filter_type_all/matrix_2_1 action_count 43 step 0 next 43
2021-09-14 16:15:23.549211: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95d900 of size 512 by op load/filter_type_all/bias_2_1 action_count 44 step 0 next 44
2021-09-14 16:15:23.549218: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95db00 of size 256 by op load/filter_type_all/matrix_1_1 action_count 45 step 0 next 45
2021-09-14 16:15:23.549225: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dc00 of size 256 by op load/filter_type_all/bias_1_1 action_count 46 step 0 next 46
2021-09-14 16:15:23.549232: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dd00 of size 40192 by op load/filter_type_all/matrix_3_0 action_count 47 step 0 next 47
2021-09-14 16:15:23.549239: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967a00 of size 1024 by op load/filter_type_all/bias_3_0 action_count 48 step 0 next 48
2021-09-14 16:15:23.549245: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967e00 of size 256 by op load/filter_type_all/bias_1_0 action_count 49 step 0 next 49
2021-09-14 16:15:23.549252: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967f00 of size 10240 by op load/filter_type_all/matrix_2_0 action_count 50 step 0 next 50
2021-09-14 16:15:23.549259: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a700 of size 512 by op load/filter_type_all/bias_2_0 action_count 51 step 0 next 51
2021-09-14 16:15:23.549266: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a900 of size 256 by op load/filter_type_all/matrix_1_0 action_count 52 step 0 next 52
2021-09-14 16:15:23.549273: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96aa00 of size 220928 by op load/descrpt_attr/t_avg action_count 53 step 0 next 53
2021-09-14 16:15:23.549280: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c9a0900 of size 220928 by op load/descrpt_attr/t_std action_count 54 step 0 next 54
2021-09-14 16:15:23.549287: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af09c9d6800 of size 235617280 by op UNUSED action_count 124269 step 238 next 86
2021-09-14 16:15:23.549294: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0aaa8a400 of size 706560000 by op load/ProdEnvMatA action_count 122455 step 238 next 78
2021-09-14 16:15:23.549301: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0d4c5e400 of size 176640000 by op load/ProdEnvMatA action_count 122456 step 238 next 97
2021-09-14 16:15:23.549308: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0df4d3400 of size 29440000 by op load/ProdEnvMatA action_count 122457 step 238 next 58
2021-09-14 16:15:23.549315: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e10e6c00 of size 112640000 by op load/filter_type_all/Slice_4 action_count 124261 step 238 next 68
2021-09-14 16:15:23.549322: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e7c52c00 of size 71680000 by op load/filter_type_all/Slice_2 action_count 124262 step 238 next 82
2021-09-14 16:15:23.549330: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ec0aec00 of size 51200000 by op load/filter_type_all/Slice action_count 124263 step 238 next 60
2021-09-14 16:15:23.549339: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ef182c00 of size 704000000 by op load/filter_type_all/MatMul_8 action_count 124268 step 238 next 83
2021-09-14 16:15:23.549358: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1190e5c00 of size 448000000 by op load/filter_type_all/MatMul_4 action_count 124270 step 238 next 65
2021-09-14 16:15:23.549365: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af133c24c00 of size 320000000 by op UNUSED action_count 124275 step 238 next 63
2021-09-14 16:15:23.549372: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af146d51c00 of size 320000000 by op load/filter_type_all/add action_count 124274 step 238 next 95
2021-09-14 16:15:23.549379: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af159e7ec00 of size 1408000000 by op load/filter_type_all/MatMul_9 action_count 124276 step 238 next 89
2021-09-14 16:15:23.549386: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af1add44c00 of size 1408000000 by op UNUSED action_count 124283 step 238 next 93
2021-09-14 16:15:23.549393: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af201c0ac00 of size 896000000 by op load/filter_type_all/MatMul_5 action_count 124278 step 238 next 94
2021-09-14 16:15:23.549400: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af237288c00 of size 896000000 by op load/filter_type_all/concat_2 action_count 124279 step 238 next 59
2021-09-14 16:15:23.549407: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af26c906c00 of size 640000000 by op load/filter_type_all/concat action_count 124280 step 238 next 92
2021-09-14 16:15:23.549414: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af292b60c00 of size 640000000 by op load/filter_type_all/MatMul_1 action_count 124281 step 238 next 73
2021-09-14 16:15:23.549421: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af2b8dbac00 of size 2816000000 by op load/filter_type_all/MatMul_10 action_count 124282 step 238 next 74
2021-09-14 16:15:23.549428: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af360b46c00 of size 1792000000 by op load/filter_type_all/MatMul_6 action_count 124284 step 238 next 69
2021-09-14 16:15:23.549435: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af3cb842c00 of size 762958848 by op UNUSED action_count 122412 step 237 next 18446744073709551615
2021-09-14 16:15:23.549442: I tensorflow/core/common_runtime/bfc_allocator.cc:1051]      Summary of in-use Chunks by size: 
2021-09-14 16:15:23.549450: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 12 Chunks of size 256 totalling 3.0KiB
2021-09-14 16:15:23.549457: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 512 totalling 1.5KiB
2021-09-14 16:15:23.549464: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 1024 totalling 3.0KiB
2021-09-14 16:15:23.549471: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1280 totalling 1.2KiB
2021-09-14 16:15:23.549478: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 18 Chunks of size 2048 totalling 36.0KiB
2021-09-14 16:15:23.549485: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 10240 totalling 30.0KiB
2021-09-14 16:15:23.549493: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 40192 totalling 117.8KiB
2021-09-14 16:15:23.549500: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 220928 totalling 431.5KiB
2021-09-14 16:15:23.549507: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 6 Chunks of size 460800 totalling 2.64MiB
2021-09-14 16:15:23.549513: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 2304000 totalling 6.59MiB
2021-09-14 16:15:23.549521: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 29440000 totalling 28.08MiB
2021-09-14 16:15:23.549531: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 51200000 totalling 48.83MiB
2021-09-14 16:15:23.549539: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 71680000 totalling 68.36MiB
2021-09-14 16:15:23.549546: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 112640000 totalling 107.42MiB
2021-09-14 16:15:23.549553: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 176640000 totalling 168.46MiB
2021-09-14 16:15:23.549560: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 320000000 totalling 305.18MiB
2021-09-14 16:15:23.549567: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 448000000 totalling 427.25MiB
2021-09-14 16:15:23.549574: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 640000000 totalling 1.19GiB
2021-09-14 16:15:23.549581: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 704000000 totalling 671.39MiB
2021-09-14 16:15:23.549588: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 706560000 totalling 673.83MiB
2021-09-14 16:15:23.549595: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 896000000 totalling 1.67GiB
2021-09-14 16:15:23.549602: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1408000000 totalling 1.31GiB
2021-09-14 16:15:23.549609: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1792000000 totalling 1.67GiB
2021-09-14 16:15:23.549616: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2816000000 totalling 2.62GiB
2021-09-14 16:15:23.549623: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 10.91GiB
2021-09-14 16:15:23.549630: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 14445051904 memory_limit_: 14445051904 available bytes: 0 curr_region_allocation_bytes_: 28890103808
2021-09-14 16:15:23.549641: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
Limit:                     14445051904
InUse:                     11718475776
MaxInUse:                  11718475776
NumAllocs:                       62177
MaxAllocSize:               2816000000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-09-14 16:15:23.549655: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ******************_*************_________******************************************************_____
2021-09-14 16:15:23.549694: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at concat_op.cc:158 : Resource exhausted: OOM when allocating tensor with shape[2240000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2021-09-14 16:15:33.549958: W tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.19GiB (rounded to 1280000000)requested by op load/filter_type_all/MatMul_2
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2021-09-14 16:15:33.549998: I tensorflow/core/common_runtime/bfc_allocator.cc:991] BFCAllocator dump for GPU_0_bfc
2021-09-14 16:15:33.550011: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (256): 	Total Chunks: 12, Chunks in use: 12. 3.0KiB allocated for chunks. 3.0KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:33.550020: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (512): 	Total Chunks: 3, Chunks in use: 3. 1.5KiB allocated for chunks. 1.5KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:33.550029: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1024): 	Total Chunks: 4, Chunks in use: 4. 4.2KiB allocated for chunks. 4.2KiB in use in bin. 3.3KiB client-requested in use in bin.
2021-09-14 16:15:33.550042: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2048): 	Total Chunks: 18, Chunks in use: 18. 36.0KiB allocated for chunks. 36.0KiB in use in bin. 33.8KiB client-requested in use in bin.
2021-09-14 16:15:33.550051: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550060: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8192): 	Total Chunks: 3, Chunks in use: 3. 30.0KiB allocated for chunks. 30.0KiB in use in bin. 29.3KiB client-requested in use in bin.
2021-09-14 16:15:33.550068: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16384): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550078: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (32768): 	Total Chunks: 3, Chunks in use: 3. 117.8KiB allocated for chunks. 117.8KiB in use in bin. 117.2KiB client-requested in use in bin.
2021-09-14 16:15:33.550086: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (65536): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550095: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (131072): 	Total Chunks: 2, Chunks in use: 2. 431.5KiB allocated for chunks. 431.5KiB in use in bin. 431.2KiB client-requested in use in bin.
2021-09-14 16:15:33.550104: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (262144): 	Total Chunks: 6, Chunks in use: 6. 2.64MiB allocated for chunks. 2.64MiB in use in bin. 2.64MiB client-requested in use in bin.
2021-09-14 16:15:33.550111: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550119: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550127: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2097152): 	Total Chunks: 3, Chunks in use: 3. 6.59MiB allocated for chunks. 6.59MiB in use in bin. 6.59MiB client-requested in use in bin.
2021-09-14 16:15:33.550135: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550143: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550152: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16777216): 	Total Chunks: 1, Chunks in use: 1. 28.08MiB allocated for chunks. 28.08MiB in use in bin. 28.08MiB client-requested in use in bin.
2021-09-14 16:15:33.550165: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (33554432): 	Total Chunks: 1, Chunks in use: 1. 48.83MiB allocated for chunks. 48.83MiB in use in bin. 48.83MiB client-requested in use in bin.
2021-09-14 16:15:33.550176: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (67108864): 	Total Chunks: 2, Chunks in use: 2. 175.78MiB allocated for chunks. 175.78MiB in use in bin. 175.78MiB client-requested in use in bin.
2021-09-14 16:15:33.550185: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (134217728): 	Total Chunks: 2, Chunks in use: 1. 393.16MiB allocated for chunks. 168.46MiB in use in bin. 168.46MiB client-requested in use in bin.
2021-09-14 16:15:33.550194: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (268435456): 	Total Chunks: 14, Chunks in use: 11. 12.81GiB allocated for chunks. 10.97GiB in use in bin. 10.85GiB client-requested in use in bin.
2021-09-14 16:15:33.550203: I tensorflow/core/common_runtime/bfc_allocator.cc:1014] Bin for 1.19GiB was 256.00MiB, Chunk State: 
2021-09-14 16:15:33.550222: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 0 | bin_num: 20, prev:   Size: 427.25MiB | Requested Size: 427.25MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_4, stepid: 238, last_action: 124270, next:   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/add, stepid: 238, last_action: 124274, for: UNUSED, stepid: 238, last_action: 124275
2021-09-14 16:15:33.550234: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 727.61MiB | Requested Size: 42.72MiB | in_use: 0 | bin_num: 20, prev:   Size: 1.67GiB | Requested Size: 1.67GiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_6, stepid: 238, last_action: 124284, for: UNUSED, stepid: 237, last_action: 122412
2021-09-14 16:15:33.550248: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 854.49MiB | Requested Size: 854.49MiB | in_use: 0 | bin_num: 20, prev:   Size: 854.49MiB | Requested Size: 854.49MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_5, stepid: 238, last_action: 124278, next:   Size: 610.35MiB | Requested Size: 610.35MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/concat, stepid: 238, last_action: 124280, for: UNUSED, stepid: 238, last_action: 124285
2021-09-14 16:15:33.550255: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 14445051904
2021-09-14 16:15:33.550267: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000000 of size 256 by op load/descrpt_attr/rcut action_count 1 step 0 next 1
2021-09-14 16:15:33.550274: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000100 of size 1280 by op ScratchBuffer action_count 2 step 0 next 2
2021-09-14 16:15:33.550281: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000600 of size 256 by op load/gradients/grad_ys_0 action_count 3 step 0 next 3
2021-09-14 16:15:33.550294: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000700 of size 2304000 by op load/layer_0_type_2/matrix action_count 4 step 0 next 4
2021-09-14 16:15:33.550302: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c232f00 of size 2048 by op load/layer_0_type_2/bias action_count 5 step 0 next 5
2021-09-14 16:15:33.550309: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c233700 of size 460800 by op load/layer_1_type_2/matrix action_count 6 step 0 next 6
2021-09-14 16:15:33.550317: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a3f00 of size 2048 by op load/layer_1_type_2/bias action_count 7 step 0 next 7
2021-09-14 16:15:33.550324: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4700 of size 2048 by op load/layer_1_type_2/idt action_count 8 step 0 next 8
2021-09-14 16:15:33.550330: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4f00 of size 460800 by op load/layer_2_type_2/matrix action_count 9 step 0 next 9
2021-09-14 16:15:33.550337: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315700 of size 2048 by op load/layer_2_type_2/bias action_count 10 step 0 next 10
2021-09-14 16:15:33.550353: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315f00 of size 2048 by op load/layer_2_type_2/idt action_count 11 step 0 next 11
2021-09-14 16:15:33.550361: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316700 of size 2048 by op load/final_layer_type_2/matrix action_count 12 step 0 next 12
2021-09-14 16:15:33.550368: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316f00 of size 256 by op load/final_layer_type_2/bias action_count 13 step 0 next 13
2021-09-14 16:15:33.550375: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c317000 of size 2304000 by op load/layer_0_type_1/matrix action_count 14 step 0 next 14
2021-09-14 16:15:33.550382: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c549800 of size 2048 by op load/layer_0_type_1/bias action_count 15 step 0 next 15
2021-09-14 16:15:33.550391: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c54a000 of size 460800 by op load/layer_1_type_1/matrix action_count 16 step 0 next 16
2021-09-14 16:15:33.550398: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5ba800 of size 2048 by op load/layer_1_type_1/bias action_count 17 step 0 next 17
2021-09-14 16:15:33.550405: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb000 of size 2048 by op load/layer_1_type_1/idt action_count 18 step 0 next 18
2021-09-14 16:15:33.550412: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb800 of size 460800 by op load/layer_2_type_1/matrix action_count 19 step 0 next 19
2021-09-14 16:15:33.550419: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c000 of size 2048 by op load/layer_2_type_1/bias action_count 20 step 0 next 20
2021-09-14 16:15:33.550426: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c800 of size 2048 by op load/layer_2_type_1/idt action_count 21 step 0 next 21
2021-09-14 16:15:33.550433: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d000 of size 2048 by op load/final_layer_type_1/matrix action_count 22 step 0 next 22
2021-09-14 16:15:33.550440: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d800 of size 256 by op load/final_layer_type_1/bias action_count 23 step 0 next 23
2021-09-14 16:15:33.550447: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d900 of size 256 by op load/final_layer_type_0/bias action_count 24 step 0 next 24
2021-09-14 16:15:33.550454: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62da00 of size 2048 by op load/final_layer_type_0/matrix action_count 25 step 0 next 25
2021-09-14 16:15:33.550461: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62e200 of size 2048 by op load/layer_2_type_0/idt action_count 26 step 0 next 26
2021-09-14 16:15:33.550468: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62ea00 of size 2048 by op load/layer_2_type_0/bias action_count 27 step 0 next 27
2021-09-14 16:15:33.550475: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62f200 of size 460800 by op load/layer_2_type_0/matrix action_count 28 step 0 next 28
2021-09-14 16:15:33.550482: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c69fa00 of size 2048 by op load/layer_1_type_0/idt action_count 29 step 0 next 29
2021-09-14 16:15:33.550488: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0200 of size 2048 by op load/layer_1_type_0/bias action_count 30 step 0 next 30
2021-09-14 16:15:33.550495: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0a00 of size 2048 by op load/layer_0_type_0/bias action_count 31 step 0 next 31
2021-09-14 16:15:33.550502: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a1200 of size 460800 by op load/layer_1_type_0/matrix action_count 32 step 0 next 32
2021-09-14 16:15:33.550509: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c711a00 of size 2304000 by op load/layer_0_type_0/matrix action_count 33 step 0 next 33
2021-09-14 16:15:33.550516: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944200 of size 256 by op load/filter_type_all/mul/y action_count 34 step 0 next 34
2021-09-14 16:15:33.550523: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944300 of size 40192 by op load/filter_type_all/matrix_3_2 action_count 35 step 0 next 35
2021-09-14 16:15:33.550530: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e000 of size 1024 by op load/filter_type_all/bias_3_2 action_count 36 step 0 next 36
2021-09-14 16:15:33.550537: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e400 of size 10240 by op load/filter_type_all/matrix_2_2 action_count 37 step 0 next 37
2021-09-14 16:15:33.550547: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950c00 of size 512 by op load/filter_type_all/bias_2_2 action_count 38 step 0 next 38
2021-09-14 16:15:33.550554: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950e00 of size 256 by op load/filter_type_all/matrix_1_2 action_count 39 step 0 next 39
2021-09-14 16:15:33.550561: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950f00 of size 256 by op load/filter_type_all/bias_1_2 action_count 40 step 0 next 40
2021-09-14 16:15:33.550568: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c951000 of size 40192 by op load/filter_type_all/matrix_3_1 action_count 41 step 0 next 41
2021-09-14 16:15:33.550575: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95ad00 of size 1024 by op load/filter_type_all/bias_3_1 action_count 42 step 0 next 42
2021-09-14 16:15:33.550582: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95b100 of size 10240 by op load/filter_type_all/matrix_2_1 action_count 43 step 0 next 43
2021-09-14 16:15:33.550589: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95d900 of size 512 by op load/filter_type_all/bias_2_1 action_count 44 step 0 next 44
2021-09-14 16:15:33.550596: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95db00 of size 256 by op load/filter_type_all/matrix_1_1 action_count 45 step 0 next 45
2021-09-14 16:15:33.550602: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dc00 of size 256 by op load/filter_type_all/bias_1_1 action_count 46 step 0 next 46
2021-09-14 16:15:33.550609: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dd00 of size 40192 by op load/filter_type_all/matrix_3_0 action_count 47 step 0 next 47
2021-09-14 16:15:33.550616: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967a00 of size 1024 by op load/filter_type_all/bias_3_0 action_count 48 step 0 next 48
2021-09-14 16:15:33.550623: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967e00 of size 256 by op load/filter_type_all/bias_1_0 action_count 49 step 0 next 49
2021-09-14 16:15:33.550630: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967f00 of size 10240 by op load/filter_type_all/matrix_2_0 action_count 50 step 0 next 50
2021-09-14 16:15:33.550637: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a700 of size 512 by op load/filter_type_all/bias_2_0 action_count 51 step 0 next 51
2021-09-14 16:15:33.550643: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a900 of size 256 by op load/filter_type_all/matrix_1_0 action_count 52 step 0 next 52
2021-09-14 16:15:33.550650: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96aa00 of size 220928 by op load/descrpt_attr/t_avg action_count 53 step 0 next 53
2021-09-14 16:15:33.550658: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c9a0900 of size 220928 by op load/descrpt_attr/t_std action_count 54 step 0 next 54
2021-09-14 16:15:33.550665: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af09c9d6800 of size 235617280 by op UNUSED action_count 124269 step 238 next 86
2021-09-14 16:15:33.550672: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0aaa8a400 of size 706560000 by op load/ProdEnvMatA action_count 122455 step 238 next 78
2021-09-14 16:15:33.550679: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0d4c5e400 of size 176640000 by op load/ProdEnvMatA action_count 122456 step 238 next 97
2021-09-14 16:15:33.550686: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0df4d3400 of size 29440000 by op load/ProdEnvMatA action_count 122457 step 238 next 58
2021-09-14 16:15:33.550693: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e10e6c00 of size 112640000 by op load/filter_type_all/Slice_4 action_count 124261 step 238 next 68
2021-09-14 16:15:33.550702: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e7c52c00 of size 71680000 by op load/filter_type_all/Slice_2 action_count 124262 step 238 next 82
2021-09-14 16:15:33.550709: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ec0aec00 of size 51200000 by op load/filter_type_all/Slice action_count 124263 step 238 next 60
2021-09-14 16:15:33.550717: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ef182c00 of size 704000000 by op load/filter_type_all/MatMul_8 action_count 124268 step 238 next 83
2021-09-14 16:15:33.550724: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1190e5c00 of size 448000000 by op load/filter_type_all/MatMul_4 action_count 124270 step 238 next 65
2021-09-14 16:15:33.550730: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af133c24c00 of size 320000000 by op UNUSED action_count 124275 step 238 next 63
2021-09-14 16:15:33.550737: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af146d51c00 of size 320000000 by op load/filter_type_all/add action_count 124274 step 238 next 95
2021-09-14 16:15:33.550744: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af159e7ec00 of size 1408000000 by op load/filter_type_all/MatMul_9 action_count 124276 step 238 next 89
2021-09-14 16:15:33.550752: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1add44c00 of size 1408000000 by op load/filter_type_all/concat_1 action_count 124286 step 238 next 93
2021-09-14 16:15:33.550758: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af201c0ac00 of size 896000000 by op load/filter_type_all/MatMul_5 action_count 124278 step 238 next 94
2021-09-14 16:15:33.550765: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af237288c00 of size 896000000 by op UNUSED action_count 124285 step 238 next 59
2021-09-14 16:15:33.550772: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af26c906c00 of size 640000000 by op load/filter_type_all/concat action_count 124280 step 238 next 92
2021-09-14 16:15:33.550779: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af292b60c00 of size 640000000 by op load/filter_type_all/MatMul_1 action_count 124281 step 238 next 73
2021-09-14 16:15:33.550786: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af2b8dbac00 of size 2816000000 by op load/filter_type_all/MatMul_10 action_count 124282 step 238 next 74
2021-09-14 16:15:33.550793: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af360b46c00 of size 1792000000 by op load/filter_type_all/MatMul_6 action_count 124284 step 238 next 69
2021-09-14 16:15:33.550800: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af3cb842c00 of size 762958848 by op UNUSED action_count 122412 step 237 next 18446744073709551615
2021-09-14 16:15:33.550806: I tensorflow/core/common_runtime/bfc_allocator.cc:1051]      Summary of in-use Chunks by size: 
2021-09-14 16:15:33.550814: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 12 Chunks of size 256 totalling 3.0KiB
2021-09-14 16:15:33.550822: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 512 totalling 1.5KiB
2021-09-14 16:15:33.550829: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 1024 totalling 3.0KiB
2021-09-14 16:15:33.550836: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1280 totalling 1.2KiB
2021-09-14 16:15:33.550843: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 18 Chunks of size 2048 totalling 36.0KiB
2021-09-14 16:15:33.550850: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 10240 totalling 30.0KiB
2021-09-14 16:15:33.550857: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 40192 totalling 117.8KiB
2021-09-14 16:15:33.550864: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 220928 totalling 431.5KiB
2021-09-14 16:15:33.550871: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 6 Chunks of size 460800 totalling 2.64MiB
2021-09-14 16:15:33.550879: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 2304000 totalling 6.59MiB
2021-09-14 16:15:33.550887: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 29440000 totalling 28.08MiB
2021-09-14 16:15:33.550894: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 51200000 totalling 48.83MiB
2021-09-14 16:15:33.550901: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 71680000 totalling 68.36MiB
2021-09-14 16:15:33.550908: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 112640000 totalling 107.42MiB
2021-09-14 16:15:33.550915: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 176640000 totalling 168.46MiB
2021-09-14 16:15:33.550922: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 320000000 totalling 305.18MiB
2021-09-14 16:15:33.550929: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 448000000 totalling 427.25MiB
2021-09-14 16:15:33.550936: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 640000000 totalling 1.19GiB
2021-09-14 16:15:33.550943: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 704000000 totalling 671.39MiB
2021-09-14 16:15:33.550950: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 706560000 totalling 673.83MiB
2021-09-14 16:15:33.550957: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 896000000 totalling 854.49MiB
2021-09-14 16:15:33.550964: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 1408000000 totalling 2.62GiB
2021-09-14 16:15:33.550971: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1792000000 totalling 1.67GiB
2021-09-14 16:15:33.550979: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2816000000 totalling 2.62GiB
2021-09-14 16:15:33.550990: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 11.39GiB
2021-09-14 16:15:33.550999: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 14445051904 memory_limit_: 14445051904 available bytes: 0 curr_region_allocation_bytes_: 28890103808
2021-09-14 16:15:33.551011: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
Limit:                     14445051904
InUse:                     12230475776
MaxInUse:                  12230475776
NumAllocs:                       62178
MaxAllocSize:               2816000000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-09-14 16:15:33.551023: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ******************_*****************************_____******************************************_____
2021-09-14 16:15:33.551064: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at matmul_op_impl.h:710 : Resource exhausted: OOM when allocating tensor with shape[1600000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node load/filter_type_all/concat_5}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[load/o_virial/_27]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node load/filter_type_all/concat_5}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/bin/dp", line 10, in <module>
    sys.exit(main())
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 443, in main
    test(**dict_args)
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/entrypoints/test.py", line 82, in test
    err = test_ener(
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/entrypoints/test.py", line 229, in test_ener
    ret = dp.eval(
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/infer/deep_pot.py", line 231, in eval
    e, f, v = self._eval_inner(coords, cells, atom_types, fparam = fparam, aparam = aparam, atomic = atomic, efield = efield)
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/infer/deep_pot.py", line 326, in _eval_inner
    v_out = self.sess.run (t_out, feed_dict = feed_dict_test)
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1368, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node load/filter_type_all/concat_5 (defined at /lib/python3.9/site-packages/deepmd/infer/deep_eval.py:141) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[load/o_virial/_27]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node load/filter_type_all/concat_5 (defined at /lib/python3.9/site-packages/deepmd/infer/deep_eval.py:141) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Original stack trace for 'load/filter_type_all/concat_5':
  File "/bin/dp", line 10, in <module>
    sys.exit(main())
  File "/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 443, in main
    test(**dict_args)
  File "/lib/python3.9/site-packages/deepmd/entrypoints/test.py", line 71, in test
    dp = DeepPotential(model)
  File "/lib/python3.9/site-packages/deepmd/infer/__init__.py", line 62, in DeepPotential
    dp = DeepPot(mf, load_prefix=load_prefix, default_tf_graph=default_tf_graph)
  File "/lib/python3.9/site-packages/deepmd/infer/deep_pot.py", line 82, in __init__
    DeepEval.__init__(
  File "/lib/python3.9/site-packages/deepmd/infer/deep_eval.py", line 26, in __init__
    self.graph = self._load_graph(
  File "/lib/python3.9/site-packages/deepmd/infer/deep_eval.py", line 141, in _load_graph
    tf.import_graph_def(
  File "/lib/python3.9/site-packages/tensorflow/python/util/deprecation.py", line 535, in new_func
    return func(*args, **kwargs)
  File "/lib/python3.9/site-packages/tensorflow/python/framework/importer.py", line 400, in import_graph_def
    return _import_graph_def_internal(
  File "/lib/python3.9/site-packages/tensorflow/python/framework/importer.py", line 513, in _import_graph_def_internal
    _ProcessNewOps(graph)
  File "/lib/python3.9/site-packages/tensorflow/python/framework/importer.py", line 243, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 3707, in _add_new_tf_operations
    new_ops = [
  File "/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 3708, in <listcomp>
    self._create_op_from_tf_operation(c_op, compute_device=compute_devices)
  File "/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 3590, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 2045, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)

COMMAND:

nohup /opt/software/deepmd-kit/2.0.0-cuda10.1/bin/dp test -m ./AlCuMg_a9r12_1600_000.pb -s ../../data/ > dptest.log 2>&1 &

paltform: ALI-EHPC

machine_type: P100_4_30 & T4_4_15 (training is OK, but test giving OOM )

Steps to Reproduce

Further Information, Files, and Links

The text was updated successfully, but these errors were encountered:

njzjz · 2021-09-14T19:29:17Z

See #748 (comment)

amcadmus · 2021-09-14T23:41:29Z

We may want an automatic adjustment of the testing batchsize...

njzjz · 2021-09-15T00:15:04Z

We may want an automatic adjustment of the testing batchsize...

Here is something that can be referred: Lightning-AI/pytorch-lightning#1638

Vibsteamer · 2021-09-15T06:28:51Z

See #748 (comment)

Is it possible now to complete a “all-data” dp test ?
May be to conduct on a CPU node with the larger enough MEM?

Resolves deepmodeling#1149. We start nbatch * natoms from 1024 (or we can set a different number), and iteratively multiply it by 2 until catching the OOM error. A small issue is that it's a bit slow to catch the TF OOM error. It's a problem of TF and I don't know how to resolve it. Luckily we only need to catch once.

Vibsteamer added the bug label Sep 14, 2021

njzjz self-assigned this Sep 20, 2021

njzjz mentioned this issue Sep 22, 2021

automatic batch size for dp test #1165

Merged

amcadmus closed this as completed in ba087c4 Sep 29, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] _dp test raise OOM | 2.0.0 _ #1149

[BUG] _dp test raise OOM | 2.0.0 _ #1149

Vibsteamer commented Sep 14, 2021

njzjz commented Sep 14, 2021

amcadmus commented Sep 14, 2021

njzjz commented Sep 15, 2021

Vibsteamer commented Sep 15, 2021

[BUG] _dp test raise OOM | 2.0.0 _ #1149

[BUG] _dp test raise OOM | 2.0.0 _ #1149

Comments

Vibsteamer commented Sep 14, 2021

njzjz commented Sep 14, 2021

amcadmus commented Sep 14, 2021

njzjz commented Sep 15, 2021

Vibsteamer commented Sep 15, 2021