Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] _dp test raise OOM | 2.0.0 _ #1149

Closed
Vibsteamer opened this issue Sep 14, 2021 · 4 comments
Closed

[BUG] _dp test raise OOM | 2.0.0 _ #1149

Vibsteamer opened this issue Sep 14, 2021 · 4 comments
Assignees
Labels

Comments

@Vibsteamer
Copy link
Contributor

Summary
Using kit-2.0.0 to conduct "dp test" would raise error of OOM. Not seen with previous version on the same system.

Deepmd-kit version, installation way, input file, running commands, error log, etc.

version 2.0.0_release

error

......
DEEPMD INFO    # ---------------output of dp test--------------- 
DEEPMD INFO    # testing system : ../../data/data.init/data.init/AlCuMg/init.1119
DEEPMD INFO    # number of test data : 10 
DEEPMD INFO    Energy RMSE        : 6.845718e-02 eV
DEEPMD INFO    Energy RMSE/Natoms : 2.139287e-03 eV
DEEPMD INFO    Force  RMSE        : 9.130943e-03 eV/A
DEEPMD INFO    Virial RMSE        : 2.466390e-01 eV
DEEPMD INFO    Virial RMSE/Natoms : 7.707468e-03 eV
DEEPMD INFO    # ----------------------------------------------- 
DEEPMD INFO    # ---------------output of dp test--------------- 
DEEPMD INFO    # testing system : ../../data/data.init/data.init/AlCuMg/init.112
2021-09-14 16:15:13.547135: W tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 2.62GiB (rounded to 2816000000)requested by op load/filter_type_all/concat_5
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2021-09-14 16:15:13.547190: I tensorflow/core/common_runtime/bfc_allocator.cc:991] BFCAllocator dump for GPU_0_bfc
2021-09-14 16:15:13.547210: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (256): 	Total Chunks: 12, Chunks in use: 12. 3.0KiB allocated for chunks. 3.0KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:13.547220: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (512): 	Total Chunks: 3, Chunks in use: 3. 1.5KiB allocated for chunks. 1.5KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:13.547228: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1024): 	Total Chunks: 4, Chunks in use: 4. 4.2KiB allocated for chunks. 4.2KiB in use in bin. 3.3KiB client-requested in use in bin.
2021-09-14 16:15:13.547242: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2048): 	Total Chunks: 18, Chunks in use: 18. 36.0KiB allocated for chunks. 36.0KiB in use in bin. 33.8KiB client-requested in use in bin.
2021-09-14 16:15:13.547251: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547260: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8192): 	Total Chunks: 3, Chunks in use: 3. 30.0KiB allocated for chunks. 30.0KiB in use in bin. 29.3KiB client-requested in use in bin.
2021-09-14 16:15:13.547268: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16384): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547277: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (32768): 	Total Chunks: 3, Chunks in use: 3. 117.8KiB allocated for chunks. 117.8KiB in use in bin. 117.2KiB client-requested in use in bin.
2021-09-14 16:15:13.547291: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (65536): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547301: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (131072): 	Total Chunks: 2, Chunks in use: 2. 431.5KiB allocated for chunks. 431.5KiB in use in bin. 431.2KiB client-requested in use in bin.
2021-09-14 16:15:13.547309: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (262144): 	Total Chunks: 6, Chunks in use: 6. 2.64MiB allocated for chunks. 2.64MiB in use in bin. 2.64MiB client-requested in use in bin.
2021-09-14 16:15:13.547317: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547324: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547332: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2097152): 	Total Chunks: 3, Chunks in use: 3. 6.59MiB allocated for chunks. 6.59MiB in use in bin. 6.59MiB client-requested in use in bin.
2021-09-14 16:15:13.547340: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547356: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:13.547366: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16777216): 	Total Chunks: 1, Chunks in use: 1. 28.08MiB allocated for chunks. 28.08MiB in use in bin. 28.08MiB client-requested in use in bin.
2021-09-14 16:15:13.547375: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (33554432): 	Total Chunks: 1, Chunks in use: 1. 48.83MiB allocated for chunks. 48.83MiB in use in bin. 48.83MiB client-requested in use in bin.
2021-09-14 16:15:13.547384: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (67108864): 	Total Chunks: 2, Chunks in use: 2. 175.78MiB allocated for chunks. 175.78MiB in use in bin. 175.78MiB client-requested in use in bin.
2021-09-14 16:15:13.547393: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (134217728): 	Total Chunks: 2, Chunks in use: 1. 393.16MiB allocated for chunks. 168.46MiB in use in bin. 168.46MiB client-requested in use in bin.
2021-09-14 16:15:13.547402: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (268435456): 	Total Chunks: 13, Chunks in use: 11. 12.81GiB allocated for chunks. 10.13GiB in use in bin. 10.13GiB client-requested in use in bin.
2021-09-14 16:15:13.547416: I tensorflow/core/common_runtime/bfc_allocator.cc:1014] Bin for 2.62GiB was 256.00MiB, Chunk State: 
2021-09-14 16:15:13.547436: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 0 | bin_num: 20, prev:   Size: 427.25MiB | Requested Size: 427.25MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_4, stepid: 238, last_action: 124270, next:   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/add, stepid: 238, last_action: 124274, for: UNUSED, stepid: 238, last_action: 124275
2021-09-14 16:15:13.547448: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 2.38GiB | Requested Size: 30.52MiB | in_use: 0 | bin_num: 20, prev:   Size: 2.62GiB | Requested Size: 2.62GiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_10, stepid: 238, last_action: 124282, for: UNUSED, stepid: 237, last_action: 122413
2021-09-14 16:15:13.547455: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 14445051904
2021-09-14 16:15:13.547469: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000000 of size 256 by op load/descrpt_attr/rcut action_count 1 step 0 next 1
2021-09-14 16:15:13.547477: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000100 of size 1280 by op ScratchBuffer action_count 2 step 0 next 2
2021-09-14 16:15:13.547485: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000600 of size 256 by op load/gradients/grad_ys_0 action_count 3 step 0 next 3
2021-09-14 16:15:13.547492: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000700 of size 2304000 by op load/layer_0_type_2/matrix action_count 4 step 0 next 4
2021-09-14 16:15:13.547500: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c232f00 of size 2048 by op load/layer_0_type_2/bias action_count 5 step 0 next 5
2021-09-14 16:15:13.547507: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c233700 of size 460800 by op load/layer_1_type_2/matrix action_count 6 step 0 next 6
2021-09-14 16:15:13.547514: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a3f00 of size 2048 by op load/layer_1_type_2/bias action_count 7 step 0 next 7
2021-09-14 16:15:13.547521: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4700 of size 2048 by op load/layer_1_type_2/idt action_count 8 step 0 next 8
2021-09-14 16:15:13.547528: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4f00 of size 460800 by op load/layer_2_type_2/matrix action_count 9 step 0 next 9
2021-09-14 16:15:13.547535: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315700 of size 2048 by op load/layer_2_type_2/bias action_count 10 step 0 next 10
2021-09-14 16:15:13.547542: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315f00 of size 2048 by op load/layer_2_type_2/idt action_count 11 step 0 next 11
2021-09-14 16:15:13.547549: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316700 of size 2048 by op load/final_layer_type_2/matrix action_count 12 step 0 next 12
2021-09-14 16:15:13.547556: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316f00 of size 256 by op load/final_layer_type_2/bias action_count 13 step 0 next 13
2021-09-14 16:15:13.547563: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c317000 of size 2304000 by op load/layer_0_type_1/matrix action_count 14 step 0 next 14
2021-09-14 16:15:13.547570: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c549800 of size 2048 by op load/layer_0_type_1/bias action_count 15 step 0 next 15
2021-09-14 16:15:13.547577: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c54a000 of size 460800 by op load/layer_1_type_1/matrix action_count 16 step 0 next 16
2021-09-14 16:15:13.547584: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5ba800 of size 2048 by op load/layer_1_type_1/bias action_count 17 step 0 next 17
2021-09-14 16:15:13.547594: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb000 of size 2048 by op load/layer_1_type_1/idt action_count 18 step 0 next 18
2021-09-14 16:15:13.547601: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb800 of size 460800 by op load/layer_2_type_1/matrix action_count 19 step 0 next 19
2021-09-14 16:15:13.547608: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c000 of size 2048 by op load/layer_2_type_1/bias action_count 20 step 0 next 20
2021-09-14 16:15:13.547615: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c800 of size 2048 by op load/layer_2_type_1/idt action_count 21 step 0 next 21
2021-09-14 16:15:13.547621: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d000 of size 2048 by op load/final_layer_type_1/matrix action_count 22 step 0 next 22
2021-09-14 16:15:13.547628: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d800 of size 256 by op load/final_layer_type_1/bias action_count 23 step 0 next 23
2021-09-14 16:15:13.547635: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d900 of size 256 by op load/final_layer_type_0/bias action_count 24 step 0 next 24
2021-09-14 16:15:13.547642: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62da00 of size 2048 by op load/final_layer_type_0/matrix action_count 25 step 0 next 25
2021-09-14 16:15:13.547649: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62e200 of size 2048 by op load/layer_2_type_0/idt action_count 26 step 0 next 26
2021-09-14 16:15:13.547656: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62ea00 of size 2048 by op load/layer_2_type_0/bias action_count 27 step 0 next 27
2021-09-14 16:15:13.547662: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62f200 of size 460800 by op load/layer_2_type_0/matrix action_count 28 step 0 next 28
2021-09-14 16:15:13.547669: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c69fa00 of size 2048 by op load/layer_1_type_0/idt action_count 29 step 0 next 29
2021-09-14 16:15:13.547676: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0200 of size 2048 by op load/layer_1_type_0/bias action_count 30 step 0 next 30
2021-09-14 16:15:13.547683: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0a00 of size 2048 by op load/layer_0_type_0/bias action_count 31 step 0 next 31
2021-09-14 16:15:13.547689: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a1200 of size 460800 by op load/layer_1_type_0/matrix action_count 32 step 0 next 32
2021-09-14 16:15:13.547696: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c711a00 of size 2304000 by op load/layer_0_type_0/matrix action_count 33 step 0 next 33
2021-09-14 16:15:13.547703: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944200 of size 256 by op load/filter_type_all/mul/y action_count 34 step 0 next 34
2021-09-14 16:15:13.547710: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944300 of size 40192 by op load/filter_type_all/matrix_3_2 action_count 35 step 0 next 35
2021-09-14 16:15:13.547718: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e000 of size 1024 by op load/filter_type_all/bias_3_2 action_count 36 step 0 next 36
2021-09-14 16:15:13.547725: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e400 of size 10240 by op load/filter_type_all/matrix_2_2 action_count 37 step 0 next 37
2021-09-14 16:15:13.547732: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950c00 of size 512 by op load/filter_type_all/bias_2_2 action_count 38 step 0 next 38
2021-09-14 16:15:13.547739: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950e00 of size 256 by op load/filter_type_all/matrix_1_2 action_count 39 step 0 next 39
2021-09-14 16:15:13.547748: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950f00 of size 256 by op load/filter_type_all/bias_1_2 action_count 40 step 0 next 40
2021-09-14 16:15:13.547755: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c951000 of size 40192 by op load/filter_type_all/matrix_3_1 action_count 41 step 0 next 41
2021-09-14 16:15:13.547762: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95ad00 of size 1024 by op load/filter_type_all/bias_3_1 action_count 42 step 0 next 42
2021-09-14 16:15:13.547769: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95b100 of size 10240 by op load/filter_type_all/matrix_2_1 action_count 43 step 0 next 43
2021-09-14 16:15:13.547776: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95d900 of size 512 by op load/filter_type_all/bias_2_1 action_count 44 step 0 next 44
2021-09-14 16:15:13.547783: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95db00 of size 256 by op load/filter_type_all/matrix_1_1 action_count 45 step 0 next 45
2021-09-14 16:15:13.547790: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dc00 of size 256 by op load/filter_type_all/bias_1_1 action_count 46 step 0 next 46
2021-09-14 16:15:13.547797: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dd00 of size 40192 by op load/filter_type_all/matrix_3_0 action_count 47 step 0 next 47
2021-09-14 16:15:13.547804: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967a00 of size 1024 by op load/filter_type_all/bias_3_0 action_count 48 step 0 next 48
2021-09-14 16:15:13.547811: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967e00 of size 256 by op load/filter_type_all/bias_1_0 action_count 49 step 0 next 49
2021-09-14 16:15:13.547817: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967f00 of size 10240 by op load/filter_type_all/matrix_2_0 action_count 50 step 0 next 50
2021-09-14 16:15:13.547824: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a700 of size 512 by op load/filter_type_all/bias_2_0 action_count 51 step 0 next 51
2021-09-14 16:15:13.547831: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a900 of size 256 by op load/filter_type_all/matrix_1_0 action_count 52 step 0 next 52
2021-09-14 16:15:13.547838: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96aa00 of size 220928 by op load/descrpt_attr/t_avg action_count 53 step 0 next 53
2021-09-14 16:15:13.547846: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c9a0900 of size 220928 by op load/descrpt_attr/t_std action_count 54 step 0 next 54
2021-09-14 16:15:13.547852: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af09c9d6800 of size 235617280 by op UNUSED action_count 124269 step 238 next 86
2021-09-14 16:15:13.547860: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0aaa8a400 of size 706560000 by op load/ProdEnvMatA action_count 122455 step 238 next 78
2021-09-14 16:15:13.547867: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0d4c5e400 of size 176640000 by op load/ProdEnvMatA action_count 122456 step 238 next 97
2021-09-14 16:15:13.547874: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0df4d3400 of size 29440000 by op load/ProdEnvMatA action_count 122457 step 238 next 58
2021-09-14 16:15:13.547881: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e10e6c00 of size 112640000 by op load/filter_type_all/Slice_4 action_count 124261 step 238 next 68
2021-09-14 16:15:13.547888: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e7c52c00 of size 71680000 by op load/filter_type_all/Slice_2 action_count 124262 step 238 next 82
2021-09-14 16:15:13.547896: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ec0aec00 of size 51200000 by op load/filter_type_all/Slice action_count 124263 step 238 next 60
2021-09-14 16:15:13.547906: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ef182c00 of size 704000000 by op load/filter_type_all/MatMul_8 action_count 124268 step 238 next 83
2021-09-14 16:15:13.547913: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1190e5c00 of size 448000000 by op load/filter_type_all/MatMul_4 action_count 124270 step 238 next 65
2021-09-14 16:15:13.547919: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af133c24c00 of size 320000000 by op UNUSED action_count 124275 step 238 next 63
2021-09-14 16:15:13.547926: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af146d51c00 of size 320000000 by op load/filter_type_all/add action_count 124274 step 238 next 95
2021-09-14 16:15:13.547934: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af159e7ec00 of size 1408000000 by op load/filter_type_all/MatMul_9 action_count 124276 step 238 next 89
2021-09-14 16:15:13.547941: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1add44c00 of size 1408000000 by op load/filter_type_all/concat_4 action_count 124277 step 238 next 93
2021-09-14 16:15:13.547948: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af201c0ac00 of size 896000000 by op load/filter_type_all/MatMul_5 action_count 124278 step 238 next 94
2021-09-14 16:15:13.547955: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af237288c00 of size 896000000 by op load/filter_type_all/concat_2 action_count 124279 step 238 next 59
2021-09-14 16:15:13.547962: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af26c906c00 of size 640000000 by op load/filter_type_all/concat action_count 124280 step 238 next 92
2021-09-14 16:15:13.547969: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af292b60c00 of size 640000000 by op load/filter_type_all/MatMul_1 action_count 124281 step 238 next 73
2021-09-14 16:15:13.547976: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af2b8dbac00 of size 2816000000 by op load/filter_type_all/MatMul_10 action_count 124282 step 238 next 74
2021-09-14 16:15:13.547983: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af360b46c00 of size 2554958848 by op UNUSED action_count 122413 step 237 next 18446744073709551615
2021-09-14 16:15:13.547989: I tensorflow/core/common_runtime/bfc_allocator.cc:1051]      Summary of in-use Chunks by size: 
2021-09-14 16:15:13.547997: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 12 Chunks of size 256 totalling 3.0KiB
2021-09-14 16:15:13.548005: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 512 totalling 1.5KiB
2021-09-14 16:15:13.548011: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 1024 totalling 3.0KiB
2021-09-14 16:15:13.548018: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1280 totalling 1.2KiB
2021-09-14 16:15:13.548025: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 18 Chunks of size 2048 totalling 36.0KiB
2021-09-14 16:15:13.548033: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 10240 totalling 30.0KiB
2021-09-14 16:15:13.548040: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 40192 totalling 117.8KiB
2021-09-14 16:15:13.548047: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 220928 totalling 431.5KiB
2021-09-14 16:15:13.548054: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 6 Chunks of size 460800 totalling 2.64MiB
2021-09-14 16:15:13.548060: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 2304000 totalling 6.59MiB
2021-09-14 16:15:13.548067: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 29440000 totalling 28.08MiB
2021-09-14 16:15:13.548075: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 51200000 totalling 48.83MiB
2021-09-14 16:15:13.548082: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 71680000 totalling 68.36MiB
2021-09-14 16:15:13.548092: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 112640000 totalling 107.42MiB
2021-09-14 16:15:13.548099: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 176640000 totalling 168.46MiB
2021-09-14 16:15:13.548106: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 320000000 totalling 305.18MiB
2021-09-14 16:15:13.548113: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 448000000 totalling 427.25MiB
2021-09-14 16:15:13.548120: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 640000000 totalling 1.19GiB
2021-09-14 16:15:13.548127: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 704000000 totalling 671.39MiB
2021-09-14 16:15:13.548134: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 706560000 totalling 673.83MiB
2021-09-14 16:15:13.548141: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 896000000 totalling 1.67GiB
2021-09-14 16:15:13.548147: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 1408000000 totalling 2.62GiB
2021-09-14 16:15:13.548154: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2816000000 totalling 2.62GiB
2021-09-14 16:15:13.548161: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 10.56GiB
2021-09-14 16:15:13.548168: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 14445051904 memory_limit_: 14445051904 available bytes: 0 curr_region_allocation_bytes_: 28890103808
2021-09-14 16:15:13.548180: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
Limit:                     14445051904
InUse:                     11334475776
MaxInUse:                  11334475776
NumAllocs:                       62176
MaxAllocSize:               2816000000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-09-14 16:15:13.548192: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ******************_****************************************************************_________________
2021-09-14 16:15:13.548241: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at concat_op.cc:158 : Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2021-09-14 16:15:23.548583: W tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.67GiB (rounded to 1792000000)requested by op load/filter_type_all/concat_3
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2021-09-14 16:15:23.548628: I tensorflow/core/common_runtime/bfc_allocator.cc:991] BFCAllocator dump for GPU_0_bfc
2021-09-14 16:15:23.548641: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (256): 	Total Chunks: 12, Chunks in use: 12. 3.0KiB allocated for chunks. 3.0KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:23.548650: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (512): 	Total Chunks: 3, Chunks in use: 3. 1.5KiB allocated for chunks. 1.5KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:23.548659: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1024): 	Total Chunks: 4, Chunks in use: 4. 4.2KiB allocated for chunks. 4.2KiB in use in bin. 3.3KiB client-requested in use in bin.
2021-09-14 16:15:23.548669: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2048): 	Total Chunks: 18, Chunks in use: 18. 36.0KiB allocated for chunks. 36.0KiB in use in bin. 33.8KiB client-requested in use in bin.
2021-09-14 16:15:23.548677: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548691: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8192): 	Total Chunks: 3, Chunks in use: 3. 30.0KiB allocated for chunks. 30.0KiB in use in bin. 29.3KiB client-requested in use in bin.
2021-09-14 16:15:23.548699: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16384): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548708: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (32768): 	Total Chunks: 3, Chunks in use: 3. 117.8KiB allocated for chunks. 117.8KiB in use in bin. 117.2KiB client-requested in use in bin.
2021-09-14 16:15:23.548716: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (65536): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548725: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (131072): 	Total Chunks: 2, Chunks in use: 2. 431.5KiB allocated for chunks. 431.5KiB in use in bin. 431.2KiB client-requested in use in bin.
2021-09-14 16:15:23.548734: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (262144): 	Total Chunks: 6, Chunks in use: 6. 2.64MiB allocated for chunks. 2.64MiB in use in bin. 2.64MiB client-requested in use in bin.
2021-09-14 16:15:23.548742: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548749: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548758: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2097152): 	Total Chunks: 3, Chunks in use: 3. 6.59MiB allocated for chunks. 6.59MiB in use in bin. 6.59MiB client-requested in use in bin.
2021-09-14 16:15:23.548765: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548773: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:23.548787: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16777216): 	Total Chunks: 1, Chunks in use: 1. 28.08MiB allocated for chunks. 28.08MiB in use in bin. 28.08MiB client-requested in use in bin.
2021-09-14 16:15:23.548798: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (33554432): 	Total Chunks: 1, Chunks in use: 1. 48.83MiB allocated for chunks. 48.83MiB in use in bin. 48.83MiB client-requested in use in bin.
2021-09-14 16:15:23.548807: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (67108864): 	Total Chunks: 2, Chunks in use: 2. 175.78MiB allocated for chunks. 175.78MiB in use in bin. 175.78MiB client-requested in use in bin.
2021-09-14 16:15:23.548816: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (134217728): 	Total Chunks: 2, Chunks in use: 1. 393.16MiB allocated for chunks. 168.46MiB in use in bin. 168.46MiB client-requested in use in bin.
2021-09-14 16:15:23.548825: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (268435456): 	Total Chunks: 14, Chunks in use: 11. 12.81GiB allocated for chunks. 10.49GiB in use in bin. 10.49GiB client-requested in use in bin.
2021-09-14 16:15:23.548834: I tensorflow/core/common_runtime/bfc_allocator.cc:1014] Bin for 1.67GiB was 256.00MiB, Chunk State: 
2021-09-14 16:15:23.548853: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 0 | bin_num: 20, prev:   Size: 427.25MiB | Requested Size: 427.25MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_4, stepid: 238, last_action: 124270, next:   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/add, stepid: 238, last_action: 124274, for: UNUSED, stepid: 238, last_action: 124275
2021-09-14 16:15:23.548867: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 727.61MiB | Requested Size: 42.72MiB | in_use: 0 | bin_num: 20, prev:   Size: 1.67GiB | Requested Size: 1.67GiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_6, stepid: 238, last_action: 124284, for: UNUSED, stepid: 237, last_action: 122412
2021-09-14 16:15:23.548879: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 1.31GiB | Requested Size: 1.31GiB | in_use: 0 | bin_num: 20, prev:   Size: 1.31GiB | Requested Size: 1.31GiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_9, stepid: 238, last_action: 124276, next:   Size: 854.49MiB | Requested Size: 854.49MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_5, stepid: 238, last_action: 124278, for: UNUSED, stepid: 238, last_action: 124283
2021-09-14 16:15:23.548887: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 14445051904
2021-09-14 16:15:23.548898: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000000 of size 256 by op load/descrpt_attr/rcut action_count 1 step 0 next 1
2021-09-14 16:15:23.548906: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000100 of size 1280 by op ScratchBuffer action_count 2 step 0 next 2
2021-09-14 16:15:23.548913: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000600 of size 256 by op load/gradients/grad_ys_0 action_count 3 step 0 next 3
2021-09-14 16:15:23.548921: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000700 of size 2304000 by op load/layer_0_type_2/matrix action_count 4 step 0 next 4
2021-09-14 16:15:23.548928: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c232f00 of size 2048 by op load/layer_0_type_2/bias action_count 5 step 0 next 5
2021-09-14 16:15:23.548936: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c233700 of size 460800 by op load/layer_1_type_2/matrix action_count 6 step 0 next 6
2021-09-14 16:15:23.548943: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a3f00 of size 2048 by op load/layer_1_type_2/bias action_count 7 step 0 next 7
2021-09-14 16:15:23.548949: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4700 of size 2048 by op load/layer_1_type_2/idt action_count 8 step 0 next 8
2021-09-14 16:15:23.548956: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4f00 of size 460800 by op load/layer_2_type_2/matrix action_count 9 step 0 next 9
2021-09-14 16:15:23.548963: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315700 of size 2048 by op load/layer_2_type_2/bias action_count 10 step 0 next 10
2021-09-14 16:15:23.548970: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315f00 of size 2048 by op load/layer_2_type_2/idt action_count 11 step 0 next 11
2021-09-14 16:15:23.548977: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316700 of size 2048 by op load/final_layer_type_2/matrix action_count 12 step 0 next 12
2021-09-14 16:15:23.548984: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316f00 of size 256 by op load/final_layer_type_2/bias action_count 13 step 0 next 13
2021-09-14 16:15:23.548991: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c317000 of size 2304000 by op load/layer_0_type_1/matrix action_count 14 step 0 next 14
2021-09-14 16:15:23.548998: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c549800 of size 2048 by op load/layer_0_type_1/bias action_count 15 step 0 next 15
2021-09-14 16:15:23.549005: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c54a000 of size 460800 by op load/layer_1_type_1/matrix action_count 16 step 0 next 16
2021-09-14 16:15:23.549012: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5ba800 of size 2048 by op load/layer_1_type_1/bias action_count 17 step 0 next 17
2021-09-14 16:15:23.549021: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb000 of size 2048 by op load/layer_1_type_1/idt action_count 18 step 0 next 18
2021-09-14 16:15:23.549028: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb800 of size 460800 by op load/layer_2_type_1/matrix action_count 19 step 0 next 19
2021-09-14 16:15:23.549035: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c000 of size 2048 by op load/layer_2_type_1/bias action_count 20 step 0 next 20
2021-09-14 16:15:23.549042: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c800 of size 2048 by op load/layer_2_type_1/idt action_count 21 step 0 next 21
2021-09-14 16:15:23.549048: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d000 of size 2048 by op load/final_layer_type_1/matrix action_count 22 step 0 next 22
2021-09-14 16:15:23.549055: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d800 of size 256 by op load/final_layer_type_1/bias action_count 23 step 0 next 23
2021-09-14 16:15:23.549063: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d900 of size 256 by op load/final_layer_type_0/bias action_count 24 step 0 next 24
2021-09-14 16:15:23.549069: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62da00 of size 2048 by op load/final_layer_type_0/matrix action_count 25 step 0 next 25
2021-09-14 16:15:23.549076: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62e200 of size 2048 by op load/layer_2_type_0/idt action_count 26 step 0 next 26
2021-09-14 16:15:23.549083: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62ea00 of size 2048 by op load/layer_2_type_0/bias action_count 27 step 0 next 27
2021-09-14 16:15:23.549090: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62f200 of size 460800 by op load/layer_2_type_0/matrix action_count 28 step 0 next 28
2021-09-14 16:15:23.549100: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c69fa00 of size 2048 by op load/layer_1_type_0/idt action_count 29 step 0 next 29
2021-09-14 16:15:23.549110: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0200 of size 2048 by op load/layer_1_type_0/bias action_count 30 step 0 next 30
2021-09-14 16:15:23.549117: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0a00 of size 2048 by op load/layer_0_type_0/bias action_count 31 step 0 next 31
2021-09-14 16:15:23.549124: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a1200 of size 460800 by op load/layer_1_type_0/matrix action_count 32 step 0 next 32
2021-09-14 16:15:23.549131: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c711a00 of size 2304000 by op load/layer_0_type_0/matrix action_count 33 step 0 next 33
2021-09-14 16:15:23.549138: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944200 of size 256 by op load/filter_type_all/mul/y action_count 34 step 0 next 34
2021-09-14 16:15:23.549145: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944300 of size 40192 by op load/filter_type_all/matrix_3_2 action_count 35 step 0 next 35
2021-09-14 16:15:23.549152: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e000 of size 1024 by op load/filter_type_all/bias_3_2 action_count 36 step 0 next 36
2021-09-14 16:15:23.549159: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e400 of size 10240 by op load/filter_type_all/matrix_2_2 action_count 37 step 0 next 37
2021-09-14 16:15:23.549166: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950c00 of size 512 by op load/filter_type_all/bias_2_2 action_count 38 step 0 next 38
2021-09-14 16:15:23.549174: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950e00 of size 256 by op load/filter_type_all/matrix_1_2 action_count 39 step 0 next 39
2021-09-14 16:15:23.549183: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950f00 of size 256 by op load/filter_type_all/bias_1_2 action_count 40 step 0 next 40
2021-09-14 16:15:23.549190: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c951000 of size 40192 by op load/filter_type_all/matrix_3_1 action_count 41 step 0 next 41
2021-09-14 16:15:23.549197: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95ad00 of size 1024 by op load/filter_type_all/bias_3_1 action_count 42 step 0 next 42
2021-09-14 16:15:23.549204: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95b100 of size 10240 by op load/filter_type_all/matrix_2_1 action_count 43 step 0 next 43
2021-09-14 16:15:23.549211: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95d900 of size 512 by op load/filter_type_all/bias_2_1 action_count 44 step 0 next 44
2021-09-14 16:15:23.549218: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95db00 of size 256 by op load/filter_type_all/matrix_1_1 action_count 45 step 0 next 45
2021-09-14 16:15:23.549225: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dc00 of size 256 by op load/filter_type_all/bias_1_1 action_count 46 step 0 next 46
2021-09-14 16:15:23.549232: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dd00 of size 40192 by op load/filter_type_all/matrix_3_0 action_count 47 step 0 next 47
2021-09-14 16:15:23.549239: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967a00 of size 1024 by op load/filter_type_all/bias_3_0 action_count 48 step 0 next 48
2021-09-14 16:15:23.549245: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967e00 of size 256 by op load/filter_type_all/bias_1_0 action_count 49 step 0 next 49
2021-09-14 16:15:23.549252: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967f00 of size 10240 by op load/filter_type_all/matrix_2_0 action_count 50 step 0 next 50
2021-09-14 16:15:23.549259: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a700 of size 512 by op load/filter_type_all/bias_2_0 action_count 51 step 0 next 51
2021-09-14 16:15:23.549266: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a900 of size 256 by op load/filter_type_all/matrix_1_0 action_count 52 step 0 next 52
2021-09-14 16:15:23.549273: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96aa00 of size 220928 by op load/descrpt_attr/t_avg action_count 53 step 0 next 53
2021-09-14 16:15:23.549280: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c9a0900 of size 220928 by op load/descrpt_attr/t_std action_count 54 step 0 next 54
2021-09-14 16:15:23.549287: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af09c9d6800 of size 235617280 by op UNUSED action_count 124269 step 238 next 86
2021-09-14 16:15:23.549294: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0aaa8a400 of size 706560000 by op load/ProdEnvMatA action_count 122455 step 238 next 78
2021-09-14 16:15:23.549301: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0d4c5e400 of size 176640000 by op load/ProdEnvMatA action_count 122456 step 238 next 97
2021-09-14 16:15:23.549308: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0df4d3400 of size 29440000 by op load/ProdEnvMatA action_count 122457 step 238 next 58
2021-09-14 16:15:23.549315: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e10e6c00 of size 112640000 by op load/filter_type_all/Slice_4 action_count 124261 step 238 next 68
2021-09-14 16:15:23.549322: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e7c52c00 of size 71680000 by op load/filter_type_all/Slice_2 action_count 124262 step 238 next 82
2021-09-14 16:15:23.549330: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ec0aec00 of size 51200000 by op load/filter_type_all/Slice action_count 124263 step 238 next 60
2021-09-14 16:15:23.549339: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ef182c00 of size 704000000 by op load/filter_type_all/MatMul_8 action_count 124268 step 238 next 83
2021-09-14 16:15:23.549358: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1190e5c00 of size 448000000 by op load/filter_type_all/MatMul_4 action_count 124270 step 238 next 65
2021-09-14 16:15:23.549365: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af133c24c00 of size 320000000 by op UNUSED action_count 124275 step 238 next 63
2021-09-14 16:15:23.549372: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af146d51c00 of size 320000000 by op load/filter_type_all/add action_count 124274 step 238 next 95
2021-09-14 16:15:23.549379: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af159e7ec00 of size 1408000000 by op load/filter_type_all/MatMul_9 action_count 124276 step 238 next 89
2021-09-14 16:15:23.549386: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af1add44c00 of size 1408000000 by op UNUSED action_count 124283 step 238 next 93
2021-09-14 16:15:23.549393: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af201c0ac00 of size 896000000 by op load/filter_type_all/MatMul_5 action_count 124278 step 238 next 94
2021-09-14 16:15:23.549400: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af237288c00 of size 896000000 by op load/filter_type_all/concat_2 action_count 124279 step 238 next 59
2021-09-14 16:15:23.549407: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af26c906c00 of size 640000000 by op load/filter_type_all/concat action_count 124280 step 238 next 92
2021-09-14 16:15:23.549414: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af292b60c00 of size 640000000 by op load/filter_type_all/MatMul_1 action_count 124281 step 238 next 73
2021-09-14 16:15:23.549421: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af2b8dbac00 of size 2816000000 by op load/filter_type_all/MatMul_10 action_count 124282 step 238 next 74
2021-09-14 16:15:23.549428: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af360b46c00 of size 1792000000 by op load/filter_type_all/MatMul_6 action_count 124284 step 238 next 69
2021-09-14 16:15:23.549435: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af3cb842c00 of size 762958848 by op UNUSED action_count 122412 step 237 next 18446744073709551615
2021-09-14 16:15:23.549442: I tensorflow/core/common_runtime/bfc_allocator.cc:1051]      Summary of in-use Chunks by size: 
2021-09-14 16:15:23.549450: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 12 Chunks of size 256 totalling 3.0KiB
2021-09-14 16:15:23.549457: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 512 totalling 1.5KiB
2021-09-14 16:15:23.549464: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 1024 totalling 3.0KiB
2021-09-14 16:15:23.549471: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1280 totalling 1.2KiB
2021-09-14 16:15:23.549478: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 18 Chunks of size 2048 totalling 36.0KiB
2021-09-14 16:15:23.549485: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 10240 totalling 30.0KiB
2021-09-14 16:15:23.549493: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 40192 totalling 117.8KiB
2021-09-14 16:15:23.549500: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 220928 totalling 431.5KiB
2021-09-14 16:15:23.549507: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 6 Chunks of size 460800 totalling 2.64MiB
2021-09-14 16:15:23.549513: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 2304000 totalling 6.59MiB
2021-09-14 16:15:23.549521: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 29440000 totalling 28.08MiB
2021-09-14 16:15:23.549531: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 51200000 totalling 48.83MiB
2021-09-14 16:15:23.549539: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 71680000 totalling 68.36MiB
2021-09-14 16:15:23.549546: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 112640000 totalling 107.42MiB
2021-09-14 16:15:23.549553: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 176640000 totalling 168.46MiB
2021-09-14 16:15:23.549560: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 320000000 totalling 305.18MiB
2021-09-14 16:15:23.549567: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 448000000 totalling 427.25MiB
2021-09-14 16:15:23.549574: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 640000000 totalling 1.19GiB
2021-09-14 16:15:23.549581: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 704000000 totalling 671.39MiB
2021-09-14 16:15:23.549588: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 706560000 totalling 673.83MiB
2021-09-14 16:15:23.549595: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 896000000 totalling 1.67GiB
2021-09-14 16:15:23.549602: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1408000000 totalling 1.31GiB
2021-09-14 16:15:23.549609: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1792000000 totalling 1.67GiB
2021-09-14 16:15:23.549616: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2816000000 totalling 2.62GiB
2021-09-14 16:15:23.549623: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 10.91GiB
2021-09-14 16:15:23.549630: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 14445051904 memory_limit_: 14445051904 available bytes: 0 curr_region_allocation_bytes_: 28890103808
2021-09-14 16:15:23.549641: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
Limit:                     14445051904
InUse:                     11718475776
MaxInUse:                  11718475776
NumAllocs:                       62177
MaxAllocSize:               2816000000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-09-14 16:15:23.549655: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ******************_*************_________******************************************************_____
2021-09-14 16:15:23.549694: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at concat_op.cc:158 : Resource exhausted: OOM when allocating tensor with shape[2240000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
2021-09-14 16:15:33.549958: W tensorflow/core/common_runtime/bfc_allocator.cc:456] Allocator (GPU_0_bfc) ran out of memory trying to allocate 1.19GiB (rounded to 1280000000)requested by op load/filter_type_all/MatMul_2
If the cause is memory fragmentation maybe the environment variable 'TF_GPU_ALLOCATOR=cuda_malloc_async' will improve the situation. 
Current allocation summary follows.
Current allocation summary follows.
2021-09-14 16:15:33.549998: I tensorflow/core/common_runtime/bfc_allocator.cc:991] BFCAllocator dump for GPU_0_bfc
2021-09-14 16:15:33.550011: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (256): 	Total Chunks: 12, Chunks in use: 12. 3.0KiB allocated for chunks. 3.0KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:33.550020: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (512): 	Total Chunks: 3, Chunks in use: 3. 1.5KiB allocated for chunks. 1.5KiB in use in bin. 1.2KiB client-requested in use in bin.
2021-09-14 16:15:33.550029: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1024): 	Total Chunks: 4, Chunks in use: 4. 4.2KiB allocated for chunks. 4.2KiB in use in bin. 3.3KiB client-requested in use in bin.
2021-09-14 16:15:33.550042: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2048): 	Total Chunks: 18, Chunks in use: 18. 36.0KiB allocated for chunks. 36.0KiB in use in bin. 33.8KiB client-requested in use in bin.
2021-09-14 16:15:33.550051: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4096): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550060: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8192): 	Total Chunks: 3, Chunks in use: 3. 30.0KiB allocated for chunks. 30.0KiB in use in bin. 29.3KiB client-requested in use in bin.
2021-09-14 16:15:33.550068: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16384): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550078: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (32768): 	Total Chunks: 3, Chunks in use: 3. 117.8KiB allocated for chunks. 117.8KiB in use in bin. 117.2KiB client-requested in use in bin.
2021-09-14 16:15:33.550086: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (65536): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550095: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (131072): 	Total Chunks: 2, Chunks in use: 2. 431.5KiB allocated for chunks. 431.5KiB in use in bin. 431.2KiB client-requested in use in bin.
2021-09-14 16:15:33.550104: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (262144): 	Total Chunks: 6, Chunks in use: 6. 2.64MiB allocated for chunks. 2.64MiB in use in bin. 2.64MiB client-requested in use in bin.
2021-09-14 16:15:33.550111: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (524288): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550119: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (1048576): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550127: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (2097152): 	Total Chunks: 3, Chunks in use: 3. 6.59MiB allocated for chunks. 6.59MiB in use in bin. 6.59MiB client-requested in use in bin.
2021-09-14 16:15:33.550135: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (4194304): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550143: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (8388608): 	Total Chunks: 0, Chunks in use: 0. 0B allocated for chunks. 0B in use in bin. 0B client-requested in use in bin.
2021-09-14 16:15:33.550152: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (16777216): 	Total Chunks: 1, Chunks in use: 1. 28.08MiB allocated for chunks. 28.08MiB in use in bin. 28.08MiB client-requested in use in bin.
2021-09-14 16:15:33.550165: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (33554432): 	Total Chunks: 1, Chunks in use: 1. 48.83MiB allocated for chunks. 48.83MiB in use in bin. 48.83MiB client-requested in use in bin.
2021-09-14 16:15:33.550176: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (67108864): 	Total Chunks: 2, Chunks in use: 2. 175.78MiB allocated for chunks. 175.78MiB in use in bin. 175.78MiB client-requested in use in bin.
2021-09-14 16:15:33.550185: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (134217728): 	Total Chunks: 2, Chunks in use: 1. 393.16MiB allocated for chunks. 168.46MiB in use in bin. 168.46MiB client-requested in use in bin.
2021-09-14 16:15:33.550194: I tensorflow/core/common_runtime/bfc_allocator.cc:998] Bin (268435456): 	Total Chunks: 14, Chunks in use: 11. 12.81GiB allocated for chunks. 10.97GiB in use in bin. 10.85GiB client-requested in use in bin.
2021-09-14 16:15:33.550203: I tensorflow/core/common_runtime/bfc_allocator.cc:1014] Bin for 1.19GiB was 256.00MiB, Chunk State: 
2021-09-14 16:15:33.550222: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 0 | bin_num: 20, prev:   Size: 427.25MiB | Requested Size: 427.25MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_4, stepid: 238, last_action: 124270, next:   Size: 305.18MiB | Requested Size: 305.18MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/add, stepid: 238, last_action: 124274, for: UNUSED, stepid: 238, last_action: 124275
2021-09-14 16:15:33.550234: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 727.61MiB | Requested Size: 42.72MiB | in_use: 0 | bin_num: 20, prev:   Size: 1.67GiB | Requested Size: 1.67GiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_6, stepid: 238, last_action: 124284, for: UNUSED, stepid: 237, last_action: 122412
2021-09-14 16:15:33.550248: I tensorflow/core/common_runtime/bfc_allocator.cc:1020]   Size: 854.49MiB | Requested Size: 854.49MiB | in_use: 0 | bin_num: 20, prev:   Size: 854.49MiB | Requested Size: 854.49MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/MatMul_5, stepid: 238, last_action: 124278, next:   Size: 610.35MiB | Requested Size: 610.35MiB | in_use: 1 | bin_num: -1, for: load/filter_type_all/concat, stepid: 238, last_action: 124280, for: UNUSED, stepid: 238, last_action: 124285
2021-09-14 16:15:33.550255: I tensorflow/core/common_runtime/bfc_allocator.cc:1027] Next region of size 14445051904
2021-09-14 16:15:33.550267: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000000 of size 256 by op load/descrpt_attr/rcut action_count 1 step 0 next 1
2021-09-14 16:15:33.550274: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000100 of size 1280 by op ScratchBuffer action_count 2 step 0 next 2
2021-09-14 16:15:33.550281: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000600 of size 256 by op load/gradients/grad_ys_0 action_count 3 step 0 next 3
2021-09-14 16:15:33.550294: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c000700 of size 2304000 by op load/layer_0_type_2/matrix action_count 4 step 0 next 4
2021-09-14 16:15:33.550302: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c232f00 of size 2048 by op load/layer_0_type_2/bias action_count 5 step 0 next 5
2021-09-14 16:15:33.550309: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c233700 of size 460800 by op load/layer_1_type_2/matrix action_count 6 step 0 next 6
2021-09-14 16:15:33.550317: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a3f00 of size 2048 by op load/layer_1_type_2/bias action_count 7 step 0 next 7
2021-09-14 16:15:33.550324: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4700 of size 2048 by op load/layer_1_type_2/idt action_count 8 step 0 next 8
2021-09-14 16:15:33.550330: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c2a4f00 of size 460800 by op load/layer_2_type_2/matrix action_count 9 step 0 next 9
2021-09-14 16:15:33.550337: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315700 of size 2048 by op load/layer_2_type_2/bias action_count 10 step 0 next 10
2021-09-14 16:15:33.550353: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c315f00 of size 2048 by op load/layer_2_type_2/idt action_count 11 step 0 next 11
2021-09-14 16:15:33.550361: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316700 of size 2048 by op load/final_layer_type_2/matrix action_count 12 step 0 next 12
2021-09-14 16:15:33.550368: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c316f00 of size 256 by op load/final_layer_type_2/bias action_count 13 step 0 next 13
2021-09-14 16:15:33.550375: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c317000 of size 2304000 by op load/layer_0_type_1/matrix action_count 14 step 0 next 14
2021-09-14 16:15:33.550382: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c549800 of size 2048 by op load/layer_0_type_1/bias action_count 15 step 0 next 15
2021-09-14 16:15:33.550391: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c54a000 of size 460800 by op load/layer_1_type_1/matrix action_count 16 step 0 next 16
2021-09-14 16:15:33.550398: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5ba800 of size 2048 by op load/layer_1_type_1/bias action_count 17 step 0 next 17
2021-09-14 16:15:33.550405: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb000 of size 2048 by op load/layer_1_type_1/idt action_count 18 step 0 next 18
2021-09-14 16:15:33.550412: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c5bb800 of size 460800 by op load/layer_2_type_1/matrix action_count 19 step 0 next 19
2021-09-14 16:15:33.550419: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c000 of size 2048 by op load/layer_2_type_1/bias action_count 20 step 0 next 20
2021-09-14 16:15:33.550426: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62c800 of size 2048 by op load/layer_2_type_1/idt action_count 21 step 0 next 21
2021-09-14 16:15:33.550433: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d000 of size 2048 by op load/final_layer_type_1/matrix action_count 22 step 0 next 22
2021-09-14 16:15:33.550440: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d800 of size 256 by op load/final_layer_type_1/bias action_count 23 step 0 next 23
2021-09-14 16:15:33.550447: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62d900 of size 256 by op load/final_layer_type_0/bias action_count 24 step 0 next 24
2021-09-14 16:15:33.550454: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62da00 of size 2048 by op load/final_layer_type_0/matrix action_count 25 step 0 next 25
2021-09-14 16:15:33.550461: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62e200 of size 2048 by op load/layer_2_type_0/idt action_count 26 step 0 next 26
2021-09-14 16:15:33.550468: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62ea00 of size 2048 by op load/layer_2_type_0/bias action_count 27 step 0 next 27
2021-09-14 16:15:33.550475: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c62f200 of size 460800 by op load/layer_2_type_0/matrix action_count 28 step 0 next 28
2021-09-14 16:15:33.550482: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c69fa00 of size 2048 by op load/layer_1_type_0/idt action_count 29 step 0 next 29
2021-09-14 16:15:33.550488: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0200 of size 2048 by op load/layer_1_type_0/bias action_count 30 step 0 next 30
2021-09-14 16:15:33.550495: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a0a00 of size 2048 by op load/layer_0_type_0/bias action_count 31 step 0 next 31
2021-09-14 16:15:33.550502: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c6a1200 of size 460800 by op load/layer_1_type_0/matrix action_count 32 step 0 next 32
2021-09-14 16:15:33.550509: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c711a00 of size 2304000 by op load/layer_0_type_0/matrix action_count 33 step 0 next 33
2021-09-14 16:15:33.550516: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944200 of size 256 by op load/filter_type_all/mul/y action_count 34 step 0 next 34
2021-09-14 16:15:33.550523: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c944300 of size 40192 by op load/filter_type_all/matrix_3_2 action_count 35 step 0 next 35
2021-09-14 16:15:33.550530: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e000 of size 1024 by op load/filter_type_all/bias_3_2 action_count 36 step 0 next 36
2021-09-14 16:15:33.550537: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c94e400 of size 10240 by op load/filter_type_all/matrix_2_2 action_count 37 step 0 next 37
2021-09-14 16:15:33.550547: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950c00 of size 512 by op load/filter_type_all/bias_2_2 action_count 38 step 0 next 38
2021-09-14 16:15:33.550554: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950e00 of size 256 by op load/filter_type_all/matrix_1_2 action_count 39 step 0 next 39
2021-09-14 16:15:33.550561: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c950f00 of size 256 by op load/filter_type_all/bias_1_2 action_count 40 step 0 next 40
2021-09-14 16:15:33.550568: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c951000 of size 40192 by op load/filter_type_all/matrix_3_1 action_count 41 step 0 next 41
2021-09-14 16:15:33.550575: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95ad00 of size 1024 by op load/filter_type_all/bias_3_1 action_count 42 step 0 next 42
2021-09-14 16:15:33.550582: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95b100 of size 10240 by op load/filter_type_all/matrix_2_1 action_count 43 step 0 next 43
2021-09-14 16:15:33.550589: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95d900 of size 512 by op load/filter_type_all/bias_2_1 action_count 44 step 0 next 44
2021-09-14 16:15:33.550596: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95db00 of size 256 by op load/filter_type_all/matrix_1_1 action_count 45 step 0 next 45
2021-09-14 16:15:33.550602: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dc00 of size 256 by op load/filter_type_all/bias_1_1 action_count 46 step 0 next 46
2021-09-14 16:15:33.550609: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c95dd00 of size 40192 by op load/filter_type_all/matrix_3_0 action_count 47 step 0 next 47
2021-09-14 16:15:33.550616: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967a00 of size 1024 by op load/filter_type_all/bias_3_0 action_count 48 step 0 next 48
2021-09-14 16:15:33.550623: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967e00 of size 256 by op load/filter_type_all/bias_1_0 action_count 49 step 0 next 49
2021-09-14 16:15:33.550630: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c967f00 of size 10240 by op load/filter_type_all/matrix_2_0 action_count 50 step 0 next 50
2021-09-14 16:15:33.550637: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a700 of size 512 by op load/filter_type_all/bias_2_0 action_count 51 step 0 next 51
2021-09-14 16:15:33.550643: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96a900 of size 256 by op load/filter_type_all/matrix_1_0 action_count 52 step 0 next 52
2021-09-14 16:15:33.550650: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c96aa00 of size 220928 by op load/descrpt_attr/t_avg action_count 53 step 0 next 53
2021-09-14 16:15:33.550658: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af09c9a0900 of size 220928 by op load/descrpt_attr/t_std action_count 54 step 0 next 54
2021-09-14 16:15:33.550665: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af09c9d6800 of size 235617280 by op UNUSED action_count 124269 step 238 next 86
2021-09-14 16:15:33.550672: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0aaa8a400 of size 706560000 by op load/ProdEnvMatA action_count 122455 step 238 next 78
2021-09-14 16:15:33.550679: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0d4c5e400 of size 176640000 by op load/ProdEnvMatA action_count 122456 step 238 next 97
2021-09-14 16:15:33.550686: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0df4d3400 of size 29440000 by op load/ProdEnvMatA action_count 122457 step 238 next 58
2021-09-14 16:15:33.550693: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e10e6c00 of size 112640000 by op load/filter_type_all/Slice_4 action_count 124261 step 238 next 68
2021-09-14 16:15:33.550702: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0e7c52c00 of size 71680000 by op load/filter_type_all/Slice_2 action_count 124262 step 238 next 82
2021-09-14 16:15:33.550709: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ec0aec00 of size 51200000 by op load/filter_type_all/Slice action_count 124263 step 238 next 60
2021-09-14 16:15:33.550717: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af0ef182c00 of size 704000000 by op load/filter_type_all/MatMul_8 action_count 124268 step 238 next 83
2021-09-14 16:15:33.550724: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1190e5c00 of size 448000000 by op load/filter_type_all/MatMul_4 action_count 124270 step 238 next 65
2021-09-14 16:15:33.550730: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af133c24c00 of size 320000000 by op UNUSED action_count 124275 step 238 next 63
2021-09-14 16:15:33.550737: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af146d51c00 of size 320000000 by op load/filter_type_all/add action_count 124274 step 238 next 95
2021-09-14 16:15:33.550744: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af159e7ec00 of size 1408000000 by op load/filter_type_all/MatMul_9 action_count 124276 step 238 next 89
2021-09-14 16:15:33.550752: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af1add44c00 of size 1408000000 by op load/filter_type_all/concat_1 action_count 124286 step 238 next 93
2021-09-14 16:15:33.550758: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af201c0ac00 of size 896000000 by op load/filter_type_all/MatMul_5 action_count 124278 step 238 next 94
2021-09-14 16:15:33.550765: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af237288c00 of size 896000000 by op UNUSED action_count 124285 step 238 next 59
2021-09-14 16:15:33.550772: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af26c906c00 of size 640000000 by op load/filter_type_all/concat action_count 124280 step 238 next 92
2021-09-14 16:15:33.550779: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af292b60c00 of size 640000000 by op load/filter_type_all/MatMul_1 action_count 124281 step 238 next 73
2021-09-14 16:15:33.550786: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af2b8dbac00 of size 2816000000 by op load/filter_type_all/MatMul_10 action_count 124282 step 238 next 74
2021-09-14 16:15:33.550793: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] InUse at 2af360b46c00 of size 1792000000 by op load/filter_type_all/MatMul_6 action_count 124284 step 238 next 69
2021-09-14 16:15:33.550800: I tensorflow/core/common_runtime/bfc_allocator.cc:1046] Free  at 2af3cb842c00 of size 762958848 by op UNUSED action_count 122412 step 237 next 18446744073709551615
2021-09-14 16:15:33.550806: I tensorflow/core/common_runtime/bfc_allocator.cc:1051]      Summary of in-use Chunks by size: 
2021-09-14 16:15:33.550814: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 12 Chunks of size 256 totalling 3.0KiB
2021-09-14 16:15:33.550822: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 512 totalling 1.5KiB
2021-09-14 16:15:33.550829: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 1024 totalling 3.0KiB
2021-09-14 16:15:33.550836: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1280 totalling 1.2KiB
2021-09-14 16:15:33.550843: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 18 Chunks of size 2048 totalling 36.0KiB
2021-09-14 16:15:33.550850: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 10240 totalling 30.0KiB
2021-09-14 16:15:33.550857: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 40192 totalling 117.8KiB
2021-09-14 16:15:33.550864: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 220928 totalling 431.5KiB
2021-09-14 16:15:33.550871: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 6 Chunks of size 460800 totalling 2.64MiB
2021-09-14 16:15:33.550879: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 3 Chunks of size 2304000 totalling 6.59MiB
2021-09-14 16:15:33.550887: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 29440000 totalling 28.08MiB
2021-09-14 16:15:33.550894: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 51200000 totalling 48.83MiB
2021-09-14 16:15:33.550901: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 71680000 totalling 68.36MiB
2021-09-14 16:15:33.550908: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 112640000 totalling 107.42MiB
2021-09-14 16:15:33.550915: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 176640000 totalling 168.46MiB
2021-09-14 16:15:33.550922: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 320000000 totalling 305.18MiB
2021-09-14 16:15:33.550929: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 448000000 totalling 427.25MiB
2021-09-14 16:15:33.550936: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 640000000 totalling 1.19GiB
2021-09-14 16:15:33.550943: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 704000000 totalling 671.39MiB
2021-09-14 16:15:33.550950: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 706560000 totalling 673.83MiB
2021-09-14 16:15:33.550957: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 896000000 totalling 854.49MiB
2021-09-14 16:15:33.550964: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 2 Chunks of size 1408000000 totalling 2.62GiB
2021-09-14 16:15:33.550971: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 1792000000 totalling 1.67GiB
2021-09-14 16:15:33.550979: I tensorflow/core/common_runtime/bfc_allocator.cc:1054] 1 Chunks of size 2816000000 totalling 2.62GiB
2021-09-14 16:15:33.550990: I tensorflow/core/common_runtime/bfc_allocator.cc:1058] Sum Total of in-use chunks: 11.39GiB
2021-09-14 16:15:33.550999: I tensorflow/core/common_runtime/bfc_allocator.cc:1060] total_region_allocated_bytes_: 14445051904 memory_limit_: 14445051904 available bytes: 0 curr_region_allocation_bytes_: 28890103808
2021-09-14 16:15:33.551011: I tensorflow/core/common_runtime/bfc_allocator.cc:1066] Stats: 
Limit:                     14445051904
InUse:                     12230475776
MaxInUse:                  12230475776
NumAllocs:                       62178
MaxAllocSize:               2816000000
Reserved:                            0
PeakReserved:                        0
LargestFreeBlock:                    0

2021-09-14 16:15:33.551023: W tensorflow/core/common_runtime/bfc_allocator.cc:467] ******************_*****************************_____******************************************_____
2021-09-14 16:15:33.551064: W tensorflow/core/framework/op_kernel.cc:1767] OP_REQUIRES failed at matmul_op_impl.h:710 : Resource exhausted: OOM when allocating tensor with shape[1600000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
Traceback (most recent call last):
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1375, in _do_call
    return fn(*args)
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1359, in _run_fn
    return self._call_tf_sessionrun(options, feed_dict, fetch_list,
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1451, in _call_tf_sessionrun
    return tf_session.TF_SessionRun_wrapper(self._session, options, feed_dict,
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node load/filter_type_all/concat_5}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[load/o_virial/_27]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[{{node load/filter_type_all/concat_5}}]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/bin/dp", line 10, in <module>
    sys.exit(main())
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 443, in main
    test(**dict_args)
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/entrypoints/test.py", line 82, in test
    err = test_ener(
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/entrypoints/test.py", line 229, in test_ener
    ret = dp.eval(
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/infer/deep_pot.py", line 231, in eval
    e, f, v = self._eval_inner(coords, cells, atom_types, fparam = fparam, aparam = aparam, atomic = atomic, efield = efield)
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/deepmd/infer/deep_pot.py", line 326, in _eval_inner
    v_out = self.sess.run (t_out, feed_dict = feed_dict_test)
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 967, in run
    result = self._run(None, fetches, feed_dict, options_ptr,
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1190, in _run
    results = self._do_run(handle, final_targets, final_fetches,
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1368, in _do_run
    return self._do_call(_run_fn, feeds, fetches, targets, options,
  File "/opt/software/deepmd-kit/2.0.0-cuda10.1/lib/python3.9/site-packages/tensorflow/python/client/session.py", line 1394, in _do_call
    raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.ResourceExhaustedError: 2 root error(s) found.
  (0) Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node load/filter_type_all/concat_5 (defined at /lib/python3.9/site-packages/deepmd/infer/deep_eval.py:141) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

	 [[load/o_virial/_27]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

  (1) Resource exhausted: OOM when allocating tensor with shape[3520000,100] and type double on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
	 [[node load/filter_type_all/concat_5 (defined at /lib/python3.9/site-packages/deepmd/infer/deep_eval.py:141) ]]
Hint: If you want to see a list of allocated tensors when OOM happens, add report_tensor_allocations_upon_oom to RunOptions for current allocation info.

0 successful operations.
0 derived errors ignored.

Original stack trace for 'load/filter_type_all/concat_5':
  File "/bin/dp", line 10, in <module>
    sys.exit(main())
  File "/lib/python3.9/site-packages/deepmd/entrypoints/main.py", line 443, in main
    test(**dict_args)
  File "/lib/python3.9/site-packages/deepmd/entrypoints/test.py", line 71, in test
    dp = DeepPotential(model)
  File "/lib/python3.9/site-packages/deepmd/infer/__init__.py", line 62, in DeepPotential
    dp = DeepPot(mf, load_prefix=load_prefix, default_tf_graph=default_tf_graph)
  File "/lib/python3.9/site-packages/deepmd/infer/deep_pot.py", line 82, in __init__
    DeepEval.__init__(
  File "/lib/python3.9/site-packages/deepmd/infer/deep_eval.py", line 26, in __init__
    self.graph = self._load_graph(
  File "/lib/python3.9/site-packages/deepmd/infer/deep_eval.py", line 141, in _load_graph
    tf.import_graph_def(
  File "/lib/python3.9/site-packages/tensorflow/python/util/deprecation.py", line 535, in new_func
    return func(*args, **kwargs)
  File "/lib/python3.9/site-packages/tensorflow/python/framework/importer.py", line 400, in import_graph_def
    return _import_graph_def_internal(
  File "/lib/python3.9/site-packages/tensorflow/python/framework/importer.py", line 513, in _import_graph_def_internal
    _ProcessNewOps(graph)
  File "/lib/python3.9/site-packages/tensorflow/python/framework/importer.py", line 243, in _ProcessNewOps
    for new_op in graph._add_new_tf_operations(compute_devices=False):  # pylint: disable=protected-access
  File "/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 3707, in _add_new_tf_operations
    new_ops = [
  File "/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 3708, in <listcomp>
    self._create_op_from_tf_operation(c_op, compute_device=compute_devices)
  File "/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 3590, in _create_op_from_tf_operation
    ret = Operation(c_op, self)
  File "/lib/python3.9/site-packages/tensorflow/python/framework/ops.py", line 2045, in __init__
    self._traceback = tf_stack.extract_stack_for_node(self._c_op)

COMMAND:

nohup /opt/software/deepmd-kit/2.0.0-cuda10.1/bin/dp test -m ./AlCuMg_a9r12_1600_000.pb -s ../../data/ > dptest.log 2>&1 &

paltform: ALI-EHPC

machine_type: P100_4_30 & T4_4_15 (training is OK, but test giving OOM )

Steps to Reproduce

Further Information, Files, and Links

@Vibsteamer Vibsteamer added the bug label Sep 14, 2021
@njzjz
Copy link
Member

njzjz commented Sep 14, 2021

See #748 (comment)

@amcadmus
Copy link
Member

We may want an automatic adjustment of the testing batchsize...

@njzjz
Copy link
Member

njzjz commented Sep 15, 2021

We may want an automatic adjustment of the testing batchsize...

Here is something that can be referred: Lightning-AI/pytorch-lightning#1638

@Vibsteamer
Copy link
Contributor Author

See #748 (comment)

Is it possible now to complete a “all-data” dp test ?
May be to conduct on a CPU node with the larger enough MEM?

@njzjz njzjz self-assigned this Sep 20, 2021
njzjz added a commit to njzjz/deepmd-kit that referenced this issue Sep 22, 2021
Resolves deepmodeling#1149.

We start nbatch * natoms from 1024 (or we can set a different number), and iteratively multiply it by 2 until catching the OOM error.

A small issue is that it's a bit slow to catch the TF OOM error. It's a problem of TF and I don't know how to resolve it. Luckily we only need to catch once.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants