Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gigahorse 1.1.8-ff505a5 does not run on M4000 with Cuda5.2: after phase1, terminate called, what(): invalid device ordinal , signal 6 #14

Open
bladeuserpi opened this issue Jan 28, 2023 · 16 comments

Comments

@bladeuserpi
Copy link

bladeuserpi commented Jan 28, 2023

Hi Max,

on Quadro M4000 it terminates with an error.

The Readme says:
All GPUs for compute capability 5.2 (Maxwell 2.0), 6.0, 6.1 (Pascal), 7.0 (Volta), 7.5 (Turing) and 8.0, 8.6, 8.9 (Ampere).
Which includes: GTX 1000 series, GTX 1600 series, RTX 2000 series, RTX 3000 series and RTX 4000 series

According to this: https://developer.nvidia.com/cuda-gpus
Cuda 5.2 cards are: GTX 900 series and Quadro M series

Can you please check / enable Cuda 5.2 compatibility?

     1	bf328434cd57a12ae38d0c41aa266a57  cuda_plot_k26_e4ed5f
     2	Chia k26 next-gen CUDA plotter - ff505a5
     3	Plot Format: v2.4
     4	Network Port: 11337 [MMX] (unique)
     5	No. GPUs: 1
     6	No. Streams: 4
     7	Final Destination: ./
     8	Shared Memory limit: unlimited
     9	Number of Plots: 1
    10	Initialization took 0.143 sec
    11	Crafting plot 1 out of 1 (2023/01/28 10:36:18)
    12	Process ID: 27
    13	Pool Puzzle Hash:  xxx
    14	Farmer Public Key: xxx
    15	Working Directory:   ./
    16	Working Directory 2: @RAM
    17	Compression Level: C1 (xbits = 15, final table = 3)
    18	Plot Name: plot-mmx-k26-c1-2023-01-28-10-36-xxx
    19	[P1] Setup took 0.059 sec
    20	[P1] Table 1 took 1.051 sec, 67108864 entries, 1051360 max, 16884 tmp, 0 GB/s up, 0.594686 GB/s down
    21	[P1] Table 2 took 1.263 sec, 67083606 entries, 1049983 max, 16882 tmp, 0.395883 GB/s up, 0.680436 GB/s down
    22	[P1] Table 3 took 1.031 sec, 67026740 entries, 1048853 max, 16818 tmp, 0.666577 GB/s up, 1.89441 GB/s down
    23	[P1] Table 4 took 1.12 sec, 66915113 entries, 1047733 max, 16856 tmp, 0.9475 GB/s up, 1.67412 GB/s down
    24	[P1] Table 5 took 1.091 sec, 66704641 entries, 1044894 max, 16759 tmp, 0.971065 GB/s up, 1.50379 GB/s down
    25	[P1] Table 6 took 1.033 sec, 66288742 entries, 1038189 max, 16731 tmp, 0.841945 GB/s up, 1.36134 GB/s down
    26	[P1] Table 7 took 0.48 sec, 65450408 entries, 1025813 max, 16545 tmp, 1.41479 GB/s up, 1.62764 GB/s down
    27	Phase 1 took 7.19 sec
    28	terminate called after throwing an instance of 'std::runtime_error'
    29	  what():  invalid device ordinal
    30	Command terminated by signal 6
    31	2.42user 5.55system 0:09.16elapsed 87%CPU (0avgtext+0avgdata 8366844maxresident)k
    32	21144inputs+96outputs (5major+2082253minor)pagefaults 0swaps
    33

P.S. 
The previous version showed another error behaviour:
Chia k26 next-gen CUDA plotter - 40a8b16
...
   48	Phase 1 took 4.939 sec
    49	[P2] Setup took 0.004 sec
    50	[P2] Table 7 took 0.051 sec, 9.11458 GB/s up, 0.16276 GB/s down
    51	[P2] Table 6 took 0.051 sec, 9.11458 GB/s up, 0.16276 GB/s down
    52	[P2] Table 5 took 0.05 sec, 9.29688 GB/s up, 0.166016 GB/s down
    53	[P2] Table 4 took 0.051 sec, 9.11458 GB/s up, 0.16276 GB/s down
    54	Phase 2 took 0.209 sec
    55	[P3] Setup took 0.041 sec
    56	[P3] Table 3 LPSK took 0.107 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 5.04253 GB/s up, 8.03168 GB/s down
    57	WritePark(): ans_length (65535) > max_ans_length (1022) (y = 0, i = 0)
    58	WritePark(): ans_length (57343) > max_ans_length (1022) (y = 0, i = 1)
...
Command terminated by signal 11
@madMAx43v3r
Copy link
Owner

what's your command line?

@madMAx43v3r
Copy link
Owner

try latest version now

@bladeuserpi
Copy link
Author

Thanks, now it works perfect.
It is fixed in this version: 1.1.8-217b8ba

root@243ce97156ce data1]# cat -n giga235.cuda_plot_k26_834ee25.19481.out
     1	Sat Jan 28 21:30:56 2023       
     2	+-----------------------------------------------------------------------------+
     3	| NVIDIA-SMI 525.85.05    Driver Version: 525.85.05    CUDA Version: 12.0     |
     4	|-------------------------------+----------------------+----------------------+
     5	| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     6	| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     7	|                               |                      |               MIG M. |
     8	|===============================+======================+======================|
     9	|   0  Quadro M4000        Off  | 00000000:84:00.0 Off |                  N/A |
    10	| 71%   52C    P0    46W / 120W |      0MiB /  8192MiB |      1%      Default |
    11	|                               |                      |                  N/A |
    12	+-------------------------------+----------------------+----------------------+
    13	                                                                               
    14	+-----------------------------------------------------------------------------+
    15	| Processes:                                                                  |
    16	|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    17	|        ID   ID                                                   Usage      |
    18	|=============================================================================|
    19	|  No running processes found                                                 |
    20	+-----------------------------------------------------------------------------+
    21	    Product Name                          : Quadro M4000
    22	Sat Jan 28 21:30:57 UTC 2023
    23	298ef2b10a4db0f1bd796eab98d816b5  cuda_plot_k26_834ee25
    24	Calling /cuda_plot_k26_834ee25 -c xxx -f xxx -x 11337 -t ./
    25	Chia k26 next-gen CUDA plotter - 217b8ba
    26	Plot Format: v2.4
    27	Network Port: 11337 [MMX] (unique)
    28	No. GPUs: 1
    29	No. Streams: 4
    30	Final Destination: ./
    31	Shared Memory limit: unlimited
    32	Number of Plots: 1
    33	Initialization took 0.144 sec
    34	Crafting plot 1 out of 1 (2023/01/28 21:30:58)
    35	Process ID: 243
    36	Pool Puzzle Hash:  xxx
    37	Farmer Public Key: xxx
    38	Working Directory:   ./
    39	Working Directory 2: @RAM
    40	Compression Level: C1 (xbits = 15, final table = 3)
    41	Plot Name: plot-mmx-k26-c1-2023-01-28-21-30-xxx
    42	[P1] Setup took 0.06 sec
    43	[P1] Table 1 took 1.05 sec, 67108864 entries, 1050713 max, 16892 tmp, 0 GB/s up, 0.595253 GB/s down
    44	[P1] Table 2 took 1.274 sec, 67086826 entries, 1050211 max, 16907 tmp, 0.392465 GB/s up, 0.674561 GB/s down
    45	[P1] Table 3 took 1.031 sec, 67045988 entries, 1049235 max, 16897 tmp, 0.666609 GB/s up, 1.89441 GB/s down
    46	[P1] Table 4 took 1.113 sec, 66952825 entries, 1048507 max, 16860 tmp, 0.953733 GB/s up, 1.68465 GB/s down
    47	[P1] Table 5 took 1.101 sec, 66760454 entries, 1045142 max, 16715 tmp, 0.962788 GB/s up, 1.49014 GB/s down
    48	[P1] Table 6 took 1.044 sec, 66395649 entries, 1039662 max, 16692 tmp, 0.833771 GB/s up, 1.347 GB/s down
    49	[P1] Table 7 took 0.48 sec, 65692092 entries, 1028834 max, 16593 tmp, 1.41707 GB/s up, 1.62764 GB/s down
    50	Phase 1 took 7.214 sec
    51	[P2] Setup took 0.006 sec
    52	[P2] Table 7 took 0.049 sec, 8.74008 GB/s up, 0.169404 GB/s down
    53	[P2] Table 6 took 0.049 sec, 8.83368 GB/s up, 0.169404 GB/s down
    54	[P2] Table 5 took 0.049 sec, 8.88222 GB/s up, 0.169404 GB/s down
    55	[P2] Table 4 took 0.049 sec, 8.90781 GB/s up, 0.169404 GB/s down
    56	Phase 2 took 0.207 sec
    57	[P3] Setup took 0.042 sec
    58	[P3] Table 3 LPSK took 0.116 sec, 53609680 entries, 875470 max, 13944 tmp, 4.37786 GB/s up, 7.40854 GB/s down
    59	[P3] Table 3 NSK took 0.36 sec, 53609680 entries, 839536 max, 13944 tmp, 1.52557 GB/s up, 2.41423 GB/s down
    60	[P3] Table 4 PDSK took 0.093 sec, 53897402 entries, 845019 max, 13568 tmp, 4.78262 GB/s up, 7.56065 GB/s down
    61	[P3] Table 4 LPSK took 0.183 sec, 53897402 entries, 868435 max, 14116 tmp, 4.10563 GB/s up, 4.69612 GB/s down
    62	[P3] Table 4 NSK took 0.318 sec, 53897402 entries, 844896 max, 13956 tmp, 1.73634 GB/s up, 2.73309 GB/s down
    63	[P3] Table 5 PDSK took 0.087 sec, 54731389 entries, 859145 max, 13832 tmp, 5.09804 GB/s up, 8.08207 GB/s down
    64	[P3] Table 5 LPSK took 0.185 sec, 54731389 entries, 892309 max, 14578 tmp, 4.10772 GB/s up, 4.64535 GB/s down
    65	[P3] Table 5 NSK took 0.322 sec, 54731389 entries, 857890 max, 14366 tmp, 1.7413 GB/s up, 2.69914 GB/s down
    66	[P3] Table 6 PDSK took 0.086 sec, 57212501 entries, 898627 max, 14462 tmp, 5.12966 GB/s up, 8.17605 GB/s down
    67	[P3] Table 6 LPSK took 0.191 sec, 57212501 entries, 947027 max, 15603 tmp, 4.11196 GB/s up, 4.49943 GB/s down
    68	[P3] Table 6 NSK took 0.326 sec, 57212501 entries, 896914 max, 15175 tmp, 1.7979 GB/s up, 2.66602 GB/s down
    69	[P3] Table 7 PDSK took 0.085 sec, 65692092 entries, 1032746 max, 16593 tmp, 7.19771 GB/s up, 8.27224 GB/s down
    70	[P3] Table 7 LPSK took 0.211 sec, 65692092 entries, 1082858 max, 17679 tmp, 4.12476 GB/s up, 4.07294 GB/s down
    71	[P3] Table 7 NSK took 0.395 sec, 65692092 entries, 1028834 max, 17394 tmp, 1.70376 GB/s up, 2.20031 GB/s down
    72	Phase 3 took 3.051 sec
    73	[P4] Setup took 0.017 sec
    74	[P4] total_p7_parks = 32077
    75	[P4] total_c3_parks = 6569, 2400 / 2464 ANS bytes
    76	Phase 4 took 0.301 sec, 1.21955 GB/s up, 0.787557 GB/s down
    77	Total plot creation time was 10.7934 sec (0.17989 min)
    78	Flushing to disk took 0.087 sec
    79	4.83user 6.43system 0:12.58elapsed 89%CPU (0avgtext+0avgdata 8381820maxresident)k
    80	0inputs+2375744outputs (0major+2130013minor)pagefaults 0swaps

@bladeuserpi
Copy link
Author

K32 also works fine:

   25	Chia k32 next-gen CUDA plotter - 217b8ba
    26	Plot Format: v2.4
    27	Network Port: 11337 [MMX] (unique)
    28	No. GPUs: 1
    29	No. Streams: 4
    30	Final Destination: ./
    31	Shared Memory limit: unlimited
    32	Number of Plots: 1
    33	Initialization took 0.149 sec
    34	Crafting plot 1 out of 1 (2023/01/28 21:38:27)
    35	Process ID: 451
    36	Pool Puzzle Hash:  xxx
    37	Farmer Public Key: xxx
    38	Working Directory:   ./
    39	Working Directory 2: @RAM
    40	Compression Level: C1 (xbits = 15, final table = 3)
    41	Plot Name: plot-mmx-k32-c1-2023-01-28-21-38-xxx
    42	[P1] Setup took 0.873 sec
    43	[P1] Table 1 took 21.746 sec, 4294967296 entries, 16792865 max, 66868 tmp, 0 GB/s up, 1.56352 GB/s down
    44	[P1] Table 2 took 49.892 sec, 4294867796 entries, 16789256 max, 66637 tmp, 0.641385 GB/s up, 1.02221 GB/s down
    45	[P1] Table 3 took 89 sec, 4294628575 entries, 16789481 max, 66588 tmp, 0.539313 GB/s up, 1.33708 GB/s down
    46	[P1] Table 4 took 91.857 sec, 4294114692 entries, 16784900 max, 66669 tmp, 0.87085 GB/s up, 1.29549 GB/s down
    47	[P1] Table 5 took 79.549 sec, 4293066859 entries, 16781305 max, 66543 tmp, 1.00547 GB/s up, 1.28223 GB/s down
    48	[P1] Table 6 took 67.918 sec, 4291142724 entries, 16771252 max, 66609 tmp, 0.941896 GB/s up, 1.25151 GB/s down
    49	[P1] Table 7 took 47.519 sec, 4287117215 entries, 16758192 max, 66543 tmp, 1.00922 GB/s up, 0.983822 GB/s down
    50	Phase 1 took 448.858 sec
    51	[P2] Setup took 0.47 sec
    52	[P2] Table 7 took 6.988 sec, 4.57091 GB/s up, 0.0760232 GB/s down
    53	[P2] Table 6 took 6.209 sec, 5.14922 GB/s up, 0.0855613 GB/s down
    54	[P2] Table 5 took 5.982 sec, 5.34701 GB/s up, 0.0888081 GB/s down
    55	[P2] Table 4 took 5.9 sec, 5.42265 GB/s up, 0.0900424 GB/s down
    56	Phase 2 took 25.813 sec
    57	[P3] Setup took 0.632 sec
    58	[P3] Table 3 LPSK took 6.148 sec, 3439192676 entries, 14274084 max, 56476 tmp, 5.29094 GB/s up, 8.29542 GB/s down
    59	[P3] Table 3 NSK took 24.657 sec, 3439192676 entries, 13451472 max, 56476 tmp, 1.55883 GB/s up, 2.41453 GB/s down
    60	[P3] Table 4 PDSK took 5.224 sec, 3464741486 entries, 13559523 max, 53823 tmp, 6.22605 GB/s up, 8.94913 GB/s down
    61	[P3] Table 4 LPSK took 19.297 sec, 3464741486 entries, 13860931 max, 55689 tmp, 3.16726 GB/s up, 2.64291 GB/s down
    62	[P3] Table 4 NSK took 18.847 sec, 3464741486 entries, 13548462 max, 55097 tmp, 2.05452 GB/s up, 3.15887 GB/s down
    63	[P3] Table 5 PDSK took 5.233 sec, 3530459450 entries, 13813320 max, 54777 tmp, 6.21385 GB/s up, 8.93374 GB/s down
    64	[P3] Table 5 LPSK took 19.542 sec, 3530459450 entries, 14247360 max, 57479 tmp, 3.17175 GB/s up, 2.60978 GB/s down
    65	[P3] Table 5 NSK took 19.242 sec, 3530459450 entries, 13805085 max, 56697 tmp, 2.05051 GB/s up, 3.09402 GB/s down
    66	[P3] Table 6 PDSK took 5.263 sec, 3709297820 entries, 14512455 max, 57638 tmp, 6.17571 GB/s up, 8.88281 GB/s down
    67	[P3] Table 6 LPSK took 20.316 sec, 3709297820 entries, 15087524 max, 60528 tmp, 3.16519 GB/s up, 2.51035 GB/s down
    68	[P3] Table 6 NSK took 20.113 sec, 3709297820 entries, 14500721 max, 59895 tmp, 2.06109 GB/s up, 2.96003 GB/s down
    69	[P3] Table 7 PDSK took 5.978 sec, 4287117215 entries, 16765911 max, 66543 tmp, 7.34687 GB/s up, 7.82038 GB/s down
    70	[P3] Table 7 LPSK took 25.201 sec, 4287117215 entries, 17200380 max, 68869 tmp, 2.83941 GB/s up, 2.02374 GB/s down
    71	[P3] Table 7 NSK took 23.502 sec, 4287117215 entries, 16758192 max, 68180 tmp, 2.03865 GB/s up, 2.5332 GB/s down
    72	Phase 3 took 219.766 sec
    73	[P4] Setup took 0.312 sec
    74	[P4] total_p7_parks = 2093319
    75	[P4] total_c3_parks = 428711, 2385 / 2457 ANS bytes
    76	Phase 4 took 14.667 sec, 2.17778 GB/s up, 1.24657 GB/s down
    77	Total plot creation time was 709.189 sec (11.8198 min)
    78	Flushing to disk took 8.869 sec
    79	315.10user 163.83system 12:22.78elapsed 64%CPU (0avgtext+0avgdata 196177128maxresident)k
    80	0inputs+176717632outputs (0major+50223561minor)pagefaults 0swaps

But it nearly reaches thermal limit:
    2	+-----------------------------------------------------------------------------+
     3	| NVIDIA-SMI 525.85.05    Driver Version: 525.85.05    CUDA Version: 12.0     |
     4	|-------------------------------+----------------------+----------------------+
     5	| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     6	| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     7	|                               |                      |               MIG M. |
     8	|===============================+======================+======================|
     9	|   0  Quadro M4000        Off  | 00000000:84:00.0 Off |                  N/A |
    10	|100%   97C    P0    75W / 120W |   5701MiB /  8192MiB |    100%      Default |
    11	|                               |                      |                  N/A |
    12	+-------------------------------+----------------------+----------------------+
    13	
    14	+-----------------------------------------------------------------------------+
    15	| Processes:                                                                  |
    16	|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    17	|        ID   ID                                                   Usage      |
    18	|=============================================================================|
    19	|    0   N/A  N/A      5108	 C   ./cuda_plot_k32_834ee25          4778MiB |
    20	+-----------------------------------------------------------------------------+

Later it shows Fanspeed can be faster than 100%:
    73	    Fan Speed                             : 118 %

nvidia-smi -q shows these thermal limits (slowdown/shutdown)
  155	    Temperature
   156	        GPU Current Temp                  : 86 C
   157	        GPU T.Limit Temp                  : N/A
   158	        GPU Shutdown Temp                 : 104 C
   159	        GPU Slowdown Temp                 : 99 C
   160	        GPU Max Operating Temp            : N/A
   161	        GPU Target Temperature            : 81 C
   162	        Memory Current Temp               : N/A
   163	        Memory Max Operating Temp         : N/A

@madMAx43v3r
Copy link
Owner

nice, it seems your GPU doesnt support a certain feature, but that's ok since it's not needed for single GPU plotting

@madMAx43v3r
Copy link
Owner

you sure the fan works?

@bladeuserpi
Copy link
Author

I can hear the fan, especially when it is shown >100%. Also M4000 is a single slot card with single fan.
Now trying K34.

@bladeuserpi
Copy link
Author

Thats really nice performance for such a venerable card with 120W (uses only 80W during run): K34 in less than an hour.

  25	Chia k34 next-gen CUDA plotter - 217b8ba
    26	Plot Format: v2.4
    27	Network Port: 11337 [MMX] (unique)
    28	No. GPUs: 1
    29	No. Streams: 4
    30	Final Destination: ./
    31	Shared Memory limit: unlimited
    32	Number of Plots: 1
    33	Initialization took 0.159 sec
    34	Crafting plot 1 out of 1 (2023/01/28 22:04:03)
    35	Process ID: 659
    36	Pool Puzzle Hash:  xxx
    37	Farmer Public Key: xxx
    38	Working Directory:   ./
    39	Working Directory 2: @RAM
    40	Compression Level: C1 (xbits = 15, final table = 3)
    41	Plot Name: plot-mmx-k34-c1-2023-01-28-22-04-xxx
    42	[P1] Setup took 1.096 sec
    43	[P1] Table 1 took 98.508 sec, 17179869184 entries, 16792201 max, 16986 tmp, 0 GB/s up, 1.8273 GB/s down
    44	[P1] Table 2 took 238.503 sec, 17179740410 entries, 16791874 max, 16998 tmp, 0.603766 GB/s up, 1.09015 GB/s down
    45	[P1] Table 3 took 455.553 sec, 17179014088 entries, 16789758 max, 17018 tmp, 0.456585 GB/s up, 1.27319 GB/s down
    46	[P1] Table 4 took 523.988 sec, 17177706618 entries, 16787314 max, 17018 tmp, 0.641204 GB/s up, 1.1069 GB/s down
    47	[P1] Table 5 took 393.541 sec, 17175350642 entries, 16785224 max, 17014 tmp, 0.853679 GB/s up, 1.27053 GB/s down
    48	[P1] Table 6 took 350.404 sec, 17170418250 entries, 16781838 max, 16979 tmp, 0.776043 GB/s up, 1.19863 GB/s down
    49	[P1] Table 7 took 245.674 sec, 17160520630 entries, 16771392 max, 17014 tmp, 0.846185 GB/s up, 0.895512 GB/s down
    50	Phase 1 took 2307.87 sec
    51	[P2] Setup took 1.507 sec
    52	[P2] Table 7 took 40.435 sec, 3.16201 GB/s up, 0.0525535 GB/s down
    53	[P2] Table 6 took 35.229 sec, 3.63137 GB/s up, 0.0603196 GB/s down
    54	[P2] Table 5 took 33.798 sec, 3.78621 GB/s up, 0.0628735 GB/s down
    55	[P2] Table 4 took 33.274 sec, 3.84636 GB/s up, 0.0638637 GB/s down
    56	Phase 2 took 145.222 sec
    57	[P3] Setup took 0.728 sec
    58	[P3] Table 3 LPSK took 31.977 sec, 13757836352 entries, 14631359 max, 14707 tmp, 4.06913 GB/s up, 7.50552 GB/s down
    59	[P3] Table 3 NSK took 107.973 sec, 13757836352 entries, 13449779 max, 14707 tmp, 1.42402 GB/s up, 2.42786 GB/s down
    60	[P3] Table 4 PDSK took 38.234 sec, 13861013568 entries, 13559321 max, 13775 tmp, 3.40296 GB/s up, 5.75414 GB/s down
    61	[P3] Table 4 LPSK took 83.578 sec, 13861013568 entries, 13861736 max, 14444 tmp, 2.92546 GB/s up, 2.87162 GB/s down
    62	[P3] Table 4 NSK took 91.411 sec, 13861013568 entries, 13554199 max, 14078 tmp, 1.69464 GB/s up, 2.86775 GB/s down
    63	[P3] Table 5 PDSK took 37.079 sec, 14125809556 entries, 13825099 max, 14135 tmp, 3.50849 GB/s up, 5.93338 GB/s down
    64	[P3] Table 5 LPSK took 84.901 sec, 14125809556 entries, 14250243 max, 14760 tmp, 2.92087 GB/s up, 2.82687 GB/s down
    65	[P3] Table 5 NSK took 92.587 sec, 14125809556 entries, 13811284 max, 14486 tmp, 1.70508 GB/s up, 2.83132 GB/s down
    66	[P3] Table 6 PDSK took 38.676 sec, 14843952578 entries, 14520844 max, 14722 tmp, 3.36267 GB/s up, 5.68838 GB/s down
    67	[P3] Table 6 LPSK took 88.138 sec, 14843952578 entries, 15087224 max, 15673 tmp, 2.91946 GB/s up, 2.72305 GB/s down
    68	[P3] Table 6 NSK took 98.132 sec, 14843952578 entries, 14513168 max, 15284 tmp, 1.69052 GB/s up, 2.67134 GB/s down
    69	[P3] Table 7 PDSK took 42.574 sec, 17160520630 entries, 16790131 max, 17014 tmp, 4.12932 GB/s up, 5.16756 GB/s down
    70	[P3] Table 7 LPSK took 100.007 sec, 17160520630 entries, 17197010 max, 17958 tmp, 2.86378 GB/s up, 2.39987 GB/s down
    71	[P3] Table 7 NSK took 103.914 sec, 17160520630 entries, 16771392 max, 17483 tmp, 1.8456 GB/s up, 2.5227 GB/s down
    72	Phase 3 took 1040.59 sec
    73	[P4] Setup took 0.28 sec
    74	[P4] total_p7_parks = 8379161
    75	[P4] total_c3_parks = 1716052, 2384 / 2460 ANS bytes
    76	Phase 4 took 58.218 sec, 2.19616 GB/s up, 1.32927 GB/s down
    77	Total plot creation time was 3552.21 sec (59.2035 min)
    78	Flushing to disk took 52.502 sec
    79	1434.95user 744.19system 1:01:44elapsed 58%CPU (0avgtext+0avgdata 804982064maxresident)k
    80	16inputs+738008760outputs (0major+206163716minor)pagefaults 0swaps

    1	top - 23:44:16 up  1:19,  4 users,  load average: 0.10, 0.34, 0.46
     2	Tasks: 808 total,   2 running, 805 sleeping,   0 stopped,   1 zombie
     3	%Cpu(s):  0.0 us,  0.1 sy,  0.0 ni, 99.9 id,  0.0 wa,  0.0 hi,  0.0 si,  0.0 st
     4	MiB Mem : 1030648.+total, 127473.0 free,  25470.0 used, 877705.6 buff/cache
     5	MiB Swap:  16100.0 total,  16100.0 free,      0.0 used. 212997.9 avail Mem 
     6	
     7	    PID USER      PR  NI    VIRT    RES    SHR S  %CPU  %MEM     TIME+ COMMAND                      
     8	   9919 gputest   20   0  778.1g 767.5g 767.4g S   2.0  76.3  18:41.51 cuda_plot_k34_8              
     9	   2428 root      20   0  526632  38504  17148 S   1.0   0.0   0:11.62 tuned            

    4	+-----------------------------------------------------------------------------+
     5	| NVIDIA-SMI 525.85.05    Driver Version: 525.85.05    CUDA Version: 12.0     |
     6	|-------------------------------+----------------------+----------------------+
     7	| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
     8	| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
     9	|                               |                      |               MIG M. |
    10	|===============================+======================+======================|
    11	|   0  Quadro M4000        Off  | 00000000:84:00.0 Off |                  N/A |
    12	|100%   96C    P0    79W / 120W |   8102MiB /  8192MiB |    100%      Default |
    13	|                               |                      |                  N/A |
    14	+-------------------------------+----------------------+----------------------+
    15	
    16	+-----------------------------------------------------------------------------+
    17	| Processes:                                                                  |
    18	|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
    19	|        ID   ID                                                   Usage      |
    20	|=============================================================================|
    21	|    0   N/A  N/A      9919      C   ./cuda_plot_k34_834ee25          6923MiB |
    22	+-----------------------------------------------------------------------------+

This shows 6923MB for cuda_plot, but GPU uses 8102MB.

@bladeuserpi
Copy link
Author

bladeuserpi commented Jan 28, 2023

For fun I also tried K2200 which is Cuda5.0 (so according to Readme it is not supported),
and it does not work. There would be another venerable card K6000 which is only Cuda3.5 but
it has good Tflops: https://images.nvidia.com/content/quadro/product-literature/line-card/610177-nvidia-quadro-line-card-us-fnl-hr.pdf
I do not know if it makes sense to enable these older Cuda levels, especially if it needs much effort besides
using some compiler flags ?

    24	Calling /cuda_plot_k26_834ee25 -c xxx -f xxx -x 11337 -t ./ -S 2
    25	Chia k26 next-gen CUDA plotter - 217b8ba
    26	Plot Format: v2.4
    27	Network Port: 11337 [MMX] (unique)
    28	No. GPUs: 1
    29	No. Streams: 2
    30	Final Destination: ./
    31	Shared Memory limit: unlimited
    32	Number of Plots: 1
    33	Initialization took 0.128 sec
    34	Crafting plot 1 out of 1 (2023/01/28 23:25:22)
    35	Process ID: 27
    36	Pool Puzzle Hash:  xxx
    37	Farmer Public Key: xxx
    38	Working Directory:   ./
    39	Working Directory 2: @RAM
    40	Compression Level: C1 (xbits = 15, final table = 3)
    41	Plot Name: plot-mmx-k26-c1-2023-01-28-23-25-xxx
    42	[P1] Setup took 0.03 sec
    43	[P1] Table 1 took 1.076 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 0 GB/s up, 0.580869 GB/s down
    44	[P1] Table 2 took 1.106 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 0.480335 GB/s up, 0.777026 GB/s down
    45	[P1] Table 3 took 0.991 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 0.737103 GB/s up, 1.97088 GB/s down
    46	[P1] Table 4 took 1.014 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 1.11332 GB/s up, 1.84913 GB/s down
    47	[P1] Table 5 took 0.93 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 1.21388 GB/s up, 1.76413 GB/s down
    48	[P1] Table 6 took 0.858 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 1.08355 GB/s up, 1.639 GB/s down
    49	[P1] Table 7 took 0.252 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 2.89869 GB/s up, 3.10026 GB/s down
    50	Phase 1 took 6.281 sec
    51	[P2] Setup took 0.006 sec
    52	[P2] Table 7 took 0.08 sec, 5.81055 GB/s up, 0.10376 GB/s down
    53	[P2] Table 6 took 0.082 sec, 5.66883 GB/s up, 0.101229 GB/s down
    54	[P2] Table 5 took 0.082 sec, 5.66883 GB/s up, 0.101229 GB/s down
    55	[P2] Table 4 took 0.081 sec, 5.73881 GB/s up, 0.102479 GB/s down
    56	Phase 2 took 0.337 sec
    57	[P3] Setup took 0.023 sec
    58	[P3] Table 3 LPSK took 0.234 sec, 83886080 entries, 1310720 max, 4294967295 tmp, 2.30577 GB/s up, 3.67261 GB/s down
    59	Command terminated by signal 11
    60	2.66user 5.32system 0:08.57elapsed 93%CPU (0avgtext+0avgdata 8404456maxresident)k
    61	21144inputs+144outputs (5major+2099879minor)pagefaults 0swaps

@madMAx43v3r
Copy link
Owner

K2200 is too slow, no point. older than Maxwell has very poor performance.

Also need to measure the second plot, first plot is slow.

@madMAx43v3r
Copy link
Owner

And if you have 1 TB RAM, it makes sense to invest in a better GPU ;)

@bladeuserpi
Copy link
Author

I agree K2200 will be too slow, I would be more interested to bring new life to K6000 (2880 cudacores, 5.1Tflops),
but I can understand if you think it is not worth your effort. The Quadro series have the advantage of ECC memory and
not to be over-sized so they fit better into workstations (which can take more RAM than consumer equipment).
Also on this machine with 3060Ti the sidepanel cannot be closed and worse Linux hangs after grub menu with
blackscreen/steady cursor.

@madMAx43v3r
Copy link
Owner

trust me K6000 will be slow, slower than your M4000, Kepler is really bad for GPGPU.

@bladeuserpi
Copy link
Author

bladeuserpi commented Jan 29, 2023

Thanks for explaining.

I also see 2nd plot gets faster:

grep creation giga40.cuda_plot_k32_c1bd566.23139.out
Total plot creation time was 700.733 sec (11.6789 min)
Total plot creation time was 668.539 sec (11.1423 min)
Total plot creation time was 672.004 sec (11.2001 min)
Total plot creation time was 671.021 sec (11.1837 min)
Total plot creation time was 657.168 sec (10.9528 min)

grep creation giga1018.cuda_plot_k34_c1bd566.3002.out
Total plot creation time was 3455.33 sec (57.5889 min)
Total plot creation time was 3166.55 sec (52.7759 min)
Total plot creation time was 3173.35 sec (52.8892 min)
Total plot creation time was 3179.3 sec (52.9883 min)

@bladeuserpi
Copy link
Author

bladeuserpi commented Jan 30, 2023

The 2nd-plot-speedup is even more pronounced for 3060-12GB:
(Installing 3060 as secondary card avoided black screen problem)

# grep creation giga3295.cuda_plot_k32_c1bd566.5758.out 
Total plot creation time was 269.23 sec (4.48717 min)
Total plot creation time was 182.047 sec (3.03411 min)
Total plot creation time was 182.94 sec (3.04899 min)
Total plot creation time was 182.333 sec (3.03889 min)
Total plot creation time was 182.167 sec (3.03612 min)

# grep creation giga4272.cuda_plot_k34_c1bd566.9727.out 
Total plot creation time was 1203.22 sec (20.0536 min)
Total plot creation time was 861.759 sec (14.3626 min)
Total plot creation time was 842.048 sec (14.0341 min)
Total plot creation time was 816.19 sec (13.6032 min)

@madMAx43v3r
Copy link
Owner

yeah the faster your GPU, the more difference in second plot speed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants