Not enable peer access in case of the GPUs are located over QPI #3319

buaaliyi · 2015-11-12T07:19:33Z

I've found that for some hardware architectures, such as several types of HP server, the GPUs over QPI can even enable p2p access between each others, however, the bandwidth is quit low (less than 200MB/s instead of 6~8GB/s in normal). So I think there should not to enable peer access between GPUs which were plugged on the different I/O Hubs (IOH) to ensure at least normal cudaMemcpy performance.

The following quote is from our related hardware engineer.

quote:
"""
NVIDIA GPUs are designed to take full advantage of the PCI-e Gen2 standard, including the Peer-to-Peer communication, but the IOH chipset does not support the full PCI-e Gen2 specification for P2P communication with other IOH chipsets
The cudaPeerEnable() API call will return an error code if the application tries to establish a P2P relationship between two GPUs that would require P2P communication over QPI. The cudaMemcopy() function for P2P Direct Transfers automatically falls back to using a Device-to-Host-to-Device path, but there is no automatic fallback for P2P Direct Access (P2P load/store instructions in device code).
One known example system is the HP Z800 workstation with dual IOH chipsets which can run the simpleP2P example, but bandwidth is very low (100s of MB/s instead of several GB/s) because of the fallback path.
NVIDIA is investigating whether GPU P2P across QPI can be supported by adding functionality to future GPU architectures.
"""

flx42 · 2015-11-12T17:52:07Z

What is your driver version?
I think this issue was fixed in a recent driver update, and this sort of low-level hack should not be part of Caffe IMO.

buaaliyi · 2015-11-13T01:17:05Z

@flx42 The driver version is 346.46, which is the release version with CUDA 7.0.

flx42 · 2015-11-13T02:17:13Z

I don't remember exactly when it was fixed, could you try on 352.39? The official version for CUDA 7.5.

buaaliyi · 2015-11-13T03:14:49Z

OK, I'll try it. Thank you for your advise.

shelhamer · 2017-03-23T07:53:07Z

Closing given new parallelism in #4563

Not enable peer access in case of the GPUs are located over QPI

d54ae85

buaaliyi force-pushed the peer_access branch from 065d9f5 to d54ae85 Compare November 12, 2015 07:57

ronghanghu added the multi-GPU label Nov 29, 2015

shelhamer closed this Mar 23, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Not enable peer access in case of the GPUs are located over QPI #3319

Not enable peer access in case of the GPUs are located over QPI #3319

buaaliyi commented Nov 12, 2015

flx42 commented Nov 12, 2015

buaaliyi commented Nov 13, 2015

flx42 commented Nov 13, 2015

buaaliyi commented Nov 13, 2015

shelhamer commented Mar 23, 2017

Not enable peer access in case of the GPUs are located over QPI #3319

Not enable peer access in case of the GPUs are located over QPI #3319

Conversation

buaaliyi commented Nov 12, 2015

flx42 commented Nov 12, 2015

buaaliyi commented Nov 13, 2015

flx42 commented Nov 13, 2015

buaaliyi commented Nov 13, 2015

shelhamer commented Mar 23, 2017