-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rieman fit in CA #169
Rieman fit in CA #169
Conversation
Validation summaryReference release CMSSW_10_2_4 at 6e8f631
|
@rovere what changes do we expect running the usual workflows ? |
none, since the products still live only on the GPU. |
Looks like this PR introduces a GPU memory leak:
Can you have a look ? |
I will |
cudaFree(helix_fit_resultsGPU_) together with the other buffers
Validation summaryReference release CMSSW_10_2_4 at 6e8f631
|
Reported by cuda-memcheck --tool memcheck --leak-check full --report-api-errors all: CUDA-MEMCHECK Leaked 3040000 bytes at 0x75ab80000 Saved host backtrace up to driver entry point at cudaMalloc time Host Frame:/lib64/libcuda.so.1 (cuMemAlloc_v2 + 0x17f) [0x22bdaf] Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/external/slc7_amd64_gcc700/lib/libcudart.so.9.2 [0x2fdc3] Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/external/slc7_amd64_gcc700/lib/libcudart.so.9.2 [0x1196b] Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/external/slc7_amd64_gcc700/lib/libcudart.so.9.2 (cudaMalloc + 0x16f) [0x3fb2f] Host Frame:/data/user/fwyzard/patatrack/validation/run.ho9ucTUSKo/testing/lib/slc7_amd64_gcc700/pluginRecoPixelVertexingPixelTripletsPlugins.so (CAHitQuadrupletGeneratorGPU::allocateOnGPU() + 0x341) [0x27301] Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libHeterogeneousCoreCUDACore.so (heterogeneous::GPUCuda::call_beginStreamGPUCuda(edm::StreamID) + 0x945) [0x6785] Host Frame:/data/user/fwyzard/patatrack/validation/run.ho9ucTUSKo/testing/lib/slc7_amd64_gcc700/pluginRecoPixelVertexingPixelTripletsPlugins.so [0x2ada6] Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libFWCoreFramework.so (edm::Worker::beginStream(edm::StreamID, edm::StreamContext&) + 0x110) [0x23e900] Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libFWCoreFramework.so (edm::WorkerManager::beginStream(edm::StreamID, edm::StreamContext&) + 0x32) [0x1e6b32] Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libFWCoreFramework.so (edm::EventProcessor::beginJob() + 0x1df) [0x20599f] Host Frame:cmsRun [0xead4] Host Frame:cmsRun (main + 0x162) [0xd2c2] Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x22445] Host Frame:cmsRun [0xd598] LEAK SUMMARY: 3040000 bytes leaked in 1 allocations ERROR SUMMARY: 1 error
Validation summaryReference release CMSSW_10_2_4 at 6e8f631
|
No changes to the standard workflows (as expected). For the 10824.8 TTbar workflow it adds to the profile:
the new kernels:
|
@rovere @felicepantaleo do we want to have a flag to turn on the Riemann fit, or run it unconditionally ? |
@fwyzard Could you elaborate in what ways does running the Riemann fit unconditionally make benchmarking much less useful? The motivation for unconditional execution is to replace the CPU-side poor-man fit (that is used for filtering quadruplets) with the real fit. On the other hand, the circle fit becoming now the second heaviest kernel, maybe we have/want to reconsider that. |
Hi Matti, let me try to summarise the chat with Felice, and explain my point of view here. There are (at least) two things going on here:
The goal of the former is to improve the physics performance, at a negligible (or at least affordable) increase in computing cost. Making the Riemann fit the default workflow goes in this direction. The goal of the latter is to show that a given algorithm can be implemented and offloaded to GPUs in a way that significantly reduces the overall cost at a fixed performance. Being able to measure the event processing throughput with the same inputs and outputs is a requirement here. Felice claims we should do former first, and use that as a baseline for the latter. My point is that I do not care much about the former, but I am quite invested in the latter, so I would rather not have the Patatrack code base in a situation where it cannot be measured. |
Thanks Andrea for the explanation. Allow me to ask one further clarification: in
with "it" do you refer to physics performance, technical performance, or both? |
Sorry for the confusion - I mean in a state where we cannot compare the
throughput on GPU vs CPU for a similar enough workflow.
|
Ok, thanks for the clarification. I'm fine with adding configuration flags, we could e.g. keep all the Riemann fit work in the .9 workflow, and leave .8 as it is with this respect. |
// ge[] ==> [(0,0), (0,1), (0,2), (1,1), (1,2), (2,2)] | ||
// So: | ||
// (i=0,j=0,l=1) ==> (x_0,y_0) | ||
// (i=1,j=2,l=0) ==> (z_1,x_1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand this line :-(
// So: | ||
// (i=0,j=0,l=1) ==> (x_0,y_0) | ||
// (i=1,j=2,l=0) ==> (z_1,x_1) | ||
auto ge_idx = j + l; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
wouldn't it be simpler to rewrite this as
auto ge_idx = j + l + (j > 0 and l > 0);
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed
Validation summaryReference release CMSSW_10_2_4 at 6e8f631
|
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
Also, add back the stand-alone GPU fit test.
This supersedes #162