Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rieman fit in CA #169

Merged
merged 5 commits into from
Sep 25, 2018
Merged

Conversation

rovere
Copy link

@rovere rovere commented Sep 17, 2018

This supersedes #162

@cmsbot
Copy link

cmsbot commented Sep 17, 2018

A new Pull Request was created by @rovere (Marco Rovere) for CMSSW_10_2_X_Patatrack.

It involves the following packages:

RecoPixelVertexing/PixelTrackFitting
RecoPixelVertexing/PixelTriplets

@cmsbot, @fwyzard can you please review it and eventually sign? Thanks.

cms-bot commands are listed here

@fwyzard
Copy link

fwyzard commented Sep 17, 2018

Validation summary

Reference release CMSSW_10_2_4 at 6e8f631
Development branch CMSSW_10_2_X_Patatrack at 68a8252
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/4e971f246aca4f5c813907f6e6bd472f85dd7a2d/log .

@fwyzard
Copy link

fwyzard commented Sep 17, 2018

@rovere what changes do we expect running the usual workflows ?

@rovere
Copy link
Author

rovere commented Sep 17, 2018

none, since the products still live only on the GPU.

@fwyzard
Copy link

fwyzard commented Sep 17, 2018

Looks like this PR introduces a GPU memory leak:

CUDA-MEMCHECK
Leaked 3040000 bytes at 0x75ab80000
    Saved host backtrace up to driver entry point at cudaMalloc time
    Host Frame:/lib64/libcuda.so.1 (cuMemAlloc_v2 + 0x17f) [0x22bdaf]
    Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/external/slc7_amd64_gcc700/lib/libcudart.so.9.2 [0x2fdc3]
    Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/external/slc7_amd64_gcc700/lib/libcudart.so.9.2 [0x1196b]
    Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/external/slc7_amd64_gcc700/lib/libcudart.so.9.2 (cudaMalloc + 0x16f) [0x3fb2f]
    Host Frame:/data/user/fwyzard/patatrack/validation/run.dTOgaParzq/testing/lib/slc7_amd64_gcc700/pluginRecoPixelVertexingPixelTripletsPlugins.so (CAHitQuadrupletGeneratorGPU::allocateOnGPU() + 0x341) [0x27301]
    Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libHeterogeneousCoreCUDACore.so (heterogeneous::GPUCuda::call_beginStreamGPUCuda(edm::StreamID) + 0x945) [0x6785]
    Host Frame:/data/user/fwyzard/patatrack/validation/run.dTOgaParzq/testing/lib/slc7_amd64_gcc700/pluginRecoPixelVertexingPixelTripletsPlugins.so [0x2ada6]
    Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libFWCoreFramework.so (edm::Worker::beginStream(edm::StreamID, edm::StreamContext&) + 0x110) [0x23e900]
    Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libFWCoreFramework.so (edm::WorkerManager::beginStream(edm::StreamID, edm::StreamContext&) + 0x32) [0x1e6b32]
    Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libFWCoreFramework.so (edm::EventProcessor::beginJob() + 0x1df) [0x20599f]
    Host Frame:cmsRun [0xead4]
    Host Frame:cmsRun (main + 0x162) [0xd2c2]
    Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x22445]                                                                                                                                                               
    Host Frame:cmsRun [0xd598]                                                                                                                                                                                                     
                                                                                                                                                                                                                                   
LEAK SUMMARY: 3040000 bytes leaked in 1 allocations                                                                                                                                                                                
ERROR SUMMARY: 1 error

Can you have a look ?

@rovere
Copy link
Author

rovere commented Sep 17, 2018

I will

cudaFree(helix_fit_resultsGPU_) together with the other buffers
@cmsbot
Copy link

cmsbot commented Sep 20, 2018

Pull request #169 was updated. @cmsbot, @fwyzard can you please check and sign again.

@fwyzard
Copy link

fwyzard commented Sep 20, 2018

Validation summary

Reference release CMSSW_10_2_4 at 6e8f631
Development branch CMSSW_10_2_X_Patatrack at 68a8252
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/77083d81ae097aca74e2f94d0d787cbd6f4ab908/log .

Reported by cuda-memcheck --tool memcheck --leak-check full --report-api-errors all:
    CUDA-MEMCHECK
    Leaked 3040000 bytes at 0x75ab80000
        Saved host backtrace up to driver entry point at cudaMalloc time
        Host Frame:/lib64/libcuda.so.1 (cuMemAlloc_v2 + 0x17f) [0x22bdaf]
        Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/external/slc7_amd64_gcc700/lib/libcudart.so.9.2 [0x2fdc3]
        Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/external/slc7_amd64_gcc700/lib/libcudart.so.9.2 [0x1196b]
        Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/external/slc7_amd64_gcc700/lib/libcudart.so.9.2 (cudaMalloc + 0x16f) [0x3fb2f]
        Host Frame:/data/user/fwyzard/patatrack/validation/run.ho9ucTUSKo/testing/lib/slc7_amd64_gcc700/pluginRecoPixelVertexingPixelTripletsPlugins.so (CAHitQuadrupletGeneratorGPU::allocateOnGPU() + 0x341) [0x27301]
        Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libHeterogeneousCoreCUDACore.so (heterogeneous::GPUCuda::call_beginStreamGPUCuda(edm::StreamID) + 0x945) [0x6785]
        Host Frame:/data/user/fwyzard/patatrack/validation/run.ho9ucTUSKo/testing/lib/slc7_amd64_gcc700/pluginRecoPixelVertexingPixelTripletsPlugins.so [0x2ada6]
        Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libFWCoreFramework.so (edm::Worker::beginStream(edm::StreamID, edm::StreamContext&) + 0x110) [0x23e900]
        Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libFWCoreFramework.so (edm::WorkerManager::beginStream(edm::StreamID, edm::StreamContext&) + 0x32) [0x1e6b32]
        Host Frame:/data/cmssw/slc7_amd64_gcc700/cms/cmssw/CMSSW_10_2_4_Patatrack/lib/slc7_amd64_gcc700/libFWCoreFramework.so (edm::EventProcessor::beginJob() + 0x1df) [0x20599f]
        Host Frame:cmsRun [0xead4]
        Host Frame:cmsRun (main + 0x162) [0xd2c2]
        Host Frame:/lib64/libc.so.6 (__libc_start_main + 0xf5) [0x22445]
        Host Frame:cmsRun [0xd598]
    
    LEAK SUMMARY: 3040000 bytes leaked in 1 allocations
    ERROR SUMMARY: 1 error
@cmsbot
Copy link

cmsbot commented Sep 20, 2018

Pull request #169 was updated. @cmsbot, @fwyzard can you please check and sign again.

@fwyzard
Copy link

fwyzard commented Sep 20, 2018

Validation summary

Reference release CMSSW_10_2_4 at 6e8f631
Development branch CMSSW_10_2_X_Patatrack at 68a8252
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/9c24030e8a02ce6f92952dfad9a27a457485a4cf/log .

@fwyzard
Copy link

fwyzard commented Sep 20, 2018

No changes to the standard workflows (as expected).

For the 10824.8 TTbar workflow it adds to the profile:

GPU activities:    46.29%  298.56ms       100    2.99ms    1.65ms    6.28ms  gpuClustering::findClus(unsigned short const *, unsigned short const *, unsigned short const *, unsigned int const *, unsigned int*, unsigned int*, int*, int)
                   14.77%   95.24ms       100  952.39us  826.96us    1.06ms  gpuPixelRecHits::getHits(pixelCPEforGPU::ParamsOnGPU const *, float const *, unsigned short const *, unsigned short const *, unsigned short const *, unsigned short const *, unsigned int const *, unsigned int const *, unsigned int const *, int const *, int, unsigned int const *, int*, unsigned short*, float*, float*, float*, float*, short*, float*, float*, float*, float*, unsigned short*, unsigned short*)
                   14.18%   91.48ms       100  914.84us  541.64us    1.41ms  gpuPixelDoublets::getDoubletsFromHisto(GPUCACell*, unsigned int*, siPixelRecHitsHeterogeneousProduct::HitsOnGPU const *, GPU::VecArray<unsigned int, int=256>*)
                    9.48%   61.12ms       100  611.23us  139.23us    1.76ms  kernel_connect(GPU::SimpleVector<Quadruplet>*, GPUCACell*, unsigned int const *, GPU::VecArray<unsigned int, int=256>*, float, float, float, float, float, unsigned int, unsigned int)
                    8.64%   55.74ms      1300   42.88us    1.15us  919.89us  [CUDA memcpy DtoH]
                    3.88%   25.05ms       100  250.50us  99.457us  510.38us  kernel_find_ntuplets(GPUCACell*, unsigned int const *, GPU::SimpleVector<Quadruplet>*, unsigned int, unsigned int)
                    0.85%    5.47ms       100   54.74us   27.81us   96.19us  kernel_checkOverflows(GPU::SimpleVector<Quadruplet>*, GPUCACell*, unsigned int const *, GPU::VecArray<unsigned int, int=256>*, unsigned int, unsigned int)

the new kernels:

GPU activities:    33.74%  296.57ms       100    2.97ms    1.68ms    6.54ms  gpuClustering::findClus(unsigned short const *, unsigned short const *, unsigned short const *, unsigned int const *, unsigned int*, unsigned int*, int*, int)
                   19.47%  171.18ms       100    1.71ms    1.64ms    1.78ms  KernelCircleFitAllHits(GPU::SimpleVector<Quadruplet>*, int, float, Rfit::helix_fit*, Eigen::Matrix<double, int=3, int=-1, int=0, int=3, int=4>*, Eigen::Matrix<double, int=-1, int=-1, int=0, int=12, int=12>*, Rfit::circle_fit*, Eigen::Matrix<double, int=4, int=1, int=0, int=4, int=1>*, Rfit::line_fit*)
                   10.79%   94.82ms       100  948.18us  829.39us    1.03ms  gpuPixelRecHits::getHits(pixelCPEforGPU::ParamsOnGPU const *, float const *, unsigned short const *, unsigned short const *, unsigned short const *, unsigned short const *, unsigned int const *, unsigned int const *, unsigned int const *, int const *, int, unsigned int const *, int*, unsigned short*, float*, float*, float*, float*, short*, float*, float*, float*, float*, unsigned short*, unsigned short*)
                   10.42%   91.56ms       100  915.60us  541.90us    1.43ms  gpuPixelDoublets::getDoubletsFromHisto(GPUCACell*, unsigned int*, siPixelRecHitsHeterogeneousProduct::HitsOnGPU const *, GPU::VecArray<unsigned int, int=256>*)
                    6.95%   61.12ms       100  611.20us  140.52us    1.74ms  kernel_connect(GPU::SimpleVector<Quadruplet>*, GPUCACell*, unsigned int const *, GPU::VecArray<unsigned int, int=256>*, float, float, float, float, float, unsigned int, unsigned int)
                    6.22%   54.68ms      1300   42.06us    1.12us  877.39us  [CUDA memcpy DtoH]
                    5.98%   52.59ms       100  525.89us  511.46us  549.96us  KernelLineFitAllHits(GPU::SimpleVector<Quadruplet>*, float, Rfit::helix_fit*, Eigen::Matrix<double, int=3, int=-1, int=0, int=3, int=4>*, Eigen::Matrix<double, int=-1, int=-1, int=0, int=12, int=12>*, Rfit::circle_fit*, Eigen::Matrix<double, int=4, int=1, int=0, int=4, int=1>*, Rfit::line_fit*)
                    2.85%   25.08ms       100  250.83us   98.75us  513.80us  kernel_find_ntuplets(GPUCACell*, unsigned int const *, GPU::SimpleVector<Quadruplet>*, unsigned int, unsigned int)
                    1.59%   13.96ms       100  139.61us  135.65us  144.58us  KernelFastFitAllHits(GPU::SimpleVector<Quadruplet>*, siPixelRecHitsHeterogeneousProduct::HitsOnGPU const *, int, float, Rfit::helix_fit*, Eigen::Matrix<double, int=3, int=-1, int=0, int=3, int=4>*, Eigen::Matrix<double, int=-1, int=-1, int=0, int=12, int=12>*, Rfit::circle_fit*, Eigen::Matrix<double, int=4, int=1, int=0, int=4, int=1>*, Rfit::line_fit*)
                    0.62%    5.47ms       100   54.74us   27.87us   96.58us  kernel_checkOverflows(GPU::SimpleVector<Quadruplet>*, GPUCACell*, unsigned int const *, GPU::VecArray<unsigned int, int=256>*, unsigned int, unsigned int)

@fwyzard
Copy link

fwyzard commented Sep 20, 2018

@rovere @felicepantaleo do we want to have a flag to turn on the Riemann fit, or run it unconditionally ?
The latter makes benchmarking much less useful.

@makortel
Copy link

@fwyzard Could you elaborate in what ways does running the Riemann fit unconditionally make benchmarking much less useful?

The motivation for unconditional execution is to replace the CPU-side poor-man fit (that is used for filtering quadruplets) with the real fit. On the other hand, the circle fit becoming now the second heaviest kernel, maybe we have/want to reconsider that.

@fwyzard
Copy link

fwyzard commented Sep 20, 2018

Hi Matti, let me try to summarise the chat with Felice, and explain my point of view here.

There are (at least) two things going on here:

  • the development of a better-performing algorithm for the pixel tracks
  • the porting of the pixel track reconstruction to GPU

The goal of the former is to improve the physics performance, at a negligible (or at least affordable) increase in computing cost. Making the Riemann fit the default workflow goes in this direction.

The goal of the latter is to show that a given algorithm can be implemented and offloaded to GPUs in a way that significantly reduces the overall cost at a fixed performance. Being able to measure the event processing throughput with the same inputs and outputs is a requirement here.

Felice claims we should do former first, and use that as a baseline for the latter.

My point is that I do not care much about the former, but I am quite invested in the latter, so I would rather not have the Patatrack code base in a situation where it cannot be measured.

@makortel
Copy link

Thanks Andrea for the explanation. Allow me to ask one further clarification: in

I would rather not have the Patatrack code base in a situation where it cannot be measured.

with "it" do you refer to physics performance, technical performance, or both?

@fwyzard
Copy link

fwyzard commented Sep 20, 2018 via email

@makortel
Copy link

Ok, thanks for the clarification. I'm fine with adding configuration flags, we could e.g. keep all the Riemann fit work in the .9 workflow, and leave .8 as it is with this respect.

// ge[] ==> [(0,0), (0,1), (0,2), (1,1), (1,2), (2,2)]
// So:
// (i=0,j=0,l=1) ==> (x_0,y_0)
// (i=1,j=2,l=0) ==> (z_1,x_1)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand this line :-(

// So:
// (i=0,j=0,l=1) ==> (x_0,y_0)
// (i=1,j=2,l=0) ==> (z_1,x_1)
auto ge_idx = j + l;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't it be simpler to rewrite this as

auto ge_idx = j + l + (j > 0 and l > 0);

?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed

@fwyzard
Copy link

fwyzard commented Sep 25, 2018

Validation summary

Reference release CMSSW_10_2_4 at 6e8f631
Development branch CMSSW_10_2_X_Patatrack at 68a8252
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

DQM GUI plots

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_2_2-PU25ns_102X_upgrade2018_realistic_v11-v2/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_2_2-102X_upgrade2018_realistic_v11-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/8d566a17edb8190cbbb6defc0ac884e27cf53be7/log .

@fwyzard fwyzard merged commit a64cea6 into cms-patatrack:CMSSW_10_2_X_Patatrack Sep 25, 2018
@fwyzard fwyzard added this to the CMSSW_10_2_5_Patatrack milestone Sep 25, 2018
fwyzard pushed a commit that referenced this pull request Oct 8, 2020
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Oct 20, 2020
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Oct 23, 2020
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Nov 6, 2020
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Nov 16, 2020
Also, add back the stand-alone GPU fit test.
fwyzard added a commit that referenced this pull request Nov 27, 2020
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Dec 26, 2020
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Jan 15, 2021
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Mar 23, 2021
Also, add back the stand-alone GPU fit test.
fwyzard pushed a commit that referenced this pull request Apr 1, 2021
Also, add back the stand-alone GPU fit test.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants