Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prototype for module-internal chain of tasks #363

Merged

Conversation

makortel
Copy link

@makortel makortel commented Jul 1, 2019

PR description:

This PR adds helpers for a chain of TBB tasks internal to the ExternalWork module. These helpers can be used to implement e.g. a chain of tasks that alternate CPU and GPU work, or a loop of GPU work where the decision of continuing is done in the CPU. This PR also adds CUDAScopedContextAnalyze mentioned in #355 (comment).

In short, the CUDAScopedContextAcquire can be used to queue a "next task" instead of the produce() with

void ...::acquire(...) {
  CUDAScopedContextAcquire ctx{...};

  ctx.insertNextTask([this](WaitingTaskWithArenaHolder h) {
    CUDAScopedContextTask ctx{this->ctxState_, std::move(h)};
     ...
   }):
}

The commit 203cd7d experiments with the interface by providing directly the CUDAScopedContextTask to the user at the cost of interface inconsistency between acquire() and the function

void ...::acquire(...) {
  CUDAScopedContextAcquire ctx{...};

  ctx.insertNextTask([](CUDAScopedContextTask ctx) {
     ...
   }):
}

I would like to hear opinions of others which one would you prefer.

PR validation:

Added test runs.

@makortel
Copy link
Author

makortel commented Jul 1, 2019

@vkhristenko this is the PR I mentioned

@makortel
Copy link
Author

makortel commented Jul 1, 2019

@VinInn this mechanism can be used also for "re-running a producer with more memory when initial memory was not sufficient"

holder.doneWaiting(*excptr);
return;
}
func();
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit unhappy about the replication of WaitingTaskWithArenaHolder here, especially since the code-to-be-copied would be only 4 lines.

@fwyzard
Copy link

fwyzard commented Jul 10, 2019

Validation summary

Reference release CMSSW_10_6_0 at b45186e
Development branch CMSSW_10_6_X_Patatrack at f340840
Testing PRs:

makeTrackValidationPlots.py plots

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

  • tracking validation plots and summary for workflow 10824.5
  • tracking validation plots and summary for workflow 10824.51
  • tracking validation plots and summary for workflow 10824.52

logs and nvprof/nvvp profiles

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValZMM_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_realistic_v4-v1/GEN-SIM-DIGI-RAW

/RelValTTbar_13/CMSSW_10_6_0-PU25ns_106X_upgrade2018_design_v3-v1/GEN-SIM-DIGI-RAW

Logs

The full log is available at https://fwyzard.web.cern.ch/fwyzard/patatrack/pulls/125cfeca1ba29106d05e2d5c98456f59118c5442/log .

@fwyzard
Copy link

fwyzard commented Sep 9, 2019

About the insertNextTask() syntax, I prefer the second variant, as it is leaner and avoids repeating the same couple of lines.

@makortel
Copy link
Author

makortel commented Sep 9, 2019

Couple of items from a chat with @fwyzard

  • Change the order insertNextTask() and CUDA async work launches to be more intuitive (the order doesn't really matter)
  • Make the LIFO stack property of the "next task" more clear in the interface (insertNextTask() -> pushNextTask()) and examples
  • Make it clear in the documentation that the caller of acquire()/lambda has a copy of the WaitingTaskWithArenaHolder guaranteeing that the next task will be run only after the current one has finished

@makortel
Copy link
Author

makortel commented Sep 9, 2019

@fwyzard I've addressed the points from our discussion (#363 (comment)), could you take a look again?

@fwyzard
Copy link

fwyzard commented Sep 9, 2019

Thanks, the latest set of changes make it clear what the behaviour of the tasks should be.

@fwyzard
Copy link

fwyzard commented Sep 9, 2019

The only aspect that it's not clear to me is why

Note that the CUDAService is not available (nor is any other service) in these intermediate tasks.

I thought the Services were available to the underlying thread, not to the individual task ?

@fwyzard fwyzard merged commit 703ee96 into cms-patatrack:CMSSW_10_6_X_Patatrack Sep 10, 2019
@makortel
Copy link
Author

@fwyzard Services are a rather complex subsystem, approximately requiring a valid ServiceToken on the thread calling Services. The framework holds a ServiceToken during the execution of any module methods, but such token does not exist on arbitrary TBB tasks and thus Services can not be used.

Here I decided to be lazy and not support Services in these tasks (and see how far we get with it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants