Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory policy #983

Draft
wants to merge 4 commits into
base: master
Choose a base branch
from
Draft

Memory policy #983

wants to merge 4 commits into from

Conversation

nksauter
Copy link
Contributor

@nksauter nksauter commented Apr 9, 2024

I've created a failing unit test for new kokkos code in the cuda-context.

If the test "libtbx.python cctbx_project/simtbx/tests/tst_memory_policy.py" can be made to work, the bug has been fixed.

Background information: I'm trying to extend the kokkos exascale_api with new behavior. In the old way, it would allocate large arrays on GPU corresponding to the detector size even if only a few pixels are calculated due to the whitelist (a list of pixels of interest). In the new behavior only enough memory is allocated for the whitelist pixels. I am using C++ templates with template specialization dependent on these two memory-allocation cases. I need this to work so Daniel can move ahead with the SPREAD project, and it is just this last detail that I cannot seem to fix.

@dermen
Copy link
Contributor

dermen commented May 9, 2024

@nksauter , just making sure I see where we are: in the latest commit, the test works, but there is duplicated code which we want to avoid ?

@nksauter
Copy link
Contributor Author

nksauter commented May 9, 2024 via email

Copy link
Contributor

@Baharis Baharis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After 3 weeks of testing via psii_spread's annulus worker, this PR was indeed found to significantly lower the memory footprint when simulating images on GPU. I don't have particular issues with the functionality of this code.

In the exascale_api, allow pixel values to be calculation either on
large array (all pixels), or with low-memory on just the whitelist
consisting of shoebox pixels.  This commit only gives the polymorphism
framework; both implementations are currently identical giving the
large-array behavior.
The script tests/tst_memory_policy.py fails with a cuda illegal access.
The intention is to get help from NESAP to get a functional test.
Memory savings achieved through code specialization, for the case where
pixel values are simulated on a small whitelist. Specializations are not
yet optimal, as there is still a lot of code duplication.

Changes give ~4.5x reduction in memory footprint, but no success yet in
resizing the array m_accumulate_floatimage.  Attempts so far lead to
cuda memory allocation error.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants