Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce OptiX direct callable API that owns groupdata buffer #1683

Merged
merged 7 commits into from
Jun 14, 2023

Conversation

chellmuth
Copy link
Collaborator

@chellmuth chellmuth commented May 12, 2023

Description

On gpu, the renderer currently owns a shader's groupdata params buffer so that it can pass its pointer to both the init and entry functions.

The downside of this approach is that in order to avoid dynamic memory allocation, the renderer must commit to a buffer size at build time. Using a conservative size to accommodate potentially large shaders can lead to gigabytes of unnecessary memory footprint.

This patch adds a new OptiX direct callable alternative to the existing init and entry callables. This new function allocates a perfectly-sized groupdata params buffer, so that the gpu only pays for the memory it needs, and then it calls init and entry.

Particularly large shaders can require larger buffers than there is space on the cuda stack. To handle this case, there is a new option, "max_optix_groupdata_alloc". Any shader requiring a groupdata buffer larger than this value will not allocate its own buffer, and instead use the pointer passed in by the renderer (presumably coming from a global memory pool).

Tests

All testshade tests run with the existing api, and again with the new fused api.

Checklist:

  • I have read the contribution guidelines.
  • I have previously submitted a Contributor License Agreement.
  • I have updated the documentation, if applicable.
  • I have ensured that the change is tested somewhere in the testsuite (adding new test cases if necessary).
  • My code follows the prevailing code style of this project.

@tgrant-nv
Copy link
Contributor

This looks very cool, Chris. I'm still kicking the tires, but it seems like a great step forward.

@tgrant-nv
Copy link
Contributor

I see that the fused group function and the separate init/group functions all make it into the generated PTX. If you know in advance that you will or won't be using the fused function, it might be a good idea to drop the unused entry points. It would certainly make the PTX a lot smaller, and could save time in codegen and optimization.

Signed-off-by: Chris Hellmuth <chellmuth@gmail.com>
@chellmuth
Copy link
Collaborator Author

I see that the fused group function and the separate init/group functions all make it into the generated PTX. If you know in advance that you will or won't be using the fused function, it might be a good idea to drop the unused entry points. It would certainly make the PTX a lot smaller, and could save time in codegen and optimization.

I'm happy to add an option that lets the user specify which callable(s) they're using, so we can remove the other. For what it's worth, in my tests the wrapper callables are just a couple dozen lines and keeping them all doesn't appear to affect codegen times.

@chellmuth
Copy link
Collaborator Author

Latest push rebases on the recent reparameter work, and switches optix-testrender from the current api to the single-callable api.

Signed-off-by: Chris Hellmuth <chellmuth@gmail.com>
Signed-off-by: Chris Hellmuth <chellmuth@gmail.com>
Copy link
Collaborator

@lgritz lgritz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I noted two places where you don't really need to construct a std::string, that's just a minor preference.

Could you please edit oslexec.h to add max_optix_groupdata_alloc to the documentation of attribute() that lists all of the attributes that one is allowed to set or query?

Is this ready to go, everybody ok with it?

src/liboslexec/llvm_instance.cpp Outdated Show resolved Hide resolved
src/liboslexec/llvm_instance.cpp Outdated Show resolved Hide resolved
@lgritz
Copy link
Collaborator

lgritz commented Jun 14, 2023

Code looks fine, but can you please document the new attribute name you added, in oslexec.h where all the others are documented? Thanks.

Signed-off-by: Chris Hellmuth <chellmuth@gmail.com>
@chellmuth
Copy link
Collaborator Author

Ah yeah, you caught me pushing commits only partially addressing your feedback :) Latest commit adds the attribute documentation.

@lgritz lgritz merged commit 1956e83 into AcademySoftwareFoundation:main Jun 14, 2023
chellmuth pushed a commit to chellmuth/OpenShadingLanguage that referenced this pull request Sep 6, 2024
* fix: Have ReParameter only copy data when it changes (AcademySoftwareFoundation#1698)
* gpu: OptiX direct callable API that owns groupdata buffer (AcademySoftwareFoundation#1683)

See merge request spi/dev/3rd-party/osl-feedstock!51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants