Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference Speedup #18

Open
pkulium opened this issue Jul 18, 2023 · 3 comments
Open

Inference Speedup #18

pkulium opened this issue Jul 18, 2023 · 3 comments

Comments

@pkulium
Copy link

pkulium commented Jul 18, 2023

Great work!
I am trying to do inference speedup. Could you please share the code for inference speedup using 2:4 sparsity on Ampere GPUs? Thanks!

@efrantar
Copy link
Member

Hi, as also discussed in #15 (see there for some more details), the layer-wise 2:4 inference speedups we report were directly produced with NVIDIA's CUTLASS profiler using their prebuilt kernels, no custom code from our side was involved.

@pkulium
Copy link
Author

pkulium commented Jul 24, 2023

thanks for this info!

@Hongbosherlock
Copy link

Hi, as also discussed in #15 (see there for some more details), the layer-wise 2:4 inference speedups we report were directly produced with NVIDIA's CUTLASS profiler using their prebuilt kernels, no custom code from our side was involved.

Have you tried cuSPARSE? Is it easier to use and more effective than cutlass?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants