Inference Speedup #18

pkulium · 2023-07-18T15:32:55Z

Great work!
I am trying to do inference speedup. Could you please share the code for inference speedup using 2:4 sparsity on Ampere GPUs? Thanks!

efrantar · 2023-07-20T14:07:09Z

Hi, as also discussed in #15 (see there for some more details), the layer-wise 2:4 inference speedups we report were directly produced with NVIDIA's CUTLASS profiler using their prebuilt kernels, no custom code from our side was involved.

pkulium · 2023-07-24T06:03:45Z

thanks for this info!

Hongbosherlock · 2023-08-14T12:30:21Z

Hi, as also discussed in #15 (see there for some more details), the layer-wise 2:4 inference speedups we report were directly produced with NVIDIA's CUTLASS profiler using their prebuilt kernels, no custom code from our side was involved.

Have you tried cuSPARSE? Is it easier to use and more effective than cutlass?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Speedup #18

Inference Speedup #18

pkulium commented Jul 18, 2023

efrantar commented Jul 20, 2023

pkulium commented Jul 24, 2023

Hongbosherlock commented Aug 14, 2023

Inference Speedup #18

Inference Speedup #18

Comments

pkulium commented Jul 18, 2023

efrantar commented Jul 20, 2023

pkulium commented Jul 24, 2023

Hongbosherlock commented Aug 14, 2023