Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] Examples of how the llm pldi24-artifacts are generated #165

Open
bibo-msft opened this issue Jul 9, 2024 · 1 comment
Open
Labels
enhancement New feature or request

Comments

@bibo-msft
Copy link
Contributor

Is your feature request related to a problem? Please describe.
I am trying to retarget the llm artifacts to my own FPGA board. I'd like to regenerate the HLS code to try more aggressive quantization schemes.

Describe the solution you'd like
Please add some small examples of advanced optimization techniques that are used in the pldi24-artifact repo.

  • Mixed precision input/output for GEMM
  • Mixed precision activation/weight for GEMM
  • Mixed precision input/output for Softmax/Layernorm/Residual
  • Low-bit packing input/output for GEMM/Softmax

Additional context
For example, the softmax operator requires the same fp32 datatype for both input and output. However, there is a mixed precision HLS implementation with input/output packing in the artifact code here. I searched the Allo repo and could not find a reference of how to generate such code.

@bibo-msft bibo-msft added the enhancement New feature or request label Jul 9, 2024
@chhzh123
Copy link
Member

chhzh123 commented Jul 9, 2024

Hi @bibo-msft, thanks for raising the issue! The PLDI'24 artifact was not purely generated by Allo. There exists some manual hacks in the kernel, and we are still automating the process.

Currently, we have a script for generating the Transformer kernels. Please check out this page for the instructions. This test case also shows a low-bit packing example of GEMM. You can change the bitwidths in the type parameters to generate different GEMM kernels.

We will provide additional examples of mixed precision kernels soon and will notify you once they are available. Please feel free to share any other suggestions you may have. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants