mit-han-lab / llm-awq Public

Notifications You must be signed in to change notification settings
Fork 183
Star 2.4k

Code
Issues 126
Pull requests 6
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Actions
Projects
Security
Insights

Issues: mit-han-lab/llm-awq

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

126 Open 49 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

No video inference code

#227 opened Oct 16, 2024 by Closertodeath

怎么将生成的.pt文件与模型结构合并，转换为其他结构

#226 opened Oct 14, 2024 by gdgfd22

AutoModelForSequenceClassification模型量化

#225 opened Oct 12, 2024 by Fenglly

Feature 'ldmatrix' requires target sm_75 or higher when building awq_inference_engine on Tesla V100

#223 opened Oct 5, 2024 by ShobhaRajanna

AttributeError: 'LlamaConfig' object has no attribute 'rope_theta'

#222 opened Sep 30, 2024 by lvtao65535

How to Split AWQ Weights?

#221 opened Sep 28, 2024 by Azure-Tang

Unsupported NVHPC compiler found. nvc++ is the only NVHPC compiler

#220 opened Sep 17, 2024 by SimWangArizona

"Expected all tensors to be on the same device" when running "Perform AWQ search" on Llama3

#219 opened Sep 10, 2024 by charlesyju

About the implementation of scaled activation

#217 opened Aug 22, 2024 by XcloudFance

Batch Processing not implemented for LlavaStreamGenerator

#216 opened Aug 12, 2024 by rahulthakur319

NotImplementedError: <class 'transformers_modules.modeling_chatglm.ChatGLMForConditionalGeneration'>

#214 opened Aug 8, 2024 by lihaofd

GGUF export support / CPU inference

#213 opened Aug 5, 2024 by TomekPro

How to generate awq quantized model for llava-1.5-7b-hf

#210 opened Jul 26, 2024 by XiaotaoChen

Plans for running model on other devices?

#209 opened Jul 18, 2024 by stats-202

Add support for GPUs with compute capability lower than 8.0 for awq/kernels installation

#204 opened Jul 3, 2024 by rahulthakur319

How to load and infer the VILA-1.5-40B-AWQ model on multiple GPUs? I currently have 4 A30✖️24GB GPUs and a cuda out of memory error occurs.

#203 opened Jun 27, 2024 by changqinyao

显卡要求

#202 opened Jun 23, 2024 by kplxwb

Illegal memory access for LLama-3-70B

#200 opened Jun 11, 2024 by pprp

Request for Semi-Structured Sparse Matrix Support in AWQ Kernel

#199 opened Jun 10, 2024 by pprp

Memory increases significantly during inference

#196 opened Jun 3, 2024 by xpq-tech

Invalid Characters

#195 opened Jun 3, 2024 by YandongJi

Rocm support request

#194 opened May 31, 2024 by Wintoplay

Is this a bug for the quantization phase?

#193 opened May 28, 2024 by sleepwalker2017

google.protobuf.message.DecodeError: Error parsing message

#192 opened May 28, 2024 by InkyuPak

AWQ and VILA dependency compatible issue

#191 opened May 24, 2024 by chaifong92

Previous 1 2 3 4 5 6 Next

Previous Next

ProTip! Find all open issues with in progress development work with linked:pr.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly