Support llama-3 #789

boixu · 2024-04-22T14:53:49Z

Hi

Please add support for llama-3

Currently the prompt template is not compatible since llama-3 uses different style.
Ref: https://llama.meta.com/docs/model-cards-and-prompt-formats/meta-llama-3

Currently as is I was unable to use the llama-3 model.

Thanks in advance!

toomy0toons · 2024-05-02T07:31:11Z

h i tried llama-3 and may be you can use the setup.
code is little dirty.

first add template for llama3 in file.
prompt_template_utils.py



def get_prompt_template(system_prompt=system_prompt, promptTemplate_type=None, history=False):
    if promptTemplate_type == "llama3":
        if history:
            prompt = PromptTemplate(
                template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant, you will use the provided context to answer user questions.
Read the given context before answering questions and think step by step. If you can not answer a user question based on 
the provided context, inform the user. Do not use any other information for answering user. Provide a detailed answer to the question. <|eot_id|><|start_header_id|>user<|end_header_id|>
                Context: {history} \n {context} 
                User: {question} 
                Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
                input_variables=["history", "context", "question"],
        )
        else:
            prompt = PromptTemplate(
                template="""<|begin_of_text|><|start_header_id|>system<|end_header_id|> You are a helpful assistant, you will use the provided context to answer user questions.
Read the given context before answering questions and think step by step. If you can not answer a user question based on 
the provided context, inform the user. Do not use any other information for answering user. Provide a detailed answer to the question. <|eot_id|><|start_header_id|>user<|end_header_id|>
                Context: {context} 
                User: {question} 
                Answer: <|eot_id|><|start_header_id|>assistant<|end_header_id|>""",
                input_variables=["context", "question"],
        )
    elif promptTemplate_type == "llama":
        B_INST, E_INST = "[INST]", "[/INST]"
        B_SYS, E_SYS = "<<SYS>>\n", "\n<</SYS>>\n\n"
        SYSTEM_PROMPT = B_SYS + system_prompt + E_SYS

then add option for choosing the llama3 in localGPT
run_localGPT.py

@click.option(
    "--model_type",
    default="llama",
    type=click.Choice(
        ["llama", "mistral", "non_llama", "llama3"],
    ),
    help="model type, llama, mistral or non_llama, or llama3",
)

you can run now with python run_localGPT.py --model_type llama3

here is the model i used for tesitng.

constants.py

# LLAMA 3
MODEL_ID = "unsloth/llama-3-8b-bnb-4bit"
MODEL_BASENAME = None

KerenK-EXRM · 2024-05-02T20:25:47Z

@toomy0toons did you upgrade the llama cpp or transformers version to make this work with llama-3?

toomy0toons · 2024-05-03T00:27:37Z

I did install llama cpp by the readme docs.

i have cuda GPU so i installed the cublas version.

# Example: cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir

I did not install anything or upgrade anything besides official insturctions. It works out of the box. but since requirements.txt does not specify a version,
and i installed yesterday my verions might be a more recent one.
my transformers is transformers==4.38.2 now.
@KerenK-EXRM
is there a problem running llama3?

PromtEngineer · 2024-05-03T05:58:57Z

I think since llama2 is probably not going to be used anymore, I will update the prompt template for llama3 as default template.

KerenK-EXRM · 2024-05-04T18:10:41Z

@toomy0toons I tried with another version( QuantFactory/Meta-Llama-3-8B-GGUF) and it did't work.
looks like the project adjusted to support llama3
thank you! cant wait to try :)

VISWANATH78 · 2024-05-08T07:37:16Z

hi i have downloaded llama3 70b model . can some one provide me steps to convert into hugging face model and then run in the localGPT as currently i have done the same for llama 70b i am able to perform but i am not able to convert the full model files to .hf format files. so i would request for an proper steps in how i can perform. please let me know guys any steps please let me know. thank you

carloposo · 2024-05-08T08:44:53Z

I did install llama cpp by the readme docs.

i have cuda GPU so i installed the cublas version.
# Example: cuBLAS
CMAKE_ARGS="-DLLAMA_CUBLAS=on" FORCE_CMAKE=1 pip install llama-cpp-python==0.1.83 --no-cache-dir
I did not install anything or upgrade anything besides official insturctions. It works out of the box. but since requirements.txt does not specify a version, and i installed yesterday my verions might be a more recent one. my transformers is transformers==4.38.2 now. @KerenK-EXRM is there a problem running llama3?

Hi @toomy0toons , trying to do the same but having some issues as per this #793

toomy0toons · 2024-05-08T10:32:17Z

@carloposo
@KerenK-EXRM

my understanding is that the instruct model (8b) has extra set of tokens or has diffenrent prompt template.

try 7b models?

carloposo · 2024-05-08T10:46:48Z

@carloposo @KerenK-EXRM

my understanding is that the instruct model (8b) has extra set of tokens or has diffenrent prompt template.

try 7b models?

No 7B models for llama3 (https://adithyask.medium.com/from-7b-to-8b-parameters-understanding-weight-matrix-changes-in-llama-transformer-models-31ea7ed5fd88)

Do you mean none of the embedding models in constants.py are ok to run any of the llama-3 8b models?

carloposo · 2024-05-09T09:59:51Z

@toomy0toons found out the answer here https://youtu.be/S6PdFPoteBU?si=pSsxCNFJsz_dxn8b&t=551

jxmai · 2024-08-21T03:41:26Z

@PromtEngineer For info, the video you posted: https://www.youtube.com/watch?v=S6PdFPoteBU&t=549s that mentioned

based on the experimentation, the gguf model don't really follow the prompt template so you use the unquantized model for that time being.

Let's resolve the mystery here. The reason was caused by the underlying library of llama-cpp (EOS for instruct models). Some of the related discussion can be seen here: ggerganov/llama.cpp#6745 (comment) and there's some latter changes to support llama3 EOS e.g. https://github.com/ggerganov/llama.cpp/pull/6751/files

In particular, this commit from llama.cpp is what we are looking for: ggerganov/llama.cpp@7370d66

It was released on April 21: https://github.com/ggerganov/llama.cpp/tree/b2707

but for llama-cpp-python, we need to find the version at least greater than the patch for llama.cpp

which is this one:

[0.2.62]

feat: Update llama.cpp to ggerganov/llama.cpp@3b8f1ec

feat: update grammar schema converter to match llama.cpp by @themrzmaster in #1353

feat: add disable_ping_events flag by @khimaros in #1257

feat: Make saved state more compact on-disk by @tc-wolf in #1296

feat: Use all available CPUs for batch processing by @ddh0 in #1345

We can see the patch llama-cpp-python was included for version greater than 0.2.62 (can be verified using git log --all --grep='Added EOS' if you check out ggerganov/llama.cpp@3b8f1ec)

In summary, we only need to be sure the llama-cpp-python version is greater than 0.2.62 to run the quantized llama3 model.

jxmai · 2024-08-21T04:07:29Z

Created associated pull request to strength the docs: #823. No particular code changes needed from our end in this case but we need to clarify the llama-cpp-python version to support llama3.

Fix #789: Update README with instructions for running the quantized L…

VISWANATH78 · 2024-09-21T02:26:32Z

I wanna deploy the application how to do it so I have the infrastructure but deployment of llm and multiple user to access provide me the steps to do it

…

On Sat, 21 Sep 2024 at 4:56 AM, PromptEngineer ***@***.***> wrote: Closed #789 <#789> as completed via b4322d4 <b4322d4> . — Reply to this email directly, view it on GitHub <#789 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APQASI6XXJFNJ66JMD5Y7DDZXSVIVAVCNFSM6AAAAABGS6M2AGVHI2DSMVQWIX3LMV45UABCJFZXG5LFIV3GK3TUJZXXI2LGNFRWC5DJN5XDWMJUGM2DMNZRG43DENI> . You are receiving this because you commented.Message ID: ***@***.***>

jxmai mentioned this issue Aug 21, 2024

Fix #789: Update README with instructions for running the quantized L… #823

Merged

PromtEngineer closed this as completed in b4322d4 Sep 20, 2024

PromtEngineer added a commit that referenced this issue Sep 20, 2024

Merge pull request #823 from jxmai/feature/#789

b654a59

Fix #789: Update README with instructions for running the quantized L…

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support llama-3 #789

Support llama-3 #789

boixu commented Apr 22, 2024

toomy0toons commented May 2, 2024

KerenK-EXRM commented May 2, 2024

toomy0toons commented May 3, 2024 •

edited

Loading

PromtEngineer commented May 3, 2024

KerenK-EXRM commented May 4, 2024

VISWANATH78 commented May 8, 2024

carloposo commented May 8, 2024

toomy0toons commented May 8, 2024

carloposo commented May 8, 2024

carloposo commented May 9, 2024

jxmai commented Aug 21, 2024

[0.2.62]

jxmai commented Aug 21, 2024

VISWANATH78 commented Sep 21, 2024 via email

Support llama-3 #789

Support llama-3 #789

Comments

boixu commented Apr 22, 2024

toomy0toons commented May 2, 2024

KerenK-EXRM commented May 2, 2024

toomy0toons commented May 3, 2024 • edited Loading

PromtEngineer commented May 3, 2024

KerenK-EXRM commented May 4, 2024

VISWANATH78 commented May 8, 2024

carloposo commented May 8, 2024

toomy0toons commented May 8, 2024

carloposo commented May 8, 2024

carloposo commented May 9, 2024

jxmai commented Aug 21, 2024

[0.2.62]

jxmai commented Aug 21, 2024

VISWANATH78 commented Sep 21, 2024 via email

toomy0toons commented May 3, 2024 •

edited

Loading