-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama3 instruct and chat system prompts #950
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Elron, The PR looks fine. Please add the json files and it will allow the consistency test to pass
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be consistent. The system prompt needs to be defined outside. the format
prepare/formats/models/llama3.py
Outdated
"{instruction}{source}<|eot_id|><|start_header_id|>assistant<|end_header_id|>\n\n" | ||
"{target_prefix}{target}<|eot_id|>", | ||
model_input_format="<|begin_of_text|><|start_header_id|>system<|end_header_id|>\n\n" | ||
"{DEFAULT_SYSTEM_PRO#MPT}<|eot_id|>{demos}<|start_header_id|>user<|end_header_id|>" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
system_prompt should be taken from {system_prompt}, so it can be configured from the outside.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yoavkatz yes, this is how it is for llama3_instruct
. For llama3_chat
it was defined without a system prompt in the past, and according to the guidelines that is not a good idea. So I thought it makes sense to add a default system prompt so if the user does not define one, we at least have a generic one. Does this make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we just put the system prompt there instead of using this DEFAULT_SYSTEM_PROMPT
constant? I tested it on my side, and it worked, but I am not sure if it will work if unitxt is installed as a library.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the recommended approach is to have system_prompt, then llama3_chat should have system prompt, but this should be configurable via {system_prompt}.
By default, it should be format=formats.llama3_chat,system_prompt=system_prompt.llama3.
The users can always pass system_prompt.empty if they don't want system prompt.
If there is any need to format without |system| special token, there should be a format which is llama3_chat_without_system_prompt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, formats.llama3_chat_with_system_prompt was used in the internal project that uses unitxt.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added the instruction and default behavior to the system prompt. I created under examples , simple code that we can all run and review the output:
python examples/evaluate_different_formats.py
source (str):
<|begin_of_text|><|start_header_id|>system<|end_header_id|>
You are a helpful, respectful and honest assistant. Always answer as helpfully as possible, while being safe. Your answers should not include any harmful, unethica
l, racist, sexist, toxic, dangerous, or illegal content. Please ensure that your responses are socially unbiased and positive in nature.
If a question does not make any sense, or is not factually coherent, explain why instead of answering something not correct. If you don't know the answer to a quest
ion, please don't share false information.
Answer the following questions in one word.<|eot_id|><|start_header_id|>user<|end_header_id|>
Where is Spain?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
Europe<|eot_id|><|start_header_id|>user<|end_header_id|>
When was IBM founded?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
1911<|eot_id|><|start_header_id|>user<|end_header_id|>
What is the capital of Texas?<|eot_id|><|start_header_id|>assistant<|end_header_id|>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
template = InputOutputTemplate(
instruction="Answer the following questions in one word.",
input_format="{question}",
output_format="{answers}",
postprocessors=["processors.lower_case"],
)
dataset = load_dataset(
card=card,
template=template,
format="formats.llama3_chat",
system_prompt="system_prompts.models.llama2",
num_demos=2,
demos_pool_size=3,
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @yoavkatz I was also working on a fix and testing it, but yours looks good to me, and I just tested it on a task and it works well (with llama3-70b-instruct I get a much better score with llama3 format).
The only thing is that now llama3_chat
and llama3_instruct
are the same. I suggest removing llama3_chat
and updating all the references to it. Do you want me to do it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. If they are the same we can keep just one. I just wonder about about the alternative format where the instruction and demos are in the same user prompt as the question. Granite recommended icl format is this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yoavkatz okay I just replaced llama_chat
with llama_instruct
anywhere it was used.
I also experimented with two alternatives. An alt1
option where the instruction goes to the user role prompt, and an alt2
option where the demos all go into one user prompt, as you had suggested. Numbers I get on two example tasks, each with llama-3-70b-instruct
and 3-shot prompts:
Task | Option | Score |
---|---|---|
cards.boolq.classification |
formats.llama3_instruct |
0.9119 |
cards.boolq.classification |
formats.llama3_instruct_alt1 |
0.9081 |
cards.boolq.classification |
formats.llama3_instruct_alt2 |
0.89 |
cards.summarize_from_human_feedback |
no format | 0.64 |
cards.summarize_from_human_feedback |
formats.llama3_instruct |
0.68 |
cards.summarize_from_human_feedback |
formats.llama3_instruct_alt1 |
0.67 |
cards.summarize_from_human_feedback |
formats.llama3_instruct_alt2 |
0.56 |
So it seems like alt1
is not making much of a difference, but alt2
is worse. I believe we have the right prompt now. I suggest approving this PR and merging for now. I'll be doing more testing and will let you know if I find ways to improve the format.
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I approved. Make sure the old jsons are deleted.
Also, given the results maybe we don't need alt1 and alt2 in the catalog. If we keep then, let's give them descriptive names (e.g formats.llama3_instruct_all_demos_in_a_single_turn".)
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
and show confidence internvals Signed-off-by: Yoav Katz <katz@il.ibm.com>
large model inference Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
Signed-off-by: Yoav Katz <katz@il.ibm.com>
* llama3 instruct and chat system prompts * fixing llama3_chat prompt + adding jsons * fixing llama3 formats + a boolqa system prompt * Start of example to check different formats. Signed-off-by: Yoav Katz <katz@il.ibm.com> * Added instructions to llama3 model. Signed-off-by: Yoav Katz <katz@il.ibm.com> * replacing llama3_chat with llama3_instruct + 2 alts * Added example for multiple formats Signed-off-by: Yoav Katz <katz@il.ibm.com> * prepare artifcats (whitespace changes) Signed-off-by: Yoav Katz <katz@il.ibm.com> * Updated evaluation to use more examples and show confidence internvals Signed-off-by: Yoav Katz <katz@il.ibm.com> * Do not run examples that require large model inference Signed-off-by: Yoav Katz <katz@il.ibm.com> * Seperated examples in example table Signed-off-by: Yoav Katz <katz@il.ibm.com> * Fixed test_examples to skip some files. Signed-off-by: Yoav Katz <katz@il.ibm.com> --------- Signed-off-by: Yoav Katz <katz@il.ibm.com> Co-authored-by: Yoav Katz <katz@il.ibm.com>
Issue #945
I removed the
formats.llama3_chat_with_system_prompt
system prompt that was not being used anywhere, and also wasn't right (e.g. had\n
between parts instead of\n\n
).I've updated
formats.llama3_chat
definition so:user
/assistant
prompts. I believe that is how we are supposed to provide examples.DEFAULT_SYSTEM_PROMPT
that is one that is being suggested/used for llama2, e.g. see: https://developer.ibm.com/tutorials/awb-prompt-engineering-llama-2/I've added a
formats.llama3_instruct
that I believe more accurately reflects the official guidance https://huggingface.co/blog/llama3#how-to-prompt-llama-3 which statesNote: the
instruction
is not a part of the demos. I believe a demo example-specific instruction can improve the prompt, but I don't think it's possible the way unitxt templates work now (instruction
cannot be used indemo_format
).