-
Notifications
You must be signed in to change notification settings - Fork 39
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add llm as judge mt-bench dataset and metrics #791
Merged
elronbandel
merged 164 commits into
main
from
users/ofir/add_llm_as_judge_dataset_and_metrics
May 20, 2024
Merged
Add llm as judge mt-bench dataset and metrics #791
elronbandel
merged 164 commits into
main
from
users/ofir/add_llm_as_judge_dataset_and_metrics
May 20, 2024
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…set_and_metrics' into users/ofir/add_llm_as_judge_dataset_and_metrics
…set_and_metrics' into users/ofir/add_llm_as_judge_dataset_and_metrics
…set_and_metrics' into users/ofir/add_llm_as_judge_dataset_and_metrics
elronbandel
approved these changes
May 20, 2024
bnayahu
pushed a commit
that referenced
this pull request
May 21, 2024
* add mt_bench_single_turn_gpt4_judge dataset * added typings to model_response_assessment task field * fixed output_format in mt_bench template * fixed output_format in mt_bench template * add llama3 format * temporal changes to the inference engines * add llama3_bam_mt_bench_prompt llm-as-judge metric * add assert to openai model recipe * update genai and openai inference apis * add model_response_assessment_chat task * add ChatTemplate * add model_response_assessment.json * fix model_response_assessment.json * add template and task of chat llm as judge * mt bench templates * mt bench templates * model assessment tasks * add InterleaveListsToDialogOperator operator * update dialog template * update mt bench template * update mt bench template update * update chat template * add mt bench datasets * small fixes * update metrics * update metrics * delete old files * update test requirements file * update test requirements file * update llam3 metric with correct format * add model assestmnt tasks with reference * update tasks * clear catalog * add tasks * update task * update templates * update * update * update * add mt bench pairwise proccessor * remove odl file * update * add model assesment pairwise comparison tass * add pairwise templates * fix pairwise templates * fix mt bench pairwise processor * fix template * add mt-bench pairwise dataset * llm as judge metric cards * add llama3 metrics * update * update * update prepare test python version * clean catalog * update templates * update tasks * update tasks * update templates * update cards * update cards * update templates * add cards * add cards for llm as judge metric * add cards for llm as judge metric * add metrics * merge * add mt becnh generation datasets * fix * fix * fix * fix * update python to 3.9 for catalog testing * remove old catalog items * update llm as a judge * update readme * update tests * update dynamic cards for llm as judge * update llm as jusge etric * update tests * add the ability to strip_system_prompt_and_format_from_inputs * update tests * update * update * update * update * update * update * update * update * update * update * update * update * add phi3 format * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update readme * update cards with LiteralEval * update cards with LiteralEval * make llm judge dynamic fields * add json * update * update metric * update * fix * update readme * update * update * update * update * update * update * update * update * update * update * update * update * update * update * update * Update llm_as_judge.rst * update * update * update * update * update * Update llm_as_judge.rst (#847) * update * update * update * update * update * update * update * update * small fix * small fix --------- Co-authored-by: Yoav Katz <68273864+yoavkatz@users.noreply.github.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.