Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add example of using LLM as a judge for summarization dataset. #965

Merged
merged 1 commit into from
Jun 30, 2024

Conversation

eladven
Copy link
Member

@eladven eladven commented Jun 30, 2024

No description provided.

Copy link
Member

@elronbandel elronbandel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall looks good. Yet do we really want users to write templates for each task being evaluated? I think a better model is that we will guide them to use something like:
"card=cards.xsum,metrics=[metrics.llm_as_judge]"

@elronbandel elronbandel enabled auto-merge (squash) June 30, 2024 08:25
@yoavkatz
Copy link
Member

Overall looks good. Yet do we really want users to write templates for each task being evaluated? I think a better model is that we will guide them to use something like: "card=cards.xsum,metrics=[metrics.llm_as_judge]"

Yes. I agree. We already have an example of adding an llm metric definition and using it. Here , I think we should use a simple predefined metric. We should show one metric that uses the reference answr and ibe that does not.

Copy link

codecov bot commented Jun 30, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.35%. Comparing base (31f7d4b) to head (7445aa5).
Report is 8 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #965      +/-   ##
==========================================
+ Coverage   91.33%   91.35%   +0.01%     
==========================================
  Files         110      112       +2     
  Lines       11704    11794      +90     
==========================================
+ Hits        10690    10774      +84     
- Misses       1014     1020       +6     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@elronbandel elronbandel merged commit 80243f5 into main Jun 30, 2024
10 checks passed
@elronbandel elronbandel deleted the summarization_llm_as_judge branch June 30, 2024 09:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants