Add example of using LLM as a judge for summarization dataset. #965

eladven · 2024-06-30T07:52:11Z

No description provided.

elronbandel

Overall looks good. Yet do we really want users to write templates for each task being evaluated? I think a better model is that we will guide them to use something like:
"card=cards.xsum,metrics=[metrics.llm_as_judge]"

yoavkatz · 2024-06-30T09:00:30Z

Overall looks good. Yet do we really want users to write templates for each task being evaluated? I think a better model is that we will guide them to use something like: "card=cards.xsum,metrics=[metrics.llm_as_judge]"

Yes. I agree. We already have an example of adding an llm metric definition and using it. Here , I think we should use a simple predefined metric. We should show one metric that uses the reference answr and ibe that does not.

codecov · 2024-06-30T09:20:25Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.35%. Comparing base (31f7d4b) to head (7445aa5).
Report is 8 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #965      +/-   ##
==========================================
+ Coverage   91.33%   91.35%   +0.01%     
==========================================
  Files         110      112       +2     
  Lines       11704    11794      +90     
==========================================
+ Hits        10690    10774      +84     
- Misses       1014     1020       +6

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Add example of using LLM as a judge for summarization dataset.

7445aa5

elronbandel approved these changes Jun 30, 2024

View reviewed changes

elronbandel enabled auto-merge (squash) June 30, 2024 08:25

elronbandel merged commit 80243f5 into main Jun 30, 2024
10 checks passed

elronbandel deleted the summarization_llm_as_judge branch June 30, 2024 09:21

gitMichal pushed a commit that referenced this pull request Jul 15, 2024

Add example of using LLM as a judge for summarization dataset. (#965)

093a1d5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add example of using LLM as a judge for summarization dataset. #965

Add example of using LLM as a judge for summarization dataset. #965

eladven commented Jun 30, 2024

elronbandel left a comment

yoavkatz commented Jun 30, 2024

codecov bot commented Jun 30, 2024

Add example of using LLM as a judge for summarization dataset. #965

Add example of using LLM as a judge for summarization dataset. #965

Conversation

eladven commented Jun 30, 2024

elronbandel left a comment

Choose a reason for hiding this comment

yoavkatz commented Jun 30, 2024

codecov bot commented Jun 30, 2024

Codecov Report