Enable regex to extract floats in score generation #1223

sfc-gh-dhuang · 2024-06-18T01:56:03Z

Items to add to release announcement:

Heading:
During benchmarking of various feedback providers, I notice there are models (i.e. a finetuned mixtral-8x7b that tend to give 10.0 instead of 10 in their feedback scoring before normalization. In the current implementation, PATTERN_INTEGER will extract 0 and 10 from 10.0 and eventually pick the lesser value.

Failing example when testing groundedness feedback functions - where the score accompanying COT was interpreted as 0, instead of the expected 10:

0.0,
 {'reasons': 'STATEMENT 0:\nCriteria: I am a man and I love fish,\nSupporting Evidence: The source states "All men love fish", and the statement contains "I am a man and I love fish". The source contains the information that a man loves fish, and the statement contains the information that the speaker is a man and loves fish.\nScore: 10.0\n'}

I'm switching to PATTERN_NUMBER to unblock for now.

Other details that are good to know but need not be announced:

This is only a stopgap solution and I might be missing some contexts here as in why PATTERN_INTEGER was used over PATTERN_NUMBER in the previous PR. cc @sfc-gh-pmardziel to add more background if I'm missing sth obvious.

I do believe we should move toward structured and systematic feedback score generation mechanisms with some self-refining prompt iterations (i.e. via DSPy) ASAP for more robust score generation, ideally before integrating w/ the monitoring stack, even at the cost of slightly higher token usage/cost/latency (which can also be alleviated via better prompts and instruction tuning).

sfc-gh-dkurokawa · 2024-06-21T17:10:23Z

trulens_eval/trulens_eval/utils/generated.py


    vals = set()
    for match in matches:
        try:
-            vals.add(validate_rating(int(match)))
+            vals.add(
+                validate_rating(int(float(match)))


kinda late to the game, but:

Why are we constrained to ints?

If we are constrained to ints, shouldn't we round it instead of flooring it?

The doc on L54 says "If the string does not match an integer ... raises an error ...", but this won't do that.

No particular reason beyond easier intrepretability for the end users AFAIK.

Make sense - I can make a change to this

Same as 1. - I do recall we went back and forth a bit on this and the doc just became outdated b/c of my change. Will update

#1244 fix PR

* match floats and integers for score generation * bb * updated expected test cases * minor fix

dosubot bot added the size:XS This PR changes 0-9 lines, ignoring generated files. label Jun 18, 2024

sfc-gh-dhuang requested review from sfc-gh-dkurokawa, sfc-gh-chu, sfc-gh-jfishbein, sfc-gh-pdharmana and a team June 18, 2024 01:56

sfc-gh-dhuang added 3 commits June 17, 2024 18:56

match floats and integers for score generation

97211f2

bb

c461a69

updated expected test cases

6d55f0a

sfc-gh-dhuang force-pushed the fix-regex-floating-point branch from 092ea37 to 6d55f0a Compare June 18, 2024 01:57

minor fix

eb88f44

dosubot bot added size:S This PR changes 10-29 lines, ignoring generated files. and removed size:XS This PR changes 0-9 lines, ignoring generated files. labels Jun 18, 2024

sfc-gh-jreini approved these changes Jun 18, 2024

View reviewed changes

dosubot bot added the lgtm This PR has been approved by a maintainer label Jun 18, 2024

sfc-gh-jreini merged commit e481329 into main Jun 18, 2024
9 checks passed

sfc-gh-dkurokawa reviewed Jun 21, 2024

View reviewed changes

sfc-gh-jreini mentioned this pull request Jun 21, 2024

0.32.0 release #1240

Merged

sfc-gh-dhuang added a commit that referenced this pull request Jun 28, 2024

Enable regex to extract floats in score generation (#1223)

dfc9cd5

* match floats and integers for score generation * bb * updated expected test cases * minor fix

sfc-gh-dhuang added a commit that referenced this pull request Jul 1, 2024

Enable regex to extract floats in score generation (#1223)

c65a1f1

* match floats and integers for score generation * bb * updated expected test cases * minor fix

sfc-gh-chu pushed a commit that referenced this pull request Sep 25, 2024

Enable regex to extract floats in score generation (#1223)

6989659

* match floats and integers for score generation * bb * updated expected test cases * minor fix

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enable regex to extract floats in score generation #1223

Enable regex to extract floats in score generation #1223

sfc-gh-dhuang commented Jun 18, 2024 •

edited

Loading

sfc-gh-dkurokawa Jun 21, 2024

sfc-gh-dhuang Jun 22, 2024

sfc-gh-dhuang Jun 22, 2024

Enable regex to extract floats in score generation #1223

Enable regex to extract floats in score generation #1223

Conversation

sfc-gh-dhuang commented Jun 18, 2024 • edited Loading

sfc-gh-dkurokawa Jun 21, 2024

Choose a reason for hiding this comment

sfc-gh-dhuang Jun 22, 2024

Choose a reason for hiding this comment

sfc-gh-dhuang Jun 22, 2024

Choose a reason for hiding this comment

sfc-gh-dhuang commented Jun 18, 2024 •

edited

Loading