Eval metrics #587

lilacheden · 2024-02-20T11:05:08Z

Standard api for disabling/enabling confidence interval computation of metric/metricpipeline
make kendalltau and roc_auc metrics and not metric pipelines (to temporarily overcome the multiple metric pipeline issue)

codecov · 2024-02-20T11:36:17Z

Codecov Report

Attention: 2 lines in your changes are missing coverage. Please review.

Comparison is base (aa41221) 87.78% compared to head (6fba78a) 87.97%.

Files	Patch %	Lines
src/unitxt/metrics.py	91.66%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #587      +/-   ##
==========================================
+ Coverage   87.78%   87.97%   +0.18%     
==========================================
  Files          85       85              
  Lines        7170     7208      +38     
==========================================
+ Hits         6294     6341      +47     
+ Misses        876      867       -9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

matanor

LGTM! left some small comments ..

matanor · 2024-02-20T12:36:53Z

src/unitxt/metrics.py

@@ -80,6 +80,14 @@ def new_random_generator():
    def disable_confidence_interval_calculation(self):
        self.n_resamples = None

+    def disable_confidence_interval_calculation_return_n_resamples(self):


maybe just return self.n_resamples from the disable_confidence_interval_calculation method?
that shouldn't affect the existing code at all, no?

matanor · 2024-02-20T12:39:11Z

src/unitxt/metrics.py

        kendall_results = self.kendalltau(references, predictions, variant=self.variant)
        corr = kendall_results.correlation
        return {
            self.main_score: corr,
-            "p_val": kendall_results.pvalue,
+            "kendalltau_p_val": kendall_results.pvalue,


Suggested change

"kendalltau_p_val": kendall_results.pvalue,

f"{self.main_score}_p_val": kendall_results.pvalue,

the current main score is with a "_b" suffix, so perhaps they should be consistent?
main_score = "kendalltau_b"

matanor · 2024-02-20T12:41:24Z

src/unitxt/operators.py

-                    metric.metric, MetricWithConfidenceInterval
-                ):
-                    metric.metric.disable_confidence_interval_calculation()
+                metric.disable_confidence_interval_calculation()


do we also need an abstract disable_confidence_interval_calculation() in Metric?

currently IIUC only the "overrides" were defined ..

Signed-off-by: lilacheden <lilach.edel@gmail.com>

…es issue) Signed-off-by: lilacheden <lilach.edel@gmail.com>

Signed-off-by: lilacheden <lilach.edel@gmail.com>

…fix set_n_resamples Signed-off-by: lilacheden <lilach.edel@gmail.com>

Signed-off-by: lilacheden <lilach.edel@gmail.com>

lilacheden requested a review from matanor February 20, 2024 11:05

matanor approved these changes Feb 20, 2024

View reviewed changes

lilacheden force-pushed the eval_metrics branch 2 times, most recently from c0c1cf2 to afc8030 Compare February 20, 2024 14:18

lilacheden added 9 commits February 20, 2024 17:05

disable confidence intervals for metric pipeline

cacfb0e

Signed-off-by: lilacheden <lilach.edel@gmail.com>

make kendalltau and roc_auc metrics (to work around the metric piplin…

218d0a1

…es issue) Signed-off-by: lilacheden <lilach.edel@gmail.com>

more informative field name

24f2c03

Signed-off-by: lilacheden <lilach.edel@gmail.com>

fix: disable_confidence_interval_calculation returns n_resampling

7fc9e71

Signed-off-by: lilacheden <lilach.edel@gmail.com>

avoid changing disable_confidence_interval_calculation signature and …

692edaa

…fix set_n_resamples Signed-off-by: lilacheden <lilach.edel@gmail.com>

fix typo

5671092

Signed-off-by: lilacheden <lilach.edel@gmail.com>

add test

5b88868

Signed-off-by: lilacheden <lilach.edel@gmail.com>

add abstract methods to Metric

e214bf2

Signed-off-by: lilacheden <lilach.edel@gmail.com>

add tests for metrics

6fba78a

Signed-off-by: lilacheden <lilach.edel@gmail.com>

lilacheden force-pushed the eval_metrics branch from d57af26 to 6fba78a Compare February 20, 2024 15:05

lilacheden merged commit 1104a44 into main Feb 20, 2024
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eval metrics #587

Eval metrics #587

lilacheden commented Feb 20, 2024

codecov bot commented Feb 20, 2024 •

edited

Loading

matanor left a comment

matanor Feb 20, 2024

matanor Feb 20, 2024

matanor Feb 20, 2024

matanor Feb 20, 2024

matanor Feb 20, 2024

	"kendalltau_p_val": kendall_results.pvalue,
	f"{self.main_score}_p_val": kendall_results.pvalue,

Eval metrics #587

Eval metrics #587

Conversation

lilacheden commented Feb 20, 2024

codecov bot commented Feb 20, 2024 • edited Loading

Codecov Report

matanor left a comment

Choose a reason for hiding this comment

matanor Feb 20, 2024

Choose a reason for hiding this comment

matanor Feb 20, 2024

Choose a reason for hiding this comment

matanor Feb 20, 2024

Choose a reason for hiding this comment

matanor Feb 20, 2024

Choose a reason for hiding this comment

matanor Feb 20, 2024

Choose a reason for hiding this comment

codecov bot commented Feb 20, 2024 •

edited

Loading