Grouped instance metric inherit from InstanceMetrics #452

sam-data-guy-iam · 2024-01-08T16:57:41Z

Please ignore the earlier PR for aggregation metrics and use this branch instead. I took @matanor s suggestion to modify InstanceMetric and make the grouped mean type metrics inherit from the InstanceMetric classes that define the instance calculation.

Also: Closes #389

matanor

hi @sam-data-guy-iam thanks for this new PR, looks good!
i left a few comments, please have a look..

matanor · 2024-01-11T11:53:30Z

src/unitxt/version.py

@@ -1 +1 @@
-version = "1.4.3"
+version = "1.4.3"


I think this file shouldn't be in the PR, no?

yes I guess this can be deleted from the PR, since it is just from the merge.

Its still here, no? just with an updated version:

yes, I updated the branch with the merge, but I'm not sure what to do to fix this in particular. I guess this file can just be removed from the PR?

src/unitxt/metrics.py

matanor · 2024-01-11T13:33:17Z

src/unitxt/metrics.py

+        group_total_scores = [
+            score for score in group_total_scores if not np.isnan(score)
+        ]
+        # ignore NaNs in aggregation


The comment is not what the line below does.. should it be removed?

added a warnings catch for the RuntimeWarning from nanmean

matanor · 2024-01-11T13:39:39Z

src/unitxt/metrics.py

            ci = bootstrap(
-                (scores,),
-                statistic=mean,
+                (identifiers,),


can you please explain why pass the identifiers and not the instances here?

You're right, I found a work-around. I had copied this part of the code from the GlobalMetric (which might not need it either).

matanor · 2024-01-11T13:41:33Z

src/unitxt/metrics.py

+                    try:
+                        return aggregation_func(instances, score_name)
+                    except Exception as e:
+                        # this happens in edge cases, for example, when the sampling creates a


is this something that happens? because the comment talks about bleu which is a GlobalMetric.

you're right, I copied this from GlobalMetric confidence intervals. In this case, since the instance scores are already computed, there is no additional computation on the instances (unless there is a corner case where one designs an aggregation function to do something weird like this) there should be no issue, and the confidence interval already deals with the case where there are NaNs. I am removing it.

…tion

matanor

hi @sam-data-guy-iam tks for making the changes, i went over the changes in metrics.py again, pls have a look ..

matanor · 2024-01-14T09:29:33Z

src/unitxt/metrics.py


    @property
    @abstractmethod
    def reduction_map(self) -> dict:
        pass

+    def _validate_group_mean_reduction(self):
+        if "group_mean" in self.reduction_map:


since _validate_group_mean_reduction is called after checking if reduction == "group_mean": then perhaps there is no need to check again?

no problem, adding it as an assert without an if condition, just in case

matanor · 2024-01-14T09:34:39Z

src/unitxt/metrics.py

-            if reduction == "mean":
-                from statistics import mean
+    @staticmethod
+    def aggregate(instances, field_name):


Suggested change

def aggregate(instances, field_name):

def average_instance_scores(instances, field_name):

renaming this and the group function

matanor · 2024-01-14T09:37:26Z

src/unitxt/metrics.py

    def process(self, stream: Stream, stream_name: Optional[str] = None) -> Generator:
+        instances, global_score = self.compute_instance_scores(stream, stream_name)
+
+        for reduction, fields in self.reduction_map.items():


Suggested change

for reduction, fields in self.reduction_map.items():

for reduction_type, reduction_params in self.reduction_map.items():

Improved readability IMO, pls also see suggestions below

matanor · 2024-01-14T09:38:05Z

src/unitxt/metrics.py

+
+            if reduction == "group_mean":
+                self._validate_group_mean_reduction()
+                score_fields = (


Suggested change

score_fields = (

reduction_fields = (

matanor · 2024-01-14T09:39:03Z

src/unitxt/metrics.py

+
+            aggregation_func = None
+            if reduction == "mean":
+                aggregation_func = self.aggregate


Suggested change

aggregation_func = self.aggregate

aggregation_func = self.aggregate

reduction_fields = reduction_params

matanor · 2024-01-14T10:50:46Z

src/unitxt/metrics.py

-        score_names = (
-            self.ci_scores if self.ci_scores is not None else [self.main_score]
-        )
+        if score_names is None:


suggest to always explicitly pass this param, i.e. remove the default, and update the relevant calls

src/unitxt/metrics.py

matanor · 2024-01-14T10:57:17Z

src/unitxt/metrics.py

+        with warnings.catch_warnings():
+            # in case instances is empty, return NaN but avoid printing a RuntimeWarning
+            warnings.simplefilter("ignore", category=RuntimeWarning)
+            return np.nanmean(scores)


what happened before your changes if some of the instance scores were NaN? Is it still the same behaviour?

…ted CIs. score_based_confidence_interval accepts list of score fields without definining bootstrap function

move aggregate_instance_scores as static method to MetricWithConfidenceInterval so can be used in score_based_confidence_interval

codecov · 2024-01-16T19:39:02Z

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (3637f8e) 88.03% compared to head (3d4c712) 88.40%.

Files	Patch %	Lines
src/unitxt/metrics.py	95.59%	10 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #452      +/-   ##
==========================================
+ Coverage   88.03%   88.40%   +0.36%     
==========================================
  Files          87       87              
  Lines        7440     7699     +259     
==========================================
+ Hits         6550     6806     +256     
- Misses        890      893       +3

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…nction

…aphrase_accuracy.json rename to norm hedges g

…/unitxt into aggregation_inherit_instance

sam-data-guy-iam · 2024-02-14T16:34:11Z

hi @sam-data-guy-iam , maded another pass over metrics.py, left some comments, mostly small things at this point.. pls have a look one thing which i am not sure i understand is why the deepcopy of the instances is done..

I used deepcopy to copy the instances because I didn't want to tie the resamples list to instances by assignment because we later modify the instances, and I didn't want there to be unexpected results (have had this happen in the past).

…dated

matanor · 2024-02-15T08:12:22Z

hi @sam-data-guy-iam , maded another pass over metrics.py, left some comments, mostly small things at this point.. pls have a look one thing which i am not sure i understand is why the deepcopy of the instances is done..

I used deepcopy to copy the instances because I didn't want to tie the resamples list to instances by assignment because we later modify the instances, and I didn't want there to be unexpected results (have had this happen in the past).

@sam-data-guy-iam
I see.. i think this can go both ways: you could make accidental changes to the instances, and then the deepcopy prevents that. But at some point others might be making intentional changes to the instances, and then the deepcopy would prevent these from taking effect on the original instances.
IMO, if you know now that you are making updates to the instances, and you don't want them reflected on the original instances, use deepcopy. but otherwise, i am not in favor of doing so, you might be preventing future issues, but you might also be causing future issues..

sam-data-guy-iam · 2024-02-15T09:37:38Z

hi @sam-data-guy-iam , maded another pass over metrics.py, left some comments, mostly small things at this point.. pls have a look one thing which i am not sure i understand is why the deepcopy of the instances is done..

I used deepcopy to copy the instances because I didn't want to tie the resamples list to instances by assignment because we later modify the instances, and I didn't want there to be unexpected results (have had this happen in the past).

@sam-data-guy-iam I see.. i think this can go both ways: you could make accidental changes to the instances, and then the deepcopy prevents that. But at some point others might be making intentional changes to the instances, and then the deepcopy would prevent these from taking effect on the original instances. IMO, if you know now that you are making updates to the instances, and you don't want them reflected on the original instances, use deepcopy. but otherwise, i am not in favor of doing so, you might be preventing future issues, but you might also be causing future issues..

fair enough. I will remove them.

matanor

hi @sam-data-guy-iam , pls see my comments regarding the tests.
also left some very minor suggestions for the non-test code

src/unitxt/metrics.py

prepare/metrics/grouped_instance_metrics.py

tests/library/test_metrics.py

…n of prediction and reference to strings internally.

matanor

LGTM! thanks @sam-data-guy-iam for this new feature!

…ues, so CI will not be NaN

…-NaN values, so CI will not be NaN" This reverts commit 6249538.

…ues, so CI will not be NaN

Samuel Ackerman added 13 commits January 8, 2024 11:36

add tests for grouped instance metrics

b32d782

modify InstanceMetric to accept grouped_mean reduction

a797cdc

initial commit

0d63164

apply ruff formatting

0f3f828

apply ruff formatting, reduce complexity

a316972

merge with main

d21ab3a

merge with main

ffa4e1d

initial commit

01914ff

rename grouped instance metrics so artifact type and name correspond

7d98ec5

rename grouped instance metrics so artifact type and name correspond

b99694a

rename grouped instance metrics so artifact type and name correspond

735ce41

Merge branch 'main' into aggregation_inherit_instance

1dca6aa

remove newline formatting

730ff45

matanor mentioned this pull request Jan 9, 2024

Aggregation metrics #430

Closed

Samuel Ackerman added 2 commits January 10, 2024 17:27

commits from merge

eeada82

remove (catalog from removed metric)

0aaa1da

matanor requested changes Jan 11, 2024

View reviewed changes

Samuel Ackerman added 3 commits January 11, 2024 21:08

fix some variation in expected values

2d52c54

add catching of nanmean warning; fix InstanceMetric verification func…

510d6e8

…tion

merge with main

bb6b46b

matanor requested changes Jan 14, 2024

View reviewed changes

Samuel Ackerman added 5 commits January 16, 2024 14:47

InstanceMetric need to specify ci_scores for fields that have calcula…

8f5ce10

…ted CIs. score_based_confidence_interval accepts list of score fields without definining bootstrap function

Merge branch 'main' into aggregation_inherit_instance

998bbdb

add ci_scores to several InstanceMetrics

55e559d

move aggregate_instance_scores as static method to MetricWithConfidenceInterval so can be used in score_based_confidence_interval

Merge branch 'main' into aggregation_inherit_instance

ac1fe8c

ruff formatting

7047fd7

Samuel Ackerman added 3 commits January 17, 2024 11:25

add test_grouped_instance_metric_errors for code coverage

67e05e4

Merge branch 'main' into aggregation_inherit_instance

ea82024

add grouped instance metrics with normalized Cohen's h aggregation fu…

4cc38cb

…nction

sam-data-guy-iam and others added 3 commits February 14, 2024 11:44

Delete src/unitxt/catalog/metrics/robustness/fixed_group_hedges_g_par…

f014bc6

…aphrase_accuracy.json rename to norm hedges g

Merge branch 'aggregation_inherit_instance' of https://github.com/IBM…

95b1587

…/unitxt into aggregation_inherit_instance

fix PDR so if both means are 0, return 0 rather than NaN

04cec38

elronbandel approved these changes Feb 14, 2024

View reviewed changes

final PR changes, remove agg_func definition

be7410e

remove checks on instances in get_group_scores that were already vali…

3a98a47

…dated

remove deepcopy

81dbff4

matanor requested changes Feb 19, 2024

View reviewed changes

Samuel Ackerman added 2 commits February 20, 2024 13:22

fix some comments and parameter names. Make TokenOverlap do conversio…

bcc1ae0

…n of prediction and reference to strings internally.

Merge branch 'main' into aggregation_inherit_instance

1afb458

matanor approved these changes Feb 20, 2024

View reviewed changes

Samuel Ackerman added 8 commits February 20, 2024 16:55

initial commit

ccf447f

add absolute value version of Hedges G / Cohens H

7472623

add absolute value version of Hedges G / Cohens H to tests

3593542

merge with main

ce207b7

changes to global metric confidence interval now resample non-NaN val…

6249538

…ues, so CI will not be NaN

Merge branch 'main' into aggregation_inherit_instance

4c28f1a

Revert "changes to global metric confidence interval now resample non…

651dd84

…-NaN values, so CI will not be NaN" This reverts commit 6249538.

changes to global metric confidence interval now resample non-NaN val…

34af53b

…ues, so CI will not be NaN

matanor enabled auto-merge (rebase) February 21, 2024 12:22

Merge branch 'main' into aggregation_inherit_instance

36edb77

auto-merge was automatically disabled February 21, 2024 12:42
Rebase failed

Merge branch 'main' into aggregation_inherit_instance

45ee96c

matanor enabled auto-merge (rebase) February 22, 2024 07:02

auto-merge was automatically disabled February 22, 2024 07:31
Rebase failed

Merge branch 'main' into aggregation_inherit_instance

3d4c712

elronbandel merged commit a0443c2 into main Feb 22, 2024
6 of 7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Grouped instance metric inherit from InstanceMetrics #452

Grouped instance metric inherit from InstanceMetrics #452

sam-data-guy-iam commented Jan 8, 2024 •

edited by matanor

Loading

matanor left a comment

matanor Jan 11, 2024

sam-data-guy-iam Jan 11, 2024

matanor Jan 14, 2024

sam-data-guy-iam Jan 14, 2024

matanor Jan 11, 2024

sam-data-guy-iam Jan 11, 2024

matanor Jan 11, 2024

sam-data-guy-iam Jan 11, 2024

matanor Jan 11, 2024

sam-data-guy-iam Jan 11, 2024

matanor left a comment

matanor Jan 14, 2024

sam-data-guy-iam Jan 14, 2024

matanor Jan 14, 2024

sam-data-guy-iam Jan 14, 2024

matanor Jan 14, 2024

matanor Jan 14, 2024

sam-data-guy-iam Jan 14, 2024

matanor Jan 14, 2024

sam-data-guy-iam Jan 14, 2024

matanor Jan 14, 2024

sam-data-guy-iam Jan 14, 2024

matanor Jan 14, 2024

matanor Jan 14, 2024

codecov bot commented Jan 16, 2024 •

edited

Loading

sam-data-guy-iam commented Feb 14, 2024

matanor commented Feb 15, 2024

sam-data-guy-iam commented Feb 15, 2024

matanor left a comment

matanor left a comment

	def aggregate(instances, field_name):
	def average_instance_scores(instances, field_name):

	for reduction, fields in self.reduction_map.items():
	for reduction_type, reduction_params in self.reduction_map.items():

	aggregation_func = self.aggregate
	aggregation_func = self.aggregate
	reduction_fields = reduction_params

Grouped instance metric inherit from InstanceMetrics #452

Grouped instance metric inherit from InstanceMetrics #452

Conversation

sam-data-guy-iam commented Jan 8, 2024 • edited by matanor Loading

matanor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

matanor left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Jan 16, 2024 • edited Loading

Codecov Report

sam-data-guy-iam commented Feb 14, 2024

matanor commented Feb 15, 2024

sam-data-guy-iam commented Feb 15, 2024

matanor left a comment

Choose a reason for hiding this comment

matanor left a comment

Choose a reason for hiding this comment

sam-data-guy-iam commented Jan 8, 2024 •

edited by matanor

Loading

codecov bot commented Jan 16, 2024 •

edited

Loading