Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Grouped instance metric inherit from InstanceMetrics #452

Merged
merged 112 commits into from
Feb 22, 2024

Conversation

sam-data-guy-iam
Copy link
Collaborator

@sam-data-guy-iam sam-data-guy-iam commented Jan 8, 2024

Please ignore the earlier PR for aggregation metrics and use this branch instead. I took @matanor s suggestion to modify InstanceMetric and make the grouped mean type metrics inherit from the InstanceMetric classes that define the instance calculation.

Also: Closes #389

@matanor matanor mentioned this pull request Jan 9, 2024
Copy link
Member

@matanor matanor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @sam-data-guy-iam thanks for this new PR, looks good!
i left a few comments, please have a look..

@@ -1 +1 @@
version = "1.4.3"
version = "1.4.3"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this file shouldn't be in the PR, no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes I guess this can be deleted from the PR, since it is just from the merge.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its still here, no? just with an updated version:
image

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I updated the branch with the merge, but I'm not sure what to do to fix this in particular. I guess this file can just be removed from the PR?

src/unitxt/metrics.py Outdated Show resolved Hide resolved
src/unitxt/metrics.py Outdated Show resolved Hide resolved
group_total_scores = [
score for score in group_total_scores if not np.isnan(score)
]
# ignore NaNs in aggregation
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment is not what the line below does.. should it be removed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a warnings catch for the RuntimeWarning from nanmean

ci = bootstrap(
(scores,),
statistic=mean,
(identifiers,),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you please explain why pass the identifiers and not the instances here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, I found a work-around. I had copied this part of the code from the GlobalMetric (which might not need it either).

try:
return aggregation_func(instances, score_name)
except Exception as e:
# this happens in edge cases, for example, when the sampling creates a
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this something that happens? because the comment talks about bleu which is a GlobalMetric.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're right, I copied this from GlobalMetric confidence intervals. In this case, since the instance scores are already computed, there is no additional computation on the instances (unless there is a corner case where one designs an aggregation function to do something weird like this) there should be no issue, and the confidence interval already deals with the case where there are NaNs. I am removing it.

Copy link
Member

@matanor matanor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @sam-data-guy-iam tks for making the changes, i went over the changes in metrics.py again, pls have a look ..


@property
@abstractmethod
def reduction_map(self) -> dict:
pass

def _validate_group_mean_reduction(self):
if "group_mean" in self.reduction_map:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since _validate_group_mean_reduction is called after checking if reduction == "group_mean": then perhaps there is no need to check again?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no problem, adding it as an assert without an if condition, just in case

if reduction == "mean":
from statistics import mean
@staticmethod
def aggregate(instances, field_name):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
def aggregate(instances, field_name):
def average_instance_scores(instances, field_name):

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

renaming this and the group function

def process(self, stream: Stream, stream_name: Optional[str] = None) -> Generator:
instances, global_score = self.compute_instance_scores(stream, stream_name)

for reduction, fields in self.reduction_map.items():
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
for reduction, fields in self.reduction_map.items():
for reduction_type, reduction_params in self.reduction_map.items():

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Improved readability IMO, pls also see suggestions below

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted


if reduction == "group_mean":
self._validate_group_mean_reduction()
score_fields = (
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
score_fields = (
reduction_fields = (

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted


aggregation_func = None
if reduction == "mean":
aggregation_func = self.aggregate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
aggregation_func = self.aggregate
aggregation_func = self.aggregate
reduction_fields = reduction_params

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

accepted

score_names = (
self.ci_scores if self.ci_scores is not None else [self.main_score]
)
if score_names is None:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest to always explicitly pass this param, i.e. remove the default, and update the relevant calls

src/unitxt/metrics.py Show resolved Hide resolved
src/unitxt/metrics.py Show resolved Hide resolved
src/unitxt/metrics.py Outdated Show resolved Hide resolved
with warnings.catch_warnings():
# in case instances is empty, return NaN but avoid printing a RuntimeWarning
warnings.simplefilter("ignore", category=RuntimeWarning)
return np.nanmean(scores)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what happened before your changes if some of the instance scores were NaN? Is it still the same behaviour?

Samuel Ackerman added 5 commits January 16, 2024 14:47
…ted CIs. score_based_confidence_interval accepts list of score fields without definining bootstrap function
move aggregate_instance_scores as static method to MetricWithConfidenceInterval so can be used in score_based_confidence_interval
Copy link

codecov bot commented Jan 16, 2024

Codecov Report

Attention: 10 lines in your changes are missing coverage. Please review.

Comparison is base (3637f8e) 88.03% compared to head (3d4c712) 88.40%.

Files Patch % Lines
src/unitxt/metrics.py 95.59% 10 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #452      +/-   ##
==========================================
+ Coverage   88.03%   88.40%   +0.36%     
==========================================
  Files          87       87              
  Lines        7440     7699     +259     
==========================================
+ Hits         6550     6806     +256     
- Misses        890      893       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@sam-data-guy-iam
Copy link
Collaborator Author

hi @sam-data-guy-iam , maded another pass over metrics.py, left some comments, mostly small things at this point.. pls have a look one thing which i am not sure i understand is why the deepcopy of the instances is done..

I used deepcopy to copy the instances because I didn't want to tie the resamples list to instances by assignment because we later modify the instances, and I didn't want there to be unexpected results (have had this happen in the past).

@matanor
Copy link
Member

matanor commented Feb 15, 2024

hi @sam-data-guy-iam , maded another pass over metrics.py, left some comments, mostly small things at this point.. pls have a look one thing which i am not sure i understand is why the deepcopy of the instances is done..

I used deepcopy to copy the instances because I didn't want to tie the resamples list to instances by assignment because we later modify the instances, and I didn't want there to be unexpected results (have had this happen in the past).

@sam-data-guy-iam
I see.. i think this can go both ways: you could make accidental changes to the instances, and then the deepcopy prevents that. But at some point others might be making intentional changes to the instances, and then the deepcopy would prevent these from taking effect on the original instances.
IMO, if you know now that you are making updates to the instances, and you don't want them reflected on the original instances, use deepcopy. but otherwise, i am not in favor of doing so, you might be preventing future issues, but you might also be causing future issues..

@sam-data-guy-iam
Copy link
Collaborator Author

hi @sam-data-guy-iam , maded another pass over metrics.py, left some comments, mostly small things at this point.. pls have a look one thing which i am not sure i understand is why the deepcopy of the instances is done..

I used deepcopy to copy the instances because I didn't want to tie the resamples list to instances by assignment because we later modify the instances, and I didn't want there to be unexpected results (have had this happen in the past).

@sam-data-guy-iam I see.. i think this can go both ways: you could make accidental changes to the instances, and then the deepcopy prevents that. But at some point others might be making intentional changes to the instances, and then the deepcopy would prevent these from taking effect on the original instances. IMO, if you know now that you are making updates to the instances, and you don't want them reflected on the original instances, use deepcopy. but otherwise, i am not in favor of doing so, you might be preventing future issues, but you might also be causing future issues..

fair enough. I will remove them.

Copy link
Member

@matanor matanor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @sam-data-guy-iam , pls see my comments regarding the tests.
also left some very minor suggestions for the non-test code

src/unitxt/metrics.py Outdated Show resolved Hide resolved
src/unitxt/metrics.py Outdated Show resolved Hide resolved
src/unitxt/metrics.py Outdated Show resolved Hide resolved
prepare/metrics/grouped_instance_metrics.py Outdated Show resolved Hide resolved
prepare/metrics/grouped_instance_metrics.py Show resolved Hide resolved
prepare/metrics/grouped_instance_metrics.py Show resolved Hide resolved
prepare/metrics/grouped_instance_metrics.py Show resolved Hide resolved
prepare/metrics/grouped_instance_metrics.py Show resolved Hide resolved
tests/library/test_metrics.py Show resolved Hide resolved
tests/library/test_metrics.py Show resolved Hide resolved
Copy link
Member

@matanor matanor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! thanks @sam-data-guy-iam for this new feature!

@matanor matanor enabled auto-merge (rebase) February 21, 2024 12:22
auto-merge was automatically disabled February 21, 2024 12:42

Rebase failed

@matanor matanor enabled auto-merge (rebase) February 22, 2024 07:02
auto-merge was automatically disabled February 22, 2024 07:31

Rebase failed

@elronbandel elronbandel merged commit a0443c2 into main Feb 22, 2024
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

In GlobalMetric confidence interval, need check to make sure new generated value is not NaN
3 participants