Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using cached metrics in consecutive test_card runs can be faulty #884

Open
antonioan opened this issue Jun 5, 2024 · 4 comments
Open

Using cached metrics in consecutive test_card runs can be faulty #884

antonioan opened this issue Jun 5, 2024 · 4 comments
Assignees

Comments

@antonioan
Copy link
Member

I noticed that a consecutive test_card is using the same instance of the metric used by the previous call (for a different card).

This is problematic for me, because the common metric that I'm using (inheriting from BulkInstanceMetric) dynamically sets the fields in self.reduction_map["mean"] during the execution of compute(). So upon running test_card on the second card, the metric score reduction fails because it sees a field it doesn't recognize (but the previous card does; it had been added by that card).

It's interesting though that the second run does call prepare() on a new metric instance, but then runs compute() on the same previous one, as was caught by my debug prints:

Called prepare on id 6107889360
Running tests on cards.first
Called prepare on id 13044661776
Called prepare on id 13476572752
Called compute on id 13476572752
Called compute on id 13476572752
Called compute on id 13476572752
Called compute on id 13476572752
Running tests on cards.second
Called prepare on id 13042983248       # <-- new instance
Called compute on id 13476572752      # <-- same old instance
@yoavkatz
Copy link
Member

yoavkatz commented Jun 5, 2024

There seems to be a cache of artififacts:
@classmethod
def get_artifact(cls, artifact_identifier: str) -> Artifact:
if artifact_identifier not in cls.cache:
artifact, artifactory = fetch_artifact(artifact_identifier)
cls.cache[artifact_identifier] = artifact
return cls.cache[artifact_identifier]

It was introduced to improve runtime (e.g. for metrics that download models), but still, it’s error prone if artifacts are not immutable. Maybe caching should be enabled explicitly.

@elronbandel
Copy link
Member

Thanks @antonioan, that is indeed a very intersting behaviour. Can you write a minimal code that replicate it so i can have a look?

@yoavkatz
Copy link
Member

@antonioan showed this to me. Basically, you just need to have a variable in the metrics (even a count of processed instances variables). If you run two cards, the value from the counter in the first card will be used in the second card as well.

@dafnapension
Copy link
Collaborator

dafnapension commented Sep 25, 2024

Here is a small piece of code, that contains the important relevant parts of the issue:

from unitxt.metrics import Metric
from unitxt.operators import ArtifactFetcherMixin
from unitxt.test_utils.metrics import apply_metric

predictions = ["A", "B", "C"]
references = [["B", "C"], ["A"], ["B", "C"]]

metric = ArtifactFetcherMixin.get_artifact("metrics.accuracy")
assert isinstance(metric, Metric)

metric.score_prefix = "my_"
outputs = apply_metric(
    metric=metric, predictions=predictions, references=references
)
print(outputs[0]["score"])           <--- prints score named  my_accuracy, as metric was explicitly set

metric = ArtifactFetcherMixin.get_artifact("metrics.accuracy")
assert isinstance(metric, Metric)

outputs = apply_metric(
    metric=metric, predictions=predictions, references=references
)
print(outputs[0]["score"])   <--- prints score named  my_accuracy, although metric was gotten fresh from ArtifactFetcherMixin

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants