Using cached metrics in consecutive test_card runs can be faulty #884

antonioan · 2024-06-05T08:32:22Z

I noticed that a consecutive test_card is using the same instance of the metric used by the previous call (for a different card).

This is problematic for me, because the common metric that I'm using (inheriting from BulkInstanceMetric) dynamically sets the fields in self.reduction_map["mean"] during the execution of compute(). So upon running test_card on the second card, the metric score reduction fails because it sees a field it doesn't recognize (but the previous card does; it had been added by that card).

It's interesting though that the second run does call prepare() on a new metric instance, but then runs compute() on the same previous one, as was caught by my debug prints:

Called prepare on id 6107889360
Running tests on cards.first
Called prepare on id 13044661776
Called prepare on id 13476572752
Called compute on id 13476572752
Called compute on id 13476572752
Called compute on id 13476572752
Called compute on id 13476572752
Running tests on cards.second
Called prepare on id 13042983248       # <-- new instance
Called compute on id 13476572752      # <-- same old instance

The text was updated successfully, but these errors were encountered:

yoavkatz · 2024-06-05T09:01:30Z

There seems to be a cache of artififacts:
@classmethod
def get_artifact(cls, artifact_identifier: str) -> Artifact:
if artifact_identifier not in cls.cache:
artifact, artifactory = fetch_artifact(artifact_identifier)
cls.cache[artifact_identifier] = artifact
return cls.cache[artifact_identifier]

It was introduced to improve runtime (e.g. for metrics that download models), but still, it’s error prone if artifacts are not immutable. Maybe caching should be enabled explicitly.

elronbandel · 2024-06-10T19:14:18Z

Thanks @antonioan, that is indeed a very intersting behaviour. Can you write a minimal code that replicate it so i can have a look?

yoavkatz · 2024-06-11T04:57:00Z

@antonioan showed this to me. Basically, you just need to have a variable in the metrics (even a count of processed instances variables). If you run two cards, the value from the counter in the first card will be used in the second card as well.

dafnapension · 2024-09-25T18:57:18Z

Here is a small piece of code, that contains the important relevant parts of the issue:

from unitxt.metrics import Metric
from unitxt.operators import ArtifactFetcherMixin
from unitxt.test_utils.metrics import apply_metric

predictions = ["A", "B", "C"]
references = [["B", "C"], ["A"], ["B", "C"]]

metric = ArtifactFetcherMixin.get_artifact("metrics.accuracy")
assert isinstance(metric, Metric)

metric.score_prefix = "my_"
outputs = apply_metric(
    metric=metric, predictions=predictions, references=references
)
print(outputs[0]["score"])           <--- prints score named  my_accuracy, as metric was explicitly set

metric = ArtifactFetcherMixin.get_artifact("metrics.accuracy")
assert isinstance(metric, Metric)

outputs = apply_metric(
    metric=metric, predictions=predictions, references=references
)
print(outputs[0]["score"])   <--- prints score named  my_accuracy, although metric was gotten fresh from ArtifactFetcherMixin

yoavkatz assigned yoavkatz and elronbandel Jun 5, 2024

dafnapension mentioned this issue Sep 26, 2024

return a deep copy of artifact from artifact-cache #1238

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using cached metrics in consecutive test_card runs can be faulty #884

Using cached metrics in consecutive test_card runs can be faulty #884

antonioan commented Jun 5, 2024

yoavkatz commented Jun 5, 2024

elronbandel commented Jun 10, 2024

yoavkatz commented Jun 11, 2024

dafnapension commented Sep 25, 2024 •

edited

Loading

Using cached metrics in consecutive test_card runs can be faulty #884

Using cached metrics in consecutive test_card runs can be faulty #884

Comments

antonioan commented Jun 5, 2024

yoavkatz commented Jun 5, 2024

elronbandel commented Jun 10, 2024

yoavkatz commented Jun 11, 2024

dafnapension commented Sep 25, 2024 • edited Loading

dafnapension commented Sep 25, 2024 •

edited

Loading