Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Fixes #355
This is my first attempt to debug a public codebase. def need a few pointers for guidance.
I've investigated two possible sources of error methods as per suggested by you @lvwerra, namely the
_infer_feature_from_example
or_enforce_nested_string_type
.After some debugging, I found that the former method may work as intended. It correctly picks the right schema format for a given prediction and reference input. For example, when I ran the code below,
there are two schemas available for rouge (as it supports multiple reference inputs for a single prediction) which are saved as a list like follows
[{'predictions': Value(dtype='string', id='sequence'), 'references': Sequence(feature=Value(dtype='string', id='sequence'), length=-1, id=None)}, {'predictions': Value(dtype='string', id='sequence'), 'references': Value(dtype='string', id='sequence')}]
. the method will then update the right format toself.selected_feature_format
.I suspect the problem arises when the code tries to call _enforce_nested_string_type on self.info.features which stores both schemas rather than the one that was chosen by
_infer_feature_from_example
. this discrepancy then leads to _enforce_nested_string_type enforcing on the wrong schema which in this case is the reference sequence schema when it was supposed to enforce it on the reference value schema.So I changed self.info.features to self.selected_feature_format and it works. I tried on a few other tests that I created in
debug_test.py
and they all passed - this probably isn't the right way of doing code tests but I'm new so yeah 😅.