Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

find_overlap question #16

Open
amobash2 opened this issue Jun 4, 2020 · 3 comments
Open

find_overlap question #16

amobash2 opened this issue Jun 4, 2020 · 3 comments

Comments

@amobash2
Copy link

amobash2 commented Jun 4, 2020

Hi,
I assumed find_overlap is supposed to find if two ranges have any portion in common, am I wrong?

In your function if input is true_range = range(1, 2) and pred_range = range(2, 2), pred_range is a subset of true_range, so we should count this as a partial overlap, are you not counting such overlaps as partial? The range is exclusive of upper bound.

Below function will return set() when I feed above true_range and pred_range. Shouldn't it be better to check if the minimum of the ranges upper bound is smaller than maximum of lower bounds of the ranges, and return True saying the two overlap? Please correct me if I am not understanding your find_overlap function goal correctly :)

def find_overlap(true_range, pred_range):
"""Find the overlap between two ranges
Find the overlap between two ranges. Return the overlapping values if
present, else return an empty set().
Examples:
>>> find_overlap((1, 2), (2, 3))
2
>>> find_overlap((1, 2), (3, 4))
set()
"""

true_set = set(true_range)
pred_set = set(pred_range)

overlaps = true_set.intersection(pred_set)

return overlaps
@amobash2
Copy link
Author

amobash2 commented Jun 4, 2020

Adding another example for reproduction:

true_labels=
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-MISC', 'O', 'O', 'O']
pred_labels=
['O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'O', 'B-MISC', 'O', 'O', 'O']

true_e = collect_named_entities(true_labels)
pred = collect_named_entities(pred_labels)

pred_range = range(pred[0].start_offset, pred[0].end_offset)
true_range = range(true_e[0].start_offset, true_e[0].end_offset)

def find_overlap(true_range, pred_range):
"""Find the overlap between two ranges
Find the overlap between two ranges. Return the overlapping values if
present, else return an empty set().
Examples:
>>> find_overlap((1, 2), (2, 3))
2
>>> find_overlap((1, 2), (3, 4))
set()
"""

true_set = set(true_range)
pred_set = set(pred_range)

overlaps = true_set.intersection(pred_set)

return overlaps

print("Overlaps = ")
find_overlap(true_range, pred_range)

will return set(), I don't think use of range function here is best, using min/max as suggested above to determine if two tuples of [start_span, end_span] overlap makes more sense to me, please correct me if I am wrong.

@ivyleavedtoadflax
Copy link
Contributor

Hi @amobash2 please have a look at https://github.com/ivyleavedtoadflax/nervaluate. I took @davidsbatista's work here and made it into a python package. I'm not sure that your issues will be solved in nervaluate, but i am actively if infrequently developing it, so may be able to help.

@amobash2
Copy link
Author

amobash2 commented Jun 9, 2020

Thanks @ivyleavedtoadflax I will check your repo and reach out if we can resolve some of the issues in ner_eval.py file

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants