Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Duplicate detections in 2010_phd_reubens_sync #78

Open
2 tasks
peterdesmet opened this issue Nov 12, 2020 · 4 comments
Open
2 tasks

Duplicate detections in 2010_phd_reubens_sync #78

peterdesmet opened this issue Nov 12, 2020 · 4 comments
Assignees
Labels
2015_phd_reubens_sync datapaper Issue that needs to be resolved for data paper etn data update Update required in ETN database

Comments

@peterdesmet
Copy link
Member

peterdesmet commented Nov 12, 2020

2010_phd_reubens_sync contains 77.426 detections with duplicate pk (38713 to be removed). This seems to be caused by 2 tags:

tag_id tag_fk animal_id (Number of Rows)
A69-1303-65302 67 733 112 + 83.278 non duplicates
A69-1303-65302 85 747 112 (only records)
A69-1303-65303 68 2159 38601
A69-1303-65303 68 734 38601
  • A69-1303-65302: is listed 3 times in tags: 96 (AVAILABLE), 67 (ENDED), 85 (AVAILABLE). I don't think the tag should be listed as AVAILABLE twice, but that does not seem to be causing the issue, because A69-1303-65301 is also listed twice as available. The issue seems to be with 85: all duplicates are coming from this tag.
  • animal 734 and 2159 are complete duplicates. I would suggest to remove 2159
@peterdesmet peterdesmet added datapaper Issue that needs to be resolved for data paper etn data update Update required in ETN database 2015_phd_reubens_sync labels Nov 12, 2020
@jreubens
Copy link
Collaborator

These are not that easy to handle, as this was a mistake of Vemco. Thus although the tag Ids are the same, these tags were put in the water as different tags. Thus I can't remove any tag/animal from the DB.
Can we manually remove the duplicates generated here?

@peterdesmet
Copy link
Member Author

Changed in DB:

  • A69-1303-65302: one duplicate assignment from 2010-09-14 00:00 to 2011-03-11 was removed in animal table
  • A69-1303-65307: recapture date was set to 2011-03-11 (instead of 2011-03-16) to not overlap with next assignment

@peterdesmet
Copy link
Member Author

@jreubens, we now have 2,226 duplicated rows (1,113 duplicates) all around the recovery/release date, for almost all tags. This is odd, because in the animals table, none of the recovery/release dates overlap anymore, while in the detections it seem they do, for 2011-03-11 and 2011-03-12:

tag_id animal_id date_time (Number of Rows)
A69-1303-65302 733 2011-03-11 112
A69-1303-65302 747 2011-03-11 112
A69-1303-65304 735 2011-03-11 186
A69-1303-65304 735 2011-03-12 5
A69-1303-65304 749 2011-03-11 186
A69-1303-65304 749 2011-03-12 5
A69-1303-65305 736 2011-03-11 79
A69-1303-65305 736 2011-03-12 7
A69-1303-65305 750 2011-03-11 79
A69-1303-65305 750 2011-03-12 7
A69-1303-65306 737 2011-03-11 187
A69-1303-65306 737 2011-03-12 12
A69-1303-65306 751 2011-03-11 187
A69-1303-65306 751 2011-03-12 12
A69-1303-65307 738 2011-03-11 47
A69-1303-65307 738 2011-03-12 6
A69-1303-65307 752 2011-03-11 47
A69-1303-65307 752 2011-03-12 6
A69-1303-65308 739 2011-03-11 86
A69-1303-65308 753 2011-03-11 86
A69-1303-65309 740 2011-03-11 22
A69-1303-65309 754 2011-03-11 22
A69-1303-65310 741 2011-03-11 109
A69-1303-65310 741 2011-03-12 6
A69-1303-65310 755 2011-03-11 109
A69-1303-65310 755 2011-03-12 6
A69-1303-65311 742 2011-03-11 90
A69-1303-65311 756 2011-03-11 90
A69-1303-65312 743 2011-03-11 108
A69-1303-65312 757 2011-03-11 108
A69-1303-65313 744 2011-03-11 51
A69-1303-65313 758 2011-03-11 51

@jreubens
Copy link
Collaborator

jreubens commented Dec 1, 2020

I discussed this with Aubri and he told it is very hard to find this out. I guess this has lower priority and suggest we take the first detection of the two...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
2015_phd_reubens_sync datapaper Issue that needs to be resolved for data paper etn data update Update required in ETN database
Projects
None yet
Development

No branches or pull requests

2 participants