Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different dynamicProperties fields in event.txt and occurrence.txt - what happens when the tables are joined? #201

Open
Mesibov opened this issue Apr 20, 2023 · 6 comments
Labels
other types of data - extensions term - dynamicProperties term - Event Pertaining to a term organized in the Event class. term - MeasurementOrFact Pertaining to a term organized in the MeasurementOrFact class. term - Occurrence Pertaining to a term organized in the Occurrence class. term - record-level Pertaining to a term not organized in any specific Darwin Core class.

Comments

@Mesibov
Copy link

Mesibov commented Apr 20, 2023

A data compiler wants to use dynamicProperties both in event.txt and occurrence.txt. The field in event.txt will have key:value data for events, with each eventID potentially having different data. The field in occurrence.txt will have completely different key:value data, with each occurrenceID potentially having different data. When these tables are joined (e.g. by GBIF, to build occurrences), there is a collision of dynamicProperties fields. What happens? Is there any way to avoid a collision?

@tucotuco
Copy link
Member

Hi @Mesibov. I can't speak for how GBIF interprets the two terms (@timrobertson100), but I can say that a more explicit way to encode the information that is going into those dynamicProperties fields is to use the Extended Measurement or Facts extension. With that the publisher can be explicit about whether the information pertains to the Event or to the Occurrence as well as providing more potential richness than a key:value pair.

@tucotuco tucotuco added term - record-level Pertaining to a term not organized in any specific Darwin Core class. term - Occurrence Pertaining to a term organized in the Occurrence class. term - Event Pertaining to a term organized in the Event class. other types of data - extensions term - MeasurementOrFact Pertaining to a term organized in the MeasurementOrFact class. term - dynamicProperties labels Apr 23, 2023
@timrobertson100
Copy link
Member

In GBIF processing today, the data is pivoted to occurrences such that the fields on the event will only be used if they are null on the occurrence records. In this instance, those event properties would be dropped. There is exploratory work to bring in an event index where both fields would remain, but that is some way out.

I think John provides the better option for current use though.

@Mesibov
Copy link
Author

Mesibov commented Apr 23, 2023

Many thanks @tucotuco and @timrobertson100. So the (single) eMOF could contain records with "eventID" for the event properties and "occurrenceID" for the occurrence properties, which sounds like it would work...

@debpaul
Copy link
Contributor

debpaul commented Apr 24, 2023

@Mesibov great question -- thanks for asking it. @mjy please take a look. Thanks @timrobertson100 @tucotuco for explaining the possible choices and results of those choices in this ticket.

@dbloom
Copy link

dbloom commented Apr 24, 2023 via email

@MattBlissett
Copy link
Member

Some DWCA extensions are included in GBIF, although in most cases their content is not searchable directly. It is possible to search for records having an extension, e.g. https://www.gbif.org/occurrence/search?advanced=1&dwca_extension=http:~2F~2Frs.iobis.org~2Fobis~2Fterms~2FExtendedMeasurementOrFact

Data downloads including verbatim extension data are coming soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
other types of data - extensions term - dynamicProperties term - Event Pertaining to a term organized in the Event class. term - MeasurementOrFact Pertaining to a term organized in the MeasurementOrFact class. term - Occurrence Pertaining to a term organized in the Occurrence class. term - record-level Pertaining to a term not organized in any specific Darwin Core class.
Projects
None yet
Development

No branches or pull requests

6 participants