Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hard coded CURIES in OAK code cause confusion when ontologies use different prefix maps #698

Open
matentzn opened this issue Jan 20, 2024 · 3 comments

Comments

@matentzn
Copy link
Contributor

Having pieces of code like

, and I think just searching there are a number of cases in OAK where these are occur, seems dangerous to me. @joeflack4 just uncovered a case where we passed in a oboInOwl prefix to semsql, which resulted in lexmatch no longer being able to understand that oboInOwl:hasExactSynonym (which was used in the ontology) is, in fact, the same as oio:hasExactSynonym. There are various ways to solve this problem:

  1. No code exists where curies are defined that cant be overwritten by the user. In the ex
  2. A standardised "oak context" exists to which all incoming information is standardised before processing, or, the other way around, OAK entities in the code are standardized (using curies.Converter.standardize()) against an incoming prefix map.
  3. We could require that incoming semsql ontologies must be standardised against the OAK context (prefix map).

None of this is particularly easy - (3) is probably easiest, but we would have to give some tool support, like

runoak normalise-prefixes -i ont.db.

@cmungall
Copy link
Collaborator

3 - this is the way

@joeflack4
Copy link
Contributor

What does (3) entail?

Currently, you can pass a prefix map (currently only non-EPM bimap supported; prefixes.csv) when creating a SemSQL DB. Are we saying that this prefix map can have additional entries not already in the OAK context so long as there is no conflict (i.e. a URI prefix which is assigned in the to a different prefix than OAK has assigned)?

Couldn't we just interpret such conflicts as prefix synonyms and maybe throw a warning to the user?

@matentzn
Copy link
Contributor Author

prefixes.csv is actually a "one way epm in disguise" the same prefix can be mapped to multiple URL prefixes. See my comments in #699 for what I think the best solution would be. The key issue here is not the EPM - it is that prefix assumptions are hardcoded in the code. All entities in the code should be cycled through a standard epm before being used (say, "curies.Converter.standardise("oio:hasDbXref")" or something similar. Ideally, --epm can always be passed in to all oak commands to replace the default epm, which re-serialises the built-in curies prior to usage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants