Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Include PyOBO products #45

Open
cmungall opened this issue Jul 14, 2022 · 6 comments
Open

Include PyOBO products #45

cmungall opened this issue Jul 14, 2022 · 6 comments

Comments

@cmungall
Copy link
Collaborator

See https://github.com/biopragmatics/obo-db-ingest

It would be quite easy to add these as builds, and distribute the sqlite on s3.

Advantages:

  • easy to query in OAK (though some methods and commands e.g tree wouldn't make sense as these wouldn't follow expected structural shapes for ontologies)
  • fast to query via SQL

Note there is ongoing discussion about URIs for these, but semantic-sql doesn't care, we store things natively as CURIEs, and the prefix table can be swapped to anything.

Ideally the products would be built and distributed (obo/owl/json) upstream, to avoid running the build step, as this introduces an additional source of potential pipeline failure, we also have to determine memory/disk requirements

cc @cthoyt

@cthoyt
Copy link

cthoyt commented Jul 18, 2022

Yes they’re all built and distributed on GitHub at the moment but some need to be gzipped. Is that alright?

cmungall added a commit that referenced this issue Nov 23, 2022
@cmungall
Copy link
Collaborator Author

gzip is fine. It would be great if all had stable URLs, to avoid modifying the registry entry on new releases (it is worth continuing to explore housing some of these on OBO but that can be pursued separately). Standardizing on ISO-8601 for release dates would be great too.

I'm trying a few of these. I am manually adding to the registry for now but perhaps we could come up with some kind of standard registry yaml for this sort of thing.

@cthoyt
Copy link

cthoyt commented Mar 16, 2023

FYI there are now PURLs for these files, standardized to ISO standard for dates when possible. Examples:

Resource Version Type Example PURL
Reactome Sequential https://w3id.org/biopragmatics/resources/reactome/83/reactome.obo
Interpro Major/Minor https://w3id.org/biopragmatics/resources/interpro/92.0/interpro.obo
Interpro Semantic https://w3id.org/biopragmatics/resources/drugbank.salt/5.1.9/drugbank.salt.obo
MeSH Year https://w3id.org/biopragmatics/resources/mesh/2003/mesh.obo.gz
UniProt Year/Month https://w3id.org/biopragmatics/resources/uniprot/2022_05/uniprot.obo.gz
HGNC Date https://w3id.org/biopragmatics/resources/hgnc/2023-02-01/hgnc.obo
CGNC unversioned* https://w3id.org/biopragmatics/resources/cgnc/cgnc.obo

to do:

  1. Make a "release" even for versioned ones so there's a stable URL pointing to the most recent version
  2. Standardize date formats further, e.g. for UniProt, Wikipathways, etc
  3. Create some kind of manifest file of the latest build

@cmungall
Copy link
Collaborator Author

cmungall commented Mar 17, 2023 via email

@cmungall
Copy link
Collaborator Author

and remember, what you have is swissprot, NOT uniprot! :-)

@cthoyt
Copy link

cthoyt commented Mar 18, 2023

@cmungall here's the manifest file, with PURLs for each of the most recent artifacts listed in it: https://github.com/biopragmatics/obo-db-ingest/blob/main/manifest.yml

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants