Skip to content

Commit

Permalink
Synchronization: SubClassOf
Browse files Browse the repository at this point in the history
Pipeline to keep subclass relationships in sync between Mondo and sources.
Currently handled: Case 1, 3, 5
- Add: makefile goals: sync, sync-subclassof, and more.
- Update: makefile goal: build-mondo-ingest
- Add: src/scripts/sync_subclassof.py
- Add: Outputs to reports/

General
- Bugfix: utils.py remove_angle_brackets(): Now correctly returns a str if receives a str.
- Update: config/prefixes.csv: Add: New entries
- Add: utils.py: get_monarch_curies_converter()
- Update: Python requirements: oaklib upgrade
- Update: Minor refactor in _get_all_owned_terms()
- Update: .gitignore: entries, ordering, and comments

Related to other PRs
- Update: metadata/icd10who.yml: The 2 prefix maps in this file were inconsistent with each other. Corrected here and also in #377
  • Loading branch information
joeflack4 committed Jan 6, 2024
1 parent f5fa475 commit 8e9ca4b
Show file tree
Hide file tree
Showing 31 changed files with 96,313 additions and 62 deletions.
35 changes: 28 additions & 7 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -35,9 +35,11 @@ mondo-ingest-base.json

imports/*

# src/mappings/
src/mappings/mondo-sources-all-lexical.sssom.tsv
src/mappings/mondo-sources-all-lexical-2.sssom.tsv

# src/ontology/
src/ontology/.template.db
src/ontology/mirror
src/ontology/mirror/*
Expand All @@ -47,6 +49,7 @@ src/ontology/reports/*.ttl
!src/ontology/reports/*exclusion_reasons.robot.template.tsv
!src/ontology/reports/*excluded_terms.txt
src/ontology/reports/mondo-ingest-edit.owl-obo-report.tsv

src/ontology/mondo-ingest.owl
src/ontology/mondo-ingest.obo
src/ontology/mondo-ingest.json
Expand All @@ -55,7 +58,6 @@ src/ontology/mondo-ingest-basic.*
src/ontology/mondo-ingest-full.*
src/ontology/mondo-ingest-simple.*
src/ontology/mondo-ingest-simple-non-classified.*

src/ontology/seed.txt
src/ontology/dosdp-tools.log
src/ontology/ed_definitions_merged.owl
Expand All @@ -64,20 +66,39 @@ src/ontology/simple_seed.txt
src/ontology/patterns
src/ontology/merged-mondo-ingest-edit.owl

src/ontology/target/

src/ontology/tmp/*
!src/ontology/tmp/README.md
# src/ontology/components/
src/ontology/components/*

# src/ontology/imports/
src/ontology/imports/*.owl
!src/ontology/imports/do_import.owl
!src/ontology/imports/omo_import.owl
!src/ontology/imports/ro_import.owl
src/ontology/imports/*.json
src/ontology/imports/*_terms_combined.txt

src/ontology/components/*
# src/ontology/mirror/
src/ontology/mirror
src/ontology/mirror/*

# src/ontology/reports/
src/ontology/reports/*.ttl
src/ontology/reports/*.subclass.direct-in-mondo-only.tsv
src/ontology/reports/mondo-ingest-edit.owl-obo-report.tsv
# todo: These ! patterns don't seem to be an exception to anything that would currently be ignoring them, so I don't think they're needed. Commented out. - Joe 2023/11/16
#!src/ontology/reports/README.md
#!src/ontology/reports/*excluded_terms_in_mondo_xrefs.tsv
#!src/ontology/reports/*exclusion_reasons.robot.template.tsv
#!src/ontology/reports/*excluded_terms.txt

# src/ontology/target/
src/ontology/target/

# src/ontology/tmp/
src/ontology/tmp/*
!src/ontology/tmp/README.md

# src/patterns/
src/patterns/data/**/*.ofn
src/patterns/data/**/*.txt
src/patterns/pattern_owl_seed.txt
Expand All @@ -87,5 +108,5 @@ src/scripts/.ipynb_checkpoints/*
src/scripts/mondo_unmapped.tsv

# Test
test/output/
tests/output/
src/scripts/dataframes/*
18 changes: 16 additions & 2 deletions docs/developer/workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,8 +35,8 @@ for each ontology with more detailed information.
These workflows will create a [mapping progress report](../reports/unmapped.md) with statistics, with linked pages for each ontology that show unmapped terms.

#### Makefile goals
1. `reports/%_mapping_status.tsv`: Running this also runs `reports/%_unmapped_terms.tsv`. Creates a table of all terms for ontology `%`, along with labels, and other columns `is_excluded`, `is_mapped`, `is_deprecated`.
2. `reports/%_unmapped_terms.tsv`: Running this also runs `reports/%_mapping_status.tsv`. Creates a table of unmapped terms for ontology `%` and their labels.
1. `reports/%_mapping_status.tsv`: Running this also runs / generates `reports/%_unmapped_terms.tsv`. Creates a table of all terms for ontology `%`, along with labels, and other columns `is_excluded`, `is_mapped`, `is_deprecated`.
2. `reports/%_unmapped_terms.tsv`: Running this also runs / generates `reports/%_mapping_status.tsv`. Creates a table of unmapped terms for ontology `%` and their labels.
3. `unmapped-terms-tables`: Generates `reports/%_mapping_status.tsv` and `reports/%_unmapped_terms.tsv` for all ontologies.
4. `unmapped-terms-docs`: Based on the set of `reports/%_mapping_status.tsv` and `reports/%_unmapped_terms.tsv` for all ontologies, uses these to create the [mapping progress report](../reports/unmapped.md) and other related pages.
5. `mapping-progress-report`: Runs `unmapped-terms-tables` and `unmapped-terms-docs`. Creates mapping progress report [mapping progress report](../reports/unmapped.md) and pages for each ontology which list their umapped terms. Also generates `reports/%_mapping_status.tsv` and `reports/%_unmapped_terms.tsv` for all ontologies.
Expand Down Expand Up @@ -66,3 +66,17 @@ These workflows will help with excluding certain terms from integration into Mon
7. `reports/excluded_terms.txt`: Runs reports/%_term_exclusions.txt for all ontologies and combines into a single file.
8. `reports/exclusion_reasons.robot.template.tsv`: Runs reports/%_exclusion_reasons.robot.template.tsv for all ontologies and combines into a single file.
9. `exclusions-all`: Runs all exclusion artefacts for all ontologies.

## Synchronization
These workflows help synchronize Mondo with source ontologies.

#### Makefile goals
1. `generate-synchronization-files`: Runs synchronization pipeline.
2. `sync-subclassof`: Runs 'sync-subclassof' part of synchronization pipeline, generating set of outputs for all ontologies.
3. `sync-subclassof-%`: Generates subClassOf synchronization outputs for given ontology. Alias for `reports/%.subclass.added.robot.tsv`, `reports/%.subclass.confirmed.robot.tsv`, and `reports/%.subclass.direct-in-mondo-only.tsv`.
4. `reports/%.subclass.added.robot.tsv`: Creates robot template containing new subclass relationships from given ontology to be imported into Mondo. Running this also runs / generates `reports/%.subclass.added-obsolete.robot.tsv`, `reports/%.subclass.confirmed.robot.tsv`, and `reports/%.subclass.direct-in-mondo-only.tsv`.
5. `reports/%.subclass.added-obsolete.robot.tsv`: Creates robot template containing new subclass relationships from given ontology that would be imported into Mondo, except for that these terms are obsolete in Mondo. Running this also runs / generates `reports/%.subclass.added.robot.tsv`, `reports/%.subclass.confirmed.robot.tsv`, and `reports/%.subclass.direct-in-mondo-only.tsv`.
6. `reports/%.subclass.confirmed.robot.tsv`: Creates robot template containing subclass relations for given ontology that exist in Mondo and are confirmed to also exist in the source. Running this also runs / generates `reports/%.subclass.added.robot.tsv`, `reports/%.subclass.added-obsolete.robot.tsv`, and `reports/%.subclass.direct-in-mondo-only.tsv`.
7. `reports/%.subclass.direct-in-mondo-only.tsv`: Path to create file for relations for given ontology where direct subclass relation exists only in Mondo and not in the source. Running this also runs / generates `reports/%.subclass.added.robot.tsv`, `reports/%.subclass.added-obsolete.robot.tsv`, and `reports/%.subclass.confirmed.robot.tsv`.
8. `reports/sync-subClassOf.direct-in-mondo-only.tsv`: For all subclass relationships in Mondo, shows which sources do not have it and whether no source has it. Combination of all `--outpath-direct-in-mondo-only` outputs for all sources, using those as inputs, and then deletes them after.
9. `reports/sync-subClassOf.confirmed.tsv`: For all subclass relationships in Mondo, by source, a robot template containing showing what is in Mondo and are confirmed to also exist in the source. Combination of all `--outpath-confirmed` outputs for all sources.
2 changes: 1 addition & 1 deletion python-requirements-unlocked.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
curies
jinja2
oaklib
oaklib>=0.5.20
pandas
pyyaml
sssom
Expand Down
111 changes: 79 additions & 32 deletions python-requirements.txt
Original file line number Diff line number Diff line change
@@ -1,42 +1,57 @@
aiohttp==3.8.1
aiosignal==1.2.0
airium==0.2.5
alabaster==0.7.12
annotated-types==0.5.0
antlr4-python3-runtime==4.9.3
appdirs==1.4.4
arrow==1.2.3
async-timeout==4.0.2
attrs==21.4.0
Babel==2.10.1
bcp47==0.0.4
bioregistry==0.5.95
beautifulsoup4==4.12.2
bioregistry==0.6.99
bleach==5.0.0
cattrs==22.2.0
certifi==2021.10.8
CFGraph==0.2.1
chardet==4.0.0
chardet==5.2.0
charset-normalizer==2.0.12
class-resolver==0.3.10
click==8.1.3
class-resolver==0.4.2
click==8.1.7
colorama==0.4.6
commonmark==0.9.1
curies==0.1.5
curies==0.6.4
decorator==5.1.1
Deprecated==1.2.13
deprecation==2.1.0
distlib==0.3.4
docutils==0.17.1
EditorConfig==0.12.3
et-xmlfile==1.1.0
fastobo==0.12.1
eutils==0.6.0
exceptiongroup==1.1.1
fastobo==0.12.2
filelock==3.6.0
fqdn==1.5.1
frozenlist==1.3.0
fsspec==2022.5.0
funowl==0.1.12
funowl==0.2.3
ghp-import==2.1.0
graphviz==0.20
greenlet==1.1.2
greenlet==2.0.1
hbreader==0.9.1
idna==3.3
ijson==3.2.0.post0
imagesize==1.3.0
importlib-metadata==4.12.0
importlib-metadata==6.8.0
iniconfig==2.0.0
isodate==0.6.1
isoduration==20.11.0
Jinja2==3.1.2
joblib==1.1.0
jsbeautifier==1.14.7
json-flattener==0.1.9
jsonasobj==1.3.1
jsonasobj2==1.0.4
Expand All @@ -46,94 +61,126 @@ jsonpointer==2.3
jsonschema==4.4.0
keyring==23.6.0
kgcl==0.1.0
kgcl-rdflib==0.3.0
kgcl-schema==0.3.0
kgcl-rdflib==0.5.0
kgcl-schema==0.6.0
lark==1.1.2
linkml==1.2.14
linkml==1.4.11
linkml-dataops==0.1.0
linkml-runtime==1.2.16
linkml-renderer==0.3.0
linkml-runtime==1.5.7
lxml==4.9.2
Markdown==3.3.7
markdown-it-py==2.1.0
MarkupSafe==2.1.1
mdit-py-plugins==0.3.0
mdurl==0.1.1
more-click==0.1.1
mergedeep==1.3.4
mkdocs==1.4.3
mkdocs-material==9.1.12
mkdocs-material-extensions==1.1.1
mkdocs-mermaid2-plugin==0.6.0
more-click==0.1.2
multidict==6.0.2
myst-parser==0.18.0
networkx==2.8
numpy==1.22.3
ndex2==3.5.0
networkx==3.1
numpy==1.24.4
nxontology==0.4.1
oaklib==0.1.43
oaklib==0.5.20
ols-client==0.1.2
ontoportal-client==0.0.3
openpyxl==3.0.10
packaging==21.3
pandas==1.4.4
pandas==2.1.1
pandasql==0.7.3
pansql==0.0.1
parse==1.19.0
pbr==5.8.1
pkginfo==1.8.3
platformdirs==2.5.2
pluggy==1.0.0
ply==3.11
prefixcommons==0.1.9
prefixmaps==0.1.3
pronto==2.5.0
pydantic==1.9.1
Pygments==2.12.0
prefixcommons==0.1.12
prefixmaps==0.1.5
pronto==2.5.5
py==1.11.0
pydantic==2.4.0
pydantic_core==2.10.0
Pygments==2.15.1
PyJSG==0.11.10
pymdown-extensions==10.0
pyparsing==2.4.7
pyrsistent==0.18.1
PyShEx==0.8.1
PyShExC==0.9.1
pystow==0.4.4
pysolr==3.9.0
pystow==0.5.0
pytest==7.2.2
pytest-logging==2015.11.4
python-dateutil==2.8.2
PyTrie==0.4.0
pytz==2022.1
PyYAML==6.0
PyYAML==6.0.1
pyyaml_env_tag==0.1
ratelimit==2.2.1
rdflib==6.1.1
rdflib==7.0.0
rdflib-jsonld==0.6.1
rdflib-shim==1.0.3
readme-renderer==35.0
recommonmark==0.7.1
requests==2.27.1
regex==2023.5.5
requests==2.28.2
requests-cache==1.0.1
requests-toolbelt==0.9.1
rfc3339-validator==0.1.4
rfc3986==2.0.0
rfc3987==1.3.8
rich==12.4.4
ruamel.yaml==0.17.21
ruamel.yaml.clib==0.2.6
scikit-learn==1.0.2
scipy==1.8.0
semsql==0.2.5
semsimian==0.2.1
semsql==0.3.2
ShExJSG==0.8.2
six==1.16.0
snowballstemmer==2.2.0
sortedcontainers==2.4.0
soupsieve==2.4.1
sparqlslurper==0.5.1
SPARQLWrapper==2.0.0
Sphinx==4.5.0
sphinx-click==4.2.0
sphinx-rtd-theme==1.2.0
sphinxcontrib-applehelp==1.0.2
sphinxcontrib-devhelp==1.0.2
sphinxcontrib-htmlhelp==2.0.0
sphinxcontrib-jquery==4.1
sphinxcontrib-jsmath==1.0.1
sphinxcontrib-qthelp==1.0.3
sphinxcontrib-serializinghtml==1.1.5
SQLAlchemy==1.4.36
SQLAlchemy-Utils==0.38.2
sssom==0.3.16
sssom-schema==0.9.4
sssom==0.3.41
sssom-schema==0.15.0
stevedore==3.5.0
tabulate==0.9.0
threadpoolctl==3.1.0
tomli==2.0.1
tox==3.28.0
tqdm==4.64.0
twine==4.0.1
typing_extensions==4.2.0
typing_extensions==4.8.0
tzdata==2023.3
uri-template==1.2.0
url-normalize==1.4.3
urllib3==1.26.9
validators==0.18.2
validators==0.22.0
virtualenv==20.14.1
virtualenv-clone==0.5.7
virtualenvwrapper==4.8.4
watchdog==2.1.9
webcolors==1.12
webencodings==0.5.1
wrapt==1.14.0
yarl==1.7.2
Expand Down
Loading

0 comments on commit 8e9ca4b

Please sign in to comment.