Machine learning with Semantic Web ontologies follows several strategies, one of which involves projecting ontologies into graph structures and applying graph embeddings or graph-based machine learning methods to the resulting graphs. Several methods have been developed that project ontology axioms into graphs. However, these methods are limited in the type of axioms they can project (totality), whether they are invertible (injectivity), and how they exploit semantic information. These limitations restrict the kind of tasks to which they can be applied. Category-theoretical semantics of logic languages formalizes interpretations using categories instead of sets, and categories have a graph-like structure. We developed CatE, which uses the category-theoretical formulation of the semantics of the Description Logic ALC to generate a graph representation for ontology axioms. The CatE projection is total and injective, and therefore overcomes limitations of other graph-based ontology embedding methods which are generally not invertible. We apply CatE to a number of different tasks, including deductive and inductive reasoning, and we demonstrate that CatE improves over state of the art ontology embedding methods. Furthermore, we show that CatE can also outperform model-theoretic ontology embedding methods in machine learning tasks in the biomedical domain.
├── README.md
├── run
├── src
│ ├── cat
│ ├── data
│ ├── models
│ ├── projectors
│ └── visual
├── tests
│ ├── example3
│ └── example4
└── use_cases
├── experiments
│ ├── fobi
│ ├── nro
│ ├── go_deductive
│ ├── go_completion
│ ├── foodon_completion
│ └── ppi
├── fobi
│ ├── data
│ └── models
├── foodon_completion
│ ├── data
│ └── models
├── go_completion
│ ├── data
│ └── models
├── go_deductive
│ ├── data
│ └── models
├── nro
│ ├── data
│ └── models
└── ppi
├── data
└── models
- Python 3.8
- Anaconda
- Git LFS
git lfs install #due to some large data files
git clone --recursive https://github.com/bio-ontology-research-group/catE.git
cd catE
conda env create -f environment.yml
conda activate cate
The data is located in the use_cases
directory for each task. For example, You will find the files for FoodOn completion task in use_cases/foodon_completion/data.tar.gz
. To uncompress the data just run:
cd use_cases/foodon_completion
tar -xzvf data.tar.gz
To run the script, use the run_model.py
script. The parameters are the following:
Parameter | Command line | Options |
---|---|---|
case | -case | nro, fobi, go_comp, foodon_comp, go_ded, ppi |
graph | -g | owl2vec, onto2graph, rdf, cat, cat1, cat2 |
kge model | -kge | transe, ordere |
root directory | -r | [case/data |
embedding dimension | -dim | 64, 128, 256, ... |
margin | -margin | 0.0, 0.02, 0.04, ... |
weight decay | -wd | 0.0000, 0.0001, 0.0005 |
batch size | -bs | 4096, 8192, 16834 |
learning rate | -lr | 0.1, 0.01, 0.001 |
testing batch size | -tbs | 8, 16 |
number of negatives | -negs | 1,2,3,... |
epochs | -e | 1000, 2000, ... |
device | -d | cpu, cuda |
validation file | -vf | name_of_validation_file.csv |
testing file | -tf | name_of_testing_file.csv |
results file | -rf | name_of_results_csv_file.csv |
reduced subsumption | -rs | Train on ontology with some axioms |
test unsatisfiability | -tu | Flag to test |
test deductive inference | -td | Flag to test |
test ontology completion | -tc | Flag to test |
test ppi | -tppi | Flag to test ppi prediction axioms |
only train | -otr | Flag to only perform training |
only test | -ot | Flag to only perform testing |
seed | -s | 42 |
For example, to train the ontology completion task for FoodOn you can run
python run_model.py -case foodon_comp -g cat -kge ordere -r foodon_comp/data -dim 64 -m 0.2 -wd 0.0001 -bs 4096 -lr 0.1 -tbs 8 -e 4000 -d cuda -rd results_foodon_comp.csv -tf foodon/data/test.csv
The commands to run the experiments in the paper are located at use_cases/experiments