CatE: Embedding $\mathcal{ALC}$ ontologies using category-theoretical semantics.

Abstract

Machine learning with Semantic Web ontologies follows several strategies, one of which involves projecting ontologies into graph structures and applying graph embeddings or graph-based machine learning methods to the resulting graphs. Several methods have been developed that project ontology axioms into graphs. However, these methods are limited in the type of axioms they can project (totality), whether they are invertible (injectivity), and how they exploit semantic information. These limitations restrict the kind of tasks to which they can be applied. Category-theoretical semantics of logic languages formalizes interpretations using categories instead of sets, and categories have a graph-like structure. We developed CatE, which uses the category-theoretical formulation of the semantics of the Description Logic ALC to generate a graph representation for ontology axioms. The CatE projection is total and injective, and therefore overcomes limitations of other graph-based ontology embedding methods which are generally not invertible. We apply CatE to a number of different tasks, including deductive and inductive reasoning, and we demonstrate that CatE improves over state of the art ontology embedding methods. Furthermore, we show that CatE can also outperform model-theoretic ontology embedding methods in machine learning tasks in the biomedical domain.

Repository Overview


├── README.md
├── run
├── src
│   ├── cat
│   ├── data
│   ├── models
│   ├── projectors
│   └── visual
├── tests
│   ├── example3
│   └── example4
└── use_cases
    ├── experiments
    │   ├── fobi
    │   ├── nro
    │   ├── go_deductive
    │   ├── go_completion
    │   ├── foodon_completion
    │   └── ppi
    ├── fobi
    │   ├── data
    │   └── models
    ├── foodon_completion
    │   ├── data
    │   └── models
    ├── go_completion
    │   ├── data
    │   └── models
    ├── go_deductive
    │   ├── data
    │   └── models
    ├── nro
    │   ├── data
    │   └── models
    └── ppi
        ├── data
        └── models

Dependencies

Python 3.8
Anaconda
Git LFS

Set up environment

git lfs install #due to some large data files
git clone --recursive https://github.com/bio-ontology-research-group/catE.git
cd catE
conda env create -f environment.yml
conda activate cate

Getting the data

The data is located in the use_cases directory for each task. For example, You will find the files for FoodOn completion task in use_cases/foodon_completion/data.tar.gz. To uncompress the data just run:

cd use_cases/foodon_completion
tar -xzvf data.tar.gz

Running the model

To run the script, use the run_model.py script. The parameters are the following:

Parameter	Command line	Options
case	-case	nro, fobi, go_comp, foodon_comp, go_ded, ppi
graph	-g	owl2vec, onto2graph, rdf, cat, cat1, cat2
kge model	-kge	transe, ordere
root directory	-r	[case/data
embedding dimension	-dim	64, 128, 256, ...
margin	-margin	0.0, 0.02, 0.04, ...
weight decay	-wd	0.0000, 0.0001, 0.0005
batch size	-bs	4096, 8192, 16834
learning rate	-lr	0.1, 0.01, 0.001
testing batch size	-tbs	8, 16
number of negatives	-negs	1,2,3,...
epochs	-e	1000, 2000, ...
device	-d	cpu, cuda
validation file	-vf	name_of_validation_file.csv
testing file	-tf	name_of_testing_file.csv
results file	-rf	name_of_results_csv_file.csv
reduced subsumption	-rs	Train on ontology with some axioms $C \sqsubseteq D$ removed
test unsatisfiability	-tu	Flag to test $C \sqsubseteq \bot$ axioms
test deductive inference	-td	Flag to test $C \sqsubseteq D$ axioms in the deductive closure
test ontology completion	-tc	Flag to test $C \sqsubseteq D$ axioms in that are plausible but not entailed
test ppi	-tppi	Flag to test ppi prediction axioms
only train	-otr	Flag to only perform training
only test	-ot	Flag to only perform testing
seed	-s	42

For example, to train the ontology completion task for FoodOn you can run

python run_model.py -case foodon_comp -g cat -kge ordere -r foodon_comp/data -dim 64 -m 0.2 -wd 0.0001 -bs 4096 -lr 0.1 -tbs 8 -e 4000 -d cuda -rd results_foodon_comp.csv -tf foodon/data/test.csv

The commands to run the experiments in the paper are located at use_cases/experiments

Name		Name	Last commit message	Last commit date
Latest commit History 89 Commits
src		src
tests		tests
use_cases		use_cases
.gitignore		.gitignore
.gitmodules		.gitmodules
README.md		README.md
environment.yml		environment.yml
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CatE: Embedding $\mathcal{ALC}$ ontologies using category-theoretical semantics.

Abstract

Repository Overview

Dependencies

Set up environment

Getting the data

Running the model

About

Releases

Packages

Languages

bio-ontology-research-group/catE

Folders and files

Latest commit

History

Repository files navigation

CatE: Embedding $\mathcal{ALC}$ ontologies using category-theoretical semantics.

Abstract

Repository Overview

Dependencies

Set up environment

Getting the data

Running the model

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages