HTO: The Heritage Text Ontology

The Heritage Text Ontology is designed to represent the knowledge in textual heritage. It describes a set of classes, properties, and restrictions that can be used to represent content and metadata of digitised historical textual collections. It can also capture the provenance information of digital contents and concepts. This helps to track not only the sources where same concepts appeared, but also how textual data was digitised and extracted.

Why building this ontology?

A large number of textual heritage has been digitised by various data providers, which provides a wealth of opportunities for advancing research in the fields of history, culture, and linguistics. However, the heterogeneous and scale of these digital archives makes it difficult for researchers to search and extract meaningful information. In this case, making digitised textual collections Findable, Accessible, Interoperable, and Reusable (FAIR) for both humans and machines is extremely important, and building this ontology is the fundamental step to achieve that goal as it shares the formal representation of the knowledge in this domain.

Domain and Scope

For what we are going to use the ontology?

To enable interoperability amongst systems. By using the same knowledge model, computer systems can use this ontology and apply their own applications to it.
To ensure reusability. The ontology is designed to be shared amongst people and systems, and also designed in a way that facilitates later integration into other ontologies, thus allowing its reuse.
To facilitate knowledge acquisition. This is one of the main goals of our work. Knowledge found in documents can be hard to extract. This ontology aims to represent the main concepts in documents along with their relations. Also, it intends to represent the temporal aspect of the knowledge. This requires linking related concepts, tracking provenance information from different data providers (how, when and where concepts were recorded)

What is the domain that the ontology will cover?

Overall, this ontology will represent metadata (title, physical description, genre, editor, et al.) of documents, textual content (original text, article, abstract, concepts such as people and place) of documents, related provenance information (digitisation activity, software tools, editor, publisher, et al.) of the documents and the textual content, and relations among them.

Note that we only focus on the literal meaning of the textual content instead of the style how text was printed or written. To represent all aspects of an handwritten text, please refer to CRMtex (Model for the study of ancient texts)

In this version, HTO is designed mainly based on the level of information extractions from this repository, which currently supports the digitised collections from Data Foundry in National Library of Scotland. For Encyclopaedia Britannic (EB) collection, each article can be extracted along with its description, while entities inside the description can not be recognised yet. As the rest of the collections, extraction level remains in page level, which means that it can only extract the whole text content of each page, no further than that. To this end, it is essential to balance the level of generalisation and specialisation in this Ontology so that it can not only represent the knowledge which can be automatically extracted in great detail, but also extend easily to adopt new entities and relations that would be extracted in the future.

For what types of questions the information in the ontology should provide answers?

Competence questions (draft)

What volumes, editions, or series does a digitalised collection C include?
What time period does a digitalised collection C cover?
When was edition E, series S, or volume V published?
Who published edition E, series S, or volume V?
Who edited edition E, series S, or volume V?
Which genre does a volume V belongs to?
Where was a volume V published or printed?
Which language does a volume V use?
In EB, what articles a volume V include?
Where an EB article A was described (in a page, volume, edition)?
What are the EB articles which appear in all edition?
What EB articles were only appeared once in edition E?
What are EB articles related to another EB article T?
What are EB articles which has similar description to T?
How a term with name T was described in all editions?
What is the text in a page?
What sources the text descriptions of article T or a page P are extracted from?
What is the clean description of term T?
What are the descriptions of term T with the highest text quality?
What software was used to extract the description of article T or a page P?
What software was used to digitise a document?

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
.idea		.idea
doc		doc
release		release
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
hto.ttl		hto.ttl

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HTO: The Heritage Text Ontology

Why building this ontology?

Domain and Scope

For what we are going to use the ontology?

What is the domain that the ontology will cover?

For what types of questions the information in the ontology should provide answers?

About

Packages

Languages

License

frances-ai/HeritageTextOntology

Folders and files

Latest commit

History

Repository files navigation

HTO: The Heritage Text Ontology

Why building this ontology?

Domain and Scope

For what we are going to use the ontology?

What is the domain that the ontology will cover?

For what types of questions the information in the ontology should provide answers?

About

Topics

Resources

License

Stars

Watchers

Forks

Packages 0

Languages

Packages