Skip to content

Annotation enrichment in anastore workflow

Donat Agosti edited this page Dec 9, 2022 · 1 revision

This document describes annotation enrichment processes for biodiversity literature. It distinguishes the browsing of annotations, such as in an InvenioRDM repository interface supporting IIIF presentation layer, from annotation workflows that contributors use to edit (enrich) annotations. The former may use any IIIF-compliant viewer. On the other hand, enrichment using the anəstor infrastructure addressed here, currently assumes the Mirador-3 IIIF viewer/editor. Note that filters may be set for browsing annotations using multiple IIIF applications, and these may enable display of subsets of a complete Annotation Collection associated with a publication. Such filters (possibly useful for expert users) may differ from controls determining the annotation sub-set rendered for editing by anəstor to accomplish a specific task (as identified below). For both browsing and editing it will be possible to render all annotations in a Collection, but this may not have much utility because of clutter of the GUI. Editing only of annotations relating to a single publication by a single human contributor is addressed in this document—in contrast, some IIIF viewers allow browsing of multiple canvasses in a single GUI. ORCID is currently assumed as the sole infrastructure for both authenticating contributors and labelling their activities.

anəstor receives the complete annotations on a publication from TB, serialized from IMF as WADM and mints a PID for it. This document uses the term annotationCollection (which may differ from the strict WADM definition of Annotation Collection) to describe these annotations. anəstor subsequently makes a new version of the annotationCollection available to TB, together with metadata (defined below) about the activity of the contributor. This derived annotationCollection version may contain new annotations not present before the activity; it may not contain annotations that have been deleted as part of the activity; and it may contain annotations that have been edited—previously present, but no longer identical in the new version.

The activity of the contributor, leading to generation of the new annotationCollection version, is referred to as an Annotation Enrichment Task, hereinafter Task. Information associated with operation of the Task by anəstor (the Workflow) will include output from business logic concerning access control and from the Workflow, such as the annotations rendered and accessible to the contributor, and IDs of annotations edited, as well as a new Task PID. This information will be made available to TB as Task metadata, together with the new annotationCollection version (as a WADM file).

Task metadata will comprise a Task manifest file serialized in JSON, plus the Task identifier, plus the contributor ORCID and a timestamp (the commit time of the Task—its timestamp—and the ORCID will also be accessible via the Task identifier. The Task manifest will enumerate the annotation IDs that have been edited in the Task.

Based on this information returned to TB, curation management (what tasks have been executed on which publications) is supported and contributors can be credited with the Task via the ORCID infrastructure.

Clone this wiki locally