Skip to content

Linked Environment Data

bandholtz edited this page May 25, 2011 · 3 revisions

Linked Environment Data

Since 2010 several projects of the German Federal Environment Agency (FEA) have been contributing to the creation of a public data network based on Linked Data. This effort was started with the Environmental Specimen Bank (ESB) and the Semantic Network Service (SNS), with further information systems being considered for inclusion. It is part of an international collaboration effort with the Ecoinformatics Initiative. innoQ is instrumental in enabling the implementation using the Linked Data approach.

Linked Data and Environmental Informationen

Linking environmental data and terminology has been of interest to the FEA since the 90s, with several projects having been conducted in this area (UMPLIS, UDK, GEIN, SNS, PortalU). However, the existing implementations share two shortcomings:

  • Only data containers (databases, information systems, complex web pages) have been linked rather than individual records.
  • There is no common access to a shared data structure so references were only meaningful in the context of the host system.

It is these shortcomings that the Linked Data data approach is meant to overcome.

Interlinking the Environmental Specimen Bank

The Environmental Specimen Bank records the accumulation of (harmful) substances in test subjects at certain locations and times. However the UPB itself is not responsible for the comprehensive description of all relevant elements, so specialized information should be referenced instead. For substances such data is provided by GSBL, for species there is EUNIS, for locations and times SNS's geo thesaurus and environmental chronicle, respectively. The environmental thesaurus (UMTHES) provides an overarching envelope which is in turn linked with the international GEMET.

Each record in the UPB can link directly to the information from those specialized systems. Ideally those provide a backreference, enabling two-way navigation.

In addition to the information systems mentioned so far, there are numerous specialized systems operated independently from governmental agencies, e.g. Chemical Entities of Biological Interest ChEBI or GeoNames. Whether those should be referenced is merely a matter of policy - the technical opportunity exists.

RDF Models

A data representation in Resource Description Framework (RDF) format is required in each participating system for cross-linking references. Based on this, individual models (RDF schema or "vocabulary") are described and applied which are roughly comparable to object-relational models, but exceed those in expressiveness. There is already a large number of established RDF vocabularies, which can - and should - be used, combined and extended.

Technical Architecture

It does not appear efficient for each participating information system to implement Linked Data mechanisms on its own. Instead the FEA will implement a dedicated Linked Data server as shared proxy which dereferences all URIs, redirects to the individual systems' HTML representation if necessary and also provides a SPARQL endpoint.

Each participating system then only has to provide the respective RDF representation of its own data sets, and notify the Linked Data server of any modifications.

Based on this, further visualization services can be implemented, e.g. as is being tested in the US government's Data-gov project.

innoQ's Contribution