Skip to content

Thrift definitions, making HLT data specifications concrete

License

Unknown, Apache-2.0 licenses found

Licenses found

Unknown
LICENSE.md
Apache-2.0
APACHE-LICENSE
Notifications You must be signed in to change notification settings

hltcoe/concrete

Repository files navigation

Copyright 2012-2023 Johns Hopkins University HLTCOE. All rights reserved. This software is released under the 2-clause BSD license. See LICENSE in the project root directory.

Concrete

Current version: 4.18

Please consult NEWS.md for more information about changes between versions.

Introduction

Concrete is an attempt to map out various NLP data types in a Thrift schema for use in projects across Johns Hopkins University. This standardized schema allows researchers to use a common, underlying data model for all NLP tasks, and thus, facilitating integration between projects.

Browsable Schema Documentation

This repository contains HTML documentation for the Concrete schema. The documentation content is generated from the .thrift schema files. This HTML documentation contains the exact same content as the schema text files, but the HTML format makes it easier browse and explore relations between different Concrete data structures.

To view the HTML documentation, open the file:

concrete/docs/schema/index.html

in your favorite web browser.

Documentation Webserver

The repository comes with an (optional) simple Bottle-based Python web server for hosting the documentation. You can install Bottle using pip:

pip install bottle

and then start the web server with the command:

python concrete_docs_server.py [--port PORT_NUMBER]

This command will start a web server on your machine on the default port number (8097).

Point your browser to http://localhost:8097 to navigate to the documentation (assuming port 8097).

Regenerating Documentation

If you do not have write access to this repository than you can safely ignore this section.

The HTML documentation is a modified version of the documentation generated by the Thrift compiler. In order to regenerate the documentation, you will need both the thrift compiler and the Python library beautifulsoup4. You can regenerate the documentation by running the regenerate_docs.sh script:

cd docs
./regenerate_docs.sh path_to_thrift_compiler

This script will call thrift --gen html to generate HTML files for each .thrift file, and then copy modified versions of each HTML file to the schema/ directory. Not all files in the schema/ directory are auto-generated.