Skip to content
Manuel A. Vázquez edited this page Jul 6, 2021 · 8 revisions

Welcome to the nihmporter wiki!

Installation

You can use make_conda_environment.sh to build a proper Anaconda environment (by default, named nih).

Usage

Activate the above environment and run

# after activating the appropriate conda environement
./import.py

It should result in some feather/pickle (as of July 2021, huge feather files cause memory issues) files, each one storing a Pandas dataframe.

The script also produces a bunch of csv files which subset the above feather/pickle files to give only the information needed by connectivity_stats.py.

Some (WIP) information about the output is provided here.

Extra utilities

connectivity_stats.py allows to compute some statistics (such as "number of projects that are not associated with any publication", and such) from the CSV data generated by the main program. Hence, connectivity_stats.py can only be run after import.py.

Update

  • Projects and Abstracts are updated every week by making available new zip files

  • Publications and Links are updated on a yearly basis by making available new zip files

  • Patents consists of a single file with all the patents that is updated whenever happens

    • This means efficient updates are more delicate and some extra logic is required if one doesn't want to always delete the old file.

Data Documentation

Some useful links:

Some acronyms that are mentioned above:

Clone this wiki locally