-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the nihmporter wiki!
You can use make_conda_environment.sh to build a proper Anaconda environment (by default, named nih
).
Activate the above environment and run
# after activating the appropriate conda environement
./import.py
It should result in some feather/pickle (as of July 2021, huge feather files cause memory issues) files, each one storing a Pandas dataframe.
The script also produces a bunch of csv files which subset the above feather/pickle files to give only the information needed by connectivity_stats.py
.
Some (WIP) information about the output is provided here.
connectivity_stats.py
allows to compute some statistics (such as "number of projects that are not associated with any publication", and such) from the CSV data generated by the main program. Hence, connectivity_stats.py
can only be run after import.py
.
-
Projects and Abstracts are updated every week by making available new zip files
-
Publications and Links are updated on a yearly basis by making available new zip files
-
Patents consists of a single file with all the patents that is updated whenever happens
- This means efficient updates are more delicate and some extra logic is required if one doesn't want to always delete the old file.
Some useful links:
Some acronyms that are mentioned above:
- DUNS: Data Universal Numbering System: for identifying organizations