Skip to content

Python scripts and utilities for building DBNascent in MySQL

Notifications You must be signed in to change notification settings

Dowell-Lab/DBNascent-build

Repository files navigation

DBNascent_build

This repository is intended for building, updating, and querying DBNascent. This is a MySQL database cataloguing all nascent sequencing experiments in the SRA through 2020. The database has been built and maintained by the DnA Lab at University of Colorado Boulder.

Data in the database pulls from manually curated metadata tables, quality control data, and bidirectional call data from samples. All data is present on the Fiji cluster at CU Boulder.

Version 1.2

Version notes (12/20/2023):

  • The database has been somewhat restructured.
  • All table names are different but describe the same fields. The table equivalents are as follows (linkIDs and searchEquiv are the same):
Old table New table
sampleAccum samples
exptMetadata papers
sampleID sampleEquiv
geneticInfo genetics
organismInfo organisms
tissueDetails tissues
bidirSummary bidirs
conditionInfo conditions
sampleCondition conditionLink
nascentflowMetadata nascentflowRuns
sampleNascentflow nascentflowLink
bidirflowMetadata bidirflowRuns
sampleBidirflow bidirflowLink
  • A few fields have changed names. The primary key identifiers for all tables are now simply id instead of naming which id it is, whereas tables that link to that id have the field as <linkedTable>_id (see fields and linkages in schema). This helps with django's navigation of the database. Other new field names are as follows:
Old field New field
paper_id paper_name
samp_qc_score sample_qc_score
samp_data_score sample_nro_score
paper_data_score paper_nro_score
  • All non-integer identifier table linkages have been removed, so paper_name and sample_name are no longer in LinkIDs and organisms is linked to the papers and genetics tables by a numeric id instead of the organism name. Similarly with the sampleEquiv linkage to the samples table.

Dependencies

The database was built with python 3.6.3. The following packages are required for building OR querying:

configparser v5.2.0 or higher
numpy v1.19.2 or higher
yaml v5.4.1 or higher
pymysql v1.0.2 or higher (may substitute a different MySQL translator)
sqlalchemy v1.4.31 or higher

Database schema

DBNascent database schema
(Generated with https://github.com/sqlalchemy/sqlalchemy/wiki/SchemaDisplay)

Usage

All database objects and functions are defined in dborm.py and dbutils.py.

Building and maintaining DBNascent:

In order to seamlessly integrate with the django website querying this database, the tables should be initially created through a django migration within the website repository on Gitlab. However, the schemas specified for django are the same as those specified here, with a few additional tables generated by django. Thus the database can be created with this repository alone if necessary.

config_build.py defines file paths and fields outside of and within the database. Adding a field to a metadata table requires adding it to the config_build.py file as well.

organisms.txt, sample_cell_types.txt, and searcheq.txt are manually curated tables defining organisms, tissues, and unique values within the database. Adding data may require adding additional lines to these files.

The main scripts for building the database are db_global_add_update.py and db_paper_add_update.py, combined in the db_build_full.sbatch script.

Querying DBNascent:

The database can be queried with defined fields and filtering specifications with query_printout.py for input into DESeq2 or other applications. This script relies on the config_query.txt config file, as well as the dborm.py and dbutils.py. If the query is complex enough, it may require a manual MySQL query, which can be easily passed to the database and printed out with the manual_query_printout.py script.

Both config files refer to a credentials file that contains your credentials for accessing the database. This file should be a one-line two-column tab delimited file: <username><tab><password>

About

Python scripts and utilities for building DBNascent in MySQL

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published