Machine Learning using the National Comorbidity Survey

The goal of this project was to use machine learning to predict suicidal behavior based on a subsection of a dataset containing survey responses. Variable selection was performed by referencing recent psychological literature to identify variables which have been shown to be associated with suicidal behavior.

The original data are accessible from the ICPSR website

Citation:

Kessler, Ronald C. National Comorbidity Survey: Baseline (NCS-1), 1990-1992. Ann Arbor, MI: Inter-university Consortium for Political and Social Research (distributor), 2008-09-12. [https://doi.org/10.3886/ICPSR06693.v6]

Summary data regarding suicidal behavior were gathered from the National Center for Health Statictics, Centers for Disease Control and Prevention

scrape_vars.ipynb is the Jupyter notebook for scraping the variables from the DS1 code book. In addition to creating a Mongo DB, the notebook creates 3 temporary files: variables.csv, titles.csv, variables.json. The last file contains the same variable maps that are stored in the Mongo DB. scrape4.ipynb is the scraping from the DS2 code book.

sort_variables.ipynb is the Jupyter notebook that attempts to classify each variable based on the results of the scraping by looking at the values of each label found. It then attempts to convert the entire dataset.

There are various .json files that include temporary workfiles for sorting and scraping variables. Saving the intermediate work allows one to start over at a point other than the beginning.

r-joined.csv is the joined .csv file of the DS1 and DS2 data sets.

coding-variables.ipynb is the Jupyter notebook for converting variable's integer-coded values to their descriptive values.

reduced-data2.csv is the subset of the data used for fitting our models.

morefitting.ipynb is the Jupyter notebook with the Lasso Logistic Regression. Adjusting the threshold to acheive the desired recall is demonstrated. The code to plot the L1 Regularization Paths was found here: L1 Regularized Path for Logistic Regression All credit is due to the Author: Alexandre Gramfort alexandre.gramfort@inria.fr

The file also includes several other fits including Decision Trees, Random Forests, Boosting, Support Vector Classifiers, and Neural Nets. With the last method was not explored in detail, most methods struggle with the recall in this problem.

Name		Name	Last commit message	Last commit date
Latest commit History 55 Commits
dougstuff		dougstuff
website		website
NCHS_-_Injury_Mortality__United_States.csv		NCHS_-_Injury_Mortality__United_States.csv
README.md		README.md
references.html		references.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning using the National Comorbidity Survey

About

Releases

Packages

Contributors 4

Languages

Justinmatt21/NCS-1MachineLearning

Folders and files

Latest commit

History

Repository files navigation

Machine Learning using the National Comorbidity Survey

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages