Skip to content

Course Project for CSE-343 (Machine Learning) - Monsoon 2023

Notifications You must be signed in to change notification settings

arnav10goel/CSE343-ML-Project

Repository files navigation

CSE343-ML-Project - Suicide Ideation Prediction from Social Media Conversations

Course Project for CSE-343 (Machine Learning) - Monsoon 2023

Project Overview

In the face of growing concerns over mental health and the alarming rise in suicide rates, our project aims to detect and address suicide ideation by analyzing social media conversations. Utilizing advanced machine learning techniques, we've developed a robust model capable of identifying individuals at heightened risk based on their online activities. Our solution includes a real-world application through a Reddit Bot, designed to flag posts with potential suicide ideation risks.

Team Members

Introduction

The project addresses the critical need for effective suicide prevention strategies by leveraging social media as a platform for early detection of suicide ideation. With a 36% increase in suicide rates from 2000 to 2021, our predictive model seeks to provide timely intervention, potentially saving lives by identifying at-risk individuals through their digital footprints. Recognizing the pivotal role social media plays in modern communication, our system is designed to detect suicide ideation through analysis of Reddit posts. Our approach utilizes a comprehensive dataset from the r/SuicideWatch subreddit, applying machine learning algorithms to identify early signs of suicidal thoughts.

Dataset and Preprocessing

We employed the University of Maryland Reddit Suicidality Dataset, conducting rigorous data preprocessing to clean and prepare text data for analysis. Techniques included removal of non-ASCII characters, URLs, usernames, and punctuation, as well as stopwords and lowercasing for standardization.

NonSuicideWordCloud

SuicideWordCloud

Methodology

Our methodology encompasses a diverse range of machine learning models, including Logistic Regression, SVM, Naive Bayes, Decision Trees, and Random Forest, among others. We also explored ensemble methods and neural networks for enhanced predictive performance. Evaluation metrics such as accuracy, precision, and recall were employed to assess model effectiveness.

Results and Analysis

Our findings indicate that models like LDA, Logistic Regression, and the SVM classifier perform best, with notable improvements using Word2Vec embeddings. Ensemble methods and a Multilayer Perceptron (MLP) classifier also showed promising results, demonstrating the efficacy of our approach in detecting suicide ideation with high accuracy.

Results for Machine Learning Models: Screenshot 2024-03-16 at 12 49 16 PM

Results for Ensemble Method and a MLP Classifier: Screenshot 2024-03-16 at 12 49 44 PM

Model Deployment

Reddit Bot Demo: YouTube Link

The culmination of our project is the deployment of a Reddit Bot, integrating our most effective machine learning model to actively scan and flag posts for suicide ideation on Reddit. This bot aims to bridge the gap between at-risk individuals and timely mental health support.