Skip to content

This project is to analyze a dataset and build predictive models that can provide insights to the Human Resources (HR) department of a large consulting firm.

Notifications You must be signed in to change notification settings

JilsyXavier/Employee-Retention---Random-Forest-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 

Repository files navigation

Employee-Retention---Random-Forest-Classifier

This project is to analyze a dataset and build predictive models that can provide insights to the Human Resources (HR) department of a large consulting firm.

Employee Retention: Providing data-driven suggestions for HR

Background Salifort Motors

This project is to analyze a dataset and build predictive models that can provide insights to the Human Resources (HR) department of a large consulting firm.

Salifort’s senior leadership team is concerned about how many employees are leaving the company. Salifort strives to create a corporate culture that supports employee success and professional development. Further, the high turnover rate is costly in the financial sense. Salifort makes a big investment in recruiting, training, and upskilling its employees. As a first step, the leadership team asks Human Resources to survey a sample of employees to learn more about what might be driving turnover. The dataset that will be using in this lab contains 15,000 rows and 10 columns for the variables .Dataset avaialble on Kaggle. We used PACE workflow here to structure the analysis and modeling

Process

Plan Stage

Step 1. Imports

Import packages Load dataset

Step 2. Data Exploration (Initial EDA and data cleaning)

Understand your variables Clean dataset (missing data, redundant data, outliers)

Analyse Stage

Step 3. Data Exploration

EDA Continuation and Visualisation

Construct Stage

Step 3. Model Building

Fit a model that predicts the outcome variable using two or more independent variables Logistic Regression,Decision Trees and Random Forest Check model assumptions Evaluate the model

Execute Stage

Step 4. Results and Evaluation

Interpret model Evaluate model performance using metrics Prepare results, visualizations, and actionable steps to share with stakeholders

Fit a model that predicts the outcome variable using two or more independent variables Check model assumptions Evaluate the model

Summary

The logistic regression model achieved precision of 79%, recall of 82%, f1-score of 80% (all weighted averages), and accuracy of 82%, on the test set.

After conducting feature engineering, the random forest model achieved ruc_auc score of 93.72%, precision of 95.91%, recall of 85.71%, f1-score of 90.45%, and accuracy of 88.01%, on the test set. The random forest modestly outperformed the decision tree model.

About

This project is to analyze a dataset and build predictive models that can provide insights to the Human Resources (HR) department of a large consulting firm.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published