- This repository contains projects on ML Classification and Regression
- Classification:- classification is the problem of identifying to which of a set of categories (sub-populations) a new observation belongs, on the basis of a training set of data containing observations (or instances) whose category membership is known.
- Ex. Prediction of heart disease( Yes or No)
- Regression:- In statistical modeling, regression analysis is a set of statistical processes for estimating the relationships between a dependent variable (often called the 'outcome variable') and one or more independent variables (often called 'predictors', 'covariates', or 'features').
- Ex. Prediction of number (price prediction)
Here is list for every project in this repository and link to introduction of project
- Collect Data
- Exploratory Data Analysis:- (with visualization)
- Pandas, Matplotlib, Seaborn
- Feature Engineering:-
- Handle missing values and Categorical Features - sklearn
- Outlier's (depending on which model to use)
- Feature Scaling (depending on model)
- Split data
- Feature selection:-
- Correlation matrix, VariationThreshold
- Building model
- Selecting models
- Train and Evaluate model
- If more than one depending on initial accuracy take for Hyperparameter Tuning - RandomSearchCV, GridSearchCV
- Predict on test set
- Build web-app Flask, HTML5, CSS3 or Streamlit
- Deploy on cloud
- This project is to predict if customer will leave telco or not, using machine learning
- In this we used XGBoost classifier to predict it and got 78.64 accuracy
- by using gamma=0, learning_rate=0.1, max_depth=4, reg_lambda=10, scale_pos_weight=2, subsample=0.9, colsample_bytree=0.5, which we got by hyperparameter tunning XGBClassifier
- Get source code Visit
- by using gamma=0, learning_rate=0.1, max_depth=4, reg_lambda=10, scale_pos_weight=2, subsample=0.9, colsample_bytree=0.5, which we got by hyperparameter tunning XGBClassifier
- On basis of person's health records predict if person have heart disease or not
- In this we used 4 different models to see initial accuracy (ie. SVM, RandomForestClassifier, Adaboost, KNN)
- Out of this we we picked top two for hypertunning( RandomForest and Adaboost) and Adaboost got 91% accuracy_score
- Get source code Visit
- This project is to predict the passangers in titanic will survive or not on the basis of given data.
- In this notebook we've used Random Forest,KNN and GradientBoost classifier and got better accuracy on these top 2 models i.e.GradientBoostClassifier and RandomForestlassifier.
- After Hypertunning got an accuracy 78.73% for Gradient Boosting.
- Hypertunnig parameters for GBC learning_rate= 0.01, max_depth= 4, max_features= 16, min_samples_leaf= 1, min_samples_split=20, n_estimators= 500.
- Get source code Visit
- Prediction of sale price of bulldozer on basis of its specifications like year made, productSize and 50 more features
- Here we used RandomForestRegressor and got pretty good score of 85% r^2
- so after hypertuning we got best_params n_estimators=90, min_samples_leaf=1, min_samples_split=14,max_features=0.5, n_jobs=-1, max_samples=None
- with this we got 95% and 88% r^2 on train and test set respectively
- Get source code Visit
- Prediction of House price on basis of features of house
- Here we used Lasso, randomforest and XGBoost for initial r^2 score
- Then we hypertuined XGB cause it had bit more r^2 and got gamma= 0, learning_rate= 0.05, max_depth =6, reg_lambda= 0, subsample=0.9, colsample_bytree=0.5
- with this params I got 98% and 85% r^2 on train and validation set respectively
- Get source code Visit