This is the solution for the Synthanic competition on Kaggle.
The goal is to perform EDA and create a model solving binary classification task using synthetic dataset which is based on a real Titanic dataset. The statistical properties of this dataset are similar to the original (and well known) Titanic dataset.
The notebook with solution contains:
- Data quality assessment and missing data imputation.
- Thorough Data exploration with many plots, observations, summary and feature engineering.
- Modeling block were I compared 3 algorithms: Logistic Regression, KNN and Random Forests and did model tuning with RandomizeSearchCV, cross validation, feature selection, data scaling and encoding.