Udacity Arvato Identify Customer Segments

Capstone Project

Consider a company's marketing campaign (Arvato Financial Services), in which we need to select those individuals who can become the company's future customers. For this task, we have the following databases: demographic information from Germany (country where the company is located) and information from individuals who are already customers of this company.
First, the demographic information of the German population was analyzed in order to understand and explore the main characteristics of this population.
Then, we create a predictive model that can determine with reasonable accuracy whether a person can become a possible consumer of the company, when subjected to a certain marketing campaign.
Finally, we classify each possible consumer, from an unexplored test database, and submit the result on the kaggle platform.

The project is a problem for a company, with real data and with several possible approaches. It is a rich set of data and an interesting problem to be solved. Submitting work on Kaggle is a way to compare the quality of our algorithm with of other students algorithms. That's why I chose to do this specific project that motivated me to learn even more.

The following packages are necessary: numpy , datetime, pandas , matplotlib, seaborn , math, sklearn , pylab ,itertools, imblearn, pickle, xgboost

project.pdf - Report with detailed explanation of the entire project.
capstone_proposal.pdf - Report with a proposal for thus project.
util.py - python module with basically data processing and feature engineering
cluster.py -- python module with clustering methods for segmentation report
pca.py -- python module with pca methods for dimensionality reduction
Udacity_AZDIAS_052018.csv: Demographics data for the general population of Germany; 891 211 persons (rows) x 366 features (columns);
Udacity_CUSTOMERS_052018.csv: Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features (columns);
Udacity_MAILOUT_052018_TRAIN.csv: Demographics data for individuals who were targets of a marketing campaign; 42 982 persons (rows) x 367 (columns);
Udacity_MAILOUT_052018_TEST.csv: Demographics data for individuals who were targets of a marketing campaign; 42 833 persons (rows) x 366 (columns);
unknown_values.csv: Mapping dictionary with attributes and the value of the unkown value

The result of this work can be found in the file final_project.pdf, as well as any details of implementation, conclusions and future work

The project is part of Udacity's machine learning nanodegree program. The data provided is not public, and belongs to Arvato and Udacity