You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Consider a company's marketing campaign (Arvato Financial Services), in which we
need to select those individuals who can become the company's future customers. For this task,
we have the following databases: demographic information from Germany (country where the
company is located) and information from individuals who are already customers of this
company.
First, the demographic information of the German population was analyzed in order to
understand and explore the main characteristics of this population.
Then, we create a predictive model that can determine with reasonable accuracy whether
a person can become a possible consumer of the company, when subjected to a certain
marketing campaign.
Finally, we classify each possible consumer, from an unexplored test database, and
submit the result on the kaggle platform.
Project Motivation
The project is a problem for a company, with real data and with several possible approaches. It is a rich set of data and an interesting problem to be solved. Submitting work on Kaggle is a way to compare the quality of our algorithm with of other students algorithms. That's why I chose to do this specific project that motivated me to learn even more.
Instalation
The following packages are necessary: numpy , datetime, pandas , matplotlib, seaborn , math, sklearn , pylab ,itertools, imblearn, pickle, xgboost
Files
project.pdf - Report with detailed explanation of the entire project.
capstone_proposal.pdf - Report with a proposal for thus project.
util.py - python module with basically data processing and feature engineering
cluster.py -- python module with clustering methods for segmentation report
pca.py -- python module with pca methods for dimensionality reduction
Udacity_AZDIAS_052018.csv: Demographics data for the general population of
Germany; 891 211 persons (rows) x 366 features (columns);
Udacity_CUSTOMERS_052018.csv: Demographics data for customers of a mail-order
company; 191 652 persons (rows) x 369 features (columns);
Udacity_MAILOUT_052018_TRAIN.csv: Demographics data for individuals who were
targets of a marketing campaign; 42 982 persons (rows) x 367 (columns);
Udacity_MAILOUT_052018_TEST.csv: Demographics data for individuals who were
targets of a marketing campaign; 42 833 persons (rows) x 366 (columns);
unknown_values.csv: Mapping dictionary with attributes and the value of the unkown value
Results and Conclusions
The result of this work can be found in the file final_project.pdf, as well as any details of implementation, conclusions and future work
Licenses and Acknowledgements
The project is part of Udacity's machine learning nanodegree program. The data provided is not public, and belongs to Arvato and Udacity