Skip to content

Final project for the Udacity Machine Learning Engineer Nanodegree. Customer Segmentation, based on regional demographic and customer data. Supervised Learning Techniques applied for customer acquisition..

Notifications You must be signed in to change notification settings

vgp314/Udacity-Arvato-Identify-Customer-Segments

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Udacity Arvato Identify Customer Segments

Capstone Project

Table of Contents

  1. Problem Statement
  2. Project Motivation
  3. Instalation
  4. Files
  5. Results and Conclusions
  6. Licenses and Acknowledgements

Problem Statement

  • Consider a company's marketing campaign (Arvato Financial Services), in which we need to select those individuals who can become the company's future customers. For this task, we have the following databases: demographic information from Germany (country where the company is located) and information from individuals who are already customers of this company.
  • First, the demographic information of the German population was analyzed in order to understand and explore the main characteristics of this population.
  • Then, we create a predictive model that can determine with reasonable accuracy whether a person can become a possible consumer of the company, when subjected to a certain marketing campaign.
  • Finally, we classify each possible consumer, from an unexplored test database, and submit the result on the kaggle platform.

Project Motivation

  • The project is a problem for a company, with real data and with several possible approaches. It is a rich set of data and an interesting problem to be solved. Submitting work on Kaggle is a way to compare the quality of our algorithm with of other students algorithms. That's why I chose to do this specific project that motivated me to learn even more.

Instalation

  • The following packages are necessary: numpy , datetime, pandas , matplotlib, seaborn , math, sklearn , pylab ,itertools, imblearn, pickle, xgboost

Files

  • project.pdf - Report with detailed explanation of the entire project.
  • capstone_proposal.pdf - Report with a proposal for thus project.
  • util.py - python module with basically data processing and feature engineering
  • cluster.py -- python module with clustering methods for segmentation report
  • pca.py -- python module with pca methods for dimensionality reduction
  • Udacity_AZDIAS_052018.csv: Demographics data for the general population of Germany; 891 211 persons (rows) x 366 features (columns);
  • Udacity_CUSTOMERS_052018.csv: Demographics data for customers of a mail-order company; 191 652 persons (rows) x 369 features (columns);
  • Udacity_MAILOUT_052018_TRAIN.csv: Demographics data for individuals who were targets of a marketing campaign; 42 982 persons (rows) x 367 (columns);
  • Udacity_MAILOUT_052018_TEST.csv: Demographics data for individuals who were targets of a marketing campaign; 42 833 persons (rows) x 366 (columns);
  • unknown_values.csv: Mapping dictionary with attributes and the value of the unkown value

Results and Conclusions

  • The result of this work can be found in the file final_project.pdf, as well as any details of implementation, conclusions and future work

Licenses and Acknowledgements

  • The project is part of Udacity's machine learning nanodegree program. The data provided is not public, and belongs to Arvato and Udacity

About

Final project for the Udacity Machine Learning Engineer Nanodegree. Customer Segmentation, based on regional demographic and customer data. Supervised Learning Techniques applied for customer acquisition..

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published