Skip to content

malort/Unsupervised_Country_Data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Unsupervised_Country_Data

Dataset provided by HELP International. The objective is to categorize countries according to the overall development using socio-economic and health factors.

Requirements:

pandas: Data analysis and manipulation tool.

matplotlib: Visualization library.

seaborn: Data visualization library based on matplotlib, it enhances the style of matplotlib plots.

Numpy: Numerical analysis library.

scikit-learn: Machine Learning library.

Bokeh: Library for interactive data visualization.

Plotly Express: High-level Python visualization library.

First part - EDA and Unsupervised Analysis:

After a brief exploratory data analysis, several unsupervised algorithms such as Kmeans, Affinity Propagation and Gaussian Mixture Model are used to group countries into three categories.

Second part - Dimension reduction with t-SNE and Maps visualizations:

t-Distributed Stochastic Neighbor Embedding (t-SNE) is a non-linear technique for dimensionality reduction that is particularly well suited for the visualization of high-dimensional datasets. Data is reduced in two dimensions using t-SNE and plotted with Bokeh.

tSNE2

Interactive map visualizations are used to show the result of the previous analysis.

world_plotly

asia_gdpp_mort