Graph-Based-Feature-Selection-for-Dimensionality-Reduction

Graph Based Feature Selection is a new approach of reducing the dimensionality of a dataset using a Graph Based approach. The apporach tries to generate a Kruskal's minimum spanning tree of a graph where the features of the dataset are the vertices and the correlation among them are the weights of the edges. The edges having weights greater than the user defined threshold are removed. Hence, reducing the dimension of the dataset.

Tools Used: Scikit-learn, Pandas, Numpy, Matplotlib, Networkx

Steps performed to achieve dimensionality reduction via Graph Based Method:

Importing dataset and Data Preprocessing:
- Naming the columns.
- Imputation to treat Missing Values using imputer from scikit-learn library.
Normalization:
- Values are rescaled so that they end up ranging between 0 and 1 using Normalizer from scikit-learn library.
- Also known as min-max scaling.
Correlation matrix:
- Correlation among the features are found using .corr() from pandas library.
Defining the Vertices and Weights:
- The features or the columns are taken into a list to define Vertices.
- Weights are stored into another list by extracting the upper triangular matrix from the correlation matrix.
The Graph Based Approach:

Generating the Minimum Spanning Tree(MST) for the Graph using Krushkal's algorithm

A spanning tree of a graph is a sub graph that is a tree and connects all the vertices without forming a loop. Number of vertices is same as the original graph. Method:

a. Sort all the edges in non-decreasing order of their weight.

b. Pick the smallest edge. Check if it forms a cycle with the spanning tree formed so far. If cycle is not formed include this edge else discard it.

c. Repeat Step b until there are (V-1) edges in the spanning tree.
Removing edges whose weights are greater than the threshold from the constructed MST
- Cosidering the Threshold to be -0.31, the vertices associated with weight greater than this are dropped.
- The remaining MST is printed after dropping the vertices.
The Final Reduced Dataset is displayed

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
Dimenional reduction on yeast dataset-4.ipynb		Dimenional reduction on yeast dataset-4.ipynb
Graph Based Feature Selection for Dimensionality Reduction.ipynb		Graph Based Feature Selection for Dimensionality Reduction.ipynb
Minimum Spanning Tree after removing edges.jpg		Minimum Spanning Tree after removing edges.jpg
Minimum Spanning Tree.jpg		Minimum Spanning Tree.jpg
README.md		README.md
yeast.csv		yeast.csv
yeast.names		yeast.names

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Graph-Based-Feature-Selection-for-Dimensionality-Reduction

Steps performed to achieve dimensionality reduction via Graph Based Method:

The Graph Based Approach:

Generating the Minimum Spanning Tree(MST) for the Graph using Krushkal's algorithm

About

Releases

Packages

Languages

akashjborah97/Graph-Based-Feature-Selection-for-Dimensionality-Reduction

Folders and files

Latest commit

History

Repository files navigation

Graph-Based-Feature-Selection-for-Dimensionality-Reduction

Steps performed to achieve dimensionality reduction via Graph Based Method:

The Graph Based Approach:

Generating the Minimum Spanning Tree(MST) for the Graph using Krushkal's algorithm

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages