Player-Classifier

Built Classification models to determine the player from Shakespeare-plays dataset using Feature Engineering and exploratory data analysis.

Problem Statement

To be, or not to be

Classy Shakespeare plays and players

Set up a data science project structure in a new git repository in your GitHub account
Download the Shakespeare plays dataset from https://www.kaggle.com/kingburrito666/shakespeare-plays
Load the data set into panda data frames
Formulate one or two ideas on how feature engineering would help the data set to establish additional value using exploratory data analysis
Build one or more classification models to determine the player using the other columns as features
Document your process and results
Commit your notebook, source code, visualizations and other supporting files to the git repository in GitHub

Data Description

This is a dataset comprised of all of Shakespeare's plays. It includes the following:

The first column is the Data-Line, it just keeps track of all the rows there are.
The second column is the play that the lines are from.
The third column is the actual line being spoken at any given time.
The fourth column is the Act-Scene-Line from which any given line is from.
The fifth column is the player who is saying any given line.
The sixth column is the line being spoken.

What are we doing?

Train Classification models to predict Player from Shakespeare plays dataset.
Explore Shakespeare plays dataset
Using Feature Engineering, transform data to provide best accuracy with classification models.
Use different Models to identify the best one for this dataset.

Analysis

This is number of players in each play plot

After exploring dataset we understood the availability of huge data for each play helps build more accurate model

This is a play count plot- to determine how many attributes we have for each play

Also our main goal is to determine player from dataset.

This plot is for player count- to determine how many attributes we have for each player

Results: (80% of data is used for training and 20% for testing)

Decision Tree Classifier

Accuracy of Decision Tree classifier on training set: 100%

Accuracy of Decision Tree classifier on test set: 79%

performed pretty good compared to others.

K- Nearest Neighbors Classifier

Accuracy of K-NN classifier on training set: 76%

Accuracy of K-NN classifier on test set: 53%

not too bad performance.

Gaussian Naive Bayes Classifier

Accuracy of GNB classifier on training set: 23%

Accuracy of GNB classifier on test set: 22%

worst performance overall.

Random Forest Classifier

Accuracy of Random Forest classifier on training set: 100%

Accuracy of Random Forest classifier on test set: 82%

best performing model.

Project Structure:

Readme.md

Project description

Data

Contains raw data in csv format

Notebooks

Jupyter Notebook for Exploratory data analysis, Visualization, Feature Engineering and Classification.

Reports

plot- number of players vs play
Plot- Play count
Plot- Player count

Requirements.txt

Info about Tools, frameworks and libraries required to reproduce the work flow

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
notebooks		notebooks
reports		reports
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
Requirements.txt		Requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Player-Classifier

Problem Statement

To be, or not to be

Classy Shakespeare plays and players

Data Description

This is a dataset comprised of all of Shakespeare's plays. It includes the following:

What are we doing?

Analysis

This is number of players in each play plot

After exploring dataset we understood the availability of huge data for each play helps build more accurate model

This is a play count plot- to determine how many attributes we have for each play

Also our main goal is to determine player from dataset.

This plot is for player count- to determine how many attributes we have for each player

Results: (80% of data is used for training and 20% for testing)

Decision Tree Classifier

Accuracy of Decision Tree classifier on training set: 100%

Accuracy of Decision Tree classifier on test set: 79%

K- Nearest Neighbors Classifier

Accuracy of K-NN classifier on training set: 76%

Accuracy of K-NN classifier on test set: 53%

Gaussian Naive Bayes Classifier

Accuracy of GNB classifier on training set: 23%

Accuracy of GNB classifier on test set: 22%

Random Forest Classifier

Accuracy of Random Forest classifier on training set: 100%

Accuracy of Random Forest classifier on test set: 82%

Project Structure:

Readme.md

Data

Notebooks

Reports

Requirements.txt

About

Releases

Packages

Languages

License

SFLazarus/Player-Classifier

Folders and files

Latest commit

History

Repository files navigation

Player-Classifier

Problem Statement

To be, or not to be

Classy Shakespeare plays and players

Data Description

This is a dataset comprised of all of Shakespeare's plays. It includes the following:

What are we doing?

Analysis

This is number of players in each play plot

After exploring dataset we understood the availability of huge data for each play helps build more accurate model

This is a play count plot- to determine how many attributes we have for each play

Also our main goal is to determine player from dataset.

This plot is for player count- to determine how many attributes we have for each player

Results: (80% of data is used for training and 20% for testing)

Decision Tree Classifier

Accuracy of Decision Tree classifier on training set: 100%

Accuracy of Decision Tree classifier on test set: 79%

K- Nearest Neighbors Classifier

Accuracy of K-NN classifier on training set: 76%

Accuracy of K-NN classifier on test set: 53%

Gaussian Naive Bayes Classifier

Accuracy of GNB classifier on training set: 23%

Accuracy of GNB classifier on test set: 22%

Random Forest Classifier

Accuracy of Random Forest classifier on training set: 100%

Accuracy of Random Forest classifier on test set: 82%

Project Structure:

Readme.md

Data

Notebooks

Reports

Requirements.txt

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages