Text-Analytics-with-Multi-Class-and-Imbalanced-Learning

This project is part of Advanced Topics in Machine Learning subject. Further detailed description of the project can be known in the documentation of the Project.

Problem: Genre Identification on (a sub-set of) Gutenberg Corpus

Consider this set of books belonging to the 19^th Century English Fiction ¹.

The data set is created from Project Gutenberg². The data set consists of about 1000 books and roughly 10 genres. The task here consists of detection (i.e. multi-class classification) of genre³ of a book. Each data-point in this classification task is a fiction book with a label (genre). Please note the following three main challenges tackled:

Extraction of features that are relevant to fiction books, which may include ideas like sentiment, setting⁴ and so on, using appropriate libraries.
Outline of all the models used and why and how model selection was performed.
Explaination of how the evaluation of the model is being done and how the data set is to be partitioned while taking into account potential challenges like class imbalances and similar.

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
dataset_Gutenberg_Corpus-subset_English_Fiction_1k		dataset_Gutenberg_Corpus-subset_English_Fiction_1k
docs		docs
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Text-Analytics-with-Multi-Class-and-Imbalanced-Learning

Problem: Genre Identification on (a sub-set of) Gutenberg Corpus

About

Languages

License

JalajVora/Text-Analytics-with-Multi-Class-and-Imbalanced-Learning

Folders and files

Latest commit

History

Repository files navigation

Text-Analytics-with-Multi-Class-and-Imbalanced-Learning

Problem: Genre Identification on (a sub-set of) Gutenberg Corpus

About

Topics

Resources

License

Stars

Watchers

Forks

Languages