News Article Text Classification

A pre-trained Multi-Class Text Classification model for identifying the topic of news articles. The purpose of this model is to provide it with a given piece of text and to be provided with it's topic/ category. These categories are based off common news website categories.

This model was created with the help of Susan Li, who wrote this article.

Installation

python -m pip install git+https://github.com/user1342/News-Article-Text-Classification.git

Usage

Using the class:

from news_classification.news_topic_text_classifier import news_topic_text_classifier
model = news_topic_text_classifier()

Print model data:

model.print_model_feature_data()

Identify the topic of a given piece of text:

# Get all categories currently used in the model
print(model.get_all_categories())
# Get the category of a given piece of text
print(model.get_category(r"The introduction of the General Data Protection Regulation (GDPR), the EU is enacting a set of mandatory regulations for businesses that go into effect soon, on May 25, 2018. Organisations found in non-compliance could face hefty penalties of up to 20 million euros, or 4 percent of worldwide annual turnover, whichever is higher. Simply put, GDPR was enacted to give citizens and residents more control over their personal data and puts strict data handling rules in place governing “controllers” that collect data from EU residents, and “processors” that process the data on behalf of controllers, such as cloud providers."))

While the model is pre-trained you can re-download a new set of training data and re-train the model. The creation of the dataset uses news website RSS feeds to download their most recent articles in specific categories. These news websites include: BBC, The Daily Mail, The Independant, Wired, and CNN. Re-training the model will use this new data set. If no paramiters are given they default to the deafult location and will over-write the existing dataset.

new_data_set_location = "new_data_set.csv"
model.create_data_set(dataset=new_data_set_location)
model.re_train(dataset=new_data_set_location)

print(model._data_frame.head())

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
news_classification		news_classification
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.md		README.md
example.ipynb		example.ipynb
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

News Article Text Classification

Installation

Usage

Categories

About

Releases

Packages

Languages

License

CartographerLabs/News-Article-Text-Classification

Folders and files

Latest commit

History

Repository files navigation

News Article Text Classification

Installation

Usage

Categories

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages