Skip to content

Creating an algorithm to classify documents. Each document can have multiple labels

License

Notifications You must be signed in to change notification settings

josepmariall/text_classification

Repository files navigation

Document Classification (text)

This project classifies documents according to multiple labels. Dataset consists of the 6 first pages of over 18,000 documents, and the way each document has been indexed (labelled). Each document can have multiple labels. In total, there's up to 29 different labels.

Documents are from an International Organization.

Table of Contents

The following jupyter notebooks are provided:

  1. Data Exploration and Visualizations
  2. Data Processing
  3. Creating and Training a model

File 1 prepares data for both modeling and visualizations, creating 2 different files one for each purpose.

Code of ethics

This project has been undertaken complying with a code of ethics

Install

I provide the environment used to run this code.

License

This project is under Copyright © 2019 Josep Maria Niubo. It is free software, and may be redistributed under the terms specified in the LICENSE file

About

Creating an algorithm to classify documents. Each document can have multiple labels

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published