Multilabel Classification Evaluation #5

angrymeir · 2020-05-10T08:57:33Z

Thank you for this awesome project!
Currently the evaluation class only supports single label classification, even though SS3 inherently supports multilabel classification.
These are the steps (I see) needed to support multilabel classification evaluation:

Take the output of classify_multilabel
Convert result to binarized vector (same length as confidence vector)
Implement multilabel classification metrics usage (e.g. Hamming Loss)
Adopt Gridsearch

angrymeir · 2020-05-10T20:11:12Z

Edit: Since the multilabel stratified k-fold cross validation is not implemented in sklearn this repository might help for the implementation of multilabel gridsearch.

sergioburdisso · 2020-05-11T13:42:10Z

Thank you, @angrymeir! You're helping to make this humble project better!

That is totally right, the current implementation of the evaluation class does not provide support for multilabel classification.

What do you think of adding an extra argument to the classify_multilabel called, for instance, indicator_function_output, which could be either True or False. This argument could be used to enable the output to be a binarized vector having the value 1 for all $c_i$ such that $doc \in c_i$ (according to the trained model), and 0 otherwise. Do you think the name indicator_function_output is OK?

I'm currently working on the train method, which now should make the training procedure much easier and clearer, allowing the y_train list to be composed of lists of labels (not single labels). One interesting thing I realized is that some datasets will provide no labels at all for some documents (e.g. this one), thus, the empty list [] is a valid "label". Internally, I create a special "other" category as a workaround. Good thing is that now the train/fit will be much flexible.

Thanks for suggesting that repository implementing multilabel stratified k-fold cross-validation! it seems quite straightforward to use.

BTW, taking into account your great ideas, suggestions, and feedback, do you mind being added to the README file as a contributor?

sergioburdisso · 2020-05-12T15:03:13Z

BTW just in case that you're wondering regarding being added as a contributor, PySS3 follows the all-contributors specification, "Recognize all contributors, not just the ones who push code" 😎

Now that I'm done with the other Issue, I'll continue with this one 👽 ☕

angrymeir · 2020-05-12T15:32:28Z

Sounds like a plan!
I'll also further read into stratification.

I would be honored to be listed as a contributor! However, the ideas are not only from me, but also my colleague @Vaiyani!

sergioburdisso · 2020-05-12T15:49:40Z

@all-contributors could you add @Vaiyani and @angrymeir as contributors for ideas, suggestions, and feedback?

allcontributors · 2020-05-12T15:49:49Z

@sergioburdisso

I've put up a pull request to add @angrymeir and @Vaiyani! 🎉

sergioburdisso · 2020-05-12T16:29:19Z

@angrymeir and @Vaiyani, both were added to the readme file! 😎 Thanks, guys. I've also added you as contributions not only for ideas but also for data (since probably I'll be using your SemEval 2016 Task 5 dataset for the tutorials and live demo, as suggested in Issue #6).

Vaiyani · 2020-05-12T16:35:00Z

@sergioburdisso Thanks for this great project as well :)

The fit/train method now supports multilabel classification. It will automatically determine if we're dealing with a multilabel classification problem by looking at the first item of the `y_train` list. If the first item is a list (of labels), i.e., if it's not a single label, it will assume we're dealing with a multilabel classification problem.

This function converts the list of training/test labels (i.e., y_train/y_test) into a membership matrix. This function is useful when working with multi-label classification problems and it is meant to be used only internally by the evaluation module (the ``Evaluation`` class). However, in case users want to perform model evaluations using custom evaluation metrics, they could use this function as shown in the following example, in which the performance will be measured in terms of Hamming loss: ``` from pyss3 import SS3 from pyss3.util import Dataset, membership_matrix from sklearn.metrics import hamming_loss x_train, y_train = Dataset.load_from_files_multilabel(...) x_test, y_test = Dataset.load_from_files_multilabel(...) clf = SS3() clf.train(x_train, y_train) y_pred = clf.predict(x_test, multilabel=True) y_test_mem = membership_matrix(clf, y_test) y_pred_mem = membership_matrix(clf, y_pred) hamming_loss(y_test_mem, y_pred_mem) ``` Documentation available here: https://pyss3.rtfd.io/en/latest/api/index.html#pyss3.util.membership_matrix

This function converts the list of training/test labels (i.e., y_train/y_test) into a membership matrix. This function is useful when working with multi-label classification problems and it is meant to be used only internally by the evaluation module (the ``Evaluation`` class). However, in case users want to perform model evaluations using custom evaluation metrics, they could use this function as shown in the following example, in which the performance will be measured in terms of Hamming loss: ``` from pyss3 import SS3 from pyss3.util import Dataset, membership_matrix from sklearn.metrics import hamming_loss x_train, y_train = Dataset.load_from_files_multilabel(...) x_test, y_test = Dataset.load_from_files_multilabel(...) clf = SS3() clf.train(x_train, y_train) y_pred = clf.predict(x_test) y_test_mem = membership_matrix(clf, y_test) y_pred_mem = membership_matrix(clf, y_pred) hamming_loss(y_test_mem, y_pred_mem) ``` Documentation available here: https://pyss3.rtfd.io/en/latest/api/index.html#pyss3.util.membership_matrix

Now, when working with multi-label classification problems, ``predict()`` will realize the user is working with multi-labeled data and set the `multilabel` argument to True by default. Therefore, if the user has trained the model using multilabeled data, then (s)he can simply call ``predict(x_test)`` without the ``multilabel=True`` argument. (#5)

Now the ``membership_matrix()`` runs 30 times faster. For instance, what before took 4.5s now takes only 150ms. This optimization was necessary because this function is called each that time the model is evaluated, which means, for instance, that is called multiple times while performing ``grid_search()``.

Evaluation.test() now supports multi-label classification as well. It supports all previous standard metrics (precision, recall, f1-score, accuracy) plus two new ones, 'hamming-lose' and 'exact-match' (equivalent to 'accuracy'). Once finished, the `test` function will also show a binary confusion matrix for each possible label.

Evaluation.test() now supports multi-label classification as well. It supports all previous standard metrics (precision, recall, f1-score, accuracy) plus two new ones, 'hamming-lose' and 'exact-match' (equivalent to 'accuracy'). Once finished, the `test` function also shows a binary confusion matrix for each possible label.

angrymeir · 2020-05-15T23:55:23Z

Hey @sergioburdisso ,
I just tried out clf.fit() and Evaluation.test() on our multilabel dataset and it works like a charm!
Wuhu 🥳 Thank you for implementing this!

Regarding the Grid Search should I create a separate Issue for that?

Evaluation.kfold_cross_validation() now supports multi-label classification as well. It supports all previous standard metrics (precision, recall, f1-score, accuracy) plus two new ones, 'hamming-lose' and 'exact-match' (equivalent to 'accuracy').

sergioburdisso · 2020-05-16T11:39:53Z

@angrymeir Cool!!! I've just finished with the kfold_cross_validation, now I'll start with the grid_search, it shouldn't be too difficult since it mostly calls test and kfold_cross_validation. I've been doing the changes in such a way to make things easier for me, not only for grid_search but also for the interactive evaluation 3D Plot (Evaluation.plot()) which now shouldn't take me too much time to adapt it to support multilabel classification.

Evaluation.grid_search() now supports multi-label classification using the "test" method. It supports all previous standard metrics (precision, recall, f1-score, accuracy) plus two new ones, 'hamming-lose' and 'exact-match' (equivalent to 'accuracy').

Now the 3D evaluation plot (`Evaluation.plot()`) supports multi-label classification. New performance metrics have been added and binary confusion matrices for each label are shown for each evaluated model configuration.

sergioburdisso · 2020-05-16T16:52:43Z

@angrymeir @Vaiyani Guys! I've finally finished adding full multi-label classification support to the Evaluation class! Yay!!! 🥳🥳🥳

Thanks, guys, for creating this issue :) these changes were necessary. Issue #9 is also part of this overall process of adding full multi-label classification support to PySS3 so, as soon as I finish with the other two issues, I'll finally release the new version (0.6.0). Do you think guys that we should also add a new tutorial showing the new features? do you think your dataset is gonna be well suited for that or should I use a simpler one? sort of more like a "proof-of-concept" dataset... what do you think?

Vaiyani · 2020-05-16T17:05:14Z

@sergioburdisso thankyou for the quick and effective response from your side on this issue.

I believe tutorial would be a good idea for the new people as well because tutorials are the first point of learning (from my experience). Would be really helpful.

As for the data, not quite sure. Our dataset (Sem eval) is also well suited for this but at the end whichever delivers the message clearly should be the aim.

angrymeir · 2020-05-16T18:36:42Z

I guess a tutorial highlighting the differences would be great!
However we can't use SemEval for that, since we're not allowed to redistribute it publicly..
I think the Toxic Comment Dataset should also be well suited for that :)

In case you need help with the notebook/don't have time to implement it, let me know and I'll create one!

The dataset is a subset of the CMU Movie Summary Corpus (http://www.cs.cmu.edu/~ark/personas/) with 32985 summaries and only 10 movie genres. The dataset is structured according to #6, i.e., there are two files, one for the labels and another for the movie plot summaries.

sergioburdisso · 2020-05-20T00:52:00Z

Guys! I've just finished implementing the multi-label support for the Live Test tool (issue #9).

Now, in the left panel, test documents are shown with a % corresponding to the label-based accuracy (aka hamming score). Besides, when a document is selected, the true labels are shown along with the predicted labels, misclassified labels are shown in red, as "drama" below:

I'm about to release the new version soon, I'm just performing the final checks. Regarding the dataset for the tutorial, I've finally decided to use a subset of the CMU Movie Summary Corpus with only 10 categories (and 32985 documents/plot summaries). I've already uploaded the zipped dataset to the repo (5f5c055), it uses the same format as you suggested in Issue #6 (one file for (semicolon separated) labels, another for docs), so I'll probably start working on (a very basic version of) the tutorial soon 😊

PySS3 now fully support multi-label classification! :) - The ``load_from_files_multilabel()`` function was added to the ``Dataset`` class (7ece7ce, resolved #6) - The ``Evaluation`` class now supports multi-label classification (#5) - Add multi-label support to ``train()/fit()`` (4d00476) - Add multi-label support to ``Evaluation.test()`` (0a897dd) - Add multi-label support to ``show_best and get_best()`` (ef2419b) - Add multi-label support to ``kfold_cross_validation()`` (aacd3a0) - Add multi-label support to ``grid_search()`` (925156d, 79f1e9d) - Add multi-label support to the 3D Evaluation Plot (42bbc65) - The Live Test tool now supports multi-label classification as well (15657ee, b617bb7, resolved #9) - Category names are no longer case-insensitive (4ec009a, resolved #8)

sergioburdisso added the enhancement New feature or request label May 11, 2020

allcontributors bot mentioned this issue May 12, 2020

docs: add angrymeir and Vaiyani as contributors #7

Merged

This was referenced May 15, 2020

Multilabel Live Test #9

Closed

Change of category name #8

Closed

sergioburdisso closed this as completed in 79f1e9d May 16, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multilabel Classification Evaluation #5

Multilabel Classification Evaluation #5

angrymeir commented May 10, 2020

angrymeir commented May 10, 2020

sergioburdisso commented May 11, 2020 •

edited

Loading

sergioburdisso commented May 12, 2020

angrymeir commented May 12, 2020

sergioburdisso commented May 12, 2020

allcontributors bot commented May 12, 2020 •

edited by sergioburdisso

Loading

sergioburdisso commented May 12, 2020

Vaiyani commented May 12, 2020

angrymeir commented May 15, 2020

sergioburdisso commented May 16, 2020

sergioburdisso commented May 16, 2020 •

edited

Loading

Vaiyani commented May 16, 2020

angrymeir commented May 16, 2020

sergioburdisso commented May 20, 2020

Multilabel Classification Evaluation #5

Multilabel Classification Evaluation #5

Comments

angrymeir commented May 10, 2020

angrymeir commented May 10, 2020

sergioburdisso commented May 11, 2020 • edited Loading

sergioburdisso commented May 12, 2020

angrymeir commented May 12, 2020

sergioburdisso commented May 12, 2020

allcontributors bot commented May 12, 2020 • edited by sergioburdisso Loading

sergioburdisso commented May 12, 2020

Vaiyani commented May 12, 2020

angrymeir commented May 15, 2020

sergioburdisso commented May 16, 2020

sergioburdisso commented May 16, 2020 • edited Loading

Vaiyani commented May 16, 2020

angrymeir commented May 16, 2020

sergioburdisso commented May 20, 2020

sergioburdisso commented May 11, 2020 •

edited

Loading

allcontributors bot commented May 12, 2020 •

edited by sergioburdisso

Loading

sergioburdisso commented May 16, 2020 •

edited

Loading