Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multilabel Classification Evaluation #5

Closed
angrymeir opened this issue May 10, 2020 · 14 comments
Closed

Multilabel Classification Evaluation #5

angrymeir opened this issue May 10, 2020 · 14 comments
Labels
enhancement New feature or request

Comments

@angrymeir
Copy link
Collaborator

Hey @sergioburdisso,

Thank you for this awesome project!
Currently the evaluation class only supports single label classification, even though SS3 inherently supports multilabel classification.
These are the steps (I see) needed to support multilabel classification evaluation:

  • Take the output of classify_multilabel
  • Convert result to binarized vector (same length as confidence vector)
  • Implement multilabel classification metrics usage (e.g. Hamming Loss)
  • Adopt Gridsearch
@angrymeir
Copy link
Collaborator Author

Edit: Since the multilabel stratified k-fold cross validation is not implemented in sklearn this repository might help for the implementation of multilabel gridsearch.

@sergioburdisso sergioburdisso added the enhancement New feature or request label May 11, 2020
@sergioburdisso
Copy link
Owner

sergioburdisso commented May 11, 2020

Thank you, @angrymeir! You're helping to make this humble project better!

That is totally right, the current implementation of the evaluation class does not provide support for multilabel classification.

What do you think of adding an extra argument to the classify_multilabel called, for instance, indicator_function_output, which could be either True or False. This argument could be used to enable the output to be a binarized vector having the value 1 for all $c_i$ such that $doc \in c_i$ (according to the trained model), and 0 otherwise. Do you think the name indicator_function_output is OK?

I'm currently working on the train method, which now should make the training procedure much easier and clearer, allowing the y_train list to be composed of lists of labels (not single labels). One interesting thing I realized is that some datasets will provide no labels at all for some documents (e.g. this one), thus, the empty list [] is a valid "label". Internally, I create a special "other" category as a workaround. Good thing is that now the train/fit will be much flexible.

Thanks for suggesting that repository implementing multilabel stratified k-fold cross-validation! it seems quite straightforward to use.

BTW, taking into account your great ideas, suggestions, and feedback, do you mind being added to the README file as a contributor?

@sergioburdisso
Copy link
Owner

BTW just in case that you're wondering regarding being added as a contributor, PySS3 follows the all-contributors specification, "Recognize all contributors, not just the ones who push code" 😎

Now that I'm done with the other Issue, I'll continue with this one 👽 ☕

@angrymeir
Copy link
Collaborator Author

Sounds like a plan!
I'll also further read into stratification.

I would be honored to be listed as a contributor! However, the ideas are not only from me, but also my colleague @Vaiyani!

@sergioburdisso
Copy link
Owner

@all-contributors could you add @Vaiyani and @angrymeir as contributors for ideas, suggestions, and feedback?

@allcontributors
Copy link
Contributor

allcontributors bot commented May 12, 2020

@sergioburdisso

I've put up a pull request to add @angrymeir and @Vaiyani! 🎉

@sergioburdisso
Copy link
Owner

@angrymeir and @Vaiyani, both were added to the readme file! 😎 Thanks, guys. I've also added you as contributions not only for ideas but also for data (since probably I'll be using your SemEval 2016 Task 5 dataset for the tutorials and live demo, as suggested in Issue #6).

@Vaiyani
Copy link

Vaiyani commented May 12, 2020

@sergioburdisso Thanks for this great project as well :)

sergioburdisso added a commit that referenced this issue May 13, 2020
The fit/train method now supports multilabel classification. It will
automatically determine if we're dealing with a multilabel
classification problem by looking at the first item of the `y_train`
list. If the first item is a list (of labels), i.e., if it's not a
single label, it will assume we're dealing with a multilabel
classification problem.
sergioburdisso added a commit that referenced this issue May 13, 2020
This function converts the list of training/test labels (i.e.,
y_train/y_test) into a membership matrix. This function is useful when
working with multi-label classification problems and it is meant to be
used only internally by the evaluation module (the ``Evaluation``
class). However, in case users want to perform model evaluations using
custom evaluation metrics, they could use this function as shown in the
following example, in which the performance will be measured in terms of
Hamming loss:

```
from pyss3 import SS3
from pyss3.util import Dataset, membership_matrix

from sklearn.metrics import hamming_loss

x_train, y_train = Dataset.load_from_files_multilabel(...)
x_test, y_test = Dataset.load_from_files_multilabel(...)

clf = SS3()
clf.train(x_train, y_train)

y_pred = clf.predict(x_test, multilabel=True)

y_test_mem = membership_matrix(clf, y_test)
y_pred_mem = membership_matrix(clf, y_pred)

hamming_loss(y_test_mem, y_pred_mem)
```

Documentation available here:
https://pyss3.rtfd.io/en/latest/api/index.html#pyss3.util.membership_matrix
sergioburdisso added a commit that referenced this issue May 13, 2020
This function converts the list of training/test labels (i.e.,
y_train/y_test) into a membership matrix. This function is useful when
working with multi-label classification problems and it is meant to be
used only internally by the evaluation module (the ``Evaluation``
class). However, in case users want to perform model evaluations using
custom evaluation metrics, they could use this function as shown in the
following example, in which the performance will be measured in terms of
Hamming loss:

```
from pyss3 import SS3
from pyss3.util import Dataset, membership_matrix

from sklearn.metrics import hamming_loss

x_train, y_train = Dataset.load_from_files_multilabel(...)
x_test, y_test = Dataset.load_from_files_multilabel(...)

clf = SS3()
clf.train(x_train, y_train)

y_pred = clf.predict(x_test)

y_test_mem = membership_matrix(clf, y_test)
y_pred_mem = membership_matrix(clf, y_pred)

hamming_loss(y_test_mem, y_pred_mem)
```

Documentation available here:
https://pyss3.rtfd.io/en/latest/api/index.html#pyss3.util.membership_matrix
sergioburdisso added a commit that referenced this issue May 14, 2020
Now, when working with multi-label classification problems,
``predict()`` will realize the user is working with multi-labeled data
and set the `multilabel` argument to True by default. Therefore, if the
user has trained the model using multilabeled data, then (s)he can
simply call ``predict(x_test)`` without the ``multilabel=True``
argument.

(#5)
sergioburdisso added a commit that referenced this issue May 14, 2020
Now the ``membership_matrix()`` runs 30 times faster. For instance, what
before took 4.5s now takes only 150ms. This optimization was necessary
because this function is called each that time the model is evaluated,
which means, for instance, that is called multiple times while
performing ``grid_search()``.
sergioburdisso added a commit that referenced this issue May 14, 2020
Evaluation.test() now supports multi-label classification as well. It
supports all previous standard metrics (precision, recall, f1-score,
accuracy) plus two new ones, 'hamming-lose' and 'exact-match'
(equivalent to 'accuracy'). Once finished, the `test` function will
also show a binary confusion matrix for each possible label.
sergioburdisso added a commit that referenced this issue May 14, 2020
Evaluation.test() now supports multi-label classification as well. It
supports all previous standard metrics (precision, recall, f1-score,
accuracy) plus two new ones, 'hamming-lose' and 'exact-match'
(equivalent to 'accuracy'). Once finished, the `test` function also shows a binary confusion matrix for each possible label.
sergioburdisso added a commit that referenced this issue May 14, 2020
Evaluation.test() now supports multi-label classification as well. It
supports all previous standard metrics (precision, recall, f1-score,
accuracy) plus two new ones, 'hamming-lose' and 'exact-match'
(equivalent to 'accuracy'). Once finished, the `test` function also
shows a binary confusion matrix for each possible label.
This was referenced May 15, 2020
@angrymeir
Copy link
Collaborator Author

Hey @sergioburdisso ,
I just tried out clf.fit() and Evaluation.test() on our multilabel dataset and it works like a charm!
Wuhu 🥳 Thank you for implementing this!

Regarding the Grid Search should I create a separate Issue for that?

sergioburdisso added a commit that referenced this issue May 16, 2020
Evaluation.kfold_cross_validation() now supports multi-label
classification as well. It supports all previous standard metrics
(precision, recall, f1-score, accuracy) plus two new ones,
'hamming-lose' and 'exact-match' (equivalent to 'accuracy').
@sergioburdisso
Copy link
Owner

@angrymeir Cool!!! I've just finished with the kfold_cross_validation, now I'll start with the grid_search, it shouldn't be too difficult since it mostly calls test and kfold_cross_validation. I've been doing the changes in such a way to make things easier for me, not only for grid_search but also for the interactive evaluation 3D Plot (Evaluation.plot()) which now shouldn't take me too much time to adapt it to support multilabel classification.

sergioburdisso added a commit that referenced this issue May 16, 2020
Evaluation.grid_search() now supports multi-label classification using
the "test" method. It supports all previous standard metrics (precision,
recall, f1-score, accuracy) plus two new ones, 'hamming-lose' and
'exact-match' (equivalent to 'accuracy').
sergioburdisso added a commit that referenced this issue May 16, 2020
Now the 3D evaluation plot (`Evaluation.plot()`) supports multi-label
classification. New performance metrics have been added and binary
confusion matrices for each label are shown for each evaluated model
configuration.
@sergioburdisso
Copy link
Owner

sergioburdisso commented May 16, 2020

@angrymeir @Vaiyani Guys! I've finally finished adding full multi-label classification support to the Evaluation class! Yay!!! 🥳🥳🥳

Thanks, guys, for creating this issue :) these changes were necessary. Issue #9 is also part of this overall process of adding full multi-label classification support to PySS3 so, as soon as I finish with the other two issues, I'll finally release the new version (0.6.0). Do you think guys that we should also add a new tutorial showing the new features? do you think your dataset is gonna be well suited for that or should I use a simpler one? sort of more like a "proof-of-concept" dataset... what do you think?

@Vaiyani
Copy link

Vaiyani commented May 16, 2020

@sergioburdisso thankyou for the quick and effective response from your side on this issue.

I believe tutorial would be a good idea for the new people as well because tutorials are the first point of learning (from my experience). Would be really helpful.

As for the data, not quite sure. Our dataset (Sem eval) is also well suited for this but at the end whichever delivers the message clearly should be the aim.

@angrymeir
Copy link
Collaborator Author

I guess a tutorial highlighting the differences would be great!
However we can't use SemEval for that, since we're not allowed to redistribute it publicly..
I think the Toxic Comment Dataset should also be well suited for that :)

In case you need help with the notebook/don't have time to implement it, let me know and I'll create one!

sergioburdisso added a commit that referenced this issue May 20, 2020
The dataset is a subset of the CMU Movie Summary Corpus
(http://www.cs.cmu.edu/~ark/personas/) with 32985 summaries and only 10
movie genres. The dataset is structured according to #6, i.e., there are
two files, one for the labels and another for the movie plot summaries.
@sergioburdisso
Copy link
Owner

Guys! I've just finished implementing the multi-label support for the Live Test tool (issue #9).

Now, in the left panel, test documents are shown with a % corresponding to the label-based accuracy (aka hamming score). Besides, when a document is selected, the true labels are shown along with the predicted labels, misclassified labels are shown in red, as "drama" below:


I'm about to release the new version soon, I'm just performing the final checks. Regarding the dataset for the tutorial, I've finally decided to use a subset of the CMU Movie Summary Corpus with only 10 categories (and 32985 documents/plot summaries). I've already uploaded the zipped dataset to the repo (5f5c055), it uses the same format as you suggested in Issue #6 (one file for (semicolon separated) labels, another for docs), so I'll probably start working on (a very basic version of) the tutorial soon 😊

sergioburdisso added a commit that referenced this issue May 24, 2020
PySS3 now fully support multi-label classification! :)

- The ``load_from_files_multilabel()`` function was added to the
  ``Dataset`` class (7ece7ce, resolved #6)

- The ``Evaluation`` class now supports multi-label classification (#5)
  - Add multi-label support to ``train()/fit()`` (4d00476)
  - Add multi-label support to ``Evaluation.test()`` (0a897dd)
  - Add multi-label support to ``show_best and get_best()`` (ef2419b)
  - Add multi-label support to ``kfold_cross_validation()`` (aacd3a0)
  - Add multi-label support to ``grid_search()`` (925156d, 79f1e9d)
  - Add multi-label support to the 3D Evaluation Plot (42bbc65)

- The Live Test tool now supports multi-label classification as well
  (15657ee, b617bb7, resolved #9)

- Category names are no longer case-insensitive (4ec009a, resolved #8)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants