Rule extraction in unsupervised outlier detection for XAI

OneClass SVM is a popular method to perform unsupervised outlier detection on the features that compose a dataset. However, this method is generally considered a "black box" due to the fact that it's difficult to justify in an intuitive and simple way why the decision frontier is classifying the data points in the different categories. This problem appears in supervised classification tasks and even more in unsupervised learning. To be able to obtain intuitive explanations this library offers a method to infer rules that justify why a point is labeled as an outlier based on [1], [2].

The library performs the outlier analysis using scikit-learn [3].

This method offers an algorithmic transparency method for variables of any kind (they can be numerical or categorical)

Getting Started

These instructions will explain how to use the library and be able to obtain the results indicated.

Prerequisites

The dependencies included in the file requirements.txt are needed to be able to use the library directly or to execute the example (Example.ipynb)

$ pip install -r requirements.txt

Usage

The main function that acts as a wrapper over the class OneClassSVM from scikit-learn is ocsvm_rule_extractor. This function uses as parameters the following ones:

dataset: pandas dataframe containing the dataset
numerical_cols: list of the columns that contain numerical variables
categorical_cols: list of the columns that contain categorical (non ordinal) variables
dct_params: dictionary that contain the parameters used in the generic model creation of OneClassSVM. See [3] for more info.

NOTE: categorical columns should be onehot encoded.

>>> ocsvm_rule_extractor(dataset, numerical_cols, categorical_cols, dct_params)

The function then returns both the model trained and a dataframe with the rules infered. These rules look like the following example:

NOT anomaly...
Rule Nº 1: IF sex = 0 AND school = 0 AND studytime <= 4 AND G3 <= 15 AND studytime >= 1 AND G3 >= 8 
Rule Nº 2: IF sex = 0 AND school = 1 AND studytime <= 2 AND G3 <= 0 AND studytime >= 2 AND G3 >= 0 
Rule Nº 3: IF sex = 1 AND school = 0 AND studytime <= 4 AND G3 <= 13 AND studytime >= 2 AND G3 >= 8

These rules indicates the limit values that justify why a data point should not be considered an anomaly, so any other case would mean that the data is anomalous.

More Information

More information regarding the theory behind this method can be found within the corresponding paper. The Jupyter Notebooks show several examples about how to use the library. The notebooks are designed so they can be tested with services such as Google Colab.

Authors

Alberto Barbado González - (https://github.com/AlbertoBarbado/)
Barbado González, Alberto. 2019. Rule extraction in unsupervised outlier detection for algorithmic transparency. Madrid. Telefónica. https://github.com/AlbertoBarbado/unsupervised-outlier-transparency
Barbado, A., Corcho, Ó., & Benjamins, R. (2022). Rule extraction in unsupervised anomaly detection for model explainability: Application to OneClass SVM. Expert Systems with Applications, 189, 116100. To cite it (BibTeX):

@article{barbado2022rule, title={Rule extraction in unsupervised anomaly detection for model explainability: Application to OneClass SVM}, author={Barbado, Alberto and Corcho, {'O}scar and Benjamins, Richard}, journal={Expert Systems with Applications}, volume={189}, pages={116100}, year={2022}, publisher={Elsevier} }

License

This project is licensed under the Apache License 2.0 - see the LICENSE.md file for details

References

[1]: H. Núñez, C. Angulo, and A. Català. Rule extraction from support vector machines. In European Symposium on Artificial Neural Networks (ESANN), pages 107–112, 2002.
[2]: D. Martens, J. Huysmans, R. Setiono, J. Vanthienen, and B. Baesens. Rule Extraction from Support Vector Machines: An Overview of Issues and Application in Credit Scoring. 2008.
[3]: scikit-learn library for OneClassSVM: https://scikit-learn.org/stable/modules/generated/sklearn.svm.OneClassSVM.html

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
dataset		dataset
lib		lib
results/seismic_2D		results/seismic_2D
Example_2D.ipynb		Example_2D.ipynb
Example_2D.py		Example_2D.py
Example_Generic.ipynb		Example_Generic.ipynb
LICENSE		LICENSE
README.md		README.md
main.py		main.py
main_seismic.py		main_seismic.py
main_seismic_2D.py		main_seismic_2D.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Rule extraction in unsupervised outlier detection for XAI

Getting Started

Prerequisites

Usage

More Information

Authors

License

References

About

Releases 2

Packages

Languages

License

AlbertoBarbado/unsupervised-outlier-transparency

Folders and files

Latest commit

History

Repository files navigation

Rule extraction in unsupervised outlier detection for XAI

Getting Started

Prerequisites

Usage

More Information

Authors

License

References

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages