From credit ratings to housing allocation, machine learning models are increasingly used to automate everyday decision making processes. With the growing impact on society, more and more concerns are being voiced about the loss of transparency, accountability and fairness of the algorithms making the decisions. We as data scientists need to step-up our game and look for ways to mitigate emergent discrimination in our models. We need to make sure that our predictions do not disproportionately hurt people with certain sensitive characteristics (e.g., gender, ethnicity).
Luckily, last year's NIPS conference showed that the field is actively investigating how to bring fairness to predictive models. The number of papers published on the topic is rapidly increasing, a signal that fairness is finally being taken seriously. This point is also nicely made in the cartoon below, which was taken from the excellent CS 294: Fairness in Machine Learning course taught at UC Berkley.
Some approaches focus on interpretability and transparency by allowing deeper interrogation of complex, black box models. Other approaches, make trained models more robust and fair in their predictions by taking the route of constraining and changing the optimization objective.
Here, we will train a model for making income level predictions, analyse the fairness of its predictions and then show how adversarial training can be used to make it fair. The used approach is based on the 2017 NIPS paper "Learning to Pivot with Adversarial Networks" by Louppe et al.
For our experiment we use Ault UCI dataset which can be download from here. It is also referred to as "Census Income" dataset. Here, we will predict whether or not a person's income is larger than 50K dollar a year. It is not hard to imagine that financial institutions train models on similar data sets and use them to decide whether or not someone is eligible for a loan, or to set the height of an insurance premium. The dataset contain the following features:
In the Adult UCI dataset there are two sensitive featues.
Distribution of race in the dataset | Distribution of sex in the dataset |
Mutual information is a measure of non-linear association between two variables. It indicates how much the uncertainty around one variable is reduced by observing another. In the below, you can see the mutual information values between each of the 6 features and protected features. Notice the high value between marital-status and sex. This suggests a possible relationship between these variables. In other words, marital-status could be a proxy variable for sex. This is done using the mutual_info_classif
function.
Below figure describe our full model and the workflow.
Use Pip to create a new environment and install dependency from requirement.txt
file. The following command will install the packages according to the configuration file requirement.txt
.
pip install -r requirements.txt
After setup the required folders and package run the following command for the experiment.
python project/train.py \
--root_dir <YOUR_ROOT_DIR> \
--dataset_dir <YOUR_DATAASET_DIR> \
--batch_size 64 \
--iteration 10 \
--gpu YGPU_NUMBER> \
--test_size 0.2
Without GPU
python project/train.py \
--root_dir <YOUR_ROOT_DIR> \
--dataset_dir <YOUR_DATAASET_DIR> \
--batch_size 64 \
--iteration 10 \
--test_size 0.2
A new folder "visualization" will be created automatically, and for each iteration the result graph with accuracy, ROC AUC and DI (%) will be saved in the visualization directory.
Result from our experiment | Previously implementated result |
From our experiment, we got the following result
DI vs Attention Weight | Accuracy vs Attention Weight |