Skip to content

Latest commit

 

History

History
28 lines (17 loc) · 5.78 KB

equality-of-opportunity.md

File metadata and controls

28 lines (17 loc) · 5.78 KB

Equality of Opportunity (EOpp)

EOpp Definition

Equality of opportunity is one of the most widely used definitions of fairness. For a recommender system, EOpp suggests that randomly chosen ``qualified'' candidates should be represented equally regardless of which group they belong to; in other words, the exposure of qualified candidates from any group should be equal. Most recommender systems generate scores s(X) (predicting the relevance of an item to a binary response variable Y) to rank candidate items based on a feature set X. In these cases, EOpp corresponds to the independence of <s(X) and characteristic/attribute C given the response/label Y=1, i.e.

P(s(X) \leq t \mid C= c_1, Y=1) = P(s(X) \leq t \mid C= c_2, Y=1),\forall c_1, c_2.

EOpp Algorithm

We provide the post-processing technique presented in Nandy et al. (2021). The function eOppTransformation() (see EOppUtils) can be used to learn a transformation that can be applied to model scores for achieving EOpp. The distribution of the transformed scores can be forced to match as the distribution before transformation by setting the argument originalScale = true. This is useful for blending the transformed scores s*(X) with the original scores s(X) as t * s*(X) + (1-t) * s(X) to achieve a fairness-performance trade-off by adjusting the tuning parameter t in [0, 1].

Position bias adjustment

To define EOpp in the presence of the position bias, we need to take into account the dependency of the response variable Y on the position where the item is shown. To this end, we denote the counterfactual response when an item appears at position j by Y(j). Furthermore, we use to denote the position of an item in the ranking generated by s(X). Therefore, the observed response is given by Y().

A scoring function s(X) of a recommendation system satisfies EOpp with respect to a characteristic C if P(s(X) \leq t \mid C=c_1,Y(\gamma)=1) = P(s(X) \leq t \mid C=c_2,Y(\gamma)=1), \forall t, c_1, c_2.

We provide a debiasing technique that should be applied before applying the EOpp algorithm in the presence of position bias. The function debiasPositiveLabelScores() (see positionBiasUtils) removes the effect of the position bias from the training data and the output can be directly used by eOppTransformation() (see EOppUtils) for learning the EOpp transformation.

Example

We illustrate the position bias adjusted EOpp algorithm in EOppUtilsTest.

Data Generation (as in Nandy et al. (2021)): We generate a population of p = 50,000 items, where each item consists of id i, characteristic Ci in {0, 1}, label at position 1 Yi(1) and relevance Ri. We independently generate Ci's from a Bernoulli(0.6) distribution. The conditional distribution Yi(1) given Ci = 0 is Bernoulli(0.4), and the conditional distribution Yi(1) given Ci = 0 is Bernoulli(0.5). Finally, Ri | (Ci, Yi(1)) is generated from Gaussian(0.6Yi(1) + 2Ci, 0.5) + (1 - Ci) * Uniform[0,~ (1 + Yi(1))].

We consider a recommendation system with K = 50 slots. For each session, we randomly select 50 items from the population and assign a score si = Ri + Gaussian(0, 0.1) to each selected item i = 1,..., 50000. The selected items are then ranked according to si (in a descending order) and assigned position according to rank(i). Finally, the item i at position j gets observed response Y(j) = Y(1) * Bernoulli(wj) with position bias wj = 1 / log2(1+j).

Validation: We learn the EOpp transformation using training data containing 20K i.i.d.\ sessions (i.e. 20K * 50 = 1M samples). For testing, we apply the transformation on validation data containing 20000 i.i.d.\ sessions. To apply the effect of position bias in the transformed validation data, we multiply the labels Y(1) with Bernoulli(1/(1 + position)) random numbers, where the position corresponds to the rank of an item according to the transformed score. We validate the EOpp transformation by computing the 2nd Wasserstein distance between the transformed positive label score distributions corresponding to C=0 and C=1. Additionally, we validate the equality of the transformed score distribution and the scores before the transformation.