Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Detection metrics score doesn't accurately point out quality or privacy. Should the formula be changed? #375

Open
npatki opened this issue Jun 26, 2023 · 0 comments
Labels
question General question about the software

Comments

@npatki
Copy link
Contributor

npatki commented Jun 26, 2023

Problem Description

The detection metrics for single table data and sequential data both compute the AUC (ROC) and return 1-AUC as the final score. The score is hard to interpret.

  • An extreme (close to 0 or close to 1) indicates that the synthetic and real data are noticeably different -- enough for a model to tell them apart. This indicates lower quality or alternatively higher privacy.
  • A middle score (close to 0.5) indicates that the synthetic and real data are similar -- enough to fool the model, as the model is no better than random. This indicates higher quality or alternately lower privacy.

This is an odd way to interpret the score. Usually, we want 1 to represent success and 0 to represent failure.

Proposed Changes

Instead of returning 1-AUC, perhaps there is a different formula we can use such as:

$score = | AUC - 0.5 | \times 2$

This would yield a score that is geared towards privacy:

  • 0 if the AUC score was close to 0.5, which means lower privacy
  • 1 if the AUC score was closer to an extreme (0 or 1), which means higher privacy
@npatki npatki added the question General question about the software label Jun 26, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question about the software
Projects
None yet
Development

No branches or pull requests

1 participant