Skip to content
saphir-lab edited this page Feb 23, 2022 · 5 revisions

Welcome to the CSV-Anonymizer wiki

Introduction

A common problematic encountered by Data Scientists is to have data containing personally identifiable information (PII) such as email addresses, customer IDs or phone numbers. A simple solution is to remove these fields before sharing the data. However, analysis may rely on having the PII data. For example, in e-Commerce context, customer IDs are necessary to know how many customers bought which product. Instead of removing PII data, you can anonymize the PII by using hashing technique.

CSV Anonymizer provides capacity to anonymize all or part of your data using various hash functions.

🛠 Configuration

This script takes parameters from a file named 'settings.yaml' and located on the same directory as the script. To know more on the possibilities offered, you can refer to the setting.yaml page on this wiki.

🔢 Sample Outcome

See the Sample page on this wiki site to visualize outcomes for various parameters.

Clone this wiki locally