Skip to content

yeounoh/DQM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Quality Metric (DQM) -- obsolete

Estimating the number of remaining errors (aka Data Quality Metric/DQM) in a dataset is an important problem. Previously (http://www.vldb.org/pvldb/vol10/p1094-chung.pdf), we have shown that some heuristic estimators can provide useful estimates to guide the data cleaning process (e.g., know when to stop cleaning). In this work, we hope to continue developing a new estimator with some correctness guarantees (e.g., upper/error bounds).

This is an old repository. I have moved this project to DQMflask (https://github.com/yeounoh/DQMflask).

DQM Test Cases

  • Compare DQM estimators
python dqm_test.py DQMTest.test_estimators

Amazon Mechanical Turks Expeirments

amt/ folder contains codes for AMT experiments. The experiments (data cleaning or error detection) were crowdsourced on AMT.

python amt/exp_restaurant_dataset.py

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages