name-probability

This repo implements the disambiguation methodology outlined in "How Unique and Traceable are Usernames?" to link users across platforms. While the paper is interested in usernames, I've typically used it as an additional feature in record linkage tasks -- for example, linking campaign contributions to employment data.

Usage

>>> from NameProbability import NameMatcher
>>> name_list_src = '#LOCATION OF NAME LIST FILE' # or use sample_names.csv in data directory
>>> # for custom name list, expects text file with each row containing string for a person's name
>>> # currently only been tested with "first last" or "last, first" name formats
>>> nameprob = NameMatcher(name_list_location=name_list_src, last_comma_first=True)
>>> nameprob.probSamePerson('john smith', 'john r smith')
>>> 0.008288431595531668
>>> nameprob.probSamePerson('zubin jelveh', 'zubin r jelveh')
>>> 0.999999999999234634

Installation

python setup.py install

Edit Operation Probability

In order to compute P(u_1 | u_2) -- the probability person A uses name one given that person A uses name two -- we have to compute the probability of each edit operation that takes us from u_1 to u_2. The current implementation does this empirically by taking a sample of 50,000 names and counting the occurrence of each type of edit operation. Room for improvement here.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
data		data
NameProbability.py		NameProbability.py
README.md		README.md
__init__.py		__init__.py
counter.py		counter.py
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

name-probability

Usage

Installation

Edit Operation Probability

About

Releases

Packages

Languages

zjelveh/name-probability

Folders and files

Latest commit

History

Repository files navigation

name-probability

Usage

Installation

Edit Operation Probability

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages