Skip to content

Using a recurrent neural network in TensorFlow to predict national origin by last name.

Notifications You must be signed in to change notification settings

avikj/Name-NationalOrigin-Classifier

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Name Origin Classifier

Humans are often able to accurately predict someone's national origin based on their last name, even if they have not seen it before. For example, one can predict that "Yang" is more likely to be a Chinese last name than a Japanese one, while "Kobayashi" seems characteristically Japanese.

I used a recurrent neural network with TensorFlow in Python to train a model to do this automatically. The model achieved 97% accuracy in classifying names as Chinese pr Japanese names, 87% accuracy in classifying names as Chinese, Japanese, or Vietnamese, and 79% accuracy in classifying names as Chinese, Japanese, Vietnamese, or Korean.

This drastic decrease in accuracy with the introduction of Vietnamese and Korean makes sense, and would likely also be seen among humans, as Korean and Vietnamese names seem similar to Chinese names. Even within the dataset, some Korean and Vietnamese names were the same as Chinese names ("Tien" is listed both as Chinese and Vietnamese, while "Wang" is listed both as Chinese and Korean).

Name data was scraped from familyeducation.com.

About

Using a recurrent neural network in TensorFlow to predict national origin by last name.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages