Phonorm

phonorm is an exploratory project in which we apply a machine translation approach to the problem of phonetic normalization. The need for such a model arose from the type of conversations we observed in our chatbot ChitChat developed at the Leiden University Center for Innovation, as we observed a lot of text that is written much like it is spoken. Current phonetic algorithms, such as Soundex are too aggressive and do not work well in our use case.

You can find our writeup of the project here. Comments are welcome and can either be left in the issues section or can be sent to jasperginn[at]gmail.com

This repository contains the following files

+-- data
  | +-- extra
      - contains wikipedia dataset with commonly misspelled words
  | +-- preprocessed
      - contains preprocessed datasets
  | +-- raw
      - contains raw data (not preprocessed)
+-- docs
  - Contains presentation and writeup
+-- modeling
  - Contains Jupyter notebooks used for modeling
+-- models
  - Contains pre-trained models
+-- phonorm
  - Contains utilities and code for modeling
+-- preprocessing
  - Contains utilities and code for preprocessing data
+-- .gitignore
+-- README.md
+-- requirements.txt

A note on training the model

If you want to retrain the model using the data in this repository, be aware that training will be slow on CPUs. You should consider using a GPU.

Setting up

At a minimum, you need a python 3 installation. However, it would be best to use Anaconda. The steps below assume that you are using anaconda for this project.

Create a new environment called 'phonorm'

conda create -n phonorm python=3.6 anaconda

Activate the environment

source activate phonorm

on Windows:

conda activate phonorm

Install dependencies

conda install --yes --file requirements.txt

(optional) Install 'pywiktionary' from git

pip install git+https://github.com/abuccts/wikt2pron.git

(optional) install tensorflow-gpu if you are using a GPU

conda install tensorflow-gpu

At this point, your environment ready to be used.

Using phonorm

If you want to train your own models, you should check out the modeling folder for examples.

If you want to use the pre-trained models, please see the examples folder.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Phonorm

A note on training the model

Setting up

Using phonorm

About

Releases 2

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 25 Commits
data		data
docs		docs
examples		examples
img		img
modeling		modeling
models		models
phonorm		phonorm
preprocessing		preprocessing
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
_config.yml		_config.yml
requirements.txt		requirements.txt

License

JasperHG90/Phonorm

Folders and files

Latest commit

History

Repository files navigation

Phonorm

A note on training the model

Setting up

Using phonorm

About

Topics

Resources

License

Stars

Watchers

Forks

Releases 2

Packages 0

Languages

Packages