GitHub - hellomasaya/word-alignment-models: IBM model 1

NLA Assignment 1 - Word Alignment

Data preprocessing:

Training the model:

Used defaultdict as translation probability table to improve training time and space, where each entry takes key-value pairs in the following format: tef([hindi_word, english_word]) = translation_probability. Using this only relevant pairs of words are looked at.
For each hindi word in each hindi sentence the corresponding english translated sentence's words are made pairs with.
The EM algorithm is run i.e. the model is trained for 16 epochs.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.ipynb_checkpoints		.ipynb_checkpoints
20171099.zip		20171099.zip
Assignment 1 - Word Alignment.pdf		Assignment 1 - Word Alignment.pdf
HMM_wa.pdf		HMM_wa.pdf
IBM_one.ipynb		IBM_one.ipynb
IBMmodel1-v3.ipynb		IBMmodel1-v3.ipynb
IBMmodel1.ipynb		IBMmodel1.ipynb
IBMmodel1.py		IBMmodel1.py
IBMmodel1_0.ipynb		IBMmodel1_0.ipynb
README.md		README.md
README.pdf		README.pdf
dev.en.txt		dev.en.txt
dev.hi		dev.hi
dev_translations.txt		dev_translations.txt
dev_version1.txt		dev_version1.txt
dev_version2.txt		dev_version2.txt
test.en		test.en
test.hi		test.hi
train.en		train.en
train.hi		train.hi