Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for an incremental alignment model that models fertility #4

Open
ddaspit opened this issue Apr 4, 2017 · 4 comments
Open

Comments

@ddaspit
Copy link

ddaspit commented Apr 4, 2017

Currently, Thot has support for IBM1, IBM2, and HMM models, none of which models fertility. Are there any plans to support IBM3, IBM4, IBM5, or one of the extensions to HMM that models/simulates fertility? Obviously, the IBM models 3-5 are complex and might be difficult to support. Some of the fertility extensions to HMM seem simpler and would improve accuracy.

@daormar
Copy link
Owner

daormar commented Apr 5, 2017

Hi Damien,

as you point out IBM models 3-5 are complex and difficult to introduce. However, the main reason
why they have not been included yet into the toolkit is because they do not seem to produce significant gains in translation quality with respect to HMM-based models (a better alignment error rate does not always result in improved translation quality).

I agree with you in that probably the way to go in this case would be to incorporate fertility extensions to HMM (I assume that with this you are referring to the 2002 paper by Toutanova et. al). The problem is that this is still not so easy to incorporate and currently we are focusing in the improvement of other aspects of the toolkit.

On the other hand, I was wondering if you need such models to generate alignments or only because of the potential improvements they would produce in translation quality.

@ddaspit
Copy link
Author

ddaspit commented Apr 6, 2017

First off, I just want to say that I appreciate the work that you have done on Thot. The incremental training and interactive machine translation features are invaluable.

I certainly understand that the IBM models 3-5 do not greatly improve the quality for translation and that the main purpose of Thot is machine translation. We are using Thot for machine translation, but, as you guessed, we are also using the single word alignment models to align texts for various purpose. That is why we are interested in adding fertility to the HMM model. Thot's ability to perform incremental training of the word alignment models is important for our project, so Giza++ isn't really an option. The HMM model is working well for us. We were just looking into possible ways of improving the quality of the alignments and modeling fertility seemed to be the most promising route, so I wanted to find out if there was any plans to add it.

@daormar
Copy link
Owner

daormar commented Apr 7, 2017

First of all, thanks very much for your interest in the tool and for your response.

I think that HMM alignment models with fertility would be a very interesting feature to incorporate in the toolkit. In spite of the fact that currently we don't have the possibility to spend time on that, lately we are interested in incorporating new collaborators in the project, and such feature could constitute one candidate for future developments.

@ddaspit
Copy link
Author

ddaspit commented Apr 8, 2017

That sounds good to me. I might be able to work on it at some point and submit it as a pull request. Thank you for keeping it as a candidate for future development.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants