Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conceptual issue in character embeddings #40

Open
mraduldubey opened this issue Feb 25, 2019 · 4 comments
Open

Conceptual issue in character embeddings #40

mraduldubey opened this issue Feb 25, 2019 · 4 comments

Comments

@mraduldubey
Copy link

I have this conceptual doubt in the part where we are obtaining word level representations from characters using the final output of BiLSTM network. We are initializing the character embeddings using xavier_initialization which just ensures that the cells do not saturate. So, how do these random embeddings capture any meaningful information? And how is this network trained or is it unsupervised?

@guillaumegenthial
Copy link
Owner

Hi @mraduldubey ,
You are right, the character embeddings are indeed initialized randomly. However, at training time, the loss is backpropagated all the way and the character embeddings are thus updated (thus using supervised learning).

@mraduldubey
Copy link
Author

mraduldubey commented Apr 17, 2019

Thanks @guillaumegenthial for the reply. This way the ground truth will be a vector representing the whole word. So, what is the ground truth here?

@guillaumegenthial
Copy link
Owner

You train the network to predict the tags. Turns out some parameters of the network correspond to character embeddings, so these are trained to help the network predict the tags. So the ground truth is the tag, and the learned embeddings help predict this tag.

@mraduldubey
Copy link
Author

So, you mean that the word representation n/w, the contextual word representation n/w and the decoder, though mentioned separately in the blog, are trained simultaneously in conjunction with the ground truth being the tags and the backpropagation happens from the final layer back to the word representation n/w.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants