word2vec algorithm using skip-gram architecture

Why to use Embedding layer instead of one-hot encoding?

If you using one-hot encoding you will end up with a lot zeros at the hidden out of the first hidden layers. To solve this problem, an embedding layer used as a lookup table.

The words are encoded as integers that used as index for words in the lookup table. for example, "go" might be encoded as 148 and it means that it corresponds to the 148th row of the embedding matrix.

The rows of the matrix are corresponds to the number of words.
The columns of the matrix are corresponds to the hidden dimension.

word2vec

Word2Vec uses the embedding layer to find vector representations of words that contain semantic meaning(relating to meaning in language or logic. ).

Loading data and pre-processing:

** Here I'm fixing up the text to make training easier. This comes from the utils.py file. The preprocess function does a few things:

It converts any punctuation into tokens, so a period is changed to . In this data set, there aren't any periods, but it will help in other NLP problems.
It removes all words that show up five or fewer times in the dataset. This will greatly reduce issues due to noise in the data and improve the quality of the vector representations.
It returns a list of words in the text.

Then, to reduce th noise from our data and to get better training and representation, subsampling is used.

Do we need all layers of the network?

It depends on the application but for this application we need embedding layers only (as a lookup table).

The image below shows a network for input, embedding and output layer. We only need the embedding layer.

Validation

The validation is done by cosine similarity.

What is Negative Sampling?

For every example we give the network, we train it using the output from the softmax layer. That means for each input, we're making very small changes to millions of weights even though we only have one true example. This makes training the network very inefficient. We can approximate the loss from the softmax layer by only updating a small subset of all the weights at once. We'll update the weights for the correct example, but only a small number of incorrect, or noise, examples. This is called "negative sampling".

There are two modifications we need to make. First, since we're not taking the softmax output over all the words, we're really only concerned with one output word at a time. Similar to how we use an embedding table to map the input word to the hidden layer, we can now use another embedding table to map the hidden layer to the output word. Now we have two embedding layers, one for input words and one for output words. Secondly, we use a modified loss function where we only care about the true example and a small subset of noise examples.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
.ipynb_checkpoints		.ipynb_checkpoints
assets		assets
data		data
Negative_Sampling_Exercise.ipynb		Negative_Sampling_Exercise.ipynb
README.md		README.md
Skip_Grams_Exercise.ipynb		Skip_Grams_Exercise.ipynb
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

word2vec algorithm using skip-gram architecture

Why to use Embedding layer instead of one-hot encoding?

word2vec

Loading data and pre-processing:

Do we need all layers of the network?

Validation

What is Negative Sampling?

Visualization for the results

Results for skip_GRams_Exercise:

Results for Negative_Sampling_Exercise:

Referance:

About

Releases

Packages

Languages

mohammad-albarham/word2vec-embeddings

Folders and files

Latest commit

History

Repository files navigation

word2vec algorithm using skip-gram architecture

Why to use Embedding layer instead of one-hot encoding?

word2vec

Loading data and pre-processing:

Do we need all layers of the network?

Validation

What is Negative Sampling?

Visualization for the results

Results for skip_GRams_Exercise:

Results for Negative_Sampling_Exercise:

Referance:

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages