Skip to content

Getting Word Embeddings using my ELMO architecture

Notifications You must be signed in to change notification settings

MallaSailesh/ELMO

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

ELMo: Deep Contextualized Word Representations

Introduction

Modern word embedding algorithms like word2vec and GloVe provide single representations for words, ignoring contextual information. ELMo, a contextualized embedding model, addresses this by capturing word meaning in context using stacked Bi-LSTM layers. This README outlines the implementation and training of an ELMo architecture from scratch using PyTorch.

Implementation and Training

Architecture

The ELMo architecture consists of stacked Bi-LSTM layers to generate contextualized word embeddings. Weights for combining word representations across layers are trained.

Model Pre-training

ELMo embeddings are learned through bidirectional language modeling on the given dataset's train split.

Downstream Task

Trained the ELMo architecture on a 4-way classification task using the AG News Classification Dataset.

Corpus

Trained the model on the provided News Classification Dataset (same dataset used for other methods in getting word embeddingd - check Word_Vectorization repository of mine for more detail).

Hyperparameter Tuning

Trainable λs

Trained and found the best λs for combining word representations across different layers.

Frozen λs

Randomly initialized and froze the λs.

Learnable Function

Learned a function to combine word representations across layers.

Analysis

Comprehensive analysis of ELMo's performance in pretraining and the downstream task compared to SVD and Word2Vec embeddings. Included performance metrics like accuracy, F1 score, precision, recall, and confusion matrices for different settings.

Loading Models

Load Model

data = torch.load("<filename>")

To load any model ( .pt files ) :-

`<data retrieved>` = torch.load("`<filename>`")

Note :-

  • While pretraining the elmo , i used only first 10000 sentences in train.csv for it
  • Also i used only first 10000 train sentences for downstream task also

About

Getting Word Embeddings using my ELMO architecture

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages