Skip to content

ilopezgazpio/DAM_STS

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DAM_STS

Reimplementation of the Decomposable Attention Model (DAM) for STS

If you use this software for academic research please cite the described paper:

@inproceedings{artetxe2018conll,
  author    = {Artetxe, Mikel  and  Labaka, Gorka  and Lopez-Gazpio, Inigo  and  Agirre, Eneko},
  title     = {Uncovering divergent linguistic information in word embeddings with lessons for intrinsic and extrinsic evaluation},
  booktitle = {Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018)},
  month     = {October},
  year      = {2018},
  address   = {Brussels, Belgium},
  publisher = {Association for Computational Linguistics}
}

Requirements

  • Python 2
  • Python 3
  • pyTorch (tested on 0.2)
  • The following python modules: numpy, h5py and nltk.corpus.stopwords (optional)

Usage

  1. Download and extract the STS Benchmark dataset from http://ixa2.si.ehu.es/stswiki/images/4/48/Stsbenchmark.tar.gz

wget http://ixa2.si.ehu.es/stswiki/images/4/48/Stsbenchmark.tar.gz tar xvzf Stsbenchmark.tar.gz

  1. Process the dataset
python2 preprocess_datasets/process-STSBenchmark.py \
	--data_folder stsbenchmark \
	--out_folder stsbenchmark
  1. Download and extract Glove word embeddings in the stsbenchmark folder from http://nlp.stanford.edu/data/glove.840B.300d.zip
wget http://nlp.stanford.edu/data/glove.840B.300d.zip
unzip glove.840B.300d.zip
mv glove.840B.300d.txt embeddings.glove.txt
  1. Run Evaluate.sh script
./Evaluate.sh script

Acknowledgements

The project is motivated by the following papers and github repositories:

About

Decomposable Attention Model (DAM) for STS

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published