Skip to content

Latest commit

 

History

History
39 lines (27 loc) · 1011 Bytes

README.md

File metadata and controls

39 lines (27 loc) · 1011 Bytes

DataLoader for Seq2seq

Efficient data loader for text dataset using torch.utils.data.Dataset, collate_fn and torch.utils.data.DataLoader.


Prerequesites


Usage

1. Clone the repository

$ git clone https://github.com/yunjey/seq2seq-dataloader.git
$ cd seq2seq-dataloader

2. Download nltk tokenizer

$ pip install nltk
$ python
$ import nltk
$ nltk.download('punkt')

3. Build word2id dictionary

$ python build_vocab.py

4. Check DataLoader

For usage, please see example.ipynb.