Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Enhancement] DatasetLoader for BERT pre-training #799

Merged
merged 13 commits into from
Jul 2, 2019

Conversation

eric-haibin-lin
Copy link
Member

@eric-haibin-lin eric-haibin-lin commented Jun 27, 2019

This PR introduces a DatasetLoader that launches a worker pool to prefetch datasets based on urls.
The main thread creates a new dataloader whenever the current dataset is exhausted.
The feature is currently in the script folder.

* use DatasetLoader

* fix lint

* fix bug

* fix lint

* fix bug

* fix bug

* fix lint

* fix argument

* skip test
@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

❗ No coverage uploaded for pull request head (master@63ce4e1). Click here to learn what that means.
The diff coverage is n/a.

@codecov
Copy link

codecov bot commented Jun 27, 2019

Codecov Report

Merging #799 into master will increase coverage by <.01%.
The diff coverage is 52.63%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #799      +/-   ##
==========================================
+ Coverage   90.38%   90.39%   +<.01%     
==========================================
  Files          66       65       -1     
  Lines        6378     6280      -98     
==========================================
- Hits         5765     5677      -88     
+ Misses        613      603      -10
Impacted Files Coverage Δ
src/gluonnlp/utils/files.py 45.09% <100%> (+8.73%) ⬆️
src/gluonnlp/data/stream.py 85.56% <18.18%> (-4.06%) ⬇️
src/gluonnlp/data/dataloader.py

@mli
Copy link
Member

mli commented Jun 27, 2019

Job PR-799/1 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-799/1/index.html

@mli
Copy link
Member

mli commented Jun 27, 2019

Job PR-799/2 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-799/2/index.html

@szha szha changed the title DatasetLoader for BERT pre-training [Enhancement] DatasetLoader for BERT pre-training Jun 28, 2019
@mli
Copy link
Member

mli commented Jun 28, 2019

Job PR-799/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-799/3/index.html

EC2 Default User and others added 2 commits June 29, 2019 00:24
@mli
Copy link
Member

mli commented Jun 29, 2019

Job PR-799/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-799/4/index.html

@mli
Copy link
Member

mli commented Jun 29, 2019

Job PR-799/5 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-799/5/index.html

@mli
Copy link
Member

mli commented Jun 29, 2019

Job PR-799/6 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-799/6/index.html

@mli
Copy link
Member

mli commented Jun 30, 2019

Job PR-799/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-799/7/index.html

@mli
Copy link
Member

mli commented Jul 1, 2019

Job PR-799/9 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-799/9/index.html

@mli
Copy link
Member

mli commented Jul 1, 2019

Job PR-799/10 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-799/10/index.html

@eric-haibin-lin eric-haibin-lin merged commit 9d069d3 into dmlc:master Jul 2, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants