Fix scripts/question_answering/data_pipeline.py requiring optional package #1013

leezu · 2019-11-19T08:47:38Z

Because a nlp.data.SpacyTokenizer is created as class attribute, SpacyTokenizer
is required when Python parses the data_pipeline.py file. This means users will
always need to install the "optional" SpacyTokenizer dependencies, even if they
don't plan to use it. For example, just running an unrelated test in the scripts
folder will currently raise the following error.

ImportError while loading conftest '/home/ubuntu/projects/gluon-nlp/scripts/tests/conftest.py'.
scripts/tests/conftest.py:23: in <module>
    from ..question_answering.data_pipeline import SQuADDataPipeline
scripts/question_answering/data_pipeline.py:433: in <module>
    class SQuADDataTokenizer:
scripts/question_answering/data_pipeline.py:435: in SQuADDataTokenizer
    spacy_tokenizer = nlp.data.SpacyTokenizer()
src/gluonnlp/data/transforms.py:248: in __init__
    lang=lang))
E   OSError: SpaCy Model for the specified language="en_core_web_sm" has not been downloaded. You need to check the installation guide in https://spacy.io/usage/models. Usually, the installation command should be `python -m spacy download en_core_web_sm`.

cc @dmlc/gluon-nlp-team

codecov · 2019-11-19T08:47:41Z

Codecov Report

Merging #1013 into master will not change coverage.
The diff coverage is n/a.

@@           Coverage Diff           @@
##           master    #1013   +/-   ##
=======================================
  Coverage   88.27%   88.27%           
=======================================
  Files          67       67           
  Lines        6254     6254           
=======================================
  Hits         5521     5521           
  Misses        733      733

mli · 2019-11-20T09:01:52Z

Job PR-1013/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/3/index.html

mli · 2019-12-03T04:19:34Z

Job PR-1013/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/4/index.html

…ckage Because a nlp.data.SpacyTokenizer is created as class attribute, SpacyTokenizer is required when Python parses the data_pipeline.py file. This means users will always need to install the "optional" SpacyTokenizer dependencies, even if they don't plan to use it. For example, just running an unrelated test in the scripts folder will currently raise the following error. ImportError while loading conftest '/home/ubuntu/projects/gluon-nlp/scripts/tests/conftest.py'. scripts/tests/conftest.py:23: in <module> from ..question_answering.data_pipeline import SQuADDataPipeline scripts/question_answering/data_pipeline.py:433: in <module> class SQuADDataTokenizer: scripts/question_answering/data_pipeline.py:435: in SQuADDataTokenizer spacy_tokenizer = nlp.data.SpacyTokenizer() src/gluonnlp/data/transforms.py:248: in __init__ lang=lang)) E OSError: SpaCy Model for the specified language="en_core_web_sm" has not been downloaded. You need to check the installation guide in https://spacy.io/usage/models. Usually, the installation command should be `python -m spacy download en_core_web_sm`.

mli · 2019-12-03T08:23:02Z

Job PR-1013/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/7/index.html

mli · 2019-12-03T09:03:22Z

Job PR-1013/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/8/index.html

leezu requested a review from Ishitori November 19, 2019 08:47

leezu requested a review from a team as a code owner November 19, 2019 08:47

leezu force-pushed the fixqadatapipelinespacy branch from 51eb0c4 to 6cfdfa6 Compare November 20, 2019 08:22

eric-haibin-lin approved these changes Nov 29, 2019

View reviewed changes

leezu force-pushed the fixqadatapipelinespacy branch from 6cfdfa6 to 49b861d Compare December 3, 2019 03:44

Add missing regex optional dependency

3e51007

leezu force-pushed the fixqadatapipelinespacy branch 2 times, most recently from a87a9e3 to 78fdd38 Compare December 3, 2019 07:42

leezu force-pushed the fixqadatapipelinespacy branch from 78fdd38 to b048122 Compare December 3, 2019 07:47

Fix lint

233007f

leezu removed the request for review from Ishitori December 3, 2019 08:28

leezu merged commit 7b7bf60 into dmlc:master Dec 4, 2019

leezu deleted the fixqadatapipelinespacy branch December 4, 2019 03:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix scripts/question_answering/data_pipeline.py requiring optional package #1013

Fix scripts/question_answering/data_pipeline.py requiring optional package #1013

leezu commented Nov 19, 2019

codecov bot commented Nov 19, 2019 •

edited

Loading

mli commented Nov 20, 2019

mli commented Dec 3, 2019

mli commented Dec 3, 2019

mli commented Dec 3, 2019

Fix scripts/question_answering/data_pipeline.py requiring optional package #1013

Fix scripts/question_answering/data_pipeline.py requiring optional package #1013

Conversation

leezu commented Nov 19, 2019

codecov bot commented Nov 19, 2019 • edited Loading

Codecov Report

mli commented Nov 20, 2019

mli commented Dec 3, 2019

mli commented Dec 3, 2019

mli commented Dec 3, 2019

codecov bot commented Nov 19, 2019 •

edited

Loading