Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Fix scripts/question_answering/data_pipeline.py requiring optional package #1013

Merged
merged 3 commits into from
Dec 4, 2019

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Nov 19, 2019

Because a nlp.data.SpacyTokenizer is created as class attribute, SpacyTokenizer
is required when Python parses the data_pipeline.py file. This means users will
always need to install the "optional" SpacyTokenizer dependencies, even if they
don't plan to use it. For example, just running an unrelated test in the scripts
folder will currently raise the following error.

ImportError while loading conftest '/home/ubuntu/projects/gluon-nlp/scripts/tests/conftest.py'.
scripts/tests/conftest.py:23: in <module>
    from ..question_answering.data_pipeline import SQuADDataPipeline
scripts/question_answering/data_pipeline.py:433: in <module>
    class SQuADDataTokenizer:
scripts/question_answering/data_pipeline.py:435: in SQuADDataTokenizer
    spacy_tokenizer = nlp.data.SpacyTokenizer()
src/gluonnlp/data/transforms.py:248: in __init__
    lang=lang))
E   OSError: SpaCy Model for the specified language="en_core_web_sm" has not been downloaded. You need to check the installation guide in https://spacy.io/usage/models. Usually, the installation command should be `python -m spacy download en_core_web_sm`.

cc @dmlc/gluon-nlp-team

@leezu leezu requested a review from Ishitori November 19, 2019 08:47
@leezu leezu requested a review from a team as a code owner November 19, 2019 08:47
@codecov
Copy link

codecov bot commented Nov 19, 2019

Codecov Report

Merging #1013 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1013   +/-   ##
=======================================
  Coverage   88.27%   88.27%           
=======================================
  Files          67       67           
  Lines        6254     6254           
=======================================
  Hits         5521     5521           
  Misses        733      733

@mli
Copy link
Member

mli commented Nov 20, 2019

Job PR-1013/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/3/index.html

@mli
Copy link
Member

mli commented Dec 3, 2019

Job PR-1013/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/4/index.html

@leezu leezu force-pushed the fixqadatapipelinespacy branch 2 times, most recently from a87a9e3 to 78fdd38 Compare December 3, 2019 07:42
…ckage

Because a nlp.data.SpacyTokenizer is created as class attribute, SpacyTokenizer
is required when Python parses the data_pipeline.py file. This means users will
always need to install the "optional" SpacyTokenizer dependencies, even if they
don't plan to use it. For example, just running an unrelated test in the scripts
folder will currently raise the following error.

ImportError while loading conftest '/home/ubuntu/projects/gluon-nlp/scripts/tests/conftest.py'.
scripts/tests/conftest.py:23: in <module>
    from ..question_answering.data_pipeline import SQuADDataPipeline
scripts/question_answering/data_pipeline.py:433: in <module>
    class SQuADDataTokenizer:
scripts/question_answering/data_pipeline.py:435: in SQuADDataTokenizer
    spacy_tokenizer = nlp.data.SpacyTokenizer()
src/gluonnlp/data/transforms.py:248: in __init__
    lang=lang))
E   OSError: SpaCy Model for the specified language="en_core_web_sm" has not been downloaded. You need to check the installation guide in https://spacy.io/usage/models. Usually, the installation command should be `python -m spacy download en_core_web_sm`.
@mli
Copy link
Member

mli commented Dec 3, 2019

Job PR-1013/7 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/7/index.html

@leezu leezu removed the request for review from Ishitori December 3, 2019 08:28
@mli
Copy link
Member

mli commented Dec 3, 2019

Job PR-1013/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1013/8/index.html

@leezu leezu merged commit 7b7bf60 into dmlc:master Dec 4, 2019
@leezu leezu deleted the fixqadatapipelinespacy branch December 4, 2019 03:45
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants