Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

[Tutorial] add KoBERT tutorial #1230

Open
wants to merge 38 commits into
base: v0.x
Choose a base branch
from
Open

[Tutorial] add KoBERT tutorial #1230

wants to merge 38 commits into from

Conversation

jamiekang
Copy link
Contributor

added kobert_naver_movie for KoBERT tutorial.

Description

(Brief description on what this PR is about)

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Feature1, tests, (and when applicable, API doc)
  • Feature2, tests, (and when applicable, API doc)

Comments

  • If this change is a backward incompatible change, why must this change be made.
  • Interesting edge cases to note here

cc @dmlc/gluon-nlp-team

@jamiekang jamiekang requested a review from a team as a code owner May 12, 2020 09:00
* Update Jenkinsfile_py3_cpu_unittest

* Update Jenkinsfile_py3-master_cpu_unittest
@chenw23
Copy link
Member

chenw23 commented May 13, 2020

Hello Jiyang, would you please merge the latest commit to your branch for this pull request?
#1229 This fixes a cpu-unittest timing restriction which is preventing your commit from being built.
Thanks!

@jamiekang
Copy link
Contributor Author

Hello Jiyang, would you please merge the latest commit to your branch for this pull request?
#1229 This fixes a cpu-unittest timing restriction which is preventing your commit from being built.
Thanks!

Hello, my branch is v0.9.x and I made the lastest commit to that branch. I don't have any other branches. Can you tell me which steps are more required? Thanks.

@chenw23
Copy link
Member

chenw23 commented May 14, 2020

Hello Jiyang, would you please merge the latest commit to your branch for this pull request?
#1229 This fixes a cpu-unittest timing restriction which is preventing your commit from being built.
Thanks!

Hello, my branch is v0.9.x and I made the lastest commit to that branch. I don't have any other branches. Can you tell me which steps are more required? Thanks.

Hello, this commit is merged into v0.9.x branch yesterday and it seems that your pull request is opened 2 days ago. So maybe your pull request is not including this commit?

@chenw23
Copy link
Member

chenw23 commented May 14, 2020

Sorry but I wonder whether there is actual need for merging into v0.9.x(release branch) rather than the master(develop branch)?
I am noticing the gpu-doc failures. On master branch there are some new features that might improve the stability of doc build and help us debugging errors.

@chenw23
Copy link
Member

chenw23 commented May 14, 2020

Hello, I think you need to change the pull request target branch to dmlc:master. Currently you are still targeting dmlc:v0.9.x
Thanks!

@jamiekang jamiekang changed the base branch from v0.9.x to master May 14, 2020 04:17
@@ -92,7 +92,7 @@ def main():
spin = ['-', '/', '|', '\\', '-', '/', '|', '\\']
logGroupName = '/aws/batch/job'

jobName = re.sub('[^A-Za-z0-9_\-]', '', args.name)[:128] # Enforce AWS Batch jobName rules
jobName = re.sub(r'[^A-Za-z0-9_\-]', '', args.name)[:128] # Enforce AWS Batch jobName rules
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello,
Did you type this character by mistake?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know anything about this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have checked and found out this is from a change in v0.9.x branch but not in master branch. #1219
It's good and please keep it!
Thanks!

@chenw23
Copy link
Member

chenw23 commented May 14, 2020

Hello Jiyang,
One of the test is failing due to unclear errors. Please wait patiently while we are working on the fixes.

@leezu This gpu doc test cannot pass doctest. But it seems that this error is due to a connection error. Maybe we need to do some changes elsewhere?

@jamiekang
Copy link
Contributor Author

any update?

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test failed due to some external dependency. Let me trigger it again and see if it passes

@chenw23
Copy link
Member

chenw23 commented May 30, 2020

any update?

Hello Jiyang, would you please merge the latest master branch into your pull request, especially to include #1236 ? So that gpu-doc can pass.
Thanks!

@jamiekang
Copy link
Contributor Author

jamiekang commented Jun 2, 2020

Is this okay?

Merge pull request #1 from dmlc/master … 19c4045

@avinashsai
Copy link
Member

Is this okay?

Merge pull request #1 from dmlc/master … 19c4045

yes

Copy link
Member

@eric-haibin-lin eric-haibin-lin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@leezu any idea why the err log is missing?

fatal error: An error occurred (404) when calling the HeadObject operation: Key "batch/PR-1230/14/docs/examples/sentiment_analysis/kobert_naver_movie.stderr.log" does not exist

@chenw23
Copy link
Member

chenw23 commented Jun 5, 2020

@leezu any idea why the err log is missing?

fatal error: An error occurred (404) when calling the HeadObject operation: Key "batch/PR-1230/14/docs/examples/sentiment_analysis/kobert_naver_movie.stderr.log" does not exist

I think this is because the ci/batch/submit-job.py failed.
This failure is due to the failure of ci/batch/docker/gluon_nlp_job.sh
The failure above is due to the failure of docs/md2ipynb.py

So the root cause of this failure is that the conversion of the newly added md file to the ipynb file didn't succeed.

@leezu
Copy link
Contributor

leezu commented Sep 9, 2020

@jamiekang The failure now is in the next and final step of the build process. The rm command helped to get to the final step.
Currently the CI fails due to a sphinx warning.

[2020-09-09T06:57:48.867Z] /var/lib/jenkins/gluon-nlp-cpu-py3-master/docs/examples/sentiment_analysis/kobert_naver_movie.ipynb:Could not lex literal_block as "python". Highlighting skipped.

To reproduce locally, you should be able to run MD2IPYNB_OPTION=--disable_compute make docs_local on your computer

@szha
Copy link
Member

szha commented Sep 9, 2020

adding this line generated new error: !rm -rf dataset_folder

should I keep this line or remove it?

You need to update the folder name with the actual dataset path.

changed clean up code.
@jamiekang
Copy link
Contributor Author

adding this line generated new error: !rm -rf dataset_folder

should I keep this line or remove it?

You need to update the folder name with the actual dataset path.

ok, I will fix it. Any idea to overcome timeout?

@jamiekang
Copy link
Contributor Author

Could not lex literal_block as "python". Highlighting skipped.

@jamiekang
Copy link
Contributor Author

kernel.cu(1084): Error: Formal parameter space overflowed (4648 bytes required, max 4096 bytes allowed) in function

@jamiekang
Copy link
Contributor Author

What does this mean?
Reshape_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_expand_dims_kernel.cu(1084): Error: Formal parameter space overflowed (4648 bytes required, max 4096 bytes allowed) in function

@szha
Copy link
Member

szha commented Sep 24, 2020

cc @ptrendx, looks like a failed fusion case.

@ptrendx
Copy link
Contributor

ptrendx commented Sep 24, 2020

Hmmm, yeeeaah... So, just out of curiosity - why do you call there expand_dims over a 100 times on a single thing?

@jamiekang
Copy link
Contributor Author

Hmmm, yeeeaah... So, just out of curiosity - why do you call there expand_dims over a 100 times on a single thing?

There's no explicit call to expand_dims() in the source (.md or .ipynb).

@ptrendx
Copy link
Contributor

ptrendx commented Sep 25, 2020

I don't think this error comes from your PR - it happens in test_xlnet_finetune_glue[MRPC].

@jamiekang
Copy link
Contributor Author

Any update?

@szha
Copy link
Member

szha commented Nov 9, 2020

here's a summary of the blocking issues:

  • (resolved) fusion RTC bug for expand_dims (thanks @ptrendx)
  • (resolved) conda dependency resolution failure (I upgraded conda and its python versions on all workers)
  • (ongoing) horovod installation issue

Once the master-gpu-doc pipeline passes I will merge this PR first and we can unblock the horovod issue separately.

@jamiekang
Copy link
Contributor Author

here's a summary of the blocking issues:

  • (resolved) fusion RTC bug for expand_dims (thanks @ptrendx)
  • (resolved) conda dependency resolution failure (I upgraded conda and its python versions on all workers)
  • (ongoing) horovod installation issue

Once the master-gpu-doc pipeline passes I will merge this PR first and we can unblock the horovod issue separately.

Thanks. Let's see how the master-gpu-doc pipeline works.

@szha
Copy link
Member

szha commented Nov 9, 2020

Looks like there is still some error in the new notebook that needs to be resolved first:

[2020-11-09T22:10:54.320Z] Warning, treated as error:
[2020-11-09T22:10:54.320Z] /var/lib/jenkins/gluon-nlp-cpu-py3-master/docs/examples/sentiment_analysis/kobert_naver_movie.ipynb:Could not lex literal_block as "python". Highlighting skipped.

Checking what's causing it.

```

```{.python .input}
!rm -rf nsmc # clean up
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm guessing this is what's troubling the python lexer. Trying to do the same inside python

@jamiekang
Copy link
Contributor Author

It seems we still have the multiple expand_dims error.

@sxjscience
Copy link
Member

I'll later also take a look about how to port KoBERT + Tutorial to the master version.

@jamiekang
Copy link
Contributor Author

I'll later also take a look about how to port KoBERT + Tutorial to the master version.

Thanks!

@jamiekang
Copy link
Contributor Author

/var/lib/jenkins/workspace/gluon-nlp-cpu-py3/conda/cpu/py3/lib/python3.5/site-packages/mxnet/include/mxnet/ndarray.h:41:10: fatal error: mkldnn.hpp: No such file or directory
Anyone can help this? Thanks in advance.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants