Skip to content
This repository has been archived by the owner on Jan 15, 2024. It is now read-only.

Make GPT2Model a HybridBlock #1010

Merged
merged 8 commits into from
Nov 20, 2019
Merged

Make GPT2Model a HybridBlock #1010

merged 8 commits into from
Nov 20, 2019

Conversation

leezu
Copy link
Contributor

@leezu leezu commented Nov 15, 2019

Description

Fixes #993
Fixes #1015

Checklist

Essentials

  • PR's title starts with a category (e.g. [BUGFIX], [MODEL], [TUTORIAL], [FEATURE], [DOC], etc)
  • Changes are complete (i.e. I finished coding on this PR)
  • All changes have test coverage
  • Code is well-documented

Changes

  • Make GPT2Model a HybridBlock

Comments

cc @dmlc/gluon-nlp-team @gigasquid

@leezu leezu requested a review from a team as a code owner November 15, 2019 12:22
@codecov
Copy link

codecov bot commented Nov 15, 2019

Codecov Report

Merging #1010 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1010   +/-   ##
=======================================
  Coverage   89.93%   89.93%           
=======================================
  Files          67       67           
  Lines        6340     6340           
=======================================
  Hits         5702     5702           
  Misses        638      638

@leezu leezu force-pushed the hybridgpt2 branch 2 times, most recently from 5c87d6d to 9c3c32e Compare November 15, 2019 12:24
@mli
Copy link
Member

mli commented Nov 15, 2019

Job PR-1010/3 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1010/3/index.html

@leezu leezu mentioned this pull request Nov 15, 2019
@gigasquid
Copy link
Contributor

gigasquid commented Nov 15, 2019

Thanks so much for helping with this @leezu.

I pulled the branch and tried to run the sequence sampling for gpt2 and got an error. I think it might actually be a problem on master with some refactoring (or I'm doing something wrong):

10:10 $ python3  sequence_sampling.py  random-sample  --bos 'Deep learning and natural language processing'   --lm-model gpt2_345m 
Namespace(beam_size=5, bos='Deep learning and natural language processing', command='random-sample', gpu=0, lm_model='gpt2_345m', max_length=20, print_num=3, temperature=1.0, use_top_k=None)
Traceback (most recent call last):
  File "sequence_sampling.py", line 187, in <module>
    generate()
  File "sequence_sampling.py", line 146, in generate
    decoder, vocab = get_decoder_vocab(args.lm_model)
  File "sequence_sampling.py", line 116, in get_decoder_vocab
    ctx=ctx)
  File "/Users/cmeier/workspace/deep-learning/gluon-nlp/scripts/text_generation/model/__init__.py", line 64, in get_model
    return models[name](**kwargs)
  File "/Users/cmeier/workspace/deep-learning/gluon-nlp/scripts/text_generation/model/gpt.py", line 383, in gpt2_345m
    **kwargs)
  File "/Users/cmeier/workspace/deep-learning/gluon-nlp/scripts/text_generation/model/gpt.py", line 429, in _get_gpt2_model
    **kwargs)
  File "/Users/cmeier/workspace/deep-learning/gluon-nlp/scripts/text_generation/model/gpt.py", line 239, in __init__
    units=units, hidden_size=units * 4, prefix='ffn{}_'.format(i)))
  File "/Users/cmeier/workspace/deep-learning/gluon-nlp/scripts/text_generation/model/gpt.py", line 186, in __init__
    self._act = GELU(approximate=True)
  File "/usr/local/lib/python3.7/site-packages/gluonnlp/model/block.py", line 106, in __init__
    super(GELU, self).__init__(**kwargs)
TypeError: __init__() got an unexpected keyword argument 'approximate'

The error does not occur if I go back to git checkout 4e555394baf557e5e55e1ae24a2147b03dce2213

@leezu
Copy link
Contributor Author

leezu commented Nov 15, 2019

You need to use the development version of gluonnlp if you use the development version of the script. Use pip install git+https://github.com/dmlc/gluon-nlp.git

@mli
Copy link
Member

mli commented Nov 18, 2019

Job PR-1010/4 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1010/4/index.html

@leezu leezu mentioned this pull request Nov 20, 2019
@mli
Copy link
Member

mli commented Nov 20, 2019

Job PR-1010/8 is complete.
Docs are uploaded to http://gluon-nlp-staging.s3-accelerate.dualstack.amazonaws.com/PR-1010/8/index.html

@leezu leezu merged commit ebfc920 into dmlc:master Nov 20, 2019
@leezu leezu deleted the hybridgpt2 branch November 20, 2019 08:20
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

prev_len in gpt.py Export for GPT-2
4 participants