Add mono models to huggingface model-zoo and incorporate into pipeline #29

MXueguang · 2020-05-23T16:26:21Z

No description provided.

rodrigonogueira4 · 2020-05-23T16:46:09Z

pygaggle/run/evaluate_passage_ranker.py

    device = torch.device(options.device)
+
+    try:


Instead of try/except, should we pass a flag to distinguish PT from TF checkpoints?

This entire segment of code should give an error since loader is not defined anymore?

@rodrigonogueira4 Yes, I feel a flag makes more sense. will add an flag --from_tf for evaluate_passage_ranker.py

@ronakice oh, I didn't notice there is a one change difference between my gcloud and local. Will remove that line.

Huggingface's work-around is something like bool("ckpt" in model_name_or_path)

model_name_or_path is a directory may contains both "ckpt" and pytorch_model.bin ? https://huggingface.co/google/bert_uncased_L-10_H-128_A-2#list-files , so maybe still need to be stated by user?

To avoid problems with PT files containing ckpt in their name, I would just go with the flag.

ronakice · 2020-05-23T20:33:53Z

I feel like the model in the model-zoo is missing the vocab file?

MXueguang · 2020-05-24T00:08:05Z

I feel like the model in the model-zoo is missing the vocab file?

@ronakice Is this because we are using t5-base as tokenizer?

ronakice · 2020-05-24T00:31:52Z

https://huggingface.co/valhalla/t5-base-squad#list-files compare this with our list-files and it seems like our model is missing a few! Not sure if all are useful though!

MXueguang · 2020-05-24T02:22:21Z

Currently in construct_t5()

we load model by T5ForConditionalGeneration.from_pretrained("castorini/monot5-base-msmarco").
we load tokenizer by AutoTokenizer.from_pretrained("t5-base")
which from different models on HuggingFace.

I think the missed files are from tokenizer.(e.g. config file and spiece.model or vocal file). The pretrained model itself doesn't require these files. (The original gs://neuralresearcher_data/doc2query/experiments/367 doesn't contains vocab files as well.)

2020-05-24 02:04:42 [INFO] configuration_utils: loading configuration file https://s3.amazona
ws.com/models.huggingface.co/bert/t5-base-config.json from cache 

2020-05-24 02:04:42 [INFO] tokenization_utils: loading file https://s3.amazonaws.com/models.huggingface.co/bert/t5-spiece.model from cache

Do we need to find a way to integrate the pretrained tokenizer files to our model files?

MXueguang · 2020-05-24T04:41:21Z

I don't think we need default='false' if action='store_true'. Could you double-check?

Yes, it can be removed.

MXueguang · 2020-05-24T04:44:04Z

revalidated experiment result on latest update

monot5

monoBERT

rodrigonogueira4

LGTM, thanks for implementing this!

ronakice · 2020-05-24T13:04:59Z

Great! We can merge this. Thanks for doing this @MXueguang

MXueguang and others added 7 commits May 23, 2020 00:41

Add replication for MS MARCO passage

4a2adbc

fold two replication sections into one

7fe5352

add GPUs info in replication logs

6053c0a

remove model names from replication logs

643ec35

clarify replication remarks

d9bf64d

Merge branch 'master' of https://github.com/castorini/pygaggle

3a19676

Add mono models to hugging face model-zoo and incorporate into pipline

555d89b

rodrigonogueira4 reviewed May 23, 2020

View reviewed changes

MXueguang added 2 commits May 24, 2020 10:53

add from-tf flag to distinguish PT from TF checkpoints

58c3564

remove default=false for from-tf flag

14be579

rodrigonogueira4 approved these changes May 24, 2020

View reviewed changes

ronakice closed this May 24, 2020

ronakice reopened this May 24, 2020

ronakice merged commit d44c89c into castorini:master May 24, 2020

ronakice mentioned this pull request Jun 1, 2020

Add mono models to huggingface model-zoo and incorporate into pipeline #26

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add mono models to huggingface model-zoo and incorporate into pipeline #29

Add mono models to huggingface model-zoo and incorporate into pipeline #29

MXueguang commented May 23, 2020

rodrigonogueira4 May 23, 2020

ronakice May 23, 2020

MXueguang May 23, 2020 •

edited

Loading

MXueguang May 23, 2020

ronakice May 23, 2020

MXueguang May 23, 2020 •

edited

Loading

rodrigonogueira4 May 24, 2020

ronakice commented May 23, 2020

MXueguang commented May 24, 2020 •

edited

Loading

ronakice commented May 24, 2020 •

edited

Loading

MXueguang commented May 24, 2020 •

edited

Loading

MXueguang commented May 24, 2020

MXueguang commented May 24, 2020

rodrigonogueira4 left a comment

ronakice commented May 24, 2020

Add mono models to huggingface model-zoo and incorporate into pipeline #29

Add mono models to huggingface model-zoo and incorporate into pipeline #29

Conversation

MXueguang commented May 23, 2020

rodrigonogueira4 May 23, 2020

Choose a reason for hiding this comment

ronakice May 23, 2020

Choose a reason for hiding this comment

MXueguang May 23, 2020 • edited Loading

Choose a reason for hiding this comment

MXueguang May 23, 2020

Choose a reason for hiding this comment

ronakice May 23, 2020

Choose a reason for hiding this comment

MXueguang May 23, 2020 • edited Loading

Choose a reason for hiding this comment

rodrigonogueira4 May 24, 2020

Choose a reason for hiding this comment

ronakice commented May 23, 2020

MXueguang commented May 24, 2020 • edited Loading

ronakice commented May 24, 2020 • edited Loading

MXueguang commented May 24, 2020 • edited Loading

MXueguang commented May 24, 2020

MXueguang commented May 24, 2020

rodrigonogueira4 left a comment

Choose a reason for hiding this comment

ronakice commented May 24, 2020

MXueguang May 23, 2020 •

edited

Loading

MXueguang May 23, 2020 •

edited

Loading

MXueguang commented May 24, 2020 •

edited

Loading

ronakice commented May 24, 2020 •

edited

Loading

MXueguang commented May 24, 2020 •

edited

Loading