Question about testing on new data #30

zhaoxy92 · 2019-07-31T15:35:19Z

Hi, I'm trying to run ZOE on a new dataset and the following questions were raised:

In the main.py, should I comment out runner.elmo_processor.load_cached_embeddings("target.min.embedding.pickle", "wikilinks.min.embedding.pickle")? If yes, could you show me how these two files are generated and what are the format for the raw version of these two files? Currently I found running new data is extremely slow (processed 30 sentences after one night). Anything idea how I can speed up things?
Are there any other files/data I need to generate for testing on new dataset? (maybe vocab_test.txt?)

Thank you!

Slash0BZ · 2019-07-31T17:18:04Z

The speed is slow on non-cached Wikipedia titles, especially on CPUs, because it runs multiple ELMo inferences to generate a title's representation. I could provide a huge SQLite file (~72GB) that contains all the Wikipedia titles, do you want me to share it? By having that file, you could use this function instead of load_cached_embeddings. Furthermore, it is recommended to cache your test set as well, i.e. store what candidates are found at each instance so that you can tune your type inference at a low cost. To do this, I would suggest storing results into a map and pickle that map.
Everything should work fine if you have your type mapping (inference) part working. The previous point only speeds things up, without any impact on the results.

zhaoxy92 · 2019-07-31T18:49:36Z

Thank you. Please share it with me! Really appreciate it!

…

On Wed, Jul 31, 2019 at 10:18 AM Xuanyu Zhou ***@***.***> wrote: 1. The speed is slow on non-cached Wikipedia titles, especially on CPUs, because it runs multiple ELMo inferences to generate a title's representation. I could provide a huge SQLite file (~72GB) that contains all the Wikipedia titles, do you want me to share it? By having that file, you could use this function <https://github.com/CogComp/zoe/blob/master/zoe_utils.py#L39> instead of load_cached_embeddings. Furthermore, it is recommended to cache your test set as well, i.e. store what candidates are found at each instance so that you can tune your type inference at a low cost. To do this, I would suggest storing results into a map and pickle that map. 2. Everything should work fine if you have your type mapping (inference) part working. The previous point only speeds things up, without any impact on the results. — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub <#30?email_source=notifications&email_token=AFB56KISOX5OALWGS5P5TT3QCHCMZA5CNFSM4IIH7RI2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3H6I7A#issuecomment-516940924>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AFB56KKCVRXOCBC4ZLGAZC3QCHCMZANCNFSM4IIH7RIQ> .

Slash0BZ · 2019-08-01T13:43:32Z

Updated the file "elmo_cache_correct.db" in the Google Drive https://drive.google.com/drive/u/1/folders/1fD6WfCEPQICGPhxqlwuVmf8uOot-jQq8?ths=true. Sorry for the delay, it's a huge file to upload.

To use it, please refer to the function pointer above, and set server_mode=False.

zhaoxy92 · 2019-08-09T22:13:15Z

Thank you. Downloading it now, will bother you more if there is any further problems!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about testing on new data #30

Question about testing on new data #30

zhaoxy92 commented Jul 31, 2019

Slash0BZ commented Jul 31, 2019

zhaoxy92 commented Jul 31, 2019 via email

Slash0BZ commented Aug 1, 2019

zhaoxy92 commented Aug 9, 2019

Question about testing on new data #30

Question about testing on new data #30

Comments

zhaoxy92 commented Jul 31, 2019

Slash0BZ commented Jul 31, 2019

zhaoxy92 commented Jul 31, 2019 via email

Slash0BZ commented Aug 1, 2019

zhaoxy92 commented Aug 9, 2019