Am I properly using stanza offline (coref English model - Electra Large)? #1399

Zappandy · 2024-07-02T23:02:45Z

I'm currently attempting to run a pipeline I had built on my local machine with stanza on an HPC with no access to the huggingface hub or the stanza server. To bypass this, I downloaded all of the models I needed and set the download_method to None. While this seemed to work with most processors in English, the coreference processor bypassed the local files and kept trying to download the google/electra-large model.

After setting environment variables such as HF_HUB_CACHE to the corresponding path where the HF cache has been stored in the HPC and HF_HUB_OFFLINE='1', the huggingface pretrained method from the in the models coref directory kept attempting to download files. I found out that to avoid any downloads, the parameter local_files_only in the from_pretrained method must be set to True (I tested this locally with no internet connection).

Unless I'm missing something, with the current setup I don't see how I can pass this parameter to the pre_trained methods in the ~bert.py script without explicitly doing so in the script as the config object used is not the same stanza config dictionary I defined. It seems to me that the config object that it's read in the script is fetched from the model .pt file using the torch.load method, which of course means the config won't contain the local_files_only parameter.

Am I missing something or is this an expected functionality?

The text was updated successfully, but these errors were encountered:

AngledLuffa · 2024-07-03T01:43:12Z

Thanks, this is a good observation. So what I'm hearing is that we need some way to pass local_files_only to the code path(s) that load the transformers, right? But probably also to this line, which doesn't have any config at all:

    model = AutoModel.from_pretrained(config.bert_model).to(config.device)

Zappandy · 2024-07-03T10:04:29Z

Yes. I don't know how feasible it'd be to pass specific transformers configurations to the stanza pipeline config dictionary the user defines. This may be too much, but at least in terms of an offline mode, the local_files_only should be passed to any pre_trained method as long as the user has set a cache directory where the models and tokenizers are stored.

An alternative is just to pass the local path to the from_pretrained methods, but this is less portable.

AngledLuffa · 2024-08-05T04:26:07Z

Are you comfortable using branches? We made the local_files_only branch so that download_method=None now doesn't download from HF either, in addition to not downloading Stanza models.

Only caveat is the coref model is changed now, to one which detects singletons and uses xlm-roberta as the base model.

AngledLuffa · 2024-08-05T19:11:04Z

Fixed on dev? #1408

Zappandy · 2024-08-23T14:31:55Z

Thanks, yeah I'm comfortable using branches. I'll test it on the dev branch and try to report back asap.

Zappandy added the question label Jul 2, 2024

Zappandy closed this as completed Jul 3, 2024

Zappandy reopened this Jul 3, 2024

Zappandy closed this as not planned Won't fix, can't repro, duplicate, stale Jul 3, 2024

Zappandy reopened this Jul 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Am I properly using stanza offline (coref English model - Electra Large)? #1399

Am I properly using stanza offline (coref English model - Electra Large)? #1399

Zappandy commented Jul 2, 2024

AngledLuffa commented Jul 3, 2024

Zappandy commented Jul 3, 2024

AngledLuffa commented Aug 5, 2024

AngledLuffa commented Aug 5, 2024

Zappandy commented Aug 23, 2024

Am I properly using stanza offline (coref English model - Electra Large)? #1399

Am I properly using stanza offline (coref English model - Electra Large)? #1399

Comments

Zappandy commented Jul 2, 2024

AngledLuffa commented Jul 3, 2024

Zappandy commented Jul 3, 2024

AngledLuffa commented Aug 5, 2024

AngledLuffa commented Aug 5, 2024

Zappandy commented Aug 23, 2024