Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GlueDataset & GlueDataTrainingArguments - working on Colab #11

Open
LedaguenelArthur opened this issue Jan 29, 2021 · 4 comments
Open

Comments

@LedaguenelArthur
Copy link

Hi everyone,

I'm currently trying to understand how to plugin my own data in the BioBERT model in PyTorch for relation extraction, and I see that the run_re.py script is using two utils for which I can't find documentation : GlueDataset & GlueDataTrainingArguments.

Does anyone know how to find the documentation for both these functions ?
Or does anyone can briefly explain how they work ?

Thanks a lot,
Best regards,
Arthur Ledaguenel

@wonjininfo
Copy link
Member

Hi Arthur!

Our repository is based on Transformers v3.0.0.
Unfortunately, GlueDataset and GlueDataTrainingArguments are deleted in the current version of Transformers, v4.
However, you can find those two utils from here ; Tag v3.0.0

Thank you for your interest in our work!
Best,
WonJin

@LedaguenelArthur
Copy link
Author

LedaguenelArthur commented Feb 3, 2021

Thank you very much for that quick answer !

I could not find any documentation on any of those two components on https://huggingface.co/transformers/v3.0.2/index.html for some reason... :/

I would like to plug in my own dataset into the neural network and I wondered how it could work with these two components ?

Would that be enough if I transform my data in the same format as GAD and split them into train.tsv, test.tsv and dev.tsv and feed it to the GlueDataset processor ?

Besides, I am working on Google Colab and thus facing problem when using the HfArgumentParser since i'm not executing the script from a shell passing the argument in the way :

  • can I set up the sys.argv by hand to stick with your code ?
  • I have tried to get arround this by setting up the dataclass by hand :

model_args = ModelArguments(config_name = "bert-base-cased", model_name_or_path = "dmis-lab/biobert-base-cased-v1.1") data_args = DataTrainingArguments(task_name = "SST-2", data_dir=DATA_DIR, max_seq_length=MAX_LENGTH, overwrite_cache=False) training_args = TrainingArguments(per_device_train_batch_size = BATCH_SIZE, save_steps = SAVE_STEPS, seed = SEED, do_train = True, do_predict = True, learning_rate = 5e-5, output_dir = OUTPUT_DIR, overwrite_output_dir = True)

but I get the following error :

[Errno 2] No such file or directory: './GAD/1/cached_train_BertTokenizer_128_sst-2.lock'

Is there a solution for that ? What am I doing wrong ?

I stay at your disposal if my question is unclear,
Thank you again,
Best regards,
Arthur Ledaguenel

@LedaguenelArthur LedaguenelArthur changed the title GlueDataset & GlueDataTrainingArguments documentation GlueDataset & GlueDataTrainingArguments - working on Colab Feb 3, 2021
@clouwer
Copy link

clouwer commented Nov 18, 2021

Hi, Arthur

I got the same error:
No such file or directory: '../data/RE/euadr/1/cached_train_BertTokenizerFast_128_sst-2.lock'

Is there a solution?

Thank you,
Best regards,
Clouwer

@Bharathi-A-7
Copy link

@clouwer I face the same error. Did you happen to find a solution?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants