-
-
Notifications
You must be signed in to change notification settings - Fork 842
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix bug in dataset loading #284
Conversation
This fixes a bug when loading datasets. `d.data_files` is a list, so it cannot be directly passed to `hf_hub_download`
Hey, in my case, I use data_files as a string. If you would like to add this, could you consider checking the type of the variable (whether str or list) and deal appropriately? |
@ethanhs there is a syntax error in this changeset. Can you make sure the pre-commit hooks get run when committing please?
|
Fixed/ran pre-commit, and checked the type of |
Hey, could you please clarify on the original bug? Could you provide some examples and error? accelerate launch scripts/finetune.py examples/openllama-3b/lora.yml --prepare_ds_only --debug datasets:
- path: teknium/GPT4-LLM-Cleaned
type: alpaca
data_files:
- alpaca_gpt4_data_unfiltered.json
- unnatural_instructions_unfiltered.json I can run this fine on the current master. |
@NanoCode012 I think the code he is modifying is fallback code. Typically dataset loading from hf hub happens in the conditional block before this one. I honestly don't know when this else clause might get called, but it's probably worth having for that edge case. |
Sounds good! |
* Fix bug in dataset loading This fixes a bug when loading datasets. `d.data_files` is a list, so it cannot be directly passed to `hf_hub_download` * Check type of data_files, and load accordingly
This fixes a bug when loading datasets.
d.data_files
is a list, so it cannot be directly passed tohf_hub_download
, which only takes a single filename.