Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add convert_delta_to_json to CLI #1355

Merged
merged 43 commits into from
Jul 23, 2024
Merged

Conversation

KuuCi
Copy link
Contributor

@KuuCi KuuCi commented Jul 12, 2024

This PR allows users to call llmfoundry convert_delta_to_json {ARGS} while maintaining correctness with existing convert_delta_to_json script. The motivation is for DLE where we want to make the CLI much more intuitive in the docker images

@KuuCi KuuCi marked this pull request as ready for review July 12, 2024 23:27
@KuuCi KuuCi requested a review from a team as a code owner July 12, 2024 23:27
@KuuCi
Copy link
Contributor Author

KuuCi commented Jul 15, 2024

test-dt-orig-BK6Lv9 runs:

cd llm-foundry/scripts/data_prep
  python convert_delta_to_json.py \
      --delta_table_name "main.arxiv.data_chunked" \
      --json_output_folder data_folder \
      --cluster_id 0713-000001-a2kqt0bu
  ls
  ls data_folder

test-dt-cli-AT7lYi runs:

cd llm-foundry/scripts/data_prep
  llmfoundry convert_delta_to_json \
      --delta-table-name 'main.arxiv.data_chunked' \
      --json-output-folder data_folder \
      --cluster-id 0713-000001-a2kqt0bu
  ls
  ls data_folder
image

@KuuCi KuuCi requested a review from dakinggg July 15, 2024 17:57
@KuuCi KuuCi marked this pull request as draft July 16, 2024 00:55
@KuuCi
Copy link
Contributor Author

KuuCi commented Jul 18, 2024

not entirely sure why smoketest is failing

@KuuCi
Copy link
Contributor Author

KuuCi commented Jul 18, 2024

mcli logs test-dt-orig-wEutZ2 runs:
python convert_finetuning_dataset.py \ --dataset "Muennighoff/P3" \ --splits train validation \ --preprocessor "llmfoundry.data.finetuning.tasks:p3_preprocessing_function" \ --out_root "data_folder"

mcli logs test-dt-cli-fVsqOh runs:
llmfoundry data_prep convert_finetuning_dataset
--dataset "Muennighoff/P3"
--splits train,validation
--preprocessor "llmfoundry.data.finetuning.tasks:p3_preprocessing_function"
--out-root "data_folder"

image

@KuuCi KuuCi marked this pull request as ready for review July 18, 2024 18:22
@KuuCi KuuCi requested a review from a team as a code owner July 18, 2024 20:47
@KuuCi
Copy link
Contributor Author

KuuCi commented Jul 19, 2024

mcli logs test-dt-cli-52hYHv

@dakinggg dakinggg enabled auto-merge (squash) July 23, 2024 18:42
@dakinggg dakinggg merged commit 3d7d12e into main Jul 23, 2024
9 checks passed
@dakinggg dakinggg deleted the dataprep-convert_delta_to_json-cli branch August 6, 2024 18:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants