Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

adds debug options to dump onnx graphs #1789

Merged

Conversation

prathikr
Copy link
Contributor

@prathikr prathikr commented Apr 1, 2024

This PR adds functionality to dump onnx graphs via input arguments to ORTTrainer. This will assist in easier debugging processes for developers running into ONNX Runtime errors. [docs]

Test Machine: 8xV100, cu118, python3.10
Test Environment:

FROM mcr.microsoft.com/aifx/acpt/stable-ubuntu2004-cu118-py310-torch222
RUN pip install accelerate evaluate transformers scikit-learn
RUN git clone https://github.com/prathikr/optimum.git && cd optimum && git checkout prathikrao/add-debug-options && python setup.py install

Test Command: torchrun --nproc_per_node 8 optimum/examples/onnxruntime/training/language-modelling/run_clm.py --model_name_or_path mistralai/Mistral-7B-v0.1 --dataset_name wikitext --dataset_config_name wikitext-2-raw-v1 --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --do_train --max_steps 10 --fp16 --output_dir output_dir --overwrite_output_dir --seed 30 --dataloader_num_workers 1 --block_size 512 --deepspeed zero2.json --save_onnx True --onnx_prefix test --log_level VERBOSE

zero2.json

@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

@prathikr
Copy link
Contributor Author

prathikr commented Apr 3, 2024

@JingyaHuang could you help provide some insight as to why some of these tests are failing? For some reason when I click into details it only shows logs for the skipped tests and not the failing ones.

Copy link
Collaborator

@JingyaHuang JingyaHuang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding the debug options! @prathikr the failing CIs are irrelevant, we should be able to merge the PR.

@JingyaHuang JingyaHuang merged commit dac8645 into huggingface:main Apr 5, 2024
39 of 45 checks passed
young-developer pushed a commit to young-developer/optimum that referenced this pull request May 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants