let hf trainer handle torch compile #516

winglian · 2023-08-31T17:37:29Z

No description provided.

src/axolotl/utils/trainer.py

NanoCode012 · 2023-09-01T00:17:38Z

README.md

@@ -526,6 +526,10 @@ wandb_log_model: # "checkpoint" to log model to wandb Artifacts every `save_step
 # where to save the finished model to
 output_dir: ./completed-model

+# whether to use torch.compile and which backend to use
+torch_compile:  # bool
+torch_compile_backend:  # Optional[str]


Is it necessary to open this "backend" config?

Refer to the PyTorch doc for possible values and note that they may change across PyTorch versions.

and reading the source of the torch compile, it seems to only have a condition for inductor.

Not sure; there are other backends but maybe they're not useful?

torch._dynamo.list_backends() ['aot_ts_nvfuser', 'cudagraphs', 'inductor', 'ipex', 'nvprims_nvfuser', 'onnxrt', 'tvm']

src/axolotl/train.py

NanoCode012 · 2023-09-01T14:34:59Z

It seems like there may be small benefits to leaving it to default on. Is there any case having it off is better?

tmm1 · 2023-09-01T14:44:13Z

Is there any case having it off is better?

it cannot default on because it doesn't work. if you try to train it errors out.

winglian · 2023-09-07T13:15:05Z

src/axolotl/utils/trainer.py

@@ -579,6 +580,21 @@ def setup_trainer(cfg, train_dataset, eval_dataset, model, tokenizer, total_num_
        if cfg.bench_dataset:
            training_arguments_kwargs["bench_dataset"] = cfg.bench_dataset

+    if cfg.torch_compile:
+        if torch.__version__ < "2.1.0":  # pylint: disable=protected-access


@tmm1 lmk if you think this is good enough for now

* let hf trainer handle torch compile * remove torch compile checks, include option for backend * suppress torch errors to get further * require min torch version of 2.1.0 for torch compile to work --------- Co-authored-by: Aman Karmani <aman@tmm1.net>

let hf trainer handle torch compile

1eafdea

tmm1 reviewed Aug 31, 2023

View reviewed changes

src/axolotl/utils/trainer.py Outdated Show resolved Hide resolved

winglian added 2 commits August 31, 2023 13:43

remove torch compile checks, include option for backend

72e340a

Merge branch 'main' into torch-compile

f899427

winglian added the hold don't merge this yet label Aug 31, 2023

tmm1 approved these changes Aug 31, 2023

View reviewed changes

NanoCode012 reviewed Sep 1, 2023

View reviewed changes

src/axolotl/train.py Show resolved Hide resolved

suppress torch errors to get further

a0081c5

tmm1 mentioned this pull request Sep 6, 2023

Make model compilation failure a warning only #530

Closed

require min torch version of 2.1.0 for torch compile to work

2362a54

winglian commented Sep 7, 2023

View reviewed changes

winglian merged commit a4e1bb6 into main Sep 13, 2023
6 checks passed

winglian deleted the torch-compile branch September 13, 2023 15:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

let hf trainer handle torch compile #516

let hf trainer handle torch compile #516

winglian commented Aug 31, 2023

NanoCode012 Sep 1, 2023

tmm1 Sep 1, 2023 •

edited

Loading

NanoCode012 commented Sep 1, 2023

tmm1 commented Sep 1, 2023

winglian Sep 7, 2023

let hf trainer handle torch compile #516

let hf trainer handle torch compile #516

Conversation

winglian commented Aug 31, 2023

NanoCode012 Sep 1, 2023

Choose a reason for hiding this comment

tmm1 Sep 1, 2023 • edited Loading

Choose a reason for hiding this comment

NanoCode012 commented Sep 1, 2023

tmm1 commented Sep 1, 2023

winglian Sep 7, 2023

Choose a reason for hiding this comment

tmm1 Sep 1, 2023 •

edited

Loading