Deprecate the nvidia/apex integration #14416

carmocca · 2022-08-26T17:37:57Z

Proposed refactor

Deprecation:

Deprecate the ApexMixedPrecisionPlugin and passing Trainer(amp_backend=...). To be removed in 1.10
Add deprecation notices to apex throughout our docs

Removal:

Remove all the apex-related glue throughout the codebase
Remove the apex installation from CI
Remove apex from our docs

Motivation

APEX AMP can be regarded as deprecated in favor of PyTorch AMP which Michael Carilli implemented and advocated in #1337.

Most developer activity in the nvidia/apex repository happen in either apex/transformer, apex/optimizers, tests/L0, and/or apex/contrib. apex/amp directory hasn't seen changes for about 2 years

Given the 2-year hibernation would make it almost impossible to resume the support for the different optimization levels to O2.

It's unclear whether any nvidia teams use our apex plugin internally.

And the nvidia team is unable to provide support for apex bugs.

If you enjoy Lightning, check out our other projects! ⚡

Metrics: Machine learning metrics for distributed, scalable PyTorch applications.
Lite: enables pure PyTorch users to scale their existing code on any kind of device while retaining full control over their own loops and optimization logic.
Flash: The fastest way to get a Lightning baseline! A collection of tasks for fast prototyping, baselining, fine-tuning, and solving problems with deep learning.
Bolts: Pretrained SOTA Deep Learning models, callbacks, and more for research and production with PyTorch Lightning and PyTorch.
Lightning Transformers: Flexible interface for high-performance research using SOTA Transformers leveraging PyTorch Lightning, Transformers, and Hydra.

cc @tchaton @rohitgr7 @carmocca @justusschock @awaelchli @akihironitta @kaushikb11 @Borda

The text was updated successfully, but these errors were encountered:

awaelchli · 2022-08-26T17:38:30Z

I'm in favor

rohitgr7 · 2022-08-27T19:45:12Z

deepspeed still supports amp, so we should also update the DeepSpeedPrecisionPlugin to accept amp_backendnow.
Also, I would suggest naming the parameters amp_type there for consistency.

rohitgr7 · 2022-08-27T19:46:17Z

also, do you know if PyTorch is working on O1/O3 support natively?

carmocca · 2022-08-29T11:56:28Z

I wonder if DeepSpeed is impacted with the same checkpointing problems when apex is used.

also, do you know if PyTorch is working on O1/O3 support natively?

I don't think so. It might not provide relevant efficiency improvements for most use cases so maybe they scrapped supporting it. @ptrblck, if you have any insights here, we'd love to hear them from you 🙇

Also relevant: pytorch/pytorch#52279

ptrblck · 2022-08-30T04:52:19Z

also, do you know if PyTorch is working on O1/O3 support natively?

The native amp implementation via torch.amp or torch.cuda.amp is close to the legacy apex.amp O1 opt level while the legacy O3 level was mainly used for debugging and performance testing as it's the "pure FP16" implementation (it calls .half() on the data and model directly, which can be dangerous).
I agree that deprecating apex.amp from Lightning sounds like a good idea and to focus on the native implementation.

carmocca added deprecation Includes a deprecation precision: apex (removed) NVIDIA/apex precision trainer: argument pl Generic label for PyTorch Lightning package labels Aug 26, 2022

carmocca added this to the pl:future milestone Aug 26, 2022

awaelchli mentioned this issue Aug 26, 2022

Test AMP and Apex checkpointing #11885

Closed

carmocca modified the milestones: pl:future, pl:1.8 Aug 27, 2022

carmocca mentioned this issue Sep 8, 2022

Standalone Lite: Precision Plugins #14547

Merged

11 tasks

This was referenced Sep 22, 2022

Is precision="mixed" redundant? #9956

Closed

Using FP16 with apex backend at optimization level O2 results in runtime error #14868

Closed

This was referenced Oct 11, 2022

Suppress Apex AMP deprecation message until Lightning removes its implementation #15070

Closed

Suppress Apex deprecation message #15071

Closed

carmocca added the priority: 0 High priority task label Oct 13, 2022

carmocca modified the milestones: v1.8, v1.9 Oct 13, 2022

carmocca self-assigned this Oct 31, 2022

carmocca removed the priority: 0 High priority task label Oct 31, 2022

awaelchli mentioned this issue Dec 7, 2022

Unable to resume from checkpoint when using apex #11488

Closed

carmocca mentioned this issue Dec 13, 2022

Deprecate nvidia/apex #16039

Merged

12 tasks

carmocca closed this as completed in #16039 Dec 20, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deprecate the nvidia/apex integration #14416

Deprecate the nvidia/apex integration #14416

carmocca commented Aug 26, 2022 •

edited

Loading

awaelchli commented Aug 26, 2022

rohitgr7 commented Aug 27, 2022

rohitgr7 commented Aug 27, 2022

carmocca commented Aug 29, 2022

ptrblck commented Aug 30, 2022

Deprecate the nvidia/apex integration #14416

Deprecate the nvidia/apex integration #14416

Comments

carmocca commented Aug 26, 2022 • edited Loading

Proposed refactor

Motivation

If you enjoy Lightning, check out our other projects! ⚡

awaelchli commented Aug 26, 2022

rohitgr7 commented Aug 27, 2022

rohitgr7 commented Aug 27, 2022

carmocca commented Aug 29, 2022

ptrblck commented Aug 30, 2022

carmocca commented Aug 26, 2022 •

edited

Loading