In case of MPS device also copy batch to CPU #3105

hyenal · 2024-03-11T11:43:00Z

What does this PR do?

This PR fixes an issue when using MPS device and composer's classifier. When computing metrics, outputs is copied to the cpu but not batch which creates some downstream error.

What issue(s) does this change relate to?

Fixes #3094

Before submitting

Have you read the contributor guidelines?
Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
Did you update any related docs and document your change?
Did you update any related tests and add any new tests related to your change? (see testing)
Did you run the tests locally to make sure they pass?
Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

mvpatel2000

Thanks for the PR!

I think there's a minor error -- once fixed looks good to me! Please feel free to rerequest when this is merged

composer/trainer/trainer.py

mvpatel2000 · 2024-03-14T19:35:00Z

Hm... labels can be a complex type so directly calling to seems to fail tests

hyenal · 2024-03-15T13:01:31Z

To ensure compatibility with HuggingFaceModel I had to create a new method to shift labels outside of eval_forward.

I hope this is not a breaking change, it felt a bit more natural to modify the labels outside of this method as well but there may be reasons for doing it there that I am unaware of.

In addition update_metric will be slightly less efficient although the shift label operation should be very cheap

composer/models/huggingface.py

mvpatel2000 reviewed Mar 13, 2024

View reviewed changes

composer/trainer/trainer.py Outdated Show resolved Hide resolved

mvpatel2000 mentioned this pull request Mar 13, 2024

Device mismatch during evaluation when training on mps #2385

Open

Sebastien Ehrhardt added 4 commits March 14, 2024 09:21

In case of MPS device also copy batch to CPU

e1da630

metrics on CPU for MPS device

85795fd

fix variable name

ec17890

name

3d3c6d0

hyenal force-pushed the mps-batch-to-cpu branch from 63d407a to 3d3c6d0 Compare March 14, 2024 09:22

hyenal requested a review from mvpatel2000 March 14, 2024 09:22

ruff + hf transformers

81f95e2

hyenal requested a review from a team as a code owner March 14, 2024 16:38

Sebastien Ehrhardt added 7 commits March 15, 2024 09:27

label error

572b162

fix formatting

9b3444a

remove a lot of formatting changes

388321c

typo

9cddc3e

almost all tests should pass now

42519f1

don t use pop

c42a76a

new method

7ae29bf

Merge branch 'dev' into mps-batch-to-cpu

6411fe2

mvpatel2000 requested a review from dakinggg March 19, 2024 18:58

dakinggg reviewed Mar 19, 2024

View reviewed changes

composer/models/huggingface.py Outdated Show resolved Hide resolved

Sebastien Ehrhardt and others added 3 commits March 22, 2024 12:22

move labels only if metric is on CPU

6b2dd14

Merge branch 'dev' into mps-batch-to-cpu

f7b9c4c

Merge branch 'dev' into mps-batch-to-cpu

ac6fdd2

dakinggg approved these changes Mar 22, 2024

View reviewed changes

dakinggg enabled auto-merge (squash) March 22, 2024 18:36

dakinggg merged commit f925ef0 into mosaicml:dev Mar 22, 2024
14 checks passed

j316chuck pushed a commit that referenced this pull request May 16, 2024

In case of MPS device also copy batch to CPU (#3105)

b1e5c2d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

In case of MPS device also copy batch to CPU #3105

In case of MPS device also copy batch to CPU #3105

hyenal commented Mar 11, 2024 •

edited

Loading

mvpatel2000 left a comment •

edited

Loading

mvpatel2000 commented Mar 14, 2024

hyenal commented Mar 15, 2024

In case of MPS device also copy batch to CPU #3105

In case of MPS device also copy batch to CPU #3105

Conversation

hyenal commented Mar 11, 2024 • edited Loading

What does this PR do?

What issue(s) does this change relate to?

Before submitting

mvpatel2000 left a comment • edited Loading

Choose a reason for hiding this comment

mvpatel2000 commented Mar 14, 2024

hyenal commented Mar 15, 2024

hyenal commented Mar 11, 2024 •

edited

Loading

mvpatel2000 left a comment •

edited

Loading