Test failure on macos-14 CI (M1 Mac, MPS torch backend): test_mps_batched_mask_to_box #380

GenevieveBuckley · 2024-02-06T01:09:17Z

Test test_mps_batched_mask_to_box fails on our macos-14 CI.

tests/test_sam_annotator/test_vendored.py::TestVendored.test_mps_batched_mask_to_box

=================================== FAILURES ===================================
__________________ TestVendored.test_mps_batched_mask_to_box ___________________

self = <test.test_vendored.TestVendored testMethod=test_mps_batched_mask_to_box>

    @unittest.skipIf(not (torch.backends.mps.is_available() and torch.backends.mps.is_built()),
                     "MPS Pytorch backend is not available")
    def test_mps_batched_mask_to_box(self):
>       self._test_batched_mask_to_box(device="mps")

test/test_vendored.py:37: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = <test.test_vendored.TestVendored testMethod=test_mps_batched_mask_to_box>
device = 'mps'

    def _test_batched_mask_to_box(self, device):
        from micro_sam._vendored import batched_mask_to_box
    
        mask, expected_result = self._get_mask_to_box_data()
>       mask = torch.as_tensor(mask, dtype=torch.bool, device=device)
E       RuntimeError: MPS backend out of memory (MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB). Tried to allocate 256 bytes on private pool. Use PYTORCH_MPS_HIGH_WATERMARK_RATIO=0.0 to disable upper limit for memory allocations (may cause system failure).

test/test_vendored.py:21: RuntimeError

Details:

It fails on the macos-14 CI runner, an M1 Mac with 7GB of memory available.
It does not fail on my laptop locally (an M1 Mac with 16GB of memory available)
It does not fail on any of the other CI runners: ubuntu-latest, windows-latest, macos-latest. At the time of writing, macos-latest provides a CPU only mac, the github actions macos-12` image.
It does not matter which order the tests are run in (pytest-randomly). You can run only the test_vendored.py file and see the same error (so it is probably not a memory leak coming from other tests)
The error message says it "Tried to allocate 256 bytes on private pool" and also that there are currently zero bytes allocated by other things "MPS allocated: 0 bytes, other allocations: 0 bytes, max allowed: 7.93 GB". Full error message above. Maybe I'm misinterpreting it, or the message is just wrong? This is surprising to me.
Setting os.environ["PYTORCH_MPS_HIGH_WATERMARK_RATIO"] = "0.0" as suggested by the error message, did not help.
I played around with the torch.mps options (see the docs here):
- Setting torch.mps.set_per_process_memory_fraction(2.0) did not help. Two is the maximum possible value allowed.
- Using torch.mps.empty_cache() did not help.
- Both torch.mps.current_allocated_memory() and torch.mps.driver_allocated_memory() reported zero bytes of allocated MPS memory before the failing line of the test.
First noticed here: Add M1 MacOS runner to github actions jobs #370 (comment)

The text was updated successfully, but these errors were encountered:

constantinpape · 2024-02-06T07:37:12Z

Thanks for all the work on this @GenevieveBuckley!

Just two brief comments from my side:

It probably makes sense to just check this again when a new PyTorch release is out. I have a feeling that these issues are related to memory inefficiencies / issues in memory handling with the MPS backbone that could get fixed in the future.
To understand this better we could also compare the memory profiles of using MPS and CPU as device, to see if they result in similar memory leaks (if not then this is a strong hint that the issue is indeed with MPS.)

GenevieveBuckley · 2024-02-07T00:00:21Z

It probably makes sense to just check this again when a new PyTorch release is out.

Ok, we can try that.

The current version of pytorch is 2.2.0
Here is the conda list, with all the version information:

Details (click to expand)

    Name                                  Version       Build                    Channel    
  ────────────────────────────────────────────────────────────────────────────────────────────
    absl-py                               2.1.0         pyhd8ed1ab_0             conda-forge
    affogato                              0.3.3         py311h9e438b8_3          conda-forge
    aiohttp                               3.9.1         py311h05b510d_0          conda-forge
    aiosignal                             1.3.1         pyhd8ed1ab_0             conda-forge
    alabaster                             0.7.16        pyhd8ed1ab_0             conda-forge
    aom                                   3.8.1         h078ce10_0               conda-forge
    app-model                             0.2.4         pyhd8ed1ab_0             conda-forge
    appdirs                               1.4.4         pyh9f0ad1d_0             conda-forge
    appnope                               0.1.3         pyhd8ed1ab_0             conda-forge
    asciitree                             0.3.3         py_2                     conda-forge
    asttokens                             2.4.1         pyhd8ed1ab_0             conda-forge
    atk-1.0                               2.38.0        hcb7b3dd_1               conda-forge
    attrs                                 23.2.0        pyh71513ae_0             conda-forge
    babel                                 2.14.0        pyhd8ed1ab_0             conda-forge
    bioimageio.core                       0.5.11        pyhd8ed1ab_0             conda-forge
    bioimageio.spec                       0.4.9.post5   pyhd8ed1ab_0             conda-forge
    blinker                               1.7.0         pyhd8ed1ab_0             conda-forge
    blosc                                 1.21.5        hc338f07_0               conda-forge
    brotli                                1.1.0         hb547adb_1               conda-forge
    brotli-bin                            1.1.0         hb547adb_1               conda-forge
    brotli-python                         1.1.0         py311ha891d26_1          conda-forge
    brunsli                               0.1           h9f76cd9_0               conda-forge
    bzip2                                 1.0.8         h93a5062_5               conda-forge
    c-ares                                1.26.0        h93a5062_0               conda-forge
    c-blosc2                              2.13.1        ha57e6be_0               conda-forge
    sympy                                 1.12          pypyh9d50eac_103         conda-forge
    tbb                                   2021.11.0     h2ffa867_1               conda-forge
    tensorboard                           2.15.1        pyhd8ed1ab_0             conda-forge
    tensorboard-data-server               0.7.0         py311h5fb2c35_1          conda-forge
    threadpoolctl                         3.2.0         pyha21a80b_0             conda-forge
    tifffile                              2024.1.30     pyhd8ed1ab_0             conda-forge
    timm                                  0.9.12        pyhd8ed1ab_0             conda-forge
    tk                                    8.6.13        h5083fa2_1               conda-forge
    tomli                                 2.0.1         pyhd8ed1ab_0             conda-forge
    tomli-w                               1.0.0         pyhd8ed1ab_0             conda-forge
    toolz                                 0.12.1        pyhd8ed1ab_0             conda-forge
    torch_em                              0.6.1         pyhd8ed1ab_0             conda-forge
    torchvision                           0.17.0        py311_cpu                pytorch    
    tornado                               6.3.3         py311heffc1b2_1          conda-forge
    tqdm                                  4.66.1        pyhd8ed1ab_0             conda-forge
    traitlets                             5.14.1        pyhd8ed1ab_0             conda-forge
    typer                                 0.9.0         pyhd8ed1ab_0             conda-forge
    typing-extensions                     4.9.0         hd8ed1ab_0               conda-forge
    typing_extensions                     4.9.0         pyha770c72_0             conda-forge
    tzdata                                2024a         h0c530f3_0               conda-forge
    urllib3                               2.2.0         pyhd8ed1ab_0             conda-forge
    vigra                                 1.11.2        py311hb7482d5_4          conda-forge
    vispy                                 0.14.1        py311h80bfdd0_0          conda-forge
    wcwidth                               0.2.13        pyhd8ed1ab_0             conda-forge
    werkzeug                              3.0.1         pyhd8ed1ab_0             conda-forge
    wheel                                 0.42.0        pyhd8ed1ab_0             conda-forge
    wrapt                                 1.16.0        py311h05b510d_0          conda-forge
    x264                                  1!164.3095    h57fd34a_2               conda-forge
    x265                                  3.5           hbc6ce65_3               conda-forge
    xarray                                2024.1.1      pyhd8ed1ab_0             conda-forge
    xorg-libxau                           1.0.11        hb547adb_0               conda-forge
    xorg-libxdmcp                         1.1.3         h27ca646_0               conda-forge
    xxhash                                0.8.2         hb547adb_0               conda-forge
    xz                                    5.2.6         h57fd34a_0               conda-forge
    yaml                                  0.2.5         h3422bc3_2               conda-forge
    yarl                                  1.9.4         py311h05b510d_0          conda-forge
    z5py                                  2.0.17        py311h4b05729_0          conda-forge
    zarr                                  2.16.1        pyhd8ed1ab_0             conda-forge
    zeromq                                4.3.5         h965bd2d_0               conda-forge
    zfp                                   1.0.1         ha8f4885_0               conda-forge
    zipp                                  3.17.0        pyhd8ed1ab_0             conda-forge
    zlib                                  1.2.13        h53f4e23_5               conda-forge
    zlib-ng                               2.0.7         h1a8c8d9_0               conda-forge

GenevieveBuckley mentioned this issue Feb 6, 2024

Add M1 MacOS runner to github actions jobs #370

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Test failure on macos-14 CI (M1 Mac, MPS torch backend): test_mps_batched_mask_to_box #380

Test failure on macos-14 CI (M1 Mac, MPS torch backend): test_mps_batched_mask_to_box #380

GenevieveBuckley commented Feb 6, 2024 •

edited

Loading

constantinpape commented Feb 6, 2024

GenevieveBuckley commented Feb 7, 2024

Test failure on macos-14 CI (M1 Mac, MPS torch backend): test_mps_batched_mask_to_box #380

Test failure on macos-14 CI (M1 Mac, MPS torch backend): test_mps_batched_mask_to_box #380

Comments

GenevieveBuckley commented Feb 6, 2024 • edited Loading

constantinpape commented Feb 6, 2024

GenevieveBuckley commented Feb 7, 2024

GenevieveBuckley commented Feb 6, 2024 •

edited

Loading