Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Contrib unit test failure in openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data #1802

Open
xwang233 opened this issue May 16, 2024 · 0 comments
Labels
bug Something isn't working

Comments

@xwang233
Copy link
Contributor

Describe the Bug

Contrib unit test failure in openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data

Minimal Steps/Code to Reproduce the Bug

root@b4db9ba94176:/opt/pytorch/apex/apex/contrib/test# pytest -vvvs -k test_fused_update_on_random_data
============================================================================================== test session starts ==============================================================================================
platform linux -- Python 3.10.12, pytest-8.1.1, pluggy-1.5.0 -- /usr/bin/python3
cachedir: .pytest_cache
Test order randomisation NOT enabled. Enable with --random-order or --random-order-bucket=<bucket_type>
benchmark: 4.0.0 (defaults: timer=time.perf_counter disable_gc=False min_rounds=5 min_time=0.000005 max_time=1.0 calibration_precision=10 warmup=False warmup_iterations=100000)
hypothesis profile 'default' -> database=DirectoryBasedExampleDatabase(PosixPath('/opt/pytorch/apex/apex/contrib/test/.hypothesis/examples'))
rootdir: /opt/pytorch/apex
configfile: pyproject.toml
plugins: timestamper-0.0.10, xdist-3.6.1, random-order-1.1.1, benchmark-4.0.0, rerunfailures-14.0, anyio-4.3.0, timeout-2.3.1, xdoctest-1.1.0, hypothesis-6.100.0, shard-0.1.2, cov-4.1.0, flakefinder-1.1.0
collected 113 items / 112 deselected / 1 selected
Running 1 items in this shard: apex/contrib/test/openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data

[2024-05-15 17:12:58] openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data FAILED

=================================================================================================== FAILURES ====================================================================================================
_____________________________________________________________________________ FusedAdamSWATestCase.test_fused_update_on_random_data _____________________________________________________________________________

self = <test_fused_adam_swa.FusedAdamSWATestCase testMethod=test_fused_update_on_random_data>

    def setUp(self):
        super().setUp()
        self._seed = 19260817
        random.seed(self._seed)
        torch.manual_seed(self._seed)
>       torch.backends.cudnn.deterministic = True

openfold_triton/test_fused_adam_swa.py:91:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = <torch.backends.ContextProp object at 0x7f57fdef1a80>, obj = <module 'torch.backends.cudnn' from '/opt/pytorch/pytorch/torch/backends/cudnn/__init__.py'>, val = True

    def __set__(self, obj, val):
        if not flags_frozen():
            self.setter(val)
        else:
>           raise RuntimeError(
                f"not allowed to set {obj.__name__} flags "
                "after disable_global_flags; please use flags() context manager instead"
            )
E           RuntimeError: not allowed to set torch.backends.cudnn flags after disable_global_flags; please use flags() context manager instead

../../../../pytorch/torch/backends/__init__.py:43: RuntimeError
=============================================================================================== warnings summary ================================================================================================
../../transformer/tensor_parallel/cross_entropy.py:78
  /opt/pytorch/apex/apex/transformer/tensor_parallel/cross_entropy.py:78: DeprecationWarning: invalid escape sequence '\s'
    """

../../transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_with_interleaving.py:49
  /opt/pytorch/apex/apex/transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_with_interleaving.py:49: DeprecationWarning: invalid escape sequence '\_'
    """Run interleaved 1F1B schedule with communication between pipeline stages as needed.

../../transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_without_interleaving.py:261
  /opt/pytorch/apex/apex/transformer/pipeline_parallel/schedules/fwd_bwd_pipelining_without_interleaving.py:261: DeprecationWarning: invalid escape sequence '\_'
    """Run non-interleaved 1F1B schedule, with communication between pipeline stages.

../../../../pytorch/torch/_custom_ops.py:253
  /opt/pytorch/pytorch/torch/_custom_ops.py:253: DeprecationWarning: torch.library.impl_abstract was renamed to torch.library.register_fake. Please use that instead; we will remove torch.library.impl_abstract in a future version of PyTorch.
    return torch.library.impl_abstract(qualname, func, _stacklevel=2)

../../../../vision/torchvision/transforms/_functional_pil.py:242
  /opt/pytorch/vision/torchvision/transforms/_functional_pil.py:242: DeprecationWarning: BILINEAR is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BILINEAR instead.
    interpolation: int = Image.BILINEAR,

../../../../vision/torchvision/transforms/_functional_pil.py:288
  /opt/pytorch/vision/torchvision/transforms/_functional_pil.py:288: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
    interpolation: int = Image.NEAREST,

../../../../vision/torchvision/transforms/_functional_pil.py:304
  /opt/pytorch/vision/torchvision/transforms/_functional_pil.py:304: DeprecationWarning: NEAREST is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.NEAREST or Dither.NONE instead.
    interpolation: int = Image.NEAREST,

../../../../vision/torchvision/transforms/_functional_pil.py:321
  /opt/pytorch/vision/torchvision/transforms/_functional_pil.py:321: DeprecationWarning: BICUBIC is deprecated and will be removed in Pillow 10 (2023-07-01). Use Resampling.BICUBIC instead.
    interpolation: int = Image.BICUBIC,

../optimizers/distributed_fused_adam.py:273
  /opt/pytorch/apex/apex/contrib/optimizers/distributed_fused_adam.py:273: DeprecationWarning: invalid escape sequence '\:'
    """Adam optimizer with ZeRO algorithm.

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================================================================ short test summary info ============================================================================================
FAILED openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data - RuntimeError: not allowed to set torch.backends.cudnn flags after disable_global_flags; please use flags() context manager instead
================================================================================= 1 failed, 112 deselected, 9 warnings in 7.40s =================================================================================

Expected Behavior
test pass

Environment
test failed since 2/13/24 although #1759 was merged on 12/14/23 and there has been no change on the test since then

test was skipped before 2/13/24 because some environment setup in our CI, e.g. 2/12/24:

openfold_triton/test_fused_adam_swa.py::FusedAdamSWATestCase::test_fused_update_on_random_data SKIPPED (Skip testing FusedAdamSWA: No module named 'einops')

cc @crcrpar @eqy @nWEIdia

@xwang233 xwang233 added the bug Something isn't working label May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant