Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change in type promotion. Fixes to annotation.py #506

Merged
merged 3 commits into from
Oct 11, 2024
Merged

Change in type promotion. Fixes to annotation.py #506

merged 3 commits into from
Oct 11, 2024

Conversation

tompollard
Copy link
Member

@tompollard tompollard commented Oct 9, 2024

As discussed in #493, numpy v2.0 introduced changes to type promotion rules: https://numpy.org/devdocs/numpy_2_0_migration_guide.html#changes-to-numpy-data-type-promotion

Running pytest with numpy==2.0.2 and NPY_PROMOTION_STATE=weak_and_warn raises the following warnings for wfdb/io/annotation.py:

  /Users/tompollard/projects/wfdb-python/wfdb/io/annotation.py:2222: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
    while filebytes[bpi, 1] >> 2 == 59:

  /Users/tompollard/projects/wfdb-python/wfdb/io/annotation.py:2239: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
    label_store = filebytes[bpi, 1] >> 2

tests/test_plot.py::TestPlotInternal::test_get_plot_dims
  /Users/tompollard/projects/wfdb-python/wfdb/io/annotation.py:2240: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
    sample_diff += int(filebytes[bpi, 0] + 256 * (filebytes[bpi, 1] & 3))

The changes in this pull request address these issues by explicitly casting the type. I plan to follow up with several additional fixes to other modules.

@bemoody
Copy link
Collaborator

bemoody commented Oct 9, 2024

Thanks for working on this!

filebytes = filebytes.astype(np.int64)

Here, the original filebytes is an array of uint8. You're copying and converting them into an array of int64, which means multiplying memory usage by a factor of 9.

So I think this would probably work, but I think (since we're looping over the bytes in Python anyway) it'd be better to do the conversion one element at a time as needed.

That is to say, changing something like filebytes[bpi, 1] >> 2 to either int(filebytes[bpi, 1]) >> 2 if what we really want is a python integer, or np.int64(filebytes[bpi, 1]) >> 2 if what we want is a numpy integer.

NPY_PROMOTION_STATE=weak_and_warn reports that several variables changed from int64 to uint8.
@tompollard
Copy link
Member Author

Thanks! Like this? I don't see any particular reason to use int64 on the while loop, so went for int.

@cbrnr cbrnr mentioned this pull request Oct 10, 2024
@bemoody
Copy link
Collaborator

bemoody commented Oct 10, 2024

Thanks. Unless I missed something, though, it looks like there's still an issue in line 2239.

resolve UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
@bemoody
Copy link
Collaborator

bemoody commented Oct 10, 2024

In line 2241, I think we want np.int64(filebytes[bpi, 1]) & 3 instead of np.int64(filebytes[bpi, 1] & 3).

In addition to the ones you fixed, there's one more warning in line 2327 (aux_notelen & 1).

I'm not sure why not all of the warnings were showing up before.

@tompollard
Copy link
Member Author

In line 2241, I think we want np.int64(filebytes[bpi, 1]) & 3 instead of np.int64(filebytes[bpi, 1] & 3).

Sorry, good catch.

In addition to the ones you fixed, there's one more warning in line 2327 (aux_notelen & 1).

Odd, I see this now too:

tests/test_annotation.py: 10 warnings
tests/test_plot.py: 2 warnings
  /Users/tompollard/projects/wfdb-python/wfdb/io/annotation.py:2327: UserWarning: result dtype changed due to the removal of value-based promotion from NumPy. Changed from int64 to uint8.
    if aux_notelen & 1:

I have cast to int64

@tompollard tompollard force-pushed the tp/493-1 branch 2 times, most recently from 65c4b1b to e729062 Compare October 10, 2024 21:06
@tompollard
Copy link
Member Author

In line 2241, I think we want np.int64(filebytes[bpi, 1]) & 3 instead of np.int64(filebytes[bpi, 1] & 3).

Adding just np.int64(filebytes[bpi, 1]) & 3 causes the multiplication to precede the bitwise & operator, so I added an extra set of parentheses to maintain the order.

@bemoody
Copy link
Collaborator

bemoody commented Oct 11, 2024

I have cast to int64

Sorry, this (e729062) does not fix it. "aux_notelen" is causing the warning. "aux_notebytes" is not a problem and there is no reason to change it.

@bemoody
Copy link
Collaborator

bemoody commented Oct 11, 2024

There is also something strange happening involving pandas in rdann

$ NPY_PROMOTION_STATE=legacy ./ve.np2/bin/python -c 'import wfdb, numpy, pandas; print(numpy.__version__, pandas.__version__); wfdb.rdann("sample-data/1003", "atr")'
2.1.2 2.2.3
$ NPY_PROMOTION_STATE=weak ./ve.np2/bin/python -c 'import wfdb, numpy, pandas; print(numpy.__version__, pandas.__version__); wfdb.rdann("sample-data/1003", "atr")'
2.1.2 2.2.3
$ NPY_PROMOTION_STATE=legacy ./ve.np1/bin/python -c 'import wfdb, numpy, pandas; print(numpy.__version__, pandas.__version__); wfdb.rdann("sample-data/1003", "atr")'
1.26.4 2.2.3
$ NPY_PROMOTION_STATE=weak ./ve.np1/bin/python -c 'import wfdb, numpy, pandas; print(numpy.__version__, pandas.__version__); wfdb.rdann("sample-data/1003", "atr")'
1.26.4 2.2.3
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/tmp/wfdb-python/wfdb/io/annotation.py", line 2019, in rdann
    annotation.set_label_elements(return_label_elements)
  File "/tmp/wfdb-python/wfdb/io/annotation.py", line 1405, in set_label_elements
    self.convert_label_attribute(contained_elements[0], e)
  File "/tmp/wfdb-python/wfdb/io/annotation.py", line 1468, in convert_label_attribute
    label_map = self.create_label_map(inplace=False)
                ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/wfdb/io/annotation.py", line 908, in create_label_map
    label_map.loc[i] = self.custom_labels.loc[i]
    ~~~~~~~~~~~~~^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/indexing.py", line 911, in __setitem__
    iloc._setitem_with_indexer(indexer, value, self.name)
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/indexing.py", line 1932, in _setitem_with_indexer
    self._setitem_with_indexer_missing(indexer, value)
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/indexing.py", line 2328, in _setitem_with_indexer_missing
    self.obj._mgr = self.obj._append(value)._mgr
                    ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/frame.py", line 10554, in _append
    other = row_df.infer_objects(copy=False).rename_axis(
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/generic.py", line 6888, in infer_objects
    new_mgr = self._mgr.convert(copy=copy)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 447, in convert
    return self.apply("convert", copy=copy, using_cow=using_copy_on_write())
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/managers.py", line 363, in apply
    applied = getattr(b, f)(**kwargs)
              ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 639, in convert
    blocks = self.split_and_operate(
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 471, in split_and_operate
    rbs = func(nb, *args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/tmp/wfdb-python/ve.np1/lib/python3.11/site-packages/pandas/core/internals/blocks.py", line 655, in convert
    res_values = lib.maybe_convert_objects(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "lib.pyx", line 2602, in pandas._libs.lib.maybe_convert_objects
OverflowError: Python int too large to convert to C long

fix 'UserWarning: result dtype changed due to the removal of value-based promotion from NumPy'.
@bemoody
Copy link
Collaborator

bemoody commented Oct 11, 2024

Looks like a straight up pandas bug:

$ NPY_PROMOTION_STATE=weak ./ve.np1/bin/python -c 'import numpy,pandas; print(numpy.__version__, pandas.__version__); print(pandas.DataFrame({"x":[1]}))'
1.26.4 2.2.3
...
OverflowError: Python int too large to convert to C long

@bemoody
Copy link
Collaborator

bemoody commented Oct 11, 2024

pandas-dev/pandas#60023

@tompollard
Copy link
Member Author

Thanks Benjamin. Interested in seeing what happens with the Pandas issue!

@tompollard tompollard merged commit c687060 into main Oct 11, 2024
14 checks passed
@tompollard tompollard deleted the tp/493-1 branch October 11, 2024 23:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants