Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed up fourier py #808

Merged
merged 9 commits into from
Mar 13, 2024
Merged

Speed up fourier py #808

merged 9 commits into from
Mar 13, 2024

Conversation

matteobachetti
Copy link
Member

@matteobachetti matteobachetti commented Mar 11, 2024

In generate_indices_of_segment_boundaries_unbinned:

  • Avoid unnecessary check for array sortedness. This saves ~1/3 of computing time in large, memory mapped arrays
  • Use numpy.searchsorted only once

Copy link

codecov bot commented Mar 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 96.43%. Comparing base (7377187) to head (9341db0).

Additional details and impacted files
@@           Coverage Diff           @@
##             main     #808   +/-   ##
=======================================
  Coverage   96.42%   96.43%           
=======================================
  Files          45       45           
  Lines        9016     9027   +11     
=======================================
+ Hits         8694     8705   +11     
  Misses        322      322           

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link

@eleonoraveronica eleonoraveronica left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work @matteobachetti, and thank you for pushing me to make my first review. I hope it would be somehow useful :) I selected "Approve" the changes, since I only suggest a possible one that is not essential.

stingray/gti.py Outdated
@@ -1565,7 +1565,7 @@ def generate_indices_of_gti_boundaries(times, gti, dt=0):
yield s, e, idx0, idx1


def generate_indices_of_segment_boundaries_unbinned(times, gti, segment_size):
def generate_indices_of_segment_boundaries_unbinned(times, gti, segment_size, check_sorted=False):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I totally agree with the idea of adding a check on the data sorting. It speeds up the code, and it leaves the choice to the user. In my opinion, when the user can choose, is always happier :)

stingray/gti.py Outdated
assert is_sorted(times), "Array is not sorted"
if check_sorted:
assert is_sorted(times), "Array is not sorted"
all_times = np.sort(np.array(list(set(np.concatenate([start, stop])))))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line seems a bit complicated to me. If possible, I would simplify it: is list necessary? To my understanding, set applies to the ndarray in output from np.concatenate, then np.array should transform the ndarray directly in a numpy array. Would np.sort(np.array(set(np.concatenate([start, stop])))) work in the same way?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit confusing, I agree 😅 . However, set does not return an iterable that numpy.array recognizes as such. When you create an array from a set, it just creates an object array with a single element (the whole set).

In [1]: import numpy as np

In [2]: start = np.arange(5)

In [3]: stop = np.arange(1, 6)

In [4]: np.array(set(np.concatenate([start, stop])))
Out[4]: array({0, 1, 2, 3, 4, 5}, dtype=object)

In [5]: np.sort(np.array(set(np.concatenate([start, stop]))))
---------------------------------------------------------------------------
AxisError                                 Traceback (most recent call last)
Cell In[5], line 1
----> 1 np.sort(np.array(set(np.concatenate([start, stop]))))

File ~/opt/anaconda3/envs/py311/lib/python3.11/site-packages/numpy/core/fromnumeric.py:1017, in sort(a, axis, kind, order)
   1015 else:
   1016     a = asanyarray(a).copy(order="K")
-> 1017 a.sort(axis=axis, kind=kind, order=order)
   1018 return a

AxisError: axis -1 is out of bounds for array of dimension 0

The line above just shows that the array has dimension 0, it's just a single object.

list transforms the set into an iterable that numpy.array understands, and the line above has the expected behavior.

In [6]: np.array(list(set(np.concatenate([start, stop]))))
Out[6]: array([0, 1, 2, 3, 4, 5])

In [7]: np.sort(np.array(list(set(np.concatenate([start, stop])))))
Out[7]: array([0, 1, 2, 3, 4, 5])

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Vele89 I added comments to explain each passage

@matteobachetti matteobachetti added this pull request to the merge queue Mar 13, 2024
Merged via the queue into main with commit e4e477c Mar 13, 2024
17 checks passed
@matteobachetti matteobachetti deleted the speed_up_fourier_py branch March 13, 2024 18:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants