Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

546 masking time step segmentation #562

Merged
merged 40 commits into from
Feb 28, 2024
Merged

Conversation

cwmeijer
Copy link
Contributor

@cwmeijer cwmeijer commented Apr 18, 2023

Work in progress: Fixes #546.

This should now result in nice masks with clustered time steps.

time masks
(0=masked, 1=unmasked)
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
^ channels/ time -->

(0=masked, 1=unmasked)
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
^ channels/ time -->

(0=masked, 1=unmasked)
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
^ channels/ time -->



Still needs to be done before merging:
- Refactoring / polishing
   - docstrings
   - default value for number_of_features should always be None (original series length). Clustering and projecting etc should be optional as it is probably not something the user would expect.

I spent most time in making the RISE test with the expert model pass. This now works for many cases. It can sometimes be a bit brittle for certain short series_lengths, for instance. In the general case it now seems to work reliably.



def generate_channel_masks(input_data: np.ndarray, number_of_masks: int, p_keep: float):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder: merge this function with generate_time_step_masks to reduce code duplication, action point from PR #554

The error made the rise_for_timeseries tests fail as it couldn't find the correct hot/cold days in the synthetic tests with this bug.
 to have a more accurate expected masked step count in the statistical sense of the word expected (mean).
 This also means that the test for this is no longer valid.
 Also removes a duplicate parametric test.
@cwmeijer cwmeijer added this to In progress in SS Sprint 4 - Complete WP1 Jan 8, 2024
@cwmeijer cwmeijer marked this pull request as ready for review January 25, 2024 09:47
ceil = np.ceil(mean)
if floor != ceil:
user_requested_steps = int(
np.random.choice([floor, ceil], 1, p=[ceil - mean, mean - floor]))

Check notice

Code scanning / SonarCloud

numpy.random.Generator should be preferred to numpy.random.RandomState Low

Use a "numpy.random.Generator" here instead of this legacy function. See more on SonarCloud

Returns:
The generated masks (np.ndarray)
"""
cell_size = np.ceil(np.array(input_size) / feature_res)
up_size = (feature_res + 1) * cell_size
grid = np.random.choice(a=(True, False),

Check notice

Code scanning / SonarCloud

numpy.random.Generator should be preferred to numpy.random.RandomState Low

Use a "numpy.random.Generator" here instead of this legacy function. See more on SonarCloud
up_size = (number_of_features + 1) * cell_size
masks = np.empty((number_of_masks, *input_size), dtype=np.float32)
for i in range(masks.shape[0]):
y_offset = np.random.randint(0, cell_size[0])

Check notice

Code scanning / SonarCloud

numpy.random.Generator should be preferred to numpy.random.RandomState Low

Use a "numpy.random.Generator" here instead of this legacy function. See more on SonarCloud
masks = np.empty((number_of_masks, *input_size), dtype=np.float32)
for i in range(masks.shape[0]):
y_offset = np.random.randint(0, cell_size[0])
x_offset = np.random.randint(0, cell_size[1])

Check notice

Code scanning / SonarCloud

numpy.random.Generator should be preferred to numpy.random.RandomState Low

Use a "numpy.random.Generator" here instead of this legacy function. See more on SonarCloud
Returns:
The generated masks (np.ndarray)
"""
grid = np.random.random(size=(number_of_masks, number_of_features,

Check notice

Code scanning / SonarCloud

numpy.random.Generator should be preferred to numpy.random.RandomState Low

Use a "numpy.random.Generator" here instead of this legacy function. See more on SonarCloud
dianna/utils/maskers.py Fixed Show fixed Hide fixed
@cwmeijer
Copy link
Contributor Author

cwmeijer commented Jan 25, 2024

Some examples of univariate masks of length 18 with 6 features:
0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1
0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0
0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0
0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0
0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1
1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0
1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0
1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1
0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1


masks = np.empty(masks_shape, dtype=np.float32)
for i_mask in range(masks.shape[0]):
offset = np.random.random()

Check notice

Code scanning / SonarCloud

numpy.random.Generator should be preferred to numpy.random.RandomState Low

Use a "numpy.random.Generator" here instead of this legacy function. See more on SonarCloud
@cwmeijer cwmeijer removed the request for review from cpranav93 February 12, 2024 09:51
@cwmeijer
Copy link
Contributor Author

For anyone reviewing this PR. Please have a look first at my blog explaining this approach on a high level: https://medium.com/@cwmeijer/how-to-correctly-mask-time-series-data-to-use-with-xai-90247ac252b4

from dianna.utils.maskers import generate_channel_masks
from dianna.utils.maskers import generate_masks
from dianna.utils.maskers import generate_time_step_masks
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we also test the function _generate_interpolated_float_masks_for_image?

Copy link
Contributor

@SarahAlidoost SarahAlidoost left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cwmeijer thank you for implementing this. I read the blog and checked the implementation. The code can benefit from more documentation, but I saw that this will be addressed in a different issue later. I left some minor suggestions/comments.

@cwmeijer cwmeijer added this to Ready for review in SS Sprint 6 Feb 28, 2024
@cwmeijer cwmeijer merged commit b656425 into main Feb 28, 2024
18 checks passed
@cwmeijer cwmeijer deleted the 546-masking-time-step-segmentation branch February 28, 2024 15:47
@cwmeijer cwmeijer moved this from Ready for review to Done in SS Sprint 6 Mar 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

smarter segmentation while creating masks in timeseries
3 participants