546 masking time step segmentation #562

cwmeijer · 2023-04-18T09:19:54Z

Work in progress: Fixes #546.

This should now result in nice masks with clustered time steps.

time masks
(0=masked, 1=unmasked)
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 1 1 1 1
^ channels/ time -->

(0=masked, 1=unmasked)
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
0 0 0 0 0 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0
^ channels/ time -->

(0=masked, 1=unmasked)
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 1 0 0 0 0
^ channels/ time -->



Still needs to be done before merging:
- Refactoring / polishing
   - docstrings
   - default value for number_of_features should always be None (original series length). Clustering and projecting etc should be optional as it is probably not something the user would expect.

I spent most time in making the RISE test with the expert model pass. This now works for many cases. It can sometimes be a bit brittle for certain short series_lengths, for instance. In the general case it now seems to work reliably.

geek-yang · 2023-04-25T09:39:10Z

dianna/utils/maskers.py


+
+def generate_channel_masks(input_data: np.ndarray, number_of_masks: int, p_keep: float):


Reminder: merge this function with generate_time_step_masks to reduce code duplication, action point from PR #554

This may have introduced an error as the RISE time series tests no longer pass.

…ep-segmentation' into 546-masking-time-step-segmentation

introduce failing test

add tests refactor: rename parameter

The error made the rise_for_timeseries tests fail as it couldn't find the correct hot/cold days in the synthetic tests with this bug.

to have a more accurate expected masked step count in the statistical sense of the word expected (mean). This also means that the test for this is no longer valid. Also removes a duplicate parametric test.

…on' into 546-masking-time-step-segmentation

# Conflicts: # tests/methods/test_rise_timeseries.py

dianna/utils/maskers.py

+    ceil = np.ceil(mean)
+    if floor != ceil:
+        user_requested_steps = int(
+            np.random.choice([floor, ceil], 1, p=[ceil - mean, mean - floor]))


dianna/utils/maskers.py


    Returns:
        The generated masks (np.ndarray)
    """
-    cell_size = np.ceil(np.array(input_size) / feature_res)
-    up_size = (feature_res + 1) * cell_size
+    grid = np.random.choice(a=(True, False),


dianna/utils/maskers.py

+    up_size = (number_of_features + 1) * cell_size
+    masks = np.empty((number_of_masks, *input_size), dtype=np.float32)
+    for i in range(masks.shape[0]):
+        y_offset = np.random.randint(0, cell_size[0])


dianna/utils/maskers.py

+    masks = np.empty((number_of_masks, *input_size), dtype=np.float32)
+    for i in range(masks.shape[0]):
+        y_offset = np.random.randint(0, cell_size[0])
+        x_offset = np.random.randint(0, cell_size[1])


dianna/utils/maskers.py

+    Returns:
+        The generated masks (np.ndarray)
+    """
+    grid = np.random.random(size=(number_of_masks, number_of_features,


dianna/utils/maskers.py

cwmeijer · 2024-01-25T09:52:50Z

I resolved all of @geek-yang 's comments and suggestions.
I merged the latest main
Out of scope are:
high level user documentation explaining the masking generation strategy (Add documentation/docstrings/examples of strategic masking based on wave function #564, https://medium.com/@cwmeijer/how-to-correctly-mask-time-series-data-to-use-with-xai-90247ac252b4).
an optional completely random masking generation strategy (see Add naive random masking strategy for masking timeseries #565).

Some examples of univariate masks of length 18 with 6 features:
0 0 0 1 1 1 1 1 1 0 0 0 0 0 0 1 1 1
0 0 0 0 1 1 0 0 0 0 0 1 1 1 1 1 1 1
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0
0 0 0 1 1 1 1 1 0 0 0 1 1 1 1 0 0 0
0 0 0 0 1 1 0 0 0 0 1 1 1 1 1 1 1 0
0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1
0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0
0 0 0 1 1 1 1 0 0 0 0 0 0 1 1 1 1 1
1 1 1 1 0 0 0 0 0 0 0 1 1 1 1 1 0 0
1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 1 1 1
0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0
1 0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1
1 1 1 1 1 1 1 0 0 0 0 0 0 1 1 0 0 0
1 1 0 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1
0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 1 0 0
0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 1 0 0
0 0 0 0 0 1 0 0 0 0 1 1 1 1 1 1 1 1

dianna/utils/maskers.py

+
+    masks = np.empty(masks_shape, dtype=np.float32)
+    for i_mask in range(masks.shape[0]):
+        offset = np.random.random()


cwmeijer · 2024-02-14T11:02:41Z

For anyone reviewing this PR. Please have a look first at my blog explaining this approach on a high level: https://medium.com/@cwmeijer/how-to-correctly-mask-time-series-data-to-use-with-xai-90247ac252b4

dianna/utils/maskers.py

SarahAlidoost · 2024-02-14T14:54:17Z

tests/methods/test_maskers.py

 from dianna.utils.maskers import generate_channel_masks
 from dianna.utils.maskers import generate_masks
+from dianna.utils.maskers import generate_time_step_masks


can we also test the function _generate_interpolated_float_masks_for_image?

tests/methods/test_maskers.py

SarahAlidoost

@cwmeijer thank you for implementing this. I read the blog and checked the implementation. The code can benefit from more documentation, but I saw that this will be addressed in a different issue later. I left some minor suggestions/comments.

Co-authored-by: SarahAlidoost <55081872+SarahAlidoost@users.noreply.github.com>

cwmeijer added 2 commits April 18, 2023 11:19

Merge branch 'main' into 546-masking-time-step-segmentation

d86462a

add draft working segmented time step masks (refs #546)

db439a5

geek-yang mentioned this pull request Apr 24, 2023

514 add channel masking #554

Merged

refactor time series maksing WIP

e944e4e

geek-yang reviewed Apr 25, 2023

View reviewed changes

cwmeijer added 7 commits April 26, 2023 09:27

refactor timeseries masks

beeea4e

add some general masks tests that also print masks

fb616b2

remove old time step mask function WIP

2ed154c

This may have introduced an error as the RISE time series tests no longer pass.

Merge branch 'main' into 546-masking-time-step-segmentation

e82022d

many segmented time step masking fixes

aafaf8d

Merge remote-tracking branch 'refs/remotes/origin/546-masking-time-st…

4bc1436

…ep-segmentation' into 546-masking-time-step-segmentation

add feature_res configurable for rise timeseries

3acb117

cwmeijer added this to To do in SS Sprint 1 - Pay off our technical debt Oct 3, 2023

cwmeijer added this to In progress in SS Sprint 2 - Pay off our technical debt Oct 30, 2023

cwmeijer self-assigned this Oct 30, 2023

cwmeijer added this to In progress in SS Sprint 2 - Pay off our technical debt Nov 6, 2023

cwmeijer added 2 commits November 20, 2023 10:22

fix bug swapped arguments p_keep and num_features and 2 test usages

daefb59

add failing test that checks number of masked cells in maskers

2f66363

cwmeijer added this to In progress in SS Sprint 3 - improve code quality, adding tabular funcitonality Nov 22, 2023

cwmeijer added 10 commits November 27, 2023 10:05

make test case temperatures easier and configurable

0acaeb8

add failing tests for masker

8975ad9

add printing to test for debugging (WIP)

dd6af61

Merge branch 'main' into 546-masking-time-step-segmentation

4b29c1f

parameterize rise time series test synthetic data

9d0f821

introduce failing test

split masking for image and time series and fix time series

8de390b

add tests refactor: rename parameter

fix error in consistent p_keep

47f663d

The error made the rise_for_timeseries tests fail as it couldn't find the correct hot/cold days in the synthetic tests with this bug.

make mask number condition more general

bddaf29

make masked time steps number stochastic

8f15c58

to have a more accurate expected masked step count in the statistical sense of the word expected (mean). This also means that the test for this is no longer valid. Also removes a duplicate parametric test.

lower n_masks in test to save ~15 seconds testing

d4789be

cwmeijer added this to In progress in SS Sprint 4 - Complete WP1 Jan 8, 2024

cwmeijer added 5 commits January 24, 2024 17:20

Merge remote-tracking branch 'origin/546-masking-time-step-segmentati…

728139a

…on' into 546-masking-time-step-segmentation

Merge branch 'main' into 546-masking-time-step-segmentation

463b91f

# Conflicts: # tests/methods/test_rise_timeseries.py

remove projection.ipynb temp notebook

906fe05

fix import error after merge

ba79712

replace deprecated np.bool with bool

6537d9a

cwmeijer marked this pull request as ready for review January 25, 2024 09:47

github-advanced-security bot found potential problems Jan 25, 2024

View reviewed changes

cwmeijer added 2 commits January 25, 2024 11:14

add projected mask test

711d95b

fix: random offset is now independent for each mask

778da8a

github-advanced-security bot found potential problems Jan 29, 2024

View reviewed changes

elboyran requested a review from cpranav93 February 2, 2024 13:50

cwmeijer removed the request for review from cpranav93 February 12, 2024 09:51