Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix empty data loading in Insights #728

Closed
wants to merge 1 commit into from
Closed

Conversation

Reubend
Copy link
Contributor

@Reubend Reubend commented Aug 3, 2021

Previously, when Captum Insights had already shown all existing batches from the dataset, it would stop showing data at all after fetches. This change makes it recycle some batches, so that the user can keep clicking on "Fetch Data" to see the changes that their setting have on the results.

This was reported in #686

@facebook-github-bot
Copy link
Contributor

@Reubend has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@Reubend has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

@@ -439,7 +441,22 @@ def _calculate_vis_output(
return results if results else None

def _get_outputs(self) -> List[Tuple[List[VisualizationOutput], SampleCache]]:
batch_data = next(self._dataset_iter)
# If we run out of new betches, then we need to
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

betches -> batches ?

@NarineK
Copy link
Contributor

NarineK commented Aug 11, 2021

LGTM! @bilalsal, @edward-io do you have any comments on this issue ?

@@ -199,6 +200,7 @@ class scores.
self._outputs: List[VisualizationOutput] = []
self._config = FilterConfig(prediction="all", classes=[], num_examples=4)
self._dataset_iter = iter(dataset)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't your PR be simpler to construct this as cycle(iter(dataset)) since it seems you're effectively reimplementing this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be much simpler that way, but my concern is about memory usage. Per the docs, it seems that cycle

may require significant auxiliary storage (depending on the length of the iterable)

Which suggests to me that it stores the entire thing in an array and then loops around. If the dataset is huge, then I think this could be an issue, because every batch would get stored in memory. That's why I made the cache only store a few batches instead, rather than using a cycle over the entire dataset.

However, if you don't think the memory is an issue, I could do it this way instead. It would probably only be an issue if somebody requested a lot of attributions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Thanks for clarifying. Probably best to assume memory will be an issue.

Copy link
Contributor

@edward-io edward-io left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me!

@facebook-github-bot
Copy link
Contributor

@Reubend merged this pull request in e222265.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants