Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change TextPlugin to compute index_impl() asynchronously #663

Merged
merged 2 commits into from
Oct 26, 2017

Conversation

nfelt
Copy link
Contributor

@nfelt nfelt commented Oct 20, 2017

Addresses #625 by changing TextPlugin to avoid blocking its is_active() method on reading from the filesystem. Note that "reading from the filesystem" is not only reading once, but reading in all plugin assets from all runs, which could be dozens of reads per run, potentially over the network if using a remote filesystem. In my testing this could easily cause is_active() to take 30+ seconds that kind of scenario, which in turn blocks the /plugins_listing response for the whole of TensorBoard.

Since the TextPlugin's is_active() is based on its index_impl() (which backs its /tags route) and the expensive filesystem-checking logic is shared, this change puts the entirety of index_impl() on a background thread. This way the /tags route can also benefit from not blocking.

The semantics are now that is_active() will return False and index_impl() will return a placeholder empty response when first invoked, but they'll kick off at most one background thread at a time to do the computation and store the resulting index response. Once the plugin is detected to be active, is_active() will no longer kick off new background threads when invoked, but index_impl() will still kick off a new background thread when called (if one is not running already) so that refreshes will pick up new data as it gets written to the logdir.

@nfelt nfelt requested a review from chihuahua October 23, 2017 18:34
Copy link
Member

@chihuahua chihuahua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! This logic makes sense to me. The dashboard fetches tags every time it reloads. At that time, we can initiate a thread to determine if the plugin is active.

Sorry for the late reply. I've been at an all-day event. :)

self._thread_for_determining_is_active = new_thread
new_thread.start()
return False

def _determine_is_active(self):
"""Determines whether the thread is active.
"""Determines whether the plugin is active.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix.

def tags_impl(self):
# Recompute the index on demand whenever tags are requested, but do it
# in a separate thread to avoid blocking.
self._maybe_launch_index_impl_thread()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The projector plugin makes an assumption: Once it is deemed to be active, it never becomes inactive. Hence, you might consider assuming the same here - cache when the plugin is found to be active and never launch a separate thread again following that ... I think I'll defer to you to make the judgement here though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I actually opted to cache a little more aggressively than the projector plugin - the whole result of index_impl() is cached in self._index_cached, which has the benefit that tags_impl() can also return without blocking. However, unlike is_active() we don't want tags_impl() to stop refreshing the cache once it's gotten a positive result, because even a non-empty index (one that corresponds to an active plugin) may not be complete if more text summary data is still being written out to the event files. So tags_impl() needs to keep launching that background thread to refresh the cache as long as it's still being invoked itself.

Once we're at that point, we could either decide to separately cache the is_active() state permanently once it's determined to be True. This would have the benefit that is_active() would be even cheaper once it's true. However, I figured that it was slightly more complex to have two different pieces of cached data, and that we might as well just keep is_active() simple and have it reflect whatever the current active state is based on the last refresh of _index_cached - after all, if the plugin really doesn't ever go inactive after being active, the end result is the same.

Note that I do make it so that is_active() itself won't launch a new thread if the current state is active. It's only via tags_impl() that a new thread will be kicked off unconditionally; is_active() will take advantage of fresher data if it's present but won't actually refresh the data itself once the plugin is deemed active.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. We still need to launch the thread within tags_impl so that requests for tags stay fresh. SG.

tf.logging.info('TextPlugin computing index_impl() in a new thread')
self._index_cached = self.index_impl()
self._index_impl_lock.release()
self._index_impl_thread = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should self._index_impl_thread = None run before the lock is released? Otherwise, _index_impl_thread could be set to a separate thread in between? And we'd be releasing a newly started thread (and leaving the old one open)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! You're right, semantically it should definitely be set to None prior to releasing the lock. That said, _index_impl_thread actually only exists to facilitate testing (it's never read outside the test code) so in practice it's a wash. But fixed anyway.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah - true. That would've been innocuous.

Copy link
Member

@chihuahua chihuahua left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

tf.logging.info('TextPlugin computing index_impl() in a new thread')
self._index_cached = self.index_impl()
self._index_impl_lock.release()
self._index_impl_thread = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yeah - true. That would've been innocuous.

def tags_impl(self):
# Recompute the index on demand whenever tags are requested, but do it
# in a separate thread to avoid blocking.
self._maybe_launch_index_impl_thread()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see what you mean. We still need to launch the thread within tags_impl so that requests for tags stay fresh. SG.

@nfelt nfelt merged commit dac74be into tensorflow:master Oct 26, 2017
@nfelt nfelt deleted the textplugin-indeximpl-async branch October 26, 2017 23:22
jart pushed a commit to jart/tensorboard that referenced this pull request Oct 31, 2017
)

Addresses tensorflow#625 by changing TextPlugin to avoid blocking its `is_active()` method on reading from the filesystem.  Note that "reading from the filesystem" is not only reading once, but reading in all plugin assets from all runs, which could be dozens of reads per run, potentially over the network if using a remote filesystem.  In my testing this could easily cause `is_active()` to take 30+ seconds that kind of scenario, which in turn blocks the `/plugins_listing` response for the whole of TensorBoard.

Since the TextPlugin's `is_active()` is based on its `index_impl()` (which backs its `/tags` route) and the expensive filesystem-checking logic is shared, this change puts the entirety of `index_impl()` on a background thread.  This way the `/tags` route can also benefit from not blocking.

The semantics are now that `is_active()` will return False and `index_impl()` will return a placeholder empty response when first invoked, but they'll kick off at most one background thread at a time to do the computation and store the resulting index response.  Once the plugin is detected to be active, `is_active()` will no longer kick off new background threads when invoked, but `index_impl()` will still kick off a new background thread when called (if one is not running already) so that refreshes will pick up new data as it gets written to the logdir.
jart pushed a commit that referenced this pull request Nov 1, 2017
Addresses #625 by changing TextPlugin to avoid blocking its `is_active()` method on reading from the filesystem.  Note that "reading from the filesystem" is not only reading once, but reading in all plugin assets from all runs, which could be dozens of reads per run, potentially over the network if using a remote filesystem.  In my testing this could easily cause `is_active()` to take 30+ seconds that kind of scenario, which in turn blocks the `/plugins_listing` response for the whole of TensorBoard.

Since the TextPlugin's `is_active()` is based on its `index_impl()` (which backs its `/tags` route) and the expensive filesystem-checking logic is shared, this change puts the entirety of `index_impl()` on a background thread.  This way the `/tags` route can also benefit from not blocking.

The semantics are now that `is_active()` will return False and `index_impl()` will return a placeholder empty response when first invoked, but they'll kick off at most one background thread at a time to do the computation and store the resulting index response.  Once the plugin is detected to be active, `is_active()` will no longer kick off new background threads when invoked, but `index_impl()` will still kick off a new background thread when called (if one is not running already) so that refreshes will pick up new data as it gets written to the logdir.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants