Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Fix #8518 (sync requests being cached wrongly on timeout) #9358

Merged
merged 7 commits into from
Feb 24, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/9358.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Added a fix that invalidates cache for empty timed-out sync responses.
ShadowJonathan marked this conversation as resolved.
Show resolved Hide resolved
3 changes: 2 additions & 1 deletion synapse/handlers/sync.py
Original file line number Diff line number Diff line change
Expand Up @@ -277,8 +277,9 @@ async def wait_for_sync_for_user(
user_id = sync_config.user.to_string()
await self.auth.check_auth_blocking(requester=requester)

res = await self.response_cache.wrap(
res = await self.response_cache.wrap_conditional(
sync_config.request_key,
lambda result: since_token != result.next_batch,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have had a comment. Why does doing this give the right behaviour? (I shouldn't have to go and find the PR that added it, and even having done so it's hard to correlate the description to the code.)

Copy link
Contributor Author

@ShadowJonathan ShadowJonathan Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was described in the PR description, but i'll take it in another PR to add a comment here, i.e. "small fixes for ResponseCache"

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was described in the PR description,

yeah as I said: having to go and find the PR and grok it is a pain.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, i'll add a comment describing what that conditional does 👍

self._wait_for_sync_for_user,
sync_config,
since_token,
Expand Down
34 changes: 32 additions & 2 deletions synapse/util/caches/response_cache.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@
# See the License for the specific language governing permissions and
# limitations under the License.
import logging
from typing import TYPE_CHECKING, Any, Callable, Dict, Generic, Optional, TypeVar
from typing import TYPE_CHECKING, Any, Callable, Dict, Generic, Optional, Set, TypeVar

from twisted.internet import defer

Expand All @@ -40,6 +40,7 @@ class ResponseCache(Generic[T]):
def __init__(self, hs: "HomeServer", name: str, timeout_ms: float = 0):
# Requests that haven't finished yet.
self.pending_result_cache = {} # type: Dict[T, ObservableDeferred]
self.pending_conditionals = {} # type: Dict[T, Set[Callable[[Any], bool]]]

self.clock = hs.get_clock()
self.timeout_sec = timeout_ms / 1000.0
Expand Down Expand Up @@ -101,7 +102,11 @@ def set(self, key: T, deferred: defer.Deferred) -> defer.Deferred:
self.pending_result_cache[key] = result

def remove(r):
if self.timeout_sec:
should_cache = all(
func(r) for func in self.pending_conditionals.pop(key, [])
)

if self.timeout_sec and should_cache:
self.clock.call_later(
self.timeout_sec, self.pending_result_cache.pop, key, None
)
Expand All @@ -112,6 +117,31 @@ def remove(r):
result.addBoth(remove)
return result.observe()

def add_conditional(self, key: T, conditional: Callable[[Any], bool]):
self.pending_conditionals.setdefault(key, set()).add(conditional)
Comment on lines +120 to +121
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems like a dangerous thing to add to the public interface. It seems like it would be very easy to use in racy way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should i signal that add_conditional is private by prefixing it with _ (in a new PR)?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

since it's only called in one place, I'd just inline it.


def wrap_conditional(
self,
key: T,
should_cache: Callable[[Any], bool],
callback: "Callable[..., Any]",
*args: Any,
**kwargs: Any
) -> defer.Deferred:
"""The same as wrap(), but adds a conditional to the final execution.

When the final execution completes, *all* conditionals need to return True for it to properly cache,
else it'll not be cached in a timed fashion.
"""

# See if there's already a result on this key that hasn't yet completed. Due to the single-threaded nature of
# python, adding a key immediately in the same execution thread will not cause a race condition.
result = self.get(key)
if not result or isinstance(result, defer.Deferred) and not result.called:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is correct, but I had to go and look up the precedence of and vs or to check it. Please use parens to disambiguate in future.

self.add_conditional(key, should_cache)
Comment on lines +140 to +141
Copy link
Member

@richvdh richvdh Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it correct that the conditional is added when not result.called ? I'd argue that cachability of the result should be a property of the callback being used (and hence set at the same time as that callback is set) rather than any subsequent callbacks (which are discarded).

Copy link
Contributor Author

@ShadowJonathan ShadowJonathan Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

operator precedence is or -> and -> not, so this is actually

(not result) or (isinstance(result, defer.Deferred) and (not result.called))

I had to check after your previous comment, it could actually be if not result and not result.called, but because result can be a plain return value, i need to insert the check to isinstance(..., defer.Deferred)

Yes, the conditional is added only to be executed after the result is called, i do agree that now i realise that any future calls can be like "hey, this shouldn't be cached anyways because the returned value doesn't agree with what i already got from somewhere else", in which case I need to rewrite this to execute the conditional locally whenever it's called, and evict the cache when it's not valid according to the new conditional. (the callLater already taking place would simply execute anyways, no way to properly garbage collect that early at that point (to my knowledge), if it was an asyncio Task, i could also map it somewhere to then .cancel() it, if this is possible with callLater handles or something, please tell me)

(this last approach (letting callLater still call) could potentially evict a cache early if a subsequent quick wrap call has the same key, though idk how much of a problem that'd be, seeing as it'd evict cache shortly after the call, and only if;

  • the cache was deemed to be time-cached (with conditionals or not)
  • after the function's return, another function does wrap_conditional with a conditional that evaluates to False
  • shortly after that wrap_conditional call and then-immediate cache eviction, a function (within the remaining timeout) calls wrap* with the same key- wait.

Oh, this might actually be a problem, if the function (with the same key) hasn't returned yet on the second wrap* call, then the previous callLater will evict the deferred, which could lead to more wonky stuff, but the fact it evicts a call in flight already bad enough.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as discussed at considerable length in the room: I don't agree with your logic here. I think that supporting multiple should_cache callbacks per key significantly complicates the implentation, and semantically is dubious at best.

This comment was marked as outdated.

Copy link
Contributor Author

@ShadowJonathan ShadowJonathan Mar 2, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, i'll just make it a simple Dict[T, Callable] mapping then 👍

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, more discussion in #9522 on this.

Comment on lines +139 to +141
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this is duplicating half the logic of wrap, which is kinda the worst of all worlds. Could you not make should_cache nullable, and have wrap call wrap_conditional rather than the other way around?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially did that, but that became a bit of a mess, and you'd had to shift around the positional arguments on any wrap calls currently in place.

I also didn't wanna make it a keyword argument, because that'd shadow any potential keyword arguments to the inner call. While i know it is trivial to "just add it", i didn't want to because it is non-explicit to anyone not aware of this change to ResponseCache, and so it is possible for it to become a bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I initially did that, but that became a bit of a mess, and you'd had to shift around the positional arguments on any wrap calls currently in place.

I'm not suggesting changing the signature of wrap.

Wrap would be:

    def wrap(
        self, key: T, callback: "Callable[..., Any]", *args: Any, **kwargs: Any
    ) -> defer.Deferred:
        return self.wrap_conditional(key, None, callback, *args, **kwargs)

why is that a mess?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmmm, i'll change it to that in that PR, then 👍


return self.wrap(key, callback, *args, **kwargs)

def wrap(
self, key: T, callback: "Callable[..., Any]", *args: Any, **kwargs: Any
) -> defer.Deferred:
Expand Down