Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Add some tracing spans to give insight into local joins #13439

Merged
merged 2 commits into from
Aug 3, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions changelog.d/13439.misc
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
Add some tracing to give more insight into local room joins.
15 changes: 9 additions & 6 deletions synapse/handlers/message.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@
from synapse.events.snapshot import EventContext
from synapse.events.validator import EventValidator
from synapse.handlers.directory import DirectoryHandler
from synapse.logging import opentracing
from synapse.logging.context import make_deferred_yieldable, run_in_background
from synapse.metrics.background_process_metrics import run_as_background_process
from synapse.replication.http.send_event import ReplicationSendEventRestServlet
Expand Down Expand Up @@ -1372,9 +1373,10 @@ async def _persist_event(
# and `state_groups` because they have `prev_events` that aren't persisted yet
# (historical messages persisted in reverse-chronological order).
if not event.internal_metadata.is_historical():
await self._bulk_push_rule_evaluator.action_for_event_by_user(
event, context
)
with opentracing.start_active_span("calculate_push_actions"):
await self._bulk_push_rule_evaluator.action_for_event_by_user(
event, context
)

try:
# If we're a worker we need to hit out to the master.
Expand Down Expand Up @@ -1461,9 +1463,10 @@ async def cache_joined_hosts_for_event(
state = await state_entry.get_state(
self._storage_controllers.state, StateFilter.all()
)
joined_hosts = await self.store.get_joined_hosts(
event.room_id, state, state_entry
)
with opentracing.start_active_span("get_joined_hosts"):
joined_hosts = await self.store.get_joined_hosts(
event.room_id, state, state_entry
)

# Note that the expiry times must be larger than the expiry time in
# _external_cache_joined_hosts_updates.
Expand Down
57 changes: 30 additions & 27 deletions synapse/handlers/room_member.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,6 +32,7 @@
from synapse.events import EventBase
from synapse.events.snapshot import EventContext
from synapse.handlers.profile import MAX_AVATAR_URL_LEN, MAX_DISPLAYNAME_LEN
from synapse.logging import opentracing
from synapse.module_api import NOT_SPAM
from synapse.storage.state import StateFilter
from synapse.types import (
Expand Down Expand Up @@ -428,14 +429,14 @@ async def _local_membership_update(
await self._join_rate_per_room_limiter.ratelimit(
requester, key=room_id, update=False
)

result_event = await self.event_creation_handler.handle_new_client_event(
requester,
event,
context,
extra_users=[target],
ratelimit=ratelimit,
)
with opentracing.start_active_span("handle_new_client_event"):
result_event = await self.event_creation_handler.handle_new_client_event(
requester,
event,
context,
extra_users=[target],
ratelimit=ratelimit,
)

if event.membership == Membership.LEAVE:
if prev_member_event_id:
Expand Down Expand Up @@ -564,25 +565,26 @@ async def update_membership(
# by application services), and then by room ID.
async with self.member_as_limiter.queue(as_id):
async with self.member_linearizer.queue(key):
result = await self.update_membership_locked(
requester,
target,
room_id,
action,
txn_id=txn_id,
remote_room_hosts=remote_room_hosts,
third_party_signed=third_party_signed,
ratelimit=ratelimit,
content=content,
new_room=new_room,
require_consent=require_consent,
outlier=outlier,
historical=historical,
allow_no_prev_events=allow_no_prev_events,
prev_event_ids=prev_event_ids,
state_event_ids=state_event_ids,
depth=depth,
)
with opentracing.start_active_span("update_membership_locked"):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably give the Linearizer some tracing annotations at some point, which would probably make this redundant. but that's a bigger job and this is certainly fine for now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious what that job would entail?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

really, much the same as you're doing here - adding with opentracing.start_active_span(...) calls around things that might be slow. I'd probably add one at the top level of Linearizer.queue (or rather, in the nested _ctx_manager), and another at the point where we wait for the lock (await new_defer).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this was asked elsewhere but I didn't see the answer- is there any performance penalty or other drawback to consider when adding opentracing spans?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, there's a small performance penalty, which is one reason not to go completely crazy with adding spans.

There are two aspects to the performance penalty. One, each call to start_active_span results in a few function calls and python objects. Two, if tracing is actually enabled for the current request then we additionally have to serialize all the trace data and send it over the network to the jaeger server, and obviously the more spans, the more work this is.

Plus, in some ways, less can be more when it comes to looking at a trace and understanding what's going on.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, thanks for explaining!

result = await self.update_membership_locked(
requester,
target,
room_id,
action,
txn_id=txn_id,
remote_room_hosts=remote_room_hosts,
third_party_signed=third_party_signed,
ratelimit=ratelimit,
content=content,
new_room=new_room,
require_consent=require_consent,
outlier=outlier,
historical=historical,
allow_no_prev_events=allow_no_prev_events,
prev_event_ids=prev_event_ids,
state_event_ids=state_event_ids,
depth=depth,
)

return result

Expand Down Expand Up @@ -649,6 +651,7 @@ async def update_membership_locked(
Returns:
A tuple of the new event ID and stream ID.
"""

content_specified = bool(content)
if content is None:
content = {}
Expand Down