Skip to content
This repository has been archived by the owner on Apr 26, 2024. It is now read-only.

Unable to paginate rooms in the space /hierarchy -> 400 Unknown pagination token #12697

Closed
MadLittleMods opened this issue May 10, 2022 · 12 comments
Labels
A-Spaces Hierarchical organization of rooms T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.

Comments

@MadLittleMods
Copy link
Contributor

MadLittleMods commented May 10, 2022

Description

Unable to paginate rooms in the space /hierarchy -> ❌ 400 Unknown pagination token

Steps to reproduce

  • Go to the space home
  • Scroll and paginate the rooms in the space until you get stuck
    • Maybe this matters, I was also searching/filtering for a specific room before that
  • Notice Failed to load list of rooms. message in Element and the underlying network request errors:

GET https://matrix-client.matrix.org/_matrix/client/v1/rooms/!OJBlkJuUrsKnqtNnTi%3Amatrix.org/hierarchy?suggested_only=false&from=iDciuFbCVinPIwUGYtUzQLHO&limit=20 -> ❌ 400 bad request

{"errcode":"M_INVALID_PARAM","error":"Unknown pagination token"}

This could be a Element client bug but I am guessing that Synapse returned the bad pagination token in the first place. Element debug logs if you want to dive in to see if it is a client issue, https://github.com/matrix-org/element-web-rageshakes/issues/12829

Version information

  • Homeserver: matrix.org
@MadLittleMods MadLittleMods added T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues. A-Spaces Hierarchical organization of rooms labels May 10, 2022
@clokep
Copy link
Member

clokep commented May 10, 2022

The "easy" reason I can think of is that the time between pagination requests was greater than 5 minutes.

I unfortunately didn't see much in the logs (although I don't really know how to read Element Web logs...) I don't think we'll glean much from the matrix.org logs either. 😢

Looking at the server logs I see attempts to the paginate that same room from your user. I see a few requests seconds apart, which seem to also match the time taken to respond, e.g.

Request time Time to process request (s)
19:02:22,965 1.474
19:02:26,142 2.220
19:02:35,016 5.684

After that there's a 7 minute gap until the next request at 19:09:18,563. Element Web seems to continually retry over for the next ~12 minutes after this, all with the same result (unknown pagination token).

This is almost definitely due to the 5 minute session timeout we use for pagination tokens:

# The time a pagination session remains valid for.
_PAGINATION_SESSION_VALIDITY_PERIOD_MS = 5 * 60 * 1000

(This is probably exacerbated by there being no server-side search so the filter in Element Web pulls page after page trying to find the room you want...which is very inefficient.)

@MadLittleMods
Copy link
Contributor Author

MadLittleMods commented May 10, 2022

I guess it was more than 5 minutes 😪 (didn't know that restriction either). I'll create an issue in the Element Web issue tracker to handle this more gracefully. Thanks for investigating @clokep

GET https://matrix-client.matrix.org/_matrix/client/v1/rooms/!OJBlkJuUrsKnqtNnTi%3Amatrix.org/hierarchy?suggested_only=false&from=rCkHRkRndYnCOIInVookuxrI&limit=20 -> ✅ 200 OK

date: Tue, 10 May 2022 19:02:38 GMT (response header)

{ "rooms": [...], "next_batch":"iDciuFbCVinPIwUGYtUzQLHO" }

GET https://matrix-client.matrix.org/_matrix/client/v1/rooms/!OJBlkJuUrsKnqtNnTi%3Amatrix.org/hierarchy?suggested_only=false&from=iDciuFbCVinPIwUGYtUzQLHO&limit=20 -> ❌ 400 Bad request

date: Tue, 10 May 2022 19:08:58 GMT (response header)

{"errcode":"M_INVALID_PARAM","error":"Unknown pagination token"}

@clokep
Copy link
Member

clokep commented May 10, 2022

(The 5 minutes is somewhat arbitrary, it was added to try to avoid potential resource exhaustion, we could up it...but I don't think that would solve the problem! I'd be curious what Element Web was doing to wait that long between requests?)

@t3chguy
Copy link
Member

t3chguy commented May 10, 2022

The spec doesn't say anything about the token ever expiring https://spec.matrix.org/v1.2/client-server-api/#get_matrixclientv1roomsroomidhierarchy so IMO this is a spec compliance issue in Synapse

@t3chguy
Copy link
Member

t3chguy commented May 10, 2022

I'd be curious what Element Web was doing to wait that long between requests?

The user simply didn't search/scroll for that long, the client is lazy and won't paginate unless it needs the data further in the dataset. Requesting data afresh after a given time yields the possibility of the first page of data to be entirely different from the first request causing an inconsistent flash as the data gets re-rendered.

@MadLittleMods
Copy link
Contributor Author

MadLittleMods commented May 10, 2022

I'd be curious what Element Web was doing to wait that long between requests?

The user simply didn't search/scroll for that long, the client is lazy and won't paginate unless it needs the data further in the dataset. Requesting data afresh after a given time yields the possibility of the first page of data to be entirely different from the first request causing an inconsistent flash as the data gets re-rendered.

It's definitely possible that I didn't scroll during that time. But as explained further in element-hq/element-web#22138, I was filtering for rooms which automatically paginates until it reaches the end (I didn't have results to fill up the scroll area). The loading spinner was still going when I removed the filter text and tried paginating manually where the request failed. "It feels like at a certain point, the app gave up on paginating the rooms for some reason even though the loading spinner was still present. So I was waiting around even though it was doing nothing and pagination token expired behind the scenes. Is there something in the logs indicating why it would stop paginating?" (element-hq/element-web#22138)

@t3chguy
Copy link
Member

t3chguy commented May 10, 2022

Regardless of if you stopped paginating, the spec doesn't say anything about the token ever expiring, so apps are free to assume they do not given the spec is clear elsewhere about exping things e.g expires_in on an access_token from /request_token

@turt2live
Copy link
Member

fwiw, the spec appears to be wrong here. The MSC states:

When the current response page is full, the current state should be persisted and a pagination token should be generated (if there is more data to return). To prevent resource exhaustion, the server may expire persisted data that it deems to be stale.

This was implemented in #10574 as part of maintenance work for #10495 which appeared shortly after MSC2946 was updated.

As such, the spec should be clarified. It's still unfortunate that "stale" isn't bound to a time though - at initial glance, I am a bit surprised we (SCT) let that slip into implementation detail.

@clokep
Copy link
Member

clokep commented May 11, 2022

As such, the spec should be clarified. It's still unfortunate that "stale" isn't bound to a time though - at initial glance, I am a bit surprised we (SCT) let that slip into implementation detail.

We could potentially get rid of it by serializing all the necessary state into the pagination token, but it can be quite large and would need to be signed or something so that the client doesn't tamper with it. I believe that was the original rationale for having it timeout, but it was been a while and my memory might be failing!

@squahtx
Copy link
Contributor

squahtx commented May 11, 2022

To add to that, the reason the pagination state can get arbitrarily large is because /hierarchy is recursive. We have to remember which rooms we've already seen to avoid getting stuck in loops.

If we had a non-recursive API, where clients had to recurse into child spaces themselves, that pagination state would live on the client. Then we could reduce the pagination token to just the sort key of the last room in the response, or sort key+event ID to read the state at if we wanted to get a consistent snapshot of a space's children.

@clokep
Copy link
Member

clokep commented May 11, 2022

If we had a non-recursive API, where clients had to recurse into child spaces themselves, that pagination state would live on the client.

And the reason we didn't design it this way is that it would be a lot of round-trip requests for a space which is narrow, but deep. 👍

@turt2live
Copy link
Member

sounds like a great code golf issue for someone 😇

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
A-Spaces Hierarchical organization of rooms T-Defect Bugs, crashes, hangs, security vulnerabilities, or other reported issues.
Projects
None yet
Development

No branches or pull requests

5 participants