Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multi-root discovery: pragmatic, simple consensus. #216

Open
goodboy opened this issue Jul 5, 2021 · 1 comment
Open

Multi-root discovery: pragmatic, simple consensus. #216

goodboy opened this issue Jul 5, 2021 · 1 comment

Comments

@goodboy
Copy link
Owner

goodboy commented Jul 5, 2021

This issue is a more practical and immediate equivalent of the discussion in #184.

From the start of the project a hacky naive approach to finding the addresses of other "actors" has been this idea of an arbiter which we implement as a "special actor" with methods for looking up (socket) addresses by name. This is of course not an ideal system since there will always be a race during a multi-tree startup for the "arbiter" address as well no flexible consensus system for how that position can be transferred to another tree / root actor when the first is torn down / fails. The fragility is further emphasized in how root actors "check" for the registry (arbiter) existing which is simply do a fast TCP connect and drop on the supposed arbiter socket address.

Summarizing the current naive/questionable design for an address registry:

  • a single socket address is allocated to some root actor designated the "arbiter" (aka a registry actor) and this address is passed to other python programs which would like to search for actors also using this same registry
  • the way to "check" if the arbiter "exists" is to do a nasty TCP connect/drop which results in us having to specially handle and remap trio.BrokenResourceErrors to an internal TransportClosed error which is ignore silently
  • there is no mechanism for fail-over, arbiter re-election, transfer of the registry between trees

Digging into "why" this is in the code:

This "arbiter" idea was originally adopted from other "actor system" projects:

  • examples

Places to start some research

  • gossip protocol
  • matrix "federation replication" api
  • raft

WIP, will come back.

goodboy added a commit that referenced this issue Aug 28, 2023
By spawning an actor task that immediately shuts down the transport
server and then sleeps, verify that attempting to connect via the
`._discovery.find_actor()` helper delivers `None` for the `Portal`
value.

Relates to #184 and #216
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant