Skip to content
This repository has been archived by the owner on Dec 5, 2017. It is now read-only.

Allow both Clients and Services to work completely offline for a period of time. #160

Open
erikstmartin opened this issue Oct 16, 2012 · 0 comments

Comments

@erikstmartin
Copy link
Member

In the event of an entire doozer outage (whole cluster is down) we would still be able to run in an offline mode due to the fact we are maintaining internal lists of services.

Services:
The concept would be that on the service side when we notice we have no more instances of doozer to try we mark ourselves in an offline mode and don't send updates to doozer, if we unregister we start hard rejecting traffic.

We retry to connect to doozer at a set interval, and when doozer comes back online we re-register ourselves.

Client:
Clients have a list of services, so they can still use the pool of connections they have, when they notice they have lost all connectivity to doozer, after X failed attempts to a given host:port it will manually remove that instance from it's pool. and the internal instance list so that no new connections are opened to it.

We retry to connect to doozer at a set interval, upon reconnecting we rebuild our internal instance list from scratch, and cleanup any pools that we have open to instances that are not in doozer anymore, or have unregistered themselves.

The important thing to note here is that we want to make sure when doozer comes back online any of our wait() calls, and things like that we get a new revision because if all nodes went down the revision count will start over

This isn't a huge priority, but i think it would be a cool thing to do at some point to further the concept that skynet is built around that: Everything dies.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant