Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HA - Automatic Failover #67

Open
eveiga opened this issue Feb 13, 2013 · 28 comments
Open

HA - Automatic Failover #67

eveiga opened this issue Feb 13, 2013 · 28 comments

Comments

@eveiga
Copy link

eveiga commented Feb 13, 2013

Hi! First of all, thanks for the proxy, it has been really helpful :)

I'm in need of a decent solution for automatic failover and already stated that twemproxy doesn't support it. Any thoughts or ideas on it?

I was thinking on a external process that would leverage the use o redis-sentinel and on a master-switch event updates the IP address on nutcracker.conf and restarts the service.

@manjuraj
Copy link
Collaborator

Glad you liked it @eveiga

I believe using the external process the way you described makes sense. In fact you can have two twemproxy processes running - one routing the traffic to all the masters and the other to all the slaves. On a failover event, you switch from one twemproxy to the other

@bmatheny
Copy link

@manjuraj @eveiga that's what we do for memcache when there are events like a total failure (external process). Works quite well.

@eveiga
Copy link
Author

eveiga commented Feb 13, 2013

@manjuraj I've already thought on that solution. Can I use the slaves cluster to perform read operations? Or the hashes wont pair with the ones for the master cluster?

@bmatheny are you using my sugestion or manjuraj one?

@bmatheny
Copy link

@eveiga the one you recommended. When the topology needs to change the config is updated by an external process and twem gets restarted.

@eveiga
Copy link
Author

eveiga commented Feb 13, 2013

@bmatheny sorry for the boring questions :) dont you experience a window of downtime during that restart? If yes, How do you cope with that?

BTW, are you using any pool of twemproxy just with slaves for reading?

@eveiga
Copy link
Author

eveiga commented Feb 13, 2013

Humm, I forgot you are using it with memcache, dont know if the last question fits your use case!

@bmatheny
Copy link

We do see a short burst of errors. The error type is detected by the app and retried, so we generally don't 'lose' writes, and reads will fall back to the DB.

@eveiga
Copy link
Author

eveiga commented Feb 14, 2013

@bmatheny Thanks for the tips, I'll go on with that solution!

@matschaffer
Copy link
Contributor

@eveiga thanks for the redis-sentinel reminder. So far it looks like this will work well.

Has anyone built the bits to update twemproxy when redis-sentinel finishes a failover?

@manjuraj would you recommend anything more graceful than simply rewriting the twemproxy config and restarting it?

@eveiga
Copy link
Author

eveiga commented Feb 26, 2013

@matschaffer Yes, I've developed a simple service that attaches a handler to the "master-switch" event emitted by redis-sentinel, updates twemproxy.conf with the new info and restartes the service. So far so good with the tests, I'll put it in production in a short time.

@matschaffer
Copy link
Contributor

@eveiga any chance of sharing what you've come up with?

@eveiga
Copy link
Author

eveiga commented Feb 26, 2013

No problem. It's on node.js and a bit tight with our structure, still want it?

@matschaffer
Copy link
Contributor

Sure! Even just a gist is great. Always nicer to have some collaboration. :)

On Feb 26, 2013, at 9:47, eveiga notifications@github.com wrote:

No problem. It's on node.js and a bit tight with our structure but, still
want it?


Reply to this email directly or view it on
GitHubhttps://github.com//issues/67#issuecomment-14118050
.

@eveiga
Copy link
Author

eveiga commented Feb 26, 2013

https://gist.github.com/eveiga/5039007

As I said, it's pretty tight with our structure (init scripts path, mails, etc) and could be a lot configurable, but it can give you a starting point.

Sugestions are welcome!

@manjuraj
Copy link
Collaborator

it you guys can make this generic enough, we can check this into the scripts/ folder of twemproxy

@matschaffer
Copy link
Contributor

@eveiga how's yours panning out? Over here it seems to work if I'm careful about the startup order. But if the agent comes up before the sentinel the agent seems to deadlock after a certain number of retries. Have you run into that or are you controlling start order more carefully.

@matschaffer
Copy link
Contributor

@eveiga btw, I have this up at https://github.com/matschaffer/redis_twemproxy_agent as something I can pack with npm and get some rough testing around. I took out the email notifier though since we'll probably want to notify via other means.

@eveiga
Copy link
Author

eveiga commented Mar 1, 2013

Hey @matschaffer, I've assumed that the sentinel was already running, but indeed we should have some kind of reaction on a failed startup. Thanks for packing this in a new repo, I'll take a look at it during the weekend and try to do some contribution!

@matschaffer
Copy link
Contributor

No problem! After further testing I'm not sure that's the case (with the startup order issue). Not sure what caused the lack of reconfiguration on my first test but I haven't been able to replicate it. My latest commit logs a lot to stdout in hopes that I can tell what's up if it happens again.

@matschaffer
Copy link
Contributor

@eveiga how's this working for you? For me it was working great until I added a second sentinel. Seems like a single sentinel may or may not broadcast the failover messages. Still investigating though.

@matschaffer
Copy link
Contributor

After some investigation it looks like it's not just the multiple sentinels but rather multiple masters failing at the same time. The agent doesn't seem to reliably get all the switch-master messages :(

@matschaffer
Copy link
Contributor

Swapping for node-sentinel for direct use of node-redis seems to help. Gonna do another test now.

@eveiga
Copy link
Author

eveiga commented Mar 19, 2013

Hey @matschaffer! Sorry for the absence, I'm back on this! Thanks for the bumps on it, I'll take a look and update the production code.

@eveiga
Copy link
Author

eveiga commented Mar 19, 2013

BTW: I never had more than one sentinel so I've never crashed into your problem.

@idning
Copy link
Contributor

idning commented Mar 21, 2014

hi, all, try https://github.com/idning/redis-mgr please

@nidhhoggr
Copy link

If anyone is interested I started a C implementation of https://github.com/matschaffer/redis_twemproxy_agent at https://github.com/nidhhoggr/twemproxy_sentinel

@virendarkmr
Copy link

Hi, I am stuck with same issue. I have 2 different redis cluster with master slave slave and sentinel is handling failover. I redis twemproxy agent is working fine with when I give single sentinl ip in cli.js
How can I handle failover for two cluster?

@douglaslps
Copy link

hi, all, try https://github.com/idning/redis-mgr please

What happened with that? I'm getting page not found.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants