Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Not catching all switch-master messages #1

Closed
matschaffer opened this issue Mar 15, 2013 · 5 comments
Closed

Not catching all switch-master messages #1

matschaffer opened this issue Mar 15, 2013 · 5 comments
Assignees

Comments

@matschaffer
Copy link

I ran into a problem with https://github.com/matschaffer/redis_twemproxy_agent using node-sentinel where not all messages were firing the handler. Particularly in multi-failover situations (where the failed machine was running multiple redis masters). So I'd miss some switch-master messages and end up with an incorrectly configured twemproxy.

My node-fu isn't quite strong enough to see why this is happening, but psubscribe directly using https://github.com/mranney/node_redis seems to resolve it. If you'd like me to try anything in particular let me know.

@JvrBaena
Copy link
Owner

Hi Mar!
During development I found some inconsistencies in the sentinels documentation, both messages not documented and events emitted with different messages than the ones documented, so there may be some specials circumstances under which the message emitted differs from what I am capturing.

If you could provide me with an schema of your redis configuration (masters, slaves, sentinels and the masters they point to) I may try to reproduce the environment. Also, could you please me provide me the log of the sentinel whose events we are losing? (i assume it's te switch master event right?).

BTW this is my first ever github issue :D so thanks for reporting!

@matschaffer
Copy link
Author

Hehe... congratulations! 👍

So the way I tested this is with two "proxy" boxes running twemproxy and the previous version of redis_twemproxy_agent attached to a locally running redis-sentinel.

These were both pointed at a master that was running 4 redis instances, and a slave that was running another 4.

Then I tested by turning off the master. Using node-sentinel, I wouldn't usually see the agent pick up on the switch-master messages for all 4 of the shards on the master. It tended to always get at least one of the messages, but rarely all 4. Even though both sentinels actually reported all 4 switch-master events (verified by leaving redis-cli psubscribe running on both proxies).

Hope that gives you some idea. I'd provide a vagrant test case, but the chef cookbooks are a little intertwined with some proprietary stuff at the moment. For some history, this idea originally came from this twemproxy ticket.

@ghost ghost assigned JvrBaena Mar 17, 2013
@JvrBaena
Copy link
Owner

Hi again!

I have been testing a 'simplified' environment with 4 masters, 4 slaves and 2 sentinels and indeed found that, if psubscribing to '+switch-master' I was getting all 8 events while node-sentinel started emitting events and ended losing some of them.

...then I saw that for the message '+sentinel', which is sent whenever a sentinel detects another sentinel for a master, there was a typo: 'self.emot' instead of 'self.emit'. Shame on me! U__U

This was causing the handlers to silently die each time a +sentinel was detected, which in a multi-sentinel-multi-masters environment happens each time a switch-master is issued and the sentinels update their tables for this new sentinel... so I think fixing the typo should have solved the issue (in my testing environment it has).

I have published a new version (0.0.5) fixing the typo, so please, if you could test if this version fixes the problem it would be great :)

@matschaffer
Copy link
Author

It's always the little things, isn't it? I'm technically off the project that was using that now, but I'll let its new papas know that this could do the job if they wanna try it out. Thanks!

@JvrBaena
Copy link
Owner

Great!
I'll leave this issue closed at the moment, and will reopen it if I get any notice that the fix didn't solve it.

Once again, thanks for reporting! :D

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants