Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rabbitmq queue monitor + queue stability enhancements #570

Merged
merged 2 commits into from
Oct 28, 2015

Conversation

bookshelfdave
Copy link
Contributor

This PR adds several enhancements for RabbitMQ stability.

  1. a max-length policy is set on the /analytics exchange
  2. The RabbitMQ Management console is enabled by default with SSL on port 15672. A rabbitmgmt user is created to access this console.
  3. A Queue Length monitoring gen_server (chef_wm_actions_queue_monitoring.erl). This gen_server periodically connects to the management endpoint to check max-length and current length of the analytics queue. If the queue is at > 80% capacity, a warning is issued via lager. If the queue reaches 100% capacity, toggle a "queue at capacity" flag until messages have been read/purged from the queue. Code that publishes messages to RabbitMQ can check the state of the "queue at capacity" flag before enqueuing.
  4. Queue length statistics are reporting via the _status endpoint.

More details are available in chef_wm_actions_queue_monitoring.erl.


Notes

  • This PR adds the PropER testing tool, which is _GPL V3. HOWEVER, this is a _test-only dependency.

  • SSL for the RabbitMQ Management has been configured to only accept tlsv1.2 and tlsv1.1 (POODLE, BEAST) via:

    {ssl, [{versions, ['tlsv1.2', 'tlsv1.1']}]}
    

TODO:

  • tests in chef_wm_status_tests.erl
  • integration tests/local rabbitmq setup
  • vhost/queue hardcoded, remove!

opc_ctl = "/opt/opscode/bin/private-chef-ctl"
opc_username = OmnibusHelper.new(node).ownership['owner']
rmq_ctl_chpost = "/opt/opscode/embedded/bin/chpst -u #{opc_username} -U #{opc_username} #{rmq_ctl}"
rmq_plugins_chpost = "/opt/opscode/embedded/bin/chpst -u #{opc_username} -U #{opc_username} #{rmq_plugins}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that the _chpost extension is used above, but part of me feels that this is a typo for _chpst that has lived for a while.

{ok, "404", _, _} ->
lager:info("RabbitMQ max-length policy not set"),
undefined;
Resp -> lager:error("Unknown response from RabbitMQ management console: ~p", [Resp]),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indentation here doesn't match the rest of the cases.

@marcparadise
Copy link
Member

This looks good, with some relatively minor comments above.

My only large area of concern is the volume of requests this will be handling. As you pointed out the timeout option is a bit of a red herring-- realistically if we ever hit that timeout we're already going to be in a really bad state.

This is very lightweight as implemented, and I'm probably being overly paranoid - I'd feel better if we had decent integrated load suite that didn't require spinning up a ton of AWS nodes.

@bookshelfdave
Copy link
Contributor Author

thanks for the review @marcparadise @stevendanna, I'll clean things up on Monday.

@bookshelfdave
Copy link
Contributor Author

ping @chef/lob

@@ -116,6 +116,54 @@
default['private_chef']['rabbitmq']['consumer_id'] = 'hotsauce'
default['private_chef']['rabbitmq']['env_path'] = "/opt/opscode/bin:/opt/opscode/embedded/bin:/usr/bin:/bin"

default['private_chef']['rabbitmq']['ssl_versions'] = "'tlsv1.2', 'tlsv1.1'"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super minor nitpick, would this be better expressed as an Array?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, do you mean "['tlsv1.2', 'tlsv1.1']" or ['tlsv1.2', 'tlsv1.1']?

@marcparadise
Copy link
Member

👍

ChangeLog-Entry: [omnibus] enable RabbitMQ Management Plugin

- queue monitor review cleanup
- set prevent_erchef_startup_on_full_capacity default to false
- change rabbitmq ssl_versions att to an array
- queue monitor queue_at_capacity wasn't getting set upon startup

POODLE fix for rabbit mgmt console:
From https://www.rabbitmq.com/mochiweb.html

"For convenience, if you do not specify ssl_opts then rabbitmq-mochiweb
will use the same options as the main RabbitMQ server does for AMQP over
SSL, but with client certificate verification turned off. If you wish to
use client certificate verification, specify ssl_opts explicitly."
bookshelfdave added a commit that referenced this pull request Oct 28, 2015
rabbitmq queue monitor + queue stability enhancements
@bookshelfdave bookshelfdave merged commit ee95460 into master Oct 28, 2015
@bookshelfdave bookshelfdave deleted the dp_rabbit_monitoring branch February 10, 2016 21:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants