-
Notifications
You must be signed in to change notification settings - Fork 117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Configure ironic to use Ipmitool retries #172
Conversation
/test-integration |
/test-integration |
Why do you think so? The -R and -N option specify the retry number and the delay between retries. The global ironic retries and timeout still apply (unless we have a bug, obviously). |
if Ironic is handling the retires and timeouts, it sets the -R and -N options to 1 : https://github.com/openstack/ironic/blob/master/ironic/drivers/modules/ipmitool.py#L483 . In that case the internal ipmitool timeout is 1 second. I agree that ironic would retry later on. But in metal3-dev-env this 1s timeout is too short on a lot of operations. So letting ipmitool handle the timeout and retries allow us to wait longer for the vbmc answer. |
Without this configuration, the ipmitool timeout is 1 second. This is too short for vbmc. This commit uses the ipmitool retry feature and extends the timeout
to prevent errors such as ``` Suspicious activity detected for node... when attempting to heartbeat. Heartbeat request has been rejected as the version of ironic-python-agent indicated in the heartbeat operation should support agent token functionality. ```
/test-integration |
Mmm, I think I understand what you're getting at. The retransmission rate is too fast? Do you have a link to example where you hit this? @bfournie FYI, it may be a problem with your patch. |
The issue is rather that ipmitool does not wait long enough before giving up when running with -N 1. And when you configure the retries to be handled by ironic, the interval between retries is set properly, but the timeout is forced to be 1 , i.e. you send a request, fail after 1s and wait 4s before sending the next request. That's the problematic part for Metal3 Dev env |
/assign @russellb |
This PR fixes the error that is otherwise visible in ironic logs :
Ironic usually retries but sometimes fails all the retries. By waiting a bit more, there is no need for retries. the ironic logs for an example of a failed CI run can be found here : https://jenkins.nordix.org/view/Metal3/job/airship_master_v1a3_integration_test_ubuntu/243/artifact/logs-jenkins-airship_master_v1a3_integration_test_ubuntu-243.tgz (in the docker folder) |
For comparison purposes, with this PR : https://jenkins.nordix.org/job/airship_metal3io_ironic_image_v1a3_integration_test_ubuntu/47/artifact/logs-jenkins-airship_metal3io_ironic_image_v1a3_integration_test_ubuntu-47.tgz . All ipmitools related errors are gone. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The description makes sense. Let's see if these changes make CI more stable.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: dhellmann, maelk The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Without this configuration, the ipmitool timeout is 1 second. This is too short
for vbmc. This commit uses the ipmitool retry feature and extends the
timeout.
This PR also set console=ttyS0 in the IPA kernel parameter to gather IPA logs
in Metal3-dev-env