Skip to content
This repository has been archived by the owner on Jul 25, 2022. It is now read-only.

cleanup function will return the code which ssh returns #445

Conversation

neo-liang-sap
Copy link
Contributor

@neo-liang-sap neo-liang-sap commented Nov 16, 2020

What this PR does / why we need it:
cleanup function will return the code which ssh returns
Which issue(s) this PR fixes:
Fixes #444 and #448

Special notes for your reviewer:

Release note:

cleanup function will return the code which ssh returns

@neo-liang-sap neo-liang-sap requested a review from a team as a code owner November 16, 2020 02:17
@gardener-robot gardener-robot added needs/review Needs review size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels Nov 16, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 16, 2020
@neo-liang-sap
Copy link
Contributor Author

Hi @dansible ,
I tested in following two scenario

  1. postive path - ssh returns OK and the exit status code is 0
  2. negative path - i fake ssh username to "neo" and ssh failed with exit code 255, then overall exit status is 255

BTW the openstack infra itself is not very stable - in your case i saw you were unable to ssh to gctl-os shoot but today i can ssh without error, this is the same when i was doing gardenctl ssh openstack_node - sometimes the ssh failed for unknown reason...i didn't dig out why, retry will success....

image
image

@gardener-robot gardener-robot removed the needs/review Needs review label Nov 17, 2020
}
defer a.cleanUpOpenstack(sshStatus)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the purpose of having defer as last statement of the function? I would refrain from moving the deferred cleanup call to the bottom.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during my test, if i move the defer a.cleanUpOpenstack(sshStatus) to above lines of value assignment of sshStatus , it won't get correct values.
Say if i keep defer a.cleanUpOpenstack(sshStatus) to original line, when some error occurs in ssh, sshStatus will be assigned with value 1 (for example) but in defer a.cleanUpOpenstack(sshStatus) the value of sshStatus is still 0
do you have a better idea how i can get sshStatus correctly passed to cleanUpOpenstack meanwhile i can keep the original line for the defer function? thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to achieve this you need to pass a pointer

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks! i will have a try

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @petersutter , i've modified the code using *int to pass values, thanks for your suggestion!
-Neo

@neo-liang-sap neo-liang-sap force-pushed the openstack-ssh-return-with-error-code branch from 6204f9b to 1a544ae Compare November 18, 2020 01:19
@gardener-robot-ci-2 gardener-robot-ci-2 added needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) and removed reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) labels Nov 18, 2020
@gardener-robot gardener-robot added size/s Size of pull request is small (see gardener-robot robot/bots/size.py) and removed size/xs Size of pull request is tiny (see gardener-robot robot/bots/size.py) labels Nov 18, 2020
@gardener-robot-ci-2 gardener-robot-ci-2 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 18, 2020
@gardener-robot-ci-3 gardener-robot-ci-3 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 18, 2020
@neo-liang-sap
Copy link
Contributor Author

@dansible , per our discussion in planning mtg this week, i implement a function to check whether IP and port are reachable (#448) , i implement this feature in this PR as this function will impact the overall exit status so i append this feature together in this PR. Please review it when you have time, thanks!
-Neo

pkg/cmd/utils.go Outdated
attemptCnt := 0
for attemptCnt < 6 {
ncCmd := fmt.Sprintf("timeout 10 nc -vtnz %s %s", ip, port)
cmd := exec.Command("bash", "-c", ncCmd)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Stop calling bash
  • Use exec. CommandContext to pass a context with a timeout to achieve the same behavior instead of calling timeout
  • Does this work on windows? I'd rather prefer a native-go solution

@gardener-robot gardener-robot added the needs/changes Needs (more) changes label Nov 18, 2020
@gardener-robot-ci-1 gardener-robot-ci-1 added the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 18, 2020
@neo-liang-sap
Copy link
Contributor Author

thanks @petersutter , i used go native method to check whether the ip and port reachable

@gardener-robot-ci-1 gardener-robot-ci-1 removed the reviewed/ok-to-test Has approval for testing (check PR in detail before setting this label because PR is run on CI/CD) label Nov 18, 2020
Comment on lines +356 to +357
timeout := time.Second * 60
conn, err := net.DialTimeout("tcp", net.JoinHostPort(ip, port), timeout)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this works without having to dial/try multiple times within a certain amount of time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, in my test it tries within 1 min, as I set in the timeout

Copy link
Contributor

@petersutter petersutter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@gardener-robot gardener-robot added reviewed/lgtm Has approval for merging needs/changes Needs (more) changes and removed needs/changes Needs (more) changes reviewed/lgtm Has approval for merging labels Nov 18, 2020
@neo-liang-sap neo-liang-sap merged commit ca48bd0 into gardener-attic:master Nov 19, 2020
@petersutter petersutter mentioned this pull request Nov 20, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
needs/changes Needs (more) changes needs/ok-to-test Needs approval for testing (check PR in detail before setting this label because PR is run on CI/CD) size/s Size of pull request is small (see gardener-robot robot/bots/size.py)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

gardenctl ssh <openstack_node> fails with Operation timed out
7 participants