Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent state when checking ssh connection (during a compute creation) fail or cancel #706

Open
trihoangvo opened this issue Nov 10, 2020 · 1 comment
Labels
bug Something isn't working

Comments

@trihoangvo
Copy link
Contributor

trihoangvo commented Nov 10, 2020

Bug Report

Description

When a4c sends a task "compute install" to yorc, yorc creates a VM and SSH to the VM for checking connection. However, when the VM is created successfully, but the checking connection is not finished (due to several reasons below), we have the following inconsistent state: On the cloud provider side, the VM is created. In the state machine, the VM is not created.

When users undeploy the application, the task "compute uninstall" is considered as successfully, the workflow continues to uninstall another dependent cloud resources (e.g., networks). However, the cloud provider cannot delete these resources since the VM is still there. As a result, the uninstall workflow never completes.

The SSH connection check may fail due to several reasons:

  • cloud-init may fail to get metadata server to setup the VM's NIC or the public key (this is an issue from the cloud provider side, but it may happen from time to time in OpenStack).
  • the user image to boot the VM may have some issues during bootstrap.

Expected behavior

The uninstall workflow deletes all resources and completes.

Actual behavior

The uninstall workflow considers the compute uninstall step as successful and does not delete the compute.

Steps to reproduce the issue

  1. Create a topology with one compute node, network node.
  2. Waiting until the compute is created on the cloud provider and is being booted.
  3. Click un-deploy.

Additional information you deem important (e.g. issue happens only occasionally)

Happen always

Output of yorc version

current develop

Priority

Medium.

A workaround is that users delete the VM on the cloud provider manually so that the undeployment process can complete.

Discussion

We may split the compute creation in two steps: compute create and compute start for a better error handling? This is useful for users to know that their VMs are created but failed to start. The terraform may remain the same, but we may set the state of the task to "created" / "started"?

@trihoangvo trihoangvo added the bug Something isn't working label Nov 10, 2020
@trihoangvo
Copy link
Contributor Author

Hi @loicalbertin do you have any comments?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant