Skip to content
This repository has been archived by the owner on Jul 25, 2022. It is now read-only.

Extension concept for cloud-provider specific operations #208

Closed
danielfoehrKn opened this issue Jun 8, 2020 · 8 comments
Closed

Extension concept for cloud-provider specific operations #208

danielfoehrKn opened this issue Jun 8, 2020 · 8 comments
Assignees
Labels
kind/enhancement Enhancement, improvement, extension

Comments

@danielfoehrKn
Copy link

danielfoehrKn commented Jun 8, 2020

Gardener has an extensibility concept, allowing cloud-provider specific tasks, to be developed independent of the Gardener core repository.

Similarly, it would be nice if gardenctl would offer a concept where gardenctl would define an interface with operations that plugins can implement to provide support for a specific cloud-provider.
Plugins could be vendored from the gardener-extension & added at compile time or possibly even registered at runtime.

Cloud-provider specific operations could be (not exhaustive list)

  • ssh into a worker node via CLI
  • clean up shoot infrastructure forceCleanupInfrastructure(String vpcId).
    This can be helpful
    • during development when there are left-over resources
    • to cleanup Shoots when the Shoot owner lets Gardener create a new VPC for the Shoot, created additional resources in the VPC and then decides to delete the Shoot.
  • Detect & Delete leaked / orphaned resources in a pre-created VPC

Let's take the operation cleanupInfrastructure(String vpcId) as an example:

In Gardener, the Shoot infrastructure is created and destroyed by the provider-extension - see provider-aws for aws. Each provider can create the infrastructure with any suitable approach e.g for AWS the provider-aws uses terraform underneath, the vSphere provider uses the vSphere API directly via a client SDK.

Each provider would then also implement the interface method forceCleanupInfrastructure(String vpcId) that forcefully deletes all Shoot created resources + the VPC. The actual implementation is again provider specific. For instance for AWS, it could use the AWS API to check for all relevant resources (Elastic Ips, ec2 instances, ACL, NatGateway, SecuritGroups,...) and delete each single one of them, independent of the resource being created by Gardener or the Shoot owner.

Other ideas

Additionally these plugins could be combined with high-level gardenctl operations such as forceDeleteShoot. This operation would then

  • annotate & mark the Shoot for deletion
  • wait & check if the infrastructure can be deleted (check Infrastructure.status) - if not: call the forceCleanupInfrastructure() plugin for the target cloud-provider
  • Get all kubernetes resources in the namespace (get all API groups inclusive CRDs) , removes existing finalizers, and then deletes them if no deletion timestamp is set.
@danielfoehrKn danielfoehrKn added the kind/enhancement Enhancement, improvement, extension label Jun 8, 2020
@jfortin-sap-zz
Copy link

@danielfoehrKn Can you assign this to me?

@jfortin-sap-zz
Copy link

jfortin-sap-zz commented Jul 15, 2020

@danielfoehrKn I would like to confirm the high-level work flow with you before I get started. So, for example the gardenctl aws extension provider would be in it's own github repo "gardenctl-extensions" with all the other cloud related ext providers, building it would produce "gardenctl-ext-provider-aws.so" shared object library file where the interface method forceCleanupInfrastructure(String vpcId) would exist where as we would load that module on runtime and call that method if we are manipulating a shoot on aws and calling gardenctl to request a forceDeleteShoot operation?

@danielfoehrKn
Copy link
Author

I think we should get together with other gardenctl maintainers to decide wether this is something that should be worked on in the first place, when, and generally how it could look like..
However, at least in my point of view gardenctl has also other problems that might be better to tackle first. Major ones being

  • increase stability of the existing ssh command
  • (add a verbosity flag)

@tedteng
Copy link
Contributor

tedteng commented Jul 20, 2020

regarding stability issue for ssh command is that happen when access AWS

I used to be investigated on that
#203
#195

In my case ssh timeout due to status check initiating due to waiting instance-status.status,Values=ok

Waiting 60 seconds until ports are open.
Warning: Permanently added '<some-public-ip>' (ECDSA) to the list of known hosts.
channel 0: open failed: connect failed: Connection timed out
stdio forwarding failed
kex_exchange_identification: Connection closed by remote host

image

However, the issues not occurred all the time. besides, after testing it takes a long time waiting for instance-status pass before use bash ssh command jump to a node from the bastion

  • increase stability of the existing ssh command

@tedteng
Copy link
Contributor

tedteng commented Jul 23, 2020

PR for #231 @danielfoehrKn

  • increase stability of the existing ssh command
  • (add a verbosity flag)

@neo-liang-sap
Copy link
Contributor

Hi @jfortin-sap , as after discussion we opened several sub-tasks to break down this issue, shall we close this central one or you would like to use this issue to track feature implementation?
/cc @tedteng @dansible
Thanks.
-Neo

@dansible
Copy link
Contributor

dansible commented Aug 7, 2020

I think we can close this issue

@neo-liang-sap
Copy link
Contributor

/close

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
kind/enhancement Enhancement, improvement, extension
Projects
None yet
Development

No branches or pull requests

6 participants