Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhance SONiC with kubernetes management commands #962

Merged
merged 41 commits into from
Aug 29, 2020

Conversation

renukamanavalan
Copy link
Contributor

@renukamanavalan renukamanavalan commented Jun 24, 2020

Config commands:

sudo config kubernetes server ip <IP address>
sudo config kubernetes server insecure <True/False>
sudo config kubernetes server disable <True/False>
sudo config kubernetes label add <key> <value>
sudo config kubernetes label drop <key>
sudo config kubernetes join [-a/--async] [-f/--force]
sudo config kubernetes reset

Show commands:

show kube server
show kube status
show kube nodes
show kube pods

A python wrapper for kube_join for invocation by other tools.

- What I did

- How I did it

- How to verify it

- Previous command output (if the output of a command-line utility has changed)

- New command output (if the output of a command-line utility has changed)

Config commands:
1) sudo config kubernetes server ip <IP address>
2) sudo config kubernetes server insecure <True/False>
3) sudo config kubernetes server disable <True/False>
4) sudo config kubernetes label add <key> <value>
5) sudo config kubernetes label drop <key>
6) sudo config kubernetes join [-a/--async] [-f/--force]
7) sudo config kubernetes reset

Show commands:
1) show kube server
2) show kube status
3) show kube nodes
4) show kube pods

A python wrapper for kube_join for invocation by other tools.
@lgtm-com

This comment has been minimized.

config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
@lgtm-com

This comment has been minimized.

@lgtm-com
Copy link

lgtm-com bot commented Jun 25, 2020

This pull request introduces 1 alert when merging d36a167 into 38bff50 - view on LGTM.com

new alerts:

  • 1 for Unused import

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@renukamanavalan
Copy link
Contributor Author

The last/outstanding alerts from lgtm can be ignored. It looks like, the imports are unused, but they are used, through class method of click.

@jleveque
Copy link
Contributor

jleveque commented Jun 26, 2020

@renukamanavalan: It appears the LGTM warnings are correct -- those imports are unused. You don't access the kube module in either the show/main.py or config/main.py files. Please remove those imports.

@renukamanavalan
Copy link
Contributor Author

renukamanavalan commented Jun 26, 2020

Not really. of course I tested this code to work.

Here is some high level insight:
For example, in case of config, the when main.py imports, kube, the kube.py adds "kubernetes" as command to config group. The config is declared via a class method of Click.group. Being class method, it is globally available. This class method uses Abbreviation group to hold all registered commands. Being class method, it has to maintain a single instance of Abbreviation, hence kube.by, can join itself with all other commands added to config, via main.py, transparently. Now the Abbreviation group has the command, which is used by click. Also all the entire kube.py is imported, hence when sub-commands from kube are invoked, they are available locally.

Test:
When you type "sudo config ", it shows "kubernetes"
sudo config kubernetes -- shows all the subcommands and so on

admin@str-s6000-acs-13:/usr/lib/python2.7/dist-packages/config$ sudo config 
aaa                     interface               loopback                reload                  vlan
acl                     interface_naming_mode   mirror_session          route                   vrf
bgp                     is_loopback_name_valid  nat                     save                    warm_restart
container               kdump                   ntp                     sflow                   watermark
dropcounters            kubernetes              pfcwd                   snmpagentaddress        ztp
ecn                     load                    platform                snmptrap                
feature                 load_mgmt_config        portchannel             syslog                  
hostname                load_minigraph          qos                     tacacs         

admin@str-s6000-acs-13:/usr/lib/python2.7/dist-packages/config$ sudo config kube 
join    label   reset   server  
admin@str-s6000-acs-13:/usr/lib/python2.7/dist-packages/config$ sudo config kube server 
disable   insecure  ip        

Precisely same with show too.

The magic is from the class method of click.
The lGTM complains, because it does not see, main.py calling any method from kube (no kube.* calls).

show/kube.py Outdated Show resolved Hide resolved
show/kube.py Outdated Show resolved Hide resolved
show/kube.py Outdated Show resolved Hide resolved
Main one: Handle the absence of admin.conf, which would not be present if node is not connected to the cluster yet.
config/main.py Outdated Show resolved Hide resolved
show/main.py Outdated Show resolved Hide resolved
@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

@lgtm-com

This comment has been minimized.

config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
show/kube.py Outdated Show resolved Hide resolved
show/main.py Outdated Show resolved Hide resolved
utilities_common/cli.py Outdated Show resolved Hide resolved
utilities_common/db.py Outdated Show resolved Hide resolved
No logical code changes.
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/kube.py Outdated Show resolved Hide resolved
config/main.py Outdated Show resolved Hide resolved
show/main.py Show resolved Hide resolved
No logical code changes.
config/main.py Outdated Show resolved Hide resolved
show/kube.py Outdated Show resolved Hide resolved
show/main.py Outdated Show resolved Hide resolved
@abdosi
Copy link
Contributor

abdosi commented Sep 19, 2020

@renukamanavalan Do we need this in 201911 ? This seems to be enhancement.
If needed please create PR for 201911 as cherry-pick has conflict.

@rlhui

@briantopping
Copy link

Apologies for the dumb question, is there a conversation about this change somewhere?

@renukamanavalan
Copy link
Contributor Author

renukamanavalan commented Sep 21, 2020

Apologies for the dumb question, is there a conversation about this change somewhere?

Good question. There is a HLD review on this Tuesday (9/22). Please join.
Here is work in progress. link
Soon I will create a PR for this doc.

@briantopping
Copy link

Thanks for the link. I only found SONiC a few days ago, rather innocently in a search for an OSS ONIE NOS and choosing SONiC by project momentum. When I realized it ran Docker, I was blown away. So I started searching for whether anyone had tried putting Kubelet on it. I am ecstatic to find your effort in this regard.

My initial impressions is that "legacy SONiC" needs to be supported with systemd for those that will never join their switch into a control fabric such as with Kubernetes. I also noted a team somewhere that was using Docker Swarm to create a switch control fabric, so it's presumptive to assume Kubernetes will be the only control plane ever used.

The unifying aspect of these different control planes (ie systemd, Docker Swarm, Kubernetes) is they all use the Docker daemon socket for control. The systemd approach indirectly uses the socket via the Docker CLI app and shell scripts. Thus, if Docker itself on a node became the source of record for the state of a node, control plane fabrics could reasonably coexist. And to that end, it might help reduce interdependencies in the support matrix.

If it's worth connecting over this before the HLD review, please don't hesitate to reach out here or on my email.

@renukamanavalan
Copy link
Contributor Author

Hi @briantopping,
Thanks for your interest and very helpful comments.

Yes, there is Docker Swarm and also OpenShift. The reasons, we picked
a) We need to pick one for our first step.
b) Kubernetes is the most widely adopted
c) It has excellent community support, worldwide.

Said that, the goal is to keep future adaptation to other technology/tool as simple as possible. If you look at the core design, it is still systemd that manages at service level. When it comes to down to docker level, it either runs docker command or instruct the remote system to start/stop/.... In Kubernetes, it controls it through labels and in docker swarm, it could be different.

This is achieved by replacing docker commands with container commands as docker start/stop/... become container start/stop/.... The container start, under the hood calls docker or label create\remove based on owner, which is local/kube. When we extend to docker-swarm, we would add one more owner type as, say "swarm" and update the container action commands will be updated to handle swarm.

Guess, we don't have much time to connect, before community review, which is at PDT 8AM tomorrow. Please join. We can connect after that for the follow ups.

@briantopping
Copy link

briantopping commented Sep 21, 2020

Thanks for the updates, yes OpenShift and Rancher as well if you want to consider other Kubernetes-based distributions.

I have a feeling that depending on systemd will seem obtuse in a few years, but there's a high probability I am missing requirements, details and nuances.

As a user, I would like to see Kubernetes (or any orchestrator) run as unencumbered as possible. There is a lot of tribal knowledge baked into Kubernetes (for instance) and it's a nonzero possibility that the SONiC container orchestration team would miss one or more of these nuances as it tries to make k8s work through systemd instead of the Docker socket. If that missed expectation/requirement translates into an unreliable implementation in SONiC, that's worrisome.

So instead, what if the systemd implementation was refactored to act similarly as a client to dockerd? For instance, systemctl status frr (or whatever the container is called) would work directly with Docker to provide results, allowing other orchestrators to work from the same information source (docker itself). If it is already doing that, then why have the orchestrators talking to systemd? This shim seems to be a maintenance liability.

I am thinking about the overall level of effort to get to a robust and bulletproof implementation that works across all the orchestrators you've named, especially in the face of future implementation changes that we cannot foresee today. If the systemd dependency is removed, we are not constantly tracking their version stream, just updating kubelet or whatever and moving on.

I regret belaboring this point if you understood what I was talking about from the previous post. Just wanted to save time tomorrow when there is a bigger audience whose valuable time is a consideration.

EDIT: Re-reading your current behavior link, I think I may understand that the "complex dependencies across features" requirement may be why it is foreseen that systemd being a part of the call chain is necessary. I would advocate that something like Helm does a better job at this in a Kubernetes environment due to other infrastructure such as Jenkins X. I'm not saying that existing systemd infrastructure does not provide critical capabilities in the current worldview, only that they may be matched outside the existing SONiC ecosystem and some environments may wish to normalize all their infrastructure using a reduced tooling set. How can the solution space best include these alternative options without adding additional complexity to the overall delivery?

@renukamanavalan
Copy link
Contributor Author

I have to differ a little bit here. The Kubernetes is robust, but they don't handle all possible scenarios, which is improbable and becoming too complex to handle "all" will ultimately fall apart. Our SONiC scenario, where we plan to manage daemons only, is not the mainstream scenario for kubernetes. To tell you a sample, a daemonset does not support restartPolicy or backoff timer. As you know, each switch is carrying production traffic and its health is of utmost importance, with minimal down time for any dockers in any scenario. The availability of kubernetes servers can't be assured. Plus, there are many scenarios, where a switch may decide not to join. I can go on ... :-)

What we did here is to take a first step, in a non-invasive fashion, but pretty effective in bringing in external support for container management. We are going to learn a lot from this step and this can pave our way to next.

Your vision of complete external control -- Need more than kubernetes -- one more layer of orchestration that could monitor switches & kubernetes servers as a whole and take appropriate corrective action as and when needed. We are not there yet.

Thank you,
Renuka

@briantopping
Copy link

Yes! That makes perfect sense, thank you for the explanation!! Grateful for your elaboration before tomorrow's meeting. 🙏🙏🙏

@renukamanavalan renukamanavalan deleted the kube_join branch September 16, 2022 17:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants