Enhance SONiC with kubernetes management commands #962

renukamanavalan · 2020-06-24T22:35:00Z

Config commands:

sudo config kubernetes server ip <IP address>
sudo config kubernetes server insecure <True/False>
sudo config kubernetes server disable <True/False>
sudo config kubernetes label add <key> <value>
sudo config kubernetes label drop <key>
sudo config kubernetes join [-a/--async] [-f/--force]
sudo config kubernetes reset

Show commands:

show kube server
show kube status
show kube nodes
show kube pods

A python wrapper for kube_join for invocation by other tools.

- What I did

- How I did it

- How to verify it

- Previous command output (if the output of a command-line utility has changed)

- New command output (if the output of a command-line utility has changed)

Config commands: 1) sudo config kubernetes server ip <IP address> 2) sudo config kubernetes server insecure <True/False> 3) sudo config kubernetes server disable <True/False> 4) sudo config kubernetes label add <key> <value> 5) sudo config kubernetes label drop <key> 6) sudo config kubernetes join [-a/--async] [-f/--force] 7) sudo config kubernetes reset Show commands: 1) show kube server 2) show kube status 3) show kube nodes 4) show kube pods A python wrapper for kube_join for invocation by other tools.

config/kube.py

lgtm-com · 2020-06-25T21:42:20Z

This pull request introduces 1 alert when merging d36a167 into 38bff50 - view on LGTM.com

new alerts:

1 for Unused import

This reverts commit d36a167.

renukamanavalan · 2020-06-25T22:45:57Z

The last/outstanding alerts from lgtm can be ignored. It looks like, the imports are unused, but they are used, through class method of click.

jleveque · 2020-06-26T04:22:56Z

@renukamanavalan: It appears the LGTM warnings are correct -- those imports are unused. You don't access the kube module in either the show/main.py or config/main.py files. Please remove those imports.

renukamanavalan · 2020-06-26T04:42:38Z

Not really. of course I tested this code to work.

Here is some high level insight:
For example, in case of config, the when main.py imports, kube, the kube.py adds "kubernetes" as command to config group. The config is declared via a class method of Click.group. Being class method, it is globally available. This class method uses Abbreviation group to hold all registered commands. Being class method, it has to maintain a single instance of Abbreviation, hence kube.by, can join itself with all other commands added to config, via main.py, transparently. Now the Abbreviation group has the command, which is used by click. Also all the entire kube.py is imported, hence when sub-commands from kube are invoked, they are available locally.

Test:
When you type "sudo config ", it shows "kubernetes"
sudo config kubernetes -- shows all the subcommands and so on

admin@str-s6000-acs-13:/usr/lib/python2.7/dist-packages/config$ sudo config 
aaa                     interface               loopback                reload                  vlan
acl                     interface_naming_mode   mirror_session          route                   vrf
bgp                     is_loopback_name_valid  nat                     save                    warm_restart
container               kdump                   ntp                     sflow                   watermark
dropcounters            kubernetes              pfcwd                   snmpagentaddress        ztp
ecn                     load                    platform                snmptrap                
feature                 load_mgmt_config        portchannel             syslog                  
hostname                load_minigraph          qos                     tacacs         

admin@str-s6000-acs-13:/usr/lib/python2.7/dist-packages/config$ sudo config kube 
join    label   reset   server  
admin@str-s6000-acs-13:/usr/lib/python2.7/dist-packages/config$ sudo config kube server 
disable   insecure  ip

Precisely same with show too.

The magic is from the class method of click.
The lGTM complains, because it does not see, main.py calling any method from kube (no kube.* calls).

show/kube.py

Main one: Handle the absence of admin.conf, which would not be present if node is not connected to the cluster yet.

config/main.py

show/main.py

…mode.

s/hostname/get_hostname

…vice_info.

config/kube.py

config/main.py

show/kube.py

show/main.py

utilities_common/cli.py

utilities_common/db.py

No logical code changes.

config/kube.py

config/main.py

show/main.py

No logical code changes.

config/main.py

show/kube.py

show/main.py

No logical code changes.

abdosi · 2020-09-19T22:16:28Z

@renukamanavalan Do we need this in 201911 ? This seems to be enhancement.
If needed please create PR for 201911 as cherry-pick has conflict.

@rlhui

briantopping · 2020-09-21T01:47:55Z

Apologies for the dumb question, is there a conversation about this change somewhere?

renukamanavalan · 2020-09-21T02:59:06Z

Apologies for the dumb question, is there a conversation about this change somewhere?

Good question. There is a HLD review on this Tuesday (9/22). Please join.
Here is work in progress. link
Soon I will create a PR for this doc.

briantopping · 2020-09-21T18:38:50Z

Thanks for the link. I only found SONiC a few days ago, rather innocently in a search for an OSS ONIE NOS and choosing SONiC by project momentum. When I realized it ran Docker, I was blown away. So I started searching for whether anyone had tried putting Kubelet on it. I am ecstatic to find your effort in this regard.

My initial impressions is that "legacy SONiC" needs to be supported with systemd for those that will never join their switch into a control fabric such as with Kubernetes. I also noted a team somewhere that was using Docker Swarm to create a switch control fabric, so it's presumptive to assume Kubernetes will be the only control plane ever used.

The unifying aspect of these different control planes (ie systemd, Docker Swarm, Kubernetes) is they all use the Docker daemon socket for control. The systemd approach indirectly uses the socket via the Docker CLI app and shell scripts. Thus, if Docker itself on a node became the source of record for the state of a node, control plane fabrics could reasonably coexist. And to that end, it might help reduce interdependencies in the support matrix.

If it's worth connecting over this before the HLD review, please don't hesitate to reach out here or on my email.

renukamanavalan · 2020-09-21T19:10:57Z

Hi @briantopping,
Thanks for your interest and very helpful comments.

Yes, there is Docker Swarm and also OpenShift. The reasons, we picked
a) We need to pick one for our first step.
b) Kubernetes is the most widely adopted
c) It has excellent community support, worldwide.

Said that, the goal is to keep future adaptation to other technology/tool as simple as possible. If you look at the core design, it is still systemd that manages at service level. When it comes to down to docker level, it either runs docker command or instruct the remote system to start/stop/.... In Kubernetes, it controls it through labels and in docker swarm, it could be different.

This is achieved by replacing docker commands with container commands as docker start/stop/... become container start/stop/.... The container start, under the hood calls docker or label create\remove based on owner, which is local/kube. When we extend to docker-swarm, we would add one more owner type as, say "swarm" and update the container action commands will be updated to handle swarm.

Guess, we don't have much time to connect, before community review, which is at PDT 8AM tomorrow. Please join. We can connect after that for the follow ups.

briantopping · 2020-09-21T21:01:44Z

Thanks for the updates, yes OpenShift and Rancher as well if you want to consider other Kubernetes-based distributions.

I have a feeling that depending on systemd will seem obtuse in a few years, but there's a high probability I am missing requirements, details and nuances.

As a user, I would like to see Kubernetes (or any orchestrator) run as unencumbered as possible. There is a lot of tribal knowledge baked into Kubernetes (for instance) and it's a nonzero possibility that the SONiC container orchestration team would miss one or more of these nuances as it tries to make k8s work through systemd instead of the Docker socket. If that missed expectation/requirement translates into an unreliable implementation in SONiC, that's worrisome.

So instead, what if the systemd implementation was refactored to act similarly as a client to dockerd? For instance, systemctl status frr (or whatever the container is called) would work directly with Docker to provide results, allowing other orchestrators to work from the same information source (docker itself). If it is already doing that, then why have the orchestrators talking to systemd? This shim seems to be a maintenance liability.

I am thinking about the overall level of effort to get to a robust and bulletproof implementation that works across all the orchestrators you've named, especially in the face of future implementation changes that we cannot foresee today. If the systemd dependency is removed, we are not constantly tracking their version stream, just updating kubelet or whatever and moving on.

I regret belaboring this point if you understood what I was talking about from the previous post. Just wanted to save time tomorrow when there is a bigger audience whose valuable time is a consideration.

EDIT: Re-reading your current behavior link, I think I may understand that the "complex dependencies across features" requirement may be why it is foreseen that systemd being a part of the call chain is necessary. I would advocate that something like Helm does a better job at this in a Kubernetes environment due to other infrastructure such as Jenkins X. I'm not saying that existing systemd infrastructure does not provide critical capabilities in the current worldview, only that they may be matched outside the existing SONiC ecosystem and some environments may wish to normalize all their infrastructure using a reduced tooling set. How can the solution space best include these alternative options without adding additional complexity to the overall delivery?

renukamanavalan · 2020-09-21T22:09:58Z

I have to differ a little bit here. The Kubernetes is robust, but they don't handle all possible scenarios, which is improbable and becoming too complex to handle "all" will ultimately fall apart. Our SONiC scenario, where we plan to manage daemons only, is not the mainstream scenario for kubernetes. To tell you a sample, a daemonset does not support restartPolicy or backoff timer. As you know, each switch is carrying production traffic and its health is of utmost importance, with minimal down time for any dockers in any scenario. The availability of kubernetes servers can't be assured. Plus, there are many scenarios, where a switch may decide not to join. I can go on ... :-)

What we did here is to take a first step, in a non-invasive fashion, but pretty effective in bringing in external support for container management. We are going to learn a lot from this step and this can pave our way to next.

Your vision of complete external control -- Need more than kubernetes -- one more layer of orchestration that could monitor switches & kubernetes servers as a whole and take appropriate corrective action as and when needed. We are not there yet.

Thank you,
Renuka

briantopping · 2020-09-21T23:59:14Z

Yes! That makes perfect sense, thank you for the explanation!! Grateful for your elaboration before tomorrow's meeting. 🙏🙏🙏

renukamanavalan requested review from lguohan, jleveque and qiluo-msft June 24, 2020 22:35

renukamanavalan self-assigned this Jun 24, 2020

This comment has been minimized.

Sign in to view

jleveque suggested changes Jun 24, 2020

View reviewed changes

config/kube.py Outdated Show resolved Hide resolved

config/kube.py Outdated Show resolved Hide resolved

config/kube.py Outdated Show resolved Hide resolved

config/kube.py Outdated Show resolved Hide resolved

config/kube.py Outdated Show resolved Hide resolved

Changes per lgtm comments.

2a9b3d8

This comment has been minimized.

Sign in to view

Address review comments.

d36a167

Revert "Address review comments."

cf1bffd

This reverts commit d36a167.

This comment has been minimized.

Sign in to view

Address review comments in simpler way.

e418ee2

This comment has been minimized.

Sign in to view

qiluo-msft reviewed Jun 26, 2020

View reviewed changes

show/kube.py Outdated Show resolved Hide resolved

qiluo-msft reviewed Jun 26, 2020

View reviewed changes

show/kube.py Outdated Show resolved Hide resolved

qiluo-msft reviewed Jun 26, 2020

View reviewed changes

show/kube.py Outdated Show resolved Hide resolved

Changes per review comments.

255a8bf

Main one: Handle the absence of admin.conf, which would not be present if node is not connected to the cluster yet.

jleveque suggested changes Jun 28, 2020

View reviewed changes

config/main.py Outdated Show resolved Hide resolved

show/main.py Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

Added per Joe's suggestions to help fix LGTM alert.

06f0f88

This comment has been minimized.

Sign in to view

Testing, if this would install crontab.

9b59406

renukamanavalan mentioned this pull request Jun 28, 2020

Enable switch to join kubernetes master transparently, if configured sonic-net/sonic-buildimage#4474

Closed

This comment has been minimized.

Sign in to view

Revert the attempt to install crontab, as it is not required for CLI …

d95b7b4

…mode.

renukamanavalan added 3 commits August 27, 2020 20:50

get-pods only to show local pods.

4ca5986

s/hostname/get_hostname

Switch log calls to .utils

a3af199

Merge remote-tracking branch 'upstream/master' into kube_join

9f73f41

This comment has been minimized.

Sign in to view

renukamanavalan added 4 commits August 27, 2020 22:04

Re-org per review comments.

10c9f05

s/get_hostname/device_info.get_hostname/ as get_hostname come from de…

ff5c23b

…vice_info.

get hostname from device_info.

85dad40

Ensure br_netfilter module is installed before join.

de7ab8e

jleveque suggested changes Aug 28, 2020

View reviewed changes

Updates per review comments.

1ef270b

No logical code changes.

jleveque suggested changes Aug 29, 2020

View reviewed changes

More corrections per review comments.

ac9ca63

No logical code changes.

jleveque suggested changes Aug 29, 2020

View reviewed changes

config/main.py Outdated Show resolved Hide resolved

show/kube.py Outdated Show resolved Hide resolved

show/main.py Outdated Show resolved Hide resolved

Few more changes per review comments, for better readability.

1cca4f2

No logical code changes.

jleveque approved these changes Aug 29, 2020

View reviewed changes

renukamanavalan merged commit 60fcab9 into sonic-net:master Aug 29, 2020

renukamanavalan added Enhancement Request for 201911 Branch labels Sep 2, 2020

stephenxs mentioned this pull request Sep 4, 2020

[submodule] Advance submodule head for sonic-utilities sonic-net/sonic-buildimage#5318

Closed

3 tasks

abdosi removed the Request for 201911 Branch label Sep 19, 2020

renukamanavalan deleted the kube_join branch September 16, 2022 17:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Enhance SONiC with kubernetes management commands #962

Enhance SONiC with kubernetes management commands #962

renukamanavalan commented Jun 24, 2020 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

lgtm-com bot commented Jun 25, 2020

This comment has been minimized.

This comment has been minimized.

renukamanavalan commented Jun 25, 2020

jleveque commented Jun 26, 2020 •

edited

Loading

renukamanavalan commented Jun 26, 2020 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

abdosi commented Sep 19, 2020

briantopping commented Sep 21, 2020

renukamanavalan commented Sep 21, 2020 •

edited

Loading

briantopping commented Sep 21, 2020

renukamanavalan commented Sep 21, 2020

briantopping commented Sep 21, 2020 •

edited

Loading

renukamanavalan commented Sep 21, 2020

briantopping commented Sep 21, 2020

Enhance SONiC with kubernetes management commands #962

Enhance SONiC with kubernetes management commands #962

Conversation

renukamanavalan commented Jun 24, 2020 • edited Loading

This comment has been minimized.

This comment has been minimized.

lgtm-com bot commented Jun 25, 2020

This comment has been minimized.

This comment has been minimized.

renukamanavalan commented Jun 25, 2020

jleveque commented Jun 26, 2020 • edited Loading

renukamanavalan commented Jun 26, 2020 • edited Loading

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

This comment has been minimized.

abdosi commented Sep 19, 2020

briantopping commented Sep 21, 2020

renukamanavalan commented Sep 21, 2020 • edited Loading

briantopping commented Sep 21, 2020

renukamanavalan commented Sep 21, 2020

briantopping commented Sep 21, 2020 • edited Loading

renukamanavalan commented Sep 21, 2020

briantopping commented Sep 21, 2020

renukamanavalan commented Jun 24, 2020 •

edited

Loading

jleveque commented Jun 26, 2020 •

edited

Loading

renukamanavalan commented Jun 26, 2020 •

edited

Loading

renukamanavalan commented Sep 21, 2020 •

edited

Loading

briantopping commented Sep 21, 2020 •

edited

Loading