Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[warmboot] warmboot LACP downtime variation #17581

Open
stepanblyschak opened this issue Dec 20, 2023 · 1 comment
Open

[warmboot] warmboot LACP downtime variation #17581

stepanblyschak opened this issue Dec 20, 2023 · 1 comment
Assignees
Labels
Issue for 202305 NVIDIA Triaged this issue has been triaged

Comments

@stepanblyschak
Copy link
Collaborator

stepanblyschak commented Dec 20, 2023

Description

Steps to reproduce the issue:

  1. Run upgrade_path sonic-mgmt test in warm mode from 202205 to 202305

Describe the results you received:

Depending on whether LAG is created on first ports (Ethernet0, Ethernet2) or last one it will have different restoration time: e.g:

Dec 20 18:01:42.785263 sonic NOTICE swss#orchagent: :- initPort: Initialized port Ethernet0
Dec 20 18:01:47.467496 sonic NOTICE swss#orchagent: :- initPort: Initialized port Ethernet126

It takes ~5 sec to init all 56 ports. Roughly 0.1 sec per port. The initPort() in orchagent does query many attributes and port capabilities which takes time to serialize/deserialize back and forth as well as does not scale as more features are added.

E.g: https://github.com/sonic-net/sonic-swss/blob/202305/orchagent/portsorch.cpp#L4959

We can optimize this by creating host interfaces as early as possible and query rest of the things later.

Describe the results you expected:

Stable under 90 sec with ~20 sec headroom.

Output of show version:

(paste your output here)

Output of show techsupport:

SONiC Software Version: SONiC.202305_RC.55-16d7da84c_Internal
SONiC OS Version: 11
Distribution: Debian 11.8
Kernel: 5.10.0-23-2-amd64
Build commit: 16d7da84c
Build date: Wed Dec 20 01:04:07 UTC 2023
Built by: sw-r2d2-bot@r-build-sonic-ci02-244

Platform: x86_64-mlnx_msn2700-r0
HwSKU: Mellanox-SN2700-D40C8S8
ASIC: mellanox
ASIC Count: 1
Serial Number: MT1805K20439
Model Number: MSN2700-CS2F
Hardware Revision: A2
Uptime: 18:20:06 up 19 min,  1 user,  load average: 1.15, 1.19, 1.26
Date: Wed 20 Dec 2023 18:20:06

Docker images:
REPOSITORY                                         TAG                               IMAGE ID       SIZE
docker-orchagent                                   202305_RC.55-16d7da84c_Internal   3f3b4e8d6a83   330MB
docker-orchagent                                   latest                            3f3b4e8d6a83   330MB
docker-fpm-frr                                     202305_RC.55-16d7da84c_Internal   28bf35dd59af   350MB
docker-fpm-frr                                     latest                            28bf35dd59af   350MB
docker-nat                                         202305_RC.55-16d7da84c_Internal   40fb6b1c2de0   321MB
docker-nat                                         latest                            40fb6b1c2de0   321MB
docker-sflow                                       202305_RC.55-16d7da84c_Internal   4aeff566c75d   320MB
docker-sflow                                       latest                            4aeff566c75d   320MB
docker-teamd                                       202305_RC.55-16d7da84c_Internal   f7a9e70960fb   318MB
docker-teamd                                       latest                            f7a9e70960fb   318MB
docker-macsec                                      latest                            17367fbba2ad   320MB
docker-syncd-mlnx                                  202305_RC.55-16d7da84c_Internal   ecffee80c40e   844MB
docker-syncd-mlnx                                  latest                            ecffee80c40e   844MB
docker-platform-monitor                            202305_RC.55-16d7da84c_Internal   64dd2ddb15e1   829MB
docker-platform-monitor                            latest                            64dd2ddb15e1   829MB
docker-dhcp-relay                                  latest                            c82052f8856f   308MB
docker-eventd                                      202305_RC.55-16d7da84c_Internal   08450f0634c6   300MB
docker-eventd                                      latest                            08450f0634c6   300MB
docker-sonic-telemetry                             202305_RC.55-16d7da84c_Internal   62f8c86af715   387MB
docker-sonic-telemetry                             latest                            62f8c86af715   387MB
docker-snmp                                        202305_RC.55-16d7da84c_Internal   2280ad37ce04   340MB
docker-snmp                                        latest                            2280ad37ce04   340MB
docker-lldp                                        202305_RC.55-16d7da84c_Internal   8f33b3da4f82   343MB
docker-lldp                                        latest                            8f33b3da4f82   343MB
docker-router-advertiser                           202305_RC.55-16d7da84c_Internal   c905506bff31   301MB
docker-router-advertiser                           latest                            c905506bff31   301MB
docker-mux                                         202305_RC.55-16d7da84c_Internal   8c92a2ffe0c3   349MB
docker-mux                                         latest                            8c92a2ffe0c3   349MB
docker-database                                    202305_RC.55-16d7da84c_Internal   bb154d317a72   301MB
docker-database                                    latest                            bb154d317a72   301MB
docker-sonic-mgmt-framework                        202305_RC.55-16d7da84c_Internal   634e58c34140   416MB
docker-sonic-mgmt-framework                        latest                            634e58c34140   416MB

Additional information you deem important (e.g. issue happens only occasionally):

Also, there were other warm boot related issues reported:

@stepanblyschak stepanblyschak changed the title [warmboot] warmboot sometimes fails to bring control plane up in 90 sec [warmboot] warmboot LACP downtime variation Dec 27, 2023
@judyjoseph judyjoseph added Triaged this issue has been triaged NVIDIA labels Jan 3, 2024
@judyjoseph
Copy link
Contributor

@saiarcot895 please sync up with @stepanblyschak

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue for 202305 NVIDIA Triaged this issue has been triaged
Projects
None yet
Development

No branches or pull requests

4 participants