Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xcvrd crash on Celestica dx010 and S6100 #6978

Closed
yxieca opened this issue Mar 7, 2021 · 3 comments · Fixed by #6979
Closed

xcvrd crash on Celestica dx010 and S6100 #6978

yxieca opened this issue Mar 7, 2021 · 3 comments · Fixed by #6979

Comments

@yxieca
Copy link
Contributor

yxieca commented Mar 7, 2021

Description

Nightly test is failing on Celestica DX010 platform consistently because xcvrd is crashing

From show version output. It appears that the platform API implementation is either missing or wrong.

Also found this issue on S6100. The problem happened between image 459 and 467. The most likely regression is from PR #6957

Steps to reproduce the issue:

  1. load latest master image

Describe the results you received:

/var/log/syslog:Mar 7 00:10:42.585233 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd Traceback (most recent call last):
/var/log/syslog:Mar 7 00:10:42.585514 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd File "/usr/local/bin/xcvrd", line 8, in
/var/log/syslog:Mar 7 00:10:42.585710 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd sys.exit(main())
/var/log/syslog:Mar 7 00:10:42.585872 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python2.7/dist-packages/xcvrd/xcvrd.py", line 1379, in main
/var/log/syslog:Mar 7 00:10:42.586022 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd xcvrd.run()
/var/log/syslog:Mar 7 00:10:42.586169 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python2.7/dist-packages/xcvrd/xcvrd.py", line 1327, in run
/var/log/syslog:Mar 7 00:10:42.586332 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd self.init()
/var/log/syslog:Mar 7 00:10:42.586493 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python2.7/dist-packages/xcvrd/xcvrd.py", line 1292, in init
/var/log/syslog:Mar 7 00:10:42.586657 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd post_port_sfp_dom_info_to_db(is_warm_start, self.stop_event)
/var/log/syslog:Mar 7 00:10:42.586836 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python2.7/dist-packages/xcvrd/xcvrd.py", line 486, in post_port_sfp_dom_info_to_db
/var/log/syslog:Mar 7 00:10:42.587001 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd post_port_sfp_info_to_db(logical_port_name, int_tbl[asic_index], transceiver_dict, stop_event)
/var/log/syslog:Mar 7 00:10:42.587158 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd File "/usr/local/lib/python2.7/dist-packages/xcvrd/xcvrd.py", line 302, in post_port_sfp_info_to_db
/var/log/syslog:Mar 7 00:10:42.587315 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd ('dom_capability',port_info_dict['dom_capability']),
/var/log/syslog:Mar 7 00:10:42.587472 str-dx010-acs-4 INFO pmon#/supervisord: xcvrd KeyError: 'dom_capability'

Describe the results you expected:

system being healthy.

Output of show version:

SONiC Software Version: SONiC.HEAD.467-2560ec62
Distribution: Debian 10.8
Kernel: 4.19.0-12-2-amd64
Build commit: 2560ec62
Build date: Fri Mar  5 20:34:25 UTC 2021
Built by: johnar@jenkins-worker-22

Platform: x86_64-cel_seastone-r0
HwSKU: Celestica-DX010-C32
ASIC: broadcom
ASIC Count: 1
Traceback (most recent call last):
  File "/usr/local/bin/decode-syseeprom", line 18, in <module>
    import sonic_platform
ModuleNotFoundError: No module named 'sonic_platform'
Serial Number: 
Uptime: 02:39:54 up 15:13,  1 user,  load average: 3.16, 3.18, 3.15

Docker images:
REPOSITORY                    TAG                 IMAGE ID            SIZE
docker-platform-monitor       HEAD.467-2560ec62   ea6ac451e72d        606MB
docker-platform-monitor       latest              ea6ac451e72d        606MB
docker-sonic-mgmt-framework   HEAD.467-2560ec62   e2840e3c9ba9        616MB
docker-sonic-mgmt-framework   latest              e2840e3c9ba9        616MB
docker-sonic-telemetry        HEAD.467-2560ec62   162259bf8375        487MB
docker-sonic-telemetry        latest              162259bf8375        487MB
docker-sflow                  HEAD.467-2560ec62   8039bef09570        409MB
docker-sflow                  latest              8039bef09570        409MB
docker-teamd                  HEAD.467-2560ec62   569ea54d421d        408MB
docker-teamd                  latest              569ea54d421d        408MB
docker-nat                    HEAD.467-2560ec62   f475799e2630        411MB
docker-nat                    latest              f475799e2630        411MB
docker-macsec                 HEAD.467-2560ec62   6ca428de0e3d        411MB
docker-macsec                 latest              6ca428de0e3d        411MB
docker-orchagent              HEAD.467-2560ec62   b655d13719f3        427MB
docker-orchagent              latest              b655d13719f3        427MB
docker-syncd-brcm             HEAD.467-2560ec62   2fa3d9706c00        680MB
docker-syncd-brcm             latest              2fa3d9706c00        680MB
docker-snmp                   HEAD.467-2560ec62   a6ef34d65d60        438MB
docker-snmp                   latest              a6ef34d65d60        438MB
docker-lldp                   HEAD.467-2560ec62   b8c461a8bba9        438MB
docker-lldp                   latest              b8c461a8bba9        438MB
docker-router-advertiser      HEAD.467-2560ec62   bde381bc282b        398MB
docker-router-advertiser      latest              bde381bc282b        398MB
docker-database               HEAD.467-2560ec62   3ebc24e04168        398MB
docker-database               latest              3ebc24e04168        398MB
docker-dhcp-relay             HEAD.467-2560ec62   0f3d7d5ff724        405MB
docker-dhcp-relay             latest              0f3d7d5ff724        405MB
docker-fpm-frr                HEAD.467-2560ec62   be2f97368eff        426MB
docker-fpm-frr                latest              be2f97368eff        426MB

Additional information you deem important (e.g. issue happens only occasionally):

@yxieca yxieca changed the title xcvrd crash on Celestica dx010 xcvrd crash on Celestica dx010 and S6100 Mar 7, 2021
@lguohan
Copy link
Collaborator

lguohan commented Mar 7, 2021

is this a regression? @jleveque

@lguohan
Copy link
Collaborator

lguohan commented Mar 7, 2021

reopen this issue for tracking the root cause.

I am concerned about these continuous regressions.

@jleveque
Copy link
Contributor

jleveque commented Mar 7, 2021

is this a regression? @jleveque

Yes. Regression was introduced by sonic-net/sonic-platform-daemons#72. I have opened a fix here: sonic-net/sonic-platform-daemons#162.

jleveque added a commit to sonic-net/sonic-platform-daemons that referenced this issue Mar 9, 2021
…'N/A' (#162)

Currently, some vendors are using custom transceiver info parsers which do not yet provide the `dom_capability` field in the results of `get_transceiver_info()`. However, PR #72 introduced storing this value to State DB under the assumption that it would always be present. On platforms where this value is not present, it would cause xcvrd to crash (see issue: sonic-net/sonic-buildimage#6978).

This change will prevent a crash if it is not present, and will in turn save `'N/A'` as the `dom_capability` value in State DB.
@daall daall closed this as completed Mar 17, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants