Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Subinterface creation on Broadcom switches cause multiple container shutdown #18237

Open
rlebedys opened this issue Mar 1, 2024 · 25 comments · May be fixed by #18505
Open

Subinterface creation on Broadcom switches cause multiple container shutdown #18237

rlebedys opened this issue Mar 1, 2024 · 25 comments · May be fixed by #18505
Labels
DELL Triaged this issue has been triaged

Comments

@rlebedys
Copy link

rlebedys commented Mar 1, 2024

Description

When creating a subinterface on Broadcom-based switches (Trident 3) it causes multiple containers to exit.

Steps to reproduce the issue:

  1. execute command config subinterface add EthernetXX.20 20

Describe the results you received:

Multiple containers (swss, syncd and others) exit and switch becomes unstable. Containers are in a crash loop.

Describe the results you expected:

Created subinterface on port EthernetXX.

Output of show version:

SONiC Software Version: SONiC.202311.480461-bacd21577
SONiC OS Version: 11
Distribution: Debian 11.8
Kernel: 5.10.0-23-2-amd64
Build commit: bacd21577
Build date: Sun Feb 18 12:27:37 UTC 2024
Built by: AzDevOps@vmss-soni0033YT

Platform: x86_64-accton_as7326_56x-r0
HwSKU: Accton-AS7326-56X
ASIC: broadcom
ASIC Count: 1

Additional information you deem important (e.g. issue happens only occasionally):

Broadcom SAI version:

:~# bcmcmd "bcmsai ver"
bcmsai ver
BRCM SAI ver: [10.1.6.0], OCP SAI ver: [1.13.2], SDK ver: [sdk-6.5.29], CANCUN ver: [06.04.01]
drivshell>

Attaching logs right after execution of config subinterface add command.
subinterface_add_logs.txt

@adyeung
Copy link
Collaborator

adyeung commented Mar 13, 2024

I am not able to open the log, please upload techsupport output

@adyeung adyeung added BRCM Triaged this issue has been triaged labels Mar 13, 2024
@rlebedys
Copy link
Author

@adyeung, I am adding the logs to the comment.

logs
Feb 19 13:59:29.504648 gs1-leaf71 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet72 admin:1 oper:1 addr:80:a2:35:26:1b:5e ifindex:313 master:0
Feb 19 13:59:29.505178 gs1-leaf71 NOTICE swss#portsyncd: :- onMsg: Publish Ethernet72(ok:up) to state db
Feb 19 13:59:29.505178 gs1-leaf71 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet72.20 admin:0 oper:0 addr:80:a2:35:26:1b:5e ifindex:315 master:0 type:vlan
Feb 19 13:59:29.505662 gs1-leaf71 WARNING pmon#xcvrd[30]: message repeated 2 times: [ $$$ Ethernet76 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '40000,100000', 'supported_fecs': 'none,rs', 'host_tx_ready': 'true', 'speed': '40000', 'fec': 'N/A'}]
Feb 19 13:59:29.505662 gs1-leaf71 WARNING pmon#xcvrd[30]: $$$ Ethernet72 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok', 'netdev_oper_status': 'up', 'admin_status': 'up', 'mtu': '9100', 'supported_speeds': '40000,100000', 'supported_fecs': 'none,rs', 'host_tx_ready': 'true', 'speed': '40000', 'fec': 'N/A'}
Feb 19 13:59:29.506135 gs1-leaf71 NOTICE swss#portsyncd: :- onMsg: Cannot find Ethernet72.20 in port table
Feb 19 13:59:29.506342 gs1-leaf71 INFO systemd-udevd[88249]: ethtool: autonegotiation is unset or enabled, the speed and duplex are not writable.
Feb 19 13:59:29.506810 gs1-leaf71 INFO systemd-udevd[88249]: Using default interface naming scheme 'v247'.
Feb 19 13:59:29.508691 gs1-leaf71 NOTICE swss#portsyncd: :- onMsg: nlmsg type:16 key:Ethernet72.20 admin:1 oper:1 addr:80:a2:35:26:1b:5e ifindex:315 master:0 type:vlan
Feb 19 13:59:29.508824 gs1-leaf71 NOTICE swss#portsyncd: :- onMsg: Cannot find Ethernet72.20 in port table
Feb 19 13:59:29.509359 gs1-leaf71 NOTICE swss#orchagent: :- doTask: Removed pending neighbor DEL operation for Ethernet72:169.254.0.1 after SET operation
Feb 19 13:59:29.510046 gs1-leaf71 WARNING pmon#xcvrd[30]: $$$ Ethernet72.20 handle_port_update_event() : op=SET DB:STATE_DB Table:PORT_TABLE fvp {'state': 'ok'}
Feb 19 13:59:29.510046 gs1-leaf71 WARNING pmon#xcvrd[30]: *** Ethernet72.20STATE_DBPORT_TABLE handle_port_update_event() fvp {'index': '-1', 'key': 'Ethernet72.20', 'asic_id': 0, 'op': 'SET'}
Feb 19 13:59:29.510307 gs1-leaf71 ERR pmon#xcvrd[30]: Exception occured at CmisManagerTask thread due to KeyError(None)
Feb 19 13:59:29.510752 gs1-leaf71 DEBUG bgp#bgpcfgd: Received message : '('Ethernet72.20', 'SET', (('vrf', ''),))'
Feb 19 13:59:29.511034 gs1-leaf71 NOTICE swss#orchagent: :- addSubPort: Sub interface Ethernet72.20 inherits mtu size 9100 from parent port Ethernet72
Feb 19 13:59:29.511773 gs1-leaf71 ERR pmon#xcvrd[30]: Traceback (most recent call last):
Feb 19 13:59:29.511983 gs1-leaf71 ERR pmon#xcvrd[30]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1523, in run
Feb 19 13:59:29.511983 gs1-leaf71 ERR pmon#xcvrd[30]:     self.task_worker()
Feb 19 13:59:29.511983 gs1-leaf71 ERR pmon#xcvrd[30]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1228, in task_worker
Feb 19 13:59:29.511983 gs1-leaf71 ERR pmon#xcvrd[30]:     self.port_dict[lport]['host_tx_ready'] = self.get_host_tx_status(lport)
Feb 19 13:59:29.511983 gs1-leaf71 ERR pmon#xcvrd[30]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 1100, in get_host_tx_status
Feb 19 13:59:29.511983 gs1-leaf71 ERR pmon#xcvrd[30]:     state_port_tbl = self.xcvr_table_helper.get_state_port_tbl(asic_index)
Feb 19 13:59:29.511983 gs1-leaf71 ERR pmon#xcvrd[30]:   File "/usr/local/lib/python3.9/dist-packages/xcvrd/xcvrd.py", line 2426, in get_state_port_tbl
Feb 19 13:59:29.511983 gs1-leaf71 ERR pmon#xcvrd[30]:     return self.state_port_tbl[asic_id]
Feb 19 13:59:29.511983 gs1-leaf71 ERR pmon#xcvrd[30]: KeyError: None
Feb 19 13:59:29.516009 gs1-leaf71 ERR pmon#xcvrd[30]: Xcvrd: exception found at child thread CmisManagerTask due to KeyError(None)
Feb 19 13:59:29.516009 gs1-leaf71 ERR pmon#xcvrd[30]: Exiting main loop as child thread raised exception!
Feb 19 13:59:29.516009 gs1-leaf71 NOTICE swss#orchagent: :- setHostIntfsStripTag: Set SAI_HOSTIF_VLAN_TAG_KEEP to host interface: Ethernet72
Feb 19 13:59:29.516009 gs1-leaf71 INFO syncd#syncd: [none] SAI_API_PORT:_brcm_sai_link_event_cb:1558 Port 127 link down event cause: LOCAL
Feb 19 13:59:29.516009 gs1-leaf71 INFO syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_sub_router_intf_l2_config:1812 Creating vlan
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: [none] SAI_API_VLAN:_brcm_sai_vlan_create_internal_vfi:4546 MC-GRP create failed with error Feature unavailable (0xfffffff0).
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_sub_router_intf_l2_config:1852 internal vfi create failed with error -2.
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_sub_port_router_interface:3940 Sub-Port RIF L2 Config failed with error -2.
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_sub_port_router_interface:4001 SubPort Router Interface Create Failed for port:123 lag:no vlan:20 vpnid:20 vp:0x0 vfp_entry_id:0 l3_intf_id:0 rv:-2
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_xgs_create_router_interface:5176 Error in create router interface failed with error -2.
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_create_router_interface:493 pd router intf create failed with error -2.
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_create_router_interface:522 Router Interface Create Failed rv:-2
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_router_interface_create_err_cleanup:7140 RIF Create failed: rif_id:0 type:4 vrf:0 port-lag-id:123 lag:no vlan:20 virtual:no
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_NOT_SUPPORTED
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID: oid:0x300000000003a
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS: 80:A2:35:26:1B:5E
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_TYPE: SAI_ROUTER_INTERFACE_TYPE_SUB_PORT
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_PORT_ID: oid:0x1000000000038
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID: 20
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE: true
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE: true
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_MTU: 9100
Feb 19 13:59:29.516009 gs1-leaf71 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID: 0
Feb 19 13:59:29.516009 gs1-leaf71 ERR swss#orchagent: :- create: create status: SAI_STATUS_NOT_SUPPORTED
Feb 19 13:59:29.516009 gs1-leaf71 ERR swss#orchagent: :- addRouterIntfs: Failed to create router interface Ethernet72.20, rv:-2
Feb 19 13:59:29.516009 gs1-leaf71 ERR swss#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_ROUTER_INTERFACE, status: SAI_STATUS_NOT_SUPPORTED

Also attaching the techsupport dump archive that was taken when containers exited after subinterface creation.
sonic_dump_61W5SR3-mgmt_20240313_090936.tar.gz

@adyeung
Copy link
Collaborator

adyeung commented Mar 15, 2024

Problem is specific to DellEMC-S5248f-P-25G, it appears the community DELL td3-s5248f-25g.config.bcm is missing SOC parameter flow_init_mode = 1 for VFI MGID creation, besides that there are other parameters also needed for VLAN VFI to work in TD3 for sub intf creation.

Request DELL contributor @aravindmani-1 to help followup and update the file

@adyeung adyeung removed their assignment Mar 15, 2024
@adyeung adyeung added DELL and removed BRCM labels Mar 15, 2024
@rlebedys
Copy link
Author

Thanks for the update, I noticed the same issue on Accton-AS7326-56X and Accton-AS7726-32X, however, I don't have access to them anymore, and I can't collect any specific information.

@rlebedys
Copy link
Author

@adyeung @aravindmani-1 is this fix going to get merged to the master?

@aravindmani-1
Copy link
Contributor

aravindmani-1 commented May 27, 2024

@adyeung @aravindmani-1 is this fix going to get merged to the master?
Yes. This will be merged into master branch.
@prgeor Could you please help to merge this PR #18505 ?.

@tomvil
Copy link

tomvil commented Jun 26, 2024

The same happens with accton_as7326_56x switches. Are there any updates regarding Accton platform?

SONiC Software Version: SONiC.202405.0-dirty-20240620.233504
SONiC OS Version: 12
Distribution: Debian 12.5
Kernel: 6.1.0-11-2-amd64
Build commit: 926d03322
Build date: Thu Jun 20 22:58:12 UTC 2024

Platform: x86_64-accton_as7326_56x-r0
HwSKU: Accton-AS7326-56X
ASIC: broadcom
ASIC Count: 1

@NerijusRazvodovskis
Copy link

Hey @adyeung.

perhaps you had a chance to take a look at accton_as7326_56x switches, seems like they are facing the same issue as those Dell's.

@adyeung
Copy link
Collaborator

adyeung commented Jun 27, 2024

@jostar-yang please help update the config.bcm files from Accton

@tomvil
Copy link

tomvil commented Jul 8, 2024

@jostar-yang have you had the opportunity to review this issue?

@NerijusRazvodovskis
Copy link

@jostar-yang Hello, any update regarding this?

@rlebedys
Copy link
Author

@aravindmani-1 any news about this?

@tomvil
Copy link

tomvil commented Aug 30, 2024

@rlebedys did you test @aravindmani-1 fix, does it work for you? I've just tested it with s5248f and still, as soon as I add subinterface containers start to crash.

The error:

2024 Aug 30 08:09:11.250333 leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_router_interface_common_config:3205 L3 intf create failed with error -2.
2024 Aug 30 08:09:11.250333 leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_router_interface_common_config:3278 RIF common config create failed rv:-2
2024 Aug 30 08:09:11.250333 leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_sub_port_router_interface:3947 Sub-Port RIF common Config failed with error -2.
2024 Aug 30 08:09:11.250333 leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_sub_port_router_interface:4001 SubPort Router Interface Create Failed for port:49 lag:no vlan:666 vpnid:32768 vp:0xb0000001 vfp_entry_id:0 l3_intf_id:0 rv:-2
2024 Aug 30 08:09:11.250333 leaf1 INFO syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_sub_router_intf_l2_unconfig:1964 destroy vlan
2024 Aug 30 08:09:11.250537 leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_xgs_create_router_interface:5176 Error in create router interface failed with error -2.
2024 Aug 30 08:09:11.250590 leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_create_router_interface:493 pd router intf create failed with error -2.
2024 Aug 30 08:09:11.250862 leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_create_router_interface:522 Router Interface Create Failed rv:-2
2024 Aug 30 08:09:11.250909 leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_router_interface_create_err_cleanup:7140 RIF Create failed: rif_id:0 type:4 vrf:0 port-lag-id:49 lag:no vlan:666 virtual:no
2024 Aug 30 08:09:11.250958 leaf1 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_NOT_SUPPORTED
SONiC Software Version: SONiC.202405.0-dirty-20240830.091822
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-11-2-amd64
Build commit: 249c20bdf
Build date: Fri Aug 30 06:58:39 UTC 2024

Platform: x86_64-dellemc_s5248f_c3538-r0
HwSKU: DellEMC-S5248f-P-25G
ASIC: broadcom
ASIC Count: 1
Hardware Revision: N/A
Uptime: 08:06:06 up 12 min,  1 user,  load average: 2.89, 2.14, 1.36
Date: Fri 30 Aug 2024 08:06:06

Docker images:
REPOSITORY                    TAG                              IMAGE ID       SIZE
docker-dhcp-relay             latest                           934cdc88b25f   324MB
docker-dhcp-server            latest                           cdf709f8a11d   338MB
docker-fpm-frr                202405.0-dirty-20240830.091822   1b46e9d04a15   375MB
docker-fpm-frr                latest                           1b46e9d04a15   375MB
docker-macsec                 latest                           3a77d124e235   346MB
docker-lldp                   202405.0-dirty-20240830.091822   7904ddbfb954   360MB
docker-lldp                   latest                           7904ddbfb954   360MB
docker-mux                    202405.0-dirty-20240830.091822   f355662bc7b3   366MB
docker-mux                    latest                           f355662bc7b3   366MB
docker-snmp                   202405.0-dirty-20240830.091822   81a0b637c93e   354MB
docker-snmp                   latest                           81a0b637c93e   354MB
docker-sonic-gnmi             202405.0-dirty-20240830.091822   e4a31bbc8cd4   399MB
docker-sonic-gnmi             latest                           e4a31bbc8cd4   399MB
docker-sonic-mgmt-framework   202405.0-dirty-20240830.091822   6d5e68ff3033   401MB
docker-sonic-mgmt-framework   latest                           6d5e68ff3033   401MB
docker-teamd                  202405.0-dirty-20240830.091822   3f52352e3264   343MB
docker-teamd                  latest                           3f52352e3264   343MB
docker-platform-monitor       202405.0-dirty-20240830.091822   c3c08f5f6d41   440MB
docker-platform-monitor       latest                           c3c08f5f6d41   440MB
docker-sflow                  202405.0-dirty-20240830.091822   9da2da5cca1c   344MB
docker-sflow                  latest                           9da2da5cca1c   344MB
docker-router-advertiser      202405.0-dirty-20240830.091822   c932382d33d1   315MB
docker-router-advertiser      latest                           c932382d33d1   315MB
docker-orchagent              202405.0-dirty-20240830.091822   31fa919519aa   356MB
docker-orchagent              latest                           31fa919519aa   356MB
docker-nat                    202405.0-dirty-20240830.091822   85a7be8ce26d   346MB
docker-nat                    latest                           85a7be8ce26d   346MB
docker-iccpd                  202405.0-dirty-20240830.091822   d44f59428033   344MB
docker-iccpd                  latest                           d44f59428033   344MB
docker-database               202405.0-dirty-20240830.091822   59cefa77b041   323MB
docker-database               latest                           59cefa77b041   323MB
docker-eventd                 202405.0-dirty-20240830.091822   bbe4d9b78786   314MB
docker-eventd                 latest                           bbe4d9b78786   314MB
docker-syncd-brcm             202405.0-dirty-20240830.091822   3f34d16e8e42   717MB
docker-syncd-brcm             latest                           3f34d16e8e42   717MB
docker-gbsyncd-broncos        202405.0-dirty-20240830.091822   6ac692db5646   354MB
docker-gbsyncd-broncos        latest                           6ac692db5646   354MB
docker-gbsyncd-credo          202405.0-dirty-20240830.091822   3f63e3eb401e   327MB
docker-gbsyncd-credo          latest                           3f63e3eb401e   327MB

the fix is applied:

# cat /usr/share/sonic/device/x86_64-dellemc_s5248f_c3538-r0/DellEMC-S5248f-P-25G/td3-s5248f-25g.config.bcm 
...
mem_cache_enable=0
lpm_scaling_enable=0
bcm_num_cos=10
default_cpu_tx_queue=9
host_as_route_disable=1
sai_eapp_config_file=/etc/broadcom/eapps_cfg.json
sai_fast_convergence_support=1
flow_init_mode=1
sai_load_hw_config=/usr/lib/cancun/
...

@aravindmani-1
Copy link
Contributor

@tomvil could you please share the complete steps that you tried?.
Did you tried restarting the switch after applying the NPU configs?..
From the logs shared, SAI API unsupported messages are seen.

@tomvil
Copy link

tomvil commented Aug 30, 2024

@aravindmani-1 I have built the image (202405 branch) with your commit from #18505 pull request. I see the configuration is present in td3-s5248f-25g.config.bcm. And yes, I have tried to restart it.

Is there anything else I can check for you?

SAI version on my switch:

# bcmcmd "bcmsai ver"
bcmsai ver
BRCM SAI ver: [10.1.37.0], OCP SAI ver: [1.13.2], SDK ver: [sdk-6.5.29], CANCUN ver: [06.04.01]

@aravindmani-1
Copy link
Contributor

aravindmani-1 commented Aug 30, 2024

could you share the complete steps that you tried to recreate the issue(starting from the commands used)?.

@tomvil
Copy link

tomvil commented Aug 30, 2024

@aravindmani-1 here's how I reproduce the issue every time:

  1. Install fresh image (built from 202405 branch + your commit)
  2. Wait for containers to become stable
  3. Add subinterface with command config subinterface add Ethernet0.666 666
  4. Wait a few seconds and containers will start to go down/flap.

@aravindmani-1
Copy link
Contributor

@tomvil can you upload the "show techsupport" logs(when you hit the issue, please collect logs since one hour using techsupport options)?.

@audmas
Copy link

audmas commented Sep 13, 2024

@aravindmani-1 here is techsupport logs that you asked from @tomvil

sonic_dump_20240913_075645.tar.gz

@aravindmani-1
Copy link
Contributor

SAI errors are observed and causing orchagent to crash.

    2024 Sep 13 07:45:42.302722 lt-bnk-test-leaf1 DEBUG bgp#bgpcfgd: Received message : '('Ethernet0.666', 'SET', (('vrf', ''),))'
    2024 Sep 13 07:45:42.305093 lt-bnk-test-leaf1 NOTICE swss#orchagent: :- setHostIntfsStripTag: Set SAI_HOSTIF_VLAN_TAG_KEEP to host interface: Ethernet0
    2024 Sep 13 07:45:42.309753 lt-bnk-test-leaf1 INFO syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_sub_router_intf_l2_config:1812 Creating vlan
    2024 Sep 13 07:45:42.312347 lt-bnk-test-leaf1 ERR syncd#syncd: [none] SAI_API_SWITCH:_brcm_sai_l3_intf_create:7096 L3 intf create failed with error Feature unavailable (0xfffffff0).
    2024 Sep 13 07:45:42.312347 lt-bnk-test-leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_router_interface_common_config:3205 L3 intf create failed with error -2.
    2024 Sep 13 07:45:42.312347 lt-bnk-test-leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_router_interface_common_config:3278 RIF common config create failed rv:-2
    2024 Sep 13 07:45:42.312347 lt-bnk-test-leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_sub_port_router_interface:3947 Sub-Port RIF common Config failed with error -2.
    2024 Sep 13 07:45:42.312347 lt-bnk-test-leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_xgs_create_sub_port_router_interface:4001 SubPort Router Interface Create Failed for port:49 lag:no vlan:666 vpnid:32768 vp:0xb0000001 vfp_entry_id:0 l3_intf_id:0 rv:-2
    2024 Sep 13 07:45:42.313778 lt-bnk-test-leaf1 INFO syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:_brcm_sai_sub_router_intf_l2_unconfig:1964 destroy vlan
    2024 Sep 13 07:45:42.314156 lt-bnk-test-leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_xgs_create_router_interface:5176 Error in create router interface failed with error -2.
    2024 Sep 13 07:45:42.314156 lt-bnk-test-leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_create_router_interface:493 pd router intf create failed with error -2.
    2024 Sep 13 07:45:42.314156 lt-bnk-test-leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_create_router_interface:522 Router Interface Create Failed rv:-2
    2024 Sep 13 07:45:42.314156 lt-bnk-test-leaf1 ERR syncd#syncd: [none] SAI_API_ROUTER_INTERFACE:brcm_sai_router_interface_create_err_cleanup:7140 RIF Create failed: rif_id:0 type:4 vrf:0 port-lag-id:49 lag:no vlan:666 virtual:no
    2024 Sep 13 07:45:42.314156 lt-bnk-test-leaf1 ERR syncd#syncd: :- sendApiResponse: api SAI_COMMON_API_CREATE failed in syncd mode: SAI_STATUS_NOT_SUPPORTED
    2024 Sep 13 07:45:42.314667 lt-bnk-test-leaf1 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_VIRTUAL_ROUTER_ID: oid:0x300000000003a
    2024 Sep 13 07:45:42.314667 lt-bnk-test-leaf1 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_SRC_MAC_ADDRESS: 0C:29:EF:E3:B7:80
    2024 Sep 13 07:45:42.314667 lt-bnk-test-leaf1 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_TYPE: SAI_ROUTER_INTERFACE_TYPE_SUB_PORT
    2024 Sep 13 07:45:42.314667 lt-bnk-test-leaf1 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_PORT_ID: oid:0x1000000000012
    2024 Sep 13 07:45:42.314667 lt-bnk-test-leaf1 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_OUTER_VLAN_ID: 666
    2024 Sep 13 07:45:42.314667 lt-bnk-test-leaf1 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE: true
    2024 Sep 13 07:45:42.314667 lt-bnk-test-leaf1 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE: true
    2024 Sep 13 07:45:42.314667 lt-bnk-test-leaf1 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_MTU: 9100
    2024 Sep 13 07:45:42.314667 lt-bnk-test-leaf1 ERR syncd#syncd: :- processQuadEvent: attr: SAI_ROUTER_INTERFACE_ATTR_NAT_ZONE_ID: 0
    2024 Sep 13 07:45:42.315517 lt-bnk-test-leaf1 ERR swss#orchagent: :- create: create status: SAI_STATUS_NOT_SUPPORTED
    2024 Sep 13 07:45:42.315517 lt-bnk-test-leaf1 ERR swss#orchagent: :- addRouterIntfs: Failed to create router interface Ethernet0.666, rv:-2
    2024 Sep 13 07:45:42.315517 lt-bnk-test-leaf1 ERR swss#orchagent: :- handleSaiCreateStatus: Encountered failure in create operation, exiting orchagent, SAI API: SAI_API_ROUTER_INTERFACE, status: SAI_STATUS_NOT_SUPPORTED
    2024 Sep 13 07:45:42.315517 lt-bnk-test-leaf1 NOTICE swss#orchagent: :- notifySyncd: sending syncd: SYNCD_INVOKE_DUMP
    2024 Sep 13 07:45:42.315517 lt-bnk-test-leaf1 NOTICE syncd#syncd: :- processNotifySyncd: Invoking SAI failure dump
    2024 Sep 13 07:45:42.328020 lt-bnk-test-leaf1 NOTICE swss#orchagent: :- sai_redis_notify_syncd: invoked DUMP succeeded
    2024 Sep 13 07:45:43.278891 lt-bnk-test-leaf1 INFO swss#supervisord 2024-09-13 07:45:43,277 WARN exited: orchagent (terminated by SIGABRT (core dumped); not expected)

@adyeung could you please check on why SAI failures are seen here?.

@anilkpan
Copy link

Following attributes for router interface creation are not supported in the SAI version that is used by community SONiC:

SAI_ROUTER_INTERFACE_ATTR_ADMIN_V4_STATE
SAI_ROUTER_INTERFACE_ATTR_ADMIN_V6_STATE

Please try commenting out the following in intfsorch.cpp and try:

https://github.com/sonic-net/sonic-swss/blob/20e8b362a5e4fd5361a6f08effddd08d61d5d262/orchagent/intfsorch.cpp#L1245

@tomvil
Copy link

tomvil commented Sep 18, 2024

@aravindmani-1 @adyeung @anilkpan

I found this TD3 configuration as possible fix:

use_all_splithorizon_groups=1
riot_enable=1
sai_tunnel_support=1
riot_overlay_l3_intf_mem_size=4096
riot_overlay_l3_egress_mem_size=32768
riot_overlay_ecmp_resilient_hash_size=16384
flow_init_mode=1
host_as_route_disable=1

tested with Dell and EdgeCore - both work as expected and subinterface doesn't cause crashes anymore.

Can you advise if this is a good approach?

found it in: #9491

@anilkpan
Copy link

@tomvil , let me check and get back on this.

@anilkpan
Copy link

anilkpan commented Sep 19, 2024

@tomvil , you are right, the router interface admin state support is actually merged to the SAI version linked to community.

@tomvil
Copy link

tomvil commented Sep 24, 2024

@anilkpan @aravindmani-1 will you merge these changes to master (and 202405 branch) ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DELL Triaged this issue has been triaged
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants