Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DNX] Orchagent/Syncd crash due to ECMP hash offset set failed with error -2 #19059

Closed
arista-nwolfe opened this issue May 23, 2024 · 9 comments
Assignees
Labels
Chassis 🤖 Modular chassis support chassis-voq Voq chassis changes P0 Priority of the issue

Comments

@arista-nwolfe
Copy link
Contributor

arista-nwolfe commented May 23, 2024

On latest master build on DNX platforms we're seeing that the orchagent and syncd containers are crashing due to an unsupported SAI call.

ERR syncd1#syncd: [07:00.0] SAI_API_SWITCH:brcm_sai_set_switch_attribute:5078 ECMP hash offset set failed with error -2.
ERR syncd1#syncd: :- sendApiResponse: api SAI_COMMON_API_SET failed in syncd mode: SAI_STATUS_NOT_SUPPORTED
ERR syncd1#syncd: :- processQuadEvent: VID: oid:0x121000000000001 RID: oid:0x8850012100000000
ERR syncd1#syncd: :- processQuadEvent: attr: SAI_SWITCH_ATTR_ECMP_DEFAULT_HASH_OFFSET: 10
ERR swss1#orchagent: :- set: set status: SAI_STATUS_NOT_SUPPORTED
ERR swss1#orchagent: :- doAppSwitchTableTask: Failed to set switch attribute ecmp_hash_offset to 10, rv:-2
ERR swss1#orchagent: :- handleSaiSetStatus: Encountered failure in set operation, exiting orchagent, SAI API: SAI_API_SWITCH, status: SAI_STATUS_NOT_SUPPORTED
NOTICE swss1#orchagent: :- notifySyncd: sending syncd: SYNCD_INVOKE_DUMP
NOTICE syncd1#syncd: :- processNotifySyncd: Invoking SAI failure dump
NOTICE swss1#orchagent: :- sai_redis_notify_syncd: invoked DUMP succeeded
WARNING syncd0#syncd: message repeated 59 times: [ [06:00.0] SAI_API_UNSPECIFIED:sai_bulk_object_get_stats:748 Unsupported object type type 1]
NOTICE syncd0#syncd: :- threadFunction: time span 81 ms for 'start_poll:FABRIC_PORT_STAT_COUNTER:oid:0x10000000000b8'
INFO swss1#supervisord 2024-05-23 20:20:15,295 WARN exited: orchagent (terminated by SIGABRT (core dumped); not expected)

The current DNX SAI on master is 10.1.15 and we can see in this SAI call it's not supported on DNX

_brcm_sai_loadbalance_ecmp_hash_offset_set(unsigned int val)
{
    int rv;
    unsigned int hashf_offset;

    BRCM_SAI_LOG_SWITCH(SAI_LOG_LEVEL_DEBUG, "Ecmp hash offset set %u", val);
    if (DEV_IS_DNX())
    {
        return SAI_STATUS_NOT_SUPPORTED;
    }

The SAI call was added to orchagent by #18912

@kenneth-arista
Copy link
Contributor

@arlakshm @ysmanman for awareness

@lguohan
Copy link
Collaborator

lguohan commented May 24, 2024

@kperumalbfn , can you check this one? i thought the orchagent change was not merged, why the buildimage change is causing the breakage?

@kperumalbfn
Copy link
Contributor

@arista-nwolfe @kenneth-arista sonic-swss PR - https://github.com/sonic-net/sonic-swss/pull/3138/files checks for SAI attribute capability before invoking set_switch_attribute API. This PR is already merged.

Based on the above code snippet from SDK, BCM SDK returns 'true' for the attribute capability support, but it returns failure for set_switch API and that is incorrect. Could you update the SDK to return unsupported or not_implemented for DNX platform for the 2 SAI attributes and that should avoid this switch initialization crash.

@lguohan
Copy link
Collaborator

lguohan commented May 24, 2024

@kenneth-arista , can you help to create CSP and ask brcm to fix it?

@arista-nwolfe
Copy link
Contributor Author

Created CS00012352219 to track this

@arlakshm
Copy link
Contributor

@mlok-nokia @saksarav-nokia for viz...

@arlakshm arlakshm added Chassis 🤖 Modular chassis support chassis-voq Voq chassis changes P0 Priority of the issue labels May 24, 2024
@arista-nwolfe
Copy link
Contributor Author

Broadcom has a fix that Arista has confirmed works. Broadcom will add this fix to the next 10.x SAI

@kenneth-arista
Copy link
Contributor

DNX SAI 10.1.20 has the fix.

@rlhui
Copy link
Contributor

rlhui commented Jun 19, 2024

Arista confirmed issue is fixed.

@rlhui rlhui closed this as completed Jun 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Chassis 🤖 Modular chassis support chassis-voq Voq chassis changes P0 Priority of the issue
Projects
Status: Done
Development

No branches or pull requests

6 participants