Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[202205] Enhance orchagent and buffer manager in error handling (#2414) #2449

Merged

Conversation

stephenxs
Copy link
Collaborator

Cherry-pick #2414 to 202205.

What I did
Enhance orchagent and buffer manager

  • Buffer manager: do not insert buffer queue into cache if the profile is illegal, which prevents an empty string from being inserted into APPL_DB during initialization.
  • orchagent: handle the case that a field referencing other objects is an empty string.
    There had been such logic that was broken by a PR last year.

Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it
Enhance the error handling logic.
In most cases, a user will not encounter such scenarios in a production environment because it's the front-ends' (eg. CLI) responsibility to identify the wrong configuration and prevent them from being inserted to CONFIG_DB.
However, in some cases, like a wrong config_db.json composed and copied to the switch, front-ends can not prevent that.

How I verified it
Manual and mock tests.

Details if related
For the improvement in buffer manager:

  • previously, the logic was:
    • declare a reference portQueue to m_portQueueLookup[port][queues] and then assign fvValue(i) to portQueue.running_profile_name
    • But [] operation on C++ map has a side-effect -- it will insert a new element into the map if there wasn't one. In case the validation check in checkBufferProfileDirection failed and there was not one in the map, the portQueue.running_profile_name will keep empty. This is not what we want.
    • In case there was an item configured in the map, we should not remove it on failure because we want to prevent the user from being affected by misconfiguration and alert user to correct the error. There is log in checkBufferProfileDirection
  • Now it is improved in this way:
    • Avoid using reference and initialize m_portQueueLookup[port][queues] only if there is a valid egress profile configured

What I did
Enhance orchagent and buffer manager

Buffer manager: do not insert buffer queue into cache if the profile is illegal, which prevents an empty string from being inserted into APPL_DB during initialization.
orchagent: handle the case that a field referencing other objects is an empty string.
There had been such logic that was broken by a PR last year.
Signed-off-by: Stephen Sun stephens@nvidia.com

Why I did it
Enhance the error handling logic.
In most cases, a user will not encounter such scenarios in a production environment because it's the front-ends' (eg. CLI) responsibility to identify the wrong configuration and prevent them from being inserted to CONFIG_DB.
However, in some cases, like a wrong config_db.json composed and copied to the switch, front-ends can not prevent that.

How I verified it
Manual and mock tests.

Details if related
For the improvement in buffer manager:

previously, the logic was:
declare a reference portQueue to m_portQueueLookup[port][queues] and then assign fvValue(i) to portQueue.running_profile_name
But [] operation on C++ map has a side-effect -- it will insert a new element into the map if there wasn't one. In case the validation check in checkBufferProfileDirection failed and there was not one in the map, the portQueue.running_profile_name will keep empty. This is not what we want.
In case there was an item configured in the map, we should not remove it on failure because we want to prevent the user from being affected by misconfiguration and alert user to correct the error. There is log in checkBufferProfileDirection
Now it is improved in this way:
Avoid using reference and initialize m_portQueueLookup[port][queues] only if there is a valid egress profile configured
@stephenxs stephenxs changed the title Enhance orchagent and buffer manager in error handling (#2414) [202205] Enhance orchagent and buffer manager in error handling (Cherry-pick #2414) Sep 14, 2022
@liat-grozovik liat-grozovik changed the title [202205] Enhance orchagent and buffer manager in error handling (Cherry-pick #2414) [202205] Enhance orchagent and buffer manager in error handling (#2414) Sep 14, 2022
@stephenxs stephenxs marked this pull request as ready for review September 16, 2022 02:39
@liat-grozovik liat-grozovik merged commit 5d8636a into sonic-net:202205 Sep 19, 2022
dgsudharsan added a commit to dgsudharsan/sonic-buildimage that referenced this pull request Sep 21, 2022
Update sonic-swss submodule pointer to include the following:
* 8eea92e [202205][counters] Revert PR sonic-net#2432 for the buffer queue/pg counters improvement ([sonic-net#2462](sonic-net/sonic-swss#2462))
* 5d8636a [202205] Enhance orchagent and buffer manager in error handling (sonic-net#2414) ([sonic-net#2449](sonic-net/sonic-swss#2449))
* aa22237 [Everflow/ERSPAN] Set correct destination port and mac address when the nexthop is updated for ERSPAN mirror destination (sonic-net#2392) ([sonic-net#2455](sonic-net/sonic-swss#2455))
* 04ce7be check state_db for po before sending ARP/ND pkts (sonic-net#2444) ([sonic-net#2450](sonic-net/sonic-swss#2450))
* f0138a2 [portmgr] Fixed the orchagent crash due to late arrival of notif (sonic-net#2431) ([sonic-net#2451](sonic-net/sonic-swss#2451))
* 7cfde48 Change the log messages in addKernelNeigh/Route from ERROR to INFO ([sonic-net#2437](sonic-net/sonic-swss#2437))
* 2c5116e [202205][counters] Improve performance by polling only configured ports buffer queue/pg counters ([sonic-net#2432](sonic-net/sonic-swss#2432))

Signed-off-by: dgsudharsan <sudharsand@nvidia.com>
prsunny pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Sep 21, 2022
Update sonic-swss submodule pointer to include the following:
* 8eea92e [202205][counters] Revert PR #2432 for the buffer queue/pg counters improvement ([#2462](sonic-net/sonic-swss#2462))
* 5d8636a [202205] Enhance orchagent and buffer manager in error handling (#2414) ([#2449](sonic-net/sonic-swss#2449))
* aa22237 [Everflow/ERSPAN] Set correct destination port and mac address when the nexthop is updated for ERSPAN mirror destination (#2392) ([#2455](sonic-net/sonic-swss#2455))
* 04ce7be check state_db for po before sending ARP/ND pkts (#2444) ([#2450](sonic-net/sonic-swss#2450))
* f0138a2 [portmgr] Fixed the orchagent crash due to late arrival of notif (#2431) ([#2451](sonic-net/sonic-swss#2451))
* 7cfde48 Change the log messages in addKernelNeigh/Route from ERROR to INFO ([#2437](sonic-net/sonic-swss#2437))
* 2c5116e [202205][counters] Improve performance by polling only configured ports buffer queue/pg counters ([#2432](sonic-net/sonic-swss#2432))
@stephenxs stephenxs deleted the enhance-buffer-mgr-orch-202205 branch September 28, 2022 10:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants