Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Dynamic buffer calc] Support dynamic buffer calculation #1338

Merged
merged 8 commits into from
Dec 15, 2020

Conversation

stephenxs
Copy link
Collaborator

@stephenxs stephenxs commented Jul 1, 2020

What I did

Support dynamic buffer calculation

  1. Extend the CLI options for buffermgrd:

    • -a: asic_table provided,
    • -p: peripheral_table provided

    The buffermgrd will start the dynamic headroom calculation mode with -a provided.
    Otherwise, it will start the legacy mode (pg_headroom_profile looking up)

  2. A new class is provided for dynamic buffer calculation while the old one remains.
    The daemon will instantiate the corresponding class according to the CLI option when it starts.

  3. In both modes, the buffermgrd will copy BUFFER_XXX tables from CONFIG_DB to APPL_DB and the bufferorch will consume BUFFER_XXX tables from APPL_DB

Backward compatibility
For legacy mode, the backward compatibility is provided. As mentioned above, buffermgrd will check whether the json file representing the ASIC_TABLE exists when it starts.

  • If yes it will start the dynamic buffer calculating mode
  • Otherwise, it will start the compatible mode which is the old looking up mode in the new code committed in this PR.
    This logic is in cfgmgr/buffermgrd.cpp.

The logic of buffer handling in buffermgrd isn't changed in the legacy mode. The differences are:

  • in legacy mode which is the old code, there isn't any buffer related table in APPL_DB. All tables are in CONFIG_DB.
    • buffermgrd listens to PORT and CABLE_LENGTH tables in CONFIG_DB and inserts the buffer profiles into BUFFER_PROFILE table.
    • bufferorch listens to buffer related tables in CONFIG_DB and call SAI API correspondingly.
  • In the compatible mode, buffermgrd listens to tables in CONFIG_DB and copies them into APPL_DB
    • buffermgrd
      • listens to PORT and CABLE_LENGTH tables in CONFIG_DB and inserts the buffer profiles into BUFFER_PROFILE table in CONFIG_DB (not changed)
      • listens to buffer related tables in CONFIG_DB and copies them into APPL_DB
    • bufferorch listens to APPL_DB and call SAI API correspondingly. (the difference is the db it listens to).
    • db_migrator is responsible to copy the buffer related tables from CONFIG_DB to APPL_DB when system is warmbooted from the old image to the new image for the first time.

The compatible code is in cfgmgr/buffermgr.cpp, orchagent/bufferorch.cpp and db_migrator (in the sonic-utilities PR).

Why I did it

How I verified it

  1. vs test
  2. regression test PR: [Dynamic buffer calc] Test cases for dynamic buffer calculation

Dynamic buffer details

  1. In the dynamic buffer calculation mode, there are 3 lua plugins are provided for vendor-specific operations:
    • buffer_headroom_.lua, for calculating headroom size.
    • buffer_pool_.lua, for calculating buffer pool size.
    • buffer_check_headroom_.lua, for checking whether headroom exceeds the limit
  2. During initialization, The daemon will:
    • load asic_table and peripheral_table from the given json file, parse them and push them into STATE_DB.ASIC_TABLE and STATE_DB.PERIPHERAL_TABLE respectively
    • load all plugins
    • try to load the STATE_DB.BUFFER_MAX_PARAM.mmu_size which is used for updating buffer pool size
    • a timer will be started for periodic buffer pool size audit
  3. The daemon will listen to and handle the following tables from CONFIG_DB
    The tables will be cached internally in the daemon for the purpose of saving access time
    • BUFFER_POOL:
      • if the size is provided: insert the entry to APPL_DB
      • otherwise: cache them and push to APPL_DB after the size is calculated by lua plugin
    • BUFFER_PROFILE and BUFFER_PG:
      • items for ingress lossless headroom need to be cached and handled (according to the design)
      • other items will be inserted to the APPL_DB directly
    • PORT_TABLE, for ports' speed and MTU update
    • CABLE_LENGTH, for ports' cable length
  4. Other tables will be copied to APPL_DB directly:
    • BUFFER_QUEUE
    • BUFFER_PORT_INGRESS_PROFILE_LIST
    • BUFFER_PORT_EGRESS_PROFILE_LIST
  5. BufferOrch modified accordingly:
    • Consume buffer relevant tables from APPL_DB instead of CONFIG_DB
    • For BUFFER_POOL, don't set ingress/egress and static/dynamic to sai if the pool has already existed because they are create-only
    • For BUFFER_PROFILE, don't set pool for the same reason
  6. Warm reboot:
    • db_migrator is responsible for copying the data from CONFIG_DB to APPL_DB if the switch is warm-rebooted from an old image to the new image for the first time
    • no specific handling in the daemon side
  7. Provide vstest script

@lguohan lguohan requested a review from neethajohn July 2, 2020 10:04
@stephenxs stephenxs changed the title [buffermgr/bufferorch] Support dynamic buffer calculation [Dynamic buffer calc] Support dynamic buffer calculation Aug 6, 2020
@stephenxs stephenxs marked this pull request as ready for review August 14, 2020 11:56
cfgmgr/buffermgr.cpp Outdated Show resolved Hide resolved
cfgmgr/buffermgr.cpp Show resolved Hide resolved
orchagent/bufferorch.cpp Outdated Show resolved Hide resolved
orchagent/bufferorch.cpp Outdated Show resolved Hide resolved
orchagent/bufferorch.cpp Outdated Show resolved Hide resolved
orchagent/bufferorch.cpp Outdated Show resolved Hide resolved
cfgmgr/buffermgr.cpp Show resolved Hide resolved
orchagent/bufferorch.cpp Outdated Show resolved Hide resolved
orchagent/bufferorch.cpp Show resolved Hide resolved
orchagent/bufferorch.h Outdated Show resolved Hide resolved
cfgmgr/buffermgr.cpp Show resolved Hide resolved
cfgmgr/buffermgrd.cpp Outdated Show resolved Hide resolved
cfgmgr/buffermgr.cpp Show resolved Hide resolved
cfgmgr/buffermgr.h Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Outdated Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.h Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
@keboliu
Copy link
Collaborator

keboliu commented Nov 4, 2020

retest this please

cfgmgr/buffermgrdyn.cpp Outdated Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Outdated Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
cfgmgr/buffermgrdyn.cpp Show resolved Hide resolved
tests/test_buffer.py Outdated Show resolved Hide resolved
kperumalbfn
kperumalbfn previously approved these changes Nov 19, 2020
tests/test_buffer.py Outdated Show resolved Hide resolved
tests/test_buffer.py Outdated Show resolved Hide resolved
tests/test_buffer.py Outdated Show resolved Hide resolved
tests/test_buffer.py Outdated Show resolved Hide resolved
tests/test_buffer.py Outdated Show resolved Hide resolved
tests/test_buffer.py Outdated Show resolved Hide resolved
@lgtm-com
Copy link

lgtm-com bot commented Nov 30, 2020

This pull request introduces 1 alert when merging bfc28a9 into a1d6300 - view on LGTM.com

new alerts:

  • 1 for Unused import

1. Extend the CLI options for buffermgrd:
   -a: asic_table provided,
   -p: peripheral_table provided
   The buffermgrd will start with dynamic headroom calculation mode With -a provided
   Otherwise it will start the legacy mode (pg_headroom_profile looking up)
2. A new class is provided for dynamic buffer calculation while the old one remains.
   The daemon will instantiate the corresponding class according to the CLI option when it starts.
3. In both mode, the buffermgrd will copy BUFFER_XXX tables from CONFIG_DB to APPL_DB
   and the bufferorch will consume BUFFER_XXX tables from APPL_DB
The following points are for dynamic buffer calculation mode
4. In the dynamic buffer calculation mode, there are 3 lua plugins are provided for vendor-specific operations:
   - buffer_headroom_<vendor>.lua, for calculationg headroom size.
   - buffer_pool_<vendor>.lua, for calculating buffer pool size.
   - buffer_check_headroom_<vendor>.lua, for checking whether headroom exceeds the limit
5. During initialization, The daemon will:
   - load asic_table and peripheral_table from the given json file, parse them
     and push them into STATE_DB.ASIC_TABLE and STATE_DB.PERIPHERAL_TABLE respectively
   - load all plugins
   - try to load the STATE_DB.BUFFER_MAX_PARAM.mmu_size which is used for updating buffer pool size
   - a timer will be started for periodic buffer pool size audit
6. The daemon will listen to and handle the following tables from CONFIG_DB
   The tables will be cached internally in the damon for the purpose of saving access time
   - BUFFER_POOL:
     - if size is provided: insert the entry to APPL_DB
     - otherwise: cache them and push to APPL_DB after the size is calculated by lua plugin
   - BUFFER_PROFILE and BUFFER_PG:
     - items for ingress lossless headroom need to be cached and handled (according to the design)
     - other items will be inserted to the APPL_DB directly
   - PORT_TABLE, for ports' speed and MTU update
   - CABLE_LENGTH, for ports' cable length
7. Other tables will be copied to APPL_DB directly:
   - BUFFER_QUEUE
   - BUFFER_PORT_INGRESS_PROFILE_LIST
   - BUFFER_PORT_EGRESS_PROFILE_LIST
   As the names of tables in APPL_DB differ from that in CONFIG_DB, all references should be adjusted accordingly
8. BufferOrch modified accordingly: Consume buffer relavent tables from APPL_DB instead of CONFIG_DB
9. Warm reboot:
   - db_migrator is responsible for copying the data from CONFIG_DB to APPL_DB if switch is warm-rebooted
   from an old image to the new image for the first time
   - no specific handling in the daemon side
10.Provide vstest script

Signed-off-by: Stephen Sun <stephens@nvidia.com>
- Very the profile in ASIC_DB when possible:
  in case of a new profile is created, we can get the OID of the profile
  by comparing SAI OID set before and after the creation
- Add function which can switch buffer model dynamically
- Add testcases for mtu update and non-default alpha

Signed-off-by: Stephen Sun <stephens@nvidia.com>
Signed-off-by: Stephen Sun <stephens@nvidia.com>
@neethajohn
Copy link
Contributor

@stephenxs , Dynamic buffer tests are failing. can you check?

@stephenxs
Copy link
Collaborator Author

  1. rebase and smash all the commits before the one @kperumalbfn approved (inclusive) into one in order to make it clear and resolve the conflict. No change to the code.
  2. Adjust test cases according to @neethajohn 's comments. (see the last two comments)

@stephenxs
Copy link
Collaborator Author

stephenxs commented Dec 3, 2020

@stephenxs , Dynamic buffer tests are failing. can you check?

Yes, @neethajohn , it’s expected because the sonic build image and sonic utilities haven’t been merged yet. We need the vsimage supporting dynamic buffer to pass the test.
All failures are caused by buffer model isn’t found in DEVICE_METADATA|local_host. This field is implemented in the sonic build image PR.
Meanwhile, the sonic build image PR also depends on this PR, which means we are in loop dependency there will always failures before all PRs merged.
I suggest merging in this order:

  1. sonic swss common (already merged)
  2. sonic swss PR
  3. sonic utilities PR
  4. sonic build image PR along with advancing sub module heads

…retry

In case the head element in m_toSync needs to retry, the doTask will move to the
following items and call process<TableName> for each of the items in the m_toSync.
However, as the process<TableName> always handle the first element in m_toSync,
it results in the m_toSync being blocked if the head needs retry.

Solution: pass the element to which doTask move to process<TableName> so that
it's able to handle the following element in m_toSync

Signed-off-by: Stephen Sun <stephens@nvidia.com>
neethajohn
neethajohn previously approved these changes Dec 8, 2020
@liat-grozovik
Copy link
Collaborator

@lguohan and @neethajohn can you please merge this PR so we can cont with the merge flow as suggested above? i dont have the ability to do so if vs is failing and in this case it should fail until we finish the merge flow.

kperumalbfn
kperumalbfn previously approved these changes Dec 8, 2020
@liat-grozovik
Copy link
Collaborator

retest vs please

1 similar comment
@stephenxs
Copy link
Collaborator Author

retest vs please

@stephenxs
Copy link
Collaborator Author

retest this please

@stephenxs
Copy link
Collaborator Author

The vs test failed only on the dynamic buffer test cases, which is expected. I think we are safe to merge it.
@neethajohn @lguohan Could you help with this?
Thanks.

qiluo-msft pushed a commit to sonic-net/sonic-buildimage that referenced this pull request Dec 13, 2020
**- Why I did it**
To support dynamic buffer calculation.
This PR also depends on the following PRs for sub modules
- [sonic-swss: [buffermgr/bufferorch] Support dynamic buffer calculation #1338](sonic-net/sonic-swss#1338)
- [sonic-swss-common: Dynamic buffer calculation #361](sonic-net/sonic-swss-common#361)
- [sonic-utilities: Support dynamic buffer calculation #973](sonic-net/sonic-utilities#973)

**- How I did it**
1. Introduce field `buffer_model` in `DEVICE_METADATA|localhost` to represent which buffer model is running in the system currently:
    - `dynamic` for the dynamic buffer calculation model
    - `traditional` for the traditional model in which the `pg_profile_lookup.ini` is used
2. Add the tables required for the feature:
   - ASIC_TABLE in platform/\<vendor\>/asic_table.j2
   - PERIPHERAL_TABLE in platform/\<vendor\>/peripheral_table.j2
   - PORT_PERIPHERAL_TABLE on a per-platform basis in device/\<vendor\>/\<platform\>/port_peripheral_config.j2 for each platform with gearbox installed.
   - DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN in files/build_templates/buffers_config.j2
   - Add lossless PGs (3-4) for each port in files/build_templates/buffers_config.j2
3. Copy the newly introduced j2 files into the image and rendering them when the system starts
4. Update the CLI options for buffermgrd so that it can start with dynamic mode
5. Fetches the ASIC vendor name in orchagent:
   - fetch the vendor name when creates the docker and pass it as a docker environment variable
   - `buffermgrd` can use this passed-in variable
6. Clear buffer related tables from STATE_DB when swss docker starts
7. Update the src/sonic-config-engine/tests/sample_output/buffers-dell6100.json according to the buffer_config.j2
8. Remove buffer pool sizes for ingress pools and egress_lossy_pool
   Update the buffer settings for dynamic buffer calculation
1. Initialize DEFAULT_LOSSLESS_BUFFER_PARAMETER and LOSSLESS_TRAFFIC_PATTERN when enable the dynamic buffer
   and remove them when disable it.
   Originally they were enabled by default regardless whether dynamic buffer model is enabled, which has been
   removed according to the latest review comments from Qi. So move the initialization here.
2. Remove all the dependency to buffer commands in order to make the vstest pass without sonic-utilities merged
3. Some enhancement: make the test more stable:
   - before test starting, waiting all dynamic generated buffer profiles removed
   - adjust the order in which related tables removed, to make sure no leftover after buffer model switched
   - as no dynamic profile existing before test starting, it's possible to check asic db for speed change testcase

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Collaborator Author

retest this please

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@neethajohn
Copy link
Contributor

@stephenxs, can you look at the buffer test failures and fix them?

Signed-off-by: Stephen Sun <stephens@nvidia.com>
@stephenxs
Copy link
Collaborator Author

@stephenxs, can you look at the buffer test failures and fix them?

Hi @neethajohn done. All tests passed

@abdosi
Copy link
Contributor

abdosi commented Dec 16, 2020

@stepanblyschak this commit is causing sonic-util-build-pr failure. Can you please check ?

https://sonic-jenkins.westus2.cloudapp.azure.com/job/common/job/sonic-utilities-build-pr/3478/console

cc @daall @jleveque

@stephenxs
Copy link
Collaborator Author

@abdosi I'm checking it.

@daall
Copy link
Contributor

daall commented Dec 16, 2020

@abdosi @stephenxs it looks like it is fixed based on the latest few PRs that are still running. I think what happened is that those PRs that failed were based on swss build #2369, which didn't have the code changes in yet, and the later ones are using https://sonic-jenkins.westus2.cloudapp.azure.com/job/vs/job/sonic-swss-build/2370/ or later, which does have the code change.

I will look into updating the utilities test job to pull the bins and the tests from the same build rather than cloning the tests from the swss repo directly.

(example: https://sonic-jenkins.westus2.cloudapp.azure.com/job/common/job/sonic-utilities-build-pr/3481/console, which uses the swss binaries from 2371)

@stephenxs
Copy link
Collaborator Author

Thanks @daall . Just checked it. You are correct.

@stephenxs stephenxs deleted the dynamic-buffer-calculation branch December 17, 2020 01:03
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
Remove mst dump

Signed-off-by: Shlomi Bitton <shlomibi@nvidia.com>
EdenGri pushed a commit to EdenGri/sonic-swss that referenced this pull request Feb 28, 2022
…nic-net#1407)

This reverts commit b10622e.

**What I did**
revert changes to call sdkdump and replace with old call to mstdump

**How I did it**
reverting a previous commit [Mellanox] Add FW dump with new SAI implementation and remove mst dump sonic-net#1338

**How to verify it**
run techsupport
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants