Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial version of PINS P4Orch HLD #825

Merged
merged 1 commit into from
May 16, 2024
Merged

Conversation

mint570
Copy link
Collaborator

@mint570 mint570 commented Jul 29, 2021

High Level Design document for PINS P4Orch.

@mint570 mint570 marked this pull request as draft July 29, 2021 21:01
Copy link

@reshmaintel reshmaintel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

------- The P4Orch is a new orchagent which lives inside the OrchAgent process and picks up the entries to make the corresponding SAI API calls to program the switch hardware. ------

This may be old info ? P4OrchAgent will write the entries to ASIC DB. The demux does not happen here but after ASICDB in libsai?

@mint570
Copy link
Collaborator Author

mint570 commented Aug 23, 2021

------- The P4Orch is a new orchagent which lives inside the OrchAgent process and picks up the entries to make the corresponding SAI API calls to program the switch hardware. ------

This may be old info ? P4OrchAgent will write the entries to ASIC DB. The demux does not happen here but after ASICDB in libsai?

All orchs make SAI API calls, but they don't directly program the hardware. Sairedis is the library that orchagent uses to make the SAI call, and it will write to ASIC DB. It is done in the sairedis library.
Updated the doc to clarify that P4Orch writes into ASIC DB.

Not sure what "demux" means. Does it mean to separate P4 requests from existing SONiC requests? If so, there is no demux after ASIC DB. We define new table in APPL DB, but the SAI API and ASIC DB are unchanged.

@prsunny
Copy link
Contributor

prsunny commented Sep 8, 2021

For Copp Traps, you can refer this section to disable a trap via config_db entry. So its not required to edit/modify copp_cfg.json. Just have the entry with empty/null attributes.

For creating a hostif, can you refer this flow? We can have a discussion if needed.


## Restrictions/Limitations

The P4Orch is designed to meet the SDN requirements. And hence there are some differences from the other SONiC orchagents:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also specify how the re-programming happens if lets say there is an swss crash/restart and APP_DB is cleared during init. Call out if there are any limitations.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks.

Updated the "Restrictions/Limitations" section that P4Orch does not support warmboot and orchagent restart in the initial phase.

For my understanding, orchagent restart is similar to warmboot. In both case, the APPL DB tables won't be cleared. And orchagent will re-program the APPL DB tables in "init view" mode. This is not specific to P4Orch. It is a common feature for all orchs. Our main challenge for supporting warmboot now is that P4Orch doesn't do retry. During warmboot init, all table requests will come in a single batch. Since P4Orch doesn't do retry, dependency on SONiC table (such as port & vrf) can not be satisfied. We do not support warmboot in the initial phase.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

orchagent restart is not similar to warmboot. APP_DB shall be flushed during init. In regular cases, the configs are reprogrammed by managers. In case of p4rt, does this leave the system in such a state? What actions to reprogram the device?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about the late reply. (I didn't use my primary email to setup my github account. Need to do a better job in that.)

In our use case, we don't expect the system to recover after orchagent crashes. The system will go to "critical state". The controller will be notified and starts to drain and reboot the switch. Currently, we don't have special handling in recovering P4Orch after crashing.

I would like to understand more on how SONiC handles the crashing. This will help us improve this in the future. I have a few questions on orchagent restarts in the existing SONiC:

  1. You mentioned that the "configs are reprogrammed by managers". Are those the config managers that reads from CONFIG DB and writes to APPL DB? But for L3 forwarding, there is no CONFIG DB. Does anyone read the host routing and program them in APPL DB?
  2. When swss container restarts, will syncd container also restarts and wipes the ASIC? If not, how will syncd handle the "duplicate" L3 forwarding requests after orchanger restarts?

Thanks.

Copy link
Contributor

@prsunny prsunny Sep 24, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Yes and for BGP, it re-learns the route and program APP_DB. orchagent restart will also restart bgp
  2. Yes, syncd container restarts when orchagent restarts

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. That's very helpful.

From my understanding, the existing SONiC behavior does not meet our requirement. Especially in syncd restart, which will wipe the ASIC. It will cause packet drop even if we re-program the rules. In our usage, we do not restart syncd and orchagent. If any of them crash, the ASIC will still function (but no new rules can be programmed). The system will be in a "critical state". The controller will be notified and starts to drain and reboot the switch.

I updated the HLD a little bit to clarify that we don't support orchagent restart for P4RT table yet.

@mint570 mint570 marked this pull request as ready for review October 13, 2021 22:37
@mint570
Copy link
Collaborator Author

mint570 commented Oct 13, 2021

@prsunny @reshmaintel
PTAL
Thanks

@mint570
Copy link
Collaborator Author

mint570 commented Apr 8, 2022

@prsunny
Is this ready to merged? Once this HLD is merged, it will fill the P4Orch reference doc in the PIN HLD: https://github.com/Azure/SONiC/blob/master/doc/pins/pins_hld.md#p4-orchagent

This doc also have reference to other un-merged HLDs:
#840
#846
They should be ready to merged as well. Thanks.

@yxieca yxieca force-pushed the master branch 2 times, most recently from 8498931 to 8837dc2 Compare April 15, 2022 16:51
@zhangyanzhao
Copy link
Collaborator

@mint570 can you please sign the EasyCLA which is required to merge this PR? Thanks.

@mint570
Copy link
Collaborator Author

mint570 commented Feb 28, 2023

Done. Just need to re-run the check.

@prsunny prsunny merged commit ebf4d61 into sonic-net:master May 16, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants