-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added schema for health_info, reboot_cause on chassisStateDB and added the link to pmon-test-plan #1709
base: master
Are you sure you want to change the base?
Conversation
module_type_switch based on the implementation PR review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy Can you add a section for DPU dark mode support. In this case,
NPU's PMON should honor the user configuration to power OFF the DPU via platform API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor query, otherwise LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy Please update section 3.5 how the console utility be implemented.
Added section "2.1.1 DPUs in dark mode" |
2073070
to
c6b62cc
Compare
### Configuring startup and shutdown | ||
* The DPUs can be powered down by configuring the admin_status as shown. | ||
* The corresponding switch configDB table is also shown | ||
#### 2.1.1 DPUs in dark mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy can you define what is DARK mode?
Also mention the default is DARM mode enabled
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy can you define what is DARK mode? Also mention the default is DARM mode enabled
Done
@prgeor
* The user can use the “config chassis modules startup DPUx” to power ON a DPU Example: “config chassis modules startup DPU0” | ||
* The “config chassis modules shutdown DPUx” is used to power OFF a DPU Example: “config chassis modules shutdown DPU0” | ||
* The DPUs are powered down by configuring the admin_status as shown in the schema | ||
* The config change event handler listens to the config change and sets the corresponding switch configDB table and also triggers the module set_admin_state() API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy please specify where is this even handler running
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy please specify where is this even handler running
Done
@prgeor
@@ -128,9 +138,10 @@ Key: "CHASSIS_MODULE|DPU0" | |||
#### DPU shutdown sequence | |||
* There could be two possible sources for DPU shutdown. 1. A configuration change to DPU "admin_status: down" 2. The GNOI logic can trigger it. | |||
* The GNOI server runs on the DPU even after the DPU is shutdown. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy if the DPU is shut how can GNOI server run?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy if the DPU is shut how can GNOI server run?
Meant to say pre-shutdown.
The GNOI server runs on the DPU even after the DPU is pre-shutdown and listens until the graceful shutdown finishes.
Fixed
@prgeor
} | ||
|
||
``` | ||
#### DPU State |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy can you please specify that this update is done by Chassisd inside PMON. We don't need DPU specific agent to fetch these ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy can you please specify that this update is done by Chassisd inside PMON. We don't need DPU specific agent to fetch these ?
Updated:
Store the state progression (dpu_midplane_link_state, dpu_control_plane_state, dpu_data_plane_state) on the host ChassisStateDB using the push model specified in section: 3.2.4 of SONiC Chassis Platform Management & Monitoring HLD
@prgeor
@@ -676,26 +678,10 @@ fantray0 N/A fantray0.fan 55% intake Present OK 20230 | |||
fantray1 N/A fantray1.fan 56% intake Present OK 20230728 06:41:17 | |||
``` | |||
|
|||
#### 3.4.1 Reboot Cause | |||
#### 3.4.1 Reboot Cause CLIs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy Please specify which entity or service will update the chassisStateDB
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PMON on the DPU side will responsible to update the switch side chassisStateDB on DPU boot up, using the push model specified in section: 3.2.4 of SONiC Chassis Platform Management & Monitoring HLD
@prgeor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy understood its PMON. Which agent/daemon inside pmon?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor "Though how DPU pmon updates this is vendor dependent, it is recommended to use the sonic telemetry agent to align with the existing SONiC implementation."
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy In section 3.2 can you specify if the thermal management is in NPU or DPU?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@prgeor Updated. It runs on the NPU.
#### REBOOT_CAUSE DB schema | ||
``` | ||
Key: "REBOOT_CAUSE|2023_06_18_14_56_12" | ||
* Each DPU will update its reboot cause history in the Switch ChassisStateDB upon boot up. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy How? Which daemon/service?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dpu_db_util/system_health service will update the ChassisStateDB table.
* Though how DPU pmon updates this is vendor dependent, it is recommended to use the sonic telemetry agent to align with the existing SONiC implementation. | ||
* The DPUs will limit the number of history entries to a maximum of ten. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rameshraghupathy Why DPU pmon updates needs to be vendor dependent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is no guarantee that the SONiC running on the DPUs will necessarily be running Telemetry.
Added schema for health_info, reboot_cause on chassisStateDB and added the link to pmon-test-plan