-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Wait till CHASIS_APP_DB PING is successful, host_name and asic_name are valid in CONIFG_DB before starting chassis-db-cleanup #17962
Wait till CHASIS_APP_DB PING is successful, host_name and asic_name are valid in CONIFG_DB before starting chassis-db-cleanup #17962
Conversation
@judyjoseph for viz |
@judyjoseph @arlakshm @abdosi , please review this PR |
We ran the complete oc with this fix and the error "Unable to connect to redis: Cannot assign requested address" is not seen. Also we didn't see the orchagent crash |
/AzurePipelines run |
You have several pipelines (over 10) configured to build pull requests in this repository. Specify which pipelines you would like to run by using /azp run [pipelines] command. You can specify multiple pipelines using a comma separated list. |
8b72589
to
08c773a
Compare
/azp run Azure.sonic-buildimage |
1 similar comment
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change looks good to me
@saksarav-nokia, Trying to find an alternative solution here as this change to add hostname-config.service dependency will affect all platforms. I checked this script, can we add a specific check in this script to proceed with changes in /etc/hosts file only if HOSTNAME changes ?: .That should help our case and we need not add this hostname-config.service dependency |
@judyjoseph , I think that will also fix the issue. I will test it out and update the PR. |
d36977c
to
283d1ff
Compare
…re valid in CONIFG_DB before starting chassis-db-cleanup Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
@judyjoseph , Addressed your comments and verified the changes and ensured the issue is not seen with current changes. Please review it. |
/azp run Azure.sonic-buildimage |
Azure Pipelines successfully started running 1 pipeline(s). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
MSFT ADO: 27704026 |
…re valid in CONIFG_DB before starting chassis-db-cleanup (sonic-net#17962) This PR fixes the issue reported in Issu sonic-net#17945 We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands. The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with swss@.service and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's. --------- Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
Cherry-pick PR to 202305: #18756 |
…re valid in CONIFG_DB before starting chassis-db-cleanup (#17962) This PR fixes the issue reported in Issu #17945 We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands. The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with swss@.service and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's. --------- Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
…re valid in CONIFG_DB before starting chassis-db-cleanup (sonic-net#17962) This PR fixes the issue reported in Issu sonic-net#17945 We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands. The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with swss@.service and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's. --------- Signed-off-by: saksarav <sakthivadivu.saravanaraj@nokia.com>
@yxieca , who can review/approve for 202311 for this PR? |
Why I did it
This PR fixes the issue reported in Issu #17945
We noticed that chassis db clean up is skipped sometimes when the CHASSIS_APP_DB PING fails. Also if host_name and asic_name are not written to CONIG_DB, it could pass the empty strings to CHASSIS_APP_DB EVAL commands.
The service hostname-config.service is restarted whenever the config-reload or load-minigraph is done and this services renames the file /etc/hosts to updates it with the new file. This interferes with swss@.service and when swss.sh script CHASSIS_APP_DPP when the /etc/hosts file is renamed, the error "Unable to connect to redis: Cannot assign requested address" is seen and the CHASSIS_APP_DB EVAL command fails. This causes the chassis db entries not getting cleaned up and causes orchagent crash in remote LC's.
Work item tracking
How I did it
Wait till CHASS_APP_DB PING is successful before checking for entries in CHASSIS_APP_DB table. Also wait till host_name and asic_name are valis in CONFIG_DB.
Modified swss@.service to start after hostname-config.service
How to verify it
Ran a script with 200 times config reload & load-minigraph and verified that chassis db cleanup is done every time and the orchagent crash is not seen .
Which release branch to backport (provide reason below if selected)
Tested branch (Please provide the tested image version)
Description for the changelog
Link to config_db schema for YANG module changes
A picture of a cute animal (not mandatory but encouraged)