Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dockers] Update critical_processes file syntax #4831

Merged
merged 6 commits into from
Jun 26, 2020
Merged

[dockers] Update critical_processes file syntax #4831

merged 6 commits into from
Jun 26, 2020

Conversation

yozhao101
Copy link
Contributor

Signed-off-by: Yong Zhao yozhao@microsoft.com

- Why I did it
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
isc-dhcp-relay. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

- How I did it

- How to verify it
We can first enable the autorestart feature of a specified container for example dhcp_relay by running the comman sudo config container feature autorestart dhcp_relay enabled on DUT. Then we can select a critical process from the command docker top dhcp_relay and use the command sudo kill -SIGKILL <pid> to kill that critical process. Final step is to check whether the container is restarted correctly or not.

- Description for the changelog

… entries. One

kind of entry is "program:xxx" which indicates a critical process.
Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@jleveque jleveque changed the title [supervisorctl] Update the syntax of critical_processes file [dockers] Update critical_processes file syntax Jun 23, 2020
critical_processes file.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
files/scripts/supervisor-proc-exit-listener Outdated Show resolved Hide resolved
files/scripts/supervisor-proc-exit-listener Outdated Show resolved Hide resolved
files/scripts/supervisor-proc-exit-listener Outdated Show resolved Hide resolved
files/scripts/supervisor-proc-exit-listener Outdated Show resolved Hide resolved
to handle the cases such as the process name is "group" or "program".

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
Copy link
Contributor

@jleveque jleveque left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix conflicts. Looks like your repo was out-of-date. I recently renamed "docker-lldp-sv2" to "docker-lldp" and "docker-snmp-sv2" to "docker-snmp".

@yozhao101
Copy link
Contributor Author

Please fix conflicts. Looks like your repo was out-of-date. I recently renamed "docker-lldp-sv2" to "docker-lldp" and "docker-snmp-sv2" to "docker-snmp".

Fixed the conflicts.

jleveque
jleveque previously approved these changes Jun 23, 2020
@jleveque
Copy link
Contributor

@yozhao101: This will not cherry-pick cleanly to the 201911 branch due to the directory name changes and the absence of new containers, so you will also need to open a separate PR against that branch.

@yozhao101
Copy link
Contributor Author

@yozhao101: This will not cherry-pick cleanly to the 201911 branch due to the directory name changes and the absence of new containers, so you will also need to open a separate PR against that branch.

Yes, I will open a new PR against 201911 branch.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>
@lguohan
Copy link
Collaborator

lguohan commented Jun 25, 2020

retest broadcom please

@jleveque
Copy link
Contributor

Retest broadcom please

@jleveque
Copy link
Contributor

Retest broadcom please

@jleveque jleveque merged commit 4fa81b4 into sonic-net:master Jun 26, 2020
jleveque pushed a commit that referenced this pull request Jun 26, 2020
pjaipakdee19 pushed a commit to pjaipakdee19/sonic-buildimage that referenced this pull request Jul 7, 2020
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a  container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants