[dockers] Update critical_processes file syntax #4831

yozhao101 · 2020-06-23T06:56:13Z

Signed-off-by: Yong Zhao yozhao@microsoft.com

- Why I did it
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
isc-dhcp-relay. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

- How I did it

- How to verify it
We can first enable the autorestart feature of a specified container for example dhcp_relay by running the comman sudo config container feature autorestart dhcp_relay enabled on DUT. Then we can select a critical process from the command docker top dhcp_relay and use the command sudo kill -SIGKILL <pid> to kill that critical process. Final step is to check whether the container is restarted correctly or not.

- Description for the changelog

… entries. One kind of entry is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes managed by supervisord using the name "xxx". I also updated the logic to parse the file critical_processes in supervisor-proc-event-listener script. Signed-off-by: Yong Zhao <yozhao@microsoft.com>

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

critical_processes file. Signed-off-by: Yong Zhao <yozhao@microsoft.com>

files/scripts/supervisor-proc-exit-listener

to handle the cases such as the process name is "group" or "program". Signed-off-by: Yong Zhao <yozhao@microsoft.com>

jleveque

Please fix conflicts. Looks like your repo was out-of-date. I recently renamed "docker-lldp-sv2" to "docker-lldp" and "docker-snmp-sv2" to "docker-snmp".

yozhao101 · 2020-06-23T22:00:42Z

Please fix conflicts. Looks like your repo was out-of-date. I recently renamed "docker-lldp-sv2" to "docker-lldp" and "docker-snmp-sv2" to "docker-snmp".

Fixed the conflicts.

jleveque · 2020-06-23T22:08:34Z

@yozhao101: This will not cherry-pick cleanly to the 201911 branch due to the directory name changes and the absence of new containers, so you will also need to open a separate PR against that branch.

yozhao101 · 2020-06-23T22:10:26Z

@yozhao101: This will not cherry-pick cleanly to the 201911 branch due to the directory name changes and the absence of new containers, so you will also need to open a separate PR against that branch.

Yes, I will open a new PR against 201911 branch.

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

lguohan · 2020-06-25T10:59:05Z

retest broadcom please

jleveque · 2020-06-25T17:50:54Z

Retest broadcom please

jleveque · 2020-06-25T22:19:38Z

Retest broadcom please

Backport of #4831 to the 201911 branch

**- Why I did it** Initially, the critical_processes file contains either the name of critical process or the name of group. For example, the critical_processes file in the dhcp_relay container contains a single group name `isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical processes and test whether a container can be restarted correctly if one of its critical processes is killed. However, it will be difficult to differentiate whether the names in the critical_processes file are the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user. Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes managed by supervisord using the name "xxx". At the same time, I also updated the logic to parse the file critical_processes in supervisor-proc-event-listener script. **- How to verify it** We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.

yozhao101 added 2 commits June 22, 2020 23:02

[supervisorctl] Change the "group name" to "group names".

0c88fc6

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

yozhao101 requested review from lguohan, jleveque, qiluo-msft and yxieca June 23, 2020 06:56

jleveque changed the title ~~[supervisorctl] Update the syntax of critical_processes file~~ [dockers] Update critical_processes file syntax Jun 23, 2020

[supervisorctl] Update the comment about the content of

ce883fb

critical_processes file. Signed-off-by: Yong Zhao <yozhao@microsoft.com>

jleveque suggested changes Jun 23, 2020

View reviewed changes

[docker] reorganize the function get_critical_group_and_process_list()

385c751

to handle the cases such as the process name is "group" or "program". Signed-off-by: Yong Zhao <yozhao@microsoft.com>

jleveque suggested changes Jun 23, 2020

View reviewed changes

jleveque added the Enhancement ➕ label Jun 23, 2020

Merge branch 'master' into update_format_critical_processes

4c48916

jleveque previously approved these changes Jun 23, 2020

View reviewed changes

[dockers] Use '' instead of "" in split() function to keep consistent.

e7a2b41

Signed-off-by: Yong Zhao <yozhao@microsoft.com>

yozhao101 dismissed jleveque’s stale review via e7a2b41 June 23, 2020 23:08

jleveque approved these changes Jun 23, 2020

View reviewed changes

lguohan approved these changes Jun 25, 2020

View reviewed changes

yxieca approved these changes Jun 25, 2020

View reviewed changes

yozhao101 mentioned this pull request Jun 26, 2020

[201911][dockers] Update critical_processes file syntax #4854

Merged

jleveque merged commit 4fa81b4 into sonic-net:master Jun 26, 2020

jleveque pushed a commit that referenced this pull request Jun 26, 2020

[201911][dockers] Update critical_processes file syntax (#4854)

c2364cf

Backport of #4831 to the 201911 branch

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[dockers] Update critical_processes file syntax #4831

[dockers] Update critical_processes file syntax #4831

yozhao101 commented Jun 23, 2020

jleveque left a comment

yozhao101 commented Jun 23, 2020

jleveque commented Jun 23, 2020

yozhao101 commented Jun 23, 2020

lguohan commented Jun 25, 2020

jleveque commented Jun 25, 2020

jleveque commented Jun 25, 2020

[dockers] Update critical_processes file syntax #4831

[dockers] Update critical_processes file syntax #4831

Conversation

yozhao101 commented Jun 23, 2020

jleveque left a comment

Choose a reason for hiding this comment

yozhao101 commented Jun 23, 2020

jleveque commented Jun 23, 2020

yozhao101 commented Jun 23, 2020

lguohan commented Jun 25, 2020

jleveque commented Jun 25, 2020

jleveque commented Jun 25, 2020