Skip to content

Commit

Permalink
[dockers] Update critical_processes file syntax (#4831)
Browse files Browse the repository at this point in the history
**- Why I did it**
Initially, the critical_processes file contains either the name of critical process or the name of group.
For example, the critical_processes file in the dhcp_relay container contains a single group name
`isc-dhcp-relay`. When testing the autorestart feature of each container, we need get all the critical
processes and test whether a  container can be restarted correctly if one of its critical processes is
killed. However, it will be difficult to differentiate whether the names in the critical_processes file are
the critical processes or group names. At the same time, changing the syntax in this file will separate the individual process from the groups and also makes it clear to the user.

Right now the critical_processes file contains two different kind of entries. One is "program:xxx" which indicates a critical process. Another is "group:xxx" which indicates a group of critical processes
managed by supervisord using the name "xxx". At the same time, I also updated the logic to
parse the file critical_processes in supervisor-proc-event-listener script.

**- How to verify it**
We can first enable the autorestart feature of a specified container for example `dhcp_relay` by running the comman `sudo config container feature autorestart dhcp_relay enabled` on DUT. Then we can select a critical process from the command `docker top dhcp_relay` and use the command `sudo kill -SIGKILL <pid>` to kill that critical process. Final step is to check whether the container is restarted correctly or not.
  • Loading branch information
yozhao101 authored Jun 26, 2020
1 parent 921d132 commit 4fa81b4
Show file tree
Hide file tree
Showing 26 changed files with 84 additions and 58 deletions.
2 changes: 1 addition & 1 deletion dockers/docker-database/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
redis
program:redis
2 changes: 1 addition & 1 deletion dockers/docker-dhcp-relay/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
isc-dhcp-relay
group:isc-dhcp-relay
10 changes: 5 additions & 5 deletions dockers/docker-fpm-frr/critical_processes
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
zebra
staticd
bgpd
fpmsyncd
bgpcfgd
program:zebra
program:staticd
program:bgpd
program:fpmsyncd
program:bgpcfgd
4 changes: 2 additions & 2 deletions dockers/docker-fpm-gobgp/critical_processes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
gobgpd
fpmsyncd
program:gobgpd
program:fpmsyncd
8 changes: 4 additions & 4 deletions dockers/docker-fpm-quagga/critical_processes
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
zebra
bgpd
fpmsyncd
bgpcfgd
program:zebra
program:bgpd
program:fpmsyncd
program:bgpcfgd
6 changes: 3 additions & 3 deletions dockers/docker-lldp/critical_processes
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
lldpd
lldp-syncd
lldpmgrd
program:lldpd
program:lldp_syncd
program:lldpmgrd
4 changes: 2 additions & 2 deletions dockers/docker-nat/critical_processes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
natmgrd
natsyncd
program:natmgrd
program:natsyncd
20 changes: 10 additions & 10 deletions dockers/docker-orchagent/critical_processes
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
orchagent
portsyncd
neighsyncd
vlanmgrd
intfmgrd
portmgrd
buffermgrd
vrfmgrd
nbrmgrd
vxlanmgrd
program:orchagent
program:portsyncd
program:neighsyncd
program:vlanmgrd
program:intfmgrd
program:portmgrd
program:buffermgrd
program:vrfmgrd
program:nbrmgrd
program:vxlanmgrd
6 changes: 3 additions & 3 deletions dockers/docker-platform-monitor/critical_processes
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
ledd
xcvrd
psud
program:ledd
program:xcvrd
program:psud
2 changes: 1 addition & 1 deletion dockers/docker-router-advertiser/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
radvd
program:radvd
2 changes: 1 addition & 1 deletion dockers/docker-sflow/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
sflowmgrd
program:sflowmgrd
4 changes: 2 additions & 2 deletions dockers/docker-snmp/critical_processes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
snmpd
snmp-subagent
program:snmpd
program:snmp-subagent
2 changes: 1 addition & 1 deletion dockers/docker-sonic-restapi/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
restapi
program:restapi
4 changes: 2 additions & 2 deletions dockers/docker-sonic-telemetry/critical_processes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
telemetry
dialout
program:telemetry
program:dialout
4 changes: 2 additions & 2 deletions dockers/docker-teamd/critical_processes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
teammgrd
teamsyncd
program:teammgrd
program:teamsyncd
38 changes: 32 additions & 6 deletions files/scripts/supervisor-proc-exit-listener
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,42 @@ import swsssdk

from supervisor import childutils

# Contents of file should be the names of critical processes (as defined in
# supervisor.conf file), one per line
# Each line of this file should specify either one critical process or one
# critical process group, (as defined in supervisord.conf file), in the
# following format:
#
# program:<process_name>
# group:<group_name>
CRITICAL_PROCESSES_FILE = '/etc/supervisor/critical_processes'

# This table in databse contains the features for container and each
# feature for a row will be configured a state or number.
CONTAINER_FEATURE_TABLE_NAME = 'CONTAINER_FEATURE'

# Read the critical processes/group names from CRITICAL_PROCESSES_FILE
def get_critical_group_and_process_list():
critical_group_list = []
critical_process_list = []

with open(CRITICAL_PROCESSES_FILE, 'r') as file:
for line in file:
line_info = line.strip(' \n').split(':')
if len(line_info) != 2:
syslog.syslog(syslog.LOG_ERR, "Syntax of the line {} in critical_processes file is incorrect. Exiting...".format(line))
sys.exit(5)

identifier_key = line_info[0].strip()
identifier_value = line_info[1].strip()
if identifier_key == "group" and identifier_value:
critical_group_list.append(identifier_value)
elif identifier_key == "program" and identifier_value:
critical_process_list.append(identifier_value)
else:
syslog.syslog(syslog.LOG_ERR, "Syntax of the line {} in critical_processes file is incorrect. Exiting...".format(line))
sys.exit(6)

return critical_group_list, critical_process_list

def main(argv):
container_name = None
opts, args = getopt.getopt(argv, "c:", ["container-name="])
Expand All @@ -29,9 +57,7 @@ def main(argv):
syslog.syslog(syslog.LOG_ERR, "Container name not specified. Exiting...")
sys.exit(1)

# Read the list of critical processes from a file
with open(CRITICAL_PROCESSES_FILE, 'r') as f:
critical_processes = [line.rstrip('\n') for line in f]
critical_group_list, critical_process_list = get_critical_group_and_process_list()

while True:
# Transition from ACKNOWLEDGED to READY
Expand Down Expand Up @@ -73,7 +99,7 @@ def main(argv):
# If container is database or auto-restart feature is enabled and at the same time
# a critical process exited unexpectedly, terminate supervisor
if ((container_name == 'database' or restart_feature == 'enabled') and expected == 0 and
(processname in critical_processes or groupname in critical_processes)):
(processname in critical_process_list or groupname in critical_group_list)):
MSG_FORMAT_STR = "Process {} exited unxepectedly. Terminating supervisor..."
msg = MSG_FORMAT_STR.format(payload_headers['processname'])
syslog.syslog(syslog.LOG_INFO, msg)
Expand Down
2 changes: 1 addition & 1 deletion platform/barefoot/docker-syncd-bfn/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
syncd
program:syncd
4 changes: 2 additions & 2 deletions platform/broadcom/docker-syncd-brcm/critical_processes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
dsserve
syncd
program:dsserve
program:syncd
2 changes: 1 addition & 1 deletion platform/cavium/docker-syncd-cavm/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
syncd
program:syncd
2 changes: 1 addition & 1 deletion platform/centec/docker-syncd-centec/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
syncd
program:syncd
Original file line number Diff line number Diff line change
@@ -1 +1 @@
syncd
program:syncd
Original file line number Diff line number Diff line change
@@ -1 +1 @@
syncd
program:syncd
2 changes: 1 addition & 1 deletion platform/marvell/docker-syncd-mrvl/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
syncd
program:syncd
2 changes: 1 addition & 1 deletion platform/mellanox/docker-syncd-mlnx/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
syncd
program:syncd
4 changes: 2 additions & 2 deletions platform/nephos/docker-syncd-nephos/critical_processes
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
dsserve
syncd
program:dsserve
program:syncd
2 changes: 1 addition & 1 deletion platform/vs/docker-syncd-vs/critical_processes
Original file line number Diff line number Diff line change
@@ -1 +1 @@
syncd
program:syncd

0 comments on commit 4fa81b4

Please sign in to comment.