Skip to content

Commit

Permalink
[docker-syncd]: Restart SwSS, syncd and dependent services if a criti…
Browse files Browse the repository at this point in the history
…cal process in syncd container exits unexpectedly (sonic-net#3534)

Add the same mechanism I developed for the SwSS service in sonic-net#2845 to the syncd service. However, in order to cause the SwSS service to also exit and restart in this situation, I developed a docker-wait-any program which the SwSS service uses to wait for either the swss or syncd containers to exit.
  • Loading branch information
jleveque authored and zhenggen-xu committed Jan 9, 2020
1 parent 1534785 commit ccb607a
Show file tree
Hide file tree
Showing 38 changed files with 164 additions and 8 deletions.
3 changes: 3 additions & 0 deletions files/build_templates/sonic_debian_extension.j2
Original file line number Diff line number Diff line change
Expand Up @@ -218,6 +218,9 @@ sudo cp $IMAGE_CONFIGS/hostname/hostname-config.service $FILESYSTEM_ROOT/etc/sy
echo "hostname-config.service" | sudo tee -a $GENERATED_SERVICE_FILE
sudo cp $IMAGE_CONFIGS/hostname/hostname-config.sh $FILESYSTEM_ROOT/usr/bin/

# Copy miscellaneous scripts
sudo cp $IMAGE_CONFIGS/misc/docker-wait-any $FILESYSTEM_ROOT/usr/bin/

# Copy updategraph script and service file
j2 files/build_templates/updategraph.service.j2 | sudo tee $FILESYSTEM_ROOT/etc/systemd/system/updategraph.service
sudo cp $IMAGE_CONFIGS/updategraph/updategraph $FILESYSTEM_ROOT/usr/bin/
Expand Down
62 changes: 62 additions & 0 deletions files/image_config/misc/docker-wait-any
Original file line number Diff line number Diff line change
@@ -0,0 +1,62 @@
#!/usr/bin/env python

"""
docker-wait-any
This script takes one or more Docker container names as arguments,
and it will block indefinitely while all of the specified containers
are running. If any of the specified containers stop, the script will
exit.
This script was created because the 'docker wait' command is lacking
this functionality. It will block until ALL specified containers have
stopped running. Here, we spawn multiple threads and wait on one
container per thread. If any of the threads exit, the entire
application will exit.
NOTE: This script is written against docker-py version 1.6.0. Newer
versions of docker-py have a different API.
"""

import sys
import threading
from docker import Client

# Instantiate a global event to share among our threads
g_thread_exit_event = threading.Event()


def usage():
print("Usage: {} <container_name> [<container_name> ...]".format(sys.argv[0]))
sys.exit(1)


def wait_for_container(docker_client, container_name):
docker_client.wait(container_name)

print("No longer waiting on container '{}'".format(container_name))

# Signal the main thread to exit
g_thread_exit_event.set()


def main():
thread_list = []

docker_client = Client(base_url='unix://var/run/docker.sock')

# Ensure we were passed at least one argument
if len(sys.argv) < 2:
usage()

container_names = sys.argv[1:]

for container_name in container_names:
t = threading.Thread(target=wait_for_container, args=[docker_client, container_name])
t.daemon = True
t.start()
thread_list.append(t)

# Wait until we receive an event signifying one of the containers has stopped
g_thread_exit_event.wait()
sys.exit(0)

if __name__ == '__main__':
main()
17 changes: 16 additions & 1 deletion files/scripts/swss.sh
Original file line number Diff line number Diff line change
Expand Up @@ -131,7 +131,22 @@ start() {

wait() {
start_peer_and_dependent_services
/usr/bin/${SERVICE}.sh wait

# Allow some time for peer container to start
# NOTE: This assumes Docker containers share the same names as their
# corresponding services
for SECS in {1..60}; do
RUNNING=$(docker inspect -f '{{.State.Running}}' ${PEER})
if [[ x"$RUNNING" == x"true" ]]; then
break
else
sleep 1
fi
done

# NOTE: This assumes Docker containers share the same names as their
# corresponding services
/usr/bin/docker-wait-any ${SERVICE} ${PEER}
}

stop() {
Expand Down
1 change: 1 addition & 0 deletions platform/barefoot/docker-syncd-bfn-rpc.mk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
DOCKER_SYNCD_BFN_RPC = docker-syncd-bfn-rpc.gz
$(DOCKER_SYNCD_BFN_RPC)_PATH = $(PLATFORM_PATH)/docker-syncd-bfn-rpc
$(DOCKER_SYNCD_BFN_RPC)_DEPENDS += $(SYNCD_RPC) $(LIBTHRIFT)
$(DOCKER_SYNCD_BFN_RPC)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
ifeq ($(INSTALL_DEBUG_TOOLS), y)
$(DOCKER_SYNCD_BFN_RPC)_DEPENDS += $(SYNCD_RPC_DBG) \
$(LIBSWSSCOMMON_DBG) \
Expand Down
2 changes: 2 additions & 0 deletions platform/barefoot/docker-syncd-bfn/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,8 @@ debs/{{ deb }}{{' '}}

COPY ["start.sh", "/usr/bin/"]
COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]

## Clean up
RUN apt-get clean -y; apt-get autoclean -y; apt-get autoremove -y
Expand Down
1 change: 1 addition & 0 deletions platform/barefoot/docker-syncd-bfn/critical_processes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
syncd
2 changes: 1 addition & 1 deletion platform/broadcom/docker-syncd-brcm-rpc.mk
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ $(DOCKER_SYNCD_BRCM_RPC)_DEPENDS += $(SYNCD_RPC_DBG) \
$(LIBSAIMETADATA_DBG) \
$(LIBSAIREDIS_DBG)
endif
$(DOCKER_SYNCD_BRCM_RPC)_FILES += $(DSSERVE) $(BCMCMD)
$(DOCKER_SYNCD_BRCM_RPC)_FILES += $(DSSERVE) $(BCMCMD) $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
$(DOCKER_SYNCD_BRCM_RPC)_LOAD_DOCKERS += $(DOCKER_SYNCD_BASE)
SONIC_DOCKER_IMAGES += $(DOCKER_SYNCD_BRCM_RPC)
SONIC_STRETCH_DOCKERS += $(DOCKER_SYNCD_BRCM_RPC)
Expand Down
2 changes: 2 additions & 0 deletions platform/broadcom/docker-syncd-brcm/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,8 @@ COPY ["files/dsserve", "files/bcmcmd", "start.sh", "bcmsh", "/usr/bin/"]
RUN chmod +x /usr/bin/dsserve /usr/bin/bcmcmd

COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]

## Clean up
RUN apt-get clean -y; apt-get autoclean -y; apt-get autoremove -y
Expand Down
2 changes: 2 additions & 0 deletions platform/broadcom/docker-syncd-brcm/critical_processes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
dsserve
syncd
8 changes: 7 additions & 1 deletion platform/broadcom/docker-syncd-brcm/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true

[eventlistener:supervisor-proc-exit-listener]
command=/usr/bin/supervisor-proc-exit-listener
events=PROCESS_STATE_EXITED
autostart=true
autorestart=unexpected

[program:start.sh]
command=/usr/bin/start.sh
priority=1
Expand All @@ -15,7 +21,7 @@ stderr_logfile=syslog
command=/usr/sbin/rsyslogd -n
priority=2
autostart=false
autorestart=false
autorestart=unexpected
stdout_logfile=syslog
stderr_logfile=syslog

Expand Down
1 change: 1 addition & 0 deletions platform/cavium/docker-syncd-cavm-rpc.mk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
DOCKER_SYNCD_CAVM_RPC = docker-syncd-cavm-rpc.gz
$(DOCKER_SYNCD_CAVM_RPC)_PATH = $(PLATFORM_PATH)/docker-syncd-cavm-rpc
$(DOCKER_SYNCD_CAVM_RPC)_DEPENDS += $(SYNCD_RPC) $(LIBTHRIFT) $(CAVM_LIBSAI) $(XP_TOOLS) $(REDIS_TOOLS)
$(DOCKER_SYNCD_CAVM_RPC)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
ifeq ($(INSTALL_DEBUG_TOOLS), y)
$(DOCKER_SYNCD_CAVM_RPC)_DEPENDS += $(SYNCD_RPC_DBG) \
$(LIBSWSSCOMMON_DBG) \
Expand Down
1 change: 1 addition & 0 deletions platform/cavium/docker-syncd-cavm.mk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
DOCKER_SYNCD_CAVM = docker-syncd-cavm.gz
$(DOCKER_SYNCD_CAVM)_PATH = $(PLATFORM_PATH)/docker-syncd-cavm
$(DOCKER_SYNCD_CAVM)_DEPENDS += $(SYNCD) $(CAVM_LIBSAI) $(XP_TOOLS) $(REDIS_TOOLS)
$(DOCKER_SYNCD_CAVM)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
ifeq ($(INSTALL_DEBUG_TOOLS), y)
$(DOCKER_SYNCD_CAVM)_DEPENDS += $(SYNCD_DBG) \
$(LIBSWSSCOMMON_DBG) \
Expand Down
2 changes: 2 additions & 0 deletions platform/cavium/docker-syncd-cavm/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ debs/{{ deb }}{{' '}}

COPY ["start.sh", "/usr/bin/"]
COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]

COPY ["profile.ini", "/etc/ssw/AS7512/"]

Expand Down
1 change: 1 addition & 0 deletions platform/cavium/docker-syncd-cavm/critical_processes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
syncd
8 changes: 7 additions & 1 deletion platform/cavium/docker-syncd-cavm/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true

[eventlistener:supervisor-proc-exit-listener]
command=/usr/bin/supervisor-proc-exit-listener
events=PROCESS_STATE_EXITED
autostart=true
autorestart=unexpected

[program:start.sh]
command=/usr/bin/start.sh
priority=1
Expand All @@ -15,7 +21,7 @@ stderr_logfile=syslog
command=/usr/sbin/rsyslogd -n
priority=2
autostart=false
autorestart=false
autorestart=unexpected
stdout_logfile=syslog
stderr_logfile=syslog

Expand Down
1 change: 1 addition & 0 deletions platform/centec/docker-syncd-centec-rpc.mk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
DOCKER_SYNCD_CENTEC_RPC = docker-syncd-centec-rpc.gz
$(DOCKER_SYNCD_CENTEC_RPC)_PATH = $(PLATFORM_PATH)/docker-syncd-centec-rpc
$(DOCKER_SYNCD_CENTEC_RPC)_DEPENDS += $(SYNCD_RPC) $(LIBTHRIFT)
$(DOCKER_SYNCD_CENTEC_RPC)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
ifeq ($(INSTALL_DEBUG_TOOLS), y)
$(DOCKER_SYNCD_CENTEC_RPC)_DEPENDS += $(SYNCD_RPC_DBG) \
$(LIBSWSSCOMMON_DBG) \
Expand Down
1 change: 1 addition & 0 deletions platform/centec/docker-syncd-centec.mk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
DOCKER_SYNCD_CENTEC = docker-syncd-centec.gz
$(DOCKER_SYNCD_CENTEC)_PATH = $(PLATFORM_PATH)/docker-syncd-centec
$(DOCKER_SYNCD_CENTEC)_DEPENDS += $(SYNCD)
$(DOCKER_SYNCD_CENTEC)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
ifeq ($(INSTALL_DEBUG_TOOLS), y)
$(DOCKER_SYNCD_CENTEC)_DEPENDS += $(SYNCD_DBG) \
$(LIBSWSSCOMMON_DBG) \
Expand Down
2 changes: 2 additions & 0 deletions platform/centec/docker-syncd-centec/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,8 @@ RUN apt-get install -f kmod

COPY ["start.sh", "/usr/bin/"]
COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]

## Clean up
RUN apt-get clean -y; apt-get autoclean -y; apt-get autoremove -y
Expand Down
1 change: 1 addition & 0 deletions platform/centec/docker-syncd-centec/critical_processes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
syncd
8 changes: 7 additions & 1 deletion platform/centec/docker-syncd-centec/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true

[eventlistener:supervisor-proc-exit-listener]
command=/usr/bin/supervisor-proc-exit-listener
events=PROCESS_STATE_EXITED
autostart=true
autorestart=unexpected

[program:start.sh]
command=/usr/bin/start.sh
priority=1
Expand All @@ -15,7 +21,7 @@ stderr_logfile=syslog
command=/usr/sbin/rsyslogd -n
priority=2
autostart=false
autorestart=false
autorestart=unexpected
stdout_logfile=syslog
stderr_logfile=syslog

Expand Down
1 change: 1 addition & 0 deletions platform/marvell-arm64/docker-syncd-mrvl-rpc.mk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
DOCKER_SYNCD_MRVL_RPC = docker-syncd-mrvl-rpc.gz
$(DOCKER_SYNCD_MRVL_RPC)_PATH = $(PLATFORM_PATH)/docker-syncd-mrvl-rpc
$(DOCKER_SYNCD_MRVL_RPC)_DEPENDS += $(SYNCD_RPC) $(LIBTHRIFT)
$(DOCKER_SYNCD_MRVL_RPC)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
ifeq ($(INSTALL_DEBUG_TOOLS), y)
$(DOCKER_SYNCD_MRVL_RPC)_DEPENDS += $(SYNCD_RPC_DBG) \
$(LIBSWSSCOMMON_DBG) \
Expand Down
2 changes: 2 additions & 0 deletions platform/marvell-arm64/docker-syncd-mrvl/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ debs/{{ deb }}{{' '}}

COPY ["start.sh", "/usr/bin/"]
COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]

## Clean up
RUN apt-get clean -y; apt-get autoclean -y; apt-get autoremove -y
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
syncd
1 change: 1 addition & 0 deletions platform/marvell-armhf/docker-syncd-mrvl-rpc.mk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
DOCKER_SYNCD_MRVL_RPC = docker-syncd-mrvl-rpc.gz
$(DOCKER_SYNCD_MRVL_RPC)_PATH = $(PLATFORM_PATH)/docker-syncd-mrvl-rpc
$(DOCKER_SYNCD_MRVL_RPC)_DEPENDS += $(SYNCD_RPC) $(LIBTHRIFT)
$(DOCKER_SYNCD_MRVL_RPC)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
ifeq ($(INSTALL_DEBUG_TOOLS), y)
$(DOCKER_SYNCD_MRVL_RPC)_DEPENDS += $(SYNCD_RPC_DBG) \
$(LIBSWSSCOMMON_DBG) \
Expand Down
2 changes: 2 additions & 0 deletions platform/marvell-armhf/docker-syncd-mrvl/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ debs/{{ deb }}{{' '}}

COPY ["start.sh", "/usr/bin/"]
COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]

## Clean up
RUN apt-get clean -y; apt-get autoclean -y; apt-get autoremove -y
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
syncd
1 change: 1 addition & 0 deletions platform/marvell/docker-syncd-mrvl-rpc.mk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
DOCKER_SYNCD_MRVL_RPC = docker-syncd-mrvl-rpc.gz
$(DOCKER_SYNCD_MRVL_RPC)_PATH = $(PLATFORM_PATH)/docker-syncd-mrvl-rpc
$(DOCKER_SYNCD_MRVL_RPC)_DEPENDS += $(SYNCD_RPC) $(LIBTHRIFT)
$(DOCKER_SYNCD_MRVL_RPC)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
ifeq ($(INSTALL_DEBUG_TOOLS), y)
$(DOCKER_SYNCD_MRVL_RPC)_DEPENDS += $(SYNCD_RPC_DBG) \
$(LIBSWSSCOMMON_DBG) \
Expand Down
2 changes: 2 additions & 0 deletions platform/marvell/docker-syncd-mrvl/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,8 @@ debs/{{ deb }}{{' '}}

COPY ["start.sh", "syncd.sh", "/usr/bin/"]
COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]

## Clean up
RUN apt-get clean -y; apt-get autoclean -y; apt-get autoremove -y
Expand Down
1 change: 1 addition & 0 deletions platform/marvell/docker-syncd-mrvl/critical_processes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
syncd
8 changes: 7 additions & 1 deletion platform/marvell/docker-syncd-mrvl/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true

[eventlistener:supervisor-proc-exit-listener]
command=/usr/bin/supervisor-proc-exit-listener
events=PROCESS_STATE_EXITED
autostart=true
autorestart=unexpected

[program:start.sh]
command=/usr/bin/start.sh
priority=1
Expand All @@ -15,7 +21,7 @@ stderr_logfile=syslog
command=/usr/sbin/rsyslogd -n
priority=2
autostart=false
autorestart=false
autorestart=unexpected
stdout_logfile=syslog
stderr_logfile=syslog

Expand Down
1 change: 1 addition & 0 deletions platform/mellanox/docker-syncd-mlnx-rpc.mk
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
DOCKER_SYNCD_MLNX_RPC = docker-syncd-mlnx-rpc.gz
$(DOCKER_SYNCD_MLNX_RPC)_PATH = $(PLATFORM_PATH)/docker-syncd-mlnx-rpc
$(DOCKER_SYNCD_MLNX_RPC)_DEPENDS += $(SYNCD_RPC) $(LIBTHRIFT)
$(DOCKER_SYNCD_MLNX_RPC)_FILES += $(SUPERVISOR_PROC_EXIT_LISTENER_SCRIPT)
ifeq ($(INSTALL_DEBUG_TOOLS), y)
$(DOCKER_SYNCD_MLNX_RPC)_DEPENDS += $(SYNCD_RPC_DBG) \
$(LIBSWSSCOMMON_DBG) \
Expand Down
2 changes: 2 additions & 0 deletions platform/mellanox/docker-syncd-mlnx/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -35,5 +35,7 @@ RUN apt-get clean -y && \

COPY ["start.sh", "/usr/bin/"]
COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]

ENTRYPOINT ["/usr/bin/supervisord"]
1 change: 1 addition & 0 deletions platform/mellanox/docker-syncd-mlnx/critical_processes
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
syncd
8 changes: 7 additions & 1 deletion platform/mellanox/docker-syncd-mlnx/supervisord.conf
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,12 @@ logfile_maxbytes=1MB
logfile_backups=2
nodaemon=true

[eventlistener:supervisor-proc-exit-listener]
command=/usr/bin/supervisor-proc-exit-listener
events=PROCESS_STATE_EXITED
autostart=true
autorestart=unexpected

[program:start.sh]
command=/usr/bin/start.sh
priority=1
Expand All @@ -15,7 +21,7 @@ stderr_logfile=syslog
command=/usr/sbin/rsyslogd -n
priority=2
autostart=false
autorestart=false
autorestart=unexpected
stdout_logfile=syslog
stderr_logfile=syslog

Expand Down
2 changes: 2 additions & 0 deletions platform/nephos/docker-syncd-nephos/Dockerfile.j2
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ COPY ["files/dsserve", "files/npx_diag", "start.sh", "/usr/bin/"]
RUN chmod +x /usr/bin/npx_diag /usr/bin/dsserve

COPY ["supervisord.conf", "/etc/supervisor/conf.d/"]
COPY ["files/supervisor-proc-exit-listener", "/usr/bin"]
COPY ["critical_processes", "/etc/supervisor/"]

## Clean up
RUN apt-get clean -y; apt-get autoclean -y; apt-get autoremove -y
Expand Down
2 changes: 2 additions & 0 deletions platform/nephos/docker-syncd-nephos/critical_processes
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
dsserve
syncd
Loading

0 comments on commit ccb607a

Please sign in to comment.