Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DellEMC] Watchdog support DellEMCS6100 #3187

Closed
wants to merge 1 commit into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions build_debian.sh
Original file line number Diff line number Diff line change
Expand Up @@ -250,6 +250,7 @@ sudo LANG=C DEBIAN_FRONTEND=noninteractive chroot $FILESYSTEM_ROOT apt-get -y in
locales \
flashrom \
cgroup-tools \
watchdog \
mcelog

#Adds a locale to a debian system in non-interactive mode
Expand Down Expand Up @@ -411,6 +412,12 @@ sudo cp files/dhcp/snmpcommunity $FILESYSTEM_ROOT/etc/dhcp/dhclient-exit-hooks.d
sudo cp files/dhcp/vrf $FILESYSTEM_ROOT/etc/dhcp/dhclient-exit-hooks.d/
sudo cp files/dhcp/dhclient.conf $FILESYSTEM_ROOT/etc/dhcp/

## Config watchdog device and disable at startup
sudo sed -i 's/#watchdog-device/watchdog-device/' $FILESYSTEM_ROOT/etc/watchdog.conf
sudo sed -i 's/run_watchdog=1/run_watchdog=0/' $FILESYSTEM_ROOT/etc/default/watchdog
sudo rm -rf $FILESYSTEM_ROOT/lib/systemd/system/wd_keepalive.service
sudo rm -rf $FILESYSTEM_ROOT/etc/init.d/wd_keepalive

Copy link
Contributor Author

@paavaanan paavaanan Jul 19, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Here enabled watchdog device in watchdog.conf
  • Watchdog can be enabled only with sonic_platform API on need basis.
  • Removed wd_keepalive support.

## Version file
sudo mkdir -p $FILESYSTEM_ROOT/etc/sonic
sudo tee $FILESYSTEM_ROOT/etc/sonic/sonic_version.yml > /dev/null <<EOF
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -239,6 +239,10 @@ if [[ "$1" == "init" ]]; then
/usr/local/bin/platform_watchdog_disable.sh
fi

#Enable watcdog with nowayout
rmmod iTCO_wdt
modprobe iTCO_wdt nowayout=1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Added nowayout support.
  • So, once watchdog is started we can't stop it.
When the device is closed, the watchdog is disabled, unless the "Magic
Close" feature is supported (see below).  This is not always such a
good idea, since if there is a bug in the watchdog daemon and it
crashes the system will not reboot.  Because of this, some of the
drivers support the configuration option "Disable watchdog shutdown on
close", CONFIG_WATCHDOG_NOWAYOUT.  If it is set to Y when compiling
the kernel, there is no way of disabling the watchdog once it has been
started.  So, if the watchdog daemon crashes, the system will reboot
after the timeout has passed. Watchdog devices also usually support
the nowayout module parameter so that this option can be controlled at
runtime.

https://www.kernel.org/doc/Documentation/watchdog/watchdog-api.txt

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The platform API provides a function to "disarm" the watchdog. I'm not sure how frequently (or even if) this will be called. However, with "nowayout" enabled, it appears that there is no way to disable the watchdog after it has started. Is this correct?

Copy link
Contributor Author

@paavaanan paavaanan Aug 2, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Yes. We can't able to stop the watchdog once we enable with "nowayout" option.
  • The need for nowayout is if user-space watchdog daemon got crashed and accidentally if it close the /dev/watchdog node proerly then there is a possibility watchdog may never kick-in.
  • To avoid this (slightest possibility) nowayout option is used.

cpu_board_mux "new_device"
switch_board_mux "new_device"
sys_eeprom "new_device"
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,183 @@
#
# watchdog_base.py
#
# Abstract base class for implementing a platform-specific class with which
# to interact with a hardware watchdog module in SONiC
#
try:
import os
import sys
import subprocess
import logging
import time
from sonic_platform_base.watchdog_base import WatchdogBase
except ImportError as e:
raise ImportError(str(e) + "- required module not found")


class Watchdog(WatchdogBase):
"""
Abstract base class for interfacing with a hardware watchdog module
"""
WATCHDOG_DEFAULT_FILE = "/etc/default/watchdog"
WATCHDOG_CONFIG_FILE = "/etc/watchdog.conf"

WATCHDOG_START = "systemctl start watchdog.service"
WATCHDOG_STATUS = "systemctl status watchdog.service"
WATCHDOG_STOP = "systemctl stop watchdog.service"
WATCHDOG_RESTART = "systemctl restart watchdog.service"

def run_command(self, command):

proc = subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
(out, err) = proc.communicate()

if proc.returncode != 0:
sys.exit(proc.returncode)

def write_config(self, filename, search, replace):

retval = -1

if (not os.path.isfile(filename)):
print filename, 'not found !'
return retval

with open(filename, "r") as file:
filedata = file.read()

if filedata:
filedata = filedata.replace(search, replace)

with open(filename, 'w') as file:
file.write(filedata)

def check_config(self, filename, param):

filedata = open(filename).read()
for item in filedata.split("\n"):
if param in item:
return item

return None

def get_wd_register(self, reg_name):

retval = 'ERR'
WATCHDOG_SYS_DIR = "/sys/class/watchdog/watchdog0"

wd_reg_file = WATCHDOG_SYS_DIR+'/'+reg_name

if (not os.path.isfile(wd_reg_file)):
print wd_reg_file, 'not found !'
return retval

try:
with open(wd_reg_file, 'r') as fd:
retval = fd.read()
except Exception as error:
logging.error("Unable to open ", wd_reg_file, "file !")

retval = retval.rstrip('\r\n')
retval = retval.lstrip(" ")
return retval

def arm(self, seconds):
"""
Arm the hardware watchdog with a timeout of <seconds> seconds.
If the watchdog is currently armed, calling this function will
simply reset the timer to the provided value. If the underlying
hardware does not support the value provided in <seconds>, this
method should arm the watchdog with the *next greater* available
value.

Returns:
An integer specifying the *actual* number of seconds the watchdog
was armed with. On failure returns -1.
"""

# Max timeout is 30 seconds
if seconds > 30:
seconds = 30

# Enable watchdog in boot-up
watchdog_enable = self.check_config(
self.WATCHDOG_DEFAULT_FILE,
"run_watchdog=1")
if watchdog_enable is None:
self.write_config(
self.WATCHDOG_DEFAULT_FILE,
"run_watchdog=0",
"run_watchdog=1")

# configure watchdog-timeout
new_timeout = 'watchdog-timeout = ' + str(seconds)
old_timeout = self.check_config(
self.WATCHDOG_CONFIG_FILE,
"watchdog-timeout")

if old_timeout is not None:
self.write_config(
self.WATCHDOG_CONFIG_FILE,
old_timeout,
new_timeout)
else:
with open(self.WATCHDOG_CONFIG_FILE, "a") as wd_file:
wd_file.write(new_timeout)
self.run_command(self.WATCHDOG_START)

# Restart watchdog service
self.run_command(self.WATCHDOG_RESTART)

if self.get_wd_register("timeout") == str(seconds):
return seconds

return -1

def disarm(self):
"""
Disarm 'watchdog-timeout' in open(self.WATCHDOG_CONFIG_FILE).read():
the hardware watchdog

Returns:
A boolean, True if watchdog is disarmed successfully, False if not
"""
if self.get_wd_register("state") == "active":

# Disable watchdog in boot-up
self.write_config(
self.WATCHDOG_DEFAULT_FILE,
"run_watchdog=1",
"run_watchdog=0")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will disable the watchdog at next boot. However, this function is meant to disable the watchdog at runtime, in the event there may ever be a need. Is this not possible because of the "nowayout" feature enabled above?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. You are right. The trade-off is with nowayout there is noway to stop watchdog. (Even with magic value)


return True

return False

def is_armed(self):
"""
Retrieves the armed state of the hardware watchdog.

Returns:
A boolean, True if watchdog is armed, False if not
"""

if self.get_wd_register("state") == "active":
return True

return False

def get_remaining_time(self):
"""
If the watchdog is armed, retrieve the number of seconds remaining on
the watchdog timer

Returns:
An integer specifying the number of seconds remaining on thei
watchdog timer. If the watchdog is not armed, returns -1.
"""
if self.get_wd_register("state") == "active":
return self.get_wd_register("timeleft")

return -1