Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SONiC core dump utility #3499

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

rajendra-dendukuri
Copy link
Contributor

- What I did

Added new way to collect and manage application core files.

- How I did it
Core files generated by kernel are created by the host o/s and also stored
on host o/s. This applies for processes running inside container as well.
Containers do not have access to journal as well as core files. Containers
are supposed to have limited access to host o/s and core files and journal
may contain sensitive information.

To improve debugging of crashes inside a container following changes are made:

  • Install systemd-coredump in base o/s

  • Remove existing simple coredump facility

  • Enable persistent journald to store coredump history

  • Minimal default coredump configuration

  • Added a coredump-config service to generate coredump configuration

  • when INSTALL_DEBUG_TOOLS=y is set in the build. systemd-coredump tool
    is installed in all containers beside gdb

  • When SONIC_DEBUGGING_ON=y is set in the build,
    /var/log/journal and /var/lib/systemd/coredump are mapped inside container

To inspect a core file, from a container shell, issue below commands

docker exec -ti coredumpctl

- How to verify it
kill -ABRT
coredumpctl list
coredumpctl info

- Description for the changelog

Use systemd-coredump for core file management in SONiC

- A picture of a cute animal (not mandatory but encouraged)

- Install systemd-coredump in base o/s
- Remove existing simple coredump facility
- Enable persistent journald to store coredump history
- Minimal default coredump configuration
- Added a coredump-config service to generate coredump configuration

Core files generated by kernel are created by the host o/s and also stored
on host o/s. This applies for processes running inside container as well.
Containers do not have access to journal as well as core files. Containers
are supposed to have limited access to host o/s and core files and journal
may contain sensitive information.

Toimprove debugging of crashes inside a container following changes are made:

- when INSTALL_DEBUG_TOOLS=y is set in the build. systemd-coredump tool
is installed in all containers beside gdb

- When SONIC_DEBUGGING_ON=y is set in the build,
  /var/log/journal and /var/lib/systemd/coredump are mapped inside container

To inspect a core file, from a container shell, issue below commands

docker exec -ti <container-name> /bin/bash
@rajendra-dendukuri
Copy link
Contributor Author

Refer to
sonic-net/SONiC#468

slave.mk Outdated
@@ -637,6 +637,7 @@ $(addprefix $(TARGET_PATH)/, $(SONIC_INSTALLERS)) : $(TARGET_PATH)/% : \
export sonic_asic_platform="$(patsubst %-$(CONFIGURED_ARCH),%,$(CONFIGURED_PLATFORM))"
export enable_organization_extensions="$(ENABLE_ORGANIZATION_EXTENSIONS)"
export enable_dhcp_graph_service="$(ENABLE_DHCP_GRAPH_SERVICE)"
export sonic_debugging_on="$(SONIC_DEBUGGING_ON)"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a good reason to separate this from INSTALL_DEBUG_TOOLS? Maybe we can combine them both into a BUILD_DEBUG_IMAGE flag?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There will always be cases where we would want to do something separate for INSTALL_DEBUG_TOOLS compared to SONIC_DEBUGGING_ON. The idea I followed was, INSTALL_DEBUG_TOOLS will install additional debug tools (coredumpctl) installed with in the container. This adds some size to the container. So users may not want to do this bit and do it if they really want to debug using gdb.

SONIC_DEBUGGING_ON, will allow the container to have access to /var/log/journal so that it can find matching core reports.

You are correct we will need both for running the gdb, but wanted to keep them separate to give flexibility to the user.


DISABLE_COREDUMP_CONF="/etc/sysctl.d/50-disable-coredump.conf"

if [ "$(redis-cli -n 4 HGET "COREDUMP|config" "enabled")" = "false" ] ; then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If key is present && carry value as false, then we disable. In other words the default behavior is "enabled=true". To disable, one has to create this key explicitly.

Instead, why not require a key for disabling, which would imply default is enabled.

"COREDUMP|disabled" == true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is common place to see configuration knobs with a positive intent. So if we flip this around, it may confuse the user with other usages of true/false kind of configurations.

DISABLE_COREDUMP_CONF="/etc/sysctl.d/50-disable-coredump.conf"

if [ "$(redis-cli -n 4 HGET "COREDUMP|config" "enabled")" = "false" ] ; then
echo "kernel.core_pattern=" > ${DISABLE_COREDUMP_CONF}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean ?

Can you please explain the impact of disable?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the coredump admin mode is disabled in config db, core files will not be generated. We are creating a sysctl entry to disable core dump.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give me a use case, where you would not want core file dump ?

Core file dump implies that some unexpected error occurred or user explicitly creating one with kill for a reason. In either case, the dump is required for analysis.

If this is the only purpose of coredump-config.service, I don't see a need.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. corefiles can be a space hog
  2. Multiple corefiles may be generated if a process is in infinite loop
  3. corefiles may contain sensitive information, so some applications may not want it to be recorded.
  4. We plan to re-use coredump-config.service for enable/disable of kernel coredump as well. Also there may be additional parameters that you would like to configure w.r.t core files (e.g limit on the size of core file). Current mode of operation is we chose some fixed numbers. But future extensions may make them configurable. To start with, we are providing a framework to enable/disable the feature.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already another initiative to do core file rotation and limit the count of core files at any time per process. Turning off is definitely not the solution.

The limitation on count / size is also from Broadcom only. I need to look for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact you are referring that PR #468, which has the following.

" a. Support per-process core file rotation and archiving to optimize disk space "

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am referring to various configurations provided by systemd-coredump. Below is a link to it.
https://www.freedesktop.org/software/systemd/man/coredump.conf.html
User's might want some bits of it be part of ConfigDB.

The PR I was referring to is PR#729 which enables kernel core dump feature. For this feature, it is desirable that users have an enable/disable knob as kdump requires dedicated 512MB of memory.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me add Guohan to this thread. What we need is the "requirement/use case"? I don't see one.

I find it rather risky to have a DB variable to control, as it could get saved and persist across reboots, transparently, which is a big risk as it is likely to block core dumps unintentionally.

I would rather have this ability as a CLI tool, which disables temporarily and all should be back to default/enabled state upon reboot.

What we do need is the ability to limit count of cores per process and overall disk size taken by cores.

@@ -11,7 +11,9 @@ VIM = vim
OPENSSH = openssh-client
SSHPASS = sshpass
STRACE = strace
$(DOCKER_BASE_STRETCH)_DBG_IMAGE_PACKAGES += $(GDB) $(GDBSERVER) $(VIM) $(OPENSSH) $(SSHPASS) $(STRACE)
SYSTEMD_COREDUMP = systemd-coredump
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W/o this package installed, will there be core dumps created?

Plus this is already installed in build_debian.sh unconditionally. Can you please explain?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

W/O this package installed inside the container, core dumps will still be created.

Core files are always generated on host o/s and stored in the /var/lib/systemd/coredump directory.

The application that has crashed may be part of a container. So if you want to run gdb using the coredumpctl gdb command, it will not find the application binary when executed on host o/s.

So, we map the /var/lib/systemd/coredump directory to the containers (see change in docker_image_ctl.j2) and also install the coredumpctl tool here. Now, the corefile and the handy coredumpctl tool are ready for debugging the application inside the container.

@rajendra-dendukuri
Copy link
Contributor Author

retest vsimage please

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants