d8815b1cd746f2875aff435dc9f6137e0fcf68fd
Add hereby "set -uo pipefail", with the goal of improving reliability (suggested by Emil/@xexaxo). Notice that the suggestion included "-e", but we make use of this, by checking non-zero pipes, so instead of refactoring the code to just have this option, the choice was to not have it. Also, make use of bash as the shell to execute the tools - after some analysis, we make use of few bashisms that are a bummer to change, since a lot of scripts in SteamOS make use of bash and in general it is a very common shell, let's just go along with it. Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
# ###########################################################################
# ########################## Arch Kdump / Pstore ###########################
# ###########################################################################
#
#
# This is the Arch Kdump/Pstore infrastructure; the goal is to collect
# data whenever a kernel crash is detected. There is a lightweight
# collection, that only grabs dmesg, and a more complete setting to grab the
# whole (compressed) vmcore. See the DETAILS section below for more info.
#
#
# ############################ HOW-TO USE IT ##############################
#
# 1. Install the package with pacman if not available in your system; to check
# if it's already installed look the pacman installed package list. Also, be
# sure the systemd service was properly loaded by checking
# 'systemctl status kdump-init.service'.
#
# 2. In a crash event, the dmesg log is collected, and by default this happens
# via the Pstore mechanism, i.e., no extra memory should be reserved and no
# GRUB change is required. If 'lsmod' shows "ramoops", then Pstore is in use.
# Some extra files are collected besides dmesg, like dmidecode output and the
# "/etc/os-release" file.
#
# 3. The logs are stored in a ZIP file in the folder at "$MOUNT_FOLDER/logs"
# (see the config file); this file is named as: "kdump-TIMESTAMP.zip",
# where TIMESTAMP is the current timestamp (tz is UTC).
#
# 4. (IMPORTANT) Please, test the infrastructure in order to see if a dummy
# crash log is collected before using it to try debugging complex issues.
# In order to do that, login to a shell and execute, as root user:
# 'echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger'
#
# This action will trigger a dummy crash and reboot the system; check if
# there is a ZIP file with the crash logs in the directory described in (3).
#
# 5. Various tunings are available at "/usr/share/kdump.d/*" files; for
# example, the users can choose Kdump instead of Pstore (USE_PSTORE_RAM),
# and if using Kdump, collect the full vmcore (FULL_COREDUMP). The vmcore is
# not stored in the ZIP file, but it's saved in "$MOUNT_FOLDER/crash".
# NOTICE that, if Kdump is used instead of Pstore (either per user's choice
# or due to some failure in Pstore), a reboot is necessary before kdump is
# usable, in order to effectively reserve crashkernel memory.
#
# 6. Error and succeeding messages are sent to systemd journal, so running
# 'journalctl -b | grep kdump' would hopefully bring some information.
#
#
# ############################## DETAILS ##################################
# CAVEATS / INSTRUCTIONS
# ###########################################################################
# (a) We automatically edit GRUB config in case Pstore fails or if the user's
# choice is to use Kdump. But it requires one reboot in order the crashkernel
# memory is effectively reserved by kernel.
#
# In case Kdump is used, the crashkernel necessary memory was empirically
# determined; setting 144M wasn't enough, 160M is unstable, so 192M seems
# good enough. This amount might change in future kernel versions, requiring
# tests using the approach suggested in the step (4) above.
#
#
# TODOs
# ###########################################################################
# * Would be interesting to have a clean-up mechanism, to keep up to N most
# recent ZIP log files, instead of keeping all of them forever.
#
# * Pstore ramoops back-end has some limitations that we're discussing with
# the kernel community - right now we can only collect ONE dmesg and its
# size is truncated on "record_size" bytes, not allowing a file split like
# efi-pstore; thankfully we still can collect 2MiB dmesg, but hopefully we can
# improve that upstream.
#
# * Add a more reliable reboot mechanism - we had seen issues in the past
# with "reboot -f", and relying in sysrq reboot as a quirk managed to be a safe
# option, so this is something to think about. Should be easy to implement.
#
Description
Fork of https://gitlab.freedesktop.org/gpiccoli/kdumpst that works if you use btrfs with subvolumes
Languages
Shell
87.5%
Makefile
12.5%