Guilherme G. Piccoli a3ab8c421b all: Refactor the whole kdump/pstore folder setting
This is somewhat a big refactor. The early design of kdump/pstore was
meant to support the A/B scheme of Steam Deck and a dracut-based initrd
only. In this scheme, we had a DEVNODE (like nvme partition or a devlink)
that represented the device node to be mounted early in boot when kdump
was in use. Also, we had a folder defined in the config file on top of
such dev node, and a temporary file used to "carry" the composition of
the full kdump path across boot time scripts. Yeah, kinda complex setup.

We are now moving to a more generic approach, and for that, the design
choice was a more convenient/simple one for the common cases, that
requires some operations to properly work on SteamOS dracut-based initrd.
Now we have only a single path on config file, which should be accessible
in a R/W filesystem by both scripts executed in the systemd service. No
devnode information or temp file is used anymore.

But with that, comes the need of discovering the proper devnode and base
folder for kdump'ing early in boot, from the initrd. Using the findmnt
tool we manage to derive all the necessary data during the initrd
preparation phase. Also, while at it we manage to fix an "inconsistency"
of our dracut initrd creation script: installkernel() should be responsible
to deal with DRM modules removal, not install().

On top of this (already big) change, now our dracut initrd excludes not
only amdgpu driver/FWs, but radeon, nvidia and i915 as well. And due to
our refactor of the mount point information (using findmnt to collect info
during dracut initrd creation), we also allow now arbitrary filesystem
drivers to be included, i.e., we don't hardcode/limit for ext4 only.

Again, mea culpa for not splitting this in multiple atomic/simple commits,
the burden to keep a pretty git log is starting to consume precious time.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
2023-03-31 15:34:42 -03:00

#  ###########################################################################
#  ########################## Arch Kdump / Pstore  ###########################
#  ###########################################################################
#
#
#  This is the Arch Kdump/Pstore infrastructure; the goal is to collect
#  data whenever a kernel crash is detected. There is a lightweight
#  collection, that only grabs dmesg, and a more complete setting to grab the
#  whole (compressed) vmcore. See the DETAILS section below for more info.
#
#
#  ############################  HOW-TO USE IT  ##############################
#
#  1. Install the package with pacman if not available in your system; to check
#  if it's already installed look the pacman installed package list. Also, be
#  sure the systemd service was properly loaded by checking
#  'systemctl status kdump-init.service'.
#
#  2. In a crash event, the dmesg log is collected, and by default this happens
#  via the Pstore mechanism, i.e., no extra memory should be reserved and no
#  GRUB change is required. If 'lsmod' shows "ramoops", then Pstore is in use.
#  Some extra files are collected besides dmesg, like dmidecode output and the
#  "/etc/os-release" file.
#
#  3. The logs are stored in a ZIP file in the folder at "$MOUNT_FOLDER/logs"
#  (see the config file); this file is named as: "kdump-TIMESTAMP.zip",
#  where TIMESTAMP is the current timestamp (tz is UTC).
#
#  4. (IMPORTANT) Please, test the infrastructure in order to see if a dummy
#  crash log is collected before using it to try debugging complex issues.
#  In order to do that, login to a shell and execute, as root user:
#  'echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger'
#
#  This action will trigger a dummy crash and reboot the system; check if
#  there is a ZIP file with the crash logs in the directory described in (3).
#
#  5. Various tunings are available at "/usr/share/kdump.d/*" files; for
#  example, the users can choose Kdump instead of Pstore (USE_PSTORE_RAM),
#  and if using Kdump, collect the full vmcore (FULL_COREDUMP). The vmcore is
#  not stored in the ZIP file, but it's saved in "$MOUNT_FOLDER/crash".
#  NOTICE that, if Kdump is used instead of Pstore (either per user's choice
#  or due to some failure in Pstore), a reboot is necessary before kdump is
#  usable, in order to effectively reserve crashkernel memory.
#
#  6. Error and succeeding messages are sent to systemd journal, so running
#  'journalctl -b | grep kdump' would hopefully bring some information.
#
#
#  ##############################  DETAILS  ##################################
#  CAVEATS / INSTRUCTIONS
#  ###########################################################################
#  (a) We automatically edit GRUB config in case Pstore fails or if the user's
#  choice is to use Kdump. But it requires one reboot in order the crashkernel
#  memory is effectively reserved by kernel.
#
#  In case Kdump is used, the crashkernel necessary memory was empirically
#  determined; setting 144M wasn't enough, 160M is unstable, so 192M seems
#  good enough. This amount might change in future kernel versions, requiring
#  tests using the approach suggested in the step (4) above.
#
#
#  TODOs
#  ###########################################################################
#  * Would be interesting to have a clean-up mechanism, to keep up to N most
#  recent ZIP log files, instead of keeping all of them forever.
#
#  * Pstore ramoops back-end has some limitations that we're discussing with
#  the kernel community - right now we can only collect ONE dmesg and its
#  size is truncated on "record_size" bytes, not allowing a file split like
#  efi-pstore; thankfully we still can collect 2MiB dmesg, but hopefully we can
#  improve that upstream.
#
#  * Add a more reliable reboot mechanism - we had seen issues in the past
#  with "reboot -f", and relying in sysrq reboot as a quirk managed to be a safe
#  option, so this is something to think about. Should be easy to implement.
#
Description
Fork of https://gitlab.freedesktop.org/gpiccoli/kdumpst that works if you use btrfs with subvolumes
Readme LGPL-2.1 844 KiB
Languages
Shell 87.5%
Makefile 12.5%