Multiple fixes and improvements

OK, likely I should have split this commit in many more, to ease
reviews and improve bisectability. But since I didn't, I'll at least
try to summarize here what was changed by this big commit, starting
with the more complex additions to the trivial ones:

- Added a log submission system; for now, it doesn't submit logs
to anywhere since we don't have such API well-defined. But it saves
the logs locally in a tar.zst, using timestamps to define the moment
of log collection and clears the old files. The naming of the log
compressed tarball makes use of the Deck serial number and Steam
most recent logged account (thanks @TonyP for that idea).

- Still on the log submission topic: we try first pstore logs, and
if we have none of them, we check kdump logs. In case we have a
vmcore, we save it locally (after renaming), but for now we consider
that this kind of file won't be submitted.

- Since we now have makedumpfile in Holo (thanks @xexaxo), I've
removed this binary from here and made this package dependent on
makedumpfile. Also, adjusted the respective scripts.

- Added some error messages through logger, in case we fail to
load pstore/kdump or fail in the log submission script.

- Changed systemd service to just do the pstore/kdump loading
and call the submit_report.sh script, which is then "disowned"
in order we finish as fast as possible the systemd service; boot
delays due to this service wouldn't be nice.

- Increased the "crashkernel" memory recommended in the documentation,
since I've noticed the most recent version of the kernel requires
a bit more - let's play safe!

- Changed timestamps to use UTC tz - this is due kdump collection
happening so early, that timezone is not set, so let's stick with
UTC in all cases.

- Checked scripts with shellcheck[0] and improved the README.

[0] Accepted most suggestions, but some are polemic, and may introduce
issues, so the scripts are not fully passing shellcheck and I don't
expect them to be in the future, it's just a minor style improvement
and failsafe for multiple shells.

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
This commit is contained in:
Guilherme G. Piccoli
2022-01-12 18:38:32 -03:00
parent 2ab4eae53e
commit 3638dc8264
10 changed files with 189 additions and 71 deletions

View File

@ -15,7 +15,8 @@
# collection, that only grabs dmesg, and a more complete setting to grab the
# whole (compressed) vmcore. The tunnings are available at /etc/default/kdump.
#
# Also, the infrastructure is able to configure and save pstore-RAM logs.
# Also, the infrastructure is able to configure and save pstore-RAM logs;
# this is the default option.
#
# After installation and a reboot, things should be all set EXCEPT for GRUB
# config - please check the CAVEATS/INSTRUCTIONS section below. Notice the
@ -27,17 +28,17 @@
# CAVEATS / INSTRUCTIONS
# ###########################################################################
# (a) For now, we don't automatically edit any GRUB config, so the minimum
# necessary action after installing this package is to add "crashkernel=160M"
# necessary action after installing this package is to add "crashkernel=192M"
# to your GRUB config in order subsequent boots pick this setting and do reserve
# the memory, or else kdump cannot work. The memory amount was empirically
# determined - 128M wasn't enough and 144M is unstable, so 160M seems good enough.
# determined - 144M wasn't enough and 160M is unstable, so 192M seems good enough.
# If you prefer to rely on pstore-RAM, no GRUB setting should be required; this
# is currently the default (see /etc/default/kdump).
#
# (b) It requires (obviously) a RW rootfs - we've used tune2fs in order to make
# it read-write, since it's RO by default. Also, we assume the nvme partition
# scheme is default across all versions and didn't change with new updates
# for example - kdump relies in mounting partitions, etc.
# for example - both kdump and pstore relies in mounting partitions, etc.
#
# (c) Due to a post-transaction hook executed by libalpm (90-dracut-install.hook),
# unfortunately after installing the kdump-steamos package *all* initramfs images
@ -45,9 +46,9 @@
# but for now be prepared: the installation take some (long) minutes due to that ={
#
# (d) Unfortunately makedumpfile from Arch Linux is not available on official
# repos, only in AUR. So, we're hereby _packing the binary_ with all the scripts,
# which is a temporary workaround and should be resolved later - already started
# to "lobby" for package inclusion in the official channels:
# repos, only in AUR. But it is available on Holo, so we make use of that.
# Also, a discussion was started to get it included on official repos:
# https://lists.archlinux.org/pipermail/aur-general/2022-January/036767.html
# https://aur.archlinux.org/packages/makedumpfile/#comment-843853
#
#
@ -68,13 +69,16 @@
# in the past and relying in sysrq reboot as a quirk managed to be a safe option,
# so this is something to think about here. Should be easy to implement.
#
# (5) Maybe a good idea would be to allow creating the minimum image for any
# specified kernel, not only for the running one (which is what we do now).
# Low-priority idea, easy to implement.
# (5) The log submission mechanism is incomplete - we save the logs as tar.zst
# files, but they are not submitted to any remote server, etc.
#
# (6) Pstore ramoops backend has some limitations that we're discussing with
# the kernel community - right now we can only collect ONE dmesg and its
# size is truncated on "record_size" bytes, not allowing a file split like
# efi-pstore; hopefully we can improve that.
#
# (7) Maybe a good idea would be to allow creating the minimum image for any
# specified kernel, not only for the running one (which is what we do now).
# Low-priority idea, easy to implement.
#
```