config/sysctl: Adjust some configuration options/sysctls to be more agnostic

Currently a lot of our configuration settings and panic sysctls are
highly specific to SteamOS, so let's make them distro-agnostic, more
focused on generic HW / usage of a regular Arch Linux user. The affected
configs/sysctls are detailed below:

(a) Pstore memory settings: since on Steam Deck we have somewhat
pre-reserved RAM for pstore (~15M due to kernel memory alignment
rounding), makes sense to have a bit more of such memory effectively
available for pstore. In the general case though, likely users will
require to manually reserve it, so 4M of total memory with 1M buffer
seems more than enough to collect a dmesg, specially considering point
(e) below.

(b) The log storage folder was tuned for Deck, in which we have A/B
partitioning scheme and a persistent /home, but in general (following
standard kdump tools "on the market", like Debian's/Fedora's), /var
is used for that, so we follow the trend here.

(c) Grub file location was also special on SteamOS, so let's make
it follow the default /boot/grub/grub.cfg here.

(d) Kdump-specific tunings: the goal for people using kdump (not pstore!)
is usually to collect the vmcore of the panicked kernel to explore it,
using tools like crash/drgn. This is not the main goal on SteamOS, in
which we want to collect as much info we can get *on dmesg* and that's
it for most cases...

With that in mind, we needed "crash_kexec_post_notifiers" parameter
to dump more info on dmesg during a panic (a potentially problematic
parameter in some HWs BTW, but tested in depth on Deck) and we disabled
the vmcore saving by default as well. So, let's "revert" it here, having
vmcore capturing enabled by default and dropping the post_notifiers
parameter (see next point as well).

(e) About the sysctls, we are more aggressive on panicking on Deck
(like panic on soft lockups) and the goal is to collect the most info
we can on dmesg, so needed to enable panic_print to dump tasks and
whatnot on dmesg during a panic event. In the general case, people
that wish to have the most information as possible would go with
kdump, collecting vmcore, not with pstore collecting just the dmesg.

With that said, "reduce" panic_print here to only show memory info
and CPUs backtraces, and disable soft lockup panic.
(Also we cleared the file to drop mentioning the choices of *not*
panicking on hung tasks or RCU stalls).

Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
This commit is contained in:
Guilherme G. Piccoli
2023-03-22 12:21:20 -03:00
parent 90b30b6d5d
commit 03e916405f
2 changed files with 9 additions and 24 deletions

View File

@ -17,8 +17,8 @@
# PSTORE_MEM_AMOUNT (decimal, in MB) in size. Also, kernel must be able to # PSTORE_MEM_AMOUNT (decimal, in MB) in size. Also, kernel must be able to
# allocate a contiguous memory amount of PSTORE_RECORD_SZ (decimal, MB). # allocate a contiguous memory amount of PSTORE_RECORD_SZ (decimal, MB).
USE_PSTORE_RAM=1 USE_PSTORE_RAM=1
PSTORE_MEM_AMOUNT=5242880 PSTORE_MEM_AMOUNT=4194304
PSTORE_RECORD_SZ=2097152 PSTORE_RECORD_SZ=1048576
# #
# #
# Mount-related options # Mount-related options
@ -27,7 +27,7 @@ PSTORE_RECORD_SZ=2097152
# be stored, as well as the kdump initrd and some ancillary data. This # be stored, as well as the kdump initrd and some ancillary data. This
# directory should be in an accessible filesystem (read/write) and if such # directory should be in an accessible filesystem (read/write) and if such
# folder doesn't exist, it'll be created. # folder doesn't exist, it'll be created.
MOUNT_FOLDER="/home/.steamos/offload/var/kdump" MOUNT_FOLDER="/var/crash/kdump"
# #
# #
# Kdump controlling settings # Kdump controlling settings
@ -41,7 +41,7 @@ MOUNT_FOLDER="/home/.steamos/offload/var/kdump"
# the most important parameters are nr_cpus=1 (to save RAM memory usage and # the most important parameters are nr_cpus=1 (to save RAM memory usage and
# avoid some potential issues with SMP) and reset_devices (some drivers # avoid some potential issues with SMP) and reset_devices (some drivers
# rely on that for proper kdump). # rely on that for proper kdump).
FULL_COREDUMP=0 FULL_COREDUMP=1
MAKEDUMPFILE_COREDUMP_CMD="-z -d 31" MAKEDUMPFILE_COREDUMP_CMD="-z -d 31"
MAKEDUMPFILE_DMESG_CMD="--dump-dmesg" MAKEDUMPFILE_DMESG_CMD="--dump-dmesg"
KDUMP_APPEND_CMDLINE="panic=-1 oops=panic fsck.mode=force fsck.repair=yes nr_cpus=1 reset_devices" KDUMP_APPEND_CMDLINE="panic=-1 oops=panic fsck.mode=force fsck.repair=yes nr_cpus=1 reset_devices"
@ -59,6 +59,6 @@ KDUMP_APPEND_CMDLINE="panic=-1 oops=panic fsck.mode=force fsck.repair=yes nr_cpu
# (notice that a trailing space is required in this line, so we avoid # (notice that a trailing space is required in this line, so we avoid
# messing with other kernel parameters). # messing with other kernel parameters).
GRUB_AUTOSET=1 GRUB_AUTOSET=1
GRUB_BOOT_FILE="/efi/EFI/steamos/grub.cfg" GRUB_BOOT_FILE="/boot/grub/grub.cfg"
GRUB_CFG_FILE="/etc/default/grub" GRUB_CFG_FILE="/etc/default/grub"
GRUB_CMDLINE="crashkernel=256M crash_kexec_post_notifiers " GRUB_CMDLINE="crashkernel=256M "

View File

@ -5,31 +5,16 @@
# #
# This file sets the sysctl parameters that are used by the # This file sets the sysctl parameters that are used by the
# kdump package, in order to panic and reboot on severe events, # kdump package, in order to panic and reboot on severe events,
# like oops or soft/hard lockups. # like oops or hard lockups.
# We also set panic_print in order to collect more info. # We also set panic_print in order to collect more info.
kernel.panic_on_oops = 1 kernel.panic_on_oops = 1
kernel.softlockup_panic = 1
kernel.hardlockup_panic = 1 kernel.hardlockup_panic = 1
# reboot as soon as possible after a panic event. # reboot as soon as possible after a panic event.
kernel.panic = -1 kernel.panic = -1
# dump more information when facing a panic event: # dump more information when facing a panic event:
# bit 0 - print all tasks info
# bit 1 - print system memory info # bit 1 - print system memory info
# bit 2 - print timer info # bit 6 - print all CPUs backtrace
# bit 6 - print all CPUs backtrace (currently on linux-next) kernel.panic_print = 66
kernel.panic_print = 71
# Currently disabled, since we might get stuck in some
# I/O operation and it won't be great panicking...
kernel.hung_task_panic = 0
# A bit risky to panic on that, might cause undesirable panics
# due to small stalls. A trade-off is if we indeed have a severe
# bug causing a (long) RCU stall, we'd not panic and have it
# reported. But seems the risk is bigger...
kernel.panic_on_rcu_stall = 0