submit_report.sh, kdump.etc: Add log submission mechanism
Finally we have a functional mechanism to upload the crash logs to Valve servers (special thanks to TonyP for the API help). Documentation is present in the README.MD as usual. NOTE: worth to reinforce here what was alread mentioned in the README: kdump-steamos doesn't perform any significant validation against malicious usage of the log submission mechanism, like DDoS, submitting a malicious ZIP binary, a very huge file, etc. All of this is expected to be handled by the server side. Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com>
This commit is contained in:
110
README.md
110
README.md
@ -21,30 +21,33 @@
|
||||
#
|
||||
#
|
||||
# 1. Install the package with pacman if not available in your image - there's
|
||||
# a pre-built binary package in this gitlab; to check if it's already installed
|
||||
# look the pacman installed package list. Also, be sure the systemd service was
|
||||
# properly loaded by checking 'systemctl status kdump-steamos.service'.
|
||||
# a prebuilt binary package in this gitlab; to check if it's already installed
|
||||
# look the pacman installed package list. Also, be sure the systemd service
|
||||
# was properly loaded by checking 'systemctl status kdump-steamos.service'.
|
||||
#
|
||||
# 2. Only the dmesg is collected, and by default this happens via the Pstore
|
||||
# mechanism, i.e., no extra memory should be reserved and no GRUB change is
|
||||
# required. If 'lsmod' shows "ramoops", then Pstore is in use.
|
||||
# 2. In a crash event, the dmesg log is collected, and by default this happens
|
||||
# via the Pstore mechanism, i.e., no extra memory should be reserved and no
|
||||
# GRUB change is required. If 'lsmod' shows "ramoops", then Pstore is in use.
|
||||
# Besides the dmesg with some extra information (like tasks running, memory
|
||||
# usage on crash, etc), more logs are collected like the image build version,
|
||||
# running kernel version and dmidecode.
|
||||
#
|
||||
# 3. The logs are stored in a ZIP file at "/home/.steamos/offload/var/kdump/";
|
||||
# besides the dmesg with some extra information, the image build version,
|
||||
# running kernel version and dmidecode are stored in this ZIP file as well.
|
||||
# This file is named as: "steamos-SERIAL-STEAM_USER.timestamp.zip", where
|
||||
# SERIAL is the machine serial (from dmidecode), STEAM_USER is the Steam
|
||||
# account name (based on the last logged Steam user) and timestamp tz is UTC.
|
||||
# if this ZIP file was successfully submitted to Valve servers, this file is
|
||||
# then moved into the sub-folder "sent_logs/". This file is named as:
|
||||
# "steamos-SERIAL-STEAM_USER.timestamp.zip", where SERIAL is the machine
|
||||
# serial (from dmidecode), STEAM_USER is the Steam account name (based on the
|
||||
# last logged Steam user) and timestamp tz is UTC.
|
||||
#
|
||||
# 4. (IMPORTANT) Please, test the infrastructure in order to see if a dummy
|
||||
# crash log is collected before using it to try debugging complex issues.
|
||||
# In order to do that, login to a shell and execute, as root user:
|
||||
# 'echo 1 > /proc/sys/kernel/sysrq ; echo c > /proc/sysrq-trigger'
|
||||
#
|
||||
# This action will trigger a dummy crash and reboot the system; check if there
|
||||
# is a ZIP file with the crash logs in the directory described in (3).
|
||||
# This action will trigger a dummy crash and reboot the system; check if
|
||||
# there is a ZIP file with the crash logs in the directory described in (3).
|
||||
#
|
||||
# 5. Some tunnings are available at "/etc/default/kdump"; for example users
|
||||
# 5. Some tunings are available at "/etc/default/kdump"; for example users
|
||||
# can choose Kdump instead of Pstore (USE_PSTORE_RAM), and if using Kdump,
|
||||
# collect the full vmcore (FULL_COREDUMP). The vmcore is not stored in the
|
||||
# ZIP file, but it's saved in "/home/.steamos/offload/var/kdump/crash/".
|
||||
@ -52,9 +55,13 @@
|
||||
# needed in GRUB cmdline: "crashkernel=192M crash_kexec_post_notifiers" and
|
||||
# a regular reboot is necessary.
|
||||
#
|
||||
# 6. Error and succeeding messages are sent to systemd journal, so running
|
||||
# 'journalctl | grep kdump' would hopefully bring some information. Also,
|
||||
# the ZIP file collected is automatically submitted to Valve servers; see
|
||||
# below under DETAILS/LOG SUBMISSION for API details, decisions made, etc.
|
||||
#
|
||||
#
|
||||
# ############################## DETAILS ##################################
|
||||
#
|
||||
# CAVEATS / INSTRUCTIONS
|
||||
# ###########################################################################
|
||||
# (a) Currently, we don't automatically edit GRUB config; see TODO (1) below.
|
||||
@ -91,9 +98,8 @@
|
||||
# (1) We'd like to be able to automatically edit GRUB and recreate its config
|
||||
# file - implementation tests are ongoing.
|
||||
#
|
||||
# (2) The log submission mechanism is incomplete - we save the logs as a local
|
||||
# ZIP file (as discussed in the HOW-TO), but they aren't submitted to a remote
|
||||
# Valve server. There's an API in-place, so the implementation is starting.
|
||||
# (2) Would be interesting to have a clean-up mechanism, to keep up to N most
|
||||
# recent ZIP log files, instead of keeping all of them forever.
|
||||
#
|
||||
# (3) Hopefully we can fix/prevent the unnecessary re-creation of all initramfs
|
||||
# images - it happens due to our package installing files on directory
|
||||
@ -101,8 +107,10 @@
|
||||
#
|
||||
# (4) We have a "fragile" way of determining a mount point required for Kdump;
|
||||
# this is something to improve maybe, in order to make the Kdump more reliable.
|
||||
# Also in the list of fragile things, VDF parsing is...complicated. Something
|
||||
# that would be nice to improve as well.
|
||||
#
|
||||
# (5) Pstore ramoops backend has some limitations that we're discussing with
|
||||
# (5) Pstore ramoops back-end has some limitations that we're discussing with
|
||||
# the kernel community - right now we can only collect ONE dmesg and its
|
||||
# size is truncated on "record_size" bytes, not allowing a file split like
|
||||
# efi-pstore; thankfully we still can collect 2MiB dmesg, but hopefully we can
|
||||
@ -116,4 +124,68 @@
|
||||
# specified kernel, not only for the running one (which is what we do now).
|
||||
# Low-priority idea, easy to implement.
|
||||
#
|
||||
#
|
||||
# LOG SUBMISSION
|
||||
# ###########################################################################
|
||||
# The logs collected and compressed in the ZIP file are kept in the system,
|
||||
# but they provide valuable data to Valve in order to determine issue in the
|
||||
# field, and hopefully fix them, so users are happy. Hence, the kdump-steamos
|
||||
# is capable now to submit logs to Valve servers, through an API. Below such
|
||||
# API is described, but first worth to mention some assumptions / decisions
|
||||
# made in the log submission mechanism:
|
||||
#
|
||||
# * First of all, we attempt to verify network connectivity by pinging the
|
||||
# URL "steampowered.com" - quick pings (2 packets, 0.5s between each one)
|
||||
# are attempted, but if after 99 of such pings network is considered not
|
||||
# not reliable, the log submission is aborted, but the ZIP file is kept
|
||||
# locally of course.
|
||||
#
|
||||
# * The 'curl' tool is used to submit the requests to Valve servers; for
|
||||
# that, some temporary files named ".curl_XXX" are saved in the kdump
|
||||
# folder - mentioned in the point (3) above. These files are deleted
|
||||
# if the log submission mechanism works fine, or else they're currently
|
||||
# kept for debug purposes, along with a new ".curl_err" file.
|
||||
#
|
||||
# * It is assumed that any throttling / anti-DoS mechanism comes from the
|
||||
# server portion, so the kdump-steamos doesn't perform any significant
|
||||
# validations with this respect, only basic correctness validations.
|
||||
#
|
||||
#
|
||||
# => The API details: it works by a first POST request to Valve servers,
|
||||
# which, when succeed, returns 3 main components in the response. We use
|
||||
# these values to perform a PUT request with the ZIP compressed file, and
|
||||
# finally a last POST request is necessary to finish the transaction. The
|
||||
# POST requests' URL is present in "/etc/default/kdump".
|
||||
# Below, the specific format of such requests:
|
||||
#
|
||||
# The first POST takes the following fields:
|
||||
#
|
||||
# steamid = user Steam ID, based on the latest Steam logged user;
|
||||
# have_dump_file = 0/1 - should be 1 when sending a ZIP file;
|
||||
# dump_file_size = the ZIP file size, in bytes;
|
||||
# product = "holo" (hard-coded for now);
|
||||
# build = the SteamOS build ID, from '/etc/os-release' file;
|
||||
# version = running kernel version;
|
||||
# platform = "linux" (hard-coded for now);
|
||||
# crash_time = the timestamp (epoch) of log collection/submission;
|
||||
# stack = a really concise call trace summary, only functions/addrs;
|
||||
# note = summary of the dmesg crash info, specifically a full stack trace;
|
||||
# format = "json" (hard-coded for now).
|
||||
#
|
||||
# The response of a succeeding POST will have multiple fields, that can
|
||||
# be split in 3 categories:
|
||||
#
|
||||
# PUT_URL = a new URL to be used in the PUT request;
|
||||
# GID = special ID used to finish the submission process in the next POST;
|
||||
# header name/value pairs = multiple pairs of name/value fields used as
|
||||
# headers in the PUT request.
|
||||
#
|
||||
# After parsing the response, we perform a PUT request to the PUT_URL, with
|
||||
# the ZIP file as a "--data-binary" component and the additional headers that
|
||||
# were collected in the first POST's response. Finally, we just POST the GID
|
||||
# to the finish URL ("gid=GID_NUM") and the process is terminated.
|
||||
#
|
||||
# Notice we heavily use 'jq' tool to parse the JSON response, so we assume
|
||||
# this format is the response one and that it's not changing over time.
|
||||
#
|
||||
```
|
||||
|
||||
Reference in New Issue
Block a user