From 55b453ed53d28da0b8d2316741e11eb22ceb39d6 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Onur=20=C3=96zkan?= Date: Wed, 17 Sep 2025 20:37:25 +0300 Subject: checkpatch: document new check PLACEHOLDER_USE MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Adds documentation for the new check PLACEHOLDER_USE in checkpatch. Link: https://lkml.kernel.org/r/20250917173725.22547-3-work@onurozkan.dev Signed-off-by: Onur Özkan Acked-by: Joe Perches Cc: Andy Whitcroft Cc: Dwaipayan Ray Cc: Jonathan Corbet Cc: Lukas Bulwahn Signed-off-by: Andrew Morton --- Documentation/dev-tools/checkpatch.rst | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/dev-tools/checkpatch.rst b/Documentation/dev-tools/checkpatch.rst index d5c47e560324..4dccd1036870 100644 --- a/Documentation/dev-tools/checkpatch.rst +++ b/Documentation/dev-tools/checkpatch.rst @@ -1245,6 +1245,16 @@ Others The patch file does not appear to be in unified-diff format. Please regenerate the patch file before sending it to the maintainer. + **PLACEHOLDER_USE** + Detects unhandled placeholder text left in cover letters or commit headers/logs. + Common placeholders include lines like:: + + *** SUBJECT HERE *** + *** BLURB HERE *** + + These typically come from autogenerated templates. Replace them with a proper + subject and description before sending. + **PRINTF_0XDECIMAL** Prefixing 0x with decimal output is defective and should be corrected. -- cgit v1.2.3 From 9544f9e6947f6508d29f0d0cc2dacaa749fc1613 Mon Sep 17 00:00:00 2001 From: Li RongQing Date: Wed, 15 Oct 2025 14:36:15 +0800 Subject: hung_task: panic when there are more than N hung tasks at the same time The hung_task_panic sysctl is currently a blunt instrument: it's all or nothing. Panicking on a single hung task can be an overreaction to a transient glitch. A more reliable indicator of a systemic problem is when multiple tasks hang simultaneously. Extend hung_task_panic to accept an integer threshold, allowing the kernel to panic only when N hung tasks are detected in a single scan. This provides finer control to distinguish between isolated incidents and system-wide failures. The accepted values are: - 0: Don't panic (unchanged) - 1: Panic on the first hung task (unchanged) - N > 1: Panic after N hung tasks are detected in a single scan The original behavior is preserved for values 0 and 1, maintaining full backward compatibility. [lance.yang@linux.dev: new changelog] Link: https://lkml.kernel.org/r/20251015063615.2632-1-lirongqing@baidu.com Signed-off-by: Li RongQing Reviewed-by: Masami Hiramatsu (Google) Reviewed-by: Lance Yang Tested-by: Lance Yang Acked-by: Andrew Jeffery [aspeed_g5_defconfig] Cc: Anshuman Khandual Cc: Arnd Bergmann Cc: David Hildenbrand Cc: Florian Wesphal Cc: Jakub Kacinski Cc: Jason A. Donenfeld Cc: Joel Granados Cc: Joel Stanley Cc: Jonathan Corbet Cc: Kees Cook Cc: Liam Howlett Cc: Lorenzo Stoakes Cc: "Paul E . McKenney" Cc: Pawan Gupta Cc: Petr Mladek Cc: Phil Auld Cc: Randy Dunlap Cc: Russell King Cc: Shuah Khan Cc: Simon Horman Cc: Stanislav Fomichev Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/kernel-parameters.txt | 20 +++++++++++++------- Documentation/admin-guide/sysctl/kernel.rst | 9 +++++---- 2 files changed, 18 insertions(+), 11 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index 6c42061ca20e..b8f8f5d74093 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -2010,14 +2010,20 @@ the added memory block itself do not be affected. hung_task_panic= - [KNL] Should the hung task detector generate panics. - Format: 0 | 1 + [KNL] Number of hung tasks to trigger kernel panic. + Format: + + When set to a non-zero value, a kernel panic will be triggered if + the number of detected hung tasks reaches this value. + + 0: don't panic + 1: panic immediately on first hung task + N: panic after N hung tasks are detected in a single scan - A value of 1 instructs the kernel to panic when a - hung task is detected. The default value is controlled - by the CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time - option. The value selected by this boot parameter can - be changed later by the kernel.hung_task_panic sysctl. + The default value is controlled by the + CONFIG_BOOTPARAM_HUNG_TASK_PANIC build-time option. The value + selected by this boot parameter can be changed later by the + kernel.hung_task_panic sysctl. hvc_iucv= [S390] Number of z/VM IUCV hypervisor console (HVC) terminal devices. Valid values: 0..8 diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index f3ee807b5d8b..0065a55bc09e 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -397,13 +397,14 @@ a hung task is detected. hung_task_panic =============== -Controls the kernel's behavior when a hung task is detected. +When set to a non-zero value, a kernel panic will be triggered if the +number of hung tasks found during a single scan reaches this value. This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. -= ================================================= += ======================================================= 0 Continue operation. This is the default behavior. -1 Panic immediately. -= ================================================= +N Panic when N hung tasks are found during a single scan. += ======================================================= hung_task_check_count -- cgit v1.2.3 From 6c2e6e2c1af1809d1d9cdbd50ac80f54f5995bdb Mon Sep 17 00:00:00 2001 From: Ye Bin Date: Sat, 25 Oct 2025 16:00:03 +0800 Subject: dynamic_debug: add support for print stack In practical problem diagnosis, especially during the boot phase, it is often desirable to know the call sequence. However, currently, apart from adding print statements and recompiling the kernel, there seems to be no good alternative. If dynamic_debug supported printing the call stack, it would be very helpful for diagnosing issues. This patch add support '+d' for dump stack. Link: https://lkml.kernel.org/r/20251025080003.312536-1-yebin@huaweicloud.com Signed-off-by: Ye Bin Cc: Jason Baron Cc: Jim Cromie Signed-off-by: Andrew Morton --- Documentation/admin-guide/dynamic-debug-howto.rst | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/dynamic-debug-howto.rst b/Documentation/admin-guide/dynamic-debug-howto.rst index 7c036590cd07..095a63892257 100644 --- a/Documentation/admin-guide/dynamic-debug-howto.rst +++ b/Documentation/admin-guide/dynamic-debug-howto.rst @@ -223,12 +223,13 @@ The flags are:: f Include the function name s Include the source file name l Include line number + d Include call trace For ``print_hex_dump_debug()`` and ``print_hex_dump_bytes()``, only the ``p`` flag has meaning, other flags are ignored. -Note the regexp ``^[-+=][fslmpt_]+$`` matches a flags specification. -To clear all flags at once, use ``=_`` or ``-fslmpt``. +Note the regexp ``^[-+=][fslmptd_]+$`` matches a flags specification. +To clear all flags at once, use ``=_`` or ``-fslmptd``. Debug messages during Boot Process -- cgit v1.2.3 From 5f264c00b669b934300dff506d0aa9f6f8f8c53e Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Thu, 13 Nov 2025 19:10:36 +0800 Subject: docs: panic: correct some sys_ifo names in sysctl doc Patch series "Enable hung_task and lockup cases to dump system info on demand", v2. When working on kernel stability issues: panic, task-hung and soft/hard lockup are frequently met. And to debug them, user may need lots of system information at that time, like task call stacks, lock info, memory info, ftrace dump, etc. panic case already uses sys_info() for this purpose, and has a 'panic_sys_info' sysctl(also support cmdline setup) interface to take human readable string like "tasks,mem,timers,locks,ftrace,..." to control what kinds of information is needed. Which is also helpful to debug task-hung and lockup cases. So this patchset introduces the similar sys_info sysctl interface for task-hung and lockup cases. his is mainly for debugging and the info dumping could be intrusive, like dumping call stack for all tasks when system has huge number of tasks, similarly for ftrace dump (we may add tracing_stop() and tracing_start() around it) Locally these have been used in our bug chasing for stability issues and were helpful. As Andrew suggested, add a configurable global 'kernel_sys_info' knob. When error scenarios like panic/hung-task/lockup etc doesn't setup their own sys_info knob and calls sys_info() with parameter "0", this global knob will take effect. It could be used for other kernel cases like OOM, which may not need one dedicated sys_info knob. This patch (of 4): Some sys_info names wered forgotten to change in patch iterations, while the right names are defined in kernel/sys_info.c. Link: https://lkml.kernel.org/r/20251113111039.22701-1-feng.tang@linux.alibaba.com Link: https://lkml.kernel.org/r/20251113111039.22701-2-feng.tang@linux.alibaba.com Fixes: d747755917bf ("panic: add 'panic_sys_info' sysctl to take human readable string parameter") Signed-off-by: Feng Tang Reviewed-by: Petr Mladek Cc: Jonathan Corbet Cc: Lance Yang Cc: "Paul E . McKenney" Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'Documentation') diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 0065a55bc09e..a397eeccaea7 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -911,8 +911,8 @@ to 'panic_print'. Possible values are: ============= =================================================== tasks print all tasks info mem print system memory info -timer print timers info -lock print locks info if CONFIG_LOCKDEP is on +timers print timers info +locks print locks info if CONFIG_LOCKDEP is on ftrace print ftrace buffer all_bt print all CPUs backtrace (if available in the arch) blocked_tasks print only tasks in uninterruptible (blocked) state -- cgit v1.2.3 From 8b2b9b4f6f4f7a61b7e323479ed7d9faa21d6287 Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Thu, 13 Nov 2025 19:10:37 +0800 Subject: hung_task: add hung_task_sys_info sysctl to dump sys info on task-hung When task-hung happens, developers may need different kinds of system information (call-stacks, memory info, locks, etc.) to help debugging. Add 'hung_task_sys_info' sysctl knob to take human readable string like "tasks,mem,timers,locks,ftrace,...", and when task-hung happens, all requested information will be dumped. (refer kernel/sys_info.c for more details). Meanwhile, the newly introduced sys_info() call is used to unify some existing info-dumping knobs. [feng.tang@linux.alibaba.com: maintain consistecy established behavior, per Lance and Petr] Link: https://lkml.kernel.org/r/aRncJo1mA5Zk77Hr@U-2FWC9VHC-2323.local Link: https://lkml.kernel.org/r/20251113111039.22701-3-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang Suggested-by: Petr Mladek Reviewed-by: Petr Mladek Reviewed-by: Lance Yang Cc: Jonathan Corbet Cc: "Paul E . McKenney" Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index a397eeccaea7..45b4408dad31 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -422,6 +422,11 @@ the system boot. This file shows up if ``CONFIG_DETECT_HUNG_TASK`` is enabled. +hung_task_sys_info +================== +A comma separated list of extra system information to be dumped when +hung task is detected, for example, "tasks,mem,timers,locks,...". +Refer 'panic_sys_info' section below for more details. hung_task_timeout_secs ====================== -- cgit v1.2.3 From a9af76a78760717361cccc884dc649e30db61c8b Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Thu, 13 Nov 2025 19:10:38 +0800 Subject: watchdog: add sys_info sysctls to dump sys info on system lockup When soft/hard lockup happens, developers may need different kinds of system information (call-stacks, memory info, locks, etc.) to help debugging. Add 'softlockup_sys_info' and 'hardlockup_sys_info' sysctl knobs to take human readable string like "tasks,mem,timers,locks,ftrace,...", and when system lockup happens, all requested information will be printed out. (refer kernel/sys_info.c for more details). Link: https://lkml.kernel.org/r/20251113111039.22701-4-feng.tang@linux.alibaba.com Signed-off-by: Feng Tang Reviewed-by: Petr Mladek Cc: Jonathan Corbet Cc: Lance Yang Cc: "Paul E . McKenney" Cc: Petr Mladek Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 45b4408dad31..176520283f1a 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -582,6 +582,11 @@ if leaking kernel pointer values to unprivileged users is a concern. When ``kptr_restrict`` is set to 2, kernel pointers printed using %pK will be replaced with 0s regardless of privileges. +softlockup_sys_info & hardlockup_sys_info +========================================= +A comma separated list of extra system information to be dumped when +soft/hard lockup is detected, for example, "tasks,mem,timers,locks,...". +Refer 'panic_sys_info' section below for more details. modprobe ======== -- cgit v1.2.3 From 03ef32d665e8a23d7ce5965b8b035666cfb47866 Mon Sep 17 00:00:00 2001 From: Feng Tang Date: Thu, 13 Nov 2025 19:10:39 +0800 Subject: sys_info: add a default kernel sys_info mask Which serves as a global default sys_info mask. When users want the same system information for many error cases (panic, hung, lockup ...), they can chose to set this global knob only once, while not setting up each individual sys_info knobs. This just adds a 'lazy' option, and doesn't change existing kernel behavior as the mask is 0 by default. Link: https://lkml.kernel.org/r/20251113111039.22701-5-feng.tang@linux.alibaba.com Suggested-by: Andrew Morton Signed-off-by: Feng Tang Cc: Jonathan Corbet Cc: Lance Yang Cc: "Paul E . McKenney" Cc: Petr Mladek Cc: Steven Rostedt Signed-off-by: Andrew Morton --- Documentation/admin-guide/sysctl/kernel.rst | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'Documentation') diff --git a/Documentation/admin-guide/sysctl/kernel.rst b/Documentation/admin-guide/sysctl/kernel.rst index 176520283f1a..239da22c4e28 100644 --- a/Documentation/admin-guide/sysctl/kernel.rst +++ b/Documentation/admin-guide/sysctl/kernel.rst @@ -521,6 +521,15 @@ default), only processes with the CAP_SYS_ADMIN capability may create io_uring instances. +kernel_sys_info +=============== +A comma separated list of extra system information to be dumped when +soft/hard lockup is detected, for example, "tasks,mem,timers,locks,...". +Refer 'panic_sys_info' section below for more details. + +It serves as the default kernel control knob, which will take effect +when a kernel module calls sys_info() with parameter==0. + kexec_load_disabled =================== -- cgit v1.2.3 From fdd76c8d6327e616ee61ddd00db16753d722168c Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Mon, 17 Nov 2025 09:21:53 +0530 Subject: Documentation/ABI: add kexec and kdump sysfs interface Add an ABI document for following kexec and kdump sysfs interface: - /sys/kernel/kexec_loaded - /sys/kernel/kexec_crash_loaded - /sys/kernel/kexec_crash_size - /sys/kernel/crash_elfcorehdr_size Link: https://lkml.kernel.org/r/20251117035153.1199665-1-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain Cc: Aditya Gupta Cc: Baoquan he Cc: Dave Young Cc: Hari Bathini Cc: Jiri Bohac Cc: Madhavan Srinivasan Cc: Mahesh J Salgaonkar Cc: Pingfan Liu Cc: Ritesh Harjani (IBM) Cc: Shivang Upadhyay Cc: Vivek Goyal Signed-off-by: Andrew Morton --- Documentation/ABI/testing/sysfs-kernel-kexec-kdump | 43 ++++++++++++++++++++++ 1 file changed, 43 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-kexec-kdump (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump new file mode 100644 index 000000000000..96b24565b68e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump @@ -0,0 +1,43 @@ +What: /sys/kernel/kexec_loaded +Date: Jun 2006 +Contact: kexec@lists.infradead.org +Description: read only + Indicates whether a new kernel image has been loaded + into memory using the kexec system call. It shows 1 if + a kexec image is present and ready to boot, or 0 if none + is loaded. +User: kexec tools, kdump service + +What: /sys/kernel/kexec_crash_loaded +Date: Jun 2006 +Contact: kexec@lists.infradead.org +Description: read only + Indicates whether a crash (kdump) kernel is currently + loaded into memory. It shows 1 if a crash kernel has been + successfully loaded for panic handling, or 0 if no crash + kernel is present. +User: Kexec tools, Kdump service + +What: /sys/kernel/kexec_crash_size +Date: Dec 2009 +Contact: kexec@lists.infradead.org +Description: read/write + Shows the amount of memory reserved for loading the crash + (kdump) kernel. It reports the size, in bytes, of the + crash kernel area defined by the crashkernel= parameter. + This interface also allows reducing the crashkernel + reservation by writing a smaller value, and the reclaimed + space is added back to the system RAM. +User: Kdump service + +What: /sys/kernel/crash_elfcorehdr_size +Date: Aug 2023 +Contact: kexec@lists.infradead.org +Description: read only + Indicates the preferred size of the memory buffer for the + ELF core header used by the crash (kdump) kernel. It defines + how much space is needed to hold metadata about the crashed + system, including CPU and memory information. This information + is used by the user space utility kexec to support updating the + in-kernel kdump image during hotplug operations. +User: Kexec tools -- cgit v1.2.3 From aa0145563ce26a5f5a1154e9f26a2f8c21eee2ca Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Tue, 18 Nov 2025 12:40:23 +0530 Subject: crash: export crashkernel CMA reservation to userspace Add a sysfs entry /sys/kernel/kexec_crash_cma_ranges to expose all CMA crashkernel ranges. This allows userspace tools configuring kdump to determine how much memory is reserved for crashkernel. If CMA is used, tools can warn users when attempting to capture user pages with CMA reservation. The new sysfs hold the CMA ranges in below format: cat /sys/kernel/kexec_crash_cma_ranges 100000000-10c7fffff The reason for not including Crash CMA Ranges in /proc/iomem is to avoid conflicts. It has been observed that contiguous memory ranges are sometimes shown as two separate System RAM entries in /proc/iomem. If a CMA range overlaps two System RAM ranges, adding crashk_res to /proc/iomem can create a conflict. Reference [1] describes one such instance on the PowerPC architecture. Link: https://lkml.kernel.org/r/20251118071023.1673329-1-sourabhjain@linux.ibm.com Link: https://lore.kernel.org/all/20251016142831.144515-1-sourabhjain@linux.ibm.com/ [1] Signed-off-by: Sourabh Jain Acked-by: Baoquan He Cc: Aditya Gupta Cc: Dave Young Cc: Hari Bathini Cc: Jiri Bohac Cc: Madhavan Srinivasan Cc: Mahesh J Salgaonkar Cc: Pingfan Liu Cc: Ritesh Harjani (IBM) Cc: Shivang Upadhyay Cc: Vivek Goyal Signed-off-by: Andrew Morton --- Documentation/ABI/testing/sysfs-kernel-kexec-kdump | 10 ++++++++++ 1 file changed, 10 insertions(+) (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump index 96b24565b68e..f6089e38de5f 100644 --- a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump +++ b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump @@ -41,3 +41,13 @@ Description: read only is used by the user space utility kexec to support updating the in-kernel kdump image during hotplug operations. User: Kexec tools + +What: /sys/kernel/kexec_crash_cma_ranges +Date: Nov 2025 +Contact: kexec@lists.infradead.org +Description: read only + Provides information about the memory ranges reserved from + the Contiguous Memory Allocator (CMA) area that are allocated + to the crash (kdump) kernel. It lists the start and end physical + addresses of CMA regions assigned for crashkernel use. +User: kdump service -- cgit v1.2.3 From 48a1b2321d763b5edeaf20bd4576d8c4b5df772b Mon Sep 17 00:00:00 2001 From: Pasha Tatashin Date: Sat, 1 Nov 2025 10:23:23 -0400 Subject: liveupdate: kho: move to kernel/liveupdate Move KHO to kernel/liveupdate/ in preparation of placing all Live Update core kernel related files to the same place. [pasha.tatashin@soleen.com: disable the menu when DEFERRED_STRUCT_PAGE_INIT] Link: https://lkml.kernel.org/r/CA+CK2bAvh9Oa2SLfsbJ8zztpEjrgr_hr-uGgF1coy8yoibT39A@mail.gmail.com Link: https://lkml.kernel.org/r/20251101142325.1326536-8-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin Reviewed-by: Jason Gunthorpe Reviewed-by: Mike Rapoport (Microsoft) Cc: Alexander Graf Cc: Changyuan Lyu Cc: Christian Brauner Cc: Jason Gunthorpe Cc: Jonathan Corbet Cc: Masahiro Yamada Cc: Miguel Ojeda Cc: Pratyush Yadav Cc: Randy Dunlap Cc: Simon Horman Cc: Tejun Heo Cc: Zhu Yanjun Signed-off-by: Andrew Morton --- Documentation/core-api/kho/concepts.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'Documentation') diff --git a/Documentation/core-api/kho/concepts.rst b/Documentation/core-api/kho/concepts.rst index 36d5c05cfb30..d626d1dbd678 100644 --- a/Documentation/core-api/kho/concepts.rst +++ b/Documentation/core-api/kho/concepts.rst @@ -70,5 +70,5 @@ in the FDT. That state is called the KHO finalization phase. Public API ========== -.. kernel-doc:: kernel/kexec_handover.c +.. kernel-doc:: kernel/liveupdate/kexec_handover.c :export: -- cgit v1.2.3 From 9e2fd062fa1713a33380cc97ef324d086dd45ba5 Mon Sep 17 00:00:00 2001 From: Pasha Tatashin Date: Tue, 25 Nov 2025 11:58:31 -0500 Subject: liveupdate: luo_core: Live Update Orchestrator MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Patch series "Live Update Orchestrator", v8. This series introduces the Live Update Orchestrator, a kernel subsystem designed to facilitate live kernel updates using a kexec-based reboot. This capability is critical for cloud environments, allowing hypervisors to be updated with minimal downtime for running virtual machines. LUO achieves this by preserving the state of selected resources, such as memory, devices and their dependencies, across the kernel transition. As a key feature, this series includes support for preserving memfd file descriptors, which allows critical in-memory data, such as guest RAM or any other large memory region, to be maintained in RAM across the kexec reboot. The other series that use LUO, are VFIO [1], IOMMU [2], and PCI [3] preservations. Github repo of this series [4]. The core of LUO is a framework for managing the lifecycle of preserved resources through a userspace-driven interface. Key features include: - Session Management Userspace agent (i.e. luod [5]) creates named sessions, each represented by a file descriptor (via centralized agent that controls /dev/liveupdate). The lifecycle of all preserved resources within a session is tied to this FD, ensuring automatic kernel cleanup if the controlling userspace agent crashes or exits unexpectedly. - File Preservation A handler-based framework allows specific file types (demonstrated here with memfd) to be preserved. Handlers manage the serialization, restoration, and lifecycle of their specific file types. - File-Lifecycle-Bound State A new mechanism for managing shared global state whose lifecycle is tied to the preservation of one or more files. This is crucial for subsystems like IOMMU or HugeTLB, where multiple file descriptors may depend on a single, shared underlying resource that must be preserved only once. - KHO Integration LUO drives the Kexec Handover framework programmatically to pass its serialized metadata to the next kernel. The LUO state is finalized and added to the kexec image just before the reboot is triggered. In the future this step will also be removed once stateless KHO is merged [6]. - Userspace Interface Control is provided via ioctl commands on /dev/liveupdate for creating and retrieving sessions, as well as on session file descriptors for managing individual files. - Testing The series includes a set of selftests, including userspace API validation, kexec-based lifecycle tests for various session and file scenarios, and a new in-kernel test module to validate the FLB logic. Introduce LUO, a mechanism intended to facilitate kernel updates while keeping designated devices operational across the transition (e.g., via kexec). The primary use case is updating hypervisors with minimal disruption to running virtual machines. For userspace side of hypervisor update we have copyless migration. LUO is for updating the kernel. This initial patch lays the groundwork for the LUO subsystem. Further functionality, including the implementation of state transition logic, integration with KHO, and hooks for subsystems and file descriptors, will be added in subsequent patches. Create a character device at /dev/liveupdate. A new uAPI header, , will define the necessary structures. The magic number for IOCTL is registered in Documentation/userspace-api/ioctl/ioctl-number.rst. Link: https://lkml.kernel.org/r/20251125165850.3389713-1-pasha.tatashin@soleen.com Link: https://lkml.kernel.org/r/20251125165850.3389713-2-pasha.tatashin@soleen.com Link: https://lore.kernel.org/all/20251018000713.677779-1-vipinsh@google.com/ [1] Link: https://lore.kernel.org/linux-iommu/20250928190624.3735830-1-skhawaja@google.com [2] Link: https://lore.kernel.org/linux-pci/20250916-luo-pci-v2-0-c494053c3c08@kernel.org [3] Link: https://github.com/googleprodkernel/linux-liveupdate/tree/luo/v8 [4] Link: https://tinyurl.com/luoddesign [5] Link: https://lore.kernel.org/all/20251020100306.2709352-1-jasonmiu@google.com [6] Link: https://lore.kernel.org/all/20251115233409.768044-1-pasha.tatashin@soleen.com [7] Link: https://github.com/soleen/linux/blob/luo/v8b03/diff.v7.v8 [8] Signed-off-by: Pasha Tatashin Reviewed-by: Pratyush Yadav Reviewed-by: Mike Rapoport (Microsoft) Tested-by: David Matlack Cc: Aleksander Lobakin Cc: Alexander Graf Cc: Alice Ryhl Cc: Andriy Shevchenko Cc: anish kumar Cc: Anna Schumaker Cc: Bartosz Golaszewski Cc: Bjorn Helgaas Cc: Borislav Betkov Cc: Chanwoo Choi Cc: Chen Ridong Cc: Chris Li Cc: Christian Brauner Cc: Daniel Wagner Cc: Danilo Krummrich Cc: Dan Williams Cc: David Hildenbrand Cc: David Jeffery Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Guixin Liu Cc: "H. Peter Anvin" Cc: Hugh Dickins Cc: Ilpo Järvinen Cc: Ingo Molnar Cc: Ira Weiny Cc: Jann Horn Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Joanthan Cameron Cc: Joel Granados Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Lennart Poettering Cc: Leon Romanovsky Cc: Leon Romanovsky Cc: Lukas Wunner Cc: Marc Rutland Cc: Masahiro Yamada Cc: Matthew Maurer Cc: Miguel Ojeda Cc: Myugnjoo Ham Cc: Parav Pandit Cc: Randy Dunlap Cc: Roman Gushchin Cc: Saeed Mahameed Cc: Samiullah Khawaja Cc: Song Liu Cc: Steven Rostedt Cc: Stuart Hayes Cc: Tejun Heo Cc: Thomas Gleinxer Cc: Thomas Weißschuh Cc: Vincent Guittot Cc: William Tu Cc: Yoann Congal Cc: Zijun Hu Cc: Pratyush Yadav Cc: Zhu Yanjun Signed-off-by: Andrew Morton --- Documentation/userspace-api/ioctl/ioctl-number.rst | 2 ++ 1 file changed, 2 insertions(+) (limited to 'Documentation') diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst b/Documentation/userspace-api/ioctl/ioctl-number.rst index 7c527a01d1cf..7232b3544cec 100644 --- a/Documentation/userspace-api/ioctl/ioctl-number.rst +++ b/Documentation/userspace-api/ioctl/ioctl-number.rst @@ -385,6 +385,8 @@ Code Seq# Include File Comments 0xB8 01-02 uapi/misc/mrvl_cn10k_dpi.h Marvell CN10K DPI driver 0xB8 all uapi/linux/mshv.h Microsoft Hyper-V /dev/mshv driver +0xBA 00-0F uapi/linux/liveupdate.h Pasha Tatashin + 0xC0 00-0F linux/usb/iowarrior.h 0xCA 00-0F uapi/misc/cxl.h Dead since 6.15 0xCA 10-2F uapi/misc/ocxl.h -- cgit v1.2.3 From 906a3306245525f408129153d5dcb7ae7e6ded44 Mon Sep 17 00:00:00 2001 From: Pasha Tatashin Date: Tue, 25 Nov 2025 11:58:38 -0500 Subject: docs: add luo documentation MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add the documentation files for the Live Update Orchestrator Link: https://lkml.kernel.org/r/20251125165850.3389713-9-pasha.tatashin@soleen.com Signed-off-by: Pasha Tatashin Reviewed-by: Mike Rapoport (Microsoft) Reviewed-by: Pratyush Yadav Tested-by: David Matlack Cc: Aleksander Lobakin Cc: Alexander Graf Cc: Alice Ryhl Cc: Andriy Shevchenko Cc: anish kumar Cc: Anna Schumaker Cc: Bartosz Golaszewski Cc: Bjorn Helgaas Cc: Borislav Betkov Cc: Chanwoo Choi Cc: Chen Ridong Cc: Chris Li Cc: Christian Brauner Cc: Daniel Wagner Cc: Danilo Krummrich Cc: Dan Williams Cc: David Hildenbrand Cc: David Jeffery Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Guixin Liu Cc: "H. Peter Anvin" Cc: Hugh Dickins Cc: Ilpo Järvinen Cc: Ingo Molnar Cc: Ira Weiny Cc: Jann Horn Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Joanthan Cameron Cc: Joel Granados Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Lennart Poettering Cc: Leon Romanovsky Cc: Leon Romanovsky Cc: Lukas Wunner Cc: Marc Rutland Cc: Masahiro Yamada Cc: Matthew Maurer Cc: Miguel Ojeda Cc: Myugnjoo Ham Cc: Parav Pandit Cc: Pratyush Yadav Cc: Randy Dunlap Cc: Roman Gushchin Cc: Saeed Mahameed Cc: Samiullah Khawaja Cc: Song Liu Cc: Steven Rostedt Cc: Stuart Hayes Cc: Tejun Heo Cc: Thomas Gleinxer Cc: Thomas Weißschuh Cc: Vincent Guittot Cc: William Tu Cc: Yoann Congal Cc: Zhu Yanjun Cc: Zijun Hu Signed-off-by: Andrew Morton --- Documentation/core-api/index.rst | 1 + Documentation/core-api/liveupdate.rst | 54 ++++++++++++++++++++++++++++++ Documentation/userspace-api/index.rst | 1 + Documentation/userspace-api/liveupdate.rst | 20 +++++++++++ 4 files changed, 76 insertions(+) create mode 100644 Documentation/core-api/liveupdate.rst create mode 100644 Documentation/userspace-api/liveupdate.rst (limited to 'Documentation') diff --git a/Documentation/core-api/index.rst b/Documentation/core-api/index.rst index 6cbdcbfa79c3..5eb0fbbbc323 100644 --- a/Documentation/core-api/index.rst +++ b/Documentation/core-api/index.rst @@ -138,6 +138,7 @@ Documents that don't fit elsewhere or which have yet to be categorized. :maxdepth: 1 librs + liveupdate netlink .. only:: subproject and html diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst new file mode 100644 index 000000000000..cca1993008d8 --- /dev/null +++ b/Documentation/core-api/liveupdate.rst @@ -0,0 +1,54 @@ +.. SPDX-License-Identifier: GPL-2.0 + +======================== +Live Update Orchestrator +======================== +:Author: Pasha Tatashin + +.. kernel-doc:: kernel/liveupdate/luo_core.c + :doc: Live Update Orchestrator (LUO) + +LUO Sessions +============ +.. kernel-doc:: kernel/liveupdate/luo_session.c + :doc: LUO Sessions + +LUO Preserving File Descriptors +=============================== +.. kernel-doc:: kernel/liveupdate/luo_file.c + :doc: LUO File Descriptors + +Live Update Orchestrator ABI +============================ +.. kernel-doc:: include/linux/kho/abi/luo.h + :doc: Live Update Orchestrator ABI + +Public API +========== +.. kernel-doc:: include/linux/liveupdate.h + +.. kernel-doc:: include/linux/kho/abi/luo.h + :functions: + +.. kernel-doc:: kernel/liveupdate/luo_core.c + :export: + +.. kernel-doc:: kernel/liveupdate/luo_file.c + :export: + +Internal API +============ +.. kernel-doc:: kernel/liveupdate/luo_core.c + :internal: + +.. kernel-doc:: kernel/liveupdate/luo_session.c + :internal: + +.. kernel-doc:: kernel/liveupdate/luo_file.c + :internal: + +See Also +======== + +- :doc:`Live Update uAPI ` +- :doc:`/core-api/kho/concepts` diff --git a/Documentation/userspace-api/index.rst b/Documentation/userspace-api/index.rst index b8c73be4fb11..8a61ac4c1bf1 100644 --- a/Documentation/userspace-api/index.rst +++ b/Documentation/userspace-api/index.rst @@ -61,6 +61,7 @@ Everything else :maxdepth: 1 ELF + liveupdate netlink/index sysfs-platform_profile vduse diff --git a/Documentation/userspace-api/liveupdate.rst b/Documentation/userspace-api/liveupdate.rst new file mode 100644 index 000000000000..41c0473e4f16 --- /dev/null +++ b/Documentation/userspace-api/liveupdate.rst @@ -0,0 +1,20 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================ +Live Update uAPI +================ +:Author: Pasha Tatashin + +ioctl interface +=============== +.. kernel-doc:: kernel/liveupdate/luo_core.c + :doc: LUO ioctl Interface + +ioctl uAPI +=========== +.. kernel-doc:: include/uapi/linux/liveupdate.h + +See Also +======== + +- :doc:`Live Update Orchestrator ` -- cgit v1.2.3 From 15fc11bb2cb649f2207bbb7140d5430d3937a8d5 Mon Sep 17 00:00:00 2001 From: Pratyush Yadav Date: Tue, 25 Nov 2025 11:58:45 -0500 Subject: docs: add documentation for memfd preservation via LUO MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add the documentation under the "Preserving file descriptors" section of LUO's documentation. Link: https://lkml.kernel.org/r/20251125165850.3389713-16-pasha.tatashin@soleen.com Signed-off-by: Pratyush Yadav Co-developed-by: Pasha Tatashin Signed-off-by: Pasha Tatashin Reviewed-by: Mike Rapoport (Microsoft) Tested-by: David Matlack Cc: Aleksander Lobakin Cc: Alexander Graf Cc: Alice Ryhl Cc: Andriy Shevchenko Cc: anish kumar Cc: Anna Schumaker Cc: Bartosz Golaszewski Cc: Bjorn Helgaas Cc: Borislav Betkov Cc: Chanwoo Choi Cc: Chen Ridong Cc: Chris Li Cc: Christian Brauner Cc: Daniel Wagner Cc: Danilo Krummrich Cc: Dan Williams Cc: David Hildenbrand Cc: David Jeffery Cc: David Rientjes Cc: Greg Kroah-Hartman Cc: Guixin Liu Cc: "H. Peter Anvin" Cc: Hugh Dickins Cc: Ilpo Järvinen Cc: Ingo Molnar Cc: Ira Weiny Cc: Jann Horn Cc: Jason Gunthorpe Cc: Jens Axboe Cc: Joanthan Cameron Cc: Joel Granados Cc: Johannes Weiner Cc: Jonathan Corbet Cc: Lennart Poettering Cc: Leon Romanovsky Cc: Leon Romanovsky Cc: Lukas Wunner Cc: Marc Rutland Cc: Masahiro Yamada Cc: Matthew Maurer Cc: Miguel Ojeda Cc: Myugnjoo Ham Cc: Parav Pandit Cc: Pratyush Yadav Cc: Randy Dunlap Cc: Roman Gushchin Cc: Saeed Mahameed Cc: Samiullah Khawaja Cc: Song Liu Cc: Steven Rostedt Cc: Stuart Hayes Cc: Tejun Heo Cc: Thomas Gleinxer Cc: Thomas Weißschuh Cc: Vincent Guittot Cc: William Tu Cc: Yoann Congal Cc: Zhu Yanjun Cc: Zijun Hu Signed-off-by: Andrew Morton --- Documentation/core-api/liveupdate.rst | 7 +++++++ Documentation/mm/index.rst | 1 + Documentation/mm/memfd_preservation.rst | 23 +++++++++++++++++++++++ 3 files changed, 31 insertions(+) create mode 100644 Documentation/mm/memfd_preservation.rst (limited to 'Documentation') diff --git a/Documentation/core-api/liveupdate.rst b/Documentation/core-api/liveupdate.rst index cca1993008d8..7960eb15a81f 100644 --- a/Documentation/core-api/liveupdate.rst +++ b/Documentation/core-api/liveupdate.rst @@ -23,6 +23,13 @@ Live Update Orchestrator ABI .. kernel-doc:: include/linux/kho/abi/luo.h :doc: Live Update Orchestrator ABI +The following types of file descriptors can be preserved + +.. toctree:: + :maxdepth: 1 + + ../mm/memfd_preservation + Public API ========== .. kernel-doc:: include/linux/liveupdate.h diff --git a/Documentation/mm/index.rst b/Documentation/mm/index.rst index ba6a8872849b..7aa2a8886908 100644 --- a/Documentation/mm/index.rst +++ b/Documentation/mm/index.rst @@ -48,6 +48,7 @@ documentation, or deleted if it has served its purpose. hugetlbfs_reserv ksm memory-model + memfd_preservation mmu_notifier multigen_lru numa diff --git a/Documentation/mm/memfd_preservation.rst b/Documentation/mm/memfd_preservation.rst new file mode 100644 index 000000000000..66e0fb6d5ef0 --- /dev/null +++ b/Documentation/mm/memfd_preservation.rst @@ -0,0 +1,23 @@ +.. SPDX-License-Identifier: GPL-2.0-or-later + +========================== +Memfd Preservation via LUO +========================== + +.. kernel-doc:: mm/memfd_luo.c + :doc: Memfd Preservation via LUO + +Memfd Preservation ABI +====================== + +.. kernel-doc:: include/linux/kho/abi/memfd.h + :doc: DOC: memfd Live Update ABI + +.. kernel-doc:: include/linux/kho/abi/memfd.h + :internal: + +See Also +======== + +- :doc:`/core-api/liveupdate` +- :doc:`/core-api/kho/concepts` -- cgit v1.2.3 From 5c991b6d9b30895744085abb592c77338d65a40d Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Tue, 18 Nov 2025 17:15:06 +0530 Subject: Documentation/ABI: mark old kexec sysfs deprecated The previous commit ("kexec: move sysfs entries to /sys/kernel/kexec") moved all existing kexec sysfs entries to a new location. The ABI document is updated to include a note about the deprecation of the old kexec sysfs entries. The following kexec sysfs entries are deprecated: - /sys/kernel/kexec_loaded - /sys/kernel/kexec_crash_loaded - /sys/kernel/kexec_crash_size - /sys/kernel/crash_elfcorehdr_size - /sys/kernel/kexec_crash_cma_ranges Link: https://lkml.kernel.org/r/20251118114507.1769455-3-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain Acked-by: Baoquan He Cc: Aditya Gupta Cc: Dave Young Cc: Hari Bathini Cc: Jiri Bohac Cc: Madhavan Srinivasan Cc: Mahesh J Salgaonkar Cc: Pingfan Liu Cc: Ritesh Harjani (IBM) Cc: Shivang Upadhyay Cc: Vivek Goyal Signed-off-by: Andrew Morton --- .../ABI/obsolete/sysfs-kernel-kexec-kdump | 71 ++++++++++++++++++++++ Documentation/ABI/testing/sysfs-kernel-kexec-kdump | 53 ---------------- 2 files changed, 71 insertions(+), 53 deletions(-) create mode 100644 Documentation/ABI/obsolete/sysfs-kernel-kexec-kdump delete mode 100644 Documentation/ABI/testing/sysfs-kernel-kexec-kdump (limited to 'Documentation') diff --git a/Documentation/ABI/obsolete/sysfs-kernel-kexec-kdump b/Documentation/ABI/obsolete/sysfs-kernel-kexec-kdump new file mode 100644 index 000000000000..ba26a6a1d2be --- /dev/null +++ b/Documentation/ABI/obsolete/sysfs-kernel-kexec-kdump @@ -0,0 +1,71 @@ +NOTE: all the ABIs listed in this file are deprecated and will be removed after 2028. + +Here are the alternative ABIs: ++------------------------------------+-----------------------------------------+ +| Deprecated | Alternative | ++------------------------------------+-----------------------------------------+ +| /sys/kernel/kexec_loaded | /sys/kernel/kexec/loaded | ++------------------------------------+-----------------------------------------+ +| /sys/kernel/kexec_crash_loaded | /sys/kernel/kexec/crash_loaded | ++------------------------------------+-----------------------------------------+ +| /sys/kernel/kexec_crash_size | /sys/kernel/kexec/crash_size | ++------------------------------------+-----------------------------------------+ +| /sys/kernel/crash_elfcorehdr_size | /sys/kernel/kexec/crash_elfcorehdr_size | ++------------------------------------+-----------------------------------------+ +| /sys/kernel/kexec_crash_cma_ranges | /sys/kernel/kexec/crash_cma_ranges | ++------------------------------------+-----------------------------------------+ + + +What: /sys/kernel/kexec_loaded +Date: Jun 2006 +Contact: kexec@lists.infradead.org +Description: read only + Indicates whether a new kernel image has been loaded + into memory using the kexec system call. It shows 1 if + a kexec image is present and ready to boot, or 0 if none + is loaded. +User: kexec tools, kdump service + +What: /sys/kernel/kexec_crash_loaded +Date: Jun 2006 +Contact: kexec@lists.infradead.org +Description: read only + Indicates whether a crash (kdump) kernel is currently + loaded into memory. It shows 1 if a crash kernel has been + successfully loaded for panic handling, or 0 if no crash + kernel is present. +User: Kexec tools, Kdump service + +What: /sys/kernel/kexec_crash_size +Date: Dec 2009 +Contact: kexec@lists.infradead.org +Description: read/write + Shows the amount of memory reserved for loading the crash + (kdump) kernel. It reports the size, in bytes, of the + crash kernel area defined by the crashkernel= parameter. + This interface also allows reducing the crashkernel + reservation by writing a smaller value, and the reclaimed + space is added back to the system RAM. +User: Kdump service + +What: /sys/kernel/crash_elfcorehdr_size +Date: Aug 2023 +Contact: kexec@lists.infradead.org +Description: read only + Indicates the preferred size of the memory buffer for the + ELF core header used by the crash (kdump) kernel. It defines + how much space is needed to hold metadata about the crashed + system, including CPU and memory information. This information + is used by the user space utility kexec to support updating the + in-kernel kdump image during hotplug operations. +User: Kexec tools + +What: /sys/kernel/kexec_crash_cma_ranges +Date: Nov 2025 +Contact: kexec@lists.infradead.org +Description: read only + Provides information about the memory ranges reserved from + the Contiguous Memory Allocator (CMA) area that are allocated + to the crash (kdump) kernel. It lists the start and end physical + addresses of CMA regions assigned for crashkernel use. +User: kdump service diff --git a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump deleted file mode 100644 index f6089e38de5f..000000000000 --- a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump +++ /dev/null @@ -1,53 +0,0 @@ -What: /sys/kernel/kexec_loaded -Date: Jun 2006 -Contact: kexec@lists.infradead.org -Description: read only - Indicates whether a new kernel image has been loaded - into memory using the kexec system call. It shows 1 if - a kexec image is present and ready to boot, or 0 if none - is loaded. -User: kexec tools, kdump service - -What: /sys/kernel/kexec_crash_loaded -Date: Jun 2006 -Contact: kexec@lists.infradead.org -Description: read only - Indicates whether a crash (kdump) kernel is currently - loaded into memory. It shows 1 if a crash kernel has been - successfully loaded for panic handling, or 0 if no crash - kernel is present. -User: Kexec tools, Kdump service - -What: /sys/kernel/kexec_crash_size -Date: Dec 2009 -Contact: kexec@lists.infradead.org -Description: read/write - Shows the amount of memory reserved for loading the crash - (kdump) kernel. It reports the size, in bytes, of the - crash kernel area defined by the crashkernel= parameter. - This interface also allows reducing the crashkernel - reservation by writing a smaller value, and the reclaimed - space is added back to the system RAM. -User: Kdump service - -What: /sys/kernel/crash_elfcorehdr_size -Date: Aug 2023 -Contact: kexec@lists.infradead.org -Description: read only - Indicates the preferred size of the memory buffer for the - ELF core header used by the crash (kdump) kernel. It defines - how much space is needed to hold metadata about the crashed - system, including CPU and memory information. This information - is used by the user space utility kexec to support updating the - in-kernel kdump image during hotplug operations. -User: Kexec tools - -What: /sys/kernel/kexec_crash_cma_ranges -Date: Nov 2025 -Contact: kexec@lists.infradead.org -Description: read only - Provides information about the memory ranges reserved from - the Contiguous Memory Allocator (CMA) area that are allocated - to the crash (kdump) kernel. It lists the start and end physical - addresses of CMA regions assigned for crashkernel use. -User: kdump service -- cgit v1.2.3 From fb5c3644278c169c0d03b904b2bdedcaea897df7 Mon Sep 17 00:00:00 2001 From: Sourabh Jain Date: Tue, 18 Nov 2025 17:15:07 +0530 Subject: Documentation/ABI: new kexec and kdump sysfs interface Add an ABI document for following kexec and kdump sysfs interface: - /sys/kernel/kexec/loaded - /sys/kernel/kexec/crash_loaded - /sys/kernel/kexec/crash_size - /sys/kernel/kexec/crash_elfcorehdr_size - /sys/kernel/kexec/crash_cma_ranges Link: https://lkml.kernel.org/r/20251118114507.1769455-4-sourabhjain@linux.ibm.com Signed-off-by: Sourabh Jain Acked-by: Baoquan He Cc: Aditya Gupta Cc: Dave Young Cc: Hari Bathini Cc: Jiri Bohac Cc: Madhavan Srinivasan Cc: Mahesh J Salgaonkar Cc: Pingfan Liu Cc: Ritesh Harjani (IBM) Cc: Shivang Upadhyay Cc: Vivek Goyal Signed-off-by: Andrew Morton --- Documentation/ABI/testing/sysfs-kernel-kexec-kdump | 61 ++++++++++++++++++++++ 1 file changed, 61 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-kernel-kexec-kdump (limited to 'Documentation') diff --git a/Documentation/ABI/testing/sysfs-kernel-kexec-kdump b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump new file mode 100644 index 000000000000..f59051b5d96d --- /dev/null +++ b/Documentation/ABI/testing/sysfs-kernel-kexec-kdump @@ -0,0 +1,61 @@ +What: /sys/kernel/kexec/* +Date: Nov 2025 +Contact: kexec@lists.infradead.org +Description: + The /sys/kernel/kexec/* directory contains sysfs files + that provide information about the configuration status + of kexec and kdump. + +What: /sys/kernel/kexec/loaded +Date: Nov 2025 +Contact: kexec@lists.infradead.org +Description: read only + Indicates whether a new kernel image has been loaded + into memory using the kexec system call. It shows 1 if + a kexec image is present and ready to boot, or 0 if none + is loaded. +User: kexec tools, kdump service + +What: /sys/kernel/kexec/crash_loaded +Date: Nov 2025 +Contact: kexec@lists.infradead.org +Description: read only + Indicates whether a crash (kdump) kernel is currently + loaded into memory. It shows 1 if a crash kernel has been + successfully loaded for panic handling, or 0 if no crash + kernel is present. +User: Kexec tools, Kdump service + +What: /sys/kernel/kexec/crash_size +Date: Nov 2025 +Contact: kexec@lists.infradead.org +Description: read/write + Shows the amount of memory reserved for loading the crash + (kdump) kernel. It reports the size, in bytes, of the + crash kernel area defined by the crashkernel= parameter. + This interface also allows reducing the crashkernel + reservation by writing a smaller value, and the reclaimed + space is added back to the system RAM. +User: Kdump service + +What: /sys/kernel/kexec/crash_elfcorehdr_size +Date: Nov 2025 +Contact: kexec@lists.infradead.org +Description: read only + Indicates the preferred size of the memory buffer for the + ELF core header used by the crash (kdump) kernel. It defines + how much space is needed to hold metadata about the crashed + system, including CPU and memory information. This information + is used by the user space utility kexec to support updating the + in-kernel kdump image during hotplug operations. +User: Kexec tools + +What: /sys/kernel/kexec/crash_cma_ranges +Date: Nov 2025 +Contact: kexec@lists.infradead.org +Description: read only + Provides information about the memory ranges reserved from + the Contiguous Memory Allocator (CMA) area that are allocated + to the crash (kdump) kernel. It lists the start and end physical + addresses of CMA regions assigned for crashkernel use. +User: kdump service -- cgit v1.2.3 From 3fa805c37dd4d3e72ae5c58800f3f46ab3ca1f70 Mon Sep 17 00:00:00 2001 From: Breno Leitao Date: Fri, 10 Oct 2025 03:36:50 -0700 Subject: vmcoreinfo: track and log recoverable hardware errors MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Introduce a generic infrastructure for tracking recoverable hardware errors (HW errors that are visible to the OS but does not cause a panic) and record them for vmcore consumption. This aids post-mortem crash analysis tools by preserving a count and timestamp for the last occurrence of such errors. On the other side, correctable errors, which the OS typically remains unaware of because the underlying hardware handles them transparently, are less relevant for crash dump and therefore are NOT tracked in this infrastructure. Add centralized logging for sources of recoverable hardware errors based on the subsystem it has been notified. hwerror_data is write-only at kernel runtime, and it is meant to be read from vmcore using tools like crash/drgn. For example, this is how it looks like when opening the crashdump from drgn. >>> prog['hwerror_data'] (struct hwerror_info[1]){ { .count = (int)844, .timestamp = (time64_t)1752852018, }, ... This helps fleet operators quickly triage whether a crash may be influenced by hardware recoverable errors (which executes a uncommon code path in the kernel), especially when recoverable errors occurred shortly before a panic, such as the bug fixed by commit ee62ce7a1d90 ("page_pool: Track DMA-mapped pages and unmap them when destroying the pool") This is not intended to replace full hardware diagnostics but provides a fast way to correlate hardware events with kernel panics quickly. Rare machine check exceptions—like those indicated by mce_flags.p5 or mce_flags.winchip—are not accounted for in this method, as they fall outside the intended usage scope for this feature's user base. [leitao@debian.org: add hw-recoverable-errors to toctree] Link: https://lkml.kernel.org/r/20251127-vmcoreinfo_fix-v1-1-26f5b1c43da9@debian.org Link: https://lkml.kernel.org/r/20251010-vmcore_hw_error-v5-1-636ede3efe44@debian.org Signed-off-by: Breno Leitao Suggested-by: Tony Luck Suggested-by: Shuai Xue Reviewed-by: Shuai Xue Reviewed-by: Hanjun Guo [APEI] Cc: Bjorn Helgaas Cc: Bob Moore Cc: Borislav Betkov Cc: "H. Peter Anvin" Cc: Ingo Molnar Cc: James Morse Cc: Konrad Rzessutek Wilk Cc: Len Brown Cc: Mahesh Salgaonkar Cc: Mauro Carvalho Chehab Cc: "Oliver O'Halloran" Cc: Omar Sandoval Cc: Thomas Gleinxer Signed-off-by: Andrew Morton --- Documentation/driver-api/hw-recoverable-errors.rst | 60 ++++++++++++++++++++++ Documentation/driver-api/index.rst | 1 + 2 files changed, 61 insertions(+) create mode 100644 Documentation/driver-api/hw-recoverable-errors.rst (limited to 'Documentation') diff --git a/Documentation/driver-api/hw-recoverable-errors.rst b/Documentation/driver-api/hw-recoverable-errors.rst new file mode 100644 index 000000000000..fc526c3454bd --- /dev/null +++ b/Documentation/driver-api/hw-recoverable-errors.rst @@ -0,0 +1,60 @@ +.. SPDX-License-Identifier: GPL-2.0 + +================================================= +Recoverable Hardware Error Tracking in vmcoreinfo +================================================= + +Overview +-------- + +This feature provides a generic infrastructure within the Linux kernel to track +and log recoverable hardware errors. These are hardware recoverable errors +visible that might not cause immediate panics but may influence health, mainly +because new code path will be executed in the kernel. + +By recording counts and timestamps of recoverable errors into the vmcoreinfo +crash dump notes, this infrastructure aids post-mortem crash analysis tools in +correlating hardware events with kernel failures. This enables faster triage +and better understanding of root causes, especially in large-scale cloud +environments where hardware issues are common. + +Benefits +-------- + +- Facilitates correlation of hardware recoverable errors with kernel panics or + unusual code paths that lead to system crashes. +- Provides operators and cloud providers quick insights, improving reliability + and reducing troubleshooting time. +- Complements existing full hardware diagnostics without replacing them. + +Data Exposure and Consumption +----------------------------- + +- The tracked error data consists of per-error-type counts and timestamps of + last occurrence. +- This data is stored in the `hwerror_data` array, categorized by error source + types like CPU, memory, PCI, CXL, and others. +- It is exposed via vmcoreinfo crash dump notes and can be read using tools + like `crash`, `drgn`, or other kernel crash analysis utilities. +- There is no other way to read these data other than from crash dumps. +- These errors are divided by area, which includes CPU, Memory, PCI, CXL and + others. + +Typical usage example (in drgn REPL): + +.. code-block:: python + + >>> prog['hwerror_data'] + (struct hwerror_info[HWERR_RECOV_MAX]){ + { + .count = (int)844, + .timestamp = (time64_t)1752852018, + }, + ... + } + +Enabling +-------- + +- This feature is enabled when CONFIG_VMCORE_INFO is set. + diff --git a/Documentation/driver-api/index.rst b/Documentation/driver-api/index.rst index 3e2a270bd828..a35705b44799 100644 --- a/Documentation/driver-api/index.rst +++ b/Documentation/driver-api/index.rst @@ -96,6 +96,7 @@ Subsystem-specific APIs gpio/index hsi hte/index + hw-recoverable-errors i2c iio/index infiniband -- cgit v1.2.3