aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/platform
AgeCommit message (Collapse)AuthorFilesLines
2026-05-05x86/efi: Restore IRQ state in EFI page fault handlerArd Biesheuvel1-1/+10
The kernel's softirq API does not permit re-enabling softirqs while IRQs are disabled. The reason for this is that local_bh_enable() will not only re-enable delivery of softirqs over the back of IRQs, it will also handle any pending softirqs immediately, regardless of whether IRQs are enabled at that point. For this reason, commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") disables softirqs only when IRQs are enabled, as it is not permitted otherwise, but also unnecessary, given that asynchronous softirq delivery never happens to begin with while IRQs are disabled. However, this does mean that entering a kernel mode FPU section with IRQs enabled and leaving it with IRQs disabled leads to problems, as identified by Sashiko [0]: the EFI page fault handler is called from page_fault_oops() with IRQs disabled, and thus ends the kernel mode FPU section with IRQs disabled as well, regardless of whether IRQs were enabled when it was started. This may result in schedule() being called with a non-zero preempt_count, causing a BUG(). So take care to re-enable IRQs when handling any EFI page faults if they were taken with IRQs enabled. [0] https://sashiko.dev/#/patchset/20260430074107.27051-1-ivan.hu%40canonical.com Cc: Eric Biggers <ebiggers@kernel.org> Cc: Ivan Hu <ivan.hu@canonical.com> Cc: x86@kernel.org Cc: <stable@vger.kernel.org> Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") Reviewed-by: Eric Biggers <ebiggers@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-05-04x86/efi: Fix graceful fault handling after FPU softirq changesIvan Hu1-1/+1
Since commit d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs"), kernel_fpu_begin() calls fpregs_lock() which uses local_bh_disable() instead of the previous preempt_disable(). This sets SOFTIRQ_OFFSET in preempt_count during the entire EFI runtime service call, causing in_interrupt() to return true in normal task context. The graceful page fault handler efi_crash_gracefully_on_page_fault() uses in_interrupt() to bail out for faults in real interrupt context. With SOFTIRQ_OFFSET now set, the handler always bails out, leaving EFI firmware page faults unhandled. This escalates to die() which also sees in_interrupt() as true and calls panic("Fatal exception in interrupt"), resulting in a hard system freeze. On systems with buggy firmware that triggers page faults during EFI runtime calls (e.g., accessing unmapped memory in GetTime()), this causes an unrecoverable hang instead of the expected graceful EFI_ABORTED recovery. Fix by replacing in_interrupt() with !in_task(). This preserves the original intent of bailing for interrupts or NMI faults, while no longer falsely triggering from the FPU code path's local_bh_disable(). Fixes: d02198550423 ("x86/fpu: Improve crypto performance by making kernel-mode FPU reliably usable in softirqs") Cc: <stable@vger.kernel.org> Signed-off-by: Ivan Hu <ivan.hu@canonical.com> [ardb: Sashiko spotted that using 'in_hardirq() || in_nmi()' leaves a window where a softirq may be taken before fpregs_lock() is called, but after efi_rts_work.efi_rts_id has been assigned, and any page faults occurring in that window will then be misidentified as having been caused by the firmware. Instead, use !in_task(), which incorporates in_serving_softirq(). ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-04-18Merge tag 'memblock-v7.1-rc1' of ↵Linus Torvalds2-5/+2
git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: - improve debuggability of reserve_mem kernel parameter handling with print outs in case of a failure and debugfs info showing what was actually reserved - Make memblock_free_late() and free_reserved_area() use the same core logic for freeing the memory to buddy and ensure it takes care of updating memblock arrays when ARCH_KEEP_MEMBLOCK is enabled. * tag 'memblock-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: x86/alternative: delay freeing of smp_locks section memblock: warn when freeing reserved memory before memory map is initialized memblock, treewide: make memblock_free() handle late freeing memblock: make free_reserved_area() update memblock if ARCH_KEEP_MEMBLOCK=y memblock: extract page freeing from free_reserved_area() into a helper memblock: make free_reserved_area() more robust mm: move free_reserved_area() to mm/memblock.c powerpc: opal-core: pair alloc_pages_exact() with free_pages_exact() powerpc: fadump: pair alloc_pages_exact() with free_pages_exact() memblock: reserve_mem: fix end caclulation in reserve_mem_release_by_name() memblock: move reserve_bootmem_range() to memblock.c and make it static memblock: Add reserve_mem debugfs info memblock: Print out errors on reserve_mem parser
2026-04-17Merge tag 'integrity-v7.1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity Pull integrity updates from Mimi Zohar: "There are two main changes, one feature removal, some code cleanup, and a number of bug fixes. Main changes: - Detecting secure boot mode was limited to IMA. Make detecting secure boot mode accessible to EVM and other LSMs - IMA sigv3 support was limited to fsverity. Add IMA sigv3 support for IMA regular file hashes and EVM portable signatures Remove: - Remove IMA support for asychronous hash calculation originally added for hardware acceleration Cleanup: - Remove unnecessary Kconfig CONFIG_MODULE_SIG and CONFIG_KEXEC_SIG tests - Add descriptions of the IMA atomic flags Bug fixes: - Like IMA, properly limit EVM "fix" mode - Define and call evm_fix_hmac() to update security.evm - Fallback to using i_version to detect file change for filesystems that do not support STATX_CHANGE_COOKIE - Address missing kernel support for configured (new) TPM hash algorithms - Add missing crypto_shash_final() return value" * tag 'integrity-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/zohar/linux-integrity: evm: Enforce signatures version 3 with new EVM policy 'bit 3' integrity: Allow sigv3 verification on EVM_XATTR_PORTABLE_DIGSIG ima: add support to require IMA sigv3 signatures ima: add regular file data hash signature version 3 support ima: Define asymmetric_verify_v3() to verify IMA sigv3 signatures ima: remove buggy support for asynchronous hashes integrity: Eliminate weak definition of arch_get_secureboot() ima: Add code comments to explain IMA iint cache atomic_flags ima_fs: Correctly create securityfs files for unsupported hash algos ima: check return value of crypto_shash_final() in boot aggregate ima: Define and use a digest_size field in the ima_algo_desc structure powerpc/ima: Drop unnecessary check for CONFIG_MODULE_SIG ima: efi: Drop unnecessary check for CONFIG_MODULE_SIG/CONFIG_KEXEC_SIG ima: fallback to using i_version to detect file change evm: fix security.evm for a file with IMA signature s390: Drop unnecessary CONFIG_IMA_SECURE_AND_OR_TRUSTED_BOOT evm: Don't enable fix mode when secure boot is enabled integrity: Make arch_ima_get_secureboot integrity-wide
2026-04-14Merge tag 'x86_cpu_for_7.1-rc1' of ↵Linus Torvalds1-0/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cpu updates from Dave Hansen: - Complete LASS enabling: deal with vsyscall and EFI The existing Linear Address Space Separation (LASS) support punted on support for common EFI and vsyscall configs. Complete the implementation by supporting EFI and vsyscall=xonly. - Clean up CPUID usage in newer Intel "avs" audio driver and update the x86-cpuid-db file * tag 'x86_cpu_for_7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: tools/x86/kcpuid: Update bitfields to x86-cpuid-db v3.0 ASoC: Intel: avs: Include CPUID header at file scope ASoC: Intel: avs: Check maximum valid CPUID leaf x86/cpu: Remove LASS restriction on vsyscall emulation x86/vsyscall: Disable LASS if vsyscall mode is set to EMULATE x86/vsyscall: Restore vsyscall=xonly mode under LASS x86/traps: Consolidate user fixups in the #GP handler x86/vsyscall: Reorganize the page fault emulation code x86/cpu: Remove LASS restriction on EFI x86/efi: Disable LASS while executing runtime services x86/cpu: Defer LASS enabling until userspace comes up
2026-04-01memblock, treewide: make memblock_free() handle late freeingMike Rapoport (Microsoft)2-5/+2
It shouldn't be responsibility of memblock users to detect if they free memory allocated from memblock late and should use memblock_free_late(). Make memblock_free() and memblock_phys_free() take care of late memory freeing and drop memblock_free_late(). Link: https://patch.msgid.link/20260323074836.3653702-9-rppt@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-03-31x86/platform/geode: Fix on-stack property data use-after-return bugDmitry Torokhov1-6/+18
The PROPERTY_ENTRY_GPIO macro (and by extension PROPERTY_ENTRY_REF) creates a temporary software_node_ref_args structure on the stack when used in a runtime assignment. This results in the property pointing to data that is invalid once the function returns. Fix this by ensuring the GPIO reference data is not stored on stack and using PROPERTY_ENTRY_REF_ARRAY_LEN() to point directly to the persistent reference data. Fixes: 298c9babadb8 ("x86/platform/geode: switch GPIO buttons and LEDs to software properties") Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Daniel Scally <djrscally@gmail.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Hans de Goede <hansg@kernel.org> Cc: Heikki Krogerus <heikki.krogerus@linux.intel.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260329-property-gpio-fix-v2-1-3cca5ba136d8@gmail.com
2026-03-27Merge tag 'efi-fixes-for-v7.0-3' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: "Fix a potential buffer overrun issue introduced by the previous fix for EFI boot services region reservations on x86" * tag 'efi-fixes-for-v7.0-3' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efi: efi_unmap_boot_services: fix calculation of ranges_to_free size
2026-03-20x86/efi: efi_unmap_boot_services: fix calculation of ranges_to_free sizeMike Rapoport (Microsoft)1-1/+1
ranges_to_free array should have enough room to store the entire EFI memmap plus an extra element for NULL entry. The calculation of this array size wrongly adds 1 to the overall size instead of adding 1 to the number of elements. Add parentheses to properly size the array. Reported-by: Guenter Roeck <linux@roeck-us.net> Fixes: a4b0bf6a40f3 ("x86/efi: defer freeing of boot services memory") Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-03-08Merge tag 'efi-fixes-for-v7.0-2' of ↵Linus Torvalds2-4/+53
git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi Pull EFI fix from Ard Biesheuvel: "Fix for the x86 EFI workaround keeping boot services code and data regions reserved until after SetVirtualAddressMap() completes: deferred struct page initialization may result in some of this memory being lost permanently" * tag 'efi-fixes-for-v7.0-2' of git://git.kernel.org/pub/scm/linux/kernel/git/efi/efi: x86/efi: defer freeing of boot services memory
2026-03-07Merge tag 'for-linus-7.0-rc3-tag' of ↵Linus Torvalds1-6/+1
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fixes from Juergen Gross: - a cleanup of arch/x86/kernel/head_64.S removing the pre-built page tables for Xen guests - a small comment update - another cleanup for Xen PVH guests mode - fix an issue with Xen PV-devices backed by driver domains * tag 'for-linus-7.0-rc3-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: xen/xenbus: better handle backend crash xenbus: add xenbus_device parameter to xenbus_read_driver_state() x86/PVH: Use boot params to pass RSDP address in start_info page x86/xen: update outdated comment xen/acpi-processor: fix _CST detection using undersized evaluation buffer x86/xen: Build identity mapping page tables dynamically for XENPV
2026-03-05integrity: Make arch_ima_get_secureboot integrity-wideCoiby Xu1-1/+1
EVM and other LSMs need the ability to query the secure boot status of the system, without directly calling the IMA arch_ima_get_secureboot function. Refactor the secure boot status check into a general function named arch_get_secureboot. Reported-and-suggested-by: Mimi Zohar <zohar@linux.ibm.com> Suggested-by: Roberto Sassu <roberto.sassu@huawei.com> Signed-off-by: Coiby Xu <coxu@redhat.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2026-03-03x86/efi: Disable LASS while executing runtime servicesSohil Mehta1-0/+35
Ideally, EFI runtime services should switch to kernel virtual addresses after SetVirtualAddressMap(). However, firmware implementations are known to be buggy in this regard and continue to access physical addresses. The kernel maintains a 1:1 mapping of all runtime services code and data regions to avoid breaking such firmware. LASS enforcement relies on bit 63 of the virtual address, which would block such accesses to the lower half. Unfortunately, not doing anything could lead to #GP faults when users update to a kernel with LASS enabled. One option is to use a STAC/CLAC pair to temporarily disable LASS data enforcement. However, there is no guarantee that the stray accesses would only touch data and not perform instruction fetches. Also, relying on the AC bit would depend on the runtime calls preserving RFLAGS, which is highly unlikely in practice. Instead, use the big hammer and switch off the entire LASS mechanism temporarily by clearing CR4.LASS. Runtime services are called in the context of efi_mm, which has explicitly unmapped any memory EFI isn't allowed to touch (including userspace). So, do this right after switching to efi_mm to avoid any security impact. Some runtime services can be invoked during boot when LASS isn't active. Use a global variable (similar to efi_mm) to save and restore the correct CR4.LASS state. The runtime calls are serialized with the efi_runtime_lock, so no concurrency issues are expected. Signed-off-by: Sohil Mehta <sohil.mehta@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Tony Luck <tony.luck@intel.com> Tested-by: Maciej Wieczor-Retman <maciej.wieczor-retman@intel.com> Link: https://patch.msgid.link/20260120234730.2215498-3-sohil.mehta@intel.com
2026-03-03x86/PVH: Use boot params to pass RSDP address in start_info pageHou Wenlong1-6/+1
After commit e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from boot params if available"), the RSDP address can be passed in boot params. Therefore, store the RSDP address in start_info page into boot params in the PVH entry instead of registering a different callback. This removes an absolute reference during the PVH entry and is more standardized. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <76675c4d49d3a8f72252076812ef8f22276230c2.1772282441.git.houwenlong.hwl@antgroup.com>
2026-02-25x86/efi: defer freeing of boot services memoryMike Rapoport (Microsoft)2-4/+53
efi_free_boot_services() frees memory occupied by EFI_BOOT_SERVICES_CODE and EFI_BOOT_SERVICES_DATA using memblock_free_late(). There are two issue with that: memblock_free_late() should be used for memory allocated with memblock_alloc() while the memory reserved with memblock_reserve() should be freed with free_reserved_area(). More acutely, with CONFIG_DEFERRED_STRUCT_PAGE_INIT=y efi_free_boot_services() is called before deferred initialization of the memory map is complete. Benjamin Herrenschmidt reports that this causes a leak of ~140MB of RAM on EC2 t3a.nano instances which only have 512MB or RAM. If the freed memory resides in the areas that memory map for them is still uninitialized, they won't be actually freed because memblock_free_late() calls memblock_free_pages() and the latter skips uninitialized pages. Using free_reserved_area() at this point is also problematic because __free_page() accesses the buddy of the freed page and that again might end up in uninitialized part of the memory map. Delaying the entire efi_free_boot_services() could be problematic because in addition to freeing boot services memory it updates efi.memmap without any synchronization and that's undesirable late in boot when there is concurrency. More robust approach is to only defer freeing of the EFI boot services memory. Split efi_free_boot_services() in two. First efi_unmap_boot_services() collects ranges that should be freed into an array then efi_free_boot_services() later frees them after deferred init is complete. Link: https://lore.kernel.org/all/ec2aaef14783869b3be6e3c253b2dcbf67dbc12a.camel@kernel.crashing.org Fixes: 916f676f8dc0 ("x86, efi: Retain boot service code until after switching to virtual mode") Cc: <stable@vger.kernel.org> Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds2-4/+4
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook2-4/+4
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-10Merge tag 'x86_cleanups_for_v7.0_rc1' of ↵Linus Torvalds1-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Borislav Petkov: - The usual set of cleanups and simplifications all over the tree * tag 'x86_cleanups_for_v7.0_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/segment: Use MOVL when reading segment registers selftests/x86: Clean up sysret_rip coding style x86/mm: Hide mm_free_global_asid() definition under CONFIG_BROADCAST_TLB_FLUSH x86/crash: Use set_memory_p() instead of __set_memory_prot() x86/CPU/AMD: Simplify the spectral chicken fix x86/platform/olpc: Replace strcpy() with strscpy() in xo15_sci_add() x86/split_lock: Remove dead string when split_lock_detect=fatal
2026-02-10Merge tag 'x86-boot-2026-02-09' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86/boot updates from Ingo Molnar: - x86/acpi: Add acpi=spcr to use SPCR-provided default console (Shenghao Yang) - x86/acpi/boot: Correct the acpi_is_processor_usable() check again (Yazen Ghannam) - Refresh the x86 memory map (e820 table) handling code, and make the printouts a bit more informative (Ingo Molnar) * tag 'x86-boot-2026-02-09' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (30 commits) x86/acpi: Add acpi=spcr to use SPCR-provided default console x86/boot/e820: Use <linux/sizes.h> symbols for literals x86/boot/e820: Make sure e820_search_gap() finds all gaps x86/boot/e820: Simplify the e820__range_remove() API x86/boot/e820: Remove e820__range_remove()'s unused return parameter x86/boot/e820: Simplify append_e820_table() and remove restriction on single-entry tables x86/boot/e820: Standardize __init/__initdata tag placement x86/boot/e820: Simplify & clarify __e820__range_add() a bit x86/boot/e820: Rename gap_start/gap_size to max_gap_start/max_gap_start in e820_search_gap() et al x86/boot/e820: Change e820_search_gap() to search for the highest-address PCI gap x86/boot/e820: Clean up e820__setup_pci_gap()/e820_search_gap() a bit x86/boot/e820: Change struct e820_table::nr_entries type from __u32 to u32 x86/boot/e820: Standardize e820 table index variable types under 'u32' x86/boot/e820: Standardize e820 table index variable names under 'idx' x86/boot/e820: Remove unnecessary header inclusions x86/boot/e820: Clean up __refdata use a bit x86/boot/e820: Clean up __e820__range_add() a bit x86/boot/e820: Improve e820_print_type() messages x86/boot/e820: Clean up confusing and self-contradictory verbiage around E820 related resource allocations x86/boot/e820: Remove pointless early_panic() indirection ...
2026-01-12x86/xen/pvh: Enable PAE mode for 32-bit guest only when CONFIG_X86_PAE is setHou Wenlong1-0/+2
The PVH entry is available for 32-bit KVM guests, and 32-bit KVM guests do not depend on CONFIG_X86_PAE. However, mk_early_pgtbl_32() builds different pagetables depending on whether CONFIG_X86_PAE is set. Therefore, enabling PAE mode for 32-bit KVM guests without CONFIG_X86_PAE being set would result in a boot failure during CR3 loading. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: Juergen Gross <jgross@suse.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <d09ce9a134eb9cbc16928a5b316969f8ba606b81.1768017442.git.houwenlong.hwl@antgroup.com>
2026-01-05x86/platform/olpc: Replace strcpy() with strscpy() in xo15_sci_add()Thorsten Blum1-2/+3
strcpy() has been deprecated¹ because it performs no bounds checking on the destination buffer, which can lead to buffer overflows. Use the safer strscpy() instead. ¹ https://www.kernel.org/doc/html/latest/process/deprecated.html#strcpy Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20251124125455.5495-2-thorsten.blum@linux.dev
2025-12-14x86/boot/e820: Simplify the e820__range_remove() APIIngo Molnar1-2/+1
Right now e820__range_remove() has two parameters to control the E820 type of the range removed: extern void e820__range_remove(u64 start, u64 size, enum e820_type old_type, bool check_type); Since E820 types start at 1, zero has a natural meaning of 'no type. Consolidate the (old_type,check_type) parameters into a single (filter_type) parameter: extern void e820__range_remove(u64 start, u64 size, enum e820_type filter_type); Note that both e820__mapped_raw_any() and e820__mapped_any() already have such semantics for their 'type' parameter, although it's currently not used with '0' by in-kernel code. Also, the __e820__mapped_all() internal helper already has such semantics implemented as well, and the e820__get_entry_type() API uses the '0' type to such effect. This simplifies not just e820__range_remove(), and synchronizes its use of type filters with other E820 API functions, but simplifies usage sites as well, such as parse_memmap_one(), beyond the reduction of the number of parameters: - else if (from) - e820__range_remove(start_at, mem_size, from, 1); else - e820__range_remove(start_at, mem_size, 0, 0); + e820__range_remove(start_at, mem_size, from); The generated code gets smaller as well: add/remove: 0/0 grow/shrink: 0/5 up/down: 0/-66 (-66) Function old new delta parse_memopt 112 107 -5 efi_init 1048 1039 -9 setup_arch 2719 2709 -10 e820__range_remove 283 273 -10 parse_memmap_opt 559 527 -32 Total: Before=22,675,600, After=22,675,534, chg -0.00% Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H . Peter Anvin <hpa@zytor.com> Cc: Andy Shevchenko <andy@kernel.org> Cc: Arnd Bergmann <arnd@kernel.org> Cc: David Woodhouse <dwmw@amazon.co.uk> Cc: Juergen Gross <jgross@suse.com> Cc: Kees Cook <keescook@chromium.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: Paul Menzel <pmenzel@molgen.mpg.de> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://patch.msgid.link/20250515120549.2820541-28-mingo@kernel.org
2025-10-11Merge tag 'x86_core_for_v6.18_rc1' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull more x86 updates from Borislav Petkov: - Remove a bunch of asm implementing condition flags testing in KVM's emulator in favor of int3_emulate_jcc() which is written in C - Replace KVM fastops with C-based stubs which avoids problems with the fastop infra related to latter not adhering to the C ABI due to their special calling convention and, more importantly, bypassing compiler control-flow integrity checking because they're written in asm - Remove wrongly used static branches and other ugliness accumulated over time in hyperv's hypercall implementation with a proper static function call to the correct hypervisor call variant - Add some fixes and modifications to allow running FRED-enabled kernels in KVM even on non-FRED hardware - Add kCFI improvements like validating indirect calls and prepare for enabling kCFI with GCC. Add cmdline params documentation and other code cleanups - Use the single-byte 0xd6 insn as the official #UD single-byte undefined opcode instruction as agreed upon by both x86 vendors - Other smaller cleanups and touchups all over the place * tag 'x86_core_for_v6.18_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (24 commits) x86,retpoline: Optimize patch_retpoline() x86,ibt: Use UDB instead of 0xEA x86/cfi: Remove __noinitretpoline and __noretpoline x86/cfi: Add "debug" option to "cfi=" bootparam x86/cfi: Standardize on common "CFI:" prefix for CFI reports x86/cfi: Document the "cfi=" bootparam options x86/traps: Clarify KCFI instruction layout compiler_types.h: Move __nocfi out of compiler-specific header objtool: Validate kCFI calls x86/fred: KVM: VMX: Always use FRED for IRQs when CONFIG_X86_FRED=y x86/fred: Play nice with invoking asm_fred_entry_from_kvm() on non-FRED hardware x86/fred: Install system vector handlers even if FRED isn't fully enabled x86/hyperv: Use direct call to hypercall-page x86/hyperv: Clean up hv_do_hypercall() KVM: x86: Remove fastops KVM: x86: Convert em_salc() to C KVM: x86: Introduce EM_ASM_3WCL KVM: x86: Introduce EM_ASM_1SRC2 KVM: x86: Introduce EM_ASM_2CL KVM: x86: Introduce EM_ASM_2W ...
2025-10-02Merge tag 'mm-stable-2025-10-01-19-00' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memory allocation profiling" from Suren Baghdasaryan addresses a couple of minor issues in relatively new memory allocation profiling feature - "Small cleanups" from Matthew Wilcox has a few cleanups in preparation for more memdesc work - "mm/damon: add addr_unit for DAMON_LRU_SORT and DAMON_RECLAIM" from Quanmin Yan makes some changes to DAMON in furtherance of supporting arm highmem - "selftests/mm: Add -Wunreachable-code and fix warnings" from Muhammad Anjum adds that compiler check to selftests code and fixes the fallout, by removing dead code - "Improvements to Victim Process Thawing and OOM Reaper Traversal Order" from zhongjinji makes a number of improvements in the OOM killer: mainly thawing a more appropriate group of victim threads so they can release resources - "mm/damon: misc fixups and improvements for 6.18" from SeongJae Park is a bunch of small and unrelated fixups for DAMON - "mm/damon: define and use DAMON initialization check function" from SeongJae Park implement reliability and maintainability improvements to a recently-added bug fix - "mm/damon/stat: expose auto-tuned intervals and non-idle ages" from SeongJae Park provides additional transparency to userspace clients of the DAMON_STAT information - "Expand scope of khugepaged anonymous collapse" from Dev Jain removes some constraints on khubepaged's collapsing of anon VMAs. It also increases the success rate of MADV_COLLAPSE against an anon vma - "mm: do not assume file == vma->vm_file in compat_vma_mmap_prepare()" from Lorenzo Stoakes moves us further towards removal of file_operations.mmap(). This patchset concentrates upon clearing up the treatment of stacked filesystems - "mm: Improve mlock tracking for large folios" from Kiryl Shutsemau provides some fixes and improvements to mlock's tracking of large folios. /proc/meminfo's "Mlocked" field became more accurate - "mm/ksm: Fix incorrect accounting of KSM counters during fork" from Donet Tom fixes several user-visible KSM stats inaccuracies across forks and adds selftest code to verify these counters - "mm_slot: fix the usage of mm_slot_entry" from Wei Yang addresses some potential but presently benign issues in KSM's mm_slot handling * tag 'mm-stable-2025-10-01-19-00' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (372 commits) mm: swap: check for stable address space before operating on the VMA mm: convert folio_page() back to a macro mm/khugepaged: use start_addr/addr for improved readability hugetlbfs: skip VMAs without shareable locks in hugetlb_vmdelete_list alloc_tag: fix boot failure due to NULL pointer dereference mm: silence data-race in update_hiwater_rss mm/memory-failure: don't select MEMORY_ISOLATION mm/khugepaged: remove definition of struct khugepaged_mm_slot mm/ksm: get mm_slot by mm_slot_entry() when slot is !NULL hugetlb: increase number of reserving hugepages via cmdline selftests/mm: add fork inheritance test for ksm_merging_pages counter mm/ksm: fix incorrect KSM counter handling in mm_struct during fork drivers/base/node: fix double free in register_one_node() mm: remove PMD alignment constraint in execmem_vmalloc() mm/memory_hotplug: fix typo 'esecially' -> 'especially' mm/rmap: improve mlock tracking for large folios mm/filemap: map entire large folio faultaround mm/fault: try to map the entire file folio in finish_fault() mm/rmap: mlock large folios in try_to_unmap_one() mm/rmap: fix a mlock race condition in folio_referenced_one() ...
2025-09-21x86: stop calling page_address() in free_pages()Vishal Moola (Oracle)1-1/+1
free_pages() should be used when we only have a virtual address. We should call __free_pages() directly on our page instead. Link: https://lkml.kernel.org/r/20250903185921.1785167-4-vishal.moola@gmail.com Signed-off-by: Vishal Moola (Oracle) <vishal.moola@gmail.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Andy Lutomirski <luto@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Justin Sanders <justin@coraid.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: "Michael S. Tsirkin" <mst@redhat.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: SeongJae Park <sj@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-09-03x86/boot: Move startup code out of __head sectionArd Biesheuvel1-1/+1
Move startup code out of the __head section, now that this no longer has a special significance. Move everything into .text or .init.text as appropriate, so that startup code is not kept around unnecessarily. [ bp: Fold in hunk to fix 32-bit CPU hotplug: Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202509022207.56fd97f4-lkp@intel.com ] Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828102202.1849035-45-ardb+git@google.com
2025-08-18objtool: Validate kCFI callsPeter Zijlstra1-0/+4
Validate that all indirect calls adhere to kCFI rules. Notably doing nocfi indirect call to a cfi function is broken. Apparently some Rust 'core' code violates this and explodes when ran with FineIBT. All the ANNOTATE_NOCFI_SYM sites are prime targets for attackers. - runtime EFI is especially henous because it also needs to disable IBT. Basically calling unknown code without CFI protection at runtime is a massice security issue. - Kexec image handover; if you can exploit this, you get to keep it :-) Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@kernel.org> Acked-by: Sean Christopherson <seanjc@google.com> Link: https://lkml.kernel.org/r/20250714103441.496787279@infradead.org
2025-07-29Merge tag 'x86_sev_for_v6.17_rc1' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Map the SNP calling area pages too so that OVMF EFI fw can issue SVSM calls properly with the goal of implementing EFI variable store in the SVSM - a component which is trusted by the guest, vs in the firmware, which is not - Allow the kernel to handle #VC exceptions from EFI runtime services properly when running as a SNP guest - Rework and cleanup the SNP guest request issue glue code a bit * tag 'x86_sev_for_v6.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Let sev_es_efi_map_ghcbs() map the CA pages too x86/sev/vc: Fix EFI runtime instruction emulation x86/sev: Drop unnecessary parameter in snp_issue_guest_request() x86/sev: Document requirement for linear mapping of guest request buffers x86/sev: Allocate request in TSC_INFO_REQ on stack virt: sev-guest: Contain snp_guest_request_ioctl in sev-guest
2025-06-29serial: 8250: Move CE4100 quirks to a module under 8250 driverAndy Shevchenko1-98/+0
There is inconvenient for maintainers and maintainership to have some quirks under architectural code. Move it to the specific quirk file like other 8250-compatible drivers do. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://lore.kernel.org/r/20250627182743.1273326-1-andriy.shevchenko@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-27x86/sev: Let sev_es_efi_map_ghcbs() map the CA pages tooGerd Hoffmann1-2/+2
OVMF EFI firmware needs access to the CA page to do SVSM protocol calls. For example, when the SVSM implements an EFI variable store, such calls will be necessary. So add that to sev_es_efi_map_ghcbs() and also rename the function to reflect the additional job it is doing now. [ bp: Massage. ] Signed-off-by: Gerd Hoffmann <kraxel@redhat.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250626114014.373748-4-kraxel@redhat.com
2025-06-24serial: ce4100: clean up serial_in/out() hooksJiri Slaby (SUSE)1-18/+21
ce4100_mem_serial_in() unnecessarily contains 4 nested 'if's. That makes the code hard to follow. Invert the conditions and return early if the particular conditions do not hold. And use "<<=" for shifting the offset in both of the hooks. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250623101246.486866-2-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-06-24serial: ce4100: fix build after serial_in/out() changesJiri Slaby (SUSE)1-3/+3
This x86_32 driver remain unnoticed, so after the commit below, the compilation now fails with: arch/x86/platform/ce4100/ce4100.c:107:16: error: incompatible function pointer types assigning to '...' from '...' To fix the error, convert also ce4100 to the new uart_port::serial_{in,out}() types. Signed-off-by: "Jiri Slaby (SUSE)" <jirislaby@kernel.org> Fixes: fc9ceb501e38 ("serial: 8250: sanitize uart_port::serial_{in,out}() types") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506190552.TqNasrC3-lkp@intel.com/ Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Link: https://lore.kernel.org/r/20250623101246.486866-1-jirislaby@kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-13Merge branch 'x86/msr' into x86/core, to resolve conflictsIngo Molnar2-4/+4
Conflicts: arch/x86/boot/startup/sme.c arch/x86/coco/sev/core.c arch/x86/kernel/fpu/core.c arch/x86/kernel/fpu/xstate.c Semantic conflict: arch/x86/include/asm/sev-internal.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13Merge branch 'x86/asm' into x86/core, to merge dependent commitsIngo Molnar1-2/+1
Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: 6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-13Merge branch 'x86/alternatives' into x86/core, to merge dependent commitsIngo Molnar1-5/+3
Prepare to resolve conflicts with an upstream series of fixes that conflict with pending x86 changes: 6f5bf947bab0 Merge tag 'its-for-linus-20250509' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-06Merge tag 'v6.15-rc4' into x86/asm, to pick up fixesIngo Molnar1-2/+2
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-06x86/mm: Fix false positive warning in switch_mm_irqs_off()Peter Zijlstra1-0/+1
Multiple testers reported the following new warning: WARNING: CPU: 0 PID: 0 at arch/x86/mm/tlb.c:795 Which corresponds to: if (IS_ENABLED(CONFIG_DEBUG_VM) && WARN_ON_ONCE(prev != &init_mm && !cpumask_test_cpu(cpu, mm_cpumask(next)))) cpumask_set_cpu(cpu, mm_cpumask(next)); So the problem is that unuse_temporary_mm() explicitly clears that bit; and it has to, because otherwise the flush_tlb_mm_range() in __text_poke() will try sending IPIs, which are not at all needed. See also: https://lore.kernel.org/all/20241113095550.GBZzR3pg-RhJKPDazS@fat_crate.local/ Notably, the whole {,un}use_temporary_mm() thing requires preemption to be disabled across it with the express purpose of keeping all TLB nonsense CPU local, such that invalidations can also stay local etc. However, as a side-effect, we violate this above WARN(), which sorta makes sense for the normal case, but very much doesn't make sense here. Change unuse_temporary_mm() to mark the mm_struct such that a further exception (beyond init_mm) can be grafted, to keep the warning for all the other cases. Reported-by: Chaitanya Kumar Borah <chaitanya.kumar.borah@intel.com