aboutsummaryrefslogtreecommitdiff
path: root/arch/s390/include
AgeCommit message (Collapse)AuthorFilesLines
2025-12-14s390/ipl: Clear SBP flag when bootprog is setSven Schnelle1-0/+1
With z16 a new flag 'search boot program' was introduced for list-directed IPL (SCSI, NVMe, ECKD DASD). If this flag is set, e.g. via selecting the "Automatic" value for the "Boot program selector" control on an HMC load panel, it is copied to the reipl structure from the initial ipl structure. When a user now sets a boot prog via sysfs, the flag is not cleared and the bootloader will again automatically select the boot program, ignoring user configuration. To avoid that, clear the SBP flag when a bootprog sysfs file is written. Cc: stable@vger.kernel.org Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-12-08s390/bug: Add missing alignmentHeiko Carstens1-0/+1
All objects are supposed to have a minimal alignment of two, since a couple of instructions only work with even addresses. Add the missing align statement for the file string. Fixes: 6584ff203aec ("bugs/s390: Use 'cond_str' in __EMIT_BUG()") Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-12-08s390/bug: Add missing CONFIG_BUG ifdef againHeiko Carstens1-0/+4
Fallback to generic BUG implementation in case CONFIG_BUG is disabled. This restores the old behaviour before 'cond_str' support was added. It probably doesn't matter, since nobody should disable CONFIG_BUG, but at least this is consistent to before. Fixes: 6584ff203aec ("bugs/s390: Use 'cond_str' in __EMIT_BUG()") Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-12-07s390/pci: Migrate s390 IRQ logic to IRQ domain APITobias Schumacher1-0/+5
s390 is one of the last architectures using the legacy API for setup and teardown of PCI MSI IRQs. Migrate the s390 IRQ allocation and teardown to the MSI parent domain API. For details, see: https://lore.kernel.org/lkml/20221111120501.026511281@linutronix.de In detail, create an MSI parent domain for each PCI domain. When a PCI device sets up MSI or MSI-X IRQs, the library creates a per-device IRQ domain for this device, which is used by the device for allocating and freeing IRQs. The per-device domain delegates this allocation and freeing to the parent-domain. In the end, the corresponding callbacks of the parent domain are responsible for allocating and freeing the IRQs. The allocation is split into two parts: - zpci_msi_prepare() is called once for each device and allocates the required resources. On s390, each PCI function has its own airq vector and a summary bit, which must be configured once per function. This is done in prepare(). - zpci_msi_alloc() can be called multiple times for allocating one or more MSI/MSI-X IRQs. This creates a mapping between the virtual IRQ number in the kernel and the hardware IRQ number. Freeing is split into two counterparts: - zpci_msi_free() reverts the effects of zpci_msi_alloc() and - zpci_msi_teardown() reverts the effects of zpci_msi_prepare(). This is called once when all IRQs are freed before a device is removed. Since the parent domain in the end allocates the IRQs, the hwirq encoding must be unambiguous for all IRQs of all devices. This is achieved by encoding the hwirq using the devfn and the MSI index. Reviewed-by: Niklas Schnelle <schnelle@linux.ibm.com> Reviewed-by: Farhan Ali <alifm@linux.ibm.com> Signed-off-by: Tobias Schumacher <ts@linux.ibm.com> Reviewed-by: Gerd Bayer <gbayer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-12-07s390/vmem: Support 2G page splitting for KASAN shadow freeingVasily Gorbik1-0/+2
Export split_pud_page() so it can be used from the vmem code and teach modify_pud_table() to split PUD-sized mappings when only a subrange needs to be removed. If the range to be removed covers a full PUD-sized mapping, keep the existing behavior: clear the PUD entry and free the backing large page (for non-direct mappings). Otherwise, split the PUD-mapped page into PMD mappings and let the walker handle the smaller ranges. This is needed for KASAN early shadow removal support: memory hotplug freeing the KASAN early shadow is the only expected caller that will try to free 2G PUD-mapped regions of non-direct mappings. Signed-off-by: Vasily Gorbik <gor@linux.ibm.com> Reviewed-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-12-05Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds2-4/+5
Pull KVM updates from Paolo Bonzini: "ARM: - Support for userspace handling of synchronous external aborts (SEAs), allowing the VMM to potentially handle the abort in a non-fatal manner - Large rework of the VGIC's list register handling with the goal of supporting more active/pending IRQs than available list registers in hardware. In addition, the VGIC now supports EOImode==1 style deactivations for IRQs which may occur on a separate vCPU than the one that acked the IRQ - Support for FEAT_XNX (user / privileged execute permissions) and FEAT_HAF (hardware update to the Access Flag) in the software page table walkers and shadow MMU - Allow page table destruction to reschedule, fixing long need_resched latencies observed when destroying a large VM - Minor fixes to KVM and selftests Loongarch: - Get VM PMU capability from HW GCFG register - Add AVEC basic support - Use 64-bit register definition for EIOINTC - Add KVM timer test cases for tools/selftests RISC/V: - SBI message passing (MPXY) support for KVM guest - Give a new, more specific error subcode for the case when in-kernel AIA virtualization fails to allocate IMSIC VS-file - Support KVM_DIRTY_LOG_INITIALLY_SET, enabling dirty log gradually in small chunks - Fix guest page fault within HLV* instructions - Flush VS-stage TLB after VCPU migration for Andes cores s390: - Always allocate ESCA (Extended System Control Area), instead of starting with the basic SCA and converting to ESCA with the addition of the 65th vCPU. The price is increased number of exits (and worse performance) on z10 and earlier processor; ESCA was introduced by z114/z196 in 2010 - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups x86: - Skip the costly "zap all SPTEs" on an MMIO generation wrap if MMIO SPTE caching is disabled, as there can't be any relevant SPTEs to zap - Relocate a misplaced export - Fix an async #PF bug where KVM would clear the completion queue when the guest transitioned in and out of paging mode, e.g. when handling an SMI and then returning to paged mode via RSM - Leave KVM's user-return notifier registered even when disabling virtualization, as long as kvm.ko is loaded. On reboot/shutdown, keeping the notifier registered is ok; the kernel does not use the MSRs and the callback will run cleanly and restore host MSRs if the CPU manages to return to userspace before the system goes down - Use the checked version of {get,put}_user() - Fix a long-lurking bug where KVM's lack of catch-up logic for periodic APIC timers can result in a hard lockup in the host - Revert the periodic kvmclock sync logic now that KVM doesn't use a clocksource that's subject to NTP corrections - Clean up KVM's handling of MMIO Stale Data and L1TF, and bury the latter behind CONFIG_CPU_MITIGATIONS - Context switch XCR0, XSS, and PKRU outside of the entry/exit fast path; the only reason they were handled in the fast path was to paper of a bug in the core #MC code, and that has long since been fixed - Add emulator support for AVX MOV instructions, to play nice with emulated devices whose guest drivers like to access PCI BARs with large multi-byte instructions x86 (AMD): - Fix a few missing "VMCB dirty" bugs - Fix the worst of KVM's lack of EFER.LMSLE emulation - Add AVIC support for addressing 4k vCPUs in x2AVIC mode - Fix incorrect handling of selective CR0 writes when checking intercepts during emulation of L2 instructions - Fix a currently-benign bug where KVM would clobber SPEC_CTRL[63:32] on VMRUN and #VMEXIT - Fix a bug where KVM corrupt the guest code stream when re-injecting a soft interrupt if the guest patched the underlying code after the VM-Exit, e.g. when Linux patches code with a temporary INT3 - Add KVM_X86_SNP_POLICY_BITS to advertise supported SNP policy bits to userspace, and extend KVM "support" to all policy bits that don't require any actual support from KVM x86 (Intel): - Use the root role from kvm_mmu_page to construct EPTPs instead of the current vCPU state, partly as worthwhile cleanup, but mostly to pave the way for tracking per-root TLB flushes, and elide EPT flushes on pCPU migration if the root is clean from a previous flush - Add a few missing nested consistency checks - Rip out support for doing "early" consistency checks via hardware as the functionality hasn't been used in years and is no longer useful in general; replace it with an off-by-default module param to WARN if hardware fails a check that KVM does not perform - Fix a currently-benign bug where KVM would drop the guest's SPEC_CTRL[63:32] on VM-Enter - Misc cleanups - Overhaul the TDX code to address systemic races where KVM (acting on behalf of userspace) could inadvertantly trigger lock contention in the TDX-Module; KVM was either working around these in weird, ugly ways, or was simply oblivious to them (though even Yan's devilish selftests could only break individual VMs, not the host kernel) - Fix a bug where KVM could corrupt a vCPU's cpu_list when freeing a TDX vCPU, if creating said vCPU failed partway through - Fix a few sparse warnings (bad annotation, 0 != NULL) - Use struct_size() to simplify copying TDX capabilities to userspace - Fix a bug where TDX would effectively corrupt user-return MSR values if the TDX Module rejects VP.ENTER and thus doesn't clobber host MSRs as expected Selftests: - Fix a math goof in mmu_stress_test when running on a single-CPU system/VM - Forcefully override ARCH from x86_64 to x86 to play nice with specifying ARCH=x86_64 on the command line - Extend a bunch of nested VMX to validate nested SVM as well - Add support for LA57 in the core VM_MODE_xxx macro, and add a test to verify KVM can save/restore nested VMX state when L1 is using 5-level paging, but L2 is not - Clean up the guest paging code in anticipation of sharing the core logic for nested EPT and nested NPT guest_memfd: - Add NUMA mempolicy support for guest_memfd, and clean up a variety of rough edges in guest_memfd along the way - Define a CLASS to automatically handle get+put when grabbing a guest_memfd from a memslot to make it harder to leak references - Enhance KVM selftests to make it easer to develop and debug selftests like those added for guest_memfd NUMA support, e.g. where test and/or KVM bugs often result in hard-to-debug SIGBUS errors - Misc cleanups Generic: - Use the recently-added WQ_PERCPU when creating the per-CPU workqueue for irqfd cleanup - Fix a goof in the dirty ring documentation - Fix choice of target for directed yield across different calls to kvm_vcpu_on_spin(); the function was always starting from the first vCPU instead of continuing the round-robin search" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (260 commits) KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: s390: Use generic VIRT_XFER_TO_GUEST_WORK functions ...
2025-12-02Merge tag 's390-6.19-1' of ↵Linus Torvalds33-642/+295
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 updates from Heiko Carstens: - Provide a new interface for dynamic configuration and deconfiguration of hotplug memory, allowing with and without memmap_on_memory support. This makes the way memory hotplug is handled on s390 much more similar to other architectures - Remove compat support. There shouldn't be any compat user space around anymore, therefore get rid of a lot of code which also doesn't need to be tested anymore - Add stackprotector support. GCC 16 will get new compiler options, which allow to generate code required for kernel stackprotector support - Merge pai_crypto and pai_ext PMU drivers into a new driver. This removes a lot of duplicated code. The new driver is also extendable and allows to support new PMUs - Add driver override support for AP queues - Rework and extend zcrypt and AP trace events to allow for tracing of crypto requests - Support block sizes larger than 65535 bytes for CCW tape devices - Since the rework of the virtual kernel address space the module area and the kernel image are within the same 4GB area. This eliminates the need of weak per cpu variables. Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU - Various other small improvements and fixes * tag 's390-6.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (92 commits) watchdog: diag288_wdt: Remove KMSG_COMPONENT macro s390/entry: Use lay instead of aghik s390/vdso: Get rid of -m64 flag handling s390/vdso: Rename vdso64 to vdso s390: Rename head64.S to head.S s390/vdso: Use common STABS_DEBUG and DWARF_DEBUG macros s390: Add stackprotector support s390/modules: Simplify module_finalize() slightly s390: Remove KMSG_COMPONENT macro s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPU s390/ap: Restrict driver_override versus apmask and aqmask use s390/ap: Rename mutex ap_perms_mutex to ap_attr_mutex s390/ap: Support driver_override for AP queue devices s390/ap: Use all-bits-one apmask/aqmask for vfio in_use() checks s390/debug: Update description of resize operation s390/syscalls: Switch to generic system call table generation s390/syscalls: Remove system call table pointer from thread_struct s390/uapi: Remove 31 bit support from uapi header files s390: Remove compat support tools: Remove s390 compat support ...
2025-12-02Merge tag 'kvm-s390-next-6.19-1' of ↵Paolo Bonzini2-4/+5
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD - SCA rework - VIRT_XFER_TO_GUEST_WORK support - Operation exception forwarding support - Cleanups
2025-12-02Merge tag 'core-uaccess-2025-11-30' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scoped user access updates from Thomas Gleixner: "Scoped user mode access and related changes: - Implement the missing u64 user access function on ARM when CONFIG_CPU_SPECTRE=n. This makes it possible to access a 64bit value in generic code with [unsafe_]get_user(). All other architectures and ARM variants provide the relevant accessors already. - Ensure that ASM GOTO jump label usage in the user mode access helpers always goes through a local C scope label indirection inside the helpers. This is required because compilers are not supporting that a ASM GOTO target leaves a auto cleanup scope. GCC silently fails to emit the cleanup invocation and CLANG fails the build. [ Editor's note: gcc-16 will have fixed the code generation issue in commit f68fe3ddda4 ("eh: Invoke cleanups/destructors in asm goto jumps [PR122835]"). But we obviously have to deal with clang and older versions of gcc, so.. - Linus ] This provides generic wrapper macros and the conversion of affected architecture code to use them. - Scoped user mode access with auto cleanup Access to user mode memory can be required in hot code paths, but if it has to be done with user controlled pointers, the access is shielded with a speculation barrier, so that the CPU cannot speculate around the address range check. Those speculation barriers impact performance quite significantly. This cost can be avoided by "masking" the provided pointer so it is guaranteed to be in the valid user memory access range and otherwise to point to a guaranteed unpopulated address space. This has to be done without branches so it creates an address dependency for the access, which the CPU cannot speculate ahead. This results in repeating and error prone programming patterns: if (can_do_masked_user_access()) from = masked_user_read_access_begin((from)); else if (!user_read_access_begin(from, sizeof(*from))) return -EFAULT; unsafe_get_user(val, from, Efault); user_read_access_end(); return 0; Efault: user_read_access_end(); return -EFAULT; which can be replaced with scopes and automatic cleanup: scoped_user_read_access(from, Efault) unsafe_get_user(val, from, Efault); return 0; Efault: return -EFAULT; - Convert code which implements the above pattern over to scope_user.*.access(). This also corrects a couple of imbalanced masked_*_begin() instances which are harmless on most architectures, but prevent PowerPC from implementing the masking optimization. - Add a missing speculation barrier in copy_from_user_iter()" * tag 'core-uaccess-2025-11-30' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: lib/strn*,uaccess: Use masked_user_{read/write}_access_begin when required scm: Convert put_cmsg() to scoped user access iov_iter: Add missing speculation barrier to copy_from_user_iter() iov_iter: Convert copy_from_user_iter() to masked user access select: Convert to scoped user access x86/futex: Convert to scoped user access futex: Convert to get/put_user_inline() uaccess: Provide put/get_user_inline() uaccess: Provide scoped user access regions arm64: uaccess: Use unsafe wrappers for ASM GOTO s390/uaccess: Use unsafe wrappers for ASM GOTO riscv/uaccess: Use unsafe wrappers for ASM GOTO powerpc/uaccess: Use unsafe wrappers for ASM GOTO x86/uaccess: Use unsafe wrappers for ASM GOTO uaccess: Provide ASM GOTO safe wrappers for unsafe_*_user() ARM: uaccess: Implement missing __get_user_asm_dword()
2025-12-01Merge tag 'core-bugs-2025-12-01' of ↵Linus Torvalds1-58/+44
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull bug handling infrastructure updates from Ingo Molnar: "Core updates: - Improve WARN(), which has vararg printf like arguments, to work with the x86 #UD based WARN-optimizing infrastructure by hiding the format in the bug_table and replacing this first argument with the address of the bug-table entry, while making the actual function that's called a UD1 instruction (Peter Zijlstra) - Introduce the CONFIG_DEBUG_BUGVERBOSE_DETAILED Kconfig switch (Ingo Molnar, s390 support by Heiko Carstens) Fixes and cleanups: - bugs/s390: Remove private WARN_ON() implementation (Heiko Carstens) - <asm/bugs.h>: Make i386 use GENERIC_BUG_RELATIVE_POINTERS (Peter Zijlstra)" * tag 'core-bugs-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (31 commits) x86/bugs: Make i386 use GENERIC_BUG_RELATIVE_POINTERS x86/bug: Fix BUG_FORMAT vs KASLR x86_64/bug: Inline the UD1 x86/bug: Implement WARN_ONCE() x86_64/bug: Implement __WARN_printf() x86/bug: Use BUG_FORMAT for DEBUG_BUGVERBOSE_DETAILED x86/bug: Add BUG_FORMAT basics bug: Allow architectures to provide __WARN_printf() bug: Implement WARN_ON() using __WARN_FLAGS() bug: Add report_bug_entry() bug: Add BUG_FORMAT_ARGS infrastructure bug: Clean up CONFIG_GENERIC_BUG_RELATIVE_POINTERS bug: Add BUG_FORMAT infrastructure x86: Rework __bug_table helpers bugs/s390: Remove private WARN_ON() implementation bugs/core: Reorganize fields in the first line of WARNING output, add ->comm[] output bugs/sh: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output bugs/parisc: Concatenate 'cond_str' with '__FILE__' in __WARN_FLAGS(), to extend WARN_ON/BUG_ON output bugs/riscv: Concatenate 'cond_str' with '__FILE__' in __BUG_FLAGS(), to extend WARN_ON/BUG_ON output bugs/riscv: Pass in 'cond_str' to __BUG_FLAGS() ...
2025-12-01Merge tag 'objtool-core-2025-12-01' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull objtool updates from Ingo Molnar: - klp-build livepatch module generation (Josh Poimboeuf) Introduce new objtool features and a klp-build script to generate livepatch modules using a source .patch as input. This builds on concepts from the longstanding out-of-tree kpatch project which began in 2012 and has been used for many years to generate livepatch modules for production kernels. However, this is a complete rewrite which incorporates hard-earned lessons from 12+ years of maintaining kpatch. Key improvements compared to kpatch-build: - Integrated with objtool: Leverages objtool's existing control-flow graph analysis to help detect changed functions. - Works on vmlinux.o: Supports late-linked objects, making it compatible with LTO, IBT, and similar. - Simplified code base: ~3k fewer lines of code. - Upstream: No more out-of-tree #ifdef hacks, far less cruft. - Cleaner internals: Vastly simplified logic for symbol/section/reloc inclusion and special section extraction. - Robust __LINE__ macro handling: Avoids false positive binary diffs caused by the __LINE__ macro by introducing a fix-patch-lines script which injects #line directives into the source .patch to preserve the original line numbers at compile time. - Disassemble code with libopcodes instead of running objdump (Alexandre Chartre) - Disassemble support (-d option to objtool) by Alexandre Chartre, which supports the decoding of various Linux kernel code generation specials such as alternatives: 17ef: sched_balance_find_dst_group+0x62f mov 0x34(%r9),%edx 17f3: sched_balance_find_dst_group+0x633 | <alternative.17f3> | X86_FEATURE_POPCNT 17f3: sched_balance_find_dst_group+0x633 | call 0x17f8 <__sw_hweight64> | popcnt %rdi,%rax 17f8: sched_balance_find_dst_group+0x638 cmp %eax,%edx ... jump table alternatives: 1895: sched_use_asym_prio+0x5 test $0x8,%ch 1898: sched_use_asym_prio+0x8 je 0x18a9 <sched_use_asym_prio+0x19> 189a: sched_use_asym_prio+0xa | <jump_table.189a> | JUMP 189a: sched_use_asym_prio+0xa | jmp 0x18ae <sched_use_asym_prio+0x1e> | nop2 189c: sched_use_asym_prio+0xc mov $0x1,%eax 18a1: sched_use_asym_prio+0x11 and $0x80,%ecx ... exception table alternatives: native_read_msr: 5b80: native_read_msr+0x0 mov %edi,%ecx 5b82: native_read_msr+0x2 | <ex_table.5b82> | EXCEPTION 5b82: native_read_msr+0x2 | rdmsr | resume at 0x5b84 <native_read_msr+0x4> 5b84: native_read_msr+0x4 shl $0x20,%rdx .... x86 feature flag decoding (also see the X86_FEATURE_POPCNT example in sched_balance_find_dst_group() above): 2faaf: start_thread_common.constprop.0+0x1f jne 0x2fba4 <start_thread_common.constprop.0+0x114> 2fab5: start_thread_common.constprop.0+0x25 | <alternative.2fab5> | X86_FEATURE_ALWAYS | X86_BUG_NULL_SEG 2fab5: start_thread_common.constprop.0+0x25 | jmp 0x2faba <.altinstr_aux+0x2f4> | jmp 0x4b0 <start_thread_common.constprop.0+0x3f> | nop5 2faba: start_thread_common.constprop.0+0x2a mov $0x2b,%eax ... NOP sequence shortening: 1048e2: snapshot_write_finalize+0xc2 je 0x104917 <snapshot_write_finalize+0xf7> 1048e4: snapshot_write_finalize+0xc4 nop6 1048ea: snapshot_write_finalize+0xca nop11 1048f5: snapshot_write_finalize+0xd5 nop11 104900: snapshot_write_finalize+0xe0 mov %rax,%rcx 104903: snapshot_write_finalize+0xe3 mov 0x10(%rdx),%rax ... and much more. - Function validation tracing support (Alexandre Chartre) - Various -ffunction-sections fixes (Josh Poimboeuf) - Clang AutoFDO (Automated Feedback-Directed Optimizations) support (Josh Poimboeuf) - Misc fixes and cleanups (Borislav Petkov, Chen Ni, Dylan Hatch, Ingo Molnar, John Wang, Josh Poimboeuf, Pankaj Raghav, Peter Zijlstra, Thorsten Blum) * tag 'objtool-core-2025-12-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (129 commits) objtool: Fix segfault on unknown alternatives objtool: Build with disassembly can fail when including bdf.h objtool: Trim trailing NOPs in alternative objtool: Add wide output for disassembly objtool: Compact output for alternatives with one instruction objtool: Improve naming of group alternatives objtool: Add Function to get the name of a CPU feature objtool: Provide access to feature and flags of group alternatives objtool: Fix address references in alternatives objtool: Disassemble jump table alternatives objtool: Disassemble exception table alternatives objtool: Print addresses with alternative instructions objtool: Disassemble group alternatives objtool: Print headers for alternatives objtool: Preserve alternatives order objtool: Add the --disas=<function-pattern> action objtool: Do not validate IBT for .return_sites and .call_sites objtool: Improve tracing of alternative instructions objtool: Add functions to better name alternatives objtool: Identify the different types of alternatives ...
2025-11-27KVM: s390: Enable and disable interrupts in entry codeHeiko Carstens1-0/+1
Move enabling and disabling of interrupts around the SIE instruction to entry code. Enabling interrupts only after the __TI_sie flag has been set guarantees that the SIE instruction is not executed if an interrupt happens between enabling interrupts and the execution of the SIE instruction. Interrupt handlers and machine check handler forward the PSW to the sie_exit label in such cases. This is a prerequisite for VIRT_XFER_TO_GUEST_WORK to prevent that guest context is entered when e.g. a scheduler IPI, indicating that a reschedule is required, happens right before the SIE instruction, which could lead to long delays. Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Tested-by: Andrew Donnellan <ajd@linux.ibm.com> Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-27KVM: s390: Add signal_exits counterAndrew Donnellan1-0/+1
Add a signal_exits counter for s390, as exists on arm64, loongarch, mips, powerpc, riscv and x86. This is used by kvm_handle_signal_exit(), which we will use when we later enable CONFIG_VIRT_XFER_TO_GUEST_WORK. Signed-off-by: Andrew Donnellan <ajd@linux.ibm.com> Reviewed-by: Janosch Frank <frankja@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-25s390/vdso: Rename vdso64 to vdsoHeiko Carstens1-2/+2
Since compat is gone there is only a 64 bit vdso left. Remove the superfluous "64" suffix everywhere. Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24s390: Add stackprotector supportHeiko Carstens3-1/+43
Stackprotector support was previously unavailable on s390 because by default compilers generate code which is not suitable for the kernel: the canary value is accessed via thread local storage, where the address of thread local storage is within access registers 0 and 1. Using those registers also for the kernel would come with a significant performance impact and more complicated kernel entry/exit code, since access registers contents would have to be exchanged on every kernel entry and exit. With the upcoming gcc 16 release new compiler options will become available which allow to generate code suitable for the kernel. [1] Compiler option -mstack-protector-guard=global instructs gcc to generate stackprotector code that refers to a global stackprotector canary value via symbol __stack_chk_guard. Access to this value is guaranteed to occur via larl and lgrl instructions. Furthermore, compiler option -mstack-protector-guard-record generates a section containing all code addresses that reference the canary value. To allow for per task canary values the instructions which load the address of __stack_chk_guard are patched so they access a lowcore field instead: a per task canary value is available within the task_struct of each task, and is written to the per-cpu lowcore location on each context switch. Also add sanity checks and debugging option to be consistent with other kernel code patching mechanisms. Full debugging output can be enabled with the following kernel command line options: debug_stackprotector bootdebug ignore_loglevel earlyprintk dyndbg="file stackprotector.c +p" Example debug output: stackprot: 0000021e402d4eda: c010005a9ae3 -> c01f00070240 where "<insn address>: <old insn> -> <new insn>". [1] gcc commit 0cd1f03939d5 ("s390: Support global stack protector") Reviewed-by: Sven Schnelle <svens@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-24s390/percpu: Get rid of ARCH_MODULE_NEEDS_WEAK_PER_CPUHeiko Carstens1-8/+0
Since the rework of the kernel virtual address space [1] the module area and the kernel image are within the same 4GB area. Therefore there is no need for the weak per cpu workaround for modules anymore. Remove it. [1] commit c98d2ecae08f ("s390/mm: Uncouple physical vs virtual address spaces") Acked-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-21Merge branch 'objtool/core'Peter Zijlstra26-198/+232
Bring in the UDB and objtool data annotations to avoid conflicts while further extending the bug exceptions. Signed-off-by: Peter Zijlstra <peterz@infradead.org>
2025-11-21KVM: s390: Add capability that forwards operation exceptionsJanosch Frank1-0/+1
Setting KVM_CAP_S390_USER_OPEREXEC will forward all operation exceptions to user space. This also includes the 0x0000 instructions managed by KVM_CAP_S390_USER_INSTR0. It's helpful if user space wants to emulate instructions which do not (yet) have an opcode. While we're at it refine the documentation for KVM_CAP_S390_USER_INSTR0. Signed-off-by: Janosch Frank <frankja@linux.ibm.com> Reviewed-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Janosch Frank <frankja@linux.ibm.com>
2025-11-17s390/syscalls: Switch to generic system call table generationHeiko Carstens1-1/+2
The s390 syscall.tbl format differs slightly from most others, and therefore requires an s390 specific system call table generation script. With compat support gone use the opportunity to switch to generic system call table generation. The abi for all 64 bit system calls is now common, since there is no need to specify if system call entry points are only for 64 bit anymore. Furthermore create the system call table in C instead of assembler code in order to get type checking for all system call functions contained within the table. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17s390/syscalls: Remove system call table pointer from thread_structHeiko Carstens2-2/+0
With compat support gone there is only one system call table left. Therefore remove the sys_call_table pointer from thread_struct and use the sys_call_table directly. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17s390/uapi: Remove 31 bit support from uapi header filesHeiko Carstens7-233/+0
Since the kernel does not support running 31 bit / compat binaries anymore, remove also the corresponding 31 bit support from uapi header files. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17s390: Remove compat supportHeiko Carstens10-335/+12
There shouldn't be any 31 bit code around anymore that matters. Remove the compat layer support required to run 31 bit code. Reason for removal is code simplification and reduced test effort. Note that this comes without any deprecation warnings added to config options, or kernel messages, since most likely those would be ignored anyway. If it turns out there is still a reason to keep the compat layer this can be reverted at any time in the future. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17s390/syscalls: Add pt_regs parameter to SYSCALL_DEFINE0() syscall wrapperHeiko Carstens1-8/+8
All system call wrappers should match the sys_call_ptr_t type. This is not the case for system calls without parameters. Add the missing pt_regs parameter there too. Note: this is currently not a problem, since the parameter is unused. However it prevents to create a correctly typed system call table in C. With the current assembler implementation this works because of missing type checking. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-17s390/ptrace: Rename psw_t32 to psw32_tHeiko Carstens1-1/+1
Use a standard "_t" suffix for psw_t32 and rename it to psw32_t. Reviewed-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14s390/mm: Fix __ptep_rdp() inline assemblyHeiko Carstens1-7/+5
When a zero ASCE is passed to the __ptep_rdp() inline assembly, the generated instruction should have the R3 field of the instruction set to zero. However the inline assembly is written incorrectly: for such cases a zero is loaded into a register allocated by the compiler and this register is then used by the instruction. This means that selected TLB entries may not be flushed since the specified ASCE does not match the one which was used when the selected TLB entries were created. Fix this by removing the asce and opt parameters of __ptep_rdp(), since all callers always pass zero, and use a hard-coded register zero for the R3 field. Fixes: 0807b856521f ("s390/mm: add support for RDP (Reset DAT-Protection)") Cc: stable@vger.kernel.org Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14s390/fault: Print unmodified PSW address on protection exceptionHeiko Carstens1-0/+2
In case of a kernel crash caused by a protection exception, print the unmodified PSW address as reported by the CPU. The protection exception handler modifies the PSW address in order to keep fault handling easy, however that leads to misleading call traces. Therefore restore the original PSW address before printing it. Before this change the output in case of a protection exception looks like this: Oops: 0004 ilc:2 [#1]SMP Krnl PSW : 0704c00180000000 000003ffe0b40d78 (sysrq_handle_crash+0x28/0x40) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 ... Krnl Code: 000003ffe0b40d66: e3e0f0980024 stg %r14,152(%r15) 000003ffe0b40d6c: c010fffffff2 larl %r1,000003ffe0b40d50 #000003ffe0b40d72: c0200046b6bc larl %r2,000003ffe1417aea >000003ffe0b40d78: 92021000 mvi 0(%r1),2 000003ffe0b40d7c: c0e5ffae03d6 brasl %r14,000003ffe0101528 With this change it looks like this: Oops: 0004 ilc:2 [#1]SMP Krnl PSW : 0704c00180000000 000003ffe0b40dfc (sysrq_handle_crash+0x2c/0x40) R:0 T:1 IO:1 EX:1 Key:0 M:1 W:0 P:0 AS:3 CC:0 PM:0 RI:0 EA:3 ... Krnl Code: 000003ffe0b40dec: c010fffffff2 larl %r1,000003ffe0b40dd0 000003ffe0b40df2: c0200046b67c larl %r2,000003ffe1417aea *000003ffe0b40df8: 92021000 mvi 0(%r1),2 >000003ffe0b40dfc: c0e5ffae03b6 brasl %r14,000003ffe0101568 000003ffe0b40e02: 0707 bcr 0,%r7 Note that with this change the PSW address points to the instruction behind the instruction which caused the exception like it is expected for protection exceptions. This also replaces the '#' marker in the disassembly with '*', which allows to distinguish between new and old behavior. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14s390/processor: Add __forward_psw() helperHeiko Carstens1-2/+7
Similar to __rewind_psw() add the counter part __forward_psw(). This helps to make code more readable if a PSW address has to be forwarded, since it is more natural to write addr = __forward_psw(psw, ilen); instead of addr = __rewind_psw(psw, -ilen); This renames also the ilc parameter of __rewind_psw() to ilen, since the parameter reflects an instruction length, and not an instruction length code. Also change the type of ilen from unsigned long to long so it reflects that lengths can be negative or positive. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14s390/fpu: Fix false-positive kmsan report in fpu_vstl()Aleksei Nikiforov1-0/+3
A false-positive kmsan report is detected when running ping command. An inline assembly instruction 'vstl' can write varied amount of bytes depending on value of 'index' argument. If 'index' > 0, 'vstl' writes at least 2 bytes. clang generates kmsan write helper call depending on inline assembly constraints. Constraints are evaluated compile-time, but value of 'index' argument is known only at runtime. clang currently generates call to __msan_instrument_asm_store with 1 byte as size. Manually call kmsan function to indicate correct amount of bytes written and fix false-positive report. This change fixes following kmsan reports: [ 36.563119] ===================================================== [ 36.563594] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 36.563852] virtqueue_add+0x35c6/0x7c70 [ 36.564016] virtqueue_add_outbuf+0xa0/0xb0 [ 36.564266] start_xmit+0x288c/0x4a20 [ 36.564460] dev_hard_start_xmit+0x302/0x900 [ 36.564649] sch_direct_xmit+0x340/0xea0 [ 36.564894] __dev_queue_xmit+0x2e94/0x59b0 [ 36.565058] neigh_resolve_output+0x936/0xb40 [ 36.565278] __neigh_update+0x2f66/0x3a60 [ 36.565499] neigh_update+0x52/0x60 [ 36.565683] arp_process+0x1588/0x2de0 [ 36.565916] NF_HOOK+0x1da/0x240 [ 36.566087] arp_rcv+0x3e4/0x6e0 [ 36.566306] __netif_receive_skb_list_core+0x1374/0x15a0 [ 36.566527] netif_receive_skb_list_internal+0x1116/0x17d0 [ 36.566710] napi_complete_done+0x376/0x740 [ 36.566918] virtnet_poll+0x1bae/0x2910 [ 36.567130] __napi_poll+0xf4/0x830 [ 36.567294] net_rx_action+0x97c/0x1ed0 [ 36.567556] handle_softirqs+0x306/0xe10 [ 36.567731] irq_exit_rcu+0x14c/0x2e0 [ 36.567910] do_io_irq+0xd4/0x120 [ 36.568139] io_int_handler+0xc2/0xe8 [ 36.568299] arch_cpu_idle+0xb0/0xc0 [ 36.568540] arch_cpu_idle+0x76/0xc0 [ 36.568726] default_idle_call+0x40/0x70 [ 36.568953] do_idle+0x1d6/0x390 [ 36.569486] cpu_startup_entry+0x9a/0xb0 [ 36.569745] rest_init+0x1ea/0x290 [ 36.570029] start_kernel+0x95e/0xb90 [ 36.570348] startup_continue+0x2e/0x40 [ 36.570703] [ 36.570798] Uninit was created at: [ 36.571002] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 36.571261] kmalloc_reserve+0x12a/0x470 [ 36.571553] __alloc_skb+0x310/0x860 [ 36.571844] __ip_append_data+0x483e/0x6a30 [ 36.572170] ip_append_data+0x11c/0x1e0 [ 36.572477] raw_sendmsg+0x1c8c/0x2180 [ 36.572818] inet_sendmsg+0xe6/0x190 [ 36.573142] __sys_sendto+0x55e/0x8e0 [ 36.573392] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 36.573571] __do_syscall+0x12e/0x240 [ 36.573823] system_call+0x6e/0x90 [ 36.573976] [ 36.574017] Byte 35 of 98 is uninitialized [ 36.574082] Memory access of size 98 starts at 0000000007aa0012 [ 36.574218] [ 36.574325] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G B N 6.17.0-dirty #16 NONE [ 36.574541] Tainted: [B]=BAD_PAGE, [N]=TEST [ 36.574617] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 36.574755] ===================================================== [ 63.532541] ===================================================== [ 63.533639] BUG: KMSAN: uninit-value in virtqueue_add+0x35c6/0x7c70 [ 63.533989] virtqueue_add+0x35c6/0x7c70 [ 63.534940] virtqueue_add_outbuf+0xa0/0xb0 [ 63.535861] start_xmit+0x288c/0x4a20 [ 63.536708] dev_hard_start_xmit+0x302/0x900 [ 63.537020] sch_direct_xmit+0x340/0xea0 [ 63.537997] __dev_queue_xmit+0x2e94/0x59b0 [ 63.538819] neigh_resolve_output+0x936/0xb40 [ 63.539793] ip_finish_output2+0x1ee2/0x2200 [ 63.540784] __ip_finish_output+0x272/0x7a0 [ 63.541765] ip_finish_output+0x4e/0x5e0 [ 63.542791] ip_output+0x166/0x410 [ 63.543771] ip_push_pending_frames+0x1a2/0x470 [ 63.544753] raw_sendmsg+0x1f06/0x2180 [ 63.545033] inet_sendmsg+0xe6/0x190 [ 63.546006] __sys_sendto+0x55e/0x8e0 [ 63.546859] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.547730] __do_syscall+0x12e/0x240 [ 63.548019] system_call+0x6e/0x90 [ 63.548989] [ 63.549779] Uninit was created at: [ 63.550691] kmem_cache_alloc_node_noprof+0x9e8/0x10e0 [ 63.550975] kmalloc_reserve+0x12a/0x470 [ 63.551969] __alloc_skb+0x310/0x860 [ 63.552949] __ip_append_data+0x483e/0x6a30 [ 63.553902] ip_append_data+0x11c/0x1e0 [ 63.554912] raw_sendmsg+0x1c8c/0x2180 [ 63.556719] inet_sendmsg+0xe6/0x190 [ 63.557534] __sys_sendto+0x55e/0x8e0 [ 63.557875] __s390x_sys_socketcall+0x19ae/0x2ba0 [ 63.558869] __do_syscall+0x12e/0x240 [ 63.559832] system_call+0x6e/0x90 [ 63.560780] [ 63.560972] Byte 35 of 98 is uninitialized [ 63.561741] Memory access of size 98 starts at 0000000005704312 [ 63.561950] [ 63.562824] CPU: 3 UID: 0 PID: 192 Comm: ping Tainted: G B N 6.17.0-dirty #16 NONE [ 63.563868] Tainted: [B]=BAD_PAGE, [N]=TEST [ 63.564751] Hardware name: IBM 3931 A01 703 (KVM/Linux) [ 63.564986] ===================================================== Fixes: dcd3e1de9d17 ("s390/checksum: provide csum_partial_copy_nocheck()") Signed-off-by: Aleksei Nikiforov <aleksei.nikiforov@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14s390/mm: Remove unused flush_tlb()Heiko Carstens1-2/+0
flush_tlb() exists for historic reasons and was never used. Remove it. Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-14s390/pai_crypto: Introduce generic event init using pai_pmu[]Thomas Richter1-0/+1
To support one common PAI PMU device driver which handles both PMUs pai_crypto and pai_ext, use a common naming scheme for structures and variables suitable for both device drivers. Rework PAI crypto event initialization. Add a common function for event initialization. It uses the PAI characteristics stored in the pai_pmu table instead of hardcoded values. Enlarge pai_event_valid() to check all event validation aspects. Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-13Merge tag 'v6.18-rc5' into objtool/core, to pick up fixesIngo Molnar1-1/+0
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-11-06s390/smp: Mark pcpu_delegate() and smp_call_ipl_cpu() as __noreturnThorsten Blum1-1/+1
pcpu_delegate() never returns to its caller. If the target CPU is the current CPU, it calls __pcpu_delegate(), whose delegate function is not supposed to return. In any case, even if __pcpu_delegate() unexpectedly returns, pcpu_delegate() sends SIGP_STOP to the current CPU and waits in an infinite loop. Annotate pcpu_delegate() with the __noreturn attribute to improve compiler optimizations. Also annotate smp_call_ipl_cpu() accordingly since it always calls pcpu_delegate(). [hca: Merge two patches from Thorsten Blum] Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06Merge branch 'dat-enhancement-1'Heiko Carstens3-24/+7
Heiko Carstens says: ==================== Add the Dat-Enhancement facility 1 to the list of facilities which are required to start the kernel. The facility provides the CSPG and IDTE instructions. In particular the CSPG instruction can be used to replace a valid page table entry with a different page table entry, which also differs in the page frame real address. Without the CSPG instruction it is possible to use the CSP instruction to change valid page table entries, however it only allows to change the lower or higher 32 bits of such entries, which means it cannot be used to change the page frame real address of valid page table entries. Given that there is code around (e.g. HugeTLB vmemmap optimization) which requires to change valid page table entries of the kernel mapping, without the detour over an invalid page table entry, make the CSPG instruction unconditionally available. The Dat-Enhancement facility 1 is available since z990, which is older than the currently supported minimum architecture (z10). Therefore adding this the architecture level set shouldn't cause any problems. ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-06s390/mm: Replace the CSP instruction with CSPGHeiko Carstens2-18/+5
The CSPG instruction is part of the Dat-Enhancement facility 1, which is always available. Given that it can be used everywhere where also the CSP instruction can be used, replace CSP with CSPG everywhere. This allows to remove the csp() inline assembly. Also remove the unused gmap_pmdp_csp() function. Acked-by: Alexander Gordeev &l