aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2026-05-25Merge tag 'v7.1-rc5' into driver-core-nextDanilo Krummrich187-742/+1436
We need the driver-core fixes in here as well to build on top of. Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-05-24x86/tlb: Convert copy_from_user() + kstrtouint() to kstrtouint_from_user()Fushuai Wang1-12/+7
Using kstrtouint_from_user() instead of copy_from_user() + kstrtouint() makes the code simpler and less error-prone. No functional changes. [ bp: Align function args on opening brace, while at it. ] Suggested-by: Yury Norov <ynorov@nvidia.com> Signed-off-by: Fushuai Wang <wangfushuai@baidu.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Yury Norov <ynorov@nvidia.com> Link: https://patch.msgid.link/20260117145615.53455-3-fushuai.wang@linux.dev
2026-05-24arm64: dts: exynos: Add EL2 virtual timer interruptMarc Zyngier5-5/+10
A bunch of Samsung SoCs are missing the EL2 virtual timer interrupt despite using ARMv8.1+ CPUs. Add the missing interrupt, except for those broken designs where the interrupt is documented as not being wired. Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Link: https://patch.msgid.link/20260523140242.586031-9-maz@kernel.org Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-05-24Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds14-30/+110
Pull kvm fixes from Paolo Bonzini: "arm64: - Fix ITS EventID sanitisation when restoring an interrupt translation table. - Fix PPI memory leak when failing to initialise a vcpu. - Correctly return an error when the validation of a hypervisor trace descriptor fails, and limit this validation to protected mode only. RISC-V: - Fix invalid HVA warning in steal-time recording - Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info() and pmu_snapshot_set_shmem() - Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler - Fix sign extension of value for MMIO loads s390: - Fix bugs in vSIE (nested virtualization) and UCONTROL, caused by the page table rewrite. x86: - Apply erratum #1235 workaround (disable AVIC IPI virtualization) on Hygon Family 18h, just like on AMD Family 17h. - When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM, return the VM's configured APIC bus frequency instead of the default. This is less confusing (read: not wrong) and makes it easier to fill in CPUID information that communicates the APIC bus frequency to the guest. Selftests: - Do not include glibc-internal <bits/endian.h>; it worked by chance and broke building KVM selftests with musl" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235) KVM: selftests: Verify that KVM returns the configured APIC cycle length KVM: x86: Return the VM's configured APIC bus frequency when queried KVM: selftests: elf: Include <endian.h> instead of <bits/endian.h> KVM: s390: Properly reset zero bit in PGSTE KVM: s390: vsie: Fix redundant rmap entries KVM: s390: vsie: Fix unshadowing logic KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errors KVM: s390: vsie: Fix memory leak when unshadowing KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid desc KVM: arm64: vgic: Free private_irqs when init fails after allocation KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bits RISC-V: KVM: Fix sign extension for MMIO loads RISC-V: KVM: Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler riscv: kvm: return SBI_ERR_FAILURE for pmu_event_info() when OOM riscv: kvm: return SBI_ERR_FAILURE for pmu_snapshot_set_shmem() when OOM RISC-V: KVM: Fix invalid HVA warning in steal-time recording
2026-05-24Merge tag 'x86-urgent-2026-05-24' of ↵Linus Torvalds14-69/+134
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - On SEV guests, handle set_memory_{encrypted,decrypted}() failures more conservatively by assuming that all affected pages are unencrypted (Carlos López) - Disable broadcast TLB flush when PCID is disabled (Tom Lendacky) - Fix VMX vs. hrtimer_rearm_deferred() regression (Peter Zijlstra) - Move IRQ/NMI dispatch code from KVM into x86 core, to prepare for a KVM x2apic fix (Peter Zijlstra) - Fix incorrect munmap() size on map_vdso() failure (Guilherme Giacomo Simoes) * tag 'x86-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: virt: sev-guest: Explicitly leak pages in unknown state x86/mm: Disable broadcast TLB flush when PCID is disabled x86/kvm/vmx: Fix VMX vs hrtimer_rearm_deferred() x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core x86/vdso: Fix incorrect size in munmap() on map_vdso() failure
2026-05-25arm64: dts: allwinner: sun50i-a64: Enable DT overlaysPeter Robinson1-0/+6
Enable DT overlays on some of the Pine64 devices to enable use of addon accessories such as WiFi or audio modules. Signed-off-by: Peter Robinson <pbrobinson@gmail.com> Link: https://patch.msgid.link/20260518220455.156874-1-pbrobinson@gmail.com Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
2026-05-23riscv: dts: tenstorrent: Add PMU node to blackhole for Linux perf supportMichael Neuling1-0/+48
Add a riscv,pmu device tree node with SBI PMU event mappings for the SiFive X280 hardware performance counters. This enables OpenSBI to expose the SBI PMU extension, allowing Linux perf to use the 4 programmable counters (mhpmcounter3-6) across 3 event classes: instruction commit, microarchitectural, and memory system events. Event encodings are derived from the SiFive Tenstorrent X280 MC Manual (21G3.04.00) Table 13, section 3.10.5. Assisted-by: Claude:claude-opus-4-6[1m] Signed-off-by: Michael Neuling <mikey@neuling.org> Reviewed-by: Drew Fustini <fustini@kernel.org> Signed-off-by: Drew Fustini <fustini@kernel.org>
2026-05-23Merge tag 'nios2_updates_for_v7.2' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux Pull nios2 fixes from Dinh Nguyen: - Implement _THIS_IP_ for inline asm - Add Simon Schuster as a maintainer and mark the NIOS2 as Supported * tag 'nios2_updates_for_v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: nios2: Implement _THIS_IP_ using inline asm MAINTAINERS: arch/nios2: Add Simon Schuster as co-maintainer
2026-05-23Merge tag 'loongarch-fixes-7.1-2' of ↵Linus Torvalds6-15/+68
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson Pull LoongArch fixes from Huacai Chen: "Rework KASLR to avoid initrd overlap, remove some unused code to avoid a build warning, fix some bugs in kprobes and KVM" * tag 'loongarch-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: LoongArch: KVM: Move some variable declarations to paravirt.h LoongArch: kprobes: Fix handling of fatal unrecoverable recursions LoongArch: kprobes: Use larch_insn_text_copy() to patch instructions LoongArch: Remove unused code to avoid build warning LoongArch: Avoid initrd overlap during kernel relocation LoongArch: Skip relocation-time KASLR if already applied efi/loongarch: Randomize kernel preferred address for KASLR
2026-05-23KVM: arm64: Fix arch timer interrupts for GICv3-on-GICv5 guestsSascha Bischoff1-10/+7
When running on a GICv5 host, we push an arch-timer-specific interrupt domain for the timer interrupts. This interrupt domain is used to mask the host interrupt when a GICv5 guest is running. However, this interrupt domain is still in place when running with a GICv3 guest on GICv5 hardware. The result is that some interrupt state changes are not correctly propragated to the host irqchip driver for legacy guests. Explicitly pass irqchip state changes though to the host irqchip driver when running a GICv3-based guest on a GICv5 host. This bypasses all masking, and thereby operates just as a native GICv3 guest would, with the exception of having an additional irq domain in the hierarchy. Fixes: 9491c63b6cd7 ("KVM: arm64: gic-v5: Enlighten arch timer for GICv5") Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://lore.kernel.org/r/20260520091949.542365-19-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-23KVM: arm64: vgic-v5: Atomically assign bits to PPI DVI bitmapSascha Bischoff1-1/+1
For GICv5 guests we make use of the DVI mechanism for PPIs where possible. When mapping a virtual irq to a physical one for a GICv5 guest, the corresponding bit in the DVI bitmap is set. When unmapping, said bit is cleared again. The key user of this mechanism is the arch timer. The existing code used the non-atomic __assign_bit() rather than doing the update atomically. This could technically result in losing state if a second PPI's DVI bit were being manipulated concurrently. Each individual bit within the DVI bitmap is guarded using vgic_irq->irq_lock, but there's no locking for the overall bitmap. Therefore, switch to using the atomic assign_bit() function instead. Fixes: 5a98d0e17e59 ("KVM: arm64: gic-v5: Implement direct injection of PPIs") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Link: https://lore.kernel.org/r/20260520091949.542365-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-23KVM: arm64: vgic-v5: Add missing trap handing for NV triageSascha Bischoff1-0/+8
As things stand, there is no support for Nested Virt with GICv5 guests yet. However, this is coming and therefore we need to be able to correctly triage the traps when running with NV. Add the missing fgtreg lookups required for that to triage_sysreg_trap(). These are specific to the FGT regs added as part of GICv5: * ICH_HFGRTR_EL2 * ICH_HFGWTR_EL2 * ICH_HFGITR_EL2 Fixes: 9d6d9514c08f ("KVM: arm64: gic-v5: Support GICv5 FGTs & FGUs") Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20260520091949.542365-11-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-23bpf: Recover arena kernel faults with scratch pageKumar Kartikeya Dwivedi2-7/+15
BPF arena usage is becoming more prevalent, but kernel <-> BPF communication over arena memory is awkward today. Data has to be staged through a trusted kernel pointer with extra code and copying on the BPF side. While reads through arena pointers can use a fault-safe helper, writes don't have a good solution. The in-line alternative would need instruction emulation or asm fixup labels. Enable direct kernel-side reads and writes within GUARD_SZ / 2 of any handed-in arena pointer, without bounds checking. A per-arena scratch page is installed by the arch fault path into empty arena kernel PTEs - x86 from page_fault_oops() for not-present faults, arm64 from __do_kernel_fault() for translation faults, both after the existing exception-table and KFENCE handling. The faulting instruction retries and the access is also reported through the program's BPF stream, preserving error reporting. bpf_prog_find_from_stack() resolves the current BPF program (and its arena) from the kernel stack - no new bpf_run_ctx state is added. Recovery covers the 4 GiB arena plus the upper half-guard (GUARD_SZ / 2). The lower half-guard is excluded because well-behaved kfuncs only access forward from arena pointers. The kfunc-author contract - access at most GUARD_SZ / 2 past a handed-in pointer - is documented in Documentation/bpf/kfuncs.rst. The install is lock-free via ptep_try_set(). On race-loss the winning installer's PTE is already valid, so the access retry succeeds. The arena clear path uses ptep_get_and_clear() so installer and clearer race through atomic accessors. No flush_tlb_kernel_range() afterwards. Stale "not mapped" entries just cause one extra re-fault, cheaper than a global IPI on every install. Scratch exists only to keep the kernel from oopsing on an in-line arena access. Its presence at a PTE means the BPF program has already malfunctioned, and the violation is reported through the program's BPF stream. The only requirement for behavior on a scratched PTE is that the kernel doesn't crash. In particular, any user-side access through such a PTE may segfault. The shared scratch page is freed once during map destruction. BPF instruction faults continue to use the existing JIT exception-table path. This patch changes only the kernel-text fault path. No UAPI flag is added. The new behavior is the default. v2: Use ptep_get_and_clear() in apply_range_clear_cb(). (David) v3: Stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL. (lkp) Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Cc: David Hildenbrand <david@kernel.org> Link: https://lore.kernel.org/r/20260522172219.1423324-3-tj@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-23mm: Add ptep_try_set() for lockless empty-slot installsTejun Heo2-0/+24
Add ptep_try_set(ptep, new_pte): atomically set *ptep to new_pte iff it is currently pte_none(). Returns true on success, false if the slot was already populated or the arch has no implementation. The intended caller is the upcoming bpf_arena kernel-side fault recovery path. The install runs from a page fault that can be nested under locks held by the faulting kernel caller (e.g. a BPF program holding raw_res_spin_lock_irqsave on its arena's spinlock), so trylock-and-retry would A-A deadlock. Lock-free cmpxchg is the only viable option, which constrains this helper to special kernel page tables where concurrent writers cooperate via atomic accessors. The generic version in <linux/pgtable.h> returns false. x86 and arm64 override with try_cmpxchg-based implementations on the underlying pteval. Other architectures get the false stub - the callers there already fall through to oops. v2: Rename to ptep_try_set(). Tighten kerneldoc. (David, Alexei) v3: Note that strict-zero cmpxchg is narrower than pte_none(). (Andrea) Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com> Cc: David Hildenbrand <david@kernel.org> Acked-by: David Hildenbrand (arm) <david@kernel.org> Link: https://lore.kernel.org/r/20260522172219.1423324-2-tj@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-23KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235)Tina Zhang1-5/+7
Hygon Family 18h CPUs are derived from AMD Family 17h (Zen1) silicon and share the same erratum #1235: hardware may read a stale IsRunning=1 bit during ICR write emulation and silently fail to generate an AVIC_IPI_FAILURE_TARGET_NOT_RUNNING VM-Exit on the sending vCPU. The absence of the VM-Exit causes KVM to miss the required wakeup of blocking target vCPUs, leading to hung vCPUs and unbounded delays in guest execution. Extend the existing AMD Family 17h erratum #1235 workaround to also cover Hygon Family 18h. With IPI virtualization disabled, KVM never sets IsRunning=1 in the Physical ID table, so every non-self IPI generates a VM-Exit and is correctly emulated. Fixes: 8de4a1c8164e ("KVM: SVM: Disable (x2)AVIC IPI virtualization if CPU has erratum #1235") Cc: <stable@vger.kernel.org> Signed-off-by: Tina Zhang <zhang_wei@open-hieco.net> Message-ID: <20260522040014.3380201-1-zhang_wei@open-hieco.net>
2026-05-23KVM: x86: Return the VM's configured APIC bus frequency when queriedSean Christopherson1-1/+1
When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM, return the VM's configured APIC bus frequency, not KVM's default. Aside from the fact that returning the default frequency is blatantly wrong if userspace has changed the frequency, returning the configured frequency means userspace can blindly trust the result, e.g. when filling PV CPUID information that communicates the APIC bus frequency to the guest. Fixes: 6fef518594bc ("KVM: x86: Add a capability to configure bus frequency for APIC timer") Reported-by: David Woodhouse <dwmw2@infradead.org> Closes: https://lore.kernel.org/all/ab84153e33fbe7c25667f595c56b310d4d5a93ef.camel@infradead.org Signed-off-by: Sean Christopherson <seanjc@google.com> Message-ID: <20260522173526.3539407-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-05-23Merge tag 'kvm-riscv-fixes-7.1-1' of https://github.com/kvm-riscv/linux into ↵Paolo Bonzini4-10/+15
HEAD KVM/riscv fixes for 7.1, take #1 - Fix invalid HVA warning in steal-time recording - Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info() and pmu_snapshot_set_shmem() - Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler - Fix sign extension of value for MMIO loads
2026-05-23Merge tag 'kvm-s390-master-7.1-2' of ↵Paolo Bonzini48-102/+177
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD KVM: s390: some vSIE and UCONTROL fixes Fix some memory issues and some hangs in vSIE.
2026-05-23Merge tag 'kvmarm-fixes-7.1-3' of ↵Paolo Bonzini3-3/+14
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 fixes for 7.1, take #3 - Fix ITS EventID sanitisation when restoring an interrupt translation table. - Fix PPI memory leak when failing to initialise a vcpu. - Correctly return an error when the validation of a hypervisor trace descriptor fails, and limit this validation to protected mode only.
2026-05-23openrisc: Fix jump_label smp syncingStafford Horne2-1/+4
The original commit 8c30b0018f9d ("openrisc: Add jump label support") copies from arm64 and does not properly consider how icache invalidation on remote cores works in OpenRISC. On OpenRISC remote icaches need to be invalidated otherwise static key's may remain state after updating. Fix SMP cache syncing by: 1. Properly invalidate remote core icaches on SMP systems by using icache_all_inv. The old code uses kick_all_cpus_sync() which runs a no-op IPI function call on remote CPU's which does execute a lot of code and flushes many cache lines in the process, but does not flush all and it's not correct on OpenRISC. 2. For architectures that do not have WRITETHROUGH caches be sure to flush the dcache after patching. To test this I first reproduced the issue using a custom test module [0]. The test confirmed that some icache lines maintained stale static_key code sequences after calling static_branch_enable(). After this patch there are no longer jump_label coherency issues. [0] https://github.com/stffrdhrn/or1k-utils/tree/master/tests/smp_static_key_test Cc: stable@vger.kernel.org # depends on openrisc: Add icache_all_inv Fixes: 8c30b0018f9d ("openrisc: Add jump label support") Signed-off-by: Stafford Horne <shorne@gmail.com>
2026-05-23openrisc: Add full instruction cache invalidate functionsStafford Horne3-0/+41
Add functions to invalidate all cache lines which we will use for static_key patching. On OpenRISC there is no instruction to invalidate an entire cache so we loop and invalidate cache lines one by one. This is not extremely expensive on OpenRISC as we usually have only a few hundred cache lines. I considered using the invalidate cache page or range functions. However, tracking which ranges need invalidation would have been more expensive than flushing all pages. Cc: stable@vger.kernel.org Signed-off-by: Stafford Horne <shorne@gmail.com>
2026-05-23openrisc: Cache invalidation cleanupStafford Horne1-10/+0
When working on new cache invalidation functions I noticed these cleanups in the cache initialization code. Remove unused and commented instructions to avoid confusion. Signed-off-by: Stafford Horne <shorne@gmail.com>
2026-05-22net: arcnet: remove ISA and PCMCIA support; modernize documentationEthan Nelson-Moore1-4/+0
While ARCnet is still used in industrial environments, and cards are still manufactured, it is unlikely anyone is still using it with ISA and PCMCIA cards. Reduce future maintenance burden by removing all ISA and PCMCIA ARCnet drivers and documentation related to them. Update instructions for loading modules and passing parameters to work on modern kernels and with the com20020_pci driver. Also take the opportunity to document the rest of the module parameters, correct a file path in Documentation/networking/arcnet.rst, and change a reference to /etc/rc.inet1, which no longer exists, to refer to ifconfig. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Link: https://patch.msgid.link/20260521001631.45434-4-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-22KVM: SVM: Only disable x2AVIC WRMSR interception for MSRs that are acceleratedSean Christopherson1-36/+4
When x2AVIC is enabled, disable WRMSR interception only for MSRs that are actually accelerated by hardware. Disabling interception for MSRs that aren't accelerated is functionally "fine", and in some cases a weird "win" for performance, but only for cases that should never be triggered by a well-behaved VM (writes to read-only registers; the #GP will typically occur in the guest without taking a #VMEXIT, even for fault-like exits). But overall, disabling interception for MSRs that aren't accelerated is at best confusing and unintuitive, and at worst introduces avoidable risk, as the APM's documentation is imperfect and contradictory. The table in "15.29.3.1 Virtual APIC Register Accesses" of simply states that such writes generate exits, where as "Section 15.29.10 x2AVIC" says: x2APIC MSR intercept checks and access checks have higher priority than AVIC access permission checks. CPU behavior follows the latter (which makes perfect sense), but all in all there's simply no reason to disable interception just to make a #GP faster. Note, the set of MSRs that are passed through for write is identical to VMX's set when IPI virtualization is enabled. This is not a coincidence, and is another motiviating factor for cleaning up the intercepts, as x2AVIC is functionally equivalent to APICv+IPIv. Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://patch.msgid.link/20260514213115.1637082-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-05-22KVM: SVM: Disable x2AVIC RDMSR interception for MSRs KVM actually supportsSean Christopherson1-2/+11
When toggling x2AVIC on/off, use KVM's curated mask of x2APIC MSRs that can/should be passed through to the guest (or not) when 2AVIC is enabled. Using the effective list provided by the local APIC emulation fixes multiple (classes of) bugs, as the existing hand-coded list of MSRs is wrong on multiple fronts: - ARBPRI isn't supported by KVM, isn't accelerated by AVIC (for read or write), and its #VMEXIT is fault-like, i.e. requires decoding the instruction. Disabling interception is nonsensical and suboptimal. - DFR and ICR2 aren't supported by x2APIC and so don't need their intercepts disabled for performance reasons. While the #GP due to x2APIC being abled has higher priority than the trap-like #VMEXIT, disabling interception of unsupported MSRs is confusing and unnecessary. - RRR is completely unsupported. - AVIC currently fails to pass through the "range of vectors" registers, IRR, ISR, and TMR, as e.g. X2APIC_MSR(APIC_IRR) only affects IRR0, and thus only disables intercept for vectors 31:0 (which are the *least* interesting registers). - TMCCT (the current APIC timer count) isn't accelerated by hardware, and generates a fault-like AVIC_UNACCELERATED_ACCESS #VMEXIT, i.e. requires KVM to decode the instruction to figure out what the guest was trying to access. Note, the only reason this isn't a fatal bug is that the AVIC architecture had the foresight to guard against buggy hypervisors. E.g. if hardware simply read from the virtual APIC page, the guest would get garbage (because the timer is emulated in software). Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode") Cc: stable@vger.kernel.org Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://patch.msgid.link/20260514213115.1637082-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-05-22KVM: x86: Add dedicated API for getting mask of accelerated x2APIC MSRsSean Christopherson3-5/+21
Add a dedicated local APIC API, kvm_x2apic_disable_intercept_reg_mask(), to provide the mask of x2APIC registers whose MSRs can and should be passed through to the guest when x2APIC virtualization is enable, and use it in lieu of the open-coded equivalent VMX logic. Providing a common helper will allow sharing the logic with SVM (x2AVIC), and as a bonus eliminates the somewhat confusing code where KVM enables interception for MSR_TYPE_RW, even though only the READ case actually needs to be updated. No functional change intended. Cc: stable@vger.kernel.org Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org> Link: https://patch.msgid.link/20260514213115.1637082-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-05-22Merge tag 'arm64-fixes' of ↵Linus Torvalds2-2/+3
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 fixes from Catalin Marinas: - Handle probe on hinted conditional branch instructions. BC.cond instructions can be simulated in the same way as B.cond instructions, so extend the decode mask for B.cond to cover BC.cond - Flush the walk cache when unsharing PMD tables. Recent changes to huge_pmd_unshare() introduced mmu_gather::unshared_tables but the arm64 code was still treating the TLB flushing as only targeting leaf entries (TLBI VALE1IS). Fix it by using non-leaf-only instructions (TLBI VAE1IS) when tlb->unshared_tables is set * tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: arm64: tlb: Flush walk cache when unsharing PMD tables arm64: probes: Handle probes on hinted conditional branch instructions
2026-05-22Merge tag 's390-7.1-3' of ↵Linus Torvalds2-13/+28
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux Pull s390 fixes from Alexander Gordeev: - Fix PAI NNPA mismatch between counting and recording, where sampling reports twice the value - Fix loss of PAI counter increments during recording on systems with many CPUs under heavy load, while counting is not affected - On some supported machines, CHSC cannot access memory outside the DMA zone, causing CHSC command failures. Restore GFP_DMA flag when allocating memory for CHSC control blocks - Align the numbering scheme for higher-level topology structures like socket, book, drawer with other hardware identifiers e.g. in sysfs, procfs and tools like lscpu * tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: s390/topology: Use zero-based numbering for containing entities s390/cio: Restore GFP_DMA for CHSC allocation s390/pai: Fix missing PAI counter increments under heavy load s390/pai: Disable duplicate read of kernel PAI counter value
2026-05-22MIPS: Remove unused arch/mips/crypto directoryEthan Nelson-Moore4-14/+0
The last MIPS crypto code was moved to lib/crypto/mips in commit c9e5ac0ab9d1 ("lib/crypto: mips/md5: Migrate optimized code into library"). However, arch/mips/crypto still contains stub Kconfig, Makefile, and .gitignore files. Remove these unnecessary files. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2026-05-22LoongArch: Remove unused arch/loongarch/crypto directoryEthan Nelson-Moore3-11/+0
All LoongArch crypto code was moved to arch/loongarch/lib in commit 72f51a4f4b07 ("loongarch/crc32: expose CRC32 functions through lib"). However, arch/loongarch/crypto still contains stub Kconfig and Makefile files. Remove these unnecessary files. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2026-05-22crypto: riscv/aes - replace min_t with min in riscv64_aes_ctr_cryptThorsten Blum1-2/+2
Use the simpler min() macro since the values are unsigned and compatible. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2026-05-22driver core: delete useless forward declaration of "struct class"Alexey Dobriyan1-1/+0
"struct class" is defined earlier on both cases. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Link: https://patch.msgid.link/6d5937c5-9d41-4cfe-9e42-0946e12dc72d@p183 Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22arm64: tlb: Flush walk cache when unsharing PMD tablesZeng Heng1-1/+2
When huge_pmd_unshare() is called to unshare a PMD table, the tlb_unshare_pmd_ptdesc() function sets tlb->unshared_tables=true but the aarch64 tlb_flush() only checked tlb->freed_tables to determine whether to use TLBF_NONE (vae1is, invalidates walk cache) or TLBF_NOWALKCACHE (vale1is, leaf-only). This caused the stale PMD page table entry to remain in the walk cache after unshare, potentially leading to incorrect page table walks. Fix by including unshared_tables in the check, so that when unsharing tables, TLBF_NONE is used and the walk cache is properly invalidated. Here is the detailed distinction between vae1is and vale1is: | Instruction Combination | Actual Invalidation Scope | | ------------------------ | --------------------------------------------------| | `VAE1IS` + TTL=`0` | All entries at all levels (full invalidation) | | `VAE1IS` + TTL=`2` (L2) | Non-leaf at Level 0/1 + leaf at Level 2 | | `VALE1IS` + TTL=`0` | Leaf entries at all levels (non-leaf not cleared) | | `VALE1IS` + TTL=`2` (L2) | Leaf entry at Level 2 only | Signed-off-by: Zeng Heng <zengheng4@huawei.com> Fixes: 8ce720d5bd91 ("mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather") Cc: <stable@vger.kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-05-22serial: zs: Convert to use a platform deviceMaciej W. Rozycki1-3/+57
Prevent a crash from happening as the first serial port is initialised: Console: switching to mono frame buffer device 160x64 fb0: PMAG-AA frame buffer device at tc0 DECstation Z85C30 serial driver version 0.10 CPU 0 Unable to handle kernel paging request at virtual address 0000002c, epc == 803ab00c, ra == 803aafe0 Oops[#1]: CPU: 0 PID: 1 Comm: swapper Not tainted 6.4.0-rc3-00031-g84a9582fd203-dirty #57 $ 0 : 00000000 10012c00 803aaeb0 00000000 $ 4 : 80e12f60 80e12f50 80e12f58 81000030 $ 8 : 00000000 805ff37c 00000000 33433538 $12 : 65732030 00000006 80c2915d 6c616972 $16 : 80e12f00 807b7630 00000000 00000000 $20 : 00000004 00000348 000001a0 807623b8 $24 : 00000018 00000000 $28 : 80c24000 80c25d60 8078b148 803aafe0 Hi : 00000000 Lo : 00000000 epc : 803ab00c serial_base_ctrl_add+0x78/0xf4 ra : 803aafe0 serial_base_ctrl_add+0x4c/0xf4 Status: 10012c03 KERNEL EXL IE Cause : 00000008 (ExcCode 02) BadVA : 0000002c PrId : 00000440 (R4400SC) Modules linked in: Process swapper (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000) Stack : 80760000 00000cc0 00400044 00400040 803aa02c 80d61ab8 00000000 807b7630 80760000 807623b8 807b7628 803aa644 80386998 00000000 80e17780 80220f68 80e17780 80d61ab8 80c17d80 80e17780 80e17780 8063c798 80e17780 80383fa0 00000010 80e17780 00000000 80386998 807a0000 00000000 00400040 8038f848 807623b8 80d61ab8 00000004 80e17780 00000000 803a68e4 80c25e2c 803bb884 ... Call Trace: [<803ab00c>] serial_base_ctrl_add+0x78/0xf4 [<803aa644>] serial_core_register_port+0x174/0x69c [<8077e9ac>] zs_init+0xc8/0xfc [<800404d4>] do_one_initcall+0x40/0x2ac [<8076cecc>] kernel_init_freeable+0x1e4/0x270 [<80605bec>] kernel_init+0x20/0x108 [<800431e8>] ret_from_kernel_thread+0x14/0x1c Code: 2442aeb0 ae120024 ae0200d0 <8c67002c> 50e00001 8c670000 3c06806e 3c05806e afb30010 ---[ end trace 0000000000000000 ]--- (report at the offending commit) -- where a pointer is dereferenced that has been derived from a null pointer to the port's parent device. Since no device is available with legacy probing and it's not anymore a preferable way to discover devices anyway, switch the driver to using a platform device and use it as the port's parent device. Update resource handling accordingly and only request the actual span of addresses used within the slot, which will have had its resource already requested by generic platform device code. Use platform_driver_probe() not just because SCC devices are fixed with solder on board and not straightforward to remove, but foremost because the associated TTY's major device number is the same as used by the dz driver and the first driver to claim it will prevent the other one from using it. Either one DZ device or some SCC devices will be present in a given system but never both at a time, and therefore we want the major device number to be claimed by the first driver to actually successfully bind to its device and platform_driver_probe() is a way to fulfil that. An unfortunate consequence of the switch to a platform device is we now hand the console over from the bootconsole much later in the bootstrap. The firmware console handler appears good enough though to work so late and in particular with interrupts enabled. Since there is one way only remaining to reach zs_reset() now, remove the port initialisation marker as no longer needed and go through the channel reset unconditionally. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10 Link: https://patch.msgid.link/alpine.DEB.2.21.2605062328480.46195@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22serial: dz: Convert to use a platform deviceMaciej W. Rozycki1-1/+54
Prevent a crash from happening as the first serial port is initialised: Console: switching to colour frame buffer device 160x64 tgafb: SFB+ detected, rev=0x02 fb0: Digital ZLX-E1 frame buffer device at 0x1e000000 DECstation DZ serial driver version 1.04 CPU 0 Unable to handle kernel paging request at virtual address 000000bc, epc == 8048b3a4, ra == 80470a78 Oops[#1]: CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.19.0-dirty #35 NONE $ 0 : 00000000 1000ac00 00000004 804707ac $ 4 : 00000000 80e20850 80e20858 81000030 $ 8 : 00000000 8072c81c 00000008 fefefeff $12 : 6c616972 00000006 80c5917f 69726420 $16 : 80e20800 00000000 808f8968 80e20800 $20 : 00000000 807f5a90 808b0094 808d3bc8 $24 : 00000018 80479030 $28 : 80c2e000 80c2fd70 00000069 80470a78 Hi : 00000004 Lo : 00000000 epc : 8048b3a4 __dev_fwnode+0x0/0xc ra : 80470a78 serial_base_ctrl_add+0xa0/0x168 Status: 1000ac04 IEp Cause : 30000008 (ExcCode 02) BadVA : 000000bc PrId : 00000220 (R3000) Modules linked in: Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000) Stack : 00400044 00400040 8046f4cc 00000000 808a6148 808a0000 808f8968 8086983c 808e0000 8046fc84 1000ac01 00000028 80e20700 802ba3f8 80e20700 80d34a94 80c1b900 80e20700 80e20700 80e20700 80e20700 80444650 00000000 00000000 00000000 807f5a90 808b0094 80447080 00400040 808e0000 80d34a94 808a6148 80d34a94 00000004 80e20700 00000000 8076974c 80469810 80c2fe3c 1000ac01 ... Call Trace: [<8048b3a4>] __dev_fwnode+0x0/0xc [<80470a78>] serial_base_ctrl_add+0xa0/0x168 [<8046fc84>] serial_core_register_port+0x1c8/0x974 [<808c6af0>] dz_init+0x74/0xc8 [<800470e0>] do_one_initcall+0x44/0x2d4 [<808b111c>] kernel_init_freeable+0x258/0x308 [<8072e434>] kernel_init+0x20/0x114 [<80049cd0>] ret_from_kernel_thread+0x14/0x1c Code: 27bd0018 03e00008 2402ffea <8c8200bc> 03e00008 00000000 27bdffc0 afbe0038 afb30024 ---[ end trace 0000000000000000 ]--- -- where a pointer is dereferenced that has been derived from a null pointer to the port's parent device. Since no device is available with legacy probing and it's not anymore a preferable way to discover devices anyway, switch the driver to using a platform device and use it as the port's parent device. Update resource handling accordingly and only request the actual span of addresses used within the slot, which will have had its resource already requested by generic platform device code. Use platform_driver_probe() not just because the DZ device is fixed with solder on board and not straightforward to remove, but foremost because the associated TTY's major device number is the same as used by the zs driver and the first driver to claim it will prevent the other one from using it. Either one DZ device or some SCC devices will be present in a given system but never both at a time, and therefore we want the major device number to be claimed by the first driver to actually successfully bind to its device and platform_driver_probe() is a way to fulfil that. An unfortunate consequence of the switch to a platform device is we now hand the console over from the bootconsole much later in the bootstrap. The firmware console handler appears good enough though to work so late and in particular with interrupts enabled. Conversely only starting the console port so late lets the reset code fully utilise our delay handlers, so switch from udelay() to fsleep() for transmitter draining so as to avoid busy-waiting for an excessive amount of time. Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM") Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk> Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10 Link: https://patch.msgid.link/alpine.DEB.2.21.2605062326540.46195@angie.orcam.me.uk Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-05-22KVM: s390: Properly reset zero bit in PGSTEClaudio Imbrenda1-0/+1
In case of memory pressure, it's possible that a guest page gets freed and then almost immediately reused by the guest. If CMMA is enabled, _essa_clear_cbrl() will discard all pages that are either unused or zero. If a discarded page is reused before _essa_clear_cbrl() is called, and the pgste.zero bit is not cleared, the page will be discarded despite not being unused. When calling _gmap_ptep_xchg(), always clear the pgste.zero bit. This prevents the page from being accidentally discarded when not unused. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Steffen Eiden <seiden@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22KVM: s390: vsie: Fix redundant rmap entriesClaudio Imbrenda1-1/+3
The address passed to the gmap rmap was not being masked. As a consequence several different (but functionally equivalent) rmap entries were being created for each shadowed table. Fix this by properly masking the address depending on the table level. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22KVM: s390: vsie: Fix unshadowing logicClaudio Imbrenda5-5/+63
In some cases (i.e. under extreme memory pressure on the host), attempting to shadow memory will result in the same memory being unshadowed, causing a loop. Add a PGSTE bit to distinguish between shadowed memory and shadowed DAT tables, fix the unshadowing logic in _gmap_ptep_xchg() to prevent unnecessary unshadowing and perform better checks. Also fix the unshadowing logic in _gmap_crstep_xchg_atomic() which did not unshadow properly when the large page would become unprotected. Opportunistically add a check in gmap_protect_rmap() to make sure it won't be called with level == TABLE_TYPE_PAGE_TABLE. Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errorsClaudio Imbrenda1-4/+3
Fix a memory leak that can happen if gmap_ucas_map_one() or kvm_s390_mmu_cache_topup() return error values. Also fix a similar issue in gmap_set_limit(). Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Reported-by: Jiaxin Fan <jiaxin.fan@ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22KVM: s390: vsie: Fix memory leak when unshadowingClaudio Imbrenda1-1/+3
When performing a partial unshadowing, the rmap was being leaked. Add the missing kfree(). Fixes: a2c17f9270cc ("KVM: s390: New gmap code") Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com> Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com> Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
2026-05-22KVM: arm64: vgic-v5: Limit support to 64 PPIsMarc Zyngier3-82/+26
Although we have some code supporting 128 PPIs, the only supported configuration is 64 PPIs. There is no way to test the 128 PPI code, so it is bound to bitrot very quickly. Given that KVM/arm64's goal has always been to stick to non-IMPDEF behaviours, drop the 128 PPI support. Someone motivated enough and with very strong arguments can always bring it back -- it's all in the git history. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20260520091949.542365-10-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-22KVM: arm64: vgic: Rationalise per-CPU irq accessorMarc Zyngier1-13/+12
Despite adding the necessary infrastructure to identify irq types, vgic_get_vcpu_irq() treats GICv5 PPIs in a special way, which impairs the readability of the code. Use the existing irq classifiers to handle per-CPU irqs for all vgic types, and let the normal control flow reach global interrupt handling without any v5-specific path. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20260520091949.542365-9-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-22KVM: arm64: vgic-v5: Drop defensive checks from vgic_v5_ppi_queue_irq_unlock()Marc Zyngier1-13/+3
vgic_v5_ppi_queue_irq_unlock() performs a bunch of sanity checks that are pretty pointless as there is no code path that can result in these invariants to be violated. And if they are, a nice crash is just as instructive than a warning. Drop what is evidently debug code and simplify the whole thing. Link: https://lore.kernel.org/r/20260520091949.542365-8-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-22KVM: arm64: vgic: Consolidate vgic_allocate_private_irqs_locked()Marc Zyngier1-27/+18
vgic_allocate_private_irqs_locked() calls two helpers, oddly named vgic_{,v5_}allocate_private_irq(). Not only these helpers don't allocate anything, but they also contain duplicate init code that would be better placed in the caller. Consolidate the common init code in the caller, rename the helpers to vgic_{,v5_}setup_private_irq(), and pass the irq pointer around instead of the index of the interrupt. Reviewed-by: Joey Gouly <joey.gouly@arm.com> Link: https://lore.kernel.org/r/20260520091949.542365-7-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-05-22KVM: arm64: vgic: Constify struct irq_ops usageMarc Zyngier3-7/+11
vgic-v5 has introduced much more prevalent usage of the struct irq_ops mechanism. In the process, it becomes evident that suffers from two related problems: - it