aboutsummaryrefslogtreecommitdiff
path: root/arch/arm64/tools/cpucaps
AgeCommit message (Collapse)AuthorFilesLines
2026-04-20Merge branch 'for-next/c1-pro-erratum-4193714' into for-next/coreCatalin Marinas1-0/+1
* for-next/c1-pro-erratum-4193714: : Work around C1-Pro erratum 4193714 (CVE-2026-0995) arm64: errata: Work around early CME DVMSync acknowledgement arm64: cputype: Add C1-Pro definitions arm64: tlb: Pass the corresponding mm to __tlbi_sync_s1ish() arm64: tlb: Introduce __tlbi_sync_s1ish_{kernel,batch}() for TLB maintenance
2026-04-10arm64: errata: Work around early CME DVMSync acknowledgementCatalin Marinas1-0/+1
C1-Pro acknowledges DVMSync messages before completing the SME/CME memory accesses. Work around this by issuing an IPI to the affected CPUs if they are running in EL0 with SME enabled. Note that we avoid the local DSB in the IPI handler as the kernel runs with SCTLR_EL1.IESB=1. This is sufficient to complete SME memory accesses at EL0 on taking an exception to EL1. On the return to user path, no barrier is necessary either. See the comment in sme_set_active() and the more detailed explanation in the link below. To avoid a potential IPI flood from malicious applications (e.g. madvise(MADV_PAGEOUT) in a tight loop), track where a process is active via mm_cpumask() and only interrupt those CPUs. Link: https://lore.kernel.org/r/ablEXwhfKyJW1i7l@J2N7QTR9R3 Cc: Will Deacon <will@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: James Morse <james.morse@arm.com> Cc: Mark Brown <broonie@kernel.org> Reviewed-by: Will Deacon <will@kernel.org> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-03-26arm64: cpufeature: Add FEAT_LSUIYeoreum Yun1-0/+1
Since Armv9.6, FEAT_LSUI introduces atomic instructions that allow privileged code to access user memory without clearing the PSTATE.PAN bit. Add CPU feature detection for FEAT_LSUI. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> [catalin.marinas@arm.com: Remove commit log references to SW_PAN] Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2026-01-29Merge branch 'for-next/errata' into for-next/coreWill Deacon1-0/+1
* for-next/errata: arm64: errata: Workaround for SI L1 downstream coherency issue
2026-01-23arm64: errata: Workaround for SI L1 downstream coherency issueLucas Wei1-0/+1
When software issues a Cache Maintenance Operation (CMO) targeting a dirty cache line, the CPU and DSU cluster may optimize the operation by combining the CopyBack Write and CMO into a single combined CopyBack Write plus CMO transaction presented to the interconnect (MCN). For these combined transactions, the MCN splits the operation into two separate transactions, one Write and one CMO, and then propagates the write and optionally the CMO to the downstream memory system or external Point of Serialization (PoS). However, the MCN may return an early CompCMO response to the DSU cluster before the corresponding Write and CMO transactions have completed at the external PoS or downstream memory. As a result, stale data may be observed by external observers that are directly connected to the external PoS or downstream memory. This erratum affects any system topology in which the following conditions apply: - The Point of Serialization (PoS) is located downstream of the interconnect. - A downstream observer accesses memory directly, bypassing the interconnect. Conditions: This erratum occurs only when all of the following conditions are met: 1. Software executes a data cache maintenance operation, specifically, a clean or clean&invalidate by virtual address (DC CVAC or DC CIVAC), that hits on unique dirty data in the CPU or DSU cache. This results in a combined CopyBack and CMO being issued to the interconnect. 2. The interconnect splits the combined transaction into separate Write and CMO transactions and returns an early completion response to the CPU or DSU before the write has completed at the downstream memory or PoS. 3. A downstream observer accesses the affected memory address after the early completion response is issued but before the actual memory write has completed. This allows the observer to read stale data that has not yet been updated at the PoS or downstream memory. The implementation of workaround put a second loop of CMOs at the same virtual address whose operation meet erratum conditions to wait until cache data be cleaned to PoC. This way of implementation mitigates performance penalty compared to purely duplicate original CMO. Signed-off-by: Lucas Wei <lucaswei@google.com> Signed-off-by: Will Deacon <will@kernel.org>
2026-01-22arm64: Add support for FEAT_{LS64, LS64_V}Yicong Yang1-0/+2
Armv8.7 introduces single-copy atomic 64-byte loads and stores instructions and its variants named under FEAT_{LS64, LS64_V}. These features are identified by ID_AA64ISAR1_EL1.LS64 and the use of such instructions in userspace (EL0) can be trapped. As st64bv (FEAT_LS64_V) and st64bv0 (FEAT_LS64_ACCDATA) can not be tell apart, FEAT_LS64 and FEAT_LS64_ACCDATA which will be supported in later patch will be exported to userspace, FEAT_LS64_V will be enabled only in kernel. In order to support the use of corresponding instructions in userspace: - Make ID_AA64ISAR1_EL1.LS64 visbile to userspace - Add identifying and enabling in the cpufeature list - Expose these support of these features to userspace through HWCAP3 and cpuinfo ld64b/st64b (FEAT_LS64) and st64bv (FEAT_LS64_V) is intended for special memory (device memory) so requires support by the CPU, system and target memory location (device that support these instructions). The HWCAP3_LS64, implies the support of CPU and system (since no identification method from system, so SoC vendors should advertise support in the CPU if system also support them). Otherwise for ld64b/st64b the atomicity may not be guaranteed or a DABT will be generated, so users (probably userspace driver developer) should make sure the target memory (device) also have the support. For st64bv 0xffffffffffffffff will be returned as status result for unsupported memory so user should check it. Document the restrictions along with HWCAP3_LS64. Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Oliver Upton <oupton@kernel.org> Signed-off-by: Yicong Yang <yangyicong@hisilicon.com> Signed-off-by: Zhou Wang <wangzhou1@hisilicon.com> Signed-off-by: Will Deacon <will@kernel.org>
2025-12-01Merge branch 'kvm-arm64/nv-xnx-haf' into kvmarm/nextOliver Upton1-0/+1
* kvm-arm64/nv-xnx-haf: (22 commits) : Support for FEAT_XNX and FEAT_HAF in nested : : Add support for a couple of MMU-related features that weren't : implemented by KVM's software page table walk: : : - FEAT_XNX: Allows the hypervisor to describe execute permissions : separately for EL0 and EL1 : : - FEAT_HAF: Hardware update of the Access Flag, which in the context of : nested means software walkers must also set the Access Flag. : : The series also adds some basic support for testing KVM's emulation of : the AT instruction, including the implementation detail that AT sets the : Access Flag in KVM. KVM: arm64: at: Update AF on software walk only if VM has FEAT_HAFDBS KVM: arm64: at: Use correct HA bit in TCR_EL2 when regime is EL2 KVM: arm64: Document KVM_PGTABLE_PROT_{UX,PX} KVM: arm64: Fix spelling mistake "Unexpeced" -> "Unexpected" KVM: arm64: Add break to default case in kvm_pgtable_stage2_pte_prot() KVM: arm64: Add endian casting to kvm_swap_s[12]_desc() KVM: arm64: Fix compilation when CONFIG_ARM64_USE_LSE_ATOMICS=n KVM: arm64: selftests: Add test for AT emulation KVM: arm64: nv: Expose hardware access flag management to NV guests KVM: arm64: nv: Implement HW access flag management in stage-2 SW PTW KVM: arm64: Implement HW access flag management in stage-1 SW PTW KVM: arm64: Propagate PTW errors up to AT emulation KVM: arm64: Add helper for swapping guest descriptor KVM: arm64: nv: Use pgtable definitions in stage-2 walk KVM: arm64: Handle endianness in read helper for emulated PTW KVM: arm64: nv: Stop passing vCPU through void ptr in S2 PTW KVM: arm64: Call helper for reading descriptors directly KVM: arm64: nv: Advertise support for FEAT_XNX KVM: arm64: Teach ptdump about FEAT_XNX permissions KVM: arm64: nv: Forward FEAT_XNX permissions to the shadow stage-2 ... Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24KVM: arm64: GICv3: Detect and work around the lack of ICV_DIR_EL1 trappingMarc Zyngier1-0/+1
A long time ago, an unsuspecting architect forgot to add a trap bit for ICV_DIR_EL1 in ICH_HCR_EL2. Which was unfortunate, but what's a bit of spec between friends? Thankfully, this was fixed in a later revision, and ARM "deprecates" the lack of trapping ability. Unfortuantely, a few (billion) CPUs went out with that defect, anything ARMv8.0 from ARM, give or take. And on these CPUs, you can't trap DIR on its own, full stop. As the next best thing, we can trap everything in the common group, which is a tad expensive, but hey ho, that's what you get. You can otherwise recycle the HW in the neaby bin. Tested-by: Fuad Tabba <tabba@google.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Tested-by: Mark Brown <broonie@kernel.org> Link: https://msgid.link/20251120172540.2267180-7-maz@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-11-24arm64: Detect FEAT_XNXOliver Upton1-0/+1
Detect the feature in anticipation of using it in KVM. Reviewed-by: Marc Zyngier <maz@kernel.org> Tested-by: Marc Zyngier <maz@kernel.org> Link: https://msgid.link/20251124190158.177318-2-oupton@kernel.org Signed-off-by: Oliver Upton <oupton@kernel.org>
2025-09-17arm64: cpucaps: Add GICv5 Legacy vCPU interface (GCIE_LEGACY) capabilitySascha Bischoff1-0/+1
Implement the GCIE_LEGACY capability as a system feature to be able to check for support from KVM. The type is explicitly ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE, which means that the capability is enabled early if all boot CPUs support it. Additionally, if this capability is enabled during boot, it prevents late onlining of CPUs that lack it, thereby avoiding potential mismatched configurations which would break KVM. Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-08-29Merge tag 'kvmarm-fixes-6.17-1' of ↵Paolo Bonzini1-0/+1
https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, take #2 - Correctly handle 'invariant' system registers for protected VMs - Improved handling of VNCR data aborts, including external aborts - Fixes for handling of FEAT_RAS for NV guests, providing a sane fault context during SEA injection and preventing the use of RASv1p1 fault injection hardware - Ensure that page table destruction when a VM is destroyed gives an opportunity to reschedule - Large fix to KVM's infrastructure for managing guest context loaded on the CPU, addressing issues where the output of AT emulation doesn't get reflected to the guest - Fix AT S12 emulation to actually perform stage-2 translation when necessary - Avoid attempting vLPI irqbypass when GICv4 has been explicitly disabled for a VM - Minor KVM + selftest fixes
2025-08-21arm64: Add capability denoting FEAT_RASv1p1Marc Zyngier1-0/+1
Detecting FEAT_RASv1p1 is rather complicated, as there are two ways for the architecture to advertise the same thing (always a delight...). Add a capability that will advertise this in a synthetic way to the rest of the kernel. Acked-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Marc Zyngier <maz@kernel.org> Reviewed-by: Cornelia Huck <cohuck@redhat.com> Link: https://lore.kernel.org/r/20250817202158.395078-2-maz@kernel.org Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-30Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-1/+3
Pull kvm updates from Paolo Bonzini: "ARM: - Host driver for GICv5, the next generation interrupt controller for arm64, including support for interrupt routing, MSIs, interrupt translation and wired interrupts - Use FEAT_GCIE_LEGACY on GICv5 systems to virtualize GICv3 VMs on GICv5 hardware, leveraging the legacy VGIC interface - Userspace control of the 'nASSGIcap' GICv3 feature, allowing userspace to disable support for SGIs w/o an active state on hardware that previously advertised it unconditionally - Map supporting endpoints with cacheable memory attributes on systems with FEAT_S2FWB and DIC where KVM no longer needs to perform cache maintenance on the address range - Nested support for FEAT_RAS and FEAT_DoubleFault2, allowing the guest hypervisor to inject external aborts into an L2 VM and take traps of masked external aborts to the hypervisor - Convert more system register sanitization to the config-driven implementation - Fixes to the visibility of EL2 registers, namely making VGICv3 system registers accessible through the VGIC device instead of the ONE_REG vCPU ioctls - Various cleanups and minor fixes LoongArch: - Add stat information for in-kernel irqchip - Add tracepoints for CPUCFG and CSR emulation exits - Enhance in-kernel irqchip emulation - Various cleanups RISC-V: - Enable ring-based dirty memory tracking - Improve perf kvm stat to report interrupt events - Delegate illegal instruction trap to VS-mode - MMU improvements related to upcoming nested virtualization s390x - Fixes x86: - Add CONFIG_KVM_IOAPIC for x86 to allow disabling support for I/O APIC, PIC, and PIT emulation at compile time - Share device posted IRQ code between SVM and VMX and harden it against bugs and runtime errors - Use vcpu_idx, not vcpu_id, for GA log tag/metadata, to make lookups O(1) instead of O(n) - For MMIO stale data mitigation, track whether or not a vCPU has access to (host) MMIO based on whether the page tables have MMIO pfns mapped; using VFIO is prone to false negatives - Rework the MSR interception code so that the SVM and VMX APIs are more or less identical - Recalculate all MSR intercepts from scratch on MSR filter changes, instead of maintaining shadow bitmaps - Advertise support for LKGS (Load Kernel GS base), a new instruction that's loosely related to FRED, but is supported and enumerated independently - Fix a user-triggerable WARN that syzkaller found by setting the vCPU in INIT_RECEIVED state (aka wait-for-SIPI), and then putting the vCPU into VMX Root Mode (post-VMXON). Trying to detect every possible path leading to architecturally forbidden states is hard and even risks breaking userspace (if it goes from valid to valid state but passes through invalid states), so just wait until KVM_RUN to detect that the vCPU state isn't allowed - Add KVM_X86_DISABLE_EXITS_APERFMPERF to allow disabling interception of APERF/MPERF reads, so that a "properly" configured VM can access APERF/MPERF. This has many caveats (APERF/MPERF cannot be zeroed on vCPU creation or saved/restored on suspend and resume, or preserved over thread migration let alone VM migration) but can be useful whenever you're interested in letting Linux guests see the effective physical CPU frequency in /proc/cpuinfo - Reject KVM_SET_TSC_KHZ for vm file descriptors if vCPUs have been created, as there's no known use case for changing the default frequency for other VM types and it goes counter to the very reason why the ioctl was added to the vm file descriptor. And also, there would be no way to make it work for confidential VMs with a "secure" TSC, so kill two birds with one stone - Dynamically allocation the shadow MMU's hashed page list, and defer allocating the hashed list until it's actually needed (the TDP MMU doesn't use the list) - Extract many of KVM's helpers for accessing architectural local APIC state to common x86 so that they can be shared by guest-side code for Secure AVIC - Various cleanups and fixes x86 (Intel): - Preserve the host's DEBUGCTL.FREEZE_IN_SMM when running the guest. Failure to honor FREEZE_IN_SMM can leak host state into guests - Explicitly check vmcs12.GUEST_DEBUGCTL on nested VM-Enter to prevent L1 from running L2 with features that KVM doesn't support, e.g. BTF x86 (AMD): - WARN and reject loading kvm-amd.ko instead of panicking the kernel if the nested SVM MSRPM offsets tracker can't handle an MSR (which is pretty much a static condition and therefore should never happen, but still) - Fix a variety of flaws and bugs in the AVIC device posted IRQ code - Inhibit AVIC if a vCPU's ID is too big (relative to what hardware supports) instead of rejecting vCPU creation - Extend enable_ipiv module param support to SVM, by simply leaving IsRunning clear in the vCPU's physical ID table entry - Disable IPI virtualization, via enable_ipiv, if the CPU is affected by erratum #1235, to allow (safely) enabling AVIC on such CPUs - Request GA Log interrupts if and only if the target vCPU is blocking, i.e. only if KVM needs a notification in order to wake the vCPU - Intercept SPEC_CTRL on AMD if the MSR shouldn't exist according to the vCPU's CPUID model - Accept any SNP policy that is accepted by the firmware with respect to SMT and single-socket restrictions. An incompatible policy doesn't put the kernel at risk in any way, so there's no reason for KVM to care - Drop a superfluous WBINVD (on all CPUs!) when destroying a VM and use WBNOINVD instead of WBINVD when possible for SEV cache maintenance - When reclaiming memory from an SEV guest, only do cache flushes on CPUs that have ever run a vCPU for the guest, i.e. don't flush the caches for CPUs that can't possibly have cache lines with dirty, encrypted data Generic: - Rework irqbypass to track/match producers and consumers via an xarray instead of a linked list. Using a linked list leads to O(n^2) insertion times, which is hugely problematic for use cases that create large numbers of VMs. Such use cases typically don't actually use irqbypass, but eliminating the pointless registration is a future problem to solve as it likely requires new uAPI - Track irqbypass's "token" as "struct eventfd_ctx *" instead of a "void *", to avoid making a simple concept unnecessarily difficult to understand - Decouple device posted IRQs from VFIO device assignment, as binding a VM to a VFIO group is not a requirement for enabling device posted IRQs - Clean up and document/comment the irqfd assignment code - Disallow binding multiple irqfds to an eventfd with a priority waiter, i.e. ensure an eventfd is bound to at most one irqfd through the entire host, and add a selftest to verify eventfd:irqfd bindings are globally unique - Add a tracepoint for KVM_SET_MEMORY_ATTRIBUTES to help debug issues related to private <=> shared memory conversions - Drop guest_memfd's .getattr() implementation as the VFS layer will call generic_fillattr() if inode_operations.getattr is NULL - Fix issues with dirty ring harvesting where KVM doesn't bound the processing of entries in any way, which allows userspace to keep KVM in a tight loop indefinitely - Kill off kvm_arch_{start,end}_assignment() and x86's associated tracking, now that KVM no longer uses assigned_device_count as a heuristic for either irqbypass usage or MDS mitigation Selftests: - Fix a comment typo - Verify KVM is loaded when getting any KVM module param so that attempting to run a selftest without kvm.ko loaded results in a SKIP message about KVM not being loaded/enabled (versus some random parameter not existing) - Skip tests that hit EACCES when attempting to access a file, and print a "Root required?" help message. In most cases, the test just needs to be run with elevated permissions" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (340 commits) Documentation: KVM: Use unordered list for pre-init VGIC registers RISC-V: KVM: Avoid re-acquiring memslot in kvm_riscv_gstage_map() RISC-V: KVM: Use find_vma_intersection() to search for intersecting VMAs RISC-V: perf/kvm: Add reporting of interrupt events RISC-V: KVM: Enable ring-based dirty memory tracking RISC-V: KVM: Fix inclusion of Smnpm in the guest ISA bitmap RISC-V: KVM: Delegate illegal instruction fault to VS mode RISC-V: KVM: Pass VMID as parameter to kvm_riscv_hfence_xyz() APIs RISC-V: KVM: Factor-out g-stage page table management RISC-V: KVM: Add vmid field to struct kvm_riscv_hfence RISC-V: KVM: Introduce struct kvm_gstage_mapping RISC-V: KVM: Factor-out MMU related declarations into separate headers RISC-V: KVM: Use ncsr_xyz() in kvm_riscv_vcpu_trap_redirect() RISC-V: KVM: Implement kvm_arch_flush_remote_tlbs_range() RISC-V: KVM: Don't flush TLB when PTE is unchanged RISC-V: KVM: Replace KVM_REQ_HFENCE_GVMA_VMID_ALL with KVM_REQ_TLB_FLUSH RISC-V: KVM: Rename and move kvm_riscv_local_tlb_sanitize() RISC-V: KVM: Drop the return value of kvm_riscv_vcpu_aia_init() RISC-V: KVM: Check kvm_riscv_vcpu_alloc_vector_context() return value KVM: arm64: selftests: Add FEAT_RAS EL2 registers to get-reg-list ...
2025-07-26Merge tag 'irqchip-gic-v5-host' into kvmarm/nextOliver Upton1-1/+2
GICv5 initial host support Add host kernel support for the new arm64 GICv5 architecture, which is quite a departure from the previous ones. Include support for the full gamut of the architecture (interrupt routing and delivery to CPUs, wired interrupts, MSIs, and interrupt translation). * tag 'irqchip-gic-v5-host': (32 commits) arm64: smp: Fix pNMI setup after GICv5 rework arm64: Kconfig: Enable GICv5 docs: arm64: gic-v5: Document booting requirements for GICv5 irqchip/gic-v5: Add GICv5 IWB support irqchip/gic-v5: Add GICv5 ITS support irqchip/msi-lib: Add IRQ_DOMAIN_FLAG_FWNODE_PARENT handling irqchip/gic-v3: Rename GICv3 ITS MSI parent PCI/MSI: Add pci_msi_map_rid_ctlr_node() helper function of/irq: Add of_msi_xlate() helper function irqchip/gic-v5: Enable GICv5 SMP booting irqchip/gic-v5: Add GICv5 LPI/IPI support irqchip/gic-v5: Add GICv5 IRS/SPI support irqchip/gic-v5: Add GICv5 PPI support arm64: Add support for GICv5 GSB barriers arm64: smp: Support non-SGIs for IPIs arm64: cpucaps: Add GICv5 CPU interface (GCIE) capability arm64: cpucaps: Rename GICv3 CPU interface capability arm64: Disable GICv5 read/write/instruction traps arm64/sysreg: Add ICH_HFGITR_EL2 arm64/sysreg: Add ICH_HFGWTR_EL2 ... Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-24Merge branch 'for-next/feat_mte_store_only' into for-next/coreCatalin Marinas1-0/+1
* for-next/feat_mte_store_only: : MTE feature to restrict tag checking to store only operations kselftest/arm64/mte: Add MTE_STORE_ONLY testcases kselftest/arm64/mte: Preparation for mte store only test kselftest/arm64/abi: Add MTE_STORE_ONLY feature hwcap test KVM: arm64: Expose MTE_STORE_ONLY feature to guest arm64/hwcaps: Add MTE_STORE_ONLY hwcaps arm64/kernel: Support store-only mte tag check prctl: Introduce PR_MTE_STORE_ONLY arm64/cpufeature: Add MTE_STORE_ONLY feature
2025-07-24Merge branches 'for-next/livepatch', 'for-next/user-contig-bbml2', ↵Catalin Marinas1-0/+2
'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: (23 commits) drivers/perf: hisi: Support PMUs with no interrupt drivers/perf: hisi: Relax the event number check of v2 PMUs drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver drivers/perf: hisi: Simplify the probe process for each DDRC version perf/arm-ni: Support sharing IRQs within an NI instance perf/arm-ni: Consolidate CPU affinity handling perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation perf/cxlpmu: Remove unintended newline from IRQ name format string perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe() perf: arm_spe: Relax period restriction perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE) KVM: arm64: nvhe: Disable branch generation in nVHE guests arm64: Handle BRBE booting requirements arm64/sysreg: Add BRBE registers and fields perf/arm: Add missing .suppress_bind_attrs perf/arm-cmn: Reduce stack usage during discovery perf: imx9_perf: make the read-only array mask static const perf/arm-cmn: Broaden module description for wider interconnect support ... * for-next/livepatch: : Support for HAVE_LIVEPATCH on arm64 arm64: Kconfig: Keep selects somewhat alphabetically ordered arm64: Implement HAVE_LIVEPATCH arm64: stacktrace: Implement arch_stack_walk_reliable() arm64: stacktrace: Check kretprobe_find_ret_addr() return value arm64/module: Use text-poke API for late relocations. * for-next/user-contig-bbml2: : Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts arm64/mm: Elide tlbi in contpte_convert() under BBML2 iommu/arm: Add BBM Level 2 smmu feature arm64: Add BBM Level 2 cpu feature arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type * for-next/misc: : Miscellaneous arm64 patches arm64/gcs: task_gcs_el0_enable() should use passed task arm64: signal: Remove ISB when resetting POR_EL0 arm64/mm: Drop redundant addr increment in set_huge_pte_at() arm64: Mark kernel as tainted on SAE and SError panic arm64/gcs: Don't call gcs_free() when releasing task_struct arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get arm64: pi: use 'targets' instead of extra-y in Makefile * for-next/acpi: : Various ACPI arm64 changes ACPI: Suppress misleading SPCR console message when SPCR table is absent ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled * for-next/debug-entry: : Simplify the debug exception entry path arm64: debug: remove debug exception registration infrastructure arm64: debug: split bkpt32 exception entry arm64: debug: split brk64 exception entry arm64: debug: split hardware watchpoint exception entry arm64: debug: split single stepping exception entry arm64: debug: refactor reinstall_suspended_bps() arm64: debug: split hardware breakpoint exception entry arm64: entry: Add entry and exit functions for debug exceptions arm64: debug: remove break/step handler registration infrastructure arm64: debug: call step handlers statically arm64: debug: call software breakpoint handlers statically arm64: refactor aarch32_break_handler() arm64: debug: clean up single_step_handler logic * for-next/feat_mte_tagged_far: : Support for reporting the non-address bits during a synchronous MTE tag check fault kselftest/arm64/mte: Add mtefar tests on check_mmap_options kselftest/arm64/mte: Refactor check_mmap_option test kselftest/arm64/mte: Add verification for address tag in signal handler kselftest/arm64/mte: Add address tag related macro and function kselftest/arm64/mte: Check MTE_FAR feature is supported kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS kselftest/arm64: Add MTE_FAR hwcap test KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature * for-next/kselftest: : Kselftest updates for arm64 kselftest/arm64: Handle attempts to disable SM on SME only systems kselftest/arm64: Fix SVE write data generation for SME only systems kselftest/arm64: Test SME on SME only systems in fp-ptrace kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace kselftest/arm64: Allow sve-ptrace to run on SME only systems kselftest/arm4: Provide local defines for AT_HWCAP3 kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace kselftest/arm64: Fix check for setting new VLs in sve-ptrace kselftest/arm64: Convert tpidr2 test to use kselftest.h * for-next/mdscr-cleanup: : Drop redundant DBG_MDSCR_* macros KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t arm64/debug: Drop redundant DBG_MDSCR_* macros * for-next/vmap-stack: : Force VMAP_STACK on arm64 arm64: remove CONFIG_VMAP_STACK checks from entry code arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN arm64: efi: Remove CONFIG_VMAP_STACK check arm64: Mandate VMAP_STACK arm64: efi: Fix KASAN false positive for EFI runtime stack arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth() arm64/gcs: Don't call gcs_free() during flush_gcs() arm64: Restrict pagetable teardown to avoid false warning docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
2025-07-08arm64: Detect FEAT_SCTLR2Oliver Upton1-0/+1
KVM is about to pick up support for SCTLR2. Add cpucap for later use in the guest/host context switch hot path. Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250708172532.1699409-2-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-07-08arm64: cpucaps: Add GICv5 CPU interface (GCIE) capabilityLorenzo Pieralisi1-0/+1
Implement the GCIE capability as a strict boot cpu capability to detect whether architectural GICv5 support is available in HW. Plug it in with a naming consistent with the existing GICv3 CPU interface capability. Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-17-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-08arm64: cpucaps: Rename GICv3 CPU interface capabilityLorenzo Pieralisi1-1/+1
In preparation for adding a GICv5 CPU interface capability, rework the existing GICv3 CPUIF capability - change its name and description so that the subsequent GICv5 CPUIF capability can be added with a more consistent naming on top. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Reviewed-by: Marc Zyngier <maz@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marc Zyngier <maz@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250703-gicv5-host-v7-16-12e71f1b3528@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-07-02arm64/cpufeature: Add MTE_STORE_ONLY featureYeoreum Yun1-0/+1
Since ARMv8.9, FEAT_MTE_STORE_ONLY can be used to restrict raise of tag check fault on store operation only. add MTE_STORE_ONLY feature. Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Reviewed-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20250618092957.2069907-2-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-07-02arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR featureYeoreum Yun1-0/+1
Add FEAT_MTE_TAGGED_FAR cpucap which makes FAR_ELx report all non-address bits on a synchronous MTE tag check fault since Armv8.9 Signed-off-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Yury Khrustalev <yury.khrustalev@arm.com> Link: https://lore.kernel.org/r/20250618084513.1761345-2-yeoreum.yun@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-06-30arm64: Add BBM Level 2 cpu featureMikołaj Lenczewski1-0/+1
The Break-Before-Make cpu feature supports multiple levels (levels 0-2), and this commit adds a dedicated BBML2 cpufeature to test against support for. To support BBML2 in as wide a range of contexts as we can, we want not only the architectural guarantees that BBML2 makes, but additionally want BBML2 to not create TLB conflict aborts. Not causing aborts avoids us having to prove that no recursive faults can be induced in any path that uses BBML2, allowing its use for arbitrary kernel mappings. This feature builds on the previous ARM64_CPUCAP_EARLY_LOCAL_CPU_FEATURE, as all early cpus must support BBML2 for us to enable it (and any later cpus must also support it to be onlined). Not onlining late cpus that do not support BBML2 is unavoidable, as we might currently be using BBML2 semantics for kernel memory regions. This could cause faults in the late cpus, and would be difficult to unwind, so let us avoid the case altogether. Signed-off-by: Mikołaj Lenczewski <miko.lenczewski@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> Link: https://lore.kernel.org/r/20250625113435.26849-3-miko.lenczewski@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2025-05-23Merge branch kvm-arm64/misc-6.16 into kvmarm-master/nextMarc Zyngier1-0/+1
* kvm-arm64/misc-6.16: : . : Misc changes and improvements for 6.16: : : - Add a new selftest for the SVE host state being corrupted by a guest : : - Keep HCR_EL2.xMO set at all times for systems running with the kernel at EL2, : ensuring that the window for interrupts is slightly bigger, and avoiding : a pretty bad erratum on the AmpereOne HW : : - Replace a couple of open-coded on/off strings with str_on_off() : : - Get rid of the pKVM memblock sorting, which now appears to be superflous : : - Drop superflous clearing of ICH_LR_EOI in the LR when nesting : : - Add workaround for AmpereOne's erratum AC04_CPU_23, which suffers from : a pretty bad case of TLB corruption unless accesses to HCR_EL2 are : heavily synchronised : : - Add a per-VM, per-ITS debugfs entry to dump the state of the ITS tables : in a human-friendly fashion : . KVM: arm64: Fix documentation for vgic_its_iter_next() KVM: arm64: vgic-its: Add debugfs interface to expose ITS tables arm64: errata: Work around AmpereOne's erratum AC04_CPU_23 KVM: arm64: nv: Remove clearing of ICH_LR<n>.EOI if ICH_LR<n>.HW == 1 KVM: arm64: Drop sort_memblock_regions() KVM: arm64: selftests: Add test for SVE host corruption KVM: arm64: Force HCR_EL2.xMO to 1 at all times in VHE mode KVM: arm64: Replace ternary flags with str_on_off() helper Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-19arm64: errata: Work around AmpereOne's erratum AC04_CPU_23D Scott Phillips1-0/+1
On AmpereOne AC04, updates to HCR_EL2 can rarely corrupt simultaneous translations for data addresses initiated by load/store instructions. Only instruction initiated translations are vulnerable, not translations from prefetches for example. A DSB before the store to HCR_EL2 is sufficient to prevent older instructions from hitting the window for corruption, and an ISB after is sufficient to prevent younger instructions from hitting the window for corruption. Signed-off-by: D Scott Phillips <scott@os.amperecomputing.com> Reviewed-by: Oliver Upton <oliver.upton@linux.dev> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20250513184514.2678288-1-scott@os.amperecomputing.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-05-06arm64: Add FEAT_FGT2 capabilityMarc Zyngier1-0/+1
As we will eventually have to context-switch the FEAT_FGT2 registers in KVM (something that has been completely ignored so far), add a new cap that we will be able to check for. Signed-off-by: Marc Zyngier <maz@kernel.org>
2025-03-11KVM: arm64: Compute synthetic sysreg ESR for Apple PMUv3 trapsOliver Upton1-0/+1
Apple M* CPUs provide an IMPDEF trap for PMUv3 sysregs, where ESR_EL2.EC is a reserved value (0x3F) and a sysreg-like ISS is reported in AFSR1_EL2. Compute a synthetic ESR for these PMUv3 traps, giving the illusion of something architectural to the rest of KVM. Tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305202641.428114-10-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-03-11KVM: arm64: Use a cpucap to determine if system supports FEAT_PMUv3Oliver Upton1-0/+1
KVM is about to learn some new tricks to virtualize PMUv3 on IMPDEF hardware. As part of that, we now need to differentiate host support from guest support for PMUv3. Add a cpucap to determine if an architectural PMUv3 is present to guard host usage of PMUv3 controls. Tested-by: Janne Grunau <j@jannau.net> Reviewed-by: Marc Zyngier <maz@kernel.org> Link: https://lore.kernel.org/r/20250305202641.428114-6-oliver.upton@linux.dev Signed-off-by: Oliver Upton <oliver.upton@linux.dev>
2025-01-02KVM: arm64: Work around x1e's CNTVOFF_EL2 bogosityMarc Zyngier1-0/+1
It appears that on Qualcomm's x1e CPU, CNTVOFF_EL2 doesn't really work, specially with HCR_EL2.E2H=1. A non-zero offset results in a screaming virtual timer interrupt, to the tune of a few 100k interrupts per second on a 4 vcpu VM. This is also evidenced by this CPU's inability to correctly run any of the timer selftests. The only case this doesn't break is when this register is set to 0, which breaks VM migration. When HCR_EL2.E2H=0, the timer seems to behave normally, and does not result in an interrupt storm. As a workaround, use the fact that this CPU implements FEAT_ECV, and trap all accesses to the virtual timer and counter, keeping CNTVOFF_EL2 set to zero, and emulate accesses to CVAL/TVAL/CTL and the counter itself, fixing up the timer to account for the missing offset. And if you think this is disgusting, you'd probably be right. Acked-by: Oliver Upton <oliver.upton@linux.dev> Link: https://lore.kernel.org/r/20241217142321.763801-12-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2024-11-23Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds1-0/+2
Pull kvm updates from Paolo Bonzini: "The biggest change here is eliminating the awful idea that KVM had of essentially guessing which pfns are refcounted pages. The reason to do so was that KVM needs to map both non-refcounted pages (for example BARs of VFIO devices) and VM_PFNMAP/VM_MIXMEDMAP VMAs that contain refcounted pages. However, the result was security issues in the past, and more recently the inability to map VM_IO and VM_PFNMAP memory that _is_ backed by struct page but is not refcounted. In particular this broke virtio-gpu blob resources (which directly map host graphics buffers into the guest as "vram" for the virtio-gpu device) with the amdgpu driver, because amdgpu allocates non-compound higher order pages and the tail pages could not be mapped into KVM. This requires adjusting all uses of struct page in the per-architecture code, to always work on the pfn whenever possible. The large series that did this, from David Stevens and Sean Christopherson, also cleaned up substantially the set of functions that provided arch code with the pfn for a host virtual addresses. The previous maze of twisty little passages, all different, is replaced by five functions (__gfn_to_page, __kvm_faultin_pfn, the non-__ versions of these two, and kvm_prefetch_pages) saving almost 200 lines of code. ARM: - Support for stage-1 permission indirection (FEAT_S1PIE) and permission overlays (FEAT_S1POE), including nested virt + the emulated page table walker - Introduce PSCI SYSTEM_OFF2 support to KVM + client driver. This call was introduced in PSCIv1.3 as a mechanism to request hibernation, similar to the S4 state in ACPI - Explicitly trap + hide FEAT_MPAM (QoS controls) from KVM guests. As part of it, introduce trivial initialization of the host's MPAM context so KVM can use the corresponding traps - PMU support under nested virtualization, honoring the guest hypervisor's trap configuration and event filtering when running a nested guest - Fixes to vgic ITS serialization where stale device/interrupt table entries are not zeroed when the mapping is invalidated by the VM - Avoid emulated MMIO completion if userspace has requested synchronous external abort injection - Various fixes and cleanups affecting pKVM, vCPU initialization, and selftests LoongArch: - Add iocsr and mmio bus simulation in kernel. - Add in-kernel interrupt controller emulation. - Add support for virtualization extensions to the eiointc irqchip. PPC: - Drop lingering and utterly obsolete references to PPC970 KVM, which was removed 10 years ago. - Fix incorrect documentation references to non-existing ioctls RISC-V: - Accelerate KVM RISC-V when running as a guest - Perf support to collect KVM guest statistics from host side s390: - New selftests: more ucontrol selftests and CPU model sanity checks - Support for the gen17 CPU model - List registers supported by KVM_GET/SET_ONE_REG in the documentation x86: - Cleanup KVM's handling of Accessed and Dirty bits to dedup code, improve documentation, harden against unexpected changes. Even if the hardware A/D tracking is disabled, it is possible to use the hardware-defined A/D bits to track if a PFN is Accessed and/or Dirty, and that removes a lot of special cases. - Elide TLB flushes when aging secondary PTEs, as has been done in x86's primary MMU for over 10 years. - Recover huge pages in-place in the TDP MMU when dirty page logging is toggled off, instead of zapping them and waiting until the page is re-accessed to create a huge mapping. This reduces vCPU jitter. - Batch TLB flushes when dirty page logging is toggled off. This reduces the time it takes to disable dirty logging by ~3x. - Remove the shrinker that was (poorly) attempting to reclaim shadow page tables in low-memory situations. - Clean up and optimize KVM's handling of writes to MSR_IA32_APICBASE. - Advertise CPUIDs for new instructions in Clearwater Forest - Quirk KVM's misguided behavior of initialized certain feature MSRs to their maximum supported feature set, which can result in KVM creating invalid vCPU state. E.g. initializing PERF_CAPABILITIES to a non-zero value results in the vCPU having invalid state if userspace hides PDCM from the guest, which in turn can lead to save/restore failures. - Fix KVM's handling of non-canonical checks for vCPUs that support LA57 to better follow the "architecture", in quotes because the actual behavior is poorly documented. E.g. most MSR writes and descriptor table loads ignore CR4.LA57 and operate purely on whether the CPU supports LA57. - Bypass the register cache when querying CPL from kvm_sched_out(), as filling the cache from IRQ context is generally unsafe; harden the cache accessors to try to prevent similar issues from occuring in the future. The issue that triggered this change was already fixed in 6.12, but was still kinda latent. - Advertise AMD_IBPB_RET to userspace, and fix a related bug where KVM over-advertises SPEC_CTRL when trying to support cross-vendor VMs. - Minor cleanups - Switch hugepage recovery thread to use vhost_task. These kthreads can consume significant amounts of CPU time on behalf of a VM or in response to how the VM behaves (for example how it accesses its memory); therefore KVM tried to place the thread in the VM's cgroups and charge the CPU time consumed by that work to the VM's container. However the kthreads did not process SIGSTOP/SIGCONT, and therefore cgroups which had KVM instances inside could not complete freezing. Fix this by replacing the kthread with a PF_USER_WORKER thread, via the vhost_task abstraction. Another 100+ lines removed, with generally better behavior too like having these threads properly parented in the process tree. - Revert a workaround for an old CPU erratum (Nehalem/Westmere) that didn't really work; there was really nothing to work around anyway: the broken patch was meant to fix nested virtualization, but the PERF_GLOBAL_CTRL MSR is virtualized and therefore unaffected by the erratum. - Fix 6.12 regression where CONFIG_KVM will be built as a module even if asked to be builtin, as long as neither KVM_INTEL nor KVM_AMD is 'y'. x86 selftests: - x86 selftests can now use AVX. Documentation: - Use rST internal links - Reorganize the introduction to the API document Generic: - Protect vcpu->pid accesses outside of vcpu->mutex with a rwlock instead of RCU, so that running a vCPU on a different task doesn't encounter long due to having to wait for all CPUs become quiescent. In general both reads and writes are rare, but userspace that supports confidential computing is introducing the use of "helper" vCPUs that may jump from one host processor to another. Those will be very happy to trigger a synchronize_rcu(), and the effect on performance is quite the disaster" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (298 commits) KVM: x86: Break CONFIG_KVM_X86's direct dependency on KVM_INTEL || KVM_AMD KVM: x86: add back X86_LOCAL_APIC dependency Revert "KVM: VMX: Move LOAD_IA32_PERF_GLOBAL_CTRL errata handling out of setup_vmcs_config()" KVM: x86: switch hugepage recovery thread to vhost_task KVM: x86: expose MSR_PLATFORM_INFO as a feature MSR x86: KVM: Advertise CPUIDs for new instructions in Clearwater Forest Documentation: KVM: fix malformed table irqchip/loongson-eiointc: Add virt extension support LoongArch: KVM: Add irqfd support LoongArch: KVM: Add PCHPIC user mode read and write functions LoongArch: KVM: Add PCHPIC read and write functions LoongArch: KVM: Add PCHPIC device support LoongArch: KVM: Add EIOINTC user mode read and write functions LoongArch: KVM: Add EIOINTC read and write functions LoongArch: KVM: Add EIOINTC device support LoongArch: KVM: Add IPI user mode read and write function LoongArch: KVM: Add IPI read and write function LoongArch: KVM: Add IPI device support LoongArch: KVM: Add iocsr and mmio bus simulation in kernel KVM: arm64: Pass on SVE mapping failures ...
2024-11-14Merge branches 'for-next/gcs', 'for-next/probes', 'for-next/asm-offsets', ↵Catalin Marinas1-0/+2
'for-next/tlb', 'for-next/misc', 'for-next/mte', 'for-next/sysreg', 'for-next/stacktrace', 'for-next/hwcap3', 'for-next/kselftest', 'for-next/crc32', 'for-next/guest-cca', 'for-next/haft' and 'for-next/scs', remote-tracking branch 'arm64/for-next/perf' into for-next/core * arm64/for-next/perf: perf: Switch back to struct platform_driver::remove() perf: arm_pmuv3: Add support for Samsung Mongoose PMU dt-bindings: arm: pmu: Add Samsung Mongoose core compatible perf/dwc_pcie: Fix typos in event names perf/dwc_pcie: Add support for Ampere SoCs ARM: pmuv3: Add missing write_pmuacr() perf/marvell: Marvell PEM performance monitor support perf/arm_pmuv3: Add PMUv3.9 per counter EL0 access control perf/dwc_pcie: Convert the events with mixed case to lowercase perf/cxlpmu: Support missing events in 3.1 spec perf: imx_perf: add support for i.MX91 platform dt-bindings: perf: fsl-imx-ddr: Add i.MX91 compatible drivers perf: remove unused field pmu_node * for-next/gcs: (42 commits) : arm64 Guarded Control Stack user-space support kselftest/arm64: Fix missing printf() argument in gcs/gcs-stress.c arm64/gcs: Fix outdated ptrace documentation kselftest/arm64: Ensure stable names for GCS stress test results kselftest/arm64: Validate that GCS push and write permissions work kselftest/arm64: Enable GCS for the FP stress tests kselftest/arm64: Add a GCS stress test kselftest/arm64: Add GCS signal tests kselftest/arm64: Add test coverage for GCS mode locking kselftest/arm64: Add a GCS test program built with the system libc kselftest/arm64: Add very basic GCS test program kselftest/arm64: Always run signals tests with GCS enabled kselftest/arm64: Allow signals tests