| Age | Commit message (Collapse) | Author | Files | Lines |
|
We need the driver-core fixes in here as well to build on top of.
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
Using kstrtouint_from_user() instead of copy_from_user() + kstrtouint() makes
the code simpler and less error-prone.
No functional changes.
[ bp: Align function args on opening brace, while at it. ]
Suggested-by: Yury Norov <ynorov@nvidia.com>
Signed-off-by: Fushuai Wang <wangfushuai@baidu.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Yury Norov <ynorov@nvidia.com>
Link: https://patch.msgid.link/20260117145615.53455-3-fushuai.wang@linux.dev
|
|
A bunch of Samsung SoCs are missing the EL2 virtual timer interrupt
despite using ARMv8.1+ CPUs. Add the missing interrupt, except for
those broken designs where the interrupt is documented as not being
wired.
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Signed-off-by: Marc Zyngier <maz@kernel.org>
Link: https://patch.msgid.link/20260523140242.586031-9-maz@kernel.org
Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"arm64:
- Fix ITS EventID sanitisation when restoring an interrupt
translation table.
- Fix PPI memory leak when failing to initialise a vcpu.
- Correctly return an error when the validation of a hypervisor trace
descriptor fails, and limit this validation to protected mode only.
RISC-V:
- Fix invalid HVA warning in steal-time recording
- Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info() and
pmu_snapshot_set_shmem()
- Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
- Fix sign extension of value for MMIO loads
s390:
- Fix bugs in vSIE (nested virtualization) and UCONTROL, caused by
the page table rewrite.
x86:
- Apply erratum #1235 workaround (disable AVIC IPI virtualization) on
Hygon Family 18h, just like on AMD Family 17h.
- When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM,
return the VM's configured APIC bus frequency instead of the
default. This is less confusing (read: not wrong) and makes it
easier to fill in CPUID information that communicates the APIC bus
frequency to the guest.
Selftests:
- Do not include glibc-internal <bits/endian.h>; it worked by chance
and broke building KVM selftests with musl"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm:
KVM: SVM: Disable AVIC IPI virtualization on Hygon Family 18h (erratum #1235)
KVM: selftests: Verify that KVM returns the configured APIC cycle length
KVM: x86: Return the VM's configured APIC bus frequency when queried
KVM: selftests: elf: Include <endian.h> instead of <bits/endian.h>
KVM: s390: Properly reset zero bit in PGSTE
KVM: s390: vsie: Fix redundant rmap entries
KVM: s390: vsie: Fix unshadowing logic
KVM: s390: Fix leaking kvm_s390_mmu_cache in case of errors
KVM: s390: vsie: Fix memory leak when unshadowing
KVM: arm64: Fix nVHE/pKVM hyp tracing error on invalid desc
KVM: arm64: vgic: Free private_irqs when init fails after allocation
KVM: arm64: vgic-its: Reject restored DTE with out-of-range num_eventid_bits
RISC-V: KVM: Fix sign extension for MMIO loads
RISC-V: KVM: Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
riscv: kvm: return SBI_ERR_FAILURE for pmu_event_info() when OOM
riscv: kvm: return SBI_ERR_FAILURE for pmu_snapshot_set_shmem() when OOM
RISC-V: KVM: Fix invalid HVA warning in steal-time recording
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Ingo Molnar:
- On SEV guests, handle set_memory_{encrypted,decrypted}() failures
more conservatively by assuming that all affected pages are
unencrypted (Carlos López)
- Disable broadcast TLB flush when PCID is disabled (Tom Lendacky)
- Fix VMX vs. hrtimer_rearm_deferred() regression (Peter Zijlstra)
- Move IRQ/NMI dispatch code from KVM into x86 core, to prepare for a
KVM x2apic fix (Peter Zijlstra)
- Fix incorrect munmap() size on map_vdso() failure (Guilherme Giacomo
Simoes)
* tag 'x86-urgent-2026-05-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
virt: sev-guest: Explicitly leak pages in unknown state
x86/mm: Disable broadcast TLB flush when PCID is disabled
x86/kvm/vmx: Fix VMX vs hrtimer_rearm_deferred()
x86/kvm/vmx: Move IRQ/NMI dispatch from KVM into x86 core
x86/vdso: Fix incorrect size in munmap() on map_vdso() failure
|
|
Enable DT overlays on some of the Pine64 devices to enable
use of addon accessories such as WiFi or audio modules.
Signed-off-by: Peter Robinson <pbrobinson@gmail.com>
Link: https://patch.msgid.link/20260518220455.156874-1-pbrobinson@gmail.com
Signed-off-by: Chen-Yu Tsai <wens@kernel.org>
|
|
Add a riscv,pmu device tree node with SBI PMU event mappings for the
SiFive X280 hardware performance counters. This enables OpenSBI to
expose the SBI PMU extension, allowing Linux perf to use the 4
programmable counters (mhpmcounter3-6) across 3 event classes:
instruction commit, microarchitectural, and memory system events.
Event encodings are derived from the SiFive Tenstorrent X280 MC Manual
(21G3.04.00) Table 13, section 3.10.5.
Assisted-by: Claude:claude-opus-4-6[1m]
Signed-off-by: Michael Neuling <mikey@neuling.org>
Reviewed-by: Drew Fustini <fustini@kernel.org>
Signed-off-by: Drew Fustini <fustini@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux
Pull nios2 fixes from Dinh Nguyen:
- Implement _THIS_IP_ for inline asm
- Add Simon Schuster as a maintainer and mark the NIOS2 as Supported
* tag 'nios2_updates_for_v7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux:
nios2: Implement _THIS_IP_ using inline asm
MAINTAINERS: arch/nios2: Add Simon Schuster as co-maintainer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch fixes from Huacai Chen:
"Rework KASLR to avoid initrd overlap, remove some unused code to avoid
a build warning, fix some bugs in kprobes and KVM"
* tag 'loongarch-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson:
LoongArch: KVM: Move some variable declarations to paravirt.h
LoongArch: kprobes: Fix handling of fatal unrecoverable recursions
LoongArch: kprobes: Use larch_insn_text_copy() to patch instructions
LoongArch: Remove unused code to avoid build warning
LoongArch: Avoid initrd overlap during kernel relocation
LoongArch: Skip relocation-time KASLR if already applied
efi/loongarch: Randomize kernel preferred address for KASLR
|
|
When running on a GICv5 host, we push an arch-timer-specific interrupt
domain for the timer interrupts. This interrupt domain is used to mask
the host interrupt when a GICv5 guest is running. However, this
interrupt domain is still in place when running with a GICv3 guest on
GICv5 hardware. The result is that some interrupt state changes are
not correctly propragated to the host irqchip driver for legacy
guests.
Explicitly pass irqchip state changes though to the host irqchip
driver when running a GICv3-based guest on a GICv5 host. This bypasses
all masking, and thereby operates just as a native GICv3 guest would,
with the exception of having an additional irq domain in the
hierarchy.
Fixes: 9491c63b6cd7 ("KVM: arm64: gic-v5: Enlighten arch timer for GICv5")
Suggested-by: Marc Zyngier <maz@kernel.org>
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://lore.kernel.org/r/20260520091949.542365-19-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
For GICv5 guests we make use of the DVI mechanism for PPIs where
possible. When mapping a virtual irq to a physical one for a GICv5
guest, the corresponding bit in the DVI bitmap is set. When unmapping,
said bit is cleared again. The key user of this mechanism is the arch
timer.
The existing code used the non-atomic __assign_bit() rather than doing
the update atomically. This could technically result in losing state
if a second PPI's DVI bit were being manipulated concurrently. Each
individual bit within the DVI bitmap is guarded using
vgic_irq->irq_lock, but there's no locking for the overall
bitmap. Therefore, switch to using the atomic assign_bit() function
instead.
Fixes: 5a98d0e17e59 ("KVM: arm64: gic-v5: Implement direct injection of PPIs")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Link: https://lore.kernel.org/r/20260520091949.542365-12-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
As things stand, there is no support for Nested Virt with GICv5 guests
yet. However, this is coming and therefore we need to be able to
correctly triage the traps when running with NV.
Add the missing fgtreg lookups required for that to
triage_sysreg_trap(). These are specific to the FGT regs added as part
of GICv5:
* ICH_HFGRTR_EL2
* ICH_HFGWTR_EL2
* ICH_HFGITR_EL2
Fixes: 9d6d9514c08f ("KVM: arm64: gic-v5: Support GICv5 FGTs & FGUs")
Link: https://sashiko.dev/#/patchset/20260319154937.3619520-1-sascha.bischoff%40arm.com
Signed-off-by: Sascha Bischoff <sascha.bischoff@arm.com>
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20260520091949.542365-11-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
BPF arena usage is becoming more prevalent, but kernel <-> BPF communication
over arena memory is awkward today. Data has to be staged through a trusted
kernel pointer with extra code and copying on the BPF side. While reads
through arena pointers can use a fault-safe helper, writes don't have a good
solution. The in-line alternative would need instruction emulation or asm
fixup labels.
Enable direct kernel-side reads and writes within GUARD_SZ / 2 of any
handed-in arena pointer, without bounds checking. A per-arena scratch page
is installed by the arch fault path into empty arena kernel PTEs - x86 from
page_fault_oops() for not-present faults, arm64 from __do_kernel_fault() for
translation faults, both after the existing exception-table and KFENCE
handling. The faulting instruction retries and the access is also reported
through the program's BPF stream, preserving error reporting.
bpf_prog_find_from_stack() resolves the current BPF program (and its arena)
from the kernel stack - no new bpf_run_ctx state is added. Recovery covers
the 4 GiB arena plus the upper half-guard (GUARD_SZ / 2). The lower
half-guard is excluded because well-behaved kfuncs only access forward from
arena pointers. The kfunc-author contract - access at most GUARD_SZ / 2 past
a handed-in pointer - is documented in Documentation/bpf/kfuncs.rst.
The install is lock-free via ptep_try_set(). On race-loss the winning
installer's PTE is already valid, so the access retry succeeds. The arena
clear path uses ptep_get_and_clear() so installer and clearer race through
atomic accessors. No flush_tlb_kernel_range() afterwards. Stale "not mapped"
entries just cause one extra re-fault, cheaper than a global IPI on every
install.
Scratch exists only to keep the kernel from oopsing on an in-line arena
access. Its presence at a PTE means the BPF program has already
malfunctioned, and the violation is reported through the program's BPF
stream. The only requirement for behavior on a scratched PTE is that the
kernel doesn't crash. In particular, any user-side access through such a PTE
may segfault. The shared scratch page is freed once during map destruction.
BPF instruction faults continue to use the existing JIT exception-table
path. This patch changes only the kernel-text fault path. No UAPI flag is
added. The new behavior is the default.
v2: Use ptep_get_and_clear() in apply_range_clear_cb(). (David)
v3: Stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL. (lkp)
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Cc: David Hildenbrand <david@kernel.org>
Link: https://lore.kernel.org/r/20260522172219.1423324-3-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Add ptep_try_set(ptep, new_pte): atomically set *ptep to new_pte iff it is
currently pte_none(). Returns true on success, false if the slot was already
populated or the arch has no implementation.
The intended caller is the upcoming bpf_arena kernel-side fault recovery
path. The install runs from a page fault that can be nested under locks
held by the faulting kernel caller (e.g. a BPF program holding
raw_res_spin_lock_irqsave on its arena's spinlock), so trylock-and-retry
would A-A deadlock. Lock-free cmpxchg is the only viable option, which
constrains this helper to special kernel page tables where concurrent
writers cooperate via atomic accessors.
The generic version in <linux/pgtable.h> returns false. x86 and arm64
override with try_cmpxchg-based implementations on the underlying pteval.
Other architectures get the false stub - the callers there already fall
through to oops.
v2: Rename to ptep_try_set(). Tighten kerneldoc. (David, Alexei)
v3: Note that strict-zero cmpxchg is narrower than pte_none(). (Andrea)
Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Suggested-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Tejun Heo <tj@kernel.org>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Cc: David Hildenbrand <david@kernel.org>
Acked-by: David Hildenbrand (arm) <david@kernel.org>
Link: https://lore.kernel.org/r/20260522172219.1423324-2-tj@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Hygon Family 18h CPUs are derived from AMD Family 17h (Zen1) silicon and
share the same erratum #1235: hardware may read a stale IsRunning=1 bit
during ICR write emulation and silently fail to generate an
AVIC_IPI_FAILURE_TARGET_NOT_RUNNING VM-Exit on the sending vCPU.
The absence of the VM-Exit causes KVM to miss the required wakeup of
blocking target vCPUs, leading to hung vCPUs and unbounded delays in
guest execution.
Extend the existing AMD Family 17h erratum #1235 workaround to also cover
Hygon Family 18h. With IPI virtualization disabled, KVM never sets
IsRunning=1 in the Physical ID table, so every non-self IPI generates a
VM-Exit and is correctly emulated.
Fixes: 8de4a1c8164e ("KVM: SVM: Disable (x2)AVIC IPI virtualization if CPU has erratum #1235")
Cc: <stable@vger.kernel.org>
Signed-off-by: Tina Zhang <zhang_wei@open-hieco.net>
Message-ID: <20260522040014.3380201-1-zhang_wei@open-hieco.net>
|
|
When KVM_CAP_X86_APIC_BUS_CYCLES_NS is queried on a specific VM, return the
VM's configured APIC bus frequency, not KVM's default. Aside from the fact
that returning the default frequency is blatantly wrong if userspace has
changed the frequency, returning the configured frequency means userspace
can blindly trust the result, e.g. when filling PV CPUID information that
communicates the APIC bus frequency to the guest.
Fixes: 6fef518594bc ("KVM: x86: Add a capability to configure bus frequency for APIC timer")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Closes: https://lore.kernel.org/all/ab84153e33fbe7c25667f595c56b310d4d5a93ef.camel@infradead.org
Signed-off-by: Sean Christopherson <seanjc@google.com>
Message-ID: <20260522173526.3539407-2-seanjc@google.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
|
|
HEAD
KVM/riscv fixes for 7.1, take #1
- Fix invalid HVA warning in steal-time recording
- Return SBI_ERR_FAILURE to guest upon OOM in pmu_event_info()
and pmu_snapshot_set_shmem()
- Fix NULL pointer dereference in SBI v0.1 SEND_IPI handler
- Fix sign extension of value for MMIO loads
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/kvms390/linux into HEAD
KVM: s390: some vSIE and UCONTROL fixes
Fix some memory issues and some hangs in vSIE.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 fixes for 7.1, take #3
- Fix ITS EventID sanitisation when restoring an interrupt translation
table.
- Fix PPI memory leak when failing to initialise a vcpu.
- Correctly return an error when the validation of a hypervisor trace
descriptor fails, and limit this validation to protected mode only.
|
|
The original commit 8c30b0018f9d ("openrisc: Add jump label support")
copies from arm64 and does not properly consider how icache invalidation
on remote cores works in OpenRISC. On OpenRISC remote icaches need to
be invalidated otherwise static key's may remain state after updating.
Fix SMP cache syncing by:
1. Properly invalidate remote core icaches on SMP systems by using
icache_all_inv. The old code uses kick_all_cpus_sync() which runs a
no-op IPI function call on remote CPU's which does execute a lot of
code and flushes many cache lines in the process, but does not flush
all and it's not correct on OpenRISC.
2. For architectures that do not have WRITETHROUGH caches be sure
to flush the dcache after patching.
To test this I first reproduced the issue using a custom test module
[0]. The test confirmed that some icache lines maintained stale
static_key code sequences after calling static_branch_enable(). After
this patch there are no longer jump_label coherency issues.
[0] https://github.com/stffrdhrn/or1k-utils/tree/master/tests/smp_static_key_test
Cc: stable@vger.kernel.org # depends on openrisc: Add icache_all_inv
Fixes: 8c30b0018f9d ("openrisc: Add jump label support")
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
Add functions to invalidate all cache lines which we will use for
static_key patching.
On OpenRISC there is no instruction to invalidate an entire cache so we
loop and invalidate cache lines one by one. This is not extremely
expensive on OpenRISC as we usually have only a few hundred cache lines.
I considered using the invalidate cache page or range functions.
However, tracking which ranges need invalidation would have been more
expensive than flushing all pages.
Cc: stable@vger.kernel.org
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
When working on new cache invalidation functions I noticed these
cleanups in the cache initialization code. Remove unused and commented
instructions to avoid confusion.
Signed-off-by: Stafford Horne <shorne@gmail.com>
|
|
While ARCnet is still used in industrial environments, and cards are
still manufactured, it is unlikely anyone is still using it with ISA
and PCMCIA cards. Reduce future maintenance burden by removing all ISA
and PCMCIA ARCnet drivers and documentation related to them. Update
instructions for loading modules and passing parameters to work on
modern kernels and with the com20020_pci driver. Also take the
opportunity to document the rest of the module parameters, correct a
file path in Documentation/networking/arcnet.rst, and change a
reference to /etc/rc.inet1, which no longer exists, to refer to
ifconfig.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260521001631.45434-4-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When x2AVIC is enabled, disable WRMSR interception only for MSRs that are
actually accelerated by hardware. Disabling interception for MSRs that
aren't accelerated is functionally "fine", and in some cases a weird "win"
for performance, but only for cases that should never be triggered by a
well-behaved VM (writes to read-only registers; the #GP will typically
occur in the guest without taking a #VMEXIT, even for fault-like exits).
But overall, disabling interception for MSRs that aren't accelerated is at
best confusing and unintuitive, and at worst introduces avoidable risk, as
the APM's documentation is imperfect and contradictory. The table in
"15.29.3.1 Virtual APIC Register Accesses" of simply states that such
writes generate exits, where as "Section 15.29.10 x2AVIC" says:
x2APIC MSR intercept checks and access checks have higher priority than
AVIC access permission checks.
CPU behavior follows the latter (which makes perfect sense), but all in
all there's simply no reason to disable interception just to make a #GP
faster.
Note, the set of MSRs that are passed through for write is identical to
VMX's set when IPI virtualization is enabled. This is not a coincidence,
and is another motiviating factor for cleaning up the intercepts, as x2AVIC
is functionally equivalent to APICv+IPIv.
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://patch.msgid.link/20260514213115.1637082-4-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
When toggling x2AVIC on/off, use KVM's curated mask of x2APIC MSRs that
can/should be passed through to the guest (or not) when 2AVIC is enabled.
Using the effective list provided by the local APIC emulation fixes
multiple (classes of) bugs, as the existing hand-coded list of MSRs is
wrong on multiple fronts:
- ARBPRI isn't supported by KVM, isn't accelerated by AVIC (for read or
write), and its #VMEXIT is fault-like, i.e. requires decoding the
instruction. Disabling interception is nonsensical and suboptimal.
- DFR and ICR2 aren't supported by x2APIC and so don't need their
intercepts disabled for performance reasons. While the #GP due to
x2APIC being abled has higher priority than the trap-like #VMEXIT,
disabling interception of unsupported MSRs is confusing and unnecessary.
- RRR is completely unsupported.
- AVIC currently fails to pass through the "range of vectors" registers,
IRR, ISR, and TMR, as e.g. X2APIC_MSR(APIC_IRR) only affects IRR0, and
thus only disables intercept for vectors 31:0 (which are the *least*
interesting registers).
- TMCCT (the current APIC timer count) isn't accelerated by hardware, and
generates a fault-like AVIC_UNACCELERATED_ACCESS #VMEXIT, i.e. requires
KVM to decode the instruction to figure out what the guest was trying to
access. Note, the only reason this isn't a fatal bug is that the AVIC
architecture had the foresight to guard against buggy hypervisors. E.g.
if hardware simply read from the virtual APIC page, the guest would get
garbage (because the timer is emulated in software).
Fixes: 4d1d7942e36a ("KVM: SVM: Introduce logic to (de)activate x2AVIC mode")
Cc: stable@vger.kernel.org
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://patch.msgid.link/20260514213115.1637082-3-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
Add a dedicated local APIC API, kvm_x2apic_disable_intercept_reg_mask(),
to provide the mask of x2APIC registers whose MSRs can and should be passed
through to the guest when x2APIC virtualization is enable, and use it in
lieu of the open-coded equivalent VMX logic. Providing a common helper
will allow sharing the logic with SVM (x2AVIC), and as a bonus eliminates
the somewhat confusing code where KVM enables interception for MSR_TYPE_RW,
even though only the READ case actually needs to be updated.
No functional change intended.
Cc: stable@vger.kernel.org
Reviewed-by: Naveen N Rao (AMD) <naveen@kernel.org>
Link: https://patch.msgid.link/20260514213115.1637082-2-seanjc@google.com
Signed-off-by: Sean Christopherson <seanjc@google.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 fixes from Catalin Marinas:
- Handle probe on hinted conditional branch instructions.
BC.cond instructions can be simulated in the same way as B.cond
instructions, so extend the decode mask for B.cond to cover BC.cond
- Flush the walk cache when unsharing PMD tables. Recent changes to
huge_pmd_unshare() introduced mmu_gather::unshared_tables but the
arm64 code was still treating the TLB flushing as only targeting leaf
entries (TLBI VALE1IS).
Fix it by using non-leaf-only instructions (TLBI VAE1IS) when
tlb->unshared_tables is set
* tag 'arm64-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux:
arm64: tlb: Flush walk cache when unsharing PMD tables
arm64: probes: Handle probes on hinted conditional branch instructions
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 fixes from Alexander Gordeev:
- Fix PAI NNPA mismatch between counting and recording, where sampling
reports twice the value
- Fix loss of PAI counter increments during recording on systems with
many CPUs under heavy load, while counting is not affected
- On some supported machines, CHSC cannot access memory outside the DMA
zone, causing CHSC command failures. Restore GFP_DMA flag when
allocating memory for CHSC control blocks
- Align the numbering scheme for higher-level topology structures like
socket, book, drawer with other hardware identifiers e.g. in sysfs,
procfs and tools like lscpu
* tag 's390-7.1-3' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux:
s390/topology: Use zero-based numbering for containing entities
s390/cio: Restore GFP_DMA for CHSC allocation
s390/pai: Fix missing PAI counter increments under heavy load
s390/pai: Disable duplicate read of kernel PAI counter value
|
|
The last MIPS crypto code was moved to lib/crypto/mips in
commit c9e5ac0ab9d1 ("lib/crypto: mips/md5: Migrate optimized code into
library"). However, arch/mips/crypto still contains stub Kconfig,
Makefile, and .gitignore files. Remove these unnecessary files.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
All LoongArch crypto code was moved to arch/loongarch/lib in
commit 72f51a4f4b07 ("loongarch/crc32: expose CRC32 functions through
lib"). However, arch/loongarch/crypto still contains stub Kconfig and
Makefile files. Remove these unnecessary files.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Use the simpler min() macro since the values are unsigned and
compatible.
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
"struct class" is defined earlier on both cases.
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Link: https://patch.msgid.link/6d5937c5-9d41-4cfe-9e42-0946e12dc72d@p183
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
When huge_pmd_unshare() is called to unshare a PMD table, the
tlb_unshare_pmd_ptdesc() function sets tlb->unshared_tables=true
but the aarch64 tlb_flush() only checked tlb->freed_tables to
determine whether to use TLBF_NONE (vae1is, invalidates walk
cache) or TLBF_NOWALKCACHE (vale1is, leaf-only).
This caused the stale PMD page table entry to remain in the walk cache
after unshare, potentially leading to incorrect page table walks.
Fix by including unshared_tables in the check, so that when
unsharing tables, TLBF_NONE is used and the walk cache is properly
invalidated.
Here is the detailed distinction between vae1is and vale1is:
| Instruction Combination | Actual Invalidation Scope |
| ------------------------ | --------------------------------------------------|
| `VAE1IS` + TTL=`0` | All entries at all levels (full invalidation) |
| `VAE1IS` + TTL=`2` (L2) | Non-leaf at Level 0/1 + leaf at Level 2 |
| `VALE1IS` + TTL=`0` | Leaf entries at all levels (non-leaf not cleared) |
| `VALE1IS` + TTL=`2` (L2) | Leaf entry at Level 2 only |
Signed-off-by: Zeng Heng <zengheng4@huawei.com>
Fixes: 8ce720d5bd91 ("mm/hugetlb: fix excessive IPI broadcasts when unsharing PMD tables using mmu_gather")
Cc: <stable@vger.kernel.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Prevent a crash from happening as the first serial port is initialised:
Console: switching to mono frame buffer device 160x64
fb0: PMAG-AA frame buffer device at tc0
DECstation Z85C30 serial driver version 0.10
CPU 0 Unable to handle kernel paging request at virtual address 0000002c, epc == 803ab00c, ra == 803aafe0
Oops[#1]:
CPU: 0 PID: 1 Comm: swapper Not tainted 6.4.0-rc3-00031-g84a9582fd203-dirty #57
$ 0 : 00000000 10012c00 803aaeb0 00000000
$ 4 : 80e12f60 80e12f50 80e12f58 81000030
$ 8 : 00000000 805ff37c 00000000 33433538
$12 : 65732030 00000006 80c2915d 6c616972
$16 : 80e12f00 807b7630 00000000 00000000
$20 : 00000004 00000348 000001a0 807623b8
$24 : 00000018 00000000
$28 : 80c24000 80c25d60 8078b148 803aafe0
Hi : 00000000
Lo : 00000000
epc : 803ab00c serial_base_ctrl_add+0x78/0xf4
ra : 803aafe0 serial_base_ctrl_add+0x4c/0xf4
Status: 10012c03 KERNEL EXL IE
Cause : 00000008 (ExcCode 02)
BadVA : 0000002c
PrId : 00000440 (R4400SC)
Modules linked in:
Process swapper (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
Stack : 80760000 00000cc0 00400044 00400040 803aa02c 80d61ab8 00000000 807b7630
80760000 807623b8 807b7628 803aa644 80386998 00000000 80e17780 80220f68
80e17780 80d61ab8 80c17d80 80e17780 80e17780 8063c798 80e17780 80383fa0
00000010 80e17780 00000000 80386998 807a0000 00000000 00400040 8038f848
807623b8 80d61ab8 00000004 80e17780 00000000 803a68e4 80c25e2c 803bb884
...
Call Trace:
[<803ab00c>] serial_base_ctrl_add+0x78/0xf4
[<803aa644>] serial_core_register_port+0x174/0x69c
[<8077e9ac>] zs_init+0xc8/0xfc
[<800404d4>] do_one_initcall+0x40/0x2ac
[<8076cecc>] kernel_init_freeable+0x1e4/0x270
[<80605bec>] kernel_init+0x20/0x108
[<800431e8>] ret_from_kernel_thread+0x14/0x1c
Code: 2442aeb0 ae120024 ae0200d0 <8c67002c> 50e00001 8c670000 3c06806e 3c05806e afb30010
---[ end trace 0000000000000000 ]---
(report at the offending commit) -- where a pointer is dereferenced that
has been derived from a null pointer to the port's parent device.
Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device. Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.
Use platform_driver_probe() not just because SCC devices are fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the dz
driver and the first driver to claim it will prevent the other one from
using it. Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.
An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.
Since there is one way only remaining to reach zs_reset() now, remove
the port initialisation marker as no longer needed and go through the
channel reset unconditionally.
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10
Link: https://patch.msgid.link/alpine.DEB.2.21.2605062328480.46195@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Prevent a crash from happening as the first serial port is initialised:
Console: switching to colour frame buffer device 160x64
tgafb: SFB+ detected, rev=0x02
fb0: Digital ZLX-E1 frame buffer device at 0x1e000000
DECstation DZ serial driver version 1.04
CPU 0 Unable to handle kernel paging request at virtual address 000000bc, epc == 8048b3a4, ra == 80470a78
Oops[#1]:
CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.19.0-dirty #35 NONE
$ 0 : 00000000 1000ac00 00000004 804707ac
$ 4 : 00000000 80e20850 80e20858 81000030
$ 8 : 00000000 8072c81c 00000008 fefefeff
$12 : 6c616972 00000006 80c5917f 69726420
$16 : 80e20800 00000000 808f8968 80e20800
$20 : 00000000 807f5a90 808b0094 808d3bc8
$24 : 00000018 80479030
$28 : 80c2e000 80c2fd70 00000069 80470a78
Hi : 00000004
Lo : 00000000
epc : 8048b3a4 __dev_fwnode+0x0/0xc
ra : 80470a78 serial_base_ctrl_add+0xa0/0x168
Status: 1000ac04 IEp
Cause : 30000008 (ExcCode 02)
BadVA : 000000bc
PrId : 00000220 (R3000)
Modules linked in:
Process swapper/0 (pid: 1, threadinfo=(ptrval), task=(ptrval), tls=00000000)
Stack : 00400044 00400040 8046f4cc 00000000 808a6148 808a0000 808f8968 8086983c
808e0000 8046fc84 1000ac01 00000028 80e20700 802ba3f8 80e20700 80d34a94
80c1b900 80e20700 80e20700 80e20700 80e20700 80444650 00000000 00000000
00000000 807f5a90 808b0094 80447080 00400040 808e0000 80d34a94 808a6148
80d34a94 00000004 80e20700 00000000 8076974c 80469810 80c2fe3c 1000ac01
...
Call Trace:
[<8048b3a4>] __dev_fwnode+0x0/0xc
[<80470a78>] serial_base_ctrl_add+0xa0/0x168
[<8046fc84>] serial_core_register_port+0x1c8/0x974
[<808c6af0>] dz_init+0x74/0xc8
[<800470e0>] do_one_initcall+0x44/0x2d4
[<808b111c>] kernel_init_freeable+0x258/0x308
[<8072e434>] kernel_init+0x20/0x114
[<80049cd0>] ret_from_kernel_thread+0x14/0x1c
Code: 27bd0018 03e00008 2402ffea <8c8200bc> 03e00008 00000000 27bdffc0 afbe0038 afb30024
---[ end trace 0000000000000000 ]---
-- where a pointer is dereferenced that has been derived from a null
pointer to the port's parent device.
Since no device is available with legacy probing and it's not anymore a
preferable way to discover devices anyway, switch the driver to using a
platform device and use it as the port's parent device. Update resource
handling accordingly and only request the actual span of addresses used
within the slot, which will have had its resource already requested by
generic platform device code.
Use platform_driver_probe() not just because the DZ device is fixed with
solder on board and not straightforward to remove, but foremost because
the associated TTY's major device number is the same as used by the zs
driver and the first driver to claim it will prevent the other one from
using it. Either one DZ device or some SCC devices will be present in a
given system but never both at a time, and therefore we want the major
device number to be claimed by the first driver to actually successfully
bind to its device and platform_driver_probe() is a way to fulfil that.
An unfortunate consequence of the switch to a platform device is we now
hand the console over from the bootconsole much later in the bootstrap.
The firmware console handler appears good enough though to work so late
and in particular with interrupts enabled.
Conversely only starting the console port so late lets the reset code
fully utilise our delay handlers, so switch from udelay() to fsleep()
for transmitter draining so as to avoid busy-waiting for an excessive
amount of time.
Fixes: 84a9582fd203 ("serial: core: Start managing serial controllers to enable runtime PM")
Signed-off-by: Maciej W. Rozycki <macro@orcam.me.uk>
Cc: stable@vger.kernel.org # needs to use .remove_new for <= 6.10
Link: https://patch.msgid.link/alpine.DEB.2.21.2605062326540.46195@angie.orcam.me.uk
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
In case of memory pressure, it's possible that a guest page gets freed
and then almost immediately reused by the guest. If CMMA is enabled,
_essa_clear_cbrl() will discard all pages that are either unused or
zero. If a discarded page is reused before _essa_clear_cbrl() is called,
and the pgste.zero bit is not cleared, the page will be discarded
despite not being unused.
When calling _gmap_ptep_xchg(), always clear the pgste.zero bit. This
prevents the page from being accidentally discarded when not unused.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Reviewed-by: Steffen Eiden <seiden@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
The address passed to the gmap rmap was not being masked. As a
consequence several different (but functionally equivalent) rmap
entries were being created for each shadowed table.
Fix this by properly masking the address depending on the table level.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
In some cases (i.e. under extreme memory pressure on the host),
attempting to shadow memory will result in the same memory being
unshadowed, causing a loop.
Add a PGSTE bit to distinguish between shadowed memory and shadowed DAT
tables, fix the unshadowing logic in _gmap_ptep_xchg() to prevent
unnecessary unshadowing and perform better checks.
Also fix the unshadowing logic in _gmap_crstep_xchg_atomic() which did
not unshadow properly when the large page would become unprotected.
Opportunistically add a check in gmap_protect_rmap() to make sure it
won't be called with level == TABLE_TYPE_PAGE_TABLE.
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
Fix a memory leak that can happen if gmap_ucas_map_one() or
kvm_s390_mmu_cache_topup() return error values.
Also fix a similar issue in gmap_set_limit().
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Reported-by: Jiaxin Fan <jiaxin.fan@ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
When performing a partial unshadowing, the rmap was being leaked.
Add the missing kfree().
Fixes: a2c17f9270cc ("KVM: s390: New gmap code")
Signed-off-by: Claudio Imbrenda <imbrenda@linux.ibm.com>
Reviewed-by: Christoph Schlameuss <schlameuss@linux.ibm.com>
Reviewed-by: Christian Borntraeger <borntraeger@linux.ibm.com>
Signed-off-by: Christian Borntraeger <borntraeger@linux.ibm.com>
|
|
Although we have some code supporting 128 PPIs, the only supported
configuration is 64 PPIs. There is no way to test the 128 PPI code,
so it is bound to bitrot very quickly.
Given that KVM/arm64's goal has always been to stick to non-IMPDEF
behaviours, drop the 128 PPI support. Someone motivated enough and
with very strong arguments can always bring it back -- it's all in
the git history.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20260520091949.542365-10-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Despite adding the necessary infrastructure to identify irq types,
vgic_get_vcpu_irq() treats GICv5 PPIs in a special way, which
impairs the readability of the code.
Use the existing irq classifiers to handle per-CPU irqs for all
vgic types, and let the normal control flow reach global interrupt
handling without any v5-specific path.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20260520091949.542365-9-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
vgic_v5_ppi_queue_irq_unlock() performs a bunch of sanity checks
that are pretty pointless as there is no code path that can
result in these invariants to be violated. And if they are, a nice
crash is just as instructive than a warning.
Drop what is evidently debug code and simplify the whole thing.
Link: https://lore.kernel.org/r/20260520091949.542365-8-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
vgic_allocate_private_irqs_locked() calls two helpers, oddly named
vgic_{,v5_}allocate_private_irq().
Not only these helpers don't allocate anything, but they also
contain duplicate init code that would be better placed in the
caller.
Consolidate the common init code in the caller, rename the helpers
to vgic_{,v5_}setup_private_irq(), and pass the irq pointer around
instead of the index of the interrupt.
Reviewed-by: Joey Gouly <joey.gouly@arm.com>
Link: https://lore.kernel.org/r/20260520091949.542365-7-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
vgic-v5 has introduced much more prevalent usage of the struct
irq_ops mechanism.
In the process, it becomes evident that suffers from two related
problems:
- it |