| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
"These are the arm64 updates for 6.19.
The biggest part is the Arm MPAM driver under drivers/resctrl/.
There's a patch touching mm/ to handle spurious faults for huge pmd
(similar to the pte version). The corresponding arm64 part allows us
to avoid the TLB maintenance if a (huge) page is reused after a write
fault. There's EFI refactoring to allow runtime services with
preemption enabled and the rest is the usual perf/PMU updates and
several cleanups/typos.
Summary:
Core features:
- Basic Arm MPAM (Memory system resource Partitioning And Monitoring)
driver under drivers/resctrl/ which makes use of the fs/rectrl/ API
Perf and PMU:
- Avoid cycle counter on multi-threaded CPUs
- Extend CSPMU device probing and add additional filtering support
for NVIDIA implementations
- Add support for the PMUs on the NoC S3 interconnect
- Add additional compatible strings for new Cortex and C1 CPUs
- Add support for data source filtering to the SPE driver
- Add support for i.MX8QM and "DB" PMU in the imx PMU driver
Memory managemennt:
- Avoid broadcast TLBI if page reused in write fault
- Elide TLB invalidation if the old PTE was not valid
- Drop redundant cpu_set_*_tcr_t0sz() macros
- Propagate pgtable_alloc() errors outside of __create_pgd_mapping()
- Propagate return value from __change_memory_common()
ACPI and EFI:
- Call EFI runtime services without disabling preemption
- Remove unused ACPI function
Miscellaneous:
- ptrace support to disable streaming on SME-only systems
- Improve sysreg generation to include a 'Prefix' descriptor
- Replace __ASSEMBLY__ with __ASSEMBLER__
- Align register dumps in the kselftest zt-test
- Remove some no longer used macros/functions
- Various spelling corrections"
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
arm64/mm: Document why linear map split failure upon vm_reset_perms is not problematic
arm64/pageattr: Propagate return value from __change_memory_common
arm64/sysreg: Remove unused define ARM64_FEATURE_FIELD_BITS
KVM: arm64: selftests: Consider all 7 possible levels of cache
KVM: arm64: selftests: Remove ARM64_FEATURE_FIELD_BITS and its last user
arm64: atomics: lse: Remove unused parameters from ATOMIC_FETCH_OP_AND macros
Documentation/arm64: Fix the typo of register names
ACPI: GTDT: Get rid of acpi_arch_timer_mem_init()
perf: arm_spe: Add support for filtering on data source
perf: Add perf_event_attr::config4
perf/imx_ddr: Add support for PMU in DB (system interconnects)
perf/imx_ddr: Get and enable optional clks
perf/imx_ddr: Move ida_alloc() from ddr_perf_init() to ddr_perf_probe()
dt-bindings: perf: fsl-imx-ddr: Add compatible string for i.MX8QM, i.MX8QXP and i.MX8DXL
arm64: remove duplicate ARCH_HAS_MEM_ENCRYPT
arm64: mm: use untagged address to calculate page index
MAINTAINERS: new entry for MPAM Driver
arm_mpam: Add kunit tests for props_mismatch()
arm_mpam: Add kunit test for bitmap reset
arm_mpam: Add helper to reset saved mbwu state
...
|
|
The conversion to generic IRQ entry left some functions
in the EL1 (kernel) IRQ entry path very shallow, so drop
the __inner_functions() where appropriate, saving some
time and stack.
This is not a fix but an optimization.
Drop stale comments about irqentry_enter/exit() while we
are at it.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
exit_to_user_mode_prepare() is used for both interrupts and syscalls, but
there is extra rseq work, which is only required for in the interrupt exit
case.
Split up the function and provide wrappers for syscalls and interrupts,
which allows to separate the rseq exit work in the next step.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Reviewed-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20251027084307.782234789@linutronix.de
|
|
We intend that EL0 exception handlers unmask all DAIF exceptions
before calling exit_to_user_mode().
When completing single-step of a suspended breakpoint, we do not call
local_daif_restore(DAIF_PROCCTX) before calling exit_to_user_mode(),
leaving all DAIF exceptions masked.
When pseudo-NMIs are not in use this is benign.
When pseudo-NMIs are in use, this is unsound. At this point interrupts
are masked by both DAIF.IF and PMR_EL1, and subsequent irq flag
manipulation may not work correctly. For example, a subsequent
local_irq_enable() within exit_to_user_mode_loop() will only unmask
interrupts via PMR_EL1 (leaving those masked via DAIF.IF), and
anything depending on interrupts being unmasked (e.g. delivery of
signals) will not work correctly.
This was detected by CONFIG_ARM64_DEBUG_PRIORITY_MASKING.
Move the call to `try_step_suspended_breakpoints()` outside of the check
so that interrupts can be unmasked even if we don't call the step handler.
Fixes: 0ac7584c08ce ("arm64: debug: split single stepping exception entry")
Cc: <stable@vger.kernel.org> # 6.17
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
[catalin.marinas@arm.com: added Mark's rewritten commit log and some whitespace]
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently, x86, Riscv and Loongarch use the generic entry code, which
makes maintainer's work easier and code more elegant. Start converting
arm64 to use the generic entry infrastructure from kernel/entry/* by
switching it to generic IRQ entry, which removes 100+ lines of duplicate
code. arm64 will completely switch to generic entry in a later series.
The changes are below:
- Remove *enter_from/exit_to_kernel_mode(), and wrap with generic
irqentry_enter/exit() as their code and functionality are almost
identical.
- Define ARCH_EXIT_TO_USER_MODE_WORK and implement
arch_exit_to_user_mode_work() to check arm64-specific thread flags
"_TIF_MTE_ASYNC_FAULT" and "_TIF_FOREIGN_FPSTATE".
So also remove *enter_from/exit_to_user_mode(), and wrap with
generic enter_from/exit_to_user_mode() because they are
exactly the same.
- Remove arm64_enter/exit_nmi() and use generic irqentry_nmi_enter/exit()
because they're exactly the same, so the temporary arm64 version
irqentry_state can also be removed.
- Remove PREEMPT_DYNAMIC code, as generic irqentry_exit_cond_resched()
has the same functionality.
- Implement arch_irqentry_exit_need_resched() with
arm64_preempt_schedule_irq() for arm64 which will allow arm64 to do
its architecture specific checks.
Tested-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Suggested-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The arm64 entry code only preempts a kernel context upon a return from
a regular IRQ exception. The generic entry code may preempt a kernel
context for any exception return where irqentry_exit() is used, and so
may preempt other exceptions such as faults.
In preparation for moving arm64 over to the generic entry code, align
arm64 with the generic behaviour by calling
arm64_preempt_schedule_irq() from exit_to_kernel_mode(). To make this
possible, arm64_preempt_schedule_irq()
and dynamic/raw_irqentry_exit_cond_resched() are moved earlier in
the file, with no changes.
As Mark pointed out, this change will have the following 2 key impact:
- " We'll preempt even without taking a "real" interrupt. That
shouldn't result in preemption that wasn't possible before,
but it does change the probability of preempting at certain points,
and might have a performance impact, so probably warrants a
benchmark."
- " We will not preempt when taking interrupts from a region of kernel
code where IRQs are enabled but RCU is not watching, matching the
behaviour of the generic entry code.
This has the potential to introduce livelock if we can ever have a
screaming interrupt in such a region, so we'll need to go figure out
whether that's actually a problem.
Having this as a separate patch will make it easier to test/bisect
for that specifically."
Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
To align the structure of the code with irqentry_exit_cond_resched()
from the generic entry code, hoist the need_irq_preemption()
and IS_ENABLED() check earlier. And different preemption check functions
are defined based on whether dynamic preemption is enabled.
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The generic entry code uses preempt_count() and need_resched() helpers to
check if it should do preempt_schedule_irq(). Currently, arm64 use its own
check logic, that is "READ_ONCE(current_thread_info()->preempt_count == 0",
which is equivalent to "preempt_count() == 0 && need_resched()".
In preparation for moving arm64 over to the generic entry code, use
these helpers to replace arm64's own code and move it ahead.
No functional changes.
Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The generic entry code has the form:
| raw_irqentry_exit_cond_resched()
| {
| if (!preempt_count()) {
| ...
| if (need_resched())
| preempt_schedule_irq();
| }
| }
In preparation for moving arm64 over to the generic entry code, align
the structure of the arm64 code with raw_irqentry_exit_cond_resched() from
the generic entry code.
Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The generic entry code uses irqentry_state_t to track lockdep and RCU
state across exception entry and return. For historical reasons, arm64
embeds similar fields within its pt_regs structure.
In preparation for moving arm64 over to the generic entry code, pull
these fields out of arm64's pt_regs, and use a separate structure,
matching the style of the generic entry code.
No functional changes.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
The generic entry code expects architecture code to provide
regs_irqs_disabled(regs) function, but arm64 does not have this and
provides interrupts_enabled(regs), which has the opposite polarity.
In preparation for moving arm64 over to the generic entry code,
relace arm64's interrupts_enabled() with regs_irqs_disabled() and
update its callers under arch/arm64.
For the moment, a definition of interrupts_enabled() is provided for
the GICv3 driver. Once arch/arm implement regs_irqs_disabled(), this
can be removed.
Delete the fast_interrupts_enabled() macro as it is unused and we
don't want any new users to show up.
No functional changes.
Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Suggested-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
'for-next/misc', 'for-next/acpi', 'for-next/debug-entry', 'for-next/feat_mte_tagged_far', 'for-next/kselftest', 'for-next/mdscr-cleanup' and 'for-next/vmap-stack', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf: (23 commits)
drivers/perf: hisi: Support PMUs with no interrupt
drivers/perf: hisi: Relax the event number check of v2 PMUs
drivers/perf: hisi: Add support for HiSilicon SLLC v3 PMU driver
drivers/perf: hisi: Use ACPI driver_data to retrieve SLLC PMU information
drivers/perf: hisi: Add support for HiSilicon DDRC v3 PMU driver
drivers/perf: hisi: Simplify the probe process for each DDRC version
perf/arm-ni: Support sharing IRQs within an NI instance
perf/arm-ni: Consolidate CPU affinity handling
perf/cxlpmu: Fix typos in cxl_pmu.c comments and documentation
perf/cxlpmu: Remove unintended newline from IRQ name format string
perf/cxlpmu: Fix devm_kcalloc() argument order in cxl_pmu_probe()
perf: arm_spe: Relax period restriction
perf: arm_pmuv3: Add support for the Branch Record Buffer Extension (BRBE)
KVM: arm64: nvhe: Disable branch generation in nVHE guests
arm64: Handle BRBE booting requirements
arm64/sysreg: Add BRBE registers and fields
perf/arm: Add missing .suppress_bind_attrs
perf/arm-cmn: Reduce stack usage during discovery
perf: imx9_perf: make the read-only array mask static const
perf/arm-cmn: Broaden module description for wider interconnect support
...
* for-next/livepatch:
: Support for HAVE_LIVEPATCH on arm64
arm64: Kconfig: Keep selects somewhat alphabetically ordered
arm64: Implement HAVE_LIVEPATCH
arm64: stacktrace: Implement arch_stack_walk_reliable()
arm64: stacktrace: Check kretprobe_find_ret_addr() return value
arm64/module: Use text-poke API for late relocations.
* for-next/user-contig-bbml2:
: Optimise the TLBI when folding/unfolding contigous PTEs on hardware with BBML2 and no TLB conflict aborts
arm64/mm: Elide tlbi in contpte_convert() under BBML2
iommu/arm: Add BBM Level 2 smmu feature
arm64: Add BBM Level 2 cpu feature
arm64: cpufeature: Introduce MATCH_ALL_EARLY_CPUS capability type
* for-next/misc:
: Miscellaneous arm64 patches
arm64/gcs: task_gcs_el0_enable() should use passed task
arm64: signal: Remove ISB when resetting POR_EL0
arm64/mm: Drop redundant addr increment in set_huge_pte_at()
arm64: Mark kernel as tainted on SAE and SError panic
arm64/gcs: Don't call gcs_free() when releasing task_struct
arm64: fix unnecessary rebuilding when CONFIG_DEBUG_EFI=y
arm64/mm: Optimize loop to reduce redundant operations of contpte_ptep_get
arm64: pi: use 'targets' instead of extra-y in Makefile
* for-next/acpi:
: Various ACPI arm64 changes
ACPI: Suppress misleading SPCR console message when SPCR table is absent
ACPI: Return -ENODEV from acpi_parse_spcr() when SPCR support is disabled
* for-next/debug-entry:
: Simplify the debug exception entry path
arm64: debug: remove debug exception registration infrastructure
arm64: debug: split bkpt32 exception entry
arm64: debug: split brk64 exception entry
arm64: debug: split hardware watchpoint exception entry
arm64: debug: split single stepping exception entry
arm64: debug: refactor reinstall_suspended_bps()
arm64: debug: split hardware breakpoint exception entry
arm64: entry: Add entry and exit functions for debug exceptions
arm64: debug: remove break/step handler registration infrastructure
arm64: debug: call step handlers statically
arm64: debug: call software breakpoint handlers statically
arm64: refactor aarch32_break_handler()
arm64: debug: clean up single_step_handler logic
* for-next/feat_mte_tagged_far:
: Support for reporting the non-address bits during a synchronous MTE tag check fault
kselftest/arm64/mte: Add mtefar tests on check_mmap_options
kselftest/arm64/mte: Refactor check_mmap_option test
kselftest/arm64/mte: Add verification for address tag in signal handler
kselftest/arm64/mte: Add address tag related macro and function
kselftest/arm64/mte: Check MTE_FAR feature is supported
kselftest/arm64/mte: Register mte signal handler with SA_EXPOSE_TAGBITS
kselftest/arm64: Add MTE_FAR hwcap test
KVM: arm64: Expose FEAT_MTE_TAGGED_FAR feature to guest
arm64: Report address tag when FEAT_MTE_TAGGED_FAR is supported
arm64/cpufeature: Add FEAT_MTE_TAGGED_FAR feature
* for-next/kselftest:
: Kselftest updates for arm64
kselftest/arm64: Handle attempts to disable SM on SME only systems
kselftest/arm64: Fix SVE write data generation for SME only systems
kselftest/arm64: Test SME on SME only systems in fp-ptrace
kselftest/arm64: Test FPSIMD format data writes via NT_ARM_SVE in fp-ptrace
kselftest/arm64: Allow sve-ptrace to run on SME only systems
kselftest/arm4: Provide local defines for AT_HWCAP3
kselftest/arm64: Specify SVE data when testing VL set in sve-ptrace
kselftest/arm64: Fix test for streaming FPSIMD write in sve-ptrace
kselftest/arm64: Fix check for setting new VLs in sve-ptrace
kselftest/arm64: Convert tpidr2 test to use kselftest.h
* for-next/mdscr-cleanup:
: Drop redundant DBG_MDSCR_* macros
KVM: selftests: Change MDSCR_EL1 register holding variables as uint64_t
arm64/debug: Drop redundant DBG_MDSCR_* macros
* for-next/vmap-stack:
: Force VMAP_STACK on arm64
arm64: remove CONFIG_VMAP_STACK checks from entry code
arm64: remove CONFIG_VMAP_STACK checks from SDEI stack handling
arm64: remove CONFIG_VMAP_STACK checks from stacktrace overflow logic
arm64: remove CONFIG_VMAP_STACK conditionals from traps overflow stack
arm64: remove CONFIG_VMAP_STACK conditionals from irq stack setup
arm64: Remove CONFIG_VMAP_STACK conditionals from THREAD_SHIFT and THREAD_ALIGN
arm64: efi: Remove CONFIG_VMAP_STACK check
arm64: Mandate VMAP_STACK
arm64: efi: Fix KASAN false positive for EFI runtime stack
arm64/ptrace: Fix stack-out-of-bounds read in regs_get_kernel_stack_nth()
arm64/gcs: Don't call gcs_free() during flush_gcs()
arm64: Restrict pagetable teardown to avoid false warning
docs: arm64: Fix ICC_SRE_EL2 register typo in booting.rst
|
|
With VMAP_STACK now always enabled on arm64, remove all CONFIG_VMAP_STACK
conditionals from entry handling in arch/arm64/kernel/entry-common.c and
arch/arm64/kernel/entry.S.
This change unconditionally includes the bad stack handling and overflow
detection logic, simplifying the code and reflecting the mandatory use of
VMAP_STACK for all arm64 kernel builds.
Signed-off-by: Breno Leitao <leitao@debian.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707-arm64_vmap-v1-8-8de98ca0f91c@debian.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Now that debug exceptions are handled individually and without the need
for dynamic registration, remove the unused registration infrastructure.
This removes the external caller for `debug_exception_enter()` and
`debug_exception_exit()`.
Make them static again and remove them from the header.
Remove `early_brk64()` as it has been made redundant by
(arm64: debug: split brk64 exception entry) and is not used anymore.
Note : in `early_brk64()` `bug_brk_handler()` is called unconditionally
as a fall-through, but now `call_break_hook()` only calls it if the
immediate matches.
This does not change the behaviour in early boot, as if
`bug_brk_handler()` was called on a non-BUG immediate it would return
DBG_HOOK_ERROR anyway, which `call_break_hook()` will do if no immediate
matches.
Remove `trap_init()`, as it would be empty and a weak definition already
exists in `init/main.c`.
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707114109.35672-14-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently all debug exceptions share common entry code and are routed
to `do_debug_exception()`, which calls dynamically-registered
handlers for each specific debug exception. This is unfortunate as
different debug exceptions have different entry handling requirements,
and it would be better to handle these distinct requirements earlier.
The BKPT32 exception can only be triggered by a BKPT instruction. Thus,
we know that the PC is a legitimate address and isn't being used to train
a branch predictor with a bogus address : we don't need to call
`arm64_apply_bp_hardening()`.
The handler for this exception only pends a signal and doesn't depend
on any per-CPU state : we don't need to inhibit preemption, nor do we
need to keep the DAIF exceptions masked, so we can unmask them earlier.
Split the BKPT32 exception entry and adjust function signatures and its
behaviour to match its relaxed constraints compared to other
debug exceptions.
We can also remove `NOKRPOBE_SYMBOL`, as this cannot lead to a kprobe
recursion.
This replaces the last usage of `el0_dbg()`, so remove it.
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707114109.35672-13-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently all debug exceptions share common entry code and are routed
to `do_debug_exception()`, which calls dynamically-registered
handlers for each specific debug exception. This is unfortunate as
different debug exceptions have different entry handling requirements,
and it would be better to handle these distinct requirements earlier.
The BRK64 instruction can only be triggered by a BRK instruction. Thus,
we know that the PC is a legitimate address and isn't being used to train
a branch predictor with a bogus address : we don't need to call
`arm64_apply_bp_hardening()`.
We do not need to handle the Cortex-A76 erratum #1463225 either, as it
only relevant for single stepping at EL1.
BRK64 does not write FAR_EL1 either, as only hardware watchpoints do so.
Split the BRK64 exception entry, adjust the function signature, and its
behaviour to match the lack of needed mitigations.
Further, as the EL0 and EL1 code paths are cleanly separated, we can split
`do_brk64()` into `do_el0_brk64()` and `do_el1_brk64()`, and call them
directly from the relevant entry paths.
Use `die()` directly for the EL1 error path, as in `do_el1_bti()` and
`do_el1_undef()`.
We can also remove `NOKRPOBE_SYMBOL` for the EL0 path, as it cannot
lead to a kprobe recursion.
When taking a BRK64 exception from EL0, the exception handling is safely
preemptible : the only possible handler is `uprobe_brk_handler()`.
It only operates on task-local data and properly checks its validity,
then raises a Thread Information Flag, processed before returning
to userspace in `do_notify_resume()`, which is already preemptible.
Thus we can safely unmask interrupts and enable preemption before
handling the break itself, fixing a PREEMPT_RT issue where the handler
could call a sleeping function with preemption disabled.
Given that the break hook registration is handled statically in
`call_break_hook` since
(arm64: debug: call software break handlers statically)
and that we now bypass the exception handler registration, this change
renders `early_brk64` redundant : its functionality is now handled through
the post-init path.
This also removes the last usage of `el1_dbg()`.
This also removes the last usage of `el0_dbg()` without `CONFIG_COMPAT`.
Mark it `__maybe_unused`, to prevent a warning when building this patch
without `CONFIG_COMPAT`, as the following patch removes `el0_dbg()`.
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707114109.35672-12-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently all debug exceptions share common entry code and are routed
to `do_debug_exception()`, which calls dynamically-registered
handlers for each specific debug exception. This is unfortunate as
different debug exceptions have different entry handling requirements,
and it would be better to handle these distinct requirements earlier.
Hardware watchpoints are the only debug exceptions that will write
FAR_EL1, so we need to preserve it and pass it down.
However, they cannot be used to maliciously train branch predictors, so
we can omit calling `arm64_bp_hardening()`, nor do they need to handle
the Cortex-A76 erratum #1463225, as it only applies to single stepping
exceptions.
As the hardware watchpoint handler only returns 0 and never triggers
the call to `arm64_notify_die()`, we can call it directly from
`entry-common.c`.
Split the hardware watchpoint exception entry and adjust the behaviour
to match the lack of needed mitigations.
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707114109.35672-11-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently all debug exceptions share common entry code and are routed
to `do_debug_exception()`, which calls dynamically-registered
handlers for each specific debug exception. This is unfortunate as
different debug exceptions have different entry handling requirements,
and it would be better to handle these distinct requirements earlier.
The single stepping exception has the most constraints : it can be
exploited to train branch predictors and it needs special handling at EL1
for the Cortex-A76 erratum #1463225. We need to conserve all those
mitigations.
However, it does not write an address at FAR_EL1, as only hardware
watchpoints do so.
The single-step handler does its own signaling if it needs to and only
returns 0, so we can call it directly from `entry-common.c`.
Split the single stepping exception entry, adjust the function signature,
keep the security mitigation and erratum handling.
Further, as the EL0 and EL1 code paths are cleanly separated, we can split
`do_softstep()` into `do_el0_softstep()` and `do_el1_softstep()` and
call them directly from the relevant entry paths.
We can also remove `NOKPROBE_SYMBOL` for the EL0 path, as it cannot
lead to a kprobe recursion.
Move the call to `arm64_apply_bp_hardening()` to `entry-common.c` so that
we can do it as early as possible, and only for the exceptions coming
from EL0, where it is needed.
This is safe to do as it is `noinstr`, as are all the functions it
may call. `el0_ia()` and `el0_pc()` already call it this way.
When taking a soft-step exception from EL0, most of the single stepping
handling is safely preemptible : the only possible handler is
`uprobe_single_step_handler()`. It only operates on task-local data and
properly checks its validity, then raises a Thread Information Flag,
processed before returning to userspace in `do_notify_resume()`, which
is already preemptible.
However, the soft-step handler first calls `reinstall_suspended_bps()`
to check if there is any hardware breakpoint or watchpoint pending
or already stepped through.
This cannot be preempted as it manipulates the hardware breakpoint and
watchpoint registers.
Move the call to `try_step_suspended_breakpoints()` to `entry-common.c`
and adjust the relevant comments.
We can now safely unmask interrupts before handling the step itself,
fixing a PREEMPT_RT issue where the handler could call a sleeping function
with preemption disabled.
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Closes: https://lore.kernel.org/linux-arm-kernel/Z6YW_Kx4S2tmj2BP@uudg.org/
Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707114109.35672-10-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Currently all debug exceptions share common entry code and are routed
to `do_debug_exception()`, which calls dynamically-registered
handlers for each specific debug exception. This is unfortunate as
different debug exceptions have different entry handling requirements,
and it would be better to handle these distinct requirements earlier.
Hardware breakpoints exceptions are generated by the hardware after user
configuration. As such, they can be exploited when training branch
predictors outside of the userspace VA range: they still need to call
`arm64_apply_bp_hardening()` if needed to mitigate against this attack.
However, they do not need to handle the Cortex-A76 erratum #1463225 as
it only applies to single stepping exceptions.
It does not set an address in FAR_EL1 either, only the hardware
watchpoint does.
As the hardware breakpoint handler only returns 0 and never triggers
the call to `arm64_notify_die()`, we can call it directly from
`entry-common.c`.
Split the hardware breakpoint exception entry, adjust
the function signature, and handling of the Cortex-A76 erratum to fit
the behaviour of the exception.
Move the call to `arm64_apply_bp_hardening()` to `entry-common.c` so that
we can do it as early as possible, and only for the exceptions coming
from EL0, where it is needed.
This is safe to do as it is `noinstr`, as are all the functions it
may call. `el0_ia()` and `el0_pc()` already call it this way.
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707114109.35672-8-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Move the `debug_exception_enter()` and `debug_exception_exit()`
functions from mm/fault.c, as they are needed to split
the debug exceptions entry paths from the current unified one.
Make them externally visible in include/asm/exception.h until
the caller in mm/fault.c is cleaned up.
Signed-off-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Tested-by: Luis Claudio R. Goncalves <lgoncalv@redhat.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Reviewed-by: Will Deacon <will@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20250707114109.35672-7-ada.coupriediaz@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
MDSCR_EL1 has already been defined in tools sysreg format and hence can be
used in all debug monitor related call paths. But using generated sysreg
definitions causes build warnings because there is a mismatch between mdscr
variable (u32) and GENMASK() based masks (long unsigned int). Convert all
variables handling MDSCR_EL1 register as u64 which also reflects its true
width as well.
--------------------------------------------------------------------------
arch/arm64/kernel/debug-monitors.c: In function ‘disable_debug_monitors’:
arch/arm64/kernel/debug-monitors.c:108:13: warning: conversion from ‘long
unsigned int’ to ‘u32’ {aka ‘unsigned int’} changes value from
‘18446744073709518847’ to ‘4294934527’ [-Woverflow]
108 | disable = ~MDSCR_EL1_MDE;
| ^
--------------------------------------------------------------------------
While here, replace an open encoding with MDSCR_EL1_TDCC in __cpu_setup().
Cc: Will Deacon <will@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-kernel@vger.kernel.org
Reviewed-by: Ada Couprie Diaz <ada.coupriediaz@arm.com>
Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20250613023646.1215700-2-anshuman.khandual@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Allocate a task flag used to represent the patch pending state for the
task. When a livepatch is being loaded or unloaded, the livepatch code
uses this flag to select the proper version of a being patched kernel
functions to use for current task.
In arch/arm64/Kconfig, select HAVE_LIVEPATCH and include proper Kconfig.
This is largely based on [1] by Suraj Jitindar Singh.
[1] https://lore.kernel.org/all/20210604235930.603-1-surajjs@amazon.com/
Cc: Suraj Jitindar Singh <surajjs@amazon.com>
Cc: Torsten Duwe <duwe@suse.de>
Acked-by: Miroslav Benes <mbenes@suse.cz>
Tested-by: Breno Leitao <leitao@debian.org>
Tested-by: Andrea della Porta <andrea.porta@suse.com>
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250630174502.842486-1-song@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
* for-next/sme-fixes: (35 commits)
arm64/fpsimd: Allow CONFIG_ARM64_SME to be selected
arm64/fpsimd: ptrace: Gracefully handle errors
arm64/fpsimd: ptrace: Mandate SVE payload for streaming-mode state
arm64/fpsimd: ptrace: Do not present register data for inactive mode
arm64/fpsimd: ptrace: Save task state before generating SVE header
arm64/fpsimd: ptrace/prctl: Ensure VL changes leave task in a valid state
arm64/fpsimd: ptrace/prctl: Ensure VL changes do not resurrect stale data
arm64/fpsimd: Make clone() compatible with ZA lazy saving
arm64/fpsimd: Clear PSTATE.SM during clone()
arm64/fpsimd: Consistently preserve FPSIMD state during clone()
arm64/fpsimd: Remove redundant task->mm check
arm64/fpsimd: signal: Use SMSTOP behaviour in setup_return()
arm64/fpsimd: Add task_smstop_sm()
arm64/fpsimd: Factor out {sve,sme}_state_size() helpers
arm64/fpsimd: Clarify sve_sync_*() functions
arm64/fpsimd: ptrace: Consistently handle partial writes to NT_ARM_(S)SVE
arm64/fpsimd: signal: Consistently read FPSIMD context
arm64/fpsimd: signal: Mandate SVE payload for streaming-mode state
arm64/fpsimd: signal: Clear PSTATE.SM when restoring FPSIMD frame only
arm64/fpsimd: Do not discard modified SVE state
...
|
|
Historically SVE state was discarded deterministically early in the
syscall entry path, before ptrace is notified of syscall entry. This
permitted ptrace to modify SVE state before and after the "real" syscall
logic was executed, with the modified state being retained.
This behaviour was changed by commit:
8c845e2731041f0f ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
That commit was intended to speed up workloads that used SVE by
opportunistically leaving SVE enabled when returning from a syscall.
The syscall entry logic was modified to truncate the SVE state without
disabling userspace access to SVE, and fpsimd_save_user_state() was
modified to discard userspace SVE state whenever
in_syscall(current_pt_regs()) is true, i.e. when
current_pt_regs()->syscallno != NO_SYSCALL.
Leaving SVE enabled opportunistically resulted in a couple of changes to
userspace visible behaviour which weren't described at the time, but are
logical consequences of opportunistically leaving SVE enabled:
* Signal handlers can observe the type of saved state in the signal's
sve_context record. When the kernel only tracks FPSIMD state, the 'vq'
field is 0 and there is no space allocated for register contents. When
the kernel tracks SVE state, the 'vq' field is non-zero and the
register contents are saved into the record.
As a result of the above commit, 'vq' (and the presence of SVE
register state) is non-deterministically zero or non-zero for a period
of time after a syscall. The effective register state is still
deterministic.
Hopefully no-one relies on this being deterministic. In general,
handlers for asynchronous events cannot expect a deterministic state.
* Similarly to signal handlers, ptrace requests can observe the type of
saved state in the NT_ARM_SVE and NT_ARM_SSVE regsets, as this is
exposed in the header flags. As a result of the above commit, this is
now in a non-deterministic state after a syscall. The effective
register state is still deterministic.
Hopefully no-one relies on this being deterministic. In general,
debuggers would have to handle this changing at arbitrary points
during program flow.
Discarding the SVE state within fpsimd_save_user_state() resulted in
other changes to userspace visible behaviour which are not desirable:
* A ptrace tracer can modify (or create) a tracee's SVE state at syscall
entry or syscall exit. As a result of the above commit, the tracee's
SVE state can be discarded non-deterministically after modification,
rather than being retained as it previously was.
Note that for co-operative tracer/tracee pairs, the tracer may
(re)initialise the tracee's state arbitrarily after the tracee sends
itself an initial SIGSTOP via a syscall, so this affects realistic
design patterns.
* The current_pt_regs()->syscallno field can be modified via ptrace, and
can be altered even when the tracee is not really in a syscall,
causing non-deterministic discarding to occur in situations where this
was not previously possible.
Further, using current_pt_regs()->syscallno in this way is unsound:
* There are data races between readers and writers of the
current_pt_regs()->syscallno field.
The current_pt_regs()->syscallno field is written in interruptible
task context using plain C accesses, and is read in irq/softirq
context using plain C accesses. These accesses are subject to data
races, with the usual concerns with tearing, etc.
* Writes to current_pt_regs()->syscallno are subject to compiler
reordering.
As current_pt_regs()->syscallno is written with plain C accesses,
the compiler is free to move those writes arbitrarily relative to
anything which doesn't access the same memory location.
In theory this could break signal return, where prior to restoring the
SVE state, restore_sigframe() calls forget_syscall(). If the write
were hoisted after restore of some SVE state, that state could be
discarded unexpectedly.
In practice that reordering cannot happen in the absence of LTO (as
cross compilation-unit function calls happen prevent this reordering),
and that reordering appears to be unlikely in the presence of LTO.
Additionally, since commit:
f130ac0ae4412dbe ("arm64: syscall: unmask DAIF earlier for SVCs")
... DAIF is unmasked before el0_svc_common() sets regs->syscallno to the
real syscall number. Consequently state may be saved in SVE format prior
to this point.
Considering all of the above, current_pt_regs()->syscallno should not be
used to infer whether the SVE state can be discarded. Luckily we can
instead use cpu_fp_state::to_save to track when it is safe to discard
the SVE state:
* At syscall entry, after the live SVE register state is truncated, set
cpu_fp_state::to_save to FP_STATE_FPSIMD to indicate that only the
FPSIMD portion is live and needs to be saved.
* At syscall exit, once the task's state is guaranteed to be live, set
cpu_fp_state::to_save to FP_STATE_CURRENT to indicate that TIF_SVE
must be considered to determine which state needs to be saved.
* Whenever state is modified, it must be saved+flushed prior to
manipulation. The state will be truncated if necessary when it is
saved, and reloading the state will set fp_state::to_save to
FP_STATE_CURRENT, preventing subsequent discarding.
This permits SVE state to be discarded *only* when it is known to have
been truncated (and the non-FPSIMD portions must be zero), and ensures
that SVE state is retained after it is explicitly modified.
For backporting, note that this fix depends on the following commits:
* b2482807fbd4 ("arm64/sme: Optimise SME exit on syscall entry")
* f130ac0ae441 ("arm64: syscall: unmask DAIF earlier for SVCs")
* 929fa99b1215 ("arm64/fpsimd: signal: Always save+flush state early")
Fixes: 8c845e273104 ("arm64/sve: Leave SVE enabled on syscall if we don't context switch")
Fixes: f130ac0ae441 ("arm64: syscall: unmask DAIF earlier for SVCs")
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Marc Zyngier <maz@kernel.org>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20250508132644.1395904-2-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
For an architecture to enable CONFIG_ARCH_HAS_RESCHED_LAZY, two things are
required:
1) Adding a TIF_NEED_RESCHED_LAZY flag definition
2) Checking for TIF_NEED_RESCHED_LAZY in the appropriate locations
2) is handled in a generic manner by CONFIG_GENERIC_ENTRY, which isn't
(yet) implemented for arm64. However, outside of core scheduler code,
TIF_NEED_RESCHED_LAZY only needs to be checked on a kernel exit, meaning:
o return/entry to userspace.
o return/entry to guest.
The return/entry to a guest is all handled by xfer_to_guest_mode_handle_work()
which already does the right thing, so it can be left as-is.
arm64 doesn't use common entry's exit_to_user_mode_prepare(), so update its
return to user path to check for TIF_NEED_RESCHED_LAZY and call into
schedule() accordingly.
Link: https://lore.kernel.org/linux-rt-users/20241216190451.1c61977c@mordecai.tesarici.cz/
Link: https://lore.kernel.org/all/xhsmh4j0fl0p3.mognet@vschneid-thinkpadt14sgen2i.remote.csb/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
[testdrive, _TIF_WORK_MASK fixlet and changelog.]
Signed-off-by: Mike Galbraith <efault@gmx.de>
[Another round of testing; changelog faff]
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20250305104925.189198-2-vschneid@redhat.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
* for-next/mops:
: More FEAT_MOPS (memcpy instructions) uses - in-kernel routines
arm64: mops: Document requirements for hypervisors
arm64: lib: Use MOPS for copy_page() and clear_page()
arm64: lib: Use MOPS for memcpy() routines
arm64: mops: Document booting requirement for HCR_EL2.MCE2
arm64: mops: Handle MOPS exceptions from EL1
arm64: probes: Disable kprobes/uprobes on MOPS instructions
# Conflicts:
# arch/arm64/kernel/entry-common.c
|
|
We will soon be using MOPS instructions in the kernel, so wire up the
exception handler to handle exceptions from EL1 caused by the copy/set
operation being stopped and resumed on a different type of CPU.
Add a helper for advancing the single step state machine, similarly to
what the EL0 exception handler does.
Signed-off-by: Kristina Martsenko <kristina.martsenko@arm.com>
Link: https://lore.kernel.org/r/20240930161051.3777828-3-kristina.martsenko@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
A new exception code is defined for GCS specific faults other than
standard load/store faults, for example GCS token validation failures,
add handling for this. These faults are reported to userspace as
segfaults with code SEGV_CPERR (protection error), mirroring the
reporting for x86 shadow stack errors.
GCS faults due to memory load/store operations generate data aborts with
a flag set, these will be handled separately as part of the data abort
handling.
Since we do not currently enable GCS for EL1 we should not get any faults
there but while we're at it we wire things up there, treating any GCS
fault as fatal.
Reviewed-by: Thiago Jung Bauermann <thiago.bauermann@linaro.org>
Reviewed-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20241001-arm64-gcs-v13-19-222b78d87eee@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Context tracking state related symbols currently use a mix of the
CONTEXT_ (e.g. CONTEXT_KERNEL) and CT_SATE_ (e.g. CT_STATE_MASK) prefixes.
Clean up the naming and make the ctx_state enum use the CT_STATE_ prefix.
Suggested-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Valentin Schneider <vschneid@redhat.com>
Acked-by: Frederic Weisbecker <frederic@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
|
|
When returning to a user context, the arm64 entry code masks all DAIF
exceptions before handling pending work in exit_to_user_mode_prepare()
and do_notify_resume(), where it will transiently unmask all DAIF
exceptions. This is a holdover from the old entry assembly, which
conservatively masked all DAIF exceptions, and it's only necessary to
mask interrupts at this point during the exception return path, so long
as we subsequently mask all DAIF exceptions before the actual exception
return.
While most DAIF manipulation follows a save...restore sequence, the
manipulation in do_notify_resume() is the other way around, unmasking
all DAIF exceptions before masking them again. This is unfortunate as we
unnecessarily mask Debug and SError exceptions, and it would be nice to
remove this special case to make DAIF manipulation simpler and most
consistent.
This patch changes exit_to_user_mode_prepare() and do_notify_resume() to
only mask interrupts while handling pending work, masking other DAIF
exceptions after this has completed. This removes the unusual DAIF
manipulation and allows Debug and SError exceptions to be taken for a
slightly longer window during the exception return path.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: James Morse <james.morse@arm.com>
Cc: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20240206123848.1696480-4-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Itaru Kitayama <itaru.kitayama@linux.dev>
|