| Age | Commit message (Collapse) | Author | Files | Lines |
|
Now that gcc-8 and binutils-2.30 are the minimum versions, a lot of
the individual feature checks can go away for simplification.
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
Introduce `rustc-min-version` support function that mimics
`{gcc,clang}-min-version` ones, following commit 88b61e3bff93
("Makefile.compiler: replace cc-ifversion with compiler-specific macros").
In addition, use it in the first use case we have in the kernel (which
was done independently to minimize the changes needed for the fix).
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Fiona Behrens <me@Kloenk.dev>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
|
|
Starting with Rust 1.85.0 (to be released 2025-02-20), `rustc` warns
[1] about disabling neon in the aarch64 hardfloat target:
warning: target feature `neon` cannot be toggled with
`-Ctarget-feature`: unsound on hard-float targets
because it changes float ABI
|
= note: this was previously accepted by the compiler but
is being phased out; it will become a hard error
in a future release!
= note: for more information, see issue #116344
<https://github.com/rust-lang/rust/issues/116344>
Thus, instead, use the softfloat target instead.
While trying it out, I found that the kernel sanitizers were not enabled
for that built-in target [2]. Upstream Rust agreed to backport
the enablement for the current beta so that it is ready for
the 1.85.0 release [3] -- thanks!
However, that still means that before Rust 1.85.0, we cannot switch
since sanitizers could be in use. Thus conditionally do so.
Cc: stable@vger.kernel.org # Needed in 6.12.y and 6.13.y only (Rust is pinned in older LTSs).
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Matthew Maurer <mmaurer@google.com>
Cc: Alice Ryhl <aliceryhl@google.com>
Cc: Ralf Jung <post@ralfj.de>
Cc: Jubilee Young <workingjubilee@gmail.com>
Link: https://github.com/rust-lang/rust/pull/133417 [1]
Link: https://rust-lang.zulipchat.com/#narrow/channel/131828-t-compiler/topic/arm64.20neon.20.60-Ctarget-feature.60.20warning/near/495358442 [2]
Link: https://github.com/rust-lang/rust/pull/135905 [3]
Link: https://github.com/rust-lang/rust/issues/116344
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
Reviewed-by: Trevor Gross <tmgross@umich.edu>
Tested-by: Matthew Maurer <mmaurer@google.com>
Reviewed-by: Ralf Jung <post@ralfj.de>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20250210163732.281786-1-ojeda@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
$(objtree) refers to the top of the output directory of kernel builds.
This commit adds the explicit $(objtree)/ prefix to build artifacts
needed for building external modules.
This change has no immediate impact, as the top-level Makefile
currently defines:
objtree := .
This commit prepares for supporting the building of external modules
in a different directory.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
|
|
Certain portions of code always need to be position-independent
regardless of CONFIG_RELOCATABLE, including code which is executed in an
idmap or which is executed before relocations are applied. In some
kernel configurations the LLD linker generates position-dependent
veneers for such code, and when executed these result in early boot-time
failures.
Marc Zyngier encountered a boot failure resulting from this when
building a (particularly cursed) configuration with LLVM, as he reported
to the list:
https://lore.kernel.org/linux-arm-kernel/86wmjwvatn.wl-maz@kernel.org/
In Marc's kernel configuration, the .head.text and .rodata.text sections
end up more than 128MiB apart, requiring a veneer to branch between the
two:
| [mark@lakrids:~/src/linux]% usekorg 14.1.0 aarch64-linux-objdump -t vmlinux | grep -w _text
| ffff800080000000 g .head.text 0000000000000000 _text
| [mark@lakrids:~/src/linux]% usekorg 14.1.0 aarch64-linux-objdump -t vmlinux | grep -w primary_entry
| ffff8000889df0e0 g .rodata.text 000000000000006c primary_entry,
... consequently, LLD inserts a position-dependent veneer for the branch
from _stext (in .head.text) to primary_entry (in .rodata.text):
| ffff800080000000 <_text>:
| ffff800080000000: fa405a4d ccmp x18, #0x0, #0xd, pl // pl = nfrst
| ffff800080000004: 14003fff b ffff800080010000 <__AArch64AbsLongThunk_primary_entry>
...
| ffff800080010000 <__AArch64AbsLongThunk_primary_entry>:
| ffff800080010000: 58000050 ldr x16, ffff800080010008 <__AArch64AbsLongThunk_primary_entry+0x8>
| ffff800080010004: d61f0200 br x16
| ffff800080010008: 889df0e0 .word 0x889df0e0
| ffff80008001000c: ffff8000 .word 0xffff8000
... and as this is executed early in boot before the kernel is mapped in
TTBR1 this results in a silent boot failure.
Fix this by passing '--pic-veneer' to the linker, which will cause the
linker to use position-independent veneers, e.g.
| ffff800080000000 <_text>:
| ffff800080000000: fa405a4d ccmp x18, #0x0, #0xd, pl // pl = nfrst
| ffff800080000004: 14003fff b ffff800080010000 <__AArch64ADRPThunk_primary_entry>
...
| ffff800080010000 <__AArch64ADRPThunk_primary_entry>:
| ffff800080010000: f004e3f0 adrp x16, ffff800089c8f000 <__idmap_text_start>
| ffff800080010004: 91038210 add x16, x16, #0xe0
| ffff800080010008: d61f0200 br x16
I've opted to pass '--pic-veneer' unconditionally, as:
* In addition to solving the boot failure, these sequences are generally
nicer as they require fewer instructions and don't need to perform
data accesses.
* While the position-independent veneer sequences have a limited +/-2GiB
range, this is not a new restriction. Even kernels built with
CONFIG_RELOCATABLE=n are limited to 2GiB in size as we have several
structues using 32-bit relative offsets and PPREL32 relocations, which
are similarly limited to +/-2GiB in range. These include extable
entries, jump table entries, and alt_instr entries.
* GNU LD defaults to using position-independent veneers, and supports
the same '--pic-veneer' option, so this change is not expected to
adversely affect GNU LD.
I've tested with GNU LD 2.30 to 2.42 inclusive and LLVM 13.0.1 to 19.1.0
inclusive, using the kernel.org binaries from:
* https://mirrors.edge.kernel.org/pub/tools/crosstool/
* https://mirrors.edge.kernel.org/pub/tools/llvm/
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reported-by: Marc Zyngier <maz@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Will Deacon <will@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20240927101838.3061054-1-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Add all of the flags that are needed to support the shadow call stack
(SCS) sanitizer with Rust, and updates Kconfig to allow only
configurations that work.
The -Zfixed-x18 flag is required to use SCS on arm64, and requires rustc
version 1.80.0 or greater. This restriction is reflected in Kconfig.
When CONFIG_DYNAMIC_SCS is enabled, the build will be configured to
include unwind tables in the build artifacts. Dynamic SCS uses the
unwind tables at boot to find all places that need to be patched. The
-Cforce-unwind-tables=y flag ensures that unwind tables are available
for Rust code.
In non-dynamic mode, the -Zsanitizer=shadow-call-stack flag is what
enables the SCS sanitizer. Using this flag requires rustc version 1.82.0
or greater on the targets used by Rust in the kernel. This restriction
is reflected in Kconfig.
It is possible to avoid the requirement of rustc 1.80.0 by using
-Ctarget-feature=+reserve-x18 instead of -Zfixed-x18. However, this flag
emits a warning during the build, so this patch does not add support for
using it and instead requires 1.80.0 or greater.
The dependency is placed on `select HAVE_RUST` to avoid a situation
where enabling Rust silently turns off the sanitizer. Instead, turning
on the sanitizer results in Rust being disabled. We generally do not
want changes to CONFIG_RUST to result in any mitigations being changed
or turned off.
At the time of writing, rustc 1.82.0 only exists via the nightly release
channel. There is a chance that the -Zsanitizer=shadow-call-stack flag
will end up needing 1.83.0 instead, but I think it is small.
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Kees Cook <kees@kernel.org>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Link: https://lore.kernel.org/r/20240829-shadow-call-stack-v7-1-2f62a4432abf@google.com
[ Fixed indentation using spaces. - Miguel ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
On arm64 we build compressed images, but "make install" by default will
install the old non-compressed one. To actually get the compressed
image install, you need to use "make zinstall", which is not the usual
way to install a kernel.
Which may not sound like much of an issue, but when you deal with
multiple architectures (and years of your fingers knowing the regular
"make install" incantation), this inconsistency is pretty annoying.
But as Will Deacon says:
"Sadly, bootloaders being as top quality as you might expect, I don't
think we're in a position to rely on decompressor support across the
board. Our Image.gz is literally just that -- we don't have a built-in
decompressor (nor do I think we want to rush into that again after the
fun we had on arm32) and the recent EFI zboot support solves that
problem for platforms using EFI.
Changing the default 'install' target terrifies me. There are bound to
be folks with embedded boards who've scripted this and we could really
ruin their day if we quietly give them a compressed kernel that their
bootloader doesn't know how to handle :/"
So make this conditional on a new "COMPRESSED_INSTALL" option.
Cc: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
arm64 provides an equivalent to the common kernel-mode FPU API, but in a
different header and using different function names. Add a wrapper
header, and export CFLAGS adjustments as found in lib/raid6/Makefile.
Link: https://lkml.kernel.org/r/20240329072441.591471-5-samuel.holland@sifive.com
Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nicolas Schier <nicolas@fjasle.eu>
Cc: Palmer Dabbelt <palmer@rivosinc.com>
Cc: Russell King <linux@armlinux.org.uk>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: WANG Xuerui <git@xen0n.name>
Cc: Will Deacon <will@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Add a script which produces a Flat Image Tree (FIT), a single file
containing the built kernel and associated devicetree files.
Compression defaults to gzip which gives a good balance of size and
performance.
The files compress from about 86MB to 24MB using this approach.
The FIT can be used by bootloaders which support it, such as U-Boot
and Linuxboot. It permits automatic selection of the correct
devicetree, matching the compatible string of the running board with
the closest compatible string in the FIT. There is no need for
filenames or other workarounds.
Add a 'make image.fit' build target for arm64, as well.
The FIT can be examined using 'dumpimage -l'.
This uses the 'dtbs-list' file but processes only .dtb files, ignoring
the overlay .dtbo files.
This features requires pylibfdt (use 'pip install libfdt'). It also
requires compression utilities for the algorithm being used. Supported
compression options are the same as the Image.xxx files. Use
FIT_COMPRESSION to select an algorithm other than gzip.
While FIT supports a ramdisk / initrd, no attempt is made to support
this here, since it must be built separately from the Linux build.
Signed-off-by: Simon Glass <sjg@chromium.org>
Acked-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20240329032836.141899-3-sjg@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Add a new variable containing a list of possible targets. Mark them as
phony. This matches the approach taken for arch/arm
Signed-off-by: Simon Glass <sjg@chromium.org>
Reviewed-by: Nicolas Schier <n.schier@avm.de>
Link: https://lore.kernel.org/r/20240329032836.141899-2-sjg@chromium.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
This commit provides the build flags for Rust for AArch64. The core Rust
support already in the kernel does the rest. This enables the PAC ret
and BTI options in the Rust build flags to match the options that are
used when building C.
The Rust samples have been tested with this commit.
Signed-off-by: Jamie Cunliffe <Jamie.Cunliffe@arm.com>
Acked-by: Will Deacon <will@kernel.org>
Tested-by: Dirk Behme <dirk.behme@de.bosch.com>
Tested-by: Boqun Feng <boqun.feng@gmail.com>
Acked-by: Miguel Ojeda <ojeda@kernel.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Tested-by: Alice Ryhl <aliceryhl@google.com>
Tested-by: Fabien Parent <fabien.parent@linaro.org>
Link: https://lore.kernel.org/r/20231020155056.3495121-3-Jamie.Cunliffe@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Commit 2d071968a405 ("arm64: compat: Remove 32-bit sigreturn code
from the vDSO") removed all VDSO_* symbols in the compat vDSO. As a
result, vdso32-offsets.h is now empty and therefore unused. Time to
remove it.
Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com>
Link: https://lore.kernel.org/r/20240129154748.1727759-1-kevin.brodsky@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
'make vdso_install' renames arch/arm64/kernel/vdso32/vdso.so.dbg to
vdso32.so during installation, which allows 64-bit and 32-bit vdso
files to be installed in the same directory.
However, arm64 is the only architecture that requires this renaming.
To simplify the vdso_install logic, rename the in-tree vdso file so
its base name matches the installed file name.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Link: https://lore.kernel.org/r/20231117125620.1058300-1-masahiroy@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
|
|
A common issue in Makefile is a race in parallel building.
You need to be careful to prevent multiple threads from writing to the
same file simultaneously.
Commit 3939f3345050 ("ARM: 8418/1: add boot image dependencies to not
generate invalid images") addressed such a bad scenario.
A similar symptom occurs with the following command:
$ make -j$(nproc) ARCH=arm64 Image vmlinuz.efi
[ snip ]
SORTTAB vmlinux
OBJCOPY arch/arm64/boot/Image
OBJCOPY arch/arm64/boot/Image
AS arch/arm64/boot/zboot-header.o
PAD arch/arm64/boot/vmlinux.bin
GZIP arch/arm64/boot/vmlinuz
OBJCOPY arch/arm64/boot/vmlinuz.o
LD arch/arm64/boot/vmlinuz.efi.elf
OBJCOPY arch/arm64/boot/vmlinuz.efi
The log "OBJCOPY arch/arm64/boot/Image" is displayed twice.
It indicates that two threads simultaneously enter arch/arm64/boot/
and write to arch/arm64/boot/Image.
It occasionally leads to a build failure:
$ make -j$(nproc) ARCH=arm64 Image vmlinuz.efi
[ snip ]
SORTTAB vmlinux
OBJCOPY arch/arm64/boot/Image
PAD arch/arm64/boot/vmlinux.bin
truncate: Invalid number: 'arch/arm64/boot/vmlinux.bin'
make[2]: *** [drivers/firmware/efi/libstub/Makefile.zboot:13:
arch/arm64/boot/vmlinux.bin] Error 1
make[2]: *** Deleting file 'arch/arm64/boot/vmlinux.bin'
make[1]: *** [arch/arm64/Makefile:163: vmlinuz.efi] Error 2
make[1]: *** Waiting for unfinished jobs....
make: *** [Makefile:234: __sub-make] Error 2
vmlinuz.efi depends on Image, but such a dependency is not specified
in arch/arm64/Makefile.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: SImon Glass <sjg@chromium.org>
Link: https://lore.kernel.org/r/20231119053234.2367621-1-masahiroy@kernel.org
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Currently, there is no standard implementation for vdso_install,
leading to various issues:
1. Code duplication
Many architectures duplicate similar code just for copying files
to the install destination.
Some architectures (arm, sparc, x86) create build-id symlinks,
introducing more code duplication.
2. Unintended updates of in-tree build artifacts
The vdso_install rule depends on the vdso files to install.
It may update in-tree build artifacts. This can be problematic,
as explained in commit 19514fc665ff ("arm, kbuild: make
"make install" not depend on vmlinux").
3. Broken code in some architectures
Makefile code is often copied from one architecture to another
without proper adaptation.
'make vdso_install' for parisc does not work.
'make vdso_install' for s390 installs vdso64, but not vdso32.
To address these problems, this commit introduces a generic vdso_install
rule.
Architectures that support vdso_install need to define vdso-install-y
in arch/*/Makefile. vdso-install-y lists the files to install.
For example, arch/x86/Makefile looks like this:
vdso-install-$(CONFIG_X86_64) += arch/x86/entry/vdso/vdso64.so.dbg
vdso-install-$(CONFIG_X86_X32_ABI) += arch/x86/entry/vdso/vdsox32.so.dbg
vdso-install-$(CONFIG_X86_32) += arch/x86/entry/vdso/vdso32.so.dbg
vdso-install-$(CONFIG_IA32_EMULATION) += arch/x86/entry/vdso/vdso32.so.dbg
These files will be installed to $(MODLIB)/vdso/ with the .dbg suffix,
if exists, stripped away.
vdso-install-y can optionally take the second field after the colon
separator. This is needed because some architectures install a vdso
file as a different base name.
The following is a snippet from arch/arm64/Makefile.
vdso-install-$(CONFIG_COMPAT_VDSO) += arch/arm64/kernel/vdso32/vdso.so.dbg:vdso32.so
This will rename vdso.so.dbg to vdso32.so during installation. If such
architectures change their implementation so that the base names match,
this workaround will go away.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Acked-by: Sven Schnelle <svens@linux.ibm.com> # s390
Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>
Reviewed-by: Guo Ren <guoren@kernel.org>
Acked-by: Helge Deller <deller@gmx.de> # parisc
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux
Pull arm64 updates from Catalin Marinas:
- Support for arm64 SME 2 and 2.1. SME2 introduces a new 512-bit
architectural register (ZT0, for the look-up table feature) that
Linux needs to save/restore
- Include TPIDR2 in the signal context and add the corresponding
kselftests
- Perf updates: Arm SPEv1.2 support, HiSilicon uncore PMU updates, ACPI
support to the Marvell DDR and TAD PMU drivers, reset DTM_PMU_CONFIG
(ARM CMN) at probe time
- Support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64
- Permit EFI boot with MMU and caches on. Instead of cleaning the
entire loaded kernel image to the PoC and disabling the MMU and
caches before branching to the kernel bare metal entry point, leave
the MMU and caches enabled and rely on EFI's cacheable 1:1 mapping of
all of system RAM to populate the initial page tables
- Expose the AArch32 (compat) ELF_HWCAP features to user in an arm64
kernel (the arm32 kernel only defines the values)
- Harden the arm64 shadow call stack pointer handling: stash the shadow
stack pointer in the task struct on interrupt, load it directly from
this structure
- Signal handling cleanups to remove redundant validation of size
information and avoid reading the same data from userspace twice
- Refactor the hwcap macros to make use of the automatically generated
ID registers. It should make new hwcaps writing less error prone
- Further arm64 sysreg conversion and some fixes
- arm64 kselftest fixes and improvements
- Pointer authentication cleanups: don't sign leaf functions, unify
asm-arch manipulation
- Pseudo-NMI code generation optimisations
- Minor fixes for SME and TPIDR2 handling
- Miscellaneous updates: ARCH_FORCE_MAX_ORDER is now selectable,
replace strtobool() to kstrtobool() in the cpufeature.c code, apply
dynamic shadow call stack in two passes, intercept pfn changes in
set_pte_at() without the required break-before-make sequence, attempt
to dump all instructions on unhandled kernel faults
* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (130 commits)
arm64: fix .idmap.text assertion for large kernels
kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
kselftest/arm64: Copy whole EXTRA context
arm64: kprobes: Drop ID map text from kprobes blacklist
perf: arm_spe: Print the version of SPE detected
perf: arm_spe: Add support for SPEv1.2 inverted event filtering
perf: Add perf_event_attr::config3
arm64/sme: Fix __finalise_el2 SMEver check
drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
arm64/signal: Only read new data when parsing the ZT context
arm64/signal: Only read new data when parsing the ZA context
arm64/signal: Only read new data when parsing the SVE context
arm64/signal: Avoid rereading context frame sizes
arm64/signal: Make interface for restore_fpsimd_context() consistent
arm64/signal: Remove redundant size validation from parse_user_sigframe()
arm64/signal: Don't redundantly verify FPSIMD magic
arm64/cpufeature: Use helper macros to specify hwcaps
arm64/cpufeature: Always use symbolic name for feature value in hwcaps
arm64/sysreg: Initial unsigned annotations for ID registers
arm64/sysreg: Initial annotation of signed ID registers
...
|
|
Provide a slimline configuration intended to be booted on virtual
machines, with the goal of providing a light configuration which will
boot on and enable features available in mach-virt. This is defined in
terms of the standard defconfig, with an additional virt.config fragment
which disables options unneeded in a virtual configuration.
As a first step we just disable all the ARCH_ configuration options,
disabling the build of all the SoC specific drivers. This results in a
kernel that builds about 25% faster in my testing, if this approach
works for people we can add further options.
Signed-off-by: Mark Brown <broonie@kernel.org>
Link: https://lore.kernel.org/r/20230203-arm64-defconfigs-v1-3-cd0694a05f13@kernel.org
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
|
|
'for-next/misc', 'for-next/sme2', 'for-next/tpidr2', 'for-next/scs', 'for-next/compat-hwcap', 'for-next/ftrace', 'for-next/efi-boot-mmu-on', 'for-next/ptrauth' and 'for-next/pseudo-nmi', remote-tracking branch 'arm64/for-next/perf' into for-next/core
* arm64/for-next/perf:
perf: arm_spe: Print the version of SPE detected
perf: arm_spe: Add support for SPEv1.2 inverted event filtering
perf: Add perf_event_attr::config3
drivers/perf: fsl_imx8_ddr_perf: Remove set-but-not-used variable
perf: arm_spe: Support new SPEv1.2/v8.7 'not taken' event
perf: arm_spe: Use new PMSIDR_EL1 register enums
perf: arm_spe: Drop BIT() and use FIELD_GET/PREP accessors
arm64/sysreg: Convert SPE registers to automatic generation
arm64: Drop SYS_ from SPE register defines
perf: arm_spe: Use feature numbering for PMSEVFR_EL1 defines
perf/marvell: Add ACPI support to TAD uncore driver
perf/marvell: Add ACPI support to DDR uncore driver
perf/arm-cmn: Reset DTM_PMU_CONFIG at probe
drivers/perf: hisi: Extract initialization of "cpa_pmu->pmu"
drivers/perf: hisi: Simplify the parameters of hisi_pmu_init()
drivers/perf: hisi: Advertise the PERF_PMU_CAP_NO_EXCLUDE capability
* for-next/sysreg:
: arm64 sysreg and cpufeature fixes/updates
KVM: arm64: Use symbolic definition for ISR_EL1.A
arm64/sysreg: Add definition of ISR_EL1
arm64/sysreg: Add definition for ICC_NMIAR1_EL1
arm64/cpufeature: Remove 4 bit assumption in ARM64_FEATURE_MASK()
arm64/sysreg: Fix errors in 32 bit enumeration values
arm64/cpufeature: Fix field sign for DIT hwcap detection
* for-next/sme:
: SME-related updates
arm64/sme: Optimise SME exit on syscall entry
arm64/sme: Don't use streaming mode to probe the maximum SME VL
arm64/ptrace: Use system_supports_tpidr2() to check for TPIDR2 support
* for-next/kselftest: (23 commits)
: arm64 kselftest fixes and improvements
kselftest/arm64: Don't require FA64 for streaming SVE+ZA tests
kselftest/arm64: Copy whole EXTRA context
kselftest/arm64: Fix enumeration of systems without 128 bit SME for SSVE+ZA
kselftest/arm64: Fix enumeration of systems without 128 bit SME
kselftest/arm64: Don't require FA64 for streaming SVE tests
kselftest/arm64: Limit the maximum VL we try to set via ptrace
kselftest/arm64: Correct buffer size for SME ZA storage
kselftest/arm64: Remove the local NUM_VL definition
kselftest/arm64: Verify simultaneous SSVE and ZA context generation
kselftest/arm64: Verify that SSVE signal context has SVE_SIG_FLAG_SM set
kselftest/arm64: Remove spurious comment from MTE test Makefile
kselftest/arm64: Support build of MTE tests with clang
kselftest/arm64: Initialise current at build time in signal tests
kselftest/arm64: Don't pass headers to the compiler as source
kselftest/arm64: Remove redundant _start labels from FP tests
kselftest/arm64: Fix .pushsection for strings in FP tests
kselftest/arm64: Run BTI selftests on systems without BTI
kselftest/arm64: Fix test numbering when skipping tests
kselftest/arm64: Skip non-power of 2 SVE vector lengths in fp-stress
kselftest/arm64: Only enumerate power of two VLs in syscall-abi
...
* for-next/misc:
: Miscellaneous arm64 updates
arm64/mm: Intercept pfn changes in set_pte_at()
Documentation: arm64: correct spelling
arm64: traps: attempt to dump all instructions
arm64: Apply dynamic shadow call stack patching in two passes
arm64: el2_setup.h: fix spelling typo in comments
arm64: Kconfig: fix spelling
arm64: cpufeature: Use kstrtobool() instead of strtobool()
arm64: Avoid repeated AA64MMFR1_EL1 register read on pagefault path
arm64: make ARCH_FORCE_MAX_ORDER selectable
* for-next/sme2: (23 commits)
: Support for arm64 SME 2 and 2.1
arm64/sme: Fix __finalise_el2 SMEver check
kselftest/arm64: Remove redundant _start labels from zt-test
kselftest/arm64: Add coverage of SME 2 and 2.1 hwcaps
kselftest/arm64: Add coverage of the ZT ptrace regset
kselftest/arm64: Add SME2 coverage to syscall-abi
kselftest/arm64: Add test coverage for ZT register signal frames
kselftest/arm64: Teach the generic signal context validation about ZT
kselftest/arm64: Enumerate SME2 in the signal test utility code
kselftest/arm64: Cover ZT in the FP stress test
kselftest/arm64: Add a stress test program for ZT0
arm64/sme: Add hwcaps for SME 2 and 2.1 features
arm64/sme: Implement ZT0 ptrace support
arm64/sme: Implement signal handling for ZT
arm64/sme: Implement context switching for ZT0
arm64/sme: Provide storage for ZT0
arm64/sme: Add basic enumeration for SME2
arm64/sme: Enable host kernel to access ZT0
arm64/sme: Manually encode ZT0 load and store instructions
arm64/esr: Document ISS for ZT0 being disabled
arm64/sme: Document SME 2 and SME 2.1 ABI
...
* for-next/tpidr2:
: Include TPIDR2 in the signal context
kselftest/arm64: Add test case for TPIDR2 signal frame records
kselftest/arm64: Add TPIDR2 to the set of known signal context records
arm64/signal: Include TPIDR2 in the signal context
arm64/sme: Document ABI for TPIDR2 signal information
* for-next/scs:
: arm64: harden shadow call stack pointer handling
arm64: Stash shadow stack pointer in the task struct on interrupt
arm64: Always load shadow stack pointer directly from the task struct
* for-next/compat-hwcap:
: arm64: Expose compat ARMv8 AArch32 features (HWCAPs)
arm64: Add compat hwcap SSBS
arm64: Add compat hwcap SB
arm64: Add compat hwcap I8MM
arm64: Add compat hwcap ASIMDBF16
arm64: Add compat hwcap ASIMDFHM
arm64: Add compat hwcap ASIMDDP
arm64: Add compat hwcap FPHP and ASIMDHP
* for-next/ftrace:
: Add arm64 support for DYNAMICE_FTRACE_WITH_CALL_OPS
arm64: avoid executing padding bytes during kexec / hibernation
arm64: Implement HAVE_DYNAMIC_FTRACE_WITH_CALL_OPS
arm64: ftrace: Update stale comment
arm64: patching: Add aarch64_insn_write_literal_u64()
arm64: insn: Add helpers for BTI
arm64: Extend support for CONFIG_FUNCTION_ALIGNMENT
ACPI: Don't build ACPICA with '-Os'
Compiler attributes: GCC cold function alignment workarounds
ftrace: Add DYNAMIC_FTRACE_WITH_CALL_OPS
* for-next/efi-boot-mmu-on:
: Permit arm64 EFI boot with MMU and caches on
arm64: kprobes: Drop ID map text from kprobes blacklist
arm64: head: Switch endianness before populating the ID map
efi: arm64: enter with MMU and caches enabled
arm64: head: Clean the ID map and the HYP text to the PoC if needed
arm64: head: avoid cache invalidation when entering with the MMU on
arm64: head: record the MMU state at primary entry
arm64: kernel: move identity map out of .text mapping
arm64: head: Move all finalise_el2 calls to after __enable_mmu
* for-next/ptrauth:
: arm64 pointer authentication cleanup
arm64: pauth: don't sign leaf functions
arm64: unify asm-arch manipulation
* for-next/pseudo-nmi:
: Pseudo-NMI code generation optimisations
arm64: irqflags: use alternative branches for pseudo-NMI logic
arm64: add ARM64_HAS_GIC_PRIO_RELAXED_SYNC cpucap
arm64: make ARM64_HAS_GIC_PRIO_MASKING depend on ARM64_HAS_GIC_CPUIF_SYSREGS
arm64: rename ARM64_HAS_IRQ_PRIO_MASKING to ARM64_HAS_GIC_PRIO_MASKING
arm64: rename ARM64_HAS_SYSREG_GIC_CPUIF to ARM64_HAS_GIC_CPUIF_SYSREGS
|
|
Currently, when CONFIG_ARM64_PTR_AUTH_KERNEL=y (and
CONFIG_UNWIND_PATCH_PAC_INTO_SCS=n), we enable pointer authentication
for all functions, including leaf functions. This isn't necessary, and
is unfortunate for a few reasons:
* Any PACIASP instruction is implicitly a `BTI C` landing pad, and
forcing the addition of a PACIASP in every function introduces a
larger set of BTI gadgets than is necessary.
* The PACIASP and AUTIASP instructions make leaf functions larger than
necessary, bloating the kernel Image. For a defconfig v6.2-rc3 kernel,
this appears to add ~64KiB relative to not signing leaf functions,
which is unfortunate but not entirely onerous.
* The PACIASP and AUTIASP instructions potentially make leaf functions
more expensive in terms of performance and/or power. For many trivial
leaf functions, this is clearly unnecessary, e.g.
| <arch_local_save_flags>:
| d503233f paciasp
| d53b4220 mrs x0, daif
| d50323bf autiasp
| d65f03c0 ret
| <calibration_delay_done>:
| d503233f paciasp
| d50323bf autiasp
| d65f03c0 ret
| d503201f nop
* When CONFIG_UNWIND_PATCH_PAC_INTO_SCS=y we disable pointer
authentication for leaf functions, so clearly this is not functionally
necessary, indicates we have an inconsistent threat model, and
convolutes the Makefile logic.
We've used pointer authentication in leaf functions since the
introduction of in-kernel pointer authentication in commit:
74afda4016a7437e ("arm64: compile the kernel with ptrauth return address signing")
... but at the time we had no rationale for signing leaf functions.
Subsequently, we considered avoiding signing leaf functions:
https://lore.kernel.org/linux-arm-kernel/1586856741-26839-1-git-send-email-amit.kachhap@arm.com/
https://lore.kernel.org/linux-arm-kernel/1588149371-20310-1-git-send-email-amit.kachhap@arm.com/
... however at the time we didn't have an abundance of reasons to avoid
signing leaf functions as above (e.g. the BTI case), we had no hardware
to make performance measurements, and it was reasoned that this gave
some level of protection against a limited set of code-reuse gadgets
which would fall through to a RET. We documented this in commit:
717b938e22f8dbf0 ("arm64: Document why we enable PAC support for leaf functions")
Notably, this was before we supported any forward-edge CFI scheme (e.g.
Arm BTI, or Clang CFI/kCFI), which would prevent jumping into the middle
of a function.
In addition, even with signing forced for leaf functions, AUTIASP may be
placed before a number of instructions which might constitute such a
gadget, e.g.
| <user_regs_reset_single_step>:
| f9400022 ldr x2, [x1]
| d503233f paciasp
| d50323bf autiasp
| f9408401 ldr x1, [x0, #264]
| 720b005f tst w2, #0x200000
| b26b0022 orr x2, x1, #0x200000
| 926af821 and x1, x1, #0xffffffffffdfffff
| 9a820021 csel x1, x1, x2, eq // eq = none
| f9008401 str x1, [x0, #264]
| d65f03c0 ret
| <fpsimd_cpu_dead>:
| 2a0003e3 mov w3, w0
| 9000ff42 adrp x2, ffff800009ffd000 <xen_dynamic_chip+0x48>
| 9120e042 add x2, x2, #0x838
| 52800000 mov w0, #0x0 // #0
| d503233f paciasp
| f000d041 adrp x1, ffff800009a20000 <this_cpu_vector>
| d50323bf autiasp
| 9102c021 add x1, x1, #0xb0
| f8635842 ldr x2, [x2, w3, uxtw #3]
| f821685f str xzr, [x2, x1]
| d65f03c0 ret
| d503201f nop
So generally, trying to use AUTIASP to detect such gadgetization is not
robust, and this is dealt with far better by forward-edge CFI (which is
designed to prevent such cases). We should bite the bullet and stop
pretending that AUTIASP is a mitigation for such forward-edge
gadgetization.
For the above reasons, this patch has the kernel consistently sign
non-leaf functions and avoid signing leaf functions.
Considering a defconfig v6.2-rc3 kernel built with LLVM 15.0.6:
* The vmlinux is ~43KiB smaller:
| [mark@lakrids:~/src/linux]% ls -al vmlinux-*
| -rwxr-xr-x 1 mark mark 338547808 Jan 25 17:17 vmlinux-after
| -rwxr-xr-x 1 mark mark 338591472 Jan 25 17:22 vmlinux-before
* The resulting Image is 64KiB smaller:
| [mark@lakrids:~/src/linux]% ls -al Image-*
| -rwxr-xr-x 1 mark mark 32702976 Jan 25 17:17 Image-after
| -rwxr-xr-x 1 mark mark 32768512 Jan 25 17:22 Image-before
* There are ~400 fewer BTI gadgets:
| [mark@lakrids:~/src/linux]% usekorg 12.1.0 aarch64-linux-objdump -d vmlinux-before 2> /dev/null | grep -ow 'paciasp\|bti\sc\?' | sort | uniq -c
| 1219 bti c
| 61982 paciasp
| [mark@lakrids:~/src/linux]% usekorg 12.1.0 aarch64-linux-objdump -d vmlinux-after 2> /dev/null | grep -ow 'paciasp\|bti\sc\?' | sort | uniq -c
| 10099 bti c
| 52699 paciasp
Which is +8880 BTIs, and -9283 PACIASPs, for -403 unnecessary BTI
gadgets. While this is small relative to the total, distinguishing the
two cases will make it easier to analyse and reduce this set further
in future.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Amit Daniel Kachhap <amit.kachhap@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230131105809.991288-3-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
Assemblers will reject instructions not supported by a target
architecture version, and so we must explicitly tell the assembler the
latest architecture version for which we want to assemble instructions
from.
We've added a few AS_HAS_ARMV8_<N> definitions for this, in addition to
an inconsistently named AS_HAS_PAC definition, from which arm64's
top-level Makefile determines the architecture version that we intend to
target, and generates the `asm-arch` variable.
To make this a bit clearer and easier to maintain, this patch reworks
the Makefile to determine asm-arch in a single if-else-endif chain.
AS_HAS_PAC, which is defined when the assembler supports
`-march=armv8.3-a`, is renamed to AS_HAS_ARMV8_3.
As the logic for armv8.3-a is lifted out of the block handling pointer
authentication, `asm-arch` may now be set to armv8.3-a regardless of
whether support for pointer authentication is selected. This means that
it will be possible to assemble armv8.3-a instructions even if we didn't
intend to, but this is consistent with our handling of other
architecture versions, and the compiler won't generate armv8.3-a
instructions regardless.
For the moment there's no need for an CONFIG_AS_HAS_ARMV8_1, as the code
for LSE atomics and LDAPR use individual `.arch_extension` entries and
do not require the baseline asm arch to be bumped to armv8.1-a. The
other armv8.1-a features (e.g. PAN) do not require assembler support.
There should be no functional change as a result of this patch.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Mark Brown <broonie@kernel.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230131105809.991288-2-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
This patch enables support for DYNAMIC_FTRACE_WITH_CALL_OPS on arm64.
This allows each ftrace callsite to provide an ftrace_ops to the common
ftrace trampoline, allowing each callsite to invoke distinct tracer
functions without the need to fall back to list processing or to
allocate custom trampolines for each callsite. This significantly speeds
up cases where multiple distinct trace functions are used and callsites
are mostly traced by a single tracer.
The main idea is to place a pointer to the ftrace_ops as a literal at a
fixed offset from the function entry point, which can be recovered by
the common ftrace trampoline. Using a 64-bit literal avoids branch range
limitations, and permits the ops to be swapped atomically without
special considerations that apply to code-patching. In future this will
also allow for the implementation of DYNAMIC_FTRACE_WITH_DIRECT_CALLS
without branch range limitations by using additional fields in struct
ftrace_ops.
As noted in the core patch adding support for
DYNAMIC_FTRACE_WITH_CALL_OPS, this approach allows for directly invoking
ftrace_ops::func even for ftrace_ops which are dynamically-allocated (or
part of a module), without going via ftrace_ops_list_func.
Currently, this approach is not compatible with CLANG_CFI, as the
presence/absence of pre-function NOPs changes the offset of the
pre-function type hash, and there's no existing mechanism to ensure a
consistent offset for instrumented and uninstrumented functions. When
CLANG_CFI is enabled, the existing scheme with a global ops->func
pointer is used, and there should be no functional change. I am
currently working with others to allow the two to work together in
future (though this will liekly require updated compiler support).
I've benchamrked this with the ftrace_ops sample module [1], which is
not currently upstream, but available at:
https://lore.kernel.org/lkml/20230103124912.2948963-1-mark.rutland@arm.com
git://git.kernel.org/pub/scm/linux/kernel/git/mark/linux.git ftrace-ops-sample-20230109
Using that module I measured the total time taken for 100,000 calls to a
trivial instrumented function, with a number of tracers enabled with
relevant filters (which would apply to the instrumented function) and a
number of tracers enabled with irrelevant filters (which would not apply
to the instrumented function). I tested on an M1 MacBook Pro, running
under a HVF-accelerated QEMU VM (i.e. on real hardware).
Before this patch:
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,583 | 0.95 | -
0 | 1 || 93,709 | 0.94 | -
0 | 2 || 93,666 | 0.94 | -
0 | 10 || 93,709 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 6,467,833 | 64.68 | 63.73
1 | 2 || 7,509,708 | 75.10 | 74.15
1 | 10 || 23,786,792 | 237.87 | 236.92
1 | 100 || 106,432,500 | 1,064.43 | 1063.38
---------+------------++-------------+--------------+------------
1 | 0 || 1,431,875 | 14.32 | 13.37
2 | 0 || 6,456,334 | 64.56 | 63.62
10 | 0 || 22,717,000 | 227.17 | 226.22
100 | 0 || 103,293,667 | 1032.94 | 1031.99
---------+------------++-------------+--------------+--------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
After this patch
Number of tracers || Total time | Per-call average time (ns)
Relevant | Irrelevant || (ns) | Total | Overhead
=========+============++=============+==============+============
0 | 0 || 94,541 | 0.95 | -
0 | 1 || 93,666 | 0.94 | -
0 | 2 || 93,709 | 0.94 | -
0 | 10 || 93,667 | 0.94 | -
0 | 100 || 93,792 | 0.94 | -
---------+------------++-------------+--------------+------------
1 | 1 || 281,000 | 2.81 | 1.86
1 | 2 || 281,042 | 2.81 | 1.87
1 | 10 || 280,958 | 2.81 | 1.86
1 | 100 || 281,250 | 2.81 | 1.87
---------+------------++-------------+--------------+------------
1 | 0 || 280,959 | 2.81 | 1.86
2 | 0 || 6,502,708 | 65.03 | 64.08
10 | 0 || 18,681,209 | 186.81 | 185.87
100 | 0 || 103,550,458 | 1,035.50 | 1034.56
---------+------------++-------------+--------------+------------
Note: per-call overhead is estimated relative to the baseline case
with 0 relevant tracers and 0 irrelevant tracers.
As can be seen from the above:
a) Whenever there is a single relevant tracer function associated with a
tracee, the overhead of invoking the tracer is constant, and does not
scale with the number of tracers which are *not* associated with that
tracee.
b) The overhead for a single relevant tracer has dropped to ~1/7 of the
overhead prior to this series (from 13.37ns to 1.86ns). This is
largely due to permitting calls to dynamically-allocated ftrace_ops
without going through ftrace_ops_list_func.
I've run the ftrace selftests from v6.2-rc3, which reports:
| # of passed: 110
| # of failed: 0
| # of unresolved: 3
| # of untested: 0
| # of unsupported: 0
| # of xfailed: 1
| # of undefined(test bug): 0
... where the unresolved entries were the tests for DIRECT functions
(which are not supported), and the checkbashisms selftest (which is
irrelevant here):
| [8] Test ftrace direct functions against tracers [UNRESOLVED]
| [9] Test ftrace direct functions against kprobes [UNRESOLVED]
| [62] Meta-selftest: Checkbashisms [UNRESOLVED]
... with all other tests passing (or failing as expected).
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Cc: Florent Revest <revest@chromium.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20230123134603.1064407-9-mark.rutland@arm.com
Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
|
|
* for-next/ftrace:
ftrace: arm64: remove static ftrace
ftrace: arm64: move from REGS to ARGS
ftrace: abstract DYNAMIC_FTRACE_WITH_ARGS accesses
ftrace: rename ftrace_instruction_pointer_set() -> ftrace_regs_set_instruction_pointer()
ftrace: pass fregs to arch_ftrace_set_direct_caller()
|
|
This commit replaces arm64's support for FTRACE_WITH_REGS with support
for FTRACE_WITH_ARGS. This removes some overhead and complexity, and
removes some latent issues with inconsistent presentation of struct
pt_regs (which can only be reliably saved/restored at exception
boundaries).
FTRACE_WITH_REGS has been supported on arm64 since commit:
3b23e4991fb66f6d ("arm64: implement ftrace with regs")
As noted in the commit message, the major reasons for implementing
FTRACE_WITH_REGS were:
(1) To make it possible to use the ftrace graph tracer with pointer
authentication, where it's necessary to snapshot/manipulate the LR
before it is signed by the instrumented function.
(2) To make it possible to implement LIVEPATCH in future, where we need
to hook function entry before an instrumented function manipulates
the stack or argument registers. Practically speaking, we need to
preserve the argument/return registers, PC, LR, and SP.
Neither of these need a struct pt_regs, and only require the set of
registers which are live at function call/return boundaries. Our calling
convention is defined by "Procedure Call Standard for the Arm® 64-bit
Architecture (AArch64)" (AKA "AAPCS64"), which can currently be found
at:
https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst
Per AAPCS64, all function call argument and return values are held in
the following GPRs:
* X0 - X7 : parameter / result registers
* X8 : indirect result location register
* SP : stack pointer (AKA SP)
Additionally, ad function call boundaries, the following GPRs hold
context/return information:
* X29 : frame pointer (AKA FP)
* X30 : link register (AKA LR)
... and for ftrace we need to capture the instrumented address:
* PC : program counter
No other GPRs are relevant, as none of the other arguments hold
parameters or return values:
* X9 - X17 : temporaries, may be clobbered
* X18 : shadow call stack pointer (or temorary)
* X19 - X28 : callee saved
This patch implements FTRACE_WITH_ARGS for arm64, only saving/restoring
the minimal set of registers necessary. This is always sufficient to
manipulate control flow (e.g. for live-patching) or to manipulate
function arguments and return values.
This reduces the necessary stack usage from 336 bytes for pt_regs down
to 112 bytes for ftrace_regs + 32 bytes for two frame records, freeing
up 188 bytes. This could be reduced further with changes to the
unwinder.
As there is no longer a need to save different sets of registers for
different features, we no longer need distinct `ftrace_caller` and
`ftrace_regs_caller` trampolines. This allows the trampoline assembly to
be simpler, and simplifies code which previously had to handle the two
trampolines.
I've tested this with the ftrace selftests, where there are no
unexpected failures.
Co-developed-by: Florent Revest <revest@chromium.org>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Florent Revest <revest@chromium.org>
Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Will Deacon <will@kernel.org>
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://lore.kernel.org/r/20221103170520.931305-5-mark.rutland@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Implement dynamic shadow call stack support on Clang, by parsing the
unwind tables at init time to locate all occurrences of PACIASP/AUTIASP
instructions, and replacing them with the shadow call stack push and pop
instructions, respectively.
This is useful because the overhead of the shadow call stack is
difficult to justify on hardware that implements pointer authentication
(PAC), and given that the PAC instructions are executed as NOPs on
hardware that doesn't, we can just replace them without breaking
anything. As PACIASP/AUTIASP are guaranteed to be paired with respect to
manipulations of the return address, replacing them 1:1 with shadow call
stack pushes and pops is guaranteed to result in the desired behavior.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Sami Tolvanen <sami |