aboutsummaryrefslogtreecommitdiff
path: root/arch
AgeCommit message (Collapse)AuthorFilesLines
2026-01-22arm64: dts: marvell: Add SoC specific compatibles to SafeXcel cryptoAngeloGioacchino Del Regno2-2/+4
Following the changes in the binding for the SafeXcel crypto engine, add SoC specific compatibles to the existing nodes in Armada 37xx and CP11x. Signed-off-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> Signed-off-by: Gregory CLEMENT <gregory.clement@bootlin.com>
2026-01-22KVM: arm64: nv: Return correct RES0 bits for FGT registersZenghui Yu (Huawei)1-1/+1
We had extended the sysreg masking infrastructure to more general registers, instead of restricting it to VNCR-backed registers, since commit a0162020095e ("KVM: arm64: Extend masking facility to arbitrary registers"). Fix kvm_get_sysreg_res0() to reflect this fact. Note that we're sure that we only deal with FGT registers in kvm_get_sysreg_res0(), the if (sr < __VNCR_START__) is actually a never false, which should probably be removed later. Fixes: 69c19e047dfe ("KVM: arm64: Add TCR2_EL2 to the sysreg arrays") Signed-off-by: Zenghui Yu (Huawei) <zenghui.yu@linux.dev> Link: https://patch.msgid.link/20260121101631.41037-1-zenghui.yu@linux.dev Signed-off-by: Marc Zyngier <maz@kernel.org> Cc: stable@vger.kernel.org
2026-01-22Merge tag 'v6.20-rockchip-dts64-1' of ↵Krzysztof Kozlowski38-25/+2248
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt New boards: Orange Pi CM5 module + Baseboard, Radxa CM5 module + IO-board. PCIe-slot-overlay for rk3576-evb1 New peripherals: some of the video decoders on rk3576 and rk3588 Enabled peripherals: many RK3588-NPUs and a lot of other peripherals on a plethora of boards. * tag 'v6.20-rockchip-dts64-1' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: (40 commits) arm64: dts: rockchip: Add the vdpu383 Video Decoder on rk3576 arm64: dts: rockchip: Add the vdpu381 Video Decoders on RK3588 arm64: dts: rockchip: Add rk3588s-orangepi-cm5-base device tree dt-bindings: arm: rockchip: Add Orange Pi CM5 Base arm64: dts: rockchip: Enable second HDMI output on CM3588 arm64: dts: rockchip: Add HDMI to Gameforce Ace arm64: dts: rockchip: Enable analog sound on RK3576 EVB1 arm64: dts: rockchip: Enable HDMI sound on RK3576 EVB1 arm64: dts: rockchip: Enable HDMI sound on Luckfox Core3576 arm64: dts: rockchip: Enable HDMI sound on FriendlyElec NanoPi M5 arm64: dts: rockchip: Use a readable audio card name on NanoPi M5 arm64: dts: rockchip: enable NPU on rk3588-jaguar arm64: dts: rockchip: enable NPU on rk3588-tiger dt-bindings: arm: rockchip: fix description for Radxa CM5 dt-bindings: arm: rockchip: fix description for Radxa CM3I arm64: dts: rockchip: Add missing everest,es8388 supplies to rk3399-roc-pc-plus arm64: dts: rockchip: Enable PCIe for ArmSoM Sige1 arm64: dts: rockchip: Enable the NPU on Turing RK1 arm64: dts: rockchip: Enable the NPU on FriendlyElec CM3588 arm64: dts: rockchip: Enable the NPU on NanoPC T6/T6-LTS ... Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-22KVM: arm64: Always populate FGT masks at boot timeMarc Zyngier1-9/+9
We currently only populate the FGT masks if the underlying HW does support FEAT_FGT. However, with the addition of the RES1 support for system registers, this results in a lot of noise at boot time, as reported by Nathan. That's because even if FGT isn't supported, we still check for the attribution of the bits to particular features, and not keeping the masks up-to-date leads to (fairly harmess) warnings. Given that we want these checks to be enforced even if the HW doesn't support FGT, enable the generation of FGT masks unconditionally (this is rather cheap anyway). Only the storage of the FGT configuration is avoided, which will save a tiny bit of memory on these machines. Reported-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Fixes: c259d763e6b09 ("KVM: arm64: Account for RES1 bits in DECLARE_FEAT_MAP() and co") Link: https://lore.kernel.org/r/20260120211558.GA834868@ax162 Link: https://patch.msgid.link/20260122085153.535538-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-01-22Merge tag 'v6.20-rockchip-dts32-1' of ↵Krzysztof Kozlowski1-1/+16
https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip into soc/dt HEVC decoder node for RK3288. * tag 'v6.20-rockchip-dts32-1' of https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip: ARM: dts: rockchip: Add vdec node for RK3288 Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-22Merge tag 'juno-updates-7.0' of ↵Krzysztof Kozlowski2-4/+11
https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux into soc/dt Armv8 Juno/Vexpress updates for v7.0 This contains a small set of DT updates: 1. Align DTS node naming with established coding style by replacing underscores with hyphens in node names. This is a safe change and does not affect ABI. 2. Add support for the CMN PMU on the Arm Morello platform, exposing the CMN-Skeena (CMN-600 r3p1–compatible) PMU via the standard CMN-600 binding. This enables PMU access on real Morello SDP hardware, where the registers are functional. * tag 'juno-updates-7.0' of https://git.kernel.org/pub/scm/linux/kernel/git/sudeep.holla/linux: arm64: dts: arm: Use hyphen in node names arm64: dts: morello: Add CMN PMU Signed-off-by: Krzysztof Kozlowski <krzk@kernel.org>
2026-01-21ARM: dts: qcom: switch to RPMPD_* indicesDmitry Baryshkov1-2/+2
Use generic RPMPD_* defines for power domain instead of using platform-specific defines. Reviewed-by: Bjorn Andersson <andersson@kernel.org> Acked-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251211-rework-rpmhpd-rpmpd-v2-2-a5ec4028129f@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-21arm64: dts: qcom: sm6115: Add CX_MEM/DBGC GPU regionsKonrad Dybcio1-2/+6
Describe the GPU register regions, with the former existing but not being used much if at all on this silicon, and the latter containing various debugging levers generally related to dumping the state of the IP upon a crash. Fixes: 11750af256f8 ("arm64: dts: qcom: sm6115: Add GPU nodes") Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Closes: https://lore.kernel.org/linux-arm-msm/8a64f70b-8034-45e7-86a3-0015cf357132@oss.qualcomm.com/T/#m404f1425c36b61467760f058b696b8910340a063 Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251229-topic-6115_2290_gpu_dbgc-v1-3-4a24d196389c@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-21arm64: dts: qcom: agatti: Add CX_MEM/DBGC GPU regionsKonrad Dybcio1-2/+6
Describe the GPU register regions, with the former existing but not being used much if at all on this silicon, and the latter containing various debugging levers generally related to dumping the state of the IP upon a crash. Fixes: 4faeef52c8e6 ("arm64: dts: qcom: qcm2290: Add GPU nodes") Reported-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Closes: https://lore.kernel.org/linux-arm-msm/8a64f70b-8034-45e7-86a3-0015cf357132@oss.qualcomm.com/T/#m404f1425c36b61467760f058b696b8910340a063 Signed-off-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Akhil P Oommen <akhilpo@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251229-topic-6115_2290_gpu_dbgc-v1-2-4a24d196389c@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-21arm64: dts: qcom: sm8750: add ADSP fastrpc-compute-cb nodesAlexey Klimov1-0/+61
Add ADSP fastrpc nodes for sm8750 SoC. Cc: Ekansh Gupta <quic_ekangupt@quicinc.com> Cc: Srinivas Kandagatla <srini@kernel.org> Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lore.kernel.org/r/20251209-sm8750-fastrpc-adsp-v3-2-ccfff49a8af9@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-21arm64: dts: qcom: sm8750: add memory node for adsp fastrpcAlexey Klimov1-0/+8
Add optional memory heap node that can be used for ADSP fastrpc. Cc: Ekansh Gupta <quic_ekangupt@quicinc.com> Cc: Srinivas Kandagatla <srini@kernel.org> Signed-off-by: Alexey Klimov <alexey.klimov@linaro.org> Link: https://lore.kernel.org/r/20251209-sm8750-fastrpc-adsp-v3-1-ccfff49a8af9@linaro.org Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-21arm64: dts: qcom: switch to RPMPD_* indicesDmitry Baryshkov8-40/+40
Use generic RPMPD_* defines for power domain instead of using platform-specific defines. Reviewed-by: Bjorn Andersson <andersson@kernel.org> Acked-by: Bjorn Andersson <andersson@kernel.org> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20251211-rework-rpmhpd-rpmpd-v2-1-a5ec4028129f@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-21Merge tag 'soc-fixes-6.19-2' of ↵Linus Torvalds21-60/+39
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC fixes from Arnd Bergmann: "The main changes are devicetree updates for qualcomm and rockchips arm64 platforms, fixing minor mistakes in SoC and board specific settings: - GPIO settings for Pinephone Pro buttons - Register ranges for rk3576 GPU - Power domains on sc8280xp - Clocks on qcom talos - dtc warnings for extraneous properties, nonstandard node names and undocument identifiers The Tegra210 platform gets a single revert for a devicetree change that caused a 6.19 regression. On 32-bit Arm, we have trivial fixes for Microchip SAMA7 devicetree files and NPCM Kconfig, as well as Andrew Jeffery being officially listed as MAINTAINER for NPCM. A single driver fix is for Qualcomm RPMHD power domains, bringing the driver up to date with a devicetree change that added additional power domains to be enabled" * tag 'soc-fixes-6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (27 commits) MAINTAINERS: Add Andrew as M: to ARM/NUVOTON NPCM ARCHITECTURE MAINTAINERS: update email address for Yixun Lan Revert "arm64: tegra: Add interconnect properties for Tegra210" arm64: dts: rockchip: Drop unsupported properties arm64: dts: rockchip: Fix gpio pinctrl node names arm64: dts: rockchip: Fix pinctrl property typo on rk3326-odroid-go3 arm64: dts: rockchip: Drop "sitronix,st7789v" fallback compatible from rk3568-wolfvision ARM: dts: microchip: sama7d65: fix size-cells property for i2c3 ARM: dts: microchip: sama7d65: fix the ranges property for flx9 arm: npcm: drop unused Kconfig ERRATA symbol arm64: dts: rockchip: Fix wrong register range of rk3576 gpu arm64: dts: rockchip: Configure MCLK for analog sound on NanoPi M5 arm64: dts: rockchip: Fix headphones widget name on NanoPi M5 ARM: dts: microchip: lan966x: Fix the access to the PHYs for pcb8290 arm64: dts: rockchip: remove redundant max-link-speed from nanopi-r4s arm64: dts: rockchip: remove dangerous max-link-speed from helios64 arm64: dts: rockchip: fix unit-address for RK3588 NPU's core1 and core2's IOMMU arm64: dts: rockchip: Fix wifi interrupts flag on Sakura Pi RK3308B arm64: dts: qcom: sm8650: Fix compile warnings in USB controller node arm64: dts: qcom: sm8550: Fix compile warnings in USB controller node ...
2026-01-21arm64: dts: amlogic: meson-s4-s905y4-khadas-vim1s: enable SDIO interfaceNick Xie1-0/+27
Enable the SDIO controller interface connected to the on-board AP6256 WiFi/BT module. Signed-off-by: Nick Xie <nick@khadas.com> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20260121014725.122722-1-nick@khadas.com Signed-off-by: Neil Armstrong <neil.armstrong@linaro.org>
2026-01-21perf/x86/intel: Do not enable BTS for guestsFernand Sieber1-2/+11
By default when users program perf to sample branch instructions (PERF_COUNT_HW_BRANCH_INSTRUCTIONS) with a sample period of 1, perf interprets this as a special case and enables BTS (Branch Trace Store) as an optimization to avoid taking an interrupt on every branch. Since BTS doesn't virtualize, this optimization doesn't make sense when the request originates from a guest. Add an additional check that prevents this optimization for virtualized events (exclude_host). Reported-by: Jan H. Schönherr <jschoenh@amazon.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fernand Sieber <sieberf@amazon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20251211183604.868641-1-sieberf@amazon.com
2026-01-21arm64: dts: qcom: oneplus-enchilada: Specify i2c4 clock frequencyDavid Heidelberg1-0/+2
Per the binding, omitting the clock frequency from a Geni I2C controller node defaults the bus to 100 kHz. But at least in Linux, a friendly info print highlights the lack of explicitly defined frequency in the DeviceTree. Specify the frequency, to give it an explicit value, and to silence the log print in Linux. Downstream doesn't define any frequency, thus also using 100 kHz. Signed-off-by: David Heidelberg <david@ixit.cz> Link: https://lore.kernel.org/r/20251130-enchilada-i2c-freq-v1-1-2932480a0261@ixit.cz Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-21arm64: dts: qcom: sm6350: Add clocks for aggre1 & aggre2 NoCLuca Weiss1-0/+3
As per updated bindings, add the clocks for those two interconnects, which are required to set up QoS correctly. Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Signed-off-by: Luca Weiss <luca.weiss@fairphone.com> Link: https://lore.kernel.org/r/20251114-sm6350-icc-qos-v2-5-6af348cb9c69@fairphone.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-21arm64: dts: qcom: agatti: enable FastRPC on the ADSPDmitry Baryshkov1-0/+41
On Agatti platform the ADSP provides FastRPC support. Add corresponding device node, in order to be able to utilize the DSP offload from the Linux side. Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260113-agatti-fastrpc-v2-1-b66870213f89@oss.qualcomm.com Signed-off-by: Bjorn Andersson <andersson@kernel.org>
2026-01-21Merge tag 'qcom-arm64-fixes-for-6.19' of ↵Arnd Bergmann5-13/+16
https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux into arm/fixes Qualcomm Arm64 DeviceTree fixes for v6.19 Add missing power-domains to the SC8280XP RPM power-domain and ensure these are voted for from the remoteproc instances while powering them up. Clear a couple of DeviceTree validation warnings in SM8550 and SM8650 USB controller nodes. Specify the correct display panel on the OnePlus 6. Correct the UFS clock mapping on Talos, to ensure UFS is properly clocked. Add Abel's old emails address to .mailmap. * tag 'qcom-arm64-fixes-for-6.19' of https://git.kernel.org/pub/scm/linux/kernel/git/qcom/linux: arm64: dts: qcom: sm8650: Fix compile warnings in USB controller node arm64: dts: qcom: sm8550: Fix compile warnings in USB controller node arm64: dts: qcom: sc8280xp: Add missing VDD_MXC links pmdomain: qcom: rpmhpd: Add MXC to SC8280XP dt-bindings: power: qcom,rpmpd: Add SC8280XP_MXC_AO arm64: dts qcom: sdm845-oneplus-enchilada: Specify panel name within the compatible mailmap: Update email address for Abel Vesa arm64: dts: qcom: talos: Correct UFS clocks ordering Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-01-20kallsyms/bpf: rename __bpf_address_lookup() to bpf_address_lookup()Petr Mladek3-3/+3
bpf_address_lookup() has been used only in kallsyms_lookup_buildid(). It was supposed to set @modname and @modbuildid when the symbol was in a module. But it always just cleared @modname because BPF symbols were never in a module. And it did not clear @modbuildid because the pointer was not passed. The wrapper is no longer needed. Both @modname and @modbuildid are now always initialized to NULL in kallsyms_lookup_buildid(). Remove the wrapper and rename __bpf_address_lookup() to bpf_address_lookup() because this variant is used everywhere. [akpm@linux-foundation.org: fix loongarch] Link: https://lkml.kernel.org/r/20251128135920.217303-6-pmladek@suse.com Fixes: 9294523e3768 ("module: add printk formats to add module build ID to stacktraces") Signed-off-by: Petr Mladek <pmladek@suse.com> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: Aaron Tomlin <atomlin@atomlin.com> Cc: Daniel Borkman <daniel@iogearbox.net> Cc: Daniel Gomez <da.gomez@samsung.com> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Kees Cook <kees@kernel.org> Cc: Luis Chamberalin <mcgrof@kernel.org> Cc: Marc Rutland <mark.rutland@arm.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Petr Pavlu <petr.pavlu@suse.com> Cc: Sami Tolvanen <samitolvanen@google.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20watchdog: softlockup: panic when lockup duration exceeds N thresholdsLi RongQing4-4/+4
The softlockup_panic sysctl is currently a binary option: panic immediately or never panic on soft lockups. Panicking on any soft lockup, regardless of duration, can be overly aggressive for brief stalls that may be caused by legitimate operations. Conversely, never panicking may allow severe system hangs to persist undetected. Extend softlockup_panic to accept an integer threshold, allowing the kernel to panic only when the normalized lockup duration exceeds N watchdog threshold periods. This provides finer-grained control to distinguish between transient delays and persistent system failures. The accepted values are: - 0: Don't panic (unchanged) - 1: Panic when duration >= 1 * threshold (20s default, original behavior) - N > 1: Panic when duration >= N * threshold (e.g., 2 = 40s, 3 = 60s.) The original behavior is preserved for values 0 and 1, maintaining full backward compatibility while allowing systems to tolerate brief lockups while still catching severe, persistent hangs. [lirongqing@baidu.com: v2] Link: https://lkml.kernel.org/r/20251218074300.4080-1-lirongqing@baidu.com Link: https://lkml.kernel.org/r/20251216074521.2796-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Hao Luo <haoluo@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Lance Yang <lance.yang@linux.dev> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Song Liu <song@kernel.org> Cc: Stanislav Fomichev <sdf@fomichev.me> Cc: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20kernel.h: drop hex.h and update all hex.h usersRandy Dunlap7-0/+7
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of hex.h interfaces to directly #include <linux/hex.h> as part of the process of putting kernel.h on a diet. Removing hex.h from kernel.h means that 36K C source files don't have to pay the price of parsing hex.h for the roughly 120 C source files that need it. This change has been build-tested with allmodconfig on most ARCHes. Also, all users/callers of <linux/hex.h> in the entire source tree have been updated if needed (if not already #included). Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20lib/tests: convert test_uuid module to KUnitRyota Sakamoto13-13/+0
Move lib/test_uuid.c to lib/tests/uuid_kunit.c and convert it to use KUnit. This change switches the ad-hoc test code to standard KUnit test cases. The test data remains the same, but the verification logic is updated to use KUNIT_EXPECT_* macros. Also remove CONFIG_TEST_UUID from arch/*/configs/* because it is no longer used. The new CONFIG_UUID_KUNIT_TEST will be automatically enabled by CONFIG_KUNIT_ALL_TESTS. [lukas.bulwahn@redhat.com: MAINTAINERS: adjust file entry in UUID HELPERS] Link: https://lkml.kernel.org/r/20251217053907.2778515-1-lukas.bulwahn@redhat.com Link: https://lkml.kernel.org/r/20251215134322.12949-1-sakamo.ryota@gmail.com Signed-off-by: Ryota Sakamoto <sakamo.ryota@gmail.com> Signed-off-by: Lukas Bulwahn <lukas.bulwahn@redhat.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: David Gow <davidgow@google.com> Cc: Andriy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Brendan Higgins <brendan.higgins@linux.dev> Cc: Lukas Bulwahn <lukas.bulwahn@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20mm/mmu_gather: remove @delay_remap of __tlb_remove_page_size()Wei Yang1-4/+2
__tlb_remove_page_size() is only used in tlb_remove_page_size() with @delay_remap set to false and it is passed directly to __tlb_remove_folio_pages_size(). Remove @delay_remap of __tlb_remove_page_size() and call __tlb_remove_folio_pages_size() with false @delay_remap. Link: https://lkml.kernel.org/r/20251231030026.15938-1-richard.weiyang@gmail.com Signed-off-by: Wei Yang <richard.weiyang@gmail.com> Acked-by: SeongJae Park <sj@kernel.org> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Acked-by: Will Deacon <will@kernel.org> Acked-by: Heiko Carstens <hca@linux.ibm.com> # s390 Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Christian Borntraeger <borntraeger@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Sven Schnelle <svens@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20x86/clear_page: introduce clear_pages()Ankur Arora1-5/+12
Performance when clearing with string instructions (x86-64-stosq and similar) can vary significantly based on the chunk-size used. $ perf bench mem memset -k 4KB -s 4GB -f x86-64-stosq # Running 'mem/memset' benchmark: # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) # Copying 4GB bytes ... 13.748208 GB/sec $ perf bench mem memset -k 2MB -s 4GB -f x86-64-stosq # Running 'mem/memset' benchmark: # function 'x86-64-stosq' (movsq-based memset() in # arch/x86/lib/memset_64.S) # Copying 4GB bytes ... 15.067900 GB/sec $ perf bench mem memset -k 1GB -s 4GB -f x86-64-stosq # Running 'mem/memset' benchmark: # function 'x86-64-stosq' (movsq-based memset() in arch/x86/lib/memset_64.S) # Copying 4GB bytes ... 38.104311 GB/sec (Both on AMD Milan.) With a change in chunk-size from 4KB to 1GB, we see the performance go from 13.7 GB/sec to 38.1 GB/sec. For the chunk-size of 2MB the change isn't quite as drastic but it is worth adding a clear_page() variant that can handle contiguous page-extents. Link: https://lkml.kernel.org/r/20260107072009.1615991-6-ankur.a.arora@oracle.com Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Tested-by: Raghavendra K T <raghavendra.kt@amd.com> Reviewed-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20x86/mm: simplify clear_page_*Ankur Arora3-47/+67
clear_page_rep() and clear_page_erms() are wrappers around "REP; STOS" variations. Inlining gets rid of an unnecessary CALL/RET (which isn't free when using RETHUNK speculative execution mitigations.) Fixup and rename clear_page_orig() to adapt to the changed calling convention. Also add a comment from Dave Hansen detailing various clearing mechanisms used in clear_page(). Link: https://lkml.kernel.org/r/20260107072009.1615991-5-ankur.a.arora@oracle.com Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Tested-by: Raghavendra K T <raghavendra.kt@amd.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Andy Lutomirski <luto@kernel.org> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Hildenbrand <david@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20treewide: provide a generic clear_user_page() variantDavid Hildenbrand21-26/+7
Patch series "mm: folio_zero_user: clear page ranges", v11. This series adds clearing of contiguous page ranges for hugepages. The series improves on the current discontiguous clearing approach in two ways: - clear pages in a contiguous fashion. - use batched clearing via clear_pages() wherever exposed. The first is useful because it allows us to make much better use of hardware prefetchers. The second, enables advertising the real extent to the processor. Where specific instructions support it (ex. string instructions on x86; "mops" on arm64 etc), a processor can optimize based on this because, instead of seeing a sequence of 8-byte stores, or a sequence of 4KB pages, it sees a larger unit being operated on. For instance, AMD Zen uarchs (for extents larger than LLC-size) switch to a mode where they start eliding cacheline allocation. This is helpful not just because it results in higher bandwidth, but also because now the cache is not evicting useful cachelines and replacing them with zeroes. Demand faulting a 64GB region shows performance improvement: $ perf bench mem mmap -p $pg-sz -f demand -s 64GB -l 5 baseline +series (GBps +- %stdev) (GBps +- %stdev) pg-sz=2MB 11.76 +- 1.10% 25.34 +- 1.18% [*] +115.47% preempt=* pg-sz=1GB 24.85 +- 2.41% 39.22 +- 2.32% + 57.82% preempt=none|voluntary pg-sz=1GB (similar) 52.73 +- 0.20% [#] +112.19% preempt=full|lazy [*] This improvement is because switching to sequential clearing allows the hardware prefetchers to do a much better job. [#] For pg-sz=1GB a large part of the improvement is because of the cacheline elision mentioned above. preempt=full|lazy improves upon that because, not needing explicit invocations of cond_resched() to ensure reasonable preemption latency, it can clear the full extent as a single unit. In comparison the maximum extent used for preempt=none|voluntary is PROCESS_PAGES_NON_PREEMPT_BATCH (32MB). When provided the full extent the processor forgoes allocating cachelines on this path almost entirely. (The hope is that eventually, in the fullness of time, the lazy preemption model will be able to do the same job that none or voluntary models are used for, allowing us to do away with cond_resched().) Raghavendra also tested previous version of the series on AMD Genoa and sees similar improvement [1] with preempt=lazy. $ perf bench mem map -p $page-size -f populate -s 64GB -l 10 base patched change pg-sz=2MB 12.731939 GB/sec 26.304263 GB/sec 106.6% pg-sz=1GB 26.232423 GB/sec 61.174836 GB/sec 133.2% This patch (of 8): Let's drop all variants that effectively map to clear_page() and provide it in a generic variant instead. We'll use the macro clear_user_page to indicate whether an architecture provides it's own variant. Also, clear_user_page() is only called from the generic variant of clear_user_highpage(), so define it only if the architecture does not provide a clear_user_highpage(). And, for simplicity define it in linux/highmem.h. Note that for parisc, clear_page() and clear_user_page() map to clear_page_asm(), so we can just get rid of the custom clear_user_page() implementation. There is a clear_user_page_asm() function on parisc, that seems to be unused. Not sure what's up with that. Link: https://lkml.kernel.org/r/20260107072009.1615991-1-ankur.a.arora@oracle.com Link: https://lkml.kernel.org/r/20260107072009.1615991-2-ankur.a.arora@oracle.com Signed-off-by: David Hildenbrand <david@redhat.com> Co-developed-by: Ankur Arora <ankur.a.arora@oracle.com> Signed-off-by: Ankur Arora <ankur.a.arora@oracle.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Ankur Arora <ankur.a.arora@oracle.com> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Hildenbrand <david@kernel.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Konrad Rzessutek Wilk <konrad.wilk@oracle.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Li Zhe <lizhe.67@bytedance.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Mateusz Guzik <mjguzik@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Raghavendra K T <raghavendra.kt@amd.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20mm: add basic tests for lazy_mmuKevin Brodsky1-1/+3
Add basic KUnit tests for the generic aspects of the lazy MMU mode: ensure that it appears active when it should, depending on how enable/disable and pause/resume pairs are nested. [akpm@linux-foundation.org: export ppc64_tlb_batch and __flush_tlb_pending to modules] [ritesh.list@gmail.com: use EXPORT_SYMBOL_IF_KUNIT()] Link: https://lkml.kernel.org/r/87a4zhkt6h.ritesh.list@gmail.com [kevin.brodsky@arm.com: move MODULE_IMPORT_NS(), add comment] Link: https://lkml.kernel.org/r/20251217163812.2633648-2-kevin.brodsky@arm.com Link: https://lkml.kernel.org/r/20251215150323.2218608-15-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Signed-off-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20x86/xen: use lazy_mmu_state when context-switchingKevin Brodsky2-5/+2
We currently set a TIF flag when scheduling out a task that is in lazy MMU mode, in order to restore it when the task is scheduled again. The generic lazy_mmu layer now tracks whether a task is in lazy MMU mode in task_struct::lazy_mmu_state. We can therefore check that state when switching to the new task, instead of using a separate TIF flag. Link: https://lkml.kernel.org/r/20251215150323.2218608-14-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Juergen Gross <jgross@suse.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20sparc/mm: replace batch->active with is_lazy_mmu_mode_active()Kevin Brodsky2-9/+1
A per-CPU batch struct is activated when entering lazy MMU mode; its lifetime is the same as the lazy MMU section (it is deactivated when leaving the mode). Preemption is disabled in that interval to ensure that the per-CPU reference remains valid. The generic lazy_mmu layer now tracks whether a task is in lazy MMU mode. We can therefore use the generic helper is_lazy_mmu_mode_active() to tell whether a batch struct is active instead of tracking it explicitly. Link: https://lkml.kernel.org/r/20251215150323.2218608-13-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Acked-by: Andreas Larsson <andreas@gaisler.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20powerpc/mm: replace batch->active with is_lazy_mmu_mode_active()Kevin Brodsky2-10/+1
A per-CPU batch struct is activated when entering lazy MMU mode; its lifetime is the same as the lazy MMU section (it is deactivated when leaving the mode). Preemption is disabled in that interval to ensure that the per-CPU reference remains valid. The generic lazy_mmu layer now tracks whether a task is in lazy MMU mode. We can therefore use the generic helper is_lazy_mmu_mode_active() to tell whether a batch struct is active instead of tracking it explicitly. Link: https://lkml.kernel.org/r/20251215150323.2218608-12-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand (Red Hat) <david@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20arm64: mm: replace TIF_LAZY_MMU with is_lazy_mmu_mode_active()Kevin Brodsky2-18/+4
The generic lazy_mmu layer now tracks whether a task is in lazy MMU mode. As a result we no longer need a TIF flag for that purpose - let's use the new is_lazy_mmu_mode_active() helper instead. The explicit check for in_interrupt() is no longer necessary either as is_lazy_mmu_mode_active() always returns false in interrupt context. Link: https://lkml.kernel.org/r/20251215150323.2218608-11-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20mm: enable lazy_mmu sections to nestKevin Brodsky1-12/+0
Despite recent efforts to prevent lazy_mmu sections from nesting, it remains difficult to ensure that it never occurs - and in fact it does occur on arm64 in certain situations (CONFIG_DEBUG_PAGEALLOC). Commit 1ef3095b1405 ("arm64/mm: Permit lazy_mmu_mode to be nested") made nesting tolerable on arm64, but without truly supporting it: the inner call to leave() disables the batching optimisation before the outer section ends. This patch actually enables lazy_mmu sections to nest by tracking the nesting level in task_struct, in a similar fashion to e.g. pagefault_{enable,disable}(). This is fully handled by the generic lazy_mmu helpers that were recently introduced. lazy_mmu sections were not initially intended to nest, so we need to clarify the semantics w.r.t. the arch_*_lazy_mmu_mode() callbacks. This patch takes the following approach: * The outermost calls to lazy_mmu_mode_{enable,disable}() trigger calls to arch_{enter,leave}_lazy_mmu_mode() - this is unchanged. * Nested calls to lazy_mmu_mode_{enable,disable}() are not forwarded to the arch via arch_{enter,leave} - lazy MMU remains enabled so the assumption is that these callbacks are not relevant. However, existing code may rely on a call to disable() to flush any batched state, regardless of nesting. arch_flush_lazy_mmu_mode() is therefore called in that situation. A separate interface was recently introduced to temporarily pause the lazy MMU mode: lazy_mmu_mode_{pause,resume}(). pause() fully exits the mode *regardless of the nesting level*, and resume() restores the mode at the same nesting level. pause()/resume() are themselves allowed to nest, so we actually store two nesting levels in task_struct: enable_count and pause_count. A new helper is_lazy_mmu_mode_active() is introduced to determine whether we are currently in lazy MMU mode; this will be used in subsequent patches to replace the various ways arch's currently track whether the mode is enabled. In summary (enable/pause represent the values *after* the call): lazy_mmu_mode_enable() -> arch_enter() enable=1 pause=0 lazy_mmu_mode_enable() -> ø enable=2 pause=0 lazy_mmu_mode_pause() -> arch_leave() enable=2 pause=1 lazy_mmu_mode_resume() -> arch_enter() enable=2 pause=0 lazy_mmu_mode_disable() -> arch_flush() enable=1 pause=0 lazy_mmu_mode_disable() -> arch_leave() enable=0 pause=0 Note: is_lazy_mmu_mode_active() is added to <linux/sched.h> to allow arch headers included by <linux/pgtable.h> to use it. Link: https://lkml.kernel.org/r/20251215150323.2218608-10-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jann Horn <jannh@google.com> Cc: Juegren Gross <jgross@suse.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ritesh Harjani (IBM) <ritesh.list@gmail.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Cc: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-20mm: bail out of lazy_mmu_mode_* in interrupt contextKevin Brodsky1-9/+0
The lazy MMU mode cannot be used in interrupt context. This is documented in <linux/pgtable.h>, but isn't consistently handled across architectures. arm64 ensures that calls to lazy_mmu_mode_* have no effect in interrupt context, because such calls do occur in certain configurations - see commit b81c688426a9 ("arm64/mm: Disable barrier batching in interrupt contexts"). Other architectures do not check this situation, most likely because it hasn't occurred so far. Let's handle this in the new generic lazy_mmu layer, in the same fashion as arm64: bail out of lazy_mmu_mode_* if in_interrupt(). Also remove the arm64 handling that is now redundant. Both arm64 and x86/Xen also ensure that any lazy MMU optimisation is disabled while in interrupt (see queue_pte_barriers() and xen_get_lazy_mode() respectively). This will be handled in the generic layer in a subsequent patch. Link: https://lkml.kernel.org/r/20251215150323.2218608-9-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Acked-by: David Hildenbrand (Red Hat) <david@kernel.org> Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Yeoreum Yun <yeoreum.yun@arm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Andreas Larsson <andreas@gaisler.com> Cc: Borislav Betkov <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: David Hildenbrand <david@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: David Woodhouse <dwmw2@infradead.org> Cc: