aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2026-04-10sched/eevdf: Clear buddies for preempt_shortVincent Guittot1-2/+4
next buddy should not prevent shorter slice preemption. Don't take buddy into account when checking if shorter slice entity can preempt and clear it if the entity with a shorter slice can preempt current. Test on snapdragon rb5: hackbench -T -p -l 16000000 -g 2 1> /dev/null & hackbench runs in cgroup /test-A cyclictest -t 1 -i 2777 -D 63 --policy=fair --mlock -h 20000 -q cyclictest runs in cgroup /test-B tip/sched/core tip/sched/core +this patch cyclictest slice (ms) (default)2.8 8 8 hackbench slice (ms) (default)2.8 20 20 Total Samples | 22679 22595 22686 Average (us) | 84 94(-12%) 59( 37%) Median (P50) (us) | 56 56( 0%) 56( 0%) 90th Percentile (us) | 64 65(- 2%) 63( 3%) 99th Percentile (us) | 1047 1273(-22%) 74( 94%) 99.9th Percentile (us) | 2431 4751(-95%) 663( 86%) Maximum (us) | 4694 8655(-84%) 3934( 55%) Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260410132321.2897789-1-vincent.guittot@linaro.org
2026-04-10Merge branches 'for-next/misc', 'for-next/tlbflush', ↵Catalin Marinas1-97/+10
'for-next/ttbr-macros-cleanup', 'for-next/kselftest', 'for-next/feat_lsui', 'for-next/mpam', 'for-next/hotplug-batched-tlbi', 'for-next/bbml2-fixes', 'for-next/sysreg', 'for-next/generic-entry' and 'for-next/acpi', remote-tracking branches 'arm64/for-next/perf' and 'arm64/for-next/read-once' into for-next/core * arm64/for-next/perf: : Perf updates perf/arm-cmn: Fix resource_size_t printk specifier in arm_cmn_init_dtc() perf/arm-cmn: Fix incorrect error check for devm_ioremap() perf: add NVIDIA Tegra410 C2C PMU perf: add NVIDIA Tegra410 CPU Memory Latency PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE-TGT PMU perf/arm_cspmu: nvidia: Add Tegra410 PCIE PMU perf/arm_cspmu: Add arm_cspmu_acpi_dev_get perf/arm_cspmu: nvidia: Add Tegra410 UCF PMU perf/arm_cspmu: nvidia: Rename doc to Tegra241 perf/arm-cmn: Stop claiming entire iomem region arm64: cpufeature: Use pmuv3_implemented() function arm64: cpufeature: Make PMUVer and PerfMon unsigned KVM: arm64: Read PMUVer as unsigned * arm64/for-next/read-once: : Fixes for __READ_ONCE() with CONFIG_LTO=y arm64, compiler-context-analysis: Permit alias analysis through __READ_ONCE() with CONFIG_LTO=y arm64: Optimize __READ_ONCE() with CONFIG_LTO=y * for-next/misc: : Miscellaneous cleanups/fixes arm64: rsi: use linear-map alias for realm config buffer arm64: Kconfig: fix duplicate word in CMDLINE help text arm64: mte: Skip TFSR_EL1 checks and barriers in synchronous tag check mode arm64/hwcap: Generate the KERNEL_HWCAP_ definitions for the hwcaps arm64: kexec: Remove duplicate allocation for trans_pgd arm64: mm: Use generic enum pgtable_level arm64: scs: Remove redundant save/restore of SCS SP on entry to/from EL0 arm64: remove ARCH_INLINE_* * for-next/tlbflush: : Refactor the arm64 TLB invalidation API and implementation arm64: mm: __ptep_set_access_flags must hint correct TTL arm64: mm: Provide level hint for flush_tlb_page() arm64: mm: Wrap flush_tlb_page() around __do_flush_tlb_range() arm64: mm: More flags for __flush_tlb_range() arm64: mm: Refactor __flush_tlb_range() to take flags arm64: mm: Refactor flush_tlb_page() to use __tlbi_level_asid() arm64: mm: Simplify __flush_tlb_range_limit_excess() arm64: mm: Simplify __TLBI_RANGE_NUM() macro arm64: mm: Re-implement the __flush_tlb_range_op macro in C arm64: mm: Inline __TLBI_VADDR_RANGE() into __tlbi_range() arm64: mm: Push __TLBI_VADDR() into __tlbi_level() arm64: mm: Implicitly invalidate user ASID based on TLBI operation arm64: mm: Introduce a C wrapper for by-range TLB invalidation arm64: mm: Re-implement the __tlbi_level macro as a C function * for-next/ttbr-macros-cleanup: : Cleanups of the TTBR1_* macros arm64/mm: Directly use TTBRx_EL1_CnP arm64/mm: Directly use TTBRx_EL1_ASID_MASK arm64/mm: Describe TTBR1_BADDR_4852_OFFSET * for-next/kselftest: : arm64 kselftest updates selftests/arm64: Implement cmpbr_sigill() to hwcap test * for-next/feat_lsui: : Futex support using FEAT_LSUI instructions to avoid toggling PAN arm64: armv8_deprecated: Disable swp emulation when FEAT_LSUI present arm64: Kconfig: Add support for LSUI KVM: arm64: Use CAST instruction for swapping guest descriptor arm64: futex: Support futex with FEAT_LSUI arm64: futex: Refactor futex atomic operation KVM: arm64: kselftest: set_id_regs: Add test for FEAT_LSUI KVM: arm64: Expose FEAT_LSUI to guests arm64: cpufeature: Add FEAT_LSUI * for-next/mpam: (40 commits) : Expose MPAM to user-space via resctrl: : - Add architecture context-switch and hiding of the feature from KVM. : - Add interface to allow MPAM to be exposed to user-space using resctrl. : - Add errata workaoround for some existing platforms. : - Add documentation for using MPAM and what shape of platforms can use resctrl arm64: mpam: Add initial MPAM documentation arm_mpam: Quirk CMN-650's CSU NRDY behaviour arm_mpam: Add workaround for T241-MPAM-6 arm_mpam: Add workaround for T241-MPAM-4 arm_mpam: Add workaround for T241-MPAM-1 arm_mpam: Add quirk framework arm_mpam: resctrl: Call resctrl_init() on platforms that can support resctrl arm64: mpam: Select ARCH_HAS_CPU_RESCTRL arm_mpam: resctrl: Add empty definitions for assorted resctrl functions arm_mpam: resctrl: Update the rmid reallocation limit arm_mpam: resctrl: Add resctrl_arch_rmid_read() arm_mpam: resctrl: Allow resctrl to allocate monitors arm_mpam: resctrl: Add support for csu counters arm_mpam: resctrl: Add monitor initialisation and domain boilerplate arm_mpam: resctrl: Add kunit test for control format conversions arm_mpam: resctrl: Add support for 'MB' resource arm_mpam: resctrl: Wait for cacheinfo to be ready arm_mpam: resctrl: Add rmid index helpers arm_mpam: resctrl: Convert to/from MPAMs fixed-point formats arm_mpam: resctrl: Hide CDP emulation behind CONFIG_EXPERT ... * for-next/hotplug-batched-tlbi: : arm64/mm: Enable batched TLB flush in unmap_hotplug_range() arm64/mm: Reject memory removal that splits a kernel leaf mapping arm64/mm: Enable batched TLB flush in unmap_hotplug_range() * for-next/bbml2-fixes: : Fixes for realm guest and BBML2_NOABORT arm64: mm: Remove pmd_sect() and pud_sect() arm64: mm: Handle invalid large leaf mappings correctly arm64: mm: Fix rodata=full block mapping support for realm guests * for-next/sysreg: : arm64 sysreg updates arm64/sysreg: Update ID_AA64SMFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ZFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64FPFR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR2_EL1 description to DDI0601 2025-12 arm64/sysreg: Update ID_AA64ISAR0_EL1 description to DDI0601 2025-12 arm64/sysreg: Update SMIDR_EL1 to DDI0601 2025-06 * for-next/generic-entry: : More arm64 refactoring towards using the generic entry code arm64: Check DAIF (and PMR) at task-switch time arm64: entry: Use split preemption logic arm64: entry: Use irqentry_{enter_from,exit_to}_kernel_mode() arm64: entry: Consistently prefix arm64-specific wrappers arm64: entry: Don't preempt with SError or Debug masked entry: Split preemption from irqentry_exit_to_kernel_mode() entry: Split kernel mode logic from irqentry_{enter,exit}() entry: Move irqentry_enter() prototype later entry: Remove local_irq_{enable,disable}_exit_to_user() entry: Fix stale comment for irqentry_enter() * for-next/acpi: : arm64 ACPI updates ACPI: AGDI: fix missing newline in error message
2026-04-10Merge branches 'pm-cpuidle', 'pm-opp' and 'pm-sleep'Rafael J. Wysocki1-2/+5
Merge cpuidle updates, OPP (operating performance points) library updates, and updates related to system suspend and hibernation for 7.1-rc1: - Refine stopped tick handling in the menu cpuidle governor and rearrange stopped tick handling in the teo cpuidle governor (Rafael Wysocki) - Add Panther Lake C-states table to the intel_idle driver (Artem Bityutskiy) - Clean up dead dependencies on CPU_IDLE in Kconfig (Julian Braha) - Simplify cpuidle_register_device() with guard() (Huisong Li) - Use performance level if available to distinguish between rates in OPP debugfs (Manivannan Sadhasivam) - Fix scoped_guard in dev_pm_opp_xlate_required_opp() (Viresh Kumar) - Return -ENODATA if the snapshot image is not loaded (Alberto Garcia) - Remove inclusion of crypto/hash.h from hibernate_64.c on x86 (Eric Biggers) * pm-cpuidle: cpuidle: Simplify cpuidle_register_device() with guard() cpuidle: clean up dead dependencies on CPU_IDLE in Kconfig intel_idle: Add Panther Lake C-states table cpuidle: governors: teo: Rearrange stopped tick handling cpuidle: governors: menu: Refine stopped tick handling * pm-opp: OPP: Move break out of scoped_guard in dev_pm_opp_xlate_required_opp() OPP: debugfs: Use performance level if available to distinguish between rates * pm-sleep: PM: hibernate: return -ENODATA if the snapshot image is not loaded PM: hibernate: x86: Remove inclusion of crypto/hash.h
2026-04-10Merge branch 'pm-cpufreq'Rafael J. Wysocki1-2/+3
Merge cpufreq updates for 7.1-rc1: - Update qcom-hw DT bindings to include Eliza hardware (Abel Vesa) - Update cpufreq-dt-platdev blocklist (Faruque Ansari) - Minor updates to driver and dt-bindings for Tegra (Thierry Reding, Rosen Penev) - Add MAINTAINERS entry for CPPC driver (Viresh Kumar) - Add support for new features: CPPC performance priority, Dynamic EPP, Raw EPP, and new unit tests for them to amd-pstate (Gautham Shenoy, Mario Limonciello) - Fix sysfs files being present when HW missing and broken/outdated documentation in the amd-pstate driver (Ninad Naik, Gautham Shenoy) - Pass the policy to cpufreq_driver->adjust_perf() to avoid using cpufreq_cpu_get() in the .adjust_perf() callback in amd-pstate which leads to a scheduling-while-atomic bug (K Prateek Nayak) - Clean up dead code in Kconfig for cpufreq (Julian Braha) - Remove max_freq_req update for pre-existing cpufreq policy and add a boost_freq_req QoS request to save the boost constraint instead of overwriting the last scaling_max_freq constraint (Pierre Gondois) - Embed cpufreq QoS freq_req objects in cpufreq policy so they all are allocated in one go along with the policy to simplify lifetime rules and avoid error handling issues (Viresh Kumar) - Use DMI max speed when CPPC is unavailable in the acpi-cpufreq scaling driver (Henry Tseng) - Switch policy_is_shared() in cpufreq to using cpumask_nth() instead of cpumask_weight() because the former is more efficient (Yury Norov) - Use sysfs_emit() in sysfs show functions for cpufreq governor attributes (Thorsten Blum) - Update intel_pstate to stop returning an error when "off" is written to its status sysfs attribute while the driver is already off (Fabio De Francesco) - Include current frequency in the debug message printed by __cpufreq_driver_target() (Pengjie Zhang) * pm-cpufreq: (38 commits) cpufreq/amd-pstate: Add POWER_SUPPLY select for dynamic EPP MAINTAINERS: amd-pstate: Step down as maintainer, add Prateek as reviewer cpufreq: Pass the policy to cpufreq_driver->adjust_perf() cpufreq/amd-pstate: Pass the policy to amd_pstate_update() cpufreq/amd-pstate-ut: Add a unit test for raw EPP cpufreq/amd-pstate: Add support for raw EPP writes cpufreq/amd-pstate: Add support for platform profile class cpufreq/amd-pstate: add kernel command line to override dynamic epp cpufreq/amd-pstate: Add dynamic energy performance preference Documentation: amd-pstate: fix dead links in the reference section cpufreq/amd-pstate: Cache the max frequency in cpudata Documentation/amd-pstate: Add documentation for amd_pstate_floor_{freq,count} Documentation/amd-pstate: List amd_pstate_prefcore_ranking sysfs file Documentation/amd-pstate: List amd_pstate_hw_prefcore sysfs file amd-pstate-ut: Add a testcase to validate the visibility of driver attributes amd-pstate-ut: Add module parameter to select testcases amd-pstate: Introduce a tracepoint trace_amd_pstate_cppc_req2() amd-pstate: Add sysfs support for floor_freq and floor_count amd-pstate: Add support for CPPC_REQ2 and FLOOR_PERF x86/cpufeatures: Add AMD CPPC Performance Priority feature. ...
2026-04-09cgroup/rdma: fix swapped arguments in pr_warn() format stringcuitao1-1/+1
The format string says "device %p ... rdma cgroup %p" but the arguments were passed as (cg, device), printing them in the wrong order. Signed-off-by: cuitao <cuitao@kylinos.cn> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-04-09bpf: Fix use-after-free in offloaded map/prog info fillJiayuan Chen1-6/+4
When querying info for an offloaded BPF map or program, bpf_map_offload_info_fill_ns() and bpf_prog_offload_info_fill_ns() obtain the network namespace with get_net(dev_net(offmap->netdev)). However, the associated netdev's netns may be racing with teardown during netns destruction. If the netns refcount has already reached 0, get_net() performs a refcount_t increment on 0, triggering: refcount_t: addition on 0; use-after-free. Although rtnl_lock and bpf_devs_lock ensure the netdev pointer remains valid, they cannot prevent the netns refcount from reaching zero. Fix this by using maybe_get_net() instead of get_net(). maybe_get_net() uses refcount_inc_not_zero() and returns NULL if the refcount is already zero, which causes ns_get_path_cb() to fail and the caller to return -ENOENT -- the correct behavior when the netns is being destroyed. Fixes: 675fc275a3a2d ("bpf: offload: report device information for offloaded programs") Fixes: 52775b33bb507 ("bpf: offload: report device information about offloaded maps") Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Closes: https://lore.kernel.org/bpf/f0aa3678-79c9-47ae-9e8c-02a3d1df160a@hust.edu.cn/ Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260409023733.168050-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-09bpf: Drop pkt_end markers on arithmetic to prevent is_pkt_ptr_branch_takenDaniel Borkmann1-8/+22
When a pkt pointer acquires AT_PKT_END or BEYOND_PKT_END range from a comparison, and then, known-constant arithmetic is performed, adjust_ptr_min_max_vals() copies the stale range via dst_reg->raw = ptr_reg->raw without clearing the negative reg->range sentinel values. This lets is_pkt_ptr_branch_taken() choose one branch direction and skip going through the other. Fix this by clearing negative pkt range values (that is, AT_PKT_END and BEYOND_PKT_END) after arithmetic on pkt pointers. This ensures is_pkt_ptr_branch_taken() returns unknown and both branches are properly verified. Fixes: 6d94e741a8ff ("bpf: Support for pointers beyond pkt_end.") Reported-by: STAR Labs SG <info@starlabs.sg> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260409155016.536608-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-09Merge tag 'dma-mapping-7.0-2026-04-09' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fix from Marek Szyprowski: "A fix for DMA-mapping subsystem, which hides annoying, false-positive warnings from DMA-API debug on coherent platforms like x86_64 (Mikhail Gavrilov)" * tag 'dma-mapping-7.0-2026-04-09' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-debug: suppress cacheline overlap warning when arch has no DMA alignment requirement
2026-04-08bpf: Remove static qualifier from local subprog pointerDaniel Borkmann1-2/+2
The local subprog pointer in create_jt() and visit_abnormal_return_insn() was declared static. It is unconditionally assigned via bpf_find_containing_subprog() before every use. Thus, the static qualifier serves no purpose and rather creates confusion. Just remove it. Fixes: e40f5a6bf88a ("bpf: correct stack liveness for tail calls") Fixes: 493d9e0d6083 ("bpf, x86: add support for indirect jumps") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20260408191242.526279-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-08bpf: Fix ld_{abs,ind} failure path analysis in subprogsDaniel Borkmann1-2/+31
Usage of ld_{abs,ind} instructions got extended into subprogs some time ago via commit 09b28d76eac4 ("bpf: Add abnormal return checks."). These are only allowed in subprograms when the latter are BTF annotated and have scalar return types. The code generator in bpf_gen_ld_abs() has an abnormal exit path (r0=0 + exit) from legacy cBPF times. While the enforcement is on scalar return types, the verifier must also simulate the path of abnormal exit if the packet data load via ld_{abs,ind} failed. This is currently not the case. Fix it by having the verifier simulate both success and failure paths, and extend it in similar ways as we do for tail calls. The success path (r0=unknown, continue to next insn) is pushed onto stack for later validation and the r0=0 and return to the caller is done on the fall-through side. Fixes: 09b28d76eac4 ("bpf: Add abnormal return checks.") Reported-by: STAR Labs SG <info@starlabs.sg> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260408191242.526279-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-08bpf: Propagate error from visit_tailcall_insnDaniel Borkmann1-2/+5
Commit e40f5a6bf88a ("bpf: correct stack liveness for tail calls") added visit_tailcall_insn() but did not check its return value. Fixes: e40f5a6bf88a ("bpf: correct stack liveness for tail calls") Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260408191242.526279-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-08bpf: Make find_linfo widely availableKumar Kartikeya Dwivedi2-42/+38
Move find_linfo() as bpf_find_linfo() into core.c to allow for its use in the verifier in subsequent patches. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/20260408021359.3786905-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-08bpf: Extract bpf_get_linfo_file_lineKumar Kartikeya Dwivedi1-8/+21
Extract bpf_get_linfo_file_line as its own function so that the logic to obtain the file, line, and line number for a given program can be shared in subsequent patches. Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260408021359.3786905-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-08Merge branch kvm-arm64/hyp-tracing into kvmarm-master/nextMarc Zyngier9-53/+2557
* kvm-arm64/hyp-tracing: (40 commits) : . : EL2 tracing support, adding both 'remote' ring-buffer : infrastructure and the tracing itself, courtesy of : Vincent Donnefort. From the cover letter: : : "The growing set of features supported by the hypervisor in protected : mode necessitates debugging and profiling tools. Tracefs is the : ideal candidate for this task: : : * It is simple to use and to script. : : * It is supported by various tools, from the trace-cmd CLI to the : Android web-based perfetto. : : * The ring-buffer, where are stored trace events consists of linked : pages, making it an ideal structure for sharing between kernel and : hypervisor. : : This series first introduces a new generic way of creating remote events and : remote buffers. Then it adds support to the pKVM hypervisor." : . tracing: selftests: Extend hotplug testing for trace remotes tracing: Non-consuming read for trace remotes with an offline CPU tracing: Adjust cmd_check_undefined to show unexpected undefined symbols tracing: Restore accidentally removed SPDX tag KVM: arm64: avoid unused-variable warning tracing: Generate undef symbols allowlist for simple_ring_buffer KVM: arm64: tracing: add ftrace dependency tracing: add more symbols to whitelist tracing: Update undefined symbols allow list for simple_ring_buffer KVM: arm64: Fix out-of-tree build for nVHE/pKVM tracing tracing: selftests: Add hypervisor trace remote tests KVM: arm64: Add selftest event support to nVHE/pKVM hyp KVM: arm64: Add hyp_enter/hyp_exit events to nVHE/pKVM hyp KVM: arm64: Add event support to the nVHE/pKVM hyp and trace remote KVM: arm64: Add trace reset to the nVHE/pKVM hyp KVM: arm64: Sync boot clock with the nVHE/pKVM hyp KVM: arm64: Add trace remote for the nVHE/pKVM hyp KVM: arm64: Add tracing capability for the nVHE/pKVM hyp KVM: arm64: Support unaligned fixmap in the pKVM hyp KVM: arm64: Initialise hyp_nr_cpus for nVHE hyp ... Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-04-08perf/events: Replace READ_ONCE() with standard pgtable accessorsAnshuman Khandual1-3/+3
Replace raw READ_ONCE() dereferences of pgtable entries with corresponding standard page table accessors pxdp_get() in perf_get_pgtable_size(). These accessors default to READ_ONCE() on platforms that don't override them. So there is no functional change on such platforms. However arm64 platform is being extended to support 128 bit page tables via a new architecture feature i.e FEAT_D128 in which case READ_ONCE() will not provide required single copy atomic access for 128 bit page table entries. Although pxdp_get() accessors can later be overridden on arm64 platform to extend required single copy atomicity support on 128 bit entries. Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260227062744.2215491-1-anshuman.khandual@arm.com
2026-04-08sched/rt: Cleanup global RT bandwidth functionsMichal Koutný1-24/+0
The commit 5f6bd380c7bdb ("sched/rt: Remove default bandwidth control") and followup changes made a few of the functions unnecessary, drop them for simplicity. Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260323-sched-rert_groups-v3-3-1e7d5ed6b249@suse.com
2026-04-08sched/rt: Move group schedulability check to sched_rt_global_validate()Michal Koutný1-9/+8
The sched_rt_global_constraints() function is a remnant that used to set up global RT throttling but that is no more since commit 5f6bd380c7bdb ("sched/rt: Remove default bandwidth control") and the function ended up only doing schedulability check. Move the check into the validation function where it fits better. (The order of validations sched_dl_global_validate() and sched_rt_global_validate() shouldn't matter.) Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260323-sched-rert_groups-v3-2-1e7d5ed6b249@suse.com
2026-04-08sched/rt: Skip group schedulable check with rt_group_sched=0Michal Koutný1-3/+2
The warning from the commit 87f1fb77d87a6 ("sched: Add RT_GROUP WARN checks for non-root task_groups") is wrong -- it assumes that only task_groups with rt_rq are traversed, however, the schedulability check would iterate all task_groups even when rt_group_sched=0 is disabled at boot time but some non-root task_groups exist. The schedulability check is supposed to validate: a) that children don't overcommit its parent, b) no RT task group overcommits global RT limit. but with rt_group_sched=0 there is no (non-trivial) hierarchy of RT groups, therefore skip the validation altogether. Otherwise, writes to the global sched_rt_runtime_us knob will be rejected with incorrect validation error. This fix is immaterial with CONFIG_RT_GROUP_SCHED=n. Fixes: 87f1fb77d87a6 ("sched: Add RT_GROUP WARN checks for non-root task_groups") Signed-off-by: Michal Koutný <mkoutny@suse.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260323-sched-rert_groups-v3-1-1e7d5ed6b249@suse.com
2026-04-08sched/deadline: Use revised wakeup rule for dl_serverPeter Zijlstra1-1/+1
John noted that commit 115135422562 ("sched/deadline: Fix 'stuck' dl_server") unfixed the issue from commit a3a70caf7906 ("sched/deadline: Fix dl_server behaviour"). The issue in commit 115135422562 was for wakeups of the server after the deadline; in which case you *have* to start a new period. The case for a3a70caf7906 is wakeups before the deadline. Now, because the server is effectively running a least-laxity policy, it means that any wakeup during the runnable phase means dl_entity_overflow() will be true. This means we need to adjust the runtime to allow it to still run until the existing deadline expires. Use the revised wakeup rule for dl_defer entities. Fixes: 115135422562 ("sched/deadline: Fix 'stuck' dl_server") Reported-by: John Stultz <jstultz@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Juri Lelli <juri.lelli@redhat.com> Tested-by: John Stultz <jstultz@google.com> Link: https://patch.msgid.link/20260404102244.GB22575@noisy.programming.kicks-ass.net
2026-04-08entry: Split kernel mode logic from irqentry_{enter,exit}()Mark Rutland1-95/+8
The generic irqentry code has entry/exit functions specifically for exceptions taken from user mode, but doesn't have entry/exit functions specifically for exceptions taken from kernel mode. It would be helpful to have separate entry/exit functions specifically for exceptions taken from kernel mode. This would make the structure of the entry code more consistent, and would make it easier for architectures to manage logic specific to exceptions taken from kernel mode. Move the logic specific to kernel mode out of irqentry_enter() and irqentry_exit() into new irqentry_enter_from_kernel_mode() and irqentry_exit_to_kernel_mode() functions. These are marked __always_inline and placed in irq-entry-common.h, as with irqentry_enter_from_user_mode() and irqentry_exit_to_user_mode(), so that they can be inlined into architecture-specific wrappers. The existing out-of-line irqentry_enter() and irqentry_exit() functions retained as callers of the new functions. The lockdep assertion from irqentry_exit() is moved into irqentry_exit_to_user_mode() and irqentry_exit_to_kernel_mode(). This was previously missing from irqentry_exit_to_user_mode() when called directly, and any new lockdep assertion failure relating from this change is a latent bug. Aside from the lockdep change noted above, there should be no functional change as a result of this change. [ tglx: Updated kernel doc ] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260407131650.3813777-5-mark.rutland@arm.com
2026-04-08entry: Remove local_irq_{enable,disable}_exit_to_user()Mark Rutland1-2/+2
local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user() are never overridden by architecture code, and are always equivalent to local_irq_enable() and local_irq_disable(). These functions were added on the assumption that arm64 would override them to manage 'DAIF' exception masking, as described by Thomas Gleixner in these threads: https://lore.kernel.org/all/20190919150809.340471236@linutronix.de/ https://lore.kernel.org/all/alpine.DEB.2.21.1910240119090.1852@nanos.tec.linutronix.de/ In practice arm64 did not need to override either. Prior to moving to the generic irqentry code, arm64's management of DAIF was reworked in commit: 97d935faacde ("arm64: Unmask Debug + SError in do_notify_resume()") Since that commit, arm64 only masks interrupts during the 'prepare' step when returning to user mode, and masks other DAIF exceptions later. Within arm64_exit_to_user_mode(), the arm64 entry code is as follows: local_irq_disable(); exit_to_user_mode_prepare_legacy(regs); local_daif_mask(); mte_check_tfsr_exit(); exit_to_user_mode(); Remove the unnecessary local_irq_enable_exit_to_user() and local_irq_disable_exit_to_user() functions. Signed-off-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Jinjie Ruan <ruanjinjie@huawei.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260407131650.3813777-3-mark.rutland@arm.com
2026-04-07bpf: Allow overwriting referenced dynptr when refcnt > 1Amery Hung1-2/+21
The verifier currently does not allow overwriting a referenced dynptr's stack slot to prevent resource leak. This is because referenced dynptr holds additional resources that requires calling specific helpers to release. This limitation can be relaxed when there are multiple copies of the same dynptr. Whether it is the orignial dynptr or one of its clones, as long as there exists at least one other dynptr with the same ref_obj_id (to be used to release the reference), its stack slot should be allowed to be overwritten. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260406150548.1354271-2-ameryhung@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07bpf: Clear delta when clearing reg id for non-{add,sub} opsDaniel Borkmann1-28/+28
When a non-{add,sub} alu op such as xor is performed on a scalar register that previously had a BPF_ADD_CONST delta, the else path in adjust_reg_min_max_vals() only clears dst_reg->id but leaves dst_reg->delta unchanged. This stale delta can propagate via assign_scalar_id_before_mov() when the register is later used in a mov. It gets a fresh id but keeps the stale delta from the old (now-cleared) BPF_ADD_CONST. This stale delta can later propagate leading to a verifier-vs- runtime value mismatch. The clear_id label already correctly clears both delta and id. Make the else path consistent by also zeroing the delta when id is cleared. More generally, this introduces a helper clear_scalar_id() which internally takes care of zeroing. There are various other locations in the verifier where only the id is cleared. By using the helper we catch all current and future locations. Fixes: 98d7ca374ba4 ("bpf: Track delta between "linked" registers.") Reported-by: STAR Labs SG <info@starlabs.sg> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260407192421.508817-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07bpf: Fix linked reg delta tracking when src_reg == dst_regDaniel Borkmann1-1/+2
Consider the case of rX += rX where src_reg and dst_reg are pointers to the same bpf_reg_state in adjust_reg_min_max_vals(). The latter first modifies the dst_reg in-place, and later in the delta tracking, the subsequent is_reg_const(src_reg)/reg_const_value(src_reg) reads the post-{add,sub} value instead of the original source. This is problematic since it sets an incorrect delta, which sync_linked_regs() then propagates to linked registers, thus creating a verifier-vs-runtime mismatch. Fix it by just skipping this corner case. Fixes: 98d7ca374ba4 ("bpf: Track delta between "linked" registers.") Reported-by: STAR Labs SG <info@starlabs.sg> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20260407192421.508817-1-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07bpf: Prefer vmlinux symbols over module symbols for unqualified kprobesAndrey Grodzovsky1-0/+8
When an unqualified kprobe target exists in both vmlinux and a loaded module, number_of_same_symbols() returns a count greater than 1, causing kprobe attachment to fail with -EADDRNOTAVAIL even though the vmlinux symbol is unambiguous. When no module qualifier is given and the symbol is found in vmlinux, return the vmlinux-only count without scanning loaded modules. This preserves the existing behavior for all other cases: - Symbol only in a module: vmlinux count is 0, falls through to module scan as before. - Symbol qualified with MOD:SYM: mod != NULL, unchanged path. - Symbol ambiguous within vmlinux itself: count > 1 is returned as-is. Fixes: 926fe783c8a6 ("tracing/kprobes: Fix symbol counting logic by looking at modules as well") Fixes: 9d8616034f16 ("tracing/kprobes: Add symbol counting check when module loads") Suggested-by: Ihor Solodrai <ihor.solodrai@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Ihor Solodrai <ihor.solodrai@linux.dev> Signed-off-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Link: https://lore.kernel.org/r/20260407203912.1787502-2-andrey.grodzovsky@crowdstrike.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07bpf: Retire rcu_trace_implies_rcu_gp()Kumar Kartikeya Dwivedi4-48/+19
RCU Tasks Trace grace period implies RCU grace period, and this guarantee is expected to remain in the future. Only BPF is the user of this predicate, hence retire the API and clean up all in-tree users. RCU Tasks Trace is now implemented on SRCU-fast and its grace period mechanism always has at least one call to synchronize_rcu() as it is required for SRCU-fast's correctness (it replaces the smp_mb() that SRCU-fast readers skip). So, RCU-tt GP will always imply RCU GP. Reviewed-by: Puranjay Mohan <puranjay@kernel.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260407162234.785270-1-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07workqueue: use NR_STD_WORKER_POOLS instead of hardcoded valueManinder Singh1-2/+2
use NR_STD_WORKER_POOLS for irq_work_fns[] array definition. NR_STD_WORKER_POOLS is also 2, but better to use MACRO. Initialization loop for_each_bh_worker_pool() also uses same MACRO. Signed-off-by: Maninder Singh <maninder1.s@samsung.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-04-07Merge tag 'mm-hotfixes-stable-2026-04-06-15-27' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "Eight hotfixes. All are cc:stable and seven are for MM. All are singletons - please see the changelogs for details" * tag 'mm-hotfixes-stable-2026-04-06-15-27' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: ocfs2: fix out-of-bounds write in ocfs2_write_end_inline mm/damon/stat: deallocate damon_call() failure leaking damon_ctx mm/vma: fix memory leak in __mmap_region() mm/memory_hotplug: maintain N_NORMAL_MEMORY during hotplug mm/damon/sysfs: dealloc repeat_call_control if damon_call() fails mm: reinstate unconditional writeback start in balance_dirty_pages() liveupdate: propagate file deserialization failures mm: filemap: fix nr_pages calculation overflow in filemap_map_pages()
2026-04-07alarmtimer: Access timerqueue node under lock in suspendZhan Xusheng1-4/+8
In alarmtimer_suspend(), timerqueue_getnext() is called under base->lock, but next->expires is read after the lock is released. This is safe because suspend freezes all relevant task contexts, but reading the node while holding the lock makes the code easier to reason about and not worry about a theoretical UAF. Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260407143627.19405-1-zhanxusheng@xiaomi.com
2026-04-07bpf: Drop task_to_inode and inet_conn_established from lsm sleepable hooksJiayuan Chen1-3/+0
bpf_lsm_task_to_inode() is called under rcu_read_lock() and bpf_lsm_inet_conn_established() is called from softirq context, so neither hook can be used by sleepable LSM programs. Fixes: 423f16108c9d8 ("bpf: Augment the set of sleepable LSM hooks") Reported-by: Quan Sun <2022090917019@std.uestc.edu.cn> Reported-by: Yinhao Hu <dddddd@hust.edu.cn> Reported-by: Kaiyan Mei <M202472210@hust.edu.cn> Reported-by: Dongliang Mu <dzm91@hust.edu.cn> Closes: https://lore.kernel.org/bpf/3ab69731-24d1-431a-a351-452aafaaf2a5@std.uestc.edu.cn/T/#u Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Link: https://lore.kernel.org/r/20260407122334.344072-1-jiayuan.chen@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-07tick/nohz: Fix inverted return value in check_tick_dependency() fast pathJosh Snyder1-1/+1
Commit 56534673cea7f ("tick/nohz: Optimize check_tick_dependency() with early return") added a fast path that returns !val when the tick_stop tracepoint is disabled. This is inverted: the slow path returns true when a dependency IS found (val != 0), but !val returns true when val is zero (no dependency). The result is that can_stop_full_tick() sees "dependency found" when there are none, and the tick never stops on nohz_full CPUs. Fix this by returning !!val instead of !val, matching the slow-path semantics. Fixes: 56534673cea7f ("tick/nohz: Optimize check_tick_dependency() with early return") Signed-off-by: Josh Snyder <josh@code406.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Assisted-by: Claude:claude-opus-4-6 Link: https://patch.msgid.link/20260402-fix-idle-tick2-v1-1-eecb589649d3@code406.com
2026-04-07sched/fair: Avoid overflow in enqueue_entity()K Prateek Nayak1-2/+30
Here is one scenario which was triggered when running: stress-ng --yield=32 -t 10000000s& while true; do perf bench sched messaging -p -t -l 100000 -g 16; done on a 256CPUs machine after about an hour into the run: __enqeue_entity: entity_key(-141245081754) weight(90891264) overflow_mul(5608800059305154560) vlag(57498) delayed?(0) cfs_rq: zero_vruntime(3809707759657809) sum_w_vruntime(0) sum_weight(0) nr_queued(1) cfs_rq->curr: entity_key(0) vruntime(3809707759657809) deadline(3809723966988476) weight(37) The above comes from __enqueue_entity() after a place_entity(). Breaking this down: vlag_initial = 57498 vlag = (57498 * (37 + 90891264)) / 37 = 141,245,081,754 vruntime = 3809707759657809 - 141245081754 = 3,809,566,514,576,055 entity_key(se, cfs_rq) = -141,245,081,754 Now, multiplying the entity_key with its own weight results to 5,608,800,059,305,154,560 (same as what overflow_mul() suggests) but in Python, without overflow, this would be: -1,2837,944,014,404,397,056 Avoid the overflow (without doing the division for avg_vruntime()), by moving zero_vruntime to the new entity when it is heavier. Fixes: 4823725d9d1d ("sched/fair: Increase weight bits for avg_vruntime") Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com> [peterz: suggested 'weight > load' condition] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260407120052.GG3738010@noisy.programming.kicks-ass.net
2026-04-07sched: Use u64 for bandwidth ratio calculationsJoseph Salisbury3-3/+3
to_ratio() computes BW_SHIFT-scaled bandwidth ratios from u64 period and runtime values, but it returns unsigned long. tg_rt_schedulable() also stores the current group limit and the accumulated child sum in unsigned long. On 32-bit builds, large bandwidth ratios can be truncated and the RT group sum can wrap when enough siblings are present. That can let an overcommitted RT hierarchy pass the schedulability check, and it also narrows the helper result for other callers. Return u64 from to_ratio() and use u64 for the RT group totals so bandwidth ratios are preserved and compared at full width on both 32-bit and 64-bit builds. Fixes: b40b2e8eb521 ("sched: rt: multi level group constraints") Assisted-by: Codex:GPT-5 Signed-off-by: Joseph Salisbury <joseph.salisbury@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260403210014.2713404-1-joseph.salisbury@oracle.com
2026-04-06bpf: Do not ignore offsets for loads from insn_arraysAnton Protopopov1-0/+20
When a pointer to PTR_TO_INSN is dereferenced, the offset field of the BPF_LDX_MEM instruction can be nonzero. Patch the verifier to not ignore this field. Reported-by: Jiyong Yang <ksur673@gmail.com> Fixes: 493d9e0d6083 ("bpf, x86: add support for indirect jumps") Signed-off-by: Anton Protopopov <a.s.protopopov@gmail.com> Link: https://lore.kernel.org/r/20260406160141.36943-2-a.s.protopopov@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06bpf: Avoid -Wflex-array-members-not-at-end warningsGustavo A. R. Silva1-5/+7
Apparently, struct bpf_empty_prog_array exists entirely to populate a single element of "items" in a global variable. "null_prog" is only used during the initializer. None of this is needed; globals will be correctly sized with an array initializer of a flexible-array member. So, remove struct bpf_empty_prog_array and adjust the rest of the code, accordingly. With these changes, fix the following warnings: ./include/linux/bpf.h:2369:31: warning: structure containing a flexible array member is not at the end of another structure [-Wflex-array-member-not-at-end] Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r/acr7Whmn0br3xeBP@kspp Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06bpf: Enable unaligned accesses for syscall ctxKumar Kartikeya Dwivedi1-2/+1
Don't reject usage of fixed unaligned offsets for syscall ctx. Tests will be added in later commits. Unaligned offsets already work for variable offsets. Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260406194403.1649608-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06bpf: Support variable offsets for syscall PTR_TO_CTXKumar Kartikeya Dwivedi1-38/+65
Allow accessing PTR_TO_CTX with variable offsets in syscall programs. Fixed offsets are already enabled for all program types that do not convert their ctx accesses, since the changes we made in the commit de6c7d99f898 ("bpf: Relax fixed offset check for PTR_TO_CTX"). Note that we also lift the restriction on passing syscall context into helpers, which was not permitted before, and passing modified syscall context into kfuncs. The structure of check_mem_access can be mostly shared and preserved, but we must use check_mem_region_access to correctly verify access with variable offsets. The check made in check_helper_mem_access is hardened to only allow PTR_TO_CTX for syscall programs to be passed in as helper memory. This was the original intention of the existing code anyway, and it makes little sense for other program types' context to be utilized as a memory buffer. In case a convincing example presents itself in the future, this check can be relaxed further. We also no longer use the last-byte access to simulate helper memory access, but instead go through check_mem_region_access. Since this no longer updates our max_ctx_offset, we must do so manually, to keep track of the maximum offset at which the program ctx may be accessed. Take care to ensure that when arg_type is ARG_PTR_TO_CTX, we do not relax any fixed or variable offset constraints around PTR_TO_CTX even in syscall programs, and require them to be passed unmodified. There are several reasons why this is necessary. First, if we pass a modified ctx, then the global subprog's accesses will not update the max_ctx_offset to its true maximum offset, and can lead to out of bounds accesses. Second, tail called program (or extension program replacing global subprog) where their max_ctx_offset exceeds the program they are being called from can also cause issues. For the latter, unmodified PTR_TO_CTX is the first requirement for the fix, the second is ensuring max_ctx_offset >= the program they are being called from, which has to be a separate change not made in this commit. All in all, we can hint using arg_type when we expect ARG_PTR_TO_CTX and make our relaxation around offsets conditional on it. Drop coverage of syscall tests from verifier_ctx.c temporarily for negative cases until they are updated in subsequent commits. Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com> Acked-by: Puranjay Mohan <puranjay@kernel.org> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260406194403.1649608-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-04-06liveupdate: propagate file deserialization failuresLeo Timmins1-2/+7
luo_session_deserialize() ignored the return value from luo_file_deserialize(). As a result, a session could be left partially restored even though the /dev/liveupdate open path treats deserialization failures as fatal. Propagate the error so a failed file deserialization aborts session deserialization instead of silently continuing. Link: https://lkml.kernel.org/r/20260325044608.8407-1-leotimmins1974@gmail.com Link: https://lkml.kernel.org/r/20260325044608.8407-2-leotimmins1974@gmail.com Fixes: 16cec0d26521 ("liveupdate: luo_session: add ioctls for file preservation") Signed-off-by: Leo Timmins <leotimmins1974@gmail.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Reviewed-by: Pratyush Yadav <pratyush@kernel.org> Cc: Mike Rapoport <rppt@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05bpf: Fix stale offload->prog pointer after constant blindingMingTao Huang1-0/+2
When a dev-bound-only BPF program (BPF_F_XDP_DEV_BOUND_ONLY) undergoes JIT compilation with constant blinding enabled (bp