linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2026-06-08	mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device	Youngjun Park	2	-4/+13
	Patch series "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device", v8. Currently, in the uswsusp path, only the swap type value is retrieved at lookup time without holding a reference. If swapoff races after the type is acquired, subsequent slot allocations operate on a stale swap device. Additionally, grabbing and releasing the swap device reference on every slot allocation is inefficient across the entire hibernation swap path. This patch series addresses these issues: - Patch 1: Fixes the swapoff race in uswsusp by pinning the swap device from the point it is looked up until the session completes. - Patch 2: Removes the overhead of per-slot reference counting in alloc/free paths and cleans up the redundant SWP_WRITEOK check. This patch (of 2): Hibernation via uswsusp (/dev/snapshot ioctls) has a race window: after selecting the resume swap area but before user space is frozen, swapoff may run and invalidate the selected swap device. Fix this by pinning the swap device with SWP_HIBERNATION while it is in use. The pin is exclusive, which is sufficient since hibernate_acquire() already prevents concurrent hibernation sessions. The kernel swsusp path (sysfs-based hibernate/resume) uses find_hibernation_swap_type() which is not affected by the pin. It freezes user space before touching swap, so swapoff cannot race. Introduce dedicated helpers: - pin_hibernation_swap_type(): Look up and pin the swap device. Used by the uswsusp path. - find_hibernation_swap_type(): Lookup without pinning. Used by the kernel swsusp path. - unpin_hibernation_swap_type(): Clear the hibernation pin. While a swap device is pinned, swapoff is prevented from proceeding. Link: https://lore.kernel.org/20260323160822.1409904-1-youngjun.park@lge.com Link: https://lore.kernel.org/20260323160822.1409904-2-youngjun.park@lge.com Signed-off-by: Youngjun Park <youngjun.park@lge.com> Reviewed-by: Kairui Song <kasong@tencent.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: "Rafael J . Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-06-02	mm: introduce for_each_free_list()	Brendan Jackman	1	-4/+4
	Patch series "mm: misc cleanups from __GFP_UNMAPPED series". In v2 of the __GFP_UNMAPPED series [0], we realised that some of the patches could potentially be merged as independent cleanups. These are all independent of one another, if you think some are useful cleanups and others are pointless churn, it should be fine to just pick whatever subset you prefer. No functional change intended. This patch (of 4): There are a couple of places that iterate over the freelists with awareness of the data structures' layout. It seems ideally, code outside of mm should not be aware of the page allocator's freelists at all. But, this patch just doesn't hide them completely, it's just a meek incremental step in that direction: provide a macro to iterate over it without needing to be aware of the actual struct fields. Link: https://lore.kernel.org/20260513-page_alloc-unmapped-prep-v1-0-dacdf5402be8@google.com Link: https://lore.kernel.org/20260513-page_alloc-unmapped-prep-v1-1-dacdf5402be8@google.com Link: https://lore.kernel.org/all/20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com/ [0] Signed-off-by: Brendan Jackman <jackmanb@google.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Axel Rasmussen <axelrasmussen@google.com> Cc: Barry Song <baohua@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Kairui Song <kasong@tencent.com> Cc: Len Brown <lenb@kernel.org> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Wei Xu <weixugc@google.com> Cc: Yuanchu Xie <yuanchu@google.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-28	kasan: skip HW tagging for all kernel thread stacks	Muhammad Usama Anjum	1	-2/+3
	HW-tag KASAN never checks kernel stacks because stack pointers carry the match-all tag, so setting/poisoning tags is pure overhead. - Add __GFP_SKIP_KASAN to THREADINFO_GFP so every stack allocator that uses it skips tagging (fork path plus arch users) - Add __GFP_SKIP_KASAN to GFP_VMAP_STACK for the fork-specific vmap stacks. - When reusing cached vmap stacks, skip kasan_unpoison_range() if HW tags are enabled. Software KASAN is unchanged; this only affects tag-based KASAN. Link: https://lore.kernel.org/20260429102704.680174-3-dev.jain@arm.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@arm.com> Signed-off-by: Dev Jain <dev.jain@arm.com> Reviewed-by: Catalin Marinas <catalin.marinas@arm.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Ben Segall <bsegall@google.com> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: "Uladzislau Rezki (Sony)" <urezki@gmail.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-28	bpf: arena: use page_ref_count() instead of page_mapped() in arena_free_pages()	David Hildenbrand (Arm)	1	-1/+1
	Pages that BPF arena code maps are allocated through bpf_map_alloc_pages(), which does not allocate folios but pages. In the future, pages will not have a mapcount, only folios will. Converting the code to use folios and rely on folio_mapped() sounds like the wrong approach. Should BPF arena code allocate folios and use folio_mapped() here? But likely we would not want to use folios here longterm, as we don't really need folio information. Hard to tell. But in the meantime, we can simply use the page refcount instead, as a heuristic whether the page might be mapped to user space and we would want to try zapping it, so we can get rid of page_mapped(). Page allocation will give us a page with a refcount of 1. Any user space mapping adds a page reference. While there can be references from other subsystems (e.g., GUP), in the common case for this test here relying on the page count is good enough. Link: https://lore.kernel.org/20260427-page_mapped-v1-2-e89c3592c74c@kernel.org Signed-off-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Eduard Zingerman <eddyz87@gmail.com> Cc: Harry Yoo <harry@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com> Cc: Liam R. Howlett <liam@infradead.org> Cc: Lorenzo Stoakes <ljs@kernel.org> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Rich Felker <dalias@libc.org> Cc: Rik van Riel <riel@surriel.com> Cc: Song Liu <song@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Yonghong Song <yonghong.song@linux.dev> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-26	Merge tag 'mm-hotfixes-stable-2026-05-25-16-22' of ↵	Linus Torvalds	1	-6/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes. 9 are for MM. 9 are cc:stable and the remaining 4 address post-7.1 issues or aren't considered suitable for backporting. All patches are singletons - please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2026-05-25-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "mm: introduce a new page type for page pool in page type" mm/vmalloc: do not trigger BUG() on BH disabled context MAINTAINERS, mailmap: change email for Eugen Hristev mm/migrate_device: fix pgtable leak in migrate_vma_insert_huge_pmd_page kernel/fork: validate exit_signal in kernel_clone() mm: memcontrol: propagate NMI slab stats to memcg vmstats mm/damon/sysfs-schemes: delete tried region in regions_rmdirs() mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one zram: fix use-after-free in zram_writeback_endio memfd: deny writeable mappings when implying SEAL_WRITE ipc: limit next_id allocation to the valid ID range Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare" MAINTAINERS: .mailmap: update after GEHC spin-off
2026-05-24	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Linus Torvalds	7	-20/+93
	Pull bpf fixes from Alexei Starovoitov: - Fix bpf_throw() and global subprog combination (Kumar Kartikeya Dwivedi) - Fix out of bounds access in BPF interpreter (Yazhou Tang) - Fix potential out of bounds access in inner per-cpu array map (Guannan Wang) - Reject NULL data/sig in bpf_verify_pkcs7_signature (KP Singh) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: libbpf: fix off-by-one in emit_signature_match jump offset bpf: Reject NULL data/sig in bpf_verify_pkcs7_signature selftests/bpf: Cover global subprog exception leaks bpf: Check global subprog exception paths bpf: make bpf_session_is_return() reference optional bpf: Use array_map_meta_equal for percpu array inner map replacement selftests/bpf: Add test for large offset bpf-to-bpf call bpf: Fix s16 truncation for large bpf-to-bpf call offsets bpf: Fix out-of-bounds read in bpf_patch_call_args()
2026-05-22	Merge tag 'sched_ext-for-7.1-rc4-fixes' of ↵	Linus Torvalds	1	-3/+36
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Spurious WARN in ops_dequeue() racing with concurrent dispatch - Self-deadlock between scheduler disable and a concurrent sub-sched enable * tag 'sched_ext-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Fix spurious WARN on stale ops_state in ops_dequeue() sched_ext: Fix deadlock between scx_root_disable() and concurrent forks
2026-05-22	Merge tag 'cgroup-for-7.1-rc4-fixes' of ↵	Linus Torvalds	1	-14/+23
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two rstat fixes: - Out-of-bounds access in the css_rstat_updated() BPF kfunc when called with an unchecked user-supplied cpu - Over-strict NMI guard after the recent switch to try_cmpxchg left sparc and ppc64 unable to queue rstat updates from NMI" * tag 'cgroup-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: rstat: relax NMI guard after switch to try_cmpxchg cgroup/rstat: validate cpu before css_rstat_cpu() access
2026-05-22	Merge tag 'dma-mapping-7.1-2026-05-22' of ↵	Linus Torvalds	3	-7/+10
	git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "Two minor updates for the DMA-mapping code, mainly fixing some rare corner cases (Petr Tesarik, Jianpeng Chang)" * tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: move dma_map_resource() sanity check into debug code dma-direct: fix use of max_pfn
2026-05-22	Merge tag 'trace-v7.1-rc4' of ↵	Linus Torvalds	2	-8/+15
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Avoid NULL return from hist_field_name() The function hist_field_name() is directly passed to a strcat() which does not handle "NULL" characters. Return a zero length string when size is greater than the limit. This is used only to output already created histograms and no field currently is greater than the limit. But it should still not return NULL. - Do not call map->ops->elt_free() on allocation failure When elt_alloc() fails, it should not call the map->ops->elt_free() function if it exists, as that function may not be able to handle the free on allocation failures. The ->elt_free() should only be called when elt_alloc() succeeds. * tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Do not call map->ops->elt_free() if elt_alloc() fails tracing: Avoid NULL return from hist_field_name() on truncation
2026-05-21	kernel/fork: validate exit_signal in kernel_clone()	Deepanshu Kartikey	1	-6/+5
	When a child process exits, it sends exit_signal to its parent via do_notify_parent(). The clone() syscall constructs exit_signal as: (lower_32_bits(clone_flags) & CSIGNAL) CSIGNAL is 0xff, so values in the range 65-255 are possible. However, valid_signal() only accepts signals up to _NSIG (64 on x86_64). A non-zero non-valid exit_signal acts the same as exit_signal == 0: the parent process is not signaled when the child terminates. The syzkaller reproducer triggers this by calling clone() with flags=0x80, resulting in exit_signal = (0x80 & CSIGNAL) = 128, which exceeds _NSIG and is not a valid signal. The v1 of this patch added the check only in the clone() syscall handler, which is incomplete. kernel_clone() has other callers such as sys_ia32_clone() which would remain unprotected. Move the check to kernel_clone() to cover all callers. Since the valid_signal() check is now in kernel_clone() and covers all callers including clone3(), the same check in copy_clone_args_from_user() becomes redundant and is removed. The higher 32bits check for clone3() is kept as it is clone3() specific. Note that this is a user-visible change: previously, passing an invalid exit_signal to clone() was silently accepted. The man page for clone() does not document any defined behavior for invalid exit_signal values, so rejecting them with -EINVAL is the correct behavior. It is unlikely that any sane application relies on passing an invalid exit_signal. [oleg@redhat.com: the comment above kernel_clone() should be updated] Link: https://lore.kernel.org/abwvgU17W8wuW2-J@redhat.com Link: https://lore.kernel.org/20260316151956.563558-1-kartikey406@gmail.com Fixes: 3f2c788a1314 ("fork: prevent accidental access to clone3 features") Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: syzbot+bbe6b99feefc3a0842de@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bbe6b99feefc3a0842de Tested-by: syzbot+bbe6b99feefc3a0842de@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/20260307064202.353405-1-kartikey406@gmail.com/T/ [v1] Link: https://lore.kernel.org/all/20260316104536.558108-1-kartikey406@gmail.com/T/ [v2] Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ben Segall <bsegall@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-21	Merge tag 'trace-ringbuffer-v7.1-rc4' of ↵	Linus Torvalds	3	-8/+29
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fixes from Steven Rostedt: - Fix reporting MISSED EVENTS in trace iterator When the "trace" file is read with tracing enabled, if the writer were to pass the iterator reader, it resets, sets a "missed_events" flag and continues. The tracing output checks for missed events and if there are some, it prints out "[LOST EVENTS]" to let the user know events were dropped. But the clearing of the missed_events happened when the tracing system queried the ring buffer iterator about missed events. This was premature as the ring buffer is per CPU, and the tracing code reads all the CPU buffers and checks for missed events when it is read. If the CPU iterator that had missed events isn't printed next, the output for the LOST EVENTS is lost. Clear the missed_events flag when the iterator moves to the next event and not when the missed_events flag is queried. Also clear it on reset. - Flush and stop the persistent ring buffer on panic On panic the persistent ring buffer is used to debug what caused the panic. But on some architectures, it requires flushing the memory from cache, otherwise, the ring buffer persistent memory may not have the last events and this could also cause the ring buffer to be corrupted on the next boot. - Fix nr_subbufs initialization in simple_ring_buffer_init_mm The remote simple ring buffer meta data nr_subbufs is initialized too early and gets cleared later on, making it zero and not reflect the actual number of sub-buffers. - Fix unload_page for simple_ring_buffer init rollback On error, the pages loaded need to be unloaded. To unload a page it is expected that: page = load_page(va); -> unload_page(page). But the code was doing: unload_page(va) and not unload_page(page). - Create output file from cmd_check_undefined The check for undefined symbols checks if the file .o.checked exists and if so it skips doing the work. But the .o.checked file never was created making every build do the work even when it was already done previously. * tag 'trace-ringbuffer-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Create output file from cmd_check_undefined tracing: Fix unload_page for simple_ring_buffer init rollback tracing: Fix nr_subbufs initialization in simple_ring_buffer_init_mm() ring-buffer: Flush and stop persistent ring buffer on panic ring-buffer: Fix reporting of missed events in iterator
2026-05-21	sched_ext: Fix spurious WARN on stale ops_state in ops_dequeue()	Samuele Mariotti	1	-2/+15
	ops_dequeue() can race with finish_dispatch() and spuriously trigger the "queued task must be in BPF scheduler's custody" warning. ops_dequeue() snapshots p->scx.ops_state via atomic_long_read_acquire() and then, in the SCX_OPSS_QUEUED arm, asserts that SCX_TASK_IN_CUSTODY is set. The two reads are not atomic w.r.t. a concurrent finish_dispatch() running on another CPU: CPU 1 CPU 2 ===== ===== dequeue_task_scx() ops_dequeue() opss = read_acquire(ops_state) = SCX_OPSS_QUEUED finish_dispatch() cmpxchg ops_state: SCX_OPSS_QUEUED -> SCX_OPSS_DISPATCHING [succeeds] dispatch_enqueue(SCX_DSQ_GLOBAL, SCX_ENQ_CLEAR_OPSS) call_task_dequeue() p->scx.flags &= ~SCX_TASK_IN_CUSTODY WARN_ON_ONCE(!(p->scx.flags & SCX_TASK_IN_CUSTODY)) /* opss is stale: QUEUED, * but task already claimed */ set_release(ops_state, SCX_OPSS_NONE) The race has been observed via two distinct call chains: the most common goes through sched_setaffinity(), a rarer variant through sched_change_begin(). For SCX_DSQ_GLOBAL / SCX_DSQ_BYPASS, dispatch_enqueue() clears SCX_TASK_IN_CUSTODY before clearing ops_state to SCX_OPSS_NONE (intentional, to avoid concurrent non-atomic RMW of p->scx.flags against ops_dequeue()). The window between those two writes is exactly what ops_dequeue() observes as "QUEUED without custody". The observed state is not actually inconsistent, it just means CPU 1 has already claimed the task and the QUEUED value held by CPU 2 is stale. Re-read ops_state in that case; the next read is guaranteed to return SCX_OPSS_DISPATCHING or SCX_OPSS_NONE, both of which exit the switch cleanly. The retry is bounded: once IN_CUSTODY is cleared, ops_state has already advanced past QUEUED for this dispatch cycle, and a fresh QUEUED would require re-enqueue under p's rq lock, which CPU 2 holds. Changes in v2: - Use READ_ONCE() for p->scx.flags to ensure fresh reads and prevent compiler reordering in the lockless path - Add cpu_relax() to reduce power consumption and improve performance during the spin-wait - Use unlikely() to optimize branch prediction for the common case - Expand the in-code comment to document the race condition and bounded retry guarantee Fixes: ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics") Suggested-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Samuele Mariotti <smariotti@disroot.org> Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-21	tracing: Do not call map->ops->elt_free() if elt_alloc() fails	Masami Hiramatsu (Google)	1	-4/+13
	In paths where tracing_map_elt_alloc() failed to allocate objects, the map->ops->elt_alloc() call was never successful. In this case, map->ops->elt_free() should not be called. Link: https://sashiko.dev/#/patchset/20260520223101.34710-1-rosenp%40gmail.com Cc: stable@vger.kernel.org Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Rosen Penev <rosenp@gmail.com> Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: 2734b629525a ("tracing: Add per-element variable support to tracing_map") Link: https://patch.msgid.link/177933895460.108746.5396070821443932634.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	tracing: Create output file from cmd_check_undefined	Thomas Weißschuh	1	-1/+2
	As the output file is currently never created, the check will run every time, even if the inputs have not changed. Create an empty output file which allows make to skip the execution when it is not necessary. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260520-tracing-ringbuffer-check-v1-1-d979cfab1338@weissschuh.net Fixes: 1211907ac0b5 ("tracing: Generate undef symbols allowlist for simple_ring_buffer") Fixes: 58b4bd18390e ("tracing: Adjust cmd_check_undefined to show unexpected undefined symbols") Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	tracing: Fix unload_page for simple_ring_buffer init rollback	Vincent Donnefort	1	-1/+1
	The unload_page callback expects the return value of load_page() as its argument: ret = load_page(va); unload(ret). Fix the rollback code in simple_ring_buffer_init_mm() where the descriptor's VA is used instead of the loaded page address. Link: https://patch.msgid.link/20260512141614.1759430-1-vdonnefort@google.com Fixes: 635923081c79 ("tracing: load/unload page callbacks for simple_ring_buffer") Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	tracing: Fix nr_subbufs initialization in simple_ring_buffer_init_mm()	David Carlier	1	-1/+1
	nr_subbufs in the ring buffer metadata is always initialized to zero because it is assigned from cpu_buffer->nr_pages before the page initialization loop has run. While nr_subbufs is not currently read by the kernel, it should reflect the actual buffer geometry in the meta page for correctness. Move the assignment after the page loop so that cpu_buffer->nr_pages holds the final count. Link: https://patch.msgid.link/20260512135420.99194-1-devnexen@gmail.com Fixes: 34e5b958bdad ("tracing: Introduce simple_ring_buffer") Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	ring-buffer: Flush and stop persistent ring buffer on panic	Masami Hiramatsu (Google)	1	-0/+22
	On real hardware, panic and machine reboot may not flush hardware cache to memory. This means the persistent ring buffer, which relies on a coherent state of memory, may not have its events written to the buffer and they may be lost. Moreover, there may be inconsistency with the counters which are used for validation of the integrity of the persistent ring buffer which may cause all data to be discarded. To avoid this issue, stop recording of the ring buffer on panic and flush the cache of the ring buffer's memory. Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	ring-buffer: Fix reporting of missed events in iterator	Steven Rostedt	1	-5/+3
	When tracing is active while reading the trace file, if the iterator reading the buffer detects that the writer has passed the iterator head, it will reset and set a "missed events" flag. This flag is passed to the output processing to show the user that events were missed: CPU:4 [LOST EVENTS] The problem is that the flag is reset after it is checked in ring_buffer_iter_dropped(). But the "trace" file iterates over all the CPU ring buffers and it will check if they are dropped when figuring out which buffer to print next. This prematurely clears the missed_events flag if the CPU buffer with the missed events is not the one that is printed next. On the iteration where the CPU buffer with the missed events is printed, the check if it had missed events would return false and the output does not show that events were missed. Do not reset the missed_events flag when checking if there were missed events, but instead clear it when moving the iterator head to the next event. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260520220801.4fd09d13@fedora Fixes: c9b7a4a72ff64 ("ring-buffer/tracing: Have iterator acknowledge dropped events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-20	tracing: Avoid NULL return from hist_field_name() on truncation	David Carlier	1	-4/+2
	hist_field_name() returns "" everywhere except the fully-qualified VAR_REF/EXPR case, where snprintf() truncation returns NULL early and bypasses the bottom NULL->"" guard. Callers don't expect NULL: strcat(expr, hist_field_name(field, 0)) at trace_events_hist.c:1758 and the strcmp() in the sort-key match loop at :4804 both deref it. system and event_name are bounded by MAX_EVENT_NAME_LEN, but the field name on a VAR_REF is kstrdup'd from a histogram variable name parsed out of the trigger string and has no length cap, so a long enough var name in a fully qualified reference can reach the truncation path. Keep the length check but leave field_name as "" on overflow. Link: https://patch.msgid.link/20260508195747.25492-1-devnexen@gmail.com Fixes: 5ec1d1e97de1 ("tracing: Rebuild full_name on each hist_field_name() call") Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-20	cgroup: rstat: relax NMI guard after switch to try_cmpxchg	Cunlong Li	1	-4/+3
	Commit 36df6e3dbd7e ("cgroup: make css_rstat_updated nmi safe") used this_cpu_cmpxchg() for the lockless insertion, and therefore required both ARCH_HAVE_NMI_SAFE_CMPXCHG and ARCH_HAS_NMI_SAFE_THIS_CPU_OPS in the NMI guard: on archs without the latter, this_cpu_cmpxchg() falls back to "local_irq_save() + plain cmpxchg", and local_irq_save() cannot mask NMIs. Commit 3309b63a2281 ("cgroup: rstat: use LOCK CMPXCHG in css_rstat_updated") later replaced this_cpu_cmpxchg() with plain try_cmpxchg() to fix cross-CPU lockless-list corruption, but left the NMI guard untouched. After that switch, css_rstat_updated() no longer performs any this_cpu_*() RMW operations and only relies on the arch having NMI-safe cmpxchg, so ARCH_HAS_NMI_SAFE_THIS_CPU_OPS is no longer required in the guard. Relax the guard accordingly so that archs which have HAVE_NMI and ARCH_HAVE_NMI_SAFE_CMPXCHG but not ARCH_HAS_NMI_SAFE_THIS_CPU_OPS (e.g. sparc, powerpc on PPC64/BOOK3S) can benefit from the existing CONFIG_MEMCG_NMI_SAFETY_REQUIRES_ATOMIC path. Without this, the css is never queued in NMI on those archs, and the atomics staged by account_{slab,kmem}_nmi_safe() are not drained by flush_nmi_stats(). Fixes: 3309b63a2281 ("cgroup: rstat: use LOCK CMPXCHG in css_rstat_updated") Signed-off-by: Cunlong Li <shenxiaogll@gmail.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-20	Merge tag 'rcu-fixes.v7.1-20260519a' of ↵	Linus Torvalds	1	-6/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU fixes from Boqun Feng: "Fix a regression introduced by commit 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when non-preemptible"). SRCU may queue works on CPUs that are "possible" but never have been online. In such a case, the work callbacks may not be executed until the corresponding CPU gets online, and as the callbacks accumulates, workqueue lockups will fire. Fix this by avoiding queuing works on CPUs that have never been online" * tag 'rcu-fixes.v7.1-20260519a' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: srcu: Don't queue workqueue handlers to never-online CPUs
2026-05-20	bpf: Reject NULL data/sig in bpf_verify_pkcs7_signature	KP Singh	1	-0/+5
	__bpf_dynptr_data() can return NULL (FILE dynptrs, any non-contiguous backing). bpf_verify_pkcs7_signature() forwards the pointer to verify_pkcs7_signature() unchecked, causing a NULL deref in asn1_ber_decoder() reachable from a sleepable BPF LSM at lsm.s/bpf. NULL-check both pointers and reject with -EINVAL. Mirrors the guards already in kernel/bpf/crypto.c. Fixes: 865b0566d8f1 ("bpf: Add bpf_verify_pkcs7_signature() kfunc") Reported-by: Xianrui Dong <dongxianrui1@gmail.com> Signed-off-by: KP Singh <kpsingh@kernel.org> Reviewed-by: Amery Hung <ameryhung@gmail.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20260520024059.313468-1-kpsingh@kernel.org Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
2026-05-18	cgroup/rstat: validate cpu before css_rstat_cpu() access	Qing Ming	1	-10/+20
	css_rstat_updated() is exposed as a BPF kfunc and accepts a caller-provided cpu argument. The function uses cpu for per-cpu rstat lookups without checking whether it refers to a valid possible CPU. A BPF iter/cgroup program with CAP_BPF and CAP_PERFMON can pass an invalid cpu value. On an unfixed UBSCAN_BOUNDS test kernel, cpu == 0x7fffffff triggers: UBSAN: array-index-out-of-bounds in kernel/cgroup/rstat.c:31:9 index 2147483647 is out of range for type 'long unsigned int [64]' Call Trace: css_rstat_updated bpf_iter_run_prog cgroup_iter_seq_show bpf_seq_read Add cpu validation to the BPF-facing css_rstat_updated() kfunc and move the common implementation to __css_rstat_updated() for in-kernel callers. Fixes: a319185be9f5 ("cgroup: bpf: enable bpf programs to integrate with rstat") Signed-off-by: Qing Ming <a0yami@mailbox.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-18	srcu: Don't queue workqueue handlers to never-online CPUs	Paul E. McKenney	1	-6/+6
	While an srcu_struct structure is in the midst of switching from CPU-0 to all-CPUs state, it can attempt to invoke callbacks for CPUs that have never been online. Worse yet, it can attempt in invoke callbacks for CPUs that never will be online, even including imaginary CPUs not in cpu_possible_mask. This can cause hangs on s390, which is not set up to deal with workqueue handlers being scheduled on such CPUs. This commit therefore causes Tree SRCU to refrain from queueing workqueue handlers on CPUs that have not yet (and might never) come online. Because callbacks are not invoked on CPUs that have not been online, it is an error to invoke call_srcu(), synchronize_srcu(), or synchronize_srcu_expedited() on a CPU that is not yet fully online. However, it turns out to be less code to redirect the callbacks from too-early invocations of call_srcu() than to warn about such invocations. This commit therefore also redirects callbacks queued on not-yet-fully-online CPUs to the boot CPU. Reported-by: Vasily Gorbik <gor@linux.ibm.com> Fixes: 61bbcfb50514 ("srcu: Push srcu_node allocation to GP when non-preemptible") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: Vasily Gorbik <gor@linux.ibm.com> Tested-by: Samir <samir@linux.ibm.com> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Boqun Feng <boqun@kernel.org>
2026-05-18	dma-mapping: move dma_map_resource() sanity check into debug code	Jianpeng Chang	2	-5/+8
	dma_map_resource() uses pfn_valid() to ensure the range is not RAM. However, pfn_valid() only checks for availability of the memory map for a PFN but it does not ensure that the PFN is actually backed by RAM. On ARM64 with SPARSEMEM (128MB section granularity), MMIO addresses that share a section with RAM will falsely trigger the WARN_ON_ONCE and cause dma_map_resource() to return DMA_MAPPING_ERROR. This causes a WARNING on Raspberry Pi 4 during spi_bcm2835 probe because the SPI FIFO register (0xfe204004) falls in the same sparsemem section as the end of RAM (0xf8000000-0xfbffffff), both in section 31 (0xf8000000-0xffffffff). Move the sanity check from dma_map_resource() into debug_dma_map_phys() and replace the unreliable pfn_valid() with pfn_valid() && !PageReserved(), which correctly identifies actual usable RAM without false positives for MMIO regions that happen to have struct pages. Since dma_map_resource() is dma_map_phys(DMA_ATTR_MMIO), the check applies equally to both APIs. Any non-reserved page represents kernel memory to a sufficient degree that using DMA_ATTR_MMIO on it is almost certainly wrong and risks breaking coherency on non-coherent platforms. ZONE_DEVICE pages used for PCI P2P DMA (MEMORY_DEVICE_PCI_P2PDMA) have PageReserved set, so they will not trigger a false positive. The check no longer blocks the mapping and uses err_printk() to integrate with dma-debug filtering. Fixes: f7326196a781 ("dma-mapping: export new dma_*map_phys() interface") Reviewed-by: Robin Murphy <robin.murphy@arm.com> Signed-off-by: Jianpeng Chang <jianpeng.chang.cn@windriver.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260513072209.1486986-1-jianpeng.chang.cn@windriver.com
2026-05-17	sched_ext: Fix deadlock between scx_root_disable() and concurrent forks	Tejun Heo	1	-1/+21
	scx_root_disable() enters SCX_DISABLING before it grabs scx_enable_mutex to clear __scx_switched_all and scx_switching_all. task_should_scx() short-circuits on DISABLING, so forks in that window land on fair while next_active_class() still skips fair - the new tasks stall. This can deadlock the disable path itself: scx_alloc_and_add_sched() runs under scx_enable_mutex and creates a helper kthread; if that new kthread is one of the stalled fair tasks, the mutex holder waits forever and scx_root_disable() can never make progress. Only sub-sched support exposes this, since sub-sched enables are the only path where scx_alloc_and_add_sched() can race the root's disable. Move the DISABLING check after @scx_switching_all. @scx_switching_all serves as a proxy for __scx_switched_all, so while it's set, forks keep going to scx. Once cleared, DISABLING applies normally. v2: Reword in-source comment and description. (Andrea) Fixes: 337ec00b1d9c ("sched_ext: Implement cgroup sub-sched enabling and disabling") Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
2026-05-17	Merge tag 'trace-v7.1-rc3' of ↵	Linus Torvalds	2	-4/+4
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Add more functions to the remote allowed list randconfig found more functions that are allowed for the remote code for s390 and arm. Add them to the allowed list. - Fix remote_test error path If one of the simple ring buffers fails to load, the code is supposed to rollback its initialized buffers. Instead of rolling back the buffers for the failed load, it uses the global variable and rolls back all the successfully loaded buffers. * tag 'trace-v7.1-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix desc in error path for the trace remote test module ring-buffer remote: Avoid unexpected symbol warnings (arm, s390)
2026-05-17	bpf: Check global subprog exception paths	Kumar Kartikeya Dwivedi	2	-7/+29
	Global subprogs are verified independently and are not descended into when their callers are symbolically executed. This means a caller can hold references or locks across a global subprog call that may throw, while the verifier only checks the non-exceptional return path at the call site. Record whether a subprog might throw in the CFG summary pass, alongside the existing might_sleep and packet-data-changing summaries, and propagate that effect through reachable callees. When a global subprog is marked as possibly throwing, push the normal continuation and validate the exceptional path immediately at the call site, avoiding a synthetic exception state and associated special case in the pruning checks. Fixes: f18b03fabaa9 ("bpf: Implement BPF exceptions") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20260517075530.3461166-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-17	Merge tag 'irq-urgent-2026-05-17' of ↵	Linus Torvalds	2	-2/+14
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull IRQ fixes from Ingo Molnar: - Fix use-after-free in irq_work_single() on PREEMPT_RT (Jiayuan Chen) - Don't call add_interrupt_randomness() for NMIs in handle_percpu_devid_irq() (Mark Rutland) - Remove unused function in the ath79-cpu irqchip driver causing LKP CI build warnings (Rosen Penev) - Fix IRQ allocation/teardown leakage regressions in the GICv5 irqchip driver (Sascha Bischoff) - Fix an IRQ trigger type regression in the Meson S4 SoC irqchip driver (Xianwei Zhao) - Fix CPU offlining regression in the RiscV IMSIC irqchip driver (Yong-Xuan Wang) * tag 'irq-urgent-2026-05-17' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: irq_work: Fix use-after-free in irq_work_single() on PREEMPT_RT irqchip/riscv-imsic: Clear interrupt move state during CPU offlining irqchip/meson-gpio: Use the correct register in meson_s4_gpio_irq_set_type() irqchip/ath79-cpu: Remove unused function genirq/chip: Don't call add_interrupt_randomness() for NMIs irqchip/gic-v5: Allocate ITS parent LPIs as a range irqchip/gic-v5: Support range allocation for LPIs irqchip/gic-v5: Move LPI allocation into the LPI domain
2026-05-16	tracing: Fix desc in error path for the trace remote test module	Vincent Donnefort	1	-2/+2
	During initialisation in remote_test_load(), if one of the simple_ring_buffer fails to initialise, the error path attempts to rollback initialised buffers. However, the rollback incorrectly uses the global pointer to the trace descriptor, which is only set upon successful load completion. Fix the error path by using the local pointer to the descriptor. Link: https://patch.msgid.link/20260515201616.337469-1-vdonnefort@google.com Fixes: ea908a2b79c8 ("tracing: Add a trace remote module for testing") Reported-by: Sashiko <sashiko-bot@kernel.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> base-commit: 5d6919055dec134de3c40167a490f33c74c12581 Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-15	ring-buffer remote: Avoid unexpected symbol warnings (arm, s390)	Arnd Bergmann	1	-2/+2
	The now more verbose check found more architecture specific symbol missing from the whitelist, during randconfig testing on s390 and 32-bit arm: Unexpected symbols in kernel/trace/simple_ring_buffer.o: U __aeabi_unwind_cpp_pr1 Unexpected symbols in kernel/trace/simple_ring_buffer.o: U __s390_indirect_jump_r1 U __s390_indirect_jump_r10 U __s390_indirect_jump_r14 U __s390_indirect_jump_r2 U __s390_indirect_jump_r5 U __s390_indirect_jump_r7 U __s390_indirect_jump_r8 U __s390_indirect_jump_r9 make[6]: *** [/home/arnd/arm-soc/kernel/trace/Makefile:160: kernel/trace/simple_ring_buffer.o.checked] Error 1 Add these to the list and keep it roughly sorted into sanitizer and architecture symbols. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Marc Zyngier <maz@kernel.org> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Paolo Bonzini <pbonzini@redhat.com> Link: https://patch.msgid.link/20260515105717.1023007-1-arnd@kernel.org Fixes: 1211907ac0b5 ("tracing: Generate undef symbols allowlist for simple_ring_buffer") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-15	bpf: make bpf_session_is_return() reference optional	Arnd Bergmann	1	-0/+4
	Building without CONFIG_BPF_EVENTS produces a build-time warning: WARN: resolve_btfids: unresolved symbol bpf_session_is_return The function is actually defined in kernel/trace/bpf_trace.o, which is built conditionally based on configuration. Make the reference to this function conditional as well, as is already done in the bpf verifier for other functions. Fixes: 8fe4dc4f6456 ("bpf: change prototype of bpf_session_{cookie,is_return}") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20260515113242.2706303-1-arnd@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-14	Merge tag 'audit-pr-20260513' of ↵	Linus Torvalds	2	-1/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit Pull audit fixes from Paul Moore: - Correctly log the inheritable capabilities - Honor AUDIT_LOCKED in the AUDIT_TRIM and AUDIT_MAKE_EQUIV commands * tag 'audit-pr-20260513' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/audit: audit: enforce AUDIT_LOCKED for AUDIT_TRIM and AUDIT_MAKE_EQUIV audit: fix incorrect inheritable capability in CAPSET records
2026-05-14	ptrace: slightly saner 'get_dumpable()' logic	Linus Torvalds	2	-6/+17
	The 'dumpability' of a task is fundamentally about the memory image of the task - the concept comes from whether it can core dump or not - and makes no sense when you don't have an associated mm. And almost all users do in fact use it only for the case where the task has a mm pointer. But we have one odd special case: ptrace_may_access() uses 'dumpable' to check various other things entirely independently of the MM (typically explicitly using flags like PTRACE_MODE_READ_FSCREDS). Including for threads that no longer have a VM (and maybe never did, like most kernel threads). It's not what this flag was designed for, but it is what it is. The ptrace code does check that the uid/gid matches, so you do have to be uid-0 to see kernel thread details, but this means that the traditional "drop capabilities" model doesn't make any difference for this all. Make it all make a bit more sense by saying that if you don't have a MM pointer, we'll use a cached "last dumpability" flag if the thread ever had a MM (it will be zero for kernel threads since it is never set), and require a proper CAP_SYS_PTRACE capability to override. Reported-by: Qualys Security Advisory <qsa@qualys.com> Cc: Oleg Nesterov <oleg@redhat.com> Cc: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-05-14	bpf: Use array_map_meta_equal for percpu array inner map replacement	Guannan Wang	1	-1/+1
	percpu_array_map_ops.map_meta_equal points to the generic bpf_map_meta_equal(), which does not compare max_entries. When a percpu array serves as an inner map, replacing it with one that has fewer max_entries bypasses the check. Since percpu_array_map_gen_lookup() inlines the original template's index_mask as a JIT immediate, a lookup on the replacement map can access pptrs[] out of bounds. Point percpu_array_map_ops.map_meta_equal to array_map_meta_equal(), which already enforces the max_entries equality check. Add a selftest to verify that replacing a percpu array inner map with a differently-sized one is rejected. Fixes: db69718b8efa ("bpf: inline bpf_map_lookup_elem() for PERCPU_ARRAY maps") Signed-off-by: Guannan Wang <wgnbuaa@gmail.com> Acked-by: Mykyta Yatsenko <yatsenko@meta.com> Link: https://lore.kernel.org/r