linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
5 days	Merge tag 'dma-mapping-7.1-2026-06-11' of ↵	Linus Torvalds	2	-2/+2
	git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fix from Marek Szyprowski: "Three more fixes for the DMA-mapping code, related to PCI P2PDMA, DMA debug and DMA link ranges API (Li RongQing and Jason Gunthorpe)" * tag 'dma-mapping-7.1-2026-06-11' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: iommu/dma: Do not try to iommu_map a 0 length region in swiotlb dma-debug: fix physical address retrieval in debug_dma_sync_sg_for_device dma-mapping: direct: fix missing mapping for THRU_HOST_BRIDGE segments
7 days	Merge tag 'trace-rv-v7.1-rc6-2' of ↵	Linus Torvalds	4	-14/+9
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull runtime verifier fixes from Steven Rostedt: - Fix reset ordering on per-task destruction Reset the task before dropping the slot instead of after, which was causing out-of-bound memory accesses. - Fix HA monitor synchronization and cleanup Ensure synchronous cleanup for HA monitors by running timer callbacks in RCU read-side critical sections and using synchronize_rcu() during destruction. - Avoid armed timers after tasks exit Add automatic cleanup for per-task HA monitors to prevent timers from firing after task exit. - Fix memory ordering for DA/HA monitors Fix race conditions during monitor start by using release-acquire semantics for the monitoring flag. - Fix initialization for DA/HA monitors Ensure monitors are not initialized relying on potentially corrupted state like the monitoring flag, that is not reset by all monitors type and may have an unknown state in monitors reusing the storage (per-task). - Fix memory safety in per-task and per-object monitors Prevent use-after-free and out-of-bounds access by synchronizing with in-flight tracepoint probes using tracepoint_synchronize_unregister() before freeing monitor storage or releasing task slots. - Adjust monitors for preemptible tracepoints Fix monitors that relied on tracepoints disabling preemption. Explicitly disable task migration when per-CPU monitors handle events to avoid accessing the wrong state and update the opid monitor logic. - Fix incorrect __user specifier usage Remove __user from a non-pointer variable in the extract_params() helper. - Fix bugs in the rv tool Ensure strings are NUL-terminated, fix substring matching in monitor searches, and improve cleanup and exit status handling. - Fix several bugs in rvgen Fix LTL literal stringification, subparsers' options handling, and suffix stripping in dot2k. * tag 'trace-rv-v7.1-rc6-2' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: verification/rvgen: Fix ltl2k writing True as a literal verification/rvgen: Fix options shared among commands verification/rvgen: Fix suffix strip in dot2k tools/rv: Fix cleanup after failed trace setup tools/rv: Fix substring match when listing container monitors tools/rv: Fix substring match bug in monitor name search tools/rv: Ensure monitor name and desc are NUL-terminated rv: Use 0 to check preemption enabled in opid rv: Prevent task migration while handling per-CPU events rv: Ensure synchronous cleanup for HA monitors rv: Add automatic cleanup handlers for per-task HA monitors rv: Do not rely on clean monitor when initialising HA rv: Fix monitor start ordering and memory ordering for monitoring flag rv: Ensure all pending probes terminate on per-obj monitor destroy rv: Prevent in-flight per-task handlers from using invalid slots rv: Reset per-task DA monitors before releasing the slot rv: Fix __user specifier usage in extract_params()
9 days	Merge tag 'timers-urgent-2026-06-07' of ↵	Linus Torvalds	3	-4/+8
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: - Fix the arch_inlined_clockevent_set_next_coupled() prototype in the !CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST case (Naveen Kumar Chaudhary) - Fix an off-by-1 bug in the sys_settimeofday() usecs validation code (Naveen Kumar Chaudhary) - Mark vdso_k__data pointers as __ro_after_init (Thomas Weißschuh) - Fix livelock race in tmigr_handle_remote_up() (Amit Matityahu) tag 'timers-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: timers/migration: Fix livelock in tmigr_handle_remote_up() vdso/datastore: Mark vdso_k_*_data pointers as __ro_after_init time: Fix off-by-one in settimeofday() usec validation clockevents: Fix duplicate type specifier in stub function parameter
9 days	Merge tag 'locking-urgent-2026-06-07' of ↵	Linus Torvalds	3	-1/+10
	git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking fixes from Ingo Molnar: - Fix a NULL pointer dereference bug in the FUTEX_CMP_REQUEUE_PI code (Ji'an Zhou) - Fix a NULL pointer dereference bug in the rtmutex code (Davidlohr Bueso) * tag 'locking-urgent-2026-06-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: locking/rtmutex: Skip remove_waiter() when waiter is not enqueued futex/requeue: Prevent NULL pointer dereference in remove_waiter() on self-deadlock
10 days	Merge tag 'vfs-7.1-rc7.fixes' of ↵	Linus Torvalds	2	-3/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix error handling in ovl_cache_get() - Tighten access checks for exited tasks in pidfd_getfd() - Fix selftests leak in __wait_for_test() - Limit FUSE_NOTIFY_RETRIEVE to uptodate folios - Reject fuse_notify() pagecache ops on directories - Clear JOBCTL_PENDING_MASK for caller in zap_other_threads() - Fix failure to unlock in nfsd4_create_file() - Fix pointer arithmetic in qnx6 directory iteration - Fix UAF due to unlocked ->mnt_ns read in may_decode_fh() - Avoid potential null folio->mapping deref during iomap error reporting * tag 'vfs-7.1-rc7.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: iomap: avoid potential null folio->mapping deref during error reporting fhandle: fix UAF due to unlocked ->mnt_ns read in may_decode_fh() fs/qnx6: fix pointer arithmetic in directory iteration VFS: fix possible failure to unlock in nfsd4_create_file() signal: clear JOBCTL_PENDING_MASK for caller in zap_other_threads() fuse: reject fuse_notify() pagecache ops on directories fuse: limit FUSE_NOTIFY_RETRIEVE to uptodate folios selftests: harness: fix pidfd leak in __wait_for_test pidfd: refuse access to tasks that have started exiting harder ovl: keep err zero after successful ovl_cache_get()
11 days	Merge tag 'probes-fixes-v7.1-rc6' of ↵	Linus Torvalds	1	-2/+0
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing/probes fix from Masami Hiramatsu: "Fix the eprobe event parser to point error position correctly" * tag 'probes-fixes-v7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: Point the error offset correctly for eprobe argument error
12 days	timers/migration: Fix livelock in tmigr_handle_remote_up()	Amit Matityahu	1	-2/+6
	tmigr_handle_remote_cpu() skips timer_expire_remote() when cpu == smp_processor_id(), assuming the local softirq path already handled this CPU's timers. This assumption is wrong because jiffies can advance after the handling of the CPU's global timers in run_timer_base(BASE_GLOBAL) and before tmigr_handle_remote() evaluates the expiry times. As a consequence a timer which expires after the CPU local timer wheel advanced and becomes expired in the remote handling is ignored and the callback is never invoked and removed from the timer wheel. What's worse is that fetch_next_timer_interrupt_remote() keeps reporting it as expired, and the event is re-queued with expires == now on each iteration. The goto-again loop spins indefinitely. Fix this by calling timer_expire_remote() unconditionally. That's minimal overhead for the common case as __run_timer_base() returns immediately if there is nothing to expire in the local wheel. [ tglx: Amend change log and add a comment ] Fixes: 7ee988770326 ("timers: Implement the hierarchical pull model") Reported-by: Alon Kariv <alonka@amazon.com> Signed-off-by: Amit Matityahu <amitmat@amazon.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260603170139.33628-1-amitmat@amazon.com
13 days	locking/rtmutex: Skip remove_waiter() when waiter is not enqueued	Davidlohr Bueso	2	-1/+4
	syzbot triggered the following splat in remove_waiter() via FUTEX_CMP_REQUEUE_PI: KASAN: null-ptr-deref in range [0x0000000000000a88-0x0000000000000a8f] class_raw_spinlock_constructor remove_waiter+0x159/0x1200 kernel/locking/rtmutex.c:1561 rt_mutex_start_proxy_lock+0x103/0x120 futex_requeue+0x10e4/0x20d0 __x64_sys_futex+0x34f/0x4d0 task_blocks_on_rt_mutex() does not arm the waiter upon deadlock detection, leaving waiter->task nil, where 3bfdc63936dd ("rtmutex: Use waiter::task instead of current in remove_waiter()") made this fatal. Furthermore, rt_mutex_start_proxy_lock() should not be calling into remove_waiter() upon a successfully grabbing the rtmutex. 1a1fb985f2e2 ("futex: Handle early deadlock return correctly"), moved the remove_waiter() out of __rt_mutex_start_proxy_lock() (where 'ret' was only ever 0 or < 0) into the wrapper. Tighten this check to account for try_to_take_rt_mutex(). Fixes: 3bfdc63936dd ("rtmutex: Use waiter::task instead of current in remove_waiter()") Reported-by: syzbot+78147abe6c524f183ee9@syzkaller.appspotmail.com Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Closes: https://lore.kernel.org/all/69f114ac.050a0220.ac8b.0003.GAE@google.com/ Link: https://patch.msgid.link/20260507112913.1019537-1-dave@stgolabs.net
13 days	Merge tag 'cgroup-for-7.1-rc6-fixes' of ↵	Linus Torvalds	1	-6/+7
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "One cpuset fix and a maintenance update, both low-risk: - Fix cpuset partition CPU accounting under sibling CPU exclusion that could produce wrong CPU assignments and trigger scheduling-domain warnings. Includes selftests. - Update an email address in MAINTAINERS" * tag 'cgroup-for-7.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Change Ridong's email cgroup/cpuset: Add test cases for sibling CPU exclusion on partition update cgroup/cpuset: Use effective_xcpus in partcmd_update add/del mask calculation
13 days	Merge tag 'sched_ext-for-7.1-rc6-fixes' of ↵	Linus Torvalds	1	-4/+6
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: "Two low-risk fixes: - Drop a spurious warning that can fire during cgroup migration while a sched_ext scheduler is loaded - Fix a drgn-based debug script that broke after scheduler state moved into a per-scheduler struct" * tag 'sched_ext-for-7.1-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task() tools/sched_ext: Fix scx_show_state per-scheduler state reads
13 days	dma-debug: fix physical address retrieval in debug_dma_sync_sg_for_device	Li RongQing	1	-1/+1
	In debug_dma_sync_sg_for_device(), when iterating over a scatterlist, the debug entry population mistakenly uses the head of the scatterlist 'sg' to fetch the physical address via sg_phys(), instead of using the current iterator variable 's'. This causes dma-debug to track the physical address of the very first scatterlist entry for all subsequent entries in the list. Fix this by passing the correct loop iterator 's' to sg_phys() Fixes: 9d4f645a1fd49ee ("dma-debug: store a phys_addr_t in struct dma_debug_entry") Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260603123708.1665-1-lirongqing@baidu.com
13 days	rv: Use 0 to check preemption enabled in opid	Gabriele Monaco	1	-7/+1
	Tracepoint handlers no longer run with preemption disabled by default since a46023d5616 ("tracing: Guard __DECLARE_TRACE() use of __DO_TRACE_CALL() with SRCU-fast"), the opid monitor should now count 1 in the preemption count as preemption disabled. Change the rule for preempt_off to preempt > 0. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260601153840.124372-11-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
13 days	rv: Add automatic cleanup handlers for per-task HA monitors	Gabriele Monaco	3	-6/+6
	Hybrid automata monitors may start timers, depending on the model, these may remain active on an exiting task and cause false positives or even access freed memory. Add an enable/disable hook in the HA code, currently only populated by the per-task handler for registration and deregistration. This hooks to the sched_process_exit event and ensures the timer is stopped for every exiting task. The handler is enabled automatically but may be disabled, for instance if the monitor uses the event for another purpose (but should still manually ensure timers are stopped). Fixes: f5587d1b6ec9 ("rv: Add Hybrid Automata monitor type") Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260601153840.124372-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
13 days	rv: Fix __user specifier usage in extract_params()	Gabriele Monaco	1	-1/+2
	The attributes variables extracted from syscalls in the helper are both defined with the __user specifier although only the actual pointer to user data should be marked. Remove the __user specifier from attr. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202604150820.Ny143u6X-lkp@intel.com Fixes: b133207deb72 ("rv: Add nomiss deadline monitor") Reviewed-by: Wen Yang <wen.yang@linux.dev> Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260601153840.124372-2-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
13 days	dma-mapping: direct: fix missing mapping for THRU_HOST_BRIDGE segments	Li RongQing	1	-1/+1
	In dma_direct_map_sg(), the case PCI_P2PDMA_MAP_THRU_HOST_BRIDGE incorrectly used 'break' instead of falling through to MAP_NONE. As a result, segments traversing the host bridge skipped the required dma_direct_map_phys() call entirely, leaving sg->dma_address uninitialized and leading to DMA failures. Fix this by using 'fallthrough;'. Fixes: a25e7962db0d79 ("PCI/P2PDMA: Refactor the p2pdma mapping helpers") Reviewed-by: Logan Gunthorpe <logang@deltatee.com> Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260603013723.2439-1-lirongqing@baidu.com
14 days	sched_ext: Don't warn on NULL cgrp_moving_from in scx_cgroup_move_task()	Tejun Heo	1	-4/+6
	A WARN fires when systemd's user manager writes "+cpu +memory +pids" to its own subtree_control while a sched_ext scheduler is loaded: WARNING: at kernel/sched/ext.c:3227 scx_cgroup_move_task+0xa8/0xb0 scx_cgroup_move_task+0xa8/0xb0 sched_move_task+0x134/0x290 cpu_cgroup_attach+0x39/0x70 cgroup_migrate_execute+0x37d/0x450 cgroup_update_dfl_csses+0x1e3/0x270 cgroup_subtree_control_write+0x3e7/0x440 scx_cgroup_can_attach() arms cgrp_moving_from only when a task's cpu cgroup changes. It can still be NULL when scx_cgroup_move_task() runs, through this sequence: Step Result --------------------------------- ---------------------------------- 1. cpu enabled on cgroup G cpu css = A 2. cpu toggled off then on for G A killed, B created (same cgroup) 3. an exiting task keeps A alive migration skips it, A now stale 4. +memory migrates G stale A vs current B pulls cpu in 5. cpu attach runs for all tasks hits a live, cpu-unchanged task 6. scx_cgroup_move_task() on it cgrp_moving_from NULL -> WARN The mismatch is that scx_cgroup_can_attach() keys on cgroup identity while migration drives the move on css identity, so a NULL cgrp_moving_from here is a legitimate css-only migration, not a missing prep. The call is already gated on cgrp_moving_from, so just drop the warning. ops.cgroup_prep_move() and ops.cgroup_move() stay paired. Fixes: 819513666966 ("sched_ext: Add cgroup support") Cc: stable@vger.kernel.org # v6.12+ Reported-by: Matt Fleming <mfleming@cloudflare.com> Closes: https://lore.kernel.org/all/20260601124156.2205704-1-mfleming@cloudflare.com/ Signed-off-by: Tejun Heo <tj@kernel.org> Reviewed-by: Andrea Righi <arighi@nvidia.com>
14 days	futex/requeue: Prevent NULL pointer dereference in remove_waiter() on ↵	Ji'an Zhou	1	-0/+6
	self-deadlock When FUTEX_CMP_REQUEUE_PI requeues a non-top waiter that already owns the target PI futex, task_blocks_on_rt_mutex() returns -EDEADLK before setting waiter->task. The subsequent remove_waiter() in rt_mutex_start_proxy_lock() dereferences the NULL waiter->task, causing a kernel crash. Add a self-deadlock check for non-top waiters before calling rt_mutex_start_proxy_lock(), analogous to the top-waiter check in futex_lock_pi_atomic(). Fixes: 3bfdc63936dd4773109b7b8c280c0f3b5ae7d349 ("rtmutex: Use waiter::task instead of current in remove_waiter()") Signed-off-by: Ji'an Zhou <eilaimemedsnaimel@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org
14 days	time: Fix off-by-one in settimeofday() usec validation	Naveen Kumar Chaudhary	1	-1/+1
	The validation check uses '>' instead of '>=' when comparing tv_usec against USEC_PER_SEC, allowing the value 1000000 through. After conversion to nanoseconds (*= 1000), this produces tv_nsec == NSEC_PER_SEC, violating the timespec invariant that tv_nsec must be less than NSEC_PER_SEC. Use '>=' to reject tv_usec values that are not in the valid range of 0 to 999999. Fixes: 5e0fb1b57bea ("y2038: time: avoid timespec usage in settimeofday()") Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Acked-by: John Stultz <jstultz@google.com> Link: https://patch.msgid.link/4rikk44zew3s6577dugmx4jyblz7o5c57niuap6ct3td5yfm6w@gh7pcumg7qor
14 days	clockevents: Fix duplicate type specifier in stub function parameter	Naveen Kumar Chaudhary	1	-1/+1
	The stub for arch_inlined_clockevent_set_next_coupled() has 'u64 u64 cycles' in its parameter list. Since u64 is a typedef, the compiler parses the second 'u64' as the parameter name, making 'cycles' an unused token. Remove the duplicate so the parameter is correctly named. Fixes: 89f951a1e8ad ("clockevents: Provide support for clocksource coupled comparators") Signed-off-by: Naveen Kumar Chaudhary <naveen.osdev@gmail.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/7tostpvxzdn6tobmyow63a5rweatls5kux3scqp2vzhe7mv6uq@ecr746b4hyhf
2026-05-30	Merge tag 'liveupdate-fixes-2026-05-30' of ↵	Linus Torvalds	1	-24/+32
	git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux Pull liveupdate fixes from Mike Rapoport: "Two kexec handover regression fixes: - fix order calculation for kho_unpreserve_pages() to make sure sure that the order calculation in kho_unpreserve_pages() mathes the order calculation in kho_preserve_pages(). - fix math in calculation of KHO_TREE_MAX_DEPTH to make it work with 16KB pages" * tag 'liveupdate-fixes-2026-05-30' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux: kho: fix order calculation for kho_unpreserve_pages() kho: fix KHO_TREE_MAX_DEPTH for non-4KB page sizes
2026-05-30	tracing/probes: Point the error offset correctly for eprobe argument error	Masami Hiramatsu (Google)	1	-2/+0
	Fix to point the error offset correctly for eprobe argument error. In the cleanup commit 1b8b0cd754cd ("tracing/probes: Move event parameter fetching code to common parser"), due to incorrect backward compatibility aimed at conforming to the test specifications, the error location was set to 0 when a non-existent formal parameter was specified for Eprobe. However, this should be corrected in both the test and the implementation to point correct error position. Link: https://lore.kernel.org/all/177967567399.209006.1451571244515632097.stgit@devnote2/ Fixes: 1b8b0cd754cd ("tracing/probes: Move event parameter fetching code to common parser") Cc: stable@vger.kernel.org Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-27	cgroup/cpuset: Use effective_xcpus in partcmd_update add/del mask calculation	Sun Shaojie	1	-6/+7
	When sibling CPU exclusion occurs, a partition's user_xcpus may contain CPUs that were never actually granted to it. These CPUs are present in user_xcpus(cs) but not in cs->effective_xcpus. The partcmd_update path in update_parent_effective_cpumask() uses user_xcpus(cs) (via the local variable xcpus) to compute the addmask (CPUs to return to parent) and delmask (CPUs to request from parent). This is incorrect: 1) When newmask removes a CPU that was previously excluded by a sibling, addmask incorrectly includes that CPU and tries to return it to the parent even though the partition never actually owned it, causing CPU overlap with sibling partitions and triggering warnings in generate_sched_domains(). 2) When newmask adds a previously excluded CPU that is now available, delmask fails to request it from the parent because user_xcpus(cs) already includes it. Fix this by using cs->effective_xcpus instead of user_xcpus(cs) in all partcmd_update paths that calculate addmask or delmask, including the PERR_NOCPUS error handling paths. Reproducers: Example 1 - Removing a sibling-excluded CPU incorrectly returns it: # cd /sys/fs/cgroup # echo "0-1" > a1/cpuset.cpus # echo "root" > a1/cpuset.cpus.partition # echo "0-2" > b1/cpuset.cpus # echo "root" > b1/cpuset.cpus.partition # echo "2" > b1/cpuset.cpus # cat cpuset.cpus.effective # Actual: 0-1,3 Expected: 3 Example 2 - Expanding to a previously excluded CPU fails to request it: # cd /sys/fs/cgroup # echo "0-1" > a1/cpuset.cpus # echo "root" > a1/cpuset.cpus.partition # echo "0-2" > b1/cpuset.cpus # echo "root" > b1/cpuset.cpus.partition # echo "member" > a1/cpuset.cpus.partition # echo "1-2" > b1/cpuset.cpus # cat cpuset.cpus.effective # Actual: 0-1,3 Expected: 0,3 Fixes: 2a3602030d80 ("cgroup/cpuset: Don't invalidate sibling partitions on cpuset.cpus conflict") Cc: stable@vger.kernel.org # v7.0+ Suggested-by: Zhang Guopeng <zhangguopeng@kylinos.cn> Signed-off-by: Sun Shaojie <sunshaojie@kylinos.cn> Reviewed-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-26	Merge tag 'mm-hotfixes-stable-2026-05-25-16-22' of ↵	Linus Torvalds	1	-6/+5
	git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes. 9 are for MM. 9 are cc:stable and the remaining 4 address post-7.1 issues or aren't considered suitable for backporting. All patches are singletons - please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2026-05-25-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "mm: introduce a new page type for page pool in page type" mm/vmalloc: do not trigger BUG() on BH disabled context MAINTAINERS, mailmap: change email for Eugen Hristev mm/migrate_device: fix pgtable leak in migrate_vma_insert_huge_pmd_page kernel/fork: validate exit_signal in kernel_clone() mm: memcontrol: propagate NMI slab stats to memcg vmstats mm/damon/sysfs-schemes: delete tried region in regions_rmdirs() mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one zram: fix use-after-free in zram_writeback_endio memfd: deny writeable mappings when implying SEAL_WRITE ipc: limit next_id allocation to the valid ID range Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare" MAINTAINERS: .mailmap: update after GEHC spin-off
2026-05-26	kho: fix order calculation for kho_unpreserve_pages()	Pratyush Yadav (Google)	1	-24/+32
	Commit 91e74fa8b1bc ("kho: make sure preservations do not span multiple NUMA nodes") made sure preservations from kho_preserve_pages() do not span multiple NUMA nodes. If they do, the order is reduced and tried again. The same logic was not implemented for kho_unpreserve_pages(). This can result in unpreserve calculating a different order than preserve, and thus not actually unpreserving the pages. Fix this by moving the order calculation logic to __kho_preserve_pages_order() and use it from both preserve and unpreserve paths. Move __kho_unpreserve() down to avoid having a forward declaration. Its users are further down in the file anyway. Also, it results in grouping for all the page-level preservation and unpreservation functions. This unfortunately makes the diff hard to read, but the main change in __kho_unpreserve() is to call __kho_preserve_pages_order() instead of open-coding the order calculation. Fixes: 91e74fa8b1bc ("kho: make sure preservations do not span multiple NUMA nodes") Cc: stable@vger.kernel.org Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Samiullah Khawaja <skhawaja@google.com> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://patch.msgid.link/20260519133332.2498092-1-pratyush@kernel.org Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-05-24	Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf	Linus Torvalds	7	-20/+93
	Pull bpf fixes from Alexei Starovoitov: - Fix bpf_throw() and global subprog combination (Kumar Kartikeya Dwivedi) - Fix out of bounds access in BPF interpreter (Yazhou Tang) - Fix potential out of bounds access in inner per-cpu array map (Guannan Wang) - Reject NULL data/sig in bpf_verify_pkcs7_signature (KP Singh) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: libbpf: fix off-by-one in emit_signature_match jump offset bpf: Reject NULL data/sig in bpf_verify_pkcs7_signature selftests/bpf: Cover global subprog exception leaks bpf: Check global subprog exception paths bpf: make bpf_session_is_return() reference optional bpf: Use array_map_meta_equal for percpu array inner map replacement selftests/bpf: Add test for large offset bpf-to-bpf call bpf: Fix s16 truncation for large bpf-to-bpf call offsets bpf: Fix out-of-bounds read in bpf_patch_call_args()
2026-05-22	Merge tag 'sched_ext-for-7.1-rc4-fixes' of ↵	Linus Torvalds	1	-3/+36
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: - Spurious WARN in ops_dequeue() racing with concurrent dispatch - Self-deadlock between scheduler disable and a concurrent sub-sched enable * tag 'sched_ext-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: sched_ext: Fix spurious WARN on stale ops_state in ops_dequeue() sched_ext: Fix deadlock between scx_root_disable() and concurrent forks
2026-05-22	Merge tag 'cgroup-for-7.1-rc4-fixes' of ↵	Linus Torvalds	1	-14/+23
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: "Two rstat fixes: - Out-of-bounds access in the css_rstat_updated() BPF kfunc when called with an unchecked user-supplied cpu - Over-strict NMI guard after the recent switch to try_cmpxchg left sparc and ppc64 unable to queue rstat updates from NMI" * tag 'cgroup-for-7.1-rc4-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup: rstat: relax NMI guard after switch to try_cmpxchg cgroup/rstat: validate cpu before css_rstat_cpu() access
2026-05-22	signal: clear JOBCTL_PENDING_MASK for caller in zap_other_threads()	Aleksandr Nogikh	1	-0/+1
	When a multi-threaded process receives a stop signal (e.g., SIGSTOP), do_signal_stop() sets JOBCTL_STOP_PENDING and JOBCTL_STOP_CONSUME on all threads and sets signal->group_stop_count to the number of threads. If one of the threads concurrently calls execve(), de_thread() invokes zap_other_threads() to kill all other threads. zap_other_threads() aborts the pending group stop by resetting signal->group_stop_count to 0 and clears the JOBCTL_PENDING_MASK for all other threads. However, it fails to clear the job control flags for the calling thread. When execve() completes, the calling thread returns to user mode and checks for pending signals. Seeing the stale JOBCTL_STOP_PENDING flag, it calls do_signal_stop(), which invokes task_participate_group_stop(). Since JOBCTL_STOP_CONSUME is still set, it attempts to decrement the already-zero signal->group_stop_count, triggering a warning: sig->group_stop_count == 0 WARNING: CPU: 1 PID: 6475 at kernel/signal.c:373 task_participate_group_stop+0x215/0x2d0 Call Trace: <TASK> do_signal_stop+0x3be/0x5c0 kernel/signal.c:2619 get_signal+0xa8c/0x1330 kernel/signal.c:2884 arch_do_signal_or_restart+0xbc/0x840 arch/x86/kernel/signal.c:337 exit_to_user_mode_loop+0x8c/0x4d0 kernel/entry/common.c:98 do_syscall_64+0x33e/0xf80 arch/x86/entry/syscall_64.c:100 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> Fix this race condition by clearing the JOBCTL_PENDING_MASK for the calling thread in zap_other_threads(), ensuring it does not retain any stale job control state after the thread group is destroyed. This aligns with other functions that tear down a thread group and abort group stops, such as zap_process() and complete_signal(), which correctly clear these flags for all threads including the current one. Fixes: 39efa3ef3a37 ("signal: Use GROUP_STOP_PENDING to stop once for a single group stop") Assisted-by: Gemini:gemini-3.1-pro-preview Gemini:gemini-3-flash-preview syzbot Reported-by: syzbot+b109633ea805cac54a61@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b109633ea805cac54a61 Link: https://syzkaller.appspot.com/ai_job?id=d70208cc-862b-4fe3-bf02-3031e10cd0b3 Signed-off-by: Aleksandr Nogikh <nogikh@google.com> Link: https://patch.msgid.link/20260521142240.2973022-1-nogikh@google.com Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-22	Merge tag 'dma-mapping-7.1-2026-05-22' of ↵	Linus Torvalds	3	-7/+10
	git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "Two minor updates for the DMA-mapping code, mainly fixing some rare corner cases (Petr Tesarik, Jianpeng Chang)" * tag 'dma-mapping-7.1-2026-05-22' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma-mapping: move dma_map_resource() sanity check into debug code dma-direct: fix use of max_pfn
2026-05-22	Merge tag 'trace-v7.1-rc4' of ↵	Linus Torvalds	2	-8/+15
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Avoid NULL return from hist_field_name() The function hist_field_name() is directly passed to a strcat() which does not handle "NULL" characters. Return a zero length string when size is greater than the limit. This is used only to output already created histograms and no field currently is greater than the limit. But it should still not return NULL. - Do not call map->ops->elt_free() on allocation failure When elt_alloc() fails, it should not call the map->ops->elt_free() function if it exists, as that function may not be able to handle the free on allocation failures. The ->elt_free() should only be called when elt_alloc() succeeds. * tag 'trace-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Do not call map->ops->elt_free() if elt_alloc() fails tracing: Avoid NULL return from hist_field_name() on truncation
2026-05-21	kernel/fork: validate exit_signal in kernel_clone()	Deepanshu Kartikey	1	-6/+5
	When a child process exits, it sends exit_signal to its parent via do_notify_parent(). The clone() syscall constructs exit_signal as: (lower_32_bits(clone_flags) & CSIGNAL) CSIGNAL is 0xff, so values in the range 65-255 are possible. However, valid_signal() only accepts signals up to _NSIG (64 on x86_64). A non-zero non-valid exit_signal acts the same as exit_signal == 0: the parent process is not signaled when the child terminates. The syzkaller reproducer triggers this by calling clone() with flags=0x80, resulting in exit_signal = (0x80 & CSIGNAL) = 128, which exceeds _NSIG and is not a valid signal. The v1 of this patch added the check only in the clone() syscall handler, which is incomplete. kernel_clone() has other callers such as sys_ia32_clone() which would remain unprotected. Move the check to kernel_clone() to cover all callers. Since the valid_signal() check is now in kernel_clone() and covers all callers including clone3(), the same check in copy_clone_args_from_user() becomes redundant and is removed. The higher 32bits check for clone3() is kept as it is clone3() specific. Note that this is a user-visible change: previously, passing an invalid exit_signal to clone() was silently accepted. The man page for clone() does not document any defined behavior for invalid exit_signal values, so rejecting them with -EINVAL is the correct behavior. It is unlikely that any sane application relies on passing an invalid exit_signal. [oleg@redhat.com: the comment above kernel_clone() should be updated] Link: https://lore.kernel.org/abwvgU17W8wuW2-J@redhat.com Link: https://lore.kernel.org/20260316151956.563558-1-kartikey406@gmail.com Fixes: 3f2c788a1314 ("fork: prevent accidental access to clone3 features") Signed-off-by: Deepanshu Kartikey <Kartikey406@gmail.com> Signed-off-by: Oleg Nesterov <oleg@redhat.com> Reported-by: syzbot+bbe6b99feefc3a0842de@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=bbe6b99feefc3a0842de Tested-by: syzbot+bbe6b99feefc3a0842de@syzkaller.appspotmail.com Link: https://lore.kernel.org/all/20260307064202.353405-1-kartikey406@gmail.com/T/ [v1] Link: https://lore.kernel.org/all/20260316104536.558108-1-kartikey406@gmail.com/T/ [v2] Acked-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Michal Hocko <mhocko@suse.com> Cc: Ben Segall <bsegall@google.com> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand <david@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam@infradead.org> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Mel Gorman <mgorman@suse.de> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@kernel.org> Cc: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-05-21	Merge tag 'trace-ringbuffer-v7.1-rc4' of ↵	Linus Torvalds	3	-8/+29
	git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull ring-buffer fixes from Steven Rostedt: - Fix reporting MISSED EVENTS in trace iterator When the "trace" file is read with tracing enabled, if the writer were to pass the iterator reader, it resets, sets a "missed_events" flag and continues. The tracing output checks for missed events and if there are some, it prints out "[LOST EVENTS]" to let the user know events were dropped. But the clearing of the missed_events happened when the tracing system queried the ring buffer iterator about missed events. This was premature as the ring buffer is per CPU, and the tracing code reads all the CPU buffers and checks for missed events when it is read. If the CPU iterator that had missed events isn't printed next, the output for the LOST EVENTS is lost. Clear the missed_events flag when the iterator moves to the next event and not when the missed_events flag is queried. Also clear it on reset. - Flush and stop the persistent ring buffer on panic On panic the persistent ring buffer is used to debug what caused the panic. But on some architectures, it requires flushing the memory from cache, otherwise, the ring buffer persistent memory may not have the last events and this could also cause the ring buffer to be corrupted on the next boot. - Fix nr_subbufs initialization in simple_ring_buffer_init_mm The remote simple ring buffer meta data nr_subbufs is initialized too early and gets cleared later on, making it zero and not reflect the actual number of sub-buffers. - Fix unload_page for simple_ring_buffer init rollback On error, the pages loaded need to be unloaded. To unload a page it is expected that: page = load_page(va); -> unload_page(page). But the code was doing: unload_page(va) and not unload_page(page). - Create output file from cmd_check_undefined The check for undefined symbols checks if the file .o.checked exists and if so it skips doing the work. But the .o.checked file never was created making every build do the work even when it was already done previously. * tag 'trace-ringbuffer-v7.1-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Create output file from cmd_check_undefined tracing: Fix unload_page for simple_ring_buffer init rollback tracing: Fix nr_subbufs initialization in simple_ring_buffer_init_mm() ring-buffer: Flush and stop persistent ring buffer on panic ring-buffer: Fix reporting of missed events in iterator
2026-05-21	sched_ext: Fix spurious WARN on stale ops_state in ops_dequeue()	Samuele Mariotti	1	-2/+15
	ops_dequeue() can race with finish_dispatch() and spuriously trigger the "queued task must be in BPF scheduler's custody" warning. ops_dequeue() snapshots p->scx.ops_state via atomic_long_read_acquire() and then, in the SCX_OPSS_QUEUED arm, asserts that SCX_TASK_IN_CUSTODY is set. The two reads are not atomic w.r.t. a concurrent finish_dispatch() running on another CPU: CPU 1 CPU 2 ===== ===== dequeue_task_scx() ops_dequeue() opss = read_acquire(ops_state) = SCX_OPSS_QUEUED finish_dispatch() cmpxchg ops_state: SCX_OPSS_QUEUED -> SCX_OPSS_DISPATCHING [succeeds] dispatch_enqueue(SCX_DSQ_GLOBAL, SCX_ENQ_CLEAR_OPSS) call_task_dequeue() p->scx.flags &= ~SCX_TASK_IN_CUSTODY WARN_ON_ONCE(!(p->scx.flags & SCX_TASK_IN_CUSTODY)) /* opss is stale: QUEUED, * but task already claimed */ set_release(ops_state, SCX_OPSS_NONE) The race has been observed via two distinct call chains: the most common goes through sched_setaffinity(), a rarer variant through sched_change_begin(). For SCX_DSQ_GLOBAL / SCX_DSQ_BYPASS, dispatch_enqueue() clears SCX_TASK_IN_CUSTODY before clearing ops_state to SCX_OPSS_NONE (intentional, to avoid concurrent non-atomic RMW of p->scx.flags against ops_dequeue()). The window between those two writes is exactly what ops_dequeue() observes as "QUEUED without custody". The observed state is not actually inconsistent, it just means CPU 1 has already claimed the task and the QUEUED value held by CPU 2 is stale. Re-read ops_state in that case; the next read is guaranteed to return SCX_OPSS_DISPATCHING or SCX_OPSS_NONE, both of which exit the switch cleanly. The retry is bounded: once IN_CUSTODY is cleared, ops_state has already advanced past QUEUED for this dispatch cycle, and a fresh QUEUED would require re-enqueue under p's rq lock, which CPU 2 holds. Changes in v2: - Use READ_ONCE() for p->scx.flags to ensure fresh reads and prevent compiler reordering in the lockless path - Add cpu_relax() to reduce power consumption and improve performance during the spin-wait - Use unlikely() to optimize branch prediction for the common case - Expand the in-code comment to document the race condition and bounded retry guarantee Fixes: ebf1ccff79c4 ("sched_ext: Fix ops.dequeue() semantics") Suggested-by: Andrea Righi <arighi@nvidia.com> Signed-off-by: Samuele Mariotti <smariotti@disroot.org> Signed-off-by: Paolo Valente <paolo.valente@unimore.it> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-05-21	tracing: Do not call map->ops->elt_free() if elt_alloc() fails	Masami Hiramatsu (Google)	1	-4/+13
	In paths where tracing_map_elt_alloc() failed to allocate objects, the map->ops->elt_alloc() call was never successful. In this case, map->ops->elt_free() should not be called. Link: https://sashiko.dev/#/patchset/20260520223101.34710-1-rosenp%40gmail.com Cc: stable@vger.kernel.org Cc: Tom Zanussi <tom.zanussi@linux.intel.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Rosen Penev <rosenp@gmail.com> Reported-by: Sashiko <sashiko-bot@kernel.org> Fixes: 2734b629525a ("tracing: Add per-element variable support to tracing_map") Link: https://patch.msgid.link/177933895460.108746.5396070821443932634.stgit@devnote2 Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	tracing: Create output file from cmd_check_undefined	Thomas Weißschuh	1	-1/+2
	As the output file is currently never created, the check will run every time, even if the inputs have not changed. Create an empty output file which allows make to skip the execution when it is not necessary. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Marc Zyngier <maz@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260520-tracing-ringbuffer-check-v1-1-d979cfab1338@weissschuh.net Fixes: 1211907ac0b5 ("tracing: Generate undef symbols allowlist for simple_ring_buffer") Fixes: 58b4bd18390e ("tracing: Adjust cmd_check_undefined to show unexpected undefined symbols") Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	tracing: Fix unload_page for simple_ring_buffer init rollback	Vincent Donnefort	1	-1/+1
	The unload_page callback expects the return value of load_page() as its argument: ret = load_page(va); unload(ret). Fix the rollback code in simple_ring_buffer_init_mm() where the descriptor's VA is used instead of the loaded page address. Link: https://patch.msgid.link/20260512141614.1759430-1-vdonnefort@google.com Fixes: 635923081c79 ("tracing: load/unload page callbacks for simple_ring_buffer") Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	tracing: Fix nr_subbufs initialization in simple_ring_buffer_init_mm()	David Carlier	1	-1/+1
	nr_subbufs in the ring buffer metadata is always initialized to zero because it is assigned from cpu_buffer->nr_pages before the page initialization loop has run. While nr_subbufs is not currently read by the kernel, it should reflect the actual buffer geometry in the meta page for correctness. Move the assignment after the page loop so that cpu_buffer->nr_pages holds the final count. Link: https://patch.msgid.link/20260512135420.99194-1-devnexen@gmail.com Fixes: 34e5b958bdad ("tracing: Introduce simple_ring_buffer") Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Assisted-by: Claude:claude-opus-4-7 Signed-off-by: David Carlier <devnexen@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	ring-buffer: Flush and stop persistent ring buffer on panic	Masami Hiramatsu (Google)	1	-0/+22
	On real hardware, panic and machine reboot may not flush hardware cache to memory. This means the persistent ring buffer, which relies on a coherent state of memory, may not have its events written to the buffer and they may be lost. Moreover, there may be inconsistency with the counters which are used for validation of the integrity of the persistent ring buffer which may cause all data to be discarded. To avoid this issue, stop recording of the ring buffer on panic and flush the cache of the ring buffer's memory. Fixes: e645535a954a ("tracing: Add option to use memmapped memory for trace boot instance") Cc: stable@vger.kernel.org Cc: Will Deacon <will@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Link: https://patch.msgid.link/177751969602.2136606.12031934362587643488.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-05-21	ring-buffer: Fix reporting of missed events in iterator	Steven Rostedt	1	-5/+3
	When tracing is active while reading the trace file, if the iterator reading the buffer detects that the writer has passed the iterator head, it will reset and set a "missed events" flag. This flag is passed to the output processing to show the user that events were missed: CPU:4 [LOST EVENTS] The problem is that the