aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2026-03-31Merge tag 'wq-for-7.0-rc6-fixes' of ↵Linus Torvalds1-3/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq Pull workqueue fix from Tejun Heo: - Fix false positive stall reports on weakly ordered architectures where the lockless worklist/timestamp check in the watchdog can observe stale values due to memory reordering. Recheck under pool->lock to confirm. * tag 'wq-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: workqueue: Better describe stall check workqueue: Fix false positive stall reports
2026-03-31Merge tag 'cgroup-for-7.0-rc6-fixes' of ↵Linus Torvalds2-12/+105
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup fixes from Tejun Heo: - Fix cgroup rmdir racing with dying tasks. Deferred task cgroup unlink introduced a window where cgroup.procs is empty but the cgroup is still populated, causing rmdir to fail with -EBUSY and selftest failures. Make rmdir wait for dying tasks to fully leave and fix selftests to not depend on synchronous populated updates. - Fix cpuset v1 task migration failure from empty cpusets under strict security policies. When CPU hotplug removes the last CPU from a v1 cpuset, tasks must be migrated to an ancestor without a security_task_setscheduler() check that would block the migration. * tag 'cgroup-for-7.0-rc6-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroup/cpuset: Skip security check for hotplug induced v1 task migration cgroup/cpuset: Simplify setsched decision check in task iteration loop of cpuset_can_attach() cgroup: Fix cgroup_drain_dying() testing the wrong condition selftests/cgroup: Don't require synchronous populated update on task exit cgroup: Wait for dying tasks to leave on rmdir
2026-03-31cgroup/cpuset: Skip security check for hotplug induced v1 task migrationWaiman Long1-0/+10
When a CPU hot removal causes a v1 cpuset to lose all its CPUs, the cpuset hotplug handler will schedule a work function to migrate tasks in that cpuset with no CPU to its ancestor to enable those tasks to continue running. If a strict security policy is in place, however, the task migration may fail when security_task_setscheduler() call in cpuset_can_attach() returns a -EACCES error. That will mean that those tasks will have no CPU to run on. The system administrators will have to explicitly intervene to either add CPUs to that cpuset or move the tasks elsewhere if they are aware of it. This problem was found by a reported test failure in the LTP's cpuset_hotplug_test.sh. Fix this problem by treating this special case as an exception to skip the setsched security check in cpuset_can_attach() when a v1 cpuset with tasks have no CPU left. With that patch applied, the cpuset_hotplug_test.sh test can be run successfully without failure. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-03-31cgroup/cpuset: Simplify setsched decision check in task iteration loop of ↵Waiman Long1-9/+10
cpuset_can_attach() Centralize the check required to run security_task_setscheduler() in the task iteration loop of cpuset_can_attach() outside of the loop as it has no dependency on the characteristics of the tasks themselves. There is no functional change. Signed-off-by: Waiman Long <longman@redhat.com> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-03-31tracing: Remove duplicate latency_fsnotify() stubSteven Rostedt2-3/+2
When the SNAPSHOT is defined but FSNOTIFY is not the latency_fsnotify() function is turned into a static inline stub. But this stub was defined in both trace.h and trace_snapshot.c causing a error in build when CONFIG_SNAPSHOT is defined but FSNOTIFY is not. The stub is not needed in trace_snapshot.c as it will be defined in trace.h, remove it from the C file. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260330205859.24c0aae3@gandalf.local.home Fixes: bade44fe5462 ("tracing: Move snapshot code out of trace.c and into trace_snapshot.c") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603310604.lGE9LDBK-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-31tracing: Preserve repeated trace_trigger boot parametersWesley Atwell1-3/+10
trace_trigger= tokenizes bootup_trigger_buf in place and stores pointers into that buffer for later trigger registration. Repeated trace_trigger= parameters overwrite the buffer contents from earlier calls, leaving only the last set of parsed event and trigger strings. Keep each new trace_trigger= string at the end of bootup_trigger_buf and parse only the appended range. That preserves the earlier event and trigger strings while still letting repeated parameters queue additional boot-time triggers. This also lets Bootconfig array values work naturally when they expand to repeated trace_trigger= entries. Before this change, only the last trace_trigger= instance survived boot. Link: https://patch.msgid.link/20260330181103.1851230-2-atwellwea@gmail.com Signed-off-by: Wesley Atwell <atwellwea@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-31tracing: Append repeated boot-time tracing parametersWesley Atwell4-6/+42
Some tracing boot parameters already accept delimited value lists, but their __setup() handlers keep only the last instance seen at boot. Make repeated instances append to the same boot-time buffer in the format each parser already consumes. Use a shared trace_append_boot_param() helper for the ftrace filters, trace_options, and kprobe_event boot parameters. This also lets Bootconfig array values work naturally when they expand to repeated param=value entries. Before this change, only the last instance from each repeated parameter survived boot. Link: https://patch.msgid.link/20260330181103.1851230-1-atwellwea@gmail.com Signed-off-by: Wesley Atwell <atwellwea@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-31printk: ringbuffer: fix errors in commentsLoïc Grégoire2-7/+7
The printk ringbuffer implementation is described in the comment as using three ringbuffers, but the current implementation uses two (desc and data). Update the comment so it matches the code. Fix few more known issues in the comments. Signed-off-by: Loïc Grégoire <loicgre@gmail.com> Reviewed-by: John Ogness <john.ogness@linutronix.de> Link: https://patch.msgid.link/20260328021855.53956-1-loicgre@gmail.com [pmladek@suse.com: Fixed few more issues in the comments by John Ogness.] Signed-off-by: John Ogness <john.ogness@linutronix.de> Reviewed-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Petr Mladek <pmladek@suse.com>
2026-03-31rv: Add nomiss deadline monitorGabriele Monaco10-0/+713
Add the deadline monitors collection to validate the deadline scheduler, both for deadline tasks and servers. The currently implemented monitors are: * nomiss: validate dl entities run to completion before their deadiline Reviewed-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20260330111010.153663-13-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31sched/deadline: Move some utility functions to deadline.hGabriele Monaco1-27/+1
Some utility functions on sched_dl_entity can be useful outside of deadline.c , for instance for modelling, without relying on raw structure fields. Move functions like dl_task_of and dl_is_implicit to deadline.h to make them available outside. Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20260330111010.153663-12-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31sched: Add deadline tracepointsGabriele Monaco2-0/+28
Add the following tracepoints: * sched_dl_throttle(dl_se, cpu, type): Called when a deadline entity is throttled * sched_dl_replenish(dl_se, cpu, type): Called when a deadline entity's runtime is replenished * sched_dl_update(dl_se, cpu, type): Called when a deadline entity updates without throttle or replenish * sched_dl_server_start(dl_se, cpu, type): Called when a deadline server is started * sched_dl_server_stop(dl_se, cpu, type): Called when a deadline server is stopped Those tracepoints can be useful to validate the deadline scheduler with RV and are not exported to tracefs. Reviewed-by: Phil Auld <pauld@redhat.com> Acked-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20260330111010.153663-11-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31rv: Convert the opid monitor to a hybrid automatonGabriele Monaco5-153/+61
The opid monitor validates that wakeup and need_resched events only occur with interrupts and preemption disabled by following the preemptirq tracepoints. As reported in [1], those tracepoints might be inaccurate in some situations (e.g. NMIs). Since the monitor doesn't validate other ordering properties, remove the dependency on preemptirq tracepoints and convert the monitor to a hybrid automaton to validate the constraint during event handling. This makes the monitor more robust by also removing the workaround for interrupts missing the preemption tracepoints, which was working on PREEMPT_RT only and allows the monitor to be built on kernels without the preemptirqs tracepoints. [1] - https://lore.kernel.org/lkml/20250625120823.60600-1-gmonaco@redhat.com Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260330111010.153663-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31rv: Add sample hybrid monitor stallGabriele Monaco7-0/+266
Add a sample monitor to showcase hybrid/timed automata. The stall monitor identifies tasks stalled for longer than a threshold and reacts when that happens. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260330111010.153663-7-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31rv: Add Hybrid Automata monitor typeGabriele Monaco2-0/+76
Deterministic automata define which events are allowed in every state, but cannot define more sophisticated constraint taking into account the system's environment (e.g. time or other states not producing events). Add the Hybrid Automata monitor type as an extension of Deterministic automata where each state transition is validating a constraint on a finite number of environment variables. Hybrid automata can be used to implement timed automata, where the environment variables are clocks. Also implement the necessary functionality to handle clock constraints (ns or jiffy granularity) on state and events. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260330111010.153663-3-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31Merge branch 'dma-contig-for-7.1-modules-prep-v4' into dma-mapping-for-nextMarek Szyprowski1-6/+60
Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com>
2026-03-31dma: contiguous: Export dev_get_cma_area()Maxime Ripard1-0/+1
The CMA dma-buf heap uses the dev_get_cma_area() function to retrieve the default contiguous area. Now that this function is no longer inlined, and since we want to turn the CMA heap into a module, let's export it. Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260331-dma-buf-heaps-as-modules-v4-4-e18fda504419@kernel.org
2026-03-31dma: contiguous: Make dma_contiguous_default_area staticMaxime Ripard1-1/+1
Now that dev_get_cma_area() is no longer inline, we don't have any user of dma_contiguous_default_area() outside of contiguous.c so we can make it static. Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260331-dma-buf-heaps-as-modules-v4-3-e18fda504419@kernel.org
2026-03-31dma: contiguous: Make dev_get_cma_area() a proper functionMaxime Ripard1-0/+8
As we try to enable dma-buf heaps, and the CMA one in particular, to compile as modules, we need to export dev_get_cma_area(). It's currently implemented as an inline function that returns either the content of device->cma_area or dma_contiguous_default_area. Thus, it means we need to export dma_contiguous_default_area, which isn't really something we want any module to have access to. Instead, let's make dev_get_cma_area() a proper function we will be able to export so we can avoid exporting dma_contiguous_default_area. Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260331-dma-buf-heaps-as-modules-v4-2-e18fda504419@kernel.org
2026-03-31dma: contiguous: Turn heap registration logic aroundMaxime Ripard1-5/+50
The CMA heap instantiation was initially developed by having the contiguous DMA code call into the CMA heap to create a new instance every time a reserved memory area is probed. Turning the CMA heap into a module would create a dependency of the kernel on a module, which doesn't work. Let's turn the logic around and do the opposite: store all the reserved memory CMA regions into the contiguous DMA code, and provide an iterator for the heap to use when it probes. Signed-off-by: Maxime Ripard <mripard@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260331-dma-buf-heaps-as-modules-v4-1-e18fda504419@kernel.org
2026-03-31bpf: Fix block device hooks namesJiri Olsa1-3/+3
Use proper names for block device hooks names. Fixes: 46df585fcff7 ("bpf: classify block device hooks appropriately") Reported-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Closes: https://lore.kernel.org/bpf/acrVKUy_EPiFFmV9@krava/T/#m7c7906a1ff4029e29185aec3266dbf5c8996dbf7 Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Venkat Rao Bagalkote <venkat88@linux.ibm.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Acked-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/bpf/20260330210344.3073712-1-jolsa@kernel.org
2026-03-30rcutorture: Test call_srcu() with preemption disabled and notPaul E. McKenney1-0/+7
This commit tests invoking call_srcu() with preemption both enabled and disabled, via acquiring of pi lock. [ Joel: reword commit message. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2026-03-30rcu: Add BOOTPARAM_RCU_STALL_PANIC Kconfig optionGustavo Luiz Duarte2-1/+25
Add a Kconfig option to set the default value of the kernel.panic_on_rcu_stall sysctl, allowing the kernel to be built with panic-on-RCU-stall enabled by default. This is useful for high-availability systems that require automatic recovery (via panic_timeout) when a CPU stall is detected, without needing userspace to configure the sysctl at boot. This follows the pattern established by BOOTPARAM_SOFTLOCKUP_PANIC and BOOTPARAM_HUNG_TASK_PANIC. The runtime sysctl can still override the Kconfig default. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Gustavo Luiz Duarte <gustavold@gmail.com> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2026-03-30torture: Avoid modulo-zero error in torture_hrtimeout_ns()Paul E. McKenney1-1/+1
Currently, all calls to torture_hrtimeout_ns() either provide a non-zero fuzzt_ns or a NULL trsp, either of which avoids taking the modulus of a zero-valued fuzzt_ns. But this code should do a better job of defending itself, so this commit explicitly checks fuzzt_ns and avoids the modulus when its value is zero. Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2026-03-30rcu/nocb: Extract nocb_bypass_needs_flush() to reduce duplicationJoel Fernandes1-14/+37
The bypass flush decision logic is duplicated in rcu_nocb_try_bypass() and nocb_gp_wait() with similar conditions. This commit therefore extracts the functionality into a common helper function nocb_bypass_needs_flush() improving the code readability. A flush_faster parameter is added to controlling the flushing thresholds and timeouts. This design was in the original commit d1b222c6be1f ("rcu/nocb: Add bypass callback queueing") to avoid having the GP kthread aggressively flush the bypass queue. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2026-03-30rcu/nocb: Consolidate rcu_nocb_cpu_offload/deoffload functionsJoel Fernandes1-35/+35
The rcu_nocb_cpu_offload() and rcu_nocb_cpu_deoffload() functions are nearly duplicates. Therefore, extract the common logic into rcu_nocb_cpu_toggle_offload() which takes an 'offload' boolean, and make both exported functions simple wrappers. This eliminates a bunch of duplicate code at the call sites, namely mutex locking, CPU hotplug locking and CPU online checks. Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2026-03-30rcu-tasks: Remove unnecessary smp_store_release() in cblist_init_generic()Zqiang1-3/+3
The cblist_init_generic() is executed during the CPU early boot phase due to commit:30ef09635b9e ("rcu-tasks: Initialize callback lists at rcu_init() time"), at this time, only one boot CPU is online and the irq is disabled. this commit therefore use routine assignment replace of smp_store_release() and WRITE_ONCE() in the cblist_init_generic(). Signed-off-by: Zqiang <qiang.zhang@linux.dev> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2026-03-30rcuscale: Ditch rcu_scale_shutdown in favor of torture_shutdown_init()Paul E. McKenney1-57/+21
The torture_shutdown_init() function spawns a shutdown kthread in a manner very similar to that implemented by rcu_scale_shutdown(). This commit therefore re-implements rcu_scale_shutdown() in terms of torture_shutdown_init(). This patch was generated by Claude given as input the patch making the same transformation of ref_scale_shutdown(). Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2026-03-30refscale: Ditch ref_scale_shutdown in favor of torture_shutdown_init()Paul E. McKenney1-37/+14
The torture_shutdown_init() function spawns a shutdown kthread in a manner very similar to that implemented by ref_scale_shutdown(). This commit therefore re-implements ref_scale_shutdown in terms of torture_shutdown_init(). The initial draft of this patch was generated by version 2.1.16 of the Claude AI/LLM, but trained and configured for use by my employer, and prompted to refer to Linux-kernel source code. This initial draft failed to provide a forward reference to ref_scale_cleanup(), passed zero to torture_shutdown_init() for an unwelcome insta-shutdown, and failed to pass the kvm.sh --duration argument in as a refscale module parameter. On the other hand, it did catch the need to NULL main_task on the post-test self-shutdown code path, which I might well have forgotten to do. This version of the patch fixes those problems, and in fact very little of the initial draft remains. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2026-03-30rcutorture: Add a textbook-style trivial preemptible RCUPaul E. McKenney4-1/+93
This commit adds a trivial textbook implementation of preemptible RCU to rcutorture ("torture_type=trivial-preempt"), similar in spirit to the existing "torture_type=trivial" textbook implementation of non-preemptible RCU. Neither trivial RCU implementation has any value for production use, and are intended only to keep Paul honest in his introductory writings and presentations. [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
2026-03-30PM: EM: Fix NULL pointer dereference when perf domain ID is not foundChangwoo Min1-0/+2
dev_energymodel_nl_get_perf_domains_doit() calls em_perf_domain_get_by_id() but does not check the return value before passing it to __em_nl_get_pd_size(). When a caller supplies a non-existent perf domain ID, em_perf_domain_get_by_id() returns NULL, and __em_nl_get_pd_size() immediately dereferences pd->cpus (struct offset 0x30), causing a NULL pointer dereference. The sister handler dev_energymodel_nl_get_perf_table_doit() already handles this correctly via __em_nl_get_pd_table_id(), which returns NULL and causes the caller to return -EINVAL. Add the same NULL check in the get-perf-domains do handler. Fixes: 380ff27af25e ("PM: EM: Add dump to get-perf-domains in the EM YNL spec") Reported-by: Yi Lai <yi1.lai@linux.intel.com> Closes: https://lore.kernel.org/lkml/aXiySM79UYfk+ytd@ly-workstation/ Signed-off-by: Changwoo Min <changwoo@igalia.com> Cc: 6.19+ <stable@vger.kernel.org> # 6.19+ [ rjw: Subject and changelog edits ] Link: https://patch.msgid.link/20260329073615.649976-1-changwoo@igalia.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2026-03-30Merge branch 'for-7.0-fixes' into for-7.1Tejun Heo3-26/+74
Conflict in kernel/sched/ext.c init_sched_ext_class() between: 415cb193bb97 ("sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callback") which adds cpus_to_sync cpumask allocation, and: 84b1a0ea0b7c ("sched_ext: Implement scx_bpf_dsq_reenq() for user DSQs") 8c1b9453fde6 ("sched_ext: Convert deferred_reenq_locals from llist to regular list") which add deferred_reenq init code at the same location. Both are independent additions. Include both. Signed-off-by: Tejun Heo <tj@kernel.org>
2026-03-30sched_ext: Fix SCX_KICK_WAIT deadlock by deferring wait to balance callbackTejun Heo2-25/+73
SCX_KICK_WAIT busy-waits in kick_cpus_irq_workfn() using smp_cond_load_acquire() until the target CPU's kick_sync advances. Because the irq_work runs in hardirq context, the waiting CPU cannot reschedule and its own kick_sync never advances. If multiple CPUs form a wait cycle, all CPUs deadlock. Replace the busy-wait in kick_cpus_irq_workfn() with resched_curr() to force the CPU through do_pick_task_scx(), which queues a balance callback to perform the wait. The balance callback drops the rq lock and enables IRQs following the sched_core_balance() pattern, so the CPU can process IPIs while waiting. The local CPU's kick_sync is advanced on entry to do_pick_task_scx() and continuously during the wait, ensuring any CPU that starts waiting for us sees the advancement and cannot form cyclic dependencies. Fixes: 90e55164dad4 ("sched_ext: Implement SCX_KICK_WAIT") Cc: stable@vger.kernel.org # v6.12+ Reported-by: Christian Loehle <christian.loehle@arm.com> Link: https://lore.kernel.org/r/20260316100249.1651641-1-christian.loehle@arm.com Signed-off-by: Tejun Heo <tj@kernel.org> Tested-by: Christian Loehle <christian.loehle@arm.com>
2026-03-30dma-debug: suppress cacheline overlap warning when arch has no DMA alignment ↵Mikhail Gavrilov1-0/+1
requirement When CONFIG_DMA_API_DEBUG is enabled, the DMA debug infrastructure tracks active mappings per cacheline and warns if two different DMA mappings share the same cacheline ("cacheline tracking EEXIST, overlapping mappings aren't supported"). On x86_64, ARCH_KMALLOC_MINALIGN defaults to 8, so small kmalloc allocations (e.g. the 8-byte hub->buffer and hub->status in the USB hub driver) frequently land in the same 64-byte cacheline. When both are DMA-mapped, this triggers a false positive warning. This has been reported repeatedly since v5.14 (when the EEXIST check was added) across various USB host controllers and devices including xhci_hcd with USB hubs, USB audio devices, and USB ethernet adapters. The cacheline overlap is only a real concern on architectures that require DMA buffer alignment to cacheline boundaries (i.e. where ARCH_DMA_MINALIGN >= L1_CACHE_BYTES). On architectures like x86_64 where dma_get_cache_alignment() returns 1, the hardware is cache-coherent and overlapping cacheline mappings are harmless. Suppress the EEXIST warning when dma_get_cache_alignment() is less than L1_CACHE_BYTES, indicating the architecture does not require cacheline-aligned DMA buffers. Verified with a kernel module reproducer that performs two kmalloc(8) allocations back-to-back and DMA-maps both: Before: allocations share a cacheline, EEXIST fires within ~50 pairs After: same cacheline pair found, but no warning emitted Fixes: 2b4bbc6231d7 ("dma-debug: report -EEXIST errors in add_dma_entry") Link: https://bugzilla.kernel.org/show_bug.cgi?id=215740 Suggested-by: Harry Yoo <harry@kernel.org> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20260327124156.24820-1-mikhail.v.gavrilov@gmail.com
2026-03-29Merge tag 'timers-urgent-2026-03-29' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fix from Ingo Molnar: "Fix an argument order bug in the alarm timer forwarding logic, which may cause missed expirations or incorrect overrun accounting" * tag 'timers-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: alarmtimer: Fix argument order in alarm_timer_forward()
2026-03-29Merge tag 'locking-urgent-2026-03-29' of ↵Linus Torvalds3-2/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull futex fixes from Ingo Molnar: - Tighten up the sys_futex_requeue() ABI a bit, to disallow dissimilar futex flags and potential UaF access (Peter Zijlstra) - Fix UaF between futex_key_to_node_opt() and vma_replace_policy() (Hao-Yu Yang) - Clear stale exiting pointer in futex_lock_pi() retry path, which triggered a warning (and potential misbehavior) in stress-testing (Davidlohr Bueso) * tag 'locking-urgent-2026-03-29' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: futex: Clear stale exiting pointer in futex_lock_pi() retry path futex: Fix UaF between futex_key_to_node_opt() and vma_replace_policy() futex: Require sys_futex_requeue() to have identical flags
2026-03-29bpf: Support struct btf_struct_meta via KF_IMPLICIT_ARGSIhor Solodrai2-73/+283
The following kfuncs currently accept void *meta__ign argument: * bpf_obj_new_impl * bpf_obj_drop_impl * bpf_percpu_obj_new_impl * bpf_percpu_obj_drop_impl * bpf_refcount_acquire_impl * bpf_list_push_back_impl * bpf_list_push_front_impl * bpf_rbtree_add_impl The __ign suffix is an indicator for the verifier to skip the argument in check_kfunc_args(). Then, in fixup_kfunc_call() the verifier may set the value of this argument to struct btf_struct_meta * kptr_struct_meta from insn_aux_data. BPF programs must pass a dummy NULL value when calling these kfuncs. Additionally, the list and rbtree _impl kfuncs also accept an implicit u64 argument, which doesn't require __ign suffix because it's a scalar, and BPF programs explicitly pass 0. Add new kfuncs with KF_IMPLICIT_ARGS [1], that correspond to each _impl kfunc accepting meta__ign. The existing _impl kfuncs remain unchanged for backwards compatibility. To support this, add "btf_struct_meta" to the list of recognized implicit argument types in resolve_btfids. Implement is_kfunc_arg_implicit() in the verifier, that determines implicit args by inspecting both a non-_impl BTF prototype of the kfunc. Update the special_kfunc_list in the verifier and relevant checks to support both the old _impl and the new KF_IMPLICIT_ARGS variants of btf_struct_meta users. [1] https://lore.kernel.org/bpf/20260120222638.3976562-1-ihor.solodrai@linux.dev/ Signed-off-by: Ihor Solodrai <ihor.solodrai@linux.dev> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20260327203241.3365046-1-ihor.solodrai@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-03-28tracing: Remove spurious default precision from show_event_trigger/filter ↵David Laight1-2/+2
formats Change 2d8b7f9bf8e6e ("tracing: Have show_event_trigger/filter format a bit more in columns") added space padding to align the output. However it used ("%*.s", len, "") which requests the default precision. It doesn't matter here whether the userspace default (0) or kernel default (no precision) is used, but the format should be "%*s". Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260326201824.3919-1-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-28Merge tag 'trace-v7.0-rc5' of ↵Linus Torvalds2-18/+71
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix potential deadlock in osnoise and hotplug The interface_lock can be called by a osnoise thread and the CPU shutdown logic of osnoise can wait for this thread to finish. But cpus_read_lock() can also be taken while holding the interface_lock. This produces a circular lock dependency and can cause a deadlock. Swap the ordering of cpus_read_lock() and the interface_lock to have interface_lock taken within the cpus_read_lock() context to prevent this circular dependency. - Fix freeing of event triggers in early boot up If the same trigger is added on the kernel command line, the second one will fail to be applied and the trigger created will be freed. This calls into the deferred logic and creates a kernel thread to do the freeing. But the command line logic is called before kernel threads can be created and this leads to a NULL pointer dereference. Delay freeing event triggers until late init. * tag 'trace-v7.0-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Drain deferred trigger frees if kthread creation fails tracing: Fix potential deadlock in cpu hotplug with osnoise
2026-03-28tracing: Remove tracing_alloc_snapshot() when snapshot isn't definedSteven Rostedt1-7/+0
The function tracing_alloc_snapshot() is only used between trace.c and trace_snapshot.c. When snapshot isn't configured, it's not used at all. The stub function was defined as a global with no users and no prototype causing build issues. Remove the function when snapshot isn't configured as nothing is calling it. Also remove the EXPORT_SYMBOL_GPL() that was associated with it as it's not used outside of the tracing subsystem which also includes any modules. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260328101946.2c4ef4a5@robin Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/all/acb-IuZ4vDkwwQLW@sirena.co.uk/ Fixes: bade44fe546212 (tracing: Move snapshot code out of trace.c and into trace_snapshot.c) Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-28posix-timers: Fix stale function name in commentZhan Xusheng1-1/+1
The comment in exit_itimers() still refers to itimer_delete(), which was replaced by posix_timer_delete(). Update the comment accordingly. Signed-off-by: Zhan Xusheng <zhanxusheng@xiaomi.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260326142210.98632-1-zhanxusheng@xiaomi.com
2026-03-28futex: Clear stale exiting pointer in futex_lock_pi() retry pathDavidlohr Bueso1-1/+2
Fuzzying/stressing futexes triggered: WARNING: kernel/futex/core.c:825 at wait_for_owner_exiting+0x7a/0x80, CPU#11: futex_lock_pi_s/524 When futex_lock_pi_atomic() sees the owner is exiting, it returns -EBUSY and stores a refcounted task pointer in 'exiting'. After wait_for_owner_exiting() consumes that reference, the local pointer is never reset to nil. Upon a retry, if futex_lock_pi_atomic() returns a different error, the bogus pointer is passed to wait_for_owner_exiting(). CPU0 CPU1 CPU2 futex_lock_pi(uaddr) // acquires the PI futex exit() futex_cleanup_begin() futex_state = EXITING; futex_lock_pi(uaddr) futex_lock_pi_atomic() attach_to_pi_owner() // observes EXITING *exiting = owner; // takes ref return -EBUSY wait_for_owner_exiting(-EBUSY, owner) put_task_struct(); // drops ref // exiting still points to owner goto retry; futex_lock_pi_atomic() lock_pi_update_atomic() cmpxchg(uaddr) *uaddr ^= WAITERS // whatever // value changed return -EAGAIN; wait_for_owner_exiting(-EAGAIN, exiting) // stale WARN_ON_ONCE(exiting) Fix this by resetting upon retry, essentially aligning it with requeue_pi. Fixes: 3ef240eaff36 ("futex: Prevent exit livelock") Signed-off-by: Davidlohr Bueso <dave@stgolabs.net> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260326001759.4129680-1-dave@stgolabs.net
2026-03-28tracing: Drain deferred trigger frees if kthread creation failsWesley Atwell1-13/+66
Boot-time trigger registration can fail before the trigger-data cleanup kthread exists. Deferring those frees until late init is fine, but the post-boot fallback must still drain the deferred list if kthread creation never succeeds. Otherwise, boot-deferred nodes can accumulate on trigger_data_free_list, later frees fall back to synchronously freeing only the current object, and the older queued entries are leaked forever. To trigger this, add the following to the kernel command line: trace_event=sched_switch trace_trigger=sched_switch.traceon,sched_switch.traceon The second traceon trigger will fail and be freed. This triggers a NULL pointer dereference and crashes the kernel. Keep the deferred boot-time behavior, but when kthread creation fails, drain the whole queued list synchronously. Do the same in the late-init drain path so queued entries are not stranded there either. Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260324221326.1395799-3-atwellwea@gmail.com Fixes: 61d445af0a7c ("tracing: Add bulk garbage collection of freeing event_trigger_data") Signed-off-by: Wesley Atwell <atwellwea@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-27fork: zero vmap stack using clear_pages() instead of memset()Linus Walleij1-1/+1
After the introduction of clear_pages() we exploit the fact that the process vm_area is allocated in contiguous pages to just clear them all in one swift operation. Link: https://lkml.kernel.org/r/20260224-mm-fork-clear-pages-v1-1-184c65a72d49@kernel.org Signed-off-by: Linus Walleij <linusw@kernel.org> Suggested-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://lore.kernel.org/linux-mm/dpnwsp7dl4535rd7qmszanw6u5an2p74uxfex4dh53frpb7pu3@2bnjjavjrepe/ Suggested-by: Pasha Tatashin <pasha.tatashin@soleen.com> Link: https://lore.kernel.org/20240311164638.2015063-7-pasha.tatashin@soleen.com Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Ankur Arora <ankur.a.arora@oracle.com> Cc: Ben Segall <bsegall@google.com> Cc: David Hildenbrand <david@kernel.org> Cc: Dietmar Eggemann <dietmar.eggemann@arm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Kees Cook <kees@kernel.org> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vincent Guittot <vincent.guittot@linaro.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27do_notify_parent: sanitize the valid_signal() checksOleg Nesterov1-2/+3
Now that kernel_clone() checks valid_signal(args->exit_signal), the "sig" argument of do_notify_parent() must always be valid or we have a bug. However, do_notify_parent() only checks that sig != -1 at the start, then it does another valid_signal() check before __send_signal_locked(). This is confusing. Change do_notify_parent() to WARN and return early if valid_signal(sig) is false. Link: https://lkml.kernel.org/r/abld-ilvMEZ7VgMw@redhat.com Signed-off-by: Oleg Nesterov <oleg@redhat.com> Acked-by: Deepanshu Kartikey <Kartikey406@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27watchdog/hardlockup: improve buddy system detection timelinessMayank Rungta2-11/+17
Currently, the buddy system only performs checks every 3rd sample. With a 4-second interval. If a check window is missed, the next check occurs 12 seconds later, potentially delaying hard lockup detection for up to 24 seconds. Modify the buddy system to perform checks at every interval (4s). Introduce a missed-interrupt threshold to maintain the existing grace period while reducing the detection window to 8-12 seconds. Best and worst case detection scenarios: Before (12s check window): - Best case: Lockup occurs after first check but just before heartbeat interval. Detected in ~8s (8s till next check). - Worst case: Lockup occurs just after a check. Detected in ~24s (missed check + 12s till next check + 12s logic). After (4s check window with threshold of 3): - Best case: Lockup occurs just before a check. Detected in ~8s (0s till 1st check + 4s till 2nd + 4s till 3rd). - Worst case: Lockup occurs just after a check. Detected in ~12s (4s till 1st check + 4s till 2nd + 4s till 3rd). Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-4-45bd8a0cc7ed@google.com Signed-off-by: Mayank Rungta <mrungta@google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ian Rogers <irogers@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Stephane Erainan <eranian@google.com> Cc: Wang Jinchao <wangjinchao600@gmail.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27watchdog: update saved interrupts during checkMayank Rungta1-5/+13
Currently, arch_touch_nmi_watchdog() causes an early return that skips updating hrtimer_interrupts_saved. This leads to stale comparisons and delayed lockup detection. I found this issue because in our system the serial console is fairly chatty. For example, the 8250 console driver frequently calls touch_nmi_watchdog() via console_write(). If a CPU locks up after a timer interrupt but before next watchdog check, we see the following sequence: * watchdog_hardlockup_check() saves counter (e.g., 1000) * Timer runs and updates the counter (1001) * touch_nmi_watchdog() is called * CPU locks up * 10s pass: check() notices touch, returns early, skips update * 10s pass: check() saves counter (1001) * 10s pass: check() finally detects lockup This delays detection to 30 seconds. With this fix, we detect the lockup in 20 seconds. Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-2-45bd8a0cc7ed@google.com Signed-off-by: Mayank Rungta <mrungta@google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ian Rogers <irogers@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Stephane Erainan <eranian@google.com> Cc: Wang Jinchao <wangjinchao600@gmail.com> Cc: Yunhui Cui <cuiyunhui@bytedance.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27watchdog: return early in watchdog_hardlockup_check()Mayank Rungta1-58/+59
Patch series "watchdog/hardlockup: Improvements to hardlockup", v2. This series addresses limitations in the hardlockup detector implementations and updates the documentation to reflect actual behavior and recent changes. The changes are structured as follows: Refactoring (Patch 1) ===================== Patch 1 refactors watchdog_hardlockup_check() to return early if no lockup is detected. This reduces the indentation level of the main logic block, serving as a clean base for the subsequent changes. Hardlockup Detection Improvements (Patches 2 & 4) ================================================= The hardlockup detector logic relies on updating saved interrupt counts to determine if the CPU is making progress. Patch 1 ensures that the saved interrupt count is updated unconditionally before checking the "touched" flag. This prevents stale comparisons which can delay detection. This is a logic fix that ensures the detector remains accurate even when the watchdog is frequently touched. Patch 3 improves the Buddy detector's timeliness. The current checking interval (every 3rd sample) causes high variability in detection time (up to 24s). This patch changes the Buddy detector to check at every hrtimer interval (4s) with a missed-interrupt threshold of 3, narrowing the detection window to a consistent 8-12 second range. Documentation Updates (Patches 3 & 5) ===================================== The current documentation does not fully capture the variable nature of detection latency or the details of the Buddy system. Patch 3 removes the strict "10 seconds" definition of a hardlockup, which was misleading given the periodic nature of the detector. It adds a "Detection Overhead" section to the admin guide, using "Best Case" and "Worst Case" scenarios to illustrate that detection time can vary significantly (e.g., ~6s to ~20s). Patch 5 adds a dedicated section for the Buddy detector, which was previously undocumented. It details the mechanism, the new timing logic, and known limitations. This patch (of 5): Invert the `is_hardlockup(cpu)` check in `watchdog_hardlockup_check()` to return early when a hardlockup is not detected. This flattens the main logic block, reducing the indentation level and making the code easier to read and maintain. This refactoring serves as a preparation patch for future hardlockup changes. Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-0-45bd8a0cc7ed@google.com Link: https://lkml.kernel.org/r/20260312-hardlockup-watchdog-fixes-v2-1-45bd8a0cc7ed@google.com Signed-off-by: Mayank Rungta <mrungta@google.com> Reviewed-by: Douglas Anderson <dianders@chromium.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Cc: Ian Rogers <irogers@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Li Huafei <lihuafei1@huawei.com> Cc: Max Kellermann <max.kellermann@ionos.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Stephane Erai