aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2025-07-03bpf: Add show_fdinfo for kprobe_multiTao Chen1-0/+27
Show kprobe_multi link info with fdinfo, the info as follows: link_type: kprobe_multi link_id: 1 prog_tag: a69740b9746f7da8 prog_id: 21 kprobe_cnt: 8 missed: 0 cookie func 1 bpf_fentry_test1+0x0/0x20 7 bpf_fentry_test2+0x0/0x20 2 bpf_fentry_test3+0x0/0x20 3 bpf_fentry_test4+0x0/0x20 4 bpf_fentry_test5+0x0/0x20 5 bpf_fentry_test6+0x0/0x20 6 bpf_fentry_test7+0x0/0x20 8 bpf_fentry_test8+0x0/0x10 Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250702153958.639852-3-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03bpf: Add show_fdinfo for uprobe_multiTao Chen1-0/+44
Show uprobe_multi link info with fdinfo, the info as follows: link_type: uprobe_multi link_id: 9 prog_tag: e729f789e34a8eca prog_id: 39 uprobe_cnt: 3 pid: 0 path: /home/dylane/bpf/tools/testing/selftests/bpf/test_progs cookie offset ref_ctr_offset 3 0xa69f13 0x0 1 0xa69f1e 0x0 2 0xa69f29 0x0 Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250702153958.639852-2-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03bpf: Show precise link_type for {uprobe,kprobe}_multi fdinfoTao Chen2-7/+12
Alexei suggested, 'link_type' can be more precise and differentiate for human in fdinfo. In fact BPF_LINK_TYPE_KPROBE_MULTI includes kretprobe_multi type, the same as BPF_LINK_TYPE_UPROBE_MULTI, so we can show it more concretely. link_type: kprobe_multi link_id: 1 prog_tag: d2b307e915f0dd37 ... link_type: kretprobe_multi link_id: 2 prog_tag: ab9ea0545870781d ... link_type: uprobe_multi link_id: 9 prog_tag: e729f789e34a8eca ... link_type: uretprobe_multi link_id: 10 prog_tag: 7db356c03e61a4d4 Co-developed-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Tao Chen <chen.dylane@linux.dev> Link: https://lore.kernel.org/r/20250702153958.639852-1-chen.dylane@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-03bpf: Add bpf_dynptr_memset() kfuncIhor Solodrai1-0/+47
Currently there is no straightforward way to fill dynptr memory with a value (most commonly zero). One can do it with bpf_dynptr_write(), but a temporary buffer is necessary for that. Implement bpf_dynptr_memset() - an analogue of memset() from libc. Signed-off-by: Ihor Solodrai <isolodrai@meta.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20250702210309.3115903-2-isolodrai@meta.com
2025-07-03PM: sleep: console: Fix the black screen issuetuhaowen1-1/+6
When the computer enters sleep status without a monitor connected, the system switches the console to the virtual terminal tty63(SUSPEND_CONSOLE). If a monitor is subsequently connected before waking up, the system skips the required VT restoration process during wake-up, leaving the console on tty63 instead of switching back to tty1. To fix this issue, a global flag vt_switch_done is introduced to record whether the system has successfully switched to the suspend console via vt_move_to_console() during suspend. If the switch was completed, vt_switch_done is set to 1. Later during resume, this flag is checked to ensure that the original console is restored properly by calling vt_move_to_console(orig_fgconsole, 0). This prevents scenarios where the resume logic skips console restoration due to incorrect detection of the console state, especially when a monitor is reconnected before waking up. Signed-off-by: tuhaowen <tuhaowen@uniontech.com> Link: https://patch.msgid.link/20250611032345.29962-1-tuhaowen@uniontech.com Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-07-03irqdomain: Add device pointer to irq_domain_info and msi_domain_infoThomas Gleixner2-1/+3
Add device pointer to irq_domain_info and msi_domain_info, so that the device can be specified at domain creation time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Nam Cao <namcao@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/943e52403b20cf13c320d55bd4446b4562466aab.1750860131.git.namcao@linutronix.de
2025-07-03Merge tag 'ktime-get-clock-ts64-for-ptp' of ↵Paolo Abeni1-0/+33
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Base implementation for PTP with a temporary CLOCK_AUX* workaround to allow integration of depending changes into the networking tree. Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-07-03Merge tag 'ktime-get-clock-ts64-for-ptp' into timers/ptpThomas Gleixner1-0/+34
Pull the base implementation of ktime_get_clock_ts64() for PTP, which contains a temporary CLOCK_AUX* workaround. That was created to allow integration of depending changes into the networking tree. The workaround is going to be removed in a subsequent change in the timekeeping tree. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2025-07-03timekeeping: Provide ktime_get_clock_ts64()Thomas Gleixner1-0/+33
PTP implements an inline switch case for taking timestamps from various POSIX clock IDs, which already consumes quite some text space. Expanding it for auxiliary clocks really becomes too big for inlining. Provide a out of line version. The function invalidates the timestamp in case the clock is invalid. The invalidation allows to implement a validation check without the need to propagate a return value through deep existing call chains. Due to merge logistics this temporarily defines CLOCK_AUX[_LAST] if undefined, so that the plain branch, which does not contain any of the core timekeeper changes, can be pulled into the networking tree as prerequisite for the PTP side changes. These temporary defines are removed after that branch is merged into the tip::timers/ptp branch. That way the result in -next or upstream in the next merge window has zero dependencies. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250701132628.357686408@linutronix.de
2025-07-03perf: Revert to requiring CAP_SYS_ADMIN for uprobesPeter Zijlstra1-1/+1
Jann reports that uprobes can be used destructively when used in the middle of an instruction. The kernel only verifies there is a valid instruction at the requested offset, but due to variable instruction length cannot determine if this is an instruction as seen by the intended execution stream. Additionally, Mark Rutland notes that on architectures that mix data in the text segment (like arm64), a similar things can be done if the data word is 'mistaken' for an instruction. As such, require CAP_SYS_ADMIN for uprobes. Fixes: c9e0924e5c2b ("perf/core: open access to probes for CAP_PERFMON privileged process") Reported-by: Jann Horn <jannh@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/CAG48ez1n4520sq0XrWYDHKiKxE_+WCfAK+qt9qkY4ZiBGmL-5g@mail.gmail.com
2025-07-02bpf: Avoid warning on multiple referenced args in callPaul Chaignon1-4/+4
The description of full helper calls in syzkaller [1] and the addition of kernel warnings in commit 0df1a55afa83 ("bpf: Warn on internal verifier errors") allowed syzbot to reach a verifier state that was thought to indicate a verifier bug [2]: 12: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204 verifier bug: more than one arg with ref_obj_id R2 2 2 This error can be reproduced with the program from the previous commit: 0: (b7) r2 = 20 1: (b7) r3 = 0 2: (18) r1 = 0xffff92cee3cbc600 4: (85) call bpf_ringbuf_reserve#131 5: (55) if r0 == 0x0 goto pc+3 6: (bf) r1 = r0 7: (bf) r2 = r0 8: (85) call bpf_tcp_raw_gen_syncookie_ipv4#204 9: (95) exit bpf_tcp_raw_gen_syncookie_ipv4 expects R1 and R2 to be ARG_PTR_TO_FIXED_SIZE_MEM (with a size of at least sizeof(struct iphdr) for R1). R0 is a ring buffer payload of 20B and therefore matches this requirement. The verifier reaches the check on ref_obj_id while verifying R2 and rejects the program because the helper isn't supposed to take two referenced arguments. This case is a legitimate rejection and doesn't indicate a kernel bug, so we shouldn't log it as such and shouldn't emit a kernel warning. Link: https://github.com/google/syzkaller/pull/4313 [1] Link: https://lore.kernel.org/all/686491d6.a70a0220.3b7e22.20ea.GAE@google.com/T/ [2] Fixes: 457f44363a88 ("bpf: Implement BPF ring buffer and verifier support for it") Fixes: 0df1a55afa83 ("bpf: Warn on internal verifier errors") Reported-by: syzbot+69014a227f8edad4d8c6@syzkaller.appspotmail.com Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/cd09afbfd7bef10bbc432d72693f78ffdc1e8ee5.1751463262.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02bpf: avoid jump misprediction for PTR_TO_MEM | PTR_UNTRUSTEDEduard Zingerman1-1/+1
Commit f2362a57aeff ("bpf: allow void* cast using bpf_rdonly_cast()") added a notion of PTR_TO_MEM | MEM_RDONLY | PTR_UNTRUSTED type. This simultaneously introduced a bug in jump prediction logic for situations like below: p = bpf_rdonly_cast(..., 0); if (p) a(); else b(); Here verifier would wrongly predict that else branch cannot be taken. This happens because: - Function reg_not_null() assumes that PTR_TO_MEM w/o PTR_MAYBE_NULL flag cannot be zero. - The call to bpf_rdonly_cast() returns a rdonly_untrusted_mem value w/o PTR_MAYBE_NULL flag. Tracking of PTR_MAYBE_NULL flag for untrusted PTR_TO_MEM does not make sense, as the main purpose of the flag is to catch null pointer access errors. Such errors are not possible on load of PTR_UNTRUSTED values and verifier makes sure that PTR_UNTRUSTED can't be passed to helpers or kfuncs. Hence, modify reg_not_null() to assume that nullness of untrusted PTR_TO_MEM is not known. Fixes: f2362a57aeff ("bpf: allow void* cast using bpf_rdonly_cast()") Suggested-by: Alexei Starovoitov <alexei.starovoitov@gmail.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250702073620.897517-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
2025-07-02smp: Defer check for local execution in smp_call_function_many_cond()Yury Norov [NVIDIA]1-7/+3
Defer check for local execution to the actual place where it is needed, which removes the extra local variable. Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250623000010.10124-5-yury.norov@gmail.com
2025-07-02bpf: Mark cgroup_subsys_state->cgroup RCU safeSong Liu1-0/+5
Mark struct cgroup_subsys_state->cgroup as safe under RCU read lock. This will enable accessing css->cgroup from a bpf css iterator. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/20250623063854.1896364-4-song@kernel.org Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-02bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's nodeSong Liu1-0/+3
BPF programs, such as LSM and sched_ext, would benefit from tags on cgroups. One common practice to apply such tags is to set xattrs on cgroupfs folders. Introduce kfunc bpf_cgroup_read_xattr, which allows reading cgroup's xattr. Note that, we already have bpf_get_[file|dentry]_xattr. However, these two APIs are not ideal for reading cgroupfs xattrs, because: 1) These two APIs only works in sleepable contexts; 2) There is no kfunc that matches current cgroup to cgroupfs dentry. bpf_cgroup_read_xattr is generic and can be useful for many program types. It is also safe, because it requires trusted or rcu protected argument (KF_RCU). Therefore, we make it available to all program types. Signed-off-by: Song Liu <song@kernel.org> Link: https://lore.kernel.org/20250623063854.1896364-3-song@kernel.org Acked-by: Tejun Heo <tj@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-02tracing: tprobe-events: Register tracepoint when enable tprobe eventMasami Hiramatsu (Google)1-165/+217
As same as fprobe, register tracepoint stub function only when enabling tprobe events. The major changes are introducing a list of tracepoint_user and its lock, and tprobe_event_module_nb, which is another module notifier for module loading/unloading. By spliting the lock from event_mutex and a module notifier for trace_fprobe, it solved AB-BA lock dependency issue between event_mutex and tracepoint_module_list_mutex. Link: https://lore.kernel.org/all/174343538901.843280.423773753642677941.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-02tracing: fprobe-events: Register fprobe-events only when it is enabledMasami Hiramatsu (Google)2-121/+121
Currently fprobe events are registered when it is defined. Thus it will give some overhead even if it is disabled. This changes it to register the fprobe only when it is enabled. Link: https://lore.kernel.org/all/174343537128.843280.16131300052837035043.stgit@devnote2/ Suggested-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-02tracing: tprobe-events: Support multiple tprobes on the same tracepointMasami Hiramatsu (Google)1-50/+201
Allow user to set multiple tracepoint-probe events on the same tracepoint. After the last tprobe-event is removed, the tracepoint callback is unregistered. Link: https://lore.kernel.org/all/174343536245.843280.6548776576601537671.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-02tracing: tprobe-events: Remove mod field from tprobe-eventMasami Hiramatsu (Google)1-14/+9
Remove unneeded 'mod' struct module pointer field from trace_fprobe because we don't need to save this info. Link: https://lore.kernel.org/all/174343535351.843280.5868426549023332120.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-02tracing: probe-events: Cleanup entry-arg storing codeMasami Hiramatsu (Google)1-59/+69
Cleanup __store_entry_arg() so that it is easier to understand. The main complexity may come from combining the loops for finding stored-entry-arg and max-offset and appending new entry. This split those different loops into 3 parts, lookup the same entry-arg, find the max offset and append new entry. Link: https://lore.kernel.org/all/174323039929.348535.4705349977127704120.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-07-01bpf: Reject %p% format string in bprintf-like helpersPaul Chaignon1-3/+8
static const char fmt[] = "%p%"; bpf_trace_printk(fmt, sizeof(fmt)); The above BPF program isn't rejected and causes a kernel warning at runtime: Please remove unsupported %\x00 in format string WARNING: CPU: 1 PID: 7244 at lib/vsprintf.c:2680 format_decode+0x49c/0x5d0 This happens because bpf_bprintf_prepare skips over the second %, detected as punctuation, while processing %p. This patch fixes it by not skipping over punctuation. %\x00 is then processed in the next iteration and rejected. Reported-by: syzbot+e2c932aec5c8a6e1d31c@syzkaller.appspotmail.com Fixes: 48cac3f4a96d ("bpf: Implement formatted output helpers with bstr_printf") Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/a0e06cc479faec9e802ae51ba5d66420523251ee.1751395489.git.paul.chaignon@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-01bpf: Warn on internal verifier errorsPaul Chaignon1-109/+102
This patch is a follow up to commit 1cb0f56d9618 ("bpf: WARN_ONCE on verifier bugs"). It generalizes the use of verifier_error throughout the verifier, in particular for logs previously marked "verifier internal error". As a consequence, all of those verifier bugs will now come with a kernel warning (under CONFIG_DBEUG_KERNEL) detectable by fuzzers. While at it, some error messages that were too generic (ex., "bpf verifier is misconfigured") have been reworded. Signed-off-by: Paul Chaignon <paul.chaignon@gmail.com> Link: https://lore.kernel.org/r/aGQqnzMyeagzgkCK@Tunnel Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-07-01lib/group_cpus: Let group_cpu_evenly() return the number of initialized masksDaniel Wagner1-6/+5
group_cpu_evenly() might have allocated less groups then requested: group_cpu_evenly() __group_cpus_evenly() alloc_nodes_groups() # allocated total groups may be less than numgrps when # active total CPU number is less then numgrps In this case, the caller will do an out of bound access because the caller assumes the masks returned has numgrps. Return the number of groups created so the caller can limit the access range accordingly. Acked-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Daniel Wagner <wagi@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20250617-isolcpus-queue-counters-v1-1-13923686b54b@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-07-01time/timecounter: Fix the lie that struct cyclecounter is constGreg Kroah-Hartman1-1/+1
In both the read callback for struct cyclecounter, and in struct timecounter, struct cyclecounter is declared as a const pointer. Unfortunatly, a number of users of this pointer treat it as a non-const pointer as it is burried in a larger structure that is heavily modified by the callback function when accessed. This lie had been hidden by the fact that container_of() "casts away" a const attribute of a pointer without any compiler warning happening at all. Fix this all up by removing the const attribute in the needed places so that everyone can see that the structure really isn't const, but can, and is, modified by the users of it. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/2025070124-backyard-hurt-783a@gregkh
2025-07-01sched/core: Fix migrate_swap() vs. hotplugPeter Zijlstra2-10/+15
On Mon, Jun 02, 2025 at 03:22:13PM +0800, Kuyo Chang wrote: > So, the potential race scenario is: > > CPU0 CPU1 > // doing migrate_swap(cpu0/cpu1) > stop_two_cpus() > ... > // doing _cpu_down() > sched_cpu_deactivate() > set_cpu_active(cpu, false); > balance_push_set(cpu, true); > cpu_stop_queue_two_works > __cpu_stop_queue_work(stopper1,...); > __cpu_stop_queue_work(stopper2,..); > stop_cpus_in_progress -> true > preempt_enable(); > ... > 1st balance_push > stop_one_cpu_nowait > cpu_stop_queue_work > __cpu_stop_queue_work > list_add_tail -> 1st add push_work > wake_up_q(&wakeq); -> "wakeq is empty. > This implies that the stopper is at wakeq@migrate_swap." > preempt_disable > wake_up_q(&wakeq); > wake_up_process // wakeup migrate/0 > try_to_wake_up > ttwu_queue > ttwu_queue_cond ->meet below case > if (cpu == smp_processor_id()) > return false; > ttwu_do_activate > //migrate/0 wakeup done > wake_up_process // wakeup migrate/1 > try_to_wake_up > ttwu_queue > ttwu_queue_cond > ttwu_queue_wakelist > __ttwu_queue_wakelist > __smp_call_single_queue > preempt_enable(); > > 2nd balance_push > stop_one_cpu_nowait > cpu_stop_queue_work > __cpu_stop_queue_work > list_add_tail -> 2nd add push_work, so the double list add is detected > ... > ... > cpu1 get ipi, do sched_ttwu_pending, wakeup migrate/1 > So this balance_push() is part of schedule(), and schedule() is supposed to switch to stopper task, but because of this race condition, stopper task is stuck in WAKING state and not actually visible to be picked. Therefore CPU1 can do another schedule() and end up doing another balance_push() even though the last one hasn't been done yet. This is a confluence of fail, where both wake_q and ttwu_wakelist can cause crucial wakeups to be delayed, resulting in the malfunction of balance_push. Since there is only a single stopper thread to be woken, the wake_q doesn't really add anything here, and can be removed in favour of direct wakeups of the stopper thread. Then add a clause to ttwu_queue_cond() to ensure the stopper threads are never queued / delayed. Of all 3 moving parts, the last addition was the balance_push() machinery, so pick that as the point the bug was introduced. Fixes: 2558aacff858 ("sched/hotplug: Ensure only per-cpu kthreads run during hotplug") Reported-by: Kuyo Chang <kuyo.chang@mediatek.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Kuyo Chang <kuyo.chang@mediatek.com> Link: https://lkml.kernel.org/r/20250605100009.GO39944@noisy.programming.kicks-ass.net
2025-07-01sched: Fix preemption string of preempt_dynamic_noneThomas Weißschuh1-1/+1
Zero is a valid value for "preempt_dynamic_mode", namely "preempt_dynamic_none". Fix the off-by-one in preempt_model_str(), so that "preempty_dynamic_none" is correctly formatted as PREEMPT(none) instead of PREEMPT(undef). Fixes: 8bdc5daaa01e ("sched: Add a generic function to return the preemption string") Signed-off-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com> Cc: stable@vger.kernel.org Link: https://lkml.kernel.org/r/20250626-preempt-str-none-v2-1-526213b70a89@linutronix.de
2025-06-30Merge tag 'entry-split-for-arm' into core/entryThomas Gleixner4-117/+119
Prerequisite for ARM[64] generic entry conversion Merge it into the entry branch so further changes can be based on it.
2025-06-30entry: Split generic entry into generic exception and syscall entryJinjie Ruan4-117/+119
Currently CONFIG_GENERIC_ENTRY enables both the generic exception entry logic and the generic syscall entry logic, which are otherwise loosely coupled. Introduce separate config options for these so that architectures can select the two independently. This will make it easier for architectures to migrate to generic entry code. Suggested-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Walleij <linus.walleij@linaro.org> Link: https://lore.kernel.org/20250213130007.1418890-2-ruanjinjie@huawei.com Link: https://lore.kernel.org/all/20250624-generic-entry-split-v1-1-53d5ef4f94df@linaro.org [Linus Walleij: rebase onto v6.16-rc1]
2025-06-30perf/core: Fix the WARN_ON_ONCE is out of lock protected regionLuo Gengkun1-2/+2
commit 3172fb986666 ("perf/core: Fix WARN in perf_cgroup_switch()") try to fix a concurrency problem between perf_cgroup_switch and perf_cgroup_event_disable. But it does not to move the WARN_ON_ONCE into lock-protected region, so the warning is still be triggered. Fixes: 3172fb986666 ("perf/core: Fix WARN in perf_cgroup_switch()") Signed-off-by: Luo Gengkun <luogengkun@huaweicloud.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/20250626135403.2454105-1-luogengkun@huaweicloud.com
2025-06-29Merge tag 'perf_urgent_for_v6.16_rc4' of ↵Linus Torvalds2-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Make sure an AUX perf event is really disabled when it overruns * tag 'perf_urgent_for_v6.16_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/aux: Fix pending disable flow when the AUX ring buffer overruns
2025-06-28Merge tag 'trace-v6.16-rc3' of ↵Linus Torvalds1-7/+7
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: - Fix possible UAF on error path in filter_free_subsystem_filters() When freeing a subsystem filter, the filter for the subsystem is passed in to be freed and all the events within the subsystem will have their filter freed too. In order to free without waiting for RCU synchronization, list items are allocated to hold what is going to be freed to free it via a call_rcu(). If the allocation of these items fails, it will call the synchronization directly and free after that (causing a bit of delay for the user). The subsystem filter is first added to this list and then the filters for all the events under the subsystem. The bug is if one of the allocations of the list items for the event filters fail to allocate, it jumps to the "free_now" label which will free the subsystem filter, then all the items on the allocated list, and then the event filters that were not added to the list yet. But because the subsystem filter was added first, it gets freed twice. The solution is to add the subsystem filter after the events, and then if any of the allocations fail it will not try to free any of them twice * tag 'trace-v6.16-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix filter logic error
2025-06-27Merge tag 'mm-hotfixes-stable-2025-06-27-16-56' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "16 hotfixes. 6 are cc:stable and the remainder address post-6.15 issues or aren't considered necessary for -stable kernels. 5 are for MM" * tag 'mm-hotfixes-stable-2025-06-27-16-56' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: MAINTAINERS: add Lorenzo as THP co-maintainer mailmap: update Duje Mihanović's email address selftests/mm: fix validate_addr() helper crashdump: add CONFIG_KEYS dependency mailmap: correct name for a historical account of Zijun Hu mailmap: add entries for Zijun Hu fuse: fix runtime warning on truncate_folio_batch_exceptionals() scripts/gdb: fix dentry_name() lookup mm/damon/sysfs-schemes: free old damon_sysfs_scheme_filter->memcg_path on write mm/alloc_tag: fix the kmemleak false positive issue in the allocation of the percpu variable tag->counters lib/group_cpus: fix NULL pointer dereference from group_cpus_evenly() mm/hugetlb: remove unnecessary holding of hugetlb_lock MAINTAINERS: add missing files to mm page alloc section MAINTAINERS: add tree entry to mm init block mm: add OOM killer maintainer structure fs/proc/task_mmu: fix PAGE_IS_PFNZERO detection for the huge zero folio
2025-06-27tracing: Fix filter logic errorEdward Adam Davis1-7/+7
If the processing of the tr->events loop fails, the filter that has been added to filter_head will be released twice in free_filter_list(&head->rcu) and __free_filter(filter). After adding the filter of tr->events, add the filter to the filter_head process to avoid triggering uaf. Link: https://lore.kernel.org/tencent_4EF87A626D702F816CD0951CE956EC32CD0A@qq.com Fixes: a9d0aab5eb33 ("tracing: Fix regression of filter waiting a long time on RCU synchronization") Reported-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=daba72c4af9915e9c894 Tested-by: syzbot+daba72c4af9915e9c894@syzkaller.appspotmail.com Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Edward Adam Davis <eadavis@qq.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-06-27bpf: guard BTF_ID_FLAGS(bpf_cgroup_read_xattr) with CONFIG_BPF_LSMEduard Zingerman1-1/+1
Function bpf_cgroup_read_xattr is defined in fs/bpf_fs_kfuncs.c, which is compiled only when CONFIG_BPF_LSM is set. Add CONFIG_BPF_LSM check to bpf_cgroup_read_xattr spec in common_btf_ids in kernel/bpf/helpers.c to avoid build failures for configs w/o CONFIG_BPF_LSM. Build failure example: BTF .tmp_vmlinux1.btf.o btf_encoder__tag_kfunc: failed to find kfunc 'bpf_cgroup_read_xattr' in BTF ... WARN: resolve_btfids: unresolved symbol bpf_cgroup_read_xattr make[2]: *** [scripts/Makefile.vmlinux:91: vmlinux.unstripped] Error 255 Fixes: 535b070f4a80 ("bpf: Introduce bpf_cgroup_read_xattr to read xattr of cgroup's node") Reported-by: Jake Hillion <jakehillion@meta.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20250627175309.2710973-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-27timekeeping: Provide interface to control auxiliary clocksThomas Gleixner1-0/+116
Auxiliary clocks are disabled by default and attempts to access them fail. Provide an interface to enable/disable them at run-time. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.444626478@linutronix.de
2025-06-27timekeeping: Provide update for auxiliary timekeepersThomas Gleixner1-0/+19
Update the auxiliary timekeepers periodically. For now this is tied to the system timekeeper update from the tick. This might be revisited and moved out of the tick. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.382451331@linutronix.de
2025-06-27timekeeping: Provide adjtimex() for auxiliary clocksThomas Gleixner1-0/+16
The behaviour is close to clock_adtime(CLOCK_REALTIME) with the following differences: 1) ADJ_SETOFFSET adjusts the auxiliary clock offset 2) ADJ_TAI is not supported 3) Leap seconds are not supported 4) PPS is not supported Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.317946543@linutronix.de
2025-06-27timekeeping: Prepare do_adtimex() for auxiliary clocksThomas Gleixner1-3/+33
Exclude ADJ_TAI, leap seconds and PPS functionality as they make no sense in the context of auxiliary clocks and provide a time stamp based on the actual clock. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.253203783@linutronix.de
2025-06-27timekeeping: Make do_adjtimex() reusableThomas Gleixner1-46/+56
Split out the actual functionality of adjtimex() and make do_adjtimex() a wrapper which feeds the core timekeeper into it and handles the result including audit at the call site. This allows to reuse the actual functionality for auxiliary clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.187322876@linutronix.de
2025-06-27timekeeping: Add auxiliary clock support to __timekeeping_inject_offset()Thomas Gleixner1-8/+31
Redirect the relative offset adjustment to the auxiliary clock offset instead of modifying CLOCK_REALTIME, which has no meaning in context of these clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.124057787@linutronix.de
2025-06-27timekeeping: Make timekeeping_inject_offset() reusableThomas Gleixner1-11/+15
Split out the inner workings for auxiliary clock support and feed the core time keeper into it. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183758.059934561@linutronix.de
2025-06-27timekeeping: Provide time setter for auxiliary clocksThomas Gleixner1-0/+44
Add clock_settime(2) support for auxiliary clocks. The function affects the AUX offset which is added to the "monotonic" clock readout of these clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.995688714@linutronix.de
2025-06-27timekeeping: Add minimal posix-timers support for auxiliary clocksThomas Gleixner3-0/+25
Provide clock_getres(2) and clock_gettime(2) for auxiliary clocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.932220594@linutronix.de
2025-06-27timekeeping: Provide time getters for auxiliary clocksThomas Gleixner1-0/+65
Provide interfaces similar to the ktime_get*() family which provide access to the auxiliary clocks. These interfaces have a boolean return value, which indicates whether the accessed clock is valid or not. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.868342628@linutronix.de
2025-06-27timekeeping: Update auxiliary timekeepers on clocksource changeThomas Gleixner1-0/+33
Propagate a system clocksource change to the auxiliary timekeepers so that they can pick up the new clocksource. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: John Stultz <jstultz@google.com> Link: https://lore.kernel.org/all/20250625183757.803890875@linutronix.de
2025-06-27bpf: Fix string kfuncs names in doc commentsViktor Malik1-3/+3
Documentation comments for bpf_strnlen and bpf_strcspn contained incorrect function names. Fixes: e91370550f1f ("bpf: Add kfuncs for read-only string operations") Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Closes: https://lore.kernel.org/bpf/20250627174759.3a435f86@canb.auug.org.au/T/#u Signed-off-by: Viktor Malik <vmalik@redhat.com> Link: https://lore.kernel.org/r/20250627082001.237606-1-vmalik@redhat.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26Merge branch 'vfs-6.17.bpf' of ↵Alexei Starovoitov2-0/+8
https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Merge branch 'vfs-6.17.bpf' from vfs tree into bpf-next/master and resolve conflicts. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2025-06-26smp: Use cpumask_any_but() in smp_call_function_many_cond()Yury Norov [NVIDIA]1-6/+1
smp_call_function_many_cond() opencodes cpumask_any_but(). Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250623000010.10124-3-yury.norov@gmail.com
2025-06-26smp: Improve locality in smp_call_function_any()Yury Norov [NVIDIA]1-16/+3
smp_call_function_any() tries to make a local call as it's the cheapest option, or switches to a CPU in the same node. If it's not possible, the algorithm gives up and searches for any CPU, in a numerical order. Instead, it can search for the best CPU based on NUMA locality, including the 2nd nearest hop (a set of equidistant nodes), and higher. sched_numa_find_nth_cpu() does exactly that, and also helps to drop most of the housekeeping code. Signed-off-by: Yury Norov [NVIDIA] <yury.norov@gmail.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250623000010.10124-2-yury.norov@gmail.com
2025-06-26PM: Restrict swap use to later in the suspend sequenceMario Limonciello4-10/+2
Currently swap is restricted before drivers have had a chance to do their prepare() PM callbacks. Restricting swap this early means that if a driver needs to evict some content from memory into sawp in it's prepare callback, it won't be able to. On AMD dGPUs this can lead to failed suspends under memory pressure situations as all VRAM must be evicted to system memory or swap. Move the swap restriction to right after all devices have had a chance to do the prepare() callback. If there is any problem with the sequence, restore swap in the appropriate dpm resume callbacks or error handling paths. Closes: https://github.com/ROCm/ROCK-Kernel-Driver/issues/174 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2362 Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Tested-by: Nat Wittstock <nat@fardog.io> Tested-by: Lucian Langa <lucilanga@7pot.org> Link: https://patch.msgid.link/20250613214413.4127087-1-superm1@kernel.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>