aboutsummaryrefslogtreecommitdiff
path: root/kernel/trace
AgeCommit message (Collapse)AuthorFilesLines
2026-04-01rv: Allow epoll in rtapp-sleep monitorNam Cao2-46/+60
Since commit 0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock"), epoll_wait is real-time-safe syscall for sleeping. Add epoll_wait to the list of rt-safe sleeping APIs. Signed-off-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Gabriele Monaco <gmonaco@redhat.com> Link: https://lore.kernel.org/r/20260401130828.3115428-1-namcao@linutronix.de Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31tracing: Remove duplicate latency_fsnotify() stubSteven Rostedt2-3/+2
When the SNAPSHOT is defined but FSNOTIFY is not the latency_fsnotify() function is turned into a static inline stub. But this stub was defined in both trace.h and trace_snapshot.c causing a error in build when CONFIG_SNAPSHOT is defined but FSNOTIFY is not. The stub is not needed in trace_snapshot.c as it will be defined in trace.h, remove it from the C file. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260330205859.24c0aae3@gandalf.local.home Fixes: bade44fe5462 ("tracing: Move snapshot code out of trace.c and into trace_snapshot.c") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202603310604.lGE9LDBK-lkp@intel.com/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-31tracing: Preserve repeated trace_trigger boot parametersWesley Atwell1-3/+10
trace_trigger= tokenizes bootup_trigger_buf in place and stores pointers into that buffer for later trigger registration. Repeated trace_trigger= parameters overwrite the buffer contents from earlier calls, leaving only the last set of parsed event and trigger strings. Keep each new trace_trigger= string at the end of bootup_trigger_buf and parse only the appended range. That preserves the earlier event and trigger strings while still letting repeated parameters queue additional boot-time triggers. This also lets Bootconfig array values work naturally when they expand to repeated trace_trigger= entries. Before this change, only the last trace_trigger= instance survived boot. Link: https://patch.msgid.link/20260330181103.1851230-2-atwellwea@gmail.com Signed-off-by: Wesley Atwell <atwellwea@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-31tracing: Append repeated boot-time tracing parametersWesley Atwell4-6/+42
Some tracing boot parameters already accept delimited value lists, but their __setup() handlers keep only the last instance seen at boot. Make repeated instances append to the same boot-time buffer in the format each parser already consumes. Use a shared trace_append_boot_param() helper for the ftrace filters, trace_options, and kprobe_event boot parameters. This also lets Bootconfig array values work naturally when they expand to repeated param=value entries. Before this change, only the last instance from each repeated parameter survived boot. Link: https://patch.msgid.link/20260330181103.1851230-1-atwellwea@gmail.com Signed-off-by: Wesley Atwell <atwellwea@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-31rv: Add nomiss deadline monitorGabriele Monaco10-0/+713
Add the deadline monitors collection to validate the deadline scheduler, both for deadline tasks and servers. The currently implemented monitors are: * nomiss: validate dl entities run to completion before their deadiline Reviewed-by: Nam Cao <namcao@linutronix.de> Reviewed-by: Juri Lelli <juri.lelli@redhat.com> Link: https://lore.kernel.org/r/20260330111010.153663-13-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31rv: Convert the opid monitor to a hybrid automatonGabriele Monaco5-153/+61
The opid monitor validates that wakeup and need_resched events only occur with interrupts and preemption disabled by following the preemptirq tracepoints. As reported in [1], those tracepoints might be inaccurate in some situations (e.g. NMIs). Since the monitor doesn't validate other ordering properties, remove the dependency on preemptirq tracepoints and convert the monitor to a hybrid automaton to validate the constraint during event handling. This makes the monitor more robust by also removing the workaround for interrupts missing the preemption tracepoints, which was working on PREEMPT_RT only and allows the monitor to be built on kernels without the preemptirqs tracepoints. [1] - https://lore.kernel.org/lkml/20250625120823.60600-1-gmonaco@redhat.com Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260330111010.153663-8-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31rv: Add sample hybrid monitor stallGabriele Monaco7-0/+266
Add a sample monitor to showcase hybrid/timed automata. The stall monitor identifies tasks stalled for longer than a threshold and reacts when that happens. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260330111010.153663-7-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-31rv: Add Hybrid Automata monitor typeGabriele Monaco2-0/+76
Deterministic automata define which events are allowed in every state, but cannot define more sophisticated constraint taking into account the system's environment (e.g. time or other states not producing events). Add the Hybrid Automata monitor type as an extension of Deterministic automata where each state transition is validating a constraint on a finite number of environment variables. Hybrid automata can be used to implement timed automata, where the environment variables are clocks. Also implement the necessary functionality to handle clock constraints (ns or jiffy granularity) on state and events. Reviewed-by: Nam Cao <namcao@linutronix.de> Link: https://lore.kernel.org/r/20260330111010.153663-3-gmonaco@redhat.com Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
2026-03-28tracing: Remove spurious default precision from show_event_trigger/filter ↵David Laight1-2/+2
formats Change 2d8b7f9bf8e6e ("tracing: Have show_event_trigger/filter format a bit more in columns") added space padding to align the output. However it used ("%*.s", len, "") which requests the default precision. It doesn't matter here whether the userspace default (0) or kernel default (no precision) is used, but the format should be "%*s". Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Andrew Morton <akpm@linux-foundation.org> Link: https://patch.msgid.link/20260326201824.3919-1-david.laight.linux@gmail.com Signed-off-by: David Laight <david.laight.linux@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reviewed-by: Petr Mladek <pmladek@suse.com> Reviewed-by: Aaron Tomlin <atomlin@atomlin.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-28tracing: Remove tracing_alloc_snapshot() when snapshot isn't definedSteven Rostedt1-7/+0
The function tracing_alloc_snapshot() is only used between trace.c and trace_snapshot.c. When snapshot isn't configured, it's not used at all. The stub function was defined as a global with no users and no prototype causing build issues. Remove the function when snapshot isn't configured as nothing is calling it. Also remove the EXPORT_SYMBOL_GPL() that was associated with it as it's not used outside of the tracing subsystem which also includes any modules. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260328101946.2c4ef4a5@robin Reported-by: Mark Brown <broonie@kernel.org> Closes: https://lore.kernel.org/all/acb-IuZ4vDkwwQLW@sirena.co.uk/ Fixes: bade44fe546212 (tracing: Move snapshot code out of trace.c and into trace_snapshot.c) Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-28tracing: Drain deferred trigger frees if kthread creation failsWesley Atwell1-13/+66
Boot-time trigger registration can fail before the trigger-data cleanup kthread exists. Deferring those frees until late init is fine, but the post-boot fallback must still drain the deferred list if kthread creation never succeeds. Otherwise, boot-deferred nodes can accumulate on trigger_data_free_list, later frees fall back to synchronously freeing only the current object, and the older queued entries are leaked forever. To trigger this, add the following to the kernel command line: trace_event=sched_switch trace_trigger=sched_switch.traceon,sched_switch.traceon The second traceon trigger will fail and be freed. This triggers a NULL pointer dereference and crashes the kernel. Keep the deferred boot-time behavior, but when kthread creation fails, drain the whole queued list synchronously. Do the same in the late-init drain path so queued entries are not stranded there either. Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260324221326.1395799-3-atwellwea@gmail.com Fixes: 61d445af0a7c ("tracing: Add bulk garbage collection of freeing event_trigger_data") Signed-off-by: Wesley Atwell <atwellwea@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-27tracing: Fix potential deadlock in cpu hotplug with osnoiseLuo Haiyang1-5/+5
The following sequence may leads deadlock in cpu hotplug: task1 task2 task3 ----- ----- ----- mutex_lock(&interface_lock) [CPU GOING OFFLINE] cpus_write_lock(); osnoise_cpu_die(); kthread_stop(task3); wait_for_completion(); osnoise_sleep(); mutex_lock(&interface_lock); cpus_read_lock(); [DEAD LOCK] Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock). Cc: stable@vger.kernel.org Cc: <mathieu.desnoyers@efficios.com> Cc: <zhang.run@zte.com.cn> Cc: <yang.tao172@zte.com.cn> Cc: <ran.xiaokai@zte.com.cn> Fixes: bce29ac9ce0bb ("trace: Add osnoise tracer") Link: https://patch.msgid.link/20260326141953414bVSj33dAYktqp9Oiyizq8@zte.com.cn Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Luo Haiyang <luo.haiyang@zte.com.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-26tracing: Move snapshot code out of trace.c and into trace_snapshot.cSteven Rostedt4-1177/+1188
The trace.c file was a dumping ground for most tracing code. Start organizing it better by moving various functions out into their own files. Move all the snapshot code, including the max trace code into its own trace_snapshot.c file. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260324140145.36352d6a@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-24ring-buffer: Show what clock function is used on timestamp errorsSteven Rostedt1-4/+6
The testing for tracing was triggering a timestamp count issue that was always off by one. This has been happening for some time but has never been reported by anyone else. It was finally discovered to be an issue with the "uptime" (jiffies) clock that happened to be traced and the internal recursion caused the discrepancy. This would have been much easier to solve if the clock function being used was displayed when the error was detected. Add the clock function to the error output. Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260323202212.479bb288@gandalf.local.home Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-24Merge commit 'f35dbac6942171dc4ce9398d1d216a59224590a9' into ↵Steven Rostedt2-10/+28
trace/ring-buffer/core The commit f35dbac69421 ("ring-buffer: Fix to update per-subbuf entries of persistent ring buffer") was a fix and merged upstream. It is needed for some other work in the ring buffer. The current branch has the remote buffer code that is shared with the Arm64 subsystem and can't be rebased. Merge in the upstream commit to allow continuing of the ring buffer work. Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-25tracing: fprobe: do not zero out unused fgraph_dataMartin Kaiser1-2/+0
If fprobe_entry does not fill the allocated fgraph_data completely, the unused part does not have to be zeroed. fgraph_data is a short-lived part of the shadow stack. The preceding length field allows locating the end regardless of the content. Link: https://lore.kernel.org/all/20260324084804.375764-1-martin@kaiser.cx/ Signed-off-by: Martin Kaiser <martin@kaiser.cx> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2026-03-23tracing: Pretty-print enum parameters in function argumentsDonglin Peng1-1/+11
Currently, print_function_args() prints enum parameter values in decimal format, reducing trace log readability. Use BTF information to resolve enum parameters and print their symbolic names (where available). This improves readability by showing meaningful identifiers instead of raw numbers. Before: mod_memcg_lruvec_state(lruvec=0xffff..., idx=5, val=320) After: mod_memcg_lruvec_state(lruvec=0xffff..., idx=5 [NR_SLAB_RECLAIMABLE_B], val=320) Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Link: https://patch.msgid.link/20260209071949.4040193-1-dolinux.peng@gmail.com Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-23tracing: Free up file->private_data for use by individual eventsPetr Pavlu2-6/+0
The tracing_open_file_tr() function currently copies the trace_event_file pointer from inode->i_private to file->private_data when the file is successfully opened. This duplication is not particularly useful, as all event code should utilize event_file_file() or event_file_data() to retrieve a trace_event_file pointer from a file struct and these access functions read file->f_inode->i_private. Moreover, this setup requires the code for opening hist files to explicitly clear file->private_data before calling single_open(), since this function expects the private_data member to be set to NULL and uses it to store a pointer to a seq_file. Remove the unnecessary setting of file->private_data in tracing_open_file_tr() and simplify the hist code. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260219162737.314231-6-petr.pavlu@suse.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-23tracing: Clean up access to trace_event_file from a file pointerPetr Pavlu2-9/+14
The tracing code provides two functions event_file_file() and event_file_data() to obtain a trace_event_file pointer from a file struct. The primary method to use is event_file_file(), as it checks for the EVENT_FILE_FL_FREED flag to determine whether the event is being removed. The second function event_file_data() is an optimization for retrieving the same data when the event_mutex is still held. In the past, when removing an event directory in remove_event_file_dir(), the code set i_private to NULL for all event files and readers were expected to check for this state to recognize that the event is being removed. In the case of event_id_read(), the value was read using event_file_data() without acquiring the event_mutex. This required event_file_data() to use READ_ONCE() when retrieving the i_private data. With the introduction of eventfs, i_private is assigned when an eventfs inode is allocated and remains set throughout its lifetime. Remove the now unnecessary READ_ONCE() access to i_private in both event_file_file() and event_file_data(). Inline the access to i_private in remove_event_file_dir(), which allows event_file_data() to handle i_private solely as a trace_event_file pointer. Add a check in event_file_data() to ensure that the event_mutex is held and that file->flags doesn't have the EVENT_FILE_FL_FREED flag set. Finally, move event_file_data() immediately after event_file_code() since the latter provides a comment explaining how both functions should be used together. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260219162737.314231-5-petr.pavlu@suse.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-23tracing: Remove unnecessary check for EVENT_FILE_FL_FREEDPetr Pavlu1-6/+2
The event_filter_write() function calls event_file_file() to retrieve a trace_event_file associated with a given file struct. If a non-NULL pointer is returned, the function then checks whether the trace_event_file instance has the EVENT_FILE_FL_FREED flag set. This check is redundant because event_file_file() already performs this validation and returns NULL if the flag is set. The err value is also already initialized to -ENODEV. Remove the unnecessary check for EVENT_FILE_FL_FREED in event_filter_write(). Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20260219162737.314231-4-petr.pavlu@suse.com Signed-off-by: Petr Pavlu <petr.pavlu@suse.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-23tracing: move __printf() attribute on __ftrace_vbprintk()Arnd Bergmann1-0/+1
The sunrpc change to use trace_printk() for debugging caused a new warning for every instance of dprintk() in some configurations, when -Wformat-security is enabled: fs/nfs/getroot.c: In function 'nfs_get_root': fs/nfs/getroot.c:90:17: error: format not a string literal and no format arguments [-Werror=format-security] 90 | nfs_errorf(fc, "NFS: Couldn't getattr on root"); I've been slowly chipping away at those warnings over time with the intention of enabling them by default in the future. While I could not figure out why this only happens for this one instance, I see that the __trace_bprintk() function is always called with a local variable as the format string, rather than a literal. Move the __printf(2,3) annotation on this function from the declaration to the caller. As this is can only be validated for literals, the attribute on the declaration causes the warnings every time, but removing it entirely introduces a new warning on the __ftrace_vbprintk() definition. The format strings still get checked because the underlying literal keeps getting passed into __trace_printk() in the "else" branch, which is not taken but still evaluated for compile-time warnings. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Anna Schumaker <anna.schumaker@oracle.com> Cc: Chuck Lever <chuck.lever@oracle.com> Cc: Simon Horman <horms@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Yury Norov <ynorov@nvidia.com> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://patch.msgid.link/20260203164545.3174910-1-arnd@kernel.org Fixes: ec7d8e68ef0e ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer") Acked-by: Jeff Layton <jlayton@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-23tracing: Adjust cmd_check_undefined to show unexpected undefined symbolsNathan Chancellor1-1/+7
When the check_undefined command in kernel/trace/Makefile fails, there is no output, making it hard to understand why the build failed. Capture the output of the $(NM) + grep command and print it when failing to make it clearer what the problem is. Fixes: a717943d8ecc ("tracing: Check for undefined symbols in simple_ring_buffer") Signed-off-by: Nathan Chancellor <nathan@kernel.org> Reviewed-by: Vincent Donnefort <vdonnefort@google.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260320-cmd_check_undefined-verbose-v1-1-54fc5b061f94@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-21ftrace: Use hash argument for tmp_ops in update_ftrace_direct_modJiri Olsa1-2/+2
The modify logic registers temporary ftrace_ops object (tmp_ops) to trigger the slow path for all direct callers to be able to safely modify attached addresses. At the moment we use ops->func_hash for tmp_ops filter, which represents all the systems attachments. It's faster to use just the passed hash filter, which contains only the modified sites and is always a subset of the ops->func_hash. Cc: Alexei Starovoitov <ast@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Menglong Dong <menglong8.dong@gmail.com> Cc: Song Liu <song@kernel.org> Link: https://patch.msgid.link/20260312123738.129926-1-jolsa@kernel.org Fixes: e93672f770d7 ("ftrace: Add update_ftrace_direct_mod function") Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21ring-buffer: Fix to update per-subbuf entries of persistent ring bufferMasami Hiramatsu (Google)1-1/+1
Since the validation loop in rb_meta_validate_events() updates the same cpu_buffer->head_page->entries, the other subbuf entries are not updated. Fix to use head_page to update the entries field, since it is the cursor in this loop. Cc: stable@vger.kernel.org Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Ian Rogers <irogers@google.com> Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events") Link: https://patch.msgid.link/177391153882.193994.17158784065013676533.stgit@mhiramat.tok.corp.google.com Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21tracing: Fix trace_marker copy link list updatesSteven Rostedt1-9/+10
When the "copy_trace_marker" option is enabled for an instance, anything written into /sys/kernel/tracing/trace_marker is also copied into that instances buffer. When the option is set, that instance's trace_array descriptor is added to the marker_copies link list. This list is protected by RCU, as all iterations uses an RCU protected list traversal. When the instance is deleted, all the flags that were enabled are cleared. This also clears the copy_trace_marker flag and removes the trace_array descriptor from the list. The issue is after the flags are called, a direct call to update_marker_trace() is performed to clear the flag. This function returns true if the state of the flag changed and false otherwise. If it returns true here, synchronize_rcu() is called to make sure all readers see that its removed from the list. But since the flag was already cleared, the state does not change and the synchronization is never called, leaving a possible UAF bug. Move the clearing of all flags below the updating of the copy_trace_marker option which then makes sure the synchronization is performed. Also use the flag for checking the state in update_marker_trace() instead of looking at if the list is empty. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260318185512.1b6c7db4@gandalf.local.home Fixes: 7b382efd5e8a ("tracing: Allow the top level trace_marker to write into another instances") Reported-by: Sasha Levin <sashal@kernel.org> Closes: https://lore.kernel.org/all/20260225133122.237275-1-sashal@kernel.org/ Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21tracing: Fix failure to read user space from system call trace eventsSteven Rostedt1-0/+17
The system call trace events call trace_user_fault_read() to read the user space part of some system calls. This is done by grabbing a per-cpu buffer, disabling migration, enabling preemption, calling copy_from_user(), disabling preemption, enabling migration and checking if the task was preempted while preemption was enabled. If it was, the buffer is considered corrupted and it tries again. There's a safety mechanism that will fail out of this loop if it fails 100 times (with a warning). That warning message was triggered in some pi_futex stress tests. Enabling the sched_switch trace event and traceoff_on_warning, showed the problem: pi_mutex_hammer-1375 [006] d..21 138.981648: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981651: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981656: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981659: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981664: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981667: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981671: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981675: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981679: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981682: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981687: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981690: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981695: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981698: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981703: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981706: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981711: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981714: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981719: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981722: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981727: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981730: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 pi_mutex_hammer-1375 [006] d..21 138.981735: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0 migration/6-47 [006] d..2. 138.981738: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95 What happened was the task 1375 was flagged to be migrated. When preemption was enabled, the migration thread woke up to migrate that task, but failed because migration for that task was disabled. This caused the loop to fail to exit because the task scheduled out while trying to read user space. Every time the task enabled preemption the migration thread would schedule in, try to migrate the task, fail and let the task continue. But because the loop would only enable preemption with migration disabled, it would always fail because each time it enabled preemption to read user space, the migration thread would try to migrate it. To solve this, when the loop fails to read user space without being scheduled out, enabled and disable preemption with migration enabled. This will allow the migration task to successfully migrate the task and the next loop should succeed to read user space without being scheduled out. Cc: stable@vger.kernel.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Link: https://patch.msgid.link/20260316130734.1858a998@gandalf.local.home Fixes: 64cf7d058a005 ("tracing: Have trace_marker use per-cpu data to read user space") Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-21Merge tag 'v7.0-rc4' into timers/core, to resolve conflictIngo Molnar8-28/+105
Resolve conflict between this change in the upstream kernel: 4c652a47722f ("rseq: Mark rseq_arm_slice_extension_timer() __always_inline") ... and this pending change in timers/core: 0e98eb14814e ("entry: Prepare for deferred hrtimer rearming") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2026-03-18tracing: Restore accidentally removed SPDX tagMarc Zyngier1-1/+1
Restore the SPDX tag that was accidentally dropped. Fixes: 7e4b6c94300e3 ("tracing: add more symbols to whitelist") Reported-by: Nathan Chancellor <nathan@kernel.org> Cc: Arnd Bergmann <arnd@kernel.org> Cc: Vincent Donnefort <vdonnefort@google.com> Cc: Steven Rostedt <rostedt@goodmis.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20260317194252.1890568-1-maz@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-17tracing: Generate undef symbols allowlist for simple_ring_bufferVincent Donnefort1-8/+38
Compiler and tooling-generated symbols are difficult to maintain across all supported architectures. Make the allowlist more robust by replacing the harcoded list with a mechanism that automatically detects these symbols. This mechanism generates a C function designed to trigger common compiler-inserted symbols. Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Reviewed-by: Nathan Chancellor <nathan@kernel.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260316092845.3367411-1-vdonnefort@google.com [maz: added __msan prefix to allowlist as pointed out by Arnd] Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-12tracing: add more symbols to whitelistArnd Bergmann1-1/+3
Randconfig builds show a number of cryptic build errors from hitting undefined symbols in simple_ring_buffer.o: make[7]: *** [/home/arnd/arm-soc/kernel/trace/Makefile:147: kernel/trace/simple_ring_buffer.o.checked] Error 1 These happen with CONFIG_TRACE_BRANCH_PROFILING, CONFIG_KASAN_HW_TAGS, CONFIG_STACKPROTECTOR, CONFIG_DEBUG_IRQFLAGS and indirectly from WARN_ON(). Add exceptions for each one that I have hit so far on arm64, x86_64 and arm randconfig builds. Other architectures likely hit additional ones, so it would be nice to produce a little more verbose output that include the name of the missing symbols directly. Fixes: a717943d8ecc ("tracing: Check for undefined symbols in simple_ring_buffer") Signed-off-by: Arnd Bergmann <arnd@arndb.de> Link: https://patch.msgid.link/20260312123601.625063-2-arnd@kernel.org Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-12tracing: Update undefined symbols allow list for simple_ring_bufferVincent Donnefort1-1/+2
Undefined symbols are not allowed for simple_ring_buffer.c. But some compiler emitted symbols are missing in the allowlist. Update it. Reported-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Fixes: a717943d8ecc ("tracing: Check for undefined symbols in simple_ring_buffer") Closes: https://lore.kernel.org/all/20260311221816.GA316631@ax162/ Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20260312113535.2213350-1-vdonnefort@google.com Signed-off-by: Marc Zyngier <maz@kernel.org>
2026-03-12tracing: Use explicit array size instead of sentinel elements in symbol printingThomas Weißschuh (Schneider Electric)3-12/+15
The sentinel value added by the wrapper macros __print_symbolic() et al prevents the callers from adding their own trailing comma. This makes constructing symbol list dynamically based on kconfig values tedious. Drop the sentinel elements, so callers can either specify the trailing comma or not, just like in regular array initializers. Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-2-095357392669@linutronix.de
2026-03-09tracing: Check for undefined symbols in simple_ring_bufferVincent Donnefort1-0/+16
The simple_ring_buffer implementation must remain simple enough to be used by the pKVM hypervisor. Prevent the object build if unresolved symbols are found. Link: https://patch.msgid.link/20260309162516.2623589-19-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09tracing: load/unload page callbacks for simple_ring_bufferVincent Donnefort1-19/+72
Add load/unload callback used for each admitted page in the ring-buffer. This will be later useful for the pKVM hypervisor which uses a different VA space and need to dynamically map/unmap the ring-buffer pages. Link: https://patch.msgid.link/20260309162516.2623589-18-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09tracing: Add a trace remote module for testingVincent Donnefort4-0/+281
Add a module to help testing the tracefs support for trace remotes. This module: * Use simple_ring_buffer to write into a ring-buffer. * Declare a single "selftest" event that can be triggered from user-space. * Register a "test" trace remote. This is intended to be used by trace remote selftests. Link: https://patch.msgid.link/20260309162516.2623589-15-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09tracing: Introduce simple_ring_bufferVincent Donnefort3-0/+468
Add a simple implementation of the kernel ring-buffer. This intends to be used later by ring-buffer remotes such as the pKVM hypervisor, hence the need for a cut down version (write only) without any dependency. Link: https://patch.msgid.link/20260309162516.2623589-14-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09ring-buffer: Export buffer_data_page and macrosVincent Donnefort1-35/+1
In preparation for allowing the writing of ring-buffer compliant pages outside of ring_buffer.c, move buffer_data_page and timestamps encoding macros into the publicly available ring_buffer_types.h. Link: https://patch.msgid.link/20260309162516.2623589-13-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09tracing: Add events/ root files to trace remotesVincent Donnefort2-2/+139
Just like for the kernel events directory, add 'enable', 'header_page' and 'header_event' at the root of the trace remote events/ directory. Link: https://patch.msgid.link/20260309162516.2623589-11-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09tracing: Add events to trace remotesVincent Donnefort1-5/+259
An event is predefined point in the writer code that allows to log data. Following the same scheme as kernel events, add remote events, described to user-space within the events/ tracefs directory found in the corresponding trace remote. Remote events are expected to be described during the trace remote registration. Add also a .enable_event callback for trace_remote to toggle the event logging, if supported. Link: https://patch.msgid.link/20260309162516.2623589-10-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09tracing: Add init callback to trace remotesVincent Donnefort1-1/+6
Add a .init call back so the trace remote callers can add entries to the tracefs directory. Link: https://patch.msgid.link/20260309162516.2623589-9-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09tracing: Add non-consuming read to trace remotesVincent Donnefort3-16/+326
Allow reading the trace file for trace remotes. This performs a non-consuming read of the trace buffer. Link: https://patch.msgid.link/20260309162516.2623589-8-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09tracing: Add reset to trace remotesVincent Donnefort1-0/+45
Allow to reset the trace remote buffer by writing to the Tracefs "trace" file. This is similar to the regular Tracefs interface. Link: https://patch.msgid.link/20260309162516.2623589-7-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09tracing: Introduce trace remotesVincent Donnefort5-1/+630
A trace remote relies on ring-buffer remotes to read and control compatible tracing buffers, written by entity such as firmware or hypervisor. Add a Tracefs directory remotes/ that contains all instances of trace remotes. Each instance follows the same hierarchy as any other to ease the support by existing user-space tools. This currently does not provide any event support, which will come later. Link: https://patch.msgid.link/20260309162516.2623589-6-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09ring-buffer: Add non-consuming read for ring-buffer remotesVincent Donnefort1-6/+69
Hopefully, the remote will only swap pages on the kernel instruction (via the swap_reader_page() callback). This means we know at what point the ring-buffer geometry has changed. It is therefore possible to rearrange the kernel view of that ring-buffer to allow non-consuming read. Link: https://patch.msgid.link/20260309162516.2623589-5-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09ring-buffer: Introduce ring-buffer remotesVincent Donnefort1-8/+225
Add ring-buffer remotes to support entities outside of the kernel (such as firmware or a hypervisor) that writes events into a ring-buffer using the tracefs format Require a description of the ring-buffer pages (struct trace_buffer_desc) and callbacks (swap_reader_page and reset) to set up the ring-buffer on the kernel side. Expect the remote entity to maintain and update the meta-page. Link: https://patch.msgid.link/20260309162516.2623589-4-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09ring-buffer: Store bpage pointers into subbuf_idsVincent Donnefort1-8/+11
The subbuf_ids field allows to point to a specific page from the ring-buffer based on its ID. As a preparation or the upcoming ring-buffer remote support, point this array to the buffer_page instead of the buffer_data_page. Link: https://patch.msgid.link/20260309162516.2623589-3-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-03-09ring-buffer: Add page statistics to the meta-pageVincent Donnefort1-0/+2
Add two fields pages_touched and pages_lost to the ring-buffer meta-page. Those fields are useful to get the number of used pages in the ring-buffer. Link: https://patch.msgid.link/20260309162516.2623589-2-vdonnefort@google.com Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Vincent Donnefort <vdonnefort@google.com> Signed