| Age | Commit message (Collapse) | Author | Files | Lines |
|
Since commit 0c43094f8cc9 ("eventpoll: Replace rwlock with spinlock"),
epoll_wait is real-time-safe syscall for sleeping.
Add epoll_wait to the list of rt-safe sleeping APIs.
Signed-off-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Gabriele Monaco <gmonaco@redhat.com>
Link: https://lore.kernel.org/r/20260401130828.3115428-1-namcao@linutronix.de
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
|
|
When the SNAPSHOT is defined but FSNOTIFY is not the latency_fsnotify()
function is turned into a static inline stub. But this stub was defined in
both trace.h and trace_snapshot.c causing a error in build when
CONFIG_SNAPSHOT is defined but FSNOTIFY is not. The stub is not needed in
trace_snapshot.c as it will be defined in trace.h, remove it from the C
file.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260330205859.24c0aae3@gandalf.local.home
Fixes: bade44fe5462 ("tracing: Move snapshot code out of trace.c and into trace_snapshot.c")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202603310604.lGE9LDBK-lkp@intel.com/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
trace_trigger= tokenizes bootup_trigger_buf in place and stores pointers
into that buffer for later trigger registration. Repeated trace_trigger=
parameters overwrite the buffer contents from earlier calls, leaving
only the last set of parsed event and trigger strings.
Keep each new trace_trigger= string at the end of bootup_trigger_buf and
parse only the appended range. That preserves the earlier event and
trigger strings while still letting repeated parameters queue additional
boot-time triggers.
This also lets Bootconfig array values work naturally when they expand
to repeated trace_trigger= entries.
Before this change, only the last trace_trigger= instance survived boot.
Link: https://patch.msgid.link/20260330181103.1851230-2-atwellwea@gmail.com
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Some tracing boot parameters already accept delimited value lists, but
their __setup() handlers keep only the last instance seen at boot.
Make repeated instances append to the same boot-time buffer in the
format each parser already consumes.
Use a shared trace_append_boot_param() helper for the ftrace filters,
trace_options, and kprobe_event boot parameters.
This also lets Bootconfig array values work naturally when they expand
to repeated param=value entries.
Before this change, only the last instance from each repeated
parameter survived boot.
Link: https://patch.msgid.link/20260330181103.1851230-1-atwellwea@gmail.com
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add the deadline monitors collection to validate the deadline scheduler,
both for deadline tasks and servers.
The currently implemented monitors are:
* nomiss:
validate dl entities run to completion before their deadiline
Reviewed-by: Nam Cao <namcao@linutronix.de>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://lore.kernel.org/r/20260330111010.153663-13-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
|
|
The opid monitor validates that wakeup and need_resched events only
occur with interrupts and preemption disabled by following the
preemptirq tracepoints.
As reported in [1], those tracepoints might be inaccurate in some
situations (e.g. NMIs).
Since the monitor doesn't validate other ordering properties, remove the
dependency on preemptirq tracepoints and convert the monitor to a hybrid
automaton to validate the constraint during event handling.
This makes the monitor more robust by also removing the workaround for
interrupts missing the preemption tracepoints, which was working on
PREEMPT_RT only and allows the monitor to be built on kernels without
the preemptirqs tracepoints.
[1] - https://lore.kernel.org/lkml/20250625120823.60600-1-gmonaco@redhat.com
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260330111010.153663-8-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
|
|
Add a sample monitor to showcase hybrid/timed automata.
The stall monitor identifies tasks stalled for longer than a threshold
and reacts when that happens.
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260330111010.153663-7-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
|
|
Deterministic automata define which events are allowed in every state,
but cannot define more sophisticated constraint taking into account the
system's environment (e.g. time or other states not producing events).
Add the Hybrid Automata monitor type as an extension of Deterministic
automata where each state transition is validating a constraint on a
finite number of environment variables.
Hybrid automata can be used to implement timed automata, where the
environment variables are clocks.
Also implement the necessary functionality to handle clock constraints
(ns or jiffy granularity) on state and events.
Reviewed-by: Nam Cao <namcao@linutronix.de>
Link: https://lore.kernel.org/r/20260330111010.153663-3-gmonaco@redhat.com
Signed-off-by: Gabriele Monaco <gmonaco@redhat.com>
|
|
formats
Change 2d8b7f9bf8e6e ("tracing: Have show_event_trigger/filter format a bit more in columns")
added space padding to align the output.
However it used ("%*.s", len, "") which requests the default precision.
It doesn't matter here whether the userspace default (0) or kernel
default (no precision) is used, but the format should be "%*s".
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: https://patch.msgid.link/20260326201824.3919-1-david.laight.linux@gmail.com
Signed-off-by: David Laight <david.laight.linux@gmail.com>
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Reviewed-by: Petr Mladek <pmladek@suse.com>
Reviewed-by: Aaron Tomlin <atomlin@atomlin.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The function tracing_alloc_snapshot() is only used between trace.c and
trace_snapshot.c. When snapshot isn't configured, it's not used at all.
The stub function was defined as a global with no users and no prototype
causing build issues.
Remove the function when snapshot isn't configured as nothing is calling
it.
Also remove the EXPORT_SYMBOL_GPL() that was associated with it as it's
not used outside of the tracing subsystem which also includes any modules.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260328101946.2c4ef4a5@robin
Reported-by: Mark Brown <broonie@kernel.org>
Closes: https://lore.kernel.org/all/acb-IuZ4vDkwwQLW@sirena.co.uk/
Fixes: bade44fe546212 (tracing: Move snapshot code out of trace.c and into trace_snapshot.c)
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Boot-time trigger registration can fail before the trigger-data cleanup
kthread exists. Deferring those frees until late init is fine, but the
post-boot fallback must still drain the deferred list if kthread
creation never succeeds.
Otherwise, boot-deferred nodes can accumulate on
trigger_data_free_list, later frees fall back to synchronously freeing
only the current object, and the older queued entries are leaked
forever.
To trigger this, add the following to the kernel command line:
trace_event=sched_switch trace_trigger=sched_switch.traceon,sched_switch.traceon
The second traceon trigger will fail and be freed. This triggers a NULL
pointer dereference and crashes the kernel.
Keep the deferred boot-time behavior, but when kthread creation fails,
drain the whole queued list synchronously. Do the same in the late-init
drain path so queued entries are not stranded there either.
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20260324221326.1395799-3-atwellwea@gmail.com
Fixes: 61d445af0a7c ("tracing: Add bulk garbage collection of freeing event_trigger_data")
Signed-off-by: Wesley Atwell <atwellwea@gmail.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The following sequence may leads deadlock in cpu hotplug:
task1 task2 task3
----- ----- -----
mutex_lock(&interface_lock)
[CPU GOING OFFLINE]
cpus_write_lock();
osnoise_cpu_die();
kthread_stop(task3);
wait_for_completion();
osnoise_sleep();
mutex_lock(&interface_lock);
cpus_read_lock();
[DEAD LOCK]
Fix by swap the order of cpus_read_lock() and mutex_lock(&interface_lock).
Cc: stable@vger.kernel.org
Cc: <mathieu.desnoyers@efficios.com>
Cc: <zhang.run@zte.com.cn>
Cc: <yang.tao172@zte.com.cn>
Cc: <ran.xiaokai@zte.com.cn>
Fixes: bce29ac9ce0bb ("trace: Add osnoise tracer")
Link: https://patch.msgid.link/20260326141953414bVSj33dAYktqp9Oiyizq8@zte.com.cn
Reviewed-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Luo Haiyang <luo.haiyang@zte.com.cn>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The trace.c file was a dumping ground for most tracing code. Start
organizing it better by moving various functions out into their own files.
Move all the snapshot code, including the max trace code into its own
trace_snapshot.c file.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260324140145.36352d6a@gandalf.local.home
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The testing for tracing was triggering a timestamp count issue that was
always off by one. This has been happening for some time but has never
been reported by anyone else. It was finally discovered to be an issue
with the "uptime" (jiffies) clock that happened to be traced and the
internal recursion caused the discrepancy. This would have been much
easier to solve if the clock function being used was displayed when the
error was detected.
Add the clock function to the error output.
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260323202212.479bb288@gandalf.local.home
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
trace/ring-buffer/core
The commit f35dbac69421 ("ring-buffer: Fix to update per-subbuf entries of
persistent ring buffer") was a fix and merged upstream. It is needed for
some other work in the ring buffer. The current branch has the remote
buffer code that is shared with the Arm64 subsystem and can't be rebased.
Merge in the upstream commit to allow continuing of the ring buffer work.
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
If fprobe_entry does not fill the allocated fgraph_data completely, the
unused part does not have to be zeroed.
fgraph_data is a short-lived part of the shadow stack. The preceding
length field allows locating the end regardless of the content.
Link: https://lore.kernel.org/all/20260324084804.375764-1-martin@kaiser.cx/
Signed-off-by: Martin Kaiser <martin@kaiser.cx>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
Currently, print_function_args() prints enum parameter values
in decimal format, reducing trace log readability.
Use BTF information to resolve enum parameters and print their
symbolic names (where available). This improves readability by
showing meaningful identifiers instead of raw numbers.
Before:
mod_memcg_lruvec_state(lruvec=0xffff..., idx=5, val=320)
After:
mod_memcg_lruvec_state(lruvec=0xffff..., idx=5 [NR_SLAB_RECLAIMABLE_B], val=320)
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Link: https://patch.msgid.link/20260209071949.4040193-1-dolinux.peng@gmail.com
Signed-off-by: Donglin Peng <pengdonglin@xiaomi.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The tracing_open_file_tr() function currently copies the trace_event_file
pointer from inode->i_private to file->private_data when the file is
successfully opened. This duplication is not particularly useful, as all
event code should utilize event_file_file() or event_file_data() to
retrieve a trace_event_file pointer from a file struct and these access
functions read file->f_inode->i_private. Moreover, this setup requires the
code for opening hist files to explicitly clear file->private_data before
calling single_open(), since this function expects the private_data member
to be set to NULL and uses it to store a pointer to a seq_file.
Remove the unnecessary setting of file->private_data in
tracing_open_file_tr() and simplify the hist code.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-6-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The tracing code provides two functions event_file_file() and
event_file_data() to obtain a trace_event_file pointer from a file struct.
The primary method to use is event_file_file(), as it checks for the
EVENT_FILE_FL_FREED flag to determine whether the event is being removed.
The second function event_file_data() is an optimization for retrieving the
same data when the event_mutex is still held.
In the past, when removing an event directory in remove_event_file_dir(),
the code set i_private to NULL for all event files and readers were
expected to check for this state to recognize that the event is being
removed. In the case of event_id_read(), the value was read using
event_file_data() without acquiring the event_mutex. This required
event_file_data() to use READ_ONCE() when retrieving the i_private data.
With the introduction of eventfs, i_private is assigned when an eventfs
inode is allocated and remains set throughout its lifetime.
Remove the now unnecessary READ_ONCE() access to i_private in both
event_file_file() and event_file_data(). Inline the access to i_private in
remove_event_file_dir(), which allows event_file_data() to handle i_private
solely as a trace_event_file pointer. Add a check in event_file_data() to
ensure that the event_mutex is held and that file->flags doesn't have the
EVENT_FILE_FL_FREED flag set. Finally, move event_file_data() immediately
after event_file_code() since the latter provides a comment explaining how
both functions should be used together.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-5-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The event_filter_write() function calls event_file_file() to retrieve
a trace_event_file associated with a given file struct. If a non-NULL
pointer is returned, the function then checks whether the trace_event_file
instance has the EVENT_FILE_FL_FREED flag set. This check is redundant
because event_file_file() already performs this validation and returns NULL
if the flag is set. The err value is also already initialized to -ENODEV.
Remove the unnecessary check for EVENT_FILE_FL_FREED in
event_filter_write().
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Tom Zanussi <zanussi@kernel.org>
Link: https://patch.msgid.link/20260219162737.314231-4-petr.pavlu@suse.com
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The sunrpc change to use trace_printk() for debugging caused
a new warning for every instance of dprintk() in some configurations,
when -Wformat-security is enabled:
fs/nfs/getroot.c: In function 'nfs_get_root':
fs/nfs/getroot.c:90:17: error: format not a string literal and no format arguments [-Werror=format-security]
90 | nfs_errorf(fc, "NFS: Couldn't getattr on root");
I've been slowly chipping away at those warnings over time with the
intention of enabling them by default in the future. While I could not
figure out why this only happens for this one instance, I see that the
__trace_bprintk() function is always called with a local variable as
the format string, rather than a literal.
Move the __printf(2,3) annotation on this function from the declaration
to the caller. As this is can only be validated for literals, the
attribute on the declaration causes the warnings every time, but
removing it entirely introduces a new warning on the __ftrace_vbprintk()
definition.
The format strings still get checked because the underlying literal keeps
getting passed into __trace_printk() in the "else" branch, which is not
taken but still evaluated for compile-time warnings.
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Anna Schumaker <anna.schumaker@oracle.com>
Cc: Chuck Lever <chuck.lever@oracle.com>
Cc: Simon Horman <horms@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Yury Norov <ynorov@nvidia.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Link: https://patch.msgid.link/20260203164545.3174910-1-arnd@kernel.org
Fixes: ec7d8e68ef0e ("sunrpc: add a Kconfig option to redirect dfprintk() output to trace buffer")
Acked-by: Jeff Layton <jlayton@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When the check_undefined command in kernel/trace/Makefile fails, there
is no output, making it hard to understand why the build failed. Capture
the output of the $(NM) + grep command and print it when failing to make
it clearer what the problem is.
Fixes: a717943d8ecc ("tracing: Check for undefined symbols in simple_ring_buffer")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Vincent Donnefort <vdonnefort@google.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260320-cmd_check_undefined-verbose-v1-1-54fc5b061f94@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The modify logic registers temporary ftrace_ops object (tmp_ops) to trigger
the slow path for all direct callers to be able to safely modify attached
addresses.
At the moment we use ops->func_hash for tmp_ops filter, which represents all
the systems attachments. It's faster to use just the passed hash filter, which
contains only the modified sites and is always a subset of the ops->func_hash.
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Menglong Dong <menglong8.dong@gmail.com>
Cc: Song Liu <song@kernel.org>
Link: https://patch.msgid.link/20260312123738.129926-1-jolsa@kernel.org
Fixes: e93672f770d7 ("ftrace: Add update_ftrace_direct_mod function")
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Since the validation loop in rb_meta_validate_events() updates the same
cpu_buffer->head_page->entries, the other subbuf entries are not updated.
Fix to use head_page to update the entries field, since it is the cursor
in this loop.
Cc: stable@vger.kernel.org
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Ian Rogers <irogers@google.com>
Fixes: 5f3b6e839f3c ("ring-buffer: Validate boot range memory events")
Link: https://patch.msgid.link/177391153882.193994.17158784065013676533.stgit@mhiramat.tok.corp.google.com
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
When the "copy_trace_marker" option is enabled for an instance, anything
written into /sys/kernel/tracing/trace_marker is also copied into that
instances buffer. When the option is set, that instance's trace_array
descriptor is added to the marker_copies link list. This list is protected
by RCU, as all iterations uses an RCU protected list traversal.
When the instance is deleted, all the flags that were enabled are cleared.
This also clears the copy_trace_marker flag and removes the trace_array
descriptor from the list.
The issue is after the flags are called, a direct call to
update_marker_trace() is performed to clear the flag. This function
returns true if the state of the flag changed and false otherwise. If it
returns true here, synchronize_rcu() is called to make sure all readers
see that its removed from the list.
But since the flag was already cleared, the state does not change and the
synchronization is never called, leaving a possible UAF bug.
Move the clearing of all flags below the updating of the copy_trace_marker
option which then makes sure the synchronization is performed.
Also use the flag for checking the state in update_marker_trace() instead
of looking at if the list is empty.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260318185512.1b6c7db4@gandalf.local.home
Fixes: 7b382efd5e8a ("tracing: Allow the top level trace_marker to write into another instances")
Reported-by: Sasha Levin <sashal@kernel.org>
Closes: https://lore.kernel.org/all/20260225133122.237275-1-sashal@kernel.org/
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The system call trace events call trace_user_fault_read() to read the user
space part of some system calls. This is done by grabbing a per-cpu
buffer, disabling migration, enabling preemption, calling
copy_from_user(), disabling preemption, enabling migration and checking if
the task was preempted while preemption was enabled. If it was, the buffer
is considered corrupted and it tries again.
There's a safety mechanism that will fail out of this loop if it fails 100
times (with a warning). That warning message was triggered in some
pi_futex stress tests. Enabling the sched_switch trace event and
traceoff_on_warning, showed the problem:
pi_mutex_hammer-1375 [006] d..21 138.981648: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981651: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981656: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981659: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981664: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981667: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981671: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981675: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981679: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981682: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981687: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981690: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981695: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981698: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981703: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981706: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981711: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981714: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981719: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981722: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981727: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981730: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
pi_mutex_hammer-1375 [006] d..21 138.981735: sched_switch: prev_comm=pi_mutex_hammer prev_pid=1375 prev_prio=95 prev_state=R+ ==> next_comm=migration/6 next_pid=47 next_prio=0
migration/6-47 [006] d..2. 138.981738: sched_switch: prev_comm=migration/6 prev_pid=47 prev_prio=0 prev_state=S ==> next_comm=pi_mutex_hammer next_pid=1375 next_prio=95
What happened was the task 1375 was flagged to be migrated. When
preemption was enabled, the migration thread woke up to migrate that task,
but failed because migration for that task was disabled. This caused the
loop to fail to exit because the task scheduled out while trying to read
user space.
Every time the task enabled preemption the migration thread would schedule
in, try to migrate the task, fail and let the task continue. But because
the loop would only enable preemption with migration disabled, it would
always fail because each time it enabled preemption to read user space,
the migration thread would try to migrate it.
To solve this, when the loop fails to read user space without being
scheduled out, enabled and disable preemption with migration enabled. This
will allow the migration task to successfully migrate the task and the
next loop should succeed to read user space without being scheduled out.
Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Link: https://patch.msgid.link/20260316130734.1858a998@gandalf.local.home
Fixes: 64cf7d058a005 ("tracing: Have trace_marker use per-cpu data to read user space")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Resolve conflict between this change in the upstream kernel:
4c652a47722f ("rseq: Mark rseq_arm_slice_extension_timer() __always_inline")
... and this pending change in timers/core:
0e98eb14814e ("entry: Prepare for deferred hrtimer rearming")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
|
|
Restore the SPDX tag that was accidentally dropped.
Fixes: 7e4b6c94300e3 ("tracing: add more symbols to whitelist")
Reported-by: Nathan Chancellor <nathan@kernel.org>
Cc: Arnd Bergmann <arnd@kernel.org>
Cc: Vincent Donnefort <vdonnefort@google.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://patch.msgid.link/20260317194252.1890568-1-maz@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Compiler and tooling-generated symbols are difficult to maintain
across all supported architectures. Make the allowlist more robust by
replacing the harcoded list with a mechanism that automatically detects
these symbols.
This mechanism generates a C function designed to trigger common
compiler-inserted symbols.
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Tested-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260316092845.3367411-1-vdonnefort@google.com
[maz: added __msan prefix to allowlist as pointed out by Arnd]
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Randconfig builds show a number of cryptic build errors from
hitting undefined symbols in simple_ring_buffer.o:
make[7]: *** [/home/arnd/arm-soc/kernel/trace/Makefile:147: kernel/trace/simple_ring_buffer.o.checked] Error 1
These happen with CONFIG_TRACE_BRANCH_PROFILING, CONFIG_KASAN_HW_TAGS,
CONFIG_STACKPROTECTOR, CONFIG_DEBUG_IRQFLAGS and indirectly from WARN_ON().
Add exceptions for each one that I have hit so far on arm64, x86_64 and arm
randconfig builds.
Other architectures likely hit additional ones, so it would be nice
to produce a little more verbose output that include the name of the
missing symbols directly.
Fixes: a717943d8ecc ("tracing: Check for undefined symbols in simple_ring_buffer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Link: https://patch.msgid.link/20260312123601.625063-2-arnd@kernel.org
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
Undefined symbols are not allowed for simple_ring_buffer.c. But some
compiler emitted symbols are missing in the allowlist. Update it.
Reported-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Fixes: a717943d8ecc ("tracing: Check for undefined symbols in simple_ring_buffer")
Closes: https://lore.kernel.org/all/20260311221816.GA316631@ax162/
Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Link: https://patch.msgid.link/20260312113535.2213350-1-vdonnefort@google.com
Signed-off-by: Marc Zyngier <maz@kernel.org>
|
|
The sentinel value added by the wrapper macros __print_symbolic() et al
prevents the callers from adding their own trailing comma. This makes
constructing symbol list dynamically based on kconfig values tedious.
Drop the sentinel elements, so callers can either specify the trailing
comma or not, just like in regular array initializers.
Signed-off-by: Thomas Weißschuh (Schneider Electric) <thomas.weissschuh@linutronix.de>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Link: https://patch.msgid.link/20260311-hrtimer-cleanups-v1-2-095357392669@linutronix.de
|
|
The simple_ring_buffer implementation must remain simple enough to be
used by the pKVM hypervisor. Prevent the object build if unresolved
symbols are found.
Link: https://patch.msgid.link/20260309162516.2623589-19-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add load/unload callback used for each admitted page in the ring-buffer.
This will be later useful for the pKVM hypervisor which uses a different
VA space and need to dynamically map/unmap the ring-buffer pages.
Link: https://patch.msgid.link/20260309162516.2623589-18-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a module to help testing the tracefs support for trace remotes. This
module:
* Use simple_ring_buffer to write into a ring-buffer.
* Declare a single "selftest" event that can be triggered from
user-space.
* Register a "test" trace remote.
This is intended to be used by trace remote selftests.
Link: https://patch.msgid.link/20260309162516.2623589-15-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a simple implementation of the kernel ring-buffer. This intends to
be used later by ring-buffer remotes such as the pKVM hypervisor, hence
the need for a cut down version (write only) without any dependency.
Link: https://patch.msgid.link/20260309162516.2623589-14-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In preparation for allowing the writing of ring-buffer compliant pages
outside of ring_buffer.c, move buffer_data_page and timestamps encoding
macros into the publicly available ring_buffer_types.h.
Link: https://patch.msgid.link/20260309162516.2623589-13-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Just like for the kernel events directory, add 'enable', 'header_page'
and 'header_event' at the root of the trace remote events/ directory.
Link: https://patch.msgid.link/20260309162516.2623589-11-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
An event is predefined point in the writer code that allows to log
data. Following the same scheme as kernel events, add remote events,
described to user-space within the events/ tracefs directory found in
the corresponding trace remote.
Remote events are expected to be described during the trace remote
registration.
Add also a .enable_event callback for trace_remote to toggle the event
logging, if supported.
Link: https://patch.msgid.link/20260309162516.2623589-10-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add a .init call back so the trace remote callers can add entries to the
tracefs directory.
Link: https://patch.msgid.link/20260309162516.2623589-9-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Allow reading the trace file for trace remotes. This performs a
non-consuming read of the trace buffer.
Link: https://patch.msgid.link/20260309162516.2623589-8-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Allow to reset the trace remote buffer by writing to the Tracefs "trace"
file. This is similar to the regular Tracefs interface.
Link: https://patch.msgid.link/20260309162516.2623589-7-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
A trace remote relies on ring-buffer remotes to read and control
compatible tracing buffers, written by entity such as firmware or
hypervisor.
Add a Tracefs directory remotes/ that contains all instances of trace
remotes. Each instance follows the same hierarchy as any other to ease
the support by existing user-space tools.
This currently does not provide any event support, which will come
later.
Link: https://patch.msgid.link/20260309162516.2623589-6-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Hopefully, the remote will only swap pages on the kernel instruction (via
the swap_reader_page() callback). This means we know at what point the
ring-buffer geometry has changed. It is therefore possible to rearrange
the kernel view of that ring-buffer to allow non-consuming read.
Link: https://patch.msgid.link/20260309162516.2623589-5-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add ring-buffer remotes to support entities outside of the kernel (such
as firmware or a hypervisor) that writes events into a ring-buffer using
the tracefs format
Require a description of the ring-buffer pages (struct
trace_buffer_desc) and callbacks (swap_reader_page and reset) to set up
the ring-buffer on the kernel side.
Expect the remote entity to maintain and update the meta-page.
Link: https://patch.msgid.link/20260309162516.2623589-4-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
The subbuf_ids field allows to point to a specific page from the
ring-buffer based on its ID. As a preparation or the upcoming
ring-buffer remote support, point this array to the buffer_page instead
of the buffer_data_page.
Link: https://patch.msgid.link/20260309162516.2623589-3-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Add two fields pages_touched and pages_lost to the ring-buffer
meta-page. Those fields are useful to get the number of used pages in
the ring-buffer.
Link: https://patch.msgid.link/20260309162516.2623589-2-vdonnefort@google.com
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed |