aboutsummaryrefslogtreecommitdiff
path: root/kernel/trace/Makefile
AgeCommit message (Collapse)AuthorFilesLines
2025-11-26ftrace: Allow tracing of some of the tracing codeSteven Rostedt1-0/+17
There is times when tracing the tracing infrastructure can be useful for debugging the tracing code. Currently all files in the tracing directory are set to "notrace" the functions. Add a new config option FUNCTION_SELF_TRACING that will allow some of the files in the tracing infrastructure to be traced. It requires a config to enable because it will add noise to the function tracer if events and other tracing features are enabled. Tracing functions and events together is quite common, so not tracing the event code should be the default. Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Tom Zanussi <zanussi@kernel.org> Link: https://patch.msgid.link/20251120181514.736f2d5f@gandalf.local.home Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-07-30tracing: Have eprobes have their own config optionSteven Rostedt1-1/+1
Eprobes were added in 5.15 and were selected whenever any of the other probe events were selected. If kprobe events were enabled (which it is by default if kprobes are enabled) it would enable eprobe events as well. The same for uprobes and fprobes. Have eprobes have its own config and it gets enabled by default if tracing is enabled. Link: https://lore.kernel.org/all/20250729102636.b7cce553e7cc263722b12365@kernel.org/ Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Randy Dunlap <rdunlap@infradead.org> Link: https://lore.kernel.org/20250730140945.360286733@kernel.org Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-08-23tracing/probes: Move finding func-proto API and getting func-param API to ↵Masami Hiramatsu (Google)1-0/+1
trace_btf Move generic function-proto find API and getting function parameter API to BTF library code from trace_probe.c. This will avoid redundant efforts on different feature. Link: https://lore.kernel.org/all/169272155255.160970.719426926348706349.stgit@devnote2/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2023-06-06tracing/probes: Add fprobe events for tracing function entry and exit.Masami Hiramatsu (Google)1-0/+1
Add fprobe events for tracing function entry and exit instead of kprobe events. With this change, we can continue to trace function entry/exit even if the CONFIG_KPROBES_ON_FTRACE is not available. Since CONFIG_KPROBES_ON_FTRACE requires the CONFIG_DYNAMIC_FTRACE_WITH_REGS, it is not available if the architecture only supports CONFIG_DYNAMIC_FTRACE_WITH_ARGS. And that means kprobe events can not probe function entry/exit effectively on such architecture. But this can be solved if the dynamic events supports fprobe events. The fprobe event is a new dynamic events which is only for the function (symbol) entry and exit. This event accepts non register fetch arguments so that user can trace the function arguments and return values. The fprobe events syntax is here; f[:[GRP/][EVENT]] FUNCTION [FETCHARGS] f[MAXACTIVE][:[GRP/][EVENT]] FUNCTION%return [FETCHARGS] E.g. # echo 'f vfs_read $arg1' >> dynamic_events # echo 'f vfs_read%return $retval' >> dynamic_events # cat dynamic_events f:fprobes/vfs_read__entry vfs_read arg1=$arg1 f:fprobes/vfs_read__exit vfs_read%return arg1=$retval # echo 1 > events/fprobes/enable # head -n 20 trace | tail # TASK-PID CPU# ||||| TIMESTAMP FUNCTION # | | | ||||| | | sh-142 [005] ...1. 448.386420: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.386436: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 sh-142 [005] ...1. 448.386451: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.386458: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 sh-142 [005] ...1. 448.386469: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.386476: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 sh-142 [005] ...1. 448.602073: vfs_read__entry: (vfs_read+0x4/0x340) arg1=0xffff888007f7c540 sh-142 [005] ..... 448.602089: vfs_read__exit: (ksys_read+0x75/0x100 <- vfs_read) arg1=0x1 Link: https://lore.kernel.org/all/168507469754.913472.6112857614708350210.stgit@mhiramat.roam.corp.google.com/ Reported-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/all/202302011530.7vm4O8Ro-lkp@intel.com/ Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-07-30rv: Add Runtime Verification (RV) interfaceDaniel Bristot de Oliveira1-0/+1
RV is a lightweight (yet rigorous) method that complements classical exhaustive verification techniques (such as model checking and theorem proving) with a more practical approach to complex systems. RV works by analyzing the trace of the system's actual execution, comparing it against a formal specification of the system behavior. RV can give precise information on the runtime behavior of the monitored system while enabling the reaction for unexpected events, avoiding, for example, the propagation of a failure on safety-critical systems. The development of this interface roots in the development of the paper: De Oliveira, Daniel Bristot; Cucinotta, Tommaso; De Oliveira, Romulo Silva. Efficient formal verification for the Linux kernel. In: International Conference on Software Engineering and Formal Methods. Springer, Cham, 2019. p. 315-332. And: De Oliveira, Daniel Bristot. Automata-based formal analysis and verification of the real-time Linux kernel. PhD Thesis, 2020. The RV interface resembles the tracing/ interface on purpose. The current path for the RV interface is /sys/kernel/tracing/rv/. It presents these files: "available_monitors" - List the available monitors, one per line. For example: # cat available_monitors wip wwnr "enabled_monitors" - Lists the enabled monitors, one per line; - Writing to it enables a given monitor; - Writing a monitor name with a '!' prefix disables it; - Truncating the file disables all enabled monitors. For example: # cat enabled_monitors # echo wip > enabled_monitors # echo wwnr >> enabled_monitors # cat enabled_monitors wip wwnr # echo '!wip' >> enabled_monitors # cat enabled_monitors wwnr # echo > enabled_monitors # cat enabled_monitors # Note that more than one monitor can be enabled concurrently. "monitoring_on" - It is an on/off general switcher for monitoring. Note that it does not disable enabled monitors or detach events, but stop the per-entity monitors of monitoring the events received from the system. It resembles the "tracing_on" switcher. "monitors/" Each monitor will have its one directory inside "monitors/". There the monitor specific files will be presented. The "monitors/" directory resembles the "events" directory on tracefs. For example: # cd monitors/wip/ # ls desc enable # cat desc wakeup in preemptive per-cpu testing monitor. # cat enable 0 For further information, see the comments in the header of kernel/trace/rv/rv.c from this patch. Link: https://lkml.kernel.org/r/a4bfe038f50cb047bfb343ad0e12b0e646ab308b.1659052063.git.bristot@kernel.org Cc: Wim Van Sebroeck <wim@linux-watchdog.org> Cc: Guenter Roeck <linux@roeck-us.net> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: "Paul E. McKenney" <paulmck@kernel.org> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Gabriele Paoloni <gpaoloni@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Clark Williams <williams@redhat.com> Cc: Tao Zhou <tao.zhou@linux.dev> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-trace-devel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-05-26tracing: Disable kcov on trace_preemptirq.cCongyu Liu1-0/+4
Functions in trace_preemptirq.c could be invoked from early interrupt code that bypasses kcov trace function's in_task() check. Disable kcov on this file to reduce random code coverage. Link: https://lkml.kernel.org/r/20220523063033.1778974-1-liu3101@purdue.edu Acked-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Congyu Liu <liu3101@purdue.edu> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-03-24Merge tag 'net-next-5.18' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...
2022-03-17rethook: Add a generic return hookMasami Hiramatsu1-0/+1
Add a return hook framework which hooks the function return. Most of the logic came from the kretprobe, but this is independent from kretprobe. Note that this is expected to be used with other function entry hooking feature, like ftrace, fprobe, adn kprobes. Eventually this will replace the kretprobe (e.g. kprobe + rethook = kretprobe), but at this moment, this is just an additional hook. Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735285066.1084943.9259661137330166643.stgit@devnote2
2022-03-17fprobe: Add ftrace based probe APIsMasami Hiramatsu1-0/+1
The fprobe is a wrapper API for ftrace function tracer. Unlike kprobes, this probes only supports the function entry, but this can probe multiple functions by one fprobe. The usage is similar, user will set their callback to fprobe::entry_handler and call register_fprobe*() with probed functions. There are 3 registration interfaces, - register_fprobe() takes filtering patterns of the functin names. - register_fprobe_ips() takes an array of ftrace-location addresses. - register_fprobe_syms() takes an array of function names. The registered fprobes can be unregistered with unregister_fprobe(). e.g. struct fprobe fp = { .entry_handler = user_handler }; const char *targets[] = { "func1", "func2", "func3"}; ... ret = register_fprobe_syms(&fp, targets, ARRAY_SIZE(targets)); ... unregister_fprobe(&fp); Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org> Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/164735283857.1084943.1154436951479395551.stgit@devnote2
2022-02-10user_events: Add minimal support for trace_event into ftraceBeau Belgrave1-0/+1
Minimal support for interacting with dynamic events, trace_event and ftrace. Core outline of flow between user process, ioctl and trace_event APIs. User mode processes that wish to use trace events to get data into ftrace, perf, eBPF, etc are limited to uprobes today. The user events features enables an ABI for user mode processes to create and write to trace events that are isolated from kernel level trace events. This enables a faster path for tracing from user mode data as well as opens managed code to participate in trace events, where stub locations are dynamic. User processes often want to trace only when it's useful. To enable this a set of pages are mapped into the user process space that indicate the current state of the user events that have been registered. User processes can check if their event is hooked to a trace/probe, and if it is, emit the event data out via the write() syscall. Two new files are introduced into tracefs to accomplish this: user_events_status - This file is mmap'd into participating user mode processes to indicate event status. user_events_data - This file is opened and register/delete ioctl's are issued to create/open/delete trace events that can be used for tracing. The typical scenario is on process start to mmap user_events_status. Processes then register the events they plan to use via the REG ioctl. The ioctl reads and updates the passed in user_reg struct. The status_index of the struct is used to know the byte in the status page to check for that event. The write_index of the struct is used to describe that event when writing out to the fd that was used for the ioctl call. The data must always include this index first when writing out data for an event. Data can be written either by write() or by writev(). For example, in memory: int index; char data[]; Psuedo code example of typical usage: struct user_reg reg; int page_fd = open("user_events_status", O_RDWR); char *page_data = mmap(NULL, PAGE_SIZE, PROT_READ, MAP_SHARED, page_fd, 0); close(page_fd); int data_fd = open("user_events_data", O_RDWR); reg.size = sizeof(reg); reg.name_args = (__u64)"test"; ioctl(data_fd, DIAG_IOCSREG, &reg); int status_id = reg.status_index; int write_id = reg.write_index; struct iovec io[2]; io[0].iov_base = &write_id; io[0].iov_len = sizeof(write_id); io[1].iov_base = payload; io[1].iov_len = sizeof(payload); if (page_data[status_id]) writev(data_fd, io, 2); User events are also exposed via the dynamic_events tracefs file for both create and delete. Current status is exposed via the user_events_status tracefs file. Simple example to register a user event via dynamic_events: echo u:test >> dynamic_events cat dynamic_events u:test If an event is hooked to a probe, the probe hooked shows up: echo 1 > events/user_events/test/enable cat user_events_status 1:test # Used by ftrace Active: 1 Busy: 1 Max: 4096 If an event is not hooked to a probe, no probe status shows up: echo 0 > events/user_events/test/enable cat user_events_status 1:test Active: 1 Busy: 0 Max: 4096 Users can describe the trace event format via the following format: name[:FLAG1[,FLAG2...] [field1[;field2...]] Each field has the following format: type name Example for char array with a size of 20 named msg: echo 'u:detailed char[20] msg' >> dynamic_events cat dynamic_events u:detailed char[20] msg Data offsets are based on the data written out via write() and will be updated to reflect the correct offset in the trace_event fields. For dynamic data it is recommended to use the new __rel_loc data type. This type will be the same as __data_loc, but the offset is relative to this entry. This allows user_events to not worry about what common fields are being inserted before the data. The above format is valid for both the ioctl and the dynamic_events file. Link: https://lkml.kernel.org/r/20220118204326.2169-2-beaub@linux.microsoft.com Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2021-10-05tracing: Place trace_pid_list logic into abstract functionsSteven Rostedt (VMware)1-0/+1
Instead of having the logic that does trace_pid_list open coded, wrap it in abstract functions. This will allow a rewrite of the logic that implements the trace_pid_list without affecting the users. Note, this causes a change in behavior. Every time a pid is written into the set_*_pid file, it creates a new list and uses RCU to update it. If pid_max is lowered, but there was a pid currently in the list that was higher than pid_max, those pids will now be removed on updating the list. The old behavior kept that from happening. The rewrite of the pid_list logic will no longer depend on pid_max, and will return the old behavior. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-08-20tracing: Add a probe that attaches to trace eventsTzvetomir Stoyanov (VMware)1-0/+1
A new dynamic event is introduced: event probe. The event is attached to an existing tracepoint and uses its fields as arguments. The user can specify custom format string of the new event, select what tracepoint arguments will be printed and how to print them. An event probe is created by writing configuration string in 'dynamic_events' ftrace file: e[:[SNAME/]ENAME] SYSTEM/EVENT [FETCHARGS] - Set an event probe -:SNAME/ENAME - Delete an event probe Where: SNAME - System name, if omitted 'eprobes' is used. ENAME - Name of the new event in SNAME, if omitted the SYSTEM_EVENT is used. SYSTEM - Name of the system, where the tracepoint is defined, mandatory. EVENT - Name of the tracepoint event in SYSTEM, mandatory. FETCHARGS - Arguments: <name>=$<field>[:TYPE] - Fetch given filed of the tracepoint and print it as given TYPE with given name. Supported types are: (u8/u16/u32/u64/s8/s16/s32/s64), basic type (x8/x16/x32/x64), hexadecimal types "string", "ustring" and bitfield. Example, attach an event probe on openat system call and print name of the file that will be opened: echo "e:esys/eopen syscalls/sys_enter_openat file=\$filename:string" >> dynamic_events A new dynamic event is created in events/esys/eopen/ directory. It can be deleted with: echo "-:esys/eopen" >> dynamic_events Filters, triggers and histograms can be attached to the new event, it can be matched in synthetic events. There is one limitation - an event probe can not be attached to kprobe, uprobe or another event probe. Link: https://lkml.kernel.org/r/20210812145805.2292326-1-tz.stoyanov@gmail.com Link: https://lkml.kernel.org/r/20210819152825.142428383@goodmis.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Co-developed-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-06-25trace: Add osnoise tracerDaniel Bristot de Oliveira1-0/+1
In the context of high-performance computing (HPC), the Operating System Noise (*osnoise*) refers to the interference experienced by an application due to activities inside the operating system. In the context of Linux, NMIs, IRQs, SoftIRQs, and any other system thread can cause noise to the system. Moreover, hardware-related jobs can also cause noise, for example, via SMIs. The osnoise tracer leverages the hwlat_detector by running a similar loop with preemption, SoftIRQs and IRQs enabled, thus allowing all the sources of *osnoise* during its execution. Using the same approach of hwlat, osnoise takes note of the entry and exit point of any source of interferences, increasing a per-cpu interference counter. The osnoise tracer also saves an interference counter for each source of interference. The interference counter for NMI, IRQs, SoftIRQs, and threads is increased anytime the tool observes these interferences' entry events. When a noise happens without any interference from the operating system level, the hardware noise counter increases, pointing to a hardware-related noise. In this way, osnoise can account for any source of interference. At the end of the period, the osnoise tracer prints the sum of all noise, the max single noise, the percentage of CPU available for the thread, and the counters for the noise sources. Usage Write the ASCII text "osnoise" into the current_tracer file of the tracing system (generally mounted at /sys/kernel/tracing). For example:: [root@f32 ~]# cd /sys/kernel/tracing/ [root@f32 tracing]# echo osnoise > current_tracer It is possible to follow the trace by reading the trace trace file:: [root@f32 tracing]# cat trace # tracer: osnoise # # _-----=> irqs-off # / _----=> need-resched # | / _---=> hardirq/softirq # || / _--=> preempt-depth MAX # || / SINGLE Interference counters: # |||| RUNTIME NOISE % OF CPU NOISE +-----------------------------+ # TASK-PID CPU# |||| TIMESTAMP IN US IN US AVAILABLE IN US HW NMI IRQ SIRQ THREAD # | | | |||| | | | | | | | | | | <...>-859 [000] .... 81.637220: 1000000 190 99.98100 9 18 0 1007 18 1 <...>-860 [001] .... 81.638154: 1000000 656 99.93440 74 23 0 1006 16 3 <...>-861 [002] .... 81.638193: 1000000 5675 99.43250 202 6 0 1013 25 21 <...>-862 [003] .... 81.638242: 1000000 125 99.98750 45 1 0 1011 23 0 <...>-863 [004] .... 81.638260: 1000000 1721 99.82790 168 7 0 1002 49 41 <...>-864 [005] .... 81.638286: 1000000 263 99.97370 57 6 0 1006 26 2 <...>-865 [006] .... 81.638302: 1000000 109 99.98910 21 3 0 1006 18 1 <...>-866 [007] .... 81.638326: 1000000 7816 99.21840 107 8 0 1016 39 19 In addition to the regular trace fields (from TASK-PID to TIMESTAMP), the tracer prints a message at the end of each period for each CPU that is running an osnoise/CPU thread. The osnoise specific fields report: - The RUNTIME IN USE reports the amount of time in microseconds that the osnoise thread kept looping reading the time. - The NOISE IN US reports the sum of noise in microseconds observed by the osnoise tracer during the associated runtime. - The % OF CPU AVAILABLE reports the percentage of CPU available for the osnoise thread during the runtime window. - The MAX SINGLE NOISE IN US reports the maximum single noise observed during the runtime window. - The Interference counters display how many each of the respective interference happened during the runtime window. Note that the example above shows a high number of HW noise samples. The reason being is that this sample was taken on a virtual machine, and the host interference is detected as a hardware interference. Tracer options The tracer has a set of options inside the osnoise directory, they are: - osnoise/cpus: CPUs at which a osnoise thread will execute. - osnoise/period_us: the period of the osnoise thread. - osnoise/runtime_us: how long an osnoise thread will look for noise. - osnoise/stop_tracing_us: stop the system tracing if a single noise higher than the configured value happens. Writing 0 disables this option. - osnoise/stop_tracing_total_us: stop the system tracing if total noise higher than the configured value happens. Writing 0 disables this option. - tracing_threshold: the minimum delta between two time() reads to be considered as noise, in us. When set to 0, the default value will be used, which is currently 5 us. Additional Tracing In addition to the tracer, a set of tracepoints were added to facilitate the identification of the osnoise source. - osnoise:sample_threshold: printed anytime a noise is higher than the configurable tolerance_ns. - osnoise:nmi_noise: noise from NMI, including the duration. - osnoise:irq_noise: noise from an IRQ, including the duration. - osnoise:softirq_noise: noise from a SoftIRQ, including the duration. - osnoise:thread_noise: noise from a thread, including the duration. Note that all the values are *net values*. For example, if while osnoise is running, another thread preempts the osnoise thread, it will start a thread_noise duration at the start. Then, an IRQ takes place, preempting the thread_noise, starting a irq_noise. When the IRQ ends its execution, it will compute its duration, and this duration will be subtracted from the thread_noise, in such a way as to avoid the double accounting of the IRQ execution. This logic is valid for all sources of noise. Here is one example of the usage of these tracepoints:: osnoise/8-961 [008] d.h. 5789.857532: irq_noise: local_timer:236 start 5789.857529929 duration 1845 ns osnoise/8-961 [008] dNh. 5789.858408: irq_noise: local_timer:236 start 5789.858404871 duration 2848 ns migration/8-54 [008] d... 5789.858413: thread_noise: migration/8:54 start 5789.858409300 duration 3068 ns osnoise/8-961 [008] .... 5789.858413: sample_threshold: start 5789.858404555 duration 8723 ns interferences 2 In this example, a noise sample of 8 microseconds was reported in the last line, pointing to two interferences. Looking backward in the trace, the two previous entries were about the migration thread running after a timer IRQ execution. The first event is not part of the noise because it took place one millisecond before. It is worth noticing that the sum of the duration reported in the tracepoints is smaller than eight us reported in the sample_threshold. The reason roots in the overhead of the entry and exit code that happens before and after any interference execution. This justifies the dual approach: measuring thread and tracing. Link: https://lkml.kernel.org/r/e649467042d60e7b62714c9c6751a56299d15119.1624372313.git.bristot@redhat.com Cc: Phil Auld <pauld@redhat.com> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Kate Carcia <kcarcia@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Ingo Molnar <mingo@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Alexandre Chartre <alexandre.chartre@oracle.com> Cc: Clark Willaims <williams@redhat.com> Cc: John Kacur <jkacur@redhat.com> Cc: Juri Lelli <juri.lelli@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Signed-off-by: Daniel Bristot de Oliveira <bristot@redhat.com> [ Made the following functions static: trace_irqentry_callback() trace_irqexit_callback() trace_intel_irqentry_callback() trace_intel_irqexit_callback() Added to include/trace.h: osnoise_arch_register() osnoise_arch_unregister() Fixed define logic for LATENCY_FS_NOTIFY Reported-by: kernel test robot <lkp@intel.com> ] Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2021-02-26tracing: add error_report_end trace pointAlexander Potapenko1-0/+1
Patch series "Add error_report_end tracepoint to KFENCE and KASAN", v3. This patchset adds a tracepoint, error_repor_end, that is to be used by KFENCE, KASAN, and potentially other bug detection tools, when they print an error report. One of the possible use cases is userspace collection of kernel error reports: interested parties can subscribe to the tracing event via tracefs, and get notified when an error report occurs. This patch (of 3): Introduce error_report_end tracepoint. It can be used in debugging tools like KASAN, KFENCE, etc. to provide extensions to the error reporting mechanisms (e.g. allow tests hook into error reporting, ease error report collection from production kernels). Another benefit would be making use of ftrace for debugging or benchmarking the tools themselves. Should we need it, the tracepoint name leaves us with the possibility to introduce a complementary error_report_start tracepoint in the future. Link: https://lkml.kernel.org/r/20210121131915.1331302-1-glider@google.com Link: https://lkml.kernel.org/r/20210121131915.1331302-2-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Suggested-by: Marco Elver <elver@google.com> Cc: Andrey Konovalov <andreyknvl@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-11-06ftrace: Add recording of functions that caused recursionSteven Rostedt (VMware)1-0/+1
This adds CONFIG_FTRACE_RECORD_RECURSION that will record to a file "recursed_functions" all the functions that caused recursion while a callback to the function tracer was running. Link: https://lkml.kernel.org/r/20201106023548.102375687@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Guo Ren <guoren@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Helge Deller <deller@gmx.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Borislav Petkov <bp@alien8.de> Cc: x86@kernel.org Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Kees Cook <keescook@chromium.org> Cc: Anton Vorontsov <anton@enomsg.org> Cc: Colin Cross <ccross@android.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Jiri Kosina <jikos@kernel.org> Cc: Miroslav Benes <mbenes@suse.cz> Cc: Petr Mladek <pmladek@suse.com> Cc: Joe Lawrence <joe.lawrence@redhat.com> Cc: Kamalesh Babulal <kamalesh@linux.vnet.ibm.com> Cc: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: linux-doc@vger.kernel.org Cc: linux-kernel@vger.kernel.org Cc: linux-csky@vger.kernel.org Cc: linux-parisc@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: linux-s390@vger.kernel.org Cc: live-patching@vger.kernel.org Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-08-09Merge tag 'kbuild-v5.9' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - run the checker (e.g. sparse) after the compiler - remove unneeded cc-option tests for old compiler flags - fix tar-pkg to install dtbs - introduce ccflags-remove-y and asflags-remove-y syntax - allow to trace functions in sub-directories of lib/ - introduce hostprogs-always-y and userprogs-always-y syntax - various Makefile cleanups * tag 'kbuild-v5.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: stop filtering out $(GCC_PLUGINS_CFLAGS) from cc-option base kbuild: include scripts/Makefile.* only when relevant CONFIG is enabled kbuild: introduce hostprogs-always-y and userprogs-always-y kbuild: sort hostprogs before passing it to ifneq kbuild: move host .so build rules to scripts/gcc-plugins/Makefile kbuild: Replace HTTP links with HTTPS ones kbuild: trace functions in subdirectories of lib/ kbuild: introduce ccflags-remove-y and asflags-remove-y kbuild: do not export LDFLAGS_vmlinux kbuild: always create directories of targets powerpc/boot: add DTB to 'targets' kbuild: buildtar: add dtbs support kbuild: remove cc-option test of -ffreestanding kbuild: remove cc-option test of -fno-stack-protector Revert "kbuild: Create directory for target DTB" kbuild: run the checker after the compiler
2020-08-10kbuild: introduce ccflags-remove-y and asflags-remove-yMasahiro Yamada1-2/+2
CFLAGS_REMOVE_<file>.o filters out flags when compiling a particular object, but there is no convenient way to do that for every object in a directory. Add ccflags-remove-y and asflags-remove-y to make it easily. Use ccflags-remove-y to clean up some Makefiles. The add/remove order works as follows: [1] KBUILD_CFLAGS specifies compiler flags used globally [2] ccflags-y adds compiler flags for all objects in the current Makefile [3] ccflags-remove-y removes compiler flags for all objects in the current Makefile (New feature) [4] CFLAGS_<file> adds compiler flags per file. [5] CFLAGS_REMOVE_<file> removes compiler flags per file. Having [3] before [4] allows us to remove flags from most (but not all) objects in the current Makefile. For example, kernel/trace/Makefile removes $(CC_FLAGS_FTRACE) from all objects in the directory, then adds it back to trace_selftest_dynamic.o and CFLAGS_trace_kprobe_selftest.o The same applies to lib/livepatch/Makefile. Please note ccflags-remove-y has no effect to the sub-directories. In contrast, the previous notation got rid of compiler flags also from all the sub-directories. The following are not affected because they have no sub-directories: arch/arm/boot/compressed/ arch/powerpc/xmon/ arch/sh/ kernel/trace/ However, lib/ has several sub-directories. To keep the behavior, I added ccflags-remove-y to all Makefiles in subdirectories of lib/, except the following: lib/vdso/Makefile - Kbuild does not descend into this Makefile lib/raid/test/Makefile - This is not used for the kernel build I think commit 2464a609ded0 ("ftrace: do not trace library functions") excluded too much. In the next commit, I will remove ccflags-remove-y from the sub-directories of lib/. Suggested-by: Sami Tolvanen <samitolvanen@google.com> Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Brendan Higgins <brendanhiggins@google.com> (KUnit) Tested-by: Anders Roxell <anders.roxell@linaro.org>
2020-07-13bpf: Use dedicated bpf_trace_printk event instead of trace_printk()Alan Maguire1-0/+2
The bpf helper bpf_trace_printk() uses trace_printk() under the hood. This leads to an alarming warning message originating from trace buffer allocation which occurs the first time a program using bpf_trace_printk() is loaded. We can instead create a trace event for bpf_trace_printk() and enable it in-kernel when/if we encounter a program using the bpf_trace_printk() helper. With this approach, trace_printk() is not used directly and no warning message appears. This work was started by Steven (see Link) and finished by Alan; added Steven's Signed-off-by with his permission. Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Link: https://lore.kernel.org/r/20200628194334.6238b933@oasis.local.home Link: https://lore.kernel.org/bpf/1594641154-18897-2-git-send-email-alan.maguire@oracle.com
2020-06-11Rebase locking/kcsan to locking/urgentThomas Gleixner1-0/+3
Merge the state of the locking kcsan branch before the read/write_once() and the atomics modifications got merged. Squash the fallout of the rebase on top of the read/write once and atomic fallback work into the merge. The history of the original branch is preserved in tag locking-kcsan-2020-06-02. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2020-06-01tracing: Move synthetic events to a separate fileTom Zanussi1-0/+1
With the addition of the in-kernel synthetic event API, synthetic events are no longer specifically tied to the histogram triggers. The synthetic event code is also making trace_event_hist.c very bloated, so for those reasons, move it to a separate file, trace_events_synth.c, along with a new trace_synth.h header file. Because synthetic events are now independent from hist triggers, add a new CONFIG_SYNTH_EVENTS config option, and have CONFIG_HIST_TRIGGERS select it, and have CONFIG_SYNTH_EVENT_GEN_TEST depend on it. Link: http://lkml.kernel.org/r/4d1fa1f85ed5982706ac44844ac92451dcb04715.1590693308.git.zanussi@kernel.org Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-03-21kcsan, trace: Make KCSAN compatible with tracingMarco Elver1-0/+3
Previously the system would lock up if ftrace was enabled together with KCSAN. This is due to recursion on reporting if the tracer code is instrumented with KCSAN. To avoid this for all types of tracing, disable KCSAN instrumentation for all of kernel/trace. Furthermore, since KCSAN relies on udelay() to introduce delay, we have to disable ftrace for udelay() (currently done for x86) in case KCSAN is used together with lockdep and ftrace. The reason is that it may corrupt lockdep IRQ flags tracing state due to a peculiar case of recursion (details in Makefile comment). Reported-by: Qian Cai <cai@lca.pw> Tested-by: Qian Cai <cai@lca.pw> Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2020-01-30tracing: Add kprobe event command generation test moduleTom Zanussi1-0/+1
Add a test module that checks the basic functionality of the in-kernel kprobe event command generation API by creating kprobe events from a module. Link: http://lkml.kernel.org/r/97e502b204f9dba948e3fa3a4315448298218787.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-30tracing: Add synth event generation test moduleTom Zanussi1-0/+1
Add a test module that checks the basic functionality of the in-kernel synthetic event generation API by generating and tracing synthetic events from a module. Link: http://lkml.kernel.org/r/fcb4dd9eb9eefb70ab20538d3529d51642389664.1580323897.git.zanussi@kernel.org Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Tom Zanussi <zanussi@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2020-01-13tracing/boot: Add boot-time tracingMasami Hiramatsu1-0/+1
Setup tracing options via extra boot config in addition to kernel command line. This adds following commands support. These are applied to the global trace instance. - ftrace.options = OPT1[,OPT2...] Enable given ftrace options. - ftrace.trace_clock = CLOCK Set given CLOCK to ftrace's trace_clock. - ftrace.buffer_size = SIZE Configure ftrace buffer size to SIZE. You can use "KB" or "MB" for that SIZE. - ftrace.events = EVENT[, EVENT2...] Enable given events on boot. You can use a wild card in EVENT. - ftrace.tracer = TRACER Set TRACER to current tracer on boot. (e.g. function) Note that this is NOT replacing the kernel parameters, because this boot config based setting is later than that. If you want to trace earlier boot events, you still need kernel parameters. Link: http://lkml.kernel.org/r/157867237723.17873.17494943526320587488.stgit@devnote2 Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
2019-12-02tracing: Introduce trace event injectionCong Wang1-0/+1
We have been trying to use rasdaemon to monitor hardware errors like correctable memory errors. rasdaemon uses trace events to monitor various hardware errors. In order to test it, we have to inject some hardware errors, unfortunately not all of them provide error injections. MCE does provide a way to inject MCE errors, but errors like PCI error and devlink error don't, it is not easy to add error injection to each of them. Instead, it is relatively easier to just allow users to inject trace events in a generic way so that all trace events can be injected. This patch introduces trace event injection, where a new 'inject' is added to each tracepoint directory. Users could write into this file with key=value pairs to specify the value of each fields of the trace event, all unspecified fields are set to zero values by default. For example, for the net/net_dev_queue tracepoint, we can inject: INJECT=/sys/kernel/debug/tracing/events/net/net_dev_queue/inject