aboutsummaryrefslogtreecommitdiff
path: root/kernel
AgeCommit message (Collapse)AuthorFilesLines
2022-12-08memcg: Fix possible use-after-free in memcg_write_event_control()Tejun Heo1-1/+0
memcg_write_event_control() accesses the dentry->d_name of the specified control fd to route the write call. As a cgroup interface file can't be renamed, it's safe to access d_name as long as the specified file is a regular cgroup file. Also, as these cgroup interface files can't be removed before the directory, it's safe to access the parent too. Prior to 347c4a874710 ("memcg: remove cgroup_event->cft"), there was a call to __file_cft() which verified that the specified file is a regular cgroupfs file before further accesses. The cftype pointer returned from __file_cft() was no longer necessary and the commit inadvertently dropped the file type check with it allowing any file to slip through. With the invarients broken, the d_name and parent accesses can now race against renames and removals of arbitrary files and cause use-after-free's. Fix the bug by resurrecting the file type check in __file_cft(). Now that cgroupfs is implemented through kernfs, checking the file operations needs to go through a layer of indirection. Instead, let's check the superblock and dentry type. Signed-off-by: Tejun Heo <tj@kernel.org> Fixes: 347c4a874710 ("memcg: remove cgroup_event->cft") Cc: stable@kernel.org # v3.14+ Reported-by: Jann Horn <jannh@google.com> Acked-by: Johannes Weiner <hannes@cmpxchg.org> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-05proc: proc_skip_spaces() shouldn't think it is working on C stringsLinus Torvalds1-12/+13
proc_skip_spaces() seems to think it is working on C strings, and ends up being just a wrapper around skip_spaces() with a really odd calling convention. Instead of basing it on skip_spaces(), it should have looked more like proc_skip_char(), which really is the exact same function (except it skips a particular character, rather than whitespace). So use that as inspiration, odd coding and all. Now the calling convention actually makes sense and works for the intended purpose. Reported-and-tested-by: Kyle Zeng <zengyhkyle@gmail.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-05proc: avoid integer type confusion in get_proc_longLinus Torvalds1-3/+2
proc_get_long() is passed a size_t, but then assigns it to an 'int' variable for the length. Let's not do that, even if our IO paths are limited to MAX_RW_COUNT (exactly because of these kinds of type errors). So do the proper test in the rigth type. Reported-by: Kyle Zeng <zengyhkyle@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-12-04Merge tag 'perf_urgent_for_v6.1_rc8' of ↵Linus Torvalds1-4/+13
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fix from Borislav Petkov: - Fix a use-after-free case where the perf pending task callback would see an already freed event * tag 'perf_urgent_for_v6.1_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Fix perf_pending_task() UaF
2022-11-29Merge tag 'net-6.1-rc8-2' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf, can and wifi. Current release - new code bugs: - eth: mlx5e: - use kvfree() in mlx5e_accel_fs_tcp_create() - MACsec, fix RX data path 16 RX security channel limit - MACsec, fix memory leak when MACsec device is deleted - MACsec, fix update Rx secure channel active field - MACsec, fix add Rx security association (SA) rule memory leak Previous releases - regressions: - wifi: cfg80211: don't allow multi-BSSID in S1G - stmmac: set MAC's flow control register to reflect current settings - eth: mlx5: - E-switch, fix duplicate lag creation - fix use-after-free when reverting termination table Previous releases - always broken: - ipv4: fix route deletion when nexthop info is not specified - bpf: fix a local storage BPF map bug where the value's spin lock field can get initialized incorrectly - tipc: re-fetch skb cb after tipc_msg_validate - wifi: wilc1000: fix Information Element parsing - packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE - sctp: fix memory leak in sctp_stream_outq_migrate() - can: can327: fix potential skb leak when netdev is down - can: add number of missing netdev freeing on error paths - aquantia: do not purge addresses when setting the number of rings - wwan: iosm: - fix incorrect skb length leading to truncated packet - fix crash in peek throughput test due to skb UAF" * tag 'net-6.1-rc8-2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits) net: ethernet: renesas: ravb: Fix promiscuous mode after system resumed MAINTAINERS: Update maintainer list for chelsio drivers ionic: update MAINTAINERS entry sctp: fix memory leak in sctp_stream_outq_migrate() packet: do not set TP_STATUS_CSUM_VALID on CHECKSUM_COMPLETE net/mlx5: Lag, Fix for loop when checking lag Revert "net/mlx5e: MACsec, remove replay window size limitation in offload path" net: marvell: prestera: Fix a NULL vs IS_ERR() check in some functions net: tun: Fix use-after-free in tun_detach() net: mdiobus: fix unbalanced node reference count net: hsr: Fix potential use-after-free tipc: re-fetch skb cb after tipc_msg_validate mptcp: fix sleep in atomic at close time mptcp: don't orphan ssk in mptcp_close() dsa: lan9303: Correct stat name ipv4: Fix route deletion when nexthop info is not specified net: wwan: iosm: fix incorrect skb length net: wwan: iosm: fix crash in peek throughput test net: wwan: iosm: fix dma_alloc_coherent incompatible pointer type net: wwan: iosm: fix kernel test robot reported error ...
2022-11-29perf: Fix perf_pending_task() UaFPeter Zijlstra1-4/+13
Per syzbot it is possible for perf_pending_task() to run after the event is free()'d. There are two related but distinct cases: - the task_work was already queued before destroying the event; - destroying the event itself queues the task_work. The first cannot be solved using task_work_cancel() since perf_release() itself might be called from a task_work (____fput), which means the current->task_works list is already empty and task_work_cancel() won't be able to find the perf_pending_task() entry. The simplest alternative is extending the perf_event lifetime to cover the task_work. The second is just silly, queueing a task_work while you know the event is going away makes no sense and is easily avoided by re-arranging how the event is marked STATE_DEAD and ensuring it goes through STATE_OFF on the way down. Reported-by: syzbot+9228d6098455bb209ec8@syzkaller.appspotmail.com Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Marco Elver <elver@google.com>
2022-11-28Merge tag 'trace-v6.1-rc6' of ↵Linus Torvalds8-11/+35
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix osnoise duration type to 64bit not 32bit - Have histogram triggers be able to handle an unexpected NULL pointer for the record event, which can happen when the histogram first starts up - Clear out ring buffers when dynamic events are removed, as the type that is saved in the ring buffer is used to read the event, and a stale type that is reused by another event could cause use after free issues - Trivial comment fix - Fix memory leak in user_event_create() * tag 'trace-v6.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Free buffers when a used dynamic event is removed tracing: Add tracing_reset_all_online_cpus_unlocked() function tracing: Fix race where histograms can be called before the event tracing/osnoise: Fix duration type tracing/user_events: Fix memory leak in user_event_create() tracing/hist: add in missing * in comment blocks
2022-11-27Merge tag 'perf_urgent_for_v6.1_rc7' of ↵Linus Torvalds1-2/+25
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: "Two more fixes to the perf sigtrap handling: - output the address in the sample only when it has been requested - handle the case where user-only events can hit in kernel and thus upset the sigtrap sanity checking" * tag 'perf_urgent_for_v6.1_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf: Consider OS filter fail perf: Fixup SIGTRAP and sample_flags interaction
2022-11-25Merge tag 'pm-6.1-rc7' of ↵Linus Torvalds1-15/+15
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull power management fixes from Rafael Wysocki: "These revert a recent change in the schedutil cpufreq governor that had not been expected to make any functional difference, but turned out to introduce a performance regression, fix an initialization issue in the amd-pstate driver and make it actually replace the venerable ACPI cpufreq driver on the supported systems by default. Specifics: - Revert a recent schedutil cpufreq governor change that introduced a performace regression on Pixel 6 (Sam Wu) - Fix amd-pstate driver initialization after running the kernel via kexec (Wyes Karny) - Turn amd-pstate into a built-in driver which allows it to take precedence over acpi-cpufreq by default on supported systems and amend it with a mechanism to disable this behavior (Perry Yuan) - Update amd-pstate documentation in accordance with the other changes made to it (Perry Yuan)" * tag 'pm-6.1-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: Documentation: add amd-pstate kernel command line options Documentation: amd-pstate: add driver working mode introduction cpufreq: amd-pstate: add amd-pstate driver parameter for mode selection cpufreq: amd-pstate: change amd-pstate driver to be built-in type cpufreq: amd-pstate: cpufreq: amd-pstate: reset MSR_AMD_PERF_CTL register at init Revert "cpufreq: schedutil: Move max CPU capacity to sugov_policy"
2022-11-25Merge tag 'mm-hotfixes-stable-2022-11-24' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "24 MM and non-MM hotfixes. 8 marked cc:stable and 16 for post-6.0 issues. There have been a lot of hotfixes this cycle, and this is quite a large batch given how far we are into the -rc cycle. Presumably a reflection of the unusually large amount of MM material which went into 6.1-rc1" * tag 'mm-hotfixes-stable-2022-11-24' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (24 commits) test_kprobes: fix implicit declaration error of test_kprobes nilfs2: fix nilfs_sufile_mark_dirty() not set segment usage as dirty mm/cgroup/reclaim: fix dirty pages throttling on cgroup v1 mm: fix unexpected changes to {failslab|fail_page_alloc}.attr swapfile: fix soft lockup in scan_swap_map_slots hugetlb: fix __prep_compound_gigantic_page page flag setting kfence: fix stack trace pruning proc/meminfo: fix spacing in SecPageTables mm: multi-gen LRU: retry folios written back while isolated mailmap: update email address for Satya Priya mm/migrate_device: return number of migrating pages in args->cpages kbuild: fix -Wimplicit-function-declaration in license_is_gpl_compatible MAINTAINERS: update Alex Hung's email address mailmap: update Alex Hung's email address mm: mmap: fix documentation for vma_mas_szero mm/damon/sysfs-schemes: skip stats update if the scheme directory is removed mm/memory: return vm_fault_t result from migrate_to_ram() callback mm: correctly charge compressed memory to its memcg ipc/shm: call underlying open/close vm_ops gcov: clang: fix the buffer overflow issue ...
2022-11-24perf: Consider OS filter failPeter Zijlstra1-2/+22
Some PMUs (notably the traditional hardware kind) have boundary issues with the OS filter. Specifically, it is possible for perf_event_attr::exclude_kernel=1 events to trigger in-kernel due to SKID or errata. This can upset the sigtrap logic some and trigger the WARN. However, if this invalid sample is the first we must not loose the SIGTRAP, OTOH if it is the second, it must not override the pending_addr with a (possibly) invalid one. Fixes: ca6c21327c6a ("perf: Fix missing SIGTRAPs") Reported-by: Pengfei Xu <pengfei.xu@intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Marco Elver <elver@google.com> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Link: https://lkml.kernel.org/r/Y3hDYiXwRnJr8RYG@xpf.sh.intel.com
2022-11-24perf: Fixup SIGTRAP and sample_flags interactionPeter Zijlstra1-1/+4
The perf_event_attr::sigtrap functionality relies on data->addr being set. However commit 7b0846301531 ("perf: Use sample_flags for addr") changed this to only initialize data->addr when not 0. Fixes: 7b0846301531 ("perf: Use sample_flags for addr") Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lkml.kernel.org/r/Y3426b4OimE%2FI5po%40hirez.programming.kicks-ass.net
2022-11-23tracing: Free buffers when a used dynamic event is removedSteven Rostedt (Google)2-1/+12
After 65536 dynamic events have been added and removed, the "type" field of the event then uses the first type number that is available (not currently used by other events). A type number is the identifier of the binary blobs in the tracing ring buffer (known as events) to map them to logic that can parse the binary blob. The issue is that if a dynamic event (like a kprobe event) is traced and is in the ring buffer, and then that event is removed (because it is dynamic, which means it can be created and destroyed), if another dynamic event is created that has the same number that new event's logic on parsing the binary blob will be used. To show how this can be an issue, the following can crash the kernel: # cd /sys/kernel/tracing # for i in `seq 65536`; do echo 'p:kprobes/foo do_sys_openat2 $arg1:u32' > kprobe_events # done For every iteration of the above, the writing to the kprobe_events will remove the old event and create a new one (with the same format) and increase the type number to the next available on until the type number reaches over 65535 which is the max number for the 16 bit type. After it reaches that number, the logic to allocate a new number simply looks for the next available number. When an dynamic event is removed, that number is then available to be reused by the next dynamic event created. That is, once the above reaches the max number, the number assigned to the event in that loop will remain the same. Now that means deleting one dynamic event and created another will reuse the previous events type number. This is where bad things can happen. After the above loop finishes, the kprobes/foo event which reads the do_sys_openat2 function call's first parameter as an integer. # echo 1 > kprobes/foo/enable # cat /etc/passwd > /dev/null # cat trace cat-2211 [005] .... 2007.849603: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849620: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849838: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 cat-2211 [005] .... 2007.849880: foo: (do_sys_openat2+0x0/0x130) arg1=4294967196 # echo 0 > kprobes/foo/enable Now if we delete the kprobe and create a new one that reads a string: # echo 'p:kprobes/foo do_sys_openat2 +0($arg2):string' > kprobe_events And now we can the trace: # cat trace sendmail-1942 [002] ..... 530.136320: foo: (do_sys_openat2+0x0/0x240) arg1= cat-2046 [004] ..... 530.930817: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" cat-2046 [004] ..... 530.930961: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" cat-2046 [004] ..... 530.934278: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" cat-2046 [004] ..... 530.934563: foo: (do_sys_openat2+0x0/0x240) arg1="������������������������������������������������������������������������������������������������" bash-1515 [007] ..... 534.299093: foo: (do_sys_openat2+0x0/0x240) arg1="kkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkkk���������@��4Z����;Y�����U And dmesg has: ================================================================== BUG: KASAN: use-after-free in string+0xd4/0x1c0 Read of size 1 at addr ffff88805fdbbfa0 by task cat/2049 CPU: 0 PID: 2049 Comm: cat Not tainted 6.1.0-rc6-test+ #641 Hardware name: Hewlett-Packard HP Compaq Pro 6300 SFF/339A, BIOS K01 v03.03 07/14/2016 Call Trace: <TASK> dump_stack_lvl+0x5b/0x77 print_report+0x17f/0x47b kasan_report+0xad/0x130 string+0xd4/0x1c0 vsnprintf+0x500/0x840 seq_buf_vprintf+0x62/0xc0 trace_seq_printf+0x10e/0x1e0 print_type_string+0x90/0xa0 print_kprobe_event+0x16b/0x290 print_trace_line+0x451/0x8e0 s_show+0x72/0x1f0 seq_read_iter+0x58e/0x750 seq_read+0x115/0x160 vfs_read+0x11d/0x460 ksys_read+0xa9/0x130 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd RIP: 0033:0x7fc2e972ade2 Code: c0 e9 b2 fe ff ff 50 48 8d 3d b2 3f 0a 00 e8 05 f0 01 00 0f 1f 44 00 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 0f 05 <48> 3d 00 f0 ff ff 77 56 c3 0f 1f 44 00 00 48 83 ec 28 48 89 54 24 RSP: 002b:00007ffc64e687c8 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 RAX: ffffffffffffffda RBX: 0000000000020000 RCX: 00007fc2e972ade2 RDX: 0000000000020000 RSI: 00007fc2e980d000 RDI: 0000000000000003 RBP: 00007fc2e980d000 R08: 00007fc2e980c010 R09: 0000000000000000 R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000020f00 R13: 0000000000000003 R14: 0000000000020000 R15: 0000000000020000 </TASK> The buggy address belongs to the physical page: page:ffffea00017f6ec0 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x5fdbb flags: 0xfffffc0000000(node=0|zone=1|lastcpupid=0x1fffff) raw: 000fffffc0000000 0000000000000000 ffffea00017f6ec8 0000000000000000 raw: 0000000000000000 0000000000000000 00000000ffffffff 0000000000000000 page dumped because: kasan: bad access detected Memory state around the buggy address: ffff88805fdbbe80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88805fdbbf00: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >ffff88805fdbbf80: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ^ ffff88805fdbc000: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ffff88805fdbc080: ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ================================================================== This was found when Zheng Yejian sent a patch to convert the event type number assignment to use IDA, which gives the next available number, and this bug showed up in the fuzz testing by Yujie Liu and the kernel test robot. But after further analysis, I found that this behavior is the same as when the event type numbers go past the 16bit max (and the above shows that). As modules have a similar issue, but is dealt with by setting a "WAS_ENABLED" flag when a module event is enabled, and when the module is freed, if any of its events were enabled, the ring buffer that holds that event is also cleared, to prevent reading stale events. The same can be done for dynamic events. If any dynamic event that is being removed was enabled, then make sure the buffers they were enabled in are now cleared. Link: https://lkml.kernel.org/r/20221123171434.545706e3@gandalf.local.home Link: https://lore.kernel.org/all/20221110020319.1259291-1-zhengyejian1@huawei.com/ Cc: stable@vger.kernel.org Cc: Andrew Morton <akpm@linux-foundation.org> Depends-on: e18eb8783ec49 ("tracing: Add tracing_reset_all_online_cpus_unlocked() function") Depends-on: 5448d44c38557 ("tracing: Add unified dynamic event framework") Depends-on: 6212dd29683ee ("tracing/kprobes: Use dyn_event framework for kprobe events") Depends-on: 065e63f951432 ("tracing: Only have rmmod clear buffers that its events were active in") Depends-on: 575380da8b469 ("tracing: Only clear trace buffer on module unload if event was traced") Fixes: 77b44d1b7c283 ("tracing/kprobes: Rename Kprobe-tracer to kprobe-event") Reported-by: Zheng Yejian <zhengyejian1@huawei.com> Reported-by: Yujie Liu <yujie.liu@intel.com> Reported-by: kernel test robot <yujie.liu@intel.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing: Add tracing_reset_all_online_cpus_unlocked() functionSteven Rostedt (Google)4-4/+12
Currently the tracing_reset_all_online_cpus() requires the trace_types_lock held. But only one caller of this function actually has that lock held before calling it, and the other just takes the lock so that it can call it. More users of this function is needed where the lock is not held. Add a tracing_reset_all_online_cpus_unlocked() function for the one use case that calls it without being held, and also add a lockdep_assert to make sure it is held when called. Then have tracing_reset_all_online_cpus() take the lock internally, such that callers do not need to worry about taking it. Link: https://lkml.kernel.org/r/20221123192741.658273220@goodmis.org Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Zheng Yejian <zhengyejian1@huawei.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-23tracing: Fix race where histograms can be called before the eventSteven Rostedt (Google)1-0/+3
commit 94eedf3dded5 ("tracing: Fix race where eprobes can be called before the event") fixed an issue where if an event is soft disabled, and the trigger is being added, there's a small window where the event sees that there's a trigger but does not see that it requires reading the event yet, and then calls the trigger with the record == NULL. This could be solved with adding memory barriers in the hot path, or to make sure that all the triggers requiring a record check for NULL. The latter was chosen. Commit 94eedf3dded5 set the eprobe trigger handle to check for NULL, but the same needs to be done with histograms. Link: https://lore.kernel.org/linux-trace-kernel/20221118211809.701d40c0f8a757b0df3c025a@kernel.org/ Link: https://lore.kernel.org/linux-trace-kernel/20221123164323.03450c3a@gandalf.local.home Cc: Tom Zanussi <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Reported-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22gcov: clang: fix the buffer overflow issueMukesh Ojha1-0/+2
Currently, in clang version of gcov code when module is getting removed gcov_info_add() incorrectly adds the sfn_ptr->counter to all the dst->functions and it result in the kernel panic in below crash report. Fix this by properly handling it. [ 8.899094][ T599] Unable to handle kernel write to read-only memory at virtual address ffffff80461cc000 [ 8.899100][ T599] Mem abort info: [ 8.899102][ T599] ESR = 0x9600004f [ 8.899103][ T599] EC = 0x25: DABT (current EL), IL = 32 bits [ 8.899105][ T599] SET = 0, FnV = 0 [ 8.899107][ T599] EA = 0, S1PTW = 0 [ 8.899108][ T599] FSC = 0x0f: level 3 permission fault [ 8.899110][ T599] Data abort info: [ 8.899111][ T599] ISV = 0, ISS = 0x0000004f [ 8.899113][ T599] CM = 0, WnR = 1 [ 8.899114][ T599] swapper pgtable: 4k pages, 39-bit VAs, pgdp=00000000ab8de000 [ 8.899116][ T599] [ffffff80461cc000] pgd=18000009ffcde003, p4d=18000009ffcde003, pud=18000009ffcde003, pmd=18000009ffcad003, pte=00600000c61cc787 [ 8.899124][ T599] Internal error: Oops: 9600004f [#1] PREEMPT SMP [ 8.899265][ T599] Skip md ftrace buffer dump for: 0x1609e0 .... .., [ 8.899544][ T599] CPU: 7 PID: 599 Comm: modprobe Tainted: G S OE 5.15.41-android13-8-g38e9b1af6bce #1 [ 8.899547][ T599] Hardware name: XXX (DT) [ 8.899549][ T599] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) [ 8.899551][ T599] pc : gcov_info_add+0x9c/0xb8 [ 8.899557][ T599] lr : gcov_event+0x28c/0x6b8 [ 8.899559][ T599] sp : ffffffc00e733b00 [ 8.899560][ T599] x29: ffffffc00e733b00 x28: ffffffc00e733d30 x27: ffffffe8dc297470 [ 8.899563][ T599] x26: ffffffe8dc297000 x25: ffffffe8dc297000 x24: ffffffe8dc297000 [ 8.899566][ T599] x23: ffffffe8dc0a6200 x22: ffffff880f68bf20 x21: 0000000000000000 [ 8.899569][ T599] x20: ffffff880f68bf00 x19: ffffff8801babc00 x18: ffffffc00d7f9058 [ 8.899572][ T599] x17: 0000000000088793 x16: ffffff80461cbe00 x15: 9100052952800785 [ 8.899575][ T599] x14: 0000000000000200 x13: 0000000000000041 x12: 9100052952800785 [ 8.899577][ T599] x11: ffffffe8dc297000 x10: ffffffe8dc297000 x9 : ffffff80461cbc80 [ 8.899580][ T599] x8 : ffffff8801babe80 x7 : ffffffe8dc2ec000 x6 : ffffffe8dc2ed000 [ 8.899583][ T599] x5 : 000000008020001f x4 : fffffffe2006eae0 x3 : 000000008020001f [ 8.899586][ T599] x2 : ffffff8027c49200 x1 : ffffff8801babc20 x0 : ffffff80461cb3a0 [ 8.899589][ T599] Call trace: [ 8.899590][ T599] gcov_info_add+0x9c/0xb8 [ 8.899592][ T599] gcov_module_notifier+0xbc/0x120 [ 8.899595][ T599] blocking_notifier_call_chain+0xa0/0x11c [ 8.899598][ T599] do_init_module+0x2a8/0x33c [ 8.899600][ T599] load_module+0x23cc/0x261c [ 8.899602][ T599] __arm64_sys_finit_module+0x158/0x194 [ 8.899604][ T599] invoke_syscall+0x94/0x2bc [ 8.899607][ T599] el0_svc_common+0x1d8/0x34c [ 8.899609][ T599] do_el0_svc+0x40/0x54 [ 8.899611][ T599] el0_svc+0x94/0x2f0 [ 8.899613][ T599] el0t_64_sync_handler+0x88/0xec [ 8.899615][ T599] el0t_64_sync+0x1b4/0x1b8 [ 8.899618][ T599] Code: f905f56c f86e69ec f86e6a0f 8b0c01ec (f82e6a0c) [ 8.899620][ T599] ---[ end trace ed5218e9e5b6e2e6 ]--- Link: https://lkml.kernel.org/r/1668020497-13142-1-git-send-email-quic_mojha@quicinc.com Fixes: e178a5beb369 ("gcov: clang support") Signed-off-by: Mukesh Ojha <quic_mojha@quicinc.com> Reviewed-by: Peter Oberparleiter <oberpar@linux.ibm.com> Tested-by: Peter Oberparleiter <oberpar@linux.ibm.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Tom Rix <trix@redhat.com> Cc: <stable@vger.kernel.org> [5.2+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-22tracing/osnoise: Fix duration typeDaniel Bristot de Oliveira1-3/+3
The duration type is a 64 long value, not an int. This was causing some long noise to report wrong values. Change the duration to a 64 bits value. Link: https://lkml.kernel.org/r/a93d8a8378c7973e9c609de05826533c9e977939.1668692096.git.bristot@kernel.org Cc: stable@vger.kernel.org Cc: Daniel Bristot de Oliveira <bristot@kernel.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Fixes: bce29ac9ce0b ("trace: Add osnoise tracer") Signed-off-by: Daniel Bristot de Oliveira <bristot@kernel.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22tracing/user_events: Fix memory leak in user_event_create()Xiu Jianfeng1-1/+3
Before current_user_event_group(), it has allocated memory and save it in @name, this should freed before return error. Link: https://lkml.kernel.org/r/20221115014445.158419-1-xiujianfeng@huawei.com Fixes: e5d271812e7a ("tracing/user_events: Move pages/locks into groups to prepare for namespaces") Signed-off-by: Xiu Jianfeng <xiujianfeng@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Acked-by: Beau Belgrave <beaub@linux.microsoft.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22tracing/hist: add in missing * in comment blocksColin Ian King1-2/+2
There are a couple of missing * in comment blocks. Fix these. Cleans up two clang warnings: kernel/trace/trace_events_hist.c:986: warning: bad line: kernel/trace/trace_events_hist.c:3229: warning: bad line: Link: https://lkml.kernel.org/r/20221020133019.1547587-1-colin.i.king@gmail.com Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-22Revert "cpufreq: schedutil: Move max CPU capacity to sugov_policy"Sam Wu1-15/+15
This reverts commit 6d5afdc97ea71958287364a1f1d07e59ef151b11. On a Pixel 6 device, it is observed that this commit increases latency by approximately 50ms, or 20%, in migrating a task that requires full CPU utilization from a LITTLE CPU to Fmax on a big CPU. Reverting this change restores the latency back to its original baseline value. Fixes: 6d5afdc97ea7 ("cpufreq: schedutil: Move max CPU capacity to sugov_policy") Signed-off-by: Sam Wu <wusamuel@google.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-21bpf: Do not copy spin lock field from user in bpf_selem_allocXu Kuohai1-1/+1
bpf_selem_alloc function is used by inode_storage, sk_storage and task_storage maps to set map value, for these map types, there may be a spin lock in the map value, so if we use memcpy to copy the whole map value from user, the spin lock field may be initialized incorrectly. Since the spin lock field is zeroed by kzalloc, call copy_map_value instead of memcpy to skip copying the spin lock field to fix it. Fixes: 6ac99e8f23d4 ("bpf: Introduce bpf sk local storage") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Link: https://lore.kernel.org/r/20221114134720.1057939-2-xukuohai@huawei.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2022-11-20Merge tag 'trace-probes-v6.1' of ↵Linus Torvalds4-20/+45
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing/probes fixes from Steven Rostedt: - Fix possible NULL pointer dereference on trace_event_file in kprobe_event_gen_test_exit() - Fix NULL pointer dereference for trace_array in kprobe_event_gen_test_exit() - Fix memory leak of filter string for eprobes - Fix a possible memory leak in rethook_alloc() - Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case which can cause a possible use-after-free - Fix warning in eprobe filter creation - Fix eprobe filter creation as it picked the wrong event for the fields * tag 'trace-probes-v6.1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/eprobe: Fix eprobe filter to make a filter correctly tracing/eprobe: Fix warning in filter creation kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace case rethook: fix a potential memleak in rethook_alloc() tracing/eprobe: Fix memory leak of filter string tracing: kprobe: Fix potential null-ptr-deref on trace_array in kprobe_event_gen_test_exit() tracing: kprobe: Fix potential null-ptr-deref on trace_event_file in kprobe_event_gen_test_exit()
2022-11-20Merge tag 'trace-v6.1-rc5' of ↵Linus Torvalds7-43/+71
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix polling to block on watermark like the reads do, as user space applications get confused when the select says read is available, and then the read blocks - Fix accounting of ring buffer dropped pages as it is what is used to determine if the buffer is empty or not - Fix memory leak in tracing_read_pipe() - Fix struct trace_array warning about being declared in parameters - Fix accounting of ftrace pages used in output at start up. - Fix allocation of dyn_ftrace pages by subtracting one from order instead of diving it by 2 - Static analyzer found a case were a pointer being used outside of a NULL check (rb_head_page_deactivate()) - Fix possible NULL pointer dereference if kstrdup() fails in ftrace_add_mod() - Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event() - Fix bad pointer dereference in register_synth_event() on error path - Remove unused __bad_type_size() method - Fix possible NULL pointer dereference of entry in list 'tr->err_log' - Fix NULL pointer deference race if eprobe is called before the event setup * tag 'trace-v6.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix race where eprobes can be called before the event tracing: Fix potential null-pointer-access of entry in list 'tr->err_log' tracing: Remove unused __bad_type_size() method tracing: Fix wild-memory-access in register_synth_event() tracing: Fix memory leak in test_gen_synth_cmd() and test_empty_synth_event() ftrace: Fix null pointer dereference in ftrace_add_mod() ring_buffer: Do not deactivate non-existant pages ftrace: Optimize the allocation for mcount entries ftrace: Fix the possible incorrect kernel message tracing: Fix warning on variable 'struct trace_array' tracing: Fix memory leak in tracing_read_pipe() ring-buffer: Include dropped pages in counting dirty patches tracing/ring-buffer: Have polling block on watermark
2022-11-20tracing: Fix race where eprobes can be called before the eventSteven Rostedt (Google)1-0/+3
The flag that tells the event to call its triggers after reading the event is set for eprobes after the eprobe is enabled. This leads to a race where the eprobe may be triggered at the beginning of the event where the record information is NULL. The eprobe then dereferences the NULL record causing a NULL kernel pointer bug. Test for a NULL record to keep this from happening. Link: https://lore.kernel.org/linux-trace-kernel/20221116192552.1066630-1-rafaelmendsr@gmail.com/ Link: https://lore.kernel.org/linux-trace-kernel/20221117214249.2addbe10@gandalf.local.home Cc: Linux Trace Kernel <linux-trace-kernel@vger.kernel.org> Cc: Tzvetomir Stoyanov <tz.stoyanov@gmail.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: stable@vger.kernel.org Fixes: 7491e2c442781 ("tracing: Add a probe that attaches to trace events") Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-20Merge tag 'sched_urgent_for_v6.1_rc6' of ↵Linus Torvalds2-19/+52
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull scheduler fixes from Borislav Petkov: - Fix a small race on the task's exit path where there's a misunderstanding whether the task holds rq->lock or not - Prevent processes from getting killed when using deprecated or unknown rseq ABI flags in order to be able to fuzz the rseq() syscall with syzkaller * tag 'sched_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: sched: Fix race in task_call_func() rseq: Use pr_warn_once() when deprecated/unknown ABI flags are encountered
2022-11-20Merge tag 'perf_urgent_for_v6.1_rc6' of ↵Linus Torvalds1-6/+19
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf fixes from Borislav Petkov: - Fix an intel PT erratum where CPUs do not support single range output for more than 4K - Fix a NULL ptr dereference which can happen after an NMI interferes with the event enabling dance in amd_pmu_enable_all() - Free the events array too when freeing uncore contexts on CPU online, thereby fixing a memory leak - Improve the pending SIGTRAP check * tag 'perf_urgent_for_v6.1_rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel/pt: Fix sampling using single range output perf/x86/amd: Fix crash due to race between amd_pmu_enable_all, perf NMI and throttling perf/x86/amd/uncore: Fix memory leak for events array perf: Improve missing SIGTRAP checking
2022-11-17tracing: Fix potential null-pointer-access of entry in list 'tr->err_log'Zheng Yejian1-4/+5
Entries in list 'tr->err_log' will be reused after entry number exceed TRACING_LOG_ERRS_MAX. The cmd string of the to be reused entry will be freed first then allocated a new one. If the allocation failed, then the entry will still be in list 'tr->err_log' but its 'cmd' field is set to be NULL, later access of 'cmd' is risky. Currently above problem can cause the loss of 'cmd' information of first entry in 'tr->err_log'. When execute `cat /sys/kernel/tracing/error_log`, reproduce logs like: [ 37.495100] trace_kprobe: error: Maxactive is not for kprobe(null) ^ [ 38.412517] trace_kprobe: error: Maxactive is not for kprobe Command: p4:myprobe2 do_sys_openat2 ^ Link: https://lore.kernel.org/linux-trace-kernel/20221114104632.3547266-1-zhengyejian1@huawei.com Fixes: 1581a884b7ca ("tracing: Remove size restriction on tracing_log_err cmd strings") Signed-off-by: Zheng Yejian <zhengyejian1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-17tracing: Remove unused __bad_type_size() methodQiujun Huang1-2/+0
__bad_type_size() is unused after commit 04ae87a52074("ftrace: Rework event_create_dir()"). So, remove it. Link: https://lkml.kernel.org/r/D062EC2E-7DB7-4402-A67E-33C3577F551E@gmail.com Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Qiujun Huang <hqjagain@gmail.com> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-11-18tracing/eprobe: Fix eprobe filter to make a filter correctlyMasami Hiramatsu (Google)1-1/+1
Since the eprobe filter was defined based on the eprobe's trace event itself, it doesn't work correctly. Use the original trace event of the eprobe when making the filter so that the filter works correctly. Without this fix: # echo 'e syscalls/sys_enter_openat \ flags_rename=$flags:u32 if flags < 1000' >> dynamic_events # echo 1 > events/eprobes/sys_enter_openat/enable [ 114.551550] event trace: Could not enable event sys_enter_openat -bash: echo: write error: Invalid argument With this fix: # echo 'e syscalls/sys_enter_openat \ flags_rename=$flags:u32 if flags < 1000' >> dynamic_events # echo 1 > events/eprobes/sys_enter_openat/enable # tail trace cat-241 [000] ...1. 266.498449: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0 cat-242 [000] ...1. 266.977640: sys_enter_openat: (syscalls.sys_enter_openat) flags_rename=0 Link: https://lore.kernel.org/all/166823166395.1385292.8931770640212414483.stgit@devnote3/ Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support") Reported-by: Rafael Mendonca <rafaelmendsr@gmail.com> Tested-by: Rafael Mendonca <rafaelmendsr@gmail.com> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18tracing/eprobe: Fix warning in filter creationRafael Mendonca1-1/+1
The filter pointer (filterp) passed to create_filter() function must be a pointer that references a NULL pointer, otherwise, we get a warning when adding a filter option to the event probe: root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core sched/sched_stat_runtime \ runtime=$runtime:u32 if cpu < 4' >> dynamic_events [ 5034.340439] ------------[ cut here ]------------ [ 5034.341258] WARNING: CPU: 0 PID: 223 at kernel/trace/trace_events_filter.c:1939 create_filter+0x1db/0x250 [...] stripped [ 5034.345518] RIP: 0010:create_filter+0x1db/0x250 [...] stripped [ 5034.351604] Call Trace: [ 5034.351803] <TASK> [ 5034.351959] ? process_preds+0x1b40/0x1b40 [ 5034.352241] ? rcu_read_lock_bh_held+0xd0/0xd0 [ 5034.352604] ? kasan_set_track+0x29/0x40 [ 5034.352904] ? kasan_save_alloc_info+0x1f/0x30 [ 5034.353264] create_event_filter+0x38/0x50 [ 5034.353573] __trace_eprobe_create+0x16f4/0x1d20 [ 5034.353964] ? eprobe_dyn_event_release+0x360/0x360 [ 5034.354363] ? mark_held_locks+0xa6/0xf0 [ 5034.354684] ? _raw_spin_unlock_irqrestore+0x35/0x60 [ 5034.355105] ? trace_hardirqs_on+0x41/0x120 [ 5034.355417] ? _raw_spin_unlock_irqrestore+0x35/0x60 [ 5034.355751] ? __create_object+0x5b7/0xcf0 [ 5034.356027] ? lock_is_held_type+0xaf/0x120 [ 5034.356362] ? rcu_read_lock_bh_held+0xb0/0xd0 [ 5034.356716] ? rcu_read_lock_bh_held+0xd0/0xd0 [ 5034.357084] ? kasan_set_track+0x29/0x40 [ 5034.357411] ? kasan_save_alloc_info+0x1f/0x30 [ 5034.357715] ? __kasan_kmalloc+0xb8/0xc0 [ 5034.357985] ? write_comp_data+0x2f/0x90 [ 5034.358302] ? __sanitizer_cov_trace_pc+0x25/0x60 [ 5034.358691] ? argv_split+0x381/0x460 [ 5034.358949] ? write_comp_data+0x2f/0x90 [ 5034.359240] ? eprobe_dyn_event_release+0x360/0x360 [ 5034.359620] trace_probe_create+0xf6/0x110 [ 5034.359940] ? trace_probe_match_command_args+0x240/0x240 [ 5034.360376] eprobe_dyn_event_create+0x21/0x30 [ 5034.360709] create_dyn_event+0xf3/0x1a0 [ 5034.360983] trace_parse_run_command+0x1a9/0x2e0 [ 5034.361297] ? dyn_event_release+0x500/0x500 [ 5034.361591] dyn_event_write+0x39/0x50 [ 5034.361851] vfs_write+0x311/0xe50 [ 5034.362091] ? dyn_event_seq_next+0x40/0x40 [ 5034.362376] ? kernel_write+0x5b0/0x5b0 [ 5034.362637] ? write_comp_data+0x2f/0x90 [ 5034.362937] ? __sanitizer_cov_trace_pc+0x25/0x60 [ 5034.363258] ? ftrace_syscall_enter+0x544/0x840 [ 5034.363563] ? write_comp_data+0x2f/0x90 [ 5034.363837] ? __sanitizer_cov_trace_pc+0x25/0x60 [ 5034.364156] ? write_comp_data+0x2f/0x90 [ 5034.364468] ? write_comp_data+0x2f/0x90 [ 5034.364770] ksys_write+0x158/0x2a0 [ 5034.365022] ? __ia32_sys_read+0xc0/0xc0 [ 5034.365344] __x64_sys_write+0x7c/0xc0 [ 5034.365669] ? syscall_enter_from_user_mode+0x53/0x70 [ 5034.366084] do_syscall_64+0x60/0x90 [ 5034.366356] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 5034.366767] RIP: 0033:0x7ff0b43938f3 [...] stripped [ 5034.371892] </TASK> [ 5034.374720] ---[ end trace 0000000000000000 ]--- Link: https://lore.kernel.org/all/20221108202148.1020111-1-rafaelmendsr@gmail.com/ Fixes: 752be5c5c910 ("tracing/eprobe: Add eprobe filter support") Signed-off-by: Rafael Mendonca <rafaelmendsr@gmail.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18kprobes: Skip clearing aggrprobe's post_handler in kprobe-on-ftrace caseLi Huafei1-1/+7
In __unregister_kprobe_top(), if the currently unregistered probe has post_handler but other child probes of the aggrprobe do not have post_handler, the post_handler of the aggrprobe is cleared. If this is a ftrace-based probe, there is a problem. In later calls to disarm_kprobe(), we will use kprobe_ftrace_ops because post_handler is NULL. But we're armed with kprobe_ipmodify_ops. This triggers a WARN in __disarm_kprobe_ftrace() and may even cause use-after-free: Failed to disarm kprobe-ftrace at kernel_clone+0x0/0x3c0 (error -2) WARNING: CPU: 5 PID: 137 at kernel/kprobes.c:1135 __disarm_kprobe_ftrace.isra.21+0xcf/0xe0 Modules linked in: testKprobe_007(-) CPU: 5 PID: 137 Comm: rmmod Not tainted 6.1.0-rc4-dirty #18 [...] Call Trace: <TASK> __disable_kprobe+0xcd/0xe0 __unregister_kprobe_top+0x12/0x150 ? mutex_lock+0xe/0x30 unregister_kprobes.part.23+0x31/0xa0 unregister_kprobe+0x32/0x40 __x64_sys_delete_module+0x15e/0x260 ? do_user_addr_fault+0x2cd/0x6b0 do_syscall_64+0x3a/0x90 entry_SYSCALL_64_after_hwframe+0x63/0xcd [...] For the kprobe-on-ftrace case, we keep the post_handler setting to identify this aggrprobe armed with kprobe_ipmodify_ops. This way we can disarm it correctly. Link: https://lore.kernel.org/all/20221112070000.35299-1-lihuafei1@huawei.com/ Fixes: 0bc11ed5ab60 ("kprobes: Allow kprobes coexist with livepatch") Reported-by: Zhao Gongyi <zhaogongyi@huawei.com> Suggested-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Li Huafei <lihuafei1@huawei.com> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18rethook: fix a potential memleak in rethook_alloc()Yi Yang1-1/+3
In rethook_alloc(), the variable rh is not freed or passed out if handler is NULL, which could lead to a memleak, fix it. Link: https://lore.kernel.org/all/20221110104438.88099-1-yiyang13@huawei.com/ [Masami: Add "rethook:" tag to the title.] Fixes: 54ecbe6f1ed5 ("rethook: Add a generic return hook") Cc: stable@vger.kernel.org Signed-off-by: Yi Yang <yiyang13@huawei.com> Acke-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2022-11-18tracing/eprobe: Fix memory leak of filter stringRafael Mendonca1-0/+1
The filter string doesn't get freed when a dynamic event is deleted. If a filter is set, then memory is leaked: root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \ sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events root@localhost:/sys/kernel/tracing# echo "-:egroup/stat_runtime_4core" >> dynamic_events root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak [ 224.416373] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak) root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff88810156f1b8 (size 8): comm "bash", pid 224, jiffies 4294935612 (age 55.800s) hex dump (first 8 bytes): 63 70 75 20 3c 20 34 00 cpu < 4. backtrace: [<000000009f880725>] __kmem_cache_alloc_node+0x18e/0x720 [<0000000042492946>] __kmalloc+0x57/0x240 [<0000000034ea7995>] __trace_eprobe_create+0x1214/0x1d30 [<00000000d70ef730>] trace_probe_create+0xf6/0x110 [<00000000915c7b16>] eprobe_dyn_event_create+0x21/0x30 [<000000000d894386>] create_dyn_event+0xf3/0x1a0 [<00000000e9af57d5>] trace_parse_run_command+0x1a9/0x2e0 [<0000000080777f18>] dyn_event_write+0x39/0x50 [<0000000089f0ec73>] vfs_write+0x311/0xe50 [<000000003da1bdda>] ksys_write+0x158/0x2a0 [<00000000bb1e616e>] __x64_sys_write+0x7c/0xc0 [<00000000e8aef1f7>] do_syscall_64+0x60/0x90 [<00000000fe7fe8ba>] entry_SYSCALL_64_after_hwframe+0x63/0xcd Additionally, in __trace_eprobe_create() function, if an error occurs after the call to trace_eprobe_parse_filter(), which allocates the filter string, then memory is also leaked. That can be reproduced by creating the same event probe twice: root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \ sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events root@localhost:/sys/kernel/tracing# echo 'e:egroup/stat_runtime_4core \ sched/sched_stat_runtime runtime=$runtime:u32 if cpu < 4' >> dynamic_events -bash: echo: write error: File exists root@localhost:/sys/kernel/tracing# echo scan > /sys/kernel/debug/kmemleak [ 207.871584] kmemleak: 1 new suspected memory leaks (see /sys/kernel/debug/kmemleak) root@localhost:/sys/kernel/tracing# cat /sys/kernel/debug/kmemleak unreferenced object 0xffff8881020d17a8 (size 8): comm "bash", pid 223, jiffies 4294938308 (age 31.000s) hex dump (first 8 bytes): 63 70 75 20 3c 20