aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-02-06bpf: Prepare for bpf_selem_unlink_nofail()Amery Hung2-37/+37
The next patch will introduce bpf_selem_unlink_nofail() to handle rqspinlock errors. bpf_selem_unlink_nofail() will allow an selem to be partially unlinked from map or local storage. Save memory allocation method in selem so that later an selem can be correctly freed even when SDATA(selem)->smap is init to NULL. In addition, keep track of memory charge to the owner in local storage so that later bpf_selem_unlink_nofail() can return the correct memory charge to the owner. Updating local_storage->mem_charge is protected by local_storage->lock. Finally, extract miscellaneous tasks performed when unlinking an selem from local_storage into bpf_selem_unlink_storage_nolock_misc(). It will be reused by bpf_selem_unlink_nofail(). This patch also takes the chance to remove local_storage->smap, which is no longer used since commit f484f4a3e058 ("bpf: Replace bpf memory allocator with kmalloc_nolock() in local storage"). Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-10-ameryhung@gmail.com
2026-02-06bpf: Remove unused percpu counter from bpf_local_storage_map_freeAmery Hung6-12/+6
Percpu locks have been removed from cgroup and task local storage. Now that all local storage no longer use percpu variables as locks preventing recursion, there is no need to pass them to bpf_local_storage_map_free(). Remove the argument from the function. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-9-ameryhung@gmail.com
2026-02-06bpf: Remove cgroup local storage percpu counterAmery Hung1-51/+8
The percpu counter in cgroup local storage is no longer needed as the underlying bpf_local_storage can now handle deadlock with the help of rqspinlock. Remove the percpu counter and related migrate_{disable, enable}. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-8-ameryhung@gmail.com
2026-02-06bpf: Remove task local storage percpu counterAmery Hung2-136/+18
The percpu counter in task local storage is no longer needed as the underlying bpf_local_storage can now handle deadlock with the help of rqspinlock. Remove the percpu counter and related migrate_{disable, enable}. Since the percpu counter is removed, merge back bpf_task_storage_get() and bpf_task_storage_get_recur(). This will allow the bpf syscalls and helpers to run concurrently on the same CPU, removing the spurious -EBUSY error. bpf_task_storage_get(..., F_CREATE) will now always succeed with enough free memory unless being called recursively. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-7-ameryhung@gmail.com
2026-02-06bpf: Change local_storage->lock and b->lock to rqspinlockAmery Hung2-22/+47
Change bpf_local_storage::lock and bpf_local_storage_map_bucket::lock from raw_spin_lock to rqspinlock. Finally, propagate errors from raw_res_spin_lock_irqsave() to syscall return or BPF helper return. In bpf_local_storage_destroy(), ignore return from raw_res_spin_lock_irqsave() for now. A later patch will correctly handle errors correctly in bpf_local_storage_destroy() so that it can unlink selems even when failing to acquire locks. For __bpf_local_storage_map_cache(), instead of handling the error, skip updating the cache. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-6-ameryhung@gmail.com
2026-02-06bpf: Convert bpf_selem_unlink to failableAmery Hung6-49/+39
To prepare changing both bpf_local_storage_map_bucket::lock and bpf_local_storage::lock to rqspinlock, convert bpf_selem_unlink() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Open code bpf_selem_unlink_storage() in the only caller, bpf_selem_unlink(), since unlink_map and unlink_storage must be done together after all the necessary locks are acquired. For bpf_local_storage_map_free(), ignore the return from bpf_selem_unlink() for now. A later patch will allow it to unlink selems even when failing to acquire locks. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-5-ameryhung@gmail.com
2026-02-06bpf: Convert bpf_selem_link_map to failableAmery Hung3-7/+16
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock, convert bpf_selem_link_map() to failable. It still always succeeds and returns 0 until the change happens. No functional change. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-4-ameryhung@gmail.com
2026-02-06bpf: Convert bpf_selem_unlink_map to failableAmery Hung1-18/+39
To prepare for changing bpf_local_storage_map_bucket::lock to rqspinlock, convert bpf_selem_unlink_map() to failable. It still always succeeds and returns 0 for now. Since some operations updating local storage cannot fail in the middle, open-code bpf_selem_unlink_map() to take the b->lock before the operation. There are two such locations: - bpf_local_storage_alloc() The first selem will be unlinked from smap if cmpxchg owner_storage_ptr fails, which should not fail. Therefore, hold b->lock when linking until allocation complete. Helpers that assume b->lock is held by callers are introduced: bpf_selem_link_map_nolock() and bpf_selem_unlink_map_nolock(). - bpf_local_storage_update() The three step update process: link_map(new_selem), link_storage(new_selem), and unlink_map(old_selem) should not fail in the middle. In bpf_selem_unlink(), bpf_selem_unlink_map() and bpf_selem_unlink_storage() should either all succeed or fail as a whole instead of failing in the middle. So, return if unlink_map() failed. Remove the selem_linked_to_map_lockless() check as an selem in the common paths (not bpf_local_storage_map_free() or bpf_local_storage_destroy()), will be unlinked under b->lock and local_storage->lock and therefore no other threads can unlink the selem from map at the same time. In bpf_local_storage_destroy(), ignore the return of bpf_selem_unlink_map() for now. A later patch will allow bpf_local_storage_destroy() to unlink selems even when failing to acquire locks. Note that while this patch removes all callers of selem_linked_to_map(), a later patch that introduces bpf_selem_unlink_nofail() will use it again. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-3-ameryhung@gmail.com
2026-02-06bpf: Select bpf_local_storage_map_bucket based on bpf_local_storageAmery Hung3-7/+13
A later bpf_local_storage refactor will acquire all locks before performing any update. To simplified the number of locks needed to take in bpf_local_storage_map_update(), determine the bucket based on the local_storage an selem belongs to instead of the selem pointer. Currently, when a new selem needs to be created to replace the old selem in bpf_local_storage_map_update(), locks of both buckets need to be acquired to prevent racing. This can be simplified if the two selem belongs to the same bucket so that only one bucket needs to be locked. Therefore, instead of hashing selem, hashing the local_storage pointer the selem belongs. Performance wise, this is slightly better as update now requires locking one bucket. It should not change the level of contention on one bucket as the pointers to local storages of selems in a map are just as unique as pointers to selems. Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Amery Hung <ameryhung@gmail.com> Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org> Link: https://patch.msgid.link/20260205222916.1788211-2-ameryhung@gmail.com
2026-02-06PCI/AER: Clear stale errors on reporting agents upon probeLukas Wunner1-1/+25
Correctable and Uncorrectable Error Status Registers on reporting agents are cleared upon PCI device enumeration in pci_aer_init() to flush past events. They're cleared again when an error is handled by the AER driver. If an agent reports a new error after pci_aer_init() and before the AER driver has probed on the corresponding Root Port or Root Complex Event Collector, that error is not handled by the AER driver: It clears the Root Error Status Register on probe, but neglects to re-clear the Correctable and Uncorrectable Error Status Registers on reporting agents. The error will eventually be reported when another error occurs. Which is irritating because to an end user it appears as if the earlier error has just happened. Amend the AER driver to clear stale errors on reporting agents upon probe. Skip reporting agents which have not invoked pci_aer_init() yet to avoid using an uninitialized pdev->aer_cap. They're recognizable by the error bits in the Device Control register still being clear. Reporting agents may execute pci_aer_init() after the AER driver has probed, particularly when devices are hotplugged or removed/rescanned via sysfs. For this reason, it continues to be necessary that pci_aer_init() clears Correctable and Uncorrectable Error Status Registers. Reported-by: Lucas Van <lucas.van@intel.com> # off-list Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Lucas Van <lucas.van@intel.com> Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Link: https://patch.msgid.link/3011c2ed30c11f858e35e29939add754adea7478.1769332702.git.lukas@wunner.de
2026-02-06PCI: Don't claim disabled bridge windowsIlpo Järvinen1-0/+2
The commit 8278c6914306 ("PCI: Preserve bridge window resource type flags") changed bridge window resource behavior such that flags are no longer zero if the bridge window is not valid or is disabled (mainly to preserve the type flags for later use). If a bridge window has its limit smaller than base address, pci_read_bridge_*() sets both IORESOURCE_UNSET and IORESOURCE_DISABLED to indicate the bridge window exists but is not valid with the current base and limit configuration. The code in pci_claim_bridge_resources() still depends on the old behavior of checking validity of the bridge window solely based on !r->flags, whereas after 8278c6914306, also IORESOURCE_DISABLED may indicate bridge window addresses are not valid. While pci_claim_resource() does check IORESOURCE_UNSET, pci_claim_bridge_resource() attempts to clip the resource if pci_claim_resource() fails, which is not correct for bridge window resources that are not valid. As pci_bus_clip_resource() performs clipping regardless of flags and then clears IORESOURCE_UNSET, it should not be called unless the resource is valid. The problem is visible in this log: pci 0000:20:00.0: PCI bridge to [bus 21] pci 0000:20:00.0: bridge window [io size 0x0000 disabled]: can't claim; no address assigned pci 0000:20:00.0: [io 0x0000-0xffffffffffffffff disabled] clipped to [io 0x0000-0xffff disabled] Add IORESOURCE_DISABLED check in pci_claim_bridge_resources() to only claim bridge windows that appear to have a valid configuration. Fixes: 8278c6914306 ("PCI: Preserve bridge window resource type flags") Reported-by: Sizhe Liu <liusizhe5@huawei.com> Link: https://lore.kernel.org/all/20260203023545.2753811-1-liusizhe5@huawei.com Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/4d9228d6-a230-6ddf-e300-fbf42d523863@linux.intel.com
2026-02-06perf build: Reduce pmu-events related copying and mkdirsIan Rogers1-68/+149
When building to an output directory the previous code would remove files and then copy the source files over. Each source file copy would have a rule to make its directory. All JSON for every architecture was considered a source file. This led to unnecessary copying as a file would be deleted and then the same file copied again, unnecessary directory making, and copying of files not used in the build. A side-effect would be a lot of build messages. This change makes it so that all computed output files are created and then compared to all files in the OUTPUT directory. By filtering out the files that would be copied, unnecessary files can be determined and then deleted - note, this is a phony target which would remake the pmu-events.c if always depended upon, and so the dependency is conditional on there being files to remove. This has some overhead as the $(OUTPUT)/pmu-events is "find" over rather than just "rm -fr", but the savings from unnecessary copying, etc. should make up for this new make overhead. The copy target just does copying but has a dependency on the directory it needs being built, avoiding repetitive mkdirs. The source files for copying only consider the JEVENTS_ARCH unless the JEVENTS_ARCH is all. The metric JSON is only generated if appropriate, rather than always being generated and jevents.py deciding whether or not to use the files. The mypy and pylint targets are fixed as variable names had changed but the rules not updated. The line count of a build with "make -C tools/perf O=/tmp/perf clean all" prior to this change was 2181 lines, after this change it is 1596 lines. This is a reduction of 585 lines or about 27%. The generated pmu-events.c for JEVENTS_ARCH "x86" and "all" were validated as being identical after this change. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2026-02-06vfio/fsl-mc: add myself as maintainerIoana Ciornei3-7/+3
Add myself as maintainer of the vfio/fsl-mc driver. The driver is still highly in use on Layerscape DPAA2 SoCs. Signed-off-by: Ioana Ciornei <ioana.ciornei@nxp.com> Link: https://lore.kernel.org/r/20260204100913.3197966-1-ioana.ciornei@nxp.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-02-06vfio: selftests: only build tests on arm64 and x86_64Ted Logan1-0/+9
Only build vfio self-tests on arm64 and x86_64; these are the only architectures where the vfio self-tests are run. Addresses compiler warnings for format and conversions on i386. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601211830.aBEjmEFD-lkp@intel.com/ Signed-off-by: Ted Logan <tedlogan@fb.com> Reviewed-by: David Matlack <dmatlack@google.com> Link: https://lore.kernel.org/r/20260202-vfio-selftest-only-64bit-v2-1-9c3ebb37f0f4@fb.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2026-02-06PCI: rzg3s-host: Fix device node reference leak in rzg3s_pcie_host_parse_port()Felix Gu1-1/+2
In rzg3s_pcie_host_parse_port(), of_get_next_child() returns a device node with an incremented reference count that must be released with of_node_put(). The current code fails to call of_node_put() which causes a reference leak. Use the __free(device_node) attribute to ensure automatic cleanup when the variable goes out of scope. Fixes: 7ef502fb35b2 ("PCI: Add Renesas RZ/G3S host controller driver") Signed-off-by: Felix Gu <ustc.gu@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Reviewed-by: Claudiu Beznea <claudiu.beznea.uj@bp.renesas.com> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20260204-rzg3s-v1-1-142bc81c3312@gmail.com
2026-02-06perf lock contention: fix segfault in `lock contention -b/--use-bpf`Tycho Andersen (AMD)1-0/+3
When run on a kernel without BTF info, perf crashes: libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find valid kernel BTF Program received signal SIGSEGV, Segmentation fault. 0x00005555556915b7 in btf.type_cnt () (gdb) bt #0 0x00005555556915b7 in btf.type_cnt () #1 0x0000555555691fbc in btf_find_by_name_kind () #2 0x00005555556920d0 in btf.find_by_name_kind () #3 0x00005555558a1b7c in init_numa_data (con=0x7fffffffd0a0) at util/bpf_lock_contention.c:125 #4 0x00005555558a264b in lock_contention_prepare (con=0x7fffffffd0a0) at util/bpf_lock_contention.c:313 #5 0x0000555555620702 in __cmd_contention (argc=0, argv=0x7fffffffea10) at builtin-lock.c:2084 #6 0x0000555555622c8d in cmd_lock (argc=0, argv=0x7fffffffea10) at builtin-lock.c:2755 #7 0x0000555555651451 in run_builtin (p=0x555556104f00 <commands+576>, argc=3, argv=0x7fffffffea10) at perf.c:349 #8 0x00005555556516ed in handle_internal_command (argc=3, argv=0x7fffffffea10) at perf.c:401 #9 0x000055555565184e in run_argv (argcp=0x7fffffffe7fc, argv=0x7fffffffe7f0) at perf.c:445 #10 0x0000555555651b9f in main (argc=3, argv=0x7fffffffea10) at perf.c:553 Check if btf loading failed, and don't do anything with it in init_numa_data(). This leads to the following error message, instead of just a crash: libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find valid kernel BTF libbpf: kernel BTF is missing at '/sys/kernel/btf/vmlinux', was CONFIG_DEBUG_INFO_BTF enabled? libbpf: failed to find valid kernel BTF libbpf: Error loading vmlinux BTF: -ESRCH libbpf: failed to load BPF skeleton 'lock_contention_bpf': -ESRCH Failed to load lock-contention BPF skeleton lock contention BPF setup failed Signed-off-by: Tycho Andersen (AMD) <tycho@kernel.org> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Chun-Tse Shao <ctshao@google.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2026-02-06perf sort: Replace static cacheline size with sysconf cacheline sizeRicky Ringler1-5/+12
Testing: - Built perf - Executed perf mem record and report Committer notes: This addresses a TODO and improves the situation where record and report/c2c are performed on the same machine or in machines with the same cacheline size, but the proper way is to store the cacheline size in the perf.data header at 'record' time and then use it at post processing time. Signed-off-by: Ricky Ringler <ricky.ringler@proton.me> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20260129004223.26799-1-ricky.ringler@proton.me Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2026-02-06perf test: Fix test case Leader sampling on s390Thomas Richter1-1/+15
The subtest 'Leader sampling' some time fails on s390. - for z/VM guest: Disable the test for z/VM guest. There is no CPU Measurement facility to run the test successfully. - for LPAR: Use correct event names. A detailed analysis follows here: Now to the debugging and investigation: 1. With command perf record -e '{cycles,cycles}:S' -- .... the first cycles event starts sampling. On s390 this sets up sampling with a frequency of 4000 Hz. This translates to hardware sample rate of 1377000 instructions per micro-second to meet a frequency of 4000 HZ. 2. With first event cycles now sampling into a hardware buffer, an interrupt is triggered each time a sampling buffer gets full. The interrupt handler is then invoked and debug output shows the processing of samples. The size of one hardware sample is 32 bytes. With an interrupt triggered when the hardware buffer page of 4KB gets full, the interrupt handler processes 128 samples. (This is taken from s390 specific fast debug data gathering) 2025-11-07 14:35:51.977248 000003ffe013cbfa \ perf_event_count_update event->count 0x0 count 0x1502e8 2025-11-07 14:35:51.977248 000003ffe013cbfa \ perf_event_count_update event->count 0x1502e8 count 0x1502e8 2025-11-07 14:35:51.977248 000003ffe013cbfa \ perf_event_count_update event->count 0x2a05d0 count 0x1502e8 2025-11-07 14:35:51.977252 000003ffe013cbfa \ perf_event_count_update event->count 0x3f08b8 count 0x1502e8 2025-11-07 14:35:51.977252 000003ffe013cbfa \ perf_event_count_update event->count 0x540ba0 count 0x1502e8 2025-11-07 14:35:51.977253 000003ffe013cbfa \ perf_event_count_update event->count 0x690e88 count 0x1502e8 2025-11-07 14:35:51.977254 000003ffe013cbfa \ perf_event_count_update event->count 0x7e1170 count 0x1502e8 2025-11-07 14:35:51.977254 000003ffe013cbfa \ perf_event_count_update event->count 0x931458 count 0x1502e8 2025-11-07 14:35:51.977254 000003ffe013cbfa \ perf_event_count_update event->count 0xa81740 count 0x1502e8 3. The value is constantly increasing by the number of instructions executed to generate a sample entry. This is the first line of the pairs of lines. count 0x1502e8 --> 1377000 # perf script | grep 1377000 | wc -l 214 # perf script | wc -l 428 # That is 428 lines in total, and half of the lines contain value 1377000. 4. The second event cycles is opened against the counting PMU, which is an independent PMU and is not interrupt driven. Once enabled it runs in the background and keeps running, incrementing silently about 400+ counters. The counter values are read via assembly instructions. This second counter PMU's read call back function is called when the interrupt handler of the sampling facility processes each sample. The function call sequence is: perf_event_overflow() +--> __perf_event_overflow() +--> __perf_event_output() +--> perf_output_sample() +--> perf_output_read() +--> perf_output_read_group() for_each_sibling_event(sub, leader) { values[n++] = perf_event_count(sub, self); printk("%s sub %p values %#lx\n", __func__, sub, values[n-1]); } The last function perf_event_count() is invoked on the second event cylces *on* the counting PMU. An added printk statement shows the following lines in the dmesg output: # dmesg|grep perf_output_read_group |head -10 [ 332.368620] perf_output_read_group sub 00000000d80b7c1f values 0x3a80917 (1) [ 332.368624] perf_output_read_group sub 00000000d80b7c1f values 0x3a86c7f (2) [ 332.368627] perf_output_read_group sub 00000000d80b7c1f values 0x3a89c15 (3) [ 332.368629] perf_output_read_group sub 00000000d80b7c1f values 0x3a8c895 (4) [ 332.368631] perf_output_read_group sub 00000000d80b7c1f values 0x3a8f569 (5) [ 332.368633] perf_output_read_group sub 00000000d80b7c1f values 0x3a9204b [ 332.368635] perf_output_read_group sub 00000000d80b7c1f values 0x3a94790 [ 332.368637] perf_output_read_group sub 00000000d80b7c1f values 0x3a9704b [ 332.368638] perf_output_read_group sub 00000000d80b7c1f values 0x3a99888 # This correlates with the output of # perf report -D | grep 'id 00000000000000'|head -10 ..... id 0000000000000006, value 00000000001502e8, lost 0 ..... id 000000000000000e, value 0000000003a80917, lost 0 --> line (1) above ..... id 0000000000000006, value 00000000002a05d0, lost 0 ..... id 000000000000000e, value 0000000003a86c7f, lost 0 --> line (2) above ..... id 0000000000000006, value 00000000003f08b8, lost 0 ..... id 000000000000000e, value 0000000003a89c15, lost 0 --> line (3) above ..... id 0000000000000006, value 0000000000540ba0, lost 0 ..... id 000000000000000e, value 0000000003a8c895, lost 0 --> line (4) above ..... id 0000000000000006, value 0000000000690e88, lost 0 ..... id 000000000000000e, value 0000000003a8f569, lost 0 --> line (5) above Summary: - Above command starts the CPU sampling facility, with runs interrupt driven when a 4KB page is full. An interrupt processes the 128 samples and calls eventually perf_output_read_group() for each sample to save it in the event's ring buffer. - At that time the CPU counting facility is invoked to read the value of the event cycles. This value is saved as the second value in the sample_read structure. - The first and odd lines in the perf script output displays the period value between 2 samples being created by hardware. It is the number of instructions executes before the hardware writes a sample. - The second and even lines in the perf script output displays the number of CPU cycles needed to process each sample and save it in the event's ring buffer. These 2 different values can never be identical on s390. Since event leader sampling is not possible on s390 the perf tool will return EOPNOTSUPP soon. Perpare the test case for that. Suggested-by: James Clark <james.clark@linaro.org> Reviewed-by: Jan Polensky <japo@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Tested-by: Jan Polensky <japo@linux.ibm.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Sumanth Korikkar <sumanthk@linux.ibm.com> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2026-02-06perf annotate: Fix register usage in data type profilingNamhyung Kim1-31/+30
On data type profiling, it tried to match register name with a partial string. For example, it allowed to match with "%rbp)" or "%rdi,8)". But with recent change in the area, it doesn't match anymore and break the data type profiling. Let's pass the correct register name by removing the unwanted part. Add arch__dwarf_regnum() to handle it in a single place. Closes: 7d3n23li6drroxrdlpxn7ixehdeszkjdftah3zyngjl2qs22ef@yelcjv53v42o Reported-by: Dmitry Dolgov <9erthalion6@gmail.com> Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Zecheng Li <zli94@ncsu.edu> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2026-02-06perf stat: Ensure metrics are displayed even with failed eventsChun-Tse Shao3-40/+29
Currently, `perf stat` skips or hides metrics when the underlying hardware events cannot be counted (e.g., due to insufficient permissions or unsupported events). In `--metric-only` mode, this often results in missing columns or blank spaces, making the output difficult to parse. Modify the logic to ensure metrics are consistently displayed by propagating NAN (Not a Number) through the expression evaluator. Specifically: 1. Update `prepare_metric()` in stat-shadow.c to treat uncounted events (where `run == 0`) as NAN. This leverages the existing math in expr.y to propagate NAN through metric expressions. 2. Remove the early return in the display logic's `printout()` function that was previously skipping metrics in `--metric-only` mode for failed events. l 3. Simplify `perf_stat__skip_metric_event()` to no longer depend on event runtime. Tested: 1. `perf all metrics test` did not crash while paranoid is 2. 2. Multiple combinations with `CPUs_utilized` while paranoid is 2. $ ./perf stat -M CPUs_utilized -a -- sleep 1 Performance counter stats for 'system wide': <not supported> msec cpu-clock:u # nan CPUs CPUs_utilized 1,006,356,120 duration_time 1.004375550 seconds time elapsed $ ./perf stat -M CPUs_utilized -a -j -- sleep 1 {"counter-value" : "<not supported>", "unit" : "msec", "event" : "cpu-clock:u", "event-runtime" : 0, "pcnt-running" : 100.00, "metric-value" : "nan", "metric-unit" : "CPUs CPUs_utilized"} {"counter-value" : "1006642462.000000", "unit" : "", "event" : "duration_time", "event-runtime" : 1, "pcnt-running" : 100.00} $ ./perf stat -M CPUs_utilized -a --metric-only -- sleep 1 Performance counter stats for 'system wide': CPUs CPUs_utilized nan 1.004424652 seconds time elapsed $ ./perf stat -M CPUs_utilized -a --metric-only -j -- sleep 1 {"CPUs CPUs_utilized" : "none"} Reviewed-by: Ian Rogers <irogers@google.com> Signed-off-by: Chun-Tse Shao <ctshao@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2026-02-06perf test addr2line_inlines: Ensure inline information shows on LBR leavesIan Rogers1-0/+28
Expand the addr2line inline function testing to also run for an LBR callchain, skipping if LBR support isn't present. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2026-02-06perf callchain lbr: Make the leaf IP that of the sampleIan Rogers1-4/+16
The current IP of a leaf function when reported from a perf record with "--call-graph lbr" is the "to" field of the LBR branch stack record. The sample for the event being recorded may be further into the function and there may be inlining information associated with it. Rather than use the branch stack "to" field in this case switch to the callchain appending the sample->ip and thereby allowing the inline information to show. Before this change: ``` $ perf record --call-graph lbr perf test -w inlineloop ... $ perf script --fields +srcline ... perf-inlineloop 467586 4649.344493: 950905 cpu_core/cycles/P: 55dfda2829c0 parent+0x0 (perf) inlineloop.c:31 55dfda282a96 inlineloop+0x86 (perf) inlineloop.c:47 55dfda236420 run_workload+0x59 (perf) builtin-test.c:715 55dfda236b03 cmd_test+0x413 (perf) builtin-test.c:825 ... ``` After this change: ``` $ perf record --call-graph lbr perf test -w inlineloop ... $ perf script --fields +srcline ... perf-inlineloop 529703 11878.680815: 950905 cpu_core/cycles/P: 555ce86be9e6 leaf+0x26 inlineloop.c:20 (inlined) 555ce86be9e6 middle+0x26 inlineloop.c:27 (inlined) 555ce86be9e6 parent+0x26 (perf) inlineloop.c:32 555ce86bea96 inlineloop+0x86 (perf) inlineloop.c:47 555ce8672420 run_workload+0x59 (perf) builtin-test.c:715 555ce8672b03 cmd_test+0x413 (perf) builtin-test.c:825 ... ``` Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Dmitriy Vyukov <dvyukov@google.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: James Clark <james.clark@linaro.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Krzysztof Łopatowski <krzysztof.m.lopatowski@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Weilin Wang <weilin.wang@intel.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2026-02-06Merge tag 'mm-hotfixes-stable-2026-02-06-12-37' of ↵Linus Torvalds4-49/+80
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "A couple of late-breaking MM fixes. One against a new-in-this-cycle patch and the other addresses a locking issue which has been there for over a year" * tag 'mm-hotfixes-stable-2026-02-06-12-37' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm/memory-failure: reject unsupported non-folio compound page procfs: avoid fetching build ID while holding VMA lock
2026-02-06perf kvm stat: Fix build errorLeo Yan1-3/+3
Since commit ceea279f9376 ("perf kvm stat: Remove use of the arch directory"), a native build on Arm64 machine reports: util/kvm-stat-arch/kvm-stat-x86.c:7:10: fatal error: asm/svm.h: No such file or directory 7 | #include <asm/svm.h> | ^~~~~~~~~~~ compilation terminated. The build fails to find x86's asm headers when building for Arm64. Fix this by including asm headers with relative path instead. Fixes: ceea279f9376 ("perf kvm stat: Remove use of the arch directory") Signed-off-by: Leo Yan <leo.yan@arm.com> Link: https://lore.kernel.org/r/20260206-perf_fix_kvm_stat_error-v1-1-ad40115876be@arm.com Cc: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: James Clark <james.clark@linaro.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2026-02-06Merge tag 'trace-v6.19-rc7' of ↵Linus Torvalds3-24/+36
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fix from Steven Rostedt: - Fix event format field alignments for 32 bit architectures The fields in the event format files are used to parse the raw binary buffer data by applications. If they are incorrect, then the application produces garbage. On 32 bit architectures, the function graph 64bit calltime and rettime were off by 4bytes. That's because the actual fields are in a packed structure but the macros used by the ftrace events did not mark them as packed, and instead, gave them their natural alignment which made their offsets off by 4 bytes. There are macros to have a packed field within an embedded structure of an event, but there's no macro for normal fields within a packed structure of the event. The macro __field_packed() was used for the packed embedded structure field. Rename that to __field_desc_packed() (to match the non-packed embedded field macro __field_desc()), and make __field_packed() for fields that are in a packed event structure (which matches the unpacked __field() macro). Switch the calltime and rettime fields of the function graph event to use the new __field_packed() and this makes the offsets correct. * tag 'trace-v6.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing: Fix ftrace event field alignments
2026-02-06tracing/kprobes: Skip setup_boot_kprobe_events() when no cmdline eventYaxiong Tian1-0/+4
When the 'kprobe_event=' kernel command-line parameter is not provided, there is no need to execute setup_boot_kprobe_events(). This change optimizes the initialization function init_kprobe_trace() by skipping unnecessary work and effectively prevents potential blocking that could arise from contention on the event_mutex lock in subsequent operations. Link: https://patch.msgid.link/20260204015401.163748-1-tianyaxiong@kylinos.cn Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-06blktrace: Make init_blk_tracer() asynchronousYaxiong Tian1-1/+22
The init_blk_tracer() function causes significant boot delay as it waits for the trace_event_sem lock held by trace_event_update_all(). Specifically, its child function register_trace_event() requires this lock, which is occupied for an extended period during boot. To resolve this, the execution of primary init_blk_tracer() is moved to the trace_init_wq workqueue, allowing it to run asynchronously, and prevent blocking the main boot thread. Link: https://patch.msgid.link/20260204015353.163331-1-tianyaxiong@kylinos.cn Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-06tracing: Rename `eval_map_wq` and allow other parts of tracing use itYaxiong Tian2-9/+10
The eval_map_work_func() function, though queued in eval_map_wq, holds the trace_event_sem read-write lock for a long time during kernel boot. This causes blocking issues for other functions. Rename eval_map_wq to trace_init_wq and make it global, thereby allowing other parts of tracing to schedule work on this queue asynchronously and avoiding blockage of the main boot thread. Link: https://patch.msgid.link/20260204015344.162818-1-tianyaxiong@kylinos.cn Suggested-by: Steven Rostedt <rostedt@goodmis.org> Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org> Signed-off-by: Yaxiong Tian <tianyaxiong@kylinos.cn> Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2026-02-06rust: dma: add missing __rust_helper annotationsDirk Behme1-5/+5
The commit d8932355f8c5 ("rust: dma: add helpers for architectures without CONFIG_HAS_DMA") missed adding the __rust_helper annotations. Add them. Reported-by: Gary Guo <gary@garyguo.net> Closes: https://lore.kernel.org/rust-for-linux/DFW4F5OSDO7A.TBUOX6RCN8G7@garyguo.net/ Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Alice Ryhl <aliceryhl@google.com> Link: https://patch.msgid.link/20260126071738.1670967-1-dirk.behme@de.bosch.com [ Fix minor checkpatch.pl warning. - Danilo ] Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-02-06samples: rust: pci: Remove some additional `.as_ref()` for `dev_*` printDirk Behme1-3/+3
The commit 600de1c008b2 ("rust: pci: remove redundant `.as_ref()` for `dev_*` print") removed `.as_ref()` for `dev_*` prints. Nearly at the same time the commit e62e48adf76c ("sample: rust: pci: add tests for config space routines") was merged. Which missed this removal, then. Signed-off-by: Dirk Behme <dirk.behme@de.bosch.com> Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Link: https://patch.msgid.link/20260202064001.176787-1-dirk.behme@de.bosch.com Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-02-06Merge tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-clientLinus Torvalds7-27/+69
Pull ceph fixes from Ilya Dryomov: "One RBD and two CephFS fixes which address potential oopses. The RBD thing is more of a rare edge case that pops up in our CI, while the two CephFS scenarios are regressions that were reported by users and can be triggered trivially in normal operation. All marked for stable" * tag 'ceph-for-6.19-rc9' of https://github.com/ceph/ceph-client: ceph: fix NULL pointer dereference in ceph_mds_auth_match() ceph: fix oops due to invalid pointer for kfree() in parse_longname() rbd: check for EOD after exclusive lock is ensured to be held
2026-02-06Merge tag 'dma-mapping-6.19-2026-02-06' of ↵Linus Torvalds2-10/+25
git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux Pull dma-mapping fixes from Marek Szyprowski: "Two minor fixes for the DMA-mapping subsystem: - check for the rare case of the allocation failure of the global CMA pool (Shanker Donthineni) - avoid perf buffer overflow when tracing large scatter-gather lists (Deepanshu Kartikey)" * tag 'dma-mapping-6.19-2026-02-06' of git://git.kernel.org/pub/scm/linux/kernel/git/mszyprowski/linux: dma: contiguous: Check return value of dma_contiguous_reserve_area() tracing/dma: Cap dma_map_sg tracepoint arrays to prevent buffer overflow
2026-02-06Merge tag 'iommu-fix-v6.19-rc8' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux Pull iommu fix from Joerg Roedel: - Fix wrong definition of PASID_FLAG_PWSNP bit. This caused DMAR errors on Arrow Lake platforms. * tag 'iommu-fix-v6.19-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux: iommu/vt-d: Treat PAGE_SNOOP and PWSNP separately
2026-02-06Merge tag 'pmdomain-v6.19-rc3-3' of ↵Linus Torvalds4-8/+34
git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm Pull pmdomain fixes from Ulf Hansson: - imx: - Fix system wakeup support for imx8mp power domains - Fix potential out-of-range access for imx8m power domains - Fix the imx8mm gpu hang - qcom: Fix off-by-one error for highest state in rpmpd * tag 'pmdomain-v6.19-rc3-3' of git://git.kernel.org/pub/scm/linux/kernel/git/ulfh/linux-pm: pmdomain: imx8mp-blk-ctrl: Keep usb phy power domain on for system wakeup pmdomain: imx8mp-blk-ctrl: Keep gpc power domain on for system wakeup pmdomain: imx8m-blk-ctrl: fix out-of-range access of bc->domains pmdomain: imx: gpcv2: Fix the imx8mm gpu hang due to wrong adb400 reset pmdomain: qcom: rpmpd: fix off-by-one error in clamping to the highest state
2026-02-06Merge tag 'gpio-fixes-for-v6.19' of ↵Linus Torvalds2-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux Pull gpio fixes from Bartosz Golaszewski: - fix incorrect retval check in gpio-loongson-64bit - fix GPIO counting with ACPI * tag 'gpio-fixes-for-v6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/brgl/linux: gpio: loongson-64bit: Fix incorrect NULL check after devm_kcalloc() gpiolib: acpi: Fix gpio count with string references
2026-02-06Merge tag 'sound-6.19' of ↵Linus Torvalds18-56/+234
git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound Pull sound fixes from Takashi Iwai: "A collection of small fixes. It became a bit larger than wished, but all of them are device-specific small fixes, and it should be still fairly safe to take at the last minute. Included are a few quirks and fixes for Intel, AMD, HD-audio, and USB-audio, as well as a race fix in aloop driver and corrections of Cirrus firmware kunit test" * tag 'sound-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/tiwai/sound: ALSA: hda/realtek: Enable headset mic for Acer Nitro 5 ASoC: fsl_xcvr: fix missing lock in fsl_xcvr_mode_put() ASoC: dt-bindings: ti,tlv320aic3x: Add compatible string ti,tlv320aic23 ASoC: amd: fix memory leak in acp3x pdm dma ops ALSA: usb-audio: fix broken logic in snd_audigy2nx_led_update() ALSA: aloop: Fix racy access at PCM trigger ASoC: rt1320: fix intermittent no-sound issue ASoC: SOF: Intel: use hdev->info.link_mask directly firmware: cs_dsp: rate-limit log messages in KUnit builds ASoC: amd: yc: Add quirk for HP 200 G2a 16 ASoC: cs42l43: Correct handling of 3-pole jack load detection ASoC: Intel: sof_es8336: Add DMI quirk for Huawei BOD-WXX9 ASoC: sof_sdw: Add a quirk for Lenovo laptop using sidecar amps with cs42l43
2026-02-06Merge tag 'slab-for-6.19-rc8-fix' of ↵Linus Torvalds1-1/+5
git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab Pull slab fix from Vlastimil Babka: "A stable fix for memory allocation profiling tag not being cleared when aborting an allocation due to memcg charge failure (Hao Ge)" * tag 'slab-for-6.19-rc8-fix' of git://git.kernel.org/pub/scm/linux/kernel/git/vbabka/slab: mm/slab: Add alloc_tagging_slab_free_hook for memcg_alloc_abort_single
2026-02-06Merge branch 'fix-some-corner-cases-in-xskxceiver'Alexei Starovoitov1-1/+3
Larysa Zaremba says: ==================== Fix some corner cases in xskxceiver While working on XDP and AF_XDP support for ixgbevf driver, I came across two distinct problems that caused tests to fail when they shouldn't have. ==================== Link: https://patch.msgid.link/20260203155103.2305816-1-larysa.zaremba@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-06selftests/xsk: fix number of Tx frags in invalid packetLarysa Zaremba1-1/+1
The issue occurs in TOO_MANY_FRAGS test case when xdp_zc_max_segs is set to an odd number. TOO_MANY_FRAGS test case contains an invalid packet consisting of (xdp_zc_max_segs) frags. Every frag, even the last one has XDP_PKT_CONTD flag set. This packet is expected to be dropped. After that, there is a valid linear packet, which is expected to be received back. Once (xdp_zc_max_segs) is an odd number, the last packet cannot be received, if packet forwarding between Rx and Tx interfaces relies on the ethernet header, e.g. checks for ETH_P_LOOPBACK. Packet is malformed, if all traffic is looped. Turns out, sending function processes multiple invalid frags as if they were in 2-frag packets. So once the invalid mbuf packet contains an odd number of those, the valid packet after gets paired with the previous invalid descriptor, and hence does not get an ethernet header generated, so it is either dropped or malformed. Make invalid packets in verbatim mode always have only a single frag. For such packets, number of frags is otherwise meaningless, as descriptor flags are pre-configured in verbatim mode and packet data is not generated for invalid descriptors. Fixes: 697604492b64 ("selftests/xsk: add invalid descriptor test for multi-buffer") Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com> Link: https://lore.kernel.org/r/202