aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2025-11-01tracing: fprobe: use rhltable for fprobe_ip_tableMenglong Dong1-1/+2
For now, all the kernel functions who are hooked by the fprobe will be added to the hash table "fprobe_ip_table". The key of it is the function address, and the value of it is "struct fprobe_hlist_node". The budget of the hash table is FPROBE_IP_TABLE_SIZE, which is 256. And this means the overhead of the hash table lookup will grow linearly if the count of the functions in the fprobe more than 256. When we try to hook all the kernel functions, the overhead will be huge. Therefore, replace the hash table with rhltable to reduce the overhead. Link: https://lore.kernel.org/all/20250819031825.55653-1-dongml2@chinatelecom.cn/ Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
2025-10-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-2/+24
Cross-merge networking fixes after downstream PR (net-6.18-rc4). No conflicts, adjacent changes: drivers/net/ethernet/stmicro/stmmac/stmmac_main.c ded9813d17d3 ("net: stmmac: Consider Tx VLAN offload tag length for maxSDU") 26ab9830beab ("net: stmmac: replace has_xxxx with core_type") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-31can: convert generic HW timestamp ioctl to ndo_hwtstamp callbacksVadim Fedorenko1-1/+5
Can has generic implementation of ndo_eth_ioctl which implements only HW timestamping commands. Implement generic ndo_hwtstamp callbacks and use it in drivers instead of generic ioctl interface. Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Kory Maincent <kory.maincent@bootlin.com> Reviewed-by: Vincent Mailhol <mailhol@kernel.org> Link: https://patch.msgid.link/20251029231620.1135640-2-vadim.fedorenko@linux.dev Signed-off-by: Marc Kleine-Budde <mkl@pengutronix.de>
2025-10-31filemap: Add folio_next_pos()Matthew Wilcox (Oracle)1-0/+11
Replace the open-coded implementation in ocfs2 (which loses the top 32 bits on 32-bit architectures) with a helper in pagemap.h. Fixes: 35edec1d52c0 (ocfs2: update truncate handling of partial clusters) Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://patch.msgid.link/20251024170822.1427218-2-willy@infradead.org Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: ocfs2-devel@lists.linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-31Merge back system sleep material for 6.19Rafael J. Wysocki2-6/+11
2025-10-31libfs: allow to specify s_d_flagsChristian Brauner1-0/+1
Make it possible for pseudo filesystems to specify default dentry flags. Link: https://patch.msgid.link/20251029-work-namespace-nstree-listns-v4-1-2e6f823ebdc0@kernel.org Tested-by: syzbot@syzkaller.appspotmail.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-30freezer: Clarify that only cgroup1 freezer uses PM freezerTejun Heo1-4/+8
cgroup1 freezer piggybacks on the PM freezer, which inadvertently allowed userspace to produce uninterruptible tasks at will. To avoid the issue, cgroup2 freezer switched to a separate job control based mechanism. While this happened a long time ago, the code and comment haven't been updated making it confusing to people who aren't familiar with the history. Rename cgroup_freezing() to cgroup1_freezing() and update comments on top of freezing() and frozen() to clarify that cgroup2 freezer isn't covered by the PM freezer mechanism. Signed-off-by: Tejun Heo <tj@kernel.org> Suggested-by: Qu Wenruo <wqu@suse.com> Link: https://patch.msgid.link/aPZ3q6Hm865NicBC@slm.duckdns.org Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-10-30prandom: remove next_pseudo_random32Markus Theil1-6/+0
next_pseudo_random32 implements a LCG with known bad statistical properties and was only used in two pieces of testing code. With no remaining users now, remove it. Signed-off-by: Markus Theil <theil.markus@gmail.com> Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2025-10-30sysctl: fix kernel-doc format warningRandy Dunlap1-7/+4
Describe the "type" struct member using '@type' and move it together with the rest of the doc for ctl_table_header to avoid a kernel-doc warning: Warning: include/linux/sysctl.h:178 Incorrect use of kernel-doc format: * enum type - Enumeration to differentiate between ctl target types Fixes: 2f2665c13af4 ("sysctl: replace child with an enumeration") Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Joel Granados <joel.granados@kernel.org>
2025-10-29byteorder: Add le64_to_cpu_array() and cpu_to_le64_array()Eric Biggers1-0/+16
Add le64_to_cpu_array() and cpu_to_le64_array(). These mirror the corresponding 32-bit functions. These will be used by the BLAKE2b code. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20251018043106.375964-6-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2025-10-29compiler_types: Introduce __nocfi_genericNathan Chancellor1-0/+6
There are two different ways that LLVM can expand kCFI operand bundles in LLVM IR: generically in the middle end or using an architecture specific sequence when lowering LLVM IR to machine code in the backend. The generic pass allows any architecture to take advantage of kCFI but the expansion of these bundles in the middle end can mess with optimizations that may turn indirect calls into direct calls when the call target is known at compile time, such as after inlining. Add __nocfi_generic, dependent on an architecture selecting CONFIG_ARCH_USES_CFI_GENERIC_LLVM_PASS, to disable kCFI bundle generation in functions where only the generic kCFI pass may cause problems. Link: https://github.com/ClangBuiltLinux/linux/issues/2124 Signed-off-by: Nathan Chancellor <nathan@kernel.org> Link: https://patch.msgid.link/20251025-idpf-fix-arm-kcfi-build-error-v1-1-ec57221153ae@kernel.org Signed-off-by: Kees Cook <kees@kernel.org>
2025-10-29scsi: firmware: xilinx: Add APIs for UFS PHY initializationAjay Neeli2-0/+39
- Add APIs for UFS PHY initialization. - Verify M-PHY TX-RX configuration readiness. - Confirm SRAM initialization and Set SRAM bypass. - Retrieve UFS calibration values. Signed-off-by: Ajay Neeli <ajay.neeli@amd.com> Acked-by: Senthil Nathan Thangaraj <senthilnathan.thangaraj@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251021113003.13650-4-ajay.neeli@amd.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-29scsi: firmware: xilinx: Add support for secure read/write ioctl interfaceIzhar Ameer Shaikh1-0/+15
Add support for a generic ioctl read/write interface using which users can request firmware to perform read/write operations on a protected and secure address space. The functionality is introduced through the means of two new IOCTL IDs which extend the existing PM_IOCTL EEMI API: - IOCTL_READ_REG - IOCTL_MASK_WRITE_REG The caller only passes the node id of the given device and an offset. The base address is not exposed to the caller and internally retrieved by the firmware. Firmware will enforce an access policy on the incoming read/write request. Signed-off-by: Izhar Ameer Shaikh <izhar.ameer.shaikh@amd.com> Reviewed-by: Tanmay Shah <tanmay.shah@amd.com> Signed-off-by: Radhey Shyam Pandey <radhey.shyam.pandey@amd.com> Signed-off-by: Ajay Neeli <ajay.neeli@amd.com> Acked-by: Senthil Nathan Thangaraj <senthilnathan.thangaraj@amd.com> Acked-by: Michal Simek <michal.simek@amd.com> Acked-by: Bart Van Assche <bvanassche@acm.org> Link: https://patch.msgid.link/20251021113003.13650-3-ajay.neeli@amd.com Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2025-10-29net: phy: add iterator mdiobus_for_each_phyHeiner Kallweit1-1/+10
Add an iterator for all PHY's on a MII bus, and phy_find_next() as a prerequisite. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/cd112f15-401a-43d9-8525-9ff0965a68cd@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29ipv4: icmp: Add RFC 5837 supportIdo Schimmel1-0/+32
Add the ability to append the incoming IP interface information to ICMPv4 error messages in accordance with RFC 5837 and RFC 4884. This is required for more meaningful traceroute results in unnumbered networks. The feature is disabled by default and controlled via a new sysctl ("net.ipv4.icmp_errors_extension_mask") which accepts a bitmask of ICMP extensions to append to ICMP error messages. Currently, only a single value is supported, but the interface and the implementation should be able to support more extensions, if needed. Clone the skb and copy the relevant data portions before modifying the skb as the caller of __icmp_send() still owns the skb after the function returns. This should be fine since by default ICMP error messages are rate limited to 1000 per second and no more than 1 per second per specific host. Trim or pad the packet to 128 bytes before appending the ICMP extension structure in order to be compatible with legacy applications that assume that the ICMP extension structure always starts at this offset (the minimum length specified by RFC 4884). Reviewed-by: Petr Machata <petrm@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Ido Schimmel <idosch@nvidia.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20251027082232.232571-2-idosch@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-29fs: Make wbc_to_tag() inline and use it in fs.Julian Sun1-0/+7
The logic in wbc_to_tag() is widely used in file systems, so modify this function to be inline and use it in file systems. This patch has only passed compilation tests, but it should be fine. Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29writeback: allow the file system to override MIN_WRITEBACK_PAGESChristoph Hellwig2-0/+6
The relatively low minimal writeback size of 4MiB means that written back inodes on rotational media are switched a lot. Besides introducing additional seeks, this also can lead to extreme file fragmentation on zoned devices when a lot of files are cached relative to the available writeback bandwidth. Add a superblock field that allows the file system to override the default size. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251017034611.651385-3-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29mm: rename filemap_fdatawrite_range_kick to filemap_flush_rangeChristoph Hellwig1-3/+3
Rename filemap_fdatawrite_range_kick to filemap_flush_range because it is the ranged version of filemap_flush. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251024080431.324236-11-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29mm: remove __filemap_fdatawrite_rangeChristoph Hellwig1-2/+0
Use filemap_fdatawrite_range and filemap_fdatawrite_range_kick instead of the low-level __filemap_fdatawrite_range that requires the caller to know the internals of the writeback_control structure and remove __filemap_fdatawrite_range now that it is trivial and only two callers would be left. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251024080431.324236-10-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29mm: remove filemap_fdatawrite_wbcChristoph Hellwig1-2/+0
Replace filemap_fdatawrite_wbc, which exposes a writeback_control to the callers with a filemap_writeback helper that takes all the possible arguments and declares the writeback_control itself. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251024080431.324236-9-hch@lst.de Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29mm,btrfs: add a filemap_flush_nr helperChristoph Hellwig1-0/+1
Abstract out the btrfs-specific behavior of kicking off I/O on a number of pages on an address_space into a well-defined helper. Note: there is no kerneldoc comment for the new function because it is not part of the public API. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20251024080431.324236-7-hch@lst.de Reviewed-by: David Hildenbrand <david@redhat.com> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29regmap: add flat cache with sparse validitySander Vanheule1-6/+11
The flat regcache will always assume the data in the cache is valid. Since the cache is preferred over hardware access, this may shadow the actual state of the device. Add a new containing cache structure with the flat data table and a bitmap indicating cache validity. REGCACHE_FLAT will still behave as before, as the validity is ignored. Define new cache type REGCACHE_FLAT_S: a flat cache with sparse validity. The sparse validity is used to determine if a hardware access should occur to initialize the cache on the fly, vs. at regmap init for REGCACHE_FLAT. Contrary to REGCACHE_FLAT, this allows us to implement regcache_ops.drop. Signed-off-by: Sander Vanheule <sander@svanheule.net> Link: https://patch.msgid.link/20251029081248.52607-2-sander@svanheule.net Signed-off-by: Mark Brown <broonie@kernel.org>
2025-10-29perf: Support deferred user unwindPeter Zijlstra3-13/+14
Add support for deferred userspace unwind to perf. Where perf currently relies on in-place stack unwinding; from NMI context and all that. This moves the userspace part of the unwind to right before the return-to-userspace. This has two distinct benefits, the biggest is that it moves the unwind to a faultable context. It becomes possible to fault in debug info (.eh_frame, SFrame etc.) that might not otherwise be readily available. And secondly, it de-duplicates the user callchain where multiple samples happen during the same kernel entry. To facilitate this the perf interface is extended with a new record type: PERF_RECORD_CALLCHAIN_DEFERRED and two new attribute flags: perf_event_attr::defer_callchain - to request the user unwind be deferred perf_event_attr::defer_output - to request PERF_RECORD_CALLCHAIN_DEFERRED records The existing PERF_RECORD_SAMPLE callchain section gets a new context type: PERF_CONTEXT_USER_DEFERRED After which will come a single entry, denoting the 'cookie' of the deferred callchain that should be attached here, matching the 'cookie' field of the above mentioned PERF_RECORD_CALLCHAIN_DEFERRED. The 'defer_callchain' flag is expected on all events with PERF_SAMPLE_CALLCHAIN. The 'defer_output' flag is expect on the event responsible for collecting side-band events (like mmap, comm etc.). Setting 'defer_output' on multiple events will get you duplicated PERF_RECORD_CALLCHAIN_DEFERRED records. Based on earlier patches by Josh and Steven. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251023150002.GR4067720@noisy.programming.kicks-ass.net
2025-10-29unwind_user/x86: Teach FP unwind about start of functionPeter Zijlstra1-0/+1
When userspace is interrupted at the start of a function, before we get a chance to complete the frame, unwind will miss one caller. X86 has a uprobe specific fixup for this, add bits to the generic unwinder to support this. Suggested-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251024145156.GM4068168@noisy.programming.kicks-ass.net
2025-10-29unwind: Implement compat fp unwindPeter Zijlstra1-0/+1
It is important to be able to unwind compat tasks too. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20250924080119.613695709@infradead.org
2025-10-29unwind: Make unwind_task_info::unwind_mask consistentPeter Zijlstra2-3/+4
The unwind_task_info::unwind_mask was manipulated using a mixture of: regular store WRITE_ONCE() try_cmpxchg() set_bit() atomic_long_*() Clean up and make it consistently atomic_long_t. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20250924080119.384384486@infradead.org
2025-10-29unwind: Simplify unwind_reset_info()Peter Zijlstra1-14/+14
Invert the condition of the first if and make it an early exit to reduce an indent level for the rest fo the function. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20250924080118.777916262@infradead.org
2025-10-29unwind: Add required include filesPeter Zijlstra1-0/+2
To be self sufficient, the file needs to include linux/types.h. This provides things like u32/u64 and struct callback_head. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20250924080118.665787071@infradead.org
2025-10-29unwind: Shorten linesPeter Zijlstra1-4/+14
There are some exceptionally long lines that cause ugly wrapping. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Steven Rostedt (Google) <rostedt@goodmis.org> Link: https://patch.msgid.link/20250924080118.545274393@infradead.org
2025-10-29dma-mapping: remove unused map_page callbackLeon Romanovsky1-7/+0
After conversion of arch code to use physical address mapping, there are no users of .map_page() and .unmap_page() callbacks, so let's remove them. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-14-3bbfe3a25cdf@kernel.org
2025-10-29dma-mapping: remove unused mapping resource callbacksLeon Romanovsky1-6/+0
After ARM and XEN conversions to use physical addresses for the mapping, there are no in-kernel users for map_resource/unmap_resource callbacks, so remove them. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-6-3bbfe3a25cdf@kernel.org
2025-10-29dma-mapping: prepare dma_map_ops to conversion to physical addressLeon Romanovsky1-0/+7
Add new .map_phys() and .unmap_phys() callbacks to dma_map_ops as a preparation to replace .map_page() and .unmap_page() respectively. Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251015-remove-map-page-v5-1-3bbfe3a25cdf@kernel.org
2025-10-29dma-mapping: benchmark: Restore padding to ensure uABI remained consistentQinxin Xia1-0/+1
The padding field in the structure was previously reserved to maintain a stable interface for potential new fields, ensuring compatibility with user-space shared data structures. However,it was accidentally removed by tiantao in a prior commit, which may lead to incompatibility between user space and the kernel. This patch reinstates the padding to restore the original structure layout and preserve compatibility. Fixes: 8ddde07a3d28 ("dma-mapping: benchmark: extract a common header file for map_benchmark definition") Cc: stable@vger.kernel.org Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com> Reported-by: Barry Song <baohua@kernel.org> Closes: https://lore.kernel.org/lkml/CAGsJ_4waiZ2+NBJG+SCnbNk+nQ_ZF13_Q5FHJqZyxyJTcEop2A@mail.gmail.com/ Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251028120900.2265511-2-xiaqinxin@huawei.com
2025-10-29tools/dma: move dma_map_benchmark from selftests to tools/dmaQinxin Xia1-32/+0
dma_map_benchmark is a standalone developer tool rather than an automated selftest. It has no pass/fail criteria, expects manual invocation, and is built as a normal userspace binary. Move it to tools/dma/ and add a minimal Makefile. Suggested-by: Marek Szyprowski <m.szyprowski@samsung.com> Suggested-by: Barry Song <baohua@kernel.org> Signed-off-by: Qinxin Xia <xiaqinxin@huawei.com> Acked-by: Barry Song <baohua@kernel.org> Signed-off-by: Marek Szyprowski <m.szyprowski@samsung.com> Link: https://lore.kernel.org/r/20251028120900.2265511-3-xiaqinxin@huawei.com
2025-10-29Merge branch 'linus/master' into sched/core, to resolve conflictPeter Zijlstra17-16/+127
Conflicts: kernel/sched/ext.c Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-10-28net: rps: softnet_data reorg to make enqueue_to_backlog() fastEric Dumazet1-1/+10
enqueue_to_backlog() is showing up in kernel profiles on hosts with many cores, when RFS/RPS is used. The following softnet_data fields need to be updated: - input_queue_tail - input_pkt_queue (next, prev, qlen, lock) - backlog.state (if input_pkt_queue was empty) Unfortunately they are currenly using two cache lines: /* --- cacheline 3 boundary (192 bytes) --- */ call_single_data_t csd __attribute__((__aligned__(64))); /* 0xc0 0x20 */ struct softnet_data * rps_ipi_next; /* 0xe0 0x8 */ unsigned int cpu; /* 0xe8 0x4 */ unsigned int input_queue_tail; /* 0xec 0x4 */ struct sk_buff_head input_pkt_queue; /* 0xf0 0x18 */ /* --- cacheline 4 boundary (256 bytes) was 8 bytes ago --- */ struct napi_struct backlog __attribute__((__aligned__(8))); /* 0x108 0x1f0 */ Add one ____cacheline_aligned_in_smp to make sure they now are using a single cache line. Also, because napi_struct has written fields, make @state its first field. We want to make sure that cpus adding packets to sd->input_pkt_queue are not slowing down cpus processing their backlog because of false sharing. After this patch new layout is: /* --- cacheline 5 boundary (320 bytes) --- */ long int pad[3] __attribute__((__aligned__(64))); /* 0x140 0x18 */ unsigned int input_queue_tail; /* 0x158 0x4 */ /* XXX 4 bytes hole, try to pack */ struct sk_buff_head input_pkt_queue; /* 0x160 0x18 */ struct napi_struct backlog __attribute__((__aligned__(8))); /* 0x178 0x1f0 */ Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20251024091240.3292546-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-10-28tracing: Add trace_seq_pop() and seq_buf_pop()Steven Rostedt2-0/+30
In order to allow an interface to remove an added character from the trace_seq and seq_buf descriptors, add helper functions trace_seq_pop() and seq_buf_pop(). Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Takaya Saeki <takayas@google.com> Cc: Tom Zanussi <zanussi@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ian Rogers <irogers@google.com> Cc: Douglas Raillard <douglas.raillard@arm.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Link: https://lore.kernel.org/20251028231148.594898736@kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2025-10-29Add SDCA UMP/FDL supportMark Brown1-10/+11
Merge series from Charles Keepax <ckeepax@opensource.cirrus.com>: Next installment of the SDCA changes, hopefully the next series after this should be the full class driver. It is worth noting this series has a build dependency on a patch working its way through the PM/ACPI tree: commit ac46f5b6c661 ("ACPICA: Add SoundWire File Table (SWFT) signature") git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git But we can probably worry about that later, as normally there is a reasonable amount of review on these SDCA series'. This series broadly breaks down into 3 chunks, first there are several changes to remove the assumption that the struct device used for SDCA purposes represents the SoundWire slave. This is because the SDCA class driver will be made of an auxiliary driver for each SDCA Function, thus the SoundWire slave will be on the parent device for each individual driver. Then there are patches to add support for UMP/FDL. And then finally since the rest of the HID support is there and UMP was the last missing part required a small patch to add a function to allow reporting of HID events from SDCA devices.
2025-10-28fbcon: Set fb_display[i]->mode to NULL when the mode is releasedQuanmin Yan1-0/+2
Recently, we discovered the following issue through syzkaller: BUG: KASAN: slab-use-after-free in fb_mode_is_equal+0x285/0x2f0 Read of size 4 at addr ff11000001b3c69c by task syz.xxx ... Call Trace: <TASK> dump_stack_lvl+0xab/0xe0 print_address_description.constprop.0+0x2c/0x390 print_report+0xb9/0x280 kasan_report+0xb8/0xf0 fb_mode_is_equal+0x285/0x2f0 fbcon_mode_deleted+0x129/0x180 fb_set_var+0xe7f/0x11d0 do_fb_ioctl+0x6a0/0x750 fb_ioctl+0xe0/0x140 __x64_sys_ioctl+0x193/0x210 do_syscall_64+0x5f/0x9c0 entry_SYSCALL_64_after_hwframe+0x76/0x7e Based on experimentation and analysis, during framebuffer unregistration, only the memory of fb_info->modelist is freed, without setting the corresponding fb_display[i]->mode to NULL for the freed modes. This leads to UAF issues during subsequent accesses. Here's an example of reproduction steps: 1. With /dev/fb0 already registered in the system, load a kernel module to register a new device /dev/fb1; 2. Set fb1's mode to the global fb_display[] array (via FBIOPUT_CON2FBMAP); 3. Switch console from fb to VGA (to allow normal rmmod of the ko); 4. Unload the kernel module, at this point fb1's modelist is freed, leaving a wild pointer in fb_display[]; 5. Trigger the bug via system calls through fb0 attempting to delete a mode from fb0. Add a check in do_unregister_framebuffer(): if the mode to be freed exists in fb_display[], set the corresponding mode pointer to NULL. Signed-off-by: Quanmin Yan <yanquanmin1@huawei.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Helge Deller <deller@gmx.de> Cc: stable@vger.kernel.org
2025-10-28PCI: Enable host bridge emulation for PCI_DOMAINS_GENERIC platformsDan Williams1-0/+7
The ability to emulate a host bridge is useful not only for hardware PCI controllers like CONFIG_VMD, or virtual PCI controllers like CONFIG_PCI_HYPERV, but also for test and development scenarios like CONFIG_SAMPLES_DEVSEC [1]. One stumbling block for defining CONFIG_SAMPLES_DEVSEC, a sample implementation of a platform TSM for PCI Device Security, is the need to accommodate PCI_DOMAINS_GENERIC architectures alongside x86 [2]. In support of supplementing the existing CONFIG_PCI_BRIDGE_EMUL infrastructure for host bridges: * Introduce pci_bus_find_emul_domain_nr() as a common way to find a free PCI domain number whether that is to reuse the existing dynamic allocation code in the !ACPI case, or to assign an unused domain above the last ACPI segment. * Convert pci-hyperv to the new allocator so that the PCI core can unconditionally assume that bridge->domain_nr != PCI_DOMAIN_NR_NOT_SET is the dynamically allocated case. A follow on patch can also convert vmd to the new scheme. Currently vmd is limited to CONFIG_PCI_DOMAINS_GENERIC=n (x86) so, unlike pci-hyperv, it does not immediately conflict with this new pci_bus_find_emul_domain_nr() mechanism. Link: http://lore.kernel.org/174107249038.1288555.12362100502109498455.stgit@dwillia2-xfh.jf.intel.com [1] Reported-by: Suzuki K Poulose <suzuki.poulose@arm.com> Closes: http://lore.kernel.org/20250311144601.145736-3-suzuki.poulose@arm.com [2] Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Suzuki K Poulose <suzuki.poulose@arm.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Cc: Lorenzo Pieralisi <lpieralisi@kernel.org> Cc: Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: Rob Herring <robh@kernel.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Link: https://patch.msgid.link/20251024224622.1470555-2-dan.j.williams@intel.com
2025-10-28rcu: Add a small-width RCU watching counter debug optionValentin Schneider1-7/+37
A later commit will reduce the size of the RCU watching counter to free up some bits for another purpose. Paul suggested adding a config option to test the extreme case where the counter is reduced to its minimum usable width for rcutorture to poke at, so do that. Make it only configurable under RCU_EXPERT. While at it, add a comment to explain the layout of context_tracking->state. Link: http://lore.kernel.org/r/4c2cb573-168f-4806-b1d9-164e8276e66a@paulmck-laptop Suggested-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Valentin Schneider <vschneid@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2025-10-28regmap: irq: Correct documentation of wake_invert flagShawn Guo1-1/+1
Per commit 9442490a0286 ("regmap: irq: Support wake IRQ mask inversion") the wake_invert flag is to support enable register, so cleared bits are wake disabled. Fixes: 68622bdfefb9 ("regmap: irq: document mask/wake_invert flags") Cc: stable@vger.kernel.org Signed-off-by: Shawn Guo <shawnguo@kernel.org> Link: https://patch.msgid.link/20251024082344.2188895-1-shawnguo2@yeah.net Signed-off-by: Mark Brown <broonie@kernel.org>
2025-10-28platform/x86: int3472: Fix double free of GPIO device during unregisterQiu Wenbo1-1/+0
regulator_unregister() already frees the associated GPIO device. On ThinkPad X9 (Lunar Lake), this causes a double free issue that leads to random failures when other drivers (typically Intel THC) attempt to allocate interrupts. The root cause is that the reference count of the pinctrl_intel_platform module unexpectedly drops to zero when this driver defers its probe. This behavior can also be reproduced by unloading the module directly. Fix the issue by removing the redundant release of the GPIO device during regulator unregistration. Cc: stable@vger.kernel.org Fixes: 1e5d088a52c2 ("platform/x86: int3472: Stop using devm_gpiod_get()") Signed-off-by: Qiu Wenbo <qiuwenbo@kylinsec.com.cn> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com> Reviewed-by: Hans de Goede <hansg@kernel.org> Reviewed-by: Daniel Scally <dan.scally@ideasonboard.com> Link: https://patch.msgid.link/20251028063009.289414-1-qiuwenbo@gnome.org Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
2025-10-28usb: gadget: udc: fix use-after-free in usb_gadget_state_workJimmy Hu1-0/+5
A race condition during gadget teardown can lead to a use-after-free in usb_gadget_state_work(), as reported by KASAN: BUG: KASAN: invalid-access in sysfs_notify+0x2c/0xd0 Workqueue: events usb_gadget_state_work The fundamental race occurs because a concurrent event (e.g., an interrupt) can call usb_gadget_set_state() and schedule gadget->work at any time during the cleanup process in usb_del_gadget(). Commit 399a45e5237c ("usb: gadget: core: flush gadget workqueue after device removal") attempted to fix this by moving flush_work() to after device_del(). However, this does not fully solve the race, as a new work item can still be scheduled *after* flush_work() completes but before the gadget's memory is freed, leading to the same use-after-free. This patch fixes the race condition robustly by introducing a 'teardown' flag and a 'state_lock' spinlock to the usb_gadget struct. The flag is set during cleanup in usb_del_gadget() *before* calling flush_work() to prevent any new work from being scheduled once cleanup has commenced. The scheduling site, usb_gadget_set_state(), now checks this flag under the lock before queueing the work, thus safely closing the race window. Fixes: 5702f75375aa9 ("usb: gadget: udc-core: move sysfs_notify() to a workqueue") Cc: stable <stable@kernel.org> Signed-off-by: Jimmy Hu <hhhuuu@google.com> Link: https://patch.msgid.link/20251023054945.233861-1-hhhuuu@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-10-28block: make REQ_OP_ZONE_OPEN a write operationDamien Le Moal1-5/+5
A REQ_OP_OPEN_ZONE request changes the condition of a sequential zone of a zoned block device to the explicitly open condition (BLK_ZONE_COND_EXP_OPEN). As such, it should be considered a write operation. Change this operation code to be an odd number to reflect this. The following operation numbers are changed to keep the numbering compact. No problems were reported without this change as this operation has no data. However, this unifies the zone operation to reflect that they modify the device state and also allows strengthening checks in the block layer, e.g. checking if this operation is not issued against a read-only device. Fixes: 6c1b1da58f8c ("block: add zone open, close and finish operations") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-10-28block: fix op_is_zone_mgmt() to handle REQ_OP_ZONE_RESET_ALLDamien Le Moal1-0/+1
REQ_OP_ZONE_RESET_ALL is a zone management request. Fix op_is_zone_mgmt() to return true for that operation, like it already does for REQ_OP_ZONE_RESET. While no problems were reported without this fix, this change allows strengthening checks in various block device drivers (scsi sd, virtioblk, DM) where op_is_zone_mgmt() is used to verify that a zone management command is not being issued to a regular block device. Fixes: 6c1b1da58f8c ("block: add zone open, close and finish operations") Cc: stable@vger.kernel.org Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2025-10-28regulator: pca9450: add input supply linksMark Brown17-16/+127
Merge series from Oleksij Rempel <o.rempel@pengutronix.de>: This series adds input supply definitions for the NXP PCA9450 PMIC. Some systems detect power events such as undervoltage before the PMIC. To allow correct propagation of such events, each regulator must define its upstream input supply. The first patch updates the devicetree binding to document new *-supply properties, and the second patch adds matching .supply_name entries in the driver. Changes in this series: - Document INL1, INB13, INB26 and INB45 supply properties - Link all LDO and BUCK regulators to their corresponding input groups
2025-10-28net/mlx5: Add balance ID support for LAG multiplane groupsMark Bloch1-1/+1
Implement balance ID support for multiplane LAG configurations. This feature enables per-multiplane group load balancing by extending the software system image GUID with a balance ID component. Key implementations: - Enable lag_per_mp_group capability when supported by hardware. - Append load_balance_id to software system image GUID when conditions are met. - Increase MLX5_SW_IMAGE_GUID_MAX_BYTES from 8 to 9 to accommodate the extra byte. The balance ID is appended to the system image GUID only when both load_balance_id and lag_per_mp_group capabilities are available, ensuring backward compatibility while enabling enhanced LAG functionality. This enhancement allows for more granular load balancing control in complex multi-plane LAG deployments, improving network performance and flexibility. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1761211020-925651-6-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28net/mlx5: Add software system image GUID infrastructureMark Bloch1-0/+3
Replace direct hardware system image GUID usage with a new software system image GUID function that supports variable-length identifiers. Key changes: - Add mlx5_query_nic_sw_system_image_guid() function with length parameter. - Update all callsites to use the new function and buffer/length approach. - Modify mapping contexts to use byte arrays instead of u64 keys. - Update devcom matching to support variable-length keys. - Change mlx5_same_hw_devs() to use buffer comparison instead of u64. This refactoring prepares the infrastructure for balance ID support, which requires extending the system image GUID with additional data. The change maintains backward compatibility while enabling future enhancements. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Shay Drori <shayd@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/1761211020-925651-3-git-send-email-tariqt@nvidia.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-10-28pinctrl: pinconf-generic: Add properties 'skew-delay-{in,out}put-ps'Antonio Borneo1-0/+8
Add the properties 'skew-delay-input-ps' and 'skew-delay-output-ps' to the generic parameters used for parsing DT files. This allows to specify the independent skew delay value for the two directions. This enables drivers that use the generic pin configuration to get the value passed through these new properties. Signed-off-by: Antonio Borneo <antonio.borneo@foss.st.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org>