aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2026-05-01Merge tag 'nf-26-05-01' of ↵Jakub Kicinski1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following batch contains Netfilter fixes for net: 1) Replace skb_try_make_writable() by skb_ensure_writable() in nft_fwd_netdev and the flowtable to deal with uncloned packets having their network header in paged fragments. 2) Drop packet if output device does not exist and ensure sufficient headroom in nft_fwd_netdev before transmitting the skb. 3) Use the existing dup recursion counter in nft_fwd_netdev for the neigh_xmit variant, from Weiming Shi. 4) Add .check_hooks interface to x_tables to detach the control plane hook check based on the match/target configuration. Then, update nft_compat to use .check_hooks from .validate path, this fixes a lack of hook validation for several match/targets. 5) Fix incorrect .usersize in xt_CT, from Florian Westphal. 6) Fix a memleak with netdev tables in dormant state, from Florian Westphal. 7) Several patches to check if the packet is a fragment, then skip layer 4 inspection, for x_tables and nf_tables; as well as common nf_socket infrastructure. The xt_hashlimit match drops fragments to stay consistent with the existing approach when failing to parse the layer 4 protocol header. 8) Ensure sufficient headroom in the flowtable before transmitting the skb. 9) Fix the flowtable inline vlan approach for double-tagged vlan: Reverse the iteration over .encap[] since it represents the encapsulation as seen from the ingress path. Postpone pushing layer 2 header so output device is available to calculate needed headroom. Finally, add and use nf_flow_vlan_push() to fix it. 10) Fix flowtable inline pppoe with GSO packets. Moreover, use FLOW_OFFLOAD_XMIT_DIRECT to fill up destination hardware address since neighbour cache does not exist in pppoe. 11) Use skb_pull_rcsum() to decapsulate vlan and pppoe headers, for double-tagged vlan in particular this should provide some benefits in certain scenarios. More notes regarding 9-11): - sashiko is also signalling to use it for IPIP headers, but that needs more adjustments such setting skb->protocol after removing the IPIP header, will follow up in a separated patch. - I plan to submit selftests to cover double-tagged-vlan. As for pppoe, it should be possible but that would mandate a few userspace dependencies. This has been semi-automatically tested by me and reporters describing broken double-vlan-tagged and pppoe currently in the flowtable. * tag 'nf-26-05-01' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: flowtable: use skb_pull_rcsum() to pop vlan/pppoe header netfilter: flowtable: fix inline pppoe encapsulation in xmit path netfilter: flowtable: fix inline vlan encapsulation in xmit path netfilter: flowtable: ensure sufficient headroom in xmit path netfilter: xtables: fix L4 header parsing for non-first fragments netfilter: nf_tables: skip L4 header parsing for non-first fragments netfilter: nf_socket: skip socket lookup for non-first fragments netfilter: nf_tables: fix netdev hook allocation memleak with dormant tables netfilter: xt_CT: fix usersize for v1 and v2 revision netfilter: nft_compat: run xt_check_hooks_{match,target}() from .validate netfilter: x_tables: add .check_hooks to matches and targets netfilter: nft_fwd_netdev: use recursion counter in neigh egress path netfilter: nft_fwd_netdev: add device and headroom validate with neigh forwarding netfilter: replace skb_try_make_writable() by skb_ensure_writable() ==================== Link: https://patch.msgid.link/20260501122237.296262-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-01smb: smbdirect: introduce and use include/linux/smbdirect.hStefan Metzmacher1-0/+186
This makes it easier to rebuild cifs.ko and ksmbd.ko against a running kernel. Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/linux-cifs/aehrPuY60VMcYGU8@infradead.org/ Cc: Steve French <smfrench@gmail.com> Cc: Tom Talpey <tom@talpey.com> Cc: Long Li <longli@microsoft.com> Cc: Namjae Jeon <linkinjeon@kernel.org> Cc: Christoph Hellwig <hch@infradead.org> Cc: linux-cifs@vger.kernel.org Cc: samba-technical@lists.samba.org Signed-off-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Steve French <stfrench@microsoft.com>
2026-05-01alarmtimer: Remove unused interfacesThomas Gleixner1-3/+0
All alarmtimer users are converted to alarm_start_timer(). Remove the now unused interfaces. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://patch.msgid.link/20260408114952.670899355@kernel.org
2026-05-01alarmtimer: Provide alarm_start_timer()Thomas Gleixner1-0/+6
Alarm timers utilize hrtimers for normal operation and only switch to the RTC on suspend. In order to catch already expired timers early and without going through a timer interrupt cycle, provide a new start function which internally uses hrtimer_start_range_ns_user(). If hrtimer_start_range_ns_user() detects an already expired timer, it does not queue it. In that case remove the timer from the alarm base as well. Return the status queued or not back to the caller to handle the early expiry. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: John Stultz <jstultz@google.com> Link: https://patch.msgid.link/20260408114952.332822525@kernel.org
2026-05-01hrtimer: Provide hrtimer_start_range_ns_user()Thomas Gleixner1-3/+17
Calvin reported an odd NMI watchdog lockup which claims that the CPU locked up in user space. He provided a reproducer, which set's up a timerfd based timer and then rearms it in a loop with an absolute expiry time of 1ns. As the expiry time is in the past, the timer ends up as the first expiring timer in the per CPU hrtimer base and the clockevent device is programmed with the minimum delta value. If the machine is fast enough, this ends up in a endless loop of programming the delta value to the minimum value defined by the clock event device, before the timer interrupt can fire, which starves the interrupt and consequently triggers the lockup detector because the hrtimer callback of the lockup mechanism is never invoked. The clockevents code already has a last resort mechanism to prevent that, but it's sensible to catch such issues before trying to reprogram the clock event device. Provide a variant of hrtimer_start_range_ns(), which sanity checks the timer after queueing it. It does not so before because the timer might be armed and therefore needs to be dequeued. also we optimize for the latest possible point to check, so that the clock event prevention is avoided as much as possible. If the timer is already expired _before_ the clock event is reprogrammed, remove the timer from the queue and signal to the caller that the operation failed by returning false. That allows the caller to take immediate action without going through the loops and hoops of the hrtimer interrupt. The queueing code can't invoke the timer callback as the caller might hold a lock which is taken in the callback. Add a tracepoint which allows to analyze the expired at start situation. Reported-by: Calvin Owens <calvin@wbinvd.org> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Calvin Owens <calvin@wbinvd.org> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260408114951.995031895@kernel.org
2026-05-01rseq: Protect rseq_reset() against interruptsThomas Gleixner1-0/+2
rseq_reset() uses memset() to clear the tasks rseq data. That's racy against membarrier() and preemption. Guard it with irqsave to cure this. Fixes: faba9d250eae ("rseq: Introduce struct rseq_data") Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Link: https://patch.msgid.link/20260428224427.353887714%40kernel.org Cc: stable@vger.kernel.org
2026-05-01Merge tag 'block-7.1-20260430' of ↵Linus Torvalds2-3/+4
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - MD pull request via Yu: - Fix a raid5 UAF on IO across the reshape position - Avoid failing RAID1/RAID10 devices for invalid IO errors - Fix RAID10 divide-by-zero when far_copies is zero - Restore bitmap grow through sysfs - Use mddev_is_dm() instead of open-coding gendisk checks - Use ATTRIBUTE_GROUPS() for md default sysfs attributes - Replace open-coded wait loops with wait_event helpers - NVMe pull request via Keith: - Target data transfer size configuation (Aurelien) - Enable P2P for RDMA (Shivaji Kant) - TCP target updates (Maurizio, Alistair, Chaitanya, Shivam Kumar) - TCP host updates (Alistair, Chaitanya) - Authentication updates (Alistair, Daniel, Chris Leech) - Multipath fixes (John Garry) - New quirks (Alan Cui, Tao Jiang) - Apple driver fix (Fedor Pchelkin) - PCI admin doorbell update fix (Keith) - Properly propagate CDROM read-only state to the block layer * tag 'block-7.1-20260430' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (35 commits) md: use ATTRIBUTE_GROUPS() for md default sysfs attributes md: use mddev_is_dm() instead of open-coding gendisk checks md/raid1: replace wait loop with wait_event_idle() in raid1_write_request() md/md-bitmap: add a none backend for bitmap grow md/md-bitmap: split bitmap sysfs groups md: factor bitmap creation away from sysfs handling md: use mddev_lock_nointr() in mddev_suspend_and_lock_nointr() md: replace wait loop with wait_event() in md_handle_request() md/raid10: fix divide-by-zero in setup_geo() with zero far_copies md/raid1,raid10: don't fail devices for invalid IO errors MAINTAINERS: Add Xiao Ni as md/raid reviewer md/raid5: Fix UAF on IO across the reshape position cdrom, scsi: sr: propagate read-only status to block layer via set_disk_ro() nvme-auth: Hash DH shared secret to create session key nvme-pci: fix missed admin queue sq doorbell write nvme-auth: Include SC_C in RVAL controller hash nvme-tcp: teardown circular locking fixes nvmet-tcp: Don't clear tls_key when freeing sq Revert "nvmet-tcp: Don't free SQ on authentication success" nvme: skip trace completion for host path errors ...
2026-05-01Merge tag 'mm-hotfixes-stable-2026-04-30-15-39' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM fixes from Andrew Morton: "20 hotfixes. All are for MM (and for MMish maintainers). 9 are cc:stable and the remainder are for post-7.0 issues or aren't deemed suitable for backporting. There are two DAMON series from SeongJae Park which address races which could lead to use-after-free errors, and avoid the possibility of presenting stale parameter values to users" * tag 'mm-hotfixes-stable-2026-04-30-15-39' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: mm: memcontrol: fix rcu unbalance in get_non_dying_memcg_end() mm/userfaultfd: detect VMA type change after copy retry in mfill_copy_folio_retry() MAINTAINERS: remove stale kdump project URL mm/damon/stat: detect and use fresh enabled value mm/damon/lru_sort: detect and use fresh enabled and kdamond_pid values mm/damon/reclaim: detect and use fresh enabled and kdamond_pid values selftests/mm: specify requirement for PROC_MEM_ALWAYS_FORCE=y mm/damon/sysfs-schemes: protect path kfree() with damon_sysfs_lock mm/damon/sysfs-schemes: protect memcg_path kfree() with damon_sysfs_lock MAINTAINERS: update Li Wang's email address MAINTAINERS, mailmap: update email address for Qi Zheng MAINTAINERS: update Liam's email address mm/hugetlb_cma: round up per_node before logging it MAINTAINERS: fix regex pattern in CORE MM category mm/vma: do not try to unmap a VMA if mmap_prepare() invoked from mmap() mm: start background writeback based on per-wb threshold for strictlimit BDIs kho: fix error handling in kho_add_subtree() liveupdate: fix return value on session allocation failure mailmap: update entry for Dan Carpenter vmalloc: fix buffer overflow in vrealloc_node_align()
2026-04-30Merge tag 'mtd/fixes-for-7.1-rc2' of ↵Linus Torvalds1-0/+7
git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux Pull mtd fixes from Miquel Raynal: "Besides an out-of-bound bug, this is about properly supporting Winbond octal SPI NAND chips which use a specific pattern for stuffing more address bits in some operations. This uses the spi-mem flag in SPI NAND that was added to the spi-mem layer just before the merge window through the spi tree" * tag 'mtd/fixes-for-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux: mtd: spinand: winbond: Fix ODTR write VCR on W35NxxJW mtd: spinand: winbond: Set the packed page read flag to W35N02/04JW mtd: spinand: Add support for packed read data ODTR commands mtd: spi-nor: debugfs: fix out-of-bounds read in spi_nor_params_show()
2026-04-30Merge tag 'net-7.1-rc2' of ↵Linus Torvalds3-0/+33
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from netfilter. Current release - regressions: - ipmr: free mr_table after RCU grace period. Previous releases - regressions: - core: add net_iov_init() and use it to initialize ->page_type - sched: taprio: fix NULL pointer dereference in class dump - netfilter: nf_tables: - use list_del_rcu for netlink hooks - fix strict mode inbound policy matching - tcp: make probe0 timer handle expired user timeout - vrf: fix a potential NPD when removing a port from a VRF - eth: ice: - fix NULL pointer dereference in ice_reset_all_vfs() - fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw Previous releases - always broken: - page_pool: fix memory-provider leak in error path - sched: sch_cake: annotate data-races in cake_dump_stats() - mptcp: fix scheduling with atomic in timestamp sockopt - psp: check for device unregister when creating assoc - tls: fix strparser anchor skb leak on offload RX setup failure - eth: - stmmac: prevent NULL deref when RX memory exhausted - airoha: do not read uninitialized fragment address - rtl8150: fix use-after-free in rtl8150_start_xmit() Misc: - add Ido Schimmel as IPv4/IPv6 maintainer - add David Heidelberg as NFC subsystem maintainer" * tag 'net-7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (79 commits) net/sched: cls_flower: revert unintended changes sfc: fix error code in efx_devlink_info_running_versions() net: tls: fix strparser anchor skb leak on offload RX setup failure ice: add dpll peer notification for paired SMA and U.FL pins ice: fix missing dpll notifications for SW pins dpll: export __dpll_pin_change_ntf() for use under dpll_lock ice: fix SMA and U.FL pin state changes affecting paired pin ice: fix missing SMA pin initialization in DPLL subsystem ice: fix infinite recursion in ice_cfg_tx_topo via ice_init_dev_hw ice: fix NULL pointer dereference in ice_reset_all_vfs() iavf: add VIRTCHNL_OP_ADD_VLAN to success completion handler iavf: wait for PF confirmation before removing VLAN filters iavf: stop removing VLAN filters from PF on interface down iavf: rename IAVF_VLAN_IS_NEW to IAVF_VLAN_ADDING page_pool: fix memory-provider leak in page_pool_create_percpu() error path bonding: 3ad: implement proper RCU rules for port->aggregator net: airoha: Do not return err in ndo_stop() callback hv_sock: fix ARM64 support MAINTAINERS: update the IPv4/IPv6 entry and add Ido Schimmel selftests: drv-net: clarify linters and frameworks in README ...
2026-04-30kcsan: Silence -Wmaybe-uninitialized when calling __kcsan_check_access()Marco Elver1-3/+3
Some subsystems enable -Wmaybe-uninitialized [1], which can trigger false positives when KCSAN is enabled. Specifically, passing an uninitialized variable to functions that instrument accesses (e.g., copy_from_user()) results in calls to __kcsan_check_access(). Because __kcsan_check_access() takes a `const volatile void *ptr`, GCC infers that the function may only read the memory location, and thus warns if the passed variable is uninitialized. However, KCSAN is a dynamic analysis tool for data race detection; while it does read the memory location to detect concurrent modifications, the "initialized'ness" of the memory location is irrelevant for its analysis. Use absolute_pointer() in __kcsan_check_write(), kcsan_check_write(), and kcsan_check_atomic_write() to hide the pointer from the compiler, preventing it from concluding that the pointer passed points to uninitialized memory. This fixes warnings like: | CC fs/ntfs3/file.o | In file included from include/asm-generic/rwonce.h:27, | from arch/arm64/include/asm/rwonce.h:81, | from include/linux/compiler.h:369, | from include/linux/array_size.h:5, | from include/linux/kernel.h:16, | from include/linux/backing-dev.h:12, | from fs/ntfs3/file.c:10: | In function 'instrument_copy_from_user_before', | inlined from '_inline_copy_from_user' at include/linux/uaccess.h:184:2, | inlined from 'copy_from_user' at include/linux/uaccess.h:221:9, | inlined from 'ntfs_ioctl_fitrim' at fs/ntfs3/file.c:77:6, | inlined from 'ntfs_ioctl' at fs/ntfs3/file.c:164:10: | include/linux/kcsan-checks.h:220:28: error: 'range' may be used uninitialized [-Werror=maybe-uninitialized] | 220 | #define kcsan_check_access __kcsan_check_access | | ^ | include/linux/kcsan-checks.h:311:9: note: in expansion of macro 'kcsan_check_access' | 311 | kcsan_check_access(ptr, size, KCSAN_ACCESS_WRITE) | | ^~~~~~~~~~~~~~~~~~ | include/linux/instrumented.h:147:9: note: in expansion of macro 'kcsan_check_write' | 147 | kcsan_check_write(to, n); | | ^~~~~~~~~~~~~~~~~ | include/linux/kcsan-checks.h: In function 'ntfs_ioctl': | include/linux/kcsan-checks.h:37:6: note: by argument 1 of type 'const volatile void *' to '__kcsan_check_access' declared here | 37 | void __kcsan_check_access(const volatile void *ptr, size_t size, int type); | | ^~~~~~~~~~~~~~~~~~~~ | fs/ntfs3/file.c:65:29: note: 'range' declared here | 65 | struct fstrim_range range; | | ^~~~~ Link: https://lore.kernel.org/all/5da10cca-875b-418d-b54e-6be3ea32c266@app.fastmail.com/ [1] Reported-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Arnd Bergmann <arnd@arndb.de> Tested-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Marco Elver <elver@google.com>
2026-04-30irqchip/gic: Replace __ASSEMBLY__ with __ASSEMBLER__Thomas Huth2-3/+3
While the GCC and Clang compilers already define __ASSEMBLER__ automatically when compiling assembly code, __ASSEMBLY__ is a macro that only gets defined by the Makefiles in the kernel. This can be very confusing when switching between userspace and kernelspace coding, or when dealing with uapi headers that rather should use __ASSEMBLER__ instead. Standardize now on the __ASSEMBLER__ macro that is provided by the compilers. Signed-off-by: Thomas Huth <thuth@redhat.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20260421113012.146528-1-thuth@redhat.com
2026-04-30dpll: export __dpll_pin_change_ntf() for use under dpll_lockIvan Vecera1-0/+1
Export __dpll_pin_change_ntf() so that drivers can send pin change notifications from within pin callbacks, which are already called under dpll_lock. Using dpll_pin_change_ntf() in that context would deadlock. Add lockdep_assert_held() to catch misuse without the lock held. Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: Ivan Vecera <ivecera@redhat.com> Signed-off-by: Petr Oros <poros@redhat.com> Tested-by: Alexander Nowlin <alexander.nowlin@intel.com> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20260427-jk-iwl-net-petr-oros-fixes-v1-9-cdcb48303fd8@intel.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-30netfilter: x_tables: add .check_hooks to matches and targetsPablo Neira Ayuso1-0/+8
Add a new .check_hooks interface for checking if the match/target is used from the validate hook according to its configuration. Move existing conditional hook check based on the match/target configuration from .checkentry to .check_hooks for the following matches/targets: - addrtype - devgroup - physdev - policy - set - TCPMSS - SET This is a preparation patch to fix nft_compat, not functional changes are intended. Based on patch from Florian Westphal. Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2026-04-29Merge tag 'trace-v7.1-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace Pull tracing fixes from Steven Rostedt: - Fix inverted check of registering the stats for branch tracing When calling register_stat_tracer() which returns zero on success and negative on error, the callers were checking the return of zero as an error and printing a warning message. Because this was just a normal printk() message and not a WARN(), it wasn't caught in any testing. Fix the check to print the warning message when an error actually happens. - Fix a typo in a comment in tracepoint.h - Limit the size of event probes to 3K in size It is possible to create a dynamic event probe via the tracefs system that is greater than the max size of an event that the ring buffer can hold. This basically causes the event to become useless. Limit the size of an event probe to be 3K as that should be large enough to handle any dynamic events being created, and fits within the PAGE_SIZE sub-buffers of the ring buffer. * tag 'trace-v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: tracing/probes: Limit size of event probe to 3K tracepoint: Fix typo in tracepoint.h comment tracing: branch: Fix inverted check on stat tracer registration
2026-04-28Merge tag 'nf-26-04-28' of ↵Jakub Kicinski1-0/+29
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf Pablo Neira Ayuso says: ==================== Netfilter fixes for net The following patchset contains Netfilter fixes for net: 1) IEEE1394 ARP payload contains no target hardware address in the ARP packet. Apparently, arp_tables was never updated to deal with IEEE1394 ARP properly. To deal with this, return no match in case the target hardware address selector is used, either for inverse or normal match. Moreover, arpt_mangle disallows mangling of the target hardware and IP address because, it is not worth to adjust the offset calculation to fix this, we suspect no users of arp_tables for this family. 2) Use list_del_rcu() to delete device hooks in nf_tables, this hook list is RCU protected, concurrent netlink dump readers can be walking on this list, fix it by adding a helper function and use it for consistency. From Florian Westphal. 3) Add list_splice_rcu(), this is useful for joining the local list of new device hooks to the RCU protected hook list in chain and flowtable. Reviewed by Paul E. McKenney. 4) Use list_splice_rcu() to publish the new device hooks in chain and flowtable to fix concurrent netlink dump traversal. 5) Add a new hook transaction object to track device hook deletions. The current approach moves device hooks to be deleted around during the preparation phase, this breaks concurrent RCU reader via netlink dump. This new hook transaction is combined with NFT_HOOK_REMOVE flag to annotate hooks for removal in the preparation phase. 6) xt_policy inbound policy check in strict mode can lead to out-of-bound access of the secpath array due to incorrect. The iteration over the secpath needs to be reversed in the inbound to check for the human readable policy, expecting inner in first position and outer in second position, the secpath from inbound actually stores outer in first position then in second position. From Jiexun Wang. 7) Fix possible zero shift in nft_bitwise triggering UBSAN splat, reject zero shift from control plane, from Kai Ma. 8) Replace simple_strtoul() in the conntrack SIP helper since it relies on nul-terminated strings. From Florian Westphal. * tag 'nf-26-04-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf: netfilter: nf_conntrack_sip: don't use simple_strtoul netfilter: reject zero shift in nft_bitwise netfilter: xt_policy: fix strict mode inbound policy matching netfilter: nf_tables: add hook transactions for device deletions netfilter: nf_tables: join hook list via splice_list_rcu() in commit phase rculist: add list_splice_rcu() for private lists netfilter: nf_tables: use list_del_rcu for netlink hooks netfilter: arp_tables: fix IEEE1394 ARP payload parsing ==================== Link: https://patch.msgid.link/20260428095840.51961-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-28Merge tag 'sched_ext-for-7.1-rc1-fixes' of ↵Linus Torvalds2-3/+10
git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext Pull sched_ext fixes from Tejun Heo: "The merge window pulled in the cgroup sub-scheduler infrastructure, and new AI reviews are accelerating bug reporting and fixing - hence the larger than usual fixes batch: - Use-after-frees during scheduler load/unload: - The disable path could free the BPF scheduler while deferred irq_work / kthread work was still in flight - cgroup setter callbacks read the active scheduler outside the rwsem that synchronizes against teardown Fix both, and reuse the disable drain in the enable error paths so the BPF JIT page can't be freed under live callbacks. - Several BPF op invocations didn't tell the framework which runqueue was already locked, so helper kfuncs that re-acquire the runqueue by CPU could deadlock on the held lock Fix the affected callsites, including recursive parent-into-child dispatch. - The hardlockup notifier ran from NMI but eventually took a non-NMI-safe lock. Bounce it through irq_work. - A handful of bugs in the new sub-scheduler hierarchy: - helper kfuncs hard-coded the root instead of resolving the caller's scheduler - the enable error path tried to disable per-task state that had never been initialized, and leaked cpus_read_lock on the way out - a sysfs object was leaked on every load/unload - the dispatch fast-path used the root scheduler instead of the task's - a couple of CONFIG #ifdef guards were misclassified - Verifier-time hardening: BPF programs of unrelated struct_ops types (e.g. tcp_congestion_ops) could call sched_ext kfuncs - a semantic bug and, once sub-sched was enabled, a KASAN out-of-bounds read. Now rejected at load. Plus a few NULL and cross-task argument checks on sched_ext kfuncs, and a selftest covering the new deny. - rhashtable (Herbert): restore the insecure_elasticity toggle and bounce the deferred-resize kick through irq_work to break a lock-order cycle observable from raw-spinlock callers. sched_ext's scheduler-instance hash is the first user of both. - The bypass-mode load balancer used file-scope cpumasks; with multiple scheduler instances now possible, those raced. Move to per-instance cpumasks, plus a follow-up to skip tasks whose recorded CPU is stale relative to the new owning runqueue. - Smaller fixes: - a dispatch queue's first-task tracking misbehaved when a parked iterator cursor sat in the list - the runqueue's next-class wasn't promoted on local-queue enqueue, leaving an SCX task behind RT in edge cases - the reference qmap scheduler stopped erroring on legitimate cross-scheduler task-storage misses" * tag 'sched_ext-for-7.1-rc1-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext: (26 commits) sched_ext: Fix scx_flush_disable_work() UAF race sched_ext: Call wakeup_preempt() in local_dsq_post_enq() sched_ext: Release cpus_read_lock on scx_link_sched() failure in root enable sched_ext: Reject NULL-sch callers in scx_bpf_task_set_slice/dsq_vtime sched_ext: Refuse cross-task select_cpu_from_kfunc calls sched_ext: Align cgroup #ifdef guards with SUB_SCHED vs GROUP_SCHED sched_ext: Make bypass LB cpumasks per-scheduler sched_ext: Pass held rq to SCX_CALL_OP() for core_sched_before sched_ext: Pass held rq to SCX_CALL_OP() for dump_cpu/dump_task sched_ext: Save and restore scx_locked_rq across SCX_CALL_OP sched_ext: Use dsq->first_task instead of list_empty() in dispatch_enqueue() FIFO-tail sched_ext: Resolve caller's scheduler in scx_bpf_destroy_dsq() / scx_bpf_dsq_nr_queued() sched_ext: Read scx_root under scx_cgroup_ops_rwsem in cgroup setters sched_ext: Don't disable tasks in scx_sub_enable_workfn() abort path sched_ext: Skip tasks with stale task_rq in bypass_lb_cpu() sched_ext: Guard scx_dsq_move() against NULL kit->dsq after failed iter_new sched_ext: Unregister sub_kset on scheduler disable sched_ext: Defer scx_hardlockup() out of NMI sched_ext: sync disable_irq_work in bpf_scx_unreg() sched_ext: Fix local_dsq_post_enq() to use task's scheduler in sub-sched ...
2026-04-29driver core: move dev_has_sync_state() to drivers/base/base.hDanilo Krummrich1-14/+0
All callers of dev_has_sync_state() are in drivers/base/ and any attempt to use it outside of driver-core should require good justification, so there is no need to have it defined in include/linux/device.h. Thus, move it to drivers/base/base.h. Suggested-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Link: https://lore.kernel.org/driver-core/CAJZ5v0jkm9K9=-U_51FMsyxN2msdouRnz4sEjmxG0Btd6Hmw0w@mail.gmail.com/ Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260420234153.2898532-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-04-29driver core: use READ_ONCE() for dev->driver in dev_has_sync_state()Danilo Krummrich1-1/+4
dev_has_sync_state() reads dev->driver twice without holding device_lock() -- once for the NULL check and once to dereference ->sync_state. Some callers only hold device_links_write_lock, which doesn't prevent a concurrent unbind from clearing dev->driver via device_unbind_cleanup(). Fix it by reading dev->driver exactly once with READ_ONCE(), pairing with the WRITE_ONCE() in device_set_driver(). Link: https://lore.kernel.org/driver-core/DHW8QPU1VU1F.3P6PH69HLFBYC@kernel.org/ Fixes: ac338acf514e ("driver core: Add dev_has_sync_state()") Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Saravana Kannan <saravanak@kernel.org> Link: https://patch.msgid.link/20260418162221.1121873-1-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-04-28tracepoint: Fix typo in tracepoint.h commentSheng Che Peng1-1/+1
Change "my" to "may" in the description of subsystem configurations. Link: https://patch.msgid.link/20260422021819.1788091-1-synte4028@gmail.com Signed-off-by: Sheng Che Peng <synte4028@gmail.com> Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
2026-04-28workqueue: fix devm_alloc_workqueue() va_list misuseBreno Leitao1-2/+4
devm_alloc_workqueue() built a va_list and passed it as a single positional argument to the variadic alloc_workqueue() macro: va_start(args, max_active); wq = alloc_workqueue(fmt, flags, max_active, args); va_end(args); C does not allow forwarding a va_list through a ... parameter. alloc_workqueue() expands to alloc_workqueue_noprof(), which runs its own va_start() over its ... params, so the inner vsnprintf(wq->name, sizeof(wq->name), fmt, args) in __alloc_workqueue() received the outer va_list object as the first variadic slot rather than the caller's actual format arguments. Add a new static helper alloc_workqueue_va() that wraps __alloc_workqueue() and runs wq_init_lockdep() on success, and fold both alloc_workqueue_noprof() and devm_alloc_workqueue_noprof() onto it as suggested by Tejun. The wq_init_lockdep() step is required on the devm path too, otherwise __flush_workqueue()'s on-stack COMPLETION_INITIALIZER_ONSTACK_MAP would NULL-deref wq->lockdep_map. No caller changes are required. devm_alloc_ordered_workqueue() is a macro forwarding to devm_alloc_workqueue() and inherits the fix. Two in-tree callers actively trigger the broken path on every probe: drivers/power/supply/mt6370-charger.c:889 drivers/power/supply/max77705_charger.c:649 both of which use devm_alloc_ordered_workqueue(dev, "%s", 0, dev_name(dev)). A standalone reproducer module is available at[1]. Link: https://github.com/leitao/debug/blob/main/workqueue/valist/wq_va_test.c [1] Fixes: 1dfc9d60a69e ("workqueue: devres: Add device-managed allocate workqueue") Signed-off-by: Breno Leitao <leitao@debian.org> Signed-off-by: Tejun Heo <tj@kernel.org>
2026-04-27ipmr: Free mr_table after RCU grace period.Kuniyuki Iwashima1-0/+3
With CONFIG_IP_MROUTE_MULTIPLE_TABLES=n, ipmr_fib_lookup() does not check if net->ipv4.mrt is NULL. Since default_device_exit_batch() is called after ->exit_rtnl(), a device could receive IGMP packets and access net->ipv4.mrt during/after ipmr_rules_exit_rtnl(). If ipmr_rules_exit_rtnl() had already cleared it and freed the memory, the access would trigger null-ptr-deref or use-after-free. Let's fix it by using RCU helper and free mrt after RCU grace period. In addition, check_net(net) is added to mroute_clean_tables() and ipmr_cache_unresolved() to synchronise via mfc_unres_lock. This prevents ipmr_cache_unresolved() from putting skb into c->_c.mfc_un.unres.unresolved after mroute_clean_tables() purges it. For the same reason, timer_shutdown_sync() is moved after mroute_clean_tables(). Since rhltable_destroy() holds mutex internally, rcu_work is used, and it is placed as the first member because rcu_head must be placed within <4K offset. mr_table is alraedy 3864 bytes without rcu_work. Note that IP6MR is not yet converted to ->exit_rtnl(), so this change is not needed for now but will be. Fixes: b22b01867406 ("ipmr: Convert ipmr_net_exit_batch() to ->exit_rtnl().") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260423053456.4097409-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-27Merge tag 'fsnotify_for_v7.1-rc2' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull fsnotify fixes from Jan Kara: "Three fixes for fsnotify / fanotify" * tag 'fsnotify_for_v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: fsnotify: fix inode reference leak in fsnotify_recalc_mask() fanotify: Fix spelling mistake "enforecement" -> "enforcement" fanotify: fix false positive on permission events
2026-04-27hfsplus: Add a sanity check for btree node sizeEdward Adam Davis1-0/+1
Syzbot reported an uninit-value bug in [1] with a corrupted HFS+ image, during the file system mounting process, specifically while loading the catalog, a corrupted node_size value of 1 caused the rec_off argument passed to hfs_bnode_read_u16() (within hfs_bnode_find()) to be excessively large. Consequently, the function failed to return a valid value to initialize the off variable, triggering the bug [1]. Every node starts from BTree node descriptor: struct hfs_bnode_desc. So, the size of node cannot be lesser than that. However, technical specification declares that: "The node size (which is expressed in bytes) must be power of two, from 512 through 32,768, inclusive." Add a check for btree node size base on technical specification. [1] BUG: KMSAN: uninit-value in hfsplus_bnode_find+0x141c/0x1600 fs/hfsplus/bnode.c:584 hfsplus_bnode_find+0x141c/0x1600 fs/hfsplus/bnode.c:584 hfsplus_btree_open+0x169a/0x1e40 fs/hfsplus/btree.c:382 hfsplus_fill_super+0x111f/0x2770 fs/hfsplus/super.c:553 get_tree_bdev_flags+0x6e6/0x920 fs/super.c:1694 get_tree_bdev+0x38/0x50 fs/super.c:1717 hfsplus_get_tree+0x35/0x40 fs/hfsplus/super.c:709 vfs_get_tree+0xb3/0x5d0 fs/super.c:1754 fc_mount fs/namespace.c:1193 [inline] Fixes: 8ad2c6a36ac4 ("hfsplus: validate b-tree node 0 bitmap at mount time") Reported-by: syzbot+217eb327242d08197efb@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=217eb327242d08197efb Signed-off-by: Edward Adam Davis <eadavis@qq.com> Reviewed-by: Viacheslav Dubeyko <slava@dubeyko.com> Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com> Link: https://lore.kernel.org/r/tencent_5ED373437A697F83A4A446B771577626CD05@qq.com Signed-off-by: Viacheslav Dubeyko <slava@dubeyko.com>
2026-04-27Merge tag 'mailbox-v7.1' of ↵Linus Torvalds2-4/+14
git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox Pull mailbox updates from Jassi Brar: - core: fix NULL message handling and add API to query TX queue slots - test: resolve concurrency bugs, dangling IRQs, and memory leaks - dt-bindings: qcom: add Eliza IPCC - mtk: fix address calculation and pointer handling bugs - cix: resolve SCMI suspend timeouts - misc memory allocation optimizations and cleanups * tag 'mailbox-v7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/jassibrar/mailbox: mailbox: mailbox-test: make data_ready a per-instance variable mailbox: mailbox-test: initialize struct earlier mailbox: mailbox-test: don't free the reused channel mailbox: mailbox-test: handle channel errors consistently mailbox: update kdoc for struct mbox_controller mailbox: add sanity check for channel array mailbox: mailbox-test: free channels on probe error mailbox: prefix new constants with MBOX_ dt-bindings: mailbox: qcom-ipcc: Document the Eliza Inter-Processor Communication Controller mailbox: cix: Add IRQF_NO_SUSPEND to mailbox interrupt mailbox: Fix NULL message support in mbox_send_message() mailbox: remove superfluous internal header mailbox: correct kdoc title for mbox_bind_client mailbox: test: really ignore optional memory resources mailbox: exynos: drop superfluous mbox setting per channel mailbox: mtk-cmdq: Fix CURR and END addr for task insert case mailbox: mtk-vcp-mailbox: Fix the return value in mtk_vcp_mbox_xlate() mailbox: hi6220: kzalloc + kcalloc to kzalloc mailbox: rockchip: kzalloc + kcalloc to kzalloc mailbox: add API to query available TX queue slots
2026-04-27cdrom, scsi: sr: propagate read-only status to block layer via set_disk_ro()Daan De Meyer1-0/+1
The cdrom core never calls set_disk_ro() for a registered device, so BLKROGET on a CD-ROM device always returns 0 (writable), even when the drive has no write capabilities and writes will inevitably fail. This causes problems for userspace that relies on BLKROGET to determine whether a block device is read-only. For example, systemd's loop device setup uses BLKROGET to decide whether to create a loop device with LO_FLAGS_READ_ONLY. Without the read-only flag, writes pass through the loop device to the CD-ROM and fail with I/O errors. systemd-fsck similarly checks BLKROGET to decide whether to run fsck in no-repair mode (-n). The write-capability bits in cdi->mask come from two different sources: CDC_DVD_RAM and CDC_CD_RW are populated by the driver from the MODE SENSE capabilities page (page 0x2A) before register_cdrom() is called, while CDC_MRW_W and CDC_RAM require the MMC GET CONFIGURATION command and were only probed by cdrom_open_write() at device open time. This meant that any attempt to compute the writable state from the full mask at probe time was incorrect, because the GET CONFIGURATION bits were still unset (and cdi->mask is initialized such that capabilities are assumed present). Fix this by factoring the GET CONFIGURATION probing out of cdrom_open_write() into a new exported helper, cdrom_probe_write_features(), and having sr call it from sr_probe() right after get_capabilities() has populated the MODE SENSE bits. register_cdrom() then calls set_disk_ro() based on the full write-capability mask (CDC_DVD_RAM | CDC_MRW_W | CDC_RAM | CDC_CD_RW) so the block layer reflects the drive's actual write support. The feature queries used (CDF_MRW and CDF_RWRT via GET CONFIGURATION with RT=00) report drive-level capabilities that are persistent across media, so a single probe before register_cdrom() is sufficient and the redundant probe at open time is dropped. With set_disk_ro() now accurate, the long-vestigial cd->writeable flag in sr can go: get_capabilities() used to set cd->writeable based on the same four mask bits, but because CDC_MRW_W and CDC_RAM default to "capability present" in cdi->mask and aren't touched by MODE SENSE, the condition that gated cd->writeable was always true, making it unconditionally 1. Replace the corresponding gate in sr_init_command() with get_disk_ro(cd->disk), which turns a previously no-op check into a real one and also catches kernel-internal bio writers that bypass blkdev_write_iter()'s bdev_read_only() check. The sd driver (SCSI disks) does not have this problem because it checks the MODE SENSE Write Protect bit and calls set_disk_ro() accordingly. The sr driver cannot use the same approach because the MMC specification does not define the WP bit in the MODE SENSE device-specific parameter byte for CD-ROM devices. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Daan De Meyer <daan@amutable.com> Reviewed-by: Phillip Potter <phil@philpotter.co.uk> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Phillip Potter <phil@philpotter.co.uk> Link: https://patch.msgid.link/20260427210139.1400-2-phil@philpotter.co.uk Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-27Merge tag 'nvme-7.1-2026-04-24' of git://git.infradead.org/nvme into block-7.1Jens Axboe1-3/+3
Pull NVMe fixes from Keith: "- Target data transfer size confiruation (Aurelien) - Enable P2P for RDMA (Shivaji Kant) - TCP target updates (Maurizio, Alistair, Chaitanya, Shivam Kumar) - TCP host updates (Alistair, Chaitanya) - Authentication updates (Alistair, Daniel, Chris Leech) - Multipath fixes (John Garry) - New quirks (Alan Cui, Tao Jiang) - Apple driver fix (Fedor Pchelkin) - PCI admin doorbell update fix (Keith)" * tag 'nvme-7.1-2026-04-24' of git://git.infradead.org/nvme: (22 commits) nvme-auth: Hash DH shared secret to create session key nvme-pci: fix missed admin queue sq doorbell write nvme-auth: Include SC_C in RVAL controller hash nvme-tcp: teardown circular locking fixes nvmet-tcp: Don't clear tls_key when freeing sq Revert "nvmet-tcp: Don't free SQ on authentication success" nvme: skip trace completion for host path errors nvme-pci: add quirk for Memblaze Pblaze5 (0x1c5f:0x0555) nvme-multipath: put module reference when delayed removal work is canceled nvme: expose TLS mode nvme-apple: drop invalid put of admin queue reference count nvme-core: fix parameter name in comment nvmet: avoid recursive nvmet-wq flush in nvmet_ctrl_free nvme-multipath: drop head pointer check in nvme_mpath_clear_current_path() nvme: add quirk NVME_QUIRK_IGNORE_DEV_SUBNQN for 144d:a808 (Samsung PM981/983/970 EVO Plus ) nvmet-tcp: fix race between ICReq handling and queue teardown nvmet-tcp: remove redundant calls to nvmet_tcp_fatal_error() nvmet-tcp: propagate nvmet_tcp_build_pdu_iovec() errors to its callers nvme: enable PCI P2PDMA support for RDMA transport nvmet: introduce new mdts configuration entry ...
2026-04-27mtd: spinand: Add support for packed read data ODTR commandsMiquel Raynal1-0/+7
Some devices stuff address bits in the double byte opcode (in place of the repeated byte) in order to be able to increase the size of the devices, without adding extra address bytes. Create a flag to identify those devices. When the flag is set, use the "packed" variant for the read data operation. Signed-off-by: Miquel Raynal <miquel.raynal@bootlin.com>
2026-04-27MAINTAINERS: update Liam's email addressLiam R. Howlett1-1/+1
Switching to private email address. Update all contact information Add an entry to mailmap at the same time. Link: https://lore.kernel.org/20260422184310.2682901-1-liam@infradead.org Signed-off-by: Liam R. Howlett <liam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-27mm/vma: do not try to unmap a VMA if mmap_prepare() invoked from mmap()Lorenzo Stoakes1-1/+1
The mmap_prepare hook functionality includes the ability to invoke mmap_prepare() from the mmap() hook of existing 'stacked' drivers, that is ones which are capable of calling the mmap hooks of other drivers/file systems (e.g. overlayfs, shm). As part of the mmap_prepare action functionality, we deal with errors by unmapping the VMA should one arise. This works in the usual mmap_prepare case, as we invoke this action at the last moment, when the VMA is established in the maple tree. However, the mmap() hook passes a not-fully-established VMA pointer to the caller (which is the motivation behind the mmap_prepare() work), which is detached. So attempting to unmap a VMA in this state will be problematic, with the most obvious symptom being a warning in vma_mark_detached(), because the VMA is already detached. It's also unncessary - the mmap() handler will clean up the VMA on error. So to fix this issue, this patch propagates whether or not an mmap action is being completed via the compatibility layer or directly. If the former, then we do not attempt VMA cleanup, if the latter, then we do. This patch also updates the userland VMA tests to reflect the change. Link: https://lore.kernel.org/20260421102150.189982-1-ljs@kernel.org Fixes: ac0a3fc9c07d ("mm: add ability to take further action in vm_area_desc") Signed-off-by: Lorenzo Stoakes <ljs@kernel.org> Reported-by: syzbot+db390288d141a1dccf96@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/69e69734.050a0220.24bfd3.0027.GAE@google.com/ Cc: David Hildenbrand <david@kernel.org> Cc: Jann Horn <jannh@google.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Suren Baghdasaryan <surenb@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-27Merge drm/drm-fixes into drm-misc-fixesThomas Zimmermann394-5425/+8622
Getting fixes and updates from v7.1-rc1. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
2026-04-26driver core: Replace dev->offline + ->offline_disabled with accessorsDouglas Anderson1-6/+6
In C, bitfields are not necessarily safe to modify from multiple threads without locking. Switch "offline" and "offline_disabled" over to the "flags" field so modifications are safe. Cc: Rafael J. Wysocki <rafael@kernel.org> Acked-by: Mark Brown <broonie@kernel.org> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Reviewed-by: Danilo Krummrich <dakr@kernel.org> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Acked-by: Marek Szyprowski <m.szyprowski@samsung.com> Signed-off-by: Douglas Anderson <dianders@chromium.org> Link: https://patch.msgid.link/20260406162231.v5.9.I897d478b4a9361d79cd5073207c1062fd4d0d0e4@changeid Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-04-26driver core: Replace dev->of_node_reused with dev_of_node_reused()Douglas Anderson1-3/+4
In C, bitfields are not necessarily safe to modify from multiple threads without locking. Switch "of_node_reused" over to the "flags" field so modifications are safe. Cc: Johan Hovold <johan@kernel.org> Acked-by: Mark Brown <broonie@kernel.org&g