aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
3 daysksmbd: enforce FILE_READ_ATTRIBUTES on SMB_FIND_FILE_POSIX_INFORMATIONGil Portnoy1-0/+6
find_file_posix_info() in smb2_query_info() returns file metadata (owner uid, group gid, mode, inode, size, allocation size, hard-link count and all four timestamps) but performs no per-handle access check. Every sibling query handler gates on the handle's granted access first -- get_file_basic_info(), get_file_all_info(), get_file_network_open_info() and get_file_attribute_tag_info() all reject a handle lacking FILE_READ_ATTRIBUTES_LE with -EACCES. The POSIX handler is gated only by the connection-scoped tcon->posix_extensions flag, which is not a per-handle authorization, so a handle opened with only FILE_WRITE_DATA is correctly denied FileBasicInformation yet is allowed the strict-superset POSIX info. Mirror the FILE_READ_ATTRIBUTES_LE gate the sibling info handlers already use. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: reject non-VALID session in compound request branchGil Portnoy1-0/+5
smb2_check_user_session() takes a shortcut for any operation that is not the first in a COMPOUND request: it reuses work->sess (the session bound by the first operation) and validates only the SessionId, then returns "valid". It never re-checks work->sess->state == SMB2_SESSION_VALID, and a SessionId of 0xFFFFFFFFFFFFFFFF (ULLONG_MAX, the MS-SMB2 related-operation value) skips even the id comparison. The standalone path (ksmbd_session_lookup_all() plus the SESSION_SETUP state machine) does enforce the VALID state; the compound branch bypasses all of it. A SESSION_SETUP carrying only an NTLM Type-1 (NtLmNegotiate) blob publishes a fresh SMB2_SESSION_IN_PROGRESS session whose sess->user is still NULL (->user is assigned later, by ntlm_authenticate()). Used as operation 1 of a COMPOUND with operation 2 = TREE_CONNECT (related, SessionId=ULLONG_MAX, \\host\IPC$), the tree-connect then runs on that IN_PROGRESS session and reaches ksmbd_ipc_tree_connect_request(), which dereferences user_name(sess->user) with sess->user == NULL (transport_ipc.c:687/701/704) -> remote NULL-pointer dereference and a kernel Oops that wedges the ksmbd worker for all clients. Reject any non-first compound operation that lands on a session which is not SMB2_SESSION_VALID, mirroring the validity the standalone lookup path enforces. SESSION_SETUP itself legitimately runs on an IN_PROGRESS session, but it is never carried as a non-first compound operation, so multi-leg authentication is unaffected by this check. Fixes: 5005bcb42191 ("ksmbd: validate session id and tree id in the compound request") Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: compress SMB2 READ responsesNamjae Jeon6-0/+157
Handle SMB2_READFLAG_REQUEST_COMPRESSED for non-RDMA reads. Flatten the response iov, emit chained or unchained LZ77 transforms when compression is beneficial, and retain the generated buffer until the work item is released. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: negotiate and decode SMB2 compressionNamjae Jeon7-13/+210
Parse the SMB 3.1.1 compression capabilities context and negotiate LZ77 with optional chained Pattern_V1 support. Advertise compression on tree connections and decode compressed requests before normal SMB dispatch. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayscifs: negotiate chained SMB2 compression capabilitiesNamjae Jeon2-9/+37
Advertise LZ77 and Pattern_V1 with chained transform support in the SMB 3.1.1 compression negotiate context. Validate the server's returned algorithm list and flags, then retain the negotiated capabilities for a future compressed transform receive implementation. This patch only negotiates capabilities. It does not request compressed READ responses or add a compressed transform receive path. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb: add common SMB2 compression transform helpersNamjae Jeon4-8/+410
Implement common validation, compression and decompression for SMB2 compression transforms. Support unchained LZ77 and chained NONE, LZ77 and Pattern_V1 payloads. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb: move LZ77 compression into common codeNamjae Jeon7-44/+176
Move the LZ77 codec in cifs.ko to smb/common/ so both the SMB client and ksmbd can use it. Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: add per-handle permission check to FILE_LINK_INFORMATIONGil Portnoy1-0/+5
The FILE_LINK_INFORMATION arm of smb2_set_info_file() calls smb2_create_link() with no per-handle fp->daccess check. On the ReplaceIfExists path smb2_create_link() unlinks an existing file at the target name (ksmbd_vfs_remove_file) and creates a hardlink (ksmbd_vfs_link); neither helper checks daccess. A handle opened with FILE_READ_DATA only (no FILE_DELETE, no FILE_WRITE_DATA) can therefore delete an arbitrary file in the share and plant a hardlink over its name. The sibling delete/move arms in the same switch already gate: FILE_RENAME_INFORMATION and FILE_DISPOSITION_INFORMATION both require FILE_DELETE_LE; FILE_FULL_EA_INFORMATION requires FILE_WRITE_EA_LE. Gate the link arm the same way as its closest analogue (rename), since it mutates the namespace and, on replace, deletes an existing entry. This is a sibling of commit cc57232cae23 ("ksmbd: fix FSCTL permission bypass by adding a permission check for FSCTL_SET_SPARSE"). Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: add a permission check for FSCTL_SET_ZERO_DATAGil Portnoy1-0/+6
FSCTL_SET_ZERO_DATA in smb2_ioctl() destroys file data via ksmbd_vfs_zero_data() -> vfs_fallocate(PUNCH_HOLE/ZERO_RANGE) after checking only the share-level KSMBD_TREE_CONN_FLAG_WRITABLE, with no per-handle access check. A handle opened with only FILE_WRITE_ATTRIBUTES still yields an FMODE_WRITE filp (FILE_WRITE_ATTRIBUTES is part of FILE_WRITE_DESIRE_ACCESS_LE, so smb2_create_open_flags() opens it O_WRONLY), so the vfs_fallocate FMODE_WRITE check does not stop it; only the missing fp->daccess gate would. Reproduced on mainline 7.1-rc7 with KASAN by an authenticated SMB client: a FILE_WRITE_ATTRIBUTES-only handle zeroed 4096 bytes of file data it had no FILE_WRITE_DATA right to (6/6; a FILE_READ_DATA-only handle was correctly denied). This is the unfixed sibling of commit cc57232cae23 ("ksmbd: fix FSCTL permission bypass by adding a permission check for FSCTL_SET_SPARSE"). Because SET_ZERO_DATA writes data (not an attribute), require FILE_WRITE_DATA. Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: add a WRITE_DAC/WRITE_OWNER check to SMB2 SET_INFO SECURITYGil Portnoy1-0/+3
commit cc57232cae23 ("ksmbd: fix FSCTL permission bypass by adding a permission check for FSCTL_SET_SPARSE") added a fp->daccess gate to fsctl_set_sparse and noted that "similar handle-level checks exist in other functions but are missing here." The SMB2 SET_INFO SECURITY arm is one of the missing ones, and the most security-relevant: smb2_set_info_sec() calls set_info_sec() with no per-handle access check. set_info_sec() (fs/smb/server/smbacl.c) re-permissions the file: it rewrites owner/group/mode via notify_change(), rewrites the POSIX ACL via set_posix_acl(), and on KSMBD_SHARE_FLAG_ACL_XATTR shares removes and rewrites the Windows security descriptor via ksmbd_vfs_set_sd_xattr(). Every other persistent-mutation arm of the sibling handler smb2_set_info_file() checks fp->daccess first (FILE_WRITE_DATA / FILE_DELETE / FILE_WRITE_EA / FILE_WRITE_ATTRIBUTES); the SECURITY arm — which mutates the access control itself — is the only one with no gate. A client can therefore open a handle with FILE_WRITE_ATTRIBUTES only (no FILE_WRITE_DAC / FILE_WRITE_OWNER) and use SMB2_SET_INFO with InfoType SMB2_O_INFO_SECURITY to rewrite the file's DACL and owner, granting itself access the handle's daccess never carried. Unlike the FSCTL data arms this is a metadata/xattr operation, so there is no FMODE_WRITE VFS backstop — the missing fp->daccess check is the entire gate. Setting a security descriptor is the WRITE_DAC / WRITE_OWNER operation, so require at least one of those on the handle before re-permissioning the file. -EACCES is mapped to STATUS_ACCESS_DENIED by smb2_set_info(). Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: fix use-after-free of a deferred file_lock on SMB2_CLOSE then SMB2_CANCELGil Portnoy1-7/+7
Commit f580d27e8928 ("ksmbd: fix use-after-free of a deferred file_lock on double SMB2_CANCEL") made smb2_cancel() skip a work whose state is KSMBD_WORK_CANCELLED, so its cancel_fn cannot be fired a second time. But KSMBD_WORK has three states (ACTIVE, CANCELLED, CLOSED), and the same freeing producer path is reached for CLOSED too: SMB2_CLOSE on the locking handle -> set_close_state_blocked_works() sets the deferred work's state to KSMBD_WORK_CLOSED and wakes the smb2_lock() worker. The worker takes the non-ACTIVE early-exit, locks_free_lock()s the file_lock and, because the state is not KSMBD_WORK_CANCELLED, takes the STATUS_RANGE_NOT_LOCKED branch with "goto out2" -- which, like the cancelled branch, skips release_async_work(). The work stays on conn->async_requests with a live cancel_fn = smb2_remove_blocked_lock pointing at the freed file_lock. A subsequent SMB2_CANCEL for the same AsyncId then passes the KSMBD_WORK_CANCELLED-only guard (its state is KSMBD_WORK_CLOSED), so smb2_cancel() fires cancel_fn again over the freed file_lock -- the same use-after-free fixed, via SMB2_CLOSE instead of a first SMB2_CANCEL: BUG: KASAN: slab-use-after-free in __locks_delete_block __locks_delete_block locks_delete_block ksmbd_vfs_posix_lock_unblock smb2_remove_blocked_lock smb2_cancel <- 2nd SMB2_CANCEL fires cancel_fn handle_ksmbd_work Allocated by ...: locks_alloc_lock <- smb2_lock Freed by ...: locks_free_lock <- smb2_lock (non-ACTIVE early-exit) ... cache file_lock_cache of size 192 Reproduced on mainline 7.1-rc7 (which already contains f580d27e8928) with KASAN by an authenticated SMB client; the double-SMB2_CANCEL control is silent on that kernel, so the splat is attributable to the CLOSE trigger. Only an ACTIVE deferred work may have its cancel_fn fired: both terminal states (CANCELLED and CLOSED) reach the smb2_lock() early-exit that frees the file_lock and skips release_async_work(). Guard on KSMBD_WORK_ACTIVE so any non-active work is skipped. Fixes: f580d27e8928 ("ksmbd: fix use-after-free of a deferred file_lock on double SMB2_CANCEL") Cc: stable@vger.kernel.org Signed-off-by: Gil Portnoy <dddhkts1@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb: server: remove code guarded by nonexistent config optionEthan Nelson-Moore1-3/+0
A small piece of code in fs/smb/server/smb_common.c depends on CONFIG_SMB_INSECURE_SERVER, which has never been defined in the mainline kernel, but was present in old out-of-tree versions of ksmbd. Remove this dead code. Discovered while searching for CONFIG_* symbols referenced in code but not defined in any Kconfig file. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb/server: fix incorrect file size in get_file_compression_info()ChenXiaoSong1-1/+1
Before this patch, we got the wrong file size: - client: touch /mnt/file - client: smbinfo setcompression default /mnt/file - client: dd if=/dev/zero of=/mnt/file bs=1 count=1000 - client: smbinfo filecompressioninfo /mnt/file Compressed File Size: 4096 Compression Format: 2 (LZNT1) After this patch, we get the correct file size: - client: smbinfo filecompressioninfo /mnt/file Compressed File Size: 1000 Compression Format: 2 (LZNT1) Note that the actual compressed file size must be got by other methods. For Btrfs, use the following command to get actual compressed file size: - server: compsize /export/file Processed 1 file, 0 regular extents (0 refs), 1 inline. Type Perc Disk Usage Uncompressed Referenced TOTAL 4% 47B 1000B 1000B zlib 4% 47B 1000B 1000B Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb/server: get compression format in get_file_compression_info()ChenXiaoSong1-1/+6
I have added `filecompressioninfo` subcommand to `smbinfo` in cifs-utils.git. Example: 1. client: smbinfo setcompression lznt1 /mnt/file 2. client: smbinfo filecompressioninfo /mnt/file Compressed File Size: 104857600 Compression Format: 2 (LZNT1) Compression Unit Shift: 0 Chunk Shift: 0 Cluster Shift: 0 Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb/server: implement FSCTL_SET_COMPRESSION ioctl handlerChenXiaoSong4-1/+100
Example: 1. client: smbinfo setcompression no /mnt/file 2. client: smbinfo getcompression /mnt/file Compression: 0 (NONE) 3. client: smbinfo setcompression lznt1 /mnt/file 4. client: smbinfo getcompression /mnt/file Compression: 2 (LZNT1) 5. client: smbinfo setcompression default /mnt/file 6. client: smbinfo getcompression /mnt/file Compression: 2 (LZNT1) Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb/server: implement FSCTL_GET_COMPRESSION ioctl handlerChenXiaoSong4-1/+53
Example: 1. server: chattr +c /export/file 2. client: smbinfo getcompression /mnt/file Compression: 2 (LZNT1) Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb/server: get compression file attribute on openChenXiaoSong3-1/+23
Example: 1. server: chattr +c /export/file 2. client: lsattr /mnt/file --------c------------- /mnt/file Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb: move compression definitions into common/fscc.hChenXiaoSong4-12/+18
These definitions will also be used by ksmbd, move them into common header file. Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 dayssmb: remove duplicate server/smbfsctl.hChenXiaoSong3-97/+12
Rename the following places: - FSCTL_COPYCHUNK -> FSCTL_SRV_COPYCHUNK - FSCTL_COPYCHUNK_WRITE -> FSCTL_SRV_COPYCHUNK_WRITE - FSCTL_REQUEST_RESUME_KEY -> FSCTL_SRV_REQUEST_RESUME_KEY server/smbfsctl.h contains the following additional definitions compared to common/smbfsctl.h: - IO_REPARSE_TAG_LX_SYMLINK_LE - IO_REPARSE_TAG_AF_UNIX_LE - IO_REPARSE_TAG_LX_FIFO_LE - IO_REPARSE_TAG_LX_CHR_LE - IO_REPARSE_TAG_LX_BLK_LE Signed-off-by: ChenXiaoSong <chenxiaosong@kylinos.cn> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: prevent path traversal bypass by restricting caseless retryNamjae Jeon1-1/+1
ksmbd_vfs_path_lookup() enforces LOOKUP_BENEATH to restrict path resolution within the share root. When a crafted path attempts to escape the share boundary using parent-directory components ('..'), vfs_path_parent_lookup() detects this and immediately fails, returning -EXDEV. However, a bug exists in __ksmbd_vfs_kern_path() under caseless mode. The function fails to intercept the -EXDEV error and erroneously falls through to the caseless retry logic, which is intended only for genuinely missing files. During this retry process, the path is reconstructed, leading to an unintended LOOKUP_BENEATH bypass that allows write-capable users to create zero-length files or directories outside the exported share. Fix this by ensuring that the execution only proceeds to the caseless lookup retry when the error is specifically -ENOENT. Any other errors, such as -EXDEV from a path traversal attempt, must be returned immediately. Cc: stable@vger.kernel.org Reported-by: Y s65 <yu4ys@outlook.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: fix UAF of struct file_lock in SMB2_LOCK deferred-lock cancellationDavide Ornaghi1-18/+16
When a blocking byte-range lock request is deferred in the FILE_LOCK_DEFERRED path, ksmbd registers the asynchronous work into the connection's async_requests list via setup_async_work(). The cancel callback smb2_remove_blocked_lock() holds a reference to the flock. If the lock waiter is subsequently woken up but the work state is no longer KSMBD_WORK_ACTIVE (e.g., due to a concurrent cancellation), the cleanup path calls locks_free_lock(flock) without dequeuing the work from the async_requests list. Concurrently, smb2_cancel() walks the list under conn->request_lock and invokes the cancel callback, which then dereferences the already freed 'flock'. This leads to a slab-use-after-free inside __wake_up_common. Fix this by restructuring the cleanup logic after the worker returns from ksmbd_vfs_posix_lock_wait(). Move list_del(&smb_lock->llist) and release_async_work(work) to the top of the cleanup block. This guarantees that the async work is completely dequeued and serialized under conn->request_lock before locks_free_lock(flock) is called, rendering the flock unreachable for any concurrent smb2_cancel(). Cc: stable@vger.kernel.org Signed-off-by: Davide Ornaghi <d.ornaghi97@gmail.com> Signed-off-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: fix use-after-free in same_client_has_lease()Guangshuo Li1-0/+6
same_client_has_lease() returns an opinfo pointer from ci->m_op_list after dropping ci->m_lock without taking a reference. smb_grant_oplock() then dereferences that pointer in copy_lease() and when checking breaking_cnt. A concurrent close can remove the old lease from ci->m_op_list and drop the last reference before the caller uses the returned pointer, leading to a use-after-free. Take a reference when same_client_has_lease() selects an existing lease, drop any previous match while scanning, and release the returned reference in smb_grant_oplock() after copying the lease state. Fixes: e2f34481b24d ("cifsd: add server-side procedures for SMB3") Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
3 daysksmbd: fix out-of-bounds read in smb_check_perm_dacl()Hem Parekh1-1/+3
The permission-check ACE walk in smb_check_perm_dacl() validates the ACE header size and caps sid.num_subauth at SID_MAX_SUB_AUTHORITIES, but it never checks that ace->size is actually large enough to contain num_subauth sub-authorities before compare_sids() dereferences them. CIFS_SID_BASE_SIZE covers the SID header up to but excluding the sub_auth[] array, and offsetof(struct smb_ace, sid) is the ACE header, so the existing guards only guarantee the 8-byte SID base, i.e. zero sub-authorities. compare_sids() then reads ace->sid.sub_auth[i] for i < min(local_sid->num_subauth, ace->sid.num_subauth). The local comparison SIDs (sid_everyone, sid_unix_NFS_mode, and the id_to_sid() result) always have at least one sub-authority, and an attacker controls the ACE revision and authority bytes (which lie within the in-bounds SID base), so they can match one of those SIDs and force the sub_auth read. A crafted ACE with size == 16 and num_subauth >= 1 placed at the tail of the security descriptor therefore causes a heap out-of-bounds read of up to SID_MAX_SUB_AUTHORITIES * sizeof(__le32) bytes past the pntsd allocation. The security descriptor is loaded by ksmbd_vfs_get_sd_xattr() into a buffer sized exactly to the on-disk data (kzalloc(sd_size) in ndr_decode_v4_ntacl()), so the read lands past the allocation. The malformed descriptor can be stored verbatim via SMB2_SET_INFO (the DACL is not normalised before being written to the security.NTACL xattr) and the read fires on a subsequent SMB2_CREATE access check, making this reachable by an authenticated client on a share that uses ACL xattrs. Add the missing num_subauth-versus-ace_size check, mirroring the identical guards already present in the sibling parsers parse_dacl() and smb_inherit_dacl(). Fixes: d07b26f39246 ("ksmbd: require minimum ACE size in smb_check_perm_dacl()") Cc: stable@vger.kernel.org Signed-off-by: Hem Parekh <hemparekh1596@gmail.com> Acked-by: Namjae Jeon <linkinjeon@kernel.org> Signed-off-by: Steve French <stfrench@microsoft.com>
4 daysipv4: fib_rule: Move fib4_rules_exit() to ->exit().Kuniyuki Iwashima2-13/+8
syzbot reported use-after-free of net->ipv4.rules_ops. [0] It can be reproduced with these commands: while true; do ip netns add ns1 ip -n ns1 link set dev lo up ip -n ns1 address add 192.0.2.1/24 dev lo ip -n ns1 link add name dummy1 up type dummy ip -n ns1 address add 198.51.100.1/24 dev dummy1 ip -n ns1 rule add ipproto tcp sport 12345 table 12345 ip -n ns1 fou add port 5555 ipproto 47 local 192.0.2.1 peer 198.51.100.2 peer_port 54321 ip netns del ns1 done The cited commit moved fib4_rules_exit() earlier to ->exit_rtnl(), but the kernel socket destroyed in ->exit() could eventually reach __fib_lookup(). I left fib4_rules_exit() in ->exit_rtnl() because fib4_rule_delete() calls fib_unmerge(), which requires RTNL. However, when ->delete() is called, ->configure() has already been called, thus fib_unmerge() in ->delete() has no effect. Let's remove fib_unmerge() in fib4_rule_delete() and move fib4_rules_exit() to ->exit(). Many thanks to Ido Schimmel for providing the nice repro very quickly. Note that we can make fib_rules_ops.delete() return void once net-next opens. [0]: BUG: KASAN: slab-use-after-free in fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321 Read of size 8 at addr ffff88804ec4c680 by task kworker/u8:21/12641 CPU: 0 UID: 0 PID: 12641 Comm: kworker/u8:21 Not tainted syzkaller #0 PREEMPT(full) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 05/09/2026 Workqueue: netns cleanup_net Call Trace: <TASK> dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120 print_address_description+0x55/0x1e0 mm/kasan/report.c:378 print_report+0x58/0x70 mm/kasan/report.c:482 kasan_report+0x117/0x150 mm/kasan/report.c:595 fib_rules_lookup+0x15e/0xeb0 net/core/fib_rules.c:321 __fib_lookup+0x106/0x210 net/ipv4/fib_rules.c:96 ip_route_output_key_hash_rcu+0x294/0x2720 net/ipv4/route.c:2811 ip_route_output_key_hash+0x18d/0x2a0 net/ipv4/route.c:2702 __ip_route_output_key include/net/route.h:169 [inline] ip_route_output_flow+0x2a/0x150 net/ipv4/route.c:2929 ip4_datagram_release_cb+0x89d/0xbe0 net/ipv4/datagram.c:118 release_sock+0x206/0x260 net/core/sock.c:3861 inet_shutdown+0x2b1/0x390 net/ipv4/af_inet.c:950 udp_tunnel_sock_release+0x6d/0x80 net/ipv4/udp_tunnel_core.c:197 fou_release net/ipv4/fou_core.c:562 [inline] fou_exit_net+0x17d/0x1f0 net/ipv4/fou_core.c:1230 ops_exit_list net/core/net_namespace.c:199 [inline] ops_undo_list+0x43d/0x8d0 net/core/net_namespace.c:252 cleanup_net+0x572/0x810 net/core/net_namespace.c:702 process_one_work kernel/workqueue.c:3314 [inline] process_scheduled_works+0xa8e/0x14e0 kernel/workqueue.c:3397 worker_thread+0xa47/0xfb0 kernel/workqueue.c:3478 kthread+0x389/0x470 kernel/kthread.c:436 ret_from_fork+0x514/0xb70 arch/x86/kernel/process.c:158 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:245 </TASK> Fixes: 759923cf03b0 ("ipv4: fib: Convert fib_net_exit_batch() to ->exit_rtnl().") Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/6a315824.b0403584.28d0ff.0000.GAE@google.com/ Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260616191359.4142661-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: serialize netif_running() check in enqueue_to_backlog()Eric Dumazet1-2/+4
Syzbot reported a KASAN slab-use-after-free in fib_rules_lookup(). The root cause is a race condition where packets can escape the backlog flushing during device unregistration (e.g., during netns exit). Commit e9e4dd3267d0 ("net: do not process device backlog during unregistration") introduced a lockless netif_running() check in enqueue_to_backlog() to prevent queuing packets to an unregistering device. However, this creates a TOCTOU race window. A lockless transmitter (like veth_xmit) can pass the check before dev_close() clears IFF_UP. If the transmitter is then delayed, flush_all_backlogs() can run and finish before the transmitter grabs the backlog lock and queues the packet. The packet then escapes the flush and triggers UAF later when processed. Fix this by moving the netif_running() check inside the backlog lock. This serializes the check with the flush work (which also grabs the lock). We then either queue the packet before the flush runs (so it gets flushed), or check netif_running() after the flush/close completes (so it gets dropped). Fixes: e9e4dd3267d0 ("net: do not process device backlog during unregistration") Reported-by: syzbot+965506b59a2de0b6905c@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/6a315824.b0403584.28d0ff.0000.GAE@google.com/T/#u Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Julian Anastasov <ja@ssi.bg> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260616141317.407791-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysMerge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski63-238/+823
Merge in late fixes in preparation for the net-next PR. Conflicts: net/tls/tls_sw.c 406e8a651a7b ("net: skmsg: preserve sg.copy across SG transforms") 79511603a65b ("tls: remove dead sockmap (psock) handling from the SW path") drivers/net/ethernet/microsoft/mana/mana_en.c f8fd56977eeea ("net: mana: guard TX wq object destroy with INVALID_MANA_HANDLE check") d07efe5a6e641 ("net: mana: Use per-queue allocation for tx_qp to reduce allocation size") https://lore.kernel.org/ajAPXu-C_PuTgV-a@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysnet: skmsg: preserve sg.copy across SG transformsYiming Qian4-4/+44
The sk_msg sg.copy bitmap is part of the scatterlist entry ownership state. A set bit tells sk_msg_compute_data_pointers() not to expose the entry through writable BPF ctx->data. This protects entries backed by pages that are not private to the sk_msg, such as splice-backed file page-cache pages. Several sk_msg transform paths move, copy, split, or compact msg->sg.data[] entries without moving the matching sg.copy bit. This can make an externally backed entry arrive at a new slot with a clear copy bit. A later SK_MSG verdict can then expose sg_virt(sge) as writable ctx->data and BPF stores can modify the original page cache. Keep sg.copy synchronized with sg.data[] whenever entries are transferred, shifted, split, or copied into a new sk_msg. Clear the bit when an entry is replaced by a newly allocated private page or freed. This covers the BPF pull/push/pop helpers, sk_msg_shift_left/right(), sk_msg_xfer(), and tls_split_open_record(), including the partial tail entry created during TLS open-record splitting. Fixes: d3b18ad31f93 ("tls: add bpf support to sk_msg handling") Cc: stable@vger.kernel.org Reported-by: Yiming Qian <yimingqian591@gmail.com> Reported-by: Keenan Dong <keenanat2000@gmail.com> Signed-off-by: Yiming Qian <yimingqian591@gmail.com> Link: https://patch.msgid.link/20260610062137.49075-1-yimingqian591@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysMerge branch 'appletalk-move-the-protocol-out-of-tree'Jakub Kicinski39-3665/+3
Jakub Kicinski says: ==================== appletalk: move the protocol out of tree This tiny series moves appletalk out of tree, to: https://github.com/linux-netdev/mod-orphan Core maintainainers are unable to keep up with the rate of security bug reports and fixes. Nobody seems to care about appletalk enough to review the patches. As Eric pointed out Mac OS dropped AppleTalk over a decade ago. ==================== Link: https://patch.msgid.link/20260615222935.947233-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysappletalk: move the protocol out of treeJakub Kicinski37-3694/+3
AppleTalk has been removed in MacOS X 10.6 (Snow Leopard), in 2009, according to Wikipedia. We recently got a burst of AI generated fixes to this protocol which nobody is reviewing. Let AppleTalk follow AX.25 and hamradio out of the Linux tree. We we will maintain the code at: github.com/linux-netdev/mod-orphan for anyone interested in playing with it. Retain the uAPI for now. No strong reason, simply because I suspect keeping it will be less controversial. Acked-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://patch.msgid.link/20260615222935.947233-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysappletalk: stop storing per-interface state in struct net_deviceJakub Kicinski4-16/+45
AppleTalk keeps its per-interface control block (struct atalk_iface) directly in struct netdevice (dev->atalk_ptr). This is the only thing tying the protocol into the core net_device layout and is the sole blocker to moving AppleTalk out of tree. Replace dev->atalk_ptr with a small ifindex-keyed hashtable internal to ddp.c. The existing atalk_interfaces list stays the owner of the iface objects; the hashtable is purely a fast dev->iface index and reuses the same atalk_interfaces_lock. AFAICT this patch does not make this code any more racy than it already is, I'm sure Sashiko will point out some basically existing bugs. AFAICT atalk_interfaces_lock is the innermost lock already. Acked-by: Stephen Hemminger <stephen@networkplumber.org> Link: https://patch.msgid.link/20260615222935.947233-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
4 daysMerge branch 'for-7.2/bpf' into for-linusJiri Kosina2-7/+87
4 daysMerge branch 'for-7.2/cleanup_driver_data' into for-linusJiri Kosina4-18/+36
- semantic cleanup fixes for 'hid_device_id::driver_data' (Pawel Zalewski)
4 daysMerge branch 'for-7.2/core' into for-linusJiri Kosina5-17/+33
4 daysMerge branch 'for-7.2/cp2112' into for-linusJiri Kosina1-0/+41
- fwnode support for cp2112 (Danny Kaehn) - fix for cp2112 firmware-based speed configuration, if available (Danny Kaehn)
4 daysMerge branch 'for-7.2/i2c-hid' into for-linusJiri Kosina1-2/+2
4 daysMerge branch 'for-7.2/logitech' into for-linusJiri Kosina1-1/+37
- fix for high resolution scrolling for Logitech HID++ 2.0 devices (Lauri Saurus)
4 daysMerge branch 'for-7.2/multitouch' into for-linusJiri Kosina1-1/+146
- UX improvement fixes for Yoga Book 9 (Dave Carey)
4 daysMerge branch 'for-7.2/nintendo' into for-linusJiri Kosina2-12/+71
- support for HORI Wireless Switch Pad (Hector Zelaya)
4 daysMerge branch 'for-7.2/oxp' into for-linusJiri Kosina5-0/+1606
- suport for OneXPlayer (Derek J. Clark)
4 daysMerge branch 'for-7.2/playstation' into for-linusJiri Kosina1-0/+31
4 daysMerge branch 'for-7.2/rakk' into for-linusJiri Kosina4-0/+90
- support for Rakk Dasig X (Karl Cayme)
4 daysMerge branch 'for-7.2/sony' into for-linusJiri Kosina1-13/+4
4 daysMerge branch 'for-7.2/wacom' into for-linusJiri Kosina1-13/+11
- memory corruption and scheduling while atomic and error fixes (Jinmo Yang) - error handling fix (Myeonghun Pak)
4 daysMerge branch 'for-7.2/wiimote' into for-linusJiri Kosina5250-200751/+176120
4 daysRDMA/irdma: Replace waitqueue and flag with completionJacob Moroni3-13/+9
The driver previously used a waitqueue along with an explicit request_done flag, but without proper barriers around request_done. An earlier patch by Gui-Dong Han <hanguidong02@gmail.com> attempted to fix this by adding the missing memory barriers. Rather than adding the barriers, this patch replaces the waitqueue+flag with a completion, which is designed for this exact purpose. Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions") Link: https://patch.msgid.link/r/20260616155601.1081448-1-jmoroni@google.com Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 daysRDMA/hns: Fix memory leak of bonding resourcesJunxian Huang1-1/+1
In a corner case of concurrent driver removal and driver reset, bonding resource is first released in hns_roce_hw_v2_exit() during driver removal, and then is allocated again in hns_roce_register_device() during driver reset. This leads to memory leak because the release timing has already passed. This may also lead to a kernel panic as below because of the leaked notifier callback: Call trace: 0xffffa20fccc04978 (P) raw_notifier_call_chain+0x20/0x38 call_netdevice_notifiers_info+0x60/0xb8 netdev_lower_state_changed+0x4c/0xb8 As Sashiko suggested, the teardown order of bonding resources should be inverted to make sure the resources are released when the driver is removed. Fixes: b37ad2e290fc ("RDMA/hns: Initialize bonding resources") Link: https://patch.msgid.link/r/20260613102045.811623-1-huangjunxian6@hisilicon.com Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 daysRDMA/rtrs-srv: Bound RDMA-Write length to chunk size in rdma_write_sgZhenhao Wan1-2/+3
When the server answers an RTRS READ, rdma_write_sg() builds the source scatter/gather entry for the IB_WR_RDMA_WRITE that returns data to the peer. Its length is taken directly from the wire descriptor: plist->length = le32_to_cpu(id->rd_msg->desc[0].len); rd_msg points into the chunk buffer that the remote peer filled via RDMA-WRITE-WITH-IMM (rtrs_srv_rdma_done() -> process_io_req() -> process_read()), so desc[0].len is attacker-controlled and, before this change, was only rejected when zero. The source address is the fixed chunk start (dma_addr[msg_id]) and the source lkey is the PD-wide local_dma_lkey, which is not tied to the chunk's MR mapping, so the verbs layer does not constrain the transfer length to max_chunk_size. msg_id and off are bounded against queue_depth and max_chunk_size in rtrs_srv_rdma_done(), but desc[0].len is a separate field that was not checked against the chunk size. A peer that advertises desc[0].len larger than max_chunk_size can make the posted RDMA write read past the chunk's mapped region. The resulting behaviour depends on the IOMMU configuration: with no IOMMU or in passthrough mode the read may extend into memory adjacent to the chunk and be returned to the peer, which can disclose host memory; with a translating IOMMU the out-of-range access is expected to fault and abort the connection. In either case the transfer exceeds what the protocol permits and is driven by a remote peer. Reject a descriptor length above max_chunk_size, mirroring the existing off >= max_chunk_size bound in rtrs_srv_rdma_done(). Legitimate clients do not exceed it: the client sets desc[0].len to its MR length, which is capped at the negotiated max_io_size (max_chunk_size - MAX_HDR_SIZE). Fixes: 9cb837480424 ("RDMA/rtrs: server: main functionality") Link: https://patch.msgid.link/r/20260612-master-v1-1-70cde5c6fdc9@gmail.com Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Zhenhao Wan <whi4ed0g@gmail.com> Reviewed-by: Md Haris Iqbal <haris.iqbal@ionos.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 daysdocs: infiniband: correct name of option to enable the ib_uverbs moduleEthan Nelson-Moore2-2/+2
The Infiniband documentation states that CONFIG_INFINIBAND_USER_VERBS should be used to enable the ib_uverbs module. However, this option was renamed to CONFIG_INFINIBAND_USER_ACCESS in commit 17781cd6186c ("[PATCH] IB: clean up user access config options"). Update the documentation to reflect this. Link: https://patch.msgid.link/r/20260616002027.67925-1-enelsonmoore@gmail.com Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Reviewed-by: Dongliang Mu <dzm91@hust.edu.cn> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 daysRDMA/bnxt_re: Reject GET_TOGGLE_MEM when toggle page was not allocatedSelvin Xavier1-0/+4
If a user calls BNXT_RE_METHOD_GET_TOGGLE_MEM on a device that does not support the CQ/SRQ toggle feature, uctx_cq_page or uctx_srq_page will be NULL. Add an explicit -EOPNOTSUPP return after capturing the address from uctx_cq_page / uctx_srq_page if the address is zero. Fixes: e275919d9669 ("RDMA/bnxt_re: Share a page to expose per CQ info with userspace") Fixes: 181028a0d84c ("RDMA/bnxt_re: Share a page to expose per SRQ info with userspace") Link: https://patch.msgid.link/r/20260615224751.232802-16-selvin.xavier@broadcom.com Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
4 daysRDMA/bnxt_re: Fail DBR related page allocation UAPIs if the feature is disabledSelvin Xavier1-0/+4
No need to support the DBR related page allocations if the pacing feature is disabled. Fail the request if pacing is disabled. Fixes: ea2224857882 ("RDMA/bnxt_re: Update alloc_page uapi for pacing") Link: https://patch.msgid.link/r/20260615224751.232802-15-selvin.xavier@broadcom.com Reviewed-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>