aboutsummaryrefslogtreecommitdiff
path: root/fs/nfs
AgeCommit message (Collapse)AuthorFilesLines
9 daysNFS: Fix size read races in truncate, fallocate and copy offloadTrond Myklebust3-14/+27
If the pre-operation file size is read before locking the inode and quiescing O_DIRECT writes, then nfs_truncate_last_folio() might end up overwriting valid file data. Fixes: b1817b18ff20 ("NFS: Protect against 'eof page pollution'") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
12 daysNFS: Don't immediately return directory delegations when disabledAnna Schumaker1-1/+1
The function nfs_inode_evict_delegation() immediately and synchronously returns a delegation when called. This means we can't call it from nfs4_have_delegation(), since that function could be called under a lock. Instead we should mark the delegation for return and let the state manager handle it for us. Fixes: b6d2a520f463 ("NFS: Add a module option to disable directory delegations") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-07NFS/localio: Deal with page bases that are > PAGE_SIZETrond Myklebust1-0/+2
When resending requests, etc, the page base can quickly grow larger than the page size. Fixes: 091bdcfcece0 ("nfs/localio: refactor iocb and iov_iter_bvec initialization") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org>
2026-01-07NFS/localio: Stop further I/O upon hitting an errorTrond Myklebust1-16/+14
If the call into the filesystem results in an I/O error, then the next chunk of data won't be contiguous with the end of the last successful chunk. So break out of the I/O loop and report the results. Currently the localio code will do this for a short read/write, but not for an error. Fixes: 6a218b9c3183 ("nfs/localio: do not issue misaligned DIO out-of-order") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Mike Snitzer <snitzer@kernel.org>
2026-01-07NFSv4.x: Directory delegations don't require any state recoveryTrond Myklebust2-1/+10
The state recovery code in nfs_end_delegation_return() is intended to allow regular files to recover cached open and lock state. It has no function for directory delegations, and may cause corruption. Fixes: 156b09482933 ("NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-04NFSv4: Don't free slots prematurely if requesting a directory delegationTrond Myklebust1-8/+39
When requesting a directory delegation, it is imperative to hold the slot until the delegation state has been recorded. Otherwise, if a recall comes in, the call to referring_call_exists() will assume the processing is done, and when it doesn't find a delegation, it will assume it has been returned. Fixes: 156b09482933 ("NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-04NFSv4: Fix nfs_clear_verifier_delegated() for delegated directoriesTrond Myklebust1-8/+49
If the client returns a directory delegation, then look up all the child dentries, and clear their 'verifier delegated' bit, unless subject to a file delegation. Similarly, if a file delegation is being returned, check if there is a directory delegation before clearing a 'verifier delegated' bit. Reported-by: Christoph Hellwig <hch@lst.de> Fixes: 156b09482933 ("NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-04NFS: Fix directory delegation verifier checksAnna Schumaker1-19/+2
Doing this check in nfs_check_verifier() resulted in many, many more lookups on the wire when running Christoph's delegation benchmarking script. After some experimentation, I found that we can treat directory delegations exactly the same as having a delegated verifier when we reach nfs4_lookup_revalidate() for the best performance. Reported-by: Christoph Hellwig <hch@lst.de> Fixes: 156b09482933 ("NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-04pnfs/blocklayout: Fix memory leak in bl_parse_scsi()Zilin Guan1-2/+4
In bl_parse_scsi(), if the block device length is zero, the function returns immediately without releasing the file reference obtained via bl_open_path(), leading to a memory leak. Fix this by jumping to the out_blkdev_put label to ensure the file reference is properly released. Fixes: d76c769c8db4c ("pnfs/blocklayout: Don't add zero-length pnfs_block_dev") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-04pnfs/flexfiles: Fix memory leak in nfs4_ff_alloc_deviceid_node()Zilin Guan1-1/+1
In nfs4_ff_alloc_deviceid_node(), if the allocation for ds_versions fails, the function jumps to the out_scratch label without freeing the already allocated dsaddrs list, leading to a memory leak. Fix this by jumping to the out_err_drain_dsaddrs label, which properly frees the dsaddrs list before cleaning up other resources. Fixes: d67ae825a59d6 ("pnfs/flexfiles: Add the FlexFile Layout Driver") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-04NFS: Fix a deadlock involving nfs_release_folio()Trond Myklebust3-1/+38
Wang Zhaolong reports a deadlock involving NFSv4.1 state recovery waiting on kthreadd, which is attempting to reclaim memory by calling nfs_release_folio(). The latter cannot make progress due to state recovery being needed. It seems that the only safe thing to do here is to kick off a writeback of the folio, without waiting for completion, or else kicking off an asynchronous commit. Reported-by: Wang Zhaolong <wangzhaolong@huaweicloud.com> Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-04pNFS: Fix a deadlock when returning a delegation during open()Trond Myklebust3-30/+51
Ben Coddington reports seeing a hang in the following stack trace: 0 [ffffd0b50e1774e0] __schedule at ffffffff9ca05415 1 [ffffd0b50e177548] schedule at ffffffff9ca05717 2 [ffffd0b50e177558] bit_wait at ffffffff9ca061e1 3 [ffffd0b50e177568] __wait_on_bit at ffffffff9ca05cfb 4 [ffffd0b50e1775c8] out_of_line_wait_on_bit at ffffffff9ca05ea5 5 [ffffd0b50e177618] pnfs_roc at ffffffffc154207b [nfsv4] 6 [ffffd0b50e1776b8] _nfs4_proc_delegreturn at ffffffffc1506586 [nfsv4] 7 [ffffd0b50e177788] nfs4_proc_delegreturn at ffffffffc1507480 [nfsv4] 8 [ffffd0b50e1777f8] nfs_do_return_delegation at ffffffffc1523e41 [nfsv4] 9 [ffffd0b50e177838] nfs_inode_set_delegation at ffffffffc1524a75 [nfsv4] 10 [ffffd0b50e177888] nfs4_process_delegation at ffffffffc14f41dd [nfsv4] 11 [ffffd0b50e1778a0] _nfs4_opendata_to_nfs4_state at ffffffffc1503edf [nfsv4] 12 [ffffd0b50e1778c0] _nfs4_open_and_get_state at ffffffffc1504e56 [nfsv4] 13 [ffffd0b50e177978] _nfs4_do_open at ffffffffc15051b8 [nfsv4] 14 [ffffd0b50e1779f8] nfs4_do_open at ffffffffc150559c [nfsv4] 15 [ffffd0b50e177a80] nfs4_atomic_open at ffffffffc15057fb [nfsv4] 16 [ffffd0b50e177ad0] nfs4_file_open at ffffffffc15219be [nfsv4] 17 [ffffd0b50e177b78] do_dentry_open at ffffffff9c09e6ea 18 [ffffd0b50e177ba8] vfs_open at ffffffff9c0a082e 19 [ffffd0b50e177bd0] dentry_open at ffffffff9c0a0935 The issue is that the delegreturn is being asked to wait for a layout return that cannot complete because a state recovery was initiated. The state recovery cannot complete until the open() finishes processing the delegations it was given. The solution is to propagate the existing flags that indicate a non-blocking call to the function pnfs_roc(), so that it knows not to wait in this situation. Reported-by: Benjamin Coddington <bcodding@hammerspace.com> Fixes: 29ade5db1293 ("pNFS: Wait on outstanding layoutreturns to complete in pnfs_roc()") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-12-12Merge tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds19-81/+328
Pull NFS client updates from Trond Myklebust: "Bugfixes: - Fix 'nlink' attribute update races when unlinking a file - Add missing initialisers for the directory verifier in various places - Don't regress the NFSv4 open state due to misordered racing replies - Ensure the NFSv4.x callback server uses the correct transport connection - Fix potential use-after-free races when shutting down the NFSv4.x callback server - Fix a pNFS layout commit crash - Assorted fixes to ensure correct propagation of mount options when the client crosses a filesystem boundary and triggers the VFS automount code - More localio fixes Features and cleanups: - Add initial support for basic directory delegations - SunRPC back channel code cleanups" * tag 'nfs-for-6.19-1' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: (24 commits) NFSv4: Handle NFS4ERR_NOTSUPP errors for directory delegations nfs/localio: remove 61 byte hole from needless ____cacheline_aligned nfs/localio: remove alignment size checking in nfs_is_local_dio_possible NFS: Fix up the automount fs_context to use the correct cred NFS: Fix inheritance of the block sizes when automounting NFS: Automounted filesystems should inherit ro,noexec,nodev,sync flags Revert "nfs: ignore SB_RDONLY when mounting nfs" Revert "nfs: clear SB_RDONLY before getting superblock" Revert "nfs: ignore SB_RDONLY when remounting nfs" NFS: Add a module option to disable directory delegations NFS: Shortcut lookup revalidations if we have a directory delegation NFS: Request a directory delegation during RENAME NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK NFS: Add support for sending GDD_GETATTR NFSv4/pNFS: Clear NFS_INO_LAYOUTCOMMIT in pnfs_mark_layout_stateid_invalid NFSv4.1: protect destroying and nullifying bc_serv structure SUNRPC: new helper function for stopping backchannel server SUNRPC: cleanup common code in backchannel request NFSv4.1: pass transport for callback shutdown NFSv4: ensure the open stateid seqid doesn't go backwards ...
2025-12-05nfs/localio: fix regression due to out-of-order __put_credMike Snitzer1-31/+17
Commit f2060bdc21d7 ("nfs/localio: add refcounting for each iocb IO associated with NFS pgio header") inadvertantly reintroduced the same potential for __put_cred() triggering BUG_ON(cred == current->cred) that commit 992203a1fba5 ("nfs/localio: restore creds before releasing pageio data") fixed. Fix this by saving and restoring the cred around each {read,write}_iter call within the respective for loop of nfs_local_call_{read,write} using scoped_with_creds(). NOTE: this fix started by first reverting the following commits: 94afb627dfc2 ("nfs: use credential guards in nfs_local_call_read()") bff3c841f7bd ("nfs: use credential guards in nfs_local_call_write()") 1d18101a644e ("Merge tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs") followed by narrowly fixing the cred lifetime issue by using scoped_with_creds(). In doing so, this commit's changes appear more extensive than they really are (as evidenced by comparing to v6.18's fs/nfs/localio.c). Reported-by: Zorro Lang <zlang@redhat.com> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Acked-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/linux-next/20251205111942.4150b06f@canb.auug.org.au/ Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-05NFSv4: Handle NFS4ERR_NOTSUPP errors for directory delegationsTrond Myklebust1-8/+17
The error NFS4ERR_NOTSUPP will be returned for operations that are legal, but not supported by the server. Fixes: 156b09482933 ("NFS: Request a directory delegation on ACCESS, CREATE, and UNLINK") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-12-05nfs/localio: remove 61 byte hole from needless ____cacheline_alignedMike Snitzer1-1/+1
struct nfs_local_kiocb used ____cacheline_aligned on its iters[] array and as the structure evolved it caused a 61 byte hole to form. Fix this by removing ____cacheline_aligned and reordering iters[] before iter_is_dio_aligned[]. Fixes: 6a218b9c3183 ("nfs/localio: do not issue misaligned DIO out-of-order") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-12-05nfs/localio: remove alignment size checking in nfs_is_local_dio_possibleMike Snitzer1-2/+0
This check to ensure dio_offset_align isn't larger than PAGE_SIZE is no longer relevant (older iterations of NFS Direct was allocating misaligned head and tail pages but no longer does, so this check isn't needed). Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-12-01Merge tag 'vfs-6.19-rc1.directory.delegations' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull directory delegations update from Christian Brauner: "This contains the work for recall-only directory delegations for knfsd. Add support for simple, recallable-only directory delegations. This was decided at the fall NFS Bakeathon where the NFS client and server maintainers discussed how to merge directory delegation support. The approach starts with recallable-only delegations for several reasons: 1. RFC8881 has gaps that are being addressed in RFC8881bis. In particular, it requires directory position information for CB_NOTIFY callbacks, which is difficult to implement properly under Linux. The spec is being extended to allow that information to be omitted. 2. Client-side support for CB_NOTIFY still lags. The client side involves heuristics about when to request a delegation. 3. Early indication shows simple, recallable-only delegations can help performance. Anna Schumaker mentioned seeing a multi-minute speedup in xfstests runs with them enabled. With these changes, userspace can also request a read lease on a directory that will be recalled on conflicting accesses. This may be useful for applications like Samba. Users can disable leases altogether via the fs.leases-enable sysctl if needed. VFS changes: - Dedicated Type for Delegations Introduce struct delegated_inode to track inodes that may have delegations that need to be broken. This replaces the previous approach of passing raw inode pointers through the delegation breaking code paths, providing better type safety and clearer semantics for the delegation machinery. - Break parent directory delegations in open(..., O_CREAT) codepath - Allow mkdir to wait for delegation break on parent - Allow rmdir to wait for delegation break on parent - Add try_break_deleg calls for parents to vfs_link(), vfs_rename(), and vfs_unlink() - Make vfs_create(), vfs_mknod(), and vfs_symlink() break delegations on parent directory - Clean up argument list for vfs_create() - Expose delegation support to userland Filelock changes: - Make lease_alloc() take a flags argument - Rework the __break_lease API to use flags - Add struct delegated_inode - Push the S_ISREG check down to ->setlease handlers - Lift the ban on directory leases in generic_setlease NFSD changes: - Allow filecache to hold S_IFDIR files - Allow DELEGRETURN on directories - Wire up GET_DIR_DELEGATION handling Fixes: - Fix kernel-doc warnings in __fcntl_getlease - Add needed headers for new struct delegation definition" * tag 'vfs-6.19-rc1.directory.delegations' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: add needed headers for new struct delegation definition filelock: __fcntl_getlease: fix kernel-doc warnings vfs: expose delegation support to userland nfsd: wire up GET_DIR_DELEGATION handling nfsd: allow DELEGRETURN on directories nfsd: allow filecache to hold S_IFDIR files filelock: lift the ban on directory leases in generic_setlease vfs: make vfs_symlink break delegations on parent dir vfs: make vfs_mknod break delegations on parent directory vfs: make vfs_create break delegations on parent directory vfs: clean up argument list for vfs_create() vfs: break parent dir delegations in open(..., O_CREAT) codepath vfs: allow rmdir to wait for delegation break on parent vfs: allow mkdir to wait for delegation break on parent vfs: add try_break_deleg calls for parents to vfs_{link,rename,unlink} filelock: push the S_ISREG check down to ->setlease handlers filelock: add struct delegated_inode filelock: rework the __break_lease API to use flags filelock: make lease_alloc() take a flags argument
2025-12-01Merge tag 'kernel-6.19-rc1.cred' of ↵Linus Torvalds2-23/+30
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull cred guard updates from Christian Brauner: "This contains substantial credential infrastructure improvements adding guard-based credential management that simplifies code and eliminates manual reference counting in many subsystems. Features: - Kernel Credential Guards Add with_kernel_creds() and scoped_with_kernel_creds() guards that allow using the kernel credentials without allocating and copying them. This was requested by Linus after seeing repeated prepare_kernel_creds() calls that duplicate the kernel credentials only to drop them again later. The new guards completely avoid the allocation and never expose the temporary variable to hold the kernel credentials anywhere in callers. - Generic Credential Guards Add scoped_with_creds() guards for the common override_creds() and revert_creds() pattern. This builds on earlier work that made override_creds()/revert_creds() completely reference count free. - Prepare Credential Guards Add prepare credential guards for the more complex pattern of preparing a new set of credentials and overriding the current credentials with them: - prepare_creds() - modify new creds - override_creds() - revert_creds() - put_cred() Cleanups: - Make init_cred static since it should not be directly accessed - Add kernel_cred() helper to properly access the kernel credentials - Fix scoped_class() macro that was introduced two cycles ago - coredump: split out do_coredump() from vfs_coredump() for cleaner credential handling - coredump: move revert_cred() before coredump_cleanup() - coredump: mark struct mm_struct as const - coredump: pass struct linux_binfmt as const - sev-dev: use guard for path" * tag 'kernel-6.19-rc1.cred' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (36 commits) trace: use override credential guard trace: use prepare credential guard coredump: use override credential guard coredump: use prepare credential guard coredump: split out do_coredump() from vfs_coredump() coredump: mark struct mm_struct as const coredump: pass struct linux_binfmt as const coredump: move revert_cred() before coredump_cleanup() sev-dev: use override credential guards sev-dev: use prepare credential guard sev-dev: use guard for path cred: add prepare credential guard net/dns_resolver: use credential guards in dns_query() cgroup: use credential guards in cgroup_attach_permissions() act: use credential guards in acct_write_process() smb: use credential guards in cifs_get_spnego_key() nfs: use credential guards in nfs_idmap_get_key() nfs: use credential guards in nfs_local_call_write() nfs: use credential guards in nfs_local_call_read() erofs: use credential guards ...
2025-12-01Merge tag 'vfs-6.19-rc1.inode' of ↵Linus Torvalds2-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...
2025-11-29NFS: Fix up the automount fs_context to use the correct credTrond Myklebust1-0/+5
When automounting, the fs_context should be fixed up to use the cred from the parent filesystem, since the operation is just extending the namespace. Authorisation to enter that namespace will already have been provided by the preceding lookup. Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-29NFS: Fix inheritance of the block sizes when automountingTrond Myklebust5-17/+38
Only inherit the block sizes that were actually specified as mount parameters for the parent mount. Fixes: 62a55d088cd8 ("NFS: Additional refactoring for fs_context conversion") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-29NFS: Automounted filesystems should inherit ro,noexec,nodev,sync flagsTrond Myklebust2-4/+6
When a filesystem is being automounted, it needs to preserve the user-set superblock mount options, such as the "ro" flag. Reported-by: Li Lingfeng <lilingfeng3@huawei.com> Link: https://lore.kernel.org/all/20240604112636.236517-3-lilingfeng@huaweicloud.com/ Fixes: f2aedb713c28 ("NFS: Add fs_context support.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-29Revert "nfs: ignore SB_RDONLY when mounting nfs"Trond Myklebust1-1/+1
This reverts commit 52cb7f8f177878b4f22397b9c4d2c8f743766be3. Silently ignoring the "ro" and "rw" mount options causes user confusion, and regressions. Reported-by: Alkis Georgopoulos<alkisg@gmail.com> Cc: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 52cb7f8f1778 ("nfs: ignore SB_RDONLY when mounting nfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-29Revert "nfs: clear SB_RDONLY before getting superblock"Trond Myklebust1-9/+0
This reverts commit 8cd9b785943c57a136536250da80ba1eb6f8eb18. Silently ignoring the "ro" and "rw" mount options causes user confusion, and regressions. Reported-by: Alkis Georgopoulos<alkisg@gmail.com> Cc: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 8cd9b785943c ("nfs: clear SB_RDONLY before getting superblock") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-29Revert "nfs: ignore SB_RDONLY when remounting nfs"Trond Myklebust1-10/+0
This reverts commit 80c4de6ab44c14e910117a02f2f8241ffc6ec54a. Silently ignoring the "ro" and "rw" mount options causes user confusion, and regressions. Reported-by: Alkis Georgopoulos<alkisg@gmail.com> Cc: Li Lingfeng <lilingfeng3@huawei.com> Fixes: 80c4de6ab44c ("nfs: ignore SB_RDONLY when remounting nfs") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFS: Add a module option to disable directory delegationsAnna Schumaker3-0/+11
When this option is disabled then the client will not request directory delegations or check if we have one during the revalidation paths. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFS: Shortcut lookup revalidations if we have a directory delegationAnna Schumaker3-0/+27
Holding a directory delegation means we know that nobody else has modified the directory on the server, so we can take a few revalidation shortcuts. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFS: Request a directory delegation during RENAMEAnna Schumaker4-4/+10
If we notice that we're renaming a file within a directory then we take that as a sign that the user is working with the current directory and may want a delegation to avoid extra revalidations when possible. The nfs_request_directory_delegation() function exists within the NFS v4 module, so I add an extra flag to rename_setup() to indicate if a dentry is being renamed within the same parent directory. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFS: Request a directory delegation on ACCESS, CREATE, and UNLINKAnna Schumaker3-4/+58
This patch adds a new flag: NFS_INO_REQ_DIR_DELEG to signal that a directory wants to request a directory delegation the next time it does a GETATTR. I have the client request a directory delegation when doing an access, create, or unlink call since these calls indicate that a user is working with a directory. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFS: Add support for sending GDD_GETATTRAnna Schumaker1-0/+106
I add this to the existing GETATTR compound as an option extra step that we can send if the "dir_deleg" flag is set to 'true'. Actually enabling this value will happen in a later patch. Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFSv4/pNFS: Clear NFS_INO_LAYOUTCOMMIT in pnfs_mark_layout_stateid_invalidJonathan Curley1-0/+1
Fixes a crash when layout is null during this call stack: write_inode -> nfs4_write_inode -> pnfs_layoutcommit_inode pnfs_set_layoutcommit relies on the lseg refcount to keep the layout around. Need to clear NFS_INO_LAYOUTCOMMIT otherwise we might attempt to reference a null layout. Fixes: fe1cf9469d7bc ("pNFS: Clear all layout segment state in pnfs_mark_layout_stateid_invalid") Signed-off-by: Jonathan Curley <jcurley@purestorage.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFSv4.1: protect destroying and nullifying bc_serv structureOlga Kornievskaia1-1/+1
When we are shutting down the client, we free the callback server structure and then at a later pointer we free the transport used by the client. Yet, it's possible that after the callback server is freed, the transport receives a backchannel request at which point we can dereferene freed memory. Instead, do the freeing the bc server and nullying bc_serv under the lock. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFSv4.1: pass transport for callback shutdownOlga Kornievskaia3-4/+10
When we are setting up the 4.1 callback server, we pass in the appropriate rpc_xprt transport pointer with which to associate the callback server structure. Similarly, pass in the rpc_xprt pointer for when we are shutting down the callback. This will be used to make sure that we free the server structure and then clear the rpc_xprt's bc_server pointer in a safe manner. Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-23NFSv4: ensure the open stateid seqid doesn't go backwardsScott Mayhew2-2/+12
We have observed an NFSv4 client receiving a LOCK reply with a status of NFS4ERR_OLD_STATEID and subsequently retrying the LOCK request with an earlier seqid value in the stateid. As this was for a new lockowner, that would imply that nfs_set_open_stateid_locked() had updated the open stateid seqid with an earlier value. Looking at nfs_set_open_stateid_locked(), if the incoming seqid is out of sequence, the task will sleep on the state->waitq for up to 5 seconds. If the task waits for the full 5 seconds, then after finishing the wait it'll update the open stateid seqid with whatever value the incoming seqid has. If there are multiple waiters in this scenario, then the last one to perform said update may not be the one with the highest seqid. Add a check to ensure that the seqid can only be incremented, and add a tracepoint to indicate when old seqids are skipped. Signed-off-by: Scott Mayhew <smayhew@redhat.com> Reviewed-by: Benjamin Coddington <bcodding@hammerspace.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-19NFS: Initialise verifiers for visible dentries in _nfs4_open_and_get_stateTrond Myklebust1-13/+14
Ensure that the verifiers are initialised before calling d_splice_alias() in _nfs4_open_and_get_state(). Reported-by: Michael Stoler <michael.stoler@vastdata.com> Fixes: cf5b4059ba71 ("NFSv4: Fix races between open and dentry revalidation") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-19NFS: Initialise verifiers for visible dentries in nfs_atomic_open()Trond Myklebust1-1/+1
Ensure that the verifiers are initialised before calling d_splice_alias() in nfs_atomic_open(). Reported-by: Michael Stoler <michael.stoler@vastdata.com> Fixes: 809fd143de88 ("NFSv4: Ensure nfs_atomic_open set the dentry verifier on ENOENT") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-19NFS: Initialise verifiers for visible dentries in readdir and lookupTrond Myklebust1-2/+4
Ensure that the verifiers are initialised before calling d_splice_alias() in both nfs_prime_dcache() and nfs_lookup(). Reported-by: Michael Stoler <michael.stoler@vastdata.com> Fixes: a1147b8281bd ("NFS: Fix up directory verifier races") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-17NFS: Avoid changing nlink when file removes and attribute updates raceTrond Myklebust1-6/+13
If a file removal races with another operation that updates its attributes, then skip the change to nlink, and just mark the attributes as being stale. Reported-by: Aiden Lambert <alambert48@gatech.edu> Fixes: 59a707b0d42e ("NFS: Ensure we revalidate the inode correctly after remove or rename") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2025-11-12filelock: push the S_ISREG check down to ->setlease handlersJeff Layton1-0/+2
When nfsd starts requesting directory delegations, setlease handlers may see requests for leases on directories. Push the !S_ISREG check down into the non-trivial setlease handlers, so we can selectively enable them where they're supported. FUSE is special: It's the only filesystem that supports atomic_open and allows kernel-internal leases. atomic_open is issued when the VFS doesn't know the state of the dentry being opened. If the file doesn't exist, it may be created, in which case the dir lease should be broken. The existing kernel-internal lease implementation has no provision for this. Ensure that we don't allow directory leases by default going forward by explicitly disabling them there. Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20251111-dir-deleg-ro-v6-4-52f3feebb2f2@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-10NFS: Fix LTP test failures when timestamps are delegatedDai Ngo1-6/+12
The utimes01 and utime06 tests fail when delegated timestamps are enabled, specifically in subtests that modify the atime and mtime fields using the 'nobody' user ID. The problem can be reproduced as follow: # echo "/media *(rw,no_root_squash,sync)" >> /etc/exports # export -ra # mount -o rw,nfsvers=4.2 127.0.0.1:/media /tmpdir # cd /opt/ltp # ./runltp -d /tmpdir -s utimes01 # ./runltp -d /tmpdir -s utime06 This issue occurs because nfs_setattr does not verify the inode's UID against the caller's fsuid when delegated timestamps are permitted for the inode. This patch adds the UID check and if it does not match then the request is sent to the server for permission checking. Fixes: e12912d94137 ("NFSv4: Add support for delegated atime and mtime attributes") Signed-off-by: Dai Ngo <dai.ngo@oracle.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10NFSv4: Fix an incorrect parameter when calling nfs4_call_sync()Trond Myklebust1-3/+6
The Smatch static checker noted that in _nfs4_proc_lookupp(), the flag RPC_TASK_TIMEOUT is being passed as an argument to nfs4_init_sequence(), which is clearly incorrect. Since LOOKUPP is an idempotent operation, nfs4_init_sequence() should not ask the server to cache the result. The RPC_TASK_TIMEOUT flag needs to be passed down to the RPC layer. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reported-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com> Fixes: 76998ebb9158 ("NFSv4: Observe the NFS_MOUNT_SOFTREVAL flag in _nfs4_proc_lookupp") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10NFS: sysfs: fix leak when nfs_client kobject add failsYang Xiuwei1-0/+1
If adding the second kobject fails, drop both references to avoid sysfs residue and memory leak. Fixes: e96f9268eea6 ("NFS: Make all of /sys/fs/nfs network-namespace unique") Signed-off-by: Yang Xiuwei <yangxiuwei@kylinos.cn> Reviewed-by: Benjamin Coddington <ben.coddington@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10NFSv2/v3: Fix error handling in nfs_atomic_open_v23()Trond Myklebust1-3/+4
When nfs_do_create() returns an EEXIST error, it means that a regular file could not be created. That could mean that a symlink needs to be resolved. If that's the case, a lookup needs to be kicked off. Reported-by: Stephen Abbene <sabbene87@gmail.com> Link: https://bugzilla.kernel.org/show_bug.cgi?id=220710 Fixes: 7c6c5249f061 ("NFS: add atomic_open for NFSv3 to handle O_TRUNC correctly.") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: do not issue misaligned DIO out-of-orderMike Snitzer1-76/+52
From https://lore.kernel.org/linux-nfs/aQHASIumLJyOoZGH@infradead.org/ On Wed, Oct 29, 2025 at 12:20:40AM -0700, Christoph Hellwig wrote: > On Mon, Oct 27, 2025 at 12:18:30PM -0400, Mike Snitzer wrote: > > LOCALIO's misaligned DIO will issue head/tail followed by O_DIRECT > > middle (via AIO completion of that aligned middle). So out of order > > relative to file offset. > > That's in general a really bad idea. It will obviously work, but > both on SSDs and out of place write file systems it is a sure way > to increase your garbage collection overhead a lot down the line. Fix this by never issuing misaligned DIO out of order. This fix means the DIO-aligned middle will only use AIO completion if there is no misaligned end segment. Otherwise, all 3 segments of a misaligned DIO will be issued without AIO completion to ensure file offset increases properly for all partial READ or WRITE situations. Factoring out nfs_local_iter_setup() helps standardize repetitive nfs_local_iters_setup_dio() code and is inspired by cleanup work that Chuck Lever did on the NFSD Direct code. Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: Ensure DIO WRITE's IO on stable storage upon completionMike Snitzer1-1/+5
LOCALIO's misaligned DIO WRITE support requires synchronous IO for any misaligned head and/or tail that are issued using buffered IO. In addition, it is important that the O_DIRECT middle be on stable storage upon its completion via AIO. Otherwise, a misaligned DIO WRITE could mix buffered IO for the head/tail and direct IO for the DIO-aligned middle -- which could lead to problems associated with deferred writes to stable storage (such as out of order partial completions causing incorrect advancement of the file's offset, etc). Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: backfill missing partial read support for misaligned DIOMike Snitzer1-4/+20
Misaligned DIO read can be split into 3 IOs, must handle potential for short read from each component IO (follows same pattern used for handling partial writes, except upper layer read code handles advancing offset before retry). Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: add refcounting for each iocb IO associated with NFS pgio headerMike Snitzer1-43/+67
Improve completion handling of as many as 3 IOs associated with each misaligned DIO by using a atomic_t to track completion of each IO. Update nfs_local_pgio_done() to use precise atomic_t accounting for remaining iov_iter (up to 3) associated with each iocb, so that each NFS LOCALIO pgio header is only released after all IOs have completed. But also allow early return if/when a short read or write occurs. Fixes reported BUG: KASAN: slab-use-after-free in nfs_local_call_read: https://lore.kernel.org/linux-nfs/aPSvi5Yr2lGOh5Jh@dell-per750-06-vm-07.rhts.eng.pek2.redhat.com/ Reported-by: Yongcheng Yang <yoyang@redhat.com> Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10nfs/localio: remove unecessary ENOTBLK handling in DIO WRITE supportMike Snitzer1-10/+3
Each filesystem is meant to fallback to retrying DIO in terms buffered IO when it might encounter -ENOTBLK when issuing DIO (which can happen if the VFS cannot invalidate the page cache). So NFS doesn't need special handling for -ENOTBLK. Also, explicitly initialize a couple DIO related iocb members rather than simply rely on data structure zeroing. Fixes: c817248fc831 ("nfs/localio: add proper O_DIRECT support for READ and WRITE") Reported-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>
2025-11-10NFS: Check the TLS certificate fields in nfs_match_client()Trond Myklebust1-0/+8
If the TLS security policy is of type RPC_XPRTSEC_TLS_X509, then the cert_serial and privkey_serial fields need to match as well since they define the client's identity, as presented to the server. Fixes: 90c9550a8d65 ("NFS: support the kernel keyring for TLS") Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com>