aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2026-01-13getname_flags() massage, part 1Al Viro1-17/+16
In case of long name don't reread what we'd already copied. memmove() it instead. That avoids the possibility of ending up with empty name there and the need to look at the flags on the slow path. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13ntfs: ->d_compare() must not blockAl Viro5-24/+20
... so don't use __getname() there. Switch it (and ntfs_d_hash(), while we are at it) to kmalloc(PATH_MAX, GFP_NOWAIT). Yes, ntfs_d_hash() almost certainly can do with smaller allocations, but let ntfs folks deal with that - keep the allocation size as-is for now. Stop abusing names_cachep in ntfs, period - various uses of that thing in there have nothing to do with pathnames; just use k[mz]alloc() and be done with that. For now let's keep sizes as-in, but AFAICS none of the users actually want PATH_MAX. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13get rid of audit_reusename()Al Viro1-8/+3
Originally we tried to avoid multiple insertions into audit names array during retry loop by a cute hack - memorize the userland pointer and if there already is a match, just grab an extra reference to it. Cute as it had been, it had problems - two identical pointers had audit aux entries merged, two identical strings did not. Having different behaviour for syscalls that differ only by addresses of otherwise identical string arguments is obviously wrong - if nothing else, compiler can decide to merge identical string literals. Besides, this hack does nothing for non-audited processes - they get a fresh copy for retry. It's not time-critical, but having behaviour subtly differ that way is bogus. These days we have very few places that import filename more than once (9 functions total) and it's easy to massage them so we get rid of all re-imports. With that done, we don't need audit_reusename() anymore. There's no need to memorize userland pointer either. Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13do_readlinkat(): import pathname only onceAl Viro1-3/+3
Take getname_flags() and putname() outside of retry loop. Since getname_flags() is the only thing that cares about LOOKUP_EMPTY, don't bother with setting LOOKUP_EMPTY in lookup_flags - just pass it to getname_flags() and be done with that. The things could be further simplified by use of cleanup.h stuff, but let's not clutter the patch with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13do_sys_truncate(): import pathname only onceAl Viro1-1/+4
Convert the user_path_at() call inside a retry loop into getname_flags() + filename_lookup() + putname() and leave only filename_lookup() inside the loop. In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent to plain getname(). The things could be further simplified by use of cleanup.h stuff, but let's not clutter the patch with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13user_statfs(): import pathname only onceAl Viro1-1/+3
Convert the user_path_at() call inside a retry loop into getname_flags() + filename_lookup() + putname() and leave only filename_lookup() inside the loop. In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent to plain getname(). The things could be further simplified by use of cleanup.h stuff, but let's not clutter the patch with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13chroot(2): import pathname only onceAl Viro1-1/+3
Convert the user_path_at() call inside a retry loop into getname_flags() + filename_lookup() + putname() and leave only filename_lookup() inside the loop. In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent to plain getname(). The things could be further simplified by use of cleanup.h stuff, but let's not clutter the patch with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13chdir(2): import pathname only onceAl Viro1-1/+3
Convert the user_path_at() call inside a retry loop into getname_flags() + filename_lookup() + putname() and leave only filename_lookup() inside the loop. In this case we never pass LOOKUP_EMPTY, so getname_flags() is equivalent to plain getname(). The things could be further simplified by use of cleanup.h stuff, but let's not clutter the patch with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13do_utimes_path(): import pathname only onceAl Viro1-6/+7
Convert the user_path_at() call inside a retry loop into getname_flags() + filename_lookup() + putname() and leave only filename_lookup() inside the loop. Since we have the default logics for use of LOOKUP_EMPTY (passed iff AT_EMPTY_PATH is present in flags), just use getname_uflags() and don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags() will pass the right thing to getname_flags() and filename_lookup() doesn't care about LOOKUP_EMPTY at all. The things could be further simplified by use of cleanup.h stuff, but let's not clutter the patch with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13do_fchownat(): import pathname only onceAl Viro1-5/+6
Convert the user_path_at() call inside a retry loop into getname_flags() + filename_lookup() + putname() and leave only filename_lookup() inside the loop. Since we have the default logics for use of LOOKUP_EMPTY (passed iff AT_EMPTY_PATH is present in flags), just use getname_uflags() and don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags() will pass the right thing to getname_flags() and filename_lookup() doesn't care about LOOKUP_EMPTY at all. The things could be further simplified by use of cleanup.h stuff, but let's not clutter the patch with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13do_fchmodat(): import pathname only onceAl Viro1-4/+4
Convert the user_path_at() call inside a retry loop into getname_flags() + filename_lookup() + putname() and leave only filename_lookup() inside the loop. Since we have the default logics for use of LOOKUP_EMPTY (passed iff AT_EMPTY_PATH is present in flags), just use getname_uflags() and don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags() will pass the right thing to getname_flags() and filename_lookup() doesn't care about LOOKUP_EMPTY at all. The things could be further simplified by use of cleanup.h stuff, but let's not clutter the patch with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13do_faccessat(): import pathname only onceAl Viro1-3/+4
Convert the user_path_at() call inside a retry loop into getname_flags() + filename_lookup() + putname() and leave only filename_lookup() inside the loop. Since we have the default logics for use of LOOKUP_EMPTY (passed iff AT_EMPTY_PATH is present in flags), just use getname_uflags() and don't bother with setting LOOKUP_EMPTY in lookup_flags - getname_uflags() will pass the right thing to getname_flags() and filename_lookup() doesn't care about LOOKUP_EMPTY at all. The things could be further simplified by use of cleanup.h stuff, but let's not clutter the patch with that. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13init_link(): turn into a trivial wrapper for do_linkat()Al Viro1-31/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13init_symlink(): turn into a trivial wrapper for do_symlinkat()Al Viro1-13/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13init_mkdir(): turn into a trivial wrapper for do_mkdirat()Al Viro1-18/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13init_mknod(): turn into a trivial wrapper for do_mknodat()Al Viro3-21/+3
Same as init_unlink() and init_rmdir() already are; the only obstacle is do_mknodat() being static. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-13Merge tag 'gfs2-for-6.19-rc6' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 revert from Andreas Gruenbacher: "Revert bad commit "gfs2: Fix use of bio_chain" I was originally assuming that there must be a bug in gfs2 because gfs2 chains bios in the opposite direction of what bio_chain_and_submit() expects. It turns out that the bio chains are set up in "reverse direction" intentionally so that the first bio's bi_end_io callback is invoked rather than the last bio's callback. We want the first bio's callback invoked for the following reason: The initial bio starts page aligned and covers one or more pages. When it terminates at a non-page-aligned offset, subsequent bios are added to handle the remaining portion of the final page. Upon completion of the bio chain, all affected pages need to be be marked as read, and only the first bio references all of these pages" * tag 'gfs2-for-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: Revert "gfs2: Fix use of bio_chain"
2026-01-13xfs: set max_agbno to allow sparse alloc of last full inode chunkBrian Foster1-5/+6
Sparse inode cluster allocation sets min/max agbno values to avoid allocating an inode cluster that might map to an invalid inode chunk. For example, we can't have an inode record mapped to agbno 0 or that extends past the end of a runt AG of misaligned size. The initial calculation of max_agbno is unnecessarily conservative, however. This has triggered a corner case allocation failure where a small runt AG (i.e. 2063 blocks) is mostly full save for an extent to the EOFS boundary: [2050,13]. max_agbno is set to 2048 in this case, which happens to be the offset of the last possible valid inode chunk in the AG. In practice, we should be able to allocate the 4-block cluster at agbno 2052 to map to the parent inode record at agbno 2048, but the max_agbno value precludes it. Note that this can result in filesystem shutdown via dirty trans cancel on stable kernels prior to commit 9eb775968b68 ("xfs: walk all AGs if TRYLOCK passed to xfs_alloc_vextent_iterate_ags") because the tail AG selection by the allocator sets t_highest_agno on the transaction. If the inode allocator spins around and finds an inode chunk with free inodes in an earlier AG, the subsequent dir name creation path may still fail to allocate due to the AG restriction and cancel. To avoid this problem, update the max_agbno calculation to the agbno prior to the last chunk aligned agbno in the AG. This is not necessarily the last valid allocation target for a sparse chunk, but since inode chunks (i.e. records) are chunk aligned and sparse allocs are cluster sized/aligned, this allows the sb_spino_align alignment restriction to take over and round down the max effective agbno to within the last valid inode chunk in the AG. Note that even though the allocator improvements in the aforementioned commit seem to avoid this particular dirty trans cancel situation, the max_agbno logic improvement still applies as we should be able to allocate from an AG that has been appropriately selected. The more important target for this patch however are older/stable kernels prior to this allocator rework/improvement. Cc: stable@vger.kernel.org # v4.2 Fixes: 56d1115c9bc7 ("xfs: allocate sparse inode chunks on full chunk allocation failure") Signed-off-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: Fix xfs_grow_last_rtg()Nirjhar Roy (IBM)1-1/+1
The last rtg should be able to grow when the size of the last is less than (and not equal to) sb_rgextents. xfs_growfs with realtime groups fails without this patch. The reason is that, xfs_growfs_rtg() tries to grow the last rt group even when the last rt group is at its maximal size i.e, sb_rgextents. It fails with the following messages: XFS (loop0): Internal error block >= mp->m_rsumblocks at line 253 of file fs/xfs/libxfs/xfs_rtbitmap.c. Caller xfs_rtsummary_read_buf+0x20/0x80 XFS (loop0): Corruption detected. Unmount and run xfs_repair XFS (loop0): Internal error xfs_trans_cancel at line 976 of file fs/xfs/xfs_trans.c. Caller xfs_growfs_rt_bmblock+0x402/0x450 XFS (loop0): Corruption of in-memory data (0x8) detected at xfs_trans_cancel+0x10a/0x1f0 (fs/xfs/xfs_trans.c:977). Shutting down filesystem. XFS (loop0): Please unmount the filesystem and rectify the problem(s) Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: improve the assert at the top of xfs_log_coverChristoph Hellwig1-3/+5
Move each condition into a separate assert so that we can see which on triggered. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: fix an overly long line in xfs_rtgroup_calc_geometryChristoph Hellwig1-1/+2
Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: mark __xfs_rtgroup_extents staticChristoph Hellwig2-27/+25
__xfs_rtgroup_extents is not used outside of xfs_rtgroup.c, so mark it static. Move it and xfs_rtgroup_extents up in the file to avoid forward declarations. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13xfs: Fix the return value of xfs_rtcopy_summary()Nirjhar Roy (IBM)1-1/+1
xfs_rtcopy_summary() should return the appropriate error code instead of always returning 0. The caller of this function which is xfs_growfs_rt_bmblock() is already handling the error. Fixes: e94b53ff699c ("xfs: cache last bitmap block in realtime allocator") Signed-off-by: Nirjhar Roy (IBM) <nirjhar.roy.lists@gmail.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: stable@vger.kernel.org # v6.7 Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-13ext4: convert to new fserror helpersDarrick J. Wong2-4/+11
Use the new fserror functions to report metadata errors to fsnotify. Note that ext4 inconsistently passes around negative and positive error numbers all over the codebase, so we force them all to negative for consistency in what we report to fserror, and fserror ensures that only positive error numbers are passed to fanotify, per the fanotify(7) manpage. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402693.3490369.5875002879192895558.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13xfs: translate fsdax media errors into file "data lost" errors when convenientDarrick J. Wong1-0/+4
Translate fsdax persistent failure notifications into file data loss events when it's convenient, aka when the inode is already incore. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402673.3490369.1672039530408369208.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13xfs: report fs metadata errors via fsnotifyDarrick J. Wong2-0/+18
Report filesystem corruption problems to the fserror helpers so that fsnotify can also convey metadata problems to userspace. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402652.3490369.2609467634858507969.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13iomap: report file I/O errors to the VFSDarrick J. Wong3-1/+40
Wire up iomap so that it reports all file read and write errors to the VFS (and hence fsnotify) via the new fserror mechanism. Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402631.3490369.729008983502742314.stgit@frogsfrogsfrogs Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13fs: report filesystem and file I/O errors to fsnotifyDarrick J. Wong3-1/+198
Create some wrapper code around struct super_block so that filesystems have a standard way to queue filesystem metadata and file I/O error reports to have them sent to fsnotify. If a filesystem wants to provide an error number, it must supply only negative error numbers. These are stored internally as negative numbers, but they are converted to positive error numbers before being passed to fanotify, per the fanotify(7) manpage. Implementations of super_operations::report_error are passed the raw internal event data. Note that we have to play some shenanigans with mempools and queue_work so that the error handling doesn't happen outside of process context, and the event handler functions (both ->report_error and fsnotify) can handle file I/O error messages without having to worry about whatever locks might be held. This asynchronicity requires that unmount wait for pending events to clear. Add a new callback to the superblock operations structure so that filesystem drivers can themselves respond to file I/O errors if they so desire. This will be used for an upcoming self-healing patchset for XFS. Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402610.3490369.4378391061533403171.stgit@frogsfrogsfrogs Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13uapi: promote EFSCORRUPTED and EUCLEAN to errno.hDarrick J. Wong7-15/+0
Stop definining these privately and instead move them to the uapi errno.h so that they become canonical instead of copy pasta. Cc: linux-api@vger.kernel.org Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402587.3490369.17659117524205214600.stgit@frogsfrogsfrogs Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13romfs: check sb_set_blocksize() return valueDeepanshu Kartikey1-1/+4
romfs_fill_super() ignores the return value of sb_set_blocksize(), which can fail if the requested block size is incompatible with the block device's configuration. This can be triggered by setting a loop device's block size larger than PAGE_SIZE using ioctl(LOOP_SET_BLOCK_SIZE, 32768), then mounting a romfs filesystem on that device. When sb_set_blocksize(sb, ROMBSIZE) is called with ROMBSIZE=4096 but the device has logical_block_size=32768, bdev_validate_blocksize() fails because the requested size is smaller than the device's logical block size. sb_set_blocksize() returns 0 (failure), but romfs ignores this and continues mounting. The superblock's block size remains at the device's logical block size (32768). Later, when sb_bread() attempts I/O with this oversized block size, it triggers a kernel BUG in folio_set_bh(): kernel BUG at fs/buffer.c:1582! BUG_ON(size > PAGE_SIZE); Fix by checking the return value of sb_set_blocksize() and failing the mount with -EINVAL if it returns 0. Reported-by: syzbot+9c4e33e12283d9437c25@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=9c4e33e12283d9437c25 Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Link: https://patch.msgid.link/20260113084037.1167887-1-kartikey406@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-13fuse: add setlease file operationJeff Layton1-0/+1
Add the setlease file_operation to fuse_file_operations, pointing to generic_setlease. A future patch will change the default behavior to reject lease attempts with -EINVAL when there is no setlease file operation defined. Add generic_setlease to retain the ability to set leases on this filesystem. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260112130121.25965-1-jlayton@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12ecryptfs: Replace memcpy + NUL termination in ecryptfs_copy_filenameThorsten Blum1-14/+4
Use kmemdup_nul() to copy 'name' instead of using memcpy() followed by a manual NUL termination. Remove the local return variable and the goto label to simplify the code. No functional changes. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Acked-by: Tyler Hicks <code@tyhicks.com> Signed-off-by: Tyler Hicks <code@tyhicks.com>
2026-01-12NFS: Don't immediately return directory delegations when disabledAnna Schumaker1-1/+1
The function nfs_inode_evict_delegation() immediately and synchronously returns a delegation when called. This means we can't call it from nfs4_have_delegation(), since that function could be called under a lock. Instead we should mark the delegation for return and let the state manager handle it for us. Fixes: b6d2a520f463 ("NFS: Add a module option to disable directory delegations") Signed-off-by: Anna Schumaker <anna.schumaker@oracle.com> Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
2026-01-12fs: add immutable rootfsChristian Brauner4-12/+143
Currently pivot_root() doesn't work on the real rootfs because it cannot be unmounted. Userspace has to do a recursive removal of the initramfs contents manually before continuing the boot. Really all we want from the real rootfs is to serve as the parent mount for anything that is actually useful such as the tmpfs or ramfs for initramfs unpacking or the rootfs itself. There's no need for the real rootfs to actually be anything meaningful or useful. Add a immutable rootfs called "nullfs" that can be selected via the "nullfs_rootfs" kernel command line option. The kernel will mount a tmpfs/ramfs on top of it, unpack the initramfs and fire up userspace which mounts the rootfs and can then just do: chdir(rootfs); pivot_root(".", "."); umount2(".", MNT_DETACH); and be done with it. (Ofc, userspace can also choose to retain the initramfs contents by using something like pivot_root(".", "/initramfs") without unmounting it.) Technically this also means that the rootfs mount in unprivileged namespaces doesn't need to become MNT_LOCKED anymore as it's guaranteed that the immutable rootfs remains permanently empty so there cannot be anything revealed by unmounting the covering mount. In the future this will also allow us to create completely empty mount namespaces without risking to leak anything. systemd already handles this all correctly as it tries to pivot_root() first and falls back to MS_MOVE only when that fails. This goes back to various discussion in previous years and a LPC 2024 presentation about this very topic. Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-3-88dd1c34a204@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: add init_pivot_root()Christian Brauner3-47/+72
We will soon be able to pivot_root() with the introduction of the immutable rootfs. Add a wrapper for kernel internal usage. Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-2-88dd1c34a204@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: ensure that internal tmpfs mount gets mount id zeroChristian Brauner1-1/+1
and the rootfs get mount id one as it always has. Before we actually mount the rootfs we create an internal tmpfs mount which has mount id zero but is never exposed anywhere. Continue that "tradition". Link: https://patch.msgid.link/20260112-work-immutable-rootfs-v2-1-88dd1c34a204@kernel.org Fixes: 7f9bfafc5f49 ("fs: use xarray for old mount id") Reviewed-by: Jeff Layton <jlayton@kernel.org> Cc: stable@vger.kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12btrfs: fix memory leaks in create_space_info() error pathsJiasheng Jiang1-2/+6
In create_space_info(), the 'space_info' object is allocated at the beginning of the function. However, there are two error paths where the function returns an error code without freeing the allocated memory: 1. When create_space_info_sub_group() fails in zoned mode. 2. When btrfs_sysfs_add_space_info_type() fails. In both cases, 'space_info' has not yet been added to the fs_info->space_info list, resulting in a memory leak. Fix this by adding an error handling label to kfree(space_info) before returning. Fixes: 2be12ef79fe9 ("btrfs: Separate space_info create/update") Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12btrfs: invalidate pages instead of truncate after reflinkingFilipe Manana1-10/+13
Qu reported that generic/164 often fails because the read operations get zeroes when it expects to either get all bytes with a value of 0x61 or 0x62. The issue stems from truncating the pages from the page cache instead of invalidating, as truncating can zero page contents. This zeroing is not just in case the range is not page sized (as it's commented in truncate_inode_pages_range()) but also in case we are using large folios, they need to be split and the splitting fails. Stealing Qu's comment in the thread linked below: "We can have the following case: 0 4K 8K 12K 16K | | | | | |<---- Extent A ----->|<----- Extent B ------>| The page size is still 4K, but the folio we got is 16K. Then if we remap the range for [8K, 16K), then truncate_inode_pages_range() will get the large folio 0 sized 16K, then call truncate_inode_partial_folio(). Which later calls folio_zero_range() for the [8K, 16K) range first, then tries to split the folio into smaller ones to properly drop them from the cache. But if splitting failed (e.g. racing with other operations holding the filemap lock), the partially zeroed large folio will be kept, resulting the range [8K, 16K) being zeroed meanwhile the folio is still a 16K sized large one." So instead of truncating, invalidate the page cache range with a call to filemap_invalidate_inode(), which besides not doing any zeroing also ensures that while it's invalidating folios, no new folios are added. This helps ensure that buffered reads that happen while a reflink operation is in progress always get either the whole old data (the one before the reflink) or the whole new data, which is what generic/164 expects. Link: https://lore.kernel.org/linux-btrfs/7fb9b44f-9680-4c22-a47f-6648cb109ddf@suse.com/ Reported-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12btrfs: update the Kconfig string for CONFIG_BTRFS_EXPERIMENTALQu Wenruo1-1/+5
The following new features are missing: - Async checksum - Shutdown ioctl and auto-degradation - Larger block size support Which is dependent on larger folios. Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-01-12Revert "gfs2: Fix use of bio_chain"Andreas Gruenbacher1-1/+1
This reverts commit 8a157e0a0aa5143b5d94201508c0ca1bb8cfb941. That commit incorrectly assumed that the bio_chain() arguments were swapped in gfs2. However, gfs2 intentionally constructs bio chains so that the first bio's bi_end_io callback is invoked when all bios in the chain have completed, unlike bio chains where the last bio's callback is invoked. Fixes: 8a157e0a0aa5 ("gfs2: Fix use of bio_chain") Cc: stable@vger.kernel.org Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-12xfs: enable non-blocking timestamp updatesChristoph Hellwig1-6/+12
The lazytime path using the generic helpers can never block in XFS because there is no ->dirty_inode method that could block. Allow non-blocking timestamp updates for this case by replacing generic_update_time with the open coded version without the S_NOWAIT check. Fixes: 66fa3cedf16a ("fs: Add async write file modification handling.") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-12-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12xfs: implement ->sync_lazytimeChristoph Hellwig2-29/+20
Switch to the new explicit lazytime syncing method instead of trying to second guess what could be a lazytime update in ->dirty_inode. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-11-hch@lst.de Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: refactor file_update_time_flagsChristoph Hellwig1-16/+15
Split all the inode timestamp flags into a helper. This not only makes the code a bit more readable, but also optimizes away the further checks as soon as know we need an update. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-10-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: add support for non-blocking timestamp updatesChristoph Hellwig7-11/+50
Currently file_update_time_flags unconditionally returns -EAGAIN if any timestamp needs to be updated and IOCB_NOWAIT is passed. This makes non-blocking direct writes impossible on file systems with granular enough timestamps. Pass IOCB_NOWAIT to ->update_time and return -EAGAIN if it could block. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-9-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: add a ->sync_lazytime methodChristoph Hellwig1-2/+11
Allow the file system to explicitly implement lazytime syncing instead of pigging back on generic inode dirtying. This allows to simplify the XFS implementation and prepares for non-blocking lazytime timestamp updates. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-8-hch@lst.de Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: factor out a sync_lazytime helperChristoph Hellwig4-14/+20
Centralize how we synchronize a lazytime update into the actual on-disk timestamp into a single helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-7-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: refactor ->update_time handlingChristoph Hellwig14-142/+135
Pass the type of update (atime vs c/mtime plus version) as an enum instead of a set of flags that caused all kinds of confusion. Because inode_update_timestamps now can't return a modified version of those flags, return the I_DIRTY_* flags needed to persist the update, which is what the main caller in generic_update_time wants anyway, and which is suitable for the other callers that only want to know if an update happened. The whole update_time path keeps the flags argument, which will be used to support non-blocking updates soon even if it is unused, and (the slightly renamed) inode_update_time also gains the possibility to return a negative errno to support this. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-6-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fat: cleanup the flags for fat_truncate_timeChristoph Hellwig7-51/+44
Fat only has a single on-disk timestamp covering ctime and mtime. Add fat-specific flags that indicate which timestamp fat_truncate_time should update to make this more clear. This allows removing no-op fat_truncate_time calls with the S_CTIME flag and prepares for removing the S_* flags. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-5-hch@lst.de Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12nfs: split nfs_update_timestampsChristoph Hellwig1-16/+15
The VFS paths update either the atime or ctime and mtime but never mix between atime and the others. Split nfs_update_timestamps to match this to prepare for cleaning up the VFS interfaces. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-4-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12fs: allow error returns from generic_update_timeChristoph Hellwig4-12/+7
Now that no caller looks at the updated flags, switch generic_update_time to the same calling convention as the ->update_time method and return 0 or a negative errno. This prepares for adding non-blocking timestamp updates that could return -EAGAIN. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260108141934.2052404-3-hch@lst.de Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>