aboutsummaryrefslogtreecommitdiff
path: root/fs
AgeCommit message (Collapse)AuthorFilesLines
2026-01-17f2fs: fix to avoid mapping wrong physical block for swapfileChao Yu1-7/+7
Xiaolong Guo reported a f2fs bug in bugzilla [1] [1] https://bugzilla.kernel.org/show_bug.cgi?id=220951 Quoted: "When using stress-ng's swap stress test on F2FS filesystem with kernel 6.6+, the system experiences data corruption leading to either: 1 dm-verity corruption errors and device reboot 2 F2FS node corruption errors and boot hangs The issue occurs specifically when: 1 Using F2FS filesystem (ext4 is unaffected) 2 Swapfile size is less than F2FS section size (2MB) 3 Swapfile has fragmented physical layout (multiple non-contiguous extents) 4 Kernel version is 6.6+ (6.1 is unaffected) The root cause is in check_swap_activate() function in fs/f2fs/data.c. When the first extent of a small swapfile (< 2MB) is not aligned to section boundaries, the function incorrectly treats it as the last extent, failing to map subsequent extents. This results in incorrect swap_extent creation where only the first extent is mapped, causing subsequent swap writes to overwrite wrong physical locations (other files' data). Steps to Reproduce 1 Setup a device with F2FS-formatted userdata partition 2 Compile stress-ng from https://github.com/ColinIanKing/stress-ng 3 Run swap stress test: (Android devices) adb shell "cd /data/stressng; ./stress-ng-64 --metrics-brief --timeout 60 --swap 0" Log: 1 Ftrace shows in kernel 6.6, only first extent is mapped during second f2fs_map_blocks call in check_swap_activate(): stress-ng-swap-8990: f2fs_map_blocks: ino=11002, file offset=0, start blkaddr=0x43143, len=0x1 (Only 4KB mapped, not the full swapfile) 2 in kernel 6.1, both extents are correctly mapped: stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=0, start blkaddr=0x13cd4, len=0x1 stress-ng-swap-5966: f2fs_map_blocks: ino=28011, file offset=1, start blkaddr=0x60c84b, len=0xff The problematic code is in check_swap_activate(): if ((pblock - SM_I(sbi)->main_blkaddr) % blks_per_sec || nr_pblocks % blks_per_sec || !f2fs_valid_pinned_area(sbi, pblock)) { bool last_extent = false; not_aligned++; nr_pblocks = roundup(nr_pblocks, blks_per_sec); if (cur_lblock + nr_pblocks > sis->max) nr_pblocks -= blks_per_sec; /* this extent is last one */ if (!nr_pblocks) { nr_pblocks = last_lblock - cur_lblock; last_extent = true; } ret = f2fs_migrate_blocks(inode, cur_lblock, nr_pblocks); if (ret) { if (ret == -ENOENT) ret = -EINVAL; goto out; } if (!last_extent) goto retry; } When the first extent is unaligned and roundup(nr_pblocks, blks_per_sec) exceeds sis->max, we subtract blks_per_sec resulting in nr_pblocks = 0. The code then incorrectly assumes this is the last extent, sets nr_pblocks = last_lblock - cur_lblock (entire swapfile), and performs migration. After migration, it doesn't retry mapping, so subsequent extents are never processed. " In order to fix this issue, we need to lookup block mapping info after we migrate all blocks in the tail of swapfile. Cc: stable@kernel.org Fixes: 9703d69d9d15 ("f2fs: support file pinning for zoned devices") Cc: Daeho Jeong <daehojeong@google.com> Reported-and-tested-by: Xiaolong Guo <guoxiaolong2008@gmail.com> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220951 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: avoid f2fs_map_blocks() for consecutive holes in readpagesChao Yu1-6/+15
For consecutive large hole mapping across {d,id,did}nodes , we don't need to call f2fs_map_blocks() to check one hole block per one time, instead, we can use map.m_next_pgofs as a hint of next potential valid block, so that we can skip calling f2fs_map_blocks the range of [cur_pgofs + 1, .m_next_pgofs). 1) regular case touch /mnt/f2fs/file truncate -s $((1024*1024*1024)) /mnt/f2fs/file time dd if=/mnt/f2fs/file of=/dev/null bs=1M count=1024 Before: real 0m0.706s user 0m0.000s sys 0m0.706s After: real 0m0.620s user 0m0.008s sys 0m0.611s 2) large folio case touch /mnt/f2fs/file truncate -s $((1024*1024*1024)) /mnt/f2fs/file f2fs_io setflags immutable /mnt/f2fs/file sync echo 3 > /proc/sys/vm/drop_caches time dd if=/mnt/f2fs/file of=/dev/null bs=1M count=1024 Before: real 0m0.438s user 0m0.004s sys 0m0.433s After: real 0m0.368s user 0m0.004s sys 0m0.364s Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: advance index and offset after zeroing in large folio readNanzhe Zhao1-3/+1
In f2fs_read_data_large_folio(), the block zeroing path calls folio_zero_range() and then continues the loop. However, it fails to advance index and offset before continuing. This can cause the loop to repeatedly process the same subpage of the folio, leading to stalls/hangs and incorrect progress when reading large folios with holes/zeroed blocks. Fix it by advancing index and offset unconditionally in the loop iteration, so they are updated even when the zeroing path continues. Signed-off-by: Nanzhe Zhao <nzzhao@126.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: add 'folio_in_bio' to handle readahead folios with no BIO submissionNanzhe Zhao1-0/+7
f2fs_read_data_large_folio() can build a single read BIO across multiple folios during readahead. If a folio ends up having none of its subpages added to the BIO (e.g. all subpages are zeroed / treated as holes), it will never be seen by f2fs_finish_read_bio(), so folio_end_read() is never called. This leaves the folio locked and not marked uptodate. Track whether the current folio has been added to a BIO via a local 'folio_in_bio' bool flag, and when iterating readahead folios, explicitly mark the folio uptodate (on success) and unlock it when nothing was added. Signed-off-by: Nanzhe Zhao <nzzhao@126.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: avoid unnecessary block mapping lookups in f2fs_read_data_large_folioYongpeng Yang1-1/+1
In the second call to f2fs_map_blocks within f2fs_read_data_large_folio, map.m_len exceeds the logical address space to be read. This patch ensures map.m_len does not exceed the required address space. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: detect more inconsistent cases in sanity_check_node_footer()Chao Yu1-3/+12
Let's enhance sanity_check_node_footer() to detect more inconsistent cases as below: Node Type Node Footer Info =================== ============================= NODE_TYPE_REGULAR inode = true and xnode = true NODE_TYPE_INODE inode = false or xnode = true NODE_TYPE_XATTR inode = true or xnode = false NODE_TYPE_NON_INODE inode = false Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: fix to do sanity check on node footer in {read,write}_end_ioChao Yu4-19/+32
-----------[ cut here ]------------ kernel BUG at fs/f2fs/data.c:358! Call Trace: <IRQ> blk_update_request+0x5eb/0xe70 block/blk-mq.c:987 blk_mq_end_request+0x3e/0x70 block/blk-mq.c:1149 blk_complete_reqs block/blk-mq.c:1224 [inline] blk_done_softirq+0x107/0x160 block/blk-mq.c:1229 handle_softirqs+0x283/0x870 kernel/softirq.c:579 __do_softirq kernel/softirq.c:613 [inline] invoke_softirq kernel/softirq.c:453 [inline] __irq_exit_rcu+0xca/0x1f0 kernel/softirq.c:680 irq_exit_rcu+0x9/0x30 kernel/softirq.c:696 instr_sysvec_apic_timer_interrupt arch/x86/kernel/apic/apic.c:1050 [inline] sysvec_apic_timer_interrupt+0xa6/0xc0 arch/x86/kernel/apic/apic.c:1050 </IRQ> In f2fs_write_end_io(), it detects there is inconsistency in between node page index (nid) and footer.nid of node page. If footer of node page is corrupted in fuzzed image, then we load corrupted node page w/ async method, e.g. f2fs_ra_node_pages() or f2fs_ra_node_page(), in where we won't do sanity check on node footer, once node page becomes dirty, we will encounter this bug after node page writeback. Cc: stable@kernel.org Reported-by: syzbot+803dd716c4310d16ff3a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=803dd716c4310d16ff3a Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: fix to do sanity check on node footer in __write_node_folio()Chao Yu1-1/+5
Add node footer sanity check during node folio's writeback, if sanity check fails, let's shutdown filesystem to avoid looping to redirty and writeback in .writepages. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: clean up the type parameter in f2fs_sync_meta_pages()Yangyang Zang3-11/+10
Clean up code to improve readability, no logic changes. Signed-off-by: Yangyang Zang <zangyangyang1@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: support non-4KB block size without packed_ssa featureDaeho Jeong7-95/+121
Currently, F2FS requires the packed_ssa feature to be enabled when utilizing non-4KB block sizes (e.g., 16KB). This restriction limits the flexibility of filesystem formatting options. This patch allows F2FS to support non-4KB block sizes even when the packed_ssa feature is disabled. It adjusts the SSA calculation logic to correctly handle summary entries in larger blocks without the packed layout. Cc: stable@kernel.org Fixes: 7ee8bc3942f2 ("f2fs: revert summary entry count from 2048 to 512 in 16kb block support") Signed-off-by: Daeho Jeong <daehojeong@google.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: make FAULT_DISCARD obsoleteChao Yu2-16/+4
__blkdev_issue_discard() in __submit_discard_cmd() will never fail, so let's make FAULT_DISCARD fault injection obsolete. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-17f2fs: fix to avoid UAF in f2fs_write_end_io()Chao Yu1-3/+9
As syzbot reported an use-after-free issue in f2fs_write_end_io(). It is caused by below race condition: loop device umount - worker_thread - loop_process_work - do_req_filebacked - lo_rw_aio - lo_rw_aio_complete - blk_mq_end_request - blk_update_request - f2fs_write_end_io - dec_page_count - folio_end_writeback - kill_f2fs_super - kill_block_super - f2fs_put_super : free(sbi) : get_pages(, F2FS_WB_CP_DATA) accessed sbi which is freed In kill_f2fs_super(), we will drop all page caches of f2fs inodes before call free(sbi), it guarantee that all folios should end its writeback, so it should be safe to access sbi before last folio_end_writeback(). Let's relocate ckpt thread wakeup flow before folio_end_writeback() to resolve this issue. Cc: stable@kernel.org Fixes: e234088758fc ("f2fs: avoid wait if IO end up when do_checkpoint for better performance") Reported-by: syzbot+b4444e3c972a7a124187@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=b4444e3c972a7a124187 Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-16mount: add OPEN_TREE_NAMESPACEChristian Brauner3-16/+161
When creating containers the setup usually involves using CLONE_NEWNS via clone3() or unshare(). This copies the caller's complete mount namespace. The runtime will also assemble a new rootfs and then use pivot_root() to switch the old mount tree with the new rootfs. Afterward it will recursively umount the old mount tree thereby getting rid of all mounts. On a basic system here where the mount table isn't particularly large this still copies about 30 mounts. Copying all of these mounts only to get rid of them later is pretty wasteful. This is exacerbated if intermediary mount namespaces are used that only exist for a very short amount of time and are immediately destroyed again causing a ton of mounts to be copied and destroyed needlessly. With a large mount table and a system where thousands or ten-thousands of containers are spawned in parallel this quickly becomes a bottleneck increasing contention on the semaphore. Extend open_tree() with a new OPEN_TREE_NAMESPACE flag. Similar to OPEN_TREE_CLONE only the indicated mount tree is copied. Instead of returning a file descriptor referring to that mount tree OPEN_TREE_NAMESPACE will cause open_tree() to return a file descriptor to a new mount namespace. In that new mount namespace the copied mount tree has been mounted on top of a copy of the real rootfs. The caller can setns() into that mount namespace and perform any additionally required setup such as move_mount() detached mounts in there. This allows OPEN_TREE_NAMESPACE to function as a combined unshare(CLONE_NEWNS) and pivot_root(). A caller may for example choose to create an extremely minimal rootfs: fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); This will create a mount namespace where "wootwoot" has become the rootfs mounted on top of the real rootfs. The caller can now setns() into this new mount namespace and assemble additional mounts. This also works with user namespaces: unshare(CLONE_NEWUSER); fd_mntns = open_tree(-EBADF, "/var/lib/containers/wootwoot", OPEN_TREE_NAMESPACE); which creates a new mount namespace owned by the earlier created user namespace with "wootwoot" as the rootfs mounted on top of the real rootfs. Link: https://patch.msgid.link/20251229-work-empty-namespace-v1-1-bfb24c7b061f@kernel.org Tested-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Aleksa Sarai <cyphar@cyphar.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Suggested-by: Christian Brauner <brauner@kernel.org> Suggested-by: Aleksa Sarai <cyphar@cyphar.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16vfs: document d_dispose_if_unused()Miklos Szeredi1-0/+10
Add a warning about the danger of using this function without proper locking preventing eviction. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-7-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: shrink once after all buckets have been scannedMiklos Szeredi1-1/+1
In fuse_dentry_tree_work() move the shrink_dentry_list() out from the loop. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-6-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: clean up fuse_dentry_tree_work()Miklos Szeredi1-14/+14
- Change time_after64() time_before64(), since the latter is exclusively used in this file to compare dentry/inode timeout with current time. - Move the break statement from the else branch to the if branch, reducing indentation. Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-5-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: add need_resched() before unlocking bucketMiklos Szeredi1-3/+5
In fuse_dentry_tree_work() no need to unlock/lock dentry_hash[i].lock on each iteration. Suggested-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-4-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: make sure dentry is evicted if staleMiklos Szeredi1-0/+4
d_dispose_if_unused() may find the dentry with a positive refcount, in which case it won't be put on the dispose list even though it has already timed out. "Reinstall" the d_delete() callback, which was optimized out in fuse_dentry_settime(). This will result in the dentry being evicted as soon as the refcount hits zero. Fixes: ab84ad597386 ("fuse: new work queue to periodically invalidate expired dentries") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-3-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16fuse: fix race when disposing stale dentriesMiklos Szeredi1-9/+2
In fuse_dentry_tree_work() just before d_dispose_if_unused() the dentry could get evicted, resulting in UAF. Move unlocking dentry_hash[i].lock to after the dispose. To do this, fuse_dentry_tree_del_node() needs to be moved from fuse_dentry_prune() to fuse_dentry_release() to prevent an ABBA deadlock. The lock ordering becomes: -> dentry_bucket.lock -> dentry.d_lock Reported-by: Al Viro <viro@zeniv.linux.org.uk> Closes: https://lore.kernel.org/all/20251206014242.GO1712166@ZenIV/ Fixes: ab84ad597386 ("fuse: new work queue to periodically invalidate expired dentries") Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://patch.msgid.link/20260114145344.468856-2-mszeredi@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-16sysfs(2): fs_index() argument is _not_ a pathnameAl Viro1-6/+3
... it's a filesystem type name. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16ksmbd: use CLASS(filename_kernel)Al Viro1-5/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16user_statfs(): switch to CLASS(filename)Al Viro1-2/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16statx: switch to CLASS(filename_maybe_null)Al Viro1-13/+5
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16quotactl_block(): switch to CLASS(filename)Al Viro1-2/+1
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16chroot(2): switch to CLASS(filename)Al Viro1-9/+4
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16move_mount(2): switch to CLASS(filename_maybe_null)Al Viro1-4/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16namei.c: switch user pathname imports to CLASS(filename{,_flags})Al Viro1-15/+6
filename_flags is used by user_path_at(). I suspect that mixing LOOKUP_EMPTY with real lookup flags had been a mistake all along; the former belongs to pathname import, the latter - to pathwalk. Right now none of the remaining in-tree callers of user_path_at() are getting LOOKUP_EMPTY in flags, so user_path_at() could probably be switched to CLASS(filename)... Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16namei.c: convert getname_kernel() callers to CLASS(filename_kernel)Al Viro1-26/+10
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16do_f{chmod,chown,access}at(): use CLASS(filename_uflags)Al Viro1-6/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16do_readlinkat(): switch to CLASS(filename_flags)Al Viro1-6/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16do_sys_truncate(): switch to CLASS(filename)Al Viro1-7/+5
Note that failures from filename_lookup() are final - ESTALE returned by it means that retry had been done by filename_lookup() and it failed there. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16do_utimes_path(): switch to CLASS(filename_uflags)Al Viro1-5/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16chdir(2): unspaghettify a bit...Al Viro1-17/+10
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16do_fchownat(): unspaghettify a bit...Al Viro1-16/+12
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16fspick(2): use CLASS(filename_flags)Al Viro1-3/+3
That kills the last place where we mix LOOKUP_EMPTY with lookup flags proper. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16name_to_handle_at(): use CLASS(filename_uflags)Al Viro1-3/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16vfs_open_tree(): use CLASS(filename_uflags)Al Viro1-3/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16do_open_execat(): don't care about LOOKUP_EMPTYAl Viro1-2/+0
do_file_open() doesn't. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16mount_setattr(2): don't mess with LOOKUP_EMPTYAl Viro1-3/+2
just use CLASS(filename_uflags) + filename_lookup() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16file_[gs]etattr(2): switch to CLASS(filename_maybe_null)Al Viro1-4/+2
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16filename_...xattr(): don't consume filename referenceAl Viro1-25/+8
Callers switched to CLASS(filename_maybe_null) (in fs/xattr.c) and CLASS(filename_complete_delayed) (in io_uring/xattr.c). Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16non-consuming variants of do_{unlinkat,rmdir}()Al Viro4-17/+19
similar to previous commit; replacements are filename_{unlinkat,rmdir}() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16non-consuming variant of do_mknodat()Al Viro3-11/+11
similar to previous commit; replacement is filename_mknodat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16non-consuming variant of do_mkdirat()Al Viro3-9/+9
similar to previous commit; replacement is filename_mkdirat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16non-consuming variant of do_symlinkat()Al Viro3-15/+15
similar to previous commit; replacement is filename_symlinkat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16non-consuming variant of do_linkat()Al Viro3-18/+16
similar to previous commit; replacement is filename_linkat() Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16non-consuming variant of do_renameat2()Al Viro2-16/+16
filename_renameat2() replaces do_renameat2(); unlike the latter, it does not drop filename references - these days it can be just as easily arranged in the caller. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-01-16Merge tag 'xfs-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linuxLinus Torvalds5-39/+41
Pull xfs fixes from Carlos Maiolino: "Just a few obvious fixes and some 'cosmetic' changes" * tag 'xfs-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/fs/xfs/xfs-linux: xfs: set max_agbno to allow sparse alloc of last full inode chunk xfs: Fix xfs_grow_last_rtg() xfs: improve the assert at the top of xfs_log_cover xfs: fix an overly long line in xfs_rtgroup_calc_geometry xfs: mark __xfs_rtgroup_extents static xfs: Fix the return value of xfs_rtcopy_summary() xfs: fix memory leak in xfs_growfs_check_rtgeom()
2026-01-16ntfs3: Restore NULL folio initialization in ntfs_writepages()Nathan Chancellor1-1/+1
Clang warns (or errors with CONFIG_WERROR=y): fs/ntfs3/inode.c:1021:6: error: variable 'folio' is used uninitialized whenever 'if' condition is true [-Werror,-Wsometimes-uninitialized] 1021 | if (is_resident(ni)) { | ^~~~~~~~~~~~~~~ fs/ntfs3/inode.c:1024:48: note: uninitialized use occurs here 1024 | while ((folio = writeback_iter(mapping, wbc, folio, &err))) | ^~~~~ folio should be initialized to NULL for the first iteration of writeback_iter() to start the loop properly. Restore the NULL initialization of folio that was lost in the recent iomap conversion to clear up the warning. Fixes: 099ef9a ("fs/ntfs3: implement iomap-based file operations") Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/oe-kbuild-all/202601010644.FIhOXy6Y-lkp@intel.com/ Closes: https://lore.kernel.org/r/202601010513.axd56bks-lkp@intel.com/ Signed-off-by: Nathan Chancellor <nathan@kernel.org> [almaz.alexandrovich@paragon-software.com: added a few more tags] Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2026-01-16quota: fix livelock between quotactl and freeze_superAbhishek Bapat1-0/+1
When a filesystem is frozen, quotactl_block() enters a retry loop waiting for the filesystem to thaw. It acquires s_umount, checks the freeze state, drops s_umount and uses sb_start_write() - sb_end_write() pair to wait for the unfreeze. However, this retry loop can trigger a livelock issue, specifically on kernels with preemption disabled. The mechanism is as follows: 1. freeze_super() sets SB_FREEZE_WRITE and calls sb_wait_write(). 2. sb_wait_write() calls percpu_down_write(), which initiates synchronize_rcu(). 3. Simultaneously, quotactl_block() spins in its retry loop, immediately executing the sb_start_write() - sb_end_write() pair. 4. Because the kernel is non-preemptible and the loop contains no scheduling points, quotactl_block() never yields the CPU. This prevents that CPU from reaching an RCU quiescent state. 5. synchronize_rcu() in the freezer thread waits indefinitely for the quotactl_block() CPU to report a quiescent state. 6. quotactl_block() spins indefinitely waiting for the freezer to advance, which it cannot do as it is blocked on the RCU sync. This results in a hang of the freezer process and 100% CPU usage by the quota process. While this can occur intermittently on multi-core systems, it is reliably reproducing on a node with the following script, running both the freezer and the quota toggle on the same CPU: # mkfs.ext4 -O quota /dev/sda 2g && mkdir a_mount # mount /dev/sda -o quota,usrquota,grpquota a_mount # taskset -c 3 bash -c "while true; do xfs_freeze -f a_mount; \ xfs_freeze -u a_mount; done" & # taskset -c 3 bash -c "while true; do quotaon a_mount; \ quotaoff a_mount; done" & Adding cond_resched() to the retry loop fixes the issue. It acts as an RCU quiescent state, allowing synchronize_rcu() in percpu_down_write() to complete. Fixes: 576215cffdef ("fs: Drop wait_unfrozen wait queue") Signed-off-by: Abhishek Bapat <abhishekbapat@google.com> Link: https://patch.msgid.link/20260115213103.1089129-1-abhishekbapat@google.com Signed-off-by: Jan Kara <jack@suse.cz>