linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2026-01-28	Merge tag 'scrub-syzbot-fixes-7.0_2026-01-25' of ↵	Carlos Maiolino	23	-181/+115
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-7.0-merge xfs: syzbot fixes for online fsck [3/3] Fix various syzbot complaints about scrub that Jiaming Zhang found. With a bit of luck, this should all go splendidly. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-28	Merge tag 'attr-pptr-speedup-7.0_2026-01-25' of ↵	Carlos Maiolino	6	-17/+157
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-7.0-merge xfs: improve shortform attr performance [2/3] Improve performance of the xattr (and parent pointer) code when the attr structure is in short format and we can therefore perform all updates in a single transaction. Avoiding the attr intent code brings a very nice speedup in those operations. With a bit of luck, this should all go splendidly. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-28	Merge tag 'attr-leaf-freemap-fixes-7.0_2026-01-25' of ↵	Carlos Maiolino	3	-63/+155
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-7.0-merge xfs: fix problems in the attr leaf freemap code [1/3] Running generic/753 for hours revealed data corruption problems in the attr leaf block space management code. Under certain circumstances, freemap entries are left with zero size but a nonzero offset. If that offset happens to be the same offset as the end of the entries array during an attr set operation, the leaf entry table expansion will push the freemap record offset upwards without checking for overlap with any other freemap entries. If there happened to be a second freemap entry overlapping with the newly allocated leaf entry space, then the next attr set operation might find that space and overwrite the leaf entry, thereby corrupting the leaf block. Fix this by zeroing the freemap offset any time we set the size to zero. If a subsequent attr set operation finds no space in the freemap, it will compact the block and regenerate the freemaps. With a bit of luck, this should all go splendidly. Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-28	Merge tag 'health-monitoring-7.0_2026-01-20' of ↵	Carlos Maiolino	31	-26/+3044
	https://git.kernel.org/pub/scm/linux/kernel/git/djwong/xfs-linux into xfs-7.0-merge xfs: autonomous self healing of filesystems [v7] This patchset builds new functionality to deliver live information about filesystem health events to userspace. This is done by creating an anonymous file that can be read() for events by userspace programs. Events are captured by hooking various parts of XFS and iomap so that metadata health failures, file I/O errors, and major changes in filesystem state (unmounts, shutdowns, etc.) can be observed by programs. When an event occurs, the hook functions queue an event object to each event anonfd for later processing. Programs must have CAP_SYS_ADMIN to open the anonfd and there's a maximum event lag to prevent resource overconsumption. The events themselves can be read() from the anonfd as C structs for the xfs_healer daemon. In userspace, we create a new daemon program that will read the event objects and initiate repairs automatically. This daemon is managed entirely by systemd and will not block unmounting of the filesystem unless repairs are ongoing. They are auto-started by a starter service that uses fanotify. This patchset depends on the new fserror code that Christian Brauner has tentatively accepted for Linux 7.0: https://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs.git/log/?h=vfs-7.0.fserror v7: more cleanups of the media verification ioctl, improve comments, and reuse the bio v6: fix pi-breaking bugs, make verify failures trigger health reports and filter bio status flags better v5: add verify-media ioctl, collapse small helper funcs with only one caller v4: drop multiple client support so we can make direct calls into healthmon instead of chasing pointers and doing indirect calls v3: drag out of rfc status With a bit of luck, this should all go splendidly. Conflicts: This merge required an update on files: - fs/xfs/xfs_healthmon.c - fs/xfs/xfs_verify_media.c Such change was required because a parallel developement changed XFS header file xfs.h naming to xfs_platform.h, so the merge required to update those includes in both files above Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Signed-off-by: Carlos Maiolino <cem@kernel.org>
2026-01-28	isofs: support full length file names (255 instead of 253)	Shawn Landden	1	-1/+1
	Linux file names are in principle limited only to PATH_MAX (which is 4096) but the code in get_rock_ridge_filename() limits them to 253 characters. As mentioned by Jan Kara, the Rockridge standard to ECMA119/ISO9660 has no limit of file name length, but this limits file names to the traditional 255 NAME_MAX value. Signed-off-by: Shawn Landden <slandden@gmail.com> Link: https://patch.msgid.link/CA+49okq0ouJvAx0=txR_gyNKtZj55p3Zw4MB8jXZsGr4bEGjRA@mail.gmail.com Signed-off-by: Jan Kara <jack@suse.cz>
2026-01-28	erofs: mark inodes without acls in erofs_read_inode()	Gao Xiang	3	-1/+26
	Similar to commit 91ef18b567da ("ext4: mark inodes without acls in __ext4_iget()"), the ACL state won't be read when the file owner performs a lookup, and the RCU fast path for lookups won't work because the ACL state remains unknown. If there are no extended attributes, or if the xattr filter indicates that no ACL xattr is present, call cache_no_acl() directly. Reviewed-by: Hongbo Li <lihongbo22@huawei.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-01-27	fs/ntfs3: Fix slab-out-of-bounds read in DeleteIndexEntryRoot	Jiasheng Jiang	1	-0/+3
	In the 'DeleteIndexEntryRoot' case of the 'do_action' function, the entry size ('esize') is retrieved from the log record without adequate bounds checking. Specifically, the code calculates the end of the entry ('e2') using: e2 = Add2Ptr(e1, esize); It then calculates the size for memmove using 'PtrOffset(e2, ...)', which subtracts the end pointer from the buffer limit. If 'esize' is maliciously large, 'e2' exceeds the used buffer size. This results in a negative offset which, when cast to size_t for memmove, interprets as a massive unsigned integer, leading to a heap buffer overflow. This commit adds a check to ensure that the entry size ('esize') strictly fits within the remaining used space of the index header before performing memory operations. Fixes: b46acd6a6a62 ("fs/ntfs3: Add NTFS journal") Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Signed-off-by: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
2026-01-26	fat: avoid parent link count underflow in rmdir	Zhiyu Zhang	2	-2/+12
	Corrupted FAT images can leave a directory inode with an incorrect i_nlink (e.g. 2 even though subdirectories exist). rmdir then unconditionally calls drop_nlink(dir) and can drive i_nlink to 0, triggering the WARN_ON in drop_nlink(). Add a sanity check in vfat_rmdir() and msdos_rmdir(): only drop the parent link count when it is at least 3, otherwise report a filesystem error. Link: https://lkml.kernel.org/r/20260101111148.1437-1-zhiyuzhang999@gmail.com Fixes: 9a53c3a783c2 ("[PATCH] r/o bind mounts: unlink: monitor i_nlink") Signed-off-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Reported-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Closes: https://lore.kernel.org/linux-fsdevel/aVN06OKsKxZe6-Kv@casper.infradead.org/T/#t Tested-by: Zhiyu Zhang <zhiyuzhang999@gmail.com> Acked-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Christian Brauner <brauner@kernel.org> Cc: Jan Kara <jack@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26	ocfs2: add check for free bits before allocation in ocfs2_move_extent()	Deepanshu Kartikey	1	-1/+6
	Add a check to verify the group descriptor has enough free bits before attempting allocation in ocfs2_move_extent(). This prevents a kernel BUG_ON crash in ocfs2_block_group_set_bits() when the move_extents ioctl is called on a crafted or corrupted filesystem. The existing validation in ocfs2_validate_gd_self() only checks static metadata consistency (bg_free_bits_count <= bg_bits) when the descriptor is first read from disk. However, during move_extents operations, multiple allocations can exhaust the free bits count below the requested allocation size, triggering BUG_ON(le16_to_cpu(bg->bg_free_bits_count) < num_bits). The debug trace shows the issue clearly: - Block group 32 validated with bg_free_bits_count=427 - Repeated allocations decreased count: 427 -> 171 -> 43 -> ... -> 1 - Final request for 2 bits with only 1 available triggers BUG_ON By adding an early check in ocfs2_move_extent() right after ocfs2_find_victim_alloc_group(), we return -ENOSPC gracefully instead of crashing the kernel. This also avoids unnecessary work in ocfs2_probe_alloc_group() and __ocfs2_move_extent() when the allocation will fail. Link: https://lkml.kernel.org/r/20260104133504.14810-1-kartikey406@gmail.com Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reported-by: syzbot+7960178e777909060224@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=7960178e777909060224 Link: https://lore.kernel.org/all/20251231115801.293726-1-kartikey406@gmail.com/T/ [v1] Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-26	ocfs2: adjust function name reference	Julia Lawall	1	-1/+1
	There is no function dlm_mast_regions(). However, dlm_match_regions() is passed the buffer "local", which it uses internally, so it seems like dlm_match_regions() was intended. Link: https://lkml.kernel.org/r/20251230142513.95467-1-Julia.Lawall@inria.fr Signed-off-by: Julia Lawall <Julia.Lawall@inria.fr> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-27	f2fs: decrease maximum flush retry count in f2fs_enable_checkpoint()	Chao Yu	2	-1/+3
	It's rare case that sync_inodes_sb() always skips to flush some drity datas, so it's enough to give extra three more chances to flush data. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-27	f2fs: optimize NAT block loading during checkpoint write	Yongpeng Yang	1	-1/+13
	Under stress tests with frequent metadata operations, checkpoint write time can become excessively long. Analysis shows that the slowdown is caused by synchronous, one-by-one reads of NAT blocks during checkpoint processing. The issue can be reproduced with the following workload: 1. seq 1 650000 \| xargs -P 16 -n 1 touch 2. sync # avoid checkpoint write during deleting 3. delete 1 file every 455 files 4. echo 3 > /proc/sys/vm/drop_caches 5. sync # trigger checkpoint write This patch submits read I/O for all NAT blocks required in the __flush_nat_entry_set() phase in advance, reducing the overhead of synchronous waiting for individual NAT block reads. The NAT block flush latency before and after the change is as below: \| \|NAT blocks accessed\|NAT blocks read\|Flush time (ms)\| \|-------------\|-------------------\|---------------\|---------------\| \|Before change\|1205 \|1191 \|158 \| \|After change \|1264 \|1242 \|11 \| With a similar number of NAT blocks accessed and read from disk, adding NAT block readahead reduces the total NAT block flush time by more than 90%. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-27	f2fs: change size parameter of __has_cursum_space() to unsigned int	Yongpeng Yang	1	-1/+1
	All callers of __has_cursum_space() pass an unsigned int value as the size parameter. Change the parameter type to unsigned int accordingly. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-27	f2fs: add write latency stats for NAT and SIT blocks in f2fs_write_checkpoint	Yongpeng Yang	2	-2/+6
	This patch adds separate write latency accounting for NAT and SIT blocks in f2fs_write_checkpoint(). Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-27	f2fs: pin files do not require sbi->writepages lock for ordering	Yongpeng Yang	1	-0/+2
	For pinned files, the file mapping is already established before writing, and since the writes are in IPU, there is no need to acquire the sbi->writepages lock to guarantee write ordering. Signed-off-by: Yongpeng Yang <yangyongpeng@xiaomi.com> Reviewed-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-27	f2fs: fix to show simulate_lock_timeout correctly	Chao Yu	1	-1/+2
	Commit d36de29f4bb5 ("f2fs: sysfs: introduce inject_lock_timeout") introduces a bug as below, fix it. cat /sys/fs/f2fs/vdx/inject_lock_timeout s/fs/f2fs/vdx/inject_lock_timeout: Invalid argument Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-27	f2fs: introduce FAULT_SKIP_WRITE	Chao Yu	3	-0/+6
	In order to simulate skipped write during enable_checkpoint(). Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-27	f2fs: check skipped write in f2fs_enable_checkpoint()	Chao Yu	4	-4/+55
	This patch introduces sbi->nr_pages[F2FS_SKIPPED_WRITE] to record any skipped write during data flush in f2fs_enable_checkpoint(). So in the loop of data flush, if there is any skipped write in previous flush, let's retry sync_inode_sb(), otherwise, all dirty data written before f2fs_enable_checkpoint() should have been persisted, then break the retry loop. Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org>
2026-01-26	Merge tag 'vfs-6.19-rc8.fixes' of ↵	Linus Torvalds	13	-39/+75
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix the the buggy conversion of fuse_reverse_inval_entry() introduced during the creation rework - Disallow nfs delegation requests for directories by setting simple_nosetlease() - Require an opt-in for getting readdir flag bits outside of S_DT_MASK set in d_type - Fix scheduling delayed writeback work by only scheduling when the dirty time expiry interval is non-zero and cancel the delayed work if the interval is set to zero - Use rounded_jiffies_interval for dirty time work - Check the return value of sb_set_blocksize() for romfs - Wait for batched folios to be stable in __iomap_get_folio() - Use private naming for fuse hash size - Fix the stale dentry cleanup to prevent a race that causes a UAF * tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: document d_dispose_if_unused() fuse: shrink once after all buckets have been scanned fuse: clean up fuse_dentry_tree_work() fuse: add need_resched() before unlocking bucket fuse: make sure dentry is evicted if stale fuse: fix race when disposing stale dentries fuse: use private naming for fuse hash size writeback: use round_jiffies_relative for dirtytime_work iomap: wait for batched folios to be stable in __iomap_get_folio romfs: check sb_set_blocksize() return value docs: clarify that dirtytime_expire_seconds=0 disables writeback writeback: fix 100% CPU usage when dirtytime_expire_interval is 0 readdir: require opt-in for d_type flags vboxsf: don't allow delegations to be set on directories ceph: don't allow delegations to be set on directories gfs2: don't allow delegations to be set on directories 9p: don't allow delegations to be set on directories smb/client: properly disallow delegations on directories nfs: properly disallow delegation requests on directories fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
2026-01-26	xdrgen: Add enum value validation to generated decoders	Chuck Lever	2	-20/+87
	XDR enum decoders generated by xdrgen do not verify that incoming values are valid members of the enum. Incoming out-of-range values from malicious or buggy peers propagate through the system unchecked. Add validation logic to generated enum decoders using a switch statement that explicitly lists valid enumerator values. The compiler optimizes this to a simple range check when enum values are dense (contiguous), while correctly rejecting invalid values for sparse enums with gaps in their value ranges. The --no-enum-validation option on the source subcommand disables this validation when not needed. The minimum and maximum fields in _XdrEnum, which were previously unused placeholders for a range-based validation approach, have been removed since the switch-based validation handles both dense and sparse enums correctly. Because the new mechanism results in substantive changes to generated code, existing .x files are regenerated. Unrelated white space and semicolon changes in the generated code are due to recent commit 1c873a2fd110 ("xdrgen: Don't generate unnecessary semicolon") and commit 38c4df91242b ("xdrgen: Address some checkpatch whitespace complaints"). Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	nfsd: fix return error code for nfsd_map_name_to_[ug]id	Anthony Iliopoulos	1	-0/+4
	idmap lookups can time out while the cache is waiting for a userspace upcall reply. In that case cache_check() returns -ETIMEDOUT to callers. The nfsd_map_name_to_[ug]id functions currently proceed with attempting to map the id to a kuid despite a potentially temporary failure to perform the idmap lookup. This results in the code returning the error NFSERR_BADOWNER which can cause client operations to return to userspace with failure. Fix this by returning the failure status before attempting kuid mapping. This will return NFSERR_JUKEBOX on idmap lookup timeout so that clients can retry the operation instead of aborting it. Fixes: 65e10f6d0ab0 ("nfsd: Convert idmap to use kuids and kgids") Cc: stable@vger.kernel.org Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	nfsd: never defer requests during idmap lookup	Anthony Iliopoulos	3	-8/+58
	During v4 request compound arg decoding, some ops (e.g. SETATTR) can trigger idmap lookup upcalls. When those upcall responses get delayed beyond the allowed time limit, cache_check() will mark the request for deferral and cause it to be dropped. This prevents nfs4svc_encode_compoundres from being executed, and thus the session slot flag NFSD4_SLOT_INUSE never gets cleared. Subsequent client requests will fail with NFSERR_JUKEBOX, given that the slot will be marked as in-use, making the SEQUENCE op fail. Fix this by making sure that the RQ_USEDEFERRAL flag is always clear during nfs4svc_decode_compoundargs(), since no v4 request should ever be deferred. Fixes: 2f425878b6a7 ("nfsd: don't use the deferral service, return NFS4ERR_DELAY") Signed-off-by: Anthony Iliopoulos <ailiop@suse.com> Reviewed-by: NeilBrown <neil@brown.name> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	NFSD: fix setting FMODE_NOCMTIME in nfs4_open_delegation	Olga Kornievskaia	1	-1/+2
	fstests generic/215 and generic/407 were failing because the server wasn't updating mtime properly. When deleg attribute support is not compiled in and thus no attribute delegation was given, the server was skipping updating mtime and ctime because FMODE_NOCMTIME was uncoditionally set for the write delegation. Fixes: e5e9b24ab8fa ("nfsd: freeze c/mtime updates with outstanding WRITE_ATTRS delegation") Cc: stable@vger.kernel.org Signed-off-by: Olga Kornievskaia <okorniev@redhat.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	nfsd: fix nfs4_file refcount leak in nfsd_get_dir_deleg()	Jeff Layton	1	-1/+4
	Claude pointed out that there is a nfs4_file refcount leak in nfsd_get_dir_deleg(). Ensure that the reference to "fp" is released before returning. Fixes: 8b99f6a8c116 ("nfsd: wire up GET_DIR_DELEGATION handling") Cc: stable@vger.kernel.org Cc: Chris Mason <clm@meta.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	nfsd: use workqueue enable/disable APIs for v4_end_grace sync	NeilBrown	2	-14/+9
	"nfsd: provide locking for v4_end_grace" introduced a client_tracking_active flag protected by nn->client_lock to prevent the laundromat from being scheduled before client tracking initialization or after shutdown begins. That commit is suitable for backporting to LTS kernels that predate commit 86898fa6b8cd ("workqueue: Implement disable/enable for (delayed) work items"). However, the workqueue subsystem in recent kernels provides enable_delayed_work() and disable_delayed_work_sync() for this purpose. Using this mechanism enable us to remove the client_tracking_active flag and associated spinlock operations while preserving the same synchronization guarantees, which is a cleaner long-term approach. Signed-off-by: NeilBrown <neil@brown.name> Tested-by: Li Lingfeng <lilingfeng3@huawei.com> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	NFS: NFSERR_INVAL is not defined by NFSv2	Chuck Lever	2	-2/+2
	A documenting comment in include/uapi/linux/nfs.h claims incorrectly that NFSv2 defines NFSERR_INVAL. There is no such definition in either RFC 1094 or https://pubs.opengroup.org/onlinepubs/9629799/chap7.htm NFS3ERR_INVAL is introduced in RFC 1813. NFSD returns NFSERR_INVAL for PROC_GETACL, which has no specification (yet). However, nfsd_map_status() maps nfserr_symlink and nfserr_wrong_type to nfserr_inval, which does not align with RFC 1094. This logic was introduced only recently by commit 438f81e0e92a ("nfsd: move error choice for incorrect object types to version-specific code."). Given that we have no INVAL or SERVERFAULT status in NFSv2, probably the only choice is NFSERR_IO. Fixes: 438f81e0e92a ("nfsd: move error choice for incorrect object types to version-specific code.") Reviewed-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	nfsd: prefix notification in nfsd4_finalize_deleg_timestamps() with "nfsd: "	Jeff Layton	1	-1/+1
	Make it distinct that this message comes from nfsd. Signed-off-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	locks: ensure vfs_test_lock() never returns FILE_LOCK_DEFERRED	NeilBrown	2	-7/+14
	FILE_LOCK_DEFERRED can be returned when creating or removing a lock, but not when testing for a lock. This support was explicitly removed in Commit 09802fd2a8ca ("lockd: rip out deferred lock handling from testlock codepath") However the test in nlmsvc_testlock() suggests that it can be returned, only nlm cannot handle it. To aid clarity, remove the test and instead put a similar test and warning in vfs_test_lock(). If the impossible happens, convert FILE_LOCK_DEFERRED to -EIO. Signed-off-by: NeilBrown <neil@brown.name> Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	NFSD: Add instructions on how to deal with xdrgen files	Chuck Lever	1	-1/+9
	xdrgen requires a number of Python packages on the build system. We don't want to add these to the kernel build dependency list, which is long enough already. The generated files are generated manually using $ cd fs/nfsd && make xdrgen whenever the .x files are modified, then they are checked into the kernel repo so others do not need to rebuild them. Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	NFSD: Clean up nfsd4_check_open_attributes()	Chuck Lever	1	-19/+21
	The @rqstp parameter was introduced in commit 3c8e03166ae2 ("NFSv4: do exact check about attribute specified") but has never been used. Reduce indentation in callers to improve readability. Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
2026-01-26	gfs2: Fix slab-use-after-free in qd_put	Andreas Gruenbacher	1	-0/+1
	Commit a475c5dd16e5 ("gfs2: Free quota data objects synchronously") started freeing quota data objects during filesystem shutdown instead of putting them back onto the LRU list, but it failed to remove these objects from the LRU list, causing LRU list corruption. This caused use-after-free when the shrinker (gfs2_qd_shrink_scan) tried to access already-freed objects on the LRU list. Fix this by removing qd objects from the LRU list before freeing them in qd_put(). Initial fix from Deepanshu Kartikey <kartikey406@gmail.com>. Fixes: a475c5dd16e5 ("gfs2: Free quota data objects synchronously") Reported-by: syzbot+046b605f01802054bff0@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=046b605f01802054bff0 Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: Introduce glock_{type,number,sbd} helpers	Andreas Gruenbacher	12	-112/+124
	Introduce glock_type(), glock_number(), and glock_sbd() helpers for accessing a glock's type, number, and super block pointer more easily. Created with Coccinelle using the following semantic patch: @@ struct gfs2_glock gl; @@ - gl->gl_name.ln_type + glock_type(gl) @@ struct gfs2_glock gl; @@ - gl->gl_name.ln_number + glock_number(gl) @@ struct gfs2_glock gl; @@ - gl->gl_name.ln_sbd + glock_sbd(gl) glock_sbd() is a macro because it is used with const as well as non-const struct gfs2_glock arguments. Instances in macro definitions, particularly in tracepoint definitions, replaced by hand. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: gfs2_glock_hold cleanup	Andreas Gruenbacher	1	-2/+2
	Use lockref_get_not_dead() instead of an unguarded __lockref_is_dead() check. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs: Use fixed GL_GLOCK_MIN_HOLD time	Andreas Gruenbacher	1	-1/+1
	GL_GLOCK_MIN_HOLD represents the minimum time (in jiffies) that a glock should be held before being eligible for release. It is currently defined as 10, meaning that the duration depends on the timer interrupt frequency (CONFIG_HZ). Change that time to a constant 10ms independent of CONFIG_HZ. On CONFIG_HZ=1000 systems, the value remains the same. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: Fix gfs2_log_get_bio argument type	Andreas Gruenbacher	1	-3/+3
	Fix the type of gfs2_log_get_bio()'s op argument: callers pass in a blk_opf_t value and the function passes that value on as a blk_opf_t value, so the current argument type makes no sense. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: gfs2_chain_bio start sector fix	Andreas Gruenbacher	1	-3/+3
	Pass the start sector into gfs2_chain_bio(): the new bio isn't necessarily contiguous with the previous one. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: Initialize bio->bi_opf early	Andreas Gruenbacher	3	-22/+26
	Pass the right blk_opf_t value to bio_alloc() so that ->bi_ops is initialized correctly and doesn't have to be changed later. Adjust the call chain to pass that value through to where it is needed (and only there). Add a separate blk_opf_t argument to gfs2_chain_bio() instead of copying the value from the previous bio. Fixes: 8a157e0a0aa5 ("gfs2: Fix use of bio_chain") Reported-by: syzbot+f6539d4ce3f775aee0cc@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=f6539d4ce3f775aee0cc Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: Rename gfs2_log_submit_{bio -> write}	Andreas Gruenbacher	3	-6/+6
	Rename gfs2_log_submit_bio() to gfs2_log_submit_write(): this function isn't used for submitting log reads. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: Do not cancel internal demote requests	Andreas Gruenbacher	3	-15/+35
	Trying to change the state of a glock may result in a "conversion deadlock" error, indicating that the requested state transition would cause a deadlock. In this case, we unlock and retry the state change. It makes no sense to try canceling those unlock requests, but the current code is not aware of that. In addition, if a locking request is canceled shortly after it is made, the cancelation request can currently overtake the locking request. This may result in deadlocks. Fix both of these bugs by repurposing the GLF_PENDING_REPLY flag into a GLF_MAY_CANCEL flag which is set only when the current locking request can be canceled. When trying to cancel a locking request in gfs2_glock_dq(), wait for this flag to be set. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: run_queue cleanup	Andreas Gruenbacher	1	-13/+7
	In run_queue(), instead of always setting the GLF_LOCK flag, only set it when the flag is actually needed. This avoids having to undo the flag setting later. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: Retries missing in gfs2_{rename,exchange}	Andreas Gruenbacher	3	-14/+43
	Fix a bug in gfs2's asynchronous glock handling for rename and exchange operations. The original async implementation from commit ad26967b9afa ("gfs2: Use async glocks for rename") mentioned that retries were needed but never implemented them, causing operations to fail with -ESTALE instead of retrying on timeout. Also makes the waiting interruptible. In addition, the timeouts used were too high for situations in which timing out is a rare but expected scenario. Switch to shorter timeouts with randomization and exponentional backoff. Fixes: ad26967b9afa ("gfs2: Use async glocks for rename") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-26	gfs2: glock cancelation flag fix	Andreas Gruenbacher	1	-0/+2
	When an asynchronous glock holder is dequeued that hasn't been granted yet (HIF_HOLDER not set) and no dlm operation is in progress on behalf of that holder (GLF_LOCK not set), the dequeuing takes place in __gfs2_glock_dq(). There, we are not clearing the HIF_WAIT flag and waking up waiters. Fix that. This bug prevents the same holder from being enqueued later (gfs2_glock_nq()) without first reinitializing it (gfs2_holder_reinit()). The code doesn't currently use this pattern, but this will change in the next commit. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-24	xfs: check for deleted cursors when revalidating two btrees	Darrick J. Wong	2	-3/+32
	The free space and inode btree repair functions will rebuild both btrees at the same time, after which it needs to evaluate both btrees to confirm that the corruptions are gone. However, Jiaming Zhang ran syzbot and produced a crash in the second xchk_allocbt call. His root-cause analysis is as follows (with minor corrections): In xrep_revalidate_allocbt(), xchk_allocbt() is called twice (first for BNOBT, second for CNTBT). The cause of this issue is that the first call nullified the cursor required by the second call. Let's first enter xrep_revalidate_allocbt() via following call chain: xfs_file_ioctl() -> xfs_ioc_scrubv_metadata() -> xfs_scrub_metadata() -> `sc->ops->repair_eval(sc)` -> xrep_revalidate_allocbt() xchk_allocbt() is called twice in this function. In the first call: /* Note that sc->sm->sm_type is XFS_SCRUB_TYPE_BNOPT now / xchk_allocbt() -> xchk_btree() -> `bs->scrub_rec(bs, recp)` -> xchk_allocbt_rec() -> xchk_allocbt_xref() -> xchk_allocbt_xref_other() since sm_type is XFS_SCRUB_TYPE_BNOBT, pur is set to &sc->sa.cnt_cur. Kernel called xfs_alloc_get_rec() and returned -EFSCORRUPTED. Call chain: xfs_alloc_get_rec() -> xfs_btree_get_rec() -> xfs_btree_check_block() -> (XFS_IS_CORRUPT \|\| XFS_TEST_ERROR), the former is false and the latter is true, return -EFSCORRUPTED. This should be caused by ioctl$XFS_IOC_ERROR_INJECTION I guess. Back to xchk_allocbt_xref_other(), after receiving -EFSCORRUPTED from xfs_alloc_get_rec(), kernel called xchk_should_check_xref(). In this function, curpp (points to sc->sa.cnt_cur) is nullified. Back to xrep_revalidate_allocbt(), since sc->sa.cnt_cur has been nullified, it then triggered null-ptr-deref via xchk_allocbt() (second call) -> xchk_btree(). So. The bnobt revalidation failed on a cross-reference attempt, so we deleted the cntbt cursor, and then crashed when we tried to revalidate the cntbt. Therefore, check for a null cntbt cursor before that revalidation, and mark the repair incomplete. Also we can ignore the second tree entirely if the first tree was rebuilt but is already corrupt. Apply the same fix to xrep_revalidate_iallocbt because it has the same problem. Cc: r772577952@gmail.com Link: https://lore.kernel.org/linux-xfs/CANypQFYU5rRPkTy=iG5m1Lp4RWasSgrHXAh3p8YJojxV0X15dQ@mail.gmail.com/T/#m520c7835fad637eccf843c7936c200589427cc7e Cc: <stable@vger.kernel.org> # v6.8 Fixes: dbfbf3bdf639a2 ("xfs: repair inode btrees") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jiaming Zhang <r772577952@gmail.com>
2026-01-24	xfs: fix UAF in xchk_btree_check_block_owner	Darrick J. Wong	1	-2/+5
	We cannot dereference bs->cur when trying to determine if bs->cur aliases bs->sc->sa.{bno,rmap}_cur after the latter has been freed. Fix this by sampling before type before any freeing could happen. The correct temporal ordering was broken when we removed xfs_btnum_t. Cc: r772577952@gmail.com Cc: <stable@vger.kernel.org> # v6.9 Fixes: ec793e690f801d ("xfs: remove xfs_btnum_t") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jiaming Zhang <r772577952@gmail.com>
2026-01-24	xfs: check return value of xchk_scrub_create_subord	Darrick J. Wong	3	-1/+7
	Fix this function to return NULL instead of a mangled ENOMEM, then fix the callers to actually check for a null pointer and return ENOMEM. Most of the corrections here are for code merged between 6.2 and 6.10. Cc: r772577952@gmail.com Cc: <stable@vger.kernel.org> # v6.12 Fixes: 1a5f6e08d4e379 ("xfs: create subordinate scrub contexts for xchk_metadata_inode_subtype") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jiaming Zhang <r772577952@gmail.com>
2026-01-24	xfs: only call xf{array,blob}_destroy if we have a valid pointer	Darrick J. Wong	5	-9/+24
	Only call the xfarray and xfblob destructor if we have a valid pointer, and be sure to null out that pointer afterwards. Note that this patch fixes a large number of commits, most of which were merged between 6.9 and 6.10. Cc: r772577952@gmail.com Cc: <stable@vger.kernel.org> # v6.12 Fixes: ab97f4b1c03075 ("xfs: repair AGI unlinked inode bucket lists") Signed-off-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Jiaming Zhang <r772577952@gmail.com>
2026-01-23	et4: allow zeroout when doing written to unwritten split	Ojaswin Mujoo	2	-9/+122
	Currently, when we are doing an extent split and convert operation of written to unwritten extent (example, as done by ZERO_RANGE), we don't allow the zeroout fallback in case the extent tree manipulation fails. This is mostly because zeroout might take unsually long and the fact that this code path is more tolerant to failures than endio. Since we have zeroout machinery in place, we might as well use it hence lift this restriction. To mitigate zeroout taking too long respect the max zeroout limit here so that the operation finishes relatively fast. Also, add kunit tests for this case. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/1c3349020b8e098a63f293b84bc8a9b56011cef4.1769149131.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-01-23	ext4: refactor split and convert extents	Ojaswin Mujoo	1	-165/+112
	ext4_split_convert_extents() has been historically prone to subtle bugs and inconsistent behavior due to the way all the various flags interact with the extent split and conversion process. For example, callers like ext4_convert_unwritten_extents_endio() and convert_initialized_extents() needed to open code extent conversion despite passing CONVERT or CONVERT_UNWRITTEN flags because ext4_split_convert_extents() wasn't performing the conversion. Hence, refactor ext4_split_convert_extents() to clearly enforce the semantics of each flag. The major changes here are: * Clearly separate the split and convert process: * ext4_split_extent() and ext4_split_extent_at() are now only responsible to perform the split. * ext4_split_convert_extents() is now responsible to perform extent conversion after calling ext4_split_extent() for splitting. * This helps get rid of all the MARK_UNWRIT* flags. * Clearly enforce the semantics of flags passed to ext4_split_convert_extents(): * EXT4_GET_BLOCKS_CONVERT: Will convert the split extent to written * EXT4_GET_BLOCKS_CONVERT_UNWRITTEN: Will convert the split extent to unwritten * Modify all callers to enforce the above semantics. * Use ext4_split_convert_extents() instead of ext4_split_extents() in ext4_ext_convert_to_initialized() for uniformity. * Now that ext4_split_convert_extents() is handling caching to es, we dont need to do it in ext4_split_extent_zeroout(). * Cleanup all callers open coding the conversion logic. Further, modify kuniy tests to pass flags based on the new semantics. >From an end user point of view, we should not see any changes in behavior of ext4. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/2084a383d69ceefbaa293b8fcf725365eca0a349.1769149131.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-01-23	ext4: refactor zeroout path and handle all cases	Ojaswin Mujoo	1	-98/+185
	Currently, zeroout is used as a fallback in case we fail to split/convert extents in the "traditional" modify-the-extent-tree way. This is essential to mitigate failures in critical paths like extent splitting during endio. However, the logic is very messy and not easy to follow. Further, the fragile use of various flags has made it prone to errors. Refactor zeroout out logic by moving it up to ext4_split_extents(). Further, zeroout correctly based on the type of conversion we want, ie: - unwritten to written: Zeroout everything around the mapped range. - written to unwritten: Zeroout only the mapped range. Also, ext4_ext_convert_to_initialized() now passes EXT4_GET_BLOCKS_CONVERT to make the intention clear. Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/e1b51dedeca7c0b1f702141d91edfe4230560e7b.1769149131.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-01-23	ext4: propagate flags to ext4_convert_unwritten_extents_endio()	Ojaswin Mujoo	1	-6/+3
	Currently, callers like ext4_convert_unwritten_extents() pass EXT4_EX_NOCACHE flag to avoid caching extents however this is not respected by ext4_convert_unwritten_extents_endio(). Hence, modify it to accept flags from the caller and to pass the flags on to other extent manipulation functions it calls. This makes sure the NOCACHE flag is respected throughout the code path. Also, since the caller already passes METADATA_NOFAIL and CONVERT flags we don't need to explicitly pass it anymore. Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Signed-off-by: Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/7c2139e0ad32c49c19b194f72219e15d613de284.1769149131.git.ojaswin@linux.ibm.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>