aboutsummaryrefslogtreecommitdiff
path: root/fs/gfs2
AgeCommit message (Collapse)AuthorFilesLines
3 daysMerge tag 'vfs-6.19-rc8.fixes' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs fixes from Christian Brauner: - Fix the the buggy conversion of fuse_reverse_inval_entry() introduced during the creation rework - Disallow nfs delegation requests for directories by setting simple_nosetlease() - Require an opt-in for getting readdir flag bits outside of S_DT_MASK set in d_type - Fix scheduling delayed writeback work by only scheduling when the dirty time expiry interval is non-zero and cancel the delayed work if the interval is set to zero - Use rounded_jiffies_interval for dirty time work - Check the return value of sb_set_blocksize() for romfs - Wait for batched folios to be stable in __iomap_get_folio() - Use private naming for fuse hash size - Fix the stale dentry cleanup to prevent a race that causes a UAF * tag 'vfs-6.19-rc8.fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: document d_dispose_if_unused() fuse: shrink once after all buckets have been scanned fuse: clean up fuse_dentry_tree_work() fuse: add need_resched() before unlocking bucket fuse: make sure dentry is evicted if stale fuse: fix race when disposing stale dentries fuse: use private naming for fuse hash size writeback: use round_jiffies_relative for dirtytime_work iomap: wait for batched folios to be stable in __iomap_get_folio romfs: check sb_set_blocksize() return value docs: clarify that dirtytime_expire_seconds=0 disables writeback writeback: fix 100% CPU usage when dirtytime_expire_interval is 0 readdir: require opt-in for d_type flags vboxsf: don't allow delegations to be set on directories ceph: don't allow delegations to be set on directories gfs2: don't allow delegations to be set on directories 9p: don't allow delegations to be set on directories smb/client: properly disallow delegations on directories nfs: properly disallow delegation requests on directories fuse: fix conversion of fuse_reverse_inval_entry() to start_removing()
2026-01-12Revert "gfs2: Fix use of bio_chain"Andreas Gruenbacher1-1/+1
This reverts commit 8a157e0a0aa5143b5d94201508c0ca1bb8cfb941. That commit incorrectly assumed that the bio_chain() arguments were swapped in gfs2. However, gfs2 intentionally constructs bio chains so that the first bio's bi_end_io callback is invoked when all bios in the chain have completed, unlike bio chains where the last bio's callback is invoked. Fixes: 8a157e0a0aa5 ("gfs2: Fix use of bio_chain") Cc: stable@vger.kernel.org Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-01-12gfs2: don't allow delegations to be set on directoriesJeff Layton1-0/+1
With the advent of directory leases, it's necessary to set the ->setlease() handler in directory file_operations to properly deny them. In the "nolock" case however, there is no need to deny them. Fixes: e6d28ebc17eb ("filelock: push the S_ISREG check down to ->setlease handlers") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260107-setlease-6-19-v1-4-85f034abcc57@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-03Merge tag 'gfs2-for-6.19' of ↵Linus Torvalds22-764/+382
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Major withdraw / error handling overhaul based on dlm's new DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like node failures. Make withdraws asynchronous - Fix a bug in commit e4a8b5481c59a that caused 'df' to remain out of sync. ('df' is still allowed to go slightly out of sync for short periods of time) - Prevent recusive memory reclaim in gfs2_unstuff_dinode() - Clean up SDF_JOURNAL_LIVE flag handling - Fix remote evict for read-only filesystems - Fix a misuse of bio_chain() - Various other minor cleanups * tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (35 commits) gfs2: Fix use of bio_chain gfs2: Clean up SDF_JOURNAL_LIVE flag handling gfs2: No longer thaw filesystems during a withdraw gfs2: Withdraw immediately in gfs2_trans_add_meta gfs2: New gfs2_withdraw_helper gfs2: Clean up properly during a withdraw gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks} Revert "gfs2: fix infinite loop when checking ail item count before go_inval" Revert "gfs2: Allow some glocks to be used during withdraw" Revert "gfs2: Check for log write errors before telling dlm to unlock" Revert "gfs2: fix a deadlock on withdraw-during-mount" Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (3/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (2/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (1/6) Revert "gfs2: don't stop reads while withdraw in progress" gfs2: Rename LM_FLAG_{NOEXP -> RECOVER} gfs2: Kill gfs2_io_error_bh_wd ...
2025-12-02gfs2: Fix use of bio_chainAndreas Gruenbacher1-1/+1
In gfs2_chain_bio(), the call to bio_chain() has its arguments swapped. The result is leaked bios and incorrect synchronization (only the last bio will actually be waited for). This code is only used during mount and filesystem thaw, so the bug normally won't be noticeable. Reported-by: Stephen Zhang <starzhangzsd@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-12-01Merge tag 'vfs-6.19-rc1.folio' of ↵Linus Torvalds1-2/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull folio updates from Christian Brauner: "Add a new folio_next_pos() helper function that returns the file position of the first byte after the current folio. This is a common operation in filesystems when needing to know the end of the current folio. The helper is lifted from btrfs which already had its own version, and is now used across multiple filesystems and subsystems: - btrfs - buffer - ext4 - f2fs - gfs2 - iomap - netfs - xfs - mm This fixes a long-standing bug in ocfs2 on 32-bit systems with files larger than 2GiB. Presumably this is not a common configuration, but the fix is backported anyway. The other filesystems did not have bugs, they were just mildly inefficient. This also introduce uoff_t as the unsigned version of loff_t. A recent commit inadvertently changed a comparison from being unsigned (on 64-bit systems) to being signed (which it had always been on 32-bit systems), leading to sporadic fstests failures. Generally file sizes are restricted to being a signed integer, but in places where -1 is passed to indicate "up to the end of the file", it is convenient to have an unsigned type to ensure comparisons are always unsigned regardless of architecture" * tag 'vfs-6.19-rc1.folio' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Add uoff_t mm: Use folio_next_pos() xfs: Use folio_next_pos() netfs: Use folio_next_pos() iomap: Use folio_next_pos() gfs2: Use folio_next_pos() f2fs: Use folio_next_pos() ext4: Use folio_next_pos() buffer: Use folio_next_pos() btrfs: Use folio_next_pos() filemap: Add folio_next_pos()
2025-12-01Merge tag 'vfs-6.19-rc1.writeback' of ↵Linus Torvalds1-4/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull writeback updates from Christian Brauner: "Features: - Allow file systems to increase the minimum writeback chunk size. The relatively low minimal writeback size of 4MiB means that written back inodes on rotational media are switched a lot. Besides introducing additional seeks, this also can lead to extreme file fragmentation on zoned devices when a lot of files are cached relative to the available writeback bandwidth. This adds a superblock field that allows the file system to override the default size, and sets it to the zone size for zoned XFS. - Add logging for slow writeback when it exceeds sysctl_hung_task_timeout_secs. This helps identify tasks waiting for a long time and pinpoint potential issues. Recording the starting jiffies is also useful when debugging a crashed vmcore. - Wake up waiting tasks when finishing the writeback of a chunk Cleanups: - filemap_* writeback interface cleanups. Adding filemap_fdatawrite_wbc ended up being a mistake, as all but the original btrfs caller should be using better high level interfaces instead. This series removes all these low-level interfaces, switches btrfs to a more specific interface, and cleans up other too low-level interfaces. With this the writeback_control that is passed to the writeback code is only initialized in three places. - Remove __filemap_fdatawrite, __filemap_fdatawrite_range, and filemap_fdatawrite_wbc - Add filemap_flush_nr helper for btrfs - Push struct writeback_control into start_delalloc_inodes in btrfs - Rename filemap_fdatawrite_range_kick to filemap_flush_range - Stop opencoding filemap_fdatawrite_range in 9p, ocfs2, and mm - Make wbc_to_tag() inline and use it in fs" * tag 'vfs-6.19-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: Make wbc_to_tag() inline and use it in fs. xfs: set s_min_writeback_pages for zoned file systems writeback: allow the file system to override MIN_WRITEBACK_PAGES writeback: cleanup writeback_chunk_size mm: rename filemap_fdatawrite_range_kick to filemap_flush_range mm: remove __filemap_fdatawrite_range mm: remove filemap_fdatawrite_wbc mm: remove __filemap_fdatawrite mm,btrfs: add a filemap_flush_nr helper btrfs: push struct writeback_control into start_delalloc_inodes btrfs: use the local tmp_inode variable in start_delalloc_inodes ocfs2: don't opencode filemap_fdatawrite_range in ocfs2_journal_submit_inode_data_buffers 9p: don't opencode filemap_fdatawrite_range in v9fs_mmap_vm_close mm: don't opencode filemap_fdatawrite_range in filemap_invalidate_inode writeback: Add logging for slow writeback (exceeds sysctl_hung_task_timeout_secs) writeback: Wake up waiting tasks when finishing the writeback of a chunk.
2025-12-01Merge tag 'vfs-6.19-rc1.inode' of ↵Linus Torvalds5-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs inode updates from Christian Brauner: "Features: - Hide inode->i_state behind accessors. Open-coded accesses prevent asserting they are done correctly. One obvious aspect is locking, but significantly more can be checked. For example it can be detected when the code is clearing flags which are already missing, or is setting flags when it is illegal (e.g., I_FREEING when ->i_count > 0) - Provide accessors for ->i_state, converts all filesystems using coccinelle and manual conversions (btrfs, ceph, smb, f2fs, gfs2, overlayfs, nilfs2, xfs), and makes plain ->i_state access fail to compile - Rework I_NEW handling to operate without fences, simplifying the code after the accessor infrastructure is in place Cleanups: - Move wait_on_inode() from writeback.h to fs.h - Spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb for clarity - Cosmetic fixes to LRU handling - Push list presence check into inode_io_list_del() - Touch up predicts in __d_lookup_rcu() - ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage - Assert on ->i_count in iput_final() - Assert ->i_lock held in __iget() Fixes: - Add missing fences to I_NEW handling" * tag 'vfs-6.19-rc1.inode' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (22 commits) dcache: touch up predicts in __d_lookup_rcu() fs: push list presence check into inode_io_list_del() fs: cosmetic fixes to lru handling fs: rework I_NEW handling to operate without fences fs: make plain ->i_state access fail to compile xfs: use the new ->i_state accessors nilfs2: use the new ->i_state accessors overlayfs: use the new ->i_state accessors gfs2: use the new ->i_state accessors f2fs: use the new ->i_state accessors smb: use the new ->i_state accessors ceph: use the new ->i_state accessors btrfs: use the new ->i_state accessors Manual conversion to use ->i_state accessors of all places not covered by coccinelle Coccinelle-based conversion to use ->i_state accessors fs: provide accessors for ->i_state fs: spell out fenced ->i_state accesses with explicit smp_wmb/smp_rmb fs: move wait_on_inode() from writeback.h to fs.h fs: add missing fences to I_NEW handling ocfs2: retire ocfs2_drop_inode() and I_WILL_FREE usage ...
2025-11-26gfs2: Clean up SDF_JOURNAL_LIVE flag handlingAndreas Gruenbacher3-36/+33
Change do_withdraw() to clear the SDF_JOURNAL_LIVE flag under the log flush lock. In addition, change __gfs2_trans_begin() to check if the filesystem is already known to be withdrawn using gfs2_withdrawn(). Then, once we are holding the log flush lock, check if the SDF_JOURNAL_LIVE flag is still set. This second check ensures that the filesystem will remain live until the transaction is submitted. With these changes, it is no longer useful to clear SDF_JOURNAL_LIVE in gfs2_end_log_write() after calling gfs2_withdraw(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: No longer thaw filesystems during a withdrawAndreas Gruenbacher3-22/+0
Previously, when a withdraw occurred, we would wait for another node to recover our journal. This also meant that frozen filesystem needed to be thawed because otherwise, other nodes wouldn't be able to recover the filesystem. With the reversal of commit 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish"), we are no longer waiting for journal recovery during a withdraw, so we no longer need to thaw frozen filesystems, either. This also fixes a potential deadlock reported by lockdep when running xfstest generic/108. In addition, there is nothing left in do_withdraw() that would require taking sd_freeze_mutex, so don't bother taking that lock there anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Withdraw immediately in gfs2_trans_add_metaAndreas Gruenbacher1-4/+1
We can now withdraw while the log is locked. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: New gfs2_withdraw_helperAndreas Gruenbacher5-28/+90
Currently, when a gfs2 filesystem is withdrawn, an "offline" uevent is triggered that invokes gfs2-util's gfs2_withdraw_helper script. The purpose of this script is to deactivate the filesystem's block device so that it can be withdrawn immediately, even before all the filesystem's caches have been discarded. The script provided by gfs2-utils never did anything useful, and there was no way for it to report back its status to the kernel. To fix that, extend the gfs2_withdraw_helper mechanism so that the script can report one of the following results by writing the corresponding value into "/sys$DEVPATH/lock_module/withdraw": 0 - The shared block device has been marked inactive. Future write operations will fail. 1 - The shared block device may still be active and carry out write operations. If the "offline" uevent isn't reacted upon within the timeout configured in /sys$DEVPATH/tune/withdraw_helper_timeout (default 5 seconds), the event handler is assumed to have failed. In addition, add an additional "errors=deactivate" mount option. With these changes, if fatal errors are detected on a gfs2 filesystem and the filesystem is mounted with the "errors=panic" option, the kernel will panic immediately. Otherwise, an attempt will be made to deactivate the underlying block device. If successful, the kernel will release all cluster-wide locks immediately so that the rest of the cluster can continue. If unsuccessful, the kernel will either panic ("errors=deactivate"), or it will purge all filesystem I/O before releasing all cluster-wide locks ("errors=withdraw"). Note that the gfs2_withdraw_helper script still needs to be fixed to take advantage of these improvements. It could be changed to use a mechanism like LVM Persistent Reservations. "dmsetup suspend" is not a suitable mechanism as it infinitely postpones I/O operations, which may prevent withdraw from completing. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Clean up properly during a withdrawAndreas Gruenbacher1-37/+48
During a withdraw, we don't want to write out any more data than we have to, so in do_xmote(), skip the ->go_sync() glock operation. We still want to keep calling ->go_inval() to discard any cached data or metadata, whether clean or dirty. We do still allow glocks to transition into state LM_ST_UNLOCKED. This has the desired side effect of calling ->go_inval() and invalidating the glock caches. Function gfs2_withdraw_glocks() is already used for dequeuing any left-over waiters. We still want that to happen, but additionally, we want all glocks to be unlocked. Finally, we change function do_promote() to refuse any further promotions. This commit cleans up the leftovers of commit 86934198eefa ("gfs2: Clear flags when withdraw prevents xmote"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks}Andreas Gruenbacher3-5/+5
Rename function gfs2_gl_dq_holders() to gfs2_withdraw_glocks(). This function will soon be used for more than just dequeuing holders. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: fix infinite loop when checking ail item count before go_inval"Andreas Gruenbacher2-16/+4
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts commit 33dbd1e41a1d ("gfs2: fix infinite loop when checking ail item count before go_inval"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: Allow some glocks to be used during withdraw"Andreas Gruenbacher4-46/+6
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts commit a72d2401f54b ("gfs2: Allow some glocks to be used during withdraw"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: Check for log write errors before telling dlm to unlock"Andreas Gruenbacher1-16/+0
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts the rest of d93ae386ef3d ("gfs2: Check for log write errors before telling dlm to unlock"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: fix a deadlock on withdraw-during-mount"Andreas Gruenbacher1-35/+18
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts commit 865cc3e9cc0b ("gfs2: fix a deadlock on withdraw-during-mount"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6)Andreas Gruenbacher1-124/+3
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts parts of commit 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6)Andreas Gruenbacher1-3/+2
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts parts of commit 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6)Andreas Gruenbacher1-7/+3
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts parts of commit 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (3/6)Andreas Gruenbacher4-86/+0
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts parts of commit 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (2/6)Andreas Gruenbacher6-38/+0
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts parts of commit 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (1/6)Andreas Gruenbacher5-36/+1
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. Reverts parts of commit 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26Revert "gfs2: don't stop reads while withdraw in progress"Andreas Gruenbacher5-22/+6
The current withdraw code duplicates the journal recovery code gfs2 already has for dealing with node failures, and it does so poorly. That code was added because when releasing a lockspace, we didn't have a way to indicate that the lockspace needs recovery. We now do have this feature, so the current withdraw code can be removed almost entirely. This is one of several steps towards that. The withdrawing node has no role in recovering from the withdraw anymore, so it also no longer needs to read metadata blocks after a withdraw. We now only need to set a single bit in gfs2_withdraw(), so switch from try_cmpxchg() to test_and_set_bit(). Reverts commit 8cc67f704f4b ("gfs2: don't stop reads while withdraw in progress"). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Rename LM_FLAG_{NOEXP -> RECOVER}Andreas Gruenbacher6-20/+23
GFS sets the LM_FLAG_NOEXP flag on locking requests it makes during journal recovery, so rename the flag to LM_FLAG_RECOVER for improved code readability. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Kill gfs2_io_error_bh_wdAndreas Gruenbacher5-22/+10
All callers of gfs2_io_error_bh() call gfs2_withdraw() as well, so change gfs2_io_error_bh() to call gfs2_withdraw() directly. This also brings it in line with other similar error reporting functions. With that, gfs2_io_error_bh() is the same as gfs2_io_error_bh_wd(), so remove the latter. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Withdraw immediately on log write errorsAndreas Gruenbacher2-14/+2
Now that gfs2_withdraw() is asynchronous, immediately withdraw when a log write error is detected. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Rename gfs2_{withdrawing_or_ => }withdrawnAndreas Gruenbacher15-43/+42
With delayed withdraws and the SDF_WITHDRAWING flag gone, we can now rename gfs2_withdrawing_or_withdrawn() back to gfs2_withdrawn(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Get rid of delayed withdrawsAndreas Gruenbacher8-74/+16
Now that gfs2_withdraw() is asynchronous, is can be called in any context and there is no more need for gfs2_withdraw_delayed() or for turning delayed withdraws into actual withdraws. Remove the now-obsolete code. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Asynchronous withdrawAndreas Gruenbacher5-25/+37
So far, withdraws are carried out in the context of the calling task. When another task tries to withdraw while a withdraw is already underway, that task blocks as well. Change that to carry out withdraws asynchronously in workqueue context and don't block the task triggering the withdraw anymore. Fixes: syzbot+6b156e132970e550194c@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Add clean argument to lm_unmount hookAndreas Gruenbacher4-5/+15
Add a 'clean' argument to ->lm_unmount() that indicates whether the filesystem is clean or needs recovery. Set clean to true for normal unmounts, and to false for withdraws. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Clean up quotad timeout handlingAndreas Gruenbacher1-31/+27
Instead of tracking the remaining time, track the deadline of each of the timeouts. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Fix "gfs2: Switch to wait_event in gfs2_quotad"Andreas Gruenbacher1-1/+1
Commit e4a8b5481c59a ("gfs2: Switch to wait_event in gfs2_quotad") broke cyclic statfs syncing, so the numbers reported by "df" could easily get completely out of sync with reality. Fix this by reverting part of commit e4a8b5481c59a for now. A follow-up commit will clean this code up later. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Minor cosmetic remote delete cleanupsAndreas Gruenbacher1-3/+9
Rename gfs2_try_evict() to gfs2_try_to_evict(). The GIF_DEFER_DELETE flag has been superceded by the GLF_DEFER_DELETE flag, so fix a left-over comment. Add a clarifying comment to delete_work_func(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: fix remote evict for read-only filesystemsAndreas Gruenbacher1-2/+1
When a node tries to delete an inode, it first requests exclusive access to the iopen glock. This triggers demote requests on all remote nodes currently holding the iopen glock. To satisfy those requests, the remote nodes evict the inode in question, or they poke the corresponding inode glock to signal that the inode is still in active use. This behavior doesn't depend on whether or not a filesystem is read-only, so remove the incorrect read-only check. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: fix freeze error handlingAlexey Velichayshiy1-3/+1
After commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"), the freeze error handling is broken because gfs2_do_thaw() overwrites the 'error' variable, causing incorrect processing of the original freeze error. Fix this by calling gfs2_do_thaw() when gfs2_lock_fs_check_clean() fails but ignoring its return value to preserve the original freeze error for proper reporting. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b77b4a4815a9 ("gfs2: Rework freeze / thaw logic") Cc: stable@vger.kernel.org # v6.5+ Signed-off-by: Alexey Velichayshiy <a.velichayshiy@ispras.ru> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Prevent recursive memory reclaimAndreas Gruenbacher4-2/+21
Function new_inode() returns a new inode with inode->i_mapping->gfp_mask set to GFP_HIGHUSER_MOVABLE. This value includes the __GFP_FS flag, so allocations in that address space can recurse into filesystem memory reclaim. We don't want that to happen because it can consume a significant amount of stack memory. Worse than that is that it can also deadlock: for example, in several places, gfs2_unstuff_dinode() is called inside filesystem transactions. This calls filemap_grab_folio(), which can allocate a new folio, which can trigger memory reclaim. If memory reclaim recurses into the filesystem and starts another transaction, a deadlock will ensue. To fix these kinds of problems, prevent memory reclaim from recursing into filesystem code by making sure that the gfp_mask of inode address spaces doesn't include __GFP_FS. The "meta" and resource group address spaces were already using GFP_NOFS as their gfp_mask (which doesn't include __GFP_FS). The default value of GFP_HIGHUSER_MOVABLE is less restrictive than GFP_NOFS, though. To avoid being overly limiting, use the default value and only knock off the __GFP_FS flag. I'm not sure if this will actually make a difference, but it also shouldn't hurt. This patch is loosely based on commit ad22c7a043c2 ("xfs: prevent stack overflows from page cache allocation"). Fixes xfstest generic/273. Fixes: dc0b9435238c ("gfs: Don't use GFP_NOFS in gfs2_unstuff_dinode") Reviewed-by: Andrew Price <anprice@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-25fs: rework I_NEW handling to operate without fencesMateusz Guzik1-1/+1
In the inode hash code grab the state while ->i_lock is held. If found to be set, synchronize the sleep once more with the lock held. In the real world the flag is not set most of the time. Apart from being simpler to reason about, it comes with a minor speed up as now clearing the flag does not require the smp_mb() fence. While here rename wait_on_inode() to wait_on_new_inode() to line it up with __wait_on_freeing_inode(). Christian Brauner <brauner@kernel.org> says: As per the discussion in [1] I folded in the diff sent in [2]. Link: https://lore.kernel.org/69238e4d.a70a0220.d98e3.006e.GAE@google.com [1] Link: https://lore.kernel.org/c2kpawomkbvtahjm7y5mposbhckb7wxthi3iqy5yr22ggpucrm@ufvxwy233qxo [2] Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20251010221737.1403539-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-11-09gfs2: Use bio_add_folio_nofail()Matthew Wilcox (Oracle)1-2/+1
As the label says, we've just allocated a new BIO so we know we can add this folio to it. We now have bio_add_folio_nofail() for this purpose. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-05iomap: add caller-provided callbacks for read and readaheadJoanne Koong1-3/+3
Add caller-provided callbacks for read and readahead so that it can be used generically, especially by filesystems that are not block-based. In particular, this: * Modifies the read and readahead interface to take in a struct iomap_read_folio_ctx that is publicly defined as: struct iomap_read_folio_ctx { const struct iomap_read_ops *ops; struct folio *cur_folio; struct readahead_control *rac; void *read_ctx; }; where struct iomap_read_ops is defined as: struct iomap_read_ops { int (*read_folio_range)(const struct iomap_iter *iter, struct iomap_read_folio_ctx *ctx, size_t len); void (*read_submit)(struct iomap_read_folio_ctx *ctx); }; read_folio_range() reads in the folio range and is required by the caller to provide. read_submit() is optional and is used for submitting any pending read requests. * Modifies existing filesystems that use iomap for read and readahead to use the new API, through the new statically inlined helpers iomap_bio_read_folio() and iomap_bio_readahead(). There is no change in functionality for those filesystems. Signed-off-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-31gfs2: document ip in __gfs2_holder_init kernel-doc commentSukrut Heroorkar1-1/+1
Building with W=1 reports: Warning: fs/gfs2/glock.c:1248 function parameter 'ip' not described in '__gfs2_holder_init' The ip parameter was added when __gfs2_holder_init started saving the gfs2_glock_nq_init caller's return address to gh_ip. This makes it easier to backtrack which holder took the lock. Document @ip to silence this warning. Fixes: b016d9a84abd ("gfs2: Save ip from gfs2_glock_nq_init") Signed-off-by: Sukrut Heroorkar <hsukrut3@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-10-31gfs2/sysfs: Replace sprintf/snprintf with sysfs_emitUtkarsh Singh1-15/+15
Documentation/filesystems/sysfs.rst mentions that show() should only use sysfs_emit() or sysfs_emit_at() when formatting values returned to user space. This patch updates the GFS2 sysfs interface accordingly. It replaces uses of sprintf() and snprintf() in all *_show() functions with sysfs_emit() to align with current kernel sysfs API best practices. It also updates the TUNE_ATTR_2 macro to use sysfs_emit() instead of snprintf(). Signed-off-by: Utkarsh Singh <utkarsh.singh.em@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-10-31gfs2: Use folio_next_pos()Matthew Wilcox (Oracle)1-2/+1
This is one instruction more efficient than open-coding folio_pos() + folio_size(). It's the equivalent of (x + y) << z rather than x << z + y << z. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://patch.msgid.link/20251024170822.1427218-7-willy@infradead.org Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: gfs2@lists.linux.dev Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-29fs: Make wbc_to_tag() inline and use it in fs.Julian Sun1-4/+1
The logic in wbc_to_tag() is widely used in file systems, so modify this function to be inline and use it in file systems. This patch has only passed compilation tests, but it should be fine. Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20gfs2: use the new ->i_state accessorsMateusz Guzik4-5/+5
Change generated with coccinelle and fixed up by hand as appropriate. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-03Merge tag 'pull-finish_no_open' of ↵Linus Torvalds1-17/+9
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull finish_no_open updates from Al Viro: "finish_no_open calling conventions change to simplify callers" * tag 'pull-finish_no_open' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: slightly simplify nfs_atomic_open() simplify gfs2_atomic_open() simplify fuse_atomic_open() simplify nfs_atomic_open_v23() simplify vboxsf_dir_atomic_open() simplify cifs_atomic_open() 9p: simplify v9fs_vfs_atomic_open_dotl() 9p: simplify v9fs_vfs_atomic_open() allow finish_no_open(file, ERR_PTR(-E...))
2025-10-02Merge tag 'mm-stable-2025-10-01-19-00' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "mm, swap: improve cluster scan strategy" from Kairui Song improves performance and reduces the failure rate of swap cluster allocation - "support large align and nid in Rust allocators" from Vitaly Wool permits Rust allocators to set NUMA node and large alignment when perforning slub and vmalloc reallocs - "mm/damon/vaddr: support stat-purpose DAMOS" from Yueyang Pan extend DAMOS_STAT's handling of the DAMON operations sets for virtual address spaces for ops-level DAMOS filters - "execute PROCMAP_QUERY ioctl under per-vma lock" from Suren Baghdasaryan reduces mmap_lock contention during reads of /proc/pid/maps - "mm/mincore: minor clean up for swap cache checking" from Kairui Song performs some cleanup in the swap code - "mm: vm_normal_page*() improvements" from David Hildenbrand provides code cleanup in the pagemap code - "add persistent huge zero folio support" from Pankaj Raghav provides a block layer speedup by optionalls making the huge_zero_pagepersistent, instead of releasing it when its refcount falls to zero - "kho: fixes and cleanups" from Mike Rapoport adds a few touchups to the recently added Kexec Handover feature - "mm: make mm->flags a bitmap and 64-bit on all arches" from Lorenzo Stoakes turns mm_struct.flags into a bitmap. To end the constant struggle with space shortage on 32-bit conflicting with 64-bit's needs - "mm/swapfile.c and swap.h cleanup" from Chris Li cleans up some swap code - "selftests/mm: Fix false positives and skip unsupported tests" from Donet Tom fixes a few things in our selftests code - "prctl: extend PR_SET_THP_DISABLE to only provide THPs when advised" from David Hildenbrand "allows individual processes to opt-out of THP=always into THP=madvise, without affecting other workloads on the system". It's a long story - the [1/N] changelog spells out the considerations - "Add and use memdesc_flags_t" from Matthew Wilcox gets us started on the memdesc project. Please see https://kernelnewbies.org/MatthewWilcox/Memdescs and https://blogs.oracle.com/linux/post/introducing-memdesc - "Tiny optimization for large read operations" from Chi Zhiling improves the efficiency of the pagecache read path - "Better split_huge_page_test result check" from Zi Yan improves our folio splitting selftest code - "test that rmap behaves as expected" from Wei Yang adds some rmap selftests - "remove write_cache_pages()" from Christoph Hellwig removes that function and converts its two remaining callers - "selftests/mm: uffd-stress fixes" from Dev Jain fixes some UFFD selftests issues - "introduce kernel file mapped folios" from Boris Burkov introduces the concept of "kernel file pages". Using these permits btrfs to account its metadata pages to the root cgroup, rather than to the cgroups of random inappropriate tasks - "mm/pageblock: improve readability of some pageblock handling" from Wei Yang provides some readability improvements to the page allocator code - "mm/damon: support ARM32 with LPAE" from SeongJae Park teaches DAMON to understand arm32 highmem - "tools: testing: Use existing atomic.h for vma/maple tests" from Brendan Jackman performs some code cleanups and deduplication under tools/testing/ - "maple_tree: Fix testing for 32bit compiles" from Liam Howlett fixes a couple of 32-bit issues in tools/testing/radix-tree.c - "kasan: unify kasan_enabled() and remove arch-specific implementations" from Sabyrzhan Tasbolatov moves KASAN arch-specific initialization code into a common arch-neutral implementation - "mm: remove zpool" from Johannes Weiner removes zspool - an indirection layer which now only redirects to a single thing (zsmalloc) - "mm: task_stack: Stack handling cleanups" from Pasha Tatashin makes a couple of cleanups in the fork code - "mm: remove nth_page()" from David Hildenbrand makes rather a lot of adjustments at various nth_page() callsites, eventually permitting the removal of that undesirable helper function - "introduce kasan.write_only option in hw-tags" from Yeoreum Yun creates a KASAN read-only mode for ARM, using that architecture's memory tagging feature. It is felt that a read-only mode KASAN is suitable for use in production systems rather than debug-only - "mm: hugetlb: cleanup hugetlb folio allocation" from Kefeng Wang does some tidying in the hugetlb folio allocation code - "mm: establish const-correctness for pointer parameters" from Max Kellermann makes quite a number of the MM API functions more accurate about the constness of their arguments. This was getting in the way of subsystems (in this case CEPH) when they attempt to improving their own const/non-const accuracy - "Cleanup free_pages() misuse" from Vishal Moola fixes a number of code sites which were confused over when to use free_pages() vs __free_pages() - "Add Rust abstraction for Maple Trees" from Alice Ryhl makes the mapletree code accessible to Rust. Required by nouveau and by its forthcoming successor: the new Rust Nova driver - "selftests/mm: split_huge_page_test: split_pte_mapped_thp improvements" from David Hildenbrand adds a fix and some cleanups to the thp selftesting code - "mm, swap: introduce swap table as swap cache (phase I)" from Chris Li and Kairui Song is the first step along the path to implementing "swap tables" - a new approach to swap allocation and state tracking which is expected to yield speed and space improvements. This patchset itself yields a 5-20% performance benefit in some situations - "Some ptdesc cleanups" from Matthew Wilcox utilizes the new memdesc layer to clean up the ptdesc code a little - "Fix va_high_addr_switch.sh test failure" from Chunyu Hu fixes some issues in our 5-level pagetable selftesting code - "Minor fixes for memor