aboutsummaryrefslogtreecommitdiff
path: root/fs/gfs2/super.c
AgeCommit message (Collapse)AuthorFilesLines
2026-04-15Merge tag 'gfs2-for-7.1' of ↵Linus Torvalds1-24/+87
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix possible data loss during inode evict - Fix a race during bufdata allocation - More careful cleaning up during a withdraw - Prevent excessive log flushing under memory pressure - Various other minor fixes and cleanups * tag 'gfs2-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: prevent NULL pointer dereference during unmount gfs2: hide error messages after withdraw gfs2: wait for withdraw earlier during unmount gfs2: inode directory consistency checks gfs2: gfs2_log_flush withdraw fixes gfs2: add some missing log locking gfs2: fix address space truncation during withdraw gfs2: drain ail under sd_log_flush_lock gfs2: bufdata allocation race gfs2: Remove trans_drain code duplication gfs2: Move gfs2_remove_from_journal to log.c gfs2: Get rid of gfs2_log_[un]lock helpers gfs2: less aggressive low-memory log flushing gfs2: Fix data loss during inode evict gfs2: minor evict_[un]linked_inode cleanup gfs2: Avoid unnecessary transactions in evict_linked_inode gfs2: Remove unnecessary check in gfs2_evict_inode gfs2: Call unlock_new_inode before d_instantiate
2026-04-13Merge tag 'vfs-7.1-rc1.writeback' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs writeback updates from Christian Brauner: "This introduces writeback helper APIs and converts f2fs, gfs2 and nfs to stop accessing writeback internals directly" * tag 'vfs-7.1-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nfs: stop using writeback internals for WB_WRITEBACK accounting gfs2: stop using writeback internals for dirty_exceeded check f2fs: stop using writeback internals for dirty_exceeded checks writeback: prep helpers for dirty-limit and writeback accounting
2026-04-07gfs2: hide error messages after withdrawAndreas Gruenbacher1-1/+1
In gfs2_evict_inode(), don't issue error messages once a withdraw has already occurred. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07gfs2: wait for withdraw earlier during unmountAndreas Gruenbacher1-2/+3
During an unmount, wait for potential withdraw to complete before calling gfs2_make_fs_ro(). This will allow gfs2_make_fs_ro() to skip much of its work. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-04-07gfs2: fix address space truncation during withdrawAndreas Gruenbacher1-13/+28
When a withdrawn filesystem's inodes are being evicted, the address spaces of those inodes still need to be truncated but we can no longer start new transactions. We still don't want gfs2_invalidate_folio() to race with gfs2_log_flush(), so take a read lock on sdp->sd_log_flush_lock in that case. (It may not be obvious, but gfs2_invalidate_folio() is a jdata-only address space operation.) Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-03-23gfs2: Fix data loss during inode evictAndreas Gruenbacher1-8/+28
When gfs2_evict_inode() is called on an inode with unwritten data in the page cache, the page cache needs to be written before it can be truncated. This doesn't always happen. Fix that by changing gfs2_evict_inode() to always either call evict_linked_inode() or evict_unlinked_inode(). Inside evict_unlinked_inode(), first check if the inode is dirty. If it is, make sure the inode glock is held and write back the data and metadata. If it isn't, skip those steps. Also, make sure that gfs2_evict_inode() calls gfs2_evict_inode() and evict_unlinked_inode() only if ip->i_gl is not NULL; this avoids unnecessary complications there. Fixes xfstest generic/211. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-03-23gfs2: minor evict_[un]linked_inode cleanupAndreas Gruenbacher1-6/+8
Add gl helper variables in evict_unlinked_inode() and evict_linked_inode(). This patch isn't very interesting by itself, but it makes the next patch more readable. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-03-23gfs2: Avoid unnecessary transactions in evict_linked_inodeAndreas Gruenbacher1-5/+30
In evict_linked_inode(), the truncate_inode_pages() calls are carried out inside a transaction. This code was added to what was then function gfs2_delete_inode() in commit 16615be18cadf ("[GFS2] Clean up journaled data writing"). These transactions are only used for creating revokes for the jdata buffers in the journal, so don't create such transactions when we know that the address space doesn't contain any jdata buffers for this inode and truncate the metadata address space outside of the transaction. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-02-24gfs2: Remove unnecessary check in gfs2_evict_inodeAndreas Gruenbacher1-1/+1
We are no longer using LM_FLAG_TRY or LM_FLAG_TRY_1CB during inode evict, so ret cannot be GLR_TRYFAILED here. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds1-2/+2
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook1-2/+2
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-17gfs2: stop using writeback internals for dirty_exceeded checkKundan Kumar1-1/+1
Convert gfs2 dirty_exceeded handling to use the writeback core helper instead of accessing writeback directly. Reviewed-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Suggested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com> Signed-off-by: Anuj Gupta <anuj20.g@samsung.com> Link: https://patch.msgid.link/20260213054634.79785-4-kundan.kumar@samsung.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-03gfs2: fix memory leaks in gfs2_fill_super error pathDeepanshu Kartikey1-1/+3
Fix two memory leaks in the gfs2_fill_super() error handling path when transitioning a filesystem to read-write mode fails. First leak: kthread objects (thread_struct, task_struct, etc.) When gfs2_freeze_lock_shared() fails after init_threads() succeeds, the created kernel threads (logd and quotad) are never destroyed. This occurs because the fail_per_node label doesn't call gfs2_destroy_threads(). Second leak: quota bitmap buffer (8192 bytes) When gfs2_make_fs_rw() fails after gfs2_quota_init() succeeds but before other operations complete, the allocated quota bitmap is never freed. The fix moves thread cleanup to the fail_per_node label to handle all error paths uniformly. gfs2_destroy_threads() is safe to call unconditionally as it checks for NULL pointers. Quota cleanup is added in gfs2_make_fs_rw() to properly handle the withdrawal case where quota initialization succeeds but the filesystem is then withdrawn. Thread leak backtrace (gfs2_freeze_lock_shared failure): unreferenced object 0xffff88801d7bca80 (size 4480): copy_process+0x3a1/0x4670 kernel/fork.c:2422 kernel_clone+0xf3/0x6e0 kernel/fork.c:2779 kthread_create_on_node+0x100/0x150 kernel/kthread.c:478 init_threads+0xab/0x350 fs/gfs2/ops_fstype.c:611 gfs2_fill_super+0xe5c/0x1240 fs/gfs2/ops_fstype.c:1265 Quota leak backtrace (gfs2_make_fs_rw failure): unreferenced object 0xffff88812de7c000 (size 8192): gfs2_quota_init+0xe5/0x820 fs/gfs2/quota.c:1409 gfs2_make_fs_rw+0x7a/0xe0 fs/gfs2/super.c:149 gfs2_fill_super+0xfbb/0x1240 fs/gfs2/ops_fstype.c:1275 Reported-by: syzbot+aac438d7a1c44071e04b@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=aac438d7a1c44071e04b Fixes: 6c7410f44961 ("gfs2: gfs2_freeze_lock_shared cleanup") Fixes: b66f723bb552 ("gfs2: Improve gfs2_make_fs_rw error handling") Link: https://lore.kernel.org/all/20260131062509.77974-1-kartikey406@gmail.com/T/ [v1] Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: No longer thaw filesystems during a withdrawAndreas Gruenbacher1-14/+0
Previously, when a withdraw occurred, we would wait for another node to recover our journal. This also meant that frozen filesystem needed to be thawed because otherwise, other nodes wouldn't be able to recover the filesystem. With the reversal of commit 601ef0d52e96 ("gfs2: Force withdraw to replay journals and wait for it to finish"), we are no longer waiting for journal recovery during a withdraw, so we no longer need to thaw frozen filesystems, either. This also fixes a potential deadlock reported by lockdep when running xfstest generic/108. In addition, there is nothing left in do_withdraw() that would require taking sd_freeze_mutex, so don't bother taking that lock there anymore. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: New gfs2_withdraw_helperAndreas Gruenbacher1-0/+3
Currently, when a gfs2 filesystem is withdrawn, an "offline" uevent is triggered that invokes gfs2-util's gfs2_withdraw_helper script. The purpose of this script is to deactivate the filesystem's block device so that it can be withdrawn immediately, even before all the filesystem's caches have been discarded. The script provided by gfs2-utils never did anything useful, and there was no way for it to report back its status to the kernel. To fix that, extend the gfs2_withdraw_helper mechanism so that the script can report one of the following results by writing the corresponding value into "/sys$DEVPATH/lock_module/withdraw": 0 - The shared block device has been marked inactive. Future write operations will fail. 1 - The shared block device may still be active and carry out write operations. If the "offline" uevent isn't reacted upon within the timeout configured in /sys$DEVPATH/tune/withdraw_helper_timeout (default 5 seconds), the event handler is assumed to have failed. In addition, add an additional "errors=deactivate" mount option. With these changes, if fatal errors are detected on a gfs2 filesystem and the filesystem is mounted with the "errors=panic" option, the kernel will panic immediately. Otherwise, an attempt will be made to deactivate the underlying block device. If successful, the kernel will release all cluster-wide locks immediately so that the rest of the cluster can continue. If unsuccessful, the kernel will either panic ("errors=deactivate"), or it will purge all filesystem I/O before releasing all cluster-wide locks ("errors=withdraw"). Note that the gfs2_withdraw_helper script still needs to be fixed to take advantage of these improvements. It could be changed to use a mechanism like LVM Persistent Reservations. "dmsetup suspend" is not a suitable mechanism as it infinitely postpones I/O operations, which may prevent withdraw from completing. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Rename LM_FLAG_{NOEXP -> RECOVER}Andreas Gruenbacher1-1/+1
GFS sets the LM_FLAG_NOEXP flag on locking requests it makes during journal recovery, so rename the flag to LM_FLAG_RECOVER for improved code readability. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Rename gfs2_{withdrawing_or_ => }withdrawnAndreas Gruenbacher1-5/+5
With delayed withdraws and the SDF_WITHDRAWING flag gone, we can now rename gfs2_withdrawing_or_withdrawn() back to gfs2_withdrawn(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: Asynchronous withdrawAndreas Gruenbacher1-1/+1
So far, withdraws are carried out in the context of the calling task. When another task tries to withdraw while a withdraw is already underway, that task blocks as well. Change that to carry out withdraws asynchronously in workqueue context and don't block the task triggering the withdraw anymore. Fixes: syzbot+6b156e132970e550194c@syzkaller.appspotmail.com Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-11-26gfs2: fix freeze error handlingAlexey Velichayshiy1-3/+1
After commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"), the freeze error handling is broken because gfs2_do_thaw() overwrites the 'error' variable, causing incorrect processing of the original freeze error. Fix this by calling gfs2_do_thaw() when gfs2_lock_fs_check_clean() fails but ignoring its return value to preserve the original freeze error for proper reporting. Found by Linux Verification Center (linuxtesting.org) with SVACE. Fixes: b77b4a4815a9 ("gfs2: Rework freeze / thaw logic") Cc: stable@vger.kernel.org # v6.5+ Signed-off-by: Alexey Velichayshiy <a.velichayshiy@ispras.ru> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-09-15fs: rename generic_delete_inode() and generic_drop_inode()Mateusz Guzik1-1/+1
generic_delete_inode() is rather misleading for what the routine is doing. inode_just_drop() should be much clearer. The new naming is inconsistent with generic_drop_inode(), so rename that one as well with inode_ as the suffix. No functional changes. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-09gfs2: Remove GIF_ALLOC_FAILED flagAndreas Gruenbacher1-4/+2
Get rid of the GIF_ALLOC_FAILED flag; we can now be confident that the additional consistency check isn't needed. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-26Merge tag 'gfs2-for-6.16' of ↵Linus Torvalds1-88/+6
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Fix the long-standing warnings in inode_to_wb() when CONFIG_LOCKDEP is enabled: gfs2 doesn't support cgroup writeback and so inode->i_wb will never change. This is the counterpart of commit 9e888998ea4d ("writeback: fix false warning in inode_to_wb()") - Fix a hang introduced by commit 8d391972ae2d ("gfs2: Remove __gfs2_writepage()"): prevent gfs2_logd from creating transactions for jdata pages while trying to flush the log - Fix a race between gfs2_create_inode() and gfs2_evict_inode() by deallocating partially created inodes on the gfs2_create_inode() error path - Fix a bug in the journal head lookup code that could cause mount to fail after successful recovery - Various smaller fixes and cleanups from various people * tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits) gfs2: No more gfs2_find_jhead caching gfs2: Get rid of duplicate log head lookup gfs2: Simplify clean_journal gfs2: Simplify gfs2_log_pointers_init gfs2: Move gfs2_log_pointers_init gfs2: Minor comments fix gfs2: Don't start unnecessary transactions during log flush gfs2: Move gfs2_trans_add_databufs gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folio gfs2: avoid inefficient use of crc32_le_shift() gfs2: Do not call iomap_zero_range beyond eof gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batch gfs2: Fix usage of bio->bi_status in gfs2_end_log_write gfs2: deallocate inodes in gfs2_create_inode gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_dealloc gfs2: Move gfs2_dinode_dealloc gfs2: Don't reread inodes unnecessarily gfs2: gfs2_create_inode error handling fix gfs2: Remove unnecessary NULL check before free_percpu() gfs2: check sb_min_blocksize return value ...
2025-05-22gfs2: No more gfs2_find_jhead cachingAndreas Gruenbacher1-1/+1
We are no longer calling gfs2_find_jhead() on the same log twice, so there is no more reason for keeping the log contents cached across those calls. In addition, log head lookup and log header writing didn't go through the same address space and so the caching wasn't even fully working, anyway. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Get rid of duplicate log head lookupAndreas Gruenbacher1-12/+3
Currently at mount time, the recovery code looks up the current log head and, if necessary, replays the log and writes a recovery header to indicate that the log is clean. It does that for each log that may need recovery. We also know that our own log will always be checked as part of that process. Then, the mount code looks up the log head of our own log again. The double log head lookup can be costly, but more importantly, it is unnecessary because we can trivially compute the position of the log head after recovery; all we need to do for that is bump the position and lh_sequence by one when writing a recovery header. With that in mind, move the call to gfs2_log_pointers_init() into gfs2_recover_func() and get rid of the double lookup in gfs2_make_fs_rw(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-22gfs2: Simplify gfs2_log_pointers_initAndreas Gruenbacher1-2/+1
Move the initialization of sdp->sd_log_sequence and sdp->sd_log_flush_head inside gfs2_log_pointers_init(). Use gfs2_replay_incr_blk(). Before this change, the log head lookup code in freeze_go_xmote_bh() didn't update sdp->sd_log_flush_head. This is now fixed, but the code in freeze_go_xmote_bh() appears to be pretty useless in the first place: on a frozen filesystem, the log head will not change. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-05-09super: add filesystem freezing helpers for suspend and hibernateChristian Brauner1-9/+13
Allow the power subsystem to support filesystem freeze for suspend and hibernate. For some kernel subsystems it is paramount that they are guaranteed that they are the owner of the freeze to avoid any risk of deadlocks. This is the case for the power subsystem. Enable it to recognize whether it did actually freeze the filesystem. If userspace has 10 filesystems and suspend/hibernate manges to freeze 5 and then fails on the 6th for whatever odd reason (current or future) then power needs to undo the freeze of the first 5 filesystems. It can't just walk the list again because while it's unlikely that a new filesystem got added in the meantime it still cannot tell which filesystems the power subsystem actually managed to get a freeze reference count on that needs to be dropped during thaw. There's various ways out of this ugliness. For example, record the filesystems the power subsystem managed to freeze on a temporary list in the callbacks and then walk that list backwards during thaw to undo the freezing or make sure that the power subsystem just actually exclusively freezes things it can freeze and marking such filesystems as being owned by power for the duration of the suspend or resume cycle. I opted for the latter as that seemed the clean thing to do even if it means more code changes. If hibernation races with filesystem freezing (e.g. DM reconfiguration), then hibernation need not freeze a filesystem because it's already frozen but userspace may thaw the filesystem before hibernation actually happens. If the race happens the other way around, DM reconfiguration may unexpectedly fail with EBUSY. So allow FREEZE_EXCL to nest with other holders. An exclusive freezer cannot be undone by any of the other concurrent freezers. Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-04-24gfs2: deallocate inodes in gfs2_create_inodeAndreas Gruenbacher1-5/+1
When creating and destroying inodes, we are relying on the inode hash table to make sure that for a given inode number, only a single inode will exist. We then link that inode to its inode and iopen glock and let those glocks point back at the inode. However, when iget_failed() is called, the inode is removed from the inode hash table before gfs_evict_inode() is called, and uniqueness is no longer guaranteed. Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work around that problem by detaching the inode glock from the inode before calling iget_failed(), but that broke the inode deallocation code in gfs_evict_inode(). To fix that, deallocate partially created inodes in gfs2_create_inode() instead of relying on gfs_evict_inode() for doing that. This means that gfs2_evict_inode() and its helper functions will no longer see partially created inodes, and so some simplifications are possible there. Fixes: 9ffa18884cce ("gfs2: gl_object races fix") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_deallocAndreas Gruenbacher1-1/+1
Don't check for the GIF_ALLOC_FAILED flag in gfs2_ea_dealloc() and pass that information explicitly instead. This allows for a cleaner follow-up patch. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: Move gfs2_dinode_deallocAndreas Gruenbacher1-68/+0
Move gfs2_dinode_dealloc() and its helper gfs2_final_release_pages() from super.c to inode.c. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-21gfs2: replace sd_aspace with sd_inodeAndreas Gruenbacher1-1/+1
Currently, sdp->sd_aspace and the per-inode metadata address spaces use sb->s_bdev->bd_mapping->host as their ->host; folios in those address spaces will thus appear to be on bdev rather than on gfs2 filesystems. This is a problem because gfs2 doesn't support cgroup writeback (SB_I_CGROUPWB), but bdev does. Fix that by using a "dummy" gfs2 inode as ->host in those address spaces. When coming from a folio, folio->mapping->host->i_sb will then be a gfs2 super block and the SB_I_CGROUPWB flag will not be set in sb->s_iflags. Based on a previous version from Bob Peterson from several years ago. Thanks to Tetsuo Handa, Jan Kara, and Rafael Aquini for helping figure this out. Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-04-07gfs2: pass through holder from the VFS for freeze/thawChristian Brauner1-6/+8
The filesystem's freeze/thaw functions can be called from contexts where the holder isn't userspace but the kernel, e.g., during systemd suspend/hibernate. So pass through the freeze/thaw flags from the VFS instead of hard-coding them. Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-10gfs2: skip if we cannot defer deleteAndreas Gruenbacher1-2/+2
In gfs2_evict_inode(), in the unlikely case that we cannot defer deleting the inode, it is not safe to fall back to deleting the inode; the only valid choice we have is to skip the delete. In addition, in evict_should_delete(), if we cannot lock the inode glock exclusively, we are in a bad enough state that skipping the delete is likely a better choice than trying to recover from the failure later. Fixes: c5b7a2400edc ("gfs2: Only defer deletes when we have an iopen glock") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: minor evict fixAndreas Gruenbacher1-14/+3
In evict_should_delete(), when gfs2_upgrade_iopen_glock() fails, we detach the iopen glock from the inode without calling glock_clear_object(). This leads to a warning in glock_set_object() when the same inode is recreated and the glock is reused. Fix that by only detaching the iopen glock in gfs2_evict_inode(). In addition, remove the dequeue code from evict_should_delete(); we already perform a conditional dequeue in gfs2_evict_inode(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-03-10gfs2: Replace GIF_DEFER_DELETE with GLF_DEFER_DELETEAndreas Gruenbacher1-1/+2
Having this flag attached to the iopen glock instead of the inode is much simpler; it eliminates a protential weird race in gfs2_try_evict(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-19gfs2: Only defer deletes when we have an iopen glockAndreas Gruenbacher1-4/+7
The mechanism to defer deleting unlinked inodes is tied to delete_work_func(), which is tied to iopen glocks. When we don't have an iopen glock, we must carry out deletes immediately instead. Fixes a NULL pointer dereference in gfs2_evict_inode(). Fixes: 8c21c2c71e66 ("gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: gfs2_evict_inode clarificationAndreas Gruenbacher1-3/+3
When function evict_should_delete() returns SHOULD_DEFER_EVICTION, gh is never initialized, but that isn't obvious; if it did initialize gh and then return SHOULD_DEFER_EVICTION, gfs2_evict_inode() would fail to release it. To clarify the code, change gfs2_evict_inode() to always check if gh needs to be released, no matter what evict_should_delete() returns. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Update to the evict / remote delete documentationAndreas Gruenbacher1-3/+3
Try to be a bit more clear and remove some duplications. We cannot actually get rid of the verification step eventually, so remove the comment saying so. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inodeAndreas Gruenbacher1-1/+8
Move calls to gfs2_queue_verify_delete() into gfs2_evict_inode(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Return enum evict_behavior from gfs2_upgrade_iopen_glockAndreas Gruenbacher1-6/+14
In case an iopen glock cannot be upgraded, function gfs2_upgrade_iopen_glock() needs to communicate to gfs2_evict_inode() whether deleting the inode should be deferred or skipped altogether. Change the function to return the appropriate enum evict_behavior value to indicate that. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Rename dinode_demise to evict_behaviorAndreas Gruenbacher1-18/+19
Rename enum dinode_demise to evict_behavior and its items SHOULD_DELETE_DINODE to EVICT_SHOULD_DELETE, SHOULD_NOT_DELETE_DINODE to EVICT_SHOULD_SKIP_DELETE, and SHOULD_DEFER_EVICTION to EVICT_SHOULD_DEFER_DELETE. In gfs2_evict_inode(), add a separate variable of type enum evict_behavior instead of implicitly casting to int. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Rename GIF_{DEFERRED -> DEFER}_DELETEAndreas Gruenbacher1-1/+1
The GIF_DEFERRED_DELETE flag indicates an action that gfs2_evict_inode() should take, so rename the flag to GIF_DEFER_DELETE to clarify. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-11-05gfs2: Faster gfs2_upgrade_iopen_glock wakeupsAndreas Gruenbacher1-2/+2
Move function needs_demote() to glock.h and rename it to glock_needs_demote(). In handle_callback(), wake up the glock when setting the GLF_PENDING_DEMOTE flag as well. (Setting the GLF_DEMOTE flag already triggered a wake-up.) With that, check for glock_needs_demote() in gfs2_upgrade_iopen_glock() to wake up when either of those flags is set for the inode glock: the faster we can react to contention, the better. The GLF_PENDING_DEMOTE flag is only used for inode glocks (see gfs2_glock_cb()) so it's okay to only check for the GLF_DEMOTE flag in gfs2_drop_inode(). Still, using glock_needs_demote() there as well makes the code a little easier to read. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-10-22KMSAN: uninit-value in inode_go_dump (5)Qianqiang Liu1-0/+2
When mounting of a corrupted disk image fails, the error message printed can reference uninitialized inode fields. To prevent that from happening, always initialize those fields. Reported-by: syzbot+aa0730b0a42646eb1359@syzkaller.appspotmail.com Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-09-25gfs2: Fix unlinked inode cleanupAndreas Gruenbacher1-1/+1
Before commit f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete" work"), function delete_work_func() was used to trigger the eviction of in-memory inodes from remote as well as deleting unlinked inodes at a later point. These two kinds of work were then split into two kinds of work, and the two places in the code were deferred deletion of inodes is required accidentally ended up queuing the wrong kind of work. This caused unlinked inodes to be left behind, which could in the worst case fill up filesystems and require a filesystem check to recover. Fix that by queuing the right kind of work in try_rgrp_unlink() and gfs2_drop_inode(). Fixes: f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete" work") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-29gfs2: Revise glock reference counting modelAndreas Gruenbacher1-1/+0
In the current glock reference counting model, a bias of one is added to a glock's refcount when it is locked (gl->gl_state != LM_ST_UNLOCKED). A glock is removed from the lru_list when it is enqueued, and added back when it is dequeued. This isn't a very appropriate model because most glocks are held for long periods of time (for example, the inode "owns" references to its inode and iopen glocks as long as the inode is cached even when the glock state changes to LM_ST_UNLOCKED), and they can only be freed when they are no longer referenced, anyway. Fix this by getting rid of the refcount bias for locked glocks. That way, we can use lockref_put_or_lock() to efficiently drop all but the last glock reference, and put the glock onto the lru_list when the last reference is dropped. When find_insert_glock() returns a reference to a cached glock, it removes the glock from the lru_list. Dumping the "glocks" and "glstats" debugfs files also takes glock references, but instead of removing the glocks from the lru_list in that case as well, we leave them on the list. This ensures that dumping those files won't perturb the order of the glocks on the lru_list. In addition, when the last reference to an *unlocked* glock is dropped, we immediately free it; this preserves the preexisting behavior. If it later turns out that caching unlocked glocks is useful in some situations, we can change the caching strategy. It is currently unclear if a glock that has no active references can have the GLF_LFLUSH flag set. To make sure that such a glock won't accidentally be evicted due to memory pressure, we add a GLF_LFLUSH check to gfs2_dispose_glock_lru(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-05-07gfs2: make timeout values more explicitWolfram Sang1-3/+2
'timeout' is a vague name for the return value of wait_event_*_timeout because it actually returns the time left. Because the variable is never used later, just drop the return value. Since variable 'timeout' is then only used to carry a fixed timeout value, drop this in favor of a fixed function argument as in the other call to wait_event_timeout() above. Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-29gfs2: gfs2_freeze_unlock cleanupAndreas Gruenbacher1-6/+6
Function gfs2_freeze_unlock() is always called with &sdp->sd_freeze_gh as its argument, so clean up the code by passing in sdp instead. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-24gfs2: Fix potential glock use-after-free on unmountAndreas Gruenbacher1-3/+0
When a DLM lockspace is released and there ares still locks in that lockspace, DLM will unlock those locks automatically. Commit fb6791d100d1b started exploiting this behavior to speed up filesystem unmount: gfs2 would simply free glocks it didn't want to unlock and then release the lockspace. This didn't take the bast callbacks for asynchronous lock contention notifications into account, which remain active until until a lock is unlocked or its lockspace is released. To prevent those callbacks from accessing deallocated objects, put the glocks that should not be unlocked on the sd_dead_glocks list, release the lockspace, and only then free those glocks. As an additional measure, ignore unexpected ast and bast callbacks if the receiving glock is dead. Fixes: fb6791d100d1b ("GFS2: skip dlm_unlock calls in unmount") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Cc: David Teigland <teigland@redhat.com>
2024-04-09gfs2: Replace gfs2_glock_queue_put with gfs2_glock_put_asyncAndreas Gruenbacher1-2/+2
Function gfs2_glock_queue_put() puts a glock reference by enqueuing glock work instead of putting the reference directly. This ensures that the operation won't sleep, but it is costly and really only necessary when putting the final glock reference. Replace it with a new gfs2_glock_put_async() function that only queues glock work when putting the last glock reference. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-04-09gfs2: Fix NULL pointer dereference in gfs2_log_flushAndreas Gruenbacher1-0/+4
In gfs2_jindex_free(), set sdp->sd_jdesc to NULL under the log flush lock to provide exclusion against gfs2_log_flush(). In gfs2_log_flush(), check if sdp->sd_jdesc is non-NULL before dereferencing it. Otherwise, we could run into a NULL pointer dereference when outstanding glock work races with an unmount (glock_work_func -> run_queue -> do_xmote -> inode_go_sync -> gfs2_log_flush). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>