| Age | Commit message (Collapse) | Author | Files | Lines |
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Fix possible data loss during inode evict
- Fix a race during bufdata allocation
- More careful cleaning up during a withdraw
- Prevent excessive log flushing under memory pressure
- Various other minor fixes and cleanups
* tag 'gfs2-for-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2:
gfs2: prevent NULL pointer dereference during unmount
gfs2: hide error messages after withdraw
gfs2: wait for withdraw earlier during unmount
gfs2: inode directory consistency checks
gfs2: gfs2_log_flush withdraw fixes
gfs2: add some missing log locking
gfs2: fix address space truncation during withdraw
gfs2: drain ail under sd_log_flush_lock
gfs2: bufdata allocation race
gfs2: Remove trans_drain code duplication
gfs2: Move gfs2_remove_from_journal to log.c
gfs2: Get rid of gfs2_log_[un]lock helpers
gfs2: less aggressive low-memory log flushing
gfs2: Fix data loss during inode evict
gfs2: minor evict_[un]linked_inode cleanup
gfs2: Avoid unnecessary transactions in evict_linked_inode
gfs2: Remove unnecessary check in gfs2_evict_inode
gfs2: Call unlock_new_inode before d_instantiate
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull vfs writeback updates from Christian Brauner:
"This introduces writeback helper APIs and converts f2fs, gfs2 and nfs
to stop accessing writeback internals directly"
* tag 'vfs-7.1-rc1.writeback' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
nfs: stop using writeback internals for WB_WRITEBACK accounting
gfs2: stop using writeback internals for dirty_exceeded check
f2fs: stop using writeback internals for dirty_exceeded checks
writeback: prep helpers for dirty-limit and writeback accounting
|
|
In gfs2_evict_inode(), don't issue error messages once a withdraw has already
occurred.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
During an unmount, wait for potential withdraw to complete before calling
gfs2_make_fs_ro(). This will allow gfs2_make_fs_ro() to skip much of its work.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When a withdrawn filesystem's inodes are being evicted, the address spaces of
those inodes still need to be truncated but we can no longer start new
transactions. We still don't want gfs2_invalidate_folio() to race with
gfs2_log_flush(), so take a read lock on sdp->sd_log_flush_lock in that case.
(It may not be obvious, but gfs2_invalidate_folio() is a jdata-only address
space operation.)
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When gfs2_evict_inode() is called on an inode with unwritten data in the
page cache, the page cache needs to be written before it can be
truncated. This doesn't always happen. Fix that by changing
gfs2_evict_inode() to always either call evict_linked_inode() or
evict_unlinked_inode().
Inside evict_unlinked_inode(), first check if the inode is dirty. If it
is, make sure the inode glock is held and write back the data and
metadata. If it isn't, skip those steps.
Also, make sure that gfs2_evict_inode() calls gfs2_evict_inode() and
evict_unlinked_inode() only if ip->i_gl is not NULL; this avoids
unnecessary complications there.
Fixes xfstest generic/211.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Add gl helper variables in evict_unlinked_inode() and
evict_linked_inode(). This patch isn't very interesting by itself, but
it makes the next patch more readable.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In evict_linked_inode(), the truncate_inode_pages() calls are carried
out inside a transaction. This code was added to what was then function
gfs2_delete_inode() in commit 16615be18cadf ("[GFS2] Clean up journaled
data writing").
These transactions are only used for creating revokes for the jdata
buffers in the journal, so don't create such transactions when we know
that the address space doesn't contain any jdata buffers for this inode
and truncate the metadata address space outside of the transaction.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
We are no longer using LM_FLAG_TRY or LM_FLAG_TRY_1CB during inode
evict, so ret cannot be GLR_TRYFAILED here.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Convert gfs2 dirty_exceeded handling to use the writeback core helper
instead of accessing writeback directly.
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Kundan Kumar <kundan.kumar@samsung.com>
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Link: https://patch.msgid.link/20260213054634.79785-4-kundan.kumar@samsung.com
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Fix two memory leaks in the gfs2_fill_super() error handling path when
transitioning a filesystem to read-write mode fails.
First leak: kthread objects (thread_struct, task_struct, etc.)
When gfs2_freeze_lock_shared() fails after init_threads() succeeds, the
created kernel threads (logd and quotad) are never destroyed. This
occurs because the fail_per_node label doesn't call
gfs2_destroy_threads().
Second leak: quota bitmap buffer (8192 bytes)
When gfs2_make_fs_rw() fails after gfs2_quota_init() succeeds but
before other operations complete, the allocated quota bitmap is never
freed.
The fix moves thread cleanup to the fail_per_node label to handle all
error paths uniformly. gfs2_destroy_threads() is safe to call
unconditionally as it checks for NULL pointers. Quota cleanup is added
in gfs2_make_fs_rw() to properly handle the withdrawal case where
quota initialization succeeds but the filesystem is then withdrawn.
Thread leak backtrace (gfs2_freeze_lock_shared failure):
unreferenced object 0xffff88801d7bca80 (size 4480):
copy_process+0x3a1/0x4670 kernel/fork.c:2422
kernel_clone+0xf3/0x6e0 kernel/fork.c:2779
kthread_create_on_node+0x100/0x150 kernel/kthread.c:478
init_threads+0xab/0x350 fs/gfs2/ops_fstype.c:611
gfs2_fill_super+0xe5c/0x1240 fs/gfs2/ops_fstype.c:1265
Quota leak backtrace (gfs2_make_fs_rw failure):
unreferenced object 0xffff88812de7c000 (size 8192):
gfs2_quota_init+0xe5/0x820 fs/gfs2/quota.c:1409
gfs2_make_fs_rw+0x7a/0xe0 fs/gfs2/super.c:149
gfs2_fill_super+0xfbb/0x1240 fs/gfs2/ops_fstype.c:1275
Reported-by: syzbot+aac438d7a1c44071e04b@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=aac438d7a1c44071e04b
Fixes: 6c7410f44961 ("gfs2: gfs2_freeze_lock_shared cleanup")
Fixes: b66f723bb552 ("gfs2: Improve gfs2_make_fs_rw error handling")
Link: https://lore.kernel.org/all/20260131062509.77974-1-kartikey406@gmail.com/T/ [v1]
Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Previously, when a withdraw occurred, we would wait for another node to
recover our journal. This also meant that frozen filesystem needed to
be thawed because otherwise, other nodes wouldn't be able to recover the
filesystem. With the reversal of commit 601ef0d52e96 ("gfs2: Force
withdraw to replay journals and wait for it to finish"), we are no
longer waiting for journal recovery during a withdraw, so we no longer
need to thaw frozen filesystems, either. This also fixes a potential
deadlock reported by lockdep when running xfstest generic/108.
In addition, there is nothing left in do_withdraw() that would require
taking sd_freeze_mutex, so don't bother taking that lock there anymore.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Currently, when a gfs2 filesystem is withdrawn, an "offline" uevent is
triggered that invokes gfs2-util's gfs2_withdraw_helper script. The
purpose of this script is to deactivate the filesystem's block device so
that it can be withdrawn immediately, even before all the filesystem's
caches have been discarded. The script provided by gfs2-utils never did
anything useful, and there was no way for it to report back its status
to the kernel.
To fix that, extend the gfs2_withdraw_helper mechanism so that the
script can report one of the following results by writing the
corresponding value into "/sys$DEVPATH/lock_module/withdraw":
0 - The shared block device has been marked inactive. Future write
operations will fail.
1 - The shared block device may still be active and carry out
write operations.
If the "offline" uevent isn't reacted upon within the timeout configured
in /sys$DEVPATH/tune/withdraw_helper_timeout (default 5 seconds), the
event handler is assumed to have failed.
In addition, add an additional "errors=deactivate" mount option.
With these changes, if fatal errors are detected on a gfs2 filesystem
and the filesystem is mounted with the "errors=panic" option, the kernel
will panic immediately. Otherwise, an attempt will be made to
deactivate the underlying block device. If successful, the kernel will
release all cluster-wide locks immediately so that the rest of the
cluster can continue. If unsuccessful, the kernel will either panic
("errors=deactivate"), or it will purge all filesystem I/O before
releasing all cluster-wide locks ("errors=withdraw").
Note that the gfs2_withdraw_helper script still needs to be fixed to
take advantage of these improvements. It could be changed to use a
mechanism like LVM Persistent Reservations. "dmsetup suspend" is not a
suitable mechanism as it infinitely postpones I/O operations, which may
prevent withdraw from completing.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
GFS sets the LM_FLAG_NOEXP flag on locking requests it makes during
journal recovery, so rename the flag to LM_FLAG_RECOVER for improved
code readability.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
With delayed withdraws and the SDF_WITHDRAWING flag gone, we can now
rename gfs2_withdrawing_or_withdrawn() back to gfs2_withdrawn().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
So far, withdraws are carried out in the context of the calling task.
When another task tries to withdraw while a withdraw is already
underway, that task blocks as well. Change that to carry out withdraws
asynchronously in workqueue context and don't block the task triggering
the withdraw anymore.
Fixes: syzbot+6b156e132970e550194c@syzkaller.appspotmail.com
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
After commit b77b4a4815a9 ("gfs2: Rework freeze / thaw logic"),
the freeze error handling is broken because gfs2_do_thaw()
overwrites the 'error' variable, causing incorrect processing
of the original freeze error.
Fix this by calling gfs2_do_thaw() when gfs2_lock_fs_check_clean()
fails but ignoring its return value to preserve the original
freeze error for proper reporting.
Found by Linux Verification Center (linuxtesting.org) with SVACE.
Fixes: b77b4a4815a9 ("gfs2: Rework freeze / thaw logic")
Cc: stable@vger.kernel.org # v6.5+
Signed-off-by: Alexey Velichayshiy <a.velichayshiy@ispras.ru>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
generic_delete_inode() is rather misleading for what the routine is
doing. inode_just_drop() should be much clearer.
The new naming is inconsistent with generic_drop_inode(), so rename that
one as well with inode_ as the suffix.
No functional changes.
Signed-off-by: Mateusz Guzik <mjguzik@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
Get rid of the GIF_ALLOC_FAILED flag; we can now be confident that the
additional consistency check isn't needed.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2
Pull gfs2 updates from Andreas Gruenbacher:
- Fix the long-standing warnings in inode_to_wb() when CONFIG_LOCKDEP
is enabled: gfs2 doesn't support cgroup writeback and so inode->i_wb
will never change. This is the counterpart of commit 9e888998ea4d
("writeback: fix false warning in inode_to_wb()")
- Fix a hang introduced by commit 8d391972ae2d ("gfs2: Remove
__gfs2_writepage()"): prevent gfs2_logd from creating transactions
for jdata pages while trying to flush the log
- Fix a race between gfs2_create_inode() and gfs2_evict_inode() by
deallocating partially created inodes on the gfs2_create_inode()
error path
- Fix a bug in the journal head lookup code that could cause mount to
fail after successful recovery
- Various smaller fixes and cleanups from various people
* tag 'gfs2-for-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits)
gfs2: No more gfs2_find_jhead caching
gfs2: Get rid of duplicate log head lookup
gfs2: Simplify clean_journal
gfs2: Simplify gfs2_log_pointers_init
gfs2: Move gfs2_log_pointers_init
gfs2: Minor comments fix
gfs2: Don't start unnecessary transactions during log flush
gfs2: Move gfs2_trans_add_databufs
gfs2: Rename jdata_dirty_folio to gfs2_jdata_dirty_folio
gfs2: avoid inefficient use of crc32_le_shift()
gfs2: Do not call iomap_zero_range beyond eof
gfs: don't check for AOP_WRITEPAGE_ACTIVATE in gfs2_write_jdata_batch
gfs2: Fix usage of bio->bi_status in gfs2_end_log_write
gfs2: deallocate inodes in gfs2_create_inode
gfs2: Move GIF_ALLOC_FAILED check out of gfs2_ea_dealloc
gfs2: Move gfs2_dinode_dealloc
gfs2: Don't reread inodes unnecessarily
gfs2: gfs2_create_inode error handling fix
gfs2: Remove unnecessary NULL check before free_percpu()
gfs2: check sb_min_blocksize return value
...
|
|
We are no longer calling gfs2_find_jhead() on the same log twice, so
there is no more reason for keeping the log contents cached across those
calls. In addition, log head lookup and log header writing didn't go
through the same address space and so the caching wasn't even fully
working, anyway.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Currently at mount time, the recovery code looks up the current log head
and, if necessary, replays the log and writes a recovery header to
indicate that the log is clean. It does that for each log that may need
recovery. We also know that our own log will always be checked as part
of that process. Then, the mount code looks up the log head of our own
log again.
The double log head lookup can be costly, but more importantly, it is
unnecessary because we can trivially compute the position of the log
head after recovery; all we need to do for that is bump the position and
lh_sequence by one when writing a recovery header.
With that in mind, move the call to gfs2_log_pointers_init() into
gfs2_recover_func() and get rid of the double lookup in
gfs2_make_fs_rw().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move the initialization of sdp->sd_log_sequence and
sdp->sd_log_flush_head inside gfs2_log_pointers_init(). Use
gfs2_replay_incr_blk().
Before this change, the log head lookup code in freeze_go_xmote_bh()
didn't update sdp->sd_log_flush_head. This is now fixed, but the code
in freeze_go_xmote_bh() appears to be pretty useless in the first place:
on a frozen filesystem, the log head will not change.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Allow the power subsystem to support filesystem freeze for
suspend and hibernate.
For some kernel subsystems it is paramount that they are guaranteed that
they are the owner of the freeze to avoid any risk of deadlocks. This is
the case for the power subsystem. Enable it to recognize whether it did
actually freeze the filesystem.
If userspace has 10 filesystems and suspend/hibernate manges to freeze 5
and then fails on the 6th for whatever odd reason (current or future)
then power needs to undo the freeze of the first 5 filesystems. It can't
just walk the list again because while it's unlikely that a new
filesystem got added in the meantime it still cannot tell which
filesystems the power subsystem actually managed to get a freeze
reference count on that needs to be dropped during thaw.
There's various ways out of this ugliness. For example, record the
filesystems the power subsystem managed to freeze on a temporary list in
the callbacks and then walk that list backwards during thaw to undo the
freezing or make sure that the power subsystem just actually exclusively
freezes things it can freeze and marking such filesystems as being owned
by power for the duration of the suspend or resume cycle. I opted for
the latter as that seemed the clean thing to do even if it means more
code changes.
If hibernation races with filesystem freezing (e.g. DM reconfiguration),
then hibernation need not freeze a filesystem because it's already
frozen but userspace may thaw the filesystem before hibernation actually
happens.
If the race happens the other way around, DM reconfiguration may
unexpectedly fail with EBUSY.
So allow FREEZE_EXCL to nest with other holders. An exclusive freezer
cannot be undone by any of the other concurrent freezers.
Link: https://lore.kernel.org/r/20250329-work-freeze-v2-6-a47af37ecc3d@kernel.org
Reviewed-by: Jan Kara <jack@suse.cz>
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
When creating and destroying inodes, we are relying on the inode hash
table to make sure that for a given inode number, only a single inode
will exist. We then link that inode to its inode and iopen glock and
let those glocks point back at the inode. However, when iget_failed()
is called, the inode is removed from the inode hash table before
gfs_evict_inode() is called, and uniqueness is no longer guaranteed.
Commit f1046a472b70 ("gfs2: gl_object races fix") was trying to work
around that problem by detaching the inode glock from the inode before
calling iget_failed(), but that broke the inode deallocation code in
gfs_evict_inode().
To fix that, deallocate partially created inodes in gfs2_create_inode()
instead of relying on gfs_evict_inode() for doing that.
This means that gfs2_evict_inode() and its helper functions will no
longer see partially created inodes, and so some simplifications are
possible there.
Fixes: 9ffa18884cce ("gfs2: gl_object races fix")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Don't check for the GIF_ALLOC_FAILED flag in gfs2_ea_dealloc() and pass
that information explicitly instead. This allows for a cleaner
follow-up patch.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move gfs2_dinode_dealloc() and its helper gfs2_final_release_pages()
from super.c to inode.c.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Currently, sdp->sd_aspace and the per-inode metadata address spaces use
sb->s_bdev->bd_mapping->host as their ->host; folios in those address
spaces will thus appear to be on bdev rather than on gfs2 filesystems.
This is a problem because gfs2 doesn't support cgroup writeback
(SB_I_CGROUPWB), but bdev does.
Fix that by using a "dummy" gfs2 inode as ->host in those address
spaces. When coming from a folio, folio->mapping->host->i_sb will then
be a gfs2 super block and the SB_I_CGROUPWB flag will not be set in
sb->s_iflags.
Based on a previous version from Bob Peterson from several years ago.
Thanks to Tetsuo Handa, Jan Kara, and Rafael Aquini for helping figure
this out.
Fixes: aaa2cacf8184 ("writeback: add lockdep annotation to inode_to_wb()")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The filesystem's freeze/thaw functions can be called from contexts where
the holder isn't userspace but the kernel, e.g., during systemd
suspend/hibernate. So pass through the freeze/thaw flags from the VFS
instead of hard-coding them.
Signed-off-by: Christian Brauner <brauner@kernel.org>
|
|
In gfs2_evict_inode(), in the unlikely case that we cannot defer
deleting the inode, it is not safe to fall back to deleting the inode;
the only valid choice we have is to skip the delete.
In addition, in evict_should_delete(), if we cannot lock the inode glock
exclusively, we are in a bad enough state that skipping the delete is
likely a better choice than trying to recover from the failure later.
Fixes: c5b7a2400edc ("gfs2: Only defer deletes when we have an iopen glock")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In evict_should_delete(), when gfs2_upgrade_iopen_glock() fails, we
detach the iopen glock from the inode without calling
glock_clear_object(). This leads to a warning in glock_set_object()
when the same inode is recreated and the glock is reused.
Fix that by only detaching the iopen glock in gfs2_evict_inode().
In addition, remove the dequeue code from evict_should_delete(); we
already perform a conditional dequeue in gfs2_evict_inode().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Having this flag attached to the iopen glock instead of the inode is
much simpler; it eliminates a protential weird race in gfs2_try_evict().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The mechanism to defer deleting unlinked inodes is tied to
delete_work_func(), which is tied to iopen glocks. When we don't have
an iopen glock, we must carry out deletes immediately instead.
Fixes a NULL pointer dereference in gfs2_evict_inode().
Fixes: 8c21c2c71e66 ("gfs2: Call gfs2_queue_verify_delete from gfs2_evict_inode")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When function evict_should_delete() returns SHOULD_DEFER_EVICTION, gh is
never initialized, but that isn't obvious; if it did initialize gh and
then return SHOULD_DEFER_EVICTION, gfs2_evict_inode() would fail to
release it. To clarify the code, change gfs2_evict_inode() to always
check if gh needs to be released, no matter what evict_should_delete()
returns.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Try to be a bit more clear and remove some duplications. We cannot
actually get rid of the verification step eventually, so remove the
comment saying so.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move calls to gfs2_queue_verify_delete() into gfs2_evict_inode().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In case an iopen glock cannot be upgraded, function
gfs2_upgrade_iopen_glock() needs to communicate to gfs2_evict_inode()
whether deleting the inode should be deferred or skipped altogether.
Change the function to return the appropriate enum evict_behavior value
to indicate that.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Rename enum dinode_demise to evict_behavior and its items
SHOULD_DELETE_DINODE to EVICT_SHOULD_DELETE,
SHOULD_NOT_DELETE_DINODE to EVICT_SHOULD_SKIP_DELETE, and
SHOULD_DEFER_EVICTION to EVICT_SHOULD_DEFER_DELETE.
In gfs2_evict_inode(), add a separate variable of type enum
evict_behavior instead of implicitly casting to int.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
The GIF_DEFERRED_DELETE flag indicates an action that gfs2_evict_inode()
should take, so rename the flag to GIF_DEFER_DELETE to clarify.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Move function needs_demote() to glock.h and rename it to
glock_needs_demote(). In handle_callback(), wake up the glock when
setting the GLF_PENDING_DEMOTE flag as well. (Setting the GLF_DEMOTE
flag already triggered a wake-up.)
With that, check for glock_needs_demote() in gfs2_upgrade_iopen_glock()
to wake up when either of those flags is set for the inode glock: the
faster we can react to contention, the better.
The GLF_PENDING_DEMOTE flag is only used for inode glocks (see
gfs2_glock_cb()) so it's okay to only check for the GLF_DEMOTE flag in
gfs2_drop_inode(). Still, using glock_needs_demote() there as well
makes the code a little easier to read.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When mounting of a corrupted disk image fails, the error message printed
can reference uninitialized inode fields. To prevent that from happening,
always initialize those fields.
Reported-by: syzbot+aa0730b0a42646eb1359@syzkaller.appspotmail.com
Signed-off-by: Qianqiang Liu <qianqiang.liu@163.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Before commit f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete"
work"), function delete_work_func() was used to trigger the eviction of
in-memory inodes from remote as well as deleting unlinked inodes at a
later point. These two kinds of work were then split into two kinds of
work, and the two places in the code were deferred deletion of inodes is
required accidentally ended up queuing the wrong kind of work. This
caused unlinked inodes to be left behind, which could in the worst case
fill up filesystems and require a filesystem check to recover.
Fix that by queuing the right kind of work in try_rgrp_unlink() and
gfs2_drop_inode().
Fixes: f0e56edc2ec7 ("gfs2: Split the two kinds of glock "delete" work")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In the current glock reference counting model, a bias of one is added to
a glock's refcount when it is locked (gl->gl_state != LM_ST_UNLOCKED).
A glock is removed from the lru_list when it is enqueued, and added back
when it is dequeued. This isn't a very appropriate model because most
glocks are held for long periods of time (for example, the inode "owns"
references to its inode and iopen glocks as long as the inode is cached
even when the glock state changes to LM_ST_UNLOCKED), and they can only
be freed when they are no longer referenced, anyway.
Fix this by getting rid of the refcount bias for locked glocks. That
way, we can use lockref_put_or_lock() to efficiently drop all but the
last glock reference, and put the glock onto the lru_list when the last
reference is dropped. When find_insert_glock() returns a reference to a
cached glock, it removes the glock from the lru_list.
Dumping the "glocks" and "glstats" debugfs files also takes glock
references, but instead of removing the glocks from the lru_list in that
case as well, we leave them on the list. This ensures that dumping
those files won't perturb the order of the glocks on the lru_list.
In addition, when the last reference to an *unlocked* glock is dropped,
we immediately free it; this preserves the preexisting behavior. If it
later turns out that caching unlocked glocks is useful in some
situations, we can change the caching strategy.
It is currently unclear if a glock that has no active references can
have the GLF_LFLUSH flag set. To make sure that such a glock won't
accidentally be evicted due to memory pressure, we add a GLF_LFLUSH
check to gfs2_dispose_glock_lru().
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
'timeout' is a vague name for the return value of wait_event_*_timeout
because it actually returns the time left. Because the variable is never
used later, just drop the return value. Since variable 'timeout' is then
only used to carry a fixed timeout value, drop this in favor of a fixed
function argument as in the other call to wait_event_timeout() above.
Signed-off-by: Wolfram Sang <wsa+renesas@sang-engineering.com>
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
Function gfs2_freeze_unlock() is always called with &sdp->sd_freeze_gh
as its argument, so clean up the code by passing in sdp instead.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
When a DLM lockspace is released and there ares still locks in that
lockspace, DLM will unlock those locks automatically. Commit
fb6791d100d1b started exploiting this behavior to speed up filesystem
unmount: gfs2 would simply free glocks it didn't want to unlock and then
release the lockspace. This didn't take the bast callbacks for
asynchronous lock contention notifications into account, which remain
active until until a lock is unlocked or its lockspace is released.
To prevent those callbacks from accessing deallocated objects, put the
glocks that should not be unlocked on the sd_dead_glocks list, release
the lockspace, and only then free those glocks.
As an additional measure, ignore unexpected ast and bast callbacks if
the receiving glock is dead.
Fixes: fb6791d100d1b ("GFS2: skip dlm_unlock calls in unmount")
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
Cc: David Teigland <teigland@redhat.com>
|
|
Function gfs2_glock_queue_put() puts a glock reference by enqueuing
glock work instead of putting the reference directly. This ensures that
the operation won't sleep, but it is costly and really only necessary
when putting the final glock reference. Replace it with a new
gfs2_glock_put_async() function that only queues glock work when putting
the last glock reference.
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|
|
In gfs2_jindex_free(), set sdp->sd_jdesc to NULL under the log flush
lock to provide exclusion against gfs2_log_flush().
In gfs2_log_flush(), check if sdp->sd_jdesc is non-NULL before
dereferencing it. Otherwise, we could run into a NULL pointer
dereference when outstanding glock work races with an unmount
(glock_work_func -> run_queue -> do_xmote -> inode_go_sync ->
gfs2_log_flush).
Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
|