aboutsummaryrefslogtreecommitdiff
path: root/fs/gfs2/file.c
AgeCommit message (Collapse)AuthorFilesLines
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook1-2/+2
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-01-12fs: remove simple_nosetlease()Jeff Layton1-2/+0
Setting ->setlease() to a NULL pointer now has the same effect as setting it to simple_nosetlease(). Remove all of the setlease file_operations that are set to simple_nosetlease, and the function itself. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260108-setlease-6-20-v1-24-ea4dec9b67fa@kernel.org Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12gfs2: add a setlease file operationJeff Layton1-0/+1
gfs2_file_fops_nolock() already has this explicitly set, so it's only necessary to set this in gfs2_dir_fops_nolock(). A future patch will change the default behavior to reject lease attempts with -EINVAL when there is no setlease file operation defined. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260108-setlease-6-20-v1-10-ea4dec9b67fa@kernel.org Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12gfs2: don't allow delegations to be set on directoriesJeff Layton1-0/+1
With the advent of directory leases, it's necessary to set the ->setlease() handler in directory file_operations to properly deny them. In the "nolock" case however, there is no need to deny them. Fixes: e6d28ebc17eb ("filelock: push the S_ISREG check down to ->setlease handlers") Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260107-setlease-6-19-v1-4-85f034abcc57@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-12-03Merge tag 'gfs2-for-6.19' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Major withdraw / error handling overhaul based on dlm's new DLM_RELEASE_RECOVER feature: this allows gfs to treat withdraws like node failures. Make withdraws asynchronous - Fix a bug in commit e4a8b5481c59a that caused 'df' to remain out of sync. ('df' is still allowed to go slightly out of sync for short periods of time) - Prevent recusive memory reclaim in gfs2_unstuff_dinode() - Clean up SDF_JOURNAL_LIVE flag handling - Fix remote evict for read-only filesystems - Fix a misuse of bio_chain() - Various other minor cleanups * tag 'gfs2-for-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (35 commits) gfs2: Fix use of bio_chain gfs2: Clean up SDF_JOURNAL_LIVE flag handling gfs2: No longer thaw filesystems during a withdraw gfs2: Withdraw immediately in gfs2_trans_add_meta gfs2: New gfs2_withdraw_helper gfs2: Clean up properly during a withdraw gfs2: Rename gfs2_{gl_dq_holders => withdraw_glocks} Revert "gfs2: fix infinite loop when checking ail item count before go_inval" Revert "gfs2: Allow some glocks to be used during withdraw" Revert "gfs2: Check for log write errors before telling dlm to unlock" Revert "gfs2: fix a deadlock on withdraw-during-mount" Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (6/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (5/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (4/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (3/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (2/6) Revert "gfs2: Force withdraw to replay journals and wait for it to finish" (1/6) Revert "gfs2: don't stop reads while withdraw in progress" gfs2: Rename LM_FLAG_{NOEXP -> RECOVER} gfs2: Kill gfs2_io_error_bh_wd ...
2025-11-26gfs2: Rename gfs2_{withdrawing_or_ => }withdrawnAndreas Gruenbacher1-1/+1
With delayed withdraws and the SDF_WITHDRAWING flag gone, we can now rename gfs2_withdrawing_or_withdrawn() back to gfs2_withdrawn(). Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-10-20gfs2: use the new ->i_state accessorsMateusz Guzik1-1/+1
Change generated with coccinelle and fixed up by hand as appropriate. Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-12gfs2: Add proper lockspace lockingAndreas Gruenbacher1-8/+15
GFS2 has been calling functions like dlm_lock() even after the lockspace that these functions operate on has been released with dlm_release_lockspace(). It has always assumed that those functions would return -EINVAL in that case, but that was never guaranteed, and it certainly is no longer the case since commit 4db41bf4f04f ("dlm: remove ls_local_handle from struct dlm_ls"). To fix that, add proper lockspace locking. Fixes: 3e11e5304150 ("GFS2: ignore unlock failures after withdraw") Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Andrew Price <anprice@redhat.com>
2025-07-28Merge tag 'vfs-6.17-rc1.iomap' of ↵Linus Torvalds1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs iomap updates from Christian Brauner: - Refactor the iomap writeback code and split the generic and ioend/bio based writeback code. There are two methods that define the split between the generic writeback code, and the implemementation of it, and all knowledge of ioends and bios now sits below that layer. - Add fuse iomap support for buffered writes and dirty folio writeback. This is needed so that granular uptodate and dirty tracking can be used in fuse when large folios are enabled. This has two big advantages. For writes, instead of the entire folio needing to be read into the page cache, only the relevant portions need to be. For writeback, only the dirty portions need to be written back instead of the entire folio. * tag 'vfs-6.17-rc1.iomap' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fuse: refactor writeback to use iomap_writepage_ctx inode fuse: hook into iomap for invalidating and checking partial uptodateness fuse: use iomap for folio laundering fuse: use iomap for writeback fuse: use iomap for buffered writes iomap: build the writeback code without CONFIG_BLOCK iomap: add read_folio_range() handler for buffered writes iomap: improve argument passing to iomap_read_folio_sync iomap: replace iomap_folio_ops with iomap_write_ops iomap: export iomap_writeback_folio iomap: move folio_unlock out of iomap_writeback_folio iomap: rename iomap_writepage_map to iomap_writeback_folio iomap: move all ioend handling to ioend.c iomap: add public helpers for uptodate state manipulation iomap: hide ioends from the generic writeback code iomap: refactor the writeback interface iomap: cleanup the pending writeback tracking in iomap_writepage_map_blocks iomap: pass more arguments using the iomap writeback context iomap: header diet
2025-07-14iomap: replace iomap_folio_ops with iomap_write_opsChristoph Hellwig1-1/+2
The iomap_folio_ops are only used for buffered writes, including the zero and unshare variants. Rename them to iomap_write_ops to better describe the usage, and pass them through the call chain like the other operation specific methods instead of through the iomap. xfs_iomap_valid grows a IOMAP_HOLE check to keep the existing behavior that never attached the folio_ops to a iomap representing a hole. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/20250710133343.399917-12-hch@lst.de Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Brian Foster <bfoster@redhat.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-04tree-wide: s/struct fileattr/struct file_kattr/gChristian Brauner1-2/+2
Now that we expose struct file_attr as our uapi struct rename all the internal struct to struct file_kattr to clearly communicate that it is a kernel internal struct. This is similar to struct mount_{k}attr and others. Link: https://lore.kernel.org/20250703-restlaufzeit-baurecht-9ed44552b481@brauner Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-03-18gfs2: some comment clarificationsAndreas Gruenbacher1-4/+4
Since commit e1fa9ea85ce8 ("gfs2: Stop using glock holder auto-demotion for now"), we unconditionally drop the inode glock before trying to fault in more pages. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2025-01-14gfs2: Truncate address space when flipping GFS2_DIF_JDATA flagAndreas Gruenbacher1-0/+1
Truncate an inode's address space when flipping the GFS2_DIF_JDATA flag: depending on that flag, the pages in the address space will either use buffer heads or iomap_folio_state structs, and we cannot mix the two. Reported-by: Kun Hu <huk23@m.fudan.edu.cn>, Jiaji Qin <jjtan24@m.fudan.edu.cn> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-10-02Merge patch series "Fixup NLM and kNFSD file lock callbacks"Christian Brauner1-0/+2
Benjamin Coddington <bcodding@redhat.com> says: Last year both GFS2 and OCFS2 had some work done to make their locking more robust when exported over NFS. Unfortunately, part of that work caused both NLM (for NFS v3 exports) and kNFSD (for NFSv4.1+ exports) to no longer send lock notifications to clients. This in itself is not a huge problem because most NFS clients will still poll the server in order to acquire a conflicted lock, but now that I've noticed it I can't help but try to fix it because there are big advantages for setups that might depend on timely lock notifications, and we've supported that as a feature for a long time. Its important for NLM and kNFSD that they do not block their kernel threads inside filesystem's file_lock implementations because that can produce deadlocks. We used to make sure of this by only trusting that posix_lock_file() can correctly handle blocking lock calls asynchronously, so the lock managers would only setup their file_lock requests for async callbacks if the filesystem did not define its own lock() file operation. However, when GFS2 and OCFS2 grew the capability to correctly handle blocking lock requests asynchronously, they started signalling this behavior with EXPORT_OP_ASYNC_LOCK, and the check for also trusting posix_lock_file() was inadvertently dropped, so now most filesystems no longer produce lock notifications when exported over NFS. I tried to fix this by simply including the old check for lock(), but the resulting include mess and layering violations was more than I could accept. There's a much cleaner way presented here using an fop_flag, which while potentially flag-greedy, greatly simplifies the problem and grooms the way for future uses by both filesystems and lock managers alike. * patches from https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com: exportfs: Remove EXPORT_OP_ASYNC_LOCK NLM/NFSD: Fix lock notifications for async-capable filesystems gfs2/ocfs2: set FOP_ASYNC_LOCK fs: Introduce FOP_ASYNC_LOCK NFS: trace: show TIMEDOUT instead of 0x6e nfsd: use system_unbound_wq for nfsd_file_gc_worker() nfsd: count nfsd_file allocations nfsd: fix refcount leak when file is unhashed after being found nfsd: remove unneeded EEXIST error check in nfsd_do_file_acquire nfsd: add list_head nf_gc to struct nfsd_file Link: https://lore.kernel.org/r/cover.1726083391.git.bcodding@redhat.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-12gfs2/ocfs2: set FOP_ASYNC_LOCKBenjamin Coddington1-0/+2
Both GFS2 and OCFS2 use DLM locking, which will allow async lock requests. Signal this support by setting FOP_ASYNC_LOCK. Signed-off-by: Benjamin Coddington <bcodding@redhat.com> Link: https://lore.kernel.org/r/fc4163dbbf33c58e5a8b8ee8cb8c57e555f53ce5.1726083391.git.bcodding@redhat.com Reviewed-by: Jeff Layton <jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-09-03iomap: add a private argument for iomap_file_buffered_writeJosef Bacik1-1/+1
In order to switch fuse over to using iomap for buffered writes we need to be able to have the struct file for the original write, in case we have to read in the page to make it uptodate. Handle this by using the existing private field in the iomap_iter, and add the argument to iomap_file_buffered_write. This will allow us to pass the file in through the iomap buffered write path, and is flexible for any other file systems needs. Signed-off-by: Josef Bacik <josef@toxicpanda.com> Link: https://lore.kernel.org/r/7f55c7c32275004ba00cddf862d970e6e633f750.1724755651.git.josef@toxicpanda.com Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-05-14Merge tag 'gfs2-for-v6.10' of ↵Linus Torvalds1-29/+30
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Properly fix the glock shrinker this time: it broke in commit "gfs2: Make glock lru list scanning safer" and commit "gfs2: fix glock shrinker ref issues" wasn't actually enough to fix it - On unmount, keep glocks around long enough that no more dlm callbacks can occur on them - Some more folio conversion patches from Matthew Wilcox - Lots of other smaller fixes and cleanups * tag 'gfs2-for-v6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (27 commits) gfs2: make timeout values more explicit gfs2: Convert gfs2_aspace_writepage() to use a folio gfs2: Add a migrate_folio operation for journalled files gfs2: Simplify gfs2_read_super gfs2: Convert gfs2_page_mkwrite() to use a folio gfs2: gfs2_freeze_unlock cleanup gfs2: Remove and replace gfs2_glock_queue_work gfs2: do_xmote fixes gfs2: finish_xmote cleanup gfs2: Unlock fewer glocks on unmount gfs2: Fix potential glock use-after-free on unmount gfs2: Remove ill-placed consistency check gfs2: Fix lru_count accounting gfs2: Fix "Make glock lru list scanning safer" Revert "gfs2: fix glock shrinker ref issues" gfs2: Fix "ignore unlock failures after withdraw" gfs2: Get rid of unnecessary test_and_set_bit gfs2: Don't set GLF_LOCK in gfs2_dispose_glock_lru gfs2: Replace gfs2_glock_queue_put with gfs2_glock_put_async gfs2: Get rid of gfs2_glock_queue_put in signal_our_withdraw ...
2024-04-29gfs2: Convert gfs2_page_mkwrite() to use a folioMatthew Wilcox (Oracle)1-29/+30
Convert the incoming page to a folio and use it throughout, saving several calls to compound_head(). Also use 'pos' for file position rather than the ambiguous 'offset' and convert 'length' to type size_t in case we get some truly ridiculous sized folios in the future. This function should now be large-folio safe, but I may have missed something. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2024-02-05gfs2: adapt to breakup of struct file_lockJeff Layton1-6/+5
Most of the existing APIs have remained the same, but subsystems that access file_lock fields directly need to reach into struct file_lock_core now. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-38-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05filelock: split common fields into struct file_lock_coreJeff Layton1-0/+1
In a future patch, we're going to split file leases into their own structure. Since a lot of the underlying machinery uses the same fields move those into a new file_lock_core, and embed that inside struct file_lock. For now, add some macros to ensure that we can continue to build while the conversion is in progress. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-17-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2024-02-05gfs2: convert to using new filelock helpersJeff Layton1-4/+4
Convert to using the new file locking helper functions. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://lore.kernel.org/r/20240131-flsplit-v3-9-c6129007ee8d@kernel.org Reviewed-by: NeilBrown <neilb@suse.de> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-12-20gfs2: Rename gfs2_withdrawn to gfs2_withdrawing_or_withdrawnAndreas Gruenbacher1-1/+1
This function checks whether the filesystem has been been marked to be withdrawn eventually or has been withdrawn already. Rename this function to avoid confusing code like checking for gfs2_withdrawing() when gfs2_withdrawn() has already returned true. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-12-20gfs2: Mark withdraws as unlikelyAndreas Gruenbacher1-1/+1
Mark the gfs2_withdrawn(), gfs2_withdrawing(), and gfs2_withdraw_in_prog() inline functions as likely to return %false. This allows to get rid of likely() and unlikely() annotations at the call sites of those functions. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-11-02gfs2: Clean up gfs2_alloc_parms initializersAndreas Gruenbacher1-2/+2
When intializing a struct, all fields that are not explicitly mentioned are zeroed out already. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-09-22gfs2: Don't update inode timestamps for direct writesAndreas Gruenbacher1-4/+10
During direct reads and writes, the caller is holding the inode glock in deferred mode, which doesn't allow metadata updates. However, a previous change caused callers to update the inode modification time before carrying out direct writes, which caused the inode glock to be converted to exclusive mode for the timestamp update, only to be immediately converted back to deferred mode for the direct write. This locks out other direct readers and writers and wreaks havoc on performance. Fix that by reverting to not updating the inode modification time for direct writes. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-08-31Merge tag 'dlm-6.6' of ↵Linus Torvalds1-6/+3
git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm Pull dlm updates from David Teigland: - Allow blocking posix lock requests to be interrupted while waiting. This requires a cancel request to be sent to the userspace daemon where posix lock requests are processed across the cluster. - Fix a posix lock patch from the previous cycle in which lock requests from different file systems could be mixed up. - Fix some long standing problems with nfs posix lock cancelation. - Add a new debugfs file for printing queued callbacks. - Stop modifying buffers that have been used to receive a message. - Misc cleanups and some refactoring. * tag 'dlm-6.6' of git://git.kernel.org/pub/scm/linux/kernel/git/teigland/linux-dlm: dlm: fix plock lookup when using multiple lockspaces fs: dlm: don't use RCOM_NAMES for version detection fs: dlm: create midcomms nodes when configure fs: dlm: constify receive buffer fs: dlm: drop rxbuf manipulation in dlm_recover_master_copy fs: dlm: drop rxbuf manipulation in dlm_copy_master_names fs: dlm: get recovery sequence number as parameter fs: dlm: cleanup lock order fs: dlm: remove clear_members_cb fs: dlm: add plock dev tracepoints fs: dlm: check on plock ops when exit dlm fs: dlm: debugfs for queued callbacks fs: dlm: remove unused processed_nodes fs: dlm: add missing spin_unlock fs: dlm: fix F_CANCELLK to cancel pending request fs: dlm: allow to F_SETLKW getting interrupted fs: dlm: remove twice newline
2023-08-29Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linuxLinus Torvalds1-8/+8
Pull block updates from Jens Axboe: "Pretty quiet round for this release. This contains: - Add support for zoned storage to ublk (Andreas, Ming) - Series improving performance for drivers that mark themselves as needing a blocking context for issue (Bart) - Cleanup the flush logic (Chengming) - sed opal keyring support (Greg) - Fixes and improvements to the integrity support (Jinyoung) - Add some exports for bcachefs that we can hopefully delete again in the future (Kent) - deadline throttling fix (Zhiguo) - Series allowing building the kernel without buffer_head support (Christoph) - Sanitize the bio page adding flow (Christoph) - Write back cache fixes (Christoph) - MD updates via Song: - Fix perf regression for raid0 large sequential writes (Jan) - Fix split bio iostat for raid0 (David) - Various raid1 fixes (Heinz, Xueshi) - raid6test build fixes (WANG) - Deprecate bitmap file support (Christoph) - Fix deadlock with md sync thread (Yu) - Refactor md io accounting (Yu) - Various non-urgent fixes (Li, Yu, Jack) - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li, Ming, Nitesh, Ruan, Tejun, Thomas, Xu)" * tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits) block: use strscpy() to instead of strncpy() block: sed-opal: keyring support for SED keys block: sed-opal: Implement IOC_OPAL_REVERT_LSP block: sed-opal: Implement IOC_OPAL_DISCOVERY blk-mq: prealloc tags when increase tagset nr_hw_queues blk-mq: delete redundant tagset map update when fallback blk-mq: fix tags leak when shrink nr_hw_queues ublk: zoned: support REQ_OP_ZONE_RESET_ALL md: raid0: account for split bio in iostat accounting md/raid0: Fix performance regression for large sequential writes md/raid0: Factor out helper for mapping and submitting a bio md raid1: allow writebehind to work on any leg device set WriteMostly md/raid1: hold the barrier until handle_read_error() finishes md/raid1: free the r1bio before waiting for blocked rdev md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io() blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init drivers/rnbd: restore sysfs interface to rnbd-client md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid() raid6: test: only check for Altivec if building on powerpc hosts raid6: test: make sure all intermediate and artifact files are .gitignored ...
2023-08-28Merge tag 'v6.6-vfs.ctime' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs timestamp updates from Christian Brauner: "This adds VFS support for multi-grain timestamps and converts tmpfs, xfs, ext4, and btrfs to use them. This carries acks from all relevant filesystems. The VFS always uses coarse-grained timestamps when updating the ctime and mtime after a change. This has the benefit of allowing filesystems to optimize away a lot of metadata updates, down to around 1 per jiffy, even when a file is under heavy writes. Unfortunately, this has always been an issue when we're exporting via NFSv3, which relies on timestamps to validate caches. A lot of changes can happen in a jiffy, so timestamps aren't sufficient to help the client decide to invalidate the cache. Even with NFSv4, a lot of exported filesystems don't properly support a change attribute and are subject to the same problems with timestamp granularity. Other applications have similar issues with timestamps (e.g., backup applications). If we were to always use fine-grained timestamps, that would improve the situation, but that becomes rather expensive, as the underlying filesystem would have to log a lot more metadata updates. This introduces fine-grained timestamps that are used when they are actively queried. This uses the 31st bit of the ctime tv_nsec field to indicate that something has queried the inode for the mtime or ctime. When this flag is set, on the next mtime or ctime update, the kernel will fetch a fine-grained timestamp instead of the usual coarse-grained one. As POSIX generally mandates that when the mtime changes, the ctime must also change the kernel always stores normalized ctime values, so only the first 30 bits of the tv_nsec field are ever used. Filesytems can opt into this behavior by setting the FS_MGTIME flag in the fstype. Filesystems that don't set this flag will continue to use coarse-grained timestamps. Various preparatory changes, fixes and cleanups are included: - Fixup all relevant places where POSIX requires updating ctime together with mtime. This is a wide-range of places and all maintainers provided necessary Acks. - Add new accessors for inode->i_ctime directly and change all callers to rely on them. Plain accesses to inode->i_ctime are now gone and it is accordingly rename to inode->__i_ctime and commented as requiring accessors. - Extend generic_fillattr() to pass in a request mask mirroring in a sense the statx() uapi. This allows callers to pass in a request mask to only get a subset of attributes filled in. - Rework timestamp updates so it's possible to drop the @now parameter the update_time() inode operation and associated helpers. - Add inode_update_timestamps() and convert all filesystems to it removing a bunch of open-coding" * tag 'v6.6-vfs.ctime' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (107 commits) btrfs: convert to multigrain timestamps ext4: switch to multigrain timestamps xfs: switch to multigrain timestamps tmpfs: add support for multigrain timestamps fs: add infrastructure for multigrain timestamps fs: drop the timespec64 argument from update_time xfs: have xfs_vn_update_time gets its own timestamp fat: make fat_update_time get its own timestamp fat: remove i_version handling from fat_update_time ubifs: have ubifs_update_time use inode_update_timestamps btrfs: have it use inode_update_timestamps fs: drop the timespec64 arg from generic_update_time fs: pass the request_mask to generic_fillattr fs: remove silly warning from current_time gfs2: fix timestamp handling on quota inodes fs: rename i_ctime field to __i_ctime selinux: convert to ctime accessor functions security: convert to ctime accessor functions apparmor: convert to ctime accessor functions sunrpc: convert to ctime accessor functions ...
2023-08-07gfs2: Don't use filemap_splice_readBob Peterson1-2/+2
Starting with patch 2cb1e08985, gfs2 started using the new function filemap_splice_read rather than the old (and subsequently deleted) function generic_file_splice_read. filemap_splice_read works by taking references to a number of folios in the page cache and splicing those folios into a pipe. The folios are then read from the pipe and the folio references are dropped. This can take an arbitrary amount of time. We cannot allow that in gfs2 because those folio references will pin the inode glock to the node and prevent it from being demoted, which can lead to cluster-wide deadlocks. Instead, use copy_splice_read. (In addition, the old generic_file_splice_read called into ->read_iter, which called gfs2_file_read_iter, which took the inode glock during the operation. The new filemap_splice_read interface does not take the inode glock anymore. This is fixable, but it still wouldn't prevent cluster-wide deadlocks.) Fixes: 2cb1e08985e3 ("splice: Use filemap_splice_read() instead of generic_file_splice_read()") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-08-02fs: rename and move block_page_mkwrite_returnChristoph Hellwig1-8/+8
block_page_mkwrite_return is neither block nor mkwrite specific, and should not be under CONFIG_BLOCK. Move it to mm.h and rename it to vmf_fs_error. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Christian Brauner <brauner@kernel.org> Link: https://lore.kernel.org/r/20230801172201.1923299-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-07-24gfs2: convert to ctime accessor functionsJeff Layton1-1/+1
In later patches, we're going to change how the inode's ctime field is used. Switch to using accessor functions instead of raw accesses of inode->i_ctime. Signed-off-by: Jeff Layton <jlayton@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-Id: <20230705190309.579783-45-jlayton@kernel.org> Signed-off-by: Christian Brauner <brauner@kernel.org>
2023-07-20fs: dlm: fix F_CANCELLK to cancel pending requestAlexander Aring1-6/+3
This patch fixes the current handling of F_CANCELLK by not just doing a unlock as we need to try to cancel a lock at first. A unlock makes sense on a non-blocking lock request but if it's a blocking lock request we need to cancel the request until it's not granted yet. This patch is fixing this behaviour by first try to cancel a lock request and if it's failed it's unlocking the lock which seems to be granted. Note: currently the nfs locking handling was disabled by commit 40595cdc93ed ("nfs: block notification on fs with its own ->lock"). However DLM was never being updated regarding to this change. Future patches will try to fix lockd lock requests for DLM. This patch is currently assuming the upstream DLM lockd handling is correct. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-07-04Merge tag 'gfs2-v6.4-rc5-fixes' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 updates from Andreas Gruenbacher: - Move the freeze/thaw logic from glock callback context to process / worker thread context to prevent deadlocks - Fix a quota reference couting bug in do_qc() - Carry on deallocating inodes even when gfs2_rindex_update() fails - Retry filesystem-internal reads when they are interruped by a signal - Eliminate kmap_atomic() in favor of kmap_local_page() / memcpy_{from,to}_page() - Get rid of noop_direct_IO - And a few more minor fixes and cleanups * tag 'gfs2-v6.4-rc5-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: (23 commits) gfs2: Add quota_change type gfs2: Use memcpy_{from,to}_page where appropriate gfs2: Convert remaining kmap_atomic calls to kmap_local_page gfs2: Replace deprecated kmap_atomic with kmap_local_page gfs: Get rid of unnucessary locking in inode_go_dump gfs2: gfs2_freeze_lock_shared cleanup gfs2: Replace sd_freeze_state with SDF_FROZEN flag gfs2: Rework freeze / thaw logic gfs2: Rename SDF_{FS_FROZEN => FREEZE_INITIATOR} gfs2: Reconfiguring frozen filesystem already rejected gfs2: Rename gfs2_freeze_lock{ => _shared } gfs2: Rename the {freeze,thaw}_super callbacks gfs2: Rename remaining "transaction" glock references gfs2: retry interrupted internal reads gfs2: Fix possible data races in gfs2_show_options() gfs2: Fix duplicate should_fault_in_pages() call gfs2: set FMODE_CAN_ODIRECT instead of a dummy direct_IO method gfs2: Don't remember delete unless it's successful gfs2: Update rl_unlinked before releasing rgrp lock gfs2: Fix gfs2_qa_get imbalance in gfs2_quota_hold ...
2023-06-28Merge tag 'mm-stable-2023-06-24-19-15' of ↵Linus Torvalds1-5/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull mm updates from Andrew Morton: - Yosry Ahmed brought back some cgroup v1 stats in OOM logs - Yosry has also eliminated cgroup's atomic rstat flushing - Nhat Pham adds the new cachestat() syscall. It provides userspace with the ability to query pagecache status - a similar concept to mincore() but more powerful and with improved usability - Mel Gorman provides more optimizations for compaction, reducing the prevalence of page rescanning - Lorenzo Stoakes has done some maintanance work on the get_user_pages() interface - Liam Howlett continues with cleanups and maintenance work to the maple tree code. Peng Zhang also does some work on maple tree - Johannes Weiner has done some cleanup work on the compaction code - David Hildenbrand has contributed additional selftests for get_user_pages() - Thomas Gleixner has contributed some maintenance and optimization work for the vmalloc code - Baolin Wang has provided some compaction cleanups, - SeongJae Park continues maintenance work on the DAMON code - Huang Ying has done some maintenance on the swap code's usage of device refcounting - Christoph Hellwig has some cleanups for the filemap/directio code - Ryan Roberts provides two patch series which yield some rationalization of the kernel's access to pte entries - use the provided APIs rather than open-coding accesses - Lorenzo Stoakes has some fixes to the interaction between pagecache and directio access to file mappings - John Hubbard has a series of fixes to the MM selftesting code - ZhangPeng continues the folio conversion campaign - Hugh Dickins has been working on the pagetable handling code, mainly with a view to reducing the load on the mmap_lock - Catalin Marinas has reduced the arm64 kmalloc() minimum alignment from 128 to 8 - Domenico Cerasuolo has improved the zswap reclaim mechanism by reorganizing the LRU management - Matthew Wilcox provides some fixups to make gfs2 work better with the buffer_head code - Vishal Moola also has done some folio conversion work - Matthew Wilcox has removed the remnants of the pagevec code - their functionality is migrated over to struct folio_batch * tag 'mm-stable-2023-06-24-19-15' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (380 commits) mm/hugetlb: remove hugetlb_set_page_subpool() mm: nommu: correct the range of mmap_sem_read_lock in task_mem() hugetlb: revert use of page_cache_next_miss() Revert "page cache: fix page_cache_next/prev_miss off by one" mm/vmscan: fix root proactive reclaim unthrottling unbalanced node mm: memcg: rename and document global_reclaim() mm: kill [add|del]_page_to_lru_list() mm: compaction: convert to use a folio in isolate_migratepages_block() mm: zswap: fix double invalidate with exclusive loads mm: remove unnecessary pagevec includes mm: remove references to pagevec mm: rename invalidate_mapping_pagevec to mapping_try_invalidate mm: remove struct pagevec net: convert sunrpc from pagevec to folio_batch i915: convert i915_gpu_error to use a folio_batch pagevec: rename fbatch_count() mm: remove check_move_unevictable_pages() drm: convert drm_gem_put_pages() to use a folio_batch i915: convert shmem_sg_free_table() to use a folio_batch scatterlist: add sg_set_folio() ...
2023-06-26Merge tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linuxLinus Torvalds1-2/+2
Pull splice updates from Jens Axboe: "This kills off ITER_PIPE to avoid a race between truncate, iov_iter_revert() on the pipe and an as-yet incomplete DMA to a bio with unpinned/unref'ed pages from an O_DIRECT splice read. This causes memory corruption. Instead, we either use (a) filemap_splice_read(), which invokes the buffered file reading code and splices from the pagecache into the pipe; (b) copy_splice_read(), which bulk-allocates a buffer, reads into it and then pushes the filled pages into the pipe; or (c) handle it in filesystem-specific code. Summary: - Rename direct_splice_read() to copy_splice_read() - Simplify the calculations for the number of pages to be reclaimed in copy_splice_read() - Turn do_splice_to() into a helper, vfs_splice_read(), so that it can be used by overlayfs and coda to perform the checks on the lower fs - Make vfs_splice_read() jump to copy_splice_read() to handle direct-I/O and DAX - Provide shmem with its own splice_read to handle non-existent pages in the pagecache. We don't want a ->read_folio() as we don't want to populate holes, but filemap_get_pages() requires it - Provide overlayfs with its own splice_read to call down to a lower layer as overlayfs doesn't provide ->read_folio() - Provide coda with its own splice_read to call down to a lower layer as coda doesn't provide ->read_folio() - Direct ->splice_read to copy_splice_read() in tty, procfs, kernfs and random files as they just copy to the output buffer and don't splice pages - Provide wrappers for afs, ceph, ecryptfs, ext4, f2fs, nfs, ntfs3, ocfs2, orangefs, xfs and zonefs to do locking and/or revalidation - Make cifs use filemap_splice_read() - Replace pointers to generic_file_splice_read() with pointers to filemap_splice_read() as DIO and DAX are handled in the caller; filesystems can still provide their own alternate ->splice_read() op - Remove generic_file_splice_read() - Remove ITER_PIPE and its paraphernalia as generic_file_splice_read was the only user" * tag 'for-6.5/splice-2023-06-23' of git://git.kernel.dk/linux: (31 commits) splice: kdoc for filemap_splice_read() and copy_splice_read() iov_iter: Kill ITER_PIPE splice: Remove generic_file_splice_read() splice: Use filemap_splice_read() instead of generic_file_splice_read() cifs: Use filemap_splice_read() trace: Convert trace/seq to use copy_splice_read() zonefs: Provide a splice-read wrapper xfs: Provide a splice-read wrapper orangefs: Provide a splice-read wrapper ocfs2: Provide a splice-read wrapper ntfs3: Provide a splice-read wrapper nfs: Provide a splice-read wrapper f2fs: Provide a splice-read wrapper ext4: Provide a splice-read wrapper ecryptfs: Provide a splice-read wrapper ceph: Provide a splice-read wrapper afs: Provide a splice-read wrapper 9p: Add splice_read wrapper net: Make sock_splice_read() use copy_splice_read() by default tty, proc, kernfs, random: Use copy_splice_read() ...
2023-06-12gfs2: Fix duplicate should_fault_in_pages() callBob Peterson1-1/+1
In gfs2_file_buffered_write(), we currently jump from the second call of function should_fault_in_pages() to above the first call, so should_fault_in_pages() is getting called twice in a row, causing it to accidentally fall back to single-page writes rather than trying the more efficient multi-page writes first. Fix that by moving the retry label to the correct place, behind the first call to should_fault_in_pages(). Fixes: e1fa9ea85ce8 ("gfs2: Stop using glock holder auto-demotion for now") Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-06-12gfs2: set FMODE_CAN_ODIRECT instead of a dummy direct_IO methodChristoph Hellwig1-0/+3
Since commit a2ad63daa88b ("VFS: add FMODE_CAN_ODIRECT file flag"), file systems can just set the FMODE_CAN_ODIRECT flag at open time instead of wiring up a dummy direct_IO method to indicate support for direct I/O. Remove .direct_IO from gfs2_aops and set FMODE_CAN_ODIRECT in gfs2_open_common for regular files that do not use data journalling. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-06-09iomap: update ki_pos in iomap_file_buffered_writeChristoph Hellwig1-3/+1
All callers of iomap_file_buffered_write need to updated ki_pos, move it into common code. Link: https://lkml.kernel.org/r/20230601145904.1385409-8-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Andreas Gruenbacher <agruenba@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Damien Le Moal <dlemoal@kernel.org> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Theodore Ts'o <tytso@mit.edu> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-09backing_dev: remove current->backing_dev_infoChristoph Hellwig1-2/+0
Patch series "cleanup the filemap / direct I/O interaction", v4. This series cleans up some of the generic write helper calling conventions and the page cache writeback / invalidation for direct I/O. This is a spinoff from the no-bufferhead kernel project, for which we'll want to an use iomap based buffered write path in the block layer. This patch (of 12): The last user of current->backing_dev_info disappeared in commit b9b1335e6403 ("remove bdi_congested() and wb_congested() and related functions"). Remove the field and all assignments to it. Link: https://lkml.kernel.org/r/20230601145904.1385409-1-hch@lst.de Link: https://lkml.kernel.org/r/20230601145904.1385409-2-hch@lst.de Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Acked-by: Theodore Ts'o <tytso@mit.edu> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Andreas Gruenbacher <agruenba@redhat.com> Cc: Anna Schumaker <anna@kernel.org> Cc: Chao Yu <chao@kernel.org> Cc: Ilya Dryomov <idryomov@gmail.com> Cc: Jaegeuk Kim <jaegeuk@kernel.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Matthew Wilcox <willy@infradead.org> Cc: Miklos Szeredi <miklos@szeredi.hu> Cc: Miklos Szeredi <mszeredi@redhat.com> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Xiubo Li <xiubli@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-06-01gfs2: Don't get stuck writing page onto itself under direct I/OAndreas Gruenbacher1-3/+14
When a direct I/O write is performed, iomap_dio_rw() invalidates the part of the page cache which the write is going to before carrying out the write. In the odd case, the direct I/O write will be reading from the same page it is writing to. gfs2 carries out writes with page faults disabled, so it should have been obvious that this page invalidation can cause iomap_dio_rw() to never make any progress. Currently, gfs2 will end up in an endless retry loop in gfs2_file_direct_write() instead, though. Break this endless loop by limiting the number of retries and falling back to buffered I/O after that. Also simplify should_fault_in_pages() sightly and add a comment to make the above case easier to understand. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com>
2023-05-24splice: Use filemap_splice_read() instead of generic_file_splice_read()David Howells1-2/+2
Replace pointers to generic_file_splice_read() with calls to filemap_splice_read(). Signed-off-by: David Howells <dhowells@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christian Brauner <brauner@kernel.org> cc: Jens Axboe <axboe@kernel.dk> cc: Al Viro <viro@zeniv.linux.org.uk> cc: David Hildenbrand <david@redhat.com> cc: John Hubbard <jhubbard@nvidia.com> cc: linux-mm@kvack.org cc: linux-block@vger.kernel.org cc: linux-fsdevel@vger.kernel.org Link: https://lore.kernel.org/r/20230522135018.2742245-29-dhowells@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-02-20Merge tag 'fs.idmapped.v6.3' of ↵Linus Torvalds1-2/+2
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/idmapping Pull vfs idmapping updates from Christian Brauner: - Last cycle we introduced the dedicated struct mnt_idmap type for mount idmapping and the required infrastucture in 256c8aed2b42 ("fs: introduce dedicated idmap type for mounts"). As promised in last cycle's pull request message this converts everything to rely on struct mnt_idmap. Currently we still pass around the plain namespace that was attached to a mount. This is in general pretty convenient but it makes it easy to conflate namespaces that are relevant on the filesystem with namespaces that are relevant on the mount level. Especially for