aboutsummaryrefslogtreecommitdiff
path: root/fs/jbd2
AgeCommit message (Collapse)AuthorFilesLines
2 daysMerge tag 'ext4_for_linus-7.2-rc1' of ↵Linus Torvalds3-127/+22
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - A major rework of the fast commit mechanism to avoid lock contention and deadlocks. We also export snapshot statistics in /proc/fs/ext4/*/fc_info - Performance optimization for directory hash computation by processing input in 4-byte chunks and removing function pointers, along with new KUnit tests for directory hash - Cleanups in JBD2 to remove special slabs and use kmalloc() instead - Various bug fixes, including: - Early validation of donor superblock in EXT4_IOC_MOVE_EXT to avoid cross-fs deadlock - Fix for a kernel BUG in ext4_write_inline_data_end under data=journal - Fix for a NULL dereference in jbd2_journal_dirty_metadata when handle is aborted - Fix for an underflow in JBD2 fast commit block initialization check - Fix for LOGFLUSH shutdown ordering to ensure ordered data writeback - Miscellaneous fixes for error path return values and KUnit assertions * tag 'ext4_for_linus-7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: ext4: validate donor file superblock early in EXT4_IOC_MOVE_EXT ext4: fix kernel BUG in ext4_write_inline_data_end ext4: fix ERR_PTR(0) in ext4_mkdir() jbd2: remove special jbd2 slabs ext4: remove mention of PageWriteback ext4: improve str2hashbuf by processing 4-byte chunks and removing function pointers ext4: add Kunit coverage for directory hash computation ext4: fast commit: export snapshot stats in fc_info ext4: fast commit: add lock_updates tracepoint ext4: fast commit: avoid i_data_sem by dropping ext4_map_blocks() in snapshots ext4: fast commit: avoid self-deadlock in inode snapshotting ext4: fast commit: avoid waiting for FC_COMMITTING ext4: lockdep: handle i_data_sem subclassing for special inodes ext4: fast commit: snapshot inode state before writing log jbd2: fix integer underflow in jbd2_journal_initialize_fast_commit() ext4: fix fast commit wait/wake bit mapping on 64-bit jbd2: check for aborted handle in jbd2_journal_dirty_metadata() ext4: Use %pe to print PTR_ERR() ext4: fix LOGFLUSH shutdown ordering to allow ordered-mode data writeback ext4: replace KUnit tests for memcmp() with KUNIT_ASSERT_MEMEQ()
6 daysMerge tag 'vfs-7.2-rc1.misc' of ↵Linus Torvalds2-8/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Reduce pipe->mutex contention by pre-allocating pages outside the lock in anon_pipe_write(). anon_pipe_write() called alloc_page() once per page while holding pipe->mutex. The allocation can sleep doing direct reclaim and runs memcg charging, which extends the critical section and stalls any concurrent reader on the same mutex. Now up to 8 pages are pre-allocated before the mutex is taken, leftovers are recycled into the per-pipe tmp_page[] cache before unlock, and any remainder is released after unlock, keeping the allocator out of the critical section on both sides. On a writers x readers sweep with 64KB writes against a 1 MB pipe throughput improves 6-28% and average write latency drops 5-22%; under memory pressure - when the cost of holding the mutex across reclaim is highest - throughput improves 21-48% and latency drops 17-33%. The microbenchmark is added to selftests. - uaccess/sockptr: fix the ignored_trailing logic in copy_struct_to_user() to behave as documented and the usize check in copy_struct_from_sockptr() for user pointers, and add copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr() helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC). - bpf: add a sleepable bpf_real_inode() kfunc that resolves the real inode backing a dentry via d_real_inode(). On overlayfs the inode attached to the dentry doesn't carry the underlying device information; this is used by the filesystem restriction BPF program that was merged into systemd. - docs: add guidelines for submitting new filesystems, motivated by the maintenance burden abandoned and untestable filesystems impose on VFS developers, blocking infrastructure work like folio conversions and iomap migration. Fixes: - libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() and drop the now-redundant assignments in callers. This began as a one-line dma-buf fix for a path_noexec() warning; a pseudo filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo() callers were audited: the only visible effect is on dma-buf where SB_I_NOEXEC silences the warning. - Handle set_blocksize() failures in legacy filesystems (bfs, hpfs, qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a device with a sector size > PAGE_SIZE crashed roughly half of them; the rest had the same missing error handling pattern. Plus a follow-up releasing the superblock buffer_head when setting the minix v3 block size fails. - mount: honour SB_NOUSER in the new mount API. - fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by switching the process-group paths of send_sigio() and send_sigurg() from read_lock(&tasklist_lock) to RCU, matching the single-PID path. - vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing delegated NFS mounts (fsopen() in a container with the mount performed by a privileged daemon) that broke when non-init s_user_ns was tied to FS_USERNS_MOUNT. - selftests/namespaces: fix a hang in nsid_test where an unreaped grandchild kept the TAP pipe write-end open, a waitpid(-1) race in listns_efault_test, and a false FAIL on kernels without listns() where the tests should SKIP. - filelock: fix the break_lease() stub signature for CONFIG_FILE_LOCKING=n. - init/initramfs_test: wait for the async initramfs unpacking before running; the test and do_populate_rootfs() share the parser state. - fs/coredump: reduce redundant log noise in validate_coredump_safety(). - iomap: pass the correct length to fserror_report_io() in __iomap_write_begin(). - backing-file: fix the backing_file_open() kerneldoc. Cleanups: - initramfs: refactor the cpio hex header parsing to use hex2bin() instead of the hand-rolled simple_strntoul() which is reverted, and extend the initramfs KUnit tests to cover header fields with 0x prefixes. - Replace __get_free_pages() and friends with kmalloc()/kzalloc() across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2, isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the do_mounts init code - part of the larger work of replacing page allocator calls with kmalloc(). - Use clear_and_wake_up_bit() in unlock_buffer() and journal_end_buffer_io_sync() instead of open-coding the sequence. - Drop unused VFS exports: unexport drop_super_exclusive(), remove start_removing_user_path_at(), and fold __start_removing_path() into start_removing_path(). - fs/read_write: narrow the __kernel_write() export with EXPORT_SYMBOL_FOR_MODULES(). - vfs: uapi: retire octal and hex constants in favor of (1 << n) for the O_ flags. Finding a free bit for a new flag across the architectures was needlessly hard with the mixed bases. - dcache: add extra sanity checks of dead dentries in dentry_free() via a new DENTRY_WARN_ONCE() that also prints d_flags. - iov_iter: use kmemdup_array() in dup_iter() to harden the allocation against multiplication overflow. - fs/pipe: write to ->poll_usage only once. - vfs: remove an always-taken if-branch in find_next_fd(). - dcache: use kmalloc_flex() for struct external_name in __d_alloc(). - namei: use QSTR() instead of QSTR_INIT() in path_pts(). - sync_file_range: delete dead S_ISLNK code. - Comment fixes: retire a stale comment in fget_task_next() and fix assorted spelling mistakes" * tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits) backing-file: fix backing_file_open() kerneldoc parameter iomap: pass the correct len to fserror_report_io in __iomap_write_begin vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags bpf: add bpf_real_inode() kfunc fs/read_write: Do not export __kernel_write() to the entire world libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() mount: honour SB_NOUSER in the new mount API fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling selftests/pipe: add pipe_bench microbenchmark fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write fs: retire stale comment in fget_task_next() fs: fix spelling mistakes in comment bfs: replace get_zeroed_page() with kzalloc() binfmt_misc: replace __get_free_page() with kmalloc() configfs: replace __get_free_pages() with kzalloc() fs/namespace: use __getname() to allocate mntpath buffer fs/select: replace __get_free_page() with kmalloc() ...
11 daysjbd2: remove special jbd2 slabsMatthew Wilcox (Oracle)3-128/+13
When jbd2 was originally written, kmalloc() would not guarantee memory alignment for the requested objects. Since commit 59bb47985c1d in 2019, kmalloc has guaranteed natural alignment for power-of-two allocations. We can now remove the jbd2 special slabs and just use kmalloc() directly. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: Tal Zussman <tz2294@columbia.edu> Link: https://patch.msgid.link/20260528171413.1088143-1-willy@infradead.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-06-04jbd2: Convert jbd2_write_superblock() to bh_submit()Matthew Wilcox (Oracle)1-3/+1
Avoid an extra indirect function call and changing the buffer refcount by using bh_submit() instead of submit_bh(). Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://patch.msgid.link/20260528173150.1093780-18-willy@infradead.org Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: linux-ext4@vger.kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-06-04jbd2: Convert journal commit to bh_submit()Matthew Wilcox (Oracle)1-6/+7
Avoid an extra indirect function call by using bh_submit() instead of submit_bh() in journal_submit_commit_record() and jbd2_journal_commit_transaction(). These both use journal_end_buffer_io_sync(), so it's more straightforward to do them both at once. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://patch.msgid.link/20260528173150.1093780-17-willy@infradead.org Acked-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Cc: linux-ext4@vger.kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-06-03jbd2: fix integer underflow in jbd2_journal_initialize_fast_commit()Junrui Luo1-0/+2
jbd2_journal_initialize_fast_commit() validates journal capacity by checking (journal->j_last - num_fc_blks < JBD2_MIN_JOURNAL_BLOCKS). Both j_last and num_fc_blks are unsigned, so when num_fc_blks exceeds j_last the subtraction wraps to a large value, bypassing the bounds check. The resulting underflow corrupts j_last, j_fc_first, and j_free, leading to journal abort. Fix by checking num_fc_blks against j_last before the subtraction, returning -EFSCORRUPTED. Fixes: 6866d7b3f2bb ("ext4 / jbd2: add fast commit initialization") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Fixes: e029c5f27987 ("ext4: make num of fast commit blocks configurable") Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Fixes: e029c5f279872 ("ext4: make num of fast commit blocks configurable") Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/SYBPR01MB7881663C927DE9D7BBF4D1DFAF062@SYBPR01MB7881.ausprd01.prod.outlook.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-06-03jbd2: check for aborted handle in jbd2_journal_dirty_metadata()Deepanshu Kartikey1-2/+7
jbd2_journal_dirty_metadata() unconditionally dereferences handle->h_transaction at function entry to obtain the journal pointer: transaction_t *transaction = handle->h_transaction; journal_t *journal = transaction->t_journal; However, h_transaction may legitimately be NULL for an aborted handle. The is_handle_aborted() helper in include/linux/jbd2.h explicitly treats !h_transaction as one of the aborted states: if (handle->h_aborted || !handle->h_transaction) return 1; Every other entry point in fs/jbd2/transaction.c (jbd2_journal_get_{write,undo,create}_access, jbd2_journal_extend, jbd2_journal_restart, jbd2_journal_stop, etc.) guards against this with an is_handle_aborted() check before any dereference of h_transaction. jbd2_journal_dirty_metadata() was missing this guard. This is reachable from ocfs2's xattr code. ocfs2_xa_set() intentionally falls through to ocfs2_xa_journal_dirty() even after ocfs2_xa_prepare_entry() fails, on the assumption that the buffer needs to be journaled to record any partial modifications (see the comment above the out_dirty label in fs/ocfs2/xattr.c). If the failure was caused by the journal being aborted -- e.g. an underlying I/O error during a sub-operation such as __ocfs2_remove_xattr_range() -- the handle's h_transaction has been cleared by the abort path, and the unconditional deref in jbd2_journal_dirty_metadata() becomes a NULL deref. Reproduced by syzbot with a crafted ocfs2 image where I/O against the loop device backing the mount is sabotaged via LOOP_SET_STATUS64 between two setxattr() calls, causing the second setxattr (which truncates an external xattr value) to abort the journal mid-flight: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000 KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] RIP: jbd2_journal_dirty_metadata+0x4a/0xd30 fs/jbd2/transaction.c:1520 Call Trace: ocfs2_journal_dirty+0x130/0x700 fs/ocfs2/journal.c:831 ocfs2_xa_journal_dirty fs/ocfs2/xattr.c:1483 [inline] ocfs2_xa_set+0x15e3/0x2ec0 fs/ocfs2/xattr.c:2294 ocfs2_xattr_block_set+0x3e0/0x33c0 fs/ocfs2/xattr.c:3016 __ocfs2_xattr_set_handle+0x6b3/0xf50 fs/ocfs2/xattr.c:3418 ocfs2_xattr_set+0xf3f/0x13e0 fs/ocfs2/xattr.c:3681 __vfs_setxattr+0x43c/0x480 fs/xattr.c:218 ... Fix by adding the standard is_handle_aborted() guard at the top of jbd2_journal_dirty_metadata() and returning -EROFS, matching the pattern used by every other entry point in this file. ocfs2_journal_dirty() already handles a non-zero return from jbd2_journal_dirty_metadata() correctly. Reported-by: syzbot+98f651460e558a21baae@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=98f651460e558a21baae Tested-by: syzbot+98f651460e558a21baae@syzkaller.appspotmail.com Signed-off-by: Deepanshu Kartikey <kartikey406@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260507050605.50081-1-kartikey406@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-05-27jbd2: replace __get_free_pages() with kmalloc()Mike Rapoport (Microsoft)1-5/+2
jbd2_alloc() falls back from kmem_cache_alloc() to __get_free_pages() for allocations larger than PAGE_SIZE. But kmalloc() can handle such cases with essentially the same fallback. Replace use of __get_free_pages() with kmalloc() and simplify jbd2_free() as both kmem_cache_alloc() and kmalloc() allocations can be freed with kfree(). Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Link: https://patch.msgid.link/20260523-b4-fs-v1-10-275e36a83f0e@kernel.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-27fs: jbd2: use clear_and_wake_up_bit() in journal_end_buffer_io_sync()Agatha Isabelle Moreira1-3/+1
Use `clear_and_wake_up_bit()` in `journal_end_buffer_io_sync()`, since the helper was introduced in 'commit 8236b0ae31c83 ("bdi: wake up concurrent wb_shutdown() callers.")' as a generic way of doing the same sequence of operations: clear_bit_unlock(); smp_mb__after_atomic(); wake_up_bit(); The helper was first implemented to avoid bugs caused by forgetting to call `wake_up_bit()` after `clear_bit_unlock()`. Since `journal_end_buffer_io_sync()` was first introduced by 'commit 470decc613ab2 ("jbd2: initial copy of files from jbd")' and last modified in this operation by 'commit 4e857c58efeb9 ("arch: Mass conversion of smp_mb__*()")', years before `clear_and_wake_up_bit()`, it still uses the open-coded sequence. Replace the open-coded sequence with the helper to avoid duplicate code and reduce code paths to maintain. Suggested-by: shuo chen <1289151713@qq.com> Link: https://lore.kernel.org/kernelnewbies/agzoqV835-co4kAN@guidai/T/#t Signed-off-by: Agatha Isabelle Moreira <code@agatha.dev> Link: https://patch.msgid.link/ag4SrrOl7R2DcLLi@guidai Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-04-17Merge tag 'ext4_for_linux-7.0-rc1' of ↵Linus Torvalds4-55/+155
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - Refactor code paths involved with partial block zero-out in prearation for converting ext4 to use iomap for buffered writes - Remove use of d_alloc() from ext4 in preparation for the deprecation of this interface - Replace some J_ASSERTS with a journal abort so we can avoid a kernel panic for a localized file system error - Simplify various code paths in mballoc, move_extent, and fast commit - Fix rare deadlock in jbd2_journal_cancel_revoke() that can be triggered by generic/013 when blocksize < pagesize - Fix memory leak when releasing an extended attribute when its value is stored in an ea_inode - Fix various potential kunit test bugs in fs/ext4/extents.c - Fix potential out-of-bounds access in check_xattr() with a corrupted file system - Make the jbd2_inode dirty range tracking safe for lockless reads - Avoid a WARN_ON when writeback files due to a corrupted file system; we already print an ext4 warning indicatign that data will be lost, so the WARN_ON is not necessary and doesn't add any new information * tag 'ext4_for_linux-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits) jbd2: fix deadlock in jbd2_journal_cancel_revoke() ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all() ext4: fix possible null-ptr-deref in mbt_kunit_exit() ext4: fix possible null-ptr-deref in extents_kunit_exit() ext4: fix the error handling process in extents_kunit_init). ext4: call deactivate_super() in extents_kunit_exit() ext4: fix miss unlock 'sb->s_umount' in extents_kunit_init() ext4: fix bounds check in check_xattrs() to prevent out-of-bounds access ext4: zero post-EOF partial block before appending write ext4: move pagecache_isize_extended() out of active handle ext4: remove ctime/mtime update from ext4_alloc_file_blocks() ext4: unify SYNC mode checks in fallocate paths ext4: ensure zeroed partial blocks are persisted in SYNC mode ext4: move zero partial block range functions out of active handle ext4: pass allocate range as loff_t to ext4_alloc_file_blocks() ext4: remove handle parameters from zero partial block functions ext4: move ordered data handling out of ext4_block_do_zero_range() ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range() ext4: factor out journalled block zeroing range ext4: rename and extend ext4_block_truncate_page() ...
2026-04-13Merge tag 'vfs-7.1-rc1.kino' of ↵Linus Torvalds2-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64
2026-04-09jbd2: fix deadlock in jbd2_journal_cancel_revoke()Zhang Yi1-3/+5
Commit f76d4c28a46a ("fs/jbd2: use sleeping version of __find_get_block()") changed jbd2_journal_cancel_revoke() to use __find_get_block_nonatomic() which holds the folio lock instead of i_private_lock. This breaks the lock ordering (folio -> buffer) and causes an ABBA deadlock when the filesystem blocksize < pagesize: T1 T2 ext4_mkdir() ext4_init_new_dir() ext4_append() ext4_getblk() lock_buffer() <- A sync_blockdev() blkdev_writepages() writeback_iter() writeback_get_folio() folio_lock() <- B ext4_journal_get_create_access() jbd2_journal_cancel_revoke() __find_get_block_nonatomic() folio_lock() <- B block_write_full_folio() lock_buffer() <- A This can occasionally cause generic/013 to hang. Fix by only calling __find_get_block_nonatomic() when the passed buffer_head doesn't belong to the bdev, which is the only case that we need to look up its bdev alias. Otherwise, the lookup is redundant since the found buffer_head is equal to the one we passed in. Fixes: f76d4c28a46a ("fs/jbd2: use sleeping version of __find_get_block()") Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20260409114204.917154-1-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2026-04-09jbd2: store jinode dirty range in PAGE_SIZE unitsLi Chen3-23/+58
jbd2_inode fields are updated under journal->j_list_lock, but some paths read them without holding the lock (e.g. fast commit helpers and ordered truncate helpers). READ_ONCE() alone is not sufficient for the dirty range fields when they are stored as loff_t because 32-bit platforms can observe torn loads. Store the dirty range in PAGE_SIZE units as pgoff_t instead. Represent the dirty range end as an exclusive end page. This avoids a special sentinel value and keeps MAX_LFS_FILESIZE on 32-bit representable. Publish a new dirty range by updating end_page before start_page, and treat start_page >= end_page as empty in the accessor for robustness. Use READ_ONCE() on the read side and WRITE_ONCE() on the write side for the dirty range and i_flags to match the existing lockless access pattern. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-5-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09jbd2: gracefully abort on transaction state corruptionsMilos Nikic1-28/+86
Auditing the jbd2 codebase reveals several legacy J_ASSERT calls that enforce internal state machine invariants (e.g., verifying jh->b_transaction or jh->b_next_transaction pointers). When these invariants are broken, the journal is in a corrupted state. However, triggering a fatal panic brings down the entire system for a localized filesystem error. This patch targets a specific class of these asserts: those residing inside functions that natively return integer error codes, booleans, or error pointers. It replaces the hard J_ASSERTs with WARN_ON_ONCE to capture the offending stack trace, safely drops any held locks, gracefully aborts the journal, and returns -EINVAL. This prevents a catastrophic kernel panic while ensuring the corrupted journal state is safely contained and upstream callers (like ext4 or ocfs2) can gracefully handle the aborted handle. Functions modified in fs/jbd2/transaction.c: - jbd2__journal_start() - do_get_write_access() - jbd2_journal_dirty_metadata() - jbd2_journal_forget() - jbd2_journal_try_to_free_buffers() - jbd2_journal_file_inode() Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260304172016.23525-3-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-09jbd2: gracefully abort instead of panicking on unlocked bufferMilos Nikic1-1/+6
In jbd2_journal_get_create_access(), if the caller passes an unlocked buffer, the code currently triggers a fatal J_ASSERT. While an unlocked buffer here is a clear API violation and a bug in the caller, crashing the entire system is an overly severe response. It brings down the whole machine for a localized filesystem inconsistency. Replace the J_ASSERT with a WARN_ON_ONCE to capture the offending caller's stack trace, and return an error (-EINVAL). This allows the journal to gracefully abort the transaction, protecting data integrity without causing a kernel panic. Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20260304172016.23525-2-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-03-27jbd2: gracefully abort on checkpointing state corruptionsMilos Nikic1-2/+13
This patch targets two internal state machine invariants in checkpoint.c residing inside functions that natively return integer error codes. - In jbd2_cleanup_journal_tail(): A blocknr of 0 indicates a severely corrupted journal superblock. Replaced the J_ASSERT with a WARN_ON_ONCE and a graceful journal abort, returning -EFSCORRUPTED. - In jbd2_log_do_checkpoint(): Replaced the J_ASSERT_BH checking for an unexpected buffer_jwrite state. If the warning triggers, we explicitly drop the just-taken get_bh() reference and call __flush_batch() to safely clean up any previously queued buffers in the j_chkpt_bhs array, preventing a memory leak before returning -EFSCORRUPTED. Signed-off-by: Milos Nikic <nikic.milos@gmail.com> Reviewed-by: Andreas Dilger <adilger@dilger.ca> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Baokun Li <libaokun@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260311041548.159424-1-nikic.milos@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2026-03-06treewide: change inode->i_ino from unsigned long to u64Jeff Layton2-3/+3
On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds1-2/+1
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds2-4/+4
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook2-7/+6
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2025-11-26jbd2: fix the inconsistency between checksum and data in memory for journal sbYe Bin1-0/+14
Copying the file system while it is mounted as read-only results in a mount failure: [~]# mkfs.ext4 -F /dev/sdc [~]# mount /dev/sdc -o ro /mnt/test [~]# dd if=/dev/sdc of=/dev/sda bs=1M [~]# mount /dev/sda /mnt/test1 [ 1094.849826] JBD2: journal checksum error [ 1094.850927] EXT4-fs (sda): Could not load journal inode mount: mount /dev/sda on /mnt/test1 failed: Bad message The process described above is just an abstracted way I came up with to reproduce the issue. In the actual scenario, the file system was mounted read-only and then copied while it was still mounted. It was found that the mount operation failed. The user intended to verify the data or use it as a backup, and this action was performed during a version upgrade. Above issue may happen as follows: ext4_fill_super set_journal_csum_feature_set(sb) if (ext4_has_metadata_csum(sb)) incompat = JBD2_FEATURE_INCOMPAT_CSUM_V3; if (test_opt(sb, JOURNAL_CHECKSUM) jbd2_journal_set_features(sbi->s_journal, compat, 0, incompat); lock_buffer(journal->j_sb_buffer); sb->s_feature_incompat |= cpu_to_be32(incompat); //The data in the journal sb was modified, but the checksum was not updated, so the data remaining in memory has a mismatch between the data and the checksum. unlock_buffer(journal->j_sb_buffer); In this case, the journal sb copied over is in a state where the checksum and data are inconsistent, so mounting fails. To solve the above issue, update the checksum in memory after modifying the journal sb. Fixes: 4fd5ea43bc11 ("jbd2: checksum journal superblock") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Reviewed-by: Darrick J. Wong <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251103010123.3753631-1-yebin@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2025-11-26jbd2: store more accurate errno in superblock when possibleWengang Wang3-9/+13
When jbd2_journal_abort() is called, the provided error code is stored in the journal superblock. Some existing calls hard-code -EIO even when the actual failure is not I/O related. This patch updates those calls to pass more accurate error codes, allowing the superblock to record the true cause of failure. This helps improve diagnostics and debugging clarity when analyzing journal aborts. Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Message-ID: <20251031210501.7337-1-wen.gang.wang@oracle.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-26jbd2: avoid bug_on in jbd2_journal_get_create_access() when file system ↵Ye Bin1-5/+14
corrupted There's issue when file system corrupted: ------------[ cut here ]------------ kernel BUG at fs/jbd2/transaction.c:1289! Oops: invalid opcode: 0000 [#1] SMP KASAN PTI CPU: 5 UID: 0 PID: 2031 Comm: mkdir Not tainted 6.18.0-rc1-next RIP: 0010:jbd2_journal_get_create_access+0x3b6/0x4d0 RSP: 0018:ffff888117aafa30 EFLAGS: 00010202 RAX: 0000000000000000 RBX: ffff88811a86b000 RCX: ffffffff89a63534 RDX: 1ffff110200ec602 RSI: 0000000000000004 RDI: ffff888100763010 RBP: ffff888100763000 R08: 0000000000000001 R09: ffff888100763028 R10: 0000000000000003 R11: 0000000000000000 R12: 0000000000000000 R13: ffff88812c432000 R14: ffff88812c608000 R15: ffff888120bfc000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f91d6970c99 CR3: 00000001159c4000 CR4: 00000000000006f0 Call Trace: <TASK> __ext4_journal_get_create_access+0x42/0x170 ext4_getblk+0x319/0x6f0 ext4_bread+0x11/0x100 ext4_append+0x1e6/0x4a0 ext4_init_new_dir+0x145/0x1d0 ext4_mkdir+0x326/0x920 vfs_mkdir+0x45c/0x740 do_mkdirat+0x234/0x2f0 __x64_sys_mkdir+0xd6/0x120 do_syscall_64+0x5f/0xfa0 entry_SYSCALL_64_after_hwframe+0x76/0x7e The above issue occurs with us in errors=continue mode when accompanied by storage failures. There have been many inconsistencies in the file system data. In the case of file system data inconsistency, for example, if the block bitmap of a referenced block is not set, it can lead to the situation where a block being committed is allocated and used again. As a result, the following condition will not be satisfied then trigger BUG_ON. Of course, it is entirely possible to construct a problematic image that can trigger this BUG_ON through specific operations. In fact, I have constructed such an image and easily reproduced this issue. Therefore, J_ASSERT() holds true only under ideal conditions, but it may not necessarily be satisfied in exceptional scenarios. Using J_ASSERT() directly in abnormal situations would cause the system to crash, which is clearly not what we want. So here we directly trigger a JBD abort instead of immediately invoking BUG_ON. Fixes: 470decc613ab ("[PATCH] jbd2: initial copy of files from jbd") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20251025072657.307851-1-yebin@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2025-11-13jbd2: use a weaker annotation in journal handlingByungchul Park1-1/+1
jbd2 journal handling code doesn't want jbd2_might_wait_for_commit() to be placed between start_this_handle() and stop_this_handle(). So it marks the region with rwsem_acquire_read() and rwsem_release(). However, the annotation is too strong for that purpose. We don't have to use more than try lock annotation for that. rwsem_acquire_read() implies: 1. might be a waiter on contention of the lock. 2. enter to the critical section of the lock. All we need in here is to act 2, not 1. So trylock version of annotation is sufficient for that purpose. Now that dept partially relies on lockdep annotaions, dept interpets rwsem_acquire_read() as a potential wait and might report a deadlock by the wait. Replace it with trylock version of annotation. Signed-off-by: Byungchul Park <byungchul@sk.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: stable@kernel.org Message-ID: <20251024073940.1063-1-byungchul@sk.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-11-13jbd2: use a per-journal lock_class_key for jbd2_trans_commit_keyTetsuo Handa1-2/+4
syzbot is reporting possibility of deadlock due to sharing lock_class_key for jbd2_handle across ext4 and ocfs2. But this is a false positive, for one disk partition can't have two filesystems at the same time. Reported-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=6e493c165d26d6fcbf72 Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Tested-by: syzbot+6e493c165d26d6fcbf72@syzkaller.appspotmail.com Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <987110fc-5470-457a-a218-d286a09dd82f@I-love.SAKURA.ne.jp> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2025-10-10jbd2: ensure that all ongoing I/O complete before freeing blocksZhang Yi1-4/+9
When releasing file system metadata blocks in jbd2_journal_forget(), if this buffer has not yet been checkpointed, it may have already been written back, currently be in the process of being written back, or has not yet written back. jbd2_journal_forget() calls jbd2_journal_try_remove_checkpoint() to check the buffer's status and add it to the current transaction if it has not been written back. This buffer can only be reallocated after the transaction is committed. jbd2_journal_try_remove_checkpoint() attempts to lock the buffer and check its dirty status while holding the buffer lock. If the buffer has already been written back, everything proceeds normally. However, there are two issues. First, the function returns immediately if the buffer is locked by the write-back process. It does not wait for the write-back to complete. Consequently, until the current transaction is committed and the block is reallocated, there is no guarantee that the I/O will complete. This means that ongoing I/O could write stale metadata to the newly allocated block, potentially corrupting data. Second, the function unlocks the buffer as soon as it detects that the buffer is still dirty. If a concurrent write-back occurs immediately after this unlocking and before clear_buffer_dirty() is called in jbd2_journal_forget(), data corruption can theoretically still occur. Although these two issues are unlikely to occur in practice since the undergoing metadata writeback I/O does not take this long to complete, it's better to explicitly ensure that all ongoing I/O operations are completed. Fixes: 597599268e3b ("jbd2: discard dirty data when forgetting an un-journalled buffer") Cc: stable@kernel.org Suggested-by: Jan Kara <jack@suse.cz> Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Message-ID: <20250916093337.3161016-2-yi.zhang@huaweicloud.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-09-25jbd2: increase IO priority of checkpointJulian Sun1-1/+1
In commit 6a3afb6ac6df ("jbd2: increase the journal IO's priority"), the priority of IOs initiated by jbd2 has been raised, exempting them from WBT throttling. Checkpoint is also a crucial operation of jbd2. While no serious issues have been observed so far, it should still be reasonable to exempt checkpoint from WBT throttling. Signed-off-by: Julian Sun <sunjunchao@bytedance.com> Reviewed-by: Zhang Yi <yi.zhang@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-08-13jbd2: prevent softlockup in jbd2_log_do_checkpoint()Baokun Li1-0/+1
Both jbd2_log_do_checkpoint() and jbd2_journal_shrink_checkpoint_list() periodically release j_list_lock after processing a batch of buffers to avoid long hold times on the j_list_lock. However, since both functions contend for j_list_lock, the combined time spent waiting and processing can be significant. jbd2_journal_shrink_checkpoint_list() explicitly calls cond_resched() when need_resched() is true to avoid softlockups during prolonged operations. But jbd2_log_do_checkpoint() only exits its loop when need_resched() is true, relying on potentially sleeping functions like __flush_batch() or wait_on_buffer() to trigger rescheduling. If those functions do not sleep, the kernel may hit a softlockup. watchdog: BUG: soft lockup - CPU#3 stuck for 156s! [kworker/u129:2:373] CPU: 3 PID: 373 Comm: kworker/u129:2 Kdump: loaded Not tainted 6.6.0+ #10 Hardware name: Huawei TaiShan 2280 /BC11SPCD, BIOS 1.27 06/13/2017 Workqueue: writeback wb_workfn (flush-7:2) pstate: 20000005 (nzCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--) pc : native_queued_spin_lock_slowpath+0x358/0x418 lr : jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] Call trace: native_queued_spin_lock_slowpath+0x358/0x418 jbd2_log_do_checkpoint+0x31c/0x438 [jbd2] __jbd2_log_wait_for_space+0xfc/0x2f8 [jbd2] add_transaction_credits+0x3bc/0x418 [jbd2] start_this_handle+0xf8/0x560 [jbd2] jbd2__journal_start+0x118/0x228 [jbd2] __ext4_journal_start_sb+0x110/0x188 [ext4] ext4_do_writepages+0x3dc/0x740 [ext4] ext4_writepages+0xa4/0x190 [ext4] do_writepages+0x94/0x228 __writeback_single_inode+0x48/0x318 writeback_sb_inodes+0x204/0x590 __writeback_inodes_wb+0x54/0xf8 wb_writeback+0x2cc/0x3d8 wb_do_writeback+0x2e0/0x2f8 wb_workfn+0x80/0x2a8 process_one_work+0x178/0x3e8 worker_thread+0x234/0x3b8 kthread+0xf0/0x108 ret_from_fork+0x10/0x20 So explicitly call cond_resched() in jbd2_log_do_checkpoint() to avoid softlockup. Cc: stable@kernel.org Signed-off-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250812063752.912130-1-libaokun@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar1-1/+1
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-05-20jbd2: remove journal_t argument from jbd2_superblock_csum()Eric Biggers1-3/+3
Since jbd2_superblock_csum() no longer uses its journal_t argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-5-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20jbd2: remove journal_t argument from jbd2_chksum()Eric Biggers3-12/+12
Since jbd2_chksum() no longer uses its journal_t argument, remove it. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Baokun Li <libaokun1@huawei.com> Link: https://patch.msgid.link/20250513053809.699974-4-ebiggers@kernel.org Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20ext4/jbd2: convert jbd2_journal_blocks_per_page() to support large folioZhang Yi1-3/+4
jbd2_journal_blocks_per_page() returns the number of blocks in a single page. Rename it to jbd2_journal_blocks_per_folio() and make it returns the number of blocks in the largest folio, preparing for the calculation of journal credits blocks when allocating blocks within a large folio in the writeback path. Signed-off-by: Zhang Yi <yi.zhang@huawei.com> Link: https://patch.msgid.link/20250512063319.3539411-5-yi.zhang@huaweicloud.com Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2025-05-20jbd2: fix data-race and null-ptr-deref in jbd2_journal_dirty_metadata()Jeongjun Park1-2/+3
Since handle->h_transaction may be a NULL pointer, so we should change it to call is_handle_aborted(handle) first before dereferencing it. And the following data-race was reported in my fuzzer: ================================================================== BUG: KCSAN: data-race in jbd2_journal_dirty_metadata / jbd2_journal_dirty_metadata write to 0xffff888011024104 of 4 bytes by task 10881 on cpu 1: jbd2_journal_dirty_metadata+0x2a5/0x770 fs/jbd2/transaction.c:1556 __ext4_handle_dirty_metadata+0xe7/0x4b0 fs/ext4/ext4_jbd2.c:358 ext4_do_update_inode fs/ext4/inode.c:5220 [inline] ext4_mark_iloc_dirty+0x32c/0xd50 fs/ext4/inode.c:5869 __ext4_mark_inode_dirty+0xe1/0x450 fs/ext4/inode.c:6074 ext4_dirty_inode+0x98/0xc0 fs/ext4/inode.c:6103 .... read to 0xffff888011024104 of 4 bytes by task 10880 on cpu 0: jbd2_journal_dirty_metadata+0xf2/0x770 fs/jbd2/transaction.c:1512 __ext4_handle_dirty_metadata+0xe7/0x4b0 fs/ext4/ext4_jbd2.c:358 ext4_do_update_inode fs/ext4/inode.c:5220 [inline] ext4_mark_iloc_dirty+0x32c/0xd50 fs/ext4/inode.c:5869 __ext4_mark_inode_dirty+0xe1/0x450 fs/ext4/inode.c:6074 ext4_dirty_inode+0x98/0xc0 fs/ext4/inode.c:6103 .... value changed: 0x00000000 -> 0x00000001 ================================================================== This issue is caused by missing data-race annotation for jh->b_modified. Therefore, the missing annotation needs to be added. Reported-by: syzbot+de24c3fe3c4091051710@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=de24c3fe3c4091051710 Fixes: 6e06ae88edae ("jbd2: speedup jbd2_journal_dirty_metadata()") Signed-off-by: Jeongjun Park <aha310510@gmail.com> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250514130855.99010-1-aha310510@gmail.com Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org
2025-05-08ext4: rework fast commit commit pathHarshad Shirwadkar1-2/+0
This patch reworks fast commit's commit path to remove locking the journal for the entire duration of a fast commit. Instead, we only lock the journal while marking all the eligible inodes as "committing". This allows handles to make progress in parallel with the fast co