aboutsummaryrefslogtreecommitdiff
path: root/fs/ocfs2
AgeCommit message (Collapse)AuthorFilesLines
2026-04-21Merge tag 'pull-dcache-busy-wait' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache busy loop updates from Al Viro: "Fix livelocks in shrink_dcache_tree() If shrink_dcache_tree() finds a dentry in the middle of being killed by another thread, it has to wait until the victim finishes dying, gets detached from the tree and ceases to pin its parent. The way we used to deal with that amounted to busy-wait; unfortunately, it's not just inefficient but can lead to reliably reproducible hard livelocks. Solved by having shrink_dentry_tree() attach a completion to such dentry, with dentry_unlist() calling complete() on all objects attached to it. With a bit of care it can be done without growing struct dentry or adding overhead in normal case" * tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of busy-waiting in shrink_dcache_tree() dcache.c: more idiomatic "positives are not allowed" sanity checks struct dentry: make ->d_u anonymous for_each_alias(): helper macro for iterating through dentries of given inode
2026-04-17Merge tag 'ext4_for_linux-7.0-rc1' of ↵Linus Torvalds1-2/+7
git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 Pull ext4 updates from Ted Ts'o: - Refactor code paths involved with partial block zero-out in prearation for converting ext4 to use iomap for buffered writes - Remove use of d_alloc() from ext4 in preparation for the deprecation of this interface - Replace some J_ASSERTS with a journal abort so we can avoid a kernel panic for a localized file system error - Simplify various code paths in mballoc, move_extent, and fast commit - Fix rare deadlock in jbd2_journal_cancel_revoke() that can be triggered by generic/013 when blocksize < pagesize - Fix memory leak when releasing an extended attribute when its value is stored in an ea_inode - Fix various potential kunit test bugs in fs/ext4/extents.c - Fix potential out-of-bounds access in check_xattr() with a corrupted file system - Make the jbd2_inode dirty range tracking safe for lockless reads - Avoid a WARN_ON when writeback files due to a corrupted file system; we already print an ext4 warning indicatign that data will be lost, so the WARN_ON is not necessary and doesn't add any new information * tag 'ext4_for_linux-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4: (37 commits) jbd2: fix deadlock in jbd2_journal_cancel_revoke() ext4: fix missing brelse() in ext4_xattr_inode_dec_ref_all() ext4: fix possible null-ptr-deref in mbt_kunit_exit() ext4: fix possible null-ptr-deref in extents_kunit_exit() ext4: fix the error handling process in extents_kunit_init). ext4: call deactivate_super() in extents_kunit_exit() ext4: fix miss unlock 'sb->s_umount' in extents_kunit_init() ext4: fix bounds check in check_xattrs() to prevent out-of-bounds access ext4: zero post-EOF partial block before appending write ext4: move pagecache_isize_extended() out of active handle ext4: remove ctime/mtime update from ext4_alloc_file_blocks() ext4: unify SYNC mode checks in fallocate paths ext4: ensure zeroed partial blocks are persisted in SYNC mode ext4: move zero partial block range functions out of active handle ext4: pass allocate range as loff_t to ext4_alloc_file_blocks() ext4: remove handle parameters from zero partial block functions ext4: move ordered data handling out of ext4_block_do_zero_range() ext4: rename ext4_block_zero_page_range() to ext4_block_zero_range() ext4: factor out journalled block zeroing range ext4: rename and extend ext4_block_truncate_page() ...
2026-04-16Merge tag 'mm-nonmm-stable-2026-04-15-04-20' of ↵Linus Torvalds15-117/+244
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "pid: make sub-init creation retryable" (Oleg Nesterov) Make creation of init in a new namespace more robust by clearing away some historical cruft which is no longer needed. Also some documentation fixups - "selftests/fchmodat2: Error handling and general" (Mark Brown) Fix and a cleanup for the fchmodat2() syscall selftest - "lib: polynomial: Move to math/ and clean up" (Andy Shevchenko) - "hung_task: Provide runtime reset interface for hung task detector" (Aaron Tomlin) Give administrators the ability to zero out /proc/sys/kernel/hung_task_detect_count - "tools/getdelays: use the static UAPI headers from tools/include/uapi" (Thomas Weißschuh) Teach getdelays to use the in-kernel UAPI headers rather than the system-provided ones - "watchdog/hardlockup: Improvements to hardlockup" (Mayank Rungta) Several cleanups and fixups to the hardlockup detector code and its documentation - "lib/bch: fix undefined behavior from signed left-shifts" (Josh Law) A couple of small/theoretical fixes in the bch code - "ocfs2/dlm: fix two bugs in dlm_match_regions()" (Junrui Luo) - "cleanup the RAID5 XOR library" (Christoph Hellwig) A quite far-reaching cleanup to this code. I can't do better than to quote Christoph: "The XOR library used for the RAID5 parity is a bit of a mess right now. The main file sits in crypto/ despite not being cryptography and not using the crypto API, with the generic implementations sitting in include/asm-generic and the arch implementations sitting in an asm/ header in theory. The latter doesn't work for many cases, so architectures often build the code directly into the core kernel, or create another module for the architecture code. Change this to a single module in lib/ that also contains the architecture optimizations, similar to the library work Eric Biggers has done for the CRC and crypto libraries later. After that it changes to better calling conventions that allow for smarter architecture implementations (although none is contained here yet), and uses static_call to avoid indirection function call overhead" - "lib/list_sort: Clean up list_sort() scheduling workarounds" (Kuan-Wei Chiu) Clean up this library code by removing a hacky thing which was added for UBIFS, which UBIFS doesn't actually need - "Fix bugs in extract_iter_to_sg()" (Christian Ehrhardt) Fix a few bugs in the scatterlist code, add in-kernel tests for the now-fixed bugs and fix a leak in the test itself - "kdump: Enable LUKS-encrypted dump target support in ARM64 and PowerPC" (Coiby Xu) Enable support of the LUKS-encrypted device dump target on arm64 and powerpc - "ocfs2: consolidate extent list validation into block read callbacks" (Joseph Qi) Cleanup, simplify, and make more robust ocfs2's validation of extent list fields (Kernel test robot loves mounting corrupted fs images!) * tag 'mm-nonmm-stable-2026-04-15-04-20' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (127 commits) ocfs2: validate group add input before caching ocfs2: validate bg_bits during freefrag scan ocfs2: fix listxattr handling when the buffer is full doc: watchdog: fix typos etc update Sean's email address ocfs2: use get_random_u32() where appropriate ocfs2: split transactions in dio completion to avoid credit exhaustion ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path() ocfs2: validate extent block list fields during block read ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec() ocfs2: validate dx_root extent list fields during block read ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRY ocfs2: handle invalid dinode in ocfs2_group_extend .get_maintainer.ignore: add Askar ocfs2: validate bg_list extent bounds in discontig groups checkpatch: exclude forward declarations of const structs tools/accounting: handle truncated taskstats netlink messages taskstats: set version in TGID exit notifications ocfs2/heartbeat: fix slot mapping rollback leaks on error paths arm64,ppc64le/kdump: pass dm-crypt keys to kdump kernel ...
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-15ocfs2: validate group add input before cachingZhengYuan Huang1-5/+7
[BUG] OCFS2_IOC_GROUP_ADD can trigger a BUG_ON in ocfs2_set_new_buffer_uptodate(): kernel BUG at fs/ocfs2/uptodate.c:509! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI RIP: 0010:ocfs2_set_new_buffer_uptodate+0x194/0x1e0 fs/ocfs2/uptodate.c:509 Code: ffffe88f 42b9fe4c 89e64889 dfe8b4df Call Trace: ocfs2_group_add+0x3f1/0x1510 fs/ocfs2/resize.c:507 ocfs2_ioctl+0x309/0x6e0 fs/ocfs2/ioctl.c:887 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583 x64_sys_call+0x1144/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7bbfb55a966d [CAUSE] ocfs2_group_add() calls ocfs2_set_new_buffer_uptodate() on a user-controlled group block before ocfs2_verify_group_and_input() validates that block number. That helper is only valid for newly allocated metadata and asserts that the block is not already present in the chosen metadata cache. The code also uses INODE_CACHE(inode) even though the group descriptor belongs to main_bm_inode and later journal accesses use that cache context instead. [FIX] Validate the on-disk group descriptor before caching it, then add it to the metadata cache tracked by INODE_CACHE(main_bm_inode). Keep the validation failure path separate from the later cleanup path so we only remove the buffer from that cache after it has actually been inserted. This keeps the group buffer lifetime consistent across validation, journaling, and cleanup. Link: https://lkml.kernel.org/r/20260410020209.3786348-1-gality369@gmail.com Fixes: 7909f2bf8353 ("[PATCH 2/2] ocfs2: Implement group add for online resize") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: validate bg_bits during freefrag scanZhengYuan Huang1-1/+17
[BUG] A crafted filesystem can trigger an out-of-bounds bitmap walk when OCFS2_IOC_INFO is issued with OCFS2_INFO_FL_NON_COHERENT. BUG: KASAN: use-after-free in instrument_atomic_read include/linux/instrumented.h:68 [inline] BUG: KASAN: use-after-free in _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline] BUG: KASAN: use-after-free in test_bit_le include/asm-generic/bitops/le.h:21 [inline] BUG: KASAN: use-after-free in ocfs2_info_freefrag_scan_chain fs/ocfs2/ioctl.c:495 [inline] BUG: KASAN: use-after-free in ocfs2_info_freefrag_scan_bitmap fs/ocfs2/ioctl.c:588 [inline] BUG: KASAN: use-after-free in ocfs2_info_handle_freefrag fs/ocfs2/ioctl.c:662 [inline] BUG: KASAN: use-after-free in ocfs2_info_handle_request+0x1c66/0x3370 fs/ocfs2/ioctl.c:754 Read of size 8 at addr ffff888031bce000 by task syz.0.636/1435 Call Trace: __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0xbe/0x130 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:378 [inline] print_report+0xd1/0x650 mm/kasan/report.c:482 kasan_report+0xfb/0x140 mm/kasan/report.c:595 check_region_inline mm/kasan/generic.c:186 [inline] kasan_check_range+0x11c/0x200 mm/kasan/generic.c:200 __kasan_check_read+0x11/0x20 mm/kasan/shadow.c:31 instrument_atomic_read include/linux/instrumented.h:68 [inline] _test_bit include/asm-generic/bitops/instrumented-non-atomic.h:141 [inline] test_bit_le include/asm-generic/bitops/le.h:21 [inline] ocfs2_info_freefrag_scan_chain fs/ocfs2/ioctl.c:495 [inline] ocfs2_info_freefrag_scan_bitmap fs/ocfs2/ioctl.c:588 [inline] ocfs2_info_handle_freefrag fs/ocfs2/ioctl.c:662 [inline] ocfs2_info_handle_request+0x1c66/0x3370 fs/ocfs2/ioctl.c:754 ocfs2_info_handle+0x18d/0x2a0 fs/ocfs2/ioctl.c:828 ocfs2_ioctl+0x632/0x6e0 fs/ocfs2/ioctl.c:913 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583 ... [CAUSE] ocfs2_info_freefrag_scan_chain() uses on-disk bg_bits directly as the bitmap scan limit. The coherent path reads group descriptors through ocfs2_read_group_descriptor(), which validates the descriptor before use. The non-coherent path uses ocfs2_read_blocks_sync() instead and skips that validation, so an impossible bg_bits value can drive the bitmap walk past the end of the block. [FIX] Compute the bitmap capacity from the filesystem format with ocfs2_group_bitmap_size(), report descriptors whose bg_bits exceeds that limit, and clamp the scan to the computed capacity. This keeps the freefrag report going while avoiding reads beyond the buffer. Link: https://lkml.kernel.org/r/20260410034220.3825769-1-gality369@gmail.com Fixes: d24a10b9f8ed ("Ocfs2: Add a new code 'OCFS2_INFO_FREEFRAG' for o2info ioctl.") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: fix listxattr handling when the buffer is fullZhengYuan Huang1-2/+2
[BUG] If an OCFS2 inode has both inline and block-based xattrs, listxattr() can return a size larger than the caller's buffer when the inline names consume that buffer exactly. kernel BUG at mm/usercopy.c:102! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI RIP: 0010:usercopy_abort+0xb7/0xd0 mm/usercopy.c:102 Call Trace: __check_heap_object+0xe3/0x120 mm/slub.c:8243 check_heap_object mm/usercopy.c:196 [inline] __check_object_size mm/usercopy.c:250 [inline] __check_object_size+0x5c5/0x780 mm/usercopy.c:215 check_object_size include/linux/ucopysize.h:22 [inline] check_copy_size include/linux/ucopysize.h:59 [inline] copy_to_user include/linux/uaccess.h:219 [inline] listxattr+0xb0/0x170 fs/xattr.c:926 filename_listxattr fs/xattr.c:958 [inline] path_listxattrat+0x137/0x320 fs/xattr.c:988 __do_sys_listxattr fs/xattr.c:1001 [inline] __se_sys_listxattr fs/xattr.c:998 [inline] __x64_sys_listxattr+0x7f/0xd0 fs/xattr.c:998 ... [CAUSE] Commit 936b8834366e ("ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().") replaced the old per-handler list accounting with ocfs2_xattr_list_entry(), but it kept using size == 0 to detect probe mode. That assumption stops being true once ocfs2_listxattr() finishes the inline-xattr pass. If the inline names fill the caller buffer exactly, the block-xattr pass runs with a non-NULL buffer and a remaining size of zero. ocfs2_xattr_list_entry() then skips the bounds check, keeps counting block names, and returns a positive size larger than the supplied buffer. [FIX] Detect probe mode by testing whether the destination buffer pointer is NULL instead of whether the remaining size is zero. That restores the pre-refactor behavior and matches the OCFS2 getxattr helpers. Once the remaining buffer reaches zero while more names are left, the block-xattr pass now returns -ERANGE instead of reporting a size larger than the allocated list buffer. Link: https://lkml.kernel.org/r/20260410040339.3837162-1-gality369@gmail.com Fixes: 936b8834366e ("ocfs2: Refactor xattr list and remove ocfs2_xattr_handler().") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: use get_random_u32() where appropriateDavid Carlier1-1/+1
Use the typed random integer helpers instead of get_random_bytes() when filling a single integer variable. The helpers return the value directly, require no pointer or size argument, and better express intent. Link: https://lkml.kernel.org/r/20260405154720.4732-1-devnexen@gmail.com Signed-off-by: David Carlier <devnexen@gmail.com Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: split transactions in dio completion to avoid credit exhaustionHeming Zhao1-29/+45
During ocfs2 dio operations, JBD2 may report warnings via following call trace: ocfs2_dio_end_io_write ocfs2_mark_extent_written ocfs2_change_extent_flag ocfs2_split_extent ocfs2_try_to_merge_extent ocfs2_extend_rotate_transaction ocfs2_extend_trans jbd2__journal_restart start_this_handle output: JBD2: kworker/6:2 wants too many credits credits:5450 rsv_credits:0 max:5449 To prevent exceeding the credits limit, modify ocfs2_dio_end_io_write() to handle extents in a batch of transaction. Additionally, relocate ocfs2_del_inode_from_orphan(). The orphan inode should only be removed from the orphan list after the extent tree update is complete. This ensures that if a crash occurs in the middle of extent tree updates, we won't leave stale blocks beyond EOF. This patch also changes the logic for updating the inode size and removing orphan, making it similar to ext4_dio_write_end_io(). Both operations are performed only when everything looks good. Finally, thanks to Jans and Joseph for providing the bug fix prototype and suggestions. Link: https://lkml.kernel.org/r/20260402134328.27334-2-heming.zhao@suse.com Signed-off-by: Heming Zhao <heming.zhao@suse.com> Suggested-by: Jan Kara <jack@suse.cz> Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: remove redundant l_next_free_rec check in __ocfs2_find_path()Joseph Qi1-12/+0
The l_next_free_rec > l_count check after ocfs2_read_extent_block() in __ocfs2_find_path() is now redundant, as ocfs2_validate_extent_block() already performs this validation at block read time. Remove the duplicate check to avoid maintaining the same validation in two places. Link: https://lkml.kernel.org/r/20260403090803.3860971-5-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: validate extent block list fields during block readJoseph Qi1-1/+22
Add extent list validation to ocfs2_validate_extent_block() so that corrupted on-disk fields are caught early at block read time rather than during extent tree traversal. Two checks are added: - l_count must equal the expected value from ocfs2_extent_recs_per_eb(), catching blocks with a corrupted record count before any array iteration. - l_next_free_rec must not exceed l_count, preventing out-of-bounds access when iterating over extent records. Link: https://lkml.kernel.org/r/20260403090803.3860971-4-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: remove empty extent list check in ocfs2_dx_dir_lookup_rec()Joseph Qi1-12/+3
The full extent list check is introduced by commit 44acc46d182f, which is to avoid NULL pointer dereference if a dirent is not found. Reworking the error message to not reference rec. Instead, report major_hash being looked up and l_next_free_rec, which naturally covers both failure cases (empty extent list and no matching record) without needing a separate l_next_free_rec == 0 guard. Link: https://lkml.kernel.org/r/20260403090803.3860971-3-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Joel Becker <jlbec@evilplan.org> Cc: Jun Piao <piaojun@huawei.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Mark Fasheh <mark@fasheh.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: validate dx_root extent list fields during block readJoseph Qi1-9/+25
Patch series "ocfs2: consolidate extent list validation into block read callbacks". ocfs2 validates extent list fields (l_count, l_next_free_rec) at various points during extent tree traversal. This is fragile because each caller must remember to check for corrupted on-disk data before using it. This series moves those checks into the block read validation callbacks (ocfs2_validate_dx_root and ocfs2_validate_extent_block), so corrupted fields are caught early at block read time. Redundant post-read checks are then removed. This patch (of 4): Move the extent list l_count validation from ocfs2_dx_dir_lookup_rec() into ocfs2_validate_dx_root(), so that corrupted on-disk fields are caught early at block read time rather than during directory lookups. Additionally, add a l_next_free_rec <= l_count check to prevent out-of-bounds access when iterating over extent records. Both checks are skipped for inline dx roots (OCFS2_DX_FLAG_INLINE), which use dr_entries instead of dr_list. Link: https://lkml.kernel.org/r/20260403090803.3860971-1-joseph.qi@linux.alibaba.com Link: https://lkml.kernel.org/r/20260403090803.3860971-2-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Heming Zhao <heming.zhao@suse.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: fix use-after-free in ocfs2_fault() when VM_FAULT_RETRYTejas Bharambe2-10/+7
filemap_fault() may drop the mmap_lock before returning VM_FAULT_RETRY, as documented in mm/filemap.c: "If our return value has VM_FAULT_RETRY set, it's because the mmap_lock may be dropped before doing I/O or by lock_folio_maybe_drop_mmap()." When this happens, a concurrent munmap() can call remove_vma() and free the vm_area_struct via RCU. The saved 'vma' pointer in ocfs2_fault() then becomes a dangling pointer, and the subsequent trace_ocfs2_fault() call dereferences it -- a use-after-free. Fix this by saving ip_blkno as a plain integer before calling filemap_fault(), and removing vma from the trace event. Since ip_blkno is copied by value before the lock can be dropped, it remains valid regardless of what happens to the vma or inode afterward. Link: https://lkml.kernel.org/r/20260410083816.34951-1-tejas.bharambe@outlook.com Fixes: 614a9e849ca6 ("ocfs2: Remove FILE_IO from masklog.") Signed-off-by: Tejas Bharambe <tejas.bharambe@outlook.com> Reported-by: syzbot+a49010a0e8fcdeea075f@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=a49010a0e8fcdeea075f Suggested-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: handle invalid dinode in ocfs2_group_extendZhengYuan Huang1-3/+7
[BUG] kernel BUG at fs/ocfs2/resize.c:308! Oops: invalid opcode: 0000 [#1] SMP KASAN NOPTI RIP: 0010:ocfs2_group_extend+0x10aa/0x1ae0 fs/ocfs2/resize.c:308 Code: 8b8520ff ffff83f8 860f8580 030000e8 5cc3c1fe Call Trace: ... ocfs2_ioctl+0x175/0x6e0 fs/ocfs2/ioctl.c:869 vfs_ioctl fs/ioctl.c:51 [inline] __do_sys_ioctl fs/ioctl.c:597 [inline] __se_sys_ioctl fs/ioctl.c:583 [inline] __x64_sys_ioctl+0x197/0x1e0 fs/ioctl.c:583 x64_sys_call+0x1144/0x26a0 arch/x86/include/generated/asm/syscalls_64.h:17 do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline] do_syscall_64+0x93/0xf80 arch/x86/entry/syscall_64.c:94 entry_SYSCALL_64_after_hwframe+0x76/0x7e ... [CAUSE] ocfs2_group_extend() assumes that the global bitmap inode block returned from ocfs2_inode_lock() has already been validated and BUG_ONs when the signature is not a dinode. That assumption is too strong for crafted filesystems because the JBD2-managed buffer path can bypass structural validation and return an invalid dinode to the resize ioctl. [FIX] Validate the dinode explicitly in ocfs2_group_extend(). If the global bitmap buffer does not contain a valid dinode, report filesystem corruption with ocfs2_error() and fail the resize operation instead of crashing the kernel. Link: https://lkml.kernel.org/r/20260401092303.3709187-1-gality369@gmail.com Fixes: 10995aa2451a ("ocfs2: Morph the haphazard OCFS2_IS_VALID_DINODE() checks.") Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2: validate bg_list extent bounds in discontig groupsZhengYuan Huang1-0/+25
[BUG] Running ocfs2 on a corrupted image with a discontiguous block group whose bg_list.l_next_free_rec is set to an excessively large value triggers a KASAN use-after-free crash: BUG: KASAN: use-after-free in ocfs2_bg_discontig_fix_by_rec fs/ocfs2/suballoc.c:1678 [inline] BUG: KASAN: use-after-free in ocfs2_bg_discontig_fix_result+0x4a4/0x560 fs/ocfs2/suballoc.c:1715 Read of size 4 at addr ffff88801a85f000 by task syz.0.115/552 Call Trace: ... __asan_report_load4_noabort+0x14/0x30 mm/kasan/report_generic.c:380 ocfs2_bg_discontig_fix_by_rec fs/ocfs2/suballoc.c:1678 [inline] ocfs2_bg_discontig_fix_result+0x4a4/0x560 fs/ocfs2/suballoc.c:1715 ocfs2_search_one_group fs/ocfs2/suballoc.c:1752 [inline] ocfs2_claim_suballoc_bits+0x13c3/0x1cd0 fs/ocfs2/suballoc.c:1984 ocfs2_claim_new_inode+0x2e7/0x8a0 fs/ocfs2/suballoc.c:2292 ocfs2_mknod_locked.constprop.0+0x121/0x2a0 fs/ocfs2/namei.c:637 ocfs2_mknod+0xc71/0x2400 fs/ocfs2/namei.c:384 ocfs2_create+0x158/0x390 fs/ocfs2/namei.c:676 lookup_open.isra.0+0x10a1/0x1460 fs/namei.c:3796 open_last_lookups fs/namei.c:3895 [inline] path_openat+0x11fe/0x2ce0 fs/namei.c:4131 do_filp_open+0x1f6/0x430 fs/namei.c:4161 do_sys_openat2+0x117/0x1c0 fs/open.c:1437 do_sys_open fs/open.c:1452 [inline] __do_sys_openat fs/open.c:1468 [inline] ... [CAUSE] ocfs2_bg_discontig_fix_result() iterates over bg->bg_list.l_recs[] using l_next_free_rec as the upper bound without any sanity check: for (i = 0; i < le16_to_cpu(bg->bg_list.l_next_free_rec); i++) { rec = &bg->bg_list.l_recs[i]; l_next_free_rec is read directly from the on-disk group descriptor and is trusted blindly. On a 4 KiB block device, bg_list.l_recs[] can hold at most 235 entries (ocfs2_extent_recs_per_gd(sb)). A corrupted or crafted filesystem image can set l_next_free_rec to an arbitrarily large value, causing the loop to index past the end of the group descriptor buffer_head data page and into an adjacent freed page. [FIX] Validate discontiguous bg_list.l_count against ocfs2_extent_recs_per_gd(sb), then reject l_next_free_rec values that exceed l_count. This keeps the on-disk extent list self-consistent and matches how the rest of ocfs2 uses l_count as the extent-list bound. Link: https://lkml.kernel.org/r/20260401021622.3560952-1-gality369@gmail.com Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-15ocfs2/heartbeat: fix slot mapping rollback leaks on error pathsYufan Chen1-27/+56
o2hb_map_slot_data() allocates hr_tmp_block, hr_slots, hr_slot_data, and pages in stages. If a later allocation fails, the current code returns without unwinding the earlier allocations. o2hb_region_dev_store() also leaves slot mapping resources behind when setup aborts, and it keeps hr_aborted_start/hr_node_deleted set across retries. That leaves stale state behind after a failed start. Factor the slot cleanup into o2hb_unmap_slot_data(), use it from both o2hb_map_slot_data() and o2hb_region_release(), and call it from the dev_store() rollback after stopping a started heartbeat thread. While freeing pages, clear each hr_slot_data entry as it is released, and reset the start state before each new setup attempt. This closes the slot mapping leak on allocation/setup failure paths and keeps failed setup attempts retryable. Link: https://lkml.kernel.org/r/20260330153428.19586-1-yufan.chen@linux.dev Signed-off-by: Yufan Chen <ericterminal@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-13Merge tag 'vfs-7.1-rc1.bh.metadata' of ↵Linus Torvalds2-4/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs buffer_head updates from Christian Brauner: "This cleans up the mess that has accumulated over the years in metadata buffer_head tracking for inodes. It moves the tracking into dedicated structure in filesystem-private part of the inode (so that we don't use private_list, private_data, and private_lock in struct address_space), and also moves couple other users of private_data and private_list so these are removed from struct address_space saving 3 longs in struct inode for 99% of inodes" * tag 'vfs-7.1-rc1.bh.metadata' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits) fs: Drop i_private_list from address_space fs: Drop mapping_metadata_bhs from address space ext4: Track metadata bhs in fs-private inode part minix: Track metadata bhs in fs-private inode part udf: Track metadata bhs in fs-private inode part fat: Track metadata bhs in fs-private inode part bfs: Track metadata bhs in fs-private inode part affs: Track metadata bhs in fs-private inode part ext2: Track metadata bhs in fs-private inode part fs: Provide functions for handling mapping_metadata_bhs directly fs: Switch inode_has_buffers() to take mapping_metadata_bhs fs: Make bhs point to mapping_metadata_bhs fs: Move metadata bhs tracking to a separate struct fs: Fold fsync_buffers_list() into sync_mapping_buffers() fs: Drop osync_buffers_list() kvm: Use private inode list instead of i_private_list fs: Remove i_private_data aio: Stop using i_private_data and i_private_lock hugetlbfs: Stop using i_private_data fs: Stop using i_private_data for metadata bh tracking ...
2026-04-13Merge tag 'vfs-7.1-rc1.kino' of ↵Linus Torvalds9-27/+27
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64
2026-04-09ocfs2: use jbd2 jinode dirty range accessorLi Chen1-2/+7
ocfs2 journal commit callback reads jbd2_inode dirty range fields without holding journal->j_list_lock. Use jbd2_jinode_get_dirty_range() to get the range in bytes. Suggested-by: Jan Kara <jack@suse.cz> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Li Chen <me@linux.beauty> Link: https://patch.msgid.link/20260306085643.465275-4-me@linux.beauty Signed-off-by: Theodore Ts'o <tytso@mit.edu>
2026-04-06ocfs2: fix out-of-bounds write in ocfs2_write_end_inlineJoseph Qi1-0/+10
KASAN reports a use-after-free write of 4086 bytes in ocfs2_write_end_inline, called from ocfs2_write_end_nolock during a copy_file_range splice fallback on a corrupted ocfs2 filesystem mounted on a loop device. The actual bug is an out-of-bounds write past the inode block buffer, not a true use-after-free. The write overflows into an adjacent freed page, which KASAN reports as UAF. The root cause is that ocfs2_try_to_write_inline_data trusts the on-disk id_count field to determine whether a write fits in inline data. On a corrupted filesystem, id_count can exceed the physical maximum inline data capacity, causing writes to overflow the inode block buffer. Call trace (crash path): vfs_copy_file_range (fs/read_write.c:1634) do_splice_direct splice_direct_to_actor iter_file_splice_write ocfs2_file_write_iter generic_perform_write ocfs2_write_end ocfs2_write_end_nolock (fs/ocfs2/aops.c:1949) ocfs2_write_end_inline (fs/ocfs2/aops.c:1915) memcpy_from_folio <-- KASAN: write OOB So add id_count upper bound check in ocfs2_validate_inode_block() to alongside the existing i_size check to fix it. Link: https://lkml.kernel.org/r/20260403063830.3662739-1-joseph.qi@linux.alibaba.com Signed-off-by: Joseph Qi <joseph.qi@linux.alibaba.com> Reported-by: syzbot+62c1793956716ea8b28a@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=62c1793956716ea8b28a Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05fs: remove unncessary pagevec.h includesTal Zussman1-1/+0
Remove unused pagevec.h includes from .c files. These were found with the following command: grep -rl '#include.*pagevec\.h' --include='*.c' | while read f; do grep -qE 'PAGEVEC_SIZE|folio_batch' "$f" || echo "$f" done There are probably more removal candidates in .h files, but those are more complex to analyze. Link: https://lkml.kernel.org/r/20260225-pagevec_cleanup-v2-2-716868cc2d11@columbia.edu Signed-off-by: Tal Zussman <tz2294@columbia.edu> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Zi Yan <ziy@nvidia.com> Acked-by: Chris Li <chrisl@kernel.org> Reviewed-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Christian Brauner <brauner@kernel.org> Cc: David Hildenbrand (Arm) <david@kernel.org> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-02for_each_alias(): helper macro for iterating through dentries of given inodeAl Viro1-1/+1
Most of the places using d_alias are loops iterating through all aliases for given inode; introduce a helper macro (for_each_alias(dentry, inode)) and convert open-coded instances of such loop to it. They are easier to read that way and it reduces the noise on the next steps. You _must_ hold inode->i_lock over that thing. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-03-27ocfs2/dlm: fix off-by-one in dlm_match_regions() region comparisonJunrui Luo1-1/+1
The local-vs-remote region comparison loop uses '<=' instead of '<', causing it to read one entry past the valid range of qr_regions. The other loops in the same function correctly use '<'. Fix the loop condition to use '<' for consistency and correctness. Link: https://lkml.kernel.org/r/SYBPR01MB78813DA26B50EC5E01F00566AF7BA@SYBPR01MB7881.ausprd01.prod.outlook.com Fixes: ea2034416b54 ("ocfs2/dlm: Add message DLM_QUERY_REGION") Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27ocfs2/dlm: validate qr_numregions in dlm_match_regions()Junrui Luo1-0/+8
Patch series "ocfs2/dlm: fix two bugs in dlm_match_regions()". In dlm_match_regions(), the qr_numregions field from a DLM_QUERY_REGION network message is used to drive loops over the qr_regions buffer without sufficient validation. This series fixes two issues: - Patch 1 adds a bounds check to reject messages where qr_numregions exceeds O2NM_MAX_REGIONS. The o2net layer only validates message byte length; it does not constrain field values, so a crafted message can set qr_numregions up to 255 and trigger out-of-bounds reads past the 1024-byte qr_regions buffer. - Patch 2 fixes an off-by-one in the local-vs-remote comparison loop, which uses '<=' instead of '<', reading one entry past the valid range even when qr_numregions is within bounds. This patch (of 2): The qr_numregions field from a DLM_QUERY_REGION network message is used directly as loop bounds in dlm_match_regions() without checking against O2NM_MAX_REGIONS. Since qr_regions is sized for at most O2NM_MAX_REGIONS (32) entries, a crafted message with qr_numregions > 32 causes out-of-bounds reads past the qr_regions buffer. Add a bounds check for qr_numregions before entering the loops. Link: https://lkml.kernel.org/r/SYBPR01MB7881A334D02ACEE5E0645801AF7BA@SYBPR01MB7881.ausprd01.prod.outlook.com Link: https://lkml.kernel.org/r/SYBPR01MB788166F524AD04E262E174BEAF7BA@SYBPR01MB7881.ausprd01.prod.outlook.com Fixes: ea2034416b54 ("ocfs2/dlm: Add message DLM_QUERY_REGION") Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reported-by: Yuhao Jiang <danisjiang@gmail.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27ocfs2: remove redundant error code assignmentAlexey Velichayshiy1-1/+0
Remove the error assignment for variable 'ret' during correct code execution. In subsequent execution, variable 'ret' is overwritten. Found by Linux Verification Center (linuxtesting.org) with SVACE. Link: https://lkml.kernel.org/r/20260307234809.88421-1-a.velichayshiy@ispras.ru Signed-off-by: Alexey Velichayshiy <a.velichayshiy@ispras.ru> Acked-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Jun Piao <piaojun@huawei.com> Cc: Heming Zhao <heming.zhao@suse.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-03-27ocfs2: fix possible deadlock between unlink and dio_end_io_writeJoseph Qi1-2/+1
ocfs2_unlink takes orphan dir inode_lock first and then ip_alloc_sem, while in ocfs2_dio_end_io_write, it acquires these locks in reverse order. This creates an ABBA lock ordering violat