aboutsummaryrefslogtreecommitdiff
path: root/fs/udf
AgeCommit message (Collapse)AuthorFilesLines
5 daysMerge tag 'fs_for_v7.2-rc1' of ↵Linus Torvalds2-3/+18
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull udf, isofs, ext2, and quota updates from Jan Kara: - Assorted udf & isofs fixes for maliciously formatted devices - Cleanups to use kmalloc() instead of __get_free_page() - Removal of deprecated DAX code from ext2 * tag 'fs_for_v7.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: validate VAT inode size for old VAT format udf: validate VAT header length against the VAT inode size udf: validate sparing table length as an entry count, not a byte count isofs: bound Rock Ridge symlink components to the SL record ext2: fix ignored return value of generic_write_sync() ext2: Remove deprecated DAX support isofs: replace __get_free_page() with kmalloc() quota: allocate dquot_hash with kmalloc() udf: validate free block extents against the partition length
6 daysMerge tag 'pull-fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/viro/vfsLinus Torvalds1-1/+1
Pull udf fix from Al Viro: "I just noticed that a udf fix had been sitting in #fixes since February; still applicable, Jan's Acked-by applied. Very belated pull request" * tag 'pull-fixes' of gitolite.kernel.org:pub/scm/linux/kernel/git/viro/vfs: udf: fix nls leak on udf_fill_super() failure
6 daysudf: fix nls leak on udf_fill_super() failureAl Viro1-1/+1
On all failure exits that go to error_out there we have already moved the nls reference from uopt->nls_map to sbi->s_nls_map, leaving NULL behind. Fixes: c4e89cc674ac ("udf: convert to new mount API") Acked-by: Jan Kara <jack@suse.cz> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
9 daysudf: validate VAT inode size for old VAT formatJan Kara1-0/+5
Validate VAT inode is large enough to contain at least the header for pre-2.00 UDF media format. Signed-off-by: Jan Kara <jack@suse.cz>
9 daysudf: validate VAT header length against the VAT inode sizeBryam Vargas1-0/+8
udf_load_vat() takes the virtual partition's start offset straight from the on-disk VAT 2.0 header without checking it against the VAT inode size: map->s_type_specific.s_virtual.s_start_offset = le16_to_cpu(vat20->lengthHeader); map->s_type_specific.s_virtual.s_num_entries = (sbi->s_vat_inode->i_size - map->s_type_specific.s_virtual.s_start_offset) >> 2; lengthHeader is a fully attacker-controlled 16-bit value. If it exceeds the VAT inode size, the s_num_entries subtraction underflows to a huge count, which defeats the "block > s_num_entries" bound in udf_get_pblock_virt15(); and on the ICB-inline path that function reads ((__le32 *)(iinfo->i_data + s_start_offset))[block] so a large s_start_offset indexes past the inode's in-ICB data. Mounting a crafted UDF image with a virtual (VAT) partition then triggers an out-of-bounds read. Reject a VAT whose header length does not leave room for at least one entry within the VAT inode. Fixes: fa5e08156335 ("udf: Handle VAT packed inside inode properly") Cc: stable@vger.kernel.org Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Link: https://patch.msgid.link/20260612-b4-disp-9a2317ee-v1-1-fefef5736154@proton.me Signed-off-by: Jan Kara <jack@suse.cz>
9 daysudf: validate sparing table length as an entry count, not a byte countBryam Vargas1-1/+2
udf_load_sparable_map() accepts a sparing table when sizeof(*st) + le16_to_cpu(st->reallocationTableLen) > sb->s_blocksize is false, i.e. it treats reallocationTableLen as a number of BYTES that must fit in the block. But the table is walked as an array of 8-byte sparingEntry elements: for (i = 0; i < le16_to_cpu(st->reallocationTableLen); i++) { struct sparingEntry *entry = &st->mapEntry[i]; ... entry->origLocation ... } in udf_get_pblock_spar15() and udf_relocate_blocks(). A reallocationTableLen of N therefore passes the check whenever sizeof(*st) + N <= blocksize, yet the consumers index sizeof(*st) + N * sizeof(struct sparingEntry) bytes -- up to ~8x the block. On a crafted UDF image this is an out-of-bounds read in udf_get_pblock_spar15(); udf_relocate_blocks() additionally feeds the same length to udf_update_tag(), whose crc_itu_t() reads far past the block, and its memmove() through st->mapEntry[] is an out-of-bounds write. Validate reallocationTableLen as the entry count it is, with struct_size(). Fixes: 1df2ae31c724 ("udf: Fortify loading of sparing table") Cc: stable@vger.kernel.org Signed-off-by: Bryam Vargas <hexlabsecurity@proton.me> Link: https://patch.msgid.link/20260612-b4-disp-91780c4e-v1-1-f15112ff6882@proton.me Signed-off-by: Jan Kara <jack@suse.cz>
2026-05-18udf: validate free block extents against the partition lengthMichael Bommarito1-2/+3
udf_free_blocks() checks the logical block number and count against the partition length, but drops the extent offset from that final bound. A crafted extent can pass the guard while logicalBlockNum + offset + count points past the partition, which later indexes past the space bitmap array. A single ftruncate(2) on a file backed by such an extent reliably panics the kernel. This is a local availability issue. On desktop systems where UDisks/polkit allows the active user to mount removable UDF media without CAP_SYS_ADMIN, an unprivileged local user can supply the crafted filesystem and trigger the panic by truncating a writable file on it. Systems that require root or CAP_SYS_ADMIN to mount the image have a higher prerequisite. No confidentiality or integrity impact is claimed: the reproduced primitive is an out-of-bounds read of a bitmap pointer slot followed by a kernel panic. Use the already computed logicalBlockNum + offset + count value for the partition length check. Also make load_block_bitmap() reject an out-of-range block group before indexing s_block_bitmap[], so corrupted callers cannot walk past the flexible array. Fixes: 56e69e59751d ("udf: prevent integer overflow in udf_bitmap_free_blocks()") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260515142327.1120767-1-michael.bommarito@gmail.com Signed-off-by: Jan Kara <jack@suse.cz>
2026-04-22udf: reject descriptors with oversized CRC lengthMichael Bommarito1-2/+6
udf_read_tagged() skips CRC verification when descCRCLength + sizeof(struct tag) exceeds the block size. A crafted UDF image can set descCRCLength to an oversized value to bypass CRC validation entirely; the descriptor is then accepted based solely on the 8-bit tag checksum, which is trivially recomputable. Reject such descriptors instead of silently accepting them. A legitimate single-block descriptor should never have a CRC length that exceeds the block. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Assisted-by: Claude:claude-opus-4-6 Assisted-by: Codex:gpt-5-4 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Link: https://patch.msgid.link/20260413211240.853662-1-michael.bommarito@gmail.com Signed-off-by: Jan Kara <jack@suse.cz>
2026-04-15Merge tag 'fs_for_v7.1-rc1' of ↵Linus Torvalds2-7/+8
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, udf, quota updates from Jan Kara: - A fix for a race in quota code that can expose ocfs2 to use-after-free issues - UDF fix to avoid memory corruption in face of corrupted format - Couple of ext2 fixes for better handling of fs corruption - Some more various code cleanups in UDF & ext2 * tag 'fs_for_v7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: ext2: reject inodes with zero i_nlink and valid mode in ext2_iget() ext2: use get_random_u32() where appropriate quota: Fix race of dquot_scan_active() with quota deactivation udf: fix partition descriptor append bookkeeping ext2: avoid drop_nlink() during unlink of zero-nlink inode in ext2_unlink() ext2: guard reservation window dump with EXT2FS_DEBUG ext2: replace BUG_ON with WARN_ON_ONCE in ext2_get_blocks ext2: remove stale TODO about kmap fs: udf: avoid assignment in condition when selecting allocation goal
2026-04-13Merge tag 'vfs-7.1-rc1.bh.metadata' of ↵Linus Torvalds9-13/+26
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs buffer_head updates from Christian Brauner: "This cleans up the mess that has accumulated over the years in metadata buffer_head tracking for inodes. It moves the tracking into dedicated structure in filesystem-private part of the inode (so that we don't use private_list, private_data, and private_lock in struct address_space), and also moves couple other users of private_data and private_list so these are removed from struct address_space saving 3 longs in struct inode for 99% of inodes" * tag 'vfs-7.1-rc1.bh.metadata' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (42 commits) fs: Drop i_private_list from address_space fs: Drop mapping_metadata_bhs from address space ext4: Track metadata bhs in fs-private inode part minix: Track metadata bhs in fs-private inode part udf: Track metadata bhs in fs-private inode part fat: Track metadata bhs in fs-private inode part bfs: Track metadata bhs in fs-private inode part affs: Track metadata bhs in fs-private inode part ext2: Track metadata bhs in fs-private inode part fs: Provide functions for handling mapping_metadata_bhs directly fs: Switch inode_has_buffers() to take mapping_metadata_bhs fs: Make bhs point to mapping_metadata_bhs fs: Move metadata bhs tracking to a separate struct fs: Fold fsync_buffers_list() into sync_mapping_buffers() fs: Drop osync_buffers_list() kvm: Use private inode list instead of i_private_list fs: Remove i_private_data aio: Stop using i_private_data and i_private_lock hugetlbfs: Stop using i_private_data fs: Stop using i_private_data for metadata bh tracking ...
2026-04-13Merge tag 'vfs-7.1-rc1.kino' of ↵Linus Torvalds5-21/+21
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs i_ino updates from Christian Brauner: "For historical reasons, the inode->i_ino field is an unsigned long, which means that it's 32 bits on 32 bit architectures. This has caused a number of filesystems to implement hacks to hash a 64-bit identifier into a 32-bit field, and deprives us of a universal identifier field for an inode. This changes the inode->i_ino field from an unsigned long to a u64. This shouldn't make any material difference on 64-bit hosts, but 32-bit hosts will see struct inode grow by at least 4 bytes. This could have effects on slabcache sizes and field alignment. The bulk of the changes are to format strings and tracepoints, since the kernel itself doesn't care that much about the i_ino field. The first patch changes some vfs function arguments, so check that one out carefully. With this change, we may be able to shrink some inode structures. For instance, struct nfs_inode has a fileid field that holds the 64-bit inode number. With this set of changes, that field could be eliminated. I'd rather leave that sort of cleanups for later just to keep this simple" * tag 'vfs-7.1-rc1.kino' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: nilfs2: fix 64-bit division operations in nilfs_bmap_find_target_in_group() EVM: add comment describing why ino field is still unsigned long vfs: remove externs from fs.h on functions modified by i_ino widening treewide: fix missed i_ino format specifier conversions ext4: fix signed format specifier in ext4_load_inode trace event treewide: change inode->i_ino from unsigned long to u64 nilfs2: widen trace event i_ino fields to u64 f2fs: widen trace event i_ino fields to u64 ext4: widen trace event i_ino fields to u64 zonefs: widen trace event i_ino fields to u64 hugetlbfs: widen trace event i_ino fields to u64 ext2: widen trace event i_ino fields to u64 cachefiles: widen trace event i_ino fields to u64 vfs: widen trace event i_ino fields to u64 net: change sock.sk_ino and sock_i_ino() to u64 audit: widen ino fields to u64 vfs: widen inode hash/lookup functions to u64
2026-03-27udf: Fix race between file type conversion and writebackJan Kara1-18/+15
udf_setsize() can race with udf_writepages() as follows: udf_setsize() udf_writepages() if (iinfo->i_alloc_type == ICBTAG_FLAG_AD_IN_ICB) err = udf_expand_file_adinicb(inode); err = udf_extend_file(inode, newsize); udf_adinicb_writepages() memcpy_from_file_folio() - crash because inode size is too big. Fix the problem by checking the file type under folio lock in udf_handle_page_wb() handler called from __mpage_writepages() which properly serializes with udf_expand_file_adinicb(). Reported-by: Jianzhou Zhao <luckd0g@163.com> Link: https://lore.kernel.org/all/f622c01.67ac.19cdbdd777d.Coremail.luckd0g@163.com Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260326140635.15895-4-jack@suse.cz Signed-off-by: Jan Kara <jack@suse.cz>
2026-03-26udf: Track metadata bhs in fs-private inode partJan Kara9-14/+25
Track metadata bhs for an inode in fs-private part of the inode. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-80-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-26udf: Sync and invalidate metadata buffers from udf_evict_inode()Jan Kara1-0/+2
There are only very few filesystems using generic metadata buffer head tracking and everybody is paying the overhead. When we remove this tracking for inode reclaim code .evict will start to see inodes with metadata buffers attached so write them out and prune them. Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-58-jack@suse.cz Tested-by: syzbot@syzkaller.appspotmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-26udf: Switch to generic_buffers_fsync()Jan Kara2-2/+2
UDF uses metadata bh list attached to inode. Switch it to generic_buffers_fsync() instead of generic_file_fsync() as we'll be removing metadata bh handling from generic_file_fsync(). Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260326095354.16340-51-jack@suse.cz Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-19udf: fix partition descriptor append bookkeepingSeohyeon Maeng1-1/+3
Mounting a crafted UDF image with repeated partition descriptors can trigger a heap out-of-bounds write in part_descs_loc[]. handle_partition_descriptor() deduplicates entries by partition number, but appended slots never record partnum. As a result duplicate Partition Descriptors are appended repeatedly and num_part_descs keeps growing. Once the table is full, the growth path still sizes the allocation from partnum even though inserts are indexed by num_part_descs. If partnum is already aligned to PART_DESC_ALLOC_STEP, ALIGN(partnum, step) can keep the old capacity and the next append writes past the end of the table. Store partnum in the appended slot and size growth from the next append count so deduplication and capacity tracking follow the same model. Fixes: ee4af50ca94f ("udf: Fix mounting of Win7 created UDF filesystems") Cc: stable@vger.kernel.org Signed-off-by: Seohyeon Maeng <bioloidgp@gmail.com> Link: https://patch.msgid.link/20260310081652.21220-1-bioloidgp@gmail.com Signed-off-by: Jan Kara <jack@suse.cz>
2026-03-06treewide: change inode->i_ino from unsigned long to u64Jeff Layton5-21/+21
On 32-bit architectures, unsigned long is only 32 bits wide, which causes 64-bit inode numbers to be silently truncated. Several filesystems (NFS, XFS, BTRFS, etc.) can generate inode numbers that exceed 32 bits, and this truncation can lead to inode number collisions and other subtle bugs on 32-bit systems. Change the type of inode->i_ino from unsigned long to u64 to ensure that inode numbers are always represented as 64-bit values regardless of architecture. Update all format specifiers treewide from %lu/%lx to %llu/%llx to match the new type, along with corresponding local variable types. This is the bulk treewide conversion. Earlier patches in this series handled trace events separately to allow trace field reordering for better struct packing on 32-bit. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-12-2257ad83d372@kernel.org Acked-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-27fs: udf: avoid assignment in condition when selecting allocation goalAdarsh Das1-6/+5
Avoid assignment inside an if condition when choosing the block allocation goal in inode_getblk(), and make the priority order explicit. No functional change. [JK: Fixup conditions to really not change functionality] Signed-off-by: Adarsh Das <adarshdas950@gmail.com> Link: https://patch.msgid.link/20260206125638.94194-1-adarshdas950@gmail.com Signed-off-by: Jan Kara <jack@suse.cz>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook1-1/+1
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds1-1/+1
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds1-4/+4
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook1-9/+7
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'mm-nonmm-stable-2026-02-12-10-48' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - "ocfs2: give ocfs2 the ability to reclaim suballocator free bg" saves disk space by teaching ocfs2 to reclaim suballocator block group space (Heming Zhao) - "Add ARRAY_END(), and use it to fix off-by-one bugs" adds the ARRAY_END() macro and uses it in various places (Alejandro Colomar) - "vmcoreinfo: support VMCOREINFO_BYTES larger than PAGE_SIZE" makes the vmcore code future-safe, if VMCOREINFO_BYTES ever exceeds the page size (Pnina Feder) - "kallsyms: Prevent invalid access when showing module buildid" cleans up kallsyms code related to module buildid and fixes an invalid access crash when printing backtraces (Petr Mladek) - "Address page fault in ima_restore_measurement_list()" fixes a kexec-related crash that can occur when booting the second-stage kernel on x86 (Harshit Mogalapalli) - "kho: ABI headers and Documentation updates" updates the kexec handover ABI documentation (Mike Rapoport) - "Align atomic storage" adds the __aligned attribute to atomic_t and atomic64_t definitions to get natural alignment of both types on csky, m68k, microblaze, nios2, openrisc and sh (Finn Thain) - "kho: clean up page initialization logic" simplifies the page initialization logic in kho_restore_page() (Pratyush Yadav) - "Unload linux/kernel.h" moves several things out of kernel.h and into more appropriate places (Yury Norov) - "don't abuse task_struct.group_leader" removes the usage of ->group_leader when it is "obviously unnecessary" (Oleg Nesterov) - "list private v2 & luo flb" adds some infrastructure improvements to the live update orchestrator (Pasha Tatashin) * tag 'mm-nonmm-stable-2026-02-12-10-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (107 commits) watchdog/hardlockup: simplify perf event probe and remove per-cpu dependency procfs: fix missing RCU protection when reading real_parent in do_task_stat() watchdog/softlockup: fix sample ring index wrap in need_counting_irqs() kcsan, compiler_types: avoid duplicate type issues in BPF Type Format kho: fix doc for kho_restore_pages() tests/liveupdate: add in-kernel liveupdate test liveupdate: luo_flb: introduce File-Lifecycle-Bound global state liveupdate: luo_file: Use private list list: add kunit test for private list primitives list: add primitives for private list manipulations delayacct: fix uapi timespec64 definition panic: add panic_force_cpu= parameter to redirect panic to a specific CPU netclassid: use thread_group_leader(p) in update_classid_task() RDMA/umem: don't abuse current->group_leader drm/pan*: don't abuse current->group_leader drm/amd: kill the outdated "Only the pthreads threading model is supported" checks drm/amdgpu: don't abuse current->group_leader android/binder: use same_thread_group(proc->tsk, current) in binder_mmap() android/binder: don't abuse current->group_leader kho: skip memoryless NUMA nodes when reserving scratch areas ...
2026-02-09Merge tag 'vfs-7.0-rc1.fserror' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs error reporting updates from Christian Brauner: "This contains the changes to support generic I/O error reporting. Filesystems currently have no standard mechanism for reporting metadata corruption and file I/O errors to userspace via fsnotify. Each filesystem (xfs, ext4, erofs, f2fs, etc.) privately defines EFSCORRUPTED, and error reporting to fanotify is inconsistent or absent entirely. This introduces a generic fserror infrastructure built around struct super_block that gives filesystems a standard way to queue metadata and file I/O error reports for delivery to fsnotify. Errors are queued via mempools and queue_work to avoid holding filesystem locks in the notification path; unmount waits for pending events to drain. A new super_operations::report_error callback lets filesystem drivers respond to file I/O errors themselves (to be used by an upcoming XFS self-healing patchset). On the uapi side, EFSCORRUPTED and EUCLEAN are promoted from private per-filesystem definitions to canonical errno.h values across all architectures" * tag 'vfs-7.0-rc1.fserror' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: ext4: convert to new fserror helpers xfs: translate fsdax media errors into file "data lost" errors when convenient xfs: report fs metadata errors via fsnotify iomap: report file I/O errors to the VFS fs: report filesystem and file I/O errors to fsnotify uapi: promote EFSCORRUPTED and EUCLEAN to errno.h
2026-01-20kernel.h: drop hex.h and update all hex.h usersRandy Dunlap1-0/+1
Remove <linux/hex.h> from <linux/kernel.h> and update all users/callers of hex.h interfaces to directly #include <linux/hex.h> as part of the process of putting kernel.h on a diet. Removing hex.h from kernel.h means that 36K C source files don't have to pay the price of parsing hex.h for the roughly 120 C source files that need it. This change has been build-tested with allmodconfig on most ARCHes. Also, all users/callers of <linux/hex.h> in the entire source tree have been updated if needed (if not already #included). Link: https://lkml.kernel.org/r/20251215005206.2362276-1-rdunlap@infradead.org Signed-off-by: Randy Dunlap <rdunlap@infradead.org> Reviewed-by: Andy Shevchenko <andriy.shevchenko@intel.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Yury Norov (NVIDIA) <yury.norov@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-13uapi: promote EFSCORRUPTED and EUCLEAN to errno.hDarrick J. Wong1-2/+0
Stop definining these privately and instead move them to the uapi errno.h so that they become canonical instead of copy pasta. Cc: linux-api@vger.kernel.org Signed-off-by: Darrick J. Wong <djwong@kernel.org> Link: https://patch.msgid.link/176826402587.3490369.17659117524205214600.stgit@frogsfrogsfrogs Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-01-12udf: add setlease file operationJeff Layton2-0/+4
Add the setlease file_operation pointing to generic_setlease to the udf file_operations structures. A future patch will change the default behavior to reject lease attempts with -EINVAL when there is no setlease file operation defined. Add generic_setlease to retain the ability to set leases on this filesystem. Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260108-setlease-6-20-v1-20-ea4dec9b67fa@kernel.org Acked-by: Al Viro <viro@zeniv.linux.org.uk> Acked-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-10-20Coccinelle-based conversion to use ->i_state accessorsMateusz Guzik1-1/+1
All places were patched by coccinelle with the default expecting that ->i_lock is held, afterwards entries got fixed up by hand to use unlocked variants as needed. The script: @@ expression inode, flags; @@ - inode->i_state & flags + inode_state_read(inode) & flags @@ expression inode, flags; @@ - inode->i_state &= ~flags + inode_state_clear(inode, flags) @@ expression inode, flag1, flag2; @@ - inode->i_state &= ~flag1 & ~flag2 + inode_state_clear(inode, flag1 | flag2) @@ expression inode, flags; @@ - inode->i_state |= flags + inode_state_set(inode, flags) @@ expression inode, flags; @@ - inode->i_state = flags + inode_state_assign(inode, flags) @@ expression inode, flags; @@ - flags = inode->i_state + flags = inode_state_read(inode) @@ expression inode, flags; @@ - READ_ONCE(inode->i_state) & flags + inode_state_read(inode) & flags Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-09-22fs: udf: fix OOB read in lengthAllocDescs handlingLarshin Sergey1-0/+3
When parsing Allocation Extent Descriptor, lengthAllocDescs comes from on-disk data and must be validated against the block size. Crafted or corrupted images may set lengthAllocDescs so that the total descriptor length (sizeof(allocExtDesc) + lengthAllocDescs) exceeds the buffer, leading udf_update_tag() to call crc_itu_t() on out-of-bounds memory and trigger a KASAN use-after-free read. BUG: KASAN: use-after-free in crc_itu_t+0x1d5/0x2b0 lib/crc-itu-t.c:60 Read of size 1 at addr ffff888041e7d000 by task syz-executor317/5309 CPU: 0 UID: 0 PID: 5309 Comm: syz-executor317 Not tainted 6.12.0-rc4-syzkaller-00261-g850925a8133c #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:94 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:120 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 crc_itu_t+0x1d5/0x2b0 lib/crc-itu-t.c:60 udf_update_tag+0x70/0x6a0 fs/udf/misc.c:261 udf_write_aext+0x4d8/0x7b0 fs/udf/inode.c:2179 extent_trunc+0x2f7/0x4a0 fs/udf/truncate.c:46 udf_truncate_tail_extent+0x527/0x7e0 fs/udf/truncate.c:106 udf_release_file+0xc1/0x120 fs/udf/file.c:185 __fput+0x23f/0x880 fs/file_table.c:431 task_work_run+0x24f/0x310 kernel/task_work.c:239 exit_task_work include/linux/task_work.h:43 [inline] do_exit+0xa2f/0x28e0 kernel/exit.c:939 do_group_exit+0x207/0x2c0 kernel/exit.c:1088 __do_sys_exit_group kernel/exit.c:1099 [inline] __se_sys_exit_group kernel/exit.c:1097 [inline] __x64_sys_exit_group+0x3f/0x40 kernel/exit.c:1097 x64_sys_call+0x2634/0x2640 arch/x86/include/generated/asm/syscalls_64.h:232 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f </TASK> Validate the computed total length against epos->bh->b_size. Found by Linux Verification Center (linuxtesting.org) with Syzkaller. Reported-by: syzbot+8743fca924afed42f93e@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=8743fca924afed42f93e Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Cc: stable@vger.kernel.org Signed-off-by: Larshin Sergey <Sergey.Larshin@kaspersky.com> Link: https://patch.msgid.link/20250922131358.745579-1-Sergey.Larshin@kaspersky.com Signed-off-by: Jan Kara <jack@suse.cz>
2025-07-28Merge tag 'fs_for_v6.17-rc1' of ↵Linus Torvalds2-14/+27
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull udf and ext2 updates from Jan Kara: "A few udf and ext2 fixes and cleanups" * tag 'fs_for_v6.17-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Verify partition map count udf: stop using write_cache_pages ext2: Handle fiemap on empty files to prevent EINVAL
2025-07-16fs: change write_begin/write_end interface to take struct kiocb *Taotao Chen1-4/+7
Change the address_space_operations callbacks write_begin() and write_end() to take struct kiocb * as the first argument instead of struct file *. Update all affected function prototypes, implementations, call sites, and related documentation across VFS, filesystems, and block layer. Part of a series refactoring address_space_operations write_begin and write_end callbacks to use struct kiocb for passing write context and flags. Signed-off-by: Taotao Chen <chentaotao@didiglobal.com> Link: https://lore.kernel.org/20250716093559.217344-4-chentaotao@didiglobal.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-07-11udf: Verify partition map countJan Kara1-2/+11
Verify that number of partition maps isn't insanely high which can lead to large allocation in udf_sb_alloc_partition_maps(). All partition maps have to fit in the LVD which is in a single block. Reported-by: syzbot+478f2c1a6f0f447a46bb@syzkaller.appspotmail.com Signed-off-by: Jan Kara <jack@suse.cz>
2025-07-11udf: stop using write_cache_pagesChristoph Hellwig1-12/+16
Stop using the obsolete write_cache_pages and use writeback_iter directly. Use the chance to refactor the inacb writeback code to not have a separate writeback helper. Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20250711081036.564232-1-hch@lst.de
2025-05-07udf: Make sure i_lenExtents is uptodate on inode evictionJan Kara1-1/+1
UDF maintains total length of all extents in i_lenExtents. Generally we keep extent lengths (and thus i_lenExtents) block aligned because it makes the file appending logic simpler. However the standard mandates that the inode size must match the length of all extents and thus we trim the last extent when closing the file. To catch possible bugs we also verify that i_lenExtents matches i_size when evicting inode from memory. Commit b405c1e58b73 ("udf: refactor udf_next_aext() to handle error") however broke the code updating i_lenExtents and thus udf_evict_inode() ended up spewing lots of errors about incorrectly sized extents although the extents were actually sized properly. Fix the updating of i_lenExtents to silence the errors. Fixes: b405c1e58b73 ("udf: refactor udf_next_aext() to handle error") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2025-04-01Merge tag 'mm-stable-2025-03-30-16-52' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - The series "Enable strict percpu address space checks" from Uros Bizjak uses x86 named address space qualifiers to provide compile-time checking of percpu area accesses. This has caused a small amount of fallout - two or three issues were reported. In all cases the calling code was found to be incorrect. - The series "Some cleanup for memcg" from Chen Ridong implements some relatively monir cleanups for the memcontrol code. - The series "mm: fixes for device-exclusive entries (hmm)" from David Hildenbrand fixes a boatload of issues which David found then using device-exclusive PTE entries when THP is enabled. More work is needed, but this makes thins better - our own HMM selftests now succeed. - The series "mm: zswap: remove z3fold and zbud" from Yosry Ahmed remove the z3fold and zbud implementations. They have been deprecated for half a year and nobody has complained. - The series "mm: further simplify VMA merge operation" from Lorenzo Stoakes implements numerous simplifications in this area. No runtime effects are anticipated. - The series "mm/madvise: remove redundant mmap_lock operations from process_madvise()" from SeongJae Park rationalizes the locking in the madvise() implementation. Performance gains of 20-25% were observed in one MADV_DONTNEED microbenchmark. - The series "Tiny cleanup and improvements about SWAP code" from Baoquan He contains a number of touchups to issues which Baoquan noticed when working on the swap code. - The series "mm: kmemleak: Usability improvements" from Catalin Marinas implements a couple of improvements to the kmemleak user-visible output. - The series "mm/damon/paddr: fix large folios access and schemes handling" from Usama Arif provides a couple of fixes for DAMON's handling of large folios. - The series "mm/damon/core: fix wrong and/or useless damos_walk() behaviors" from SeongJae Park fixes a few issues with the accuracy of kdamond's walking of DAMON regions. - The series "expose mapping wrprotect, fix fb_defio use" from Lorenzo Stoakes changes the interaction between framebuffer deferred-io and core MM. No functional changes are anticipated - this is preparatory work for the future removal of page structure fields. - The series "mm/damon: add support for hugepage_size DAMOS filter" from Usama Arif adds a DAMOS filter which permits the filtering by huge page sizes. - The series "mm: permit guard regions for file-backed/shmem mappings" from Lorenzo Stoakes extends the guard region feature from its present "anon mappings only" state. The feature now covers shmem and file-backed mappings. - The series "mm: batched unmap lazyfree large folios during reclamation" from Barry Song cleans up and speeds up the unmapping for pte-mapped large folios. - The series "reimplement per-vma lock as a refcount" from Suren Baghdasaryan puts the vm_lock back into the vma. Our reasons for pulling it out were largely bogus and that change made the code more messy. This patchset provides small (0-10%) improvements on one microbenchmark. - The series "Docs/mm/damon: misc DAMOS filters documentation fixes and improves" from SeongJae Park does some maintenance work on the DAMON docs. - The series "hugetlb/CMA improvements for large systems" from Frank van der Linden addresses a pile of issues which have been observed when using CMA on large machines. - The series "mm/damon: introduce DAMOS filter type for unmapped pages" from SeongJae Park enables users of DMAON/DAMOS to filter my the page's mapped/unmapped status. - The series "zsmalloc/zram: there be preemption" from Sergey Senozhatsky teaches zram to run its compression and decompression operations preemptibly. - The series "selftests/mm: Some cleanups from trying to run them" from Brendan Jackman fixes a pile of unrelated issues which Brendan encountered while runnimg our selftests. - The series "fs/proc/task_mmu: add guard region bit to pagemap" from Lorenzo Stoakes permits userspace to use /proc/pid/pagemap to determine whether a particular page is a guard page. - The series "mm, swap: remove swap slot cache" from Kairui Song removes the swap slot cache from the allocation path - it simply wasn't being effective. - The series "mm: cleanups for device-exclusive entries (hmm)" from David Hildenbrand implements a number of unrelated cleanups in this code. - The series "mm: Rework generic PTDUMP configs" from Anshuman Khandual implements a number of preparatoty cleanups to the GENERIC_PTDUMP Kconfig logic. - The series "mm/damon: auto-tune aggregation interval" from SeongJae Park implements a feedback-driven automatic tuning feature for DAMON's aggregation interval tuning. - The series "Fix lazy mmu mode" from Ryan Roberts fixes some issues in powerpc, sparc and x86 lazy MMU implementations. Ryan did this in preparation for implementing lazy mmu mode for arm64 to optimize vmalloc. - The series "mm/page_alloc: Some clarifications for migratetype fallback" from Brendan Jackman reworks some commentary to make the code easier to follow. - The series "page_counter cleanup and size reduction" from Shakeel Butt cleans up the page_counter code and fixes a size increase which we accidentally added late last year. - The series "Add a command line option that enables control of how many threads should be used to allocate huge pages" from Thomas Prescher does that. It allows the careful operator to significantly reduce boot time by tuning the parallalization of huge page initialization. - The series "Fix calculations in trace_balance_dirty_pages() for cgwb" from Tang Yizhou fixes the tracing output from the dirty page balancing code. - The series "mm/damon: make allow filters after reject filters useful and intuitive" from SeongJae Park improves the handling of allow and reject filters. Behaviour is made more consistent and the documention is updated accordingly. - The series "Switch zswap to object read/write APIs" from Yosry Ahmed updates zswap to the new object read/write APIs and thus permits the removal of some legacy code from zpool and zsmalloc. - The series "Some trivial cleanups for shmem" from Baolin Wang does as it claims. - The series "fs/dax: Fix ZONE_DEVICE page reference counts" from Alistair Popple regularizes the weird ZONE_DEVICE page refcount handling in DAX, permittig the removal of a number of special-case checks. - The series "refactor mremap and fix bug" from Lorenzo Stoakes is a preparatoty refactoring and cleanup of the mremap() code. - The series "mm: MM owner tracking for large folios (!hugetlb) + CONFIG_NO_PAGE_MAPCOUNT" from David Hildenbrand reworks the manner in which we determine whether a large folio is known to be mapped exclusively into a single MM. - The series "mm/damon: add sysfs dirs for managing DAMOS filters based on handling layers" from SeongJae Park adds a couple of new sysfs directories to ease the management of DAMON/DAMOS filters. - The series "arch, mm: reduce code duplication in mem_init()" from Mike Rapoport consolidates many per-arch implementations of mem_init() into code generic code, where that is practical. - The series "mm/damon/sysfs: commit parameters online via damon_call()" from SeongJae Park continues the cleaning up of sysfs access to DAMON internal data. - The series "mm: page_ext: Introduce new iteration API" from Luiz Capitulino reworks the page_ext initialization to fix a boot-time crash which was observed with an unusual combination of compile and cmdline options. - The series "Buddy allocator like (or non-uniform) folio split" from Zi Yan reworks the code to split a folio into smaller folios. The main benefit is lessened memory consumption: fewer post-split folios are generated. - The series "Minimize xa_node allocation during xarry split" from Zi Yan reduces the number of xarray xa_nodes which are generated during an xarray split. - The series "drivers/base/memory: Two cleanups" from Gavin Shan performs some maintenance work on the drivers/base/memory code. - The series "Add tracepoints for lowmem reserves, watermarks and totalreserve_pages" from Martin Liu adds some more tracepoints to the page allocator code. - The series "mm/madvise: cleanup requests validations and classifications" from SeongJae Park cleans up some warts which SeongJae observed during his earlier madvise work. - The series "mm/hwpoison: Fix regressions in memory failure handling" from Shuai Xue addresses two quite serious regressions which Shuai has observed in the memory-failure implementation. - The series "mm: reliable huge page allocator" from Johannes Weiner makes huge page allocations cheaper and more reliable by reducing fragmentation. - The series "Minor memcg cleanups & prep for memdescs" from Matthew Wilcox is preparatory work for the future implementation of memdescs. - The series "track memory used by balloon drivers" from Nico Pache introduces a way to track memory used by our various balloon drivers. - The series "mm/damon: introduce DAMOS filter type for active pages" from Nhat Pham permits users to filter for active/inactive pages, separately for file and anon pages. - The series "Adding Proactive Memory Reclaim Statistics" from Hao Jia separates the proactive reclaim statistics from the direct reclaim statistics. - The series "mm/vmscan: don't try to reclaim hwpoison folio" from Jinjiang Tu fixes our handling of hwpoisoned pages within the reclaim code. * tag 'mm-stable-2025-03-30-16-52' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (431 commits) mm/page_alloc: remove unnecessary __maybe_unused in order_to_pindex() x86/mm: restore early initialization of high_memory for 32-bits mm/vmscan: don't try to reclaim hwpoison folio mm/hwpoison: introduce folio_contain_hwpoisoned_page() helper cgroup: docs: add pswpin and pswpout items in cgroup v2 doc mm: vmscan: split proactive reclaim statistics from direct reclaim statistics selftests/mm: speed up split_huge_page_test selftests/mm: uffd-unit-tests support for hugepages > 2M docs/mm/damon/design: document active DAMOS filter type mm/damon: implement a new DAMOS filter type for active pages fs/dax: don't disassociate zero page entries MM documentation: add "Unaccepted" meminfo entry selftests/mm: add commentary about 9pfs bugs fork: use __vmalloc_node() for stack allocation docs/mm: Physical Memory: Populate the "Zones" section xen: balloon: update the NR_BALLOON_PAGES state hv_balloon: update the NR_BALLOON_PAGES state balloon_compaction: update the NR_BALLOON_PAGES state meminfo: add a per node counter for balloon drivers mm: remove references to folio in __memcg_kmem_uncharge_page() ...
2025-03-31Merge tag 'fs_for_v6.15-rc1' of ↵Linus Torvalds1-0/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull ext2, udf, and isofs updates from Jan Kara: - conversion of ext2 to the new mount API - small folio conversion work for ext2 - a fix of an unexpected return value in udf in inode_getblk() - a fix of handling of corrupted directory in isofs * tag 'fs_for_v6.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: udf: Fix inode_getblk() return value ext2: Make ext2_params_spec static ext2: create ext2_msg_fc for use during parsing ext2: convert to the new mount API ext2: Remove reference to bh->b_page isofs: fix KMSAN uninit-value bug in do_isofs_readdir()
2025-03-16fs: convert block_commit_write() to take a folioMatthew Wilcox (Oracle)1-1/+1
All callers now have a folio, so pass it in instead of converting folio->page->folio. Link: https://lkml.kernel.org/r/20250217192009.437916-1-willy@infradead.org Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-13udf: Fix inode_getblk() return valueJan Kara1-0/+1
Smatch noticed that inode_getblk() can return 1 on successful mapping of a block instead of expected 0 after commit b405c1e58b73 ("udf: refactor udf_next_aext() to handle error"). This could confuse some of the callers and lead to strange failures (although the one reported by Smatch in udf_mkdir() is impossible to trigger in practice). Fix the return value of inode_getblk(). Link: https://lore.kernel.org/all/cb514af7-bbe0-435b-934f-dd1d7a16d2cd@stanley.mountain Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Fixes: b405c1e58b73 ("udf: refactor udf_next_aext() to handle error") CC: stable@vger.kernel.org Signed-off-by: Jan Kara <jack@suse.cz>
2025-02-27