aboutsummaryrefslogtreecommitdiff
path: root/Documentation/filesystems
AgeCommit message (Collapse)AuthorFilesLines
2 daysMerge tag 'docs-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linuxLinus Torvalds3-18/+24
Pull documentation updates from Jonathan Corbet: "Things have calmed down a bit on the docs front, with no earthshaking changes this time around: - Ongoing work on the Japanese and Portuguese translations - Better integration of the MAINTAINERS file into the rendered documents, including a search interface - A seemingly infinite supply of fixes for typos, minor grammatical issues, and related problems that LLMs find with abandon" * tag 'docs-7.2' of git://git.kernel.org/pub/scm/linux/kernel/git/docs/linux: (93 commits) docs: pt_BR: Translate 3.Early-stage.rst into Portuguese docs: pt_BR: update "Purpose of Defconfigs" section in maintainer-soc.rst Documentation: bug-hunting.rst: fix grammar docs/ja_JP: translate submitting-patches.rst (interleaved-replies) docs: Fix minor grammatical error docs/{it_it,sp_SP,zh_CN,zh_TW}: update references to removed CONFIG_DEBUG_SLAB Documentation: process: fix brackets Documentation: arch: fix brackets docs/dyndbg: explain flags parse 1st docs/dyndbg: update examples \012 to \n docs: kernel-parameters: Fix stale sticore file paths docs: real-time: Fix duplicated sched(7) text docs: kgdb: Fix stale source file paths docs: sonypi: Fix stale header file path docs: kernel-parameters: Remove sa1100ir IrDA parameter iommu: Documentation: rearrange, update kernel-parameters docs: md: fix grammar in speed_limit description docs: changes.rst: restore pahole 1.26 minimum (regressed by sort) Documentation: Fix syntax of kmalloc_objs example in coding style doc docs: pt_BR: update maintainer-handbooks ...
2 daysMerge tag 'x86_cache_for_v7.2_rc1' of ↵Linus Torvalds1-6/+16
gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip Pull x86 resource control updates from Borislav Petkov: "Preparatory work for MPAM counter assignment: - Simplify the error handling path when creating monitor group event configuration directories - Make the MBM event filter configurable only on architectures that support it and expose this with the respective file modes in the event config - Disallow the MBA software controller on systems where MBM counters are assignable, as it requires continuous bandwidth measurement that assignable counters do not guarantee - Replace a compile-time Kconfig option for fixed counter assignment with a per-architecture runtime property, and expose whether the counter assignment mode is changeable to userspace - Continue counter allocation across all domains instead of aborting at the first failure - Document that automatic MBM counter assignment is best effort and may not assign counters to all domains - Document the behavior of task ID 0 and idle tasks in the resctrl tasks file" * tag 'x86_cache_for_v7.2_rc1' of gitolite.kernel.org:pub/scm/linux/kernel/git/tip/tip: fs/resctrl: Document tasks file behaviour for task id 0 and idle tasks fs/resctrl: Document that automatic counter assignment is best effort fs/resctrl: Continue counter allocation after failure fs/resctrl: Add monitor property 'mbm_cntr_assign_fixed' fs/resctrl: Disallow the software controller when MBM counters are assignable x86,fs/resctrl: Create 'event_filter' files read only if they're not configurable fs/resctrl: Tidy up the error path in resctrl_mkdir_event_configs()
4 daysMerge tag 'pull-dcache' of ↵Linus Torvalds1-0/+17
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache updates from Al Viro: - d_alloc_parallel() API change (Neil's with my changes) - NORCU fixes - Reorganization and simplification of dentry eviction logic - Simplifying rcu_read_lock() scopes in fs/dcache.c - Secondary roots work - getting rid of NFS fake root dentries and dealing with remaining shrink_dcache_for_umount() and shrink_dentry_list() races - making cursors NORCU (surprisingly easy) * tag 'pull-dcache' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (22 commits) make cursors NORCU nfs: get rid of fake root dentries wind ->s_roots via ->d_sib instead of ->d_hash shrink_dentry_tree(): unify the calls of shrink_dentry_list() shrinking rcu_read_lock() scope in d_alloc_parallel() d_walk(): shrink rcu_read_lock() scope document dentry_kill() adjust calling conventions of lock_for_kill(), fold __dentry_kill() into dentry_kill() Document rcu_read_lock() use in select_collect2() Shift rcu_read_{,un}lock() inside fast_dput() simplify safety for lock_for_kill() slowpath fold lock_for_kill() and __dentry_kill() into common helper fold lock_for_kill() into shrink_kill() shrink_dentry_list(): start with removing from shrink list d_prune_aliases(): make sure to skip NORCU aliases kill d_dispose_if_unused() make to_shrink_list() return whether it has moved dentry to list select_collect(): ignore dentries on shrink lists if they have positive refcounts find_acceptable_alias(): skip NORCU aliases with zero refcount fix a race between d_find_any_alias() and final dput() of NORCU dentries ...
4 daysMerge tag 'vfs-7.2-rc1.procfs' of ↵Linus Torvalds1-1/+18
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull procfs updates from Christian Brauner: - Revamp fs/filesystems.c The file was a mess with a hand-rolled linked list in desperate need of a cleanup. The filesystems list is now RCU-ified, /proc files can be marked permanent from outside fs/proc/, and the string emitted when reading /proc/filesystems is pre-generated and cached instead of pointer-chasing and printfing entry by entry on every read. The file is read frequently because libselinux reads it and is linked into numerous frequently used programs (even ones you would not suspect, like sed!). Scalability also improves since reference maintenance on open/close is bypassed. open+read+close cycle single-threaded (ops/s): before: 442732 after: 1063462 (+140%) open+read+close cycle with 20 processes (ops/s): before: 606177 after: 3300576 (+444%) A follow-up patch adds missing unlocks in some corner cases and tidies things up. - Relax the mount visibility check for subset=pid mounts When procfs is mounted with subset=pid, all static files become unavailable and only the dynamic pid information is accessible. In that case there is no point in imposing the full mount visibility restrictions on the mounter - everything that can be hidden in procfs is already inaccessible. These restrictions prevented procfs from being mounted inside rootless containers since almost all container implementations overmount parts of procfs to hide certain directories. As part of this /proc/self/net is only shown in subset=pid mounts for CAP_NET_ADMIN, reconfiguring subset=pid is rejected, the SB_I_USERNS_VISIBLE superblock flag is replaced with an FS_USERNS_MOUNT_RESTRICTED filesystem flag, fully visible mounts are recorded in a list, and the mount restrictions are finally documented. - Protect ptrace_may_access() with exec_update_lock in procfs Most uses of ptrace_may_access() in procfs should hold exec_update_lock to avoid TOCTOU issues with concurrent privileged execve() (like setuid binary execution). This fixes the easy cases - the owner and visibility checks and the FD link permission checks - with the gnarlier ones to follow later. * tag 'vfs-7.2-rc1.procfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: fs: fix ups and tidy ups to /proc/filesystems caching proc: protect ptrace_may_access() with exec_update_lock (FD links) proc: protect ptrace_may_access() with exec_update_lock (part 1) docs: proc: add documentation about mount restrictions proc: handle subset=pid separately in userns visibility checks proc: prevent reconfiguring subset=pid proc: subset=pid: Show /proc/self/net only for CAP_NET_ADMIN fs: cache the string generated by reading /proc/filesystems sysfs: remove trivial sysfs_get_tree() wrapper fs: RCU-ify filesystems list fs: move SB_I_USERNS_VISIBLE to FS_USERNS_MOUNT_RESTRICTED proc: allow to mark /proc files permanent outside of fs/proc/ namespace: record fully visible mounts in list
4 daysMerge tag 'vfs-7.2-rc1.misc' of ↵Linus Torvalds3-1/+196
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "Features: - Reduce pipe->mutex contention by pre-allocating pages outside the lock in anon_pipe_write(). anon_pipe_write() called alloc_page() once per page while holding pipe->mutex. The allocation can sleep doing direct reclaim and runs memcg charging, which extends the critical section and stalls any concurrent reader on the same mutex. Now up to 8 pages are pre-allocated before the mutex is taken, leftovers are recycled into the per-pipe tmp_page[] cache before unlock, and any remainder is released after unlock, keeping the allocator out of the critical section on both sides. On a writers x readers sweep with 64KB writes against a 1 MB pipe throughput improves 6-28% and average write latency drops 5-22%; under memory pressure - when the cost of holding the mutex across reclaim is highest - throughput improves 21-48% and latency drops 17-33%. The microbenchmark is added to selftests. - uaccess/sockptr: fix the ignored_trailing logic in copy_struct_to_user() to behave as documented and the usize check in copy_struct_from_sockptr() for user pointers, and add copy_struct_{from,to}_bounce_buffer() and copy_struct_to_sockptr() helpers for upcoming users (IPPROTO_SMBDIRECT, IPPROTO_QUIC). - bpf: add a sleepable bpf_real_inode() kfunc that resolves the real inode backing a dentry via d_real_inode(). On overlayfs the inode attached to the dentry doesn't carry the underlying device information; this is used by the filesystem restriction BPF program that was merged into systemd. - docs: add guidelines for submitting new filesystems, motivated by the maintenance burden abandoned and untestable filesystems impose on VFS developers, blocking infrastructure work like folio conversions and iomap migration. Fixes: - libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() and drop the now-redundant assignments in callers. This began as a one-line dma-buf fix for a path_noexec() warning; a pseudo filesystem has no reason not to set SB_I_NOEXEC. All init_pseudo() callers were audited: the only visible effect is on dma-buf where SB_I_NOEXEC silences the warning. - Handle set_blocksize() failures in legacy filesystems (bfs, hpfs, qnx4, jfs, befs, affs, isofs, minix, ntfs3, omfs). Mounting a device with a sector size > PAGE_SIZE crashed roughly half of them; the rest had the same missing error handling pattern. Plus a follow-up releasing the superblock buffer_head when setting the minix v3 block size fails. - mount: honour SB_NOUSER in the new mount API. - fs/fcntl: fix a SOFTIRQ-unsafe lock order in fasync signaling by switching the process-group paths of send_sigio() and send_sigurg() from read_lock(&tasklist_lock) to RCU, matching the single-PID path. - vfs: add an FS_USERNS_DELEGATABLE flag and set it for NFS, fixing delegated NFS mounts (fsopen() in a container with the mount performed by a privileged daemon) that broke when non-init s_user_ns was tied to FS_USERNS_MOUNT. - selftests/namespaces: fix a hang in nsid_test where an unreaped grandchild kept the TAP pipe write-end open, a waitpid(-1) race in listns_efault_test, and a false FAIL on kernels without listns() where the tests should SKIP. - filelock: fix the break_lease() stub signature for CONFIG_FILE_LOCKING=n. - init/initramfs_test: wait for the async initramfs unpacking before running; the test and do_populate_rootfs() share the parser state. - fs/coredump: reduce redundant log noise in validate_coredump_safety(). - iomap: pass the correct length to fserror_report_io() in __iomap_write_begin(). - backing-file: fix the backing_file_open() kerneldoc. Cleanups: - initramfs: refactor the cpio hex header parsing to use hex2bin() instead of the hand-rolled simple_strntoul() which is reverted, and extend the initramfs KUnit tests to cover header fields with 0x prefixes. - Replace __get_free_pages() and friends with kmalloc()/kzalloc() across quota, proc, ocfs2/dlm, nilfs2, nfs, nfsd, libfs, jfs, jbd2, isofs, fuse, select, namespace, configfs, binfmt_misc, bfs, and the do_mounts init code - part of the larger work of replacing page allocator calls with kmalloc(). - Use clear_and_wake_up_bit() in unlock_buffer() and journal_end_buffer_io_sync() instead of open-coding the sequence. - Drop unused VFS exports: unexport drop_super_exclusive(), remove start_removing_user_path_at(), and fold __start_removing_path() into start_removing_path(). - fs/read_write: narrow the __kernel_write() export with EXPORT_SYMBOL_FOR_MODULES(). - vfs: uapi: retire octal and hex constants in favor of (1 << n) for the O_ flags. Finding a free bit for a new flag across the architectures was needlessly hard with the mixed bases. - dcache: add extra sanity checks of dead dentries in dentry_free() via a new DENTRY_WARN_ONCE() that also prints d_flags. - iov_iter: use kmemdup_array() in dup_iter() to harden the allocation against multiplication overflow. - fs/pipe: write to ->poll_usage only once. - vfs: remove an always-taken if-branch in find_next_fd(). - dcache: use kmalloc_flex() for struct external_name in __d_alloc(). - namei: use QSTR() instead of QSTR_INIT() in path_pts(). - sync_file_range: delete dead S_ISLNK code. - Comment fixes: retire a stale comment in fget_task_next() and fix assorted spelling mistakes" * tag 'vfs-7.2-rc1.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (73 commits) backing-file: fix backing_file_open() kerneldoc parameter iomap: pass the correct len to fserror_report_io in __iomap_write_begin vfs: add FS_USERNS_DELEGATABLE flag and set it for NFS filelock: fix break_lease() stub signature for CONFIG_FILE_LOCKING=n vfs: uapi: retire octal and hex numbers in favor of (1 << n) for O_ flags bpf: add bpf_real_inode() kfunc fs/read_write: Do not export __kernel_write() to the entire world libfs: drop redundant SB_I_NOEXEC/SB_I_NODEV in init_pseudo() callers libfs: set SB_I_NOEXEC and SB_I_NODEV by default in init_pseudo() mount: honour SB_NOUSER in the new mount API fs/fcntl: fix SOFTIRQ-unsafe lock order in fasync signaling selftests/pipe: add pipe_bench microbenchmark fs/pipe: pre-allocate pages outside pipe->mutex in anon_pipe_write fs: retire stale comment in fget_task_next() fs: fix spelling mistakes in comment bfs: replace get_zeroed_page() with kzalloc() binfmt_misc: replace __get_free_page() with kmalloc() configfs: replace __get_free_pages() with kzalloc() fs/namespace: use __getname() to allocate mntpath buffer fs/select: replace __get_free_page() with kmalloc() ...
4 daysMerge tag 'vfs-7.2-rc1.bh' of ↵Linus Torvalds1-14/+0
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull buffer_head updates from Christian Brauner: "This removes b_end_io from struct buffer_head. Instead of setting bio->bi_end_io to end_bio_bh_io_sync() which then calls bh->b_end_io(), the new bh_submit() and __bh_submit() interfaces set bio->bi_end_io to the appropriate completion handler directly, replacing two indirect function calls in the completion path with one. It is also one fewer function pointer in the middle of a writable data structure that can be corrupted, it shrinks struct buffer_head from 104 to 96 bytes allowing roughly 7% more buffer_heads to be cached in the same amount of memory, and it removes some atomic operations as the buffer refcount is no longer incremented before calling the end_io handler. All in-tree users (fs/buffer.c itself, ext4, jbd2, ocfs2, gfs2, nilfs2, and md-bitmap) are converted, and submit_bh(), mark_buffer_async_write(), and end_buffer_write_sync() are removed" * tag 'vfs-7.2-rc1.bh' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (34 commits) buffer: Remove end_buffer_write_sync() buffer: Change calling convention for end_buffer_read_sync() buffer: Remove b_end_io buffer: Remove submit_bh() md-bitmap: Convert read_file_page and write_file_page to bh_submit() nilfs2: Convert nilfs_mdt_submit_block to bh_submit() nilfs2: Convert nilfs_gccache_submit_read_data to bh_submit() nilfs2: Convert nilfs_btnode_submit_block to bh_submit() buffer: Remove mark_buffer_async_write() gfs2: Convert gfs2_aspace_write_folio to bh_submit() gfs2: Remove use of b_end_io in gfs2_meta_read_endio() gfs2: Convert gfs2_dir_readahead to bh_submit() gfs2: Convert gfs2_metapath_ra to bh_submit() ocfs2: Convert ocfs2_write_super_or_backup to bh_submit() ocfs2: Convert ocfs2_read_blocks to bh_submit() ocfs2: Convert ocfs2_read_block to bh_submit() ocfs2: Convert ocfs2_write_block to bh_submit() jbd2: Convert jbd2_write_superblock() to bh_submit() jbd2: Convert journal commit to bh_submit() ext4: Convert ext4_commit_super() to bh_submit() ...
13 dayskill d_dispose_if_unused()Al Viro1-0/+11
Rename to_shrink_list() into __move_to_shrink_list(), document and export it. Switch d_dispose_if_unused() users to that and kill d_dispose_if_unused() itself. Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
13 daysVFS: use wait_var_event for waiting in d_alloc_parallel()NeilBrown1-0/+6
Parallel lookup starts with a call of d_alloc_parallel(). That primitive either returns a matching hashed dentry or allocates a new one in the in-lookup state and returns it to the caller. Once the caller is done with lookup, it indicates so either by call of d_{splice_alias,add}() or by call of d_done_lookup(); at that point dentry leaves the in-lookup state. If d_alloc_parallel() finds a matching in-lookup dentry, it must wait for that dentry to leave the in-lookup state, one way or another. Currently by supplying wait_queue_head to d_alloc_parallel(). If d_alloc_parallel() creates a new in-lookup dentry, the address of that wait_queue_head is stored in ->d_wait of new dentry and stays there while it's in the in-lookup; subsequent d_alloc_parallel() will wait on the queue found in the matching in-lookup dentry. Transition out of in-lookup state wakes waiters on that queue (if any). That works, but the calling conventions are inconvenient - the caller must supply wait_queue_head and make sure that it survives at least until the new in-lookup dentry leaves the in-lookup state. That amounts to boilerplate in the d_alloc_parallel() callers that are followed by a call of d_lookup_done() in the same function; in cases like nfs asynchronous unlink it gets worse than that. This patch changes d_alloc_parallel() to use wake_up_var_locked() to wake up waiters, and wait_var_event_spinlock() to wait. dentry->d_lock is used for synchronisation as it is already held and the relevant times. That eliminates the need of caller-supplied wait_queue_head, simplifying the calling conventions. Better yet, we only need one bit of information stored in dentry itself: whether there are any waiters to be woken up, and that can be easily stored in ->d_flags; ->d_wait goes away. The reason we need that bit (DCACHE_LOOKUP_WAITERS) is that with wait_var machinery the queues are shared with all kinds of stuff and there's no way tell if any of the waiters have anything to do with our dentry; most of the time none of them will be relevant, so we need to avoid the pointless wakeups. Another benefit of the new scheme comes from the fact that wakeups have to be done outside of write-side critical areas of ->i_dir_seq; with the old scheme we need to carry the value picked from ->d_wait from __d_lookup_unhash() to the place where we actually wake the waiters up. Now we can just leave DCACHE_LOOKUP_WAITERS in ->d_flags until we get to doing wakeups - that's done within the same ->d_lock scope, so we are fine; new bit is accessed only under ->d_lock and it's seen only on dentries with DCACHE_PAR_LOOKUP in ->d_flags. __d_lookup_unhash() no longer needs to re-init ->d_lru. That was previously shared (in a union) with ->d_wait but ->d_wait is now gone so it no longer corrupts ->d_lru. Co-developed-by: Al Viro <viro@zeniv.linux.org.uk> # saner handling of flags Signed-off-by: NeilBrown <neil@brown.name> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2026-06-04buffer: Remove b_end_ioMatthew Wilcox (Oracle)1-14/+0
This shrinks buffer_head by 8 bytes, letting us pack more buffer heads per slab. With a Debian config, it shrinks from 104 bytes to 96 bytes which is 42 objects per 4KiB page rather than 39, a 7% reduction in the amount of memory used. Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org> Link: https://patch.msgid.link/20260528173150.1093780-33-willy@infradead.org Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-21fs: remove start_removing_user_path_atChristoph Hellwig1-1/+0
This function is entirely unused, remove it. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260511072239.2456725-3-hch@lst.de Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-21docs: add guidelines for submitting new filesystemsAmir Goldstein2-0/+196
This document is motivated by the ongoing maintenance burden that abandoned and untestable filesystems impose on VFS developers, blocking infrastructure changes such as folio conversions and iomap migration. This week alone, two new filesystems were proposed on linux-fsdevel (VMUFAT and FTRFS), highlighting the need for documented guidelines that new filesystem authors can refer to before submission. Multiple recent discussions on linux-fsdevel have touched on the criteria for merging new filesystems and for deprecating old ones, covering topics such as modern VFS interface adoption, testability, userspace utilities, maintainer commitment, and user base viability. Add Documentation/filesystems/adding-new-filesystems.rst describing the technical requirements and community expectations for merging a new filesystem into the kernel. The guidelines cover: - Alternatives to consider before proposing a new in-kernel filesystem - Technical requirements: modern VFS interfaces (iomap, folios, fs_context mount API), testability, and userspace utilities - Community expectations: identified maintainers, demonstrated commitment, sustained backing, and a clear user base - Ongoing obligations after merging, including the risk of deprecation for unmaintained filesystems Link: https://lore.kernel.org/linux-fsdevel/20260411151155.321214-1-adrianmcmenamin@gmail.com/ Link: https://lore.kernel.org/linux-fsdevel/20260413142357.515792-1-aurelien@hackers.camp/ Link: https://lore.kernel.org/linux-fsdevel/yndtg2jbj55fzd2kkhsmel4pp5ll5xfvfiaqh24tdct3jiqosd@jzbfzf3rrxrd/ Link: https://lore.kernel.org/linux-fsdevel/20260124091742.GA43313@macsyma.local/ Link: https://lore.kernel.org/lkml/20260111140345.3866-1-linkinjeon@kernel.org/ Cc: Christian Brauner <brauner@kernel.org> Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Jan Kara <jack@suse.cz> Cc: Theodore Tso <tytso@mit.edu> Cc: Christoph Hellwig <hch@lst.de> Cc: Darrick J. Wong <djwong@kernel.org> Cc: Matthew Wilcox <willy@infradead.org> Assisted-by: Cursor:claude-4-opus Signed-off-by: Amir Goldstein <amir73il@gmail.com> Link: https://patch.msgid.link/20260422125212.1743006-1-amir73il@gmail.com Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-15docs: locking: Fix stale dquot.c pathCosta Shulyupin1-1/+1
The quota code was moved from fs/dquot.c to fs/quota/dquot.c in commit 884d179dff3a ("quota: Move quota files into separate directory"). Update the reference. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260503160221.1594319-2-costa.shul@redhat.com>
2026-05-11docs: proc: add documentation about mount restrictionsAlexey Gladkov1-1/+18
procfs has a number of mounting restrictions that are not documented anywhere. Signed-off-by: Alexey Gladkov <legion@kernel.org> Link: https://patch.msgid.link/e7cb804df3c1759ee17cf9df1dc4c211d63d7a5f.1777278334.git.legion@kernel.org Reviewed-by: Aleksa Sarai <aleksa@amutable.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-05-08fs/resctrl: Document tasks file behaviour for task id 0 and idle tasksBen Horgan1-0/+5
When 0 is written to the tasks file it is interpreted as the current task in rdtgroup_move_task(). Each CPU's idle task has task_struct::pid set to 0 and, on x86, task_struct::closid to RESCTRL_RESERVED_CLOSID and task_struct::rmid to RESCTRL_RESERVED_RMID. Equivalently, on MPAM platforms, thread_info::mpam_partid_pmg is encoded with PARTID and PMG set to RESCTRL_RESERVED_CLOSID and RESCTRL_RESERVED_RMID, respectively. As there is no interface to change these from the default, the resctrl configuration for the idle tasks is fixed and they always behave equivalently to a task in the default tasks file and so take their configuration from the cpus/cpus_list files. On read of the tasks file, show_rdt_tasks() filters out any 0 PID. Hence, a task id of 0 is never shown in the tasks file and the idle tasks are not represented either. Document the user visible behaviour. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/20260506082855.3694761-1-ben.horgan@arm.com
2026-05-08fs/resctrl: Document that automatic counter assignment is best effortBen Horgan1-0/+6
When using automatic counter assignment it's useful for a user to know which counters they can expect to be assigned on group creation. Document that automatic counter assignment is best effort and how to discover any assignment failures. Suggested-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Babu Moger <babu.moger@amd.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/20260506082855.3694761-1-ben.horgan@arm.com
2026-05-06x86,fs/resctrl: Create 'event_filter' files read only if they're not ↵Ben Horgan1-6/+5
configurable When the counter assignment mode is mbm_event resctrl assumes the MBM events are configurable and exposes the 'event_filter' files. These files live at info/L3_MON/event_configs/<event>/event_filter and are used to display and set the event configuration. The MPAM architecture has support for configuring the memory bandwidth utilization (MBWU) counters to only count reads or only count writes. However, in MPAM, this event filtering support is optional in the hardware (and not yet implemented in the MPAM driver) but MBM counter assignment is always possible for MPAM MBWU counters. In order to support mbm_event mode with MPAM, create the 'event_filter' files read only if the event configuration can't be changed. A user can still chmod the file and so also return early with an error from event_filter_write(). Introduce a new monitor property, mbm_cntr_configurable, to indicate whether or not assignable MBM counters are configurable. On x86, set this to true whenever mbm_cntr_assignable is true to keep existing behaviour. Signed-off-by: Ben Horgan <ben.horgan@arm.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Reviewed-by: Babu Moger <babu.moger@amd.com> Tested-by: Babu Moger <babu.moger@amd.com> Link: https://lore.kernel.org/20260506082855.3694761-1-ben.horgan@arm.com
2026-05-03Documentation: filesystems: cramfs: correct stale hard-link and endianness ↵Nicolas Pitre1-8/+14
claims Two paragraphs in cramfs.rst have been misleading for a long time: - "Hard links are supported, but hard linked files will still have a link count of 1": mkcramfs does not preserve hard links; it deduplicates by content (eliminate_doubles()). Two names for the same on-disk inode in the source tree become two separate (content-shared) entries in the image, and cramfs always reports a link count of 1. - "Currently, cramfs must be written and read with architectures of the same endianness ... PAGE_SIZE == 4096 ... is a bug, but it hasn't been decided what the best fix is": the endianness situation has been settled for years -- the kernel checks for CRAMFS_MAGIC_WEND in cramfs_fill_super() and refuses the mount, and mkcramfs has gained -B / -L for producing images of the opposite endianness from the build host (useful for cross-builds, but the reader still needs to match). Restate this accurately. Signed-off-by: Nicolas Pitre <nico@fluxnic.net> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260422211039.270552-1-nico@fluxnic.net>
2026-05-03docs: proc: fix minor grammar and formatting issuesMyro Demma1-2/+2
Fix missing "from" in "prevent <pid> being reused" and add spacing in vm_area_struct range notation for readability. No functional changes. Signed-off-by: Myro Demma <myro@myromyro.com> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260428133001.11384-1-myro@myromyro.com>
2026-04-27Merge tag 'fs_for_v7.1-rc2' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs Pull isofs and udf fixes from Jan Kara: "Several isofs and udf fixes" * tag 'fs_for_v7.1-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs: docs: isofs: replace dead ECMA-119 FTP link udf: reject descriptors with oversized CRC length isofs: use QSTR_LEN() in isofs_cmp isofs: validate block number from NFS file handle in isofs_export_iget isofs: validate Rock Ridge CE continuation extent against volume size
2026-04-27docs: isofs: replace dead ECMA-119 FTP linkZiran Zhang1-1/+1
The original link is no longer valid. Replace it with the official PDF of the 2nd edition. The new link points to the exact 2nd edition that the existing comment in isofs.rst refers to. Signed-off-by: Ziran Zhang <zhangcoder@yeah.net> Link: https://patch.msgid.link/20260425142943.6809-1-zhangcoder@yeah.net Signed-off-by: Jan Kara <jack@suse.cz>
2026-04-27Documentation: proc: fix section numbering in table of contentsBaolin Liu1-7/+7
Commit e24ccaaf7ec4 ("block: remove last remaining traces of IDE documentation") removed the IDE section but left its table of contents entry behind. Fix the stale entry and renumber the following sections. Fixes: e24ccaaf7ec4 ("block: remove last remaining traces of IDE documentation") Signed-off-by: Baolin Liu <liubaolin@kylinos.cn> Acked-by: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Jonathan Corbet <corbet@lwn.net> Message-ID: <20260424090654.19229-1-liubaolin12138@163.com>
2026-04-21Merge tag 'pull-dcache-busy-wait' of ↵Linus Torvalds1-0/+10
git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull dcache busy loop updates from Al Viro: "Fix livelocks in shrink_dcache_tree() If shrink_dcache_tree() finds a dentry in the middle of being killed by another thread, it has to wait until the victim finishes dying, gets detached from the tree and ceases to pin its parent. The way we used to deal with that amounted to busy-wait; unfortunately, it's not just inefficient but can lead to reliably reproducible hard livelocks. Solved by having shrink_dentry_tree() attach a completion to such dentry, with dentry_unlist() calling complete() on all objects attached to it. With a bit of care it can be done without growing struct dentry or adding overhead in normal case" * tag 'pull-dcache-busy-wait' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: get rid of busy-waiting in shrink_dcache_tree() dcache.c: more idiomatic "positives are not allowed" sanity checks struct dentry: make ->d_u anonymous for_each_alias(): helper macro for iterating through dentries of given inode
2026-04-20Merge tag 'nfsd-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linuxLinus Torvalds2-0/+87
Pull nfsd updates from Chuck Lever: - filehandle signing to defend against filehandle-guessing attacks (Benjamin Coddington) The server now appends a SipHash-2-4 MAC to each filehandle when the new "sign_fh" export option is enabled. NFSD then verifies filehandles received from clients against the expected MAC; mismatches return NFS error STALE - convert the entire NLMv4 server-side XDR layer from hand-written C to xdrgen-generated code, spanning roughly thirty patches (Chuck Lever) XDR functions are generally boilerplate code and are easy to get wrong. The goals of this conversion are improved memory safety, lower maintenance burden, and groundwork for eventual Rust code generation for these functions. - improve pNFS block/SCSI layout robustness with two related changes (Dai Ngo) SCSI persistent reservation fencing is now tracked per client and per device via an xarray, to avoid both redundant preempt operations on devices already fenced and a potential NFSD deadlock when all nfsd threads are waiting for a layout return. - scalability and infrastructure improvements Sincere thanks to all contributors, reviewers, testers, and bug reporters who participated in the v7.1 NFSD development cycle. * tag 'nfsd-7.1' of git://git.kernel.org/pub/scm/linux/kernel/git/cel/linux: (83 commits) NFSD: Docs: clean up pnfs server timeout docs nfsd: fix comment typo in nfsxdr nfsd: fix comment typo in nfs3xdr NFSD: convert callback RPC program to per-net namespace NFSD: use per-operation statidx for callback procedures svcrdma: Use contiguous pages for RDMA Read sink buffers SUNRPC: Add svc_rqst_page_release() helper SUNRPC: xdr.h: fix all kernel-doc warnings svcrdma: Factor out WR chain linking into helper svcrdma: Add Write chunk WRs to the RPC's Send WR chain svcrdma: Clean up use of rdma->sc_pd->device svcrdma: Clean up use of rdma->sc_pd->device in Receive paths svcrdma: Add fair queuing for Send Queue access SUNRPC: Optimize rq_respages allocation in svc_alloc_arg SUNRPC: Track consumed rq_pages entries svcrdma: preserve rq_next_page in svc_rdma_save_io_pages SUNRPC: Handle NULL entries in svc_rqst_release_pages SUNRPC: Allocate a separate Reply page array SUNRPC: Tighten bounds checking in svc_rqst_replace_page NFSD: Sign filehandles ...
2026-04-19Merge tag 'mm-stable-2026-04-18-02-14' of ↵Linus Torvalds1-0/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song) Address the longstanding "dying memcg problem". A situation wherein a no-longer-used memory control group will hang around for an extended period pointlessly consuming memory - "fix unexpected type conversions and potential overflows" (Qi Zheng) Fix a couple of potential 32-bit/64-bit issues which were identified during review of the "Eliminate Dying Memory Cgroup" series - "kho: history: track previous kernel version and kexec boot count" (Breno Leitao) Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time - "liveupdate: prevent double preservation" (Pasha Tatashin) Teach LUO to avoid managing the same file across different active sessions - "liveupdate: Fix module unloading and unregister API" (Pasha Tatashin) Address an issue with how LUO handles module reference counting and unregistration during module unloading - "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar) Simplify and clean up the zswap crypto compression handling and improve the lifecycle management of zswap pool's per-CPU acomp_ctx resources - "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race" (SeongJae Park) Address unlikely but possible leaks and deadlocks in damon_call() and damon_walk() - "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park) Fix a couple of root-only wild pointer dereferences - "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race" (SeongJae Park) Update the DAMON documentation to warn operators about potential races which can occur if the commit_inputs parameter is altered at the wrong time - "Minor hmm_test fixes and cleanups" (Alistair Popple) Bugfixes and a cleanup for the HMM kernel selftests - "Modify memfd_luo code" (Chenghao Duan) Cleanups, simplifications and speedups to the memfd_lou code - "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport) Support for userfaultfd in guest_memfd - "selftests/mm: skip several tests when thp is not available" (Chunyu Hu) Fix several issues in the selftests code which were causing breakage when the tests were run on CONFIG_THP=n kernels - "mm/mprotect: micro-optimization work" (Pedro Falcato) A couple of nice speedups for mprotect() - "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav) Document upcoming changes in the maintenance of KHO, LUO, memfd_luo, kexec, crash, kdump and probably other kexec-based things - they are being moved out of mm.git and into a new git tree * tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits) MAINTAINERS: add page cache reviewer mm/vmscan: avoid false-positive -Wuninitialized warning MAINTAINERS: update Dave's kdump reviewer email address MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE MAINTAINERS: drop include/linux/kho/abi/ from KHO MAINTAINERS: update KHO and LIVE UPDATE maintainers MAINTAINERS: update kexec/kdump maintainers entries mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd() selftests: mm: skip charge_reserved_hugetlb without killall userfaultfd: allow registration of ranges below mmap_min_addr mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update mm/hugetlb: fix early boot crash on parameters without '=' separator zram: reject unrecognized type= values in recompress_store() docs: proc: document ProtectionKey in smaps mm/mprotect: special-case small folios when applying permissions mm/mprotect: move softleaf code out of the main function mm: remove '!root_reclaim' checking in should_abort_scan() mm/sparse: fix comment for section map alignment mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete() selftests/mm: transhuge_stress: skip the test when thp not available ...
2026-04-18docs: proc: document ProtectionKey in smapsKevin Brodsky1-0/+4
The ProtectionKey entry was added in v4.9; back then it was x86-specific, but it now lives in generic code and applies to all architectures supporting pkeys (currently x86, power, arm64). Time to document it: add a paragraph to proc.rst about the ProtectionKey entry. [akpm@linux-foundation.org: s/system/hardware/, per review discussion] [akpm@linux-foundation.org: s/hardware/CPU/] Link: https://lore.kernel.org/20260407125133.564182-1-kevin.brodsky@arm.com Signed-off-by: Kevin Brodsky <kevin.brodsky@arm.com> Reported-by: Yury Khrustalev <yury.khrustalev@arm.com> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Reviewed-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Lorenzo Stoakes <ljs@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kevin Brodsky <kevin.brodsky@arm.com> Cc: Marc Rutland <mark.rutland@arm.com> Cc: Shuah Khan <skhan@linuxfoundation.org> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-17Merge tag 'ntfs-for-7.1-rc1-v2' of ↵Linus Torvalds2-0/+160
git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs Pull ntfs resurrection from Namjae Jeon: "Ever since Kari Argillander’s 2022 report [1] regarding the state of the ntfs3 driver, I have spent the last 4 years working to provide full write support and current trends (iomap, no buffer head, folio), enhanced performance, stable maintenance, utility support including fsck for NTFS in Linux. This new implementation is built upon the clean foundation of the original read-only NTFS driver, adding: - Write support: Implemented full write support based on the classic read-only NTFS driver. Added delayed allocation to improve write performance through multi-cluster allocation and reduced fragmentation of the cluster bitmap. - iomap conversion: Switched buffered IO (reads/writes), direct IO, file extent mapping, readpages, and writepages to use iomap. - Remove buffer_head: Completely removed buffer_head usage by converting to folios. As a result, the dependency on CONFIG_BUFFER_HEAD has been removed from Kconfig. - Stability improvements: The new ntfs driver passes 326 xfstests, compared to 273 for ntfs3. All tests passed by ntfs3 are a complete subset of the tests passed by this implementation. Added support for fallocate, idmapped mounts, permissions, and more. xfstests Results report: Total tests run: 787 Passed : 326 Failed : 38 Skipped : 423 Failed tests breakdown: - 34 tests require metadata journaling - 4 other tests: 094: No unwritten extent concept in NTFS on-disk format 563: cgroup v2 aware writeback accounting not supported 631: RENAME_WHITEOUT support required 787: NFS delegation test" Link: https://lore.kernel.org/all/da20d32b-5185-f40b-48b8-2986922d8b25@stargateuniverse.net/ [1] [ Let's see if this undead filesystem ends up being of the "Easter miracle" kind, or the "Nosferatu of filesystems" kind... ] * tag 'ntfs-for-7.1-rc1-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/linkinjeon/ntfs: (46 commits) ntfs: remove redundant out-of-bound checks ntfs: add bound checking to ntfs_external_attr_find ntfs: add bound checking to ntfs_attr_find ntfs: fix ignoring unreachable code warnings ntfs: fix inconsistent indenting warnings ntfs: fix variable dereferenced before check warnings ntfs: prefer IS_ERR_OR_NULL() over manual NULL check ntfs: harden ntfs_listxattr against EA entries ntfs: harden ntfs_ea_lookup against malformed EA entries ntfs: check $EA query-length in ntfs_ea_get ntfs: validate WSL EA payload sizes ntfs: fix WSL ea restore condition ntfs: add missing newlines to pr_err() messages ntfs: fix pointer/integer casting warnings ntfs: use ->mft_no instead of ->i_ino in prints ntfs: change mft_no type to u64 ntfs: select FS_IOMAP in Kconfig ntfs: add MODULE_ALIAS_FS ntfs: reduce stack usage in ntfs_write_mft_block() ntfs: fix sysctl table registration and path ...
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds2-0/+169
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_