aboutsummaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)AuthorFilesLines
2026-05-28Merge v7.1-rc5 into drm-nextSimona Vetter36-159/+714
Boris Brezillion needs the gem lru fixes 379e8f1ca5e9 ("drm/gem: Make the GEM LRU lock part of drm_device") backmerged for drm-misc-next. That also means we need to sort out the rename conflict in panthor with the fixup patch from Boris from drm-tip. Signed-off-by: Simona Vetter <simona.vetter@ffwll.ch>
2026-05-28PCI: Add pci_ats_required() for CXL.cache capable devicesNicolin Chen1-0/+3
Controlled by IOMMU drivers, ATS can be enabled "on demand", when a given PASID on a device is attached to an I/O page table. This is working, even when a device has no translation on its RID (i.e., RID is IOMMU bypassed). However, certain PCIe devices require non-PASID ATS on their RID even when the RID is IOMMU bypassed. Call this "ATS always on" in IOMMU term. For example, CXL spec r4.0 notes in sec 3.2.5.13 Memory Type on CXL.cache: "To source requests on CXL.cache, devices need to get the Host Physical Address (HPA) from the Host by means of an ATS request on CXL.io." In other words, the CXL.cache capability requires ATS; otherwise, it can't access host physical memory. Introduce a new pci_ats_required() helper for the IOMMU driver to scan a PCI device and shift ATS policies between "on demand" and "always on". Add the support for CXL.cache devices first. Pre-CXL devices will be added in quirks.c file. Note that pci_ats_required() validates against pci_ats_supported(), so we ensure that untrusted devices (e.g. external ports) will not be always on. This maintains the existing ATS security policy regarding potential side- channel attacks via ATS. Cc: linux-cxl@vger.kernel.org Suggested-by: Vikram Sethi <vsethi@nvidia.com> Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Tested-by: Nirmoy Das <nirmoyd@nvidia.com> Acked-by: Nirmoy Das <nirmoyd@nvidia.com> Reviewed-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Nicolin Chen <nicolinc@nvidia.com> Reviewed-by: Yi Liu <yi.l.liu@intel.com> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-05-28iommu, debugobjects: avoid gcc-16.1 section mismatch warningsArnd Bergmann1-0/+11
gcc-16 has gained some more advanced inter-procedual optimization techniques that enable it to inline the dummy_tlb_add_page() and dummy_tlb_flush() function pointers into a specialized version of __arm_v7s_unmap: WARNING: modpost: vmlinux: section mismatch in reference: __arm_v7s_unmap+0x2cc (section: .text) -> dummy_tlb_add_page (section: .init.text) ERROR: modpost: Section mismatches detected. >From what I can tell, the transformation is correct, as this is only called when __arm_v7s_unmap() is called from arm_v7s_do_selftests(), which is also __init. Since __arm_v7s_unmap() however is not __init, gcc cannot inline the inner function calls directly. In debug_objects_selftest(), the same thing happens. Both the caller and the leaf function are __init, but the IPA pulls it into a non-init one: WARNING: modpost: vmlinux: section mismatch in reference: lookup_object_or_alloc+0x7c (section: .text.lookup_object_or_alloc) -> is_static_object (section: .init.text) Marking the affected functions as not "__init" would reliably avoid this issue but is not a good solution because it removes an otherwise correct annotation. I tried marking the functions as 'noinline', but that ended up not covering all the affected configurations. With some more experimenting, I found that marking these functions as __attribute__((noipa)) is both logical and reliable. In order to keep the syntax readable, add a custom macro for this in include/linux/compiler_attributes.h next to other related macros and use it to annotate both files. Link: https://lore.kernel.org/all/abRB6g-48ZX6Yl2r@willie-the-truck/ Cc: Will Deacon <will@kernel.org> Cc: Thomas Gleixner <tglx@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Miguel Ojeda <ojeda@kernel.org> Cc: linux-kbuild@vger.kernel.org Cc: stable@vger.kernel.org Signed-off-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Will Deacon <will@kernel.org> Acked-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Miguel Ojeda <ojeda@kernel.org> Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
2026-05-27compiler-clang.h: Drop explicit version number from "all" diagnostic macroNathan Chancellor1-2/+2
This is more consistent with what commit 7efa84b5cdd6 ("compiler-gcc.h: Introduce __diag_GCC_all") did for GCC. Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-16-b3b8cda46bdd@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-05-27compiler-clang.h: Remove __cleanup -Wunused-variable workaroundNathan Chancellor1-9/+0
Now that the minimum supported version of LLVM for building the kernel has been raised to 17.0.1, the redefinition of __cleanup with __maybe_unused added to it is unnecessary because the referenced LLVM change is present in all supported LLVM versions. Drop it. Link: https://patch.msgid.link/20260517-bump-minimum-supported-llvm-version-to-17-v2-15-b3b8cda46bdd@kernel.org Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-05-27mshv: Add conditional VMBus dependencyMichael Kelley1-0/+4
When the VMBus driver is not part of the kernel (CONFIG_HYPERV_VMBUS=n), the MSHV root driver fails to link: ERROR: modpost: "hv_vmbus_exists" [drivers/hv/mshv_root.ko] undefined! Fix this while meeting these requirements: * It must be possible to include the MSHV root driver without the VMBus driver. In such case, the MSHV root driver can be built-in to the kernel image, or it can be built as a separate module. * If both the MSHV root driver and the VMBus driver are present, the MSHV root driver and VMBus driver can both be built-in, or they can both be separate modules. Or the MSHV root driver can be a module while the VMBus driver can be built-in, but the reverse is disallowed. Regardless of the build choices, the VMBus driver must be loaded before the MSHV driver in order for the SynIC to be managed properly (see comments in the MSHV SynIC code). The fix has two parts: * Add a Kconfig entry for MSHV_ROOT to depend on HYPERV_VMBUS if HYPERV_VMBUS is present. The entry disallows MSHV_ROOT being built-in when HYPERV_VMBUS is a module, but without requiring that HYPERV_VMBUS be built. * Add a stub implementation of hv_vmbus_exists() for when the VMBus driver is not present so that the MSHV root driver has no module dependency on VMBus. When the VMBus driver *is* present, the module dependency ensures that the VMBus driver loads first when both are built as modules. Existing code ensures that the VMBus driver loads first if it is built-in. The VMBus driver uses subsys_initcall(), which is initcall level 4. The MSHV root driver uses module_init(), which becomes device_init() when built-in, and device_init() is initcall level 6. Reported-by: Arnd Bergmann <arnd@arndb.de> Closes: https://lore.kernel.org/all/20260520074044.923728-1-arnd@kernel.org/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Arnd Bergmann <arnd@arndb.de> Reviewed-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Hardik Garg <hargar@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-05-27rust: driver core: drop drvdata before devres releaseDanilo Krummrich1-2/+2
Move the post_unbind_rust callback before devres_release_all() in device_unbind_cleanup(). With drvdata() removed, the driver's bus device private data is only accessible by the owning driver itself. It is hence safe to drop the driver's bus device private data before devres actions are released. This reordering is the key enabler for Higher-Ranked Lifetime Types (HRT) in Rust device drivers -- it allows driver structs to hold direct references to devres-managed resources, because the bus device private data (and with it all such references) is guaranteed to be dropped while the underlying devres resources are still alive. Without this change, devres resources would be freed first, leaving the driver's bus device private data with dangling references during its destructor. Reviewed-by: Alexandre Courbot <acourbot@nvidia.com> Reviewed-by: Gary Guo <gary@garyguo.net> Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Link: https://patch.msgid.link/20260525202921.124698-6-dakr@kernel.org Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-05-26Merge tag 'nf-next-26-05-25' of ↵Jakub Kicinski1-0/+17
https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Florian Westphal says: ==================== netfilter: updates for net-next The following patchset contains Netfilter fixes and small enhancements: 1) Disable 32-bit x_tables compatibility (32bit binaries on 64bit kernel) interface in user namespaces. This is 'last warning' before this is removed for good. 2) Add a configuration toggle for netfilter GCOV profiling. Provide dedicated toggles for ipset and ipvs. 3) Remove modular support for nfnetlink and restrict it to built-in only. From Pablo Neira Ayuso. 4) Use per-rule hash initval in nf_conncount. This avoids unecessary lock contention with short keys (e.g. conntrack zones) in different namespaces. 5) Use nf_ct_exp_net() in ctnetlink expectation dumps. From Pratham Gupta. 6) Remove a dead conditional in nft_set_rbtree. 7) Fix conntrack helper policy updates to apply per-class values correctly. From David Carlier. 8) Fix an off-by-one OOB read in nf_conntrack_irc:parse_dcc(). Use strict less-than comparison in the newline search loop to respect the exclusive-end pointer convention. From Muhammad Bilal. 9) Fix typos in nf_conntrack_proto_tcp comments. From Avinash Duduskar. 10) Restore performance optimization in nft_set_pipapo_avx2 by passing the next map index. Refactor lookup logic for clarity and add a DEBUG_NET check to document this. 11) Avoid (harmless) u16 overflow in nf_conntrack_ftp when parsing FTP PORT and EPRT commands. Ignore commands where single octet exceeds 255. From Giuseppe Caruso. Patch 12, which removes incorrect (and obviously unused) code from nft_byteorder was kept back to avoid a net -> net-next merge conflict. * tag 'nf-next-26-05-25' of https://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_conntrack_ftp: avoid u16 overflows netfilter: nft_set_pipapo_avx2: restore performance optimization netfilter: nf_conntrack_proto_tcp: fix typos in comments netfilter: nf_conntrack_irc: fix parse_dcc() off-by-one OOB read netfilter: nfnl_cthelper: apply per-class values when updating policies netfilter: nft_set_rbtree: remove dead conditional netfilter: ctnetlink: use nf_ct_exp_net() in expectation dump netfilter: nf_conncount: use per-rule hash initval netfilter: allow nfnetlink built-in only netfilter: add option for GCOV profiling netfilter: x_tables: disable 32bit compat interface in user namespaces ==================== Link: https://patch.msgid.link/20260525182924.28456-1-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-26llc: Add SPDX id lines to some llc source filesTim Bird1-7/+1
Most of the lls source files are missing SPDX-License-Identifier lines. Add appropriate IDs to these files, and remove other license info from the header. In once case, leave the existing id line and just remove the license reference text. Signed-off-by: Tim Bird <tim.bird@sony.com> Link: https://patch.msgid.link/20260522225508.24006-1-tim.bird@sony.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-26audit: use 'unsigned int' instead of 'unsigned'Ricardo Robaina2-8/+8
Address checkpatch.pl warning below, across the audit subsystem: WARNING: Prefer 'unsigned int' to bare use of 'unsigned' Minor cleanup, no functional changes. Signed-off-by: Ricardo Robaina <rrobaina@redhat.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2026-05-26call_once:: Fix typo in comment for call_once()Jiun Jeong1-1/+1
Change "succesfully" to "successfully" in the kerneldoc comment of call_once(). Signed-off-by: Jiun Jeong <jiun.jeong.cs@gmail.com> Link: https://patch.msgid.link/20260501144413.49419-1-jiun.jeong.cs@gmail.com [sean: don't scope to KVM, massage changelog] Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-05-26block: don't set BIO_QUIET for BLK_STS_AGAINChristoph Hellwig1-1/+0
Commit abb30460bda2 ("block: mark bio_wouldblock_error() bio with BIO_QUIET") added this to suppress buffer_head warnings, but neither when this commit was added nor now any buffer_head using code actually ever sets REQ_NOWAIT which can lead to BLK_STS_AGAIN. Remove the special handling for now. If we ever plan to use REQ_NOWAIT for buffer_head based I/O we're better off handling BLK_STS_AGAIN in the completion handler as it actually needs to retry the I/O as well. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20260518063336.507369-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-26block: switch numa_node to int in blk_mq_hw_ctx and init_requestMateusz Nowicki1-2/+2
numa_node in blk_mq_hw_ctx and the matching argument of blk_mq_ops::init_request can be NUMA_NO_NODE (-1). Declared as unsigned int, NUMA_NO_NODE becomes UINT_MAX and walks off nvme_dev::descriptor_pools[] on CONFIG_NUMA=n [1]. Switch the field and the callback prototype to int and update all in-tree init_request implementations. No functional change: cpu_to_node(), kmalloc_node() and blk_alloc_flush_queue() already take int. Link: https://lore.kernel.org/linux-nvme/20260522150628.399288-1-mateusz.nowicki@posteo.net/ [1] Link: https://lore.kernel.org/linux-nvme/20260309062840.2937858-2-iam@sung-woo.kim/ Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Suggested-by: Sung-woo Kim <iam@sung-woo.kim> Signed-off-by: Mateusz Nowicki <mateusz.nowicki@posteo.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://patch.msgid.link/20260523125210.272274-1-mateusz.nowicki@posteo.net Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-05-26KVM: SEV: Pin source page for write when adding CPUID data for SNP guestSean Christopherson1-1/+2
When populating a guest_memfd instance with the initial CPUID data for an SNP guest, acquire a writable pin on the source page as KVM will write back the "correct" CPUID information if the userspace provided data is rejected by trusted firmware. Because KVM writes to the source page using a kernel mapping, pinning for read could result in KVM clobbering read-only memory. Note, well-behaved VMMs are unlikely to be affected, as CPUID information is almost always dynamically generated by userspace, i.e. it's unlikely for the CPUID information to be backed by a read-only mapping. Fixes: 2a62345b30529 ("KVM: guest_memfd: GUP source pages prior to populating guest memory") Cc: stable@vger.kernel.org Signed-off-by: Ackerley Tng <ackerleytng@google.com> Link: https://patch.msgid.link/20260522-fix-sev-gmem-post-populate-v2-1-3f196bfad5a1@google.com [sean: rewrite shortlog and changelog, tag for stable@] Signed-off-by: Sean Christopherson <seanjc@google.com>
2026-05-26Merge tag 'mm-hotfixes-stable-2026-05-25-16-22' of ↵Linus Torvalds4-28/+27
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull misc fixes from Andrew Morton: "13 hotfixes. 9 are for MM. 9 are cc:stable and the remaining 4 address post-7.1 issues or aren't considered suitable for backporting. All patches are singletons - please see the individual changelogs for details" * tag 'mm-hotfixes-stable-2026-05-25-16-22' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: Revert "mm: introduce a new page type for page pool in page type" mm/vmalloc: do not trigger BUG() on BH disabled context MAINTAINERS, mailmap: change email for Eugen Hristev mm/migrate_device: fix pgtable leak in migrate_vma_insert_huge_pmd_page kernel/fork: validate exit_signal in kernel_clone() mm: memcontrol: propagate NMI slab stats to memcg vmstats mm/damon/sysfs-schemes: delete tried region in regions_rmdirs() mm/rmap: initialize nr_pages to 1 at loop start in try_to_unmap_one zram: fix use-after-free in zram_writeback_endio memfd: deny writeable mappings when implying SEAL_WRITE ipc: limit next_id allocation to the valid ID range Revert "mm/hugetlbfs: update hugetlbfs to use mmap_prepare" MAINTAINERS: .mailmap: update after GEHC spin-off
2026-05-26include: Remove unused jz4740-battery.hCosta Shulyupin1-15/+0
The last user was removed in commit aea12071d6fc ("power/supply: Drop obsolete JZ4740 driver") and replaced by a self-contained IIO-based driver. No file includes this header. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2026-05-26include: Remove unused jz4740-adc.hCosta Shulyupin1-33/+0
The last user was the JZ4740 MFD ADC driver, removed in commit ff71266aa490 ("mfd: Drop obsolete JZ4740 driver") and replaced by a self-contained IIO driver. No file includes or references this header. Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Costa Shulyupin <costa.shul@redhat.com> Signed-off-by: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
2026-05-26genirq: Add rcuref count to struct irq_descThomas Gleixner1-0/+2
Prepare for a smarter iterator for /proc/interrupts so that the next interrupt descriptor can be cached after lookup. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dmitry Ilvokhin <d@ilvokhin.com> Link: https://patch.msgid.link/20260517194931.917415190@kernel.org
2026-05-26genirq: Cache the condition for /proc/interrupts exposureThomas Gleixner1-0/+1
show_interrupts() evaluates a boatload of conditions to establish whether it should expose an interrupt in /proc/interrupts or not. That can be simplified by caching the condition in an internal status flag, which is updated when one of the relevant inputs changes. The irq_desc::kstat_irq check is dropped because visible interrupt descriptors always have a valid pointer. As a result the number of instructions and branches for reading /proc/interrupts is reduced significantly. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dmitry Ilvokhin <d@ilvokhin.com> Reviewed-by: Radu Rendec <radu@rendec.net> Link: https://patch.msgid.link/20260517194931.680943749@kernel.org
2026-05-26genirq/proc: Utilize irq_desc::tot_count to avoid evaluationThomas Gleixner1-3/+3
Interrupts which are not marked per CPU increment not only the per CPU statistics, but also the accumulation counter irq_desc::tot_count. Change the counter to type unsigned long so it does not produce sporadic zeros due to wrap arounds on 64-bit machines and do a quick check for non per CPU interrupts. If the counter is zero, then simply emit a full set of zero strings. That spares the evaluation of the per CPU counters completely for interrupts with zero events. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dmitry Ilvokhin <d@ilvokhin.com> Reviewed-by: Radu Rendec <radu@rendec.net> Link: https://patch.msgid.link/20260517194931.115522199@kernel.org
2026-05-26genirq/proc: Avoid formatting zero counts in /proc/interruptsThomas Gleixner1-0/+1
A large portion of interrupt count entries are zero. There is no point in formatting the zero value as it is way cheeper to just emit a constant string. Collect the number of consecutive zero counts and emit them in one go before a non-zero count and at the end of the line. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dmitry Ilvokhin <d@ilvokhin.com> Reviewed-by: Radu Rendec <radu@rendec.net> Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com> Link: https://patch.msgid.link/20260517194931.034728540@kernel.org
2026-05-26firmware: meson: sm: Thermal calibration read via secure monitorRonald Claveau1-0/+3
Add SM_THERMAL_CALIB_READ to the secure monitor command enum and introduce meson_sm_get_thermal_calib() to allow drivers to retrieve thermal sensor calibration data through the firmware interface. Signed-off-by: Ronald Claveau <linux-kernel-dev@aliel.fr> Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org> Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> Link: https://patch.msgid.link/20260424-add-thermal-t7-vim4-v5-2-9040ca36afe2@aliel.fr
2026-05-26exec: free the old mm outside the exec locksChristian Brauner1-0/+1
exec_mmap() installs the new mm and then tears the old one down while still holding exec_update_lock for writing -- and with cred_guard_mutex held all the way to setup_new_exec(): setmax_mm_hiwater_rss(&tsk->signal->maxrss, old_mm); mm_update_next_owner(old_mm); mmput(old_mm); Neither lock is needed for this. exec_update_lock only exists to make the mm swap atomic with the later commit_creds(), so that permission-checking readers (proc, ptrace, the futex robust list, perf, kcmp, mm_access()) never observe the new mm together with the old credentials. Those readers all operate on task->mm, i.e. the new mm after the swap; none looks at the detached old mm, its ->owner or signal->maxrss. cred_guard_mutex guards credential calculation and is equally irrelevant here. The cost is real: __mmput() runs exit_mmap() over the entire old address space and can block in exit_aio() waiting for in-flight AIO, all while holding exec_update_lock for writing and cred_guard_mutex. For execve() of a large process this blocks ptrace_attach() and every exec_update_lock reader for the duration of the teardown. Stash the old mm in bprm->old_mm and release it from setup_new_exec() after both locks are dropped. setup_new_exec() still runs before setup_arg_pages() and the segment mappings, so the old address space is freed before the new one is populated and peak memory is unchanged. The ordering constraints are kept: old_mm's mmap_lock is still dropped in exec_mmap() before mm_update_next_owner() (required since commit 31a78f23bac0 ("mm owner: fix race between swapoff and exit")), and mm_update_next_owner() still precedes mmput(); both run in the execing task's context, as mm_update_next_owner() requires. If exec swaps the mm but fails before setup_new_exec() runs the old mm would leak, so add a backstop in free_bprm(). The lazy-tlb case (old_mm == NULL, e.g. kernel_execve()) has no address space to free and is left in exec_mmap(). Link: https://patch.msgid.link/20260522-work-exit_mm-v1-1-bd32d5a560bb@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-26exec_state: relocate dumpable informationChristian Brauner (Amutable)6-44/+15
The dumpable flag captured at execve() is consulted by __ptrace_may_access() and several /proc owner / visibility checks. It lives on mm_struct today, which exit_mm() clears from the task long before the task itself is reaped. exec_state is anchored to the execve() that established the current privilege domain. CLONE_VM siblings refcount-share the parent's exec_state via copy_exec_state(); non-CLONE_VM clones allocate a fresh exec_state inheriting the parent's dumpable mode and user_ns reference via task_exec_state_copy(). execve() allocates a fresh instance (via alloc_task_exec_state() in begin_new_exec()) and installs it under task_lock + exec_update_lock with task_exec_state_replace(). init_task uses a static instance. The dumpable mode now lives on task->exec_state->dumpable. task->mm->flags no longer carries dumpability; MMF_DUMPABLE_MASK is removed, but MMF_DUMPABLE_BITS is reserved so MMF_DUMP_FILTER_* bit positions remain stable for the /proc/<pid>/coredump_filter ABI. The task->user_dumpable cache bit and its assignment in exit_mm() are removed; readers go through get_dumpable(task) directly. coredump_params gains a snapshot field cprm.dumpable, populated from get_dumpable(current) at vfs_coredump() entry, replacing the previous __get_dumpable(cprm->mm_flags) consumers in fs/coredump.c and fs/pidfs.c. The user namespace recorded at execve() is consulted by __ptrace_may_access() and by /proc/PID/* owner derivation. Move the captured user_ns onto task_exec_state, which stays attached to the task past exit_mm() and across exit_files(). bprm grows a user_ns field staged in bprm_mm_init() with the caller's user_ns, narrowed by would_dump() to the closest privileged ancestor, and consumed by exec_mmap() via alloc_task_exec_state(bprm->user_ns). free_bprm() releases the staging reference. mm_struct loses ->user_ns entirely. Initializers in init-mm, efi_mm, and the implicit one in mm_init()/dup_mm()/mm_alloc() are removed; __mmdrop() drops the matching put_user_ns(). The kthread_use_mm() WARN_ON_ONCE(!mm->user_ns) is no longer meaningful and goes too. Reviewed-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260520-work-task_exec_state-v3-4-69f895bc1385@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-26ptrace: add ptracer_access_allowed()Christian Brauner (Amutable)1-0/+1
Add a helper that encapsulates all of the logic for checking ptrace access and remove open-coded versions in follow-up patches. Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: David Hildenbrand (arm) <david@kernel.org> Link: https://patch.msgid.link/20260520-work-task_exec_state-v3-3-69f895bc1385@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-26exec: introduce struct task_exec_stateChristian Brauner (Amutable)2-0/+33
Introduce struct task_exec_state, a per-task RCU-protected structure that holds the dumpable mode and the user namespace and stays attached to the task for its full lifetime. task_exec_state_rcu() is the canonical reader: asserts RCU or task_lock is held, WARNs on a NULL state, returns the rcu_dereference()'d pointer. Reviewed-by: Jann Horn <jannh@google.com> Link: https://patch.msgid.link/20260520-work-task_exec_state-v3-2-69f895bc1385@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-26sched/coredump: introduce enum task_dumpableChristian Brauner (Amutable)2-5/+12
Replace the SUID_DUMP_DISABLE/USER/ROOT preprocessor constants with enum task_dumpable. Numeric values are preserved (kernel.suid_dumpable sysctl and prctl(PR_SET_DUMPABLE) ABI), so this is a pure rename with no behavioral change. Subsequent commits relocate dumpability onto a per-task structure where the enum type will allow stronger type-checking on the new API. Reviewed-by: Jann Horn <jannh@google.com> Reviewed-by: David Hildenbrand (arm) <david@kernel.org> Link: https://patch.msgid.link/20260520-work-task_exec_state-v3-1-69f895bc1385@kernel.org Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
2026-05-26kho: fix KHO_TREE_MAX_DEPTH for non-4KB page sizesGeorge Guo1-1/+1
KHO_TREE_MAX_DEPTH is calculated as: DIV_ROUND_UP(KHO_ORDER_0_LOG2 - KHO_BITMAP_SIZE_LOG2, KHO_TABLE_SIZE_LOG2) + 1 For systems with 16KB pages (e.g. arm64 with CONFIG_ARM64_16K_PAGES=y or LoongArch), this gives a depth of 4. Since levels are 0 based, with depth = 4 the effective top level is 3 and the top-level shift at bit 39. PAGE_SHIFT = 14 KHO_BITMAP_SIZE_LOG2 = PAGE_SHIFT + 3 = 17 KHO_TABLE_SIZE_LOG2 = log(2; (1 << PAGE_SHIFT) / 8) = 11 shift = ((3 - 1) * KHO_TABLE_SIZE_LOG2) + KHO_BITMAP_SIZE_LOG2 = 39 The order-0 bit sits at bit 50 (KHO_ORDER_0_LOG2 = 64 - PAGE_SHIFT = 50). When inserting or reading a key, the index extracted at the top level is: (1 << 50) >> 39 = 2048 2048 is exactly the table size (PAGE_SIZE / sizeof(phys_addr_t) = 2048 for 16KB pages), so it wraps to 0, aliasing the order bit to index 0 and losing it silently. On the second kernel, kho_radix_decode_key() sees a key without the order bit, calls fls64() on the wrong bit, computes a wrong order and thus a garbage physical address. phys_to_page() of that address faults in kho_preserved_memory_reserve(), causing a kernel panic early in boot. Fix by adding +1 to the DIV_ROUND_UP numerator so the formula accounts for the order bit itself, giving depth 5 for 16KB pages. The top-level shift becomes 50, and (1 << 50) >> 50 = 1, which is nonzero and unambiguous. For 4KB and 64KB page sizes the depth is unchanged. Link: https://patch.msgid.link/20260509024415.33190-1-dongtai.guo@linux.dev Fixes: 3f2ad90060f6 ("kho: adopt radix tree for preserved memory tracking") Tested-by: Kexin Liu <liukexin@kylinos.cn> Signed-off-by: George Guo <guodongtai@kylinos.cn> Reviewed-by: Pasha Tatashin <pasha.tatashin@soleen.com> [rppt: added actual math to the changelog] Signed-off-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
2026-05-26irqchip/riscv-imsic: Add nr_guest_files in per-HART local configGuo Ren (Alibaba DAMO Academy)1-1/+4
Add nr_guest_files in per-HART local config to represent the number of guest files available on a particular HART whereas the nr_guest_files in the global config represents the number of guest files available across all HARTs. This allows KVM RISC-V to use nr_guest_files from per-HART local config for asymmetric big.Little systems. Signed-off-by: Guo Ren (Alibaba DAMO Academy) <guoren@kernel.org> Acked-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Anup Patel <anup.patel@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260525094945.3721783-2-anup.patel@oss.qualcomm.com Signed-off-by: Anup Patel <anup@brainfault.org>
2026-05-25net/mlx5: Add SPF function type for page managementMoshe Shemesh1-0/+1
Add MLX5_SPF to enum mlx5_func_type so SPFs get their own page counter, and add the corresponding WARN check at page cleanup. Wait for SPF pages to be reclaimed during ECPF teardown, alongside the existing host PF and VF page waits. SPF page requests are always identified by vhca_id, so the legacy func_id_to_type() path is not reached for satellite PFs. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260521110843.367329-13-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-05-25Merge branch 'arena_direct_access' of ↵Tejun Heo187-3088/+2929
git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next into for-7.2
2026-05-25remoteproc: use rsc_table_for_each_entry() in rproc_handle_resources()Mukesh Ojha1-0/+53
Replace the open-coded resource table iteration loop in rproc_handle_resources() with the rsc_table_for_each_entry() helper. The remoteproc-specific dispatch logic (vendor resource handling via rproc_handle_rsc(), RSC_LAST bounds check, handler table lookup) is moved into a local callback rproc_handle_rsc_entry(), keeping the iteration mechanics in one canonical place. The callback receives the payload offset within the table so that handlers which write back into the resource table (e.g. rproc_handle_carveout() recording a dynamically allocated address via rsc_offset) continue to work correctly. No functional change. Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260506050107.1985033-3-mukesh.ojha@oss.qualcomm.com
2026-05-25remoteproc: Move resource table data structure to its own headerMukesh Ojha2-268/+307
The resource table data structure has traditionally been associated with the remoteproc framework, where the resource table is included as a section within the remote processor firmware binary. However, it is also possible to obtain the resource table through other means—such as from a reserved memory region populated by the boot firmware, statically maintained driver data, or via a secure SMC call—when it is not embedded in the firmware. There are multiple Qualcomm remote processors (e.g., Venus, Iris, GPU, etc.) in the upstream kernel that do not use the remoteproc framework to manage their lifecycle for various reasons. When Linux is running at EL2, similar to the Qualcomm PAS driver (qcom_q6v5_pas.c), client drivers for subsystems like video and GPU may also want to use the resource table SMC call to retrieve and map resources before they are used by the remote processor. In such cases, the resource table data structure is no longer tightly coupled with the remoteproc headers. Client drivers that do not use the remoteproc framework should still be able to parse the resource table obtained through alternative means. Therefore, there is a need to decouple the resource table definitions from the remoteproc headers. Signed-off-by: Mukesh Ojha <mukesh.ojha@oss.qualcomm.com> Link: https://lore.kernel.org/r/20260506050107.1985033-2-mukesh.ojha@oss.qualcomm.com
2026-05-25Merge branch 'arena_direct_access'Alexei Starovoitov3-0/+58
Tejun Heo says: ==================== This makes BPF arena memory directly dereferenceable from kernel code (struct_ops callbacks, kfuncs). Each arena gets a per-arena scratch page that an arch fault hook installs into empty PTEs on kernel-side faults, after KFENCE. The faulting instruction retries and the violation is reported through the program's BPF stream. v4: - Patch 1: note that the strict-zero cmpxchg is narrower than pte_none() in inline comments on both x86 and arm64. (Andrea) - Patch 2: stub bpf_arena_handle_page_fault() for !CONFIG_BPF_SYSCALL via a new include/linux/bpf_defs.h. (lkp) - Patch 7: scx_arena_alloc() retries via a loop instead of a single retry on pool growth. (Andrea) - Picked up Reviewed-by tags from Emil and Andrea. v3: https://lore.kernel.org/r/20260520235052.4180316-1-tj@kernel.org v2: https://lore.kernel.org/r/20260517211232.1670594-1-tj@kernel.org v1 (RFC): https://lore.kernel.org/r/20260427105109.2554518-1-tj@kernel.org Motivation ---------- sched_ext's ops_cid.set_cmask() hands the BPF scheduler a struct scx_cmask *. The kernel translates a kernel cpumask to a cmask, but it had no way to write into the arena, so the cmask lived in kernel memory and was passed as a trusted pointer. BPF cmask helpers all operate on arena cmasks though, so the BPF side had to word-by-word probe-read the kernel cmask into an arena cmask via cmask_copy_from_kernel() before any helper could touch it. It works, but is clumsy. The shape isn't unique to set_cmask. Sub-scheduler support is on the way and more sched_ext callbacks will want to pass structured data to BPF. Anywhere a kfunc or struct_ops callback wants to hand a struct to a BPF program, arena residence is the natural answer. Approach -------- Each arena gets a per-arena scratch page. Arenas stay sparsely mapped as today - PTEs are populated only for allocated pages. A new arch fault hook (bpf_arena_handle_page_fault) is wired into x86 page_fault_oops() and arm64 __do_kernel_fault(), after KFENCE. When a kernel-side access faults inside an arena's kern_vm range, the helper walks the stack to find the BPF program responsible, range-checks the fault address against prog->aux->arena, and atomically installs the scratch page into the empty PTE via the new ptep_try_set() wrapper. The kernel instruction retries and reads/writes the scratch page. Free paths and map destruction treat scratch as non-owned. Real allocation refuses to overwrite scratch (apply_range_set_cb returns -EBUSY). A scratched address stays dead until map destroy, since its presence means the BPF program has already malfunctioned. The mechanism is default behavior - no UAPI flag. What this preserves ------------------- All the debugging properties of today's sparse-PTE design are preserved: * BPF programs still fault on unmapped arena accesses. The fault semantics (instruction retry with rdst = 0) and the violation report through bpf_streams are unchanged for prog-side accesses. * The first kernel-side touch of an unmapped address is reported via bpf_streams the same way as a prog-side fault, with the stack walk attributing it to the originating prog. * User-side fault on a never-scratched address still lazy-allocates a real page (or returns SIGSEGV under BPF_F_SEGV_ON_FAULT). User-side fault on a scratched address SIGSEGVs. What changes for the kernel-side caller is just that an unmapped deref no longer oopses - it retries through the scratch page and emits a violation report. The same shape today's BPF instruction faults have. Patches 1-2 (atomic PTE install + arena scratch-page recovery) -------------------------------------------------------------- mm: Add ptep_try_set() for lockless empty-slot installs bpf: Recover arena kernel faults with scratch page Patches 3-5 (helpers used by struct_ops registration) ----------------------------------------------------- bpf: Add sleepable variant of bpf_arena_alloc_pages for kernel callers bpf: Add bpf_struct_ops_for_each_prog() bpf/arena: Add bpf_arena_map_kern_vm_start() and bpf_prog_arena() ==================== Link: https://lore.kernel.org/bpf/20260522172219.1423324-1-tj@kernel.org/ Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-25firmware: zynqmp: Add dynamic CSU register discovery and sysfs interfaceRonak Jain1-1/+3
Add support for dynamically discovering and exposing Configuration Security Unit (CSU) registers through sysfs. Leverage the existing PM_QUERY_DATA API to discover available registers at runtime, making the interface flexible and maintainable. Key features: - Dynamic register discovery using PM_QUERY_DATA API * PM_QID_GET_NODE_COUNT: Query number of available registers * PM_QID_GET_NODE_NAME: Query register names by index - Automatic sysfs attribute creation under csu_registers/ group - Read operations via existing IOCTL_READ_REG API - Write operations via existing IOCTL_MASK_WRITE_REG API The sysfs interface is created at: /sys/devices/platform/firmware:zynqmp-firmware/csu_registers/ Currently supported registers include: - multiboot (CSU_MULTI_BOOT) - idcode (CSU_IDCODE, read-only) - pcap-status (CSU_PCAP_STATUS, read-only) The dynamic discovery approach allows firmware to control which registers are exposed without requiring kernel changes, improving maintainability and security. The firmware does not currently expose per-register access mode information, so the kernel cannot distinguish read-only registers from read-write ones at discovery time. All discovered registers are therefore created with sysfs mode 0644, and the firmware is responsible for rejecting writes to registers it treats as read-only (for example idcode and pcap-status); that error is propagated back to userspace from the store callback. If a per-register access-mode query is added to the firmware in the future, sysfs permissions can be tightened to match. CSU register discovery is an optional feature: on firmware that lacks support for PM_QID_GET_NODE_COUNT or PM_QID_GET_NODE_NAME, the probe returns gracefully without exposing any sysfs entries. To keep the memory footprint minimal on that path, partial devm allocations made during discovery are explicitly released on failure so that no memory lingers until device unbind when the feature is unavailable. Signed-off-by: Ronak Jain <ronak.jain@amd.com> Signed-off-by: Michal Simek <michal.simek@amd.com> Link: https://lore.kernel.org/r/20260520093654.3303917-3-ronak.jain@amd.com
2026-05-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf 7.1-rc5Alexei Starovoitov30-79/+411
Cross-merge BPF and other fixes after downstream PR. Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-05-25ASoC: add shared BCLK rate constraint for cross-DAI coordinationMark Brown12-29/+350
Troy Mitchell <troy.mitchell@linux.spacemit.com> says: On some SoCs (e.g. SpacemiT K3), multiple I2S controllers share the same physical BCLK. When one controller is already streaming, the others must use hw_params that result in the same BCLK rate, otherwise the shared clock would be reconfigured and corrupt the active stream. This series adds framework-level support for this constraint: Patch 1 adds the dt-bindings for the spacemit,k3-i2s compatible. The K3 SoC uses the same I2S IP as K1 but requires additional clocks: a dedicated sysclk_div, along with c_sysclk and c_bclk which are shared across multiple I2S controllers. Patch 2 adds a DEFINE_GUARD wrapping snd_soc_card_mutex_lock() and snd_soc_card_mutex_unlock() so that scope-based locking picks up the SND_SOC_CARD_CLASS_RUNTIME lockdep subclass. Patch 3 adds the constraint logic in soc-pcm.c. During PCM open, every DAI that has a bclk clock pointer gets a hw_rule registered unconditionally. The rule callback runs at hw_refine time: it scans the card for an active peer sharing the same physical BCLK (via clk_is_match()) that has already completed hw_params, then constrains the current stream's rate to match the established BCLK rate. The first DAI to complete hw_params is unconstrained; subsequent DAIs must match. Two modes are supported: - Default (I2S): BCLK = rate * channels * sample_bits. The rule derives the valid rate range from the current channel and sample_bits intervals. - Explicit ratio (TDM): if the driver sets dai->bclk_ratio (e.g. slots * slot_width), the rule computes the single valid rate as active_bclk_rate / bclk_ratio. This series was prompted by review feedback on the SpacemiT K3 I2S series, where a vendor-specific fixed-sample-rate property was rejected in favor of a generic framework solution: https://lore.kernel.org/all/afFqgF6ZRwYdfUmL@sirena.co.uk/ Link: https://patch.msgid.link/20260522-i2s-same-blk-v4-0-a71a86faaa20@linux.spacemit.com
2026-05-25Merge branch 'for-linus' into for-nextTakashi Iwai26-130/+364
Signed-off-by: Takashi Iwai <tiwai@suse.de>
2026-05-25Merge tag 'v7.1-rc5' into driver-core-nextDanilo Krummrich50-174/+787
We need the driver-core fixes in here as well to build on top of. Signed-off-by: Danilo Krummrich <dakr@kernel.org>
2026-05-24regulator: add support for SGM3804 Dual Output driverMark Brown36-159/+714
Neil Armstrong <neil.armstrong@linaro.org> says: Add support for the SG Micro SGM3804 Single Inductor Dual Output Buck/Boost Converter used to power LCD panels a provide positive and negative power rails with configurable voltage and active discharge function for each output. The SGM3804 is powered by the enable GPIO pins inputs and only supports I2C write messages. In order to add flexibility and simplify the driver, the regmap cache is enabled and populated with default values since we can't write registers when the 2 GPIOs are down. This regulator is used to provide vsn and vsn power to the Ayaneo Pocket S2 dual-DSI LCD panel. Link: https://patch.msgid.link/20260522-topic-sm8650-ayaneo-pocket-s2-sgm3804-v5-0-bd6b1c300ecc@linaro.org
2026-05-24netfilter: x_tables: disable 32bit compat interface in user namespacesFlorian Westphal1-0/+17
This feature is required to use 32bit arp/ip/ip6/ebtables binaries on 64bit kernels. I don't think there are many users left. Support has been a compile-time option since 2021 and defaults to off since 2023. The XTABLES_COMPAT config option is already off in many distributions including Debian and Fedora. Give a few more months before complete removal but disable support in user namespaces already. Assisted-by: Claude:claude-sonnet-4-6 Signed-off-by: Florian Westphal <fw@strlen.de>
2026-05-24Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds3-4/+9
Pull bpf fixes from Alexei Starovoitov: - Fix bpf_throw() and global subprog combination (Kumar Kartikeya Dwivedi) - Fix out of bounds access in BPF interpreter (Yazhou Tang) - Fix potential out of bounds access in inner per-cpu array map (Guannan Wang) - Reject NULL data/sig in bpf_verify_pkcs7_signature (KP Singh) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: libbpf: fix off-by-one in emit_signature_match jump offset bpf: Reject NULL data/sig in bpf_verify_pkcs7_signature selftests/bpf: Cover global subprog exception leaks bpf: Check global subprog exception paths bpf: make bpf_session_is_return() reference optional bpf: Use array_map_meta_equal for percpu array inner map replacement selftests/bpf: Add test for large offset bpf-to-bpf call bpf: Fix s16 truncation for large bpf-to-bpf call offsets bpf: Fix out-of-bounds read in bpf_patch_call_args()
2026-05-24rcu: Document rcu_access_pointer() feeding into cmpxchg()Paul E. McKenney1-5/+7
This commit documents the rcu_access_pointer() use case for