linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-09-30	Merge tag 'kvm-x86-misc-6.18' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini	1	-3/+5
	KVM x86 changes for 6.18 - Don't (re)check L1 intercepts when completing userspace I/O to fix a flaw where a misbehaving usersepace (a.k.a. syzkaller) could swizzle L1's intercepts and trigger a variety of WARNs in KVM. - Emulate PERF_CNTR_GLOBAL_STATUS_SET for PerfMonV2 guests, as the MSR is supposed to exist for v2 PMUs. - Allow Centaur CPU leaves (base 0xC000_0000) for Zhaoxin CPUs. - Clean up KVM's vector hashing code for delivering lowest priority IRQs. - Clean up the fastpath handler code to only handle IPIs and WRMSRs that are actually "fast", as opposed to handling those that KVM _hopes_ are fast, and in the process of doing so add fastpath support for TSC_DEADLINE writes on AMD CPUs. - Clean up a pile of PMU code in anticipation of adding support for mediated vPMUs. - Add support for the immediate forms of RDMSR and WRMSRNS, sans full emulator support (KVM should never need to emulate the MSRs outside of forced emulation and other contrived testing scenarios). - Clean up the MSR APIs in preparation for CET and FRED virtualization, as well as mediated vPMU support. - Rejecting a fully in-kernel IRQCHIP if EOIs are protected, i.e. for TDX VMs, as KVM can't faithfully emulate an I/O APIC for such guests. - KVM_REQ_MSR_FILTER_CHANGED into a generic RECALC_INTERCEPTS in preparation for mediated vPMU support, as KVM will need to recalculate MSR intercepts in response to PMU refreshes for guests with mediated vPMUs. - Misc cleanups and minor fixes.
2025-09-30	Merge tag 'kvm-x86-selftests-6.18' of https://github.com/kvm-x86/linux into HEAD	Paolo Bonzini	16	-109/+303
	KVM selftests changes for 6.18 - Add #DE coverage in the fastops test (the only exception that's guest- triggerable in fastop-emulated instructions). - Fix PMU selftests errors encountered on Granite Rapids (GNR), Sierra Forest (SRF) and Clearwater Forest (CWF). - Minor cleanups and improvements
2025-09-30	Merge tag 'loongarch-kvm-6.18' of ↵	Paolo Bonzini	32	-73/+538
	git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson into HEAD LoongArch KVM changes for v6.18 1. Add PTW feature detection on new hardware. 2. Add sign extension with kernel MMIO/IOCSR emulation. 3. Improve in-kernel IPI emulation. 4. Improve in-kernel PCH-PIC emulation. 5. Move kvm_iocsr tracepoint out of generic code.
2025-09-30	Merge tag 'kvm-riscv-6.18-1' of https://github.com/kvm-riscv/linux into HEAD	Paolo Bonzini	22	-56/+394
	KVM/riscv changes for 6.18 - Added SBI FWFT extension for Guest/VM with misaligned delegation and pointer masking PMLEN features - Added ONE_REG interface for SBI FWFT extension - Added Zicbop and bfloat16 extensions for Guest/VM - Enabled more common KVM selftests for RISC-V such as access_tracking_perf_test, dirty_log_perf_test, memslot_modification_stress_test, memslot_perf_test, mmu_stress_test, and rseq_test - Added SBI v3.0 PMU enhancements in KVM and perf driver
2025-09-30	Merge tag 'kvmarm-6.18' of ↵	Paolo Bonzini	29	-177/+479
	git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 updates for 6.18 - Add support for FF-A 1.2 as the secure memory conduit for pKVM, allowing more registers to be used as part of the message payload. - Change the way pKVM allocates its VM handles, making sure that the privileged hypervisor is never tricked into using uninitialised data. - Speed up MMIO range registration by avoiding unnecessary RCU synchronisation, which results in VMs starting much quicker. - Add the dump of the instruction stream when panic-ing in the EL2 payload, just like the rest of the kernel has always done. This will hopefully help debugging non-VHE setups. - Add 52bit PA support to the stage-1 page-table walker, and make use of it to populate the fault level reported to the guest on failing to translate a stage-1 walk. - Add NV support to the GICv3-on-GICv5 emulation code, ensuring feature parity for guests, irrespective of the host platform. - Fix some really ugly architecture problems when dealing with debug in a nested VM. This has some bad performance impacts, but is at least correct. - Add enough infrastructure to be able to disable EL2 features and give effective values to the EL2 control registers. This then allows a bunch of features to be turned off, which helps cross-host migration. - Large rework of the selftest infrastructure to allow most tests to transparently run at EL2. This is the first step towards enabling NV testing. - Various fixes and improvements all over the map, including one BE fix, just in time for the removal of the feature.
2025-09-30	Merge tag 'kvmarm-fixes-6.17-2' of ↵	Paolo Bonzini	16	-56/+168
	https://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD KVM/arm64 changes for 6.17, round #3 - Invalidate nested MMUs upon freeing the PGD to avoid WARNs when visiting from an MMU notifier - Fixes to the TLB match process and TLB invalidation range for managing the VCNR pseudo-TLB - Prevent SPE from erroneously profiling guests due to UNKNOWN reset values in PMSCR_EL1 - Fix save/restore of host MDCR_EL2 to account for eagerly programming at vcpu_load() on VHE systems - Correct lock ordering when dealing with VGIC LPIs, avoiding scenarios where an xarray's spinlock was nested with a raw spinlock - Permit stage-2 read permission aborts which are possible in the case of NV depending on the guest hypervisor's stage-2 translation - Call raw_spin_unlock() instead of the internal spinlock API - Fix parameter ordering when assigning VBAR_EL1 [Pull into kvm/master to fix conflicts. - Paolo]
2025-09-30	Merge tag 'cgroup-for-6.18' of ↵	Linus Torvalds	4	-0/+679
	git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup Pull cgroup updates from Tejun Heo: - Extensive cpuset code cleanup and refactoring work with no functional changes: CPU mask computation logic refactoring, introducing new helpers, removing redundant code paths, and improving error handling for better maintainability. - A few bug fixes to cpuset including fixes for partition creation failures when isolcpus is in use, missing error returns, and null pointer access prevention in free_tmpmasks(). - Core cgroup changes include replacing the global percpu_rwsem with per-threadgroup rwsem when writing to cgroup.procs for better scalability, workqueue conversions to use WQ_PERCPU and system_percpu_wq to prepare for workqueue default switching from percpu to unbound, and removal of unused code including the post_attach callback. - New cgroup.stat.local time accounting feature that tracks frozen time duration. - Misc changes including selftests updates (new freezer time tests and backward compatibility fixes), documentation sync, string function safety improvements, and 64-bit division fixes. * tag 'cgroup-for-6.18' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: (39 commits) cpuset: remove is_prs_invalid helper cpuset: remove impossible warning in update_parent_effective_cpumask cpuset: remove redundant special case for null input in node mask update cpuset: fix missing error return in update_cpumask cpuset: Use new excpus for nocpu error check when enabling root partition cpuset: fix failure to enable isolated partition when containing isolcpus Documentation: cgroup-v2: Sync manual toctree cpuset: use partition_cpus_change for setting exclusive cpus cpuset: use parse_cpulist for setting cpus.exclusive cpuset: introduce partition_cpus_change cpuset: refactor cpus_allowed_validate_change cpuset: refactor out validate_partition cpuset: introduce cpus_excl_conflict and mems_excl_conflict helpers cpuset: refactor CPU mask buffer parsing logic cpuset: Refactor exclusive CPU mask computation logic cpuset: change return type of is_partition_[in]valid to bool cpuset: remove unused assignment to trialcs->partition_root_state cpuset: move the root cpuset write check earlier cgroup/cpuset: Remove redundant rcu_read_lock/unlock() in spin_lock cgroup: Remove redundant rcu_read_lock/unlock() in spin_lock ...
2025-09-30	selftests/net: add tcp_port_share to .gitignore	Gopi Krishna Menon	1	-0/+1
	Add the tcp_port_share test binary to .gitignore to avoid accidentally staging the build artifact. Signed-off-by: Gopi Krishna Menon <krishnagopi487@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20250929163140.122383-1-krishnagopi487@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-30	Merge branch 'for-6.18/selftests' into for-linus	Benjamin Tissoires	1	-245/+423
	- update vmtest.sh (Benjamin Tissoires)
2025-09-30	selftests: drv-net: psp: add tests for destroying devices	Jakub Kicinski	5	-3/+68
	Add tests for making sure device can disappear while associations exist. This is netdevsim-only since destroying real devices is more tricky. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-9-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30	selftests: drv-net: psp: add test for auto-adjusting TCP MSS	Jakub Kicinski	1	-0/+52
	Test TCP MSS getting auto-adjusted. PSP adds an encapsulation overhead of 40B per packet, when used in transport mode without any virtualization cookie or other optional PSP header fields. The kernel should adjust the MSS for a connection after PSP tx state is reached. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-8-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30	selftests: drv-net: psp: add connection breaking tests	Jakub Kicinski	1	-1/+91
	Add test checking conditions which lead to connections breaking. Using bad key or connection gets stuck if device key is rotated twice. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-7-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30	selftests: drv-net: psp: add association tests	Jakub Kicinski	4	-4/+167
	Add tests for exercising PSP associations for TCP sockets. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-6-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30	selftests: drv-net: psp: add basic data transfer and key rotation tests	Jakub Kicinski	1	-3/+191
	Add basic tests for sending data over PSP and making sure that key rotation toggles the MSB of the spi. Deploy PSP responder on the remote end. We also need a healthy dose of common helpers for setting up the connections, assertions and interrogating socket state on the Python side. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-5-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30	selftests: drv-net: add PSP responder	Jakub Kicinski	3	-0/+493
	PSP tests need the remote system to support PSP, and some PSP capable application to exchange data with. Create a simple PSP responder app which we can build and deploy to the remote host. The tests themselves can be written in Python but for ease of deploying the responder is in C (using C YNL). Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-4-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30	selftests: drv-net: base device access API test	Jakub Kicinski	7	-3/+93
	Simple PSP test to getting info about PSP devices. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Daniel Zahka <daniel.zahka@gmail.com> Link: https://patch.msgid.link/20250927225420.1443468-3-kuba@kernel.org Reviewed-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-30	selftests: bonding: add ipsec offload test	Hangbin Liu	3	-1/+162
	This introduces a test for IPSec offload over bonding, utilizing netdevsim for the testing process, as veth interfaces do not support IPSec offload. The test will ensure that the IPSec offload functionality remains operational even after a failover event occurs in the bonding configuration. Here is the test result: TEST: bond_ipsec_offload (active_slave eth0) [ OK ] TEST: bond_ipsec_offload (active_slave eth1) [ OK ] Reviewed-by: Petr Machata <petrm@nvidia.com> Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Link: https://patch.msgid.link/20250925023304.472186-2-liuhangbin@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-09-29	Merge tag 'powerpc-6.18-1' of ↵	Linus Torvalds	1	-1/+1
	git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Madhavan Srinivasan: - powerpc support for BPF arena and arena atomics - Patches to switch to msi parent domain (per-device MSI domains) - Add a lock contention tracepoint in the queued spinlock slowpath - Fixes for underflow in pseries/powernv msi and pci paths - Switch from legacy-of-mm-gpiochip dependency to platform driver - Fixes for handling TLB misses - Introduce support for powerpc papr-hvpipe - Add vpa-dtl PMU driver for pseries platform - Misc fixes and cleanups Thanks to Aboorva Devarajan, Aditya Bodkhe, Andrew Donnellan, Athira Rajeev, Cédric Le Goater, Christophe Leroy, Erhard Furtner, Gautam Menghani, Geert Uytterhoeven, Haren Myneni, Hari Bathini, Joe Lawrence, Kajol Jain, Kienan Stewart, Linus Walleij, Mahesh Salgaonkar, Nam Cao, Nicolas Schier, Nysal Jan K.A., Ritesh Harjani (IBM), Ruben Wauters, Saket Kumar Bhaskar, Shashank MS, Shrikanth Hegde, Tejas Manhas, Thomas Gleixner, Thomas Huth, Thorsten Blum, Tyrel Datwyler, and Venkat Rao Bagalkote. * tag 'powerpc-6.18-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (49 commits) powerpc/pseries: Define __u{8,32} types in papr_hvpipe_hdr struct genirq/msi: Remove msi_post_free() powerpc/perf/vpa-dtl: Add documentation for VPA dispatch trace log PMU powerpc/perf/vpa-dtl: Handle the writing of perf record when aux wake up is needed powerpc/perf/vpa-dtl: Add support to capture DTL data in aux buffer powerpc/perf/vpa-dtl: Add support to setup and free aux buffer for capturing DTL data docs: ABI: sysfs-bus-event_source-devices-vpa-dtl: Document sysfs event format entries for vpa_dtl pmu powerpc/vpa_dtl: Add interface to expose vpa dtl counters via perf powerpc/time: Expose boot_tb via accessor powerpc/32: Remove PAGE_KERNEL_TEXT to fix startup failure powerpc/fprobe: fix updated fprobe for function-graph tracer powerpc/ftrace: support CONFIG_FUNCTION_GRAPH_RETVAL powerpc64/modules: replace stub allocation sentinel with an explicit counter powerpc64/modules: correctly iterate over stubs in setup_ftrace_ool_stubs powerpc/ftrace: ensure ftrace record ops are always set for NOPs powerpc/603: Really copy kernel PGD entries into all PGDIRs powerpc/8xx: Remove left-over instruction and comments in DataStoreTLBMiss handler powerpc/pseries: HVPIPE changes to support migration powerpc/pseries: Enable hvpipe with ibm,set-system-parameter RTAS powerpc/pseries: Enable HVPIPE event message interrupt ...
2025-09-29	Merge tag 'riscv-for-linus-6.18-mw1' of ↵	Linus Torvalds	1	-0/+24
	git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux Pull RISC-V updates from Paul Walmsley - Replacement of __ASSEMBLY__ with __ASSEMBLER__ in header files (other architectures have already merged this type of cleanup) - The introduction of ioremap_wc() for RISC-V - Cleanup of the RISC-V kprobes code to use mostly-extant macros rather than open code - A RISC-V kprobes unit test - An architecture-specific endianness swap macro set implementation, leveraging some dedicated RISC-V instructions for this purpose if they are available - The ability to identity and communicate to userspace the presence of a MIPS P8700-specific ISA extension, and to leverage its MIPS-specific PAUSE implementation in cpu_relax() - Several other miscellaneous cleanups * tag 'riscv-for-linus-6.18-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (39 commits) riscv: errata: Fix the PAUSE Opcode for MIPS P8700 riscv: hwprobe: Document MIPS xmipsexectl vendor extension riscv: hwprobe: Add MIPS vendor extension probing riscv: Add xmipsexectl instructions riscv: Add xmipsexectl as a vendor extension dt-bindings: riscv: Add xmipsexectl ISA extension description riscv: cpufeature: add validation for zfa, zfh and zfhmin perf: riscv: skip empty batches in counter start selftests: riscv: Add README for RISC-V KSelfTest riscv: sbi: Switch to new sys-off handler API riscv: Move vendor errata definitions to new header RISC-V: ACPI: enable parsing the BGRT table riscv: Enable ARCH_HAVE_NMI_SAFE_CMPXCHG riscv: pi: use 'targets' instead of extra-y in Makefile riscv: introduce asm/swab.h riscv: mmap(): use unsigned offset type in riscv_sys_mmap drivers/perf: riscv: Remove redundant ternary operators riscv: mm: Use mmu-type from FDT to limit SATP mode riscv: mm: Return intended SATP mode for noXlvl options riscv: kprobes: Remove duplication of RV_EXTRACT_ITYPE_IMM ...
2025-09-29	Merge tag 'arm64-upstream' of ↵	Linus Torvalds	14	-32/+144
	git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Will Deacon: "There's good stuff across the board, including some nice mm improvements for CPUs with the 'noabort' BBML2 feature and a clever patch to allow ptdump to play nicely with block mappings in the vmalloc area. Confidential computing: - Add support for accepting secrets from firmware (e.g. ACPI CCEL) and mapping them with appropriate attributes. CPU features: - Advertise atomic floating-point instructions to userspace - Extend Spectre workarounds to cover additional Arm CPU variants - Extend list of CPUs that support break-before-make level 2 and guarantee not to generate TLB conflict aborts for changes of mapping granularity (BBML2_NOABORT) - Add GCS support to our uprobes implementation. Documentation: - Remove bogus SME documentation concerning register state when entering/exiting streaming mode. Entry code: - Switch over to the generic IRQ entry code (GENERIC_IRQ_ENTRY) - Micro-optimise syscall entry path with a compiler branch hint. Memory management: - Enable huge mappings in vmalloc space even when kernel page-table dumping is enabled - Tidy up the types used in our early MMU setup code - Rework rodata= for closer parity with the behaviour on x86 - For CPUs implementing BBML2_NOABORT, utilise block mappings in the linear map even when rodata= applies to virtual aliases - Don't re-allocate the virtual region between '_text' and '_stext', as doing so confused tools parsing /proc/vmcore. Miscellaneous: - Clean-up Kconfig menuconfig text for architecture features - Avoid redundant bitmap_empty() during determination of supported SME vector lengths - Re-enable warnings when building the 32-bit vDSO object - Avoid breaking our eggs at the wrong end. Perf and PMUs: - Support for v3 of the Hisilicon L3C PMU - Support for Hisilicon's MN and NoC PMUs - Support for Fujitsu's Uncore PMU - Support for SPE's extended event filtering feature - Preparatory work to enable data source filtering in SPE - Support for multiple lanes in the DWC PCIe PMU - Support for i.MX94 in the IMX DDR PMU driver - MAINTAINERS update (Thank you, Yicong) - Minor driver fixes (PERF_IDX2OFF() overflow, CMN register offsets). Selftests: - Add basic LSFE check to the existing hwcaps test - Support nolibc in GCS tests - Extend SVE ptrace test to pass unsupported regsets and invalid vector lengths - Minor cleanups (typos, cosmetic changes). System registers: - Fix ID_PFR1_EL1 definition - Fix incorrect signedness of some fields in ID_AA64MMFR4_EL1 - Sync TCR_EL1 definition with the latest Arm ARM (L.b) - Be stricter about the input fed into our AWK sysreg generator script - Typo fixes and removal of redundant definitions. ACPI, EFI and PSCI: - Decouple Arm's "Software Delegated Exception Interface" (SDEI) support from the ACPI GHES code so that it can be used by platforms booted with device-tree - Remove unnecessary per-CPU tracking of the FPSIMD state across EFI runtime calls - Fix a node refcount imbalance in the PSCI device-tree code. CPU Features: - Ensure register sanitisation is applied to fields in ID_AA64MMFR4 - Expose AIDR_EL1 to userspace via sysfs, primarily so that KVM guests can reliably query the underlying CPU types from the VMM - Re-enabling of SME support (CONFIG_ARM64_SME) as a result of fixes to our context-switching, signal handling and ptrace code" * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (93 commits) arm64: cpufeature: Remove duplicate asm/mmu.h header arm64: Kconfig: Make CPU_BIG_ENDIAN depend on BROKEN perf/dwc_pcie: Fix use of uninitialized variable arm/syscalls: mark syscall invocation as likely in invoke_syscall Documentation: hisi-pmu: Add introduction to HiSilicon V3 PMU Documentation: hisi-pmu: Fix of minor format error drivers/perf: hisi: Add support for L3C PMU v3 drivers/perf: hisi: Refactor the event configuration of L3C PMU drivers/perf: hisi: Extend the field of tt_core drivers/perf: hisi: Extract the event filter check of L3C PMU drivers/perf: hisi: Simplify the probe process of each L3C PMU version drivers/perf: hisi: Export hisi_uncore_pmu_isr() drivers/perf: hisi: Relax the event ID check in the framework perf: Fujitsu: Add the Uncore PMU driver arm64: map [_text, _stext) virtual address range non-executable+read-only arm64/sysreg: Update TCR_EL1 register arm64: Enable vmalloc-huge with ptdump arm64: cpufeature: add Neoverse-V3AE to BBML2 allow list arm64: errata: Apply workarounds for Neoverse-V3AE arm64: cputype: Add Neoverse-V3AE definitions ...
2025-09-29	selftest: packetdrill: Import client-ack-dropped-then-recovery-ms-timestamps.pkt	Kuniyuki Iwashima	1	-0/+46
	This also does not have the non-experimental version, so converted to FO. The comment in .pkt explains the detailed scenario. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-14-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Import sockopt-fastopen-key.pkt	Kuniyuki Iwashima	2	-0/+76
	sockopt-fastopen-key.pkt does not have the non-experimental version, so the Experimental version is converted, FOEXP -> FO. The test sets net.ipv4.tcp_fastopen_key=0-0-0-0 and instead sets another key via setsockopt(TCP_FASTOPEN_KEY). The first listener generates a valid cookie in response to TFO option without cookie, and the second listner creates a TFO socket using the valid cookie. TCP_FASTOPEN_KEY is adjusted to use the common key in default.sh so that we can use TFO_COOKIE and support dualstack. Similarly, TFO_COOKIE_ZERO for the 0-0-0-0 key is defined. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-13-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Refine tcp_fastopen_server_reset-after-disconnect.pkt.	Kuniyuki Iwashima	1	-3/+7
	These changes are applied to follow the imported packetdrill tests. * Call setsockopt(TCP_FASTOPEN) * Remove unnecessary accept() delay * Add assertion for TCP states * Rename to tcp_fastopen_server_trigger-rst-reconnect.pkt. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-12-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Import opt34/*-trigger-rst.pkt.	Kuniyuki Iwashima	2	-0/+44
	This imports the non-experimental version of opt34/*-trigger-rst.pkt. \| accept() \| SYN data \| -----------------------------------+----------+----------+ listener-closed-trigger-rst.pkt \| no \| unread \| unread-data-closed-trigger-rst.pkt \| yes \| unread \| Both files test that close()ing a SYN_RECV socket with unread SYN data triggers RST. The files are renamed to have the common prefix, trigger-rst. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-11-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Import opt34/reset-* tests.	Kuniyuki Iwashima	4	-0/+138
	This imports the non-experimental version of opt34/reset-*.pkt. \| Child \| RST \| sk_err \| ---------------------------------+---------+-------------------------------+---------+ reset-after-accept.pkt \| TFO \| after accept(), SYN_RECV \| read() \| reset-close-with-unread-data.pkt \| TFO \| after accept(), SYN_RECV \| write() \| reset-before-accept.pkt \| TFO \| before accept(), SYN_RECV \| read() \| reset-non-tfo-socket.pkt \| non-TFO \| before accept(), ESTABLISHED \| write() \| The first 3 files test scenarios where a SYN_RECV socket receives RST before/after accept() and data in SYN must be read() without error, but the following read() or fist write() will return ECONNRESET. The last test is similar but with non-TFO socket. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Import opt34/icmp-before-accept.pkt.	Kuniyuki Iwashima	1	-0/+49
	This imports the non-experimental version of icmp-before-accept.pkt. This file tests the scenario where an ICMP unreachable packet for a not-yet-accept()ed socket changes its state to TCP_CLOSE, but the SYN data must be read without error, and the following read() returns EHOSTUNREACH. Note that this test support only IPv4 as icmp is used. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Import opt34/fin-close-socket.pkt.	Kuniyuki Iwashima	1	-0/+30
	This imports the non-experimental version of fin-close-socket.pkt. This file tests the scenario where a TFO child socket's state transitions from SYN_RECV to CLOSE_WAIT before accept()ed. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Add test for experimental option.	Kuniyuki Iwashima	1	-0/+37
	The only difference between non-experimental vs experimental TFO option handling is SYN+ACK generation. When tcp_parse_fastopen_option() parses a TFO option, it sets tcp_fastopen_cookie.exp to false if the option number is 34, and true if 255. The value is carried to tcp_options_write() to generate a TFO option with the same option number. Other than that, all the TFO handling is the same and the kernel must generate the same cookie regardless of the option number. Let's add a test for the handling so that we can consolidate fastopen/server/ tests and fastopen/server/opt34 tests. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Add test for TFO_SERVER_WO_SOCKOPT1.	Kuniyuki Iwashima	1	-0/+21
	TFO_SERVER_WO_SOCKOPT1 is no longer enabled by default, and each server test requires setsockopt(TCP_FASTOPEN). Let's add a basic test for TFO_SERVER_WO_SOCKOPT1. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Import TFO server basic tests.	Kuniyuki Iwashima	5	-0/+157
	This imports basic TFO server tests from google/packetdrill. The repository has two versions of tests for most scenarios; one uses the non-experimental option (34), and the other uses the experimental option (255) with 0xF989. This only imports the following tests of the non-experimental version placed in [0]. I will add a specific test for the experimental option handling later. \| TFO \| Cookie \| Payload \| ---------------------------+-----+--------+---------+ basic-rw.pkt \| yes \| yes \| yes \| basic-zero-payload.pkt \| yes \| yes \| no \| basic-cookie-not-reqd.pkt \| yes \| no \| yes \| basic-non-tfo-listener.pkt \| no \| yes \| yes \| pure-syn-data.pkt \| yes \| no \| yes \| The original pure-syn-data.pkt missed setsockopt(TCP_FASTOPEN) and did not test TFO server in some scenarios unintentionally, so setsockopt() is added where needed. In addition, non-TFO scenario is stripped as it is covered by basic-non-tfo-listener.pkt. Also, I added basic- prefix. Link: https://github.com/google/packetdrill/tree/bfc96251310f/gtests/net/tcp/fastopen/server/opt34 #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Define common TCP Fast Open cookie.	Kuniyuki Iwashima	2	-0/+3
	TCP Fast Open cookie is generated in __tcp_fastopen_cookie_gen_cipher(). The cookie value is generated from src/dst IPs and a key configured by setsockopt(TCP_FASTOPEN_KEY) or net.ipv4.tcp_fastopen_key. The default.sh sets net.ipv4.tcp_fastopen_key, and the original packetdrill defines the corresponding cookie as TFO_COOKIE in run_all.py. [0] Then, each test does not need to care about the value, and we can easily update TFO_COOKIE in case __tcp_fastopen_cookie_gen_cipher() changes the algorithm. However, some tests use the bare hex value for specific IPv4 addresses and do not support IPv6. Let's define the same TFO_COOKIE in ksft_runner.sh. We will replace such bare hex values with TFO_COOKIE except for a single test for setsockopt(TCP_FASTOPEN_KEY). Link: https://github.com/google/packetdrill/blob/7230b3990f94/gtests/net/packetdrill/run_all.py#L65 #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Require explicit setsockopt(TCP_FASTOPEN).	Kuniyuki Iwashima	1	-1/+1
	To enable TCP Fast Open on a server, net.ipv4.tcp_fastopen must have 0x2 (TFO_SERVER_ENABLE), and we need to do either 1. Call setsockopt(TCP_FASTOPEN) for the socket 2. Set 0x400 (TFO_SERVER_WO_SOCKOPT1) additionally to net.ipv4.tcp_fastopen The default.sh sets 0x70403 so that each test does not need setsockopt(). (0x1 is TFO_CLIENT_ENABLE, and 0x70000 is ...???) However, some tests overwrite net.ipv4.tcp_fastopen without TFO_SERVER_WO_SOCKOPT1 and forgot setsockopt(TCP_FASTOPEN). For example, pure-syn-data.pkt [0] tests non-TFO servers unintentionally, except in the first scenario. To prevent such an accident, let's require explicit setsockopt(). TFO_CLIENT_ENABLE is necessary for tcp_syscall_bad_arg_fastopen-invalid-buf-ptr.pkt. Link: https://github.com/google/packetdrill/blob/bfc96251310f/gtests/net/tcp/fastopen/server/opt34/pure-syn-data.pkt #[0] Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftest: packetdrill: Set ktap_set_plan properly for single protocol test.	Kuniyuki Iwashima	1	-2/+2
	The cited commit forgot to update the ktap_set_plan call. ktap_set_plan sets the number of tests (KSFT_NUM_TESTS), which must match the number of executed tests (KTAP_CNT_PASS + KTAP_CNT_SKIP + KTAP_CNT_XFAIL) in ktap_finished. Otherwise, the selftest exit()s with 1. Let's adjust KSFT_NUM_TESTS based on supported protocols. While at it, misalignment is fixed up. Fixes: a5c10aa3d1ba ("selftests/net: packetdrill: Support single protocol test.") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20250927213022.1850048-2-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	selftests: mptcp: join: validate new laminar endp	Matthieu Baerts (NGI0)	2	-0/+78
	Here are a few sub-tests for mptcp_join.sh, validating the new 'laminar' endpoint type. In a setup where subflows created using the routing rules would be rejected by the listener, and where the latter announces one IP address, some cases are verified: - Without any 'laminar' endpoints: no new subflows are created. - With one 'laminar' endpoint: a second subflow is created. - With multiple 'laminar' endpoints: 2 IPv4 subflows are created. - With one 'laminar' endpoint, but the server announcing a second IP address, only one subflow is created. - With one 'laminar' + 'subflow' endpoint, the same endpoint is only used once. Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20250927-net-next-mptcp-rcv-path-imp-v1-8-5da266aa9c1a@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-29	Merge tag 'seccomp-v6.18-rc1' of ↵	Linus Torvalds	1	-0/+131
	git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull seccomp update from Kees Cook: - Fix race with WAIT_KILLABLE_RECV (Johannes Nixdorf) * tag 'seccomp-v6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: selftests/seccomp: Add a test for the WAIT_KILLABLE_RECV fast reply race seccomp: Fix a race with WAIT_KILLABLE_RECV if the tracer replies too fast
2025-09-29	Merge tag 'namespace-6.18-rc1' of ↵	Linus Torvalds	6	-0/+2493
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull namespace updates from Christian Brauner: "This contains a larger set of changes around the generic namespace infrastructure of the kernel. Each specific namespace type (net, cgroup, mnt, ...) embedds a struct ns_common which carries the reference count of the namespace and so on. We open-coded and cargo-culted so many quirks for each namespace type that it just wasn't scalable anymore. So given there's a bunch of new changes coming in that area I've started cleaning all of this up. The core change is to make it possible to correctly initialize every namespace uniformly and derive the correct initialization settings from the type of the namespace such as namespace operations, namespace type and so on. This leaves the new ns_common_init() function with a single parameter which is the specific namespace type which derives the correct parameters statically. This also means the compiler will yell as soon as someone does something remotely fishy. The ns_common_init() addition also allows us to remove ns_alloc_inum() and drops any special-casing of the initial network namespace in the network namespace initialization code that Linus complained about. Another part is reworking the reference counting. The reference counting was open-coded and copy-pasted for each namespace type even though they all followed the same rules. This also removes all open accesses to the reference count and makes it private and only uses a very small set of dedicated helpers to manipulate them just like we do for e.g., files. In addition this generalizes the mount namespace iteration infrastructure introduced a few cycles ago. As reminder, the vfs makes it possible to iterate sequentially and bidirectionally through all mount namespaces on the system or all mount namespaces that the caller holds privilege over. This allow userspace to iterate over all mounts in all mount namespaces using the listmount() and statmount() system call. Each mount namespace has a unique identifier for the lifetime of the systems that is exposed to userspace. The network namespace also has a unique identifier working exactly the same way. This extends the concept to all other namespace types. The new nstree type makes it possible to lookup namespaces purely by their identifier and to walk the namespace list sequentially and bidirectionally for all namespace types, allowing userspace to iterate through all namespaces. Looking up namespaces in the namespace tree works completely locklessly. This also means we can move the mount namespace onto the generic infrastructure and remove a bunch of code and members from struct mnt_namespace itself. There's a bunch of stuff coming on top of this in the future but for now this uses the generic namespace tree to extend a concept introduced first for pidfs a few cycles ago. For a while now we have supported pidfs file handles for pidfds. This has proven to be very useful. This extends the concept to cover namespaces as well. It is possible to encode and decode namespace file handles using the common name_to_handle_at() and open_by_handle_at() apis. As with pidfs file handles, namespace file handles are exhaustive, meaning it is not required to actually hold a reference to nsfs in able to decode aka open_by_handle_at() a namespace file handle. Instead the FD_NSFS_ROOT constant can be passed which will let the kernel grab a reference to the root of nsfs internally and thus decode the file handle. Namespaces file descriptors can already be derived from pidfds which means they aren't subject to overmount protection bugs. IOW, it's irrelevant if the caller would not have access to an appropriate /proc/<pid>/ns/ directory as they could always just derive the namespace based on a pidfd already. It has the same advantage as pidfds. It's possible to reliably and for the lifetime of the system refer to a namespace without pinning any resources and to compare them trivially. Permission checking is kept simple. If the caller is located in the namespace the file handle refers to they are able to open it otherwise they must hold privilege over the owning namespace of the relevant namespace. The namespace file handle layout is exposed as uapi and has a stable and extensible format. For now it simply contains the namespace identifier, the namespace type, and the inode number. The stable format means that userspace may construct its own namespace file handles without going through name_to_handle_at() as they are already allowed for pidfs and cgroup file handles" * tag 'namespace-6.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (65 commits) ns: drop assert ns: move ns type into struct ns_common nstree: make struct ns_tree private ns: add ns_debug() ns: simplify ns_common_init() further cgroup: add missing ns_common include ns: use inode initializer for initial namespaces selftests/namespaces: verify initial namespace inode numbers ns: rename to __ns_ref nsfs: port to ns_ref_() helpers net: port to ns_ref_() helpers uts: port to ns_ref_() helpers ipv4: use check_net() net: use check_net() net-sysfs: use check_net() user: port to ns_ref_() helpers time: port to ns_ref_() helpers pid: port to ns_ref_() helpers ipc: port to ns_ref_() helpers cgroup: port to ns_ref_() helpers ...
2025-09-29	Merge tag 'vfs-6.18-rc1.mount' of ↵	Linus Torvalds	3	-1/+132
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull vfs mount updates from Christian Brauner: "This contains some work around mount api handling: - Output the warning message for mnt_too_revealing() triggered during fsmount() to the fscontext log. This makes it possible for the mount tool to output appropriate warnings on the command line. For example, with the newest fsopen()-based mount(8) from util-linux, the error messages now look like: # mount -t proc proc /tmp mount: /tmp: fsmount() failed: VFS: Mount too revealing. dmesg(1) may have more information after failed mount system call. - Do not consume fscontext log entries when returning -EMSGSIZE Userspace generally expects APIs that return -EMSGSIZE to allow for them to adjust their buffer size and retry the operation. However, the fscontext log would previously clear the message even in the -EMSGSIZE case. Given that it is very cheap for us to check whether the buffer is too small before we remove the message from the ring buffer, let's just do that instead. - Drop an unused argument from do_remount()" * tag 'vfs-6.18-rc1.mount' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: vfs: fs/namespace.c: remove ms_flags argument from do_remount selftests/filesystems: add basic fscontext log tests fscontext: do not consume log entries when returning -EMSGSIZE vfs: output mount_too_revealing() errors to fscontext docs/vfs: Remove mentions to the old mount API helpers fscontext: add custom-prefix log helpers fs: Remove mount_bdev fs: Remove mount_nodev
2025-09-29	Merge tag 'vfs-6.18-rc1.misc' of ↵	Linus Torvalds	3	-0/+213
	git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs Pull misc vfs updates from Christian Brauner: "This contains the usual selections of misc updates for this cycle. Features: - Add "initramfs_options" parameter to set initramfs mount options. This allows to add specific mount options to the rootfs to e.g., limit the memory size - Add RWF_NOSIGNAL flag for pwritev2() Add RWF_NOSIGNAL flag for pwritev2. This flag prevents the SIGPIPE signal from being raised when writing on disconnected pipes or sockets. The flag is handled directly by the pipe filesystem and converted to the existing MSG_NOSIGNAL flag for sockets - Allow to pass pid namespace as procfs mount option Ever since the introduction of pid namespaces, procfs has had very implicit behaviour surrounding them (the pidns used by a procfs mount is auto-selected based on the mounting process's active pidns, and the pidns itself is basically hidden once the mount has been constructed) This implicit behaviour has historically meant that userspace was required to do some special dances in order to configure the pidns of a procfs mount as desired. Examples include: * In order to bypass the mnt_too_revealing() check, Kubernetes creates a procfs mount from an empty pidns so that user namespaced containers can be nested (without this, the nested containers would fail to mount procfs) But this requires forking off a helper process because you cannot just one-shot this using mount(2) * Container runtimes in general need to fork into a container before configuring its mounts, which can lead to security issues in the case of shared-pidns containers (a privileged process in the pidns can interact with your container runtime process) While SUID_DUMP_DISABLE and user namespaces make this less of an issue, the strict need for this due to a minor uAPI wart is kind of unfortunate Things would be much easier if there was a way for userspace to just specify the pidns they want. So this pull request contains changes to implement a new "pidns" argument which can be set using fsconfig(2): fsconfig(procfd, FSCONFIG_SET_FD, "pidns", NULL, nsfd); fsconfig(procfd, FSCONFIG_SET_STRING, "pidns", "/proc/self/ns/pid", 0); or classic mount(2) / mount(8): // mount -t proc -o pidns=/proc/self/ns/pid proc /tmp/proc mount("proc", "/tmp/proc", "proc", MS_..., "pidns=/proc/self/ns/pid"); Cleanups: - Remove the last references to EXPORT_OP_ASYNC_LOCK - Make file_remove_privs_flags() static - Remove redundant __GFP_NOWARN when GFP_NOWAIT is used - Use try_cmpxchg() in start_dir_add() - Use try_cmpxchg() in sb_init_done_wq() - Replace offsetof() with struct_size() in ioctl_file_dedupe_range() - Remove vfs_ioctl() export - Replace rwlock() with spinlock in epoll code as rwlock causes priority inversion on preempt rt kernels - Make ns_entries in fs/proc/namespaces const - Use a switch() statement() in init_special_inode() just like we do in may_open() - Use struct_size() in dir_add() in the initramfs code - Use str_plural() in rd_load_image() - Replace strcpy() with strscpy() in find_link() - Rename generic_delete_inode() to inode_just_drop() and generic_drop_inode() to inode_generic_drop() - Remove unused arguments from fcntl_{g,s}et_rw_hint() Fixes: - Document @name parameter for name_contains_dotdot() helper - Fix spelling mistake - Always return zero from replace_fd() instead of the file descriptor number - Limit the size for copy_file_range() in compat mode to prevent a signed overflow - Fix debugfs mount options not being applied - Verify the inode mode when loading it from disk in minixfs - Verify the inod