| Age | Commit message (Collapse) | Author | Files | Lines |
|
Replace manual range and mask validations with netlink policy
annotations in ctnetlink code paths, so that the netlink core rejects
invalid values early and can generate extack errors.
- CTA_PROTOINFO_TCP_STATE: reject values > TCP_CONNTRACK_SYN_SENT2 at
policy level, removing the manual >= TCP_CONNTRACK_MAX check.
- CTA_PROTOINFO_TCP_WSCALE_ORIGINAL/REPLY: reject values > TCP_MAX_WSCALE
(14). The normal TCP option parsing path already clamps to this value,
but the ctnetlink path accepted 0-255, causing undefined behavior when
used as a u32 shift count.
- CTA_FILTER_ORIG_FLAGS/REPLY_FLAGS: use NLA_POLICY_MASK with
CTA_FILTER_F_ALL, removing the manual mask checks.
- CTA_EXPECT_FLAGS: use NLA_POLICY_MASK with NF_CT_EXPECT_MASK, adding
a new mask define grouping all valid expect flags.
Extracted from a broader nf-next patch by Florian Westphal, scoped to
ctnetlink for the fixes tree.
Fixes: c8e2078cfe41 ("[NETFILTER]: ctnetlink: add support for internal tcp connection tracking flags handling")
Signed-off-by: David Carlier <devnexen@gmail.com>
Co-developed-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
|
|
Pull kvm fixes from Paolo Bonzini:
"Quite a large pull request, partly due to skipping last week and
therefore having material from ~all submaintainers in this one. About
a fourth of it is a new selftest, and a couple more changes are large
in number of files touched (fixing a -Wflex-array-member-not-at-end
compiler warning) or lines changed (reformatting of a table in the API
documentation, thanks rST).
But who am I kidding---it's a lot of commits and there are a lot of
bugs being fixed here, some of them on the nastier side like the
RISC-V ones.
ARM:
- Correctly handle deactivation of interrupts that were activated
from LRs. Since EOIcount only denotes deactivation of interrupts
that are not present in an LR, start EOIcount deactivation walk
*after* the last irq that made it into an LR
- Avoid calling into the stubs to probe for ICH_VTR_EL2.TDS when pKVM
is already enabled -- not only thhis isn't possible (pKVM will
reject the call), but it is also useless: this can only happen for
a CPU that has already booted once, and the capability will not
change
- Fix a couple of low-severity bugs in our S2 fault handling path,
affecting the recently introduced LS64 handling and the even more
esoteric handling of hwpoison in a nested context
- Address yet another syzkaller finding in the vgic initialisation,
where we would end-up destroying an uninitialised vgic with nasty
consequences
- Address an annoying case of pKVM failing to boot when some of the
memblock regions that the host is faulting in are not page-aligned
- Inject some sanity in the NV stage-2 walker by checking the limits
against the advertised PA size, and correctly report the resulting
faults
PPC:
- Fix a PPC e500 build error due to a long-standing wart that was
exposed by the recent conversion to kmalloc_obj(); rip out all the
ugliness that led to the wart
RISC-V:
- Prevent speculative out-of-bounds access using array_index_nospec()
in APLIC interrupt handling, ONE_REG regiser access, AIA CSR
access, float register access, and PMU counter access
- Fix potential use-after-free issues in kvm_riscv_gstage_get_leaf(),
kvm_riscv_aia_aplic_has_attr(), and kvm_riscv_aia_imsic_has_attr()
- Fix potential null pointer dereference in
kvm_riscv_vcpu_aia_rmw_topei()
- Fix off-by-one array access in SBI PMU
- Skip THP support check during dirty logging
- Fix error code returned for Smstateen and Ssaia ONE_REG interface
- Check host Ssaia extension when creating AIA irqchip
x86:
- Fix cases where CPUID mitigation features were incorrectly marked
as available whenever the kernel used scattered feature words for
them
- Validate _all_ GVAs, rather than just the first GVA, when
processing a range of GVAs for Hyper-V's TLB flush hypercalls
- Fix a brown paper bug in add_atomic_switch_msr()
- Use hlist_for_each_entry_srcu() when traversing mask_notifier_list,
to fix a lockdep warning; KVM doesn't hold RCU, just irq_srcu
- Ensure AVIC VMCB fields are initialized if the VM has an in-kernel
local APIC (and AVIC is enabled at the module level)
- Update CR8 write interception when AVIC is (de)activated, to fix a
bug where the guest can run in perpetuity with the CR8 intercept
enabled
- Add a quirk to skip the consistency check on FREEZE_IN_SMM, i.e. to
allow L1 hypervisors to set FREEZE_IN_SMM. This reverts (by
default) an unintentional tightening of userspace ABI in 6.17, and
provides some amount of backwards compatibility with hypervisors
who want to freeze PMCs on VM-Entry
- Validate the VMCS/VMCB on return to a nested guest from SMM,
because either userspace or the guest could stash invalid values in
memory and trigger the processor's consistency checks
Generic:
- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from
being unnecessary and confusing, triggered compiler warnings due to
-Wflex-array-member-not-at-end
- Document that vcpu->mutex is take outside of kvm->slots_lock and
kvm->slots_arch_lock, which is intentional and desirable despite
being rather unintuitive
Selftests:
- Increase the maximum number of NUMA nodes in the guest_memfd
selftest to 64 (from 8)"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (43 commits)
KVM: selftests: Verify SEV+ guests can read and write EFER, CR0, CR4, and CR8
Documentation: kvm: fix formatting of the quirks table
KVM: x86: clarify leave_smm() return value
selftests: kvm: add a test that VMX validates controls on RSM
selftests: kvm: extract common functionality out of smm_test.c
KVM: SVM: check validity of VMCB controls when returning from SMM
KVM: VMX: check validity of VMCS controls when returning from SMM
KVM: SVM: Set/clear CR8 write interception when AVIC is (de)activated
KVM: SVM: Initialize AVIC VMCB fields if AVIC is enabled with in-kernel APIC
KVM: x86: Introduce KVM_X86_QUIRK_VMCS12_ALLOW_FREEZE_IN_SMM
KVM: x86: Fix SRCU list traversal in kvm_fire_mask_notifiers()
KVM: VMX: Fix a wrong MSR update in add_atomic_switch_msr()
KVM: x86: hyper-v: Validate all GVAs during PV TLB flush
KVM: x86: synthesize CPUID bits only if CPU capability is set
KVM: PPC: e500: Rip out "struct tlbe_ref"
KVM: PPC: e500: Fix build error due to using kmalloc_obj() with wrong type
KVM: selftests: Increase 'maxnode' for guest_memfd tests
KVM: arm64: pkvm: Don't reprobe for ICH_VTR_EL2.TDS on CPU hotplug
KVM: arm64: vgic: Pick EOIcount deactivations from AP-list tail
KVM: arm64: Remove the redundant ISB in __kvm_at_s1e2()
...
|
|
HEAD
KVM generic changes for 7.0
- Remove a subtle pseudo-overlay of kvm_stats_desc, which, aside from being
unnecessary and confusing, triggered compiler warnings due to
-Wflex-array-member-not-at-end.
- Document that vcpu->mutex is take outside of kvm->slots_lock and
kvm->slots_arch_lock, which is intentional and desirable despite being
rather unintuitive.
|
|
Pull drm fixes from Dave Airlie:
"Weekly fixes pull.
There is one mm fix in here for a HMM livelock triggered by the xe
driver tests. Otherwise it's a pretty wide range of fixes across the
board, ttm UAF regression fix, amdgpu fixes, nouveau doesn't crash my
laptop anymore fix, and a fair bit of misc.
Seems about right for rc3.
mm:
- mm: Fix a hmm_range_fault() livelock / starvation problem
pagemap:
- Revert "drm/pagemap: Disable device-to-device migration"
ttm:
- fix function return breaking reclaim
- fix build failure on PREEMPT_RT
- fix bo->resource UAF
dma-buf:
- include ioctl.h in uapi header
sched:
- fix kernel doc warning
amdgpu:
- LUT fixes
- VCN5 fix
- Dispclk fix
- SMU 13.x fix
- Fix race in VM acquire
- PSP 15.x fix
- UserQ fix
amdxdna:
- fix invalid payload for failed command
- fix NULL ptr dereference
- fix major fw version check
- avoid inconsistent fw state on error
i915/display:
- Fix for Lenovo T14 G7 display not refreshing
xe:
- Do not preempt fence signaling CS instructions
- Some leak and finalization fixes
- Workaround fix
nouveau:
- avoid runtime suspend oops when using dp aux
panthor:
- fix gem_sync argument ordering
solomon:
- fix incorrect display output
renesas:
- fix DSI divider programming
ethosu:
- fix job submit error clean-up refcount
- fix NPU_OP_ELEMENTWISE validation
- handle possible underflows in IFM size calcs"
* tag 'drm-fixes-2026-03-07' of https://gitlab.freedesktop.org/drm/kernel: (38 commits)
accel: ethosu: Handle possible underflow in IFM size calculations
accel: ethosu: Fix NPU_OP_ELEMENTWISE validation with scalar
accel: ethosu: Fix job submit error clean-up refcount underflows
accel/amdxdna: Split mailbox channel create function
drm/panthor: Correct the order of arguments passed to gem_sync
Revert "drm/syncobj: Fix handle <-> fd ioctls with dirty stack"
drm/ttm: Fix bo resource use-after-free
nouveau/dpcd: return EBUSY for aux xfer if the device is asleep
accel/amdxdna: Fix major version check on NPU1 platform
drm/amdgpu/userq: refcount userqueues to avoid any race conditions
drm/amdgpu/userq: Consolidate wait ioctl exit path
drm/amdgpu/psp: Use Indirect access address for GFX to PSP mailbox
drm/amdgpu: Fix use-after-free race in VM acquire
drm/amd/pm: remove invalid gpu_metrics.energy_accumulator on smu v13.0.x
drm/xe: Fix memory leak in xe_vm_madvise_ioctl
drm/xe/reg_sr: Fix leak on xa_store failure
drm/xe/xe2_hpg: Correct implementation of Wa_16025250150
drm/xe/gsc: Fix GSC proxy cleanup on early initialization failure
Revert "drm/pagemap: Disable device-to-device migration"
drm/i915/psr: Fix for Panel Replay X granularity DPCD register handling
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Fix a typo in the mock_file help text
- Fix a comment regarding IORING_SETUP_TASKRUN_FLAG in the
io_uring.h UAPI header
- Use READ_ONCE() for reading refill queue entries
- Reject SEND_VECTORIZED for fixed buffer sends, as it isn't
implemented. Currently this flag is silently ignored
This is in preparation for making these work, but first we
need a fixup so that older kernels will correctly reject them
- Ensure "0" means default for the rx page size
* tag 'io_uring-7.0-20260305' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/zcrx: use READ_ONCE with user shared RQEs
io_uring/mock: Fix typo in help text
io_uring/net: reject SEND_VECTORIZED when unsupported
io_uring: correct comment for IORING_SETUP_TASKRUN_FLAG
io_uring/zcrx: don't set rx_page_size when not requested
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes
A return type fix for ttm, a display fix for solomon, several misc fixes
for amdxdna, a DSI clock rate fix for rz-du, a uapi fix for syncobj, a
possible build failure fix for dma-buf, a doc warning fix for sched, a
build failure fix for ttm tests, and a crash fix when suspended for
nouveau.
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://patch.msgid.link/20260305-ludicrous-quirky-raven-7cdafd@houat
|
|
include/uapi/linux/dma-buf.h uses several macros from ioctl.h to define
its ioctl commands. However, it does not include ioctl.h itself. So,
if userspace source code tries to include the dma-buf.h file without
including ioctl.h, it can result in build failures.
Therefore, include ioctl.h in the dma-buf UAPI header.
Signed-off-by: Isaac J. Manjarres <isaacmanjarres@google.com>
Reviewed-by: T.J. Mercier <tjmercier@google.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Christian König <christian.koenig@amd.com>
Link: https://lore.kernel.org/r/20260303002309.1401849-1-isaacmanjarres@google.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler fixes from Ingo Molnar:
- Fix zero_vruntime tracking when there's a single task running
- Fix slice protection logic
- Fix the ->vprot logic for reniced tasks
- Fix lag clamping in mixed slice workloads
- Fix objtool uaccess warning (and bug) in the
!CONFIG_RSEQ_SLICE_EXTENSION case caused by unexpected un-inlining,
which triggers with older compilers
- Fix a comment in the rseq registration rseq_size bound check code
- Fix a legacy RSEQ ABI quirk that handled 32-byte area sizes
differently, which special size we now reached naturally and want to
avoid. The visible ugliness of the new reserved field will be avoided
the next time the RSEQ area is extended.
* tag 'sched-urgent-2026-03-01' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
rseq: slice ext: Ensure rseq feature size differs from original rseq size
rseq: Clarify rseq registration rseq_size bound check comment
sched/core: Fix wakeup_preempt's next_class tracking
rseq: Mark rseq_arm_slice_extension_timer() __always_inline
sched/fair: Fix lag clamp
sched/eevdf: Update se->vprot in reweight_entity()
sched/fair: Only set slice protection at pick time
sched/fair: Fix zero_vruntime tracking
|
|
Sync with a recent liburing fix, which corrects the comment explaining
when the IORING_SETUP_TASKRUN_FLAG setup flag is valid to use. May be
use with COOP_TASKRUN or DEFER_TASKRUN, not useful without either of
this task_work mechanisms being used.
Link: https://github.com/axboe/liburing/pull/1543
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
fb82437fdd8c ("PCI: Change capability register offsets to hex") incorrectly
converted the PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 value from decimal 52 to hex
0x32:
-#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 52 /* v2 endpoints with link end here */
+#define PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 0x32 /* end of v2 EPs w/ link */
This broke PCI capabilities in a VMM because subsequent ones weren't
DWORD-aligned.
Change PCI_CAP_EXP_ENDPOINT_SIZEOF_V2 to the correct value of 0x34.
fb82437fdd8c was from Baruch Siach <baruch@tkos.co.il>, but this was not
Baruch's fault; it's a mistake I made when applying the patch.
Fixes: fb82437fdd8c ("PCI: Change capability register offsets to hex")
Reported-by: David Woodhouse <dwmw2@infradead.org>
Closes: https://lore.kernel.org/all/3ae392a0158e9d9ab09a1d42150429dd8ca42791.camel@infradead.org
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Krzysztof Wilczyński <kwilczynski@kernel.org>
|
|
Before rseq became extensible, its original size was 32 bytes even
though the active rseq area was only 20 bytes. This had the following
impact in terms of userspace ecosystem evolution:
* The GNU libc between 2.35 and 2.39 expose a __rseq_size symbol set
to 32, even though the size of the active rseq area is really 20.
* The GNU libc 2.40 changes this __rseq_size to 20, thus making it
express the active rseq area.
* Starting from glibc 2.41, __rseq_size corresponds to the
AT_RSEQ_FEATURE_SIZE from getauxval(3).
This means that users of __rseq_size can always expect it to
correspond to the active rseq area, except for the value 32, for
which the active rseq area is 20 bytes.
Exposing a 32 bytes feature size would make life needlessly painful
for userspace. Therefore, add a reserved field at the end of the
rseq area to bump the feature size to 33 bytes. This reserved field
is expected to be replaced with whatever field will come next,
expecting that this field will be larger than 1 byte.
The effect of this change is to increase the size from 32 to 64 bytes
before we actually have fields using that memory.
Clarify the allocation size and alignment requirements in the struct
rseq uapi comment.
Change the value returned by getauxval(AT_RSEQ_ALIGN) to return the
value of the active rseq area size rounded up to next power of 2, which
guarantees that the rseq structure will always be aligned on the nearest
power of two large enough to contain it, even as it grows. Change the
alignment check in the rseq registration accordingly.
This will minimize the amount of ABI corner-cases we need to document
and require userspace to play games with. The rule stays simple when
__rseq_size != 32:
#define rseq_field_available(field) (__rseq_size >= offsetofend(struct rseq_abi, field))
Signed-off-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260220200642.1317826-3-mathieu.desnoyers@efficios.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux
Pull Hyper-V updates from Wei Liu:
- Debugfs support for MSHV statistics (Nuno Das Neves)
- Support for the integrated scheduler (Stanislav Kinsburskii)
- Various fixes for MSHV memory management and hypervisor status
handling (Stanislav Kinsburskii)
- Expose more capabilities and flags for MSHV partition management
(Anatol Belski, Muminul Islam, Magnus Kulke)
- Miscellaneous fixes to improve code quality and stability (Carlos
López, Ethan Nelson-Moore, Li RongQing, Michael Kelley, Mukesh
Rathor, Purna Pavan Chandra Aekkaladevi, Stanislav Kinsburskii, Uros
Bizjak)
- PREEMPT_RT fixes for vmbus interrupts (Jan Kiszka)
* tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (34 commits)
mshv: Handle insufficient root memory hypervisor statuses
mshv: Handle insufficient contiguous memory hypervisor status
mshv: Introduce hv_deposit_memory helper functions
mshv: Introduce hv_result_needs_memory() helper function
mshv: Add SMT_ENABLED_GUEST partition creation flag
mshv: Add nested virtualization creation flag
Drivers: hv: vmbus: Simplify allocation of vmbus_evt
mshv: expose the scrub partition hypercall
mshv: Add support for integrated scheduler
mshv: Use try_cmpxchg() instead of cmpxchg()
x86/hyperv: Fix error pointer dereference
x86/hyperv: Reserve 3 interrupt vectors used exclusively by MSHV
Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT
x86/hyperv: Remove ASM_CALL_CONSTRAINT with VMMCALL insn
x86/hyperv: Use savesegment() instead of inline asm() to save segment registers
mshv: fix SRCU protection in irqfd resampler ack handler
mshv: make field names descriptive in a header struct
x86/hyperv: Update comment in hyperv_cleanup()
mshv: clear eventfd counter on irqfd shutdown
x86/hyperv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Jakub Kicinski:
"Including fixes from Netfilter.
Current release - new code bugs:
- net: fix backlog_unlock_irq_restore() vs CONFIG_PREEMPT_RT
- eth: mlx5e: XSK, Fix unintended ICOSQ change
- phy_port: correctly recompute the port's linkmodes
- vsock: prevent child netns mode switch from local to global
- couple of kconfig fixes for new symbols
Previous releases - regressions:
- nfc: nci: fix false-positive parameter validation for packet data
- net: do not delay zero-copy skbs in skb_attempt_defer_free()
Previous releases - always broken:
- mctp: ensure our nlmsg responses to user space are zero-initialised
- ipv6: ioam: fix heap buffer overflow in __ioam6_fill_trace_data()
- fixes for ICMP rate limiting
Misc:
- intel: fix PCI device ID conflict between i40e and ipw2200"
* tag 'net-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (85 commits)
net: nfc: nci: Fix parameter validation for packet data
net/mlx5e: Use unsigned for mlx5e_get_max_num_channels
net/mlx5e: Fix deadlocks between devlink and netdev instance locks
net/mlx5e: MACsec, add ASO poll loop in macsec_aso_set_arm_event
net/mlx5: Fix misidentification of write combining CQE during poll loop
net/mlx5e: Fix misidentification of ASO CQE during poll loop
net/mlx5: Fix multiport device check over light SFs
bonding: alb: fix UAF in rlb_arp_recv during bond up/down
bnge: fix reserving resources from FW
eth: fbnic: Advertise supported XDP features.
rds: tcp: fix uninit-value in __inet_bind
net/rds: Fix NULL pointer dereference in rds_tcp_accept_one
octeontx2-af: Fix default entries mcam entry action
net/mlx5e: XSK, Fix unintended ICOSQ change
ipv6: icmp: icmpv6_xrlim_allow() optimization if net.ipv6.icmp.ratelimit is zero
ipv4: icmp: icmpv4_xrlim_allow() optimization if net.ipv4.icmp_ratelimit is zero
ipv6: icmp: remove obsolete code in icmpv6_xrlim_allow()
inet: move icmp_global_{credit,stamp} to a separate cache line
icmp: prevent possible overflow in icmp_global_allow()
selftests/net: packetdrill: add ipv4-mapped-ipv6 tests
...
|
|
Add support for HV_PARTITION_CREATION_FLAG_SMT_ENABLED_GUEST
to allow userspace VMMs to enable SMT for guest partitions.
Expose this via new MSHV_PT_BIT_SMT_ENABLED_GUEST flag in the UAPI.
Without this flag, the hypervisor schedules guest VPs incorrectly,
causing SMT unusable.
Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
Introduce HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE to
indicate support for nested virtualization during partition creation.
This enables clearer configuration and capability checks for nested
virtualization scenarios.
Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com>
Signed-off-by: Muminul Islam <muislam@microsoft.com>
Signed-off-by: Wei Liu <wei.liu@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine
Pull dmaengine updates from Vinod Koul:
"Core:
- Add Frank Li as susbstem reviewer to help with reviews
New Support:
- Mediatek support for Dimensity 6300 and 9200 controller
- Qualcomm Kaanapali and Glymur GPI DMA engine
- Synopsis DW AXI Agilex5
- Renesas RZ/V2N SoC
- Atmel microchip lan9691-dma
- Tegra ADMA tegra264
Updates:
- sg_nents_for_dma() helper use in subsystem
- pm_runtime_mark_last_busy() redundant call update for subsystem
- Residue support for xilinx AXIDMA driver
- Intel Max SGL Size Support and capabilities for DSA3.0
- AXI dma larger than 32bits address support"
* tag 'dmaengine-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/vkoul/dmaengine: (64 commits)
dmaengine: add Frank Li as reviewer
dt-bindings: dma: qcom,gpi: Update max interrupts lines to 16
dmaengine: fsl-edma: don't explicitly disable clocks in .remove()
dmaengine: xilinx: xdma: use sg_nents_for_dma() helper
dmaengine: sh: use sg_nents_for_dma() helper
dmaengine: sa11x0: use sg_nents_for_dma() helper
dmaengine: qcom: bam_dma: use sg_nents_for_dma() helper
dmaengine: qcom: adm: use sg_nents_for_dma() helper
dmaengine: pxa-dma: use sg_nents_for_dma() helper
dmaengine: lgm: use sg_nents_for_dma() helper
dmaengine: k3dma: use sg_nents_for_dma() helper
dmaengine: dw-axi-dmac: use sg_nents_for_dma() helper
dmaengine: bcm2835-dma: use sg_nents_for_dma() helper
dmaengine: axi-dmac: use sg_nents_for_dma() helper
dmaengine: altera-msgdma: use sg_nents_for_dma() helper
scatterlist: introduce sg_nents_for_dma() helper
dmaengine: idxd: Add Max SGL Size Support for DSA3.0
dmaengine: idxd: Expose DSA3.0 capabilities through sysfs
dmaengine: sh: rz-dmac: Make channel irq local
dmaengine: pl08x: Fix comment stating the difference between PL080 and PL081
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc/IIO driver updates from Greg KH:
"Here is the big set of char/misc/iio and other smaller driver
subsystem changes for 7.0-rc1. Lots of little things in here,
including:
- Loads of iio driver changes and updates and additions
- gpib driver updates
- interconnect driver updates
- i3c driver updates
- hwtracing (coresight and intel) driver updates
- deletion of the obsolete mwave driver
- binder driver updates (rust and c versions)
- mhi driver updates (causing a merge conflict, see below)
- mei driver updates
- fsi driver updates
- eeprom driver updates
- lots of other small char and misc driver updates and cleanups
All of these have been in linux-next for a while, with no reported
issues"
* tag 'char-misc-7.0-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (297 commits)
mux: mmio: fix regmap leak on probe failure
rust_binder: return p from rust_binder_transaction_target_node()
drivers: android: binder: Update ARef imports from sync::aref
rust_binder: fix needless borrow in context.rs
iio: magn: mmc5633: Fix Kconfig for combination of I3C as module and driver builtin
iio: sca3000: Fix a resource leak in sca3000_probe()
iio: proximity: rfd77402: Add interrupt handling support
iio: proximity: rfd77402: Document device private data structure
iio: proximity: rfd77402: Use devm-managed mutex initialization
iio: proximity: rfd77402: Use kernel helper for result polling
iio: proximity: rfd77402: Align polling timeout with datasheet
iio: cros_ec: Allow enabling/disabling calibration mode
iio: frequency: ad9523: correct kernel-doc bad line warning
iio: buffer: buffer_impl.h: fix kernel-doc warnings
iio: gyro: itg3200: Fix unchecked return value in read_raw
MAINTAINERS: add entry for ADE9000 driver
iio: accel: sca3000: remove unused last_timestamp field
iio: accel: adxl372: remove unused int2_bitmask field
iio: adc: ad7766: Use iio_trigger_generic_data_rdy_poll()
iio: magnetometer: Remove IRQF_ONESHOT
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull more io_uring updates from Jens Axboe:
"This is a mix of cleanups and fixes. No major fixes in here, just a
bunch of little fixes. Some of them marked for stable as it fixes
behavioral issues
- Fix an issue with SOCKET_URING_OP_SETSOCKOPT for netlink sockets,
due to a too restrictive check on it having an ioctl handler
- Remove a redundant SQPOLL check in ring creation
- Kill dead accounting for zero-copy send, which doesn't use ->buf
or ->len post the initial setup
- Fix missing clamp of the allocation hint, which could cause
allocations to fall outside of the range the application asked
for. Still within the allowed limits.
- Fix for IORING_OP_PIPE's handling of direct descriptors
- Tweak to the API for the newly added BPF filters, making them
more future proof in terms of how applications deal with them
- A few fixes for zcrx, fixing a few error handling conditions
- Fix for zcrx request flag checking
- Add support for querying the zcrx page size
- Improve the NO_SQARRAY static branch inc/dec, avoiding busy
conditions causing too much traffic
- Various little cleanups"
* tag 'io_uring-7.0-20260216' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/bpf_filter: pass in expected filter payload size
io_uring/bpf_filter: move filter size and populate helper into struct
io_uring/cancel: de-unionize file and user_data in struct io_cancel_data
io_uring/rsrc: improve regbuf iov validation
io_uring: remove unneeded io_send_zc accounting
io_uring/cmd_net: fix too strict requirement on ioctl
io_uring: delay sqarray static branch disablement
io_uring/query: add query.h copyright notice
io_uring/query: return support for custom rx page size
io_uring/zcrx: check unsupported flags on import
io_uring/zcrx: fix post open error handling
io_uring/zcrx: fix sgtable leak on mapping failures
io_uring: use the right type for creds iteration
io_uring/openclose: fix io_pipe_fixed() slot tracking for specific slots
io_uring/filetable: clamp alloc_hint to the configured alloc range
io_uring/rsrc: replace reg buffer bit field with flags
io_uring/zcrx: improve types for size calculation
io_uring/tctx: avoid modifying loop variable in io_ring_add_registered_file
io_uring: simplify IORING_SETUP_DEFER_TASKRUN && !SQPOLL check
|
|
Musl defines its own struct ethhdr and thus defines __UAPI_DEF_ETHHDR to
zero. To avoid struct redefinition errors, user space is therefore
supposed to include netinet/if_ether.h before (or instead of)
linux/if_ether.h. To relieve them from this burden, include the libc
header here if not building for kernel space.
Reported-by: Alyssa Ross <hi@alyssa.is>
Suggested-by: Florian Westphal <fw@strlen.de>
Signed-off-by: Phil Sutter <phil@nwl.cc>
Signed-off-by: Florian Westphal <fw@strlen.de>
|
|
It's quite possible that opcodes that have payloads attached to them,
like IORING_OP_OPENAT/OPENAT2 or IORING_OP_SOCKET, that these paylods
can change over time. For example, on the openat/openat2 side, the
struct open_how argument is extensible, and could be extended in the
future to allow further arguments to be passed in.
Allow registration of a cBPF filter to give the size of the filter as
seen by userspace. If that filter is for an opcode that takes extra
payload data, allow it if the application payload expectation is the
same size than the kernels. If that is the case, the kernel supports
filtering on the payload that the application expects. If the size
differs, the behavior depends on the IO_URING_BPF_FILTER_SZ_STRICT flag:
1) If IO_URING_BPF_FILTER_SZ_STRICT is set and the size expectation
differs, fail the attempt to load the filter.
2) If IO_URING_BPF_FILTER_SZ_STRICT isn't set, allow the filter if
the userspace pdu size is smaller than what the kernel offers.
3) Regardless if IO_URING_BPF_FILTER_SZ_STRICT, fail loading the filter
if the userspace pdu size is bigger than what the kernel supports.
An attempt to load a filter due to sizing will error with -EMSGSIZE.
For that error, the registration struct will have filter->pdu_size
populated with the pdu size that the kernel uses.
Reported-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull more misc vfs updates from Christian Brauner:
"Features:
- Optimize close_range() from O(range size) to O(active FDs) by using
find_next_bit() on the open_fds bitmap instead of linearly scanning
the entire requested range. This is a significant improvement for
large-range close operations on sparse file descriptor tables.
- Add FS_XFLAG_VERITY file attribute for fs-verity files, retrievable
via FS_IOC_FSGETXATTR and file_getattr(). The flag is read-only.
Add tracepoints for fs-verity enable and verify operations,
replacing the previously removed debug printk's.
- Prevent nfsd from exporting special kernel filesystems like pidfs
and nsfs. These filesystems have custom ->open() and ->permission()
export methods that are designed for open_by_handle_at(2) only and
are incompatible with nfsd. Update the exportfs documentation
accordingly.
Fixes:
- Fix KMSAN uninit-value in ovl_fill_real() where strcmp() was used
on a non-null-terminated decrypted directory entry name from
fscrypt. This triggered on encrypted lower layers when the
decrypted name buffer contained uninitialized tail data.
The fix also adds VFS-level name_is_dot(), name_is_dotdot(), and
name_is_dot_dotdot() helpers, replacing various open-coded "." and
".." checks across the tree.
- Fix read-only fsflags not being reset together with xflags in
vfs_fileattr_set(). Currently harmless since no read-only xflags
overlap with flags, but this would cause inconsistencies for any
future shared read-only flag
- Return -EREMOTE instead of -ESRCH from PIDFD_GET_INFO when the
target process is in a different pid namespace. This lets userspace
distinguish "process exited" from "process in another namespace",
matching glibc's pidfd_getpid() behavior
Cleanups:
- Use C-string literals in the Rust seq_file bindings, replacing the
kernel::c_str!() macro (available since Rust 1.77)
- Fix typo in d_walk_ret enum comment, add porting notes for the
readlink_copy() calling convention change"
* tag 'vfs-7.0-rc1.misc.2' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
fs: add porting notes about readlink_copy()
pidfs: return -EREMOTE when PIDFD_GET_INFO is called on another ns
nfsd: do not allow exporting of special kernel filesystems
exportfs: clarify the documentation of open()/permission() expotrfs ops
fsverity: add tracepoints
fs: add FS_XFLAG_VERITY for fs-verity files
rust: seq_file: replace `kernel::c_str!` with C-Strings
fs: dcache: fix typo in enum d_walk_ret comment
ovl: use name_is_dot* helpers in readdir code
fs: add helpers name_is_dot{,dot,_dotdot}
ovl: Fix uninit-value in ovl_fill_real
fs: reset read-only fsflags together with xflags
fs/file: optimize close_range() complexity from O(N) to O(Sparse)
|
|
Add a copyright notice to io_uring's query uapi header.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add an ability to query if the zcrx rx page size setting is available.
Note, even when the API is supported by io_uring, the registration can
still get rejected for various reasons, e.g. when the NIC or the driver
doesn't support it, when the particular specified size is unsupported,
when the memory area doesn't satisfy all requirements, etc.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Pull virtio updates from Michael Tsirkin:
- in-order support in virtio core
- multiple address space support in vduse
- fixes, cleanups all over the place, notably dma alignment fixes for
non-cache-coherent systems
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (59 commits)
vduse: avoid adding implicit padding
vhost: fix caching attributes of MMIO regions by setting them explicitly
vdpa/mlx5: update MAC address handling in mlx5_vdpa_set_attr()
vdpa/mlx5: reuse common function for MAC address updates
vdpa/mlx5: update mlx_features with driver state check
crypto: virtio: Replace package id with numa node id
crypto: virtio: Remove duplicated virtqueue_kick in virtio_crypto_skcipher_crypt_req
crypto: virtio: Add spinlock protection with virtqueue notification
Documentation: Add documentation for VDUSE Address Space IDs
vduse: bump version number
vduse: add vq group asid support
vduse: merge tree search logic of IOTLB_GET_FD and IOTLB_GET_INFO ioctls
vduse: take out allocations from vduse_dev_alloc_coherent
vduse: remove unused vaddr parameter of vduse_domain_free_coherent
vduse: refactor vdpa_dev_add for goto err handling
vhost: forbid change vq groups ASID if DRIVER_OK is set
vdpa: document set_group_asid thread safety
vduse: return internal vq group struct as map token
vduse: add vq group support
vduse: add v1 API definition
...
|
|
Pull KVM updates from Paolo Bonzini:
"Loongarch:
- Add more CPUCFG mask bits
- Improve feature detection
- Add lazy load support for FPU and binary translation (LBT) register
state
- Fix return value for memory reads from and writes to in-kernel
devices
- Add support for detecting preemption from within a guest
- Add KVM steal time test case to tools/selftests
ARM:
- Add support for FEAT_IDST, allowing ID registers that are not
implemented to be reported as a normal trap rather than as an UNDEF
exception
- Add sanitisation of the VTCR_EL2 register, fixing a number of
UXN/PXN/XN bugs in the process
- Full handling of RESx bits, instead of only RES0, and resulting in
SCTLR_EL2 being added to the list of sanitised registers
- More pKVM fixes for features that are not supposed to be exposed to
guests
- Make sure that MTE being disabled on the pKVM host doesn't give it
the ability to attack the hypervisor
- Allow pKVM's host stage-2 mappings to use the Force Write Back
version of the memory attributes by using the "pass-through'
encoding
- Fix trapping of ICC_DIR_EL1 on GICv5 hosts emulating GICv3 for the
guest
- Preliminary work for guest GICv5 support
- A bunch of debugfs fixes, removing pointless custom iterators
stored in guest data structures
- A small set of FPSIMD cleanups
- Selftest fixes addressing the incorrect alignment of page
allocation
- Other assorted low-impact fixes and spelling fixes
RISC-V:
- Fixes for issues discoverd by KVM API fuzzing in
kvm_riscv_aia_imsic_has_attr(), kvm_riscv_aia_imsic_rw_attr(), and
kvm_riscv_vcpu_aia_imsic_update()
- Allow Zalasr, Zilsd and Zclsd extensions for Guest/VM
- Transparent huge page support for hypervisor page tables
- Adjust the number of available guest irq files based on MMIO
register sizes found in the device tree or the ACPI tables
- Add RISC-V specific paging modes to KVM selftests
- Detect paging mode at runtime for selftests
s390:
- Performance improvement for vSIE (aka nested virtualization)
- Completely new memory management. s390 was a special snowflake that
enlisted help from the architecture's page table management to
build hypervisor page tables, in particular enabling sharing the
last level of page tables. This however was a lot of code (~3K
lines) in order to support KVM, and also blocked several features.
The biggest advantages is that the page size of userspace is
completely independent of the page size used by the guest:
userspace can mix normal pages, THPs and hugetlbfs as it sees fit,
and in fact transparent hugepages were not possible before. It's
also now possible to have nested guests and guests with huge pages
running on the same host
- Maintainership change for s390 vfio-pci
- Small quality of life improvement for protected guests
x86:
- Add support for giving the guest full ownership of PMU hardware
(contexted switched around the fastpath run loop) and allowing
direct access to data MSRs and PMCs (restricted by the vPMU model).
KVM still intercepts access to control registers, e.g. to enforce
event filtering and to prevent the guest from profiling sensitive
host state. This is more accurate, since it has no risk of
contention and thus dropped events, and also has significantly less
overhead.
For more information, see the commit message for merge commit
bf2c3138ae36 ("Merge tag 'kvm-x86-pmu-6.20' ...")
- Disallow changing the virtual CPU model if L2 is active, for all
the same reasons KVM disallows change the model after the first
KVM_RUN
- Fix a bug where KVM would incorrectly reject host accesses to PV
MSRs when running with KVM_CAP_ENFORCE_PV_FEATURE_CPUID enabled,
even if those were advertised as supported to userspace,
- Fix a bug with protected guest state (SEV-ES/SNP and TDX) VMs,
where KVM would attempt to read CR3 configuring an async #PF entry
- Fail the build if EXPORT_SYMBOL_GPL or EXPORT_SYMBOL is used in KVM
(for x86 only) to enforce usage of EXPORT_SYMBOL_FOR_KVM_INTERNAL.
Only a few exports that are intended for external usage, and those
are allowed explicitly
- When checking nested events after a vCPU is unblocked, ignore
-EBUSY instead of WARNing. Userspace can sometimes put the vCPU
into what should be an impossible state, and spurious exit to
userspace on -EBUSY does not really do anything to solve the issue
- Also throw in the towel and drop the WARN on INIT/SIPI being
blocked when vCPU is in Wait-For-SIPI, which also resulted in
playing whack-a-mole with syzkaller stuffing architecturally
impossible states into KVM
- Add support for new Intel instructions that don't require anything
beyond enumerating feature flags to userspace
- Grab SRCU when reading PDPTRs in KVM_GET_SREGS2
- Add WARNs to guard against modifying KVM's CPU caps outside of the
intended setup flow, as nested VMX in particular is sensitive to
unexpected changes in KVM's golden configuration
- Add a quirk to allow userspace to opt-in to actually suppress EOI
broadcasts when the suppression feature is enabled by the guest
(currently limited to split IRQCHIP, i.e. userspace I/O APIC).
Sadly, simply fixing KVM to honor Suppress EOI Broadcasts isn't an
option as some userspaces have come to rely on KVM's buggy behavior
(KVM advertises Supress EOI Broadcast irrespective of whether or
not userspace I/O APIC supports Directed EOIs)
- Clean up KVM's handling of marking mapped vCPU pages dirty
- Drop a pile of *ancient* sanity checks hidden behind in KVM's
unused ASSERT() macro, most of which could be trivially triggered
by the guest and/or user, and all of which were useless
- Fold "struct dest_map" into its sole user, "struct rtc_status", to
make it more obvious what the weird parameter is used for, and to
allow fropping these RTC shenanigans if CONFIG_KVM_IOAPIC=n
- Bury all of ioapic.h, i8254.h and related ioctls (including
KVM_CREATE_IRQCHIP) behind CONFIG_KVM_IOAPIC=y
- Add a regression test for recent APICv update fixes
- Handle "hardware APIC ISR", a.k.a. SVI, updates in
kvm_apic_update_apicv() to consolidate the updates, and to
co-locate SVI updates with the updates for KVM's own cache of ISR
information
- Drop a dead function declaration
- Minor cleanups
x86 (Intel):
- Rework KVM's handling of VMCS updates while L2 is active to
temporarily switch to vmcs01 instead of deferring the update until
the next nested VM-Exit.
The deferred updates approach directly contributed to several bugs,
was proving to be a maintenance burden due to the difficulty in
auditing the correctness of deferred updates, and was polluting
"struct nested_vmx" with a growing pile of booleans
- Fix an SGX bug where KVM would incorrectly try to handle EPCM page
faults, and instead always reflect them into the guest. Since KVM
doesn't shadow EPCM entries, EPCM violations cannot be due to KVM
interference and can't be resolved by KVM
- Fix a bug where KVM would register its posted interrupt wakeup
handler even if loading kvm-intel.ko ultimately failed
- Disallow access to vmcb12 fields that aren't fully supported,
mostly to avoid weirdness and complexity for FRED and other
features, where KVM wants enable VMCS shadowing for fields that
conditionally exist
- Print out the "bad" offsets and values if kvm-intel.ko refuses to
load (or refuses to online a CPU) due to a VMCS config mismatch
x86 (AMD):
- Drop a user-triggerable WARN on nested_svm_load_cr3() failure
- Add support for virtualizing ERAPS. Note, correct virtualization of
ERAPS relies on an upcoming, publicly announced change in the APM
to reduce the set of conditions where hardware (i.e. KVM) *must*
flush the RAP
- Ignore nSVM intercepts for instructions that are not supported
according to L1's virtual CPU model
- Add support for expedited writes to the fast MMIO bus, a la VMX's
fastpath for EPT Misconfig
- Don't set GIF when clearing EFER.SVME, as GIF exists independently
of SVM, and allow userspace to restore nested state with GIF=0
- Treat exit_code as an unsigned 64-bit value through all of KVM
- Add support for fetching SNP certificates from userspace
- Fix a bug where KVM would use vmcb02 instead of vmcb01 when
emulating VMLOAD or VMSAVE on behalf of L2
- Misc fixes and cleanups
x86 selftests:
- Add a regression test for TPR<=>CR8 synchronization and IRQ masking
- Overhaul selftest's MMU infrastructure to genericize stage-2 MMU
support, and extend x86's infrastructure to support EPT and NPT
(for L2 guests)
- Extend several nested VMX tests to also cover nested SVM
- Add a selftest for nested VMLOAD/VMSAVE
- Rework the nested dirty log test, originally added as a regression
test for PML where KVM logged L2 GPAs instead of L1 GPAs, to
improve test coverage and to hopefully make the test easier to
understand and maintain
guest_memfd:
- Remove kvm_gmem_populate()'s preparation tracking and half-baked
hugepage handling. SEV/SNP was the only user of the tracking and it
can do it via the RMP
- Retroactively document and enforce (for SNP) that
KVM_SEV_SNP_LAUNCH_UPDATE and KVM_TDX_INIT_MEM_REGION require the
source page to be 4KiB aligned, to avoid non-trivial complexity for
something that no known VMM seems to be doing and to avoid an API
special case for in-place conversion, which simply can't support
unaligned sources
- When populating guest_memfd memory, GUP the source page in common
code and pass the refcounted page to the vendor callback, instead
of letting vendor code do the heavy lifting. Doing so avoids a
looming deadlock bug with in-place due an AB-BA conflict betwee
mmap_lock and guest_memfd's filemap invalidate lock
Generic:
- Fix a bug where KVM would ignore the vCPU's selected address space
when creating a vCPU-specific mapping of guest memory. Actually
this bug could not be hit even on x86, the only architecture with
multiple address spaces, but it's a bug nevertheless"
* tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (267 commits)
KVM: s390: Increase permitted SE header size to 1 MiB
MAINTAINERS: Replace backup for s390 vfio-pci
KVM: s390: vsie: Fix race in acquire_gmap_shadow()
KVM: s390: vsie: Fix race in walk_guest_tables()
KVM: s390: Use guest address to mark guest page dirty
irqchip/riscv-imsic: Adjust the number of available guest irq files
RISC-V: KVM: Transparent huge page support
RISC-V: KVM: selftests: Add Zalasr extensions to get-reg-list test
RISC-V: KVM: Allow Zalasr extensions for Guest/VM
KVM: riscv: selftests: Add riscv vm satp modes
KVM: riscv: selftests: add Zilsd and Zclsd extension to get-reg-list test
riscv: KVM: allow Zilsd and Zclsd extensions for Guest/VM
RISC-V: KVM: Skip IMSIC update if vCPU IMSIC state is not initialized
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_rw_attr()
RISC-V: KVM: Fix null pointer dereference in kvm_riscv_aia_imsic_has_attr()
RISC-V: KVM: Remove unnecessary 'ret' assignment
KVM: s390: Add explicit padding to struct kvm_s390_keyop
KVM: LoongArch: selftests: Add steal time test case
LoongArch: KVM: Add paravirt vcpu_is_preempted() support in guest side
LoongArch: KVM: Add paravirt preempt feature in hypervisor side
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Paul Walmsley:
- Add support for control flow integrity for userspace processes.
This is based on the standard RISC-V ISA extensions Zicfiss and
Zicfilp
- Improve ptrace behavior regarding vector registers, and add some
selftests
- Optimize our strlen() assembly
- Enable the ISO-8859-1 code page as built-in, similar to ARM64, for
EFI volume mounting
- Clean up some code slightly, including defining copy_user_page() as
copy_page() rather than memcpy(), aligning us with other
architectures; and using max3() to slightly simplify an expression
in riscv_iommu_init_check()
* tag 'riscv-for-linus-7.0-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (42 commits)
riscv: lib: optimize strlen loop efficiency
selftests: riscv: vstate_exec_nolibc: Use the regular prctl() function
selftests: riscv: verify ptrace accepts valid vector csr values
selftests: riscv: verify ptrace rejects invalid vector csr inputs
selftests: riscv: verify syscalls discard vector context
selftests: riscv: verify initial vector state with ptrace
selftests: riscv: test ptrace vector interface
riscv: ptrace: validate input vector csr registers
riscv: csr: define vtype register elements
riscv: vector: init vector context with proper vlenb
riscv: ptrace: return ENODATA for inactive vector extension
kselftest |