aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/irdma
AgeCommit message (Collapse)AuthorFilesLines
7 daysRDMA: During rereg_mr ensure that REREG_ACCESS is compatibleJason Gunthorpe1-0/+4
If IB_MR_REREG_ACCESS changes from RO to RW then the umem has to be re-evaluated to ensure it is properly pinned as RW. Since the umem is hidden inside each driver's mr struct add a ib_umem_check_rereg() function that each driver has to call before processing IB_MR_REREG_ACCESS. mlx4 has to retain its duplicate ib_access_writable check because it implements IB_MR_REREG_ACCESS | IB_MR_REREG_TRANS by changing both items in place sequentially while the MR is live, so it will continue to not support this combination. Cc: stable@vger.kernel.org Fixes: b40656aa7d55 ("RDMA/umem: remove FOLL_FORCE usage") Link: https://patch.msgid.link/r/0-v1-06fb1a2d6cf5+107-rereg_access_jgg@nvidia.com Reported-by: Philip Tsukerman <philiptsukerman@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-31RDMA/irdma: Add missing comp_mask check in alloc_ucontextJason Gunthorpe1-1/+3
irdma has a comp_mask field that was never checked for validity, check it. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-31RDMA: Consolidate patterns with offsetofend() to ib_copy_validate_udata_in()Jason Gunthorpe1-5/+5
Go treewide and consolidate all existing patterns using: * offsetofend() and variations * ib_is_udata_cleared() * ib_copy_from_udata() into a direct call to the new ib_copy_validate_udata_in(). Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30Merge branch 'master' into rdma-nextLeon Romanovsky4-36/+44
Let's bring v7.0-rc6 to the -next branch, so we can merge the DMA attributes fix [1] without merge conflicts. [1] https://lore.kernel.org/all/20260323-umem-dma-attrs-v1-1-d6890f2e6a1e@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org> * master: (1688 commits) Linux 7.0-rc6 ...
2026-03-30RDMA: Properly propagate the number of CQEs as unsigned intLeon Romanovsky1-1/+1
Instead of checking whether the number of CQEs is negative or zero, fix the .resize_user_cq() declaration to use unsigned int. This better reflects the expected value range. The sanity check is then handled correctly in ib_uvbers. Link: https://patch.msgid.link/20260319-resize_cq-cqe-v1-1-b78c6efc1def@nvidia.com Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA: Clarify that CQ resize is a user‑space verbLeon Romanovsky1-1/+1
The CQ resize operation is used only by uverbs. Make this explicit. Link: https://patch.msgid.link/20260318-resize_cq-type-v1-2-b2846ed18846@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-03-30RDMA/irdma: Add support for GEN4 hardwareJacob Moroni4-5/+12
GEN4 hardware is similar to GEN3 and requires only a few special cases. Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-30RDMA/irdma: Provide scratch buffers to firmware for internal useJay Bhat5-3/+61
For GEN3 and higher, FW requires scratch buffers for bookkeeping during cleanup, specifically during QP and MR destroy ops. Signed-off-by: Jay Bhat <jay.bhat@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Harden depth calculation functionsShiraz Saleem1-17/+22
An issue was exposed where OS can pass in U32_MAX for SQ/RQ/SRQ size. This can cause integer overflow and truncation of SQ/RQ/SRQ depth returning a success when it should have failed. Harden the functions to do all depth calculations and boundary checking in u64 sizes. Fixes: 563e1feb5f6e ("RDMA/irdma: Add SRQ support") Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Return EINVAL for invalid arp index errorTatyana Nikolova1-7/+10
When rdma_connect() fails due to an invalid arp index, user space rdma core reports ENOMEM which is confusing. Modify irdma_make_cm_node() to return the correct error code. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Fix deadlock during netdev reset with active connectionsAnil Samal1-1/+2
Resolve deadlock that occurs when user executes netdev reset while RDMA applications (e.g., rping) are active. The netdev reset causes ice driver to remove irdma auxiliary driver, triggering device_delete and subsequent client removal. During client removal, uverbs_client waits for QP reference count to reach zero while cma_client holds the final reference, creating circular dependency and indefinite wait in iWARP mode. Skip QP reference count wait during device reset to prevent deadlock. Fixes: c8f304d75f6c ("RDMA/irdma: Prevent QP use after free") Signed-off-by: Anil Samal <anil.samal@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Remove reset check from irdma_modify_qp_to_err()Tatyana Nikolova1-2/+0
During reset, irdma_modify_qp() to error should be called to disconnect the QP. Without this fix, if not preceded by irdma_modify_qp() to error, the API call irdma_destroy_qp() gets stuck waiting for the QP refcount to go to zero, because the cm_node associated with this QP isn't disconnected. Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Clean up unnecessary dereference of event->cm_nodeIvan Barrera1-6/+6
The cm_node is available and the usage of cm_node and event->cm_node seems arbitrary. Clean up unnecessary dereference of event->cm_node. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Ivan Barrera <ivan.d.barrera@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Remove a NOP wait_event() in irdma_modify_qp_roce()Tatyana Nikolova1-2/+0
Remove a NOP wait_event() in irdma_modify_qp_roce() which is relevant for iWARP and likely a copy and paste artifact for RoCEv2. The wait event is for sending a reset on a TCP connection, after the reset has been requested in irdma_modify_qp(), which occurs only in iWarp mode. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Update ibqp state to error if QP is already in error stateTatyana Nikolova1-0/+2
In irdma_modify_qp() update ibqp state to error if the irdma QP is already in error state, otherwise the ibqp state which is visible to the consumer app remains stale. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-18RDMA/irdma: Initialize free_qp completion before using itJacob Moroni1-1/+1
In irdma_create_qp, if ib_copy_to_udata fails, it will call irdma_destroy_qp to clean up which will attempt to wait on the free_qp completion, which is not initialized yet. Fix this by initializing the completion before the ib_copy_to_udata call. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-08RDMA/irdma: Add support for revocable pinned dmabuf importJacob Moroni1-12/+93
Use the new API to support importing pinned dmabufs from exporters that require revocation, such as VFIO. The revoke semantic is achieved by issuing a HW invalidation command but not freeing the key. This prevents further accesses to the region (they will result in an invalid key AE), but also keeps the key reserved until the region is actually deregistered (i.e., ibv_dereg_mr) so that a new MR registration cannot acquire the same key. Tested with lockdep+kasan and a memfd backed dmabuf. The rereg_mr path is explicitly blocked in libibverbs for dmabuf MRs (more specifically, any MR not of type IBV_MR_TYPE_MR), so the rereg_mr path for dmabufs was tested with a modified libibverbs. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260305170826.3803155-6-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-05Add support for TLP emulationLeon Romanovsky1-1/+1
This series adds support for Transaction Layer Packet (TLP) emulation response gateway regions, enabling userspace device emulation software to write TLP responses directly to lower layers without kernel driver involvement. Currently, the mlx5 driver exposes VirtIO emulation access regions via the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that ioctl to also support allocating TLP response gateway channels for PCI device emulation use cases. Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-04RDMA/irdma: Fix double free related to rereg_user_mrJacob Moroni1-0/+1
If IB_MR_REREG_TRANS is set during rereg_user_mr, the umem will be released and a new one will be allocated in irdma_rereg_mr_trans. If any step of irdma_rereg_mr_trans fails after the new umem is allocated, it releases the umem, but does not set iwmr->region to NULL. The problem is that this failure is propagated to the user, who will then call ibv_dereg_mr (as they should). Then, the dereg_mr path will see a non-NULL umem and attempt to call ib_umem_release again. Fix this by setting iwmr->region to NULL after ib_umem_release. Fixed: 5ac388db27c4 ("RDMA/irdma: Add support to re-register a memory region") Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260227152743.1183388-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-03-02RDMA: Complete k[z|m|c]alloc-to-k[z|m]alloc_obj conversionLeon Romanovsky1-2/+2
Commits bf4afc53b77a ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") and 69050f8d6d07 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") updated various k[z|m|c]alloc calls to their k[z|m]alloc_obj counterparts. This commit finalizes that transition within the RDMA subsystem. Link: https://patch.msgid.link/20260226-complete-alloc-conversion-v1-1-ebf1df1c2518@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-25RDMA: Move DMA block iterator logic into dedicated filesLeon Romanovsky1-1/+1
The DMA iterator logic was mixed into verbs and umem-specific code, forcing all users to include rdma/ib_umem.h. Move the block iterator logic into iter.c and rdma/iter.h so that rdma/ib_umem.h and rdma/ib_verbs.h can be separated in a follow-up patch. Link: https://patch.msgid.link/20260213-refactor-umem-v1-1-f3be85847922@nvidia.com Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2026-02-24RDMA/irdma: Fix kernel stack leak in irdma_create_user_ah()Jason Gunthorpe1-1/+1
struct irdma_create_ah_resp { // 8 bytes, no padding __u32 ah_id; // offset 0 - SET (uresp.ah_id = ah->sc_ah.ah_info.ah_idx) __u8 rsvd[4]; // offset 4 - NEVER SET <- LEAK }; rsvd[4]: 4 bytes of stack memory leaked unconditionally. Only ah_id is assigned before ib_respond_udata(). The reserved members of the structure were not zeroed. Cc: stable@vger.kernel.org Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://patch.msgid.link/3-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds4-12/+6
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds6-13/+13
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook7-31/+34
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds8-82/+165
Pull rdma updates from Jason Gunthorpe: "Usual smallish cycle. The NFS biovec work to push it down into RDMA instead of indirecting through a scatterlist is pretty nice to see, been talked about for a long time now. - Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe - Small driver improvements and minor bug fixes to hns, mlx5, rxe, mana, mlx5, irdma - Robusness improvements in completion processing for EFA - New query_port_speed() verb to move past limited IBA defined speed steps - Support for SG_GAPS in rts and many other small improvements - Rare list corruption fix in iwcm - Better support different page sizes in rxe - Device memory support for mana - Direct bio vec to kernel MR for use by NFS-RDMA - QP rate limiting for bnxt_re - Remote triggerable NULL pointer crash in siw - DMA-buf exporter support for RDMA mmaps like doorbells" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/mlx5: Implement DMABUF export ops RDMA/uverbs: Add DMABUF object type and operations RDMA/uverbs: Support external FD uobjects RDMA/siw: Fix potential NULL pointer dereference in header processing RDMA/umad: Reject negative data_len in ib_umad_write IB/core: Extend rate limit support for RC QPs RDMA/mlx5: Support rate limit only for Raw Packet QP RDMA/bnxt_re: Report QP rate limit in debugfs RDMA/bnxt_re: Report packet pacing capabilities when querying device RDMA/bnxt_re: Add support for QP rate limiting MAINTAINERS: Drop RDMA files from Hyper-V section RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc svcrdma: use bvec-based RDMA read/write API RDMA/core: add rdma_rw_max_sge() helper for SQ sizing RDMA/core: add MR support for bvec-based RDMA operations RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations RDMA/core: add bio_vec based RDMA read/write API RDMA/irdma: Use kvzalloc for paged memory DMA address array RDMA/rxe: Fix race condition in QP timer handlers RDMA/mana_ib: Add device‑memory support ...
2026-01-28RDMA/irdma: Use kvzalloc for paged memory DMA address arrayCarlos Bilbao1-3/+3
Allocate array chunk->dmainfo.dmaaddrs using kvzalloc() to allow the allocation to fall back to vmalloc when contiguous memory is unavailable (instead of failing and logging page allocation warnings). Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Carlos Bilbao (Lambda) <carlos.bilbao@kernel.org> Link: https://patch.msgid.link/20260128014446.405247-1-carlos.bilbao@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-25RDMA/irdma: Use CQ ID for CEQE contextJacob Moroni6-51/+127
The hardware allows for an opaque CQ context field to be carried over into CEQEs for the CQ. Previously, a pointer to the CQ was used for this context. In the normal CQ destroy flow, the CEQ ring is scrubbed to remove any preexisting CEQEs for the CQ that may not have been processed yet so that the CQ structure is not dereferenced in the CEQ ISR after the CQ has been freed. However, in some cases, it is possible for a CEQE to be in flight in HW even after the CQ destroy command completion is received, so it could be missed during the scrub. To protect against this, we can take advantage of the CQ table that already exists and use the CQ ID for this context rather than a CQ pointer. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260120212546.1893076-2-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-25RDMA/irdma: Add enum defs for reserved CQs/QPsJacob Moroni2-10/+22
Added definitions for the special reserved CQs and QPs. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260120212546.1893076-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/irdma: Remove fixed 1 ms delay during AH wait loopJacob Moroni3-9/+11
The AH CQP command wait loop executes in an atomic context and was using a fixed 1 ms delay. Since many AH create commands can complete much faster than 1 ms, use poll_timeout_us_atomic with a 1 us delay. Also, use the timeout value indicated during the capability exchange rather than a hard-coded value. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260105180550.2907858-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2026-01-13RDMA/irdma: Remove redundant dma_wmb() before writel()Jacob Moroni2-5/+0
A dma_wmb() is not necessary before a writel() because writel() already has an even stronger store barrier. A dma_wmb() is only required to order writes to consistent/DMA memory whereas the barrier in writel() is specified to order writes to DMA memory as well as MMIO. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20260103172517.2088895-1-jmoroni@google.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-12-18RDMA/irdma: Simplify bool conversionJiapeng Chong2-4/+2
./drivers/infiniband/hw/irdma/ctrl.c:5792:10-15: WARNING: conversion to bool not needed here. ./drivers/infiniband/hw/irdma/uk.c:1412:6-11: WARNING: conversion to bool not needed here. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=27521 Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Link: https://patch.msgid.link/20251204092414.1261795-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-12-17RDMA/irdma: avoid invalid read in irdma_net_eventMichal Schmidt1-1/+2
irdma_net_event() should not dereference anything from "neigh" (alias "ptr") until it has checked that the event is NETEVENT_NEIGH_UPDATE. Other events come with different structures pointed to by "ptr" and they may be smaller than struct neighbour. Move the read of neigh->dev under the NETEVENT_NEIGH_UPDATE case. The bug is mostly harmless, but it triggers KASAN on debug kernels: BUG: KASAN: stack-out-of-bounds in irdma_net_event+0x32e/0x3b0 [irdma] Read of size 8 at addr ffffc900075e07f0 by task kworker/27:2/542554 CPU: 27 PID: 542554 Comm: kworker/27:2 Kdump: loaded Not tainted 5.14.0-630.el9.x86_64+debug #1 Hardware name: [...] Workqueue: events rt6_probe_deferred Call Trace: <IRQ> dump_stack_lvl+0x60/0xb0 print_address_description.constprop.0+0x2c/0x3f0 print_report+0xb4/0x270 kasan_report+0x92/0xc0 irdma_net_event+0x32e/0x3b0 [irdma] notifier_call_chain+0x9e/0x180 atomic_notifier_call_chain+0x5c/0x110 rt6_do_redirect+0xb91/0x1080 tcp_v6_err+0xe9b/0x13e0 icmpv6_notify+0x2b2/0x630 ndisc_redirect_rcv+0x328/0x530 icmpv6_rcv+0xc16/0x1360 ip6_protocol_deliver_rcu+0xb84/0x12e0 ip6_input_finish+0x117/0x240 ip6_input+0xc4/0x370 ipv6_rcv+0x420/0x7d0 __netif_receive_skb_one_core+0x118/0x1b0 process_backlog+0xd1/0x5d0 __napi_poll.constprop.0+0xa3/0x440 net_rx_action+0x78a/0xba0 handle_softirqs+0x2d4/0x9c0 do_softirq+0xad/0xe0 </IRQ> Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions") Link: https://patch.msgid.link/r/20251127143150.121099-1-mschmidt@redhat.com Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2025-12-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds14-225/+114
Pull rdma updates from Jason Gunthorpe: "This has another new RDMA driver 'bng_en' for latest generation Broadcom NICs. There might be one more new driver still to come. Otherwise it is a fairly quite cycle. Summary: - Minor driver bug fixes and updates to cxgb4, rxe, rdmavt, bnxt_re, mlx5 - Many bug fix patches for irdma - WQ_PERCPU annotations and system_dfl_wq changes - Improved mlx5 support for "other eswitches" and multiple PFs - 1600Gbps link speed reporting support. Four Digits Now! - New driver bng_en for latest generation Broadcom NICs - Bonding support for hns - Adjust mlx5's hmm based ODP to work with the very large address space created by the new 5 level paging default on x86 - Lockdep fixups in rxe and siw" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits) RDMA/rxe: reclassify sockets in order to avoid false positives from lockdep RDMA/siw: reclassify sockets in order to avoid false positives from lockdep RDMA/bng_re: Remove prefetch instruction RDMA/core: Reduce cond_resched() frequency in __ib_umem_release RDMA/irdma: Fix SRQ shadow area address initialization RDMA/irdma: Remove doorbell elision logic RDMA/irdma: Do not set IBK_LOCAL_DMA_LKEY for GEN3+ RDMA/irdma: Do not directly rely on IB_PD_UNSAFE_GLOBAL_RKEY RDMA/irdma: Add missing mutex destroy RDMA/irdma: Fix SIGBUS in AEQ destroy RDMA/irdma: Add a missing kfree of struct irdma_pci_f for GEN2 RDMA/irdma: Fix data race in irdma_free_pble RDMA/irdma: Fix data race in irdma_sc_ccq_arm RDMA/mlx5: Add support for 1600_8x lane speed RDMA/core: Add new IB rate for XDR (8x) support IB/mlx5: Reduce IMR KSM size when 5-level paging is enabled RDMA/bnxt_re: Pass correct flag for dma mr creation RDMA/bnxt_re: Fix the inline size for GenP7 devices RDMA/hns: Support reset recovery for bond RDMA/hns: Support link state reporting for bond ...
2025-11-26RDMA/irdma: Fix SRQ shadow area address initializationJijun Wang1-1/+1
Fix SRQ shadow area address initialization. Fixes: 563e1feb5f6e ("RDMA/irdma: Add SRQ support") Signed-off-by: Jijun Wang <jijun.wang@intel.com> Signed-off-by: Jay Bhat <jay.bhat@intel.com> Link: https://patch.msgid.link/20251125025350.180-10-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-26RDMA/irdma: Remove doorbell elision logicJacob Moroni3-31/+2
In some cases, this logic can result in doorbell writes being skipped when they should not have been (at least on GEN3 HW), so remove it. This also means that the mb() can be safely downgraded to dma_wmb(). Fixes: 551c46edc769 ("RDMA/irdma: Add user/kernel shared libraries") Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251125025350.180-9-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-26RDMA/irdma: Do not set IBK_LOCAL_DMA_LKEY for GEN3+Jacob Moroni1-1/+2
The GEN3 hardware does not appear to support IBK_LOCAL_DMA_LKEY. Attempts to use it will result in an AE. Fixes: eb31dfc2b41a ("RDMA/irdma: Restrict Memory Window and CQE Timestamping to GEN3") Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251125025350.180-8-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-26RDMA/irdma: Do not directly rely on IB_PD_UNSAFE_GLOBAL_RKEYJacob Moroni4-10/+12
The HW disables bounds checking for MRs with a length of zero, so the driver will only allow a zero length MR if the "all_memory" flag is set, and this flag is only set if IB_PD_UNSAFE_GLOBAL_RKEY is set for the PD. This means that the "get_dma_mr" method will currently fail unless the IB_PD_UNSAFE_GLOBAL_RKEY flag is set. This has not been an issue because the "get_dma_mr" method is only ever invoked if the device does not support the local DMA key or if IB_PD_UNSAFE_GLOBAL_RKEY is set, and so far, all IRDMA HW supports the local DMA lkey. However, some new HW does not support the local DMA lkey, so the "get_dma_mr" method needs to work without IB_PD_UNSAFE_GLOBAL_RKEY being set. To support HW that does not allow the local DMA lkey, the logic has been changed to pass an explicit flag to indicate when a dma_mr is being created so that the zero length will be allowed. Also, the "all_memory" flag has been forced to false for normal MR allocation since these MRs are never supposed to provide global unsafe rkey semantics anyway; only the MR created with "get_dma_mr" should support this. Fixes: bb6d73d9add6 ("RDMA/irdma: Prevent zero-length STAG registration") Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251125025350.180-7-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-26RDMA/irdma: Add missing mutex destroyAnil Samal3-2/+10
Add missing destroy of ah_tbl_lock and vchnl_mutex. Fixes: d5edd33364a5 ("RDMA/irdma: RDMA/irdma: Add GEN3 core driver support") Signed-off-by: Anil Samal <anil.samal@intel.com> Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251125025350.180-6-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-26RDMA/irdma: Fix SIGBUS in AEQ destroyKrzysztof Czurylo1-1/+2
Removes write to IRDMA_PFINT_AEQCTL register prior to destroying AEQ, as this register does not exist in GEN3+ hardware and this kind of IRQ configuration is no longer required. Fixes: b800e82feba7 ("RDMA/irdma: Add GEN3 support for AEQ and CEQ") Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251125025350.180-5-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-26RDMA/irdma: Add a missing kfree of struct irdma_pci_f for GEN2Tatyana Nikolova1-0/+2
During a refactor of the irdma GEN2 code, the kfree of the irdma_pci_f struct in icrdma_remove(), which was originally introduced upstream as part of commit 80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X") was accidentally removed. Fixes: 0c2b80cac96e ("RDMA/irdma: Refactor GEN2 auxiliary driver") Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251125025350.180-4-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-26RDMA/irdma: Fix data race in irdma_free_pbleKrzysztof Czurylo1-2/+4
Protects pble_rsrc counters with mutex to prevent data race. Fixes the following data race in irdma_free_pble reported by KCSAN: BUG: KCSAN: data-race in irdma_free_pble [irdma] / irdma_free_pble [irdma] write to 0xffff91430baa0078 of 8 bytes by task 16956 on cpu 5: irdma_free_pble+0x3b/0xb0 [irdma] irdma_dereg_mr+0x108/0x110 [irdma] ib_dereg_mr_user+0x74/0x160 [ib_core] uverbs_free_mr+0x26/0x30 [ib_uverbs] destroy_hw_idr_uobject+0x4a/0x90 [ib_uverbs] uverbs_destroy_uobject+0x7b/0x330 [ib_uverbs] uobj_destroy+0x61/0xb0 [ib_uverbs] ib_uverbs_run_method+0x1f2/0x380 [ib_uverbs] ib_uverbs_cmd_verbs+0x365/0x440 [ib_uverbs] ib_uverbs_ioctl+0x111/0x190 [ib_uverbs] __x64_sys_ioctl+0xc9/0x100 do_syscall_64+0x44/0xa0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 read to 0xffff91430baa0078 of 8 bytes by task 16953 on cpu 2: irdma_free_pble+0x23/0xb0 [irdma] irdma_dereg_mr+0x108/0x110 [irdma] ib_dereg_mr_user+0x74/0x160 [ib_core] uverbs_free_mr+0x26/0x30 [ib_uverbs] destroy_hw_idr_uobject+0x4a/0x90 [ib_uverbs] uverbs_destroy_uobject+0x7b/0x330 [ib_uverbs] uobj_destroy+0x61/0xb0 [ib_uverbs] ib_uverbs_run_method+0x1f2/0x380 [ib_uverbs] ib_uverbs_cmd_verbs+0x365/0x440 [ib_uverbs] ib_uverbs_ioctl+0x111/0x190 [ib_uverbs] __x64_sys_ioctl+0xc9/0x100 do_syscall_64+0x44/0xa0 entry_SYSCALL_64_after_hwframe+0x6e/0xd8 value changed: 0x0000000000005a62 -> 0x0000000000005a68 Fixes: e8c4dbc2fcac ("RDMA/irdma: Add PBLE resource manager") Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251125025350.180-3-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-26RDMA/irdma: Fix data race in irdma_sc_ccq_armKrzysztof Czurylo1-0/+3
Adds a lock around irdma_sc_ccq_arm body to prevent inter-thread data race. Fixes data race in irdma_sc_ccq_arm() reported by KCSAN: BUG: KCSAN: data-race in irdma_sc_ccq_arm [irdma] / irdma_sc_ccq_arm [irdma] read to 0xffff9d51b4034220 of 8 bytes by task 255 on cpu 11: irdma_sc_ccq_arm+0x36/0xd0 [irdma] irdma_cqp_ce_handler+0x300/0x310 [irdma] cqp_compl_worker+0x2a/0x40 [irdma] process_one_work+0x402/0x7e0 worker_thread+0xb3/0x6d0 kthread+0x178/0x1a0 ret_from_fork+0x2c/0x50 write to 0xffff9d51b4034220 of 8 bytes by task 89 on cpu 3: irdma_sc_ccq_arm+0x7e/0xd0 [irdma] irdma_cqp_ce_handler+0x300/0x310 [irdma] irdma_wait_event+0xd4/0x3e0 [irdma] irdma_handle_cqp_op+0xa5/0x220 [irdma] irdma_hw_flush_wqes+0xb1/0x300 [irdma] irdma_flush_wqes+0x22e/0x3a0 [irdma] irdma_cm_disconn_true+0x4c7/0x5d0 [irdma] irdma_disconnect_worker+0x35/0x50 [irdma] process_one_work+0x402/0x7e0 worker_thread+0xb3/0x6d0 kthread+0x178/0x1a0 ret_from_fork+0x2c/0x50 value changed: 0x0000000000024000 -> 0x0000000000034000 Fixes: 3f49d6842569 ("RDMA/irdma: Implement HW Admin Queue OPs") Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251125025350.180-2-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-12RDMA/irdma: Remove redundant NULL check of udata in irdma_create_user_ah()Tuo Li1-1/+1
The variable udata cannot be NULL because irdma_create_user_ah() always receives it. Therefore, the if() check can be safely removed. Signed-off-by: Tuo Li <islituo@gmail.com> Link: https://patch.msgid.link/20251112120253.68945-1-islituo@gmail.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-09RDMA/irdma: Remove unused CQ registryJacob Moroni3-122/+3
The CQ registry was never actually used (ceq->reg_cq was always NULL), so remove the dead code. Signed-off-by: Jacob Moroni <jmoroni@google.com> Link: https://patch.msgid.link/20251105162841.31786-1-jmoroni@google.com Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2025-11-06RDMA/irdma: Take a lock before moving SRQ tail in poll_cqJay Bhat3-0/+7
Need to take an SRQ lock in poll_cq before moving SRQ tail. Signed-off-by: Jay Bhat <jay.bhat@intel.com> Reviewed-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-7-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-02RDMA/irdma: CQ size and shadow update changes for GEN3Jay Bhat5-40/+56
CQ shadow area should not be updated at the end of a page (once every 64th CQ entry), except when CQ has no more CQEs. SW must also increase the requested CQ size by 1 and make sure the CQ is not exactly one page in size. This is to address a quirk in the hardware. Signed-off-by: Jay Bhat <jay.bhat@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-4-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-02RDMA/irdma: Silently consume unsignaled completionsJay Bhat2-1/+7
In case we get an unsignaled error completion, we silently consume the CQE by pretending the QP does not exist. Without this, bookkeeping for signaled completions does not work correctly. Signed-off-by: Jay Bhat <jay.bhat@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-5-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-02RDMA/irdma: Initialize cqp_cmds_info to prevent resource leaksJay Bhat3-12/+1
Failure to initialize info.create field to false in certain cases was resulting in incorrect status code going to rdma-core when dereg_mr failed during reset. To fix this, memset entire cqp_request->info in irdma_alloc_and_get_cqp_request() function, so that this is not spread all over the code. Signed-off-by: Bhat, Jay <jay.bhat@intel.com> Reviewed-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-2-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2025-11-02RDMA/irdma: Enforce local fence for LOCAL_INV WRsJacob Moroni1-1/+1
Enforce local fence for LOCAL_INV WRs to avoid spurious FASTREG_VALID_MKEY async events during heavy invalidation/registration activity. Signed-off-by: Jacob Moroni <jmoroni@google.com> Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com> Link: https://patch.msgid.link/20251031021726.1003-3-tatyana.e.nikolova@intel.com Signed-off-by: Leon Romanovsky <leon@kernel.org>