| Age | Commit message (Collapse) | Author | Files | Lines |
|
If IB_MR_REREG_ACCESS changes from RO to RW then the umem has to be
re-evaluated to ensure it is properly pinned as RW. Since the umem is
hidden inside each driver's mr struct add a ib_umem_check_rereg() function
that each driver has to call before processing IB_MR_REREG_ACCESS.
mlx4 has to retain its duplicate ib_access_writable check because it
implements IB_MR_REREG_ACCESS | IB_MR_REREG_TRANS by changing both items
in place sequentially while the MR is live, so it will continue to not
support this combination.
Cc: stable@vger.kernel.org
Fixes: b40656aa7d55 ("RDMA/umem: remove FOLL_FORCE usage")
Link: https://patch.msgid.link/r/0-v1-06fb1a2d6cf5+107-rereg_access_jgg@nvidia.com
Reported-by: Philip Tsukerman <philiptsukerman@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
irdma has a comp_mask field that was never checked for validity, check
it.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Go treewide and consolidate all existing patterns using:
* offsetofend() and variations
* ib_is_udata_cleared()
* ib_copy_from_udata()
into a direct call to the new ib_copy_validate_udata_in().
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Let's bring v7.0-rc6 to the -next branch, so we can merge the DMA
attributes fix [1] without merge conflicts.
[1] https://lore.kernel.org/all/20260323-umem-dma-attrs-v1-1-d6890f2e6a1e@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
* master: (1688 commits)
Linux 7.0-rc6
...
|
|
Instead of checking whether the number of CQEs is negative or zero, fix the
.resize_user_cq() declaration to use unsigned int. This better reflects the
expected value range. The sanity check is then handled correctly in ib_uvbers.
Link: https://patch.msgid.link/20260319-resize_cq-cqe-v1-1-b78c6efc1def@nvidia.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The CQ resize operation is used only by uverbs. Make this explicit.
Link: https://patch.msgid.link/20260318-resize_cq-type-v1-2-b2846ed18846@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
GEN4 hardware is similar to GEN3 and requires only a few special cases.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
For GEN3 and higher, FW requires scratch buffers for bookkeeping
during cleanup, specifically during QP and MR destroy ops.
Signed-off-by: Jay Bhat <jay.bhat@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
An issue was exposed where OS can pass in U32_MAX for SQ/RQ/SRQ size.
This can cause integer overflow and truncation of SQ/RQ/SRQ depth
returning a success when it should have failed.
Harden the functions to do all depth calculations and boundary
checking in u64 sizes.
Fixes: 563e1feb5f6e ("RDMA/irdma: Add SRQ support")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When rdma_connect() fails due to an invalid arp index, user space rdma core
reports ENOMEM which is confusing. Modify irdma_make_cm_node() to return the
correct error code.
Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Resolve deadlock that occurs when user executes netdev reset while RDMA
applications (e.g., rping) are active. The netdev reset causes ice
driver to remove irdma auxiliary driver, triggering device_delete and
subsequent client removal. During client removal, uverbs_client waits
for QP reference count to reach zero while cma_client holds the final
reference, creating circular dependency and indefinite wait in iWARP
mode. Skip QP reference count wait during device reset to prevent
deadlock.
Fixes: c8f304d75f6c ("RDMA/irdma: Prevent QP use after free")
Signed-off-by: Anil Samal <anil.samal@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
During reset, irdma_modify_qp() to error should be called to disconnect
the QP. Without this fix, if not preceded by irdma_modify_qp() to error, the
API call irdma_destroy_qp() gets stuck waiting for the QP refcount to go
to zero, because the cm_node associated with this QP isn't disconnected.
Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The cm_node is available and the usage of cm_node and event->cm_node
seems arbitrary. Clean up unnecessary dereference of event->cm_node.
Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager")
Signed-off-by: Ivan Barrera <ivan.d.barrera@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Remove a NOP wait_event() in irdma_modify_qp_roce() which is relevant
for iWARP and likely a copy and paste artifact for RoCEv2. The wait event
is for sending a reset on a TCP connection, after the reset has been
requested in irdma_modify_qp(), which occurs only in iWarp mode.
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In irdma_modify_qp() update ibqp state to error if the irdma QP is already
in error state, otherwise the ibqp state which is visible to the consumer
app remains stale.
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In irdma_create_qp, if ib_copy_to_udata fails, it will call
irdma_destroy_qp to clean up which will attempt to wait on
the free_qp completion, which is not initialized yet. Fix this
by initializing the completion before the ib_copy_to_udata call.
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Use the new API to support importing pinned dmabufs from exporters
that require revocation, such as VFIO. The revoke semantic is
achieved by issuing a HW invalidation command but not freeing
the key. This prevents further accesses to the region (they will
result in an invalid key AE), but also keeps the key reserved
until the region is actually deregistered (i.e., ibv_dereg_mr)
so that a new MR registration cannot acquire the same key.
Tested with lockdep+kasan and a memfd backed dmabuf.
The rereg_mr path is explicitly blocked in libibverbs for dmabuf MRs
(more specifically, any MR not of type IBV_MR_TYPE_MR), so the rereg_mr
path for dmabufs was tested with a modified libibverbs.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260305170826.3803155-6-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
This series adds support for Transaction Layer Packet (TLP) emulation
response gateway regions, enabling userspace device emulation software
to write TLP responses directly to lower layers without kernel driver
involvement.
Currently, the mlx5 driver exposes VirtIO emulation access regions via
the MLX5_IB_METHOD_VAR_OBJ_ALLOC ioctl. This series extends that
ioctl to also support allocating TLP response gateway channels for
PCI device emulation use cases.
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
If IB_MR_REREG_TRANS is set during rereg_user_mr, the
umem will be released and a new one will be allocated
in irdma_rereg_mr_trans. If any step of irdma_rereg_mr_trans
fails after the new umem is allocated, it releases the umem,
but does not set iwmr->region to NULL. The problem is that
this failure is propagated to the user, who will then call
ibv_dereg_mr (as they should). Then, the dereg_mr path will
see a non-NULL umem and attempt to call ib_umem_release again.
Fix this by setting iwmr->region to NULL after ib_umem_release.
Fixed: 5ac388db27c4 ("RDMA/irdma: Add support to re-register a memory region")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260227152743.1183388-1-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Commits bf4afc53b77a ("Convert 'alloc_obj' family to use the new default
GFP_KERNEL argument") and 69050f8d6d07 ("treewide: Replace kmalloc with
kmalloc_obj for non-scalar types") updated various k[z|m|c]alloc calls to their
k[z|m]alloc_obj counterparts.
This commit finalizes that transition within the RDMA subsystem.
Link: https://patch.msgid.link/20260226-complete-alloc-conversion-v1-1-ebf1df1c2518@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The DMA iterator logic was mixed into verbs and umem-specific code,
forcing all users to include rdma/ib_umem.h. Move the block iterator
logic into iter.c and rdma/iter.h so that rdma/ib_umem.h and
rdma/ib_verbs.h can be separated in a follow-up patch.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-1-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
struct irdma_create_ah_resp { // 8 bytes, no padding
__u32 ah_id; // offset 0 - SET (uresp.ah_id = ah->sc_ah.ah_info.ah_idx)
__u8 rsvd[4]; // offset 4 - NEVER SET <- LEAK
};
rsvd[4]: 4 bytes of stack memory leaked unconditionally. Only ah_id is assigned before ib_respond_udata().
The reserved members of the structure were not zeroed.
Cc: stable@vger.kernel.org
Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs")
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Link: https://patch.msgid.link/3-v1-83e918d69e73+a9-rdma_udata_rc_jgg@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Pull rdma updates from Jason Gunthorpe:
"Usual smallish cycle. The NFS biovec work to push it down into RDMA
instead of indirecting through a scatterlist is pretty nice to see,
been talked about for a long time now.
- Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe
- Small driver improvements and minor bug fixes to hns, mlx5, rxe,
mana, mlx5, irdma
- Robusness improvements in completion processing for EFA
- New query_port_speed() verb to move past limited IBA defined speed
steps
- Support for SG_GAPS in rts and many other small improvements
- Rare list corruption fix in iwcm
- Better support different page sizes in rxe
- Device memory support for mana
- Direct bio vec to kernel MR for use by NFS-RDMA
- QP rate limiting for bnxt_re
- Remote triggerable NULL pointer crash in siw
- DMA-buf exporter support for RDMA mmaps like doorbells"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits)
RDMA/mlx5: Implement DMABUF export ops
RDMA/uverbs: Add DMABUF object type and operations
RDMA/uverbs: Support external FD uobjects
RDMA/siw: Fix potential NULL pointer dereference in header processing
RDMA/umad: Reject negative data_len in ib_umad_write
IB/core: Extend rate limit support for RC QPs
RDMA/mlx5: Support rate limit only for Raw Packet QP
RDMA/bnxt_re: Report QP rate limit in debugfs
RDMA/bnxt_re: Report packet pacing capabilities when querying device
RDMA/bnxt_re: Add support for QP rate limiting
MAINTAINERS: Drop RDMA files from Hyper-V section
RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc
svcrdma: use bvec-based RDMA read/write API
RDMA/core: add rdma_rw_max_sge() helper for SQ sizing
RDMA/core: add MR support for bvec-based RDMA operations
RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations
RDMA/core: add bio_vec based RDMA read/write API
RDMA/irdma: Use kvzalloc for paged memory DMA address array
RDMA/rxe: Fix race condition in QP timer handlers
RDMA/mana_ib: Add device‑memory support
...
|
|
Allocate array chunk->dmainfo.dmaaddrs using kvzalloc() to allow the
allocation to fall back to vmalloc when contiguous memory is unavailable
(instead of failing and logging page allocation warnings).
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Carlos Bilbao (Lambda) <carlos.bilbao@kernel.org>
Link: https://patch.msgid.link/20260128014446.405247-1-carlos.bilbao@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The hardware allows for an opaque CQ context field to be carried
over into CEQEs for the CQ. Previously, a pointer to the CQ was used
for this context. In the normal CQ destroy flow, the CEQ ring is
scrubbed to remove any preexisting CEQEs for the CQ that may not have
been processed yet so that the CQ structure is not dereferenced in the
CEQ ISR after the CQ has been freed.
However, in some cases, it is possible for a CEQE to be in flight in
HW even after the CQ destroy command completion is received, so it
could be missed during the scrub.
To protect against this, we can take advantage of the CQ table that
already exists and use the CQ ID for this context rather than a CQ
pointer.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260120212546.1893076-2-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Added definitions for the special reserved CQs and QPs.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260120212546.1893076-1-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The AH CQP command wait loop executes in an atomic context and was
using a fixed 1 ms delay. Since many AH create commands can complete
much faster than 1 ms, use poll_timeout_us_atomic with a 1 us delay.
Also, use the timeout value indicated during the capability exchange
rather than a hard-coded value.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260105180550.2907858-1-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
A dma_wmb() is not necessary before a writel() because writel()
already has an even stronger store barrier. A dma_wmb() is only
required to order writes to consistent/DMA memory whereas the
barrier in writel() is specified to order writes to DMA memory as
well as MMIO.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20260103172517.2088895-1-jmoroni@google.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
./drivers/infiniband/hw/irdma/ctrl.c:5792:10-15: WARNING: conversion to bool not needed here.
./drivers/infiniband/hw/irdma/uk.c:1412:6-11: WARNING: conversion to bool not needed here.
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=27521
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Link: https://patch.msgid.link/20251204092414.1261795-1-jiapeng.chong@linux.alibaba.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
irdma_net_event() should not dereference anything from "neigh" (alias
"ptr") until it has checked that the event is NETEVENT_NEIGH_UPDATE.
Other events come with different structures pointed to by "ptr" and they
may be smaller than struct neighbour.
Move the read of neigh->dev under the NETEVENT_NEIGH_UPDATE case.
The bug is mostly harmless, but it triggers KASAN on debug kernels:
BUG: KASAN: stack-out-of-bounds in irdma_net_event+0x32e/0x3b0 [irdma]
Read of size 8 at addr ffffc900075e07f0 by task kworker/27:2/542554
CPU: 27 PID: 542554 Comm: kworker/27:2 Kdump: loaded Not tainted 5.14.0-630.el9.x86_64+debug #1
Hardware name: [...]
Workqueue: events rt6_probe_deferred
Call Trace:
<IRQ>
dump_stack_lvl+0x60/0xb0
print_address_description.constprop.0+0x2c/0x3f0
print_report+0xb4/0x270
kasan_report+0x92/0xc0
irdma_net_event+0x32e/0x3b0 [irdma]
notifier_call_chain+0x9e/0x180
atomic_notifier_call_chain+0x5c/0x110
rt6_do_redirect+0xb91/0x1080
tcp_v6_err+0xe9b/0x13e0
icmpv6_notify+0x2b2/0x630
ndisc_redirect_rcv+0x328/0x530
icmpv6_rcv+0xc16/0x1360
ip6_protocol_deliver_rcu+0xb84/0x12e0
ip6_input_finish+0x117/0x240
ip6_input+0xc4/0x370
ipv6_rcv+0x420/0x7d0
__netif_receive_skb_one_core+0x118/0x1b0
process_backlog+0xd1/0x5d0
__napi_poll.constprop.0+0xa3/0x440
net_rx_action+0x78a/0xba0
handle_softirqs+0x2d4/0x9c0
do_softirq+0xad/0xe0
</IRQ>
Fixes: 915cc7ac0f8e ("RDMA/irdma: Add miscellaneous utility definitions")
Link: https://patch.msgid.link/r/20251127143150.121099-1-mschmidt@redhat.com
Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Pull rdma updates from Jason Gunthorpe:
"This has another new RDMA driver 'bng_en' for latest generation
Broadcom NICs. There might be one more new driver still to come.
Otherwise it is a fairly quite cycle. Summary:
- Minor driver bug fixes and updates to cxgb4, rxe, rdmavt, bnxt_re,
mlx5
- Many bug fix patches for irdma
- WQ_PERCPU annotations and system_dfl_wq changes
- Improved mlx5 support for "other eswitches" and multiple PFs
- 1600Gbps link speed reporting support. Four Digits Now!
- New driver bng_en for latest generation Broadcom NICs
- Bonding support for hns
- Adjust mlx5's hmm based ODP to work with the very large address
space created by the new 5 level paging default on x86
- Lockdep fixups in rxe and siw"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (65 commits)
RDMA/rxe: reclassify sockets in order to avoid false positives from lockdep
RDMA/siw: reclassify sockets in order to avoid false positives from lockdep
RDMA/bng_re: Remove prefetch instruction
RDMA/core: Reduce cond_resched() frequency in __ib_umem_release
RDMA/irdma: Fix SRQ shadow area address initialization
RDMA/irdma: Remove doorbell elision logic
RDMA/irdma: Do not set IBK_LOCAL_DMA_LKEY for GEN3+
RDMA/irdma: Do not directly rely on IB_PD_UNSAFE_GLOBAL_RKEY
RDMA/irdma: Add missing mutex destroy
RDMA/irdma: Fix SIGBUS in AEQ destroy
RDMA/irdma: Add a missing kfree of struct irdma_pci_f for GEN2
RDMA/irdma: Fix data race in irdma_free_pble
RDMA/irdma: Fix data race in irdma_sc_ccq_arm
RDMA/mlx5: Add support for 1600_8x lane speed
RDMA/core: Add new IB rate for XDR (8x) support
IB/mlx5: Reduce IMR KSM size when 5-level paging is enabled
RDMA/bnxt_re: Pass correct flag for dma mr creation
RDMA/bnxt_re: Fix the inline size for GenP7 devices
RDMA/hns: Support reset recovery for bond
RDMA/hns: Support link state reporting for bond
...
|
|
Fix SRQ shadow area address initialization.
Fixes: 563e1feb5f6e ("RDMA/irdma: Add SRQ support")
Signed-off-by: Jijun Wang <jijun.wang@intel.com>
Signed-off-by: Jay Bhat <jay.bhat@intel.com>
Link: https://patch.msgid.link/20251125025350.180-10-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In some cases, this logic can result in doorbell writes being
skipped when they should not have been (at least on GEN3 HW),
so remove it. This also means that the mb() can be safely
downgraded to dma_wmb().
Fixes: 551c46edc769 ("RDMA/irdma: Add user/kernel shared libraries")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251125025350.180-9-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The GEN3 hardware does not appear to support IBK_LOCAL_DMA_LKEY. Attempts
to use it will result in an AE.
Fixes: eb31dfc2b41a ("RDMA/irdma: Restrict Memory Window and CQE Timestamping to GEN3")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251125025350.180-8-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The HW disables bounds checking for MRs with a length of zero, so
the driver will only allow a zero length MR if the "all_memory"
flag is set, and this flag is only set if IB_PD_UNSAFE_GLOBAL_RKEY
is set for the PD.
This means that the "get_dma_mr" method will currently fail unless
the IB_PD_UNSAFE_GLOBAL_RKEY flag is set. This has not been an issue
because the "get_dma_mr" method is only ever invoked if the device
does not support the local DMA key or if IB_PD_UNSAFE_GLOBAL_RKEY
is set, and so far, all IRDMA HW supports the local DMA lkey.
However, some new HW does not support the local DMA lkey, so the
"get_dma_mr" method needs to work without IB_PD_UNSAFE_GLOBAL_RKEY
being set.
To support HW that does not allow the local DMA lkey, the logic has
been changed to pass an explicit flag to indicate when a dma_mr is
being created so that the zero length will be allowed.
Also, the "all_memory" flag has been forced to false for normal MR
allocation since these MRs are never supposed to provide global
unsafe rkey semantics anyway; only the MR created with "get_dma_mr"
should support this.
Fixes: bb6d73d9add6 ("RDMA/irdma: Prevent zero-length STAG registration")
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251125025350.180-7-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add missing destroy of ah_tbl_lock and vchnl_mutex.
Fixes: d5edd33364a5 ("RDMA/irdma: RDMA/irdma: Add GEN3 core driver support")
Signed-off-by: Anil Samal <anil.samal@intel.com>
Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251125025350.180-6-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Removes write to IRDMA_PFINT_AEQCTL register prior to destroying AEQ,
as this register does not exist in GEN3+ hardware and this kind of IRQ
configuration is no longer required.
Fixes: b800e82feba7 ("RDMA/irdma: Add GEN3 support for AEQ and CEQ")
Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251125025350.180-5-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
During a refactor of the irdma GEN2 code, the kfree of the irdma_pci_f struct
in icrdma_remove(), which was originally introduced upstream as part of
commit 80f2ab46c2ee ("irdma: free iwdev->rf after removing MSI-X")
was accidentally removed.
Fixes: 0c2b80cac96e ("RDMA/irdma: Refactor GEN2 auxiliary driver")
Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251125025350.180-4-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Protects pble_rsrc counters with mutex to prevent data race.
Fixes the following data race in irdma_free_pble reported by KCSAN:
BUG: KCSAN: data-race in irdma_free_pble [irdma] / irdma_free_pble [irdma]
write to 0xffff91430baa0078 of 8 bytes by task 16956 on cpu 5:
irdma_free_pble+0x3b/0xb0 [irdma]
irdma_dereg_mr+0x108/0x110 [irdma]
ib_dereg_mr_user+0x74/0x160 [ib_core]
uverbs_free_mr+0x26/0x30 [ib_uverbs]
destroy_hw_idr_uobject+0x4a/0x90 [ib_uverbs]
uverbs_destroy_uobject+0x7b/0x330 [ib_uverbs]
uobj_destroy+0x61/0xb0 [ib_uverbs]
ib_uverbs_run_method+0x1f2/0x380 [ib_uverbs]
ib_uverbs_cmd_verbs+0x365/0x440 [ib_uverbs]
ib_uverbs_ioctl+0x111/0x190 [ib_uverbs]
__x64_sys_ioctl+0xc9/0x100
do_syscall_64+0x44/0xa0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
read to 0xffff91430baa0078 of 8 bytes by task 16953 on cpu 2:
irdma_free_pble+0x23/0xb0 [irdma]
irdma_dereg_mr+0x108/0x110 [irdma]
ib_dereg_mr_user+0x74/0x160 [ib_core]
uverbs_free_mr+0x26/0x30 [ib_uverbs]
destroy_hw_idr_uobject+0x4a/0x90 [ib_uverbs]
uverbs_destroy_uobject+0x7b/0x330 [ib_uverbs]
uobj_destroy+0x61/0xb0 [ib_uverbs]
ib_uverbs_run_method+0x1f2/0x380 [ib_uverbs]
ib_uverbs_cmd_verbs+0x365/0x440 [ib_uverbs]
ib_uverbs_ioctl+0x111/0x190 [ib_uverbs]
__x64_sys_ioctl+0xc9/0x100
do_syscall_64+0x44/0xa0
entry_SYSCALL_64_after_hwframe+0x6e/0xd8
value changed: 0x0000000000005a62 -> 0x0000000000005a68
Fixes: e8c4dbc2fcac ("RDMA/irdma: Add PBLE resource manager")
Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251125025350.180-3-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Adds a lock around irdma_sc_ccq_arm body to prevent inter-thread data race.
Fixes data race in irdma_sc_ccq_arm() reported by KCSAN:
BUG: KCSAN: data-race in irdma_sc_ccq_arm [irdma] / irdma_sc_ccq_arm [irdma]
read to 0xffff9d51b4034220 of 8 bytes by task 255 on cpu 11:
irdma_sc_ccq_arm+0x36/0xd0 [irdma]
irdma_cqp_ce_handler+0x300/0x310 [irdma]
cqp_compl_worker+0x2a/0x40 [irdma]
process_one_work+0x402/0x7e0
worker_thread+0xb3/0x6d0
kthread+0x178/0x1a0
ret_from_fork+0x2c/0x50
write to 0xffff9d51b4034220 of 8 bytes by task 89 on cpu 3:
irdma_sc_ccq_arm+0x7e/0xd0 [irdma]
irdma_cqp_ce_handler+0x300/0x310 [irdma]
irdma_wait_event+0xd4/0x3e0 [irdma]
irdma_handle_cqp_op+0xa5/0x220 [irdma]
irdma_hw_flush_wqes+0xb1/0x300 [irdma]
irdma_flush_wqes+0x22e/0x3a0 [irdma]
irdma_cm_disconn_true+0x4c7/0x5d0 [irdma]
irdma_disconnect_worker+0x35/0x50 [irdma]
process_one_work+0x402/0x7e0
worker_thread+0xb3/0x6d0
kthread+0x178/0x1a0
ret_from_fork+0x2c/0x50
value changed: 0x0000000000024000 -> 0x0000000000034000
Fixes: 3f49d6842569 ("RDMA/irdma: Implement HW Admin Queue OPs")
Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251125025350.180-2-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The variable udata cannot be NULL because irdma_create_user_ah() always
receives it. Therefore, the if() check can be safely removed.
Signed-off-by: Tuo Li <islituo@gmail.com>
Link: https://patch.msgid.link/20251112120253.68945-1-islituo@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The CQ registry was never actually used (ceq->reg_cq was always NULL),
so remove the dead code.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Link: https://patch.msgid.link/20251105162841.31786-1-jmoroni@google.com
Acked-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Need to take an SRQ lock in poll_cq before moving SRQ tail.
Signed-off-by: Jay Bhat <jay.bhat@intel.com>
Reviewed-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251031021726.1003-7-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
CQ shadow area should not be updated at the end of a page (once every
64th CQ entry), except when CQ has no more CQEs. SW must also increase
the requested CQ size by 1 and make sure the CQ is not exactly one page
in size. This is to address a quirk in the hardware.
Signed-off-by: Jay Bhat <jay.bhat@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251031021726.1003-4-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In case we get an unsignaled error completion, we silently consume the CQE by
pretending the QP does not exist. Without this, bookkeeping for signaled
completions does not work correctly.
Signed-off-by: Jay Bhat <jay.bhat@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251031021726.1003-5-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Failure to initialize info.create field to false in certain cases
was resulting in incorrect status code going to rdma-core when dereg_mr
failed during reset. To fix this, memset entire cqp_request->info
in irdma_alloc_and_get_cqp_request() function, so that this is not spread
all over the code.
Signed-off-by: Bhat, Jay <jay.bhat@intel.com>
Reviewed-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Krzysztof Czurylo <krzysztof.czurylo@intel.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251031021726.1003-2-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Enforce local fence for LOCAL_INV WRs to
avoid spurious FASTREG_VALID_MKEY async events
during heavy invalidation/registration activity.
Signed-off-by: Jacob Moroni <jmoroni@google.com>
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Link: https://patch.msgid.link/20251031021726.1003-3-tatyana.e.nikolova@intel.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|