| Age | Commit message (Collapse) | Author | Files | Lines |
|
atomic_write_reply() at drivers/infiniband/sw/rxe/rxe_resp.c
unconditionally dereferences 8 bytes at payload_addr(pkt):
value = *(u64 *)payload_addr(pkt);
check_rkey() previously accepted an ATOMIC_WRITE request with pktlen ==
resid == 0 because the length validation only compared pktlen against
resid. A remote initiator that sets the RETH length to 0 therefore reaches
atomic_write_reply() with a zero-byte logical payload, and the responder
reads sizeof(u64) bytes from past the logical end of the packet into
skb->head tailroom, then writes those 8 bytes into the attacker's MR via
rxe_mr_do_atomic_write(). That is a remote disclosure of 4 bytes of kernel
tailroom per probe (the other 4 bytes are the packet's own trailing ICRC).
IBA oA19-28 defines ATOMIC_WRITE as exactly 8 bytes. Anything else is
protocol-invalid. Hoist a strict length check into check_rkey() so the
responder never reaches the unchecked dereference, and keep the existing
WRITE-family length logic for the normal RDMA WRITE path.
Reproduced on mainline with an unmodified rxe driver: a sustained
zero-length ATOMIC_WRITE probe repeatedly leaks adjacent skb head-buffer
bytes into the attacker's MR, including recognisable kernel strings and
partial kernel-direct-map pointer words. With this patch applied the
responder rejects the PDU and the MR stays all-zero.
Cc: stable@vger.kernel.org
Fixes: 034e285f8b99 ("RDMA/rxe: Make responder support atomic write on RC service")
Link: https://patch.msgid.link/r/20260418162141.3610201-1-michael.bommarito@gmail.com
Assisted-by: Claude:claude-opus-4-7
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Even after applying commit 7244491dab34 ("RDMA/rxe: Validate pad and ICRC
before payload_size() in rxe_rcv"), a single unauthenticated UDP packet
can still trigger panic. That patch handled payload_size() underflow only
for valid opcodes with short packets, not for packets carrying an unknown
opcode. The unknown-opcode OOB read described below predates that commit
and reaches back to the initial Soft RoCE driver.
The check added there reads
pkt->paylen < header_size(pkt) + bth_pad(pkt) + RXE_ICRC_SIZE
where header_size(pkt) expands to rxe_opcode[pkt->opcode].length. The
rxe_opcode[] array has 256 entries but is only populated for defined IB
opcodes; any other entry (for example opcode 0xff) is zero-initialized, so
length == 0 and the check degenerates to
pkt->paylen < 0 + bth_pad(pkt) + RXE_ICRC_SIZE
which does not constrain pkt->paylen enough. rxe_icrc_hdr() then computes
rxe_opcode[pkt->opcode].length - RXE_BTH_BYTES
which underflows when length == 0 and passes a huge value to rxe_crc32(),
causing an out-of-bounds read of the skb payload.
Reproduced on v7.0-rc7 with that fix applied, QEMU/KVM with
CONFIG_RDMA_RXE=y and CONFIG_KASAN=y, after
rdma link add rxe0 type rxe netdev eth0
A single 48-byte UDP packet to port 4791 with BTH opcode=0xff and
QPN=IB_MULTICAST_QPN triggers:
BUG: KASAN: slab-out-of-bounds in crc32_le+0x115/0x170
Read of size 1 at addr ...
The buggy address is located 0 bytes to the right of
allocated 704-byte region
Call Trace:
crc32_le+0x115/0x170
rxe_icrc_hdr.isra.0+0x226/0x300
rxe_icrc_check+0x13f/0x3a0
rxe_rcv+0x6e1/0x16e0
rxe_udp_encap_recv+0x20a/0x320
udp_queue_rcv_one_skb+0x7ed/0x12c0
Subsequent packets with the same shape fault on unmapped memory and panic
the kernel. The trigger requires only module load and "rdma link add"; no
QP, no connection, and no authentication.
Fix this by rejecting packets whose opcode has no rxe_opcode[] entry,
detected via the zero mask or zero length, before any length arithmetic
runs.
Cc: stable@vger.kernel.org
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://patch.msgid.link/r/20260414111555.3386793-1-michael.bommarito@gmail.com
Assisted-by: Claude:claude-opus-4-6
Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Pull rdma updates from Jason Gunthorpe:
"The usual collection of driver changes, more core infrastructure
updates that typical this cycle:
- Minor cleanups and kernel-doc fixes in bnxt_re, hns, rdmavt, efa,
ocrdma, erdma, rtrs, hfi1, ionic, and pvrdma
- New udata validation framework and driver updates
- Modernize CQ creation interface in mlx4 and mlx5, manage CQ umem in
core
- Promote UMEM to a core component, split out DMA block iterator
logic
- Introduce FRMR pools with aging, statistics, pinned handles, and
netlink control and use it in mlx5
- Add PCIe TLP emulation support in mlx5
- Extend umem to work with revocable pinned dmabuf's and use it in
irdma
- More net namespace improvements for rxe
- GEN4 hardware support in irdma
- First steps to MW and UC support in mana_ib
- Support for CQ umem and doorbells in bnxt_re
- Drop opa_vnic driver from hfi1
Fixes:
- IB/core zero dmac neighbor resolution race
- GID table memory free
- rxe pad/ICRC validation and r_key async errors
- mlx4 external umem for CQ
- umem DMA attributes on unmap
- mana_ib RX steering on RSS QP destroy"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (116 commits)
RDMA/core: Fix user CQ creation for drivers without create_cq
RDMA/ionic: bound node_desc sysfs read with %.64s
IB/core: Fix zero dmac race in neighbor resolution
RDMA/mana_ib: Support memory windows
RDMA/rxe: Validate pad and ICRC before payload_size() in rxe_rcv
RDMA/core: Prefer NLA_NUL_STRING
RDMA/core: Fix memory free for GID table
RDMA/hns: Remove the duplicate calls to ib_copy_validate_udata_in()
RDMA: Remove redundant = {} for udata req structs
RDMA/irdma: Add missing comp_mask check in alloc_ucontext
RDMA/hns: Add missing comp_mask check in create_qp
RDMA/mlx5: Pull comp_mask validation into ib_copy_validate_udata_in_cm()
RDMA: Use ib_copy_validate_udata_in_cm() for zero comp_mask
RDMA/hns: Use ib_copy_validate_udata_in()
RDMA/mlx4: Use ib_copy_validate_udata_in() for QP
RDMA/mlx4: Use ib_copy_validate_udata_in()
RDMA/mlx5: Use ib_copy_validate_udata_in() for MW
RDMA/mlx5: Use ib_copy_validate_udata_in() for SRQ
RDMA/pvrdma: Use ib_copy_validate_udata_in() for srq
RDMA: Use ib_copy_validate_udata_in() for implicit full structs
...
|
|
rxe_rcv() currently checks only that the incoming packet is at least
header_size(pkt) bytes long before payload_size() is used.
However, payload_size() subtracts both the attacker-controlled BTH pad
field and RXE_ICRC_SIZE from pkt->paylen:
payload_size = pkt->paylen - offset[RXE_PAYLOAD] - bth_pad(pkt)
- RXE_ICRC_SIZE
This means a short packet can still make payload_size() underflow even
if it includes enough bytes for the fixed headers. Simply requiring
header_size(pkt) + RXE_ICRC_SIZE is not sufficient either, because a
packet with a forged non-zero BTH pad can still leave payload_size()
negative and pass an underflowed value to later receive-path users.
Fix this by validating pkt->paylen against the full minimum length
required by payload_size(): header_size(pkt) + bth_pad(pkt) +
RXE_ICRC_SIZE.
Cc: stable@vger.kernel.org
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Link: https://patch.msgid.link/r/20260401121907.1468366-1-hkbinbinbin@gmail.com
Signed-off-by: hkbinbin <hkbinbinbin@gmail.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Similar to the prior patch, these patterns are open coding an
offsetofend() using sizeof(), which targets the last member of the
current struct.
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Bernard Metzler <bernard.metzler@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Instead of checking whether the number of CQEs is negative or zero, fix the
.resize_user_cq() declaration to use unsigned int. This better reflects the
expected value range. The sanity check is then handled correctly in ib_uvbers.
Link: https://patch.msgid.link/20260319-resize_cq-cqe-v1-1-b78c6efc1def@nvidia.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The CQ resize operation is used only by uverbs. Make this explicit.
Link: https://patch.msgid.link/20260318-resize_cq-type-v1-2-b2846ed18846@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
After introducing dellink handling and per-net namespace management
for IPv4 and IPv6 sockets, extend rxe to create and destroy RDMA links
within each network namespace.
With this change, RDMA links can be instantiated both in init_net and
in other network namespaces. The lifecycle of the RDMA link is now tied
to the corresponding namespace and is properly cleaned up when the
namespace or link is removed.
This ensures rxe behaves correctly in multi-namespace environments and
keeps socket and RDMA link resources consistent across namespace
creation and teardown.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20260313023058.13020-4-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add a net namespace implementation file to rxe to manage the
lifecycle of IPv4 and IPv6 sockets per network namespace.
This implementation handles the creation and destruction of the
sockets both for init_net and for dynamically created network
namespaces. The sockets are initialized when a namespace becomes
active and are properly released when the namespace is removed.
This change provides the infrastructure needed for rxe to operate
correctly in environments using multiple network namespaces.
Reviewed-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20260313023058.13020-3-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
This patch continues the effort to refactor workqueue APIs, which has begun
with the changes introducing new workqueues and a new alloc_workqueue flag:
commit 128ea9f6ccfb ("workqueue: Add system_percpu_wq and system_dfl_wq")
commit 930c2ea566af ("workqueue: Add new WQ_PERCPU flag")
The point of the refactoring is to eventually alter the default behavior of
workqueues to become unbound by default so that their workload placement is
optimized by the scheduler.
Before that to happen, workqueue users must be converted to the better named
new workqueues with no intended behaviour changes:
system_wq -> system_percpu_wq
system_unbound_wq -> system_dfl_wq
This way the old obsolete workqueues (system_wq, system_unbound_wq) can be
removed in the future.
This specific driver already allocate an unbound workqueue named "rxe_wq",
so replace system_unbound_wq with this one instead of system_dfl_wq.
Link: https://lore.kernel.org/all/20250221112003.1dSuoGyc@linutronix.de/
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20260318152748.837388-1-marco.crivellari@suse.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
As IPv6 is built-in only, the ipv6_stub infrastructure is no longer
necessary.
Convert all drivers currently utilizing ipv6_stub to make direct
function calls. The fallback functions introduced previously will
prevent linkage errors when CONFIG_IPV6 is disabled.
Signed-off-by: Fernando Fernandez Mancera <fmancera@suse.de>
Tested-by: Ricardo B. Marlière <rbm@suse.com>
Reviewed-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Antonio Quartulli <antonio@openvpn.net>
Reviewed-by: Edward Cree <ecree.xilinx@gmail.com>
Link: https://patch.msgid.link/20260325120928.15848-7-fmancera@suse.de
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Table 63 of the IBTA spec lists R_Key violations as a class C
error. 9.9.3.1.3 Responder Class C Fault Behavior indicates an
affiliated asynchronous error should be generated at the responder
if the error can be associated to a QP but not a particular RX WQE.
Relevant portion of the spec:
C9-222.1.1: For an HCA responder using Reliable Connection service, for
a Class C responder side error, the error shall be reported to the
requester by generating the appropriate NAK code as specified in Table 63
Responder Error Behavior Summary on page 448. If the error can be related
to a particular QP but cannot be related to a particular WQE on that
receive queue (e.g. the error occurred while executing an RDMA Write
Request without immediate data), the error shall be reported to the
responder’s client as an Affiliated Asynchronous error. See Section
10.10.2.3 Asynchronous Errors on page 576 for details. If the error can be
related to a particular WQE on a given receive queue, the QP shall be
placed into the error state and the error shall be reported to the
responder’s client as a Completion error.
Generate an affiliated asynchronous error upon Rkey violations
if the opcode does not carry an immediate. This causes async
events at the responder for all ops that generate R_Key violations
except WRITE_WITH_IMM, where the error can ride in with the RX WQE.
Signed-off-by: Evan Green <evgreen@meta.com>
Link: https://patch.msgid.link/20260220185533.252759-1-evgreen@meta.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
This is the exact same thing as the 'alloc_obj()' version, only much
smaller because there are a lot fewer users of the *alloc_flex()
interface.
As with alloc_obj() version, this was done entirely with mindless brute
force, using the same script, except using 'flex' in the pattern rather
than 'objs*'.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Pull rdma updates from Jason Gunthorpe:
"Usual smallish cycle. The NFS biovec work to push it down into RDMA
instead of indirecting through a scatterlist is pretty nice to see,
been talked about for a long time now.
- Various code improvements in irdma, rtrs, qedr, ocrdma, irdma, rxe
- Small driver improvements and minor bug fixes to hns, mlx5, rxe,
mana, mlx5, irdma
- Robusness improvements in completion processing for EFA
- New query_port_speed() verb to move past limited IBA defined speed
steps
- Support for SG_GAPS in rts and many other small improvements
- Rare list corruption fix in iwcm
- Better support different page sizes in rxe
- Device memory support for mana
- Direct bio vec to kernel MR for use by NFS-RDMA
- QP rate limiting for bnxt_re
- Remote triggerable NULL pointer crash in siw
- DMA-buf exporter support for RDMA mmaps like doorbells"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits)
RDMA/mlx5: Implement DMABUF export ops
RDMA/uverbs: Add DMABUF object type and operations
RDMA/uverbs: Support external FD uobjects
RDMA/siw: Fix potential NULL pointer dereference in header processing
RDMA/umad: Reject negative data_len in ib_umad_write
IB/core: Extend rate limit support for RC QPs
RDMA/mlx5: Support rate limit only for Raw Packet QP
RDMA/bnxt_re: Report QP rate limit in debugfs
RDMA/bnxt_re: Report packet pacing capabilities when querying device
RDMA/bnxt_re: Add support for QP rate limiting
MAINTAINERS: Drop RDMA files from Hyper-V section
RDMA/uverbs: Add __GFP_NOWARN to ib_uverbs_unmarshall_recv() kmalloc
svcrdma: use bvec-based RDMA read/write API
RDMA/core: add rdma_rw_max_sge() helper for SQ sizing
RDMA/core: add MR support for bvec-based RDMA operations
RDMA/core: use IOVA-based DMA mapping for bvec RDMA operations
RDMA/core: add bio_vec based RDMA read/write API
RDMA/irdma: Use kvzalloc for paged memory DMA address array
RDMA/rxe: Fix race condition in QP timer handlers
RDMA/mana_ib: Add device‑memory support
...
|
|
I encontered the following warning:
WARNING: drivers/infiniband/sw/rxe/rxe_task.c:249 at rxe_sched_task+0x1c8/0x238 [rdma_rxe], CPU#0: swapper/0/0
...
libsha1 [last unloaded: ip6_udp_tunnel]
CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Tainted: G C 6.19.0-rc5-64k-v8+ #37 PREEMPT
Tainted: [C]=CRAP
Hardware name: Raspberry Pi 4 Model B Rev 1.2
Call trace:
rxe_sched_task+0x1c8/0x238 [rdma_rxe] (P)
retransmit_timer+0x130/0x188 [rdma_rxe]
call_timer_fn+0x68/0x4d0
__run_timers+0x630/0x888
...
WARNING: drivers/infiniband/sw/rxe/rxe_task.c:38 at rxe_sched_task+0x1c0/0x238 [rdma_rxe], CPU#0: swapper/0/0
...
WARNING: drivers/infiniband/sw/rxe/rxe_task.c:111 at do_work+0x488/0x5c8 [rdma_rxe], CPU#3: kworker/u17:4/93400
...
refcount_t: underflow; use-after-free.
WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x138/0x1a0, CPU#3: kworker/u17:4/93400
The issue is caused by a race condition between retransmit_timer() and
rxe_destroy_qp, leading to the Queue Pair's (QP) reference count dropping
to zero during timer handler execution.
It seems this warning is harmless because rxe_qp_do_cleanup() will flush
all pending timers and requests.
Example of flow causing the issue:
CPU0 CPU1
retransmit_timer() {
spin_lock_irqsave
rxe_destroy_qp()
__rxe_cleanup()
__rxe_put() // qp->ref_count decrease to 0
rxe_qp_do_cleanup() {
if (qp->valid) {
rxe_sched_task() {
WARN_ON(rxe_read(task->qp) <= 0);
}
}
spin_unlock_irqrestore
}
spin_lock_irqsave
qp->valid = 0
spin_unlock_irqrestore
}
Ensure the QP's reference count is maintained and its validity is checked
within the timer callbacks by adding calls to rxe_get(qp) and corresponding
rxe_put(qp) after use.
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Fixes: d94671632572 ("RDMA/rxe: Rewrite rxe_task.c")
Link: https://patch.msgid.link/20260120074437.623018-1-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The current implementation incorrectly handles memory regions (MRs) with
page sizes different from the system PAGE_SIZE. The core issue is that
rxe_set_page() is called with mr->page_size step increments, but the
page_list stores individual struct page pointers, each representing
PAGE_SIZE of memory.
ib_sg_to_page() has ensured that when i>=1 either
a) SG[i-1].dma_end and SG[i].dma_addr are contiguous
or
b) SG[i-1].dma_end and SG[i].dma_addr are mr->page_size aligned.
This leads to incorrect iova-to-va conversion in scenarios:
1) page_size < PAGE_SIZE (e.g., MR: 4K, system: 64K):
ibmr->iova = 0x181800
sg[0]: dma_addr=0x181800, len=0x800
sg[1]: dma_addr=0x173000, len=0x1000
Access iova = 0x181800 + 0x810 = 0x182010
Expected VA: 0x173010 (second SG, offset 0x10)
Before fix:
- index = (0x182010 >> 12) - (0x181800 >> 12) = 1
- page_offset = 0x182010 & 0xFFF = 0x10
- xarray[1] stores system page base 0x170000
- Resulting VA: 0x170000 + 0x10 = 0x170010 (wrong)
2) page_size > PAGE_SIZE (e.g., MR: 64K, system: 4K):
ibmr->iova = 0x18f800
sg[0]: dma_addr=0x18f800, len=0x800
sg[1]: dma_addr=0x170000, len=0x1000
Access iova = 0x18f800 + 0x810 = 0x190010
Expected VA: 0x170010 (second SG, offset 0x10)
Before fix:
- index = (0x190010 >> 16) - (0x18f800 >> 16) = 1
- page_offset = 0x190010 & 0xFFFF = 0x10
- xarray[1] stores system page for dma_addr 0x170000
- Resulting VA: system page of 0x170000 + 0x10 = 0x170010 (wrong)
Yi Zhang reported a kernel panic[1] years ago related to this defect.
Solution:
1. Replace xarray with pre-allocated rxe_mr_page array for sequential
indexing (all MR page indices are contiguous)
2. Each rxe_mr_page stores both struct page* and offset within the
system page
3. Handle MR page_size != PAGE_SIZE relationships:
- page_size > PAGE_SIZE: Split MR pages into multiple system pages
- page_size <= PAGE_SIZE: Store offset within system page
4. Add boundary checks and compatibility validation
This ensures correct iova-to-va conversion regardless of MR page size
and system PAGE_SIZE relationship, while improving performance through
array-based sequential access.
Tests on 4K and 64K PAGE_SIZE hosts:
- rdma-core/pytests
$ ./build/bin/run_tests.py --dev eth0_rxe
- blktest:
$ TIMEOUT=30 QUICK_RUN=1 USE_RXE=1 NVMET_TRTYPES=rdma ./check nvme srp rnbd
[1] https://lore.kernel.org/all/CAHj4cs9XRqE25jyVw9rj9YugffLn5+f=1znaBEnu1usLOciD+g@mail.gmail.com/T/
Fixes: 592627ccbdff ("RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://patch.msgid.link/20260116032753.2574363-1-lizhijian@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In rxe_map_mr_sg(), the `page_offset` member of the `rxe_mr` struct
was initialized based on `ibmr.iova`, which will be updated inside
ib_sg_to_pages() later.
Consequently, the value assigned to `page_offset` was incorrect. However,
since `page_offset` was never utilized throughout the code, it can be safely
removed to clean up the codebase and avoid future confusion.
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://patch.msgid.link/20260116032833.2574627-1-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In rxe_srq_from_init(), the queue pointer 'q' is assigned to
'srq->rq.queue' before copying the SRQ number to user space.
If copy_to_user() fails, the function calls rxe_queue_cleanup()
to free the queue, but leaves the now-invalid pointer in
'srq->rq.queue'.
The caller of rxe_srq_from_init() (rxe_create_srq) eventually
calls rxe_srq_cleanup() upon receiving the error, which triggers
a second rxe_queue_cleanup() on the same memory, leading to a
double free.
The call trace looks like this:
kmem_cache_free+0x.../0x...
rxe_queue_cleanup+0x1a/0x30 [rdma_rxe]
rxe_srq_cleanup+0x42/0x60 [rdma_rxe]
rxe_elem_release+0x31/0x70 [rdma_rxe]
rxe_create_srq+0x12b/0x1a0 [rdma_rxe]
ib_create_srq_user+0x9a/0x150 [ib_core]
Fix this by moving 'srq->rq.queue = q' after copy_to_user.
Fixes: aae0484e15f0 ("IB/rxe: avoid srq memory leak")
Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com>
Link: https://patch.msgid.link/20260112015412.29458-1-jiashengjiangcool@gmail.com
Reviewed-by: Zhu Yanjun <yanjun.Zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
rxe_odp_map_range_and_lock() must release umem_odp->umem_mutex when an
error occurs, including cases where rxe_check_pagefault() fails.
Fixes: 2fae67ab63db ("RDMA/rxe: Add support for Send/Recv/Write/Read with ODP")
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Link: https://patch.msgid.link/20251226094112.3042583-1-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
On kernels build with CONFIG_PROVE_LOCKING, CONFIG_MODULES
and CONFIG_DEBUG_LOCK_ALLOC 'rmmod rdma_rxe' is no longer
possible.
For the global recv sockets rxe_net_exit() is where we
call rxe_release_udp_tunnel-> udp_tunnel_sock_release(),
which means the sockets are destroyed before 'rmmod rdma_rxe'
finishes, so there's no need to protect against
rxe_recv_slock_key and rxe_recv_sk_key disappearing
while the sockets are still alive.
Fixes: 80a85a771deb ("RDMA/rxe: reclassify sockets in order to avoid false positives from lockdep")
Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://patch.msgid.link/20251219140408.2300163-1-metze@samba.org
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
While developing IPPROTO_SMBDIRECT support for the code
under fs/smb/common/smbdirect [1], I noticed false positives like this:
[+0,003927] ============================================
[+0,000532] WARNING: possible recursive locking detected
[+0,000611] 6.18.0-rc5-metze-kasan-lockdep.02+ #1 Tainted: G OE
[+0,000835] --------------------------------------------
[+0,000729] ksmbd:r5445/3609 is trying to acquire lock:
[+0,000709] ffff88800b9570f8 (k-sk_lock-AF_INET){+.+.}-{0:0},
at: inet_shutdown+0x52/0x360
[+0,000831]
but task is already holding lock:
[+0,000684] ffff88800654af78 (k-sk_lock-AF_INET){+.+.}-{0:0},
at: smbdirect_sk_close+0x122/0x790 [smbdirect]
[+0,000928]
other info that might help us debug this:
[+0,005552] Possible unsafe locking scenario:
[+0,000723] CPU0
[+0,000359] ----
[+0,000377] lock(k-sk_lock-AF_INET);
[+0,000478] lock(k-sk_lock-AF_INET);
[+0,000498]
*** DEADLOCK ***
[+0,001012] May be due to missing lock nesting notation
[+0,000831] 3 locks held by ksmbd:r5445/3609:
[+0,000484] #0: ffff88800654af78 (k-sk_lock-AF_INET){+.+.}-{0:0},
at: smbdirect_sk_close+0x122/0x790 [smbdirect]
[+0,001000] #1: ffff888020a40458 (&id_priv->handler_mutex){+.+.}-{4:4},
at: rdma_lock_handler+0x17/0x30 [rdma_cm]
[+0,000982] #2: ffff888020a40350 (&id_priv->qp_mutex){+.+.}-{4:4},
at: rdma_destroy_qp+0x5d/0x1f0 [rdma_cm]
[+0,000934]
stack backtrace:
[+0,000589] CPU: 0 UID: 0 PID: 3609 Comm: ksmbd:r5445 Kdump: loaded
Tainted: G OE
6.18.0-rc5-metze-kasan-lockdep.02+ #1 PREEMPT(voluntary)
[+0,000023] Tainted: [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[+0,000004] Hardware name: innotek GmbH VirtualBox/VirtualBox,
BIOS VirtualBox 12/01/2006
...
[+0,000010] print_deadlock_bug+0x245/0x330
[+0,000014] validate_chain+0x32a/0x590
[+0,000012] __lock_acquire+0x535/0xc30
[+0,000013] lock_acquire.part.0+0xb3/0x240
[+0,000017] ? inet_shutdown+0x52/0x360
[+0,000013] ? srso_alias_return_thunk+0x5/0xfbef5
[+0,000007] ? mark_held_locks+0x46/0x90
[+0,000012] lock_acquire+0x60/0x140
[+0,000006] ? inet_shutdown+0x52/0x360
[+0,000028] lock_sock_nested+0x3b/0xf0
[+0,000009] ? inet_shutdown+0x52/0x360
[+0,000008] inet_shutdown+0x52/0x360
[+0,000010] kernel_sock_shutdown+0x5b/0x90
[+0,000011] rxe_qp_do_cleanup+0x4ef/0x810 [rdma_rxe]
[+0,000043] ? __pfx_rxe_qp_do_cleanup+0x10/0x10 [rdma_rxe]
[+0,000030] execute_in_process_context+0x2b/0x170
[+0,000013] rxe_qp_cleanup+0x1c/0x30 [rdma_rxe]
[+0,000021] __rxe_cleanup+0x1cf/0x2e0 [rdma_rxe]
[+0,000036] ? __pfx___rxe_cleanup+0x10/0x10 [rdma_rxe]
[+0,000020] ? srso_alias_return_thunk+0x5/0xfbef5
[+0,000006] ? __kasan_check_read+0x11/0x20
[+0,000012] rxe_destroy_qp+0xe1/0x230 [rdma_rxe]
[+0,000035] ib_destroy_qp_user+0x217/0x450 [ib_core]
[+0,000074] rdma_destroy_qp+0x83/0x1f0 [rdma_cm]
[+0,000034] smbdirect_connection_destroy_qp+0x98/0x2e0 [smbdirect]
[+0,000017] ? __pfx_smb_direct_logging_needed+0x10/0x10 [ksmbd]
[+0,000044] smbdirect_connection_destroy+0x698/0xed0 [smbdirect]
[+0,000023] ? __pfx_smbdirect_connection_destroy+0x10/0x10 [smbdirect]
[+0,000033] ? __pfx_smb_direct_logging_needed+0x10/0x10 [ksmbd]
[+0,000031] smbdirect_connection_destroy_sync+0x42b/0x9f0 [smbdirect]
[+0,000029] ? mark_held_locks+0x46/0x90
[+0,000012] ? __pfx_smbdirect_connection_destroy_sync+0x10/0x10 [smbdirect]
[+0,000019] ? srso_alias_return_thunk+0x5/0xfbef5
[+0,000007] ? trace_hardirqs_on+0x64/0x70
[+0,000029] ? srso_alias_return_thunk+0x5/0xfbef5
[+0,000010] ? srso_alias_return_thunk+0x5/0xfbef5
[+0,000006] ? __smbdirect_connection_schedule_disconnect+0x339/0x4b0
[+0,000021] smbdirect_sk_destroy+0xb0/0x680 [smbdirect]
[+0,000024] ? srso_alias_return_thunk+0x5/0xfbef5
[+0,000006] ? trace_hardirqs_on+0x64/0x70
[+0,000006] ? srso_alias_return_thunk+0x5/0xfbef5
[+0,000005] ? __local_bh_enable_ip+0xba/0x150
[+0,000011] sk_common_release+0x66/0x340
[+0,000010] smbdirect_sk_close+0x12a/0x790 [smbdirect]
[+0,000023] ? ip_mc_drop_socket+0x1e/0x240
[+0,000013] inet_release+0x10a/0x240
[+0,000011] smbdirect_sock_release+0x502/0xe80 [smbdirect]
[+0,000015] ? srso_alias_return_thunk+0x5/0xfbef5
[+0,000024] sock_release+0x91/0x1c0
[+0,000010] smb_direct_free_transport+0x31/0x50 [ksmbd]
[+0,000025] ksmbd_conn_free+0x1d0/0x240 [ksmbd]
[+0,000040] smb_direct_disconnect+0xb2/0x120 [ksmbd]
[+0,000023] ? srso_alias_return_thunk+0x5/0xfbef5
[+0,000018] ksmbd_conn_handler_loop+0x94e/0xf10 [ksmbd]
...
I'll also add reclassify to the smbdirect socket code [1],
but I think it's better to have it in both direction
(below and above the RDMA layer).
[1]
https://git.samba.org/?p=metze/linux/wip.git;a=shortlog;h=refs/heads/master-ipproto-smbdirect
Cc: Zhu Yanjun <zyjzyj2000@gmail.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Cc: linux-cifs@vger.kernel.org
Signed-off-by: Stefan Metzmacher <metze@samba.org>
Link: https://patch.msgid.link/20251127105614.2040922-1-metze@samba.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
A NULL pointer dereference can occur in rxe_srq_chk_attr() when
ibv_modify_srq() is invoked twice in succession under certain error
conditions. The first call may fail in rxe_queue_resize(), which leads
rxe_srq_from_attr() to set srq->rq.queue = NULL. The second call then
triggers a crash (null deref) when accessing
srq->rq.queue->buf->index_mask.
Call Trace:
<TASK>
rxe_modify_srq+0x170/0x480 [rdma_rxe]
? __pfx_rxe_modify_srq+0x10/0x10 [rdma_rxe]
? uverbs_try_lock_object+0x4f/0xa0 [ib_uverbs]
? rdma_lookup_get_uobject+0x1f0/0x380 [ib_uverbs]
ib_uverbs_modify_srq+0x204/0x290 [ib_uverbs]
? __pfx_ib_uverbs_modify_srq+0x10/0x10 [ib_uverbs]
? tryinc_node_nr_active+0xe6/0x150
? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x2c0/0x470 [ib_uverbs]
? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
? uverbs_fill_udata+0xed/0x4f0 [ib_uverbs]
ib_uverbs_run_method+0x55a/0x6e0 [ib_uverbs]
? __pfx_ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x10/0x10 [ib_uverbs]
ib_uverbs_cmd_verbs+0x54d/0x800 [ib_uverbs]
? __pfx_ib_uverbs_cmd_verbs+0x10/0x10 [ib_uverbs]
? __pfx___raw_spin_lock_irqsave+0x10/0x10
? __pfx_do_vfs_ioctl+0x10/0x10
? ioctl_has_perm.constprop.0.isra.0+0x2c7/0x4c0
? __pfx_ioctl_has_perm.constprop.0.isra.0+0x10/0x10
ib_uverbs_ioctl+0x13e/0x220 [ib_uverbs]
? __pfx_ib_uverbs_ioctl+0x10/0x10 [ib_uverbs]
__x64_sys_ioctl+0x138/0x1c0
do_syscall_64+0x82/0x250
? fdget_pos+0x58/0x4c0
? ksys_write+0xf3/0x1c0
? __pfx_ksys_write+0x10/0x10
? do_syscall_64+0xc8/0x250
? __pfx_vm_mmap_pgoff+0x10/0x10
? fget+0x173/0x230
? fput+0x2a/0x80
? ksys_mmap_pgoff+0x224/0x4c0
? do_syscall_64+0xc8/0x250
? do_user_addr_fault+0x37b/0xfe0
? clear_bhb_loop+0x50/0xa0
? clear_bhb_loop+0x50/0xa0
? clear_bhb_loop+0x50/0xa0
entry_SYSCALL_64_after_hwframe+0x76/0x7e
Fixes: 8700e3e7c485 ("Soft RoCE driver")
Tested-by: Liu Yi <asatsuyu.liu@gmail.com>
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20251027215203.1321-1-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The variable page_offset is being assigned a value at the start of
a loop and being redundantly zero'd at the end of the loop, there
is no code that reads the zero'd value. The assignment is redundant
and can be removed.
Signed-off-by: Colin Ian King <coking@nvidia.com>
Link: https://patch.msgid.link/20251014120343.2528608-1-coking@nvidia.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Pull rdma updates from Jason Gunthorpe:
"A new Pensando ionic driver, a new Gen 3 HW support for Intel irdma,
and lots of small bnxt_re improvements.
- Small bug fixes and improves to hfi1, efa, mlx5, erdma, rdmarvt,
siw
- Allow userspace access to IB service records through the rdmacm
- Optimize dma mapping for erdma
- Fix shutdown of the GSI QP in mana
- Support relaxed ordering MR and fix a corruption bug with mlx5 DMA
Data Direct
- Many improvement to bnxt_re:
- Debugging features and counters
- Improve performance of some commands
- Change flow_label reporting in completions
- Mirror vnic
- RDMA flow support
- New RDMA driver for Pensando Ethernet devices: ionic
- Gen 3 hardware support for the Intel irdma driver
- Fix rdma routing resolution with VRFs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (85 commits)
RDMA/ionic: Fix memory leak of admin q_wr
RDMA/siw: Always report immediate post SQ errors
RDMA/bnxt_re: improve clarity in ALLOC_PAGE handler
RDMA/irdma: Remove unused struct irdma_cq fields
RDMA/irdma: Fix positive vs negative error codes in irdma_post_send()
RDMA/bnxt_re: Remove non-statistics counters from hw_counters
RDMA/bnxt_re: Add debugfs info entry for device and resource information
RDMA/bnxt_re: Fix incorrect errno used in function comments
RDMA: Use %pe format specifier for error pointers
RDMA/ionic: Use ether_addr_copy instead of memcpy
RDMA/ionic: Fix build failure on SPARC due to xchg() operand size
RDMA/rxe: Fix race in do_task() when draining
IB/sa: Fix sa_local_svc_timeout_ms read race
IB/ipoib: Ignore L3 master device
RDMA/core: Use route entry flag to decide on loopback traffic
RDMA/core: Resolve MAC of next-hop device without ARP support
RDMA/core: Squash a single user static function
RDMA/irdma: Update Kconfig
RDMA/irdma: Extend CQE Error and Flush Handling for GEN3 Devices
RDMA/irdma: Add Atomic Operations support
...
|
|
When do_task() exhausts its iteration budget (!ret), it sets the state
to TASK_STATE_IDLE to reschedule, without a secondary check on the
current task->state. This can overwrite the TASK_STATE_DRAINING state
set by a concurrent call to rxe_cleanup_task() or rxe_disable_task().
While state changes are protected by a spinlock, both rxe_cleanup_task()
and rxe_disable_task() release the lock while waiting for the task to
finish draining in the while(!is_done(task)) loop. The race occurs if
do_task() hits its iteration limit and acquires the lock in this window.
The cleanup logic may then proceed while the task incorrectly
reschedules itself, leading to a potential use-after-free.
This bug was introduced during the migration from tasklets to workqueues,
where the special handling for the draining case was lost.
Fix this by restoring the original pre-migration behavior. If the state is
TASK_STATE_DRAINING when iterations are exhausted, set cont to 1 to
force a new loop iteration. This allows the task to finish its work, so
that a subsequent iteration can reach the switch statement and correctly
transition the state to TASK_STATE_DRAINED, stopping the task as intended.
Fixes: 9b4b7c1f9f54 ("RDMA/rxe: Add workqueue support for rxe tasks")
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Gui-Dong Han <hanguidong02@gmail.com>
Link: https://patch.msgid.link/20250919025212.1682087-1-hanguidong02@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
When skb packets are sent out, these skb packets still depends on
the rxe resources, for example, QP, sk, when these packets are
destroyed.
If these rxe resources are released when the skb packets are destroyed,
the call traces will appear.
To avoid skb packets hang too long time in some network devices,
a timestamp is added when these skb packets are created. If these
skb packets hang too long time in network devices, these network
devices can free these skb packets to release rxe resources.
Reported-by: syzbot+8425ccfb599521edb153@syzkaller.appspotmail.com
Closes: https://syzkaller.appspot.com/bug?extid=8425ccfb599521edb153
Tested-by: syzbot+8425ccfb599521edb153@syzkaller.appspotmail.com
Fixes: 1a633bdc8fd9 ("RDMA/rxe: Let destroy qp succeed with stuck packet")
Signed-off-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Link: https://patch.msgid.link/20250726013104.463570-1-yanjun.zhu@linux.dev
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Pull rdma updates from Jason Gunthorpe:
- Various minor code cleanups and fixes for hns, iser, cxgb4, hfi1,
rxe, erdma, mana_ib
- Prefetch supprot for rxe ODP
- Remove memory window support from hns as new device FW is no longer
support it
- Remove qib, it is very old and obsolete now, Cornelis wishes to
restructure the hfi1/qib shared layer
- Fix a race in destroying CQs where we can still end up with work
running because the work is cancled before the driver stops
triggering it
- Improve interaction with namespaces:
* Follow the devlink namespace for newly spawned RDMA devices
* Create iopoib net devces in the parent IB device's namespace
* Allow CAP_NET_RAW checks to pass in user namespaces
- A new flow control scheme for IB MADs to try and avoid queue
overflows in the network
- Fix 2G message sizes in bnxt_re
- Optimize mkey layout for mlx5 DMABUF
- New "DMA Handle" concept to allow controlling PCI TPH and steering
tags
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (71 commits)
RDMA/siw: Change maintainer email address
RDMA/mana_ib: add support of multiple ports
RDMA/mlx5: Refactor optional counters steering code
RDMA/mlx5: Add DMAH support for reg_user_mr/reg_user_dmabuf_mr
IB: Extend UVERBS_METHOD_REG_MR to get DMAH
RDMA/mlx5: Add DMAH object support
RDMA/core: Introduce a DMAH object and its alloc/free APIs
IB/core: Add UVERBS_METHOD_REG_MR on the MR object
net/mlx5: Add support for device steering tag
net/mlx5: Expose IFC bits for TPH
PCI/TPH: Expose pcie_tph_get_st_table_size()
RDMA/mlx5: Fix incorrect MKEY masking
RDMA/mlx5: Fix returned type from _mlx5r_umr_zap_mkey()
RDMA/mlx5: remove redundant check on err on return expression
RDMA/mana_ib: add additional port counters
RDMA/mana_ib: Fix DSCP value in modify QP
RDMA/efa: Add CQ with external memory support
RDMA/core: Add umem "is_contiguous" and "start_dma_addr" helpers
RDMA/uverbs: Add a common way to create CQ with umem
RDMA/mlx5: Optimize DMABUF mkey page size
...
|
|
Extend UVERBS_METHOD_REG_MR to get DMAH and pass it to all drivers.
It will be used in mlx5 driver as part of the next patch from the
series.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/2ae1e628c0675db81f092cc00d3ad6fbf6139405.1752752567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Commit cc34acd577f1 ("docs: net: document new locking reality")
introduced netif_ vs dev_ function semantics: the former expects locked
netdev, the latter takes care of the locking. We don't strictly
follow this semantics on either side, but there are more dev_xxx handlers
now that don't fit. Rename them to netif_xxx where appropriate.
Signed-off-by: Stanislav Fomichev <sdf@fomichev.me>
Link: https://patch.msgid.link/20250717172333.1288349-6-sdf@fomichev.me
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The lookup_mr() function returns NULL on error. It never returns error
pointers.
Fixes: 9284bc34c773 ("RDMA/rxe: Enable asynchronous prefetch for ODP MRs")
Fixes: 3576b0df1588 ("RDMA/rxe: Implement synchronous prefetch for ODP MRs")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://patch.msgid.link/685c1430.050a0220.18b0ef.da83@mx.google.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
hmm_pfn_to_page() does not return NULL. ib_umem_odp_map_dma_and_lock()
should return an error in case the target pages cannot be mapped until
timeout, so these checks can safely be removed.
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
Link: https://patch.msgid.link/20250611162758.10000-1-dskmtsd@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Calling ibv_advise_mr(3) with flags other than IBV_ADVISE_MR_FLAG_FLUSH
invokes an asynchronous request. It is best-effort, and thus can safely be
deferred to the system-wide workqueue.
The reference counter in rxe_mr is used to ensure that the MRs persist and
that rxe is not terminated until the queued work is done.
Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
Link: https://patch.msgid.link/20250522111955.3227-3-dskmtsd@gmail.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Minimal implementation of ibv_advise_mr(3) requires synchronous calls being
successful with the IBV_ADVISE_MR_FLAG_FLUSH flag. Asynchronous requests,
which are best-effort, will be supported subsequently.
Signed-off-by: Daisuke Matsuda <dskmtsd@gmail.com>
Link: https://patch.msgid.link/20250522111955.3227-2-dskmtsd@gmail.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Move this API to the canonical timer_*() namespace.
[ tglx: Redone against pre rc1 ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
|