| Age | Commit message (Collapse) | Author | Files | Lines |
|
If IB_MR_REREG_ACCESS changes from RO to RW then the umem has to be
re-evaluated to ensure it is properly pinned as RW. Since the umem is
hidden inside each driver's mr struct add a ib_umem_check_rereg() function
that each driver has to call before processing IB_MR_REREG_ACCESS.
mlx4 has to retain its duplicate ib_access_writable check because it
implements IB_MR_REREG_ACCESS | IB_MR_REREG_TRANS by changing both items
in place sequentially while the MR is live, so it will continue to not
support this combination.
Cc: stable@vger.kernel.org
Fixes: b40656aa7d55 ("RDMA/umem: remove FOLL_FORCE usage")
Link: https://patch.msgid.link/r/0-v1-06fb1a2d6cf5+107-rereg_access_jgg@nvidia.com
Reported-by: Philip Tsukerman <philiptsukerman@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Sashiko points out that mlx4_srq_alloc() was not undone during error
unwind, add the missing call to mlx4_srq_free().
Cc: stable@vger.kernel.org
Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters")
Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=8
Link: https://patch.msgid.link/r/11-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Now that all of the udata request structs are loaded with the helpers
the callers should not pre-zero them. The helpers all guarantee that
the entire struct is filled with something.
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
All of these cases require a 0 comp_mask. Consolidate these into
using ib_copy_validate_udata_in_cm() and remove the open coded
comp_mask test.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Move the validation of the udata to the same function that copies it.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Follow the last member of each struct at the point
MLX4_IB_UVERBS_ABI_VERSION was set to 4.
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Similar to the prior patch, these patterns are open coding an
offsetofend() using sizeof(), which targets the last member of the
current struct.
Reviewed-by: Long Li <longli@microsoft.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Bernard Metzler <bernard.metzler@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Go treewide and consolidate all existing patterns using:
* offsetofend() and variations
* ib_is_udata_cleared()
* ib_copy_from_udata()
into a direct call to the new ib_copy_validate_udata_in().
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
When the mlx4 firmware reports the MLX4_DEV_CAP_FLAG2_SW_CQ_INIT capability,
libmlx4 from the rdma-core package expects the driver to initialize memory
at the address provided in the buf_addr parameter of ucmd.
This behavior cannot be supported by any external umem implementation, so
restrict it accordingly.
Fixes: f45f195af521 ("RDMA/mlx4: Introduce a modern CQ creation interface")
Reported-by: Jiri Pirko <jiri@nvidia.com>
Link: https://patch.msgid.link/20260325-fix-mlx4-external-umem-v1-1-1c7c0e779329@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Instead of checking whether the number of CQEs is negative or zero, fix the
.resize_user_cq() declaration to use unsigned int. This better reflects the
expected value range. The sanity check is then handled correctly in ib_uvbers.
Link: https://patch.msgid.link/20260319-resize_cq-cqe-v1-1-b78c6efc1def@nvidia.com
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The CQ resize operation is used only by uverbs. Make this explicit.
Link: https://patch.msgid.link/20260318-resize_cq-type-v1-2-b2846ed18846@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Commits bf4afc53b77a ("Convert 'alloc_obj' family to use the new default
GFP_KERNEL argument") and 69050f8d6d07 ("treewide: Replace kmalloc with
kmalloc_obj for non-scalar types") updated various k[z|m|c]alloc calls to their
k[z|m]alloc_obj counterparts.
This commit finalizes that transition within the RDMA subsystem.
Link: https://patch.msgid.link/20260226-complete-alloc-conversion-v1-1-ebf1df1c2518@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The CQ creation flags do not need to be cached, as they are used
immediately at the point where they are stored. Remove the unused
field and reclaim 4 bytes.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-14-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The uverbs CQ creation UAPI allows users to supply their own umem when
creating a CQ. Update mlx4 to support this model while preserving compatibility
with the legacy interface that allocates umem internally.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-13-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Inline the mlx4_ib_get_cq_umem helper function into its two call sites
(mlx4_ib_create_cq and mlx4_alloc_resize_umem) to prepare for the
transition to modern CQ creation interface.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-12-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
The DMA iterator logic was mixed into verbs and umem-specific code,
forcing all users to include rdma/ib_umem.h. Move the block iterator
logic into iter.c and rdma/iter.h so that rdma/ib_umem.h and
rdma/ib_verbs.h can be separated in a follow-up patch.
Link: https://patch.msgid.link/20260213-refactor-umem-v1-1-f3be85847922@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Conversion performed via this Coccinelle script:
// SPDX-License-Identifier: GPL-2.0-only
// Options: --include-headers-for-types --all-includes --include-headers --keep-comments
virtual patch
@gfp depends on patch && !(file in "tools") && !(file in "samples")@
identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex,
kzalloc_obj,kzalloc_objs,kzalloc_flex,
kvmalloc_obj,kvmalloc_objs,kvmalloc_flex,
kvzalloc_obj,kvzalloc_objs,kvzalloc_flex};
@@
ALLOC(...
- , GFP_KERNEL
)
$ make coccicheck MODE=patch COCCI=gfp.cocci
Build and boot tested x86_64 with Fedora 42's GCC and Clang:
Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01
Signed-off-by: Kees Cook <kees@kernel.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This converts some of the visually simpler cases that have been split
over multiple lines. I only did the ones that are easy to verify the
resulting diff by having just that final GFP_KERNEL argument on the next
line.
Somebody should probably do a proper coccinelle script for this, but for
me the trivial script actually resulted in an assertion failure in the
middle of the script. I probably had made it a bit _too_ trivial.
So after fighting that far a while I decided to just do some of the
syntactically simpler cases with variations of the previous 'sed'
scripts.
The more syntactically complex multi-line cases would mostly really want
whitespace cleanup anyway.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This was done entirely with mindless brute force, using
git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' |
xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/'
to convert the new alloc_obj() users that had a simple GFP_KERNEL
argument to just drop that argument.
Note that due to the extreme simplicity of the scripting, any slightly
more complex cases spread over multiple lines would not be triggered:
they definitely exist, but this covers the vast bulk of the cases, and
the resulting diff is also then easier to check automatically.
For the same reason the 'flex' versions will be done as a separate
conversion.
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
|
|
This is the result of running the Coccinelle script from
scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to
avoid scalar types (which need careful case-by-case checking), and
instead replace kmalloc-family calls that allocate struct or union
object instances:
Single allocations: kmalloc(sizeof(TYPE), ...)
are replaced with: kmalloc_obj(TYPE, ...)
Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...)
are replaced with: kmalloc_objs(TYPE, COUNT, ...)
Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...)
are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...)
(where TYPE may also be *VAR)
The resulting allocations no longer return "void *", instead returning
"TYPE *".
Signed-off-by: Kees Cook <kees@kernel.org>
|
|
Currently if a user enqueue a work item using schedule_delayed_work() the
used wq is "system_wq" (per-cpu wq) while queue_delayed_work() use
WORK_CPU_UNBOUND (used when a cpu is not specified). The same applies to
schedule_work() that is using system_wq and queue_work(), that makes use
again of WORK_CPU_UNBOUND.
This lack of consistentcy cannot be addressed without refactoring the API.
alloc_workqueue() treats all queues as per-CPU by default, while unbound
workqueues must opt-in via WQ_UNBOUND.
This default is suboptimal: most workloads benefit from unbound queues,
allowing the scheduler to place worker threads where they’re needed and
reducing noise when CPUs are isolated.
This change adds a new WQ_PERCPU flag to explicitly request
alloc_workqueue() to be per-cpu when WQ_UNBOUND has not been specified.
With the introduction of the WQ_PERCPU flag (equivalent to !WQ_UNBOUND),
any alloc_workqueue() caller that doesn’t explicitly specify WQ_UNBOUND
must now use WQ_PERCPU.
Once migration is complete, WQ_UNBOUND can be removed and unbound will
become the implicit default.
CC: Yishai Hadas <yishaih@nvidia.com>
Suggested-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Marco Crivellari <marco.crivellari@suse.com>
Link: https://patch.msgid.link/20251101163121.78400-5-marco.crivellari@suse.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Convert error logging throughout the RDMA subsystem to use
the %pe format specifier instead of PTR_ERR() with integer
format specifiers.
Link: https://patch.msgid.link/e81ec02df1e474be20417fb62e779776e3f47a50.1758217936.git.leon@kernel.org
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Extend UVERBS_METHOD_REG_MR to get DMAH and pass it to all drivers.
It will be used in mlx5 driver as part of the next patch from the
series.
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Reviewed-by: Edward Srouji <edwards@nvidia.com>
Link: https://patch.msgid.link/2ae1e628c0675db81f092cc00d3ad6fbf6139405.1752752567.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
In drivers/infiniband/hw/mlx4/mcg.c and drivers/infiniband/hw/mlx5/mr.c,
`msecs_to_jiffies` is used to convert milliseconds to jiffies.
For constant milliseconds, using `msecs_to_jiffies` introduces additional
computational overhead. For example, it is unnecessary to check
if m > jiffies_to_msecs(MAX_JIFFY_OFFSET) or (int)m < 0 for constants,
while using `secs_to_jiffies` can avoid these extra calculations.
Link: https://patch.msgid.link/r/20250326221955611qu6Ix3Pt5WgKvhL6sTySX@zte.com.cn
Signed-off-by: Peng Jiang <jiang.peng9@zte.com.cn>
Signed-off-by: Ye Xingchen <ye.xingchen@zte.com.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
In addition to dispatching event, some private stuffs need to be
done in this driver's link status event handler. Implement the new
report_port_event() ops with the link status event codes.
Signed-off-by: Yuyu Li <liyuyu6@huawei.com>
Signed-off-by: Junxian Huang <huangjunxian6@hisilicon.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Replace an open coding of rdma_umem_for_each_dma_block() with
the proper function.
Link: https://patch.msgid.link/0bf595962c964fb8918743405acf9103a5a85983.1733233299.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Convert mlx4 to use ib_umem_find_best_pgsz() instead of open-coded
variant to calculate MTT size.
Link: https://patch.msgid.link/c39ec6f5d4664c439a72f2961728ebb5895a9f07.1733233299.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Smatch generates the following false error report:
drivers/infiniband/hw/mlx4/main.c:393 mlx4_ib_del_gid() error: uninitialized symbol 'gids'.
Traditionally, we are not changing kernel code and asking people to fix
the tools. However in this case, the fix can be done by simply rearranging
the code to be more clear.
Fixes: e26be1bfef81 ("IB/mlx4: Implement ib_device callbacks")
Link: https://patch.msgid.link/6a3a1577463da16962463fcf62883a87506e9b62.1733233426.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
|
|
Let alloc_ordered_workqueue() format the workqueue name instead of calling
snprintf() explicitly.
Link: https://patch.msgid.link/r/20240823101840.515398-5-ruanjinjie@huawei.com
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Let alloc_ordered_workqueue() format the workqueue name instead of calling
snprintf() explicitly.
Link: https://patch.msgid.link/r/20240823101840.515398-4-ruanjinjie@huawei.com
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Changes the create_cq verb signature by sending the entire uverbs attr
bundle as a parameter. This allows drivers to send driver specific attrs
through ioctl for the create_cq verb and access them in their driver
specific code.
Also adds a new enum value for driver specific ioctl attributes for
methods already supporting UHW.
Link: https://lore.kernel.org/r/ed147343987c0d43fd391c1b2f85e2f425747387.1719512393.git.leon@kernel.org
Signed-off-by: Akiva Goldberger <agoldberger@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
drivers/infiniband/hw/mlx4/alias_GUID.c: In function ‘mlx4_ib_init_alias_guid_service’:
drivers/infiniband/hw/mlx4/alias_GUID.c:878:74: error: ‘%d’ directive
output may be truncated writing between 1 and 11 bytes into a region of
size 5 [-Werror=format-truncation=]
878 | snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i);
| ^~
drivers/infiniband/hw/mlx4/alias_GUID.c:878:63: note: directive argument in the range [-2147483641, 2147483646]
878 | snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i);
| ^~~~~~~~~~~~~~
drivers/infiniband/hw/mlx4/alias_GUID.c:878:17: note: ‘snprintf’ output
between 12 and 22 bytes into a destination of size 15
878 | snprintf(alias_wq_name, sizeof alias_wq_name, "alias_guid%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
Fixes: a0c64a17aba8 ("mlx4: Add alias_guid mechanism")
Link: https://lore.kernel.org/r/1951c9500109ca7e36dcd523f8a5f2d0d2a608d1.1718554641.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Increase size of the name array to avoid truncated output warning.
drivers/infiniband/hw/mlx4/mad.c: In function ‘mlx4_ib_alloc_demux_ctx’:
drivers/infiniband/hw/mlx4/mad.c:2197:47: error: ‘%d’ directive output
may be truncated writing between 1 and 11 bytes into a region of size 4
[-Werror=format-truncation=]
2197 | snprintf(name, sizeof(name), "mlx4_ibt%d", port);
| ^~
drivers/infiniband/hw/mlx4/mad.c:2197:38: note: directive argument in
the range [-2147483645, 2147483647]
2197 | snprintf(name, sizeof(name), "mlx4_ibt%d", port);
| ^~~~~~~~~~~~
drivers/infiniband/hw/mlx4/mad.c:2197:9: note: ‘snprintf’ output between
10 and 20 bytes into a destination of size 12
2197 | snprintf(name, sizeof(name), "mlx4_ibt%d", port);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/mlx4/mad.c:2205:48: error: ‘%d’ directive output
may be truncated writing between 1 and 11 bytes into a region of size 3
[-Werror=format-truncation=]
2205 | snprintf(name, sizeof(name), "mlx4_ibwi%d", port);
| ^~
drivers/infiniband/hw/mlx4/mad.c:2205:38: note: directive argument in
the range [-2147483645, 2147483647]
2205 | snprintf(name, sizeof(name), "mlx4_ibwi%d", port);
| ^~~~~~~~~~~~~
drivers/infiniband/hw/mlx4/mad.c:2205:9: note: ‘snprintf’ output between
11 and 21 bytes into a destination of size 12
2205 | snprintf(name, sizeof(name), "mlx4_ibwi%d", port);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/mlx4/mad.c:2213:48: error: ‘%d’ directive output
may be truncated writing between 1 and 11 bytes into a region of size 3
[-Werror=format-truncation=]
2213 | snprintf(name, sizeof(name), "mlx4_ibud%d", port);
| ^~
drivers/infiniband/hw/mlx4/mad.c:2213:38: note: directive argument in
the range [-2147483645, 2147483647]
2213 | snprintf(name, sizeof(name), "mlx4_ibud%d", port);
| ^~~~~~~~~~~~~
drivers/infiniband/hw/mlx4/mad.c:2213:9: note: ‘snprintf’ output between
11 and 21 bytes into a destination of size 12
2213 | snprintf(name, sizeof(name), "mlx4_ibud%d", port);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: all warnings being treated as errors
make[6]: *** [scripts/Makefile.build:244: drivers/infiniband/hw/mlx4/mad.o] Error 1
Fixes: fc06573dfaf8 ("IB/mlx4: Initialize SR-IOV IB support for slaves in master context")
Link: https://lore.kernel.org/r/f3798b3ce9a410257d7e1ec7c9e285f1352e256a.1718554569.git.leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
In order to be sure that 'buff' is never truncated, its size should be
12, not 11.
When building with W=1, this fixes the following warnings:
drivers/infiniband/hw/mlx4/sysfs.c: In function ‘add_port_entries’:
drivers/infiniband/hw/mlx4/sysfs.c:268:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=]
268 | sprintf(buff, "%d", i);
| ^
drivers/infiniband/hw/mlx4/sysfs.c:268:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11
268 | sprintf(buff, "%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~
drivers/infiniband/hw/mlx4/sysfs.c:286:34: error: ‘sprintf’ may write a terminating nul past the end of the destination [-Werror=format-overflow=]
286 | sprintf(buff, "%d", i);
| ^
drivers/infiniband/hw/mlx4/sysfs.c:286:17: note: ‘sprintf’ output between 2 and 12 bytes into a destination of size 11
286 | sprintf(buff, "%d", i);
| ^~~~~~~~~~~~~~~~~~~~~~
Fixes: c1e7e466120b ("IB/mlx4: Add iov directory in sysfs under the ib device")
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/0bb1443eb47308bc9be30232cc23004c4d4cf43e.1695448530.git.christophe.jaillet@wanadoo.fr
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Pull rdma updates from Jason Gunthorpe:
"Many small changes across the subystem, some highlights:
- Usual driver cleanups in qedr, siw, erdma, hfi1, mlx4/5, irdma,
mthca, hns, and bnxt_re
- siw now works over tunnel and other netdevs with a MAC address by
removing assumptions about a MAC/GID from the connection manager
- "Doorbell Pacing" for bnxt_re - this is a best effort scheme to
allow userspace to slow down the doorbell rings if the HW gets full
- irdma egress VLAN priority, better QP/WQ sizing
- rxe bug fixes in queue draining and srq resizing
- Support more ethernet speed options in the core layer
- DMABUF support for bnxt_re
- Multi-stage MTT support for erdma to allow much bigger MR
registrations
- A irdma fix with a CVE that came in too late to go to -rc, missing
bounds checking for 0 length MRs"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (87 commits)
IB/hfi1: Reduce printing of errors during driver shut down
RDMA/hfi1: Move user SDMA system memory pinning code to its own file
RDMA/hfi1: Use list_for_each_entry() helper
RDMA/mlx5: Fix trailing */ formatting in block comment
RDMA/rxe: Fix redundant break statement in switch-case.
RDMA/efa: Fix wrong resources deallocation order
RDMA/siw: Call llist_reverse_order in siw_run_sq
RDMA/siw: Correct wrong debug message
RDMA/siw: Balance the reference of cep->kref in the error path
Revert "IB/isert: Fix incorrect release of isert connection"
RDMA/bnxt_re: Fix kernel doc errors
RDMA/irdma: Prevent zero-length STAG registration
RDMA/erdma: Implement hierarchical MTT
RDMA/erdma: Refactor the storage structure of MTT entries
RDMA/erdma: Renaming variable names and field names of struct erdma_mem
RDMA/hns: Support hns HW stats
RDMA/hns: Dump whole QP/CQ/MR resource in raw
RDMA/irdma: Add missing kernel-doc in irdma_setup_umode_qp()
RDMA/mlx4: Copy union directly
RDMA/irdma: Drop unused kernel push code
...
|
|
Use the auxiliary bus to perform device management of the infiniband
part of the mlx4 driver.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Use a notifier to implement mlx4_dispatch_event() in preparation to
switch mlx4_en and mlx4_ib to be an auxiliary device.
A problem is that if the mlx4_interface.event callback was replaced with
something as mlx4_adrv.event then the implementation of
mlx4_dispatch_event() would need to acquire a lock on a given device
before executing this callback. That is necessary because otherwise
there is no guarantee that the associated driver cannot get unbound when
the callback is running. However, taking this lock is not possible
because mlx4_dispatch_event() can be invoked from the hardirq context.
Using an atomic notifier allows the driver to accurately record when it
wants to receive these events and solves this problem.
A handler registration is done by both mlx4_en and mlx4_ib at the end of
their mlx4_interface.add callback. This matches the current situation
when mlx4_add_device() would enable events for a given device
immediately after this callback, by adding the device on the
mlx4_priv.list.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Function mlx4_dispatch_event() takes an 'unsigned long' as its event
parameter. The actual value is none (MLX4_DEV_EVENT_CATASTROPHIC_ERROR),
a pointer to mlx4_eqe (MLX4_DEV_EVENT_PORT_MGMT_CHANGE), or a 32-bit
integer (remaining events).
In preparation to switch mlx4_en and mlx4_ib to be an auxiliary device,
the mlx4_interface.event callback is replaced with a notifier and
function mlx4_dispatch_event() gets updated to invoke
atomic_notifier_call_chain(). This requires forwarding the input 'param'
value from the former function to the latter. A problem is that the
notifier call takes 'void *' as its 'param' value, compared to
'unsigned long' used by mlx4_dispatch_event(). Re-passing the value
would need either punning it to 'void *' or passing down the address of
the input 'param'. Both approaches create a number of unnecessary casts.
Change instead the input 'param' of mlx4_dispatch_event() from
'unsigned long' to 'void *'. A mlx4_eqe pointer can be passed directly,
callers using an int value are adjusted to pass its address.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Simplify the mlx4 driver interface by removing mlx4_get_protocol_dev()
and the associated mlx4_interface.get_dev callbacks. This is done in
preparation to use an auxiliary bus to model the mlx4 driver structure.
The change is motivated by the following situation:
* The mlx4_en interface is being initialized by mlx4_en_add() and
mlx4_en_activate().
* The latter activate function calls mlx4_en_init_netdev() ->
register_netdev() to register a new net_device.
* A netdev event NETDEV_REGISTER is raised for the device.
* The netdev notififier mlx4_ib_netdev_event() is called and it invokes
mlx4_ib_scan_netdevs() -> mlx4_get_protocol_dev() ->
mlx4_en_get_netdev() [via mlx4_interface.get_dev].
This chain creates a problem when mlx4_en gets switched to be an
auxiliary driver. It contains two device calls which would both need to
take a respective device lock.
Avoid this situation by updating mlx4_ib_scan_netdevs() to no longer
call mlx4_get_protocol_dev() but instead to utilize the information
passed in net_device.parent and net_device.dev_port. This data is
sufficient to determine that an updated port is one that the mlx4_ib
driver should take care of and to keep mlx4_ib_dev.iboe.netdevs up to
date.
Following that, update mlx4_ib_get_netdev() to also not call
mlx4_get_protocol_dev() and instead scan all current netdevs to find
find a matching one. Note that mlx4_ib_get_netdev() is called early from
ib_register_device() and cannot use data tracked in
mlx4_ib_dev.iboe.netdevs which is not at that point yet set.
Finally, remove function mlx4_get_protocol_dev() and the
mlx4_interface.get_dev callbacks (only mlx4_en_get_netdev()) as they
became unused.
Signed-off-by: Petr Pavlu <petr.pavlu@suse.com>
Tested-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Acked-by: Tariq Toukan <tariqt@nvidia.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
|
|
Copy union directly instead of using memcpy().
Note that in this case, a direct assignment is more readable and
consistent with the subsequent assignments.
This addresses the following -Wstringop-overflow warning seen in s390
with defconfig:
drivers/infiniband/hw/mlx4/main.c:296:33: warning: writing 16 bytes into a region of size 0 [-Wstringop-overflow=]
296 | memcpy(&port_gid_table->gids[free].gid,
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
297 | &attr->gid, sizeof(attr->gid));
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This helps with the ongoing efforts to globally enable
-Wstringop-overflow.
Link: https://github.com/KSPP/linux/issues/308
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZNvimeRAPkJ24zRG@work
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Remove unnecessary variable initializations.
Signed-off-by: Ruan Jinjie <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230728065139.3411703-1-ruanjinjie@huawei.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
This code is trying to ensure that only the flags specified in the list
are allowed. The problem is that ucmd->rx_hash_fields_mask is a u64 and
the flags are an enum which is treated as a u32 in this context. That
means the test doesn't check whether the highest 32 bits are zero.
Fixes: 4d02ebd9bbbd ("IB/mlx4: Fix RSS hash fields restrictions")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/233ed975-982d-422a-b498-410f71d8a101@moroto.mountain
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The ucmd->log_sq_bb_count variable is controlled by the user so this
shift can wrap. Fix it by using check_shl_overflow() in the same way
that it was done in commit 515f60004ed9 ("RDMA/hns: Prevent undefined
behavior in hns_roce_set_user_sq_size()").
Fixes: 839041329fd3 ("IB/mlx4: Sanity check userspace send queue sizes")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/r/a8dfbd1d-c019-4556-930b-bab1ded73b10@kili.mountain
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Move the call of qp event handler from atomic to workqueue context,
so that the handler is able to block. This is needed by following
patches.
Signed-off-by: Mark Zhang <markzhang@nvidia.com>
Reviewed-by: Patrisious Haddad <phaddad@nvidia.com>
Link: https://lore.kernel.org/r/0cd17b8331e445f03942f4bb28d447f24ac5669d.1672821186.git.leonro@nvidia.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The call netdev_{put, hold} of dev_{put, hold} will check NULL,
so there is no need to check before using dev_{put, hold}.
Fix the following coccicheck warnings:
/drivers/infiniband/hw/mlx4/main.c:1311:2-10: WARNING:
WARNING NULL check before dev_{put, hold} functions is not needed.
/drivers/infiniband/hw/mlx4/main.c:148:2-10: WARNING:
WARNING NULL check before dev_{put, hold} functions is not needed.
/drivers/infiniband/hw/mlx4/main.c:1959:3-11: WARNING:
WARNING NULL check before dev_{put, hold} functions is not needed.
/drivers/infiniband/hw/mlx4/main.c:1962:3-10: WARNING:
WARNING NULL check before dev_{put, hold} functions is not needed.
Signed-off-by: zhang songyi <zhang.songyi@zte.com.cn>
Link: https://lore.kernel.org/r/202211291554079687539@zte.com.cn
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Yury Norov <yury.norov@gmail.com>
Reviewed-by: Jan Kara <jack@suse.cz> # for ext4
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> # for sch_cake
Acked-by: Chuck Lever <chuck.lever@oracle.com> # for nfsd
Acked-by: Jakub Kicinski <kuba@kernel.org>
Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> # for thunderbolt
Acked-by: Darrick J. Wong <djwong@kernel.org> # for xfs
Acked-by: Helge Deller <deller@gmx.de> # for parisc
Acked-by: Heiko Carstens <hca@linux.ibm.com> # for s390
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
|
|
Set 'iova' and 'length' on ib_mr in ib_uverbs and ib_core layers to let all
drivers have the members filled. Also, this commit removes redundancy in
the respective drivers.
Previously, commit 04c0a5fcfcf65 ("IB/uverbs: Set IOVA on IB MR in uverbs
layer") changed to set 'iova', but seems to have missed 'length' and the
ib_core layer at that time.
Fixes: 04c0a5fcfcf65 ("IB/uverbs: Set IOVA on IB MR in uverbs layer")
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20220921080844.1616883-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Flushing system-wide workqueues is dangerous and will be forbidden.
Replace system_wq with local cm_wq.
Link: https://lore.kernel.org/r/22f7183b-cc16-5a34-e879-7605f5efc6e6@I-love.SAKURA.ne.jp
Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Split out flags from ib_device::device_cap_flags that are only used
internally to the kernel into kernel_cap_flags that is not part of the
uapi. This limits the device_cap_flags to being the same bitmap that will
be copied to userspace.
This cleanly splits out the uverbs flags from the kernel flags to avoid
confusion in the flags bitmap.
Add some short comments describing which each of the kernel flags is
connected to. Remove unused kernel flags.
Link: https://lore.kernel.org/r/0-v2-22c19e565eef+139a-kern_caps_jgg@nvidia.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Max Gurtovoy <mgurtovoy@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
|
|
Pull rdma updates from Jason Gunthorpe:
- Minor bug fixes in mlx5, mthca, pvrdma, rtrs, mlx4, hfi1, hns
- Minor cleanups: coding style, useless includes and documentation
- Reorganize how multicast processing works in rxe
- Replace a red/black tree with xarray in rxe which improves performance
- DSCP support and HW address handle re-use in irdma
- Simplify the mailbox command handling in hns
- Simplify iser now that FMR is eliminated
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (93 commits)
RDMA/nldev: Prevent underflow in nldev_stat_set_counter_dynamic_doit()
IB/iser: Fix error flow in case of registration failure
IB/iser: Generalize map/unmap dma tasks
IB/iser: Use iser_fr_desc as registration context
IB/iser: Remove iser_reg_data_sg helper function
RDMA/rxe: Use standard names for ref counting
RDMA/rxe: Replace red-black trees by xarrays
RDMA/rxe: Shorten pool names in rxe_pool.c
RDMA/rxe: Move max_elem into rxe_type_info
RDMA/rxe: Replace obj by elem in declaration
RDMA/rxe: Delete _locked() APIs for pool objects
RDMA/rxe: Reverse the sense of RXE_POOL_NO_ALLOC
RDMA/rxe: Replace mr by rkey in responder resources
RDMA/rxe: Fix ref error in rxe_av.c
RDMA/hns: Use the reserved loopback QPs to free MR before destroying MPT
RDMA/irdma: Add support for address handle re-use
RDMA/qib: Fix typos in comments
RDMA/mlx5: Fix memory leak in error flow for subscribe event routine
Revert "RDMA/core: Fix ib_qp_usecnt_dec() called when error"
RDMA/rxe: Remove useless argument for update_state()
...
|