aboutsummaryrefslogtreecommitdiff
path: root/drivers/infiniband
AgeCommit message (Collapse)AuthorFilesLines
2026-05-02RDMA/hns: Fix unlocked call to hns_roce_qp_remove()Jason Gunthorpe1-0/+7
Sashiko points out that hns_roce_qp_remove() requires the caller to hold locks. The error flow in hns_roce_create_qp_common() doesn't hold those locks for the error unwind so it risks corrupting memory. Grab the same locks the other two callers use. Cc: stable@vger.kernel.org Fixes: e088a685eae9 ("RDMA/hns: Support rq record doorbell for the user space") Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=9 Link: https://patch.msgid.link/r/15-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/hns: Fix xarray race in hns_roce_create_qp_common()Jason Gunthorpe1-3/+3
Similar to the SRQ case the hr_qp is stored in the xarray before it is fully initialized. Unlike the SRQ case the error unwinds do not wait for the completion so keep the refcount 0 until the function succeeds. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Link: https://patch.msgid.link/r/14-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Suggested-by: Junxian Huang <huangjunxian6@hisilicon.com> Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/hns: Fix xarray race in hns_roce_create_srq()Jason Gunthorpe1-6/+6
Sashiko points out that once the srq memory is stored into the xarray by alloc_srqc() it can immediately be looked up by: xa_lock(&srq_table->xa); srq = xa_load(&srq_table->xa, srqn & (hr_dev->caps.num_srqs - 1)); if (srq) refcount_inc(&srq->refcount); xa_unlock(&srq_table->xa); Which will fail refcount debug because the refcount is 0 and then crash: srq->event(srq, event_type); Because event is NULL. Use refcount_inc_not_zero() instead to ensure a partially prepared srq is never retrieved from the event handler and fix the ordering of the initialization so refcount becomes 1 only after it is fully ready. All the initialization must be done before calling free_srqc() since it depends on the completion and refcount. Fixes: 9a4435375cd1 ("IB/hns: Add driver files for hns RoCE driver") Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=3 Link: https://patch.msgid.link/r/13-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reviewed-by: Junxian Huang <huangjunxian6@hisilicon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/mlx4: Fix resource leak on error in mlx4_ib_create_srq()Jason Gunthorpe1-1/+3
Sashiko points out that mlx4_srq_alloc() was not undone during error unwind, add the missing call to mlx4_srq_free(). Cc: stable@vger.kernel.org Fixes: 225c7b1feef1 ("IB/mlx4: Add a driver Mellanox ConnectX InfiniBand adapters") Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=8 Link: https://patch.msgid.link/r/11-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/vmw_pvrdma: Fix double free on pvrdma_alloc_ucontext() error pathJason Gunthorpe1-1/+1
Sashiko points out that pvrdma_uar_free() is already called within pvrdma_dealloc_ucontext(), so calling it before triggers a double free. Cc: stable@vger.kernel.org Fixes: 29c8d9eba550 ("IB: Add vmw_pvrdma driver") Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4 Link: https://patch.msgid.link/r/10-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/ocrdma: Don't NULL deref uctx on errors in ocrdma_copy_pd_uresp()Jason Gunthorpe1-2/+2
Sashiko points out that pd->uctx isn't initialized until late in the function so all these error flow references are NULL and will crash. Use the uctx that isn't NULL. Cc: stable@vger.kernel.org Fixes: fe2caefcdf58 ("RDMA/ocrdma: Add driver for Emulex OneConnect IBoE RDMA adapter") Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4 Link: https://patch.msgid.link/r/9-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/ocrdma: Clarify the mm_head searchingJason Gunthorpe1-2/+2
The intention of this code is to find matching entries exactly, the driver never creates phys_addr's with different lens so the current expression is not a bug, but it doesn't make sense and confuses review tooling. Search for exact match instead. Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4 Link: https://patch.msgid.link/r/8-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/mana: Fix error unwind in mana_ib_create_qp_rss()Jason Gunthorpe1-1/+3
Sashiko points out that mana_ib_cfg_vport_steering() is leaked, the normal destroy path cleans it up. Cc: stable@vger.kernel.org Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Link: https://sashiko.dev/#/patchset/0-v1-e911b76a94d1%2B65d95-rdma_udata_rep_jgg%40nvidia.com?part=4 Link: https://patch.msgid.link/r/7-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/mana: Fix mana_destroy_wq_obj() cleanup in mana_ib_create_qp_rss()Jason Gunthorpe1-5/+4
Sashiko points out there are two bugs here in the error unwind flow, both related to how the WQ table is unwound. First there is a double i-- on the first failure path due to the while loop having a i--, remove it. Second if mana_ib_install_cq_cb() fails then mana_create_wq_obj() is not undone due to the above i--. Cc: stable@vger.kernel.org Fixes: c15d7802a424 ("RDMA/mana_ib: Add CQ interrupt support for RAW QP") Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=1 Link: https://patch.msgid.link/r/6-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/mana: Remove user triggerable WARN_ON() in mana_ib_create_qp_rss()Jason Gunthorpe1-2/+3
Sashiko points out that the user can specify WQs sharing the same CQ as a part of the uAPI and this will trigger the WARN_ON() then go on to corrupt the kernel. Just reject it outright and fail the QP creation. Cc: stable@vger.kernel.org Fixes: c15d7802a424 ("RDMA/mana_ib: Add CQ interrupt support for RAW QP") Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=1 Link: https://patch.msgid.link/r/5-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/mana: Validate rx_hash_key_lenJason Gunthorpe1-0/+3
Sashiko points out that rx_hash_key_len comes from a uAPI structure and is blindly passed to memcpy, allowing the userspace to trash kernel memory. Bounds check it so the memcpy cannot overflow. Cc: stable@vger.kernel.org Fixes: 0266a177631d ("RDMA/mana_ib: Add a driver for Microsoft Azure Network Adapter") Link: https://sashiko.dev/#/patchset/0-v2-1c49eeb88c48%2B91-rdma_udata_rep_jgg%40nvidia.com?part=1 Link: https://patch.msgid.link/r/4-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/mlx5: Add missing store/release for lock elision patternJason Gunthorpe2-6/+6
mlx5 has a common pattern implementing a device-global singleton resource where it checks the resource pointer for !NULL and then skips obtaining the lock. This is not ordered properly as observing !NULL doesn't mean that all the data under that pointer is also visible on this CPU when the lock is not taken. Use a release/acquire pairing to explicitly manage this. Pointed out by sashiko, Codex found more cases. Fixes: 5895e70f2e6e ("IB/mlx5: Allocate resources just before first QP/SRQ is created") Fixes: 638420115cc4 ("IB/mlx5: Create UMR QP just before first reg_mr occurs") Link: https://sashiko.dev/#/patchset/SYBPR01MB7881E1E0970268BD69C0BA75AF2B2%40SYBPR01MB7881.ausprd01.prod.outlook.com Link: https://patch.msgid.link/r/3-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Assisted-by: Codex:GPT-5.5 Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/mlx5: Restore zero-init to mlx5_ib_modify_qp() ucmdJason Gunthorpe1-1/+1
Sashiko points out the check for inlen==0 got missed, the ={} was not redundant, put it back. Fixes: a9cd442a5347 ("RDMA: Remove redundant = {} for udata req structs") Link: https://patch.msgid.link/r/2-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-05-02RDMA/ionic: Fix typo in format stringJason Gunthorpe1-1/+1
Applying the corrupted patch by hand mangled the format string, put the s in the right place. Cc: stable@vger.kernel.org Fixes: 654a27f25530 ("RDMA/ionic: bound node_desc sysfs read with %.64s") Link: https://patch.msgid.link/r/1-v1-41f3135e5565+9d2-rdma_ai_fixes1_jgg@nvidia.com Reported-by: Brad Spengler <brad.spengler@opensrcsec.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29RDMA/mlx5: Fix null-ptr-deref in Raw Packet QP creationMichael Guralnik1-0/+5
Raw Packet QPs are unique in that they support separate send and receive queues, using 2 different user-provided buffers. They can also be created with one of the queues having size 0, allowing a send-only or receive-only QP. The Raw Packet RQ umem is created in the common user QP creation path, which allows zero-length queues. Add a later validation of the RQ umem in Raw Packet QP creation path when an RQ was requested. This prevents possible null-ptr dereference crashes, as seen in the below trace: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [#1] SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] CPU: 6 UID: 0 PID: 3539 Comm: raw_packet_umem Not tainted 6.19.0-rc1+ #166 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:__mlx5_umem_find_best_quantized_pgoff+0x37/0x280 [mlx5_ib] Code: ff df 41 57 49 89 ff 41 56 41 55 41 89 d5 41 54 4d 89 cc 4c 8d 4f 30 55 4c 89 ca 48 89 f5 53 48 c1 ea 03 48 89 cb 48 83 ec 18 <80> 3c 02 00 44 89 04 24 0f 85 01 02 00 00 48 ba 00 00 00 00 00 fc RSP: 0018:ff1100013966f4e0 EFLAGS: 00010282 RAX: dffffc0000000000 RBX: 00000000ffffffc0 RCX: 00000000ffffffc0 RDX: 0000000000000006 RSI: 00000ffffffff000 RDI: 0000000000000000 RBP: 00000ffffffff000 R08: 0000000000000040 R09: 0000000000000030 R10: 0000000000000000 R11: 0000000000000000 R12: ff1100013966f648 R13: 0000000000000005 R14: ff1100013966f980 R15: 0000000000000000 FS: 00007fae6c82f740(0000) GS:ff11000898ba1000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000200000000000 CR3: 000000010f96c005 CR4: 0000000000373eb0 Call Trace: <TASK> create_qp+0x747d/0xc740 [mlx5_ib] ? is_module_address+0x18/0x110 ? _create_user_qp.constprop.0+0x18e0/0x18e0 [mlx5_ib] ? __module_address+0x49/0x210 ? is_module_address+0x68/0x110 ? static_obj+0x67/0x90 ? lockdep_init_map_type+0x58/0x200 mlx5_ib_create_qp+0xc85/0x2620 [mlx5_ib] ? find_held_lock+0x2b/0x80 ? create_qp+0xc740/0xc740 [mlx5_ib] ? lock_release+0xcb/0x260 ? lockdep_init_map_type+0x58/0x200 ? __init_swait_queue_head+0xcb/0x150 create_qp.part.0+0x558/0x7c0 [ib_core] ib_create_qp_user+0xa0/0x4f0 [ib_core] ? rdma_lookup_get_uobject+0x1e4/0x400 [ib_uverbs] create_qp+0xe4f/0x1d10 [ib_uverbs] ? ib_uverbs_rereg_mr+0xd40/0xd40 [ib_uverbs] ? ib_uverbs_cq_event_handler+0x120/0x120 [ib_uverbs] ? __might_fault+0x81/0x100 ? lock_release+0xcb/0x260 ? _copy_from_user+0x3e/0x90 ib_uverbs_create_qp+0x10a/0x150 [ib_uverbs] ? ib_uverbs_ex_create_qp+0xe0/0xe0 [ib_uverbs] ? __might_fault+0x81/0x100 ? lock_release+0xcb/0x260 ib_uverbs_write+0x7e5/0xc90 [ib_uverbs] ? uverbs_devnode+0xc0/0xc0 [ib_uverbs] ? lock_acquire+0xfa/0x2b0 ? find_held_lock+0x2b/0x80 ? finish_task_switch.isra.0+0x189/0x6c0 vfs_write+0x1c0/0xf70 ? lockdep_hardirqs_on_prepare+0xde/0x170 ? kernel_write+0x5a0/0x5a0 ? __switch_to+0x527/0xe60 ? __schedule+0x10a3/0x3950 ? io_schedule_timeout+0x110/0x110 ksys_write+0x170/0x1c0 ? __x64_sys_read+0xb0/0xb0 ? trace_hardirqs_off.part.0+0x4e/0xe0 do_syscall_64+0x70/0x1360 entry_SYSCALL_64_after_hwframe+0x4b/0x53 RIP: 0033:0x7fae6ca3118d Code: ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 5b cc 0c 00 f7 d8 64 89 01 48 RSP: 002b:00007ffe678ca308 EFLAGS: 00000213 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 00007ffe678ca448 RCX: 00007fae6ca3118d RDX: 0000000000000070 RSI: 0000200000000280 RDI: 0000000000000003 RBP: 00007ffe678ca320 R08: 00000000ffffffff R09: 00007fae6c8ec5b8 R10: 0000000000000064 R11: 0000000000000213 R12: 0000000000000001 R13: 0000000000000000 R14: 00007fae6cb71000 R15: 0000000000404df0 </TASK> Modules linked in: mlx5_ib mlx5_fwctl mlx5_core bonding ip6_gre ip6_tunnel tunnel6 ip_gre gre rdma_ucm ib_uverbs rdma_cm iw_cm ib_ipoib ib_cm ib_umad ib_core rpcsec_gss_krb5 auth_rpcgss oid_registry overlay nfnetlink zram zsmalloc fuse scsi_transport_iscsi [last unloaded: mlx5_core] ---[ end trace 0000000000000000 ]--- RIP: 0010:__mlx5_umem_find_best_quantized_pgoff+0x37/0x280 [mlx5_ib] Fixes: 0fb2ed66a14c ("IB/mlx5: Add create and destroy functionality for Raw Packet QP") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-5-4621fa52de0e@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29RDMA/core: Fix rereg_mr use-after-free raceMichael Guralnik1-2/+7
When a driver creates a new MR during rereg_user_mr, a race window exists between rdma_alloc_commit_uobject() for the new MR and the point where the code reads that MR to populate the response keys. A concurrent rereg_mr or destroy_mr could destroy the MR in this window and cause UAF in the first thread. Racing flow between two rereg_mr calls: CPU0 CPU1 ---- ---- rereg_user_mr(mr_handle) uobj_get_write(mr_handle) -> mr0 mr1 = driver→rereg() rdma_alloc_commit_uobject(mr1) // mr1 replaced mr0 and is unlocked uobj_put_destroy(mr0) rereg_user_mr(mr_handle) uobj_get_write(mr_handle) -> mr1 mr2 = driver→rereg() rdma_alloc_commit_uobject(mr2) // mr2 replaced mr1 and is unlocked uobj_put_destroy(mr1) // Destroys mr1! resp.lkey = mr1->lkey; // UAF - mr1 was freed! resp.rkey = mr1->rkey; // UAF - mr1 was freed! Fix by storing lkey/rkey in local variables before the new MR is unlocked and using the local variables to set the user response. Fixes: 6e0954b11c05 ("RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-4-4621fa52de0e@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29IB/core: Fix IPv6 netlink message size in ib_nl_ip_send_msg()Maher Sanalla1-1/+1
When resolving an RDMA-CM IPv6 address, ib_nl_ip_send_msg() sends a netlink request to the userspace daemon to perform IP-to-GID resolution in certain cases. The function allocates the netlink message buffer using nla_total_size(sizeof(size)), which passes 8 bytes (the size of size_t) instead of 16 bytes (the size of an IPv6 address). This results in an 8-byte under-allocation. This is currently masked by nlmsg_new() over-allocation of the skb in its internal logic. However, the code remains incorrect. Fix the issue by supplying the proper IPv6 address length to nla_total_size(). Fixes: ae43f8286730 ("IB/core: Add IP to GID netlink offload") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-3-4621fa52de0e@nvidia.com Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29RDMA/mlx5: Fix UAF in DCT destroy due to race with createEdward Srouji1-1/+8
A potential race condition exists between mlx5_core_destroy_dct() and mlx5_core_create_dct() that can lead to a use-after-free. After _mlx5_core_destroy_dct() releases the DCT to firmware, the DCTN can be immediately reallocated for a new DCT being created concurrently. If the create path stores the new DCT in the xarray before the destroy path erases it, the destroy will incorrectly delete the new DCT's entry. Later accesses then hit freed memory. Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq() that only erases the entry if it hasn't already been replaced (still contains XA_ZERO_ENTRY), preserving any newly created DCT. Fixes: afff24899846 ("RDMA/mlx5: Handle DCT QP logic separately from low level QP interface") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-2-4621fa52de0e@nvidia.com Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-29RDMA/mlx5: Fix UAF in SRQ destroy due to race with createEdward Srouji1-1/+8
A race condition exists between mlx5_cmd_destroy_srq() and mlx5_cmd_create_srq() that can lead to a use-after-free (UAF) [1]. After destroy_srq_split() releases the SRQ to firmware, the SRQN can be immediately reallocated for a new SRQ being created concurrently. If the create path stores the new SRQ in the xarray before the destroy path erases it, the destroy will incorrectly delete the new SRQ's entry. Later accesses then hit freed memory. Fix by replacing the unconditional xa_erase_irq() with xa_cmpxchg_irq() that only erases the entry if it hasn't already been replaced (still contains XA_ZERO_ENTRY), preserving any newly created SRQ. [1] RIP: 0010:mlx5_cmd_destroy_srq+0xd8/0x110 [mlx5_ib] Code: 89 e1 ba 06 04 00 00 4c 89 f6 48 89 ef e8 80 19 70 e1 c6 83 a0 0f 00 00 00 fb 5b 44 89 e8 5d 41 5c 41 5d 41 5e c3 cc cc cc cc <0f> 0b 48 89 c2 83 e2 03 48 83 fa 02 75 08 48 3d 05 c0 ff ff 77 08 RSP: 0018:ff110001037b7d08 EFLAGS: 00010286 RAX: 0000000000000000 RBX: ff1100010bb9c000 RCX: 0000000000000000 RDX: 0000000000000000 RSI: 0000000000000000 RDI: ff110001037b7c90 RBP: ff1100010bb9cfa0 R08: 0000000000000000 R09: 0000000000000000 R10: ff110001037b7da0 R11: ff11000104f29580 R12: ff1100010e2ac090 R13: 000000000000000d R14: 0000000000000001 R15: ff11000105336300 FS: 00007fa24787c740(0000) GS:ff1100046eb8d000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fa247984e90 CR3: 0000000109d59005 CR4: 0000000000373eb0 Call Trace: <TASK> mlx5_ib_destroy_srq+0x25/0xa0 [mlx5_ib] ib_destroy_srq_user+0x21/0x90 [ib_core] uverbs_free_srq+0x1b/0x50 [ib_uverbs] destroy_hw_idr_uobject+0x1e/0x50 [ib_uverbs] uverbs_destroy_uobject+0x35/0x180 [ib_uverbs] __uverbs_cleanup_ufile+0xdd/0x140 [ib_uverbs] uverbs_destroy_ufile_hw+0x38/0xf0 [ib_uverbs] ib_uverbs_close+0x17/0xa0 [ib_uverbs] __fput+0xe0/0x2a0 __x64_sys_close+0x3a/0x80 do_syscall_64+0x55/0xac0 entry_SYSCALL_64_after_hwframe+0x76/0x7e RIP: 0033:0x7fa247984ea4 Code: 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa 80 3d a5 51 0e 00 00 74 13 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 3c c3 0f 1f 00 55 48 89 e5 48 83 ec 10 89 7d RSP: 002b:00007ffecfa79498 EFLAGS: 00000202 ORIG_RAX: 0000000000000003 RAX: ffffffffffffffda RBX: 0000200000000080 RCX: 00007fa247984ea4 RDX: 0000000000000040 RSI: 0000200000000200 RDI: 0000000000000003 RBP: 00007ffecfa794e0 R08: 00007ffecfa794e0 R09: 00007ffecfa794e0 R10: 0000000000000000 R11: 0000000000000202 R12: 0000000000000001 R13: 0000000000000000 R14: 0000200000000000 R15: 0000200000000009 </TASK> ---[ end trace 0000000000000000 ]--- Fixes: fd89099d635e ("RDMA/mlx5: Issue FW command to destroy SRQ on reentry") Link: https://patch.msgid.link/r/20260427-security-bug-fixes-v3-1-4621fa52de0e@nvidia.com Signed-off-by: Edward Srouji <edwards@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-28RDMA/mlx5: Fix error path fall-through in mlx5_ib_dev_res_srq_init()Junrui Luo1-0/+1
mlx5_ib_dev_res_srq_init() allocates two SRQs, s0 and s1. When ib_create_srq() fails for s1, the error branch destroys s0 but falls through and unconditionally assigns the freed s0 and the ERR_PTR s1 to devr->s0 and devr->s1. This leads to several problems: the lock-free fast path checks "if (devr->s1) return 0;" and treats the ERR_PTR as already initialised; users in mlx5_ib_create_qp() dereference the freed SRQ or ERR_PTR via to_msrq(devr->s0)->msrq.srqn; and mlx5_ib_dev_res_cleanup() dereferences the ERR_PTR and double-frees s0 on teardown. Fix by adding the same `goto unlock` in the s1 failure path. Cc: stable@vger.kernel.org Fixes: 5895e70f2e6e ("IB/mlx5: Allocate resources just before first QP/SRQ is created") Link: https://patch.msgid.link/r/SYBPR01MB7881E1E0970268BD69C0BA75AF2B2@SYBPR01MB7881.ausprd01.prod.outlook.com Reported-by: Yuhao Jiang <danisjiang@gmail.com> Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-28RDMA/rxe: Reject non-8-byte ATOMIC_WRITE payloadsMichael Bommarito1-1/+13
atomic_write_reply() at drivers/infiniband/sw/rxe/rxe_resp.c unconditionally dereferences 8 bytes at payload_addr(pkt): value = *(u64 *)payload_addr(pkt); check_rkey() previously accepted an ATOMIC_WRITE request with pktlen == resid == 0 because the length validation only compared pktlen against resid. A remote initiator that sets the RETH length to 0 therefore reaches atomic_write_reply() with a zero-byte logical payload, and the responder reads sizeof(u64) bytes from past the logical end of the packet into skb->head tailroom, then writes those 8 bytes into the attacker's MR via rxe_mr_do_atomic_write(). That is a remote disclosure of 4 bytes of kernel tailroom per probe (the other 4 bytes are the packet's own trailing ICRC). IBA oA19-28 defines ATOMIC_WRITE as exactly 8 bytes. Anything else is protocol-invalid. Hoist a strict length check into check_rkey() so the responder never reaches the unchecked dereference, and keep the existing WRITE-family length logic for the normal RDMA WRITE path. Reproduced on mainline with an unmodified rxe driver: a sustained zero-length ATOMIC_WRITE probe repeatedly leaks adjacent skb head-buffer bytes into the attacker's MR, including recognisable kernel strings and partial kernel-direct-map pointer words. With this patch applied the responder rejects the PDU and the MR stays all-zero. Cc: stable@vger.kernel.org Fixes: 034e285f8b99 ("RDMA/rxe: Make responder support atomic write on RC service") Link: https://patch.msgid.link/r/20260418162141.3610201-1-michael.bommarito@gmail.com Assisted-by: Claude:claude-opus-4-7 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-28RDMA/rxe: Reject unknown opcodes before ICRC processingMichael Bommarito1-0/+11
Even after applying commit 7244491dab34 ("RDMA/rxe: Validate pad and ICRC before payload_size() in rxe_rcv"), a single unauthenticated UDP packet can still trigger panic. That patch handled payload_size() underflow only for valid opcodes with short packets, not for packets carrying an unknown opcode. The unknown-opcode OOB read described below predates that commit and reaches back to the initial Soft RoCE driver. The check added there reads pkt->paylen < header_size(pkt) + bth_pad(pkt) + RXE_ICRC_SIZE where header_size(pkt) expands to rxe_opcode[pkt->opcode].length. The rxe_opcode[] array has 256 entries but is only populated for defined IB opcodes; any other entry (for example opcode 0xff) is zero-initialized, so length == 0 and the check degenerates to pkt->paylen < 0 + bth_pad(pkt) + RXE_ICRC_SIZE which does not constrain pkt->paylen enough. rxe_icrc_hdr() then computes rxe_opcode[pkt->opcode].length - RXE_BTH_BYTES which underflows when length == 0 and passes a huge value to rxe_crc32(), causing an out-of-bounds read of the skb payload. Reproduced on v7.0-rc7 with that fix applied, QEMU/KVM with CONFIG_RDMA_RXE=y and CONFIG_KASAN=y, after rdma link add rxe0 type rxe netdev eth0 A single 48-byte UDP packet to port 4791 with BTH opcode=0xff and QPN=IB_MULTICAST_QPN triggers: BUG: KASAN: slab-out-of-bounds in crc32_le+0x115/0x170 Read of size 1 at addr ... The buggy address is located 0 bytes to the right of allocated 704-byte region Call Trace: crc32_le+0x115/0x170 rxe_icrc_hdr.isra.0+0x226/0x300 rxe_icrc_check+0x13f/0x3a0 rxe_rcv+0x6e1/0x16e0 rxe_udp_encap_recv+0x20a/0x320 udp_queue_rcv_one_skb+0x7ed/0x12c0 Subsequent packets with the same shape fault on unmapped memory and panic the kernel. The trigger requires only module load and "rdma link add"; no QP, no connection, and no authentication. Fix this by rejecting packets whose opcode has no rxe_opcode[] entry, detected via the zero mask or zero length, before any length arithmetic runs. Cc: stable@vger.kernel.org Fixes: 8700e3e7c485 ("Soft RoCE driver") Link: https://patch.msgid.link/r/20260414111555.3386793-1-michael.bommarito@gmail.com Assisted-by: Claude:claude-opus-4-6 Signed-off-by: Michael Bommarito <michael.bommarito@gmail.com> Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-28IB/hfi1: Fix potential use-after-free in PIO and SDMA map teardownLi RongQing2-2/+7
The current teardown logic for dd->pio_map and dd->sdma_map frees the structures while they might still be accessed by RCU readers. Although the pointer is nulled under a spinlock, the memory is reclaimed before waiting for the grace period to end. This patch fixes the sequence by: 1. Extracting the pointer under the lock. 2. Clearing the RCU-protected pointer. 3. Waiting for readers to finish with synchronize_rcu(). 4. Finally freeing the memory. Fixes: 7724105686e7 ("IB/hfi1: add driver files") Link: https://patch.msgid.link/r/20260206050836.5890-1-lirongqing@baidu.com Signed-off-by: Li RongQing <lirongqing@baidu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-21Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsiLinus Torvalds1-0/+1
Pull SCSI updates from James Bottomley: "Usual driver updates (ufs, lpfc, fnic, target, mpi3mr). The substantive core changes are adding a 'serial' sysfs attribute and getting sd to support > PAGE_SIZE sectors" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (98 commits) scsi: target: Don't validate ignored fields in PROUT PREEMPT scsi: qla2xxx: Use nr_cpu_ids instead of NR_CPUS for qp_cpu_map allocation scsi: ufs: core: Disable timestamp for Kioxia THGJFJT0E25BAIP scsi: mpi3mr: Fix typo scsi: sd: fix missing put_disk() when device_add(&disk_dev) fails scsi: libsas: Delete unused to_dom_device() and to_dev_attr() scsi: storvsc: Handle PERSISTENT_RESERVE_IN truncation for Hyper-V vFC scsi: iscsi_tcp: Remove unneeded selections of CRYPTO and CRYPTO_MD5 scsi: lpfc: Update lpfc version to 15.0.0.0 scsi: lpfc: Add PCI ID support for LPe42100 series adapters scsi: lpfc: Introduce 128G link speed selection and support scsi: lpfc: Check ASIC_ID register to aid diagnostics during failed fw updates scsi: lpfc: Update construction of SGL when XPSGL is enabled scsi: lpfc: Remove deprecated PBDE feature scsi: lpfc: Add REG_VFI mailbox cmd error handling scsi: lpfc: Log MCQE contents for mbox commands with no context scsi: lpfc: Select mailbox rq_create cmd version based on SLI4 if_type scsi: lpfc: Break out of IRQ affinity assignment when mask reaches nr_cpu_ids scsi: ufs: core: Make the header files self-contained scsi: ufs: core: Remove an include directive from ufshcd-crypto.h ...
2026-04-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds150-7491/+3950
Pull rdma updates from Jason Gunthorpe: "The usual collection of driver changes, more core infrastructure updates that typical this cycle: - Minor cleanups and kernel-doc fixes in bnxt_re, hns, rdmavt, efa, ocrdma, erdma, rtrs, hfi1, ionic, and pvrdma - New udata validation framework and driver updates - Modernize CQ creation interface in mlx4 and mlx5, manage CQ umem in core - Promote UMEM to a core component, split out DMA block iterator logic - Introduce FRMR pools with aging, statistics, pinned handles, and netlink control and use it in mlx5 - Add PCIe TLP emulation support in mlx5 - Extend umem to work with revocable pinned dmabuf's and use it in irdma - More net namespace improvements for rxe - GEN4 hardware support in irdma - First steps to MW and UC support in mana_ib - Support for CQ umem and doorbells in bnxt_re - Drop opa_vnic driver from hfi1 Fixes: - IB/core zero dmac neighbor resolution race - GID table memory free - rxe pad/ICRC validation and r_key async errors - mlx4 external umem for CQ - umem DMA attributes on unmap - mana_ib RX steering on RSS QP destroy" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (116 commits) RDMA/core: Fix user CQ creation for drivers without create_cq RDMA/ionic: bound node_desc sysfs read with %.64s IB/core: Fix zero dmac race in neighbor resolution RDMA/mana_ib: Support memory windows RDMA/rxe: Validate pad and ICRC before payload_size() in rxe_rcv RDMA/core: Prefer NLA_NUL_STRING RDMA/core: Fix memory free for GID table RDMA/hns: Remove the duplicate calls to ib_copy_validate_udata_in() RDMA: Remove redundant = {} for udata req structs RDMA/irdma: Add missing comp_mask check in alloc_ucontext RDMA/hns: Add missing comp_mask check in create_qp RDMA/mlx5: Pull comp_mask validation into ib_copy_validate_udata_in_cm() RDMA: Use ib_copy_validate_udata_in_cm() for zero comp_mask RDMA/hns: Use ib_copy_validate_udata_in() RDMA/mlx4: Use ib_copy_validate_udata_in() for QP RDMA/mlx4: Use ib_copy_validate_udata_in() RDMA/mlx5: Use ib_copy_validate_udata_in() for MW RDMA/mlx5: Use ib_copy_validate_udata_in() for SRQ RDMA/pvrdma: Use ib_copy_validate_udata_in() for srq RDMA: Use ib_copy_validate_udata_in() for implicit full structs ...
2026-04-17RDMA/core: Fix user CQ creation for drivers without create_cqMichael Margolin1-2/+1
CQ creation is failing for drivers that only implement create_user_cq (e.g. EFA), when buffer isn't provided by userspace. This because of a leftover check that requires create_cq existence in such case. Remove the create_cq existence check from the no-buffer path. The buffer is optional and drivers that handle their own memory should work through create_user_cq regardless. Fixes: 584ec74748e6 ("RDMA/core: Prepare create CQ path for API unification") Link: https://patch.msgid.link/r/20260416201408.13980-1-mrgolin@amazon.com Signed-off-by: Michael Margolin <mrgolin@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-04-16Merge tag 'for-linus-fwctl' of ↵Linus Torvalds4-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl Pull fwctl updates from Jason Gunthorpe: - New fwctl driver for Broadcom RDMA NICs - Bug fix for non-modular builds * tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl: fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal fwctl/bnxt_fwctl: Add documentation entries fwctl/bnxt_fwctl: Add bnxt fwctl device fwctl/bnxt_en: Create an aux device for fwctl fwctl/bnxt_en: Refactor aux bus functions to be more generic fwctl/bnxt_en: Move common definitions to include/linux/bnxt/
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-15Merge tag 'drm-next-2026-04-15' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds4-13/+3
Pull drm updates from Dave Airlie: "Highlights: - new DRM RAS infrastructure using netlink - amdgpu: enable DC on CIK APUs, and more IP enablement, and more user queue work - xe: purgeable BO support, and new hw enablement - dma-buf : add revocable operations Full summary: mm: - two-pass MMU interval notifiers - add gpu active/reclaim per-node stat counters math: - provide __KERNEL_DIV_ROUND_CLOSEST() in UAPI - implement DIV_ROUND_CLOSEST() with __KERNEL_DIV_ROUND_CLOSEST() rust: - shared tag with driver-core: register macro and io infra - core: rework DMA coherent API - core: add interop::list to interop with C linked lists - core: add more num::Bounded operations - core: enable generic_arg_infer and add EMSGSIZE - workqueue: add ARef<T> support for work and delayed work - add GPU buddy allocator abstraction - add DRM shmem GEM helper abstraction - allow drm:::Device to dispatch work and delayed work items to driver private data - add dma_resv_lock helper and raw accessors core: - introduce DRM RAS infrastructure over netlink - add connector panel_type property - fourcc: add ARM interleaved 64k modifier - colorop: add destroy helper - suballoc: split into alloc and init helpers - mode: provide DRM_ARGB_GET*() macros for reading color components edid: - provide drm_output_color_Format dma-buf: - provide revoke mechanism for shared buffers - rename move_notify to invalidate_mappings - always enable move_notify - protect dma_fence_ops with RCU and improve locking - clean pages with helpers atomic: - allocate drm_private_state via callback - helper: use system_percpu_wq buddy: - make buddy allocator available to gpu level - add kernel-doc for buddy allocator - improve aligned allocation ttm: - fix fence signalling - improve tests and docs - improve handling of gfp_retry_mayfail - use per-node stat counters to track memory allocations - port pool to use list_lru - drop NUMA specific pools - make pool shrinker numa aware - track allocated pages per numa node coreboot: - cleanup coreboot framebuffer support sched: - fix race condition in drm_sched_fini pagemap: - enable THP support - pass pagemap_addr by reference gem-shmem: - Track page accessed/dirty status across mmap/vmap gpusvm: - reenable device to device migration - fix unbalanced unclock bridge: - anx7625: Support USB-C plus DT bindings - connector: Fix EDID detection - dw-hdmi-qp: Support Vendor-Specfic and SDP Infoframes; improve others - fsl-ldb: Fix visual artifacts plus related DT property 'enable-termination-resistor' - imx8qxp-pixel-link: Improve bridge reference handling - lt9611: Support Port-B-only input plus DT bindings - tda998x: Support DRM_BRIDGE_ATTACH_NO_CONNECTOR; Clean up - Support TH1520 HDMI plus DT bindings - waveshare-dsi: Fix register and attach; Support 1..4 DSI lanes plus DT bindings - anx7625: Fix USB Type-C handling - cdns-mhdp8546-core: Handle HDCP state in bridge atomic_check - Support Lontium LT8713SX DP MST bridge plus DT bindings - analogix_dp: Use DP helpers for link training panel: - panel-jdi-lt070me05000: Use mipi-dsi multi functions - panel-edp: Support Add AUO B116XAT04.1 (HW: 1A); Support CMN N116BCL-EAK (C2); Support FriendlyELEC plus DT changes - panel-edp: Fix timings for BOE NV140WUM-N64 - ilitek-ili9882t: Allow GPIO calls to sleep - jadard: Support TAIGUAN XTI05101-01A - lxd: Support LXD M9189A plus DT bindings - mantix: Fix pixel clock; Clean up - motorola: Support Motorola Atrix 4G and Droid X2 plus DT bindings - novatek: Support Novatek/Tianma NT37700F plus DT bindings - simple: Support EDT ET057023UDBA plus DT bindings; Support Powertip PH800480T032-ZHC19 plus DT bindings; Support Waveshare 13.3" - novatek-nt36672a: Use mipi_dsi_*_multi() functions - panel-edp: Support BOE NV153WUM-N42, CMN N153JCA-ELK, CSW MNF307QS3-2 - support Himax HX83121A plus DT bindings - support JuTouch JT070TM041 plus DT bindings - support Samsung S6E8FC0 plus DT bindings - himax-hx83102c: support Samsung S6E8FC0 plus DT bindings; support backlight - ili9806e: support Rocktech RK050HR345-CT106A plus DT bindings - simple: support Tianma TM050RDH03 plus DT bindings amdgpu: - enable DC by default on CIK APUs - userq fence ioctl param size fixes - set panel_type to OLED for eDP - refactor DC i2c code - FAMS2 update - rework ttm handling to allow multiple engines - DC DCE 6.x cleanup - DC support for NUTMEG/TRAVIS DP bridge - DCN 4.2 support - GC12 idle power fix for compute - use struct drm_edid in non-DC code - enable NV12/P010 support on primary planes - support newer IP discovery tables - VCN/JPEG 5.0.2 support - GC/MES 12.1 updates - USERQ fixes - add DC idle state manager - eDP DSC seamless boot amdkfd: - GC 12.1 updates - non 4K page fixes xe: - basic Xe3p_LPG and NVL-P enabling patches - allow VM_BIND decompress support - add purgeable buffer object support - add xe_vm_get_property_ioctl - restrict multi-lrc to VCS/VECS engines - allow disabling VM overcommit in fault mode - dGPU memory optimizations - Workaround cleanups and simplification - Allow VFs VRAM quote changes using sysfs - convert GT stats to per-cpu counters - pagefault refactors - enable multi-queue on xe3p_xpc - disable DCC on PTL - make MMIO communication more robust - disable D3Cold for BMG on specific platforms - vfio: improve FLR sync for Xe VFIO i915/display: - C10/C20/LT PHY PLL divider verification - use trans push mechanism to generate PSR frame change on LNL+ - refactor DP DSC slice config - VGA decode refactoring - refactor DPT, gen2-4 overlay, masked field register macro helpers - refactor stolen memory allocation decisions - prepare for UHBR DP tunnels - refactor LT PHY PLL to use DPLL framework - implement register polling/waiting in display code - add shared stepping header between i915 and display i915: - fix potential overflow of shmem scatterlist length nouveau: - provide Z cull info to userspace - initial GA100 support - shutdown on PCI device shutdown nova-core: - harden GSP command queue - add support for large RPCs - simplify GSP sequencer and message handling - refactor falcon firmware handling - convert to new register macro - conver to new DMA coherent API - use checked arithmetic - add debugfs support for gsp-rm log buffers - fix aux device registration for multi-GPU msm: - CI: - Uprev mesa - Restore CI jobs for Qualcomm APQ8016 and APQ8096 devices - Core: - Switched to of_get_available_child_by_name() - DPU: - Fixes for DSC panels - Fixed brownout because of the frequency / OPP mismatch - Quad pipe preparation (not enabled yet) - Switched to virtual planes by default - Dropped VBIF_NRT support - Added support for Eliza platform - Reworked alpha handling - Switched to correct CWB definitions on Eliza - Dropped dummy INTF_0 on MSM8953 - Corrected INTFs related to DP-MST - DP: - Removed debug prints looking into PHY internals - DSI: - Fixes for DSC panels - RGB101010 support - Support for SC8280XP - Moved PHY bindings from display/ to phy/ - GPU: - Preemption support for x2-85 and a840 - IFPC support for a840 - SKU detection support for x2-85 and a840 - Expose AQE support (VK ray-pipeline) - Avoid locking in VM_BIND fence signaling path - Fix to avoid reclaim in GPU snapshot path - Disallow foreign mapping of _NO_SHARE BOs - HDMI: - Fixed infoframes programming - MDP5: - Dropped support for MSM8974v1 - Dropped now unused code for MSM8974 v1 and SDM660 / MSM8998 panthor: - add tracepoints for power and IRQs - fix fence handling - extend timestamp query with flags - support various sources for timestamp queries tyr: - fix names and model/versions rockchip: - vop2: use drm logging function - rk3576 displayport support - support CRTC background color atmel-hlcdc: - support sana5d65 LCD controller tilcdc: - use DT bindings schema - use managed DRM interfaces - support DRM_BRIDGE_ATTACH_NO_CONNECTOR verisilicon: - support DC8200 + DT bindings virtgpu: - support PRIME import with 3D enabled komeda: - fix integer overflow in AFBC checks mcde: - improve bridge handling gma500: - use drm client buffer for fbdev framebuffer amdxdna: - add sensors ioctls - provide NPU power estimate - support column utilization sensor - allow