| Age | Commit message (Collapse) | Author | Files | Lines |
|
Add mlx5_lag_query_bond_speed() to query the aggregated speed of
lag configurations with a bond device.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Add port change event handling logic for MPESW LAG mode, ensuring
VFs are updated when the speed of LAG physical ports changes.
This triggers a speed update workflow when relevant port state changes
occur, enabling consistent and accurate reporting of VF bandwidth.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Currently, vports report only their parent's uplink speed, which in LAG
setups does not reflect the true aggregated bandwidth. This makes it
hard for upper-layer software to optimize load balancing decisions
based on accurate bandwidth information.
Fix the issue by calculating the possible maximum speed of a LAG as
the sum of speeds of all active uplinks that are part of the LAG.
Propagate this effective max speed to vports associated with the LAG
whenever a relevant event occurs, such as physical port link state
changes or LAG creation/modification.
With this change, upper-layer components receive accurate bandwidth
information corresponding to the active members of the LAG and can
make better load balancing decisions.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Introduce the max_tx_speed field to the query and modify_vport_state
structures.
Add the esw_vport_state_max_tx_speed capability bit, indicating
the firmware support modifying the max_tx_speed field via the
MODIFY_VPORT_STATE command.
Signed-off-by: Or Har-Toov <ohartoov@nvidia.com>
Reviewed-by: Maher Sanalla <msanalla@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Edward Srouji <edwards@nvidia.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
|
|
Wang Zhaolong reports a deadlock involving NFSv4.1 state recovery
waiting on kthreadd, which is attempting to reclaim memory by calling
nfs_release_folio(). The latter cannot make progress due to state
recovery being needed.
It seems that the only safe thing to do here is to kick off a writeback
of the folio, without waiting for completion, or else kicking off an
asynchronous commit.
Reported-by: Wang Zhaolong <wangzhaolong@huaweicloud.com>
Fixes: 96780ca55e3c ("NFS: fix up nfs_release_folio() to try to release the page")
Signed-off-by: Trond Myklebust <trond.myklebust@hammerspace.com>
|
|
Some low-level drivers (LLD) access block layer crypto fields, such as
rq->crypt_keyslot and rq->crypt_ctx within `struct request`, to
configure hardware for inline encryption. However, SCSI Error Handling
(EH) commands (e.g., TEST UNIT READY, START STOP UNIT) should not
involve any encryption setup.
To prevent drivers from erroneously applying crypto settings during EH,
this patch saves the original values of rq->crypt_keyslot and
rq->crypt_ctx before an EH command is prepared via scsi_eh_prep_cmnd().
These fields in the 'struct request' are then set to NULL. The original
values are restored in scsi_eh_restore_cmnd() after the EH command
completes.
This ensures that the block layer crypto context does not leak into EH
command execution.
Signed-off-by: Brian Kao <powenkao@google.com>
Link: https://patch.msgid.link/20251218031726.2642834-1-powenkao@google.com
Cc: stable@vger.kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
|
|
Directly increment the TSO features incurs a side effect: it will also
directly clear the flags in NETIF_F_ALL_FOR_ALL on the master device,
which can cause issues such as the inability to enable the nocache copy
feature on the bonding driver.
The fix is to include NETIF_F_ALL_FOR_ALL in the update mask, thereby
preventing it from being cleared.
Fixes: b0ce3508b25e ("bonding: allow TSO being set on bonding master")
Signed-off-by: Di Zhu <zhud@hygon.cn>
Link: https://patch.msgid.link/20251224012224.56185-1-zhud@hygon.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull core entry fix from Borislav Petkov:
- Make sure clang inlines trivial local_irq_* helpers
* tag 'core_urgent_for_v6.19_rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
entry: Always inline local_irq_{enable,disable}_exit_to_user()
|
|
Up until now, all monitoring events were associated with the L3 resource and it
made sense to use the L3 specific "struct rdt_mon_domain *" argument to functions
operating on domains.
Telemetry events will be tied to a new resource with its instances represented
by a new domain structure that, just like struct rdt_mon_domain, starts with
the generic struct rdt_domain_hdr.
Prepare to support domains belonging to different resources by changing the
calling convention of functions operating on domains. Pass the generic header
and use that to find the domain specific structure where needed.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
|
|
Every resctrl resource has a list of domain structures. struct rdt_ctrl_domain
and struct rdt_mon_domain both begin with struct rdt_domain_hdr with
rdt_domain_hdr::type used in validity checks before accessing the domain of
a particular type.
Add the resource id to struct rdt_domain_hdr in preparation for a new monitoring
domain structure that will be associated with a new monitoring resource. Improve
existing domain validity checks with a new helper domain_header_is_valid()
that checks both domain type and resource id. domain_header_is_valid() should
be used before every call to container_of() that accesses a domain structure.
Signed-off-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
Reviewed-by: Reinette Chatre <reinette.chatre@intel.com>
Link: https://lore.kernel.org/20251217172121.12030-1-tony.luck@intel.com
|
|
'20251117-mdss-resets-msm8917-msm8937-v2-1-a7e9bbdaac96@mainlining.org' into HEAD
Merge the addition of MDSS reset to the MSM8917 GCC binding, in order to
get access to the introduced constant.
|
|
Export and namespace those not prefixed with drm_* so
it becomes possible to write custom commit tail functions
in individual drivers using the helper infrastructure.
Tested-by: Marek Vasut <marek.vasut+renesas@mailbox.org>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Cc: stable@vger.kernel.org # v6.17+
Fixes: c9b1150a68d9 ("drm/atomic-helper: Re-order bridge chain pre-enable and post-disable")
Reviewed-by: Aradhya Bhatia <aradhya.bhatia@linux.dev>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Tested-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20251205-drm-seq-fix-v1-3-fda68fa1b3de@ideasonboard.com
|
|
This reverts commit c9b1150a68d9362a0827609fc0dc1664c0d8bfe1.
Changing the enable/disable sequence has caused regressions on multiple
platforms: R-Car, MCDE, Rockchip. A series (see link below) was sent to
fix these, but it was decided that it's better to revert the original
patch and change the enable/disable sequence only in the tidss driver.
Reverting this commit breaks tidss's DSI and OLDI outputs, which will be
fixed in the following commits.
Signed-off-by: Tomi Valkeinen <tomi.valkeinen@ideasonboard.com>
Link: https://lore.kernel.org/all/20251202-mcde-drm-regression-thirdfix-v6-0-f1bffd4ec0fa%40kernel.org/
Fixes: c9b1150a68d9 ("drm/atomic-helper: Re-order bridge chain pre-enable and post-disable")
Cc: stable@vger.kernel.org # v6.17+
Reviewed-by: Aradhya Bhatia <aradhya.bhatia@linux.dev>
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Reviewed-by: Linus Walleij <linusw@kernel.org>
Tested-by: Linus Walleij <linusw@kernel.org>
Signed-off-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20251205-drm-seq-fix-v1-1-fda68fa1b3de@ideasonboard.com
|
|
Add some of the UFS symbol rx/tx muxes were not initially described.
Signed-off-by: Taniya Das <taniya.das@oss.qualcomm.com>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Link: https://lore.kernel.org/r/20260103-ufs_symbol_clk-v2-1-51828cc76236@oss.qualcomm.com
Signed-off-by: Bjorn Andersson <andersson@kernel.org>
|
|
The __opt annotation was originally introduced specifically for
buffer/size argument pairs in bpf_dynptr_slice() and
bpf_dynptr_slice_rdwr(), allowing the buffer pointer to be NULL while
still validating the size as a constant. The __nullable annotation
serves the same purpose but is more general and is already used
throughout the BPF subsystem for raw tracepoints, struct_ops, and other
kfuncs.
This patch unifies the two annotations by replacing __opt with
__nullable. The key change is in the verifier's
get_kfunc_ptr_arg_type() function, where mem/size pair detection is now
performed before the nullable check. This ensures that buffer/size
pairs are correctly classified as KF_ARG_PTR_TO_MEM_SIZE even when the
buffer is nullable, while adding an !arg_mem_size condition to the
nullable check prevents interference with mem/size pair handling.
When processing KF_ARG_PTR_TO_MEM_SIZE arguments, the verifier now uses
is_kfunc_arg_nullable() instead of the removed is_kfunc_arg_optional()
to determine whether to skip size validation for NULL buffers.
This is the first documentation added for the __nullable annotation,
which has been in use since it was introduced but was previously
undocumented.
No functional changes to verifier behavior - nullable buffer/size pairs
continue to work exactly as before.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102221513.1961781-1-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Introduce bpf_map_memcg_enter() and bpf_map_memcg_exit() helpers to
reduce code duplication in memcg context management.
bpf_map_memcg_enter() gets the memcg from the map, sets it as active,
and returns both the previous and the now active memcg.
bpf_map_memcg_exit() restores the previous active memcg and releases the
reference obtained during enter.
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102200230.25168-2-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux
Pull crypto library fix from Eric Biggers:
"Fix the kunit_run_irq_test() function (which I recently added for the
CRC and crypto tests) to be less timing-dependent.
This fixes flakiness in the polyval kunit test suite"
* tag 'libcrypto-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux:
kunit: Enforce task execution in {soft,hard}irq contexts
|
|
Pull rdma fixes from Jason Gunthorpe:
- Fix several syzkaller found bugs:
- Poor parsing of the RDMA_NL_LS_OP_IP_RESOLVE netlink
- GID entry refcount leaking when CM destruction races with
multicast establishment
- Missing refcount put in ib_del_sub_device_and_put()
- Fixup recently introduced uABI padding for 32 bit consistency
- Avoid user triggered math overflow in MANA and AFA
- Reading invalid netdev data during an event
- kdoc fixes
- Fix never-working gid copying in ib_get_gids_from_rdma_hdr
- Typo in bnxt when validating the BAR
- bnxt mis-parsed IB_SEND_IP_CSUM so it didn't work always
- bnxt out of bounds access in bnxt related to the counters on new
devices
- Allocate the bnxt PDE table with the right sizing
- Use dma_free_coherent() correctly in bnxt
- Allow rxe to be unloadable when CONFIG_PROVE_LOCKING by adjusting the
tracking of the global sockets it uses
- Missing unlocking on error path in rxe
- Compute the right number of pages in a MR in rtrs
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
RDMA/bnxt_re: fix dma_free_coherent() pointer
RDMA/rtrs: Fix clt_path::max_pages_per_mr calculation
IB/rxe: Fix missing umem_odp->umem_mutex unlock on error path
RDMA/bnxt_re: Fix to use correct page size for PDE table
RDMA/bnxt_re: Fix OOB write in bnxt_re_copy_err_stats()
RDMA/bnxt_re: Fix IB_SEND_IP_CSUM handling in post_send
RDMA/core: always drop device refcount in ib_del_sub_device_and_put()
RDMA/rxe: let rxe_reclassify_recv_socket() call sk_owner_put()
RDMA/bnxt_re: Fix incorrect BAR check in bnxt_qplib_map_creq_db()
RDMA/core: Fix logic error in ib_get_gids_from_rdma_hdr()
RDMA/efa: Remove possible negative shift
RTRS/rtrs: clean up rtrs headers kernel-doc
RDMA/irdma: avoid invalid read in irdma_net_event
RDMA/mana_ib: check cqe length for kernel CQs
RDMA/irdma: Fix irdma_alloc_ucontext_resp padding
RDMA/ucma: Fix rdma_ucm_query_ib_service_resp struct padding
RDMA/cm: Fix leaking the multicast GID table reference
RDMA/core: Check for the presence of LS_NLA_TYPE_DGID correctly
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux
Pull io_uring fixes from Jens Axboe:
- Removed dead argument length for io_uring_validate_mmap_request()
- Use GFP_NOWAIT for overflow CQEs on legacy ring setups rather than
GFP_ATOMIC, which makes it play nicer with memcg limits
- Fix a potential circular locking issue with tctx node removal and
exec based cancelations
* tag 'io_uring-6.19-20260102' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux:
io_uring/memmap: drop unused sz param in io_uring_validate_mmap_request()
io_uring/tctx: add separate lock for list of tctx's in ctx
io_uring: use GFP_NOWAIT for overflow CQEs on legacy rings
|
|
Now that KF_TRUSTED_ARGS is the default for all kfuncs, remove the
explicit KF_TRUSTED_ARGS flag from all kfunc definitions and remove the
flag itself.
Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Link: https://lore.kernel.org/r/20260102180038.2708325-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
I haven't found an NFSERR_EAGAIN in RFCs 1094, 1813, 7530, or 8881.
None of these RFCs have an NFS status code that match the numeric
value "11".
Based on the meaning of the EAGAIN errno, I presume the use of this
status in NFSD means NFS4ERR_DELAY. So replace the one usage of
nfserr_eagain, and remove it from NFSD's NFS status conversion
tables.
As far as I can tell, NFSERR_EAGAIN has existed since the pre-git
era, but was not actually used by any code until commit f4e44b393389
("NFSD: delay unmount source's export after inter-server copy
completed."), at which time it become possible for NFSD to return
a status code of 11 (which is not valid NFS protocol).
Fixes: f4e44b393389 ("NFSD: delay unmount source's export after inter-server copy completed.")
Cc: stable@vger.kernel.org
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Jeff Layton <jlayton@kernel.org>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
|
|
Pull drm fixes from Dave Airlie:
"Happy New Year, jetlagged fixes from me, still pretty quiet, xe is
most of this, with i915/nouveau/imagination fixes and some shmem
cleanups.
shmem:
- docs and MODULE_LICENSE fix
xe:
- Ensure svm device memory is idle before migration completes
- Fix a SVM debug printout
- Use READ_ONCE() / WRITE_ONCE() for g2h_fence
i915:
- Fix eb_lookup_vmas() failure path
nouveau:
- fix prepare_fb warnings
imagination:
- prevent export of protected objects"
* tag 'drm-fixes-2026-01-02' of https://gitlab.freedesktop.org/drm/kernel:
drm/i915/gem: Zero-initialize the eb.vma array in i915_gem_do_execbuffer
drm/xe/guc: READ/WRITE_ONCE g2h_fence->done
drm/pagemap, drm/xe: Ensure that the devmem allocation is idle before use
drm/xe/svm: Fix a debug printout
drm/gem-shmem: Fix the MODULE_LICENSE() string
drm/gem-shmem: Fix typos in documentation
drm/nouveau/dispnv50: Don't call drm_atomic_get_crtc_state() in prepare_fb
drm/imagination: Disallow exporting of PM/FW protected objects
|
|
ctx->tcxt_list holds the tasks using this ring, and it's currently
protected by the normal ctx->uring_lock. However, this can cause a
circular locking issue, as reported by syzbot, where cancelations off
exec end up needing to remove an entry from this list:
======================================================
WARNING: possible circular locking dependency detected
syzkaller #0 Tainted: G L
------------------------------------------------------
syz.0.9999/12287 is trying to acquire lock:
ffff88805851c0a8 (&ctx->uring_lock){+.+.}-{4:4}, at: io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
but task is already holding lock:
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733
which lock already depends on the new lock.
the existing dependency chain (in reverse order) is:
-> #2 (&sig->cred_guard_mutex){+.+.}-{4:4}:
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
proc_pid_attr_write+0x547/0x630 fs/proc/base.c:2837
vfs_write+0x27e/0xb30 fs/read_write.c:684
ksys_write+0x145/0x250 fs/read_write.c:738
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #1 (sb_writers#3){.+.+}-{0:0}:
percpu_down_read_internal include/linux/percpu-rwsem.h:53 [inline]
percpu_down_read_freezable include/linux/percpu-rwsem.h:83 [inline]
__sb_start_write include/linux/fs/super.h:19 [inline]
sb_start_write+0x4d/0x1c0 include/linux/fs/super.h:125
mnt_want_write+0x41/0x90 fs/namespace.c:499
open_last_lookups fs/namei.c:4529 [inline]
path_openat+0xadd/0x3dd0 fs/namei.c:4784
do_filp_open+0x1fa/0x410 fs/namei.c:4814
io_openat2+0x3e0/0x5c0 io_uring/openclose.c:143
__io_issue_sqe+0x181/0x4b0 io_uring/io_uring.c:1792
io_issue_sqe+0x165/0x1060 io_uring/io_uring.c:1815
io_queue_sqe io_uring/io_uring.c:2042 [inline]
io_submit_sqe io_uring/io_uring.c:2320 [inline]
io_submit_sqes+0xbf4/0x2140 io_uring/io_uring.c:2434
__do_sys_io_uring_enter io_uring/io_uring.c:3280 [inline]
__se_sys_io_uring_enter+0x2e0/0x2b60 io_uring/io_uring.c:3219
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
-> #0 (&ctx->uring_lock){+.+.}-{4:4}:
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
io_uring_task_cancel include/linux/io_uring.h:24 [inline]
begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
search_binary_handler fs/exec.c:1669 [inline]
exec_binprm fs/exec.c:1701 [inline]
bprm_execve+0x92e/0x1400 fs/exec.c:1753
do_execveat_common+0x510/0x6a0 fs/exec.c:1859
do_execve fs/exec.c:1933 [inline]
__do_sys_execve fs/exec.c:2009 [inline]
__se_sys_execve fs/exec.c:2004 [inline]
__x64_sys_execve+0x94/0xb0 fs/exec.c:2004
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
other info that might help us debug this:
Chain exists of:
&ctx->uring_lock --> sb_writers#3 --> &sig->cred_guard_mutex
Possible unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&sig->cred_guard_mutex);
lock(sb_writers#3);
lock(&sig->cred_guard_mutex);
lock(&ctx->uring_lock);
*** DEADLOCK ***
1 lock held by syz.0.9999/12287:
#0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: prepare_bprm_creds fs/exec.c:1360 [inline]
#0: ffff88802db5a2e0 (&sig->cred_guard_mutex){+.+.}-{4:4}, at: bprm_execve+0xb9/0x1400 fs/exec.c:1733
stack backtrace:
CPU: 0 UID: 0 PID: 12287 Comm: syz.0.9999 Tainted: G L syzkaller #0 PREEMPT(full)
Tainted: [L]=SOFTLOCKUP
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 10/25/2025
Call Trace:
<TASK>
dump_stack_lvl+0xe8/0x150 lib/dump_stack.c:120
print_circular_bug+0x2e2/0x300 kernel/locking/lockdep.c:2043
check_noncircular+0x12e/0x150 kernel/locking/lockdep.c:2175
check_prev_add kernel/locking/lockdep.c:3165 [inline]
check_prevs_add kernel/locking/lockdep.c:3284 [inline]
validate_chain kernel/locking/lockdep.c:3908 [inline]
__lock_acquire+0x15a6/0x2cf0 kernel/locking/lockdep.c:5237
lock_acquire+0x107/0x340 kernel/locking/lockdep.c:5868
__mutex_lock_common kernel/locking/mutex.c:614 [inline]
__mutex_lock+0x187/0x1350 kernel/locking/mutex.c:776
io_uring_del_tctx_node+0xf0/0x2c0 io_uring/tctx.c:179
io_uring_clean_tctx+0xd4/0x1a0 io_uring/tctx.c:195
io_uring_cancel_generic+0x6ca/0x7d0 io_uring/cancel.c:646
io_uring_task_cancel include/linux/io_uring.h:24 [inline]
begin_new_exec+0x10ed/0x2440 fs/exec.c:1131
load_elf_binary+0x9f8/0x2d70 fs/binfmt_elf.c:1010
search_binary_handler fs/exec.c:1669 [inline]
exec_binprm fs/exec.c:1701 [inline]
bprm_execve+0x92e/0x1400 fs/exec.c:1753
do_execveat_common+0x510/0x6a0 fs/exec.c:1859
do_execve fs/exec.c:1933 [inline]
__do_sys_execve fs/exec.c:2009 [inline]
__se_sys_execve fs/exec.c:2004 [inline]
__x64_sys_execve+0x94/0xb0 fs/exec.c:2004
do_syscall_x64 arch/x86/entry/syscall_64.c:63 [inline]
do_syscall_64+0xec/0xf80 arch/x86/entry/syscall_64.c:94
entry_SYSCALL_64_after_hwframe+0x77/0x7f
RIP: 0033:0x7ff3a8b8f749
Code: ff ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 40 00 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 a8 ff ff ff f7 d8 64 89 01 48
RSP: 002b:00007ff3a9a97038 EFLAGS: 00000246 ORIG_RAX: 000000000000003b
RAX: ffffffffffffffda RBX: 00007ff3a8de5fa0 RCX: 00007ff3a8b8f749
RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000200000000400
RBP: 00007ff3a8c13f91 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000
R13: 00007ff3a8de6038 R14: 00007ff3a8de5fa0 R15: 00007ff3a8f0fa28
</TASK>
Add a separate lock just for the tctx_list, tctx_lock. This can nest
under ->uring_lock, where necessary, and be used separately for list
manipulation. For the cancelation off exec side, this removes the
need to grab ->uring_lock, hence fixing the circular locking
dependency.
Reported-by: syzbot+b0e3b77ffaa8a4067ce5@syzkaller.appspotmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This commit creates an rcu_tasks_trace_expedite_current() function
that expedites the current (and possibly the next) RCU Tasks Trace
grace period.
If the current RCU Tasks Trace grace period is already waiting, that wait
will complete before the expediting takes effect. If there is no RCU
Tasks Trace grace period in flight, this function might well create one.
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
|
When expressing RCU Tasks Trace in terms of SRCU-fast, it was
necessary to keep a nesting count and per-CPU srcu_ctr structure
pointer in the task_struct structure, which is slow to access.
But an alternative is to instead make rcu_read_lock_tasks_trace() and
rcu_read_unlock_tasks_trace(), which match the underlying SRCU-fast
semantics, avoiding the task_struct accesses.
When all callers have switched to the new API, the previous
rcu_read_lock_trace() and rcu_read_unlock_trace() APIs will be removed.
The rcu_read_{,un}lock_{,tasks_}trace() functions need to use smp_mb()
only if invoked where RCU is not watching, that is, from locations where
a call to rcu_is_watching() would return false. In architectures that
define the ARCH_WANTS_NO_INSTR Kconfig option, use of noinstr and friends
ensures that tracing happens only where RCU is watching, so those
architectures can dispense entirely with the read-side calls to smp_mb().
Other architectures include these read-side calls by default, but in many
installations there might be either larger than average tolerance for
risk, prohibition of removing tracing on a running system, or careful
review and approval of removing of tracing. Such installations can
build their kernels with CONFIG_TASKS_TRACE_RCU_NO_MB=y to avoid those
read-side calls to smp_mb(), thus accepting responsibility for run-time
removal of tracing from code regions that RCU is not watching.
Those wishing to disable read-side memory barriers for an entire
architecture can select this TASKS_TRACE_RCU_NO_MB Kconfig option,
hence the polarity.
[ paulmck: Apply Peter Zijlstra feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
|
Now that RCU Tasks Trace has been re-implemented in terms of SRCU-fast,
the ->trc_ipi_to_cpu, ->trc_blkd_cpu, ->trc_blkd_node, ->trc_holdout_list,
and ->trc_reader_special task_struct fields are no longer used.
In addition, the rcu_tasks_trace_qs(), rcu_tasks_trace_qs_blkd(),
exit_tasks_rcu_finish_trace(), and rcu_spawn_tasks_trace_kthread(),
show_rcu_tasks_trace_gp_kthread(), rcu_tasks_trace_get_gp_data(),
rcu_tasks_trace_torture_stats_print(), and get_rcu_tasks_trace_gp_kthread()
functions and all the other functions that they invoke are no longer used.
Also, the TRC_NEED_QS and TRC_NEED_QS_CHECKED CPP macros are no longer used.
Neither are the rcu_tasks_trace_lazy_ms and rcu_task_ipi_delay rcupdate
module parameters and the TASKS_TRACE_RCU_READ_MB Kconfig option.
This commit therefore removes all of them.
[ paulmck: Apply Alexei Starovoitov feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
|
This commit saves more than 500 lines of RCU code by re-implementing
RCU Tasks Trace in terms of SRCU-fast. Follow-up work will remove
more code that does not cause problems by its presence, but that is no
longer required.
This variant places smp_mb() in rcu_read_{,un}lock_trace(), and in the
same place that srcu_read_{,un}lock() would put them. These smp_mb()
calls will be removed on common-case architectures in a later commit.
In the meantime, it serves to enforce ordering between the underlying
srcu_read_{,un}lock_fast() markers and the intervening critical section,
even on architectures that permit attaching tracepoints on regions of
code not watched by RCU. Such architectures defeat SRCU-fast's use of
implicit single-instruction, interrupts-disabled, and atomic-operation
RCU read-side critical sections, which have no effect when RCU is not
watching. The aforementioned later commit will insert these smp_mb()
calls only on architectures that have not used noinstr to prevent
attaching tracepoints to code where RCU is not watching.
[ paulmck: Apply kernel test robot, Boqun Feng, and Zqiang feedback. ]
[ paulmck: Split out Tiny SRCU fixes per Andrii Nakryiko feedback. ]
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Tested-by: kernel test robot <oliver.sang@intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Reviewed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Boqun Feng <boqun.feng@gmail.com>
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-next
Core Changes:
- Dynamic pagemaps and multi-device SVM (Thomas)
Driver Changes:
- Introduce SRIOV scheduler Groups (Daniele)
- Configure migration queue as low latency (Francois)
- Don't use absolute path in generated header comment (Calvin Owens)
- Add SoC remapper support for system controller (Umesh)
- Insert compiler barriers in GuC code (Jonathan)
- Rebar updates (Lucas)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aVOiULyYdnFbq-JB@fedora
|
|
https://gitlab.freedesktop.org/drm/xe/kernel into drm-fixes
Core Changes:
- Ensure a SVM device memory allocation is idle before migration complete (Thomas)
Driver Changes:
- Fix a SVM debug printout (Thomas)
- Use READ_ONCE() / WRITE_ONCE() for g2h_fence (Jonathan)
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Thomas Hellstrom <thomas.hellstrom@linux.intel.com>
Link: https://patch.msgid.link/aVOTf6-whmkgrUuq@fedora
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86
Pull x86 platform driver fixes from Ilpo Järvinen:
- alienware-wmi-wmax: Area-51, x16, and 16X Aurora laptops support
- asus-armoury:
- Fix FA507R PPT data
- Add TDP data for more laptop models
- asus-nb-wmi: Asus Zenbook 14 display toggle key support
- dell-lis3lv02d: Dell Latitude 5400 support
- hp-bioscfg: Fix out-of-bounds array access in ACPI package parsing
- ibm_rtl: Fix EBDA signature search pointer arithmetic
- ideapad-laptop: Reassign KEY_CUT to KEY_SELECTIVE_SCREENSHOT
- intel/pmt:
- Fix kobject memory leak on init failure
- Use valid pointers on error handling path
- intel/vsec: Correct kernel doc comments
- mellanox: mlxbf-pmc: Fix event names
- msi-laptop: Add sysfs_remove_group()
- samsumg-galaxybook: Do not cast pointer to a shorter type
- think-lmi: WMI certificate thumbprint support for ThinkCenter
- uniwill: Tuxedo Book BA15 Gen10 support
* tag 'platform-drivers-x86-v6.19-2' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (22 commits)
platform/x86: asus-armoury: add support for G835LW
platform/x86: asus-armoury: fix ppt data for FA507R
platform/x86/intel/pmt/discovery: use valid device pointer in dev_err_probe
platform/x86: hp-bioscfg: Fix out-of-bounds array access in ACPI package parsing
platform/x86: asus-armoury: add support for G615LR
platform/x86: asus-armoury: add support for FA608UM
platform/x86: asus-armoury: add support for GA403WR
platform/x86: asus-armoury: add support for GU605CR
platform/x86: ideapad-laptop: Reassign KEY_CUT to KEY_SELECTIVE_SCREENSHOT
platform/x86: samsung-galaxybook: Fix problematic pointer cast
platform/x86/intel/pmt: Fix kobject memory leak on init failure
platform/x86/intel/vsec: correct kernel-doc comments
platform/x86: ibm_rtl: fix EBDA signature search pointer arithmetic
platform/x86: msi-laptop: add missing sysfs_remove_group()
platform/x86: think-lmi: Add WMI certificate thumbprint support for ThinkCenter
platform/x86: dell-lis3lv02d: Add Latitude 5400
platform/mellanox: mlxbf-pmc: Remove trailing whitespaces from event names
platform/x86: asus-nb-wmi: Add keymap for display toggle
platform/x86/uniwill: Add TUXEDO Book BA15 Gen10
platform/x86: alienware-wmi-wmax: Add support for Alienware 16X Aurora
...
|
|
Pull VFIO fixes from Alex Williamson:
- Restrict ROM access to dword to resolve a regression introduced with
qword access seen on some Intel NICs. Update VGA region access to the
same given lack of precedent for 64-bit users (Kevin Tian)
- Fix missing .get_region_info_caps callback in the xe-vfio-pci variant
driver due to integration through the DRM tree (Michal Wajdeczko)
- Add aligned 64-bit access macros to tools/include/linux/types.h,
allowing removal of uapi/linux/type.h includes from various vfio
selftest, resolving redefinition warnings for integration with KVM
selftests (David Matlack)
- Fix error path memory leak in pds-vfio-pci variant driver (Zilin Guan)
- Fix error path use-after-free in xe-vfio-pci variant driver (Alper Ak)
* tag 'vfio-v6.19-rc4' of https://github.com/awilliam/linux-vfio:
vfio/xe: Fix use-after-free in xe_vfio_pci_alloc_file()
vfio/pds: Fix memory leak in pds_vfio_dirty_enable()
vfio: selftests: Drop <uapi/linux/types.h> includes
tools include: Add definitions for __aligned_{l,b}e64
vfio/xe: Add default handler for .get_region_info_caps
vfio/pci: Disable qword access to the VGA region
vfio/pci: Disable qword access to the PCI ROM bar
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from Bluetooth and WiFi. Notably this includes the fix
for the iwlwifi issue you reported.
Current release - regressions:
- core: avoid prefetching NULL pointers
- wifi:
- iwlwifi: implement settime64 as stub for MVM/MLD PTP
- mac80211: fix list iteration in ieee80211_add_virtual_monitor()
- handshake: fix null-ptr-deref in handshake_complete()
- eth: mana: fix use-after-free in reset service rescan path
Previous releases - regressions:
- openvswitch: avoid needlessly taking the RTNL on vport destroy
- dsa: properly keep track of conduit reference
- ipv4:
- fix error route reference count leak with nexthop objects
- fib: restore ECMP balance from loopback
- mptcp: ensure context reset on disconnect()
- bluetooth: fix potential UaF in btusb
- nfc: fix deadlock between nfc_unregister_device and
rfkill_fop_write
- eth:
- gve: defer interrupt enabling until NAPI registration
- i40e: fix scheduling in set_rx_mode
- macb: relocate mog_init_rings() callback from macb_mac_link_up()
to macb_open()
- rtl8150: fix memory leak on usb_submit_urb() failure
- wifi: 8192cu: fix tid out of range in rtl92cu_tx_fill_desc()
Previous releases - always broken:
- ip6_gre: make ip6gre_header() robust
- ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT
- af_unix: don't post cmsg for SO_INQ unless explicitly asked for
- phy: mediatek: fix nvmem cell reference leak in
mt798x_phy_calibration
- wifi: mac80211: discard beacon frames to non-broadcast address
- eth:
- iavf: fix off-by-one issues in iavf_config_rss_reg()
- stmmac: fix the crash issue for zero copy XDP_TX action
- team: fix check for port enabled when priority changes"
* tag 'net-6.19-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (64 commits)
ipv6: fix a BUG in rt6_get_pcpu_route() under PREEMPT_RT
net: rose: fix invalid array index in rose_kill_by_device()
net: enetc: do not print error log if addr is 0
net: macb: Relocate mog_init_rings() callback from macb_mac_link_up() to macb_open()
selftests: fib_test: Add test case for ipv4 multi nexthops
net: fib: restore ECMP balance from loopback
selftests: fib_nexthops: Add test cases for error routes deletion
ipv4: Fix reference count leak when using error routes with nexthop objects
net: usb: sr9700: fix incorrect command used to write single register
ipv6: BUG() in pskb_expand_head() as part of calipso_skbuff_setattr()
usbnet: avoid a possible crash in dql_completed()
gve: defer interrupt enabling until NAPI registration
net: stmmac: fix the crash issue for zero copy XDP_TX action
octeontx2-pf: fix "UBSAN: shift-out-of-bounds error"
af_unix: don't post cmsg for SO_INQ unless explicitly asked for
net: mana: Fix use-after-free in reset service rescan path
net: avoid prefetching NULL pointers
net: bridge: Describe @tunnel_hash member in net_bridge_vlan_group struct
net: nfc: fix deadlock between nfc_unregister_device and rfkill_fop_write
net: usb: asix: validate PHY address before use
...
|
|
Many bridge drivers store a next_bridge pointer in their private data and
use it for attach and sometimes other purposes. This is going to be risky
when bridge hot-unplug is used.
Considering this example scenario:
1. pipeline: encoder --> bridge A --> bridge B --> bridge C
2. encoder takes a reference to bridge B
3. bridge B takes a next_bridge reference to bridge C
4. encoder calls (bridge B)->b_foo(), which in turns references
next_bridge, e.g.:
b_foo() {
bar(b->next_bridge);
}
If bridges B and C are removed, bridge C can be freed but B is still
allocated because the encoder holds a reference to B. So when step 4
happens, 'b->next-bridge' would be a use-after-free.
Calling drm_bridge_put() in the B bridge .remove function does not solve
the problem as it leaves a (potentially long) risk window between B removal
and the final deallocation of B. A safe moment to put the B reference is in
__drm_bridge_free(), when the last reference has been put. This can be done
by drivers in the .destroy func. However to avoid the need for so many
drivers to implement a .destroy func, just offer a next_bridge pointer to
all bridges that is automatically put it in __drm_bridge_free(), exactly
when the .destroy func is called.
Suggested-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/all/20251201-thick-jasmine-oarfish-1eceb0@houat/
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20251216-drm-bridge-alloc-getput-drm_of_find_bridge-v3-6-b5165fab8058@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
|
|
of_drm_find_bridge() does not increment the refcount for the returned
bridge, but that is required now. However converting it and all its users
is not realistically doable at once given the large amount of (direct and
indirect) callers and the complexity of some.
Solve this issue by creating a new of_drm_find_and_get_bridge() function
that is identical to of_drm_find_bridge() except also it takes a
reference. Then of_drm_find_bridge() will be deprecated to be eventually
removed.
Suggested-by: Maxime Ripard <mripard@kernel.org>
Link: https://lore.kernel.org/dri-devel/20250319-stylish-lime-mongoose-0a18ad@houat/
Reviewed-by: Maxime Ripard <mripard@kernel.org>
Link: https://patch.msgid.link/20251216-drm-bridge-alloc-getput-drm_of_find_bridge-v3-1-b5165fab8058@bootlin.com
Signed-off-by: Luca Ceresoli <luca.ceresoli@bootlin.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull misc fixes from Andrew Morton:
"27 hotfixes. 12 are cc:stable, 18 are MM.
There's a patch series from Jiayuan Chen which fixes some
issues with KASAN and vmalloc. Apart from that it's the usual
shower of singletons - please see the respective changelogs
for details"
* tag 'mm-hotfixes-stable-2025-12-28-21-50' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (27 commits)
mm/ksm: fix pte_unmap_unlock of wrong address in break_ksm_pmd_entry
mm/page_owner: fix memory leak in page_owner_stack_fops->release()
mm/memremap: fix spurious large folio warning for FS-DAX
MAINTAINERS: notify the "Device Memory" community of memory hotplug changes
sparse: update MAINTAINERS info
mm/page_alloc: report 1 as zone_batchsize for !CONFIG_MMU
mm: consider non-anon swap cache folios in folio_expected_ref_count()
rust: maple_tree: rcu_read_lock() in destructor to silence lockdep
mm: memcg: fix unit conversion for K() macro in OOM log
mm: fixup pfnmap memory failure handling to use pgoff
tools/mm/page_owner_sort: fix timestamp comparison for stable sorting
selftests/mm: fix thread state check in uffd-unit-tests
kernel/kexec: fix IMA when allocation happens in CMA area
kernel/kexec: change the prototype of kimage_map_segment()
MAINTAINERS: add ABI headers to KHO and LIVE UPDATE
.mailmap: remove one of the entries for WangYuli
mm/damon/vaddr: fix missing pte_unmap_unlock in damos_va_migrate_pmd_entry()
MAINTAINERS: update one straggling entry for Bartosz Golaszewski
mm/page_alloc: change all pageblocks migrate type on coalescing
mm: leafops.h: correct kernel-doc function param. names
...
|
|
io_uring manages issued and pending IOPOLL read/write requests in a
singly linked list. One downside of that is that individual items
cannot easily be removed from that list, and as a result, io_uring
will only complete a completed request N in that list if 0..N-1 are
also complete. For homogenous IO this isn't necessarily an issue,
but if different devices are involved in polling in the same ring, or
if disparate IO from the same device is being polled for, this can
defer completion of some requests unnecessarily.
Move to a doubly linked list for iopoll completions instead, making it
possible to easily complete whatever requests that were polled done
successfully.
Co-developed-by: Fengnan Chang <fengnanchang@gmail.com>
Link: https://lore.kernel.org/io-uring/20251210085501.84261-1-changfengnan@bytedance.com/
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator
Pull regulator fixes from Mark Brown:
"A couple of fixes from Thomas, making the UAPI headers more robustly
correct and ensuring they are covered by checkpatch, and one from
Andreas fixing an update for a change to the DT bindings that I missed
was requested during bindings review in the newly added fp9931 driver"
* tag 'regulator-fix-v6.19-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/broonie/regulator:
regulator: fp9931: fix regulator node pointer
regulator: Add UAPI headers to MAINTAINERS
regulator: uapi: Use UAPI integer type
|
|
When one iio device is a consumer of another, it is possible that
the ->info_exist_lock of both ends up being taken when reading the
value of the consumer device.
Since they currently belong to the same lockdep class (being
initialized in a single location with mutex_init()), that results in a
lockdep warning
CPU0
----
lock(&iio_dev_opaque->info_exist_lock);
lock(&iio_dev_opaque->info_exist_lock);
*** DEADLOCK ***
May be due to missing lock nesting notation
4 locks held by sensors/414:
#0: c31fd6dc (&p->lock){+.+.}-{3:3}, at: seq_read_iter+0x44/0x4e4
#1: c4f5a1c4 (&of->mutex){+.+.}-{3:3}, at: kernfs_seq_start+0x1c/0xac
#2: c2827548 (kn->active#34){.+.+}-{0:0}, at: kernfs_seq_start+0x30/0xac
#3: c1dd2b68 (&iio_dev_opaque->info_exist_lock){+.+.}-{3:3}, at: iio_read_channel_processed_scale+0x24/0xd8
stack backtrace:
CPU: 0 UID: 0 PID: 414 Comm: sensors Not tainted 6.17.11 #5 NONE
Hardware na |