| Age | Commit message (Collapse) | Author | Files | Lines |
|
The first parameter to kcpustat_field() is a pointer to the cpu kcpustat to
be fetched from. This parameter is error prone because a copy to a kcpustat
could be passed by accident instead of the original one. Also the kcpustat
structure can already be retrieved with the help of the mandatory CPU
argument.
Remove the needless parameter.
Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Link: https://patch.msgid.link/20260508131647.43868-4-frederic@kernel.org
|
|
rcu-merge.2026.05.24
rcutorture.2026.05.24: Torture-test updates
misc.2026.05.24: Miscellaneous RCU updates
|
|
BPF programs should have no need in looking into struct io_ring_ctx, if
anything, most of such cases would be anti patterns like looking up ring
indices directly via the context.
Replace it with a new empty structure, which is just an alias to struct
io_ring_ctx. It'll create a new BTF type and fail verification if a BPF
program tries to access it (beyond the first byte). It'll also give more
flexibility for the future, and otherwise it can be made aligned with
io_ring_ctx as before with struct groups if ever needed or extended in a
different way.
Fixes: d0e437b76bd3c ("io_uring/bpf-ops: implement loop_step with BPF struct_ops")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://patch.msgid.link/5f6ca3649e9e0bae8667db4357e28dd00cd07901.1780394491.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM fixes from Andrew Morton:
"13 hotfixes. All are for MM. 10 are cc:stable and the remaining 3
address post-7.1 issues or aren't considered suitable for backporting.
There's a three-patch series "userfaultfd: verify VMA state across
UFFDIO_COPY retry" from Mike Rapoport which fixes a few uffd things.
The rest are singletons - please see the individual changelogs for
details"
* tag 'mm-hotfixes-stable-2026-06-01-20-58' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm:
userfaultfd: remove redundant check in vm_uffd_ops()
userfaultfd: refuse to __mfill_atomic_pte() for unsupported VMAs
userfaultfd: verify VMA state across UFFDIO_COPY retry
mm/huge_memory: update file PMD counter before folio_put()
mm/huge_memory: update file PUD counter before folio_put()
mm/hugetlb_vmemmap: fix incorrect vmemmap restore in rollback
mm/damon/ops-common: call folio_test_lru() after folio_get()
mm/cma: fix reserved page leak on activation failure
mm/memory-failure: fix hugetlb_lock AA deadlock in get_huge_page_for_hwpoison
mm/hugetlb: restore reservation on error in hugetlb folio copy paths
mm/cma_debug: fix invalid accesses for inactive CMA areas
memcg: use round-robin victim selection in refill_stock
mm/hugetlb: avoid false positive lockdep assertion
|
|
The empty zero page is used to back any kernel or user space mapping
that is supposed to remain cleared, and so the page itself is never
supposed to be modified.
So mark it as const, which moves it into .rodata rather than .bss: on
most architectures, this ensures that both the kernel's mapping of it
and any aliases that are accessible via the kernel direct (linear) map
are mapped read-only, and cannot be used (inadvertently or maliciously)
to corrupt the contents of the zero page.
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Reviewed-by: Kevin Brodsky <kevin.brodsky@arm.com>
Acked-by: David Hildenbrand (Arm) <david@kernel.org>
Reviewed-by: Jann Horn <jannh@google.com>
Reviewed-by: Feng Tang <feng.tang@linux.alibaba.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Will Deacon <will@kernel.org>
|
|
Uwe Kleine-König (The Capable Hub) <u.kleine-koenig@baylibre.com> says:
this series targets to use named initializers for platform_device_id
arrays. In general these are better readable for humans and more robust
to changes in the respective struct definition.
This robustness is needed as I want to do
Link: https://patch.msgid.link/cover.1779878004.git.u.kleine-koenig@baylibre.com
|
|
There is an bug in which an uninitialized stack variable is used in
rseq_exit_user_update() as reported by syzbot:
BUG: KMSAN: kernel-infoleak in rseq_set_ids_get_csaddr include/linux/rseq_entry.h:502 [inline]
The local variable:
struct rseq_ids ids = {
.cpu_id = task_cpu(t),
.mm_cid = task_mm_cid(t),
.node_id = cpu_to_node(ids.cpu_id),
};
According to the C standard, the evaluation order of expressions in an
initializer list is indeterminately sequenced. The compiler (Clang, in
this KMSAN build) evaluates `cpu_to_node(ids.cpu_id)` *before*
`ids.cpu_id` is initialized with `task_cpu(t)`.
This is fixed by moving the assignment of ids.node_id outside the
structure initialization.
Fixes: 82f572449cfe ("rseq: Implement read only ABI enforcement for optimized RSEQ V2 mode")
Closes: https://syzkaller.appspot.com/bug?extid=185a631927096f9da2fc
Reported-by: syzbot+185a631927096f9da2fc@syzkaller.appspotmail.com
Signed-off-by: Qing Wang <wangqing7171@gmail.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://patch.msgid.link/20260602030854.574038-1-wangqing7171@gmail.com
|
|
Now that the proxy path uses ->is_blocked, use the '->is_blocked &&
!->blocked_on' state instead of PROXY_WAKING. Notably, this is where a
blocked_on relation is broken but the donor task might still need a return
migration.
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260526113322.596522894%40infradead.org
|
|
Add link to the task this task is proxying for, and use it so
the mutex owner can do an intelligent hand-off of the mutex to
the task that the owner is running on behalf.
[jstultz: This patch was split out from larger proxy patch]
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Juri Lelli <juri.lelli@redhat.com>
Signed-off-by: Valentin Schneider <valentin.schneider@arm.com>
Signed-off-by: Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-8-jstultz@google.com
|
|
Add a new is_blocked flag to the task struct. This flag is set
by try_to_block_task() and cleared by ttwu_do_wakeup() and
tracks if the task is blocked.
Traditionally this would mirror !p->on_rq, however due things
like DELAY_DEQUEUE and PROXY_EXEC, this can diverge, so its
useful to manage separately.
Additionally with this, we might be able to get rid of the
p->se.sched_delayed (ab)use in the core code (eventually).
Taken whole cloth from Peter's email:
https://lore.kernel.org/lkml/20260501132143.GC1026330@noisy.programming.kicks-ass.net/
With a few additional p->is_blocked = 0 in a few cases where
we return current if blocked_on gets zeroed or there is
no owner. This may hint that these current special cases
might be dropped eventually.
This change also helps resolve wait-queue stalls seen with
proxy-execution. See previous patch attempts for details:
https://lore.kernel.org/lkml/20260430215103.2978955-2-jstultz@google.com/
Reported-by: Vineeth Pillai <vineethrp@google.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-7-jstultz@google.com
|
|
This patch adds logic so try_to_wake_up() will notice if we are
waking a task where blocked_on == PROXY_WAKING, and if necessary
dequeue the task so the wakeup will naturally return-migrate the
donor task back to a cpu it can run on.
This helps performance as we do the dequeue and wakeup under the
locks normally taken in the try_to_wake_up() and avoids having
to do proxy_force_return() from __schedule(), which has to
re-take similar locks and then force a pick again loop.
This was split out from the larger proxy patch, and
significantly reworked.
Credits for the original patch go to:
Peter Zijlstra (Intel) <peterz@infradead.org>
Juri Lelli <juri.lelli@redhat.com>
Valentin Schneider <valentin.schneider@arm.com>
Connor O'Brien <connoro@google.com>
Signed-off-by: John Stultz <jstultz@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260512025635.2840817-6-jstultz@google.com
|
|
Pick up urgent fixes.
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
|
|
ktime_get_snapshot() resolves to ktime_get_snapshot_id(CLOCK_REALTIME).
Make it obvious in the code and convert the readout to use the
snapshot::systime and monoraw fields instead of snapshot::real and raw,
which aregoing away.
Similar to the PPS generators, avoid the more expensive snapshot when
CONFIG_NTP_PPS is disabled.
No functional change intended.
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195557.123410250@kernel.org
|
|
ktime_get_snapshot() provides a snapshot of the underlying clocksource
counter value and the corresponding CLOCK_MONOTONIC_RAW, CLOCK_REALTIME and
CLOCK_BOOTTIME timestamps.
There is no usage of CLOCK_REALTIME and CLOCK_BOOTTIME at the same time and
CLOCK_BOOTTIME support was just added for the ARM64 KVM tracing mechanism,
which needs CLOCK_BOOTTIME and the underlying clocksource counter value.
ktime_get_snapshot() is also not suitable for usage with CLOCK_AUX, but
that's a prerequisite to support PTP hardware timestamping for CLOCK_AUX
steering.
As a first step, rename ktime_get_snapshot() to ktime_get_snapshot_id(),
which now takes a clockid argument to select the clock which needs to be
captured. The result is stored in system_time_snapshot::systime, which will
replace the system_time_snapshot::real/boot members once all usage sites
have been converted.
ktime_get_snapshot() is a simple wrapper which hands in CLOCK_REALTIME as
clockid argument for the conversion period. That means CLOCK_REALTIME is
now captured twice, but that redunancy is only temporary.
As all usage sites of struct system_time_snapshot has to be updated anyway,
rename the 'raw' member to 'monoraw' for clarity.
No functional change vs. current users of ktime_get_snapshot()
Signed-off-by: Thomas Gleixner <tglx@kernel.org>
Tested-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Arthur Kiyanovski <akiyano@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: Thomas Weißschuh <thomas.weissschuh@linutronix.de>
Reviewed-by: Jacob Keller <jacob.e.keller@intel.com>
Link: https://patch.msgid.link/20260529195556.971591633@kernel.org
|
|
Carlos Song (OSS) <carlos.song@oss.nxp.com> says:
This series fixes two issues in the fsl-lpspi DMA transfer error paths.
Patch 1 replaces the deprecated dmaengine_terminate_all() with
dmaengine_terminate_sync() across all error paths in
fsl_lpspi_dma_transfer().
Patch 2 fixes a missing RX DMA channel termination when TX descriptor
preparation fails. Since the RX channel is already submitted and issued
before the TX descriptor is prepared, returning -EINVAL without
terminating the RX channel leaves it running against buffers that the
SPI core will unmap, potentially causing memory corruption.
Link: https://patch.msgid.link/20260525062357.3191349-1-carlos.song@oss.nxp.com
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux
Pull liveupdate fixes from Mike Rapoport:
"Two kexec handover regression fixes:
- fix order calculation for kho_unpreserve_pages() to make sure sure
that the order calculation in kho_unpreserve_pages() mathes the
order calculation in kho_preserve_pages().
- fix math in calculation of KHO_TREE_MAX_DEPTH to make it work with
16KB pages"
* tag 'liveupdate-fixes-2026-05-30' of git://git.kernel.org/pub/scm/linux/kernel/git/liveupdate/linux:
kho: fix order calculation for kho_unpreserve_pages()
kho: fix KHO_TREE_MAX_DEPTH for non-4KB page sizes
|
|
All buses have been converted from driver_set_override() to the generic
driver_override infrastructure introduced in commit cb3d1049f4ea
("driver core: generalize driver_override in struct device").
Buses now either opt into the generic sysfs callbacks via the
bus_type::driver_override flag, or use device_set_driver_override() /
__device_set_driver_override() directly.
Thus, remove the now-unused driver_set_override() helper.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20260505133935.3772495-6-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: e95060478244 ("rpmsg: Introduce a driver override mechanism")
Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20260505133935.3772495-5-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Tested-by: Michael Kelley <mhklinux@outlook.com>
Reviewed-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: d765edbb301c ("vmbus: add driver_override support")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20260505133935.3772495-4-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 2959ab247061 ("cdx: add the cdx bus driver")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20260505133935.3772495-3-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
When a driver is probed through __driver_attach(), the bus' match()
callback is called without the device lock held, thus accessing the
driver_override field without a lock, which can cause a UAF.
Fix this by using the driver-core driver_override infrastructure taking
care of proper locking internally.
Note that calling match() from __driver_attach() without the device lock
held is intentional. [1]
Link: https://lore.kernel.org/driver-core/DGRGTIRHA62X.3RY09D9SOK77P@kernel.org/ [1]
Reported-by: Gui-Dong Han <hanguidong02@gmail.com>
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220789
Fixes: 3cf385713460 ("ARM: 8256/1: driver coamba: add device binding path 'driver_override'")
Reviewed-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Link: https://patch.msgid.link/20260505133935.3772495-2-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty
Pull tty/serial driver fixes from Greg KH:
"Here are some small serial driver fixes for 7.1-rc6. Included in here
are:
- mips serial driver fixes to resolve some long-standing issues with
how they interacted with the console. That's the "majority" of the
changes in this merge request
- sh-sci driver regression fix
- 8250 driver regression fixes
- other small serial driver fixes for reported problems.
All of these have been in linux-next for over a week with no reported
issues"
* tag 'tty-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty:
serial: dz: Enable modular build
serial: zs: Convert to use a platform device
serial: dz: Convert to use a platform device
serial: zs: Switch to using channel reset
serial: zs: Fix bootconsole handover lockup
serial: dz: Fix bootconsole handover lockup
serial: dz: Fix bootconsole message clobbering at chip reset
serial: 8250_dw: dispatch SysRq character in dw8250_handle_irq()
serial: 8250: dispatch SysRq character in serial8250_handle_irq()
serial: core: introduce guard(uart_port_lock_check_sysrq_irqsave)
tty: serial: samsung: Remove redundant port lock acquisition in rx helpers
serial: altera_jtaguart: handle uart_add_one_port() failures
serial: qcom_geni: fix kfifo underflow when flush precedes DMA completion IRQ
serial: fsl_lpuart: fix rx buffer and DMA map leaks in start_rx_dma
tty: add missing tty_driver include to tty_port.h
serial: qcom-geni: fix UART_RX_PAR_EN bit position
serial: sh-sci: fix memory region release in error path
tty: serial: pch_uart: add check for dma_alloc_coherent()
serial: zs: Fix swapped RI/DSR modem line transition counting
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc
Pull char/misc/iio fixes from Greg KH:
"Here are some small char/misc/iio driver fixes for 7.1-rc6. Included
in here are:
- lots of small IIO driver fixes for reported problems.
- Android binder bugfixes for reported issues.
- small comedi test driver fixes
- counter driver fix
- parport driver fix (people still use this?)
- rpi driver fix
- uio driver fix
All of these have been in linux-next for over a week with no reported
problems"
* tag 'char-misc-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/char-misc: (41 commits)
Revert "gpib: cb7210: Fix region leak when request_irq fails"
misc: rp1: Send IACK on IRQ activate to fix kdump/kexec
gpib: cb7210: Fix region leak when request_irq fails
parport: Fix race between port and client registration
uio: uio_pci_generic_sva: fix double free of devm_kzalloc() memory
rust_binder: Avoid holding lock when dropping delivered_death
rust_binder: avoid calling pending_oneway_finished() on TF_UPDATE_TXN
comedi: comedi_test: fix check for valid scan_begin_src in waveform_ai_cmdtest()
comedi: comedi_test: Fix limiting of convert_arg in waveform_ai_cmdtest()
iio: adc: viperboard: Fix error handling in vprbrd_iio_read_raw
iio: gyro: itg3200: fix i2c read into the wrong stack location
iio: dac: ad5686: fix powerdown control on dual-channel devices
iio: dac: ad5686: acquire lock when doing powerdown control
iio: temperature: tsys01: fix broken PROM checksum validation
iio: dac: ad3530r: Fix AD3531/AD3531R powerdown mode strings
iio: buffer: hw-consumer: fix use-after-free in error path
iio: dac: ad5686: fix input raw value check
iio: dac: ad5686: fix ref bit initialization for single-channel parts
iio: ssp_sensors: cancel delayed work_refresh on remove
iio: adc: meson-saradc: fix calibration buffer leak on error
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/nathan/linux
Pull clang build fix from Nathan Chancellor:
"A small fix to disable -Wattribute-alias for clang in the few places
it is already disabled for GCC, now that tip of tree clang has
implemented -Wattribute-alias as GCC has"
* tag 'clang-fixes-7.1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/nathan/linux:
Disable -Wattribute-alias for clang-23 and newer
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux
Pull iommu fixes from Joerg Roedel:
- Fix compile warning with gcc-16.1
- Intel VT-d: Simplify calculate_psi_aligned_address()
- MAINTAINERS updates
* tag 'iommu-fixes-v7.1-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/iommu/linux:
MAINTAINERS: Add my employer to my entries
MAINTAINERS: Add Vasant Hegde to reviewers of AMD IOMMU
iommu, debugobjects: avoid gcc-16.1 section mismatch warnings
iommu/vt-d: Simplify calculate_psi_aligned_address()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid
Pull HID fixes from Benjamin Tissoires:
- buffer overflow fix for lenovo (Kean) and wacom (Lee Jones) drivers
- segfaults prevention in lenovo-go driver when used with an emulated
device (Louis Clinckx)
- cleanup of resources in u2fzero (Myeonghun Pak)
- a quirk for a USB mouse and a cleanup in hid.h (hlleng and Liu Kai)
* tag 'hid-for-linus-2026052801' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid:
HID: wacom: Fix OOB write in wacom_hid_set_device_mode()
HID: lenovo-go: drop dead NULL check on to_usb_interface()
HID: lenovo-go: reject non-USB transports in probe
HID: lenovo: Fix buffer over-read and unaligned access in X12 Tab raw_event handler
HID: quirks: Add ALWAYS_POLL quirk for SIGMACHIP USB mouse
HID: remove duplicate hid_warn_ratelimited definition
HID: u2fzero: free allocated URB on probe errors
|
|
Panther Lake-H SoC memory controller registers for memory topology have
been updated, but the current igen6_edac driver still uses old generation
ones to incorrectly parse memory topology.
Fix the issue by adding memory topology parsing function pointers to the
'struct res_config' and creating a new configuration structure for Panther
Lake-H SoCs to enable igen6_edac to parse memory correctly.
Fixes: 0be9f1af3902 ("EDAC/igen6: Add Intel Panther Lake-H SoCs support")
Fixes: 4c36e6106997 ("EDAC/igen6: Add more Intel Panther Lake-H SoCs support")
Signed-off-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Link: https://patch.msgid.link/20260403054029.3950383-3-qiuxu.zhuo@intel.com
|
|
|
|
People have gone to the trouble of writing this kernel-doc; the
least we can do is publish it.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@linux.dev>
Link: https://patch.msgid.link/20260528175905.1102280-3-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
This is a simple helper which replaces page_folio(bvec->bv_page).
Minor improvement in readability, but the real motivation is to reduce
the number of references to bvec->bv_page so that it can be changed
with less work.
Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
Cc: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: William Kucharski <william.kucharski@linux.dev>
Link: https://patch.msgid.link/20260528175905.1102280-2-willy@infradead.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Commit cd959a3562050d ("sched_ext: Add a DL server for sched_ext tasks")
introduced an ext_server deadline server to protect sched_ext tasks from
fair/RT starvation, mirroring the existing fair_server.
Currently, both servers reserve their 50ms/1000ms bandwidth at boot,
regardless of whether a BPF scheduler is loaded. Unused bandwidth is
still reclaimed at runtime by other classes, but the static reservation
prevents the RT class from implicitly using that headroom when one of
the two classes is guaranteed to be empty.
A sysadmin can work around this by writing
/sys/kernel/debug/sched/{fair,ext}_server/cpu*/runtime, but that
requires manual action and not all systems expose debugfs.
A better approach is to make server bandwidth reservations dynamic: only
the scheduling policy that is currently active should register its
reservation, while the inactive one should not artificially hold
capacity (keeping both reservations only when the BPF scheduler is
running in partial mode):
+---------------------------------------------+-------------+------------+
| BPF scheduler state | fair server | ext server |
+---------------------------------------------+-------------+------------+
| not loaded (default boot) | reserved | none |
| loaded full mode (!SCX_OPS_SWITCH_PARTIAL) | none | reserved |
| loaded partial mode (SCX_OPS_SWITCH_PARTIAL)| reserved | reserved |
+---------------------------------------------+-------------+------------+
To achieve this, introduce an "attached/detached" state for each
deadline server, so the kernel can decide whether a server's bandwidth
should be accounted in global bandwidth tracking.
At boot, the system starts with only the fair server contributing to
bandwidth accounting. When a BPF scheduler is enabled, the ext server is
attached and may replace or complement the fair server depending on
whether full or partial mode is used. When sched_ext is disabled, the
system restores the previous deadline bandwidth values and behavior.
The transition logic ensures that switching between scheduling modes is
consistent and reversible, without losing runtime configuration or
requiring manual intervention.
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Juri Lelli <juri.lelli@redhat.com>
Link: https://patch.msgid.link/20260526164420.638711-2-arighi@nvidia.com
|
|
The only user of LAST_XXX outside of fs/namei.c is fs/smb/server/vfs.c;
ksmbd_vfs_path_lookup() calls vfs_path_parent_lookup() and expects a
LAST_NORM last type (or it will be ENOENT). ksmbd_vfs_rename() also calls
vfs_path_parent_lookup() but forgets the LAST_NORM check.
It does not really make sense to have vfs_path_parent_lookup() expose
the last_type because it is only needed to ensure it is LAST_NORM. So
let's do this check in vfs_path_parent_lookup() instead and keep the
LAST_XXX internal to fs/namei.c. This changes the ENOENT errno in
ksmbd_vfs_path_lookup() to EINVAL, which matches better with how this is
handled by callers of filename_parentat().
Signed-off-by: Jori Koolstra <jkoolstra@xs4all.nl>
Link: https://patch.msgid.link/20260528175854.57626-1-jkoolstra@xs4all.nl
Reviewed-by: Amir Goldstein <amir73il@gmail.com>
Reviewed-by: NeilBrown <neil@brown.name>
Reviewed-by: Namjae Jeon <linkinjeon@kernel.org>
Signed-off-by: Christian Brauner (Amutable) <brauner@kernel.org>
|
|
The only user of msg->msg_iocb was AF_ALG, but that's deprecated.
It can be removed entirely at the cost of only supporting synchronous
operations. This doesn't break userspace, which will silently block
(for a bounded amount of time) in io_submit instead of operating
asynchronously.
This also makes struct msghdr smaller, helping every other caller of
sendmsg().
Signed-off-by: Demi Marie Obenour <demiobenour@gmail.com>
Acked-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The driver notifies the hardware to handle task through
doorbell. Currently, doorbell is enabled by default. To
prevent the process from sending doorbells during hardware
reset scenarios, which could cause the hardware to process
doorbells and trigger new errors:
For example, when the physical machine is resetting the device,
doorbells are still being sent from the virtual machine.
Therefore, the driver disables doorbell during hardware
unavailability. After hardware initialization is completed,
doorbell is enabled, and any task sent during the unavailability
period will return errors.
The hardware supports the PF to disable doorbells for all functions,
while the VF can only disable its own doorbell function. When the PF
is reset, it will disable doorbells for all functions. When VF is
reset, it only disables its own doorbell and does not affect tasks
on other functions.
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
When executing operations on crypto devices, hardware errors
are inevitable. For certain errors, a full device reset is
required to recover. However, in certain cases, only a
specific function may fail, while other functions can still
operate normally. A system-wide RAS reset in such cases would
unnecessarily impact functioning components.
This patch introduces function-level granularity handling,
enabling targeted resets of only the error-reporting
functions without affecting other operational functions.
Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
usage counter
To avoid accessing memory of a suspended device, and since the counter
interface used by PM involves sleep operations, the counter interface
cannot be placed in the interrupt top half. Therefore, the interface for
acquiring the interrupt status in the RAS reset flow that resides in the
interrupt context needs to be moved to the bottom half for processing.
Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
The problem that the VF device cannot obtain the isolation
status and isolation threshold of the device is resolved.
The accelerator driver can query the device isolation status
and threshold via the VF device using the fault query sysfs
interface under uacce. Note that only the PF device supports
isolation policy configuration, while the VF device is
limited to read-only query operations.
Signed-off-by: Zhushuai Yin <yinzhushuai@huawei.com>
Signed-off-by: Zongyu Wu <wuzongyu1@huawei.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
|
|
Two concurrent madvise(MADV_HWPOISON) calls on the same hugetlb page can
trigger a recursive spinlock self-deadlock (AA deadlock) on hugetlb_lock
when racing with a concurrent unmap:
thread#0 thread#1
-------- --------
madvise(folio, MADV_HWPOISON)
-> poisons the folio successfully
madvise(folio, MADV_HWPOISON) unmap(folio)
try_memory_failure_hugetlb
get_huge_page_for_hwpoison
spin_lock_irq(&hugetlb_lock) <- held
__get_huge_page_for_hwpoison
hugetlb_update_hwpoison()
-> MF_HUGETLB_FOLIO_PRE_POISONED
goto out:
folio_put()
refcount: 1 -> 0
free_huge_folio()
spin_lock_irqsave(&hugetlb_lock)
-> AA DEADLOCK!
The out: path in __get_huge_page_for_hwpoison() calls folio_put() to drop
the GUP reference while the hugetlb_lock is still held by the hugetlb.c
wrapper get_huge_page_for_hwpoison(). If concurrent unmap has released
the page table mapping reference, folio_put() drops the folio refcount to
zero, triggering free_huge_folio() which attempts to re-acquire the
non-recursive hugetlb_lock.
Fix this by moving hugetlb_lock acquisition from the hugetlb.c wrapper
into get_huge_page_for_hwpoison(). Place spin_unlock_irq() before the
folio_put() at the out: label so the folio is always released outside the
lock.
[akpm@linux-foundation.org: fix race, rename label per Miaohe]
Link: https://sashiko.dev/#/patchset/20260522010305.4099834-1-mawupeng1@huawei.com
Link: https://lore.kernel.org/f39f405e-4b4b-8f79-70fe-a2b5b62114eb@huawei.com
Link: https://lore.kernel.org/20260522010305.4099834-1-mawupeng1@huawei.com
Fixes: 405ce051236c ("mm/hwpoison: fix race between hugetlb free/demotion and memory_failure_hugetlb()")
Signed-off-by: Wupeng Ma <mawupeng1@huawei.com>
Acked-by: Oscar Salvador (SUSE) <osalvador@kernel.org>
Acked-by: Muchun Song <muchun.song@linux.dev>
Reviewed-by: Kefeng Wang <wangkefeng.wang@huawei.com>
Acked-by: Miaohe Lin <linmiaohe@huawei.com>
Cc: David Hildenbrand <david@kernel.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Lorenzo Stoakes <ljs@kernel.org>
Cc: Michal Hocko <mhocko@suse.com>
Cc: Mike Rapoport <rppt@kernel.org>
Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Vlastimil Babka <vbabka@kernel.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
drivers"
Danilo Krummrich <dakr@kernel.org> says:
Currently, Rust device drivers access device resources such as PCI BAR mappings
and I/O memory regions through Devres<T>.
Devres::access() provides zero-overhead access by taking a &Device<Bound>
reference as proof that the device is still bound. Since a &Device<Bound> is
available in almost all contexts by design, Devres is mostly a type-system level
proof that the resource is valid, but it can also be used from scopes without
this guarantee through its try_access() accessor.
This works well in general, but has a few limitations:
- Every access to a device resource goes through Devres::access(), which
despite zero cost, adds boilerplate to every access site.
- Destructors do not receive a &Device<Bound>, so they must use try_access(),
which can fail. In practice the access succeeds if teardown ordering is
correct, but the type system can't express this, forcing drivers to handle a
failure path that should never be taken.
- Sharing a resource across components (e.g. passing a BAR to a sub-component)
requires Arc<Devres<T>>.
- Device references must be stored as ARef<Device> rather than plain &Device
borrows.
These limitations stem from the driver's bus device private data being 'static
-- the driver struct cannot borrow from the device reference it receives in
probe(), even though it structurally cannot outlive the device binding.
This series introduces Higher-Ranked Lifetime Types (HRT) for Rust device
drivers. An HRT is a type that is generic over a lifetime -- it does not have a
fixed lifetime, but can be instantiated with any lifetime chosen by the caller.
Bus driver traits use a Generic Associated Type (GAT) type Data<'bound> to
introduce the lifetime on the private data, rather than parameterizing the
Driver trait itself. This avoids a driver trait global lifetime and avoids the
need for ForLt for bus device private data, making the bus implementations much
simpler. ForLt is only needed for auxiliary registration data, where the
lifetime is not introduced by a trait callback but must be threaded through
Registration.
With HRT, driver structs carry a lifetime parameter tied to the device binding
scope -- the interval of a bus device being bound to a driver. Device resources
like pci::Bar<'bound> and IoMem<'bound> are handed out with this lifetime, so
the compiler enforces at build time that they do not escape the binding scope.
Before:
struct MyDriver {
pdev: ARef<pci::Device>,
bar: Devres<pci::Bar<BAR_SIZE>>,
}
let io = self.bar.access(dev)?;
io.read32(OFFSET);
After:
struct MyDriver<'bound> {
pdev: &'bound pci::Device,
bar: pci::Bar<'bound, BAR_SIZE>,
}
self.bar.read32(OFFSET);
Lifetime-parameterized device resources can be put into a Devres at any point
via Bar::into_devres() / IoMem::into_devres(), providing the exact same
semantics as before. This is useful for resources shared across subsystem
boundaries where revocation is needed.
This also synergizes with the upcoming self-referential initialization support
in pin-init, which allows one field of the driver struct to borrow another
during initialization without unsafe code.
The same pattern is applied to auxiliary device registration data as a first
example beyond bus device private data. Registration<F: ForLt> can hold
lifetime-parameterized data tied to the parent driver's binding scope. Since the
auxiliary bus guarantees that the parent remains bound while the auxiliary
device is registered, the registration data can safely borrow the parent's
device resources.
More generally, binding resource lifetimes to a registration scope applies to
every registration that is scoped to a driver binding -- auxiliary devices,
class devices, IRQ handlers, workqueues.
A follow-up series extends this to class device registrations, starting with
DRM, so that class device callbacks (IOCTLs, etc.) can safely access device
resources through the separate registration data bound to the registration's
lifetime without Devres indirection.
Thanks to Gary for coming up with the ForLt implementation; thanks to Alice for
the early discussions around lifetime-parameterized private data that helped
shape the direction of this work.
Link: https://patch.msgid.link/20260525202921.124698-1-dakr@kernel.org
Signed-off-by: Danilo Krummrich <dakr@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"This is again significantly bigger than the same point into the
previous cycle, but at least smaller than last week.
I'm not aware of any pending regression for the current cycle.
Including fixes from netfilter.
Current release - regressions:
- netfilter: walk fib6_siblings under RCU
Previous releases - regressions:
- netlink: fix sending unassigned nsid after assigned one
- bridge: fix sleep in atomic context in netlink path
- sched: fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop
- ipv4: fix net->ipv4.sysctl_local_reserved_ports UaF
- eth: tun: free page on short-frame rejection in tun_xdp_one()
Previous releases - always broken:
- skbuff: fix missing zerocopy reference in pskb_carve helpers
- handshake: drain pending requests at net namespace exit
- ethtool:
- rss: avoid modifying the RSS context response
- module: avoid leaking a netdev ref on module flash errors
- coalesce: cap profile updates at NET_DIM_PARAMS_NUM_PROFILES
- netfilter: fix dst corruption in same register operation
- nfc: hci: fix out-of-bounds read in HCP header parsing
- ipv6: exthdrs: refresh nh pointer after ipv6_hop_jumbo()
- eth:
- vti: use ip6_tnl.net in vti6_changelink().
- vxlan: do not reuse cached ip_hdr() value after
skb_tunnel_check_pmtu()"
* tag 'net-7.1-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (94 commits)
dpll: zl3073x: make frequency monitor a per-device attribute
dpll: zl3073x: use __dpll_device_change_ntf() and remove change_work
dpll: export __dpll_device_change_ntf() for use under dpll_lock
net/handshake: Drain pending requests at net namespace exit
net/handshake: Verify file-reference balance in submit paths
net/handshake: Close the submit-side sock_hold race
net/handshake: hand off the pinned file reference to accept_doit
net/handshake: Take a long-lived file reference at submit
net/handshake: Pass negative errno through handshake_complete()
nvme-tcp: store negative errno in queue->tls_err
net/handshake: Use spin_lock_bh for hn_lock
net: skbuff: fix missing zerocopy reference in pskb_carve helpers
net: hibmcge: move dma_rmb() after dma_sync_single_for_cpu() in RX path
net: hibmcge: disable Relaxed Ordering to fix RX packet corruption
selftests/tc-testing: Add netem test case exercising loops
selftests/tc-testing: Add mirred test cases exercising loops
net/sched: act_mirred: Fix return code in early mirred redirect error paths
net/sched: act_mirred: Fix blockcast recursion bypass leading to stack overflow
net/sched: Fix ethx:ingress -> ethy:egress -> ethx:ingress mirred loop
net/sched: fix packet loop on netem when duplicate is on
...
|
|
A user can enable io accounting for passthrough requests, so export the
helper that checks if the request should be tracked. This will enable
stacking drivers to to report iostats for passthrough workloads. Since
the stacking request_queue may not be the one providing the request, the
API has to add a parameter for the caller to specify which one to check.
Reviewed-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Link: https://patch.msgid.link/20260528010041.1533124-2-kbusch@meta.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
|
|
Add a helper that sets bi_status and call bio_endio() as that is a very
common pattern and convert the core block code over to it.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Md Haris Iqbal <haris.iqbal@linux.dev>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@kernel.org>
|