| Age | Commit message (Collapse) | Author | Files | Lines |
|
David Yang says:
====================
net: dsa: yt921x: Add DCB/QoS support
This series add DCB/QoS support to the driver.
v5: https://lore.kernel.org/r/20260128215202.2244266-1-mmyangfl@gmail.com
v4: https://lore.kernel.org/r/20260127020847.1482724-1-mmyangfl@gmail.com
v3: https://lore.kernel.org/r/20260125001328.3784006-1-mmyangfl@gmail.com
v2: https://lore.kernel.org/r/20260122194233.2777550-1-mmyangfl@gmail.com
v1: https://lore.kernel.org/r/20260119185935.2072685-1-mmyangfl@gmail.com
====================
Link: https://patch.msgid.link/20260131021854.3405036-1-mmyangfl@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Set up global DSCP/PCP priority mappings and add related DCB methods.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-6-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
yt921x_chip_setup() is already pretty long, and is going to become
longer. Split it into parts.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-5-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Create a helper function to centralize the logic for enabling and
disabling VLAN awareness on a port.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-4-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Required by DCB/QoS support of the switch driver, since the rx packets
will have non-zero priorities.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-3-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Packet priority is part of the tag, and the priority and code fields are
used by tx and rx. Make revisions to reflect the facts.
Signed-off-by: David Yang <mmyangfl@gmail.com>
Link: https://patch.msgid.link/20260131021854.3405036-2-mmyangfl@gmail.com
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This adds a binding for the Faraday FTSSP010 SSP controller,
a pretty straight-forward syncronous serial port and SPI
controller.
The bindings are submitted separately because the one device
that has this is using it in a "nonstandard way" with regards
to the electronics, and does not make it possible to develop
or test a proper driver. However we want to be able to add
this resource to the device trees and it's not complex.
Signed-off-by: Linus Walleij <linusw@kernel.org>
Link: https://patch.msgid.link/20260203-gemini-ssp-bindings-v1-1-6d85c9c72371@kernel.org
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
If SDCA property is not present in the DisCo table, assume it
is present. This allows DAI links to be created from codec_info_list
instead of being skipped.
Signed-off-by: Mac Chiang <mac.chiang@intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260203095923.3741674-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Use functional topologies to support all RT721-related topology and
amplifier combinations,
e.g. sof-ptl-rt721.tplg, sof-ptl-rt721-l3-rt1320-l3.tplg.
If these entries are not removed, they will all use the sof-ptl-rt721.tplg.
Signed-off-by: Mac Chiang <mac.chiang@intel.com>
Reviewed-by: Liam Girdwood <liam.r.girdwood@intel.com>
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Link: https://patch.msgid.link/20260203100027.3741754-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The existing code test if (bt_link_mask_override) to overwrite the BT
link mask. This doesn't allow user to disable the BT link mask. User may
want to disable the BT link when it is detected by the NHLT.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Link: https://patch.msgid.link/20260203111545.3742255-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Currently the SoundWire BPT stream uses the paired link DMA but not
reserve it. It works without any issue because we assume the SoundWire
BPT will not run with audio streams simultaneously.
To support simultaneous audio and BPT streams, we need to use the
hda_dma_prepare/cleanup helpers to reserve the pair link host DMA.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev>
Link: https://patch.msgid.link/20260203114027.3742558-4-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
SoundWire BPT stream needs to use link and host DMAs. Thus we need
helpers to prepare and cleanup the link and host DMAs. Currently the
SoundWire BPT stream uses hda_cl_prepare/cleanup helpers. It works fine
because we assume the SwoundWire BPT will not run with audio streams
simultaneously. The new helpers are copied from hda_cl_prepare/cleanup
and add a flag to reserve the paired host and link DMAs. The new helpers
will be used by both code loader and SoundWire BPT.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev>
Link: https://patch.msgid.link/20260203114027.3742558-3-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Currently, hda_dsp_stream_get/put are used to get/put the host dma.
However, we may want to use a hda stream that both host and link dma are
available. Add helper to find the hda stream and reserve/release it.
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Pierre-Louis Bossart <pierre-louis.bossart@linux.dev>
Link: https://patch.msgid.link/20260203114027.3742558-2-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
This patch adds a setting to resolve the intermittent no-sound issue.
Signed-off-by: Shuming Fan <shumingf@realtek.com>
Link: https://patch.msgid.link/20260203084827.768238-1-shumingf@realtek.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
The link_mask variable is not changed after setting to
hdev->info.link_mask until it is used for another purpose to get the
used SoundWire links and set to mach->mach_params.links. Besides, the
link_mask variable should be reset before any link id is added to the
link_mask. To fix the issue above and avoid confusing, use the
hdev->info.link_mask variable directly to check if the SoundWire link
is enabled.
Fixes: 5226d19d4cae ("ASoC: SOF: Intel: use sof_sdw as default SDW machine driver")
Signed-off-by: Bard Liao <yung-chuan.liao@linux.intel.com>
Reviewed-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
Reviewed-by: Péter Ujfalusi <peter.ujfalusi@linux.intel.com>
Link: https://patch.msgid.link/20260203072405.3716307-1-yung-chuan.liao@linux.intel.com
Signed-off-by: Mark Brown <broonie@kernel.org>
|
|
Add current (iova, len) to the iotlb gather, regardless of the setting
of PT_FEAT_FLUSH_RANGE or PT_FEAT_FLUSH_RANGE_NO_GAPS.
In gather_range_pages(), the current IOVA range is only added to
iotlb_gather when PT_FEAT_FLUSH_RANGE is set. Yet a virtual IOMMU with
NpCache uses only PT_FEAT_FLUSH_RANGE_NO_GAPS. In that case, iotlb_gather
will stay empty (start=ULONG_MAX, end=0) after initialization, and the
current (iova, len) will not be added to the iotlb_gather, causing
subsequent iommu_iotlb_sync() to perform IOTLB invalidation with wrong
parameters (e.g., amd_iommu_iotlb_sync() computes size from
gather->end - gather->start + 1, leading to an invalid range).
The disjoint check and sync for PT_FEAT_FLUSH_RANGE_NO_GAPS remain
unchanged: when the new range is disjoint from the existing gather,
we still sync first and then add the new range, so semantics for
NO_GAPS are preserved.
Fixes: 7c53f4238aa8 ("iommupt: Add unmap_pages op")
Cc: stable@vger.kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Yu Zhang <zhangyu1@linux.microsoft.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
With concurrent TLB invalidations, completion wait randomly gets timed out
because cmd_sem_val was incremented outside the IOMMU spinlock, allowing
CMD_COMPL_WAIT commands to be queued out of sequence and breaking the
ordering assumption in wait_on_sem().
Move the cmd_sem_val increment under iommu->lock so completion sequence
allocation is serialized with command queuing.
And remove the unnecessary return.
Fixes: d2a0cac10597 ("iommu/amd: move wait_on_sem() out of spinlock")
Tested-by: Srikanth Aithal <sraithal@amd.com>
Reported-by: Srikanth Aithal <sraithal@amd.com>
Signed-off-by: Ankit Soni <Ankit.Soni@amd.com>
Reviewed-by: Vasant Hegde <vasant.hegde@amd.com>
Signed-off-by: Joerg Roedel <joerg.roedel@amd.com>
|
|
Not pausing it means that we can have the TCM work queued into a
non-freezable workqueue, which, in resume, is re-activated before the
driver's resume is called.
The TCM work might send commands to the FW before we resumed the device,
leading to an assert.
Closes: https://lore.kernel.org/linux-wireless/aTDoDiD55qlUZ0pn@debian.local/
Tested-by: Chris Bainbridge <chris.bainbridge@gmail.com>
Fixes: e8bb19c1d590 ("wifi: iwlwifi: support fast resume")
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260129212650.05621f3faedb.I44df9cf9183b5143df8078131e0d87c0fd7e1763@changeid
|
|
mlo_scan_start_wk is not canceled on disconnection. In fact, it is not
canceled anywhere except in the restart cleanup, where we don't really
have to.
This can cause an init-after-queue issue: if, for example, the work was
queued and then drv_change_interface got executed.
This can also cause use-after-free: if the work is executed after the
vif is freed.
Fixes: 9748ad82a9d9 ("wifi: iwlwifi: defer MLO scan after link activation")
Reviewed-by: Johannes Berg <johannes.berg@intel.com>
Signed-off-by: Miri Korenblit <miriam.rachel.korenblit@intel.com>
Link: https://patch.msgid.link/20260129212650.a36482a60719.I5bf64a108ca39dacb5ca0dcd8b7258a3ce8db74c@changeid
|
|
Currently, s32g_pcie_parse_ports() exercises the 'err_port' path even
during the success case. This results in ports getting deleted after
successful parsing of Root Ports.
Hence, skip the removal of Root Ports during success.
Fixes: 5cbc7d3e316e ("PCI: s32g: Add NXP S32G PCIe controller driver (RC)")
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
[mani: reworded subject and description]
Signed-off-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260202151050.1446165-1-vincent.guittot@linaro.org
|
|
if debugging is enabled the DEBUG statement will fail do to a bad
fat fingered cast.
Fixes: 102ada7ca37ed ("apparmor: fix fmt string type error in process_strs_entry")
Signed-off-by: John Johansen <john.johansen@canonical.com>
|
|
wlc_phy_txpwr_srom_read_lcnphy() in wlc_phy_attach_lcnphy() always
returns true, making the error handling code unreachable. Change the
function's return type to void and remove the dead code, similar to
the cleanup done for wlc_phy_txpwr_srom_read_nphy() in commit
47f0e32ffe4e ("wifi: brcmsmac: phy: Remove unreachable code").
Signed-off-by: Ingyu Jang <ingyujang25@korea.ac.kr>
Acked-by: Arend van Spriel <arend.vanspriel@broadcom.com>
Link: https://patch.msgid.link/20260131172355.3367673-1-ingyujang25@korea.ac.kr
Signed-off-by: Johannes Berg <johannes.berg@intel.com>
|
|
Somehow this got changed to NULL when I ported this to upstream it. No
idea how that happened.
Reported-by: Carlos Llamas <cmllamas@google.com>
Closes: https://lore.kernel.org/r/aXkEiC1sGOGfDuzI@google.com
Fixes: c1ea31205edf ("rust_binder: add binder_transaction tracepoint")
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Acked-by: Carlos Llamas <cmllamas@google.com>
Link: https://patch.msgid.link/20260128-binder-fix-target-node-null-v1-1-78d198ef55a5@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The 'max' argument of ida_alloc_max() takes the maximum valid ID and not
the "count". Using an ID of BINDERFS_MAX_MINOR (1 << 20) for dev->minor
would exceed the limits of minor numbers (20-bits). Fix this off-by-one
error by subtracting 1 from the 'max'.
Cc: stable@vger.kernel.org
Fixes: 3ad20fe393b3 ("binder: implement binderfs")
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Link: https://patch.msgid.link/20260127235545.2307876-2-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The 'max' argument of ida_alloc_max() takes the maximum valid ID and not
the "count". Using an ID of BINDERFS_MAX_MINOR (1 << 20) for dev->minor
would exceed the limits of minor numbers (20-bits). Fix this off-by-one
error by subtracting 1 from the 'max'.
Cc: stable@vger.kernel.org
Fixes: eafedbc7c050 ("rust_binder: add Rust Binder driver")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/r/202512181203.IOv6IChH-lkp@intel.com/
Signed-off-by: Carlos Llamas <cmllamas@google.com>
Reviewed-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260127235545.2307876-1-cmllamas@google.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Update call sites in binder files to import `ARef`
from `sync::aref` instead of `types`.
This aligns with the ongoing effort to move `ARef` and
`AlwaysRefCounted` to sync.
Suggested-by: Benno Lossin <lossin@kernel.org>
Link: https://github.com/Rust-for-Linux/linux/issues/1173
Signed-off-by: Shankari Anand <shankari.ak0208@gmail.com>
Acked-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260102202714.184223-2-shankari.ak0208@gmail.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Clippy warns about a needless borrow in context.rs:
error: this expression creates a reference which is immediately dereferenced by the compiler
--> drivers/android/binder/context.rs:141:18
|
141 | func(&proc);
| ^^^^^ help: change this to: `proc`
Remove the unnecessary borrow to satisfy clippy and improve code
cleanliness. No functional change.
Signed-off-by: Shivam Kalra <shivamklr@cock.li>
Acked-by: Alice Ryhl <aliceryhl@google.com>
Link: https://patch.msgid.link/20260130182842.217821-1-shivamklr@cock.li
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Currently, if the command line passed to kexec_file_load() exceeds
the supported limit of the kernel being kexec'd, -EINVAL is returned
to userspace, which is consistent across architectures. Since
-EINVAL is not specific to this case, the kexec tool cannot provide
a specific reason for the failure. Many architectures emit an error
message in this case. Add a similar error message, including the
effective limit, since the command line length is configurable.
Acked-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Enable BLK_DEV_NULL_BLK as module in defconfig and debug_defconfig, so the
Null Test Block Device Driver can be easily used for testing purposes.
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Recently [1] s390 got stackprotector support. Document this.
[1] commit f5730d44e05e ("s390: Add stackprotector support")
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
|
|
Heiner Kallweit says:
====================
net: phy: remove modalias-based MDIO device bus matching
modalias-based MDIO device bus matching has only one user (dsa-loop),
where we can replace modalias-based matching with a simple custom
match function. This, and first patch of the series, lay the foundation
for removing modalias-based matching.
====================
Link: https://patch.msgid.link/d9543e7d-23e1-4dba-a6b3-35dcd6a35dec@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Last user dsa_loop has been migrated away from modalias-based matching,
so we can remove this feature now. It was the only user of MDIO_NAME_SIZE,
so remove also this constant.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/ce1c6df0-4785-4b28-8322-32dc6bceea18@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
This change is a prerequisite for removing the MDIO device modalias,
as dsa_loop is the only user. Switch from modalias to a custom
bus match function.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Reviewed-by: Vladimir Oltean <olteanv@gmail.com>
Tested-by: Vladimir Oltean <olteanv@gmail.com>
Link: https://patch.msgid.link/15a4318f-50b5-4df5-874e-e387ee070a9d@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Primary reason for this change is to remove the misuse of MDIO_NAME_SIZE
here, so that this constant can be removed in a follow-up patch.
Use case here is simply a chip name w/o any relationship to a MDIO
device. Also there's no need to reserve a longer char array, so make
the name a pointer.
Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com>
Link: https://patch.msgid.link/61bc14fa-eed3-43b6-ae40-b98063e81578@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
|
|
Add a test case to verify correct synchronization between concurrent
readers and a revocation.
The test setup involves:
1. Consumer 1 enters the critical section (SRCU read lock) and verifies
access to the resource.
2. Provider attempts to revoke the resource. This should block until
Consumer 1 releases the lock.
3. Consumer 2 attempts to enter the critical section while revocation
is pending. It should see the resource as revoked (NULL).
4. Consumer 1 exits, allowing the revocation to complete.
This ensures that the SRCU mechanism correctly enforces grace periods
and that new readers are properly prevented from accessing the resource
once revocation has begun.
A way to run the test:
$ ./tools/testing/kunit/kunit.py run \
--kconfig_add CONFIG_REVOCABLE_KUNIT_TEST=y \
--kconfig_add CONFIG_PROVE_LOCKING=y \
--kconfig_add CONFIG_DEBUG_KERNEL=y \
--kconfig_add CONFIG_DEBUG_INFO=y \
--kconfig_add CONFIG_DEBUG_INFO_DWARF5=y \
--kconfig_add CONFIG_KASAN=y \
--kconfig_add CONFIG_DETECT_HUNG_TASK=y \
--kconfig_add CONFIG_DEFAULT_HUNG_TASK_TIMEOUT="10" \
--arch=x86_64 --raw_output=all \
revocable_test
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://patch.msgid.link/20260129143733.45618-5-tzungbi@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
The struct revocable handle stores the SRCU read-side index (idx) for
the duration of a resource access. If multiple threads share the same
struct revocable instance, they race on writing to the idx field,
corrupting the SRCU state and potentially causing unsafe unlocks.
Refactor the API to replace revocable_alloc()/revocable_free() with
revocable_init()/revocable_deinit(). This change requires the caller
to provide the storage for struct revocable.
By moving storage ownership to the caller, the API ensures that
concurrent users maintain their own private idx storage, eliminating
the race condition.
Reported-by: Johan Hovold <johan@kernel.org>
Closes: https://lore.kernel.org/all/20260124170535.11756-4-johan@kernel.org/
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://patch.msgid.link/20260129143733.45618-4-tzungbi@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Add a test to verify that revocable_alloc() correctly handles race
conditions where the provider is being released.
The test covers three scenarios:
1. Allocating from a NULL provider.
2. Allocating from a provider that has been detached (pointer is NULL).
3. Allocating from a provider that is in the process of destruction
(refcount is 0), simulating a race between revocable_alloc() and
revocable_provider_release().
A way to run the test:
$ ./tools/testing/kunit/kunit.py run \
--kconfig_add CONFIG_REVOCABLE_KUNIT_TEST=y \
--kconfig_add CONFIG_PROVE_LOCKING=y \
--kconfig_add CONFIG_DEBUG_KERNEL=y \
--kconfig_add CONFIG_DEBUG_INFO=y \
--kconfig_add CONFIG_DEBUG_INFO_DWARF5=y \
--kconfig_add CONFIG_KASAN=y \
--kconfig_add CONFIG_DETECT_HUNG_TASK=y \
--kconfig_add CONFIG_DEFAULT_HUNG_TASK_TIMEOUT="10" \
--arch=x86_64 --raw_output=all \
revocable_test
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://patch.msgid.link/20260129143733.45618-3-tzungbi@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
There are two race conditions when allocating a revocable instance:
1. After a struct revocable_provider is revoked, the caller might still
hold a dangling pointer to it. A subsequent call to
revocable_alloc() can trigger a use-after-free.
2. If revocable_provider_release() runs concurrently with
revocable_alloc(), the memory of struct revocable_provider can be
accessed during or after kfree().
To fix these:
- Manage the lifetime of struct revocable_provider using RCU. Annotate
pointers to it with __rcu and use kfree_rcu() for deallocation.
- Update revocable_alloc() to safely acquire a reference using RCU
primitives.
- Update revocable_provider_revoke() to take a double pointer (`**rp`).
It atomically NULLs out the caller's pointer before starting
revocation. This prevents the caller from holding a dangling pointer.
- Drop devm_revocable_provider_alloc(). The devm-managed model cannot
support the required double-pointer semantic for safe pointer nulling.
Reported-by: Johan Hovold <johan@kernel.org>
Closes: https://lore.kernel.org/all/aXdy-b3GOJkzGqYo@hovoldconsulting.com/
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
Link: https://patch.msgid.link/20260129143733.45618-2-tzungbi@kernel.org
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Capitalize comment starts to match kernel coding style.
Fix spelling: "reciever" -> "receiver"
Fix grammar: "it's" (contraction of "it is") -> "its" (possessive)
Remove uncertainty from "Clear error bits?" comment.
Compile tested only.
Signed-off-by: Micah Ostrow <bluefox9516@gmail.com>
Link: https://patch.msgid.link/20260127181735.57132-1-bluefox9516@gmail.com
Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
|
|
cgroups
Consider the following sequence on a CPU configured with nohz_full:
1) A task P runs in cgroup A, and cgroup A becomes throttled due to CFS
bandwidth control. The gse (cgroup A) where the task P attached is
dequeued and the CPU switches to idle.
2) Before cgroup A is unthrottled, task P is migrated from cgroup A to
another cgroup B (not throttled).
During sched_move_task(), the task P is observed as queued but not
running, and therefore no resched_curr() is triggered.
3) Since the CPU is nohz_full, it remains in do_idle() waiting for an
explicit scheduling event, i.e., resched_curr().
4) For kernel <= 5.10: Later, cgroup A is unthrottled. However, the task
P has already been migrated out of cgroup A, so unthrottle_cfs_rq()
may observe load_weight == 0 and return early without resched_curr()
called. For kernel >= 6.6: The unthrottling path normally triggers
`resched_curr()` almost cases even when no runnable tasks remain in the
unthrottled cgroup, preventing the idle stall described above. However,
if cgroup A is removed before it gets unthrottled, the unthrottling path
for cgroup A is never executed. In a result, no `resched_curr()` can be
called.
5) At this point, the task P is runnable in cgroup B (not throttled), but
the CPU remains in do_idle() with no pending reschedule point. The
system stays in this state until an unrelated event (e.g. a new task
wakeup or any cases) that can trigger a resched_curr() breaks the
nohz_full idle state, and then the task P finally gets scheduled.
The root cause is that sched_move_task() may classify the task as only
queued, not running, and therefore fails to trigger a resched_curr(),
while the later unthrottling path no longer has visibility of the
migrated task.
Preserve the existing behavior for running tasks by issuing
resched_curr(), and explicitly invoke check_preempt_curr() for tasks
that were queued at the time of migration. This ensures that runnable
tasks are reconsidered for scheduling even when nohz_full suppresses
periodic ticks.
Fixes: 29f59db3a74b ("sched: group-scheduler core")
Signed-off-by: Zicheng Qu <quzicheng@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Aaron Lu <ziqianlu@bytedance.com>
Tested-by: Aaron Lu <ziqianlu@bytedance.com>
Link: https://patch.msgid.link/20260130083438.1122457-1-quzicheng@huawei.com
|
|
Use %pe format specifier for printing PTR_ERR() error values
to make error messages more readable.
Found by Coccinelle:
./cpufreq_schedutil.c:685:49-56: WARNING: Consider using %pe to print PTR_ERR()
Signed-off-by: zenghongling <zenghongling@kylinos.cn>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260120083333.148385-1-zenghongling@kylinos.cn
|
|
CPU0 becomes overloaded when hosting a CPU-bound RT task, a non-CPU-bound
RT task, and a CFS task stuck in kernel space. When other CPUs switch from
RT to non-RT tasks, RT load balancing (LB) is triggered; with
HAVE_RT_PUSH_IPI enabled, they send IPIs to CPU0 to drive the execution
of rto_push_irq_work_func. During push_rt_task on CPU0,
if next_task->prio < rq->donor->prio, resched_curr() sets NEED_RESCHED
and after the push operation completes, CPU0 calls rto_next_cpu().
Since only CPU0 is overloaded in this scenario, rto_next_cpu() should
ideally return -1 (no further IPI needed).
However, multiple CPUs invoking tell_cpu_to_push() during LB increments
rd->rto_loop_next. Even when rd->rto_cpu is set to -1, the mismatch between
rd->rto_loop and rd->rto_loop_next forces rto_next_cpu() to restart its
search from -1. With CPU0 remaining overloaded (satisfying rt_nr_migratory
&& rt_nr_total > 1), it gets reselected, causing CPU0 to queue irq_work to
itself and send self-IPIs repeatedly. As long as CPU0 stays overloaded and
other CPUs run pull_rt_tasks(), it falls into an infinite self-IPI loop,
which triggers a CPU hardlockup due to continuous self-interrupts.
The trigging scenario is as follows:
cpu0 cpu1 cpu2
pull_rt_task
tell_cpu_to_push
<------------irq_work_queue_on
rto_push_irq_work_func
push_rt_task
resched_curr(rq) pull_rt_task
rto_next_cpu tell_cpu_to_push
<-------------------------- atomic_inc(rto_loop_next)
rd->rto_loop != next
rto_next_cpu
irq_work_queue_on
rto_push_irq_work_func
Fix redundant self-IPI by filtering the initiating CPU in rto_next_cpu().
This solution has been verified to effectively eliminate spurious self-IPIs
and prevent CPU hardlockup scenarios.
Fixes: 4bdced5c9a29 ("sched/rt: Simplify the IPI based RT balancing logic")
Suggested-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Signed-off-by: Chen Jinghuang <chenjinghuang2@huawei.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Reviewed-by: Valentin Schneider <vschneid@redhat.com>
Link: https://patch.msgid.link/20260122012533.673768-1-chenjinghuang2@huawei.com
|
|
Read-mostly sched_clock_irqtime may share the same cacheline with
frequently updated nohz struct. Make it as static_key to avoid
false sharing issue.
The only user of disable_sched_clock_irqtime()
is tsc_.*mark_unstable() which may be invoked under atomic context
and require a workqueue to disable static_key. But both of them
calls clear_sched_clock_stable() just before doing
disable_sched_clock_irqtime(). We can reuse
"sched_clock_work" to also disable sched_clock_irqtime().
One additional case need to handle is if the tsc is marked unstable
before late_initcall() phase, sched_clock_work will not be invoked
and sched_clock_irqtime will stay enabled although clock is unstable:
tsc_init()
enable_sched_clock_irqtime() # irqtime accounting is enabled here
...
if (unsynchronized_tsc()) # true
mark_tsc_unstable()
clear_sched_clock_stable()
__sched_clock_stable_early = 0;
...
if (static_key_count(&sched_clock_running.key) == 2)
# Only happens at sched_clock_init_late()
__clear_sched_clock_stable(); # Never executed
...
# late_initcall() phase
sched_clock_init_late()
if (__sched_clock_stable_early) # Already false
__set_sched_clock_stable(); # sched_clock is never marked stable
# TSC unstable, but sched_clock_work won't run to disable irqtime
So we need to disable_sched_clock_irqtime() in sched_clock_init_late()
if clock is unstable.
Reported-by: Benjamin Lei <benjamin.lei@intel.com>
Suggested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Suggested-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Signed-off-by: Wangyang Guo <wangyang.guo@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: K Prateek Nayak <kprateek.nayak@amd.com>
Reviewed-by: Tim Chen <tim.c.chen@linux.intel.com>
Reviewed-by: Tianyou Li <tianyou.li@intel.com>
Reviewed-by: Shrikanth Hegde <sshegde@linux.ibm.com>
Tested-by: K Prateek Nayak <kprateek.nayak@amd.com>
Link: https://patch.msgid.link/20260127072509.2627346-1-wangyang.guo@intel.com
|
|
Add a new kselftest to verify that the total_bw value in
/sys/kernel/debug/sched/debug remains consistent across all CPUs
under different sched_ext BPF program states:
1. Before a BPF scheduler is loaded
2. While a BPF scheduler is loaded and active
3. After a BPF scheduler is unloaded
The test runs CPU stress threads to ensure DL server bandwidth
values stabilize before checking consistency. This helps catch
potential issues with DL server bandwidth accounting during
sched_ext transitions.
Co-developed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-8-arighi@nvidia.com
|
|
Add a selftest to validate the correct behavior of the deadline server
for the ext_sched_class.
Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Emil Tsalapatis <emil@etsalapatis.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-7-arighi@nvidia.com
|
|
There are two problems with sched_server_write_common() that can cause the
dl_server to malfunction upon attempting to change the parameters:
1) when, after having disabled the dl_server by setting runtime=0, it is
enabled again while tasks are already enqueued. In this case is_active would
still be 0 and dl_server_start() would not be called.
2) when dl_server_apply_params() would fail, runtime is not applied and does
not reflect the new state.
Instead have dl_server_start() check its actual dl_runtime, and have
sched_server_write_common() unconditionally (re)start the dl_server. It will
automatically stop if there isn't anything to do, so spurious activation is
harmless -- while failing to start it is a problem.
While there, move the printk out of the locked region and make it symmetric,
also printing on enable.
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://patch.msgid.link/20260203103407.GK1282955@noisy.programming.kicks-ass.net
|
|
When a sched_ext server is loaded, tasks in the fair class are
automatically moved to the sched_ext class. Add support to modify the
ext server parameters similar to how the fair server parameters are
modified.
Re-use common code between ext and fair servers as needed.
Co-developed-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-6-arighi@nvidia.com
|
|
sched_ext currently suffers starvation due to RT. The same workload when
converted to EXT can get zero runtime if RT is 100% running, causing EXT
processes to stall. Fix it by adding a DL server for EXT.
A kselftest is also included later to confirm that both DL servers are
functioning correctly:
# ./runner -t rt_stall
===== START =====
TEST: rt_stall
DESCRIPTION: Verify that RT tasks cannot stall SCHED_EXT tasks
OUTPUT:
TAP version 13
1..1
# Runtime of FAIR task (PID 1511) is 0.250000 seconds
# Runtime of RT task (PID 1512) is 4.750000 seconds
# FAIR task got 5.00% of total runtime
ok 1 PASS: FAIR task got more than 4.00% of runtime
TAP version 13
1..1
# Runtime of EXT task (PID 1514) is 0.250000 seconds
# Runtime of RT task (PID 1515) is 4.750000 seconds
# EXT task got 5.00% of total runtime
ok 2 PASS: EXT task got more than 4.00% of runtime
TAP version 13
1..1
# Runtime of FAIR task (PID 1517) is 0.250000 seconds
# Runtime of RT task (PID 1518) is 4.750000 seconds
# FAIR task got 5.00% of total runtime
ok 3 PASS: FAIR task got more than 4.00% of runtime
TAP version 13
1..1
# Runtime of EXT task (PID 1521) is 0.250000 seconds
# Runtime of RT task (PID 1522) is 4.750000 seconds
# EXT task got 5.00% of total runtime
ok 4 PASS: EXT task got more than 4.00% of runtime
ok 1 rt_stall #
===== END =====
Co-developed-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Andrea Righi <arighi@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-5-arighi@nvidia.com
|
|
Currently the DL server interface for applying parameters checks
CFS-internals to identify if the server is active. This is error-prone
and makes it difficult when adding new servers in the future.
Fix it, by using dl_server_active() which is also used by the DL server
code to determine if the DL server was started.
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-4-arighi@nvidia.com
|
|
Updating "ppos" on error conditions does not make much sense. The pattern
is to return the error code directly without modifying the position, or
modify the position on success and return the number of bytes written.
Since on success, the return value of apply is 0, there is no point in
modifying ppos either. Fix it by removing all this and just returning
error code or number of bytes written on success.
Signed-off-by: Joel Fernandes <joelagnelf@nvidia.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Juri Lelli <juri.lelli@redhat.com>
Reviewed-by: Andrea Righi <arighi@nvidia.com>
Acked-by: Tejun Heo <tj@kernel.org>
Tested-by: Christian Loehle <christian.loehle@arm.com>
Link: https://patch.msgid.link/20260126100050.3854740-3-arighi@nvidia.com
|