aboutsummaryrefslogtreecommitdiff
path: root/drivers/pci
AgeCommit message (Collapse)AuthorFilesLines
2025-11-24mm/zone_device: rename page_free callback to folio_freeBalbir Singh1-2/+3
Change page_free to folio_free to make the folio support for zone device-private more consistent. The PCI P2PDMA callback has also been updated and changed to folio_free() as a result. For drivers that do not support folios (yet), the folio is converted back into page via &folio->page and the page is used as is, in the current callback implementation. Link: https://lkml.kernel.org/r/20251001065707.920170-3-balbirs@nvidia.com Signed-off-by: Balbir Singh <balbirs@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Zi Yan <ziy@nvidia.com> Cc: Joshua Hahn <joshua.hahnjy@gmail.com> Cc: Rakie Kim <rakie.kim@sk.com> Cc: Byungchul Park <byungchul@sk.com> Cc: Gregory Price <gourry@gourry.net> Cc: Ying Huang <ying.huang@linux.alibaba.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Nico Pache <npache@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Dev Jain <dev.jain@arm.com> Cc: Barry Song <baohua@kernel.org> Cc: Lyude Paul <lyude@redhat.com> Cc: Danilo Krummrich <dakr@kernel.org> Cc: David Airlie <airlied@gmail.com> Cc: Simona Vetter <simona@ffwll.ch> Cc: Ralph Campbell <rcampbell@nvidia.com> Cc: Mika Penttilä <mpenttil@redhat.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Francois Dugast <francois.dugast@intel.com> Cc: Madhavan Srinivasan <maddy@linux.ibm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Felix Kuehling <Felix.Kuehling@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Cc: "Christian König" <christian.koenig@amd.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-24treewide: Drop pci_save_state() after pci_restore_state()Lukas Wunner1-1/+0
In 2009, commit c82f63e411f1 ("PCI: check saved state before restore") changed the behavior of pci_restore_state() such that it became necessary to call pci_save_state() afterwards, lest recovery from subsequent PCI errors fails. The commit has just been reverted and so all the pci_save_state() after pci_restore_state() calls that have accumulated in the tree are now superfluous. Drop them. Two drivers chose a different approach to achieve the same result: drivers/scsi/ipr.c and drivers/net/ethernet/intel/e1000e/netdev.c set the pci_dev's "state_saved" flag to true before calling pci_restore_state(). Drop this as well. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> # qat Link: https://patch.msgid.link/c2b28cc4defa1b743cf1dedee23c455be98b397a.1760274044.git.lukas@wunner.de
2025-11-24PCI/ERR: Ensure error recoverability at all timesLukas Wunner2-3/+3
When the PCI core gained power management support in 2002, it introduced pci_save_state() and pci_restore_state() helpers to restore Config Space after a D3hot or D3cold transition, which implies a Soft or Fundamental Reset (PCIe r7.0 sec 5.8): https://git.kernel.org/tglx/history/c/a5287abe398b In 2006, EEH and AER were introduced to recover from errors by performing a reset. Because errors can occur at any time, drivers began calling pci_save_state() on probe to ensure recoverability. In 2009, recoverability was foiled by commit c82f63e411f1 ("PCI: check saved state before restore"): It amended pci_restore_state() to bail out if the "state_saved" flag has been cleared. The flag is cleared by pci_restore_state() itself, hence a saved state is now allowed to be restored only once and is then invalidated. That doesn't seem to make sense because the saved state should be good enough to be reused. Soon after, drivers began to work around this behavior by calling pci_save_state() immediately after pci_restore_state(), see e.g. commit b94f2d775a71 ("igb: call pci_save_state after pci_restore_state"). Hilariously, two drivers even set the "saved_state" flag to true before invoking pci_restore_state(), see ipr_reset_restore_cfg_space() and e1000_io_slot_reset(). Despite these workarounds, recoverability at all times is not guaranteed: E.g. when a PCIe port goes through a runtime suspend and resume cycle, the "saved_state" flag is cleared by: pci_pm_runtime_resume() pci_pm_default_resume_early() pci_restore_state() ... and hence on a subsequent AER event, the port's Config Space cannot be restored. Riana reports a recovery failure of a GPU-integrated PCIe switch and has root-caused it to the behavior of pci_restore_state(). Another workaround would be necessary, namely calling pci_save_state() in pcie_port_device_runtime_resume(). The motivation of commit c82f63e411f1 was to prevent restoring state if pci_save_state() hasn't been called before. But that can be achieved by saving state already on device addition, after Config Space has been initialized. A desirable side effect is that devices become recoverable even if no driver gets bound. This renders the commit unnecessary, so revert it. Reported-by: Riana Tauro <riana.tauro@intel.com> # off-list Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Link: https://patch.msgid.link/9e34ce61c5404e99ffdd29205122c6fb334b38aa.1763483367.git.lukas@wunner.de
2025-11-24PCI/PM: Stop needlessly clearing state_saved on enumeration and thawLukas Wunner2-4/+0
The state_saved flag tells the PCI core whether a driver assumes responsibility to save Config Space and put the device into a low power state on suspend. The flag is currently initialized to false on enumeration, even though it already is false (because struct pci_dev is zeroed by kzalloc()) and even though it is set to false before commencing the suspend sequence (the only code path where it's relevant). The flag is also set to false in pci_pm_thaw(), i.e. on resume, when it's no longer relevant. Drop these two superfluous flag assignments for simplicity. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Link: https://patch.msgid.link/fd167945bd7852e1ca08cd4b202130659eea2c2f.1763483367.git.lukas@wunner.de
2025-11-24PCI/PM: Reinstate clearing state_saved in legacy and !PM codepathsLukas Wunner1-0/+4
When a PCI device is suspended, it is normally the PCI core's job to save Config Space and put the device into a low power state. However drivers are allowed to assume these responsibilities. When they do, the PCI core can tell by looking at the state_saved flag in struct pci_dev: The flag is cleared before commencing the suspend sequence and it is set when pci_save_state() is called. If the PCI core finds the flag set late in the suspend sequence, it refrains from calling pci_save_state() itself. But there are two corner cases where the PCI core neglects to clear the flag before commencing the suspend sequence: * If a driver has legacy PCI PM callbacks, pci_legacy_suspend() neglects to clear the flag. The (stale) flag is subsequently queried by pci_legacy_suspend() itself and pci_legacy_suspend_late(). * If a device has no driver or its driver has no PCI PM callbacks, pci_pm_freeze() neglects to clear the flag. The (stale) flag is subsequently queried by pci_pm_freeze_noirq(). The flag may be set prior to suspend if the device went through error recovery: Drivers commonly invoke pci_restore_state() + pci_save_state() to restore Config Space after reset. The flag may also be set if drivers call pci_save_state() on probe to allow for recovery from subsequent errors. The result is that pci_legacy_suspend_late() and pci_pm_freeze_noirq() don't call pci_save_state() and so the state that will be restored on resume is the one recorded on last error recovery or on probe, not the one that the device had on suspend. If the two states happen to be identical, there's no problem. Reinstate clearing the flag in pci_legacy_suspend() and pci_pm_freeze(). The two functions used to do that until commit 4b77b0a2ba27 ("PCI: Clear saved_state after the state has been restored") deemed it unnecessary because it assumed that it's sufficient to clear the flag on resume in pci_restore_state(). The commit seemingly did not take into account that pci_save_state() and pci_restore_state() are not only used by power management code, but also for error recovery. Devices without driver or whose driver has no PCI PM callbacks may be in runtime suspend when pci_pm_freeze() is called. Their state has already been saved, so don't clear the flag to skip a pointless pci_save_state() in pci_pm_freeze_noirq(). None of the drivers with legacy PCI PM callbacks seem to use runtime PM, so clear the flag unconditionally in their case. Fixes: 4b77b0a2ba27 ("PCI: Clear saved_state after the state has been restored") Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Cc: stable@vger.kernel.org # v2.6.32+ Link: https://patch.msgid.link/094f2aad64418710daf0940112abe5a0afdc6bce.1763483367.git.lukas@wunner.de
2025-11-24PCI: dw-rockchip: Configure L1SS supportShawn Lin1-0/+40
L1 PM Substates for RC mode require support in the dw-rockchip driver including proper handling of the CLKREQ# sideband signal. It is mostly handled by hardware, but software still needs to set the clkreq fields in the PCIE_CLIENT_POWER_CON register to match the hardware implementation. For more details, see section '18.6.6.4 L1 Substate' in the RK3568 TRM 1.1 Part 2, or section '11.6.6.4 L1 Substate' in the RK3588 TRM 1.0 Part2. [bhelgaas: set pci->l1ss_support so DWC core preserves L1SS Capability bits; drop corresponding code here, include updates from https://lore.kernel.org/r/aRRG8wv13HxOCqgA@ryzen] Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/1761187883-150120-1-git-send-email-shawn.lin@rock-chips.com Link: https://patch.msgid.link/20251118214312.2598220-4-helgaas@kernel.org
2025-11-24PCI: tegra194: Remove unnecessary L1SS disable codeBjorn Helgaas1-40/+5
The DWC core clears the L1 Substates Supported bits unless the driver sets the "dw_pcie.l1ss_support" flag. The tegra194 init_host_aspm() sets "dw_pcie.l1ss_support" if the platform has the "supports-clkreq" DT property. If "supports-clkreq" is absent, "dw_pcie.l1ss_support" is not set, and the DWC core will clear the L1 Substates Supported bits. The tegra194 code to clear the L1 Substates Supported bits is unnecessary, so remove it. Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251118214312.2598220-3-helgaas@kernel.org
2025-11-24PCI: dwc: Advertise L1 PM Substates only if driver requests itBjorn Helgaas5-0/+33
L1 PM Substates require the CLKREQ# signal and may also require device-specific support. If CLKREQ# is not supported or driver support is lacking, enabling L1.1 or L1.2 may cause errors when accessing devices, e.g., nvme nvme0: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0x10 If the kernel is built with CONFIG_PCIEASPM_POWER_SUPERSAVE=y or users enable L1.x via sysfs, users may trip over these errors even if L1 Substates haven't been enabled by firmware or the driver. To prevent such errors, disable advertising the L1 PM Substates unless the driver sets "dw_pcie.l1ss_support" to indicate that it knows CLKREQ# is present and any device-specific configuration has been done. Set "dw_pcie.l1ss_support" in tegra194 (if DT includes the "supports-clkreq' property) and qcom (for cfg_2_7_0, cfg_1_9_0, cfg_1_34_0, and cfg_sc8280xp controllers) so they can continue to use L1 Substates. Based on Niklas's patch: https://patch.msgid.link/20251017163252.598812-2-cassel@kernel.org [bhelgaas: drop hiding for endpoints] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251118214312.2598220-2-helgaas@kernel.org
2025-11-24PCI: dwc: Fix wrong PORT_LOGIC_LTSSM_STATE_MASK definitionShawn Lin1-1/+1
As per DesignWare Cores PCI Express Controller Databook, section 5.50, SII: Debug Signals, cxpl_debug_info[63:0]: [5:0] smlh_ltssm_state: LTSSM current state. Encoding is same as the dedicated smlh_ltssm_state output. The mask should be 6 bits, from 0 to 5. Hence, fix the mask definition. Fixes: 23fe5bd4be90 ("PCI: keystone: Cleanup ks_pcie_link_up()") Signed-off-by: Shawn Lin <shawn.lin@rock-chips.com> [mani: reworded description] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/1763122140-203068-1-git-send-email-shawn.lin@rock-chips.com
2025-11-24PCI: pwrctrl: Add power control driver for TC9563Krishna Chaitanya Chundru3-0/+665
TC9563 is a PCIe switch that has one upstream and three downstream ports. One of the downstream ports is connected to an integrated ethernet MAC endpoint. The other two downstream ports are available to connect to external devices. One Host can connect to TC9563 by upstream port. The TC9563 switch needs to be configured after powering on and before the PCIe link is up. The PCIe controller driver already enables link training at the host side even before this driver probe happens. Due to this, when driver enables power to the switch, it participates in link training and the PCIe link may come up before configuring the switch through I2C. Once the link is up the configuration done through I2C will not have any effect. To prevent the host from participating in link training, disable link training on the host side to ensure the link does not come up before the switch is configured via I2C. Based on DT property and type of the port, TC9563 is configured through I2C. Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> [bhelgaas: squash fixes from https://lore.kernel.org/r/20251120065116.13647-2-mani@kernel.org https://lore.kernel.org/r/20251120065116.13647-3-mani@kernel.org] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Bjorn Andersson <andersson@kernel.org> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20251101-tc9563-v9-6-de3429f7787a@oss.qualcomm.com
2025-11-24PCI: Use max() instead of max_t() to ease static analysisDavid Laight1-2/+1
In this code: used_buses = max_t(unsigned int, available_buses, pci_hotplug_bus_size - 1); max_t() casts the 'unsigned long' pci_hotplug_bus_size (either 32 or 64 bits) to 'unsigned int' (32 bits) result type, so there's a potential of discarding significant bits. Instead, use max(a, b), which casts 'unsigned int' to 'unsigned long' and cannot discard significant bits. In this case, pci_hotplug_bus_size is constrained to <= 0xff by pci_setup() so this doesn't fix a bug, but it makes static analysis easier. Signed-off-by: David Laight <david.laight.linux@gmail.com> [bhelgaas: commit log] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251119224140.8616-26-david.laight.linux@gmail.com
2025-11-24s390: Remove KMSG_COMPONENT macroHeiko Carstens1-2/+1
The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message catalog" which never made it upstream. Remove the macro in order to get rid of a pointless indirection. Replace all users with the string it defines. In almost all cases this leads to a simple replacement like this: - #define KMSG_COMPONENT "appldata" - #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt + #define pr_fmt(fmt) "appldata: " fmt Except for some special cases this is just mechanical/scripted work. Acked-by: Thomas Richter <tmricht@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-11-22PCI: iproc: Implement MSI controller node detection with of_msi_xlate()Lorenzo Pieralisi1-17/+5
The functionality implemented in the iproc driver in order to detect an OF MSI controller node is now fully implemented in of_msi_xlate(). Replace the current msi-map/msi-parent parsing code with of_msi_xlate(). Since of_msi_xlate() is also a deviceID mapping API, pass in a fictitious 0 as deviceID - the driver only requires detecting the OF MSI controller node not the deviceID mapping per-se (of_msi_xlate() return value is ignored for the same reason). Signed-off-by: Lorenzo Pieralisi <lpieralisi@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Frank Li <Frank.Li@nxp.com> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251021124103.198419-5-lpieralisi@kernel.org
2025-11-22Merge tag 'v6.18-rc3' into irq/msiThomas Gleixner8-112/+53
Pick up OF changes to resolve dependencies
2025-11-20PCI/P2PDMA: Provide an access to pci_p2pdma_map_type() functionLeon Romanovsky1-2/+12
Provide an access to pci_p2pdma_map_type() function to allow subsystems to determine the appropriate mapping type for P2PDMA transfers between a provider and target device. The pci_p2pdma_map_type() function is the core P2P layer version of the existing public, but struct page focused, pci_p2pdma_state() function. It returns the same result. It is required to use the p2p subsystem from drivers that don't use the struct page layer. Like __pci_p2pdma_update_state() it is not an exported function. The idea is that only subsystem code will implement mapping helpers for taking in phys_addr_t lists, this is deliberately not made accessible to every driver to prevent abuse. Following patches will use this function to implement a shared DMA mapping helper for DMABUF. Tested-by: Alex Mastro <amastro@fb.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-4-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-20PCI/P2PDMA: Refactor to separate core P2P functionality from memory allocationLeon Romanovsky1-31/+120
Refactor the PCI P2PDMA subsystem to separate the core peer-to-peer DMA functionality from the optional memory allocation layer. This creates a two-tier architecture: The core layer provides P2P mapping functionality for physical addresses based on PCI device MMIO BARs and integrates with the DMA API for mapping operations. This layer is required for all P2PDMA users. The optional upper layer provides memory allocation capabilities including gen_pool allocator, struct page support, and sysfs interface for user space access. This separation allows subsystems like DMABUF to use only the core P2P mapping functionality without the overhead of memory allocation features they don't need. The core functionality is now available through the new pcim_p2pdma_provider() function that returns a p2pdma_provider structure. Tested-by: Alex Mastro <amastro@fb.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-3-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-20PCI/P2PDMA: Separate the mmap() support from the core logicLeon Romanovsky1-20/+23
Currently the P2PDMA code requires a pgmap and a struct page to function. The was serving three important purposes: - DMA API compatibility, where scatterlist required a struct page as input - Life cycle management, the percpu_ref is used to prevent UAF during device hot unplug - A way to get the P2P provider data through the pci_p2pdma_pagemap The DMA API now has a new flow, and has gained phys_addr_t support, so it no longer needs struct pages to perform P2P mapping. Lifecycle management can be delegated to the user, DMABUF for instance has a suitable invalidation protocol that does not require struct page. Finding the P2P provider data can also be managed by the caller without need to look it up from the phys_addr. Split the P2PDMA code into two layers. The optional upper layer, effectively, provides a way to mmap() P2P memory into a VMA by providing struct page, pgmap, a genalloc and sysfs. The lower layer provides the actual P2P infrastructure and is wrapped up in a new struct p2pdma_provider. Rework the mmap layer to use new p2pdma_provider based APIs. Drivers that do not want to put P2P memory into VMA's can allocate a struct p2pdma_provider after probe() starts and free it before remove() completes. When DMA mapping the driver must convey the struct p2pdma_provider to the DMA mapping code along with a phys_addr of the MMIO BAR slice to map. The driver must ensure that no DMA mapping outlives the lifetime of the struct p2pdma_provider. The intended target of this new API layer is DMABUF. There is usually only a single p2pdma_provider for a DMABUF exporter. Most drivers can establish the p2pdma_provider during probe, access the single instance during DMABUF attach and use that to drive the DMA mapping. DMABUF provides an invalidation mechanism that can guarantee all DMA is halted and the DMA mappings are undone prior to destroying the struct p2pdma_provider. This ensures there is no UAF through DMABUFs that are lingering past driver removal. The new p2pdma_provider layer cannot be used to create P2P memory that can be mapped into VMA's, be used with pin_user_pages(), O_DIRECT, and so on. These use cases must still use the mmap() layer. The p2pdma_provider layer is principally for DMABUF-like use cases where DMABUF natively manages the life cycle and access instead of vmas/pin_user_pages()/struct page. In addition, remove the bus_off field from pci_p2pdma_map_state since it duplicates information already available in the pgmap structure. The bus_offset is only used in one location (pci_p2pdma_bus_addr_map) and is always identical to pgmap->bus_offset. Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Alex Mastro <amastro@fb.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Ankit Agrawal <ankita@nvidia.com> Link: https://lore.kernel.org/r/20251120-dmabuf-vfio-v9-1-d7f71607f371@nvidia.com Signed-off-by: Alex Williamson <alex@shazbot.org>
2025-11-20Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski4-28/+48
Cross-merge networking fixes after downstream PR (net-6.18-rc7). No conflicts, adjacent changes: tools/testing/selftests/net/af_unix/Makefile e1bb28bf13f4 ("selftest: af_unix: Add test for SO_PEEK_OFF.") 45a1cd8346ca ("selftests: af_unix: Add tests for ECONNRESET and EOF semantics") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-19PCI: epf-test: Switch to use %ptSpAndy Shevchenko1-3/+2
Use %ptSp instead of open coded variants to print content of struct timespec64 in human readable format. Acked-by: Bjorn Helgaas <bhelgaas@google.com> Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Link: https://patch.msgid.link/20251113150217.3030010-16-andriy.shevchenko@linux.intel.com Signed-off-by: Petr Mladek <pmladek@suse.com>
2025-11-18PCI: qcom: Implement .assert_perst()Krishna Chaitanya Chundru1-0/+13
Add support for assert_perst() for switches like TC9563, which require configuration before the PCIe link is established. Such devices use this function op to assert PERST# before configuring the device and once the configuration is done they de-assert PERST#. Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20251101-tc9563-v9-5-de3429f7787a@oss.qualcomm.com
2025-11-18PCI: dwc: Implement .assert_perst() for dwc glue driversKrishna Chaitanya Chundru2-0/+18
Add .assert_perst() hook for dwc glue drivers to register with assert_perst() of pci ops, allowing for better control over the link initialization and shutdown process. Implement assert_perst() function op for dwc drivers. Signed-off-by: Krishna Chaitanya Chundru <krishna.chundru@oss.qualcomm.com> [bhelgaas: squash dwc host support] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Manivannan Sadhasivam <mani@kernel.org> Link: https://patch.msgid.link/20251101-tc9563-v9-3-de3429f7787a@oss.qualcomm.com Link: https://patch.msgid.link/20251101-tc9563-v9-4-de3429f7787a@oss.qualcomm.com
2025-11-17PCI: stm32: Don't use 'proxy' headersAndy Shevchenko3-2/+17
Update header inclusions to follow IWYU (Include What You Use) principle. In particular, replace of_gpio.h, which is subject to removal by the GPIOLIB subsystem, with the respective headers that are being used by the driver. Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251114185534.3287497-1-andriy.shevchenko@linux.intel.com
2025-11-17PCI: stm32: Fix EP page_size alignmentChristian Bruel1-0/+2
pci_epc_mem_alloc_addr() allocates a CPU address from the ATU window phys base and a page number. Set the ep->page_size so the resulting CPU address is correctly aligned with the ATU required alignment. Fixes: 151f3d29baf4 ("PCI: stm32-ep: Add PCIe Endpoint support for STM32MP25") Signed-off-by: Christian Bruel <christian.bruel@foss.st.com> [mani: added fixes tag] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251114-atu_align_ep-v1-1-88da5366fa04@foss.st.com
2025-11-17PCI: stm32: Fix LTSSM EP race with start linkChristian Bruel1-31/+8
If the host has deasserted PERST# and started link training before the link is started on EP side, enabling LTSSM before the endpoint registers are initialized in the perst_irq handler results in probing incorrect values. Thus, wait for the PERST# level-triggered interrupt to start link training at the end of initialization and cleanup the stm32_pcie_[start stop]_link functions. Fixes: 151f3d29baf4 ("PCI: stm32-ep: Add PCIe Endpoint support for STM32MP25") Signed-off-by: Christian Bruel <christian.bruel@foss.st.com> [mani: added fixes tag] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> [bhelgaas: wrap line] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Link: https://patch.msgid.link/20251114-perst_ep-v1-1-e7976317a890@foss.st.com
2025-11-17PCI: spacemit: Add SpacemiT PCIe host driverAlex Elder3-0/+371
Introduce a driver for the PCIe host controller found in the SpacemiT K1 SoC. The hardware is derived from the Synopsys DesignWare PCIe IP. The driver supports up to three PCIe ports operating at PCIe link speed up to 5 GT/s. The first port uses a combo PHY, which may be configured for use for USB3 instead. Signed-off-by: Alex Elder <elder@riscstar.com> [mani: added FIXME to the comment on disabling ASPM L1] Signed-off-by: Manivannan Sadhasivam <mani@kernel.org> Tested-by: Jason Montleon <jmontleo@redhat.com> Tested-by: Johannes Erdfelt <johannes@erdfelt.com> Tested-by: Aurelien Jarno <aurelien@aurel32.net> Link: https://patch.msgid.link/20251113214540.2623070-6-elder@riscstar.com
2025-11-14Merge branch 'mlx5-next' of ↵Jakub Kicinski1-3/+13
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Tariq Toukan says: ==================== mlx5-next updates 2025-11-13 The following pull-request contains common mlx5 updates * 'mlx5-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Expose definition for 1600Gbps link mode net/mlx5: fs, set non default device per namespace net/mlx5: fs, Add other_eswitch support for steering tables net/mlx5: Add OTHER_ESWITCH HW capabilities net/mlx5: Add direct ST mode support for RDMA PCI/TPH: Expose pcie_tph_get_st_table_loc() {rdma,net}/mlx5: Query vports mac address from device ==================== Link: https://patch.msgid.link/1763027252-1168760-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-14Merge tag 'pci-v6.18-fixes-5' of ↵Linus Torvalds4-28/+48
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull pci fixes from Bjorn Helgaas: - Cache the ASPM L0s/L1 Supported bits early so quirks can override them if necessary (Bjorn Helgaas) - Add quirks for PA Semi and Freescale Root Ports and a HiSilicon Wi-Fi device that are reported to have broken L0s and L1 (Shawn Lin, Bjorn Helgaas) * tag 'pci-v6.18-fixes-5' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: PCI/ASPM: Avoid L0s and L1 on Hi1105 [19e5:1105] Wi-Fi PCI/ASPM: Avoid L0s and L1 on PA Semi [1959:a002] Root Ports PCI/ASPM: Avoid L0s and L1 on Freescale [1957:0451] Root Ports PCI/ASPM: Convert quirks to override advertised link states PCI/ASPM: Add pcie_aspm_remove_cap() to override advertised link states PCI/ASPM: Cache L0s/L1 Supported so advertised link states can be overridden
2025-11-14PCI/TSM: Add 'dsm' and 'bound' attributes for dependent functionsDan Williams1-20/+110
PCI/TSM sysfs for physical function 0 devices, i.e. the "DSM" (Device Security Manager), contains the 'connect' and 'disconnect' attributes. After a successful 'connect' operation the DSM, its dependent functions (SR-IOV virtual functions, non-zero multi-functions, or downstream endpoints of a switch DSM) are candidates for being transitioned into a TDISP (TEE Device Interface Security Protocol) operational state, via pci_tsm_bind(). At present sysfs is blind to which devices are capable of TDISP operation and it is ambiguous which functions are serviced by which DSMs. Add a 'dsm' attribute to identify a function's DSM device, and add a 'bound' attribute to identify when a function has entered a TDISP operational state. Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Lukas Wunner <lukas@wunner.de> Cc: Samuel Ortiz <sameo@rivosinc.com> Cc: Alexey Kardashevskiy <aik@amd.com> Cc: Xu Yilun <yilun.xu@linux.intel.com> Cc: Suzuki K Poulose <suzuki.poulose@arm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251113021446.436830-9-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-14PCI/TSM: Add pci_tsm_guest_req() for managing TDIsDan Williams1-0/+60
A PCIe device function interface assigned to a TVM is a TEE Device Interface (TDI). A TDI instantiated by pci_tsm_bind() needs additional steps taken by the TVM to be accepted into the TVM's Trusted Compute Boundary (TCB) and transitioned to the RUN state. pci_tsm_guest_req() is a channel for the guest to request TDISP collateral, like Device Interface Reports, and effect TDISP state changes, like LOCKED->RUN transititions. Similar to IDE establishment and pci_tsm_bind(), these are long running operations involving SPDM message passing via the DOE mailbox. The path for a TVM to invoke pci_tsm_guest_req() is: * TSM triggers exit via guest-to-host-interface ABI (implementation specific) * VMM invokes handler (KVM handle_exit() -> userspace io) * handler issues request (userspace io handler -> ioctl() -> pci_tsm_guest_req()) * handler supplies response * VMM posts response, notifies/re-enters TVM This path is purely a transport for messages from TVM to platform TSM. By design the host kernel does not and must not care about the content of these messages. I.e. the host kernel is not in the TCB of the TVM. As this is an opaque passthrough interface, similar to fwctl, the kernel requires that implementations stay within the bounds defined by 'enum pci_tsm_req_scope'. Violation of those expectations likely has market and regulatory consequences. Out of scope requests are blocked by default. Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251113021446.436830-8-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-14PCI/TSM: Add pci_tsm_bind() helper for instantiating TDIsDan Williams1-1/+108
After a PCIe device has established a secure link and session between a TEE Security Manager (TSM) and its local Device Security Manager (DSM), the device or its subfunctions are candidates to be bound to a private memory context, a TVM. A PCIe device function interface assigned to a TVM is a TEE Device Interface (TDI). The pci_tsm_bind() requests the low-level TSM driver to associate the device with private MMIO and private IOMMU context resources of a given TVM represented by a @kvm argument. A device in the bound state corresponds to the TDISP protocol LOCKED state and awaits validation by the TVM. It is a 'struct pci_tsm_link_ops' operation because, similar to IDE establishment, it involves host side resource establishment and context setup on behalf of the guest. It is also expected to be performed lazily to allow for operation of the device in non-confidential "shared" context for pre-lock configuration. Co-developed-by: Xu Yilun <yilun.xu@linux.intel.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251113021446.436830-7-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-14PCI/IDE: Initialize an ID for all IDE streamsDan Williams3-0/+132
The PCIe spec defines two types of streams - selective and link. Each stream has an ID from the same bucket so a stream ID does not tell the type. The spec defines an "enable" bit for every stream and required stream IDs to be unique among all enabled stream but there is no such requirement for disabled streams. However, when IDE_KM is programming keys, an IDE-capable device needs to know the type of stream being programmed to write it directly to the hardware as keys are relatively large, possibly many of them and devices often struggle with keeping around rather big data not being used. Walk through all streams on a device and initialise the IDs to some unique number, both link and selective. The weakest part of this proposal is the host bridge ide_stream_ids_ida. Technically, a Stream ID only needs to be unique within a given partner pair. However, with "anonymous" / unassigned streams there is no convenient place to track the available ids. Proceed with an ida in the host bridge for now, but consider moving this tracking to be an ide_stream_ids_ida per device. Co-developed-by: Alexey Kardashevskiy <aik@amd.com> Signed-off-by: Alexey Kardashevskiy <aik@amd.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251113021446.436830-6-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-14PCI/IDE: Add Address Association Register setup for downstream MMIOXu Yilun1-9/+108
The address ranges for downstream Address Association Registers need to cover memory addresses for all functions (PFs/VFs/downstream devices) managed by a Device Security Manager (DSM). The proposed solution is get the memory (32-bit only) range and prefetchable-memory (64-bit capable) range from the immediate ancestor downstream port (either the direct-attach RP or deepest switch port when switch attached). Similar to RID association, address associations will be set by default if hardware sets 'Number of Address Association Register Blocks' in the 'Selective IDE Stream Capability Register' to a non-zero value. TSM drivers can opt-out of the settings by zero'ing out unwanted / unsupported address ranges. E.g. TDX Connect only supports prefetachable (64-bit capable) memory ranges for the Address Association setting. If the immediate downstream port provides both a memory range and prefetchable-memory range, but the IDE partner port only provides 1 Address Association Register block then the TSM driver can pick which range to associate, or let the PCI core prioritize memory. Note, the Address Association Register setup for upstream requests is still uncertain so is not included. Co-developed-by: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@kernel.org> Co-developed-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Arto Merilainen <amerilainen@nvidia.com> Signed-off-by: Xu Yilun <yilun.xu@linux.intel.com> Co-developed-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> Link: https://patch.msgid.link/20251114010227.567693-1-dan.j.williams@intel.com Signed-off-by: Dan Williams <dan.j.williams@intel.com>
2025-11-14PCI/sysfs: Use PM_RUNTIME_ACQUIRE()/PM_RUNTIME_ACQUIRE_ERR()Rafael J. Wysocki1-2/+2
Use new PM_RUNTIME_ACQUIRE() and PM_RUNTIME_ACQUIRE_ERR() wrapper macros to make the code look more straightforward. No intentional functional impact. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Reviewed-by: Dhruva Gole <d-gole@ti.com> Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com> [ rjw; Typo fix in the changelog ] Link: https://patch.msgid.link/3932581.kQq0lBPeGt@rafael.j.wysocki Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2025-11-14PCI: Convert BAR sizes bitmasks to u64Ilpo Järvinen3-4/+4
PCIe r7.0, sec 7.8.6, defines resizable BAR sizes beyond the currently supported maximum of 128TB, which will require more than u32 to store the entire bitmask. Convert Resizable BAR related functions to use u64 bitmask for BAR sizes to make the typing more future-proof. The support for the larger BAR sizes themselves is not added at this point. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-12-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Add pci_rebar_get_max_size()Ilpo Järvinen1-0/+23
Add pci_rebar_get_max_size() to allow simplifying code that wants to know the maximum possible size for a Resizable BAR. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-9-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Add pci_rebar_size_supported() helperIlpo Järvinen2-13/+20
Many callers of pci_rebar_get_possible_sizes() are interested in finding out if a particular encoded BAR Size (PCIe r7.0, sec 7.8.6.3) is supported by the particular BAR. Add pci_rebar_size_supported() into PCI core to make it easy for the drivers to determine if the BAR size is supported or not. Use the new function in pci_resize_resource() and in pci_iov_vf_bar_set_size(). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://patch.msgid.link/20251113180053.27944-6-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Improve Resizable BAR functions kernel docIlpo Järvinen1-14/+21
Fix the copy-pasted errors in the Resizable BAR handling functions kernel doc and generally improve wording choices. Fix the formatting errors of the Return: line. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-5-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Move pci_rebar_size_to_bytes() and export itIlpo Järvinen2-4/+12
pci_rebar_size_to_bytes() is in drivers/pci/pci.h but would be useful for endpoint drivers as well. Move the function to rebar.c and export it. In addition, convert the literal to where the number comes from (PCI_REBAR_MIN_SIZE). Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-4-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Move pci_rebar_bytes_to_size() and clean it upIlpo Järvinen1-0/+23
Move pci_rebar_bytes_to_size() from include/linux/pci.h to rebar.c as it does not look very trivial and is not expected to be performance critical. Convert literals to use a newly added PCI_REBAR_MIN_SIZE define. Also add kernel doc for the function as the function is exported. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Michael J. Ruhl <mjruhl@habana.ai> Link: https://patch.msgid.link/20251113180053.27944-3-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Move Resizable BAR code to rebar.cIlpo Järvinen5-235/+249
For lack of a better place to put it, Resizable BAR code has been placed inside pci.c and setup-res.c that do not use it for anything. Upcoming changes are going to add more Resizable BAR related functions, increasing the code size. As pci.c is huge as is, move the Resizable BAR related code and the BAR resize code from setup-res.c to rebar.c. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christian König <christian.koenig@amd.com> Link: https://patch.msgid.link/20251113180053.27944-2-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Prevent restoring assigned resourcesIlpo Järvinen1-0/+4
restore_dev_resource() copies saved addresses and flags from the struct pci_dev_resource back to the struct resource, typically, during rollback from a failure or in preparation for a retry attempt. If the resource is within resource tree, the resource must not be modified as the resource tree could be corrupted. Thus, it's a bug to call restore_dev_resource() for assigned resources (which did happen due to logic flaws in the BAR resize rollback). Add WARN_ON_ONCE() into restore_dev_resource() to detect such bugs easily and return without altering the resource to prevent corruption. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU Link: https://patch.msgid.link/20251113162628.5946-12-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Add kerneldoc for pci_resize_resource()Ilpo Järvinen1-0/+20
As pci_resize_resource() is meant to be used also outside of PCI core, document the interface with kerneldoc. Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Tested-by: Alex Bennée <alex.bennee@linaro.org> # AVA, AMD GPU Link: https://patch.msgid.link/20251113162628.5946-8-ilpo.jarvinen@linux.intel.com
2025-11-14PCI: Fix restoring BARs on BAR resize rollback pathIlpo Järvinen4-58/+86
BAR resize operation is implemented in the pci_resize_resource() and pbus_reassign_bridge_resources() functions. pci_resize_resource() can be called either from __resource_resize_store() from sysfs or directly by the driver for the Endpoint Device. The pci_resize_resource() requires that caller has released the device resources that share the bridge window with the BAR to be resized as otherwise the bridge window is pinned in place and cannot be changed. pbus_reassign_bridge_resources() rolls back resources if the resize operation fails, but rollback is performed only for the bridge windows. Because releasing the device resources are done by the caller of the BAR resize interface, these functions performing the BAR resize do not have access to the device resources as they were before the resize. pbus_reassign_bridge_resources() could try __pci_bridge_assign_resources() after rolling back the bridge windows as they were, however, it will not guarantee the resource are assigned due to differences in how FW and the kernel assign the resources (alignment of the start address and tail). To perform rollback robustly, the BAR resize interface has to be altered to also release the device resources that share the bridge window with the BAR to be resized. Also, remove restoring