| Age | Commit message (Collapse) | Author | Files | Lines |
|
The mmio regmap allocated during probe is never freed.
Switch to using the device managed allocator so that the regmap is
released on probe failures (e.g. probe deferral) and on driver unbind.
Fixes: a5caf03188e4 ("soc: ti: k3-socinfo: Do not use syscon helper to build regmap")
Cc: stable@vger.kernel.org # 6.15
Cc: Andrew Davis <afd@ti.com>
Signed-off-by: Johan Hovold <johan@kernel.org>
Acked-by: Andrew Davis <afd@ti.com>
Link: https://patch.msgid.link/20251127134942.2121-1-johan@kernel.org
Signed-off-by: Nishanth Menon <nm@ti.com>
|
|
There seems to be nothing preventing this driver from being compile
tested so enable that by adding the missing input prompt.
Fixes: 907a2b7e2fc7 ("soc: ti: add k3 platforms chipid module driver")
Signed-off-by: Johan Hovold <johan@kernel.org>
Link: https://patch.msgid.link/20251127135455.2497-1-johan@kernel.org
Signed-off-by: Nishanth Menon <nm@ti.com>
|
|
According to ACPI spec, entry method in LPI sub-package must be a
buffer or an integer.
The driver will disable the state whose the entry method is invalid
by zeroing flags in struct acpi_lpi_state.
The entry method is very key in cpuidle. A debug log is very useful
for developers.
Signed-off-by: Huisong Li <lihuisong@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
[ rjw: Subject and changelog edits, changed "illegal" to "invalid" ]
Link: https://patch.msgid.link/20251125064702.3666149-1-lihuisong@huawei.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
In pcie_ptm_create_debugfs(), if devm_kasprintf() fails after successfully
allocating ptm_debugfs with kzalloc(), the function returns without freeing
the allocated memory, resulting in a memory leak.
Free ptm_debugfs before returning in the devm_kasprintf() error path and in
pcie_ptm_destroy_debugfs().
Fixes: 132833405e61 ("PCI: Add debugfs support for exposing PTM context")
Signed-off-by: Aadityarangan Shridhar Iyengar <adiyenga@cisco.com>
[bhelgaas: squash additional fix from Mani:
https://lore.kernel.org/r/pdp4xc4d5ee3e547mmdro5riui3mclduqdl7j6iclfbozo2a4c@7m3qdm6yrhuv]
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: Manivannan Sadhasivam <mani@kernel.org>
Link: https://patch.msgid.link/20260111163650.33168-1-adiyenga@cisco.com
|
|
In acpi_processor_errata_piix4(), the pointer dev is first assigned an IDE
device and then reassigned an ISA device:
dev = pci_get_subsys(..., PCI_DEVICE_ID_INTEL_82371AB, ...);
dev = pci_get_subsys(..., PCI_DEVICE_ID_INTEL_82371AB_0, ...);
If the first lookup succeeds but the second fails, dev becomes NULL. This
leads to a potential null-pointer dereference when dev_dbg() is called:
if (errata.piix4.bmisx)
dev_dbg(&dev->dev, ...);
To prevent this, use two temporary pointers and retrieve each device
independently, avoiding overwriting dev with a possible NULL value.
Signed-off-by: Tuo Li <islituo@gmail.com>
[ rjw: Subject adjustment, added an empty code line ]
Link: https://patch.msgid.link/20260111163214.202262-1-islituo@gmail.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Like the JWIPC JVC9100 has its serial IRQ (10 and 11) described
as ActiveLow in the DSDT, which the kernel overrides to EdgeHigh which
breaks the serial.
irq 10, level, active-low, shared, skip-override
irq 11, level, active-low, shared, skip-override
Add the JVC9100 to the irq1_level_low_skip_override[] quirk table to fix
this.
Signed-off-by: Ai Chao <aichao@kylinos.cn>
Link: https://patch.msgid.link/20260113072719.4154485-1-aichao@kylinos.cn
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Enable Runtime PM for the mipi_i3c_hci_pci driver. Introduce helpers to
allow and forbid Runtime PM during probe and remove, using pm_runtime APIs.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-22-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Implement optional Runtime PM support for the MIPI I3C HCI driver.
Introduce runtime suspend and resume callbacks to manage bus state and
restore hardware configuration after resume. Optionally enable autosuspend
with a default delay of 1 second, and add helper functions to control
Runtime PM during probe and remove.
Read quirks from i3c_hci_driver_ids[] and set new quirk
HCI_QUIRK_RPM_ALLOWED for intel-lpss-i3c devices to enable runtime PM for
them.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-21-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Master drivers currently manage Runtime PM individually, but all require
runtime resume for bus operations. This can be centralized in common code.
Add optional Runtime PM support to ensure the parent device is runtime
resumed before bus operations and auto-suspended afterward.
Notably, do not call ->bus_cleanup() if runtime resume fails. Master
drivers that opt-in to core runtime PM support must take that into account.
Also provide an option to allow IBIs and hot-joins while runtime suspended.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-20-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Prepare for future reuse. Move master dynamic address setting logic from
i3c_hci_bus_init() into a dedicated helper function,
i3c_hci_set_master_dyn_addr().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-19-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Prepare i3c_hci_reset_and_init() to support runtime resume. Update it to
handle the case where the I/O mode has already been selected.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-18-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Prepare for future reuse. Move core initialization logic from
i3c_hci_init() into a dedicated helper function,
i3c_hci_reset_and_init().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-17-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Prepare for future reuse. Move the IO mode setting logic from
i3c_hci_init() into a dedicated helper function, i3c_hci_set_io_mode().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-16-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Prepare for future reuse of the reset sequence in other contexts, such as
power management. Move the software reset logic from i3c_hci_init() into a
dedicated helper function, i3c_hci_software_reset().
Software reset should never fail. Print an error message if it does.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-15-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Introduce helper functions to suspend and resume PIO operations. These
are required to prepare for upcoming Runtime PM support, ensuring that
PIO state is properly managed during power transitions.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-14-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Move the PIO register setup logic out of hci_pio_init() into a new
helper, __hci_pio_init(). This refactoring prepares for Runtime PM
support by allowing PIO registers to be reinitialized independently
after resume.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-13-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Introduce helper functions to suspend and resume DMA operations. These
are required to prepare for upcoming Runtime PM support, ensuring that
DMA state is properly managed during power transitions.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-12-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Split the ring setup logic out of hci_dma_init() into a new helper
hci_dma_init_rings(). This refactoring prepares for Runtime PM support
by allowing DMA rings to be reinitialized independently after resume.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-11-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Add a dedicated function to restore the Device Address Table (DAT) in
preparation for Runtime PM support. This will allow reprogramming the DAT
after the controller resumes from a low-power state.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-10-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Prepare for Runtime PM support, which requires restoring the Device Address
Table (DAT) registers after resume. Maintain a copy of DAT in memory so it
can be reprogrammed when the controller is powered back up.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-9-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
The driver already uses devres for resource management, but the standard
resource-managed DMA allocation helpers cannot be used because they assume
the DMA device matches the managed device.
To address this, factor out the deallocation logic from hci_dma_cleanup()
into a new helper, hci_dma_free(), and register it as a devres action.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-8-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
The driver already uses managed resources, so convert the PIO data
structure allocation to devm_zalloc(). Remove the manual kfree().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-7-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
The driver already uses managed resources, so convert the Device Address
Table (DAT) bitmap allocation to use devm_bitmap_zalloc(). Remove the
manual cleanup routine.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-6-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
IBI disable failures are not indicative of a software bug, so using
WARN_ON() is not appropriate. Replace these warnings with dev_err().
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-5-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
To prevent inconsistent state when an error occurs, ensure the hot-join
flag is updated only when enabling or disabling hot-join succeeds.
Fixes: 317bacf960a48 ("i3c: master: add enable(disable) hot join in sys entry")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-4-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
Wait for the bus to fully disable before proceeding, ensuring that no
operations are still in progress. Synchronize the IRQ handler only after
interrupt signals have been disabled. This approach also handles cases
where bus disable might fail, preventing race conditions and ensuring a
consistent shutdown sequence.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-3-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
The MIPI I3C HCI specification does not define reset values for
RING_OPERATION1 fields, and some controllers (e.g., Intel) do not clear
them during a software reset. Ensure the ring pointers are explicitly
set to zero during bus initialization to avoid inconsistent state.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113072702.16268-2-adrian.hunter@intel.com
Signed-off-by: Alexandre Belloni <alexandre.belloni@bootlin.com>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media
Pull media fixes from Mauro Carvalho Chehab:
- ov02c10: some fixes related to preserving bayer pattern and
horizontal control
- ipu-bridge: Add quirks for some Dell XPS laptops with inverted
sensors
- mali-c55: Fix version identifier logic
- rzg2l-cru: csi-2: fix RZ/V2H input sizes on some variants
* tag 'media/v6.19-3' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media:
media: ov02c10: Remove unnecessary hflip and vflip pointers
media: ipu-bridge: Add DMI quirk for Dell XPS laptops with upside down sensors
media: ov02c10: Fix the horizontal flip control
media: ov02c10: Adjust x-win/y-win when changing flipping to preserve bayer-pattern
media: ov02c10: Fix bayer-pattern change after default vflip change
media: rzg2l-cru: csi-2: Support RZ/V2H input sizes
media: uapi: mali-c55-config: Remove version identifier
media: mali-c55: Remove duplicated version check
media: Documentation: mali-c55: Use v4l2-isp version identifier
|
|
During pre-production development, drivers may provide both ACPI and OF
match tables while a formal ACPI HID for the device is not yet
allocated. Such devices are enumerated via PRP0001. In this case,
acpi_device_get_match_data() consults only the driver’s ACPI match table
and returns NULL, even though the device was successfully matched via
PRP0001.
This behavior also risks breaking existing PRP0001 setups if a driver
later gains an ACPI HID, as the presence of an ACPI match table changes
the match-data lookup path.
Make acpi_device_get_match_data() use the same precedence as driver
matching by using __acpi_match_device(). Return match data from the
acpi_id or of_id that was actually matched.
Remove now-unused acpi_of_device_get_match_data().
Signed-off-by: Kartik Rajput <kkartik@nvidia.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Link: https://patch.msgid.link/20260114082306.48119-1-kkartik@nvidia.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
After U-Boot initializes PCIe with "pcie enum", Linux fails to detect
an NVMe disk on some boot cycles with:
phy phy-32f00000.pcie-phy.0: phy poweron failed --> -110
Discussion with NXP identified that the iMX8MP PCIe PHY PLL may fail to
lock when re-initialized without a reset cycle [1].
The issue reproduces on 7% of tested hardware platforms, with a 30-40%
failure rate per affected device across boot cycles.
Insert a reset cycle in the power-on routine to ensure the PHY is
initialized from a known state.
[1] https://community.nxp.com/t5/i-MX-Processors/iMX8MP-PCIe-initialization-in-U-Boot/m-p/2248437#M242401
Signed-off-by: Rafael Beims <rafael.beims@toradex.com>
Cc: stable@vger.kernel.org
Link: https://patch.msgid.link/20251223150254.1075221-1-rafael@beims.me
Signed-off-by: Vinod Koul <vkoul@kernel.org>
|
|
When Firmware First is enabled, BIOS handles errors first and then it
makes them available to the kernel via the Common Platform Error Record
(CPER) sections (UEFI 2.11 Appendix N.2.13). Linux parses the CPER
sections via one of two similar paths, either ELOG or GHES. The errors
managed by ELOG are signaled to the BIOS by the I/O Machine Check
Architecture (I/O MCA).
Currently, ELOG and GHES show some inconsistencies in how they report to
userspace via trace events.
Therefore, make the two mentioned paths act similarly by tracing the CPER
CXL Protocol Error Section.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Link: https://patch.msgid.link/20260114101543.85926-6-fabio.m.de.francesco@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Make a helper out of cxl_cper_post_prot_err() that checks the CXL agent
type and copy the CPER CXL protocol errors information to a work data
structure.
Export the new symbol for reuse by ELOG.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20260114101543.85926-5-fabio.m.de.francesco@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Move the CPER CXL protocol errors validity check out of
cxl_cper_post_prot_err() to new cxl_cper_sec_prot_err_valid() and limit
the serial number check only to CXL agents that are CXL devices (UEFI
v2.10, Appendix N.2.13).
Export the new symbol for reuse by ELOG.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
[ rjw: Subject tweak ]
Link: https://patch.msgid.link/20260114101543.85926-4-fabio.m.de.francesco@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
I/O Machine Check Architecture events may signal failing PCIe components
or links. The AER event contains details on what was happening on the wire
when the error was signaled.
Trace the CPER PCIe Error section (UEFI v2.11, Appendix N.2.7) reported
by the I/O MCA.
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Link: https://patch.msgid.link/20260114101543.85926-3-fabio.m.de.francesco@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ghes_do_proc() has a catch-all for unknown or unhandled CPER formats
(UEFI v2.11 Appendix N 2.3), extlog_print() does not. This gap was
noticed by a RAS test that injected CXL protocol errors which were
notified to extlog_print() via the IOMCA (I/O Machine Check
Architecture) mechanism. Bring parity to the extlog_print() path by
including a similar log_non_standard_event().
Reviewed-by: Dan Williams <dan.j.williams@intel.com>
Reviewed-by: Dave Jiang <dave.jiang@intel.com>
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Reviewed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com>
Reviewed-by: Qiuxu Zhuo <qiuxu.zhuo@intel.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Signed-off-by: Fabio M. De Francesco <fabio.m.de.francesco@linux.intel.com>
Link: https://patch.msgid.link/20260114101543.85926-2-fabio.m.de.francesco@linux.intel.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Performance testing on ARMv8 systems shows significant overhead in error
status handling in SEA error handling.
- ghes_peek_estatus(): 8,138.3 ns (21,160 cycles).
- ghes_clear_estatus(): 2,038.3 ns (5,300 cycles).
Apply the same optimization used in ghes_notify_nmi() to
ghes_notify_sea() by checking for active errors before processing,
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20260112032239.30023-4-xueshuai@linux.alibaba.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Refactors the GHES driver by extracting common functionality into
reusable helper functions:
1. ghes_has_active_errors() - Checks if any error sources in a given list
have active errors
2. ghes_map_error_status() - Maps error status address to virtual address
3. ghes_unmap_error_status() - Unmaps error status virtual address
4. Use `guard(rcu)()` instead of explicit `rcu_read_lock()`/`rcu_read_unlock()`.
These helpers eliminate code duplication in the NMI path and prepare for
similar usage in the SEA path in a subsequent patch.
No functional change intended.
Tested-by: Tony Luck <tony.luck@intel.com>
Reviewed-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20260112032239.30023-3-xueshuai@linux.alibaba.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
ghes_notify_nmi() is called for every NMI and must check whether the NMI was
generated because an error was signalled by platform firmware.
This check is very expensive as for each registered GHES NMI source it reads
from the acpi generic address attached to this error source to get the physical
address of the acpi_hest_generic_status block. It then checks the "block_status"
to see if an error was logged.
The ACPI/APEI code must create virtual mappings for each of those physical
addresses, and tear them down afterwards. On an Icelake system this takes around
15,000 TSC cycles. Enough to disturb efforts to profile system performance.
If that were not bad enough, there are some atomic accesses in the code path
that will cause cache line bounces between CPUs. A problem that gets worse as
the core count increases.
But BIOS changes neither the acpi generic address nor the physical address of
the acpi_hest_generic_status block. So this walk can be done once when the NMI is
registered to save the virtual address (unmapping if the NMI is ever unregistered).
The "block_status" can be checked directly in the NMI handler. This can be done
without any atomic accesses.
Resulting time to check that there is not an error record is around 900 cycles.
Reported-by: Andi Kleen <andi.kleen@intel.com>
Signed-off-by: Tony Luck <tony.luck@intel.com>
Tested-by: Tony Luck <tony.luck@intel.com>
Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
Link: https://patch.msgid.link/20260112032239.30023-2-xueshuai@linux.alibaba.com
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The current logic at cper_print_fw_err() doesn't check if the
error record length is big enough to handle offset. On a bad firmware,
if the ofset is above the actual record, length -= offset will
underflow, making it dump the entire memory.
The end result can be:
- the logic taking a lot of time dumping large regions of memory;
- data disclosure due to the memory dumps;
- an OOPS, if it tries to dump an unmapped memory region.
Fix it by checking if the section length is too small before doing
a hex dump.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject tweaks ]
Link: https://patch.msgid.link/1752b5ba63a3e2f148ddee813b36c996cc617e86.1767871950.git.mchehab+huawei@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The logic at ghes_new() prevents allocating too large records, by
checking if they're bigger than GHES_ESTATUS_MAX_SIZE (currently, 64KB).
Yet, the allocation is done with the actual number of pages from the
CPER bios table location, which can be smaller.
Yet, a bad firmware could send data with a different size, which might
be bigger than the allocated memory, causing an OOPS:
Unable to handle kernel paging request at virtual address fff00000f9b40000
Mem abort info:
ESR = 0x0000000096000007
EC = 0x25: DABT (current EL), IL = 32 bits
SET = 0, FnV = 0
EA = 0, S1PTW = 0
FSC = 0x07: level 3 translation fault
Data abort info:
ISV = 0, ISS = 0x00000007, ISS2 = 0x00000000
CM = 0, WnR = 0, TnD = 0, TagAccess = 0
GCS = 0, Overlay = 0, DirtyBit = 0, Xs = 0
swapper pgtable: 4k pages, 52-bit VAs, pgdp=000000008ba16000
[fff00000f9b40000] pgd=180000013ffff403, p4d=180000013fffe403, pud=180000013f85b403, pmd=180000013f68d403, pte=0000000000000000
Internal error: Oops: 0000000096000007 [#1] SMP
Modules linked in:
CPU: 0 UID: 0 PID: 303 Comm: kworker/0:1 Not tainted 6.19.0-rc1-00002-gda407d200220 #34 PREEMPT
Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
Workqueue: kacpi_notify acpi_os_execute_deferred
pstate: 214020c5 (nzCv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
pc : hex_dump_to_buffer+0x30c/0x4a0
lr : hex_dump_to_buffer+0x328/0x4a0
sp : ffff800080e13880
x29: ffff800080e13880 x28: ffffac9aba86f6a8 x27: 0000000000000083
x26: fff00000f9b3fffc x25: 0000000000000004 x24: 0000000000000004
x23: ffff800080e13905 x22: 0000000000000010 x21: 0000000000000083
x20: 0000000000000001 x19: 0000000000000008 x18: 0000000000000010
x17: 0000000000000001 x16: 00000007c7f20fec x15: 0000000000000020
x14: 0000000000000008 x13: 0000000000081020 x12: 0000000000000008
x11: ffff800080e13905 x10: ffff800080e13988 x9 : 0000000000000000
x8 : 0000000000000000 x7 : 0000000000000001 x6 : 0000000000000020
x5 : 0000000000000030 x4 : 00000000fffffffe x3 : 0000000000000000
x2 : ffffac9aba78c1c8 x1 : ffffac9aba76d0a8 x0 : 0000000000000008
Call trace:
hex_dump_to_buffer+0x30c/0x4a0 (P)
print_hex_dump+0xac/0x170
cper_estatus_print_section+0x90c/0x968
cper_estatus_print+0xf0/0x158
__ghes_print_estatus+0xa0/0x148
ghes_proc+0x1bc/0x220
ghes_notify_hed+0x5c/0xb8
notifier_call_chain+0x78/0x148
blocking_notifier_call_chain+0x4c/0x80
acpi_hed_notify+0x28/0x40
acpi_ev_notify_dispatch+0x50/0x80
acpi_os_execute_deferred+0x24/0x48
process_one_work+0x15c/0x3b0
worker_thread+0x2d0/0x400
kthread+0x148/0x228
ret_from_fork+0x10/0x20
Code: 6b14033f 540001ad a94707e2 f100029f (b8747b44)
---[ end trace 0000000000000000 ]---
Prevent that by taking the actual allocated are into account when
checking for CPER length.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject tweaks ]
Link: https://patch.msgid.link/4e70310a816577fabf37d94ed36cde4ad62b1e0a.1767871950.git.mchehab+huawei@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
There's a logic inside GHES/CPER to detect if the section_length
is too small, but it doesn't detect if it is too big.
Currently, if the firmware receives an ARM processor CPER record
stating that a section length is big, kernel will blindly trust
section_length, producing a very long dump. For instance, a 67
bytes record with ERR_INFO_NUM set 46198 and section length
set to 854918320 would dump a lot of data going a way past the
firmware memory-mapped area.
Fix it by adding a logic to prevent it to go past the buffer
if ERR_INFO_NUM is too big, making it report instead:
[Hardware Error]: Hardware error from APEI Generic Hardware Error Source: 1
[Hardware Error]: event severity: recoverable
[Hardware Error]: Error 0, type: recoverable
[Hardware Error]: section_type: ARM processor error
[Hardware Error]: MIDR: 0xff304b2f8476870a
[Hardware Error]: section length: 854918320, CPER size: 67
[Hardware Error]: section length is too big
[Hardware Error]: firmware-generated error record is incorrect
[Hardware Error]: ERR_INFO_NUM is 46198
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject and changelog tweaks ]
Link: https://patch.msgid.link/41cd9f6b3ace3cdff7a5e864890849e4b1c58b63.1767871950.git.mchehab+huawei@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
If the BIOS generates a very small ARM Processor Error, or
an incomplete one, the current logic will fail to deferrence
err->section_length
and
ctx_info->size
Add checks to avoid that. With such changes, such GHESv2
records won't cause OOPSes like this:
[ 1.492129] Internal error: Oops: 0000000096000005 [#1] SMP
[ 1.495449] Modules linked in:
[ 1.495820] CPU: 0 UID: 0 PID: 9 Comm: kworker/0:0 Not tainted 6.18.0-rc1-00017-gabadcc3553dd-dirty #18 PREEMPT
[ 1.496125] Hardware name: QEMU QEMU Virtual Machine, BIOS unknown 02/02/2022
[ 1.496433] Workqueue: kacpi_notify acpi_os_execute_deferred
[ 1.496967] pstate: 814000c5 (Nzcv daIF +PAN -UAO -TCO +DIT -SSBS BTYPE=--)
[ 1.497199] pc : log_arm_hw_error+0x5c/0x200
[ 1.497380] lr : ghes_handle_arm_hw_error+0x94/0x220
0xffff8000811c5324 is in log_arm_hw_error (../drivers/ras/ras.c:75).
70 err_info = (struct cper_arm_err_info *)(err + 1);
71 ctx_info = (struct cper_arm_ctx_info *)(err_info + err->err_info_num);
72 ctx_err = (u8 *)ctx_info;
73
74 for (n = 0; n < err->context_info_num; n++) {
75 sz = sizeof(struct cper_arm_ctx_info) + ctx_info->size;
76 ctx_info = (struct cper_arm_ctx_info *)((long)ctx_info + sz);
77 ctx_len += sz;
78 }
79
and similar ones while trying to access section_length on an
error dump with too small size.
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Reviewed-by: Jonathan Cameron <jonathan.cameron@huawei.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>
[ rjw: Subject tweaks ]
Link: https://patch.msgid.link/7fd9f38413be05ee2d7cfdb0dc31ea2274cf1a54.1767871950.git.mchehab+huawei@kernel.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
The commit d2fe192348f9 (“nvme: only allow entering LIVE from CONNECTING
state”) disallows controller state transitions directly from RESETTING
to LIVE. However, the NVMe PCIe subsystem reset path relies on this
transition to recover the controller on PowerPC (PPC) systems.
On PPC systems, issuing a subsystem reset causes a temporary loss of
communication with the NVMe adapter. A subsequent PCIe MMIO read then
triggers EEH recovery, which restores the PCIe link and brings the
controller back online. For EEH recovery to proceed correctly, the
controller must transition back to the LIVE state.
Due to the changes introduced by commit d2fe192348f9 (“nvme: only allow
entering LIVE from CONNECTING state”), the controller can no longer
transition directly from RESETTING to LIVE. As a result, EEH recovery
exits prematurely, leaving the controller stuck in the RESETTING state.
Fix this by explicitly transitioning the controller state from RESETTING
to CONNECTING and then to LIVE. This satisfies the updated state
transition rules and allows the controller to be successfully recovered
on PPC systems following a PCIe subsystem reset.
Cc: stable@vger.kernel.org
Fixes: d2fe192348f9 ("nvme: only allow entering LIVE from CONNECTING state")
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
|
|
Based on the reality[1][2] that vb2_dma_sg_alloc() can't alloc buffer with
device DMA limits, those device will always get below error: "swiotlb
buffer is full (sz: 393216 bytes), total 65536 (slots), used 2358 (slots)"
and the uvc gadget function can't work at all.
The videobuf2-dma-sg.c driver doesn't has a formal improve about this issue
till now. For UVC gadget, the videobuf2 subsystem doesn't do dma_map() on
vmalloc returned big buffer when allocate the video buffers, however, it do
it for dma_sg returned buffer. So the issue happens for vb2_dma_sg_alloc().
To workaround the issue, lets retry vb2_reqbufs() with
vb_vmalloc_memops if it fails to allocate buffer with vb2_dma_sg_memops.
If use vmalloced buffer, UVC gadget will allocate some small buffers for
each usb_request to do dma transfer, then uvc driver will memcopy data
from big buffer to small buffer.
Link[1]: https://lore.kernel.org/linux-media/20230828075420.2009568-1-anle.pan@nxp.com/
Link[2]: https://lore.kernel.org/linux-media/20230914145812.12851-1-hui.fang@nxp.com/
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Link: https://patch.msgid.link/20260113-uvc-gadget-fix-patch-v2-4-62950ef5bcb5@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
uvcg_queue_init() may fail, but its return value is currently ignored.
Propagate the error code from uvcg_queue_init() to correctly report
initialization failures.
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://patch.msgid.link/20260113-uvc-gadget-fix-patch-v2-3-62950ef5bcb5@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
According to USB specification:
For full-/high-speed isochronous endpoints, the bInterval value is
used as the exponent for a 2^(bInterval-1) value.
To correctly convert bInterval as interval_duration:
interval_duration = 2^(bInterval-1) * frame_interval
Because the unit of video->interval is 100ns, add a comment info to
make it clear.
Fixes: 48dbe731171e ("usb: gadget: uvc: set req_size and n_requests based on the frame interval")
Cc: stable@vger.kernel.org
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://patch.msgid.link/20260113-uvc-gadget-fix-patch-v2-2-62950ef5bcb5@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Current req_payload_size calculation has 2 issue:
(1) When the first time calculate req_payload_size for all the buffers,
reqs_per_frame = 0 will be the divisor of DIV_ROUND_UP(). So
the result is undefined.
This happens because VIDIOC_STREAMON is always executed after
VIDIOC_QBUF. So video->reqs_per_frame will be 0 until VIDIOC_STREAMON
is run.
(2) The buf->req_payload_size may be bigger than max_req_size.
Take YUYV pixel format as example:
If bInterval = 1, video->interval = 666666, high-speed:
video->reqs_per_frame = 666666 / 1250 = 534
720p: buf->req_payload_size = 1843200 / 534 = 3452
1080p: buf->req_payload_size = 4147200 / 534 = 7766
Based on such req_payload_size, the controller can't run normally.
To fix above issue, assign max_req_size to buf->req_payload_size when
video->reqs_per_frame = 0. And limit buf->req_payload_size to
video->req_size if it's large than video->req_size. Since max_req_size
is used at many place, add it to struct uvc_video and set the value once
endpoint is enabled.
Fixes: 98ad03291560 ("usb: gadget: uvc: set req_length based on payload by nreqs instead of req_size")
Cc: stable@vger.kernel.org
Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://patch.msgid.link/20260113-uvc-gadget-fix-patch-v2-1-62950ef5bcb5@nxp.com
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
Ignore USB role switches if dwc3-apple is already in the desired state.
The USB-C port controller on M2 and M1/M2 Pro/Max/Ultra devices issues
additional interrupts which result in USB role switches to the already
active role.
Ignore these USB role switches to ensure the USB-C port controller and
dwc3-apple are always in a consistent state. This matches the behaviour
in __dwc3_set_mode() in core.c.
Fixes detecting USB 2.0 and 3.x devices on the affected systems. The
reset caused by the additional role switch appears to leave the USB
devices in a state which prevents detection when the phy and dwc3 is
brought back up again.
Fixes: 0ec946d32ef7 ("usb: dwc3: Add Apple Silicon DWC3 glue layer driver")
Cc: stable <stable@kernel.org>
Signed-off-by: Janne Grunau <j@jannau.net>
Acked-by: Thinh Nguyen <Thinh.Nguyen@synopsys.com>
Reviewed-by: Sven Peter <sven@kernel.org>
Tested-by: Sven Peter <sven@kernel.org> # M1 mac mini and macbook air
Link: https://patch.msgid.link/20260109-apple-dwc3-role-switch-v1-1-11623b0f6222@jannau.net
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
|
|
All MERT catastrophic errors but VF's LMTT fault are serious, so
we shouldn't limit our handling only to print debug messages.
Change CATERR message to error level and then declare the device
as wedged to match expectation from the design document. For the
LMTT faults, add a note about adding tracking of this unexpected
VF activity.
While at it, rename register fields defnitions to match the BSpec.
Also drop trailing include guard name from the regs.h file.
BSpec: 74625
Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com>
Cc: Lukasz Laguna <lukasz.laguna@intel.com>
Reviewed-by: Lukasz Laguna <lukasz.laguna@intel.com>
Link: https://patch.msgid.link/20260112183716.28700-1-michal.wajdeczko@intel.com
|