aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2025-12-03drm/panthor: fix queue_reset_timeout_lockedChia-I Wu1-12/+12
queue_check_job_completion calls queue_reset_timeout_locked to reset the timeout when progress is made. We want the reset to happen when the timeout is running, not when it is suspended. Fixes: 345c5b7cc0f85 ("drm/panthor: Make the timeout per-queue instead of per-job") Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patch.msgid.link/20251202174028.1600218-1-olvaffe@gmail.com
2025-12-03drm/panthor: Remove redundant call to disable the MCUAkash Goel1-1/+0
This commit removes the redundant call to disable the MCU firmware in the suspend path. Fixes: 514072549865 ("drm/panthor: Support GLB_REQ.STATE field for Mali-G1 GPUs") Signed-off-by: Akash Goel <akash.goel@arm.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patch.msgid.link/20251203091911.145623-1-akash.goel@arm.com
2025-12-03drm/panthor: Unlock the locked region before disabling an ASBoris Brezillon1-0/+10
An AS can be disabled in the middle of a VM operation (VM being evicted from an AS slot, for instance). In that case, we need the locked section to be unlocked before releasing the slot. v2: - Add an lockdep_assert_held() in panthor_mmu_as_disable() - Collect R-bs v3: - Don't reset the locked_region range in the as_disable() path Fixes: 6e2d3b3e8589 ("drm/panthor: Add support for atomic page table updates") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patch.msgid.link/20251203121750.404340-4-boris.brezillon@collabora.com
2025-12-03drm/panthor: Make sure caches are flushed/invalidated when an AS is recycledBoris Brezillon1-6/+20
When we re-assign a slot to a different VM, we need to make sure the old VM caches are flushed before doing the switch. Specialize panthor_mmu_as_disable() so we can skip the slot programmation while still getting the cache flushing, and call this helper from panthor_vm_active() when an idle slot is recycled. v2: - Collect R-bs Fixes: 6e2d3b3e8589 ("drm/panthor: Add support for atomic page table updates") Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patch.msgid.link/20251203121750.404340-3-boris.brezillon@collabora.com
2025-12-03drm/panthor: Drop a WARN_ON() in group_free_queue()Boris Brezillon1-3/+2
It appears the timeout can still be enabled when we reach that point, because of the asynchronous progress check done on queues that resets the timer when jobs are still in-flight, but progress was made. We could add more checks to make sure the timer is not re-enabled when a group can't run anymore, but we don't have a group to pass to queue_check_job_completion() in some context. It's just as safe (we just want to be sure the timer is stopped before we destroy the queue) and simpler to drop the WARN_ON() in group_free_queue(). v2: - Collect R-bs Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Chia-I Wu <olvaffe@gmail.com> Signed-off-by: Liviu Dudau <liviu.dudau@arm.com> Link: https://patch.msgid.link/20251203121750.404340-2-boris.brezillon@collabora.com
2025-12-03Merge drm/drm-next into drm-xe-nextThomas Hellström1456-15950/+72361
Backmerging to bring in a needed dependency for the Xe VFIO driver variant. This should ideally have been done before we commited that, so we now have a small window in drm-xe-next where that driver doesn't compile. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512030331.I8CveRre-lkp@intel.com/ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com>
2025-12-03drm/gem-shmem: revert the 8-byte alignment constraintLudovic Desroches1-1/+1
Using drm_mode_size_dumb() to compute the size of dumb buffers introduced an 8-byte alignment constraint on the pitch that wasn’t present before. Let’s remove this constraint, which isn’t necessarily required and may cause buffers to be allocated larger than needed. Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 4977dcecb931 ("drm/gem-shmem: Compute dumb-buffer sizes with drm_mode_size_dumb()") Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251126-lcd_pitch_alignment-v1-2-991610a1e369@microchip.com
2025-12-03drm/gem-dma: revert the 8-byte alignment constraintLudovic Desroches1-1/+1
Using drm_mode_size_dumb() to compute the size of dumb buffers introduced an 8-byte alignment constraint on the pitch that wasn’t present before. Let’s remove this constraint, which isn’t necessarily required and may cause buffers to be allocated larger than needed. Signed-off-by: Ludovic Desroches <ludovic.desroches@microchip.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: dcacfcd35cef ("drm/gem-dma: Compute dumb-buffer sizes with drm_mode_size_dumb()") Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20251126-lcd_pitch_alignment-v1-1-991610a1e369@microchip.com
2025-12-03drm/i915/crtc: Expose sharpness only if num_scalers is >= 2Nemesa Garg1-1/+1
CASF requires the second scaler for sharpness. Do not expose the SHARPNESS_STRENGTH property if the CRTC has fewer than two scalers. v2: Modify header and commit message. [Ankit] Signed-off-by: Nemesa Garg <nemesa.garg@intel.com> Reviewed-by: Ankit Nautiyal <ankit.k.nautiyal@intel.com> Signed-off-by: Animesh Manna <animesh.manna@intel.com> Link: https://patch.msgid.link/20251126084152.3905905-1-nemesa.garg@intel.com
2025-12-03Merge tag 'amd-drm-next-6.19-2025-12-02' of ↵Dave Airlie65-184/+1972
https://gitlab.freedesktop.org/agd5f/linux into drm-next amd-drm-next-6.19-2025-12-02: amdgpu: - Unified MES fix - SMU 11 unbalanced irq fix - Fix for driver reloading on APUs - pp_table sysfs fix - Fix memory leak in fence handling - HDMI fix - DC cursor fixes - eDP panel parsing fix - Brightness fix - DC analog fixes - EDID retry fixes - UserQ fixes - RAS fixes - IP discovery fix - Add missing locking in amdgpu_ttm_access_memory_sdma() - Smart Power OLED fix - PRT and page fault fixes for GC 6-8 - VMID reservation fix - ACP platform device fix - Add missing vm fault handling for GC 11-12 - VPE fix amdkfd: - Partitioning fix Signed-off-by: Dave Airlie <airlied@redhat.com> From: Alex Deucher <alexander.deucher@amd.com> Link: https://patch.msgid.link/20251202220101.2039347-1-alexander.deucher@amd.com
2025-12-02drm/i915/display: Use HAS_LT_PHY() for LT PHY AUX powerGustavo Sousa1-8/+8
Bspec states that the new AUX power enable/disable sequences are associated with the LT PHY. As such, use HAS_LT_PHY() instead of IP checks in those paths in the driver code. While at it, also move the comment that we can't use the power status flag to the "else" branch, since that comment is not applicable for the LT PHY. Bspec: 68967 Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Suraj Kandpal <suraj.kandpal@intel.com> Suggested-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20251202012306.9315-9-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-02drm/i915/display: Move HAS_LT_PHY() to intel_display_device.hGustavo Sousa2-2/+1
We will need to HAS_LT_PHY() that macro in code outside of LT PHY implementation. Move its definition to intel_display_device.h. Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Suraj Kandpal <suraj.kandpal@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20251202012306.9315-8-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-02drm/i915/display: Use platform check in HAS_LT_PHY()Gustavo Sousa1-1/+1
NVL uses the Lake Tahoe PHY for display output and the driver recently added the macro HAS_LT_PHY() to allow selecting code paths specific for that type of PHY. While NVL uses Xe3p_LPD as display IP, the type of PHY is actually defined at the SoC level, so use a platform check instead of display version. Bspec: 74199 Cc: Suraj Kandpal <suraj.kandpal@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Reviewed-by: Dnyaneshwar Bhadane <dnyaneshwar.bhadane@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20251202012306.9315-7-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-02drm/i915/nvls: Add NVL-S display supportSai Teja Pottumuttu2-1/+8
Add platform description and PCI IDs for NVL-S. BSpec: 74201 Signed-off-by: Sai Teja Pottumuttu <sai.teja.pottumuttu@intel.com> Reviewed-by: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://patch.msgid.link/20251202012306.9315-6-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-02drm/i915/xe3p_lpd: Handle underrun debug bitsGustavo Sousa6-0/+175
Xe3p_LPD added several bits containing information that can be relevant to debugging FIFO underruns. Add the logic necessary to handle them when reporting underruns. This was adapted from the initial patch[1] from Sai Teja Pottumuttu. [1] https://lore.kernel.org/all/20251015-xe3p_lpd-basic-enabling-v1-12-d2d1e26520aa@intel.com/ v2: - Handle FBC-related bits in intel_fbc.c. (Ville) - Do not use intel_fbc_id_for_pipe (promoted from skl_fbc_id_for_pipe), but use pipe's primary plane to get the FBC instance. (Ville, Matt) - Improve code readability by moving logic specific to logging fields of UNDERRUN_DBG1 to a separate function. (Jani) v3: - Use crtc->base.primary instead of cycling through all crtcs Bspec: 69111, 69561, 74411, 74412 Cc: Jani Nikula <jani.nikula@intel.com> Cc: Matt Roper <matthew.d.roper@intel.com> Cc: Ville Syrjälä <ville.syrjala@linux.intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Signed-off-by: Matt Atwood <matthew.s.atwood@intel.com> Link: https://patch.msgid.link/20251202012306.9315-5-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-02drm/i915/display: Handle dedicated external ports in intel_encoder_is_tc()Gustavo Sousa3-1/+27
Starting with Xe3p_LPD, the VBT has a new field, called in the driver "dedicated_external", which tells that a Type-C capable port is physically connected to a PHY outside of the Type-C subsystem. When that's the case, the driver must not do the extra Type-C programming for that port. Update intel_encoder_is_tc() to check for that case. While at it, add a note to intel_phy_is_tc() to remind us that it is about whether the respective port is a Type-C capable port rather than the PHY itself. (Maybe it would be a nice idea to rename intel_phy_is_tc()?) Note that this was handled with a new bool member added to struct intel_digital_port instead of having querying the VBT directly because VBT memory is freed (intel_bios_driver_remove) before encoder cleanup (intel_ddi_encoder_destroy), which would cause an oops to happen when the latter calls intel_encoder_is_tc(). This could be fixed by keeping VBT data around longer, but that's left for a follow-up work, if deemed necessary. v2: - Drop printing info about dedicated external, now that we are doing it when parsing the VBT. (Jani) - Add a FIXME comment on the code explaining why we need to store dedicated_external in struct intel_digital_port. (Jani) v3: - Simplify the code by using NULL check for dig_port to avoid using intel_encoder_is_dig_port(). (Imre) Cc: Imre Deak <imre.deak@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20251202012306.9315-4-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-02drm/i915/power: Use intel_encoder_is_tc()Gustavo Sousa1-10/+23
Starting with Xe3p_LPD, when intel_phy_is_tc() returns true, it does not necessarily mean that the port is connected to a PHY in the Type-C subsystem. The reason is that there is now a VBT field called dedicated_external that will indicate that a Type-C capable port is connected to a (most likely) combo/dedicated PHY. When that's the case, we must not do the extra programming required for Type-C connections. In an upcoming change, we will modify intel_encoder_is_tc() to take the VBT field dedicated_external into consideration. Update intel_display_power_well.c to use that function instead of intel_phy_is_tc(). Note that, even though icl_aux_power_well_{enable,disable} are not part of Xe3p_LPD's display paths, we modify them anyway for uniformity. v2: - Add and use icl_aux_pw_is_tc_phy() helper to avoid explicit encoder lookup. (Imre) Cc: Imre Deak <imre.deak@intel.com> Cc: Shekhar Chauhan <shekhar.chauhan@intel.com> Reviewed-by: Suraj Kandpal <suraj.kandpal@intel.com> # v1 Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20251202012306.9315-3-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-02drm/i915/vbt: Add fields dedicated_external and dyn_port_over_tcGustavo Sousa3-2/+76
VBT version 264 adds new fields associated to Xe3p_LPD's new ways of configuring SoC for TC ports and PHYs. Update the code to match the updates in VBT. The new field dedicated_external is used to represent TC ports that are connected to PHYs outside of the Type-C subsystem, meaning that they behave like dedicated ports and don't require the extra Type-C programming. In an upcoming change, we will update the driver to take this field into consideration when detecting the type of port. The new field dyn_port_over_tc is used to inform that the TC port can be dynamically allocated for a legacy connector in the Type-C subsystem, which is a new feature in Xe3p_LPD. In upcoming changes, we will use that field in order to handle the IOM resource management programming required for that. Note that, when dedicated_external is set, the fields dp_usb_type_c and tbt are tagged as "don't care" in the spec, so they should be ignored in that case, so also add a sanitization function to take care of forcing them to zero when dedicated_external is true. v2: - Use sanitization function to force dp_usb_type_c and tbt fields to be zero instead of adding a intel_bios_encoder_is_dedicated_external() check in each of their respective accessor functions. (Jani) - Print info about dedicated external ports in print_ddi_port(). (Jani) v3: - Also zero out field dyn_port_over_tc when dedicated_external is set. (Imre) - Use intel_bios_encoder_is_dedicated_external() directly instead of storing return value into variable in print_ddi_port(). (Imre) - Also print info for dyn_port_over_tc in print_ddi_port(). (Imre) Bspec: 20124, 68954, 74304 Cc: Imre Deak <imre.deak@intel.com> Cc: Jani Nikula <jani.nikula@linux.intel.com> Cc: Shekhar Chauhan <shekhar.chauhan@intel.com> Signed-off-by: Gustavo Sousa <gustavo.sousa@intel.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Link: https://patch.msgid.link/20251202012306.9315-2-matthew.s.atwood@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-02Revert "drm/amd: Skip power ungate during suspend for VPE"Mario Limonciello (AMD)1-2/+1
Skipping power ungate exposed some scenarios that will fail like below: ``` amdgpu: Register(0) [regVPEC_QUEUE_RESET_REQ] failed to reach value 0x00000000 != 0x00000001n amdgpu 0000:c1:00.0: amdgpu: VPE queue reset failed ... amdgpu: [drm] *ERROR* wait_for_completion_timeout timeout! ``` The underlying s2idle issue that prompted this commit is going to be fixed in BIOS. This reverts commit 2a6c826cfeedd7714611ac115371a959ead55bda. Fixes: 2a6c826cfeed ("drm/amd: Skip power ungate during suspend for VPE") Cc: stable@vger.kernel.org Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reported-by: Konstantin <answer2019@yandex.ru> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220812 Reported-by: Matthew Schwartz <matthew.schwartz@linux.dev> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu: use common defines for HUB faultsAlex Deucher5-8/+21
Use common definitions for the fault bits in the IH sourc data for the gmc9-12 memory hub faults Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/gmc12: add amdgpu_vm_handle_fault() handlingAlex Deucher1-0/+27
We need to call amdgpu_vm_handle_fault() on page fault on all gfx9 and newer parts to properly update the page tables, not just for recoverable page faults. Cc: stable@vger.kernel.org Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/gmc11: add amdgpu_vm_handle_fault() handlingAlex Deucher1-0/+27
We need to call amdgpu_vm_handle_fault() on page fault on all gfx9 and newer parts to properly update the page tables, not just for recoverable page faults. Cc: stable@vger.kernel.org Reviewed-by: Timur Kristóf <timur.kristof@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu: use static ids for ACP platform devsBrady Norander1-2/+8
mfd_add_hotplug_devices() assigns child platform devices with PLATFORM_DEVID_AUTO, but the ACP machine drivers expect the platform device names to never change. Use mfd_add_devices() instead and give each cell a unique id. Signed-off-by: Brady Norander <bradynorander@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/sdma6: Update SDMA 6.0.3 FW version to include UMQ ↵Srinivasan Shanmugam1-1/+1
protected-fence fix On GFX11.0.3, earlier SDMA firmware versions issue the PROTECTED_FENCE write from the user VMID (e.g. VMID 8) instead of VMID 0. This causes a GPU VM protection fault when SDMA tries to write the secure fence location, as seen in the UMQ SDMA test (cs-sdma-with-IP-DMA-UMQ) Fixes the below GPU page fault: [ 514.037189] amdgpu 0000:0b:00.0: amdgpu: [gfxhub] page fault (src_id:0 ring:40 vmid:8 pasid:32770) [ 514.037199] amdgpu 0000:0b:00.0: amdgpu: Process pid 0 thread pid 0 [ 514.037205] amdgpu 0000:0b:00.0: amdgpu: in page starting at address 0x00007fff00409000 from client 10 [ 514.037212] amdgpu 0000:0b:00.0: amdgpu: GCVM_L2_PROTECTION_FAULT_STATUS:0x00841A51 [ 514.037217] amdgpu 0000:0b:00.0: amdgpu: Faulty UTCL2 client ID: SDMA0 (0xd) [ 514.037223] amdgpu 0000:0b:00.0: amdgpu: MORE_FAULTS: 0x1 [ 514.037227] amdgpu 0000:0b:00.0: amdgpu: WALKER_ERROR: 0x0 [ 514.037232] amdgpu 0000:0b:00.0: amdgpu: PERMISSION_FAULTS: 0x5 [ 514.037236] amdgpu 0000:0b:00.0: amdgpu: MAPPING_ERROR: 0x0 [ 514.037241] amdgpu 0000:0b:00.0: amdgpu: RW: 0x1 v2: Updated commit message v3: s/gfx11.0.3/sdma 6.0.3/ in patch title (Alex) Cc: Alex Deucher <alexander.deucher@amd.com> Cc: Christian König <christian.koenig@amd.com> Cc: stable@vger.kernel.org Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu: Forward VMID reservation errorsNatalie Vock1-2/+1
Otherwise userspace may be fooled into believing it has a reserved VMID when in reality it doesn't, ultimately leading to GPU hangs when SPM is used. Fixes: 80e709ee6ecc ("drm/amdgpu: add option params to enforce process isolation between graphics and compute") Cc: stable@vger.kernel.org Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Natalie Vock <natalie.vock@gmx.de> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/gmc8: Delegate VM faults to soft IRQ handler ringTimur Kristóf1-0/+6
On old GPUs, it may be an issue that handling the interrupts from VM faults is too slow and the interrupt handler (IH) ring may overflow, which can cause an eventual hang. Delegate the processing of all VM faults to the soft IRQ handler ring. As a result, we spend much less time in the IRQ handler that interacts with the HW IH ring, which significantly reduces the chance of hangs/reboots. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/gmc7: Delegate VM faults to soft IRQ handler ringTimur Kristóf1-0/+6
On old GPUs, it may be an issue that handling the interrupts from VM faults is too slow and the interrupt handler (IH) ring may overflow, which can cause an eventual hang. Delegate the processing of all VM faults to the soft IRQ handler ring. As a result, we spend much less time in the IRQ handler that interacts with the HW IH ring, which significantly reduces the chance of hangs/reboots. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/gmc6: Delegate VM faults to soft IRQ handler ringTimur Kristóf1-0/+6
On old GPUs, it may be an issue that handling the interrupts from VM faults is too slow and the interrupt handler (IH) ring may overflow, which can cause an eventual hang. Delegate the processing of all VM faults to the soft IRQ handler ring. As a result, we spend much less time in the IRQ handler that interacts with the HW IH ring, which significantly reduces the chance of hangs/reboots. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/gmc6: Cache VM fault infoTimur Kristóf1-0/+4
Call amdgpu_vm_update_fault_cache on GMC v6 similarly to how we do in GMC v7-v8 so that VM fault info can be used later by userspace for debugging. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/gmc6: Don't print MC client as it's unknownTimur Kristóf1-6/+4
The VM_CONTEXT1_PROTECTION_FAULT_MCCLIENT register doesn't exist on GMC v6 so we can't print the MC client as a string like we do on GMC v7-v8. However, we still print the mc_id from VM_CONTEXT1_PROTECTION_FAULT_STATUS. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/cz_ih: Enable soft IRQ handler ringTimur Kristóf1-0/+10
We are going to use the soft IRQ handler ring on GMC v8 to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/tonga_ih: Enable soft IRQ handler ringTimur Kristóf1-0/+10
We are going to use the soft IRQ handler ring on GMC v8 to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/iceland_ih: Enable soft IRQ handler ringTimur Kristóf1-0/+10
We are going to use the soft IRQ handler ring on GMC v8 to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/cik_ih: Enable soft IRQ handler ringTimur Kristóf1-0/+12
We are going to use the soft IRQ handler ring on GMC v7 (CIK) to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu/si_ih: Enable soft IRQ handler ringTimur Kristóf1-0/+12
We are going to use the soft IRQ handler ring on GMC v6 (SI) to process interrupts from VM faults. Signed-off-by: Timur Kristóf <timur.kristof@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amd/display: fix typo in display_mode_core_structs.hAditya Gollamudi1-1/+1
Fix a typo in a comment, change "enviroment" to "environment" in drivers/gpu/drm/amd/display/dc/dml2/display_mode_core_structs.h Fixes: e6a8a000cfe6 ("drm/amd/display: Rename dml2 to dml2_0 folder") Acked-by: Aurabindo Pillai <aurabindo.pillai@amd.com> Signed-off-by: Aditya Gollamudi <adigollamudi@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amd/display: fix Smart Power OLED not working after S4Ian Chen1-0/+6
[HOW] Before enable smart power OLED, we need to call set pipe to let DMUB get correct ABM config. Reviewed-by: Robin Chen <robin.chen@amd.com> Signed-off-by: Ian Chen <ian.chen@amd.com> Signed-off-by: Roman Li <roman.li@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amd/display: Move RGB-type check for audio sync to DCE HW sequenceIvan Lipski2-2/+4
[Why&How] DVI-A & VGA connectors are applicable to DCE ASICs, so move them to dce110_hwseq.c to block audio sync on SIGNAL_TYPE_RGB for DCE ASICs. Signed-off-by: Ivan Lipski <ivan.lipski@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/amdgpu: add missing lock to amdgpu_ttm_access_memory_sdmaPierre-Eric Pelloux-Prayer1-0/+2
Users of ttm entities need to hold the gtt_window_lock before using them to guarantee proper ordering of jobs. Cc: stable@vger.kernel.org Fixes: cb5cc4f573e1 ("drm/amdgpu: improve debug VRAM access performance using sdma") Signed-off-by: Pierre-Eric Pelloux-Prayer <pierre-eric.pelloux-prayer@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-02drm/xe/gt: Use scope-based forcewakeRaag Jadav1-22/+10
Switch runtime PM code to use scope-based forcewake for consistency with other parts of the driver. Signed-off-by: Raag Jadav <raag.jadav@intel.com> Reviewed-by: Matt Roper <matthew.d.roper@intel.com> Link: https://patch.msgid.link/20251128082212.294592-1-raag.jadav@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-12-02drm/xe/vf: Add debugfs entries to test VF double migrationSatyanarayana K V P3-0/+60
VF migration sends a marker to the GUC before resource fixups begin, and repeats the marker with the RESFIX_DONE notification. This prevents the GUC from submitting jobs during double migration events. To reliably test double migration, a second migration must be triggered while fixups from the first migration are still in progress. Since fixups complete quickly, reproducing this scenario is difficult. Introduce debugfs controls to add delays in the post-fixup phase, creating a deterministic window for subsequent migrations. New debugfs entries: /sys/kernel/debug/dri/BDF/ ├── tile0 │ ├─gt0 │ │ ├──vf │ │ │ ├── resfix_stoppers resfix_stoppers: Predefined checkpoints that allow the migration process to pause at specific stages. The stages are given below. VF_MIGRATION_WAIT_RESFIX_START - BIT(0) VF_MIGRATION_WAIT_FIXUPS - BIT(1) VF_MIGRATION_WAIT_RESTART_JOBS - BIT(2) VF_MIGRATION_WAIT_RESFIX_DONE - BIT(3) Each state will pause with a 1-second delay per iteration, continuing until its corresponding bit is cleared. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Acked-by: Adam Miszczak <adam.miszczak@linux.intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251201095011.21453-10-satyanarayana.k.v.p@intel.com
2025-12-02drm/xe/vf: Requeue recovery on GuC MIGRATION error during VF post-migrationSatyanarayana K V P2-0/+9
Handle GuC response `XE_GUC_RESPONSE_VF_MIGRATED` as a special case in the VF post-migration recovery flow. When this error occurs, it indicates that a new migration was detected while the resource fixup process was still in progress. Instead of failing immediately, requeue the VF into the recovery path to allow proper handling of the new migration event. This improves robustness of VF recovery in SR-IOV environments where migrations can overlap with resource fixup steps. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251201095011.21453-9-satyanarayana.k.v.p@intel.com
2025-12-02drm/xe/vf: Introduce RESFIX start marker supportSatyanarayana K V P4-51/+195
In scenarios involving double migration, the VF KMD may encounter situations where it is instructed to re-migrate before having the opportunity to send RESFIX_DONE for the initial migration. This can occur when the fix-up for the prior migration is still underway, but the VF KMD is migrated again. Consequently, this may lead to the possibility of sending two migration notifications (i.e., pending fix-up for the first migration and a second notification for the new migration). Upon receiving the first RES_FIX notification, the GuC will resume VF submission on the GPU, potentially resulting in undefined behavior, such as system hangs or crashes. To avoid this, post migration, a marker is sent to the GUC prior to the start of resource fixups to indicate start of resource fixups. The same marker is sent along with RESFIX_DONE notification so that GUC can avoid submitting jobs to HW in case of double migration. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251201095011.21453-8-satyanarayana.k.v.p@intel.com
2025-12-02drm/xe/vf: Enable VF migration only on supported GuC versionsSatyanarayana K V P1-1/+21
Enable VF migration starting with GuC 70.54.0 (compatibility version 1.27.0) which supports additional VF2GUC_RESFIX_START message required to handle migration recovery in a more robust way. Signed-off-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Cc: Tomasz Lis <tomasz.lis@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/20251201095011.21453-7-satyanarayana.k.v.p@intel.com
2025-12-02drm/{i915, xe}/display: make pxp key check part of bo interfaceJani Nikula5-32/+15
Add intel_bo_key_check() next to intel_bo_is_protected() where it feels like it belongs, and drop the extra pxp compat header. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251201172730.2154668-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-12-02drm/xe/compat: remove unused i915_active.h and i915_active_types.hJani Nikula2-35/+0
Commit 965930962a41 ("drm/i915/frontbuffer: Fix intel_frontbuffer lifetime handling") dropped the last xe display users of the headers. They're still used in intel_overlay.c, but it's not built as part of xe. Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://patch.msgid.link/20251201171050.2145833-1-jani.nikula@intel.com Signed-off-by: Jani Nikula <jani.nikula@intel.com>
2025-12-02drm/fbdev-helper: Remove drm_fb_helper_debug_enter/_leave()Thomas Zimmermann1-108/+0
Remove the debug_enter/debug_leave helpers, as there are no DRM drivers supporting debugging with kgdb. Remove code to keep track of existing fbdev-emulation state. None of this required any longer. Also remove mode_set_base_atomic from struct drm_crtc_helper_funcs, which has no callers or implementations. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Daniel Thompson (RISCstar) <danielt@kernel.org> Link: https://patch.msgid.link/20251125130634.1080966-5-tzimmermann@suse.de
2025-12-02drm/radeon: Do not implement mode_set_base_atomic callbackThomas Zimmermann3-81/+26
Remove the implementation of the CRTC helper mode_set_base_atomic from radeon. It pretends to provide mode setting for kdb debugging, but has been broken for some time. Kdb output has been supported only for non-atomic mode setting since commit 9c79e0b1d096 ("drm/fb-helper: Give up on kgdb for atomic drivers") from 2017. While radeon provides non-atomic mode setting, kdb assumes that the GEM buffer object is at a fixed location in video memory. This assumption currently blocks radeon from converting to generic fbdev emulation. Fbdev-ttm helpers use a shadow buffer with a movable GEM buffer object. Triggering kdb does therefore not update the display. Another problem is that the current implementation does not handle USB keyboard input. Therefore a serial terminal is required. Then when continuing from the debugger, radeon fails with an error: [7]kdb> go [ 40.345523][ C7] BUG: scheduling while atomic: bash/1580/0x00110003 [...] [ 40.345613][ C7] schedule+0x27/0xd0 [ 40.345615][ C7] schedule_timeout+0x7b/0x100 [ 40.345617][ C7] ? __pfx_process_timeout+0x10/0x10 [ 40.345619][ C7] msleep+0x31/0x50 [ 40.345621][ C7] radeon_crtc_load_lut+0x2e4/0xcb0 [radeon 31c1ee785de120fcfd0babcc09babb3770252b4e] [ 40.345698][ C7] radeon_crtc_gamma_set+0xe/0x20 [radeon 31c1ee785de120fcfd0babcc09babb3770252b4e] [ 40.345760][ C7] drm_fb_helper_debug_leave+0xd8/0x130 [ 40.345763][ C7] kgdboc_post_exp_handler+0x54/0x70 [...] and the system hangs. Support for kdb feels pretty much broken. Hence remove the whole kdb support from radeon. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Daniel Thompson (RISCstar) <danielt@kernel.org> Link: https://patch.msgid.link/20251125130634.1080966-4-tzimmermann@suse.de
2025-12-02drm/nouveau: Do not implement mode_set_base_atomic callbackThomas Zimmermann1-20/+4
Remove the implementation of the CRTC helper mode_set_base_atomic from nouveau. It pretends to provide mode setting for kdb debugging, but has been broken for some time. Kdb output has been supported only for non-atomic mode setting since commit 9c79e0b1d096 ("drm/fb-helper: Give up on kgdb for atomic drivers") from 2017. While nouveau provides non-atomic mode setting for some devices, kdb assumes that the GEM buffer object is at a fixed location in video memory. This has not been the case since commit 4a16dd9d18a0 ("drm/nouveau/kms: switch to drm fbdev helpers") from 2022. Fbdev-ttm helpers use a shadow buffer with a movable GEM buffer object. Triggering kdb does therefore not update the display. Hence remove the whole kdb support from nouveau. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Lyude Paul <lyude@redhat.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Daniel Thompson (RISCstar) <danielt@kernel.org> Link: https://patch.msgid.link/20251125130634.1080966-3-tzimmermann@suse.de
2025-12-02drm/amdgpu: Do not implement mode_set_base_atomic callbackThomas Zimmermann3-72/+33
Remove all implementations of the CRTC helper mode_set_base_atomic from amdgpu. It pretends to provide mode setting for kdb debugging, but has been broken for some time. Kdb output has been supported only for non-atomic mode setting since commit 9c79e0b1d096 ("drm/fb-helper: Give up on kgdb for atomic drivers") from 2017. While amdgpu provides non-atomic mode setting for some devices, kdb assumes that the GEM buffer object is at a fixed location in video memory. This has not been the case since commit 087451f372bf ("drm/amdgpu: use generic fb helpers instead of setting up AMD own's.") from 2021. Fbdev-ttm helpers use a shadow buffer with a movable GEM buffer object. Triggering kdb does not update the display. Hence remove the whole kdb support from amdgpu. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Simona Vetter <simona.vetter@ffwll.ch> Acked-by: Daniel Thompson (RISCstar) <danielt@kernel.org> Link: https://patch.msgid.link/20251125130634.1080966-2-tzimmermann@suse.de