aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu
AgeCommit message (Collapse)AuthorFilesLines
2025-09-10drm/xe: Convert xe_bo_create_user() for exhaustive evictionThomas Hellström9-61/+115
Use the xe_validation_guard() to convert xe_bo_create_user() for exhaustive eviction. v2: - Adapt to argument changes of xe_validation_guard() Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v1 Link: https://lore.kernel.org/r/20250908101246.65025-4-thomas.hellstrom@linux.intel.com
2025-09-10drm/xe: Introduce an xe_validation wrapper around drm_execThomas Hellström2-0/+351
Introduce a validation wrapper xe_validation_guard() as a helper intended to be used around drm_exec transactions what perform validations. Once TTM can handle exhaustive eviction we could remove this wrapper or make it mostly a NO-OP unless other functionality is added to it. Currently the wrapper takes a read lock upon entry and if the transaction hits an OOM, all locks are released and the transaction is retried with a write-lock. If all other validations participate in this scheme, the transaction with the write lock will be the only transaction validating and should have access to all available non-pinned memory. There is currently a problem in that TTM converts -EDEADLOCKS to -ENOMEM, and with ww_mutex slowpath error injections, we can hit -ENOMEMs without having actually ran out of memory. We abuse ww_mutex internals to detect such situations until TTM is fixes to not convert the error code. In the meantime, injecting ww_mutex slowpath -EDEADLOCKs is a good way to test the implementation in the absence of real OOMs. Just introduce the wrapper in this commit. It will be hooked up to the driver in following commits. v2: - Mark class_xe_validation conditional so that the loop is skipped on initialization error. - Argument sanitation (Matt Brost) - Fix conditional execution of xe_validation_ctx_fini() (Matt Brost) - Add a no_block mode for upcoming use in the CPU fault handler. v4: - Update kerneldoc. (Xe CI). Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Link: https://lore.kernel.org/r/20250908101246.65025-3-thomas.hellstrom@linux.intel.com
2025-09-10drm/xe: Pass down drm_exec context to validationThomas Hellström21-110/+407
We want all validation (potential backing store allocation) to be part of a drm_exec transaction. Therefore add a drm_exec pointer argument to xe_bo_validate() and ___xe_bo_create_locked(). Upcoming patches will deal with making all (or nearly all) calls to these functions part of a drm_exec transaction. In the meantime, define special values of the drm_exec pointer: XE_VALIDATION_UNIMPLEMENTED: Implementation of the drm_exec transaction has not been done yet. XE_VALIDATION_UNSUPPORTED: Some Middle-layers (dma-buf) doesn't allow the drm_exec context to be passed down to map_attachment where validation takes place. XE_VALIDATION_OPT_OUT: May be used only for kunit tests where exhaustive eviction isn't crucial and the ROI of converting those is very small. For XE_VALIDATION_UNIMPLEMENTED and XE_VALIDATION_OPT_OUT there is also a lockdep check that a drm_exec transaction can indeed start at the location where the macro is expanded. This is to encourage developers to take this into consideration early in the code development process. v2: - Fix xe_vm_set_validation_exec() imbalance. Add an assert that hopefully catches future instances of this (Matt Brost) v3: - Extend to psmi_alloc_object Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> #v3 Link: https://lore.kernel.org/r/20250908101246.65025-2-thomas.hellstrom@linux.intel.com
2025-09-09drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any timeDavid Rosca2-8/+16
There is no reason to require this to happen on first submitted IB only. We need to wait for the queue to be idle, but it can be done at any time (including when there are multiple video sessions active). Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 8908fdce0634a623404e9923ed2f536101a39db5) Cc: stable@vger.kernel.org
2025-09-09drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packagesDavid Rosca1-31/+21
There can be multiple engine info packages in one IB and the first one may be common engine, not decode/encode. We need to parse the entire IB instead of stopping after finding first engine info. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit dc8f9f0f45166a6b37864e7a031c726981d6e5fc) Cc: stable@vger.kernel.org
2025-09-09drm/amd/amdgpu: Declare isp firmware binary filePratap Nirujogi1-0/+2
Declare isp firmware file isp_4_1_1.bin required by isp4.1.1 device. Suggested-by: Alexey Zagorodnikov <xglooom@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit d97b74a833eba1f4f69f67198fd98ef036c0e5f9) Cc: stable@vger.kernel.org
2025-09-09drm/amd/display: use udelay rather than fsleepAlex Deucher1-1/+1
This function can be called from an atomic context so we can't use fsleep(). Fixes: 01f60348d8fb ("drm/amd/display: Fix 'failed to blank crtc!'") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4549 Cc: Wen Chen <Wen.Chen3@amd.com> Cc: Fangzhi Zuo <jerry.zuo@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 27e4dc2c0543fd1808cc52bd888ee1e0533c4a2e)
2025-09-09drm/amdgpu: fix a memory leak in fence cleanup when unloadingAlex Deucher1-2/+0
Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting. Reported-by: Lin.Cao <lincao12@amd.com> Cc: Vitaly Prosyak <vitaly.prosyak@amd.com> Cc: Christian König <christian.koenig@amd.com> Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit a525fa37aac36c4591cc8b07ae8957862415fbd5) Cc: stable@vger.kernel.org
2025-09-09drm/amdgpu/vcn: Change amdgpu_vcn_sw_fini return to voidRodrigo Siqueira11-39/+19
The function amdgpu_vcn_sw_fini() returns an integer, but this number is always 0. This commit changes the amdgpu_vcn_sw_fini() return to void, and eliminates all checks to this return across different VCNs. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu/vcn: Document IRQ per-instance irq behavior for VCN 4.0.3Rodrigo Siqueira1-1/+5
When examining the VCN function init, it is common to find a loop that initializes VCN rings, which uses one IRQ per instance. However, VCN 4.0.3 deviates from this pattern, as it includes a distinct field to differentiate instances, which results in a slightly different ring init. This commit makes this difference explicit by using a fixed index when initializing the ring buffer and also adds a comment. Signed-off-by: Rodrigo Siqueira <siqueira@igalia.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu: validate userq hw unmap status for destroying userqPrike Liang1-0/+5
Before destroying the userq buffer object, it requires validating the userq HW unmap status and ensuring the userq is unmapped from hardware. If the user HW unmap failed, then it needs to reset the queue for reusing. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu: Wire up MMIO_REMAP placement and User-visible stringsSrinivasan Shanmugam4-0/+18
Wire up the conversions and strings for the new MMIO_REMAP placement: * amdgpu_mem_type_to_domain() maps AMDGPU_PL_MMIO_REMAP -> domain * amdgpu_bo_placement_from_domain() accepts the new domain * amdgpu_bo_mem_stats_placement() and amdgpu_bo_print_info() report it * res cursor supports the new placement * fdinfo prints "mmioremap" for the new placement Cc: Christian König <christian.koenig@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu/ttm: Add New AMDGPU_PL_MMIO_REMAP PlacementSrinivasan Shanmugam1-1/+2
Introduce a kernel-internal TTM placement type AMDGPU_PL_MMIO_REMAP for the HDP flush MMIO remap page Plumbing added: - amdgpu_res_cursor.{first,next}: treat MMIO_REMAP like DOORBELL - amdgpu_ttm_io_mem_reserve(): return BAR bus address + offset for MMIO_REMAP, mark as uncached I/O - amdgpu_ttm_io_mem_pfn(): PFN from register BAR - amdgpu_res_cpu_visible(): visible to CPU - amdgpu_evict_flags()/amdgpu_bo_move(): non-migratable - amdgpu_ttm_tt_pde_flags(): map as SYSTEM - amdgpu_bo_mem_stats_placement(): report AMDGPU_PL_MMIO_REMAP - amdgpu_fdinfo: print “mmioremap” bucket label Cc: Alex Deucher <alexander.deucher@amd.com> Suggested-by: Christian König <christian.koenig@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu/vcn: Allow limiting ctx to instance 0 for AV1 at any timeDavid Rosca2-8/+16
There is no reason to require this to happen on first submitted IB only. We need to wait for the queue to be idle, but it can be done at any time (including when there are multiple video sessions active). Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu/vcn4: Fix IB parsing with multiple engine info packagesDavid Rosca1-31/+21
There can be multiple engine info packages in one IB and the first one may be common engine, not decode/encode. We need to parse the entire IB instead of stopping after finding first engine info. Signed-off-by: David Rosca <david.rosca@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu: clean up the amdgpu_userq_active()Prike Liang2-18/+0
This is no invocation for amdgpu_userq_active(). Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu/jpeg: Move parse_cs to amdgpu_jpeg.cSathishkumar S10-70/+83
Rename jpeg_v2_dec_ring_parse_cs to amdgpu_jpeg_dec_parse_cs and move it to amdgpu_jpeg.c as it is shared among jpeg versions. Signed-off-by: Sathishkumar S <sathishkumar.sundararaju@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amd/display: Remove duplicated codeRay Wu3-9/+0
[Why&How] Remove duplicated code Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tom Chung <chiahsuan.chung@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu: validate userq input argsPrike Liang2-32/+56
This will help on validating the userq input args, and rejecting for the invalid userq request at the IOCTLs first place. Signed-off-by: Prike Liang <Prike.Liang@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu/uapi: Introduce AMDGPU_GEM_DOMAIN_MMIO_REMAPSrinivasan Shanmugam1-0/+3
Add a new GEM domain bit AMDGPU_GEM_DOMAIN_MMIO_REMAP to allow userspace to request the MMIO remap (HDP flush) page via GEM_CREATE. - include/uapi/drm/amdgpu_drm.h: * define AMDGPU_GEM_DOMAIN_MMIO_REMAP * include the bit in AMDGPU_GEM_DOMAIN_MASK v2: Add early reject in amdgpu_gem_create_ioctl() (Alex). Cc: Christian König <christian.koenig@amd.com> Suggested-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amd/amdgpu: Declare isp firmware binary filePratap Nirujogi1-0/+2
Declare isp firmware file isp_4_1_1.bin required by isp4.1.1 device. Suggested-by: Alexey Zagorodnikov <xglooom@gmail.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Pratap Nirujogi <pratap.nirujogi@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amd/display: use udelay rather than fsleepAlex Deucher1-1/+1
This function can be called from an atomic context so we can't use fsleep(). Fixes: 01f60348d8fb ("drm/amd/display: Fix 'failed to blank crtc!'") Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4549 Cc: Wen Chen <Wen.Chen3@amd.com> Cc: Fangzhi Zuo <jerry.zuo@amd.com> Cc: Nicholas Kazlauskas <nicholas.kazlauskas@amd.com> Cc: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Harry Wentland <harry.wentland@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu: Fix NULL ptr deref in amdgpu_device_cache_switch_state()John Olender1-1/+1
Kaveri has no upstream bridge, therefore parent is NULL. $ lspci -PP ... 00:01.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Kaveri [Radeon R7 Graphics] (rev d4) For comparison, Raphael: $ lspci -PP ... 00:08.1 PCI bridge: Advanced Micro Devices, Inc. [AMD] Raphael/Granite Ridge Internal GPP Bridge to Bus [C:A] ... 00:08.1/0e:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Raphael (rev c5) Fixes: 1dd2fa0e00f1 ("drm/amdgpu: Save and restore switch state") Link: https://lore.kernel.org/amd-gfx/38fe6513-f8a9-4669-8e86-89c54c465611@gmail.com/ Reviewed-by: Candice Li <candice.li@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: John Olender <john.olender@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/amdgpu: fix a memory leak in fence cleanup when unloadingAlex Deucher1-2/+0
Commit b61badd20b44 ("drm/amdgpu: fix usage slab after free") reordered when amdgpu_fence_driver_sw_fini() was called after that patch, amdgpu_fence_driver_sw_fini() effectively became a no-op as the sched entities we never freed because the ring pointers were already set to NULL. Remove the NULL setting. Reported-by: Lin.Cao <lincao12@amd.com> Cc: Vitaly Prosyak <vitaly.prosyak@amd.com> Cc: Christian König <christian.koenig@amd.com> Fixes: b61badd20b44 ("drm/amdgpu: fix usage slab after free") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-09-09drm/xe/guc: Recommend GUC v70.49.4 for PTL, BMGJulia Filipchuk1-2/+2
UAPI compatibility version 1.24.4 Resolves various BMG, SRIOV issues and adds new PTL features. Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Zhanjun Dong <zhanjun.dong@intel.com> Signed-off-by: Matt Roper <matthew.d.roper@intel.com> Link: https://lore.kernel.org/r/20250829235305.672287-7-julia.filipchuk@intel.com
2025-09-09drm/xe/guc: Don't invoke disable_ct action during replacementMichal Wajdeczko1-1/+1
During second CT initialization step, known as post_hwconfig, we want to replace previously registered CT disable devm action to make sure it will be invoked prior to releasing underlying BO. But to replace this action we don't need to execute it right away since we know that CT was disabled prior to that late init step and we already assert that. Use devm_remove_action() instead to avoid extra message about 'disabling CT' that could be seen now: [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled ... DEVRES REM ff11000149320940 guc_action_disable_ct (16 bytes) [drm:guc_ct_change_state [xe]] GT0: GuC CT communication channel disabled DEVRES ADD ff110001664fc040 guc_action_disable_ct (16 bytes) Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908102053.539-3-michal.wajdeczko@intel.com
2025-09-09drm/xe/guc: Always add CT disable action during second init stepMichal Wajdeczko1-6/+5
On DGFX, during init_post_hwconfig() step, we are reinitializing CTB BO in VRAM and we have to replace cleanup action to disable CT communication prior to release of underlying BO. But that introduces some discrepancy between DGFX and iGFX, as for iGFX we keep previously added disable CT action that would be called during unwind much later. To keep the same flow on both types of platforms, always replace old cleanup action and register new one. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Satyanarayana K V P <satyanarayana.k.v.p@intel.com> Link: https://lore.kernel.org/r/20250908102053.539-2-michal.wajdeczko@intel.com
2025-09-09drm/xe: Extend Wa_13011645652 to PTL-H, WCLJulia Filipchuk1-1/+2
Expand workaround to additional graphics architectures. Cc: Vinay Belgaumkar <vinay.belgaumkar@intel.com> Cc: Stuart Summers <stuart.summers@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Thomas Hellström <thomas.hellstrom@linux.intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: intel-xe@lists.freedesktop.org Cc: <stable@vger.kernel.org> # v6.17+ Signed-off-by: Julia Filipchuk <julia.filipchuk@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250903190122.1028373-2-julia.filipchuk@intel.com Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> (cherry picked from commit 6fc957185e1691bb6dfa4193698a229db537c2a2) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/xe: Block exec and rebind worker while evicting for suspend / hibernateThomas Hellström6-1/+88
When the xe pm_notifier evicts for suspend / hibernate, there might be racing tasks trying to re-validate again. This can lead to suspend taking excessive time or get stuck in a live-lock. This behaviour becomes much worse with the fix that actually makes re-validation bring back bos to VRAM rather than letting them remain in TT. Prevent that by having exec and the rebind worker waiting for a completion that is set to block by the pm_notifier before suspend and is signaled by the pm_notifier after resume / wakeup. It's probably still possible to craft malicious applications that block suspending. More work is pending to fix that. v3: - Avoid wait_for_completion() in the kernel worker since it could potentially cause work item flushes from freezable processes to wait forever. Instead terminate the rebind workers if needed and re-launch at resume. (Matt Auld) v4: - Fix some bad naming and leftover debug printouts. - Fix kerneldoc. - Use drmm_mutex_init() for the xe->rebind_resume_lock (Matt Auld). - Rework the interface of xe_vm_rebind_resume_worker (Matt Auld). Link: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/4288 Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904160715.2613-4-thomas.hellstrom@linux.intel.com (cherry picked from commit 599334572a5a99111015fbbd5152ce4dedc2f8b7) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/xe: Allow the pm notifier to continue on failureThomas Hellström1-10/+7
Its actions are opportunistic anyway and will be completed on device suspend. Marking as a fix to simplify backporting of the fix that follows in the series. v2: - Keep the runtime pm reference over suspend / hibernate and document why. (Matt Auld, Rodrigo Vivi): Fixes: c6a4d46ec1d7 ("drm/xe: evict user memory in PM notifier") Cc: Matthew Auld <matthew.auld@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: <stable@vger.kernel.org> # v6.16+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904160715.2613-3-thomas.hellstrom@linux.intel.com (cherry picked from commit ebd546fdffddfcaeab08afdd68ec93052c8fa740) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/xe: Attempt to bring bos back to VRAM after evictionThomas Hellström5-16/+16
VRAM+TT bos that are evicted from VRAM to TT may remain in TT also after a revalidation following eviction or suspend. This manifests itself as applications becoming sluggish after buffer objects get evicted or after a resume from suspend or hibernation. If the bo supports placement in both VRAM and TT, and we are on DGFX, mark the TT placement as fallback. This means that it is tried only after VRAM + eviction. This flaw has probably been present since the xe module was upstreamed but use a Fixes: commit below where backporting is likely to be simple. For earlier versions we need to open- code the fallback algorithm in the driver. v2: - Remove check for dgfx. (Matthew Auld) - Update the xe_dma_buf kunit test for the new strategy (CI) - Allow dma-buf to pin in current placement (CI) - Make xe_bo_validate() for pinned bos a NOP. Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/5995 Fixes: a78a8da51b36 ("drm/ttm: replace busy placement with flags v6") Cc: Matthew Brost <matthew.brost@intel.com> Cc: Matthew Auld <matthew.auld@intel.com> Cc: <stable@vger.kernel.org> # v6.9+ Signed-off-by: Thomas Hellström <thomas.hellstrom@linux.intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Link: https://lore.kernel.org/r/20250904160715.2613-2-thomas.hellstrom@linux.intel.com (cherry picked from commit cb3d7b3b46b799c96b54f8e8fe36794a55a77f0b) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/xe/configfs: Don't touch survivability_mode on finiMichal Wajdeczko1-1/+2
This is a user controlled configfs attribute, we should not modify that outside the configfs attr.store() implementation. Fixes: bc417e54e24b ("drm/xe: Enable configfs support for survivability mode") Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Lucas De Marchi <lucas.demarchi@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Stuart Summers <stuart.summers@intel.com> Reviewed-by: Lucas De Marchi <lucas.demarchi@intel.com> Link: https://lore.kernel.org/r/20250904103521.7130-1-michal.wajdeczko@intel.com (cherry picked from commit 079a5c83dbd23db7a6eed8f558cf75e264d8a17b) Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/xe: Never report L3 bank mask for media GT going forwardMatt Roper4-15/+24
We currently report an L3 bank mask as part of the GT topology on both GTs (primary and media) because a copy of the L3 bank fuse register exists on both GTs (e.g., $gsi_offset + 0x9130 on Xe3). After recent discussions it's come to light that the only known userspace software that uses this part of the uapi (the compute UMD and Mesa) only uses the value reported for the primary GT; the value reported for the media GT is ignored by both projects, and the media UMDs don't have any use for L3 information today. Since we always strive to have our uapi match the specific needs of userspace and not include additional unused baggage, let's officially drop L3 bank reporting on the media GT going forward and only keep it around for the primary GT where it actually gets used. This change will only apply to future platforms (Xe3 and later); even though it would probably be safe to remove it from Xe1/Xe2 as well, we don't want to take any chances with changing existing ABI. Note that we'd already disabled reading/reporting of the L3 bank for the media GT on PTL in commit 9ab440a9d042 ("drm/xe/ptl: L3bank mask is not available on the media GT") because it was discovered that the copy of the fuse registers on the media GT were just reporting a bogus ~0 value rather than an accurate mask. So this is just extending that PTL behavior forward to WCL and other future platforms. Note that we're also free to reinstate this part of the uapi in the future if/when some new userspace consumer emerges that _does_ have a use for media-specific L3 bank masks. Cc: Fei Yang <fei.yang@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Link: https://lore.kernel.org/r/20250905215614.796247-3-matthew.d.roper@intel.com Signed-off-by: Matt Roper <matthew.d.roper@intel.com>
2025-09-09amd/amdkfd: correct mem limit calculation for small APUsYifan Zhang1-12/+32
Current mem limit check leaks some GTT memory (reserved_for_pt reserved_for_ras + adev->vram_pin_size) for small APUs. Since carveout VRAM is tunable on APUs, there are three case regarding the carveout VRAM size relative to GTT: 1. 0 < carveout < gtt apu_prefer_gtt = true, is_app_apu = false 2. carveout > gtt / 2 apu_prefer_gtt = false, is_app_apu = false 3. 0 = carveout apu_prefer_gtt = true, is_app_apu = true It doesn't make sense to check below limitation in case 1 (default case, small carveout) because the values in the below expression are mixed with carveout and gtt. adev->kfd.vram_used[xcp_id] + vram_needed > vram_size - reserved_for_pt - reserved_for_ras - atomic64_read(&adev->vram_pin_size) gtt: kfd.vram_used, vram_needed, vram_size carveout: reserved_for_pt, reserved_for_ras, adev->vram_pin_size In case 1, vram allocation will go to gtt domain, skip vram check since ttm_mem_limit check already cover this allocation. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit fa7c99f04f6dd299388e9282812b14e95558ac8e)
2025-09-09drm/amdkfd: fix p2p links bug in topologyEric Huang1-1/+2
When creating p2p links, KFD needs to check XGMI link with two conditions, hive_id and is_sharing_enabled, but it is missing to check is_sharing_enabled, so add it to fix the error. Signed-off-by: Eric Huang <jinhuieric.huang@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 36cc7d13178d901982da7a122c883861d98da624)
2025-09-09drm/amd/display: remove oem i2c adapter on finishGeoffrey McRae1-1/+12
Fixes a bug where unbinding of the GPU would leave the oem i2c adapter registered resulting in a null pointer dereference when applications try to access the invalid device. Fixes: 3d5470c97314 ("drm/amd/display/dm: add support for OEM i2c bus") Cc: Harry Wentland <harry.wentland@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Geoffrey McRae <geoffrey.mcrae@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 89923fb7ead4fdd37b78dd49962d9bb5892403e6) Cc: stable@vger.kernel.org
2025-09-09drm/amd/display: Drop dm_prepare_suspend() and dm_complete()Mario Limonciello (AMD)1-21/+0
[Why] dm_prepare_suspend() was added in commit 50e0bae34fa6b ("drm/amd/display: Add and use new dm_prepare_suspend() callback") to allow display to turn off earlier in the suspend sequence. This caused a regression that HDMI audio sometimes didn't work properly after resume unless audio was playing during suspend. [How] Drop dm_prepare_suspend() callback. All code in it will still run during dm_suspend(). Also drop unnecessary dm_complete() callback. dm_complete() was used for failed prepare and also for any case of successful resume. The code in it already runs in dm_resume(). This change will introduce more time that the display is turned on during suspend sequence. The compositor can turn it off sooner if desired. Cc: Harry Wentland <harry.wentland@amd.com> Reported-by: Przemysław Kopa <prz.kopa@gmail.com> Closes: https://lore.kernel.org/amd-gfx/1cea0d56-7739-4ad9-bf8e-c9330faea2bb@kernel.org/T/#m383d9c08397043a271b36c32b64bb80e524e4b0f Reported-by: Kalvin <hikaph+oss@gmail.com> Closes: https://github.com/alsa-project/alsa-lib/issues/465 Closes: https://gitlab.freedesktop.org/pipewire/pipewire/-/issues/4809 Fixes: 50e0bae34fa6b ("drm/amd/display: Add and use new dm_prepare_suspend() callback") Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 2fd653b9bb5aacec5d4c421ab290905898fe85a2) Cc: stable@vger.kernel.org
2025-09-09drm/amd/display: Correct sequences and delays for DCN35 PG & RCGOvidiu Bunea7-164/+111
[why] The current PG & RCG programming in driver has some gaps and incorrect sequences. [how] Added delays after ungating clocks to allow ramp up, increased polling to allow more time for power up, and removed the incorrect sequences. Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Charlene Liu <charlene.liu@amd.com> Signed-off-by: Ovidiu Bunea <ovidiu.bunea@amd.com> Signed-off-by: Wayne Lin <wayne.lin@amd.com> Tested-by: Dan Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 1bde5584e297921f45911ae874b0175dce5ed4b5) Cc: stable@vger.kernel.org
2025-09-09drm/amd/display: Disable DPCD Probe QuirkFangzhi Zuo1-0/+1
Disable dpcd probe quirk to native aux. Signed-off-by: Fangzhi Zuo <Jerry.Zuo@amd.com> Reviewed-by: Imre Deak <imre.deak@intel.com> Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4500 Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Link: https://lore.kernel.org/r/20250904191351.746707-1-Jerry.Zuo@amd.com Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit c5f4fb40584ee591da9fa090c6f265d11cbb1acf) Cc: stable@vger.kernel.org # 6.16.y: 5281cbe0b55a Cc: stable@vger.kernel.org # 6.16.y: 0b4aa85e8981 Cc: stable@vger.kernel.org # 6.16.y: b87ed522b364 Cc: stable@vger.kernel.org # 6.16.y
2025-09-09drm/i915/gt: Fix memory leak in hangcheck selftestJonathan Cavitt1-3/+1
In active_engines, if intel_context_create fails, we need to go backwards through all the created contexts to free/put them. However, the way this is currently performed skips the first created context, as if count == 1, then --count returns 0 and exits the while-loop prematurely without performing the intel_context_put on context 0. Fix this by post-decrementing count in the while-loop, rather than pre-decrementing it. This change makes the prior guard against count underflowing unnecessary, as the while-loop exits when count == 0. Signed-off-by: Jonathan Cavitt <jonathan.cavitt@intel.com> Cc: Krzysztof Karas <krzysztof.karas@intel.com> Cc: Andi Shyti <andi.shyti@linux.intel.com> Cc: Sebastian Brzezinka <sebastian.brzezinka@intel.com> Reviewed-by: Krzysztof Karas <krzysztof.karas@intel.com> Reviewed-by: Andi Shyti <andi.shyti@linux.intel.com> Signed-off-by: Andi Shyti <andi.shyti@linux.intel.com> Link: https://lore.kernel.org/r/20250904193041.12888-2-jonathan.cavitt@intel.com
2025-09-09drm/xe/debugfs: Don't expose dgfx residencies attributes on VFMichal Wajdeczko1-1/+1
In addition of checking if we are running on the BATTLEMAGE, we should also check for not being a VF driver, as VFs can't access necessary registers, and doing so leads to: | .. [drm] GT0: VF is trying to read an inaccessible register 0x35b004+0x0 | RIP: 0010:xe_gt_sriov_vf_read32+0x5e2/0x8a0 [xe] | Call Trace: | xe_mmio_read32+0x110/0x280 [xe] | read_residency_counter+0x42/0xd0 [xe] | dgfx_pkg_residencies_show+0x115/0x190 [xe] | .. [drm] Package G2 counter failed to read, ret -19 or | .. [drm] GT0: VF is trying to read an inaccessible register 0x35b004+0x0 | RIP: 0010:xe_gt_sriov_vf_read32+0x5e2/0x8a0 [xe] | Call Trace: | xe_mmio_read32+0x110/0x280 [xe] | read_residency_counter+0x42/0xd0 [xe] | dgfx_pcie_link_residencies_show+0xe7/0x160 [xe] | .. [drm] PCIE LINK L0 RESIDENCY counter failed to read, ret -19 Similarly, there is no point to expose inject_csc_hw_error on VFs, as HW errors support is already disabled for VFs. Bspec: 53221 Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Soham Purkait <soham.purkait@intel.com> Cc: Rodrigo Vivi <rodrigo.vivi@intel.com> Cc: Riana Tauro <riana.tauro@intel.com> Reviewed-by: Gustavo Sousa <gustavo.sousa@intel.com> Acked-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250905173625.8398-1-michal.wajdeczko@intel.com
2025-09-09drm/i915: Drop unused struct_mutex from drm_i915_privateLuiz Otavio Mello2-10/+0
The struct_mutex field in drm_i915_private is no longer used anywhere in the driver. This patch removes it completely to clean up unused code and avoid confusion. Signed-off-by: Luiz Otavio Mello <luiz.mello@estudante.ufscar.br> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250908131518.36625-9-luiz.mello@estudante.ufscar.br Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/i915: Clean-up outdated struct_mutex commentsLuiz Otavio Mello2-4/+2
The struct_mutex will be removed from the DRM subsystem, as it was a legacy BKL that was only used by i915 driver. After review, it was concluded that its usage was no longer necessary This patch updates various comments in the i915 codebase to either remove or clarify references to struct_mutex, in order to prevent future misunderstandings. * i915_drv.h: Removed the statement that stolen_lock is the inner lock when overlaps with struct_mutex, since struct_mutex is no longer used in the driver. * i915_gem.c: Removed parentheses suggesting usage of struct_mutex, which which is no longer used. Signed-off-by: Luiz Otavio Mello <luiz.mello@estudante.ufscar.br> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250908131518.36625-8-luiz.mello@estudante.ufscar.br Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/i915/display: Remove outdated struct_mutex commentsLuiz Otavio Mello1-5/+1
The struct_mutex will be removed from the DRM subsystem, as it was a legacy BKL that was only used by i915 driver. After review, it was concluded that its usage was no longer necessary This patch update a comment about struct_mutex in i915/display, in order to prevent future misunderstandings. * intel_fbc.c: Removed the statement that intel_fbc->lock is the inner lock when overlapping with struct_mutex, since struct_mutex is no longer used anywhere in the driver. Signed-off-by: Luiz Otavio Mello <luiz.mello@estudante.ufscar.br> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250908131518.36625-7-luiz.mello@estudante.ufscar.br Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/i915/gem: Clean-up outdated struct_mutex commentsLuiz Otavio Mello5-10/+10
The struct_mutex will be removed from the DRM subsystem, as it was a legacy BKL that was only used by i915 driver. After review, it was concluded that its usage was no longer necessary This patch updates various comments in the i915/gem and i915/gt codebase to either remove or clarify references to struct_mutex, in order to prevent future misunderstandings. * i915_gem_execbuffer.c: Replace reference to struct_mutex with vm->mutex, as noted in the eb_reserve() function, which states that vm->mutex handles deadlocks. * i915_gem_object.c: Change struct_mutex by drm_i915_gem_object->vma.lock. i915_gem_object_unbind() in i915_gem.c states that this lock is who actually protects the unbind. * i915_gem_shrinker.c: The correct lock is actually i915->mm.obj, as already documented in its declaration. * i915_gem_wait.c: The existing comment already mentioned that struct_mutex was no longer necessary. Updated to refer to a generic global lock instead. * intel_reset_types.h: Cleaned up the comment text. Updated to refer to a generic global lock instead. Signed-off-by: Luiz Otavio Mello <luiz.mello@estudante.ufscar.br> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250908131518.36625-6-luiz.mello@estudante.ufscar.br Acked-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/i915: Replace struct_mutex in intel_guc_logLuiz Otavio Mello2-2/+11
Remove the use of struct_mutex from intel_guc_log.c and replace it with a dedicated lock, guc_lock, defined within the intel_guc_log struct.      The struct_mutex was previously used to protect concurrent access and modification of intel_guc_log->level in intel_guc_log_set_level(). However, it was suggested that the lock should reside within the intel_guc_log struct itself.      Initialize the new guc_lock in intel_guc_log_init_early(), alongside the existing relay.lock. The lock is initialized using drmm_mutex_init(), which also ensures it is properly destroyed when the driver is unloaded. Signed-off-by: Luiz Otavio Mello <luiz.mello@estudante.ufscar.br> Suggested-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250908131518.36625-5-luiz.mello@estudante.ufscar.br Acked-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/i915: Change mutex initialization in intel_guc_logLuiz Otavio Mello1-1/+6
The intel_guc_log->relay.lock is currently initialized in intel_guc_log_init_early(), but it lacks a corresponding destructor, which can lead to a memory leak. This patch replaces the use of mutex_init() with drmm_mutex_init(), which ensures the lock is properly destroyed when the driver is unloaded. Signed-off-by: Luiz Otavio Mello <luiz.mello@estudante.ufscar.br> Reviewed-by: Rodrigo Vivi <rodrigo.vivi@intel.com> Link: https://lore.kernel.org/r/20250908131518.36625-4-luiz.mello@estudante.ufscar.br Acked-by: Tvrtko Ursulin <tursulin@ursulin.net> Signed-off-by: Rodrigo Vivi <rodrigo.vivi@intel.com>
2025-09-09drm/bridge: ite-it6263: Support HDMI vendor specific infoframeLiu Ying1-20/+44
IT6263 supports HDMI vendor specific infoframe. The infoframe header and payload are configurable via NULL packet registers. The infoframe is enabled and disabled via PKT_NULL_CTRL register. Add the HDMI vendor specific infoframe support. Signed-off-by: Liu Ying <victor.liu@nxp.com> Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com> Link: https://lore.kernel.org/r/20250908-it6263-vendor-specific-infoframe-v2-1-3f2ebd9135ad@nxp.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-09-09drm/bridge: write full Audio InfoFrameDmitry Baryshkov2-17/+24
Instead of writing the first byte of the infoframe (and hoping that the rest is default / zeroes), hook Audio InfoFrame support into the write_infoframe / clear_infoframes callbacks and use drm_atomic_helper_connector_hdmi_update_audio_infoframe() to write the frame. Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250903-adv7511-audio-infoframe-v1-2-05b24459b9a4@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
2025-09-09drm/bridge: adv7511: use update latch for AVI infoframesDmitry Baryshkov1-2/+13
Instead of disabling and then reenabling AVI infoframe, use the recommended way of updating it on the fly: latch current values using the ADV7511_REG_INFOFRAME_UPDATE register. Acked-by: Maxime Ripard <mripard@kernel.org> Link: https://lore.kernel.org/r/20250903-adv7511-audio-infoframe-v1-1-05b24459b9a4@oss.qualcomm.com Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>