aboutsummaryrefslogtreecommitdiff
path: root/drivers/gpu/drm/amd/amdgpu
AgeCommit message (Collapse)AuthorFilesLines
2023-06-13Revert "drm/amdgpu: remove TOPDOWN flags when allocating VRAM in large bar ↵Arunpravin Paneer Selvam1-1/+1
system" This reverts commit c105518679b6e87232874ffc989ec403bee59664. This patch disables the TOPDOWN flag for APU and few dGPU cards which has the VRAM size equal to the BAR size. When we enable the TOPDOWN flag, we get the free blocks at the highest available memory region and we don't split the lower order blocks. This change is required to keep off the fragmentation related issues particularly in ASIC which has VRAM space <= 500MiB Hence, we are reverting this patch. Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2270 Signed-off-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-06-13drm/amdgpu: vcn_4_0 set instance 0 init sched score to 1Sonny Jiang1-1/+5
Only vcn0 can process AV1 codecx. In order to use both vcn0 and vcn1 in h264/265 transcode to AV1 cases, set vcn0 sched score to 1 at initialization time. Signed-off-by: Sonny Jiang <sonjiang@amd.com> Reviewed-by: Leo Liu <leo.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-06-13drm/amd: Tighten permissions on VBIOS flashing attributesMario Limonciello1-2/+2
Non-root users shouldn't be able to try to trigger a VBIOS flash or query the flashing status. This should be reserved for users with the appropriate permissions. Cc: stable@vger.kernel.org Fixes: 8424f2ccb3c0 ("drm/amdgpu/psp: Add vbflash sysfs interface support") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-13drm/amd: Make sure image is written to trigger VBIOS image update flowMario Limonciello1-0/+3
The VBIOS image update flow requires userspace to: 1) Write the image to `psp_vbflash` 2) Read `psp_vbflash` 3) Poll `psp_vbflash_status` to check for completion If userspace reads `psp_vbflash` before writing an image, it's possible that it causes problems that can put the dGPU into an invalid state. Explicitly check that an image has been written before letting a read succeed. Cc: stable@vger.kernel.org Fixes: 8424f2ccb3c0 ("drm/amdgpu/psp: Add vbflash sysfs interface support") Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-13drm/amdgpu: add missing radeon secondary PCI IDAlex Deucher1-0/+1
0x5b70 is a missing RV370 secondary id. Add it so we don't try and probe it with amdgpu. Cc: michel@daenzer.net Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Tested-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-06-13drm/amdgpu: Implement gfx9 patch functions for resubmissionJiadong Zhu1-0/+80
Patch the packages including CONTEXT_CONTROL and WRITE_DATA for gfx9 during the resubmission scenario. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.3.x
2023-06-13drm/amdgpu: Modify indirect buffer packages for resubmissionJiadong Zhu4-0/+102
When the preempted IB frame resubmitted to cp, we need to modify the frame data including: 1. set PRE_RESUME 1 in CONTEXT_CONTROL. 2. use meta data(DE and CE) read from CSA in WRITE_DATA. Add functions to save the location the first time IBs emitted and callback to patch the package when resubmission happens. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.3.x
2023-06-13drm/amdgpu: Program gds backup address as zero if no gds allocatedJiadong Zhu1-5/+8
It is firmware requirement to set gds_backup_addrlo and gds_backup_addrhi of DE meta both zero if no gds partition is allocated for the frame. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.3.x
2023-06-13drm/amdgpu: Reset CP_VMID_PREEMPT after trailing fence signaledJiadong Zhu1-4/+4
When MEC executes unmap_queue for mid command buffer preemption, it will kick the write pointer of the gfx ring, set CP_VMID_PREEMPT to trigger the preemption and wait for CP_VMID_PREEMPT becomes zero after the preemption done. There is a race condition that PFP may excute the resetting command before MEC set CP_VMID_PREEMPT. As a result, hang happens as CP_VMID_PREEMPT is always 0xffff. To avoid this, we send resetting CP_VMID_PREEMPT command after the trailing fence is siganled and update gfx write pointer explicitly. Signed-off-by: Jiadong Zhu <Jiadong.Zhu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.3.x Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2535
2023-06-07drm/amdgpu: change reserved vram info printYiPeng Chai1-3/+4
The link object of mgr->reserved_pages is the blocks variable in struct amdgpu_vram_reservation, not the link variable in struct drm_buddy_block. Signed-off-by: YiPeng Chai <YiPeng.Chai@amd.com> Reviewed-by: Arunpravin Paneer Selvam <Arunpravin.PaneerSelvam@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-06-07drm/amdgpu: fix xclk freq on CHIP_STONEYChia-I Wu1-2/+9
According to Alex, most APUs from that time seem to have the same issue (vbios says 48Mhz, actual is 100Mhz). I only have a CHIP_STONEY so I limit the fixup to CHIP_STONEY Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-06-07Revert "drm/amdgpu: switch to golden tsc registers for raven/raven2"Alex Deucher1-40/+0
This reverts commit f03eb1d26c2739b75580f58bbab4ab2d5d3eba46. This results in inconsistent timing reported via asynchronous GPU queries. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html Cc: Jesse.Zhang@amd.com Cc: michel@daenzer.net Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-07Revert "drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according ↵Alex Deucher1-14/+19
to revision id" This reverts commit 9d2d1827af295fd6971786672c41c4dba3657154. This results in inconsistent timing reported via asynchronous GPU queries. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html Cc: Jesse.Zhang@amd.com Cc: michel@daenzer.net Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-07Revert "drm/amdgpu: change the reference clock for raven/raven2"Alex Deucher1-3/+4
This reverts commit fbc24293ca16b3b9ef891fe32ccd04735a6f8dc1. This results in inconsistent timing reported via asynchronous GPU queries. Link: https://lists.freedesktop.org/archives/amd-gfx/2023-May/093731.html Cc: Jesse.Zhang@amd.com Cc: michel@daenzer.net Reviewed-by: Michel Dänzer <mdaenzer@redhat.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-07drm/amd: Make lack of `ACPI_FADT_LOW_POWER_S0` or `CONFIG_AMD_PMC` louder ↵Mario Limonciello1-2/+2
during suspend path Users have reported that s2idle wasn't working on OEM Phoenix systems, but it was root caused to be because `CONFIG_AMD_PMC` wasn't set in the distribution kernel config. To make this more apparent, raise the messaging to err instead of warn. Link: https://bugzilla.kernel.org/show_bug.cgi?id=217497 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-07drm/amdgpu: fix Null pointer dereference error in amdgpu_device_recover_vramHoratio Zhang2-7/+4
Use the function of amdgpu_bo_vm_destroy to handle the resource release of shadow bo. During the amdgpu_mes_self_test, shadow bo released, but vmbo->shadow_list was not, which caused a null pointer reference error in amdgpu_device_recover_vram when GPU reset. Fixes: 6c032c37ac3e ("drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)") Signed-off-by: xinhui pan <xinhui.pan@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Acked-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-06-07drm/amd: Disallow s0ix without BIOS support againMario Limonciello1-2/+6
commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support") showed improvements to power consumption over suspend when s0ix wasn't enabled in BIOS and the system didn't support S3. This patch however was misguided because the reason the system didn't support S3 was because SMT was disabled in OEM BIOS setup. This prevented the BIOS from allowing S3. Also allowing GPUs to use the s2idle path actually causes problems if they're invoked on systems that may not support s2idle in the platform firmware. `systemd` has a tendency to try to use `s2idle` if `deep` fails for any reason, which could lead to unexpected flows. The original commit also fixed a problem during resume from suspend to idle without hardware support, but this is no longer necessary with commit ca4751866397 ("drm/amd: Don't allow s0ix on APUs older than Raven") Revert commit cf488dcd0ab7 ("drm/amd: Allow s0ix without BIOS support") to make it match the expected behavior again. Cc: Rafael Ávila de Espíndola <rafael@espindo.la> Link: https://github.com/torvalds/linux/blob/v6.1/drivers/gpu/drm/amd/amdgpu/amdgpu_acpi.c#L1060 Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/2599 Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-31drm/amdgpu: enable tmz by default for GC 11.0.1Ikshwaku Chauhan1-1/+2
Add IP GC 11.0.1 in the list of target to have tmz enabled by default. Signed-off-by: Ikshwaku Chauhan <ikshwaku.chauhan@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-05-31drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v4_0Horatio Zhang1-7/+21
Add ras_poison_irq and functions. And fix the amdgpu_irq_put call trace in jpeg_v4_0_hw_fini. [ 50.497562] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu] [ 50.497619] RSP: 0018:ffffaa2400fcfcb0 EFLAGS: 00010246 [ 50.497620] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 50.497621] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 50.497621] RBP: ffffaa2400fcfcd0 R08: 0000000000000000 R09: 0000000000000000 [ 50.497622] R10: 0000000000000000 R11: 0000000000000000 R12: ffff99b2105242d8 [ 50.497622] R13: 0000000000000000 R14: ffff99b210500000 R15: ffff99b210500000 [ 50.497623] FS: 0000000000000000(0000) GS:ffff99b518480000(0000) knlGS:0000000000000000 [ 50.497623] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 50.497624] CR2: 00007f9d32aa91e8 CR3: 00000001ba210000 CR4: 0000000000750ee0 [ 50.497624] PKRU: 55555554 [ 50.497625] Call Trace: [ 50.497625] <TASK> [ 50.497627] jpeg_v4_0_hw_fini+0x43/0xc0 [amdgpu] [ 50.497693] jpeg_v4_0_suspend+0x13/0x30 [amdgpu] [ 50.497751] amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu] [ 50.497802] amdgpu_device_ip_suspend+0x41/0x80 [amdgpu] [ 50.497854] amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu] [ 50.497905] amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu] [ 50.498005] amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu] [ 50.498060] process_one_work+0x21f/0x400 [ 50.498063] worker_thread+0x200/0x3f0 [ 50.498064] ? process_one_work+0x400/0x400 [ 50.498065] kthread+0xee/0x120 [ 50.498067] ? kthread_complete_and_exit+0x20/0x20 [ 50.498068] ret_from_fork+0x22/0x30 Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-31drm/amdgpu: add RAS POISON interrupt funcs for jpeg_v2_6Horatio Zhang1-6/+22
Add ras_poison_irq and functions. Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-31drm/amdgpu: separate ras irq from jpeg instance irq for UVD_POISONHoratio Zhang2-1/+29
Separate jpegbRAS poison consumption handling from the instance irq, and register dedicated ras_poison_irq src and funcs for UVD_POISON. v2: - Separate ras irq from jpeg instance irq - Improve the subject and code comments v3: - Split the patch into three parts - Improve the code comments Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-31drm/amdgpu: add RAS POISON interrupt funcs for vcn_v4_0Horatio Zhang1-6/+30
Add ras_poison_irq and functions. And fix the amdgpu_irq_put call trace in vcn_v4_0_hw_fini. [ 44.563572] RIP: 0010:amdgpu_irq_put+0xa4/0xc0 [amdgpu] [ 44.563629] RSP: 0018:ffffb36740edfc90 EFLAGS: 00010246 [ 44.563630] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000000 [ 44.563630] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000 [ 44.563631] RBP: ffffb36740edfcb0 R08: 0000000000000000 R09: 0000000000000000 [ 44.563631] R10: 0000000000000000 R11: 0000000000000000 R12: ffff954c568e2ea8 [ 44.563631] R13: 0000000000000000 R14: ffff954c568c0000 R15: ffff954c568e2ea8 [ 44.563632] FS: 0000000000000000(0000) GS:ffff954f584c0000(0000) knlGS:0000000000000000 [ 44.563632] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 44.563633] CR2: 00007f028741ba70 CR3: 000000026ca10000 CR4: 0000000000750ee0 [ 44.563633] PKRU: 55555554 [ 44.563633] Call Trace: [ 44.563634] <TASK> [ 44.563634] vcn_v4_0_hw_fini+0x62/0x160 [amdgpu] [ 44.563700] vcn_v4_0_suspend+0x13/0x30 [amdgpu] [ 44.563755] amdgpu_device_ip_suspend_phase2+0x240/0x470 [amdgpu] [ 44.563806] amdgpu_device_ip_suspend+0x41/0x80 [amdgpu] [ 44.563858] amdgpu_device_pre_asic_reset+0xd9/0x4a0 [amdgpu] [ 44.563909] amdgpu_device_gpu_recover.cold+0x548/0xcf1 [amdgpu] [ 44.564006] amdgpu_debugfs_reset_work+0x4c/0x80 [amdgpu] [ 44.564061] process_one_work+0x21f/0x400 [ 44.564062] worker_thread+0x200/0x3f0 [ 44.564063] ? process_one_work+0x400/0x400 [ 44.564064] kthread+0xee/0x120 [ 44.564065] ? kthread_complete_and_exit+0x20/0x20 [ 44.564066] ret_from_fork+0x22/0x30 Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-31drm/amdgpu: add RAS POISON interrupt funcs for vcn_v2_6Horatio Zhang1-4/+21
Add ras_poison_irq and functions. Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-31drm/amdgpu: separate ras irq from vcn instance irq for UVD_POISONHoratio Zhang2-1/+29
Separate vcn RAS poison consumption handling from the instance irq, and register dedicated ras_poison_irq src and funcs for UVD_POISON. v2: - Separate ras irq from vcn instance irq - Improve the subject and code comments v3: - Split the patch into three parts - Improve the code comments Suggested-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-24drm/amdgpu: don't enable secure display on incompatible platformsJesse Zhang1-1/+7
[why] [drm] psp gfx command LOAD_TA(0x1) failed and response status is (0x7) [drm] psp gfx command INVOKE_CMD(0x3) failed and response status is (0x4) amdgpu 0000:04:00.0: amdgpu: Secure display: Generic Failure. [how] don't enable secure display on incompatible platforms Suggested-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Jesse zhang <jesse.zhang@amd.com> Reviewed-by: Aaron Liu <aaron.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-24drm:amd:amdgpu: Fix missing buffer object unlock in failure pathSukrut Bellary2-2/+6
smatch warning - 1) drivers/gpu/drm/amd/amdgpu/gfx_v9_0.c:3615 gfx_v9_0_kiq_resume() warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'. 2) drivers/gpu/drm/amd/amdgpu/gfx_v10_0.c:6901 gfx_v10_0_kiq_resume() warn: inconsistent returns 'ring->mqd_obj->tbo.base.resv'. Signed-off-by: Sukrut Bellary <sukrut.bellary@linux.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-18drm/amdgpu: skip disabling fence driver src_irqs when device is unpluggedGuchun Chen1-1/+2
When performing device unbind or halt, we have disabled all irqs at the very begining like amdgpu_pci_remove or amdgpu_device_halt. So amdgpu_irq_put for irqs stored in fence driver should not be called any more, otherwise, below calltrace will arrive. [ 139.114088] WARNING: CPU: 2 PID: 1550 at drivers/gpu/drm/amd/amdgpu/amdgpu_irq.c:616 amdgpu_irq_put+0xf6/0x110 [amdgpu] [ 139.114655] Call Trace: [ 139.114655] <TASK> [ 139.114657] amdgpu_fence_driver_hw_fini+0x93/0x130 [amdgpu] [ 139.114836] amdgpu_device_fini_hw+0xb6/0x350 [amdgpu] [ 139.114955] amdgpu_driver_unload_kms+0x51/0x70 [amdgpu] [ 139.115075] amdgpu_pci_remove+0x63/0x160 [amdgpu] [ 139.115193] ? __pm_runtime_resume+0x64/0x90 [ 139.115195] pci_device_remove+0x3a/0xb0 [ 139.115197] device_remove+0x43/0x70 [ 139.115198] device_release_driver_internal+0xbd/0x140 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-18drm/amdgpu/gmc11: implement get_vbios_fb_size()Alex Deucher1-1/+20
Implement get_vbios_fb_size() so we can properly reserve the vbios splash screen to avoid potential artifacts on the screen during the transition from the pre-OS console to the OS console. Acked-by: Sunil Khatri <sunil.khatri@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-05-18drm/amdgpu: Differentiate between Raven2 and Raven/Picasso according to ↵Jesse Zhang1-19/+14
revision id Due to the raven2 and raven/picasso maybe have the same GC_HWIP version. So differentiate them by revision id. Signed-off-by: shanshengwang <shansheng.wang@amd.com> Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com> Acked-by: Evan Quan <evan.quan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-18drm/amdgpu/gfx11: Adjust gfxoff before powergating on gfx11 as wellGuilherme G. Piccoli1-1/+7
(Bas: speculative change to mirror gfx10/gfx9) Signed-off-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-05-18drm/amdgpu/gfx10: Disable gfxoff before disabling powergating.Bas Nieuwenhuizen1-1/+7
Otherwise we get a full system lock (looks like a FW mess). Copied the order from the GFX9 powergating code. Fixes: 366468ff6c34 ("drm/amdgpu: Allow GfxOff on Vangogh as default") Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2545 Signed-off-by: Bas Nieuwenhuizen <bas@basnieuwenhuizen.nl> Tested-by: Guilherme G. Piccoli <gpiccoli@igalia.com> Cc: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-17drm/amdgpu/gfx11: update gpu_clock_counter logicAlex Deucher1-4/+7
This code was written prior to previous updates to this logic for other chips. The RSC registers are part of SMUIO which is an always on block so there is no need to disable gfxoff. Additionally add the carryover and preemption checks. v2: rebase Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.y: 5591a051b86b: drm/amdgpu: refine get gpu clock counter method Cc: stable@vger.kernel.org # 6.2.y: 5591a051b86b: drm/amdgpu: refine get gpu clock counter method Cc: stable@vger.kernel.org # 6.3.y: 5591a051b86b: drm/amdgpu: refine get gpu clock counter method
2023-05-11drm/amdgpu: change gfx 11.0.4 external_id rangeYifan Zhang1-1/+1
gfx 11.0.4 range starts from 0x80. Fixes: 311d52367d0a ("drm/amdgpu: add soc21 common ip block support for GC 11.0.4") Cc: stable@vger.kernel.org Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reported-by: Yogesh Mohan Marimuthu <Yogesh.Mohanmarimuthu@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Tim Huang <Tim.Huang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-11drm/amdgpu/jpeg: Remove harvest checking for JPEG3Saleemkhan Jamadar1-0/+1
Register CC_UVD_HARVESTING is obsolete for JPEG 3.1.2 Signed-off-by: Saleemkhan Jamadar <saleemkhan.jamadar@amd.com> Reviewed-by: Veerabadhran Gopalakrishnan <Veerabadhran.Gopalakrishnan@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org # 6.1.x
2023-05-11drm/amdgpu/gfx: disable gfx9 cp_ecc_error_irq only when enabling legacy gfx rasGuchun Chen1-1/+2
gfx9 cp_ecc_error_irq is only enabled when legacy gfx ras is assert. So in gfx_v9_0_hw_fini, interrupt disablement for cp_ecc_error_irq should be executed under such condition, otherwise, an amdgpu_irq_put calltrace will occur. [ 7283.170322] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ 7283.170964] RSP: 0018:ffff9a5fc3967d00 EFLAGS: 00010246 [ 7283.170967] RAX: ffff98d88afd3040 RBX: ffff98d89da20000 RCX: 0000000000000000 [ 7283.170969] RDX: 0000000000000000 RSI: ffff98d89da2bef8 RDI: ffff98d89da20000 [ 7283.170971] RBP: ffff98d89da20000 R08: ffff98d89da2ca18 R09: 0000000000000006 [ 7283.170973] R10: ffffd5764243c008 R11: 0000000000000000 R12: 0000000000001050 [ 7283.170975] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105 [ 7283.170978] FS: 0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.170981] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.170983] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ 7283.170986] Call Trace: [ 7283.170988] <TASK> [ 7283.170989] gfx_v9_0_hw_fini+0x1c/0x6d0 [amdgpu] [ 7283.171655] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.172245] amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.172823] amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.173412] pci_pm_freeze+0x54/0xc0 [ 7283.173419] ? __pfx_pci_pm_freeze+0x10/0x10 [ 7283.173425] dpm_run_callback+0x98/0x200 [ 7283.173430] __device_suspend+0x164/0x5f0 v2: drop gfx11 as it's fixed in a different solution by retiring cp_ecc_irq funcs(Hawking) Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-11drm/amdgpu: disable sdma ecc irq only when sdma RAS is enabled in suspendGuchun Chen1-3/+5
sdma_v4_0_ip is shared on a few asics, but in sdma_v4_0_hw_fini, driver unconditionally disables ecc_irq which is only enabled on those asics enabling sdma ecc. This will introduce a warning in suspend cycle on those chips with sdma ip v4.0, while without sdma ecc. So this patch correct this. [ 7283.166354] RIP: 0010:amdgpu_irq_put+0x45/0x70 [amdgpu] [ 7283.167001] RSP: 0018:ffff9a5fc3967d08 EFLAGS: 00010246 [ 7283.167019] RAX: ffff98d88afd3770 RBX: 0000000000000001 RCX: 0000000000000000 [ 7283.167023] RDX: 0000000000000000 RSI: ffff98d89da30390 RDI: ffff98d89da20000 [ 7283.167025] RBP: ffff98d89da20000 R08: 0000000000036838 R09: 0000000000000006 [ 7283.167028] R10: ffffd5764243c008 R11: 0000000000000000 R12: ffff98d89da30390 [ 7283.167030] R13: ffff98d89da38978 R14: ffffffff999ae15a R15: ffff98d880130105 [ 7283.167032] FS: 0000000000000000(0000) GS:ffff98d996f00000(0000) knlGS:0000000000000000 [ 7283.167036] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7283.167039] CR2: 00000000f7a9d178 CR3: 00000001c42ea000 CR4: 00000000003506e0 [ 7283.167041] Call Trace: [ 7283.167046] <TASK> [ 7283.167048] sdma_v4_0_hw_fini+0x38/0xa0 [amdgpu] [ 7283.167704] amdgpu_device_ip_suspend_phase2+0x101/0x1a0 [amdgpu] [ 7283.168296] amdgpu_device_suspend+0x103/0x180 [amdgpu] [ 7283.168875] amdgpu_pmops_freeze+0x21/0x60 [amdgpu] [ 7283.169464] pci_pm_freeze+0x54/0xc0 Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Tao Zhou <tao.zhou1@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-11drm/amdgpu: Fix vram recover doesn't work after whole GPU reset (v2)Lin.Cao1-1/+5
v1: Vmbo->shadow is used to back vram bo up when vram lost. So that we should set shadow as vmbo->shadow to recover vmbo->bo v2: Modify if(vmbo->shadow) shadow = vmbo->shadow as if(!vmbo->shadow) continue; Fixes: e18aaea733da ("drm/amdgpu: move shadow_list to amdgpu_bo_vm") Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Lin.Cao <lincao12@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-11drm/amdgpu: drop gfx_v11_0_cp_ecc_error_irq_funcsHoratio Zhang2-49/+5
The gfx.cp_ecc_error_irq is retired in gfx11. In gfx_v11_0_hw_fini still use amdgpu_irq_put to disable this interrupt, which caused the call trace in this function. [ 102.873958] Call Trace: [ 102.873959] <TASK> [ 102.873961] gfx_v11_0_hw_fini+0x23/0x1e0 [amdgpu] [ 102.874019] gfx_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.874072] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.874122] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.874172] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.874223] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.874321] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.874375] process_one_work+0x21f/0x3f0 [ 102.874377] worker_thread+0x200/0x3e0 [ 102.874378] ? process_one_work+0x3f0/0x3f0 [ 102.874379] kthread+0xfd/0x130 [ 102.874380] ? kthread_complete_and_exit+0x20/0x20 [ 102.874381] ret_from_fork+0x22/0x30 v2: - Handle umc and gfx ras cases in separated patch - Retired the gfx_v11_0_cp_ecc_error_irq_funcs in gfx11 v3: - Improve the subject and code comments - Add judgment on gfx11 in the function of amdgpu_gfx_ras_late_init v4: - Drop the define of CP_ME1_PIPE_INST_ADDR_INTERVAL and SET_ECC_ME_PIPE_STATE which using in gfx_v11_0_set_cp_ecc_error_state - Check cp_ecc_error_irq.funcs rather than ip version for a more sustainable life v5: - Simplify judgment conditions Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-11drm/amdgpu: set gfx9 onwards APU atomics support to be trueYifan Zhang1-0/+6
APUs w/ gfx9 onwards doesn't reply on PCIe atomics, rather it is internal path w/ native atomic support. Set have_atomics_support to true. Signed-off-by: Yifan Zhang <yifan1.zhang@amd.com> Reviewed-by: Lang Yu <lang.yu@amd.com> Acked-by: Felix Kuehling <Felix.Kuehling@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-11drm/amdgpu/nv: update VCN 3 max HEVC encoding resolutionThong Thai1-6/+16
Update the maximum resolution reported for HEVC encoding on VCN 3 devices to reflect its 8K encoding capability. v2: Also update the max height for H.264 encoding to match spec. (Ruijing) Signed-off-by: Thong Thai <thong.thai@amd.com> Reviewed-by: Ruijing Dong <ruijing.dong@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-05Merge tag 'drm-next-2023-05-05' of git://anongit.freedesktop.org/drm/drmLinus Torvalds12-65/+76
Pull more drm fixes from Dave Airlie: "This is the fixes for the last couple of weeks for i915 and last 3 weeks for amdgpu, lots of them but pretty scattered around and all pretty small. amdgpu: - SR-IOV fixes - DCN 3.2 fixes - DC mclk handling fixes - eDP fixes - SubVP fixes - HDCP regression fix - DSC fixes - DC FP fixes - DCN 3.x fixes - Display flickering fix when switching between vram and gtt - Z8 power saving fix - Fix hang when skipping modeset - GPU reset fixes - Doorbell fix when resizing BARs - Fix spurious warnings in gmc - Locking fix for AMDGPU_SCHED IOCTL - SR-IOV fix - DCN 3.1.4 fix - DCN 3.2 fix - Fix job cleanup when CS is aborted i915: - skl pipe source size check - mtl transcoder mask fix - DSI power on sequence fix - GuC versioning corner case fix" * tag 'drm-next-2023-05-05' of git://anongit.freedesktop.org/drm/drm: (48 commits) drm/amdgpu: drop redundant sched job cleanup when cs is aborted drm/amd/display: filter out invalid bits in pipe_fuses drm/amd/display: Change default Z8 watermark values drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOV drm/amdgpu: add a missing lock for AMDGPU_SCHED drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini() drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini drm/amdgpu: Enable doorbell selfring after resize FB BAR drm/amdgpu: Use the default reset when loading or reloading the driver drm/amdgpu: Fix mode2 reset for sienna cichlid drm/i915/dsi: Use unconditional msleep() instead of intel_dsi_msleep() drm/i915/mtl: Add the missing CPU transcoder mask in intel_device_info drm/i915/guc: Actually return an error if GuC version range check fails drm/amd/display: Lowering min Z8 residency time drm/amd/display: fix flickering caused by S/G mode drm/amd/display: Set min_width and min_height capability for DCN30 drm/amd/display: Isolate remaining FPU code in DCN32 drm/amd/display: Update bounding box values for DCN321 drm/amd/display: Do not clear GPINT register when releasing DMUB from reset ...
2023-05-03drm/amdgpu: drop redundant sched job cleanup when cs is abortedGuchun Chen1-10/+3
Once command submission failed due to userptr invalidation in amdgpu_cs_submit, legacy code will perform cleanup of scheduler job. However, it's not needed at all, as former commit has integrated job cleanup stuff into amdgpu_job_free. Otherwise, because of double free, a NULL pointer dereference will occur in such scenario. Bug: https://gitlab.freedesktop.org/drm/amd/-/issues/2457 Fixes: f7d66fb2ea43 ("drm/amdgpu: cleanup scheduler job initialization v2") Signed-off-by: Guchun Chen <guchun.chen@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-03drm/amdgpu: disable SDMA WPTR_POLL_ENABLE for SR-IOVHorace Chen1-4/+1
[Why] This WPTR_POLL_ENABLE is a hardware contigious polling which will cause FCLK and UCLK to keep on a high level. Mostly its case can be covered by F32_WPTR_POLL_ENABLE which polls by firmware. So to save power, SR-IOV also needs to disable this bit Signed-off-by: Horace Chen <horace.chen@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: add a missing lock for AMDGPU_SCHEDChia-I Wu1-1/+5
mgr->ctx_handles should be protected by mgr->lock. v2: improve commit message v3: add a Fixes tag Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Christian König <christian.koenig@amd.com> Fixes: 52c6a62c64fa ("drm/amdgpu: add interface for editing a foreign process's priority v3") Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: fix an amdgpu_irq_put() issue in gmc_v9_0_hw_fini()Hamza Mahfooz1-1/+0
As made mention of in commit 08c677cb0b43 ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_fini") and commit 13af556104fa ("drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_fini"). It is meaningless to call amdgpu_irq_put() for gmc.ecc_irq. So, remove it from gmc_v9_0_hw_fini(). Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: 3029c855d79f ("drm/amdgpu: Fix desktop freezed after gpu-reset") Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Hamza Mahfooz <hamza.mahfooz@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Cc: stable@vger.kernel.org
2023-05-03drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v10_0_hw_finiHoratio Zhang1-1/+0
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v10_0_hw_fini, which also leads to the call trace. [ 82.340264] Call Trace: [ 82.340265] <TASK> [ 82.340269] gmc_v10_0_hw_fini+0x83/0xa0 [amdgpu] [ 82.340447] gmc_v10_0_suspend+0xe/0x20 [amdgpu] [ 82.340623] amdgpu_device_ip_suspend_phase2+0x127/0x1c0 [amdgpu] [ 82.340789] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 82.340955] amdgpu_device_pre_asic_reset+0xdd/0x2b0 [amdgpu] [ 82.341122] amdgpu_device_gpu_recover.cold+0x4dd/0xbb2 [amdgpu] [ 82.341359] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 82.341529] process_one_work+0x21d/0x3f0 [ 82.341535] worker_thread+0x1fa/0x3c0 [ 82.341538] ? process_one_work+0x3f0/0x3f0 [ 82.341540] kthread+0xff/0x130 [ 82.341544] ? kthread_complete_and_exit+0x20/0x20 [ 82.341547] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: c8b5a95b5709 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Cc: stable@vger.kernel.org
2023-05-03drm/amdgpu: fix amdgpu_irq_put call trace in gmc_v11_0_hw_finiHoratio Zhang1-1/+0
The gmc.ecc_irq is enabled by firmware per IFWI setting, and the host driver is not privileged to enable/disable the interrupt. So, it is meaningless to use the amdgpu_irq_put function in gmc_v11_0_hw_fini, which also leads to the call trace. [ 102.980303] Call Trace: [ 102.980303] <TASK> [ 102.980304] gmc_v11_0_hw_fini+0x54/0x90 [amdgpu] [ 102.980357] gmc_v11_0_suspend+0xe/0x20 [amdgpu] [ 102.980409] amdgpu_device_ip_suspend_phase2+0x240/0x460 [amdgpu] [ 102.980459] amdgpu_device_ip_suspend+0x3d/0x80 [amdgpu] [ 102.980520] amdgpu_device_pre_asic_reset+0xd9/0x490 [amdgpu] [ 102.980573] amdgpu_device_gpu_recover.cold+0x548/0xce6 [amdgpu] [ 102.980687] amdgpu_debugfs_reset_work+0x4c/0x70 [amdgpu] [ 102.980740] process_one_work+0x21f/0x3f0 [ 102.980741] worker_thread+0x200/0x3e0 [ 102.980742] ? process_one_work+0x3f0/0x3f0 [ 102.980743] kthread+0xfd/0x130 [ 102.980743] ? kthread_complete_and_exit+0x20/0x20 [ 102.980744] ret_from_fork+0x22/0x30 Signed-off-by: Horatio Zhang <Hongkun.Zhang@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Guchun Chen <guchun.chen@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> Link: https://gitlab.freedesktop.org/drm/amd/-/issues/2522 Fixes: c8b5a95b5709 ("drm/amdgpu: Fix desktop freezed after gpu-reset") Cc: stable@vger.kernel.org
2023-05-03drm/amdgpu: Enable doorbell selfring after resize FB BARShane Xiao3-30/+41
[Why] The selfring doorbell aperture will change when resize FB BAR successfully during gmc sw init, we should reorder the sequence of enabling doorbell selfring aperture. [How] Move enable_doorbell_selfring_aperture from *_common_hw_init to *_common_late_init. This fixes the potential issue that GPU ring its own doorbell when this device is in translated mode when iommu is on. v2: Remove *_enable_doorbell_aperture functions (Christian) v3: Add comments to note that why we need enable doorbell selfring late (Christian) Signed-off-by: Shane Xiao <shane.xiao@amd.com> Signed-off-by: Aaron Liu <aaron.liu@amd.com> Tested-by: Xiaomeng Hou <Xiaomeng.Hou@amd.com> Reviewed-by: Christian K�nig <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: Use the default reset when loading or reloading the driverlyndonli1-0/+7
Below call trace and errors are observed when reloading amdgpu driver with the module parameter reset_method=3. It should do a default reset when loading or reloading the driver, regardless of the module parameter reset_method. v2: add comments inside and modify commit messages. [ +2.180243] [drm] psp gfx command ID_LOAD_TOC(0x20) failed and response status is (0x0) [ +0.000011] [drm:psp_hw_start [amdgpu]] *ERROR* Failed to load toc [ +0.000890] [drm:psp_hw_start [amdgpu]] *ERROR* PSP tmr init failed! [ +0.020683] [drm:amdgpu_fill_buffer [amdgpu]] *ERROR* Trying to clear memory with ring turned off. [ +0.000003] RIP: 0010:amdgpu_bo_release_notify+0x1ef/0x210 [amdgpu] [ +0.000004] Call Trace: [ +0.000003] <TASK> [ +0.000008] ttm_bo_release+0x2c4/0x330 [amdttm] [ +0.000026] amdttm_bo_put+0x3c/0x70 [amdttm] [ +0.000020] amdgpu_bo_free_kernel+0xe6/0x140 [amdgpu] [ +0.000728] psp_v11_0_ring_destroy+0x34/0x60 [amdgpu] [ +0.000826] psp_hw_init+0xe7/0x2f0 [amdgpu] [ +0.000813] amdgpu_device_fw_loading+0x1ad/0x2d0 [amdgpu] [ +0.000731] amdgpu_device_init.cold+0x108e/0x2002 [amdgpu] [ +0.001071] ? do_pci_enable_device+0xe1/0x110 [ +0.000011] amdgpu_driver_load_kms+0x1a/0x160 [amdgpu] [ +0.000729] amdgpu_pci_probe+0x179/0x3a0 [amdgpu] Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2023-05-03drm/amdgpu: Fix mode2 reset for sienna cichlidlyndonli1-1/+1
Before this change, sienna_cichlid_get_reset_handler will always return NULL, although the module parameter reset_method is 3 when loading amdgpu driver. Signed-off-by: lyndonli <Lyndon.Li@amd.com> Signed-off-by: Yunxiang Li <Yunxiang.Li@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Kenneth Feng <kenneth.feng@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>