linux.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author	Files	Lines
2025-12-17	drm/xe/oa: Move default oa unit assignment earlier during stream open	Ashutosh Dixit	1	-4/+4
	De-referencing param.oa_unit, when an OA unit id is not provided during stream open, results in NPD below. Oops: general protection fault, probably for non-canonical address... KASAN: null-ptr-deref in range... RIP: 0010:xe_oa_stream_open_ioctl+0x169/0x38a0 xe_observation_ioctl+0x19f/0x270 drm_ioctl_kernel+0x1f4/0x410 Fix this by moving default oa unit assignment before the dereference. Reported-by: Peter Senna Tschudin <peter.senna@linux.intel.com> Closes: https://gitlab.freedesktop.org/drm/xe/kernel/-/issues/6840 Fixes: c7e269aa565f ("drm/xe/oa: Allow exec_queue's to be specified only for OAG OA unit") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Harish Chegondi <harish.chegondi@intel.com> Link: https://patch.msgid.link/20251212061850.1565459-2-ashutosh.dixit@intel.com
2025-12-17	drm/xe/pf: Add handling for MLRC adverse event threshold	Daniele Ceraolo Spurio	2	-0/+10
	Since it is illegal to register a MLRC context when scheduler groups are enabled, the GuC consider the VF doing so as an adverse event. Like for other adverse event, there is a threshold for how many times the event can happen before the GuC throws an error, which we need to add support for. Signed-off-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-5-michal.wajdeczko@intel.com
2025-12-17	drm/xe/pf: Prepare for new threshold KLVs	Michal Wajdeczko	3	-10/+23
	We want to extend our macro-based KLV list definitions with new information about the version from which given KLV is supported. Prepare our code generators to emit dedicated version check if a KLV was defined with the version information. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-4-michal.wajdeczko@intel.com
2025-12-17	drm/xe/guc: Introduce GUC_FIRMWARE_VER_AT_LEAST helper	Michal Wajdeczko	3	-3/+24
	There are already few places in the code where we need to check GuC firmware version. Wrap existing raw conditions into a named helper macro to make it clear and avoid explicit call of the MAKE_GUC_VER. Suggested-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Cc: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251216214902.1429-3-michal.wajdeczko@intel.com
2025-12-17	drm/xe: Introduce IF_ARGS macro utility	Michal Wajdeczko	2	-0/+81
	We want to extend our macro-based KLV list definitions with new information about the version from which given KLV is supported. Add utility IF_ARGS macro that can be used in code generators to emit different code based on the presence of additional arguments. Introduce macro itself and extend our kunit tests to cover it. We will use this macro in next patch. Signed-off-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Reviewed-by: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com> Acked-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251217224018.3490-1-michal.wajdeczko@intel.com
2025-12-17	drm/xe: Fix NULL pointer dereference in xe_exec_ioctl	Tapani Pälli	1	-6/+4
	Helper function xe_sync_needs_wait expects sync->fence when accessing flags, patch makes sure we call only when sync->fence exists. v2: move null checking to xe_sync_needs_wait and make xe_sync_entry_wait utilize this helper (Matthew Auld) v3: further simplify code (Matthew Auld) Fixes NULL pointer dereference seen with Vulkan workloads: [ 118.410401] RIP: 0010:xe_sync_needs_wait+0x27/0x50 [xe] Fixes: 4ac9048d0501 ("drm/xe: Wait on in-syncs when swicthing to dma-fence mode") Signed-off-by: Tapani Pälli <tapani.palli@intel.com> Reviewed-by: Matthew Auld <matthew.auld@intel.com> Reviewed-by: Matthew Brost <matthew.brost@intel.com> Signed-off-by: Matthew Brost <matthew.brost@intel.com> Link: https://patch.msgid.link/20251217132412.435755-1-tapani.palli@intel.com
2025-12-17	drm/panthor: fix for dma-fence safe access rules	Chia-I Wu	1	-0/+4
	Commit 506aa8b02a8d6 ("dma-fence: Add safe access helpers and document the rules") details the dma-fence safe access rules. The most common culprit is that drm_sched_fence_get_timeline_name may race with group_free_queue. Signed-off-by: Chia-I Wu <olvaffe@gmail.com> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Reviewed-by: Liviu Dudau <liviu.dudau@arm.com> Reviewed-by: Steven Price <steven.price@arm.com> Cc: stable@vger.kernel.org # v6.17+ Signed-off-by: Steven Price <steven.price@arm.com> Link: https://patch.msgid.link/20251204174545.399059-1-olvaffe@gmail.com
2025-12-17	drm/panthor: Fix NULL pointer dereference on panthor_fw_unplug	Karunika Choo	1	-4/+0
	This patch removes the MCU halt and wait for halt procedures during panthor_fw_unplug() as the MCU can be in a variety of states or the FW may not even be loaded/initialized at all, the latter of which can lead to a NULL pointer dereference. It should be safe on unplug to just disable the MCU without waiting for it to halt as it may not be able to. Fixes: 514072549865 ("drm/panthor: Support GLB_REQ.STATE field for Mali-G1 GPUs") Suggested-by: Boris Brezillon <boris.brezillon@collabora.com> Signed-off-by: Karunika Choo <karunika.choo@arm.com> Reviewed-by: Liviu Dudau <liviu@dudau.co.uk> Reviewed-by: Boris Brezillon <boris.brezillon@collabora.com> Link: https://patch.msgid.link/20251215203312.1084182-1-karunika.choo@arm.com Signed-off-by: Boris Brezillon <boris.brezillon@collabora.com>
2025-12-17	drm/xe/xe_sriov_vfio: Fix return value in xe_sriov_vfio_migration_supported()	Dan Carpenter	1	-1/+1
	The xe_sriov_vfio_migration_supported() function is type bool so returning -EPERM means returning true. Return false instead. Fixes: 17f22465c5a5 ("drm/xe/pf: Export helpers for VFIO") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/aTLEZ4g-FD-iMQ2V@stanley.mountain Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-12-17	drm/xe/vf: fix return type in vf_migration_init_late()	Dan Carpenter	1	-1/+1
	The vf_migration_init_late() function is supposed to return zero on success and negative error codes on failure. The error code eventually gets propagated back to the probe() function and returned. The problem is it's declared as type bool so it returns true on error. Change it to type int instead. Fixes: 2e2dab20dd66 ("drm/xe/vf: Enable VF migration only on supported GuC versions") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Michal Wajdeczko <michal.wajdeczko@intel.com> Link: https://patch.msgid.link/aTK9pwJ_roc8vpDi@stanley.mountain Signed-off-by: Michał Winiarski <michal.winiarski@intel.com>
2025-12-16	drm/xe/oa: Always set OAG_OAGLBCTXCTRL_COUNTER_RESUME	Ashutosh Dixit	1	-3/+4
	Reports can be written out to the OA buffer using ways other than periodic sampling. These include mmio trigger and context switches. To support these use cases, when periodic sampling is not enabled, OAG_OAGLBCTXCTRL_COUNTER_RESUME must be set. Fixes: 1db9a9dc90ae ("drm/xe/oa: OA stream initialization (OAG)") Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20251205212613.826224-4-ashutosh.dixit@intel.com
2025-12-16	drm/xe/rtp: Whitelist OAMERT MMIO trigger registers	Ashutosh Dixit	1	-0/+19
	Whitelist OAMERT registers to enable userspace to execute MMIO triggers on OAMERT units. Registers are whitelisted for compute and copy class engines. Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Link: https://patch.msgid.link/20251205212613.826224-3-ashutosh.dixit@intel.com
2025-12-16	drm/xe/oa/uapi: Expose MERT OA unit	Ashutosh Dixit	2	-3/+43
	A MERT OA unit is available in the SoC on some platforms. Add support for this OA unit and expose it to userspace. The MERT OA unit does not have any HW engines attached, but is otherwise similar to an OAM unit. Signed-off-by: Lucas De Marchi <lucas.demarchi@intel.com> Reviewed-by: Umesh Nerlige Ramappa <umesh.nerlige.ramappa@intel.com> Signed-off-by: Ashutosh Dixit <ashutosh.dixit@intel.com> Link: https://patch.msgid.link/20251205212613.826224-2-ashutosh.dixit@intel.com
2025-12-16	drm/amdkfd: Fix improper NULL termination of queue restore SMI event string	Brian Kocoloski	1	-1/+1
	Pass character "0" rather than NULL terminator to properly format queue restoration SMI events. Currently, the NULL terminator precedes the newline character that is intended to delineate separate events in the SMI event buffer, which can break userspace parsers. Signed-off-by: Brian Kocoloski <brian.kocoloski@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 6e7143e5e6e21f9d5572e0390f7089e6d53edf3c)
2025-12-16	drm/amd/pm: restore SCLK settings after S0ix resume	mythilam	2	-5/+37
	User-configured SCLK(GPU core clock)frequencies were not persisting across S0ix suspend/resume cycles on smu v14 hardware. The issue occurred because of the code resetting clock frequency to zero during resume. This patch addresses the problem by: - Preserving user-configured values in driver and sets the clock frequency across resume - Preserved settings are sent to the hardware during resume Signed-off-by: mythilam <mythilam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 20ba98326f4c69e6bf8d1f42942ece485a675b27)
2025-12-16	drm/amdgpu: fix a job->pasid access race in gpu recovery	Alex Deucher	1	-2/+8
	Avoid a possible UAF in GPU recovery due to a race between the sched timeout callback and the tdr work queue. The gpu recovery function calls drm_sched_stop() and later drm_sched_start(). drm_sched_start() restarts the tdr queue which will eventually free the job. If the tdr queue frees the job before time out callback completes, the job will be freed and we'll get a UAF when accessing the pasid. Cache it early to avoid the UAF. Example KASAN trace: [ 493.058141] BUG: KASAN: slab-use-after-free in amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.067530] Read of size 4 at addr ffff88b0ce3f794c by task kworker/u128:1/323 [ 493.074892] [ 493.076485] CPU: 9 UID: 0 PID: 323 Comm: kworker/u128:1 Tainted: G E 6.16.0-1289896.2.zuul.bf4f11df81c1410bbe901c4373305a31 #1 PREEMPT(voluntary) [ 493.076493] Tainted: [E]=UNSIGNED_MODULE [ 493.076495] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [ 493.076500] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] [ 493.076512] Call Trace: [ 493.076515] <TASK> [ 493.076518] dump_stack_lvl+0x64/0x80 [ 493.076529] print_report+0xce/0x630 [ 493.076536] ? _raw_spin_lock_irqsave+0x86/0xd0 [ 493.076541] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 493.076545] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077253] kasan_report+0xb8/0xf0 [ 493.077258] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077965] amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.078672] ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu] [ 493.079378] ? amdgpu_coredump+0x1fd/0x4c0 [amdgpu] [ 493.080111] amdgpu_job_timedout+0x642/0x1400 [amdgpu] [ 493.080903] ? pick_task_fair+0x24e/0x330 [ 493.080910] ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu] [ 493.081702] ? _raw_spin_lock+0x75/0xc0 [ 493.081708] ? __pfx__raw_spin_lock+0x10/0x10 [ 493.081712] drm_sched_job_timedout+0x1b0/0x4b0 [gpu_sched] [ 493.081721] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081725] process_one_work+0x679/0xff0 [ 493.081732] worker_thread+0x6ce/0xfd0 [ 493.081736] ? __pfx_worker_thread+0x10/0x10 [ 493.081739] kthread+0x376/0x730 [ 493.081744] ? __pfx_kthread+0x10/0x10 [ 493.081748] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081751] ? __pfx_kthread+0x10/0x10 [ 493.081755] ret_from_fork+0x247/0x330 [ 493.081761] ? __pfx_kthread+0x10/0x10 [ 493.081764] ret_from_fork_asm+0x1a/0x30 [ 493.081771] </TASK> Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info") Link: https://github.com/HansKristian-Work/vkd3d-proton/pull/2670 Cc: SRINIVASAN.SHANMUGAM@amd.com Cc: vitaly.prosyak@amd.com Cc: christian.koenig@amd.com Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 20880a3fd5dd7bca1a079534cf6596bda92e107d)
2025-12-16	drm/amd/display: Fix DP no audio issue	Charlene Liu	1	-4/+4
	[why] need to enable APG_CLOCK_ENABLE enable first also need to wake up az from D3 before access az block Reviewed-by: Swapnil Patel <swapnil.patel@amd.com> Signed-off-by: Charlene Liu <Charlene.Liu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit bf5e396957acafd46003318965500914d5f4edfa)
2025-12-16	drm/amd/display: Fix scratch registers offsets for DCN351	Ray Wu	1	-4/+4
	[Why] Different platforms use different NBIO header files, causing display code to use differnt offset and read wrong accelerated status. [How] - Unified NBIO offset header file across platform. - Correct scratch registers offsets to proper locations. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4667 Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 576e032e909c8a6bb3d907b4ef5f6abe0f644199) Cc: stable@vger.kernel.org
2025-12-16	drm/amd/display: Fix scratch registers offsets for DCN35	Ray Wu	1	-4/+4
	[Why] Different platforms use differnet NBIO header files, causing display code to use differnt offset and read wrong accelerated status. [How] - Unified NBIO offset header file across platform. - Correct scratch registers offsets to proper locations. Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4667 Cc: Mario Limonciello <mario.limonciello@amd.com> Cc: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Ray Wu <ray.wu@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 49a63bc8eda0304ba307f5ba68305f936174f72d) Cc: stable@vger.kernel.org
2025-12-16	drm/amd: Resume the device in thaw() callback when console suspend is disabled	Mario Limonciello (AMD)	1	-1/+4
	If console suspend has been disabled using `no_console_suspend` also wake up during thaw() so that some messages can be seen for debugging. Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4191 Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com> (cherry picked from commit 63387cbbb714d9f0d179d9d4560de1408d0906de)
2025-12-16	drm/radeon: Convert legacy DRM logging in evergreen.c to drm_* helpers	Abhishek Rajput	1	-50/+62
	Replace DRM_DEBUG(), DRM_ERROR(), and DRM_INFO() calls with the corresponding drm_dbg(), drm_err(), and drm_info() helpers in the radeon driver. The drm_() logging helpers take a struct drm_device argument, allowing the DRM core to prefix log messages with the correct device name and instance. This is required to correctly distinguish log messages on systems with multiple GPUs. This change aligns radeon with the DRM TODO item: "Convert logging to drm_* functions with drm_device parameter". Signed-off-by: Abhishek Rajput <abhiraj21put@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Add gfx v12_1 interrupt source header	Hawking Zhang	3	-6/+142
	To acommandate specific interrupt source for gfx v12_1 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdkfd: Override KFD SVM mappings for GFX 12.1	Mukul Joshi	1	-1/+3
	Override the local MTYPE mappings in KFD SVM code with mtype_local modprobe param for GFX 12.1. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Alex Sierra <alex.sierra@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: correct rlc autoload for xcc harvest	Likun Gao	1	-1/+2
	If the number instances of firmware is RLC_NUM_INS_CODE0(Only 1 inst), need to copy it directly for rlcautolad. For the firmware which instances number bigger than 1, only copy for enabled XCC to save copy time. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: add gfx sysfs support for gfx_v12_1	Likun Gao	1	-0/+4
	Add gfx sysfs support for gfx_v12_1. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu/mes_v12_1: fix mes access xcd register	Jack Xiao	1	-0/+20
	Fix to use local register offset inside die for mes fw accessing local/remote xcd register. Signed-off-by: Jack Xiao <Jack.Xiao@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: normalize reg addr as local xcc for gfx v12_1	Likun Gao	1	-0/+30
	Normalize registers address to local xcc address for gfx v12_1. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: support xcc harvest for ih translate	Likun Gao	1	-4/+6
	Support xcc harvest for ih translate to logic xcc. V2: Only check available instances Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Correct inst_id input from physical to logic	Likun Gao	1	-25/+25
	Correct inst_id input from physical to logic for sdma v7_1. V2: Show real instance number on logic xcc. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: use physical xcc id to get rrmt	Likun Gao	1	-10/+16
	Use physical xcc_id to get rrmt on misc_op for mes v12_1. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/radeon: Convert logging in radeon_display.c to drm_* helpers	Mukesh Ogare	1	-27/+39
	Replace DRM_ERROR() and DRM_INFO() calls in drivers/gpu/drm/radeon/radeon_display.c with the corresponding drm_err() and drm_info() helpers. The drm_() logging functions take a struct drm_device argument, allowing the DRM core to prefix log messages with the correct device name and instance. This is required to correctly distinguish log messages on systems with multiple GPUs. This change aligns radeon with the DRM TODO item: "Convert logging to drm_* functions with drm_device parameter". Acked-by: Christian König <christian.koenig@amd.com> Signed-off-by: Mukesh Ogare <mukeshogare871@gmail.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdkfd: Fix improper NULL termination of queue restore SMI event string	Brian Kocoloski	1	-1/+1
	Pass character "0" rather than NULL terminator to properly format queue restoration SMI events. Currently, the NULL terminator precedes the newline character that is intended to delineate separate events in the SMI event buffer, which can break userspace parsers. Signed-off-by: Brian Kocoloski <brian.kocoloski@amd.com> Reviewed-by: Philip Yang <Philip.Yang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Correct xcc_id input to GET_INST from physical to logic	Likun Gao	3	-37/+24
	Correct xcc_id input to GET_INST from physical to logic for gfx_v12_1. Signed-off-by: Likun Gao <Likun.Gao@amd.com> Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Fix CP_MEC_MDBASE in multi-xcc for gfx v12_1	Michael Chen	1	-94/+98
	Need to allocate memory for MEC FW data and program registers CP_MEC_MDBASE for each XCC respectively. Signed-off-by: Michael Chen <michael.chen@amd.com> Acked-by: Harish Kasiviswanathan <Harish.Kasiviswanathan@amd.com> Reviewed-by: Shaoyun.liu <Shaoyun.liu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Support 57bit fault address for GFX 12.1.0	Philip Yang	1	-1/+1
	The gmc fault virtual address is up to 57bit for 5 level page table, this also works with 48bit virtual address for 4 level page table. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Add pde3 table invalidation request for GFX 12.1.0	Philip Yang	2	-0/+3
	Set pde3 invalidation request bit during tlb flush for up to 5 level page table. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Christian König <christian.koenig@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdkfd: Update LDS, Scratch base for 57bit address	Philip Yang	2	-10/+14
	For 5-level page tables, update compute vmid sh_mem_base LDS aperture and Scratch aperture base address to above 57-bit, use the same setting from gfx vmid, we can remove the duplicate macro. Update queue pdd lds_base and scratch_base to the same value as sh_mem_base setting. Then application get process apertures return the correct value to access LDS and Scratch memory for 57bit address 5-level page tables. This may pass to MES in future when mapping queue. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Enable 5-level page table for GFX 12.1.0	Philip Yang	1	-3/+3
	GFX 12.1.0 support 57bit virtual, 52bit physical address, set PDE max_level to 4, min_vm_size to 128PB to enable GPU vm 5-level page tables to support 57bit virtual address. Signed-off-by: Philip Yang <Philip.Yang@amd.com> Acked-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: init RS64_MEC_P2/P3_STACK for gfx12.1	Feifei Xu	1	-0/+2
	Add GFX12.1 MEC P2/P3 STACK firmware init. Signed-off-by: Feifei Xu <Feifei.Xu@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Fix CU info calculations for GFX 12.1	Mukul Joshi	1	-51/+27
	This patch fixes the CU info calculations for gfx 12.1. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdkfd: Update CWSR area calculations for GFX 12.1	Mukul Joshi	1	-9/+54
	Update the SGPR, VGPR, HWREG size and number of waves supported for GFX 12.1 CWSR memory limits. The CU calculation changed in topology, as a result, the values need to be updated. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Feifei Xu <Feifei.Xu@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Add soc v1_0 ih client id table	Hawking Zhang	6	-8/+96
	To acommandate the specific ih client for soc v1_0 Signed-off-by: Hawking Zhang <Hawking.Zhang@amd.com> Reviewed-by: Likun Gao <Likun.Gao@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Flush TLB on all XCCs on GFX 12.1	Mukul Joshi	1	-1/+1
	Currently, the driver code is flushing TLB on XCC 0 only. Fix it by flushing on all XCCs within the partition. Signed-off-by: Mukul Joshi <mukul.joshi@amd.com> Reviewed-by: Felix Kuehling <felix.kuehling@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amd/pm: restore SCLK settings after S0ix resume	mythilam	2	-5/+37
	User-configured SCLK(GPU core clock)frequencies were not persisting across S0ix suspend/resume cycles on smu v14 hardware. The issue occurred because of the code resetting clock frequency to zero during resume. This patch addresses the problem by: - Preserving user-configured values in driver and sets the clock frequency across resume - Preserved settings are sent to the hardware during resume Signed-off-by: mythilam <mythilam@amd.com> Acked-by: Alex Deucher <alexander.deucher@amd.com> Reviewed-by: Yang Wang <kevinyang.wang@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: do not use amdgpu_bo_gpu_offset_no_check individually	Saleemkhan Jamadar	3	-2/+30
	This should not be used indiviually, use amdgpu_bo_gpu_offset with bo reserved. v3 - unpin bo in queue destroy (Christian) v2 - pin bo so that offset returned won't change after unlock (Christian) Signed-off-by: Saleemkhan Jamadar <saleemkhan083@gmail.com> Suggested-by: Christian König <christian.koenig@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Change set ip clock/power gating param	Lijo Lazar	2	-8/+6
	It's not required to use generic void , change to struct amdgpu_device . Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Use helper to get ip block	Lijo Lazar	1	-25/+18
	Replace individual searches with the utility function get_ip_block Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: Move ip block related functions	Lijo Lazar	4	-443/+450
	Move ip block related functions to amdgpu_ip.c. No functional change intended. Signed-off-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Alex Deucher <alexander.deucher@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amdgpu: fix a job->pasid access race in gpu recovery	Alex Deucher	1	-2/+8
	Avoid a possible UAF in GPU recovery due to a race between the sched timeout callback and the tdr work queue. The gpu recovery function calls drm_sched_stop() and later drm_sched_start(). drm_sched_start() restarts the tdr queue which will eventually free the job. If the tdr queue frees the job before time out callback completes, the job will be freed and we'll get a UAF when accessing the pasid. Cache it early to avoid the UAF. Example KASAN trace: [ 493.058141] BUG: KASAN: slab-use-after-free in amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.067530] Read of size 4 at addr ffff88b0ce3f794c by task kworker/u128:1/323 [ 493.074892] [ 493.076485] CPU: 9 UID: 0 PID: 323 Comm: kworker/u128:1 Tainted: G E 6.16.0-1289896.2.zuul.bf4f11df81c1410bbe901c4373305a31 #1 PREEMPT(voluntary) [ 493.076493] Tainted: [E]=UNSIGNED_MODULE [ 493.076495] Hardware name: TYAN B8021G88V2HR-2T/S8021GM2NR-2T, BIOS V1.03.B10 04/01/2019 [ 493.076500] Workqueue: amdgpu-reset-dev drm_sched_job_timedout [gpu_sched] [ 493.076512] Call Trace: [ 493.076515] <TASK> [ 493.076518] dump_stack_lvl+0x64/0x80 [ 493.076529] print_report+0xce/0x630 [ 493.076536] ? _raw_spin_lock_irqsave+0x86/0xd0 [ 493.076541] ? __pfx__raw_spin_lock_irqsave+0x10/0x10 [ 493.076545] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077253] kasan_report+0xb8/0xf0 [ 493.077258] ? amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.077965] amdgpu_device_gpu_recover+0x968/0x990 [amdgpu] [ 493.078672] ? __pfx_amdgpu_device_gpu_recover+0x10/0x10 [amdgpu] [ 493.079378] ? amdgpu_coredump+0x1fd/0x4c0 [amdgpu] [ 493.080111] amdgpu_job_timedout+0x642/0x1400 [amdgpu] [ 493.080903] ? pick_task_fair+0x24e/0x330 [ 493.080910] ? __pfx_amdgpu_job_timedout+0x10/0x10 [amdgpu] [ 493.081702] ? _raw_spin_lock+0x75/0xc0 [ 493.081708] ? __pfx__raw_spin_lock+0x10/0x10 [ 493.081712] drm_sched_job_timedout+0x1b0/0x4b0 [gpu_sched] [ 493.081721] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081725] process_one_work+0x679/0xff0 [ 493.081732] worker_thread+0x6ce/0xfd0 [ 493.081736] ? __pfx_worker_thread+0x10/0x10 [ 493.081739] kthread+0x376/0x730 [ 493.081744] ? __pfx_kthread+0x10/0x10 [ 493.081748] ? __pfx__raw_spin_lock_irq+0x10/0x10 [ 493.081751] ? __pfx_kthread+0x10/0x10 [ 493.081755] ret_from_fork+0x247/0x330 [ 493.081761] ? __pfx_kthread+0x10/0x10 [ 493.081764] ret_from_fork_asm+0x1a/0x30 [ 493.081771] </TASK> Fixes: a72002cb181f ("drm/amdgpu: Make use of drm_wedge_task_info") Link: https://github.com/HansKristian-Work/vkd3d-proton/pull/2670 Cc: SRINIVASAN.SHANMUGAM@amd.com Cc: vitaly.prosyak@amd.com Cc: christian.koenig@amd.com Suggested-by: Matthew Brost <matthew.brost@intel.com> Reviewed-by: Srinivasan Shanmugam <srinivasan.shanmugam@amd.com> Reviewed-by: Lijo Lazar <lijo.lazar@amd.com> Reviewed-by: Christian König <christian.koenig@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
2025-12-16	drm/amd/display: Promote DC to 3.2.363	Taimur Hassan	1	-1/+1
	This version brings along the following updates: - Replay Video Conferencing V2 - Fix scratch registers offsets for DCN35 and DCN351 - Fix DP no audio issue - Add use_max_lsw parameter - Fix presentation of Z8 efficiency - Add USB-C DP Alt Mode lane limitation in DCN32 - Support DRR granularity - Don't disable DPCD mst_en if sink connected - Set enable_legacy_fast_update to false for DCN35/351 - Split update_planes_and_stream_v3 into parts (V2) Signed-off-by: Taimur Hassan <Syed.Hassan@amd.com> Signed-off-by: Chenyu Chen <chen-yu.chen@amd.com> Acked-by: Wayne Lin <Wayne.Lin@amd.com> Tested-by: Daniel Wheeler <daniel.wheeler@amd.com> Signed-off-by: Alex Deucher <alexander.deucher@amd.com>