| Age | Commit message (Collapse) | Author | Files | Lines |
|
This reverts commit 7294863a6f01248d72b61d38478978d638641bee.
This commit was erroneously applied again after commit 0ab5d711ec74
("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
removed it, leading to very hard to debug crashes, when used with a system with two
AMD GPUs of which only one supports ASPM.
Link: https://lore.kernel.org/linux-acpi/20251006120944.7880-1-spasswolf@web.de/
Link: https://github.com/acpica/acpica/issues/1060
Fixes: 0ab5d711ec74 ("drm/amd: Refactor `amdgpu_aspm` to be evaluated per device")
Signed-off-by: Bert Karwatzki <spasswolf@web.de>
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 97a9689300eb2b393ba5efc17c8e5db835917080)
Cc: stable@vger.kernel.org
|
|
[Why&How]
Right now, the HDMI HPD filter is enabled by default at 1500ms.
We want to disable it by default, as most modern displays with HDMI do
not require it for DPMS mode.
The HPD can instead be enabled as a driver parameter with a custom delay
value in ms (up to 5000ms).
Fixes: c918e75e1ed9 ("drm/amd/display: Add an HPD filter for HDMI")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4859
Signed-off-by: Ivan Lipski <ivan.lipski@amd.com>
Reviewed-by: Mario Limonciello (AMD) <superm1@kernel.org>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 6a681cd9034587fe3550868bacfbd639d1c6891f)
|
|
If console suspend has been disabled using `no_console_suspend` also
wake up during thaw() so that some messages can be seen for debugging.
Closes: https://gitlab.freedesktop.org/drm/amd/-/work_items/4191
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
(cherry picked from commit 63387cbbb714d9f0d179d9d4560de1408d0906de)
|
|
Now that the DC analog connector support and VCE1 support landed,
amdgpu is at feature parity with the old radeon driver
on SI dGPUs.
Enabling the amdgpu driver by default for SI dGPUs has the
following benefits:
- More stable OpenGL support through RadeonSI
- Vulkan support through RADV
- Improved performance
- Better display features through DC
Users who want to keep using the old driver can do so using:
amdgpu.si_support=0 radeon.si_support=1
v2:
- Update documentation in Kconfig file
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The amdgpu driver has been working well on CIK dGPUs for years.
Now that the DC analog connector support landed, these GPUs
are at feature parity with the old radeon driver.
Additionally, amdgpu yields extra performance, supports Vulkan
and provides more display features through DC as well as more
robust power management.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Move the determination into a separate function.
Change amdgpu.si_support and amdgpu.cik_support so that their
default value is -1 (default).
This prepares the code for changing the default driver based
on the chip.
Also adjust the module param documentation.
Signed-off-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The comment already explains it but the module parameter help text
doesn't.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4684
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For a mode-1 reset done at the end of S3 on PSPv11 dGPUs, only check if
TOS is unloaded.
Fixes: 32f73741d6ee ("drm/amdgpu: Wait for bootloader after PSPv11 reset")
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4649
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
pm_runtime_put_autosuspend(), pm_runtime_put_sync_autosuspend(),
pm_runtime_autosuspend() and pm_request_autosuspend() now include a call
to pm_runtime_mark_last_busy(). Remove the now-redundant explicit call to
pm_runtime_mark_last_busy().
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
This commit refactors the AMDGPU userqueue management subsystem to replace
IDR (ID Allocation) with XArray for improved performance, scalability, and
maintainability. The changes address several issues with the previous IDR
implementation and provide better locking semantics.
Key changes:
1. **Global XArray Introduction**:
- Added `userq_doorbell_xa` to `struct amdgpu_device` for global queue tracking
- Uses doorbell_index as key for efficient global lookup
- Replaces the previous `userq_mgr_list` linked list approach
2. **Per-process XArray Conversion**:
- Replaced `userq_idr` with `userq_mgr_xa` in `struct amdgpu_userq_mgr`
- Maintains per-process queue tracking with queue_id as key
- Uses XA_FLAGS_ALLOC for automatic ID allocation
3. **Locking Improvements**:
- Removed global `userq_mutex` from `struct amdgpu_device`
- Replaced with fine-grained XArray locking using XArray's internal spinlocks
4. **Runtime Idle Check Optimization**:
- Updated `amdgpu_runtime_idle_check_userq()` to use xa_empty
5. **Queue Management Functions**:
- Converted all IDR operations to equivalent XArray functions:
- `idr_alloc()` → `xa_alloc()`
- `idr_find()` → `xa_load()`
- `idr_remove()` → `xa_erase()`
- `idr_for_each()` → `xa_for_each()`
Benefits:
- **Performance**: XArray provides better scalability for large numbers of queues
- **Memory Efficiency**: Reduced memory overhead compared to IDR
- **Thread Safety**: Improved locking semantics with XArray's internal spinlocks
v2: rename userq_global_xa/userq_xa to userq_doorbell_xa/userq_mgr_xa
Remove xa_lock and use its own lock.
v3: Set queue->userq_mgr = uq_mgr in amdgpu_userq_create()
v4: use xa_store_irq (Christian)
hold the read side of the reset lock while creating/destroying queues and the manager data structure. (Chritian)
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Suggested-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
The shutdown() callback uses amdgpu_ip_suspend() which doesn't notify
drm clients during shutdown. This could lead to hangs.
[How]
Change amdgpu_pci_shutdown() to call the same sequence as suspend/resume.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
There has been multiple complains that 10 seconds are usually to long.
The original requirement for longer timeout came from compute tests on
AMDVLK, since that is no longer a topic reduce the timeout back to 2
seconds for all queues.
While at it also remove any special handling for compute queues under
SRIOV or pass through.
v2: fix checkpatch warning.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The Constant Engine found on gfx6-gfx10 HW has been a notorious source of
problems.
RADV never used it in the first place, radeonsi only used it for a few
releases around 2017 for gfx6-gfx9 before dropping support for it as
well.
While investigating another problem I just recently found that submitting
to the CE seems to be completely broken on gfx9 for quite a while.
Since nobody complained about that problem it most likely means that
nobody is using any of the affected radeonsi versions on current Linux
kernels any more.
So to potentially phase out the support for the CE and eliminate another
source of problems block submitting CE IBs unless it is enabled again
using a debug flag.
Signed-off-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Acked-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Pull drm updates from Dave Airlie:
"cross-subsystem:
- i2c-hid: Make elan touch controllers power on after panel is
enabled
- dt bindings for STM32MP25 SoC
- pci vgaarb: use screen_info helpers
- rust pin-init updates
- add MEI driver for late binding firmware update/load
uapi:
- add ioctl for reassigning GEM handles
- provide boot_display attribute on boot-up devices
core:
- document DRM_MODE_PAGE_FLIP_EVENT
- add vendor specific recovery method to drm device wedged uevent
gem:
- Simplify gpuvm locking
ttm:
- add interface to populate buffers
sched:
- Fix race condition in trace code
atomic:
- Reallow no-op async page flips
display:
- dp: Fix command length
video:
- Improve pixel-format handling for struct screen_info
rust:
- drop Opaque<> from ioctl args
- Alloc:
- BorrowedPage type and AsPageIter traits
- Implement Vmalloc::to_page() and VmallocPageIter
- DMA/Scatterlist:
- Add dma::DataDirection and type alias for dma_addr_t
- Abstraction for struct scatterlist and sg_table
- DRM:
- simplify use of generics
- add DriverFile type alias
- drop Object::SIZE
- Rust:
- pin-init tree merge
- Various methods for AsBytes and FromBytes traits
gpuvm:
- Support madvice in Xe driver
gpusvm:
- fix hmm_pfn_to_map_order usage in gpusvm
bridge:
- Improve and fix ref counting on bridge management
- cdns-dsi: Various improvements to mode setting
- Support Solomon SSD2825 plus DT bindings
- Support Waveshare DSI2DPI plus DT bindings
- Support Content Protection property
- display-connector: Improve DP display detection
- Add support for Radxa Ra620 plus DT bindings
- adv7511: Provide SPD and HDMI infoframes
- it6505: Replace crypto_shash with sha()
- synopsys: Add support for DW DPTX Controller plus DT bindings
- adv7511: Write full Audio infoframe
- ite6263: Support vendor-specific infoframes
- simple: Add support for Realtek RTD2171 DP-to-HDMI plus DT bindings
panel:
- panel-edp: Support mt8189 Chromebooks; Support BOE NV140WUM-N64;
Support SHP LQ134Z1; Fixes
- panel-simple: Support Olimex LCD-OLinuXino-5CTS plus DT bindings
- Support Samsung AMS561RA01
- Support Hydis HV101HD1 plus DT bindings
- ilitek-ili9881c: Refactor mode setting; Add support for Bestar
BSD1218-A101KL68 LCD plus DT bindings
- lvds: Add support for Ampire AMP19201200B5TZQW-T03 to DT bindings
- edp: Add support for additonal mt8189 Chromebook panels
- lvds: Add DT bindings for EDT ETML0700Z8DHA
amdgpu:
- add CRIU support for gem objects
- RAS updates
- VCN SRAM load fixes
- EDID read fixes
- eDP ALPM support
- Documentation updates
- Rework PTE flag generation
- DCE6 fixes
- VCN devcoredump cleanup
- MMHUB client id fixes
- VCN 5.0.1 RAS support
- SMU 13.0.x updates
- Expanded PCIe DPC support
- Expanded VCN reset support
- VPE per queue reset support
- give kernel jobs unique id for tracing
- pre-populate exported buffers
- cyan skillfish updates
- make vbios build number available in sysfs
- userq updates
- HDCP updates
- support MMIO remap page as ttm pool
- JPEG parser updates
- DCE6 DC updates
- use devm for i2c buses
- GPUVM locking updates
- Drop non-DC DCE11 code
- improve fallback handling for pixel encoding
amdkfd:
- SVM/page migration fixes
- debugfs fixes
- add CRIO support for gem objects
- SVM updates
radeon:
- use dev_warn_once in CS parsers
xe:
- add madvise interface
- add DRM_IOCTL_XE_VM_QUERY_MEMORY_RANGE_ATTRS to query VMA count
and memory attributes
- drop L# bank mask reporting from media GT3 on Xe3+.
- add SLPC power_profile sysfs interface
- add configs attribs to add post/mid context-switch commands
- handle firmware reported hardware errors notifying userspace with
device wedged uevent
- use same dir structure across sysfs/debugfs
- cleanup and future proof vram region init
- add G-states and PCI link states to debugfs
- Add SRIOV support for CCS surfaces on Xe2+
- Enable SRIOV PF mode by default on supported platforms
- move flush to common code
- extended core workarounds for Xe2/3
- use DRM scheduler for delayed GT TLB invalidations
- configs improvements and allow VF device enablement
- prep work to expose mmio regions to userspace
- VF migration support added
- prepare GPU SVM for THP migration
- start fixing XE_PAGE_SIZE vs PAGE_SIZE
- add PSMI support for hw validation
- resize VF bars to max possible size according to number of VFs
- Ensure GT is in C0 during resume
- pre-populate exported buffers
- replace xe_hmm with gpusvm
- add more SVM GT stats to debugfs
- improve fake pci and WA kunnit handle for new platform testing
- Test GuC to GuC comms to add debugging
- use attribute groups to simplify sysfs registration
- add Late Binding firmware code to interact with MEI
i915:
- apply multiple JSL/EHL/Gen7/Gen6 workarounds properly
- protect against overflow in active_engine()
- Use try_cmpxchg64() in __active_lookup()
- include GuC registers in error state
- get rid of dev->struct_mutex
- iopoll: generalize read_poll_timout
- lots more display refactoring
- Reject HBR3 in any eDP Panel
- Prune modes for YUV420
- Display Wa fix, additions, and updates
- DP: Fix 2.7 Gbps link training on g4x
- DP: Adjust the idle pattern handling
- DP: Shuffle the link training code a bit
- Don't set/read the DSI C clock divider on GLK
- Enable_psr kernel parameter changes
- Type-C enabled/disconnected dp-alt sink
- Wildcat Lake enabling
- DP HDR updates
- DRAM detection
- wait PSR idle on dsb commit
- Remove FBC modulo 4 restriction for ADL-P+
- panic: refactor framebuffer allocation
habanalabs:
- debug/visibility improvements
- vmalloc-backed coherent mmap support
- HLDIO infrastructure
nova-core:
- various register!() macro improvements
- minor vbios/firmware fixes/refactoring
- advance firmware boot stages; process Booter and patch signatures
- process GSP and GSP bootloader
- Add r570.144 firmware bindings and update to it
- Move GSP boot code to own module
- Use new pin-init features to store driver's private data in a
single allocation
- Update ARef import from sync::aref
nova-drm:
- Update ARef import from sync::aref
tyr:
- initial driver skeleton for a rust driver for ARM Mali GPUs
- capable of powering up, query metadata and provide it to userspace.
msm:
- GPU and Core:
- in DT bindings describe clocks per GPU type
- GMU bandwidth voting for x1-85
- a623/a663 speedbins
- cleanup some remaining no-iommu leftovers after VM_BIND conversion
- fix GEM obj 32b size truncation
- add missing VM_BIND param validation
- IFPC for x1-85 and a750
- register xml and gen_header.py sync from mesa
- Display:
- add missing bindings for display on SC8180X
- added DisplayPort MST bindings
- conversion from round_rate() to determine_rate()
amdxdna:
- add IOCTL_AMDXDNA_GET_ARRAY
- support user space allocated buffers
- streamline PM interfaces
- Refactoring wrt. hardware contexts
- improve error reporting
nouveau:
- use GSP firmware by default
- improve error reporting
- Pre-populate exported buffers
ast:
- Clean up detection of DRAM config
exynos:
- add DSIM bridge driver support for Exynos7870
- Document Exynos7870 DSIM compatible in dt-binding
panthor:
- Print task/pid on errors
- Add support for Mali G710, G510, G310, Gx15, Gx20, Gx25
- Improve cache flushing
- Fail VM bind if BO has offset
renesas:
- convert to RUNTIME_PM_OPS
rcar-du:
- Make number of lanes configurable
- Use RUNTIME_PM_OPS
- Add support for DSI commands
rocket:
- Add driver for Rockchip NPU plus DT bindings
- Use kfree() and sizeof() correctly
- Test DMA status
rockchip:
- dsi2: Add support for RK3576 plus DT bindings
- Add support for RK3588 DPTX output
tidss:
- Use crtc_ fields for programming display mode
- Remove other drivers from aperture
pixpaper:
- Add support for Mayqueen Pixpaper plus DT bindings
v3d:
- Support querying nubmer of GPU resets for KHR_robustness
stm:
- Clean up logging
- ltdc: Add support support for STM32MP257F-EV1 plus DT bindings
sitronix:
- st7571-i2c: Add support for inverted displays and 2-bit grayscale
tidss:
- Convert to kernel's FIELD_ macros
vesadrm:
- Support 8-bit palette mode
imagination:
- Improve power management
- Add support for TH1520 GPU
- Support Risc-V architectures
v3d:
- Improve job management and locking
vkms:
- Support variants of ARGB8888, ARGB16161616, RGB565, RGB888 and P01x
- Spport YUV with 16-bit components"
* tag 'drm-next-2025-10-01' of https://gitlab.freedesktop.org/drm/kernel: (1455 commits)
drm/amd: Add name to modes from amdgpu_connector_add_common_modes()
drm/amd: Drop some common modes from amdgpu_connector_add_common_modes()
drm/amdgpu: update MODULE_PARM_DESC for freesync_video
drm/amd: Use dynamic array size declaration for amdgpu_connector_add_common_modes()
drm/amd/display: Share dce100_validate_global with DCE6-8
drm/amd/display: Share dce100_validate_bandwidth with DCE6-8
drm/amdgpu: Fix fence signaling race condition in userqueue
amd/amdkfd: enhance kfd process check in switch partition
amd/amdkfd: resolve a race in amdgpu_amdkfd_device_fini_sw
drm/amd/display: Reject modes with too high pixel clock on DCE6-10
drm/amd: Drop unnecessary check in amdgpu_connector_add_common_modes()
drm/amd/display: Only enable common modes for eDP and LVDS
drm/amdgpu: remove the redeclaration of variable i
drm/amdgpu/userq: assign an error code for invalid userq va
drm/amdgpu: revert "rework reserved VMID handling" v2
drm/amdgpu: remove leftover from enforcing isolation by VMID
drm/amdgpu: Add fallback to pipe reset if KCQ ring reset fails
accel/habanalabs: add Infineon version check
accel/habanalabs/gaudi2: read preboot status after recovering from dirty state
accel/habanalabs: add HL_GET_P_STATE passthrough type
...
|
|
To better describe what it does.
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/3756
Reviewed-by: Timur Kristóf <timur.kristof@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
[Why]
commit 530694f54dd5e ("drm/amdgpu: do not resume device in thaw for
normal hibernation") optimized the flow for systems that are going
into S4 where the power would be turned off. Basically the thaw()
callback wouldn't resume the device if the hibernation image was
successfully created since the system would be powered off.
This however isn't the correct flow for a system entering into
s0i3 after the hibernation image is created. Some of the amdgpu
callbacks have different behavior depending upon the intended
state of the suspend.
[How]
Use pm_hibernation_mode_is_suspend() as an input to decide whether
to run resume during thaw() callback.
Reported-by: Ionut Nechita <ionut_n2001@yahoo.com>
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4573
Tested-by: Ionut Nechita <ionut_n2001@yahoo.com>
Fixes: 530694f54dd5e ("drm/amdgpu: do not resume device in thaw for normal hibernation")
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Tested-by: Kenneth Crudup <kenny@panix.com>
Signed-off-by: Mario Limonciello (AMD) <superm1@kernel.org>
Cc: 6.17+ <stable@vger.kernel.org> # 6.17+: 495c8d35035e: PM: hibernate: Add pm_hibernation_mode_is_suspend()
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
|
|
Add additional PCI IDs to the cyan skillfish family.
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add new ioctl DRM_IOCTL_AMDGPU_GEM_LIST_HANDLES.
This ioctl returns a list of bos with their handles, sizes,
and flags and domains.
This ioctl is meant to be used during CRIU checkpoint and
provide information needed to reconstruct the bos
in CRIU restore.
Userspace for this and the next change can be found at
https://github.com/checkpoint-restore/criu/pull/2613
Signed-off-by: David Francis <David.Francis@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
User queues are disabled before GEM objects are released
(protecting against user app crashes).
No races with PCI hot-unplug (because drm_dev_enter prevents cleanup
if iewdevice is being removed).
Cc: Christian König <christian.koenig@amd.com>
Cc: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
When power management is not enabled in the kernel build, the newly
added hibernation changes cause a link failure:
arm-linux-gnueabi-ld: drivers/gpu/drm/amd/amdgpu/amdgpu_drv.o: in function `amdgpu_pmops_thaw':
amdgpu_drv.c:(.text+0x1514): undefined reference to `pm_hibernate_is_recovering'
Make the power management code in this driver conditional on
CONFIG_PM and CONFIG_PM_SLEEP
Fixes: 530694f54dd5 ("drm/amdgpu: do not resume device in thaw for normal hibernation")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Link: https://lore.kernel.org/r/20250714081635.4071570-1-arnd@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The variable `pm_suspend_target_state` is conditionally defined only when
`CONFIG_SUSPEND` is enabled (see `include/linux/suspend.h`). Directly
referencing it without guarding by `#ifdef CONFIG_SUSPEND` causes build
failures when suspend functionality is disabled (e.g., `CONFIG_SUSPEND=n`).
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Perry Yuan <perry.yuan@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fix dcdebugmask description.
Signed-off-by: Yann Dirson <ydirson@free.fr>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
https://gitlab.freedesktop.org/agd5f/linux into drm-next
amd-drm-next-6.17-2025-07-17:
amdgpu:
- Partition fixes
- Reset fixes
- RAS fixes
- i2c fix
- MPC updates
- DSC cleanup
- EDID fixes
- Display idle D3 update
- IPS updates
- DMUB updates
- Retimer fix
- Replay fixes
- Fix DC memory leak
- Initial support for smartmux
- DCN 4.0.1 degamma LUT fix
- Per queue reset cleanups
- Track ring state associated with a fence
- SR-IOV fixes
- SMU fixes
- Per queue reset improvements for GC 9+ compute
- Per queue reset improvements for GC 10+ gfx
- Per queue reset improvements for SDMA 5+
- Per queue reset improvements for JPEG 2+
- Per queue reset improvements for VCN 2+
- GC 8 fix
- ISP updates
amdkfd:
- Enable KFD on LoongArch
radeon:
- Drop console lock during suspend/resume
UAPI:
- Add userq slot info to INFO IOCTL
Used for IGT userq validation tests (https://lists.freedesktop.org/archives/igt-dev/2025-July/093228.html)
From: Alex Deucher <alexander.deucher@amd.com>
Link: https://lore.kernel.org/r/20250717213827.2061581-1-alexander.deucher@amd.com
Signed-off-by: Dave Airlie <airlied@redhat.com>
|
|
https://gitlab.freedesktop.org/drm/misc/kernel into drm-next
drm-misc-next for 6.17:
UAPI Changes:
Cross-subsystem Changes:
Core Changes:
- mode_config: Change fb_create prototype to pass the drm_format_info
and avoid redundant lookups in drivers
- sched: kunit improvements, memory leak fixes, reset handling
improvements
- tests: kunit EDID update
Driver Changes:
- amdgpu: Hibernation fixes, structure lifetime fixes
- nouveau: sched improvements
- sitronix: Add Sitronix ST7567 Support
- bridge:
- Make connector available to bridge detect hook
- panel:
- More refcounting changes
- New panels: BOE NE14QDM
Signed-off-by: Dave Airlie <airlied@redhat.com>
From: Maxime Ripard <mripard@redhat.com>
Link: https://lore.kernel.org/r/20250717-efficient-kudu-of-fantasy-ff95e0@houat
|
|
For kernel compute queues, align the timeout with
other kernel queues (10 sec). This had previously
been set higher for OpenCL when it used kernel
queues, but now OpenCL uses KFD user queues which
don't have a timeout limitation. This also aligns
with SR-IOV which already used a shorter timeout.
Additionally the longer timeout negatively impacts
the user experience with kernel queues for interactive
applications.
Reviewed-by: Kent Russell <kent.russell@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
add eeprom data checksum check before driver unload. reset eeprom
and save correct data to eeprom when check failed
Signed-off-by: ganglxie <ganglxie@amd.com>
Reviewed-by: Tao Zhou <tao.zhou1@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
For normal hibernation, GPU do not need to be resumed in thaw since it is
not involved in writing the hibernation image. Skip resume in this case
can reduce the hibernation time.
On VM with 8 * 192GB VRAM dGPUs, 98% VRAM usage and 1.7TB system memory,
this can save 50 minutes.
Signed-off-by: Samuel Zhang <guoqing.zhang@amd.com>
Tested-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Lijo Lazar <lijo.lazar@amd.com>
Link: https://lore.kernel.org/r/20250710062313.3226149-6-guoqing.zhang@amd.com
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
|
|
This reverts commit 5fb90421fa0fbe0a968274912101fe917bf1c47b.
The original patch moved `amdgpu_userq_mgr_fini()` to the driver's
`postclose` callback, which is called after `drm_gem_release()` in
the DRM file cleanup sequence.If a user application crashes or aborts
without cleaning up its user queues, 'drm_gem_release()` may free
GEM objects that are still referenced by active user queues, leading
to use-after-free. By reverting, we ensure that user queues are
disabled and cleaned up before any GEM objects are released,
preventing this class of bug. However, this reintroduces a race
during PCI hot-unplug, where device removal can race with per-file
cleanup, leading to use-after-free in suspend/unplug paths.
This will be fixed in the next patch.
Fixes: 5fb90421fa0f ("drm/amdgpu: fix slab-use-after-free in amdgpu_userq_mgr_fini+0x70c")
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Acked-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Pass amdgpu device context instead of drm device context to some
amdgpu_device_* functions. DRM device context is not required in those
functions. No functional change.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Fixes error with dev_info_once usage in amdgpu_device_asic_has_dc_support.
Signed-off-by: Lijo Lazar <lijo.lazar@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202506281140.mXfWT3EN-lkp@intel.com/
Fixes: a3e510fd69c3 ("drm/amdgpu: Convert from DRM_* to dev_*")
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
There is a spelling mistake in a pr_info message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The issue was reproduced on NV10 using IGT pci_unplug test.
It is expected that `amdgpu_driver_postclose_kms()` is called prior to `amdgpu_drm_release()`.
However, the bug is that `amdgpu_fpriv` was freed in `amdgpu_driver_postclose_kms()`, and then
later accessed in `amdgpu_drm_release()` via a call to `amdgpu_userq_mgr_fini()`.
As a result, KASAN detected a use-after-free condition, as shown in the log below.
The proposed fix is to move the calls to `amdgpu_eviction_fence_destroy()` and
`amdgpu_userq_mgr_fini()` into `amdgpu_driver_postclose_kms()`, so they are invoked before
`amdgpu_fpriv` is freed.
This also ensures symmetry with the initialization path in `amdgpu_driver_open_kms()`,
where the following components are initialized:
- `amdgpu_userq_mgr_init()`
- `amdgpu_eviction_fence_init()`
- `amdgpu_ctx_mgr_init()`
Correspondingly, in `amdgpu_driver_postclose_kms()` we should clean up using:
- `amdgpu_userq_mgr_fini()`
- `amdgpu_eviction_fence_destroy()`
- `amdgpu_ctx_mgr_fini()`
This change eliminates the use-after-free and improves consistency in resource management between open and close paths.
[ +0.094367] ==================================================================
[ +0.000026] BUG: KASAN: slab-use-after-free in amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
[ +0.000866] Write of size 8 at addr ffff88811c068c60 by task amd_pci_unplug/1737
[ +0.000026] CPU: 3 UID: 0 PID: 1737 Comm: amd_pci_unplug Not tainted 6.14.0+ #2
[ +0.000008] Hardware name: ASUS System Product Name/ROG STRIX B550-F GAMING (WI-FI), BIOS 1401 12/03/2020
[ +0.000004] Call Trace:
[ +0.000004] <TASK>
[ +0.000003] dump_stack_lvl+0x76/0xa0
[ +0.000010] print_report+0xce/0x600
[ +0.000009] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
[ +0.000790] ? srso_return_thunk+0x5/0x5f
[ +0.000007] ? kasan_complete_mode_report_info+0x76/0x200
[ +0.000008] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
[ +0.000684] kasan_report+0xbe/0x110
[ +0.000007] ? amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
[ +0.000601] __asan_report_store8_noabort+0x17/0x30
[ +0.000007] amdgpu_userq_mgr_fini+0x70c/0x730 [amdgpu]
[ +0.000801] ? __pfx_amdgpu_userq_mgr_fini+0x10/0x10 [amdgpu]
[ +0.000819] ? srso_return_thunk+0x5/0x5f
[ +0.000008] amdgpu_drm_release+0xa3/0xe0 [amdgpu]
[ +0.000604] __fput+0x354/0xa90
[ +0.000010] __fput_sync+0x59/0x80
[ +0.000005] __x64_sys_close+0x7d/0xe0
[ +0.000006] x64_sys_call+0x2505/0x26f0
[ +0.000006] do_syscall_64+0x7c/0x170
[ +0.000004] ? kasan_record_aux_stack+0xae/0xd0
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? kmem_cache_free+0x398/0x580
[ +0.000006] ? __fput+0x543/0xa90
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __fput+0x543/0xa90
[ +0.000004] ? __kasan_check_read+0x11/0x20
[ +0.000007] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? __kasan_check_read+0x11/0x20
[ +0.000003] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? fpregs_assert_state_consistent+0x21/0xb0
[ +0.000006] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? syscall_exit_to_user_mode+0x4e/0x240
[ +0.000005] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? do_syscall_64+0x88/0x170
[ +0.000003] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? do_syscall_64+0x88/0x170
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? irqentry_exit+0x43/0x50
[ +0.000004] ? srso_return_thunk+0x5/0x5f
[ +0.000004] ? exc_page_fault+0x7c/0x110
[ +0.000006] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000005] RIP: 0033:0x7ffff7b14f67
[ +0.000005] Code: ff e8 0d 16 02 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 03 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 41 c3 48 83 ec 18 89 7c 24 0c e8 73 ba f7 ff
[ +0.000004] RSP: 002b:00007fffffffe358 EFLAGS: 00000246 ORIG_RAX: 0000000000000003
[ +0.000006] RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffff7b14f67
[ +0.000003] RDX: 0000000000000000 RSI: 00007ffff7f5755a RDI: 0000000000000003
[ +0.000003] RBP: 00007fffffffe380 R08: 0000555555568170 R09: 0000000000000000
[ +0.000003] R10: 0000000000000000 R11: 0000000000000246 R12: 00007fffffffe5c8
[ +0.000003] R13: 00005555555552a9 R14: 0000555555557d48 R15: 00007ffff7ffd040
[ +0.000007] </TASK>
[ +0.000286] Allocated by task 425 on cpu 11 at 29.751192s:
[ +0.000013] kasan_save_stack+0x28/0x60
[ +0.000008] kasan_save_track+0x18/0x70
[ +0.000006] kasan_save_alloc_info+0x38/0x60
[ +0.000006] __kasan_kmalloc+0xc1/0xd0
[ +0.000005] __kmalloc_cache_noprof+0x1bd/0x430
[ +0.000006] amdgpu_driver_open_kms+0x172/0x760 [amdgpu]
[ +0.000521] drm_file_alloc+0x569/0x9a0
[ +0.000008] drm_client_init+0x1b7/0x410
[ +0.000007] drm_fbdev_client_setup+0x174/0x470
[ +0.000007] drm_client_setup+0x8a/0xf0
[ +0.000006] amdgpu_pci_probe+0x50b/0x10d0 [amdgpu]
[ +0.000482] local_pci_probe+0xe7/0x1b0
[ +0.000008] pci_device_probe+0x5bf/0x890
[ +0.000005] really_probe+0x1fd/0x950
[ +0.000007] __driver_probe_device+0x307/0x410
[ +0.000005] driver_probe_device+0x4e/0x150
[ +0.000006] __driver_attach+0x223/0x510
[ +0.000005] bus_for_each_dev+0x102/0x1a0
[ +0.000006] driver_attach+0x3d/0x60
[ +0.000005] bus_add_driver+0x309/0x650
[ +0.000005] driver_register+0x13d/0x490
[ +0.000006] __pci_register_driver+0x1ee/0x2b0
[ +0.000006] xfrm_ealg_get_byidx+0x43/0x50 [xfrm_algo]
[ +0.000008] do_one_initcall+0x9c/0x3e0
[ +0.000007] do_init_module+0x29e/0x7f0
[ +0.000006] load_module+0x5c75/0x7c80
[ +0.000006] init_module_from_file+0x106/0x180
[ +0.000007] idempotent_init_module+0x377/0x740
[ +0.000006] __x64_sys_finit_module+0xd7/0x180
[ +0.000006] x64_sys_call+0x1f0b/0x26f0
[ +0.000006] do_syscall_64+0x7c/0x170
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000013] Freed by task 1737 on cpu 9 at 76.455063s:
[ +0.000010] kasan_save_stack+0x28/0x60
[ +0.000006] kasan_save_track+0x18/0x70
[ +0.000005] kasan_save_free_info+0x3b/0x60
[ +0.000006] __kasan_slab_free+0x54/0x80
[ +0.000005] kfree+0x127/0x470
[ +0.000006] amdgpu_driver_postclose_kms+0x455/0x760 [amdgpu]
[ +0.000485] drm_file_free.part.0+0x5b1/0xba0
[ +0.000007] drm_file_free+0x13/0x30
[ +0.000006] drm_client_release+0x1c4/0x2b0
[ +0.000006] drm_fbdev_ttm_fb_destroy+0xd2/0x120 [drm_ttm_helper]
[ +0.000007] put_fb_info+0x97/0xe0
[ +0.000006] unregister_framebuffer+0x197/0x380
[ +0.000005] drm_fb_helper_unregister_info+0x94/0x100
[ +0.000005] drm_fbdev_client_unregister+0x3c/0x80
[ +0.000007] drm_client_dev_unregister+0x144/0x330
[ +0.000006] drm_dev_unregister+0x49/0x1b0
[ +0.000006] drm_dev_unplug+0x4c/0xd0
[ +0.000006] amdgpu_pci_remove+0x58/0x130 [amdgpu]
[ +0.000482] pci_device_remove+0xae/0x1e0
[ +0.000006] device_remove+0xc7/0x180
[ +0.000006] device_release_driver_internal+0x3d4/0x5a0
[ +0.000007] device_release_driver+0x12/0x20
[ +0.000006] pci_stop_bus_device+0x104/0x150
[ +0.000006] pci_stop_and_remove_bus_device_locked+0x1b/0x40
[ +0.000005] remove_store+0xd7/0xf0
[ +0.000007] dev_attr_store+0x3f/0x80
[ +0.000006] sysfs_kf_write+0x125/0x1d0
[ +0.000005] kernfs_fop_write_iter+0x2ea/0x490
[ +0.000007] vfs_write+0x90d/0xe70
[ +0.000006] ksys_write+0x119/0x220
[ +0.000006] __x64_sys_write+0x72/0xc0
[ +0.000006] x64_sys_call+0x18ab/0x26f0
[ +0.000005] do_syscall_64+0x7c/0x170
[ +0.000005] entry_SYSCALL_64_after_hwframe+0x76/0x7e
[ +0.000013] The buggy address belongs to the object at ffff88811c068000
which belongs to the cache kmalloc-rnd-01-4k of size 4096
[ +0.000016] The buggy address is located 3168 bytes inside of
freed 4096-byte region [ffff88811c068000, ffff88811c069000)
[ +0.000022] The buggy address belongs to the physical page:
[ +0.000010] page: refcount:0 mapcount:0 mapping:0000000000000000 index:0xffff88811c06e000 pfn:0x11c068
[ +0.000006] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ +0.000006] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[ +0.000007] page_type: f5(slab)
[ +0.000007] raw: 0017ffffc0000040 ffff88810004c140 dead000000000122 0000000000000000
[ +0.000005] raw: ffff88811c06e000 0000000080040002 00000000f5000000 0000000000000000
[ +0.000006] head: 0017ffffc0000040 ffff88810004c140 dead000000000122 0000000000000000
[ +0.000005] head: ffff88811c06e000 0000000080040002 00000000f5000000 0000000000000000
[ +0.000006] head: 0017ffffc0000003 ffffea0004701a01 ffffffffffffffff 0000000000000000
[ +0.000005] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
[ +0.000004] page dumped because: kasan: bad access detected
[ +0.000011] Memory state around the buggy address:
[ +0.000009] ffff88811c068b00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000012] ffff88811c068b80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] >ffff88811c068c00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] ^
[ +0.000010] ffff88811c068c80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] ffff88811c068d00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
[ +0.000011] ==================================================================
Cc: Alex Deucher <alexander.deucher@amd.com>
Cc: Christian König <christian.koenig@amd.com>
Cc: Lijo Lazar <lijo.lazar@amd.com>
Cc: Jesse Zhang <Jesse.Zhang@amd.com>
Cc: Arvind Yadav <arvind.yadav@amd.com>
v2: drop amdgpu_drm_release() and assign drm_release()
as the callback directly.(Alex)
Fixes: adba0929736a ("drm/amdgpu: Fix Illegal opcode in command stream Error")
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Signed-off-by: Vitaly Prosyak <vitaly.prosyak@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Just use kmalloc for the fences in the rare case we need
an independent fence.
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
complete() callbacks are supposed to handle reversing anything
that occurred during prepare() callbacks. They'll be called on every
power state transition, and will also be called if the sequence is
failed (such as an aborted suspend).
Add support for IP blocks to support this action.
Reviewed-by: Alex Hung <alex.hung@amd.com>
Link: https://lore.kernel.org/r/20250602014432.3538345-2-superm1@kernel.org
Signed-off-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Add debug mask to disable kernel logs of RAS correctable errors,
including both ACA and CE error counter kernel messages.
Signed-off-by: Xiang Liu <xiang.liu@amd.com>
Reviewed-by: Hawking Zhang <Hawking.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
The current cleanup order during file descriptor close can lead to
a race condition where the eviction fence worker attempts to access
a destroyed mutex from the user queue manager:
[ 517.294055] DEBUG_LOCKS_WARN_ON(lock->magic != lock)
[ 517.294060] WARNING: CPU: 8 PID: 2030 at kernel/locking/mutex.c:564
[ 517.294094] Workqueue: events amdgpu_eviction_fence_suspend_worker [amdgpu]
The issue occurs because:
1. We destroy the user queue manager (including its mutex) first
2. Then try to destroy eviction fences which may have pending work
3. The eviction fence worker may try to access the already-destroyed mutex
Fix this by reordering the cleanup to:
1. First mark the fd as closing and destroy eviction fences,
which flushes any pending work
2. Then safely destroy the user queue manager after we're certain
no more fence work will be executed
The copy in amdgpu_driver_postclose_kms() needs to be removed (Christian)
Reviewed-by: Christian König <christian.koenig@amd.com>
Reviewed-by: Prike Liang <Prike.Liang@amd.com>
Reviewed-by: Arvind Yadav <Arvind.Yadav@amd.com>
Signed-off-by: Jesse Zhang <Jesse.Zhang@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
In VM debug mode, it is desirable to notify the application
to correct the freeing sequence by unmapping the memory before
destroying the userptr in the old userptr path. Add a bitmask
to decide whether to send gpu vm fault to the applition.
Signed-off-by: Shane Xiao <shane.xiao@amd.com>
Reviewed-by: Christian König <christian.koenig@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Set the s3/s0ix and s4 flags in the pm notifier so that we can skip
the resource evictions properly in pm prepare based on whether
we are suspending or hibernating. Drop the eviction as processes
are not frozen at this time, we we can end up getting stuck trying
to evict VRAM while applications continue to submit work which
causes the buffers to get pulled back into VRAM.
v2: Move suspend flags out of pm notifier (Mario)
Closes: https://gitlab.freedesktop.org/drm/amd/-/issues/4178
Fixes: 2965e6355dcd ("drm/amd: Add Suspend/Hibernate notification callback support")
Cc: Mario Limonciello <mario.limonciello@amd.com>
Reviewed-by: Mario Limonciello <mario.limonciello@amd.com>
Signed-off-by: Alex Deucher <alexander.deucher@amd.com>
|
|
Adjusted the enforce isolation setting handling to include the ability
to disable the cleaner shader without affecting isolation between tasks.
v2: Updated enfo |