aboutsummaryrefslogtreecommitdiff
path: root/drivers/hv
AgeCommit message (Collapse)AuthorFilesLines
2026-04-24Merge tag 'drm-fixes-2026-04-24' of https://gitlab.freedesktop.org/drm/kernelLinus Torvalds1-1/+1
Pull more drm fixes from Dave Airlie: "These are the regular fixes that have built up over last couple of weeks, all pretty minor and spread all over. atomic: - raise the vblank timeout to avoid it on virtual drivers - fix colorop duplication bridge: - stm_lvds: state check fix - dw-mipi-dsi: bridge reference leak fix panel: - visionx-rm69299: init fix dma-fence: - fix sparse warning dma-buf: - UAF fix panthor: - mapping fix arcgpu: - device_node reference leak fix nouveau: - memory leak in error path fix - overflow in reloc path for old hw fix hv: - Kconfig fix v3d: - infinite loop fix" * tag 'drm-fixes-2026-04-24' of https://gitlab.freedesktop.org/drm/kernel: drm/nouveau: fix u32 overflow in pushbuf reloc bounds check MAINTAINERS: split hisilicon maintenance and add Yongbang Shi for hibmc-drm matainers drm/v3d: Reject empty multisync extension to prevent infinite loop drm/panel: visionox-rm69299: Make use of prepare_prev_first drm/drm_atomic: duplicate colorop states if plane color pipeline in use drm/nouveau: fix nvkm_device leak on aperture removal failure hv: Select CONFIG_SYSFB only for CONFIG_HYPERV_VMBUS dma-fence: Silence sparse warning in dma_fence_describe drm/bridge: dw-mipi-dsi: Fix bridge leak when host attach fails drm/arcpgu: fix device node leak drm/panthor: Fix outdated function documentation drm/panthor: Extend VM locked region for remap case to be a superset dma-buf: fix UAF in dma_buf_put() tracepoint drm/bridge: stm_lvds: Do not fail atomic_check on disabled connector drm/atomic: Increase timeout in drm_atomic_helper_wait_for_vblanks()
2026-04-24Merge tag 'drm-misc-fixes-2026-04-23' of ↵Dave Airlie1-1/+1
https://gitlab.freedesktop.org/drm/misc/kernel into drm-fixes This week in drm-misc-fixes, we have: - A patch to raise the vblank timeout to avoid it on virtual drivers - a state check fix for stm_lvds - a use-after-free fix for dma-buf - a mapping fix for panthor - a device_node reference leak fix for arcgpu - a bridge reference leak fix for dw-mipi-dsi - a sparse warning fix for dma-fence - a kconfig fix for hv - a memory leak fix for nouveau - a fix to duplicate colorop when duplicating states - a panel initialisation order fix for visionox-rm69299 - a fix to prevent an infinite loop for v3d - an overflow fix for nouveau Signed-off-by: Dave Airlie <airlied@redhat.com> From: Maxime Ripard <mripard@redhat.com> Link: https://patch.msgid.link/20260423-realistic-eager-reindeer-4dacf7@houat
2026-04-22Merge tag 'hyperv-next-signed-20260421' of ↵Linus Torvalds13-45/+741
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull Hyper-V updates from Wei Liu: - Fix cross-compilation for hv tools (Aditya Garg) - Fix vmemmap_shift exceeding MAX_FOLIO_ORDER in mshv_vtl (Naman Jain) - Limit channel interrupt scan to relid high water mark (Michael Kelley) - Export hv_vmbus_exists() and use it in pci-hyperv (Dexuan Cui) - Fix cleanup and shutdown issues for MSHV (Jork Loeser) - Introduce more tracing support for MSHV (Stanislav Kinsburskii) * tag 'hyperv-next-signed-20260421' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Skip LP/VP creation on kexec x86/hyperv: move stimer cleanup to hv_machine_shutdown() Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing mshv: Add tracepoint for GPA intercept handling mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER tools: hv: Fix cross-compilation Drivers: hv: vmbus: Export hv_vmbus_exists() and use it in pci-hyperv mshv: Introduce tracing support Drivers: hv: vmbus: Limit channel interrupt scan to relid high water mark
2026-04-22x86/hyperv: Skip LP/VP creation on kexecJork Loeser1-0/+47
After a kexec the logical processors and virtual processors already exist in the hypervisor because they were created by the previous kernel. Attempting to add them again causes either a BUG_ON or corrupted VP state leading to MCEs in the new kernel. Add hv_lp_exists() to probe whether an LP is already present by calling HVCALL_GET_LOGICAL_PROCESSOR_RUN_TIME. When it succeeds the LP exists and we skip the add-LP and create-VP loops entirely. Also add hv_call_notify_all_processors_started() which informs the hypervisor that all processors are online. This is required after adding LPs (fresh boot) and is a no-op on kexec since we skip that path. Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Co-developed-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com> Signed-off-by: Stanislav Kinsburskii <stanislav.kinsburskii@gmail.com> Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-22x86/hyperv: move stimer cleanup to hv_machine_shutdown()Jork Loeser1-1/+0
Move hv_stimer_global_cleanup() from vmbus's hv_kexec_handler() to hv_machine_shutdown() in the platform code. This ensures stimer cleanup happens before the vmbus unload, which is required for root partition kexec to work correctly. Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-22Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowingJork Loeser1-1/+0
vmbus_alloc_synic_and_connect() declares a local 'int hyperv_cpuhp_online' that shadows the file-scope global of the same name. The cpuhp state returned by cpuhp_setup_state() is stored in the local, leaving the global at 0 (CPUHP_OFFLINE). When hv_kexec_handler() or hv_machine_shutdown() later call cpuhp_remove_state(hyperv_cpuhp_online) they pass 0, which hits the BUG_ON in __cpuhp_remove_state_cpuslocked(). Remove the local declaration so the cpuhp state is stored in the file-scope global where hv_kexec_handler() and hv_machine_shutdown() expect it. Fixes: 2647c96649ba ("Drivers: hv: Support establishing the confidential VMBus connection") Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-22mshv: Add tracepoint for GPA intercept handlingStanislav Kinsburskii2-2/+33
Provide visibility into GPA intercept operations for debugging and performance analysis of Microsoft Hypervisor guest memory management. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds3-15/+22
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-15hv: Select CONFIG_SYSFB only for CONFIG_HYPERV_VMBUSThomas Zimmermann1-1/+1
Hyperv's sysfb access only exists in the VMBUS support. Therefore only select CONFIG_SYSFB for CONFIG_HYPERV_VMBUS. Avoids sysfb code on systems that don't need it. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Fixes: 96959283a58d ("Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests") Cc: Michael Kelley <mhklinux@outlook.com> Cc: Saurabh Sengar <ssengar@linux.microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: "K. Y. Srinivasan" <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Long Li <longli@microsoft.com> Cc: linux-hyperv@vger.kernel.org Cc: <stable@vger.kernel.org> # v6.16+ Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://patch.msgid.link/20260402092305.208728-2-tzimmermann@suse.de
2026-04-14Merge tag 'irq-core-2026-04-12' of ↵Linus Torvalds2-6/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core irq updates from Thomas Gleixner: - Invoke add_interrupt_randomness() in handle_percpu_devid_irq() and cleanup the workaround in the Hyper-V driver, which would now invoke it twice on ARM64. Removing it from the driver requires to add it to the x86 system vector entry point - Remove the pointles cpu_read_lock() around reading CPU possible mask, which is read only after init - Add documentation for the interaction between device tree bindings and the interrupt type defines in irq.h - Delete stale defines in the matrix allocator and the equivalent in loongarch * tag 'irq-core-2026-04-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Drivers: hv: Move add_interrupt_randomness() to hypervisor callback sysvec genirq/chip: Invoke add_interrupt_randomness() in handle_percpu_devid_irq() genirq/affinity: Remove cpus_read_lock() while reading cpu_possible_mask genirq/matrix, LoongArch: Delete IRQ_MATRIX_BITS leftovers genirq: Document interaction between <linux/irq.h> and DT binding defines
2026-04-14mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDERNaman Jain1-3/+9
When registering VTL0 memory via MSHV_ADD_VTL0_MEMORY, the kernel computes pgmap->vmemmap_shift as the number of trailing zeros in the OR of start_pfn and last_pfn, intending to use the largest compound page order both endpoints are aligned to. However, this value is not clamped to MAX_FOLIO_ORDER, so a sufficiently aligned range (e.g. physical range [0x800000000000, 0x800080000000), corresponding to start_pfn=0x800000000 with 35 trailing zeros) can produce a shift larger than what memremap_pages() accepts, triggering a WARN and returning -EINVAL: WARNING: ... memremap_pages+0x512/0x650 requested folio size unsupported The MAX_FOLIO_ORDER check was added by commit 646b67d57589 ("mm/memremap: reject unreasonable folio/compound page sizes in memremap_pages()"). Fix this by clamping vmemmap_shift to MAX_FOLIO_ORDER so we always request the largest order the kernel supports, in those cases, rather than an out-of-range value. Also fix the error path to propagate the actual error code from devm_memremap_pages() instead of hard-coding -EFAULT, which was masking the real -EINVAL return. Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") Cc: stable@vger.kernel.org Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-14Drivers: hv: vmbus: Export hv_vmbus_exists() and use it in pci-hypervDexuan Cui1-12/+8
With commit f84b21da3624 ("PCI: hv: Don't load the driver for baremetal root partition"), the bare metal Linux root partition won't use the pci-hyperv driver, but when a Linux VM runs on the Linux root partition, pci-hyperv's module_init function init_hv_pci_drv() can still run, e.g. in the case of CONFIG_PCI_HYPERV=y, even if the VMBus driver is not used in such a VM (i.e. the hv_vmbus driver's init function returns -ENODEV due to vmbus_root_device being NULL). In such a Linux VM, init_hv_pci_drv() runs with a side effect: the 3 hvpci_block_ops callbacks are set to functions that depend on hv_vmbus. Later, when the MLX driver in such a VM invokes the callbacks, e.g. in drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c: mlx5_hv_register_invalidate(), hvpci_block_ops.reg_blk_invalidate() is hv_register_block_invalidate() rather than a NULL function pointer, and hv_register_block_invalidate() assumes that it can find a struct hv_pcibus_device from pdev->bus->sysdata, which is false in such a VM. Consequently, hv_register_block_invalidate() -> get_pcichild_wslot() -> spin_lock_irqsave() may hang since it can be accessing an invalid spinlock pointer. Fix the issue by exporting hv_vmbus_exists() and using it in pci-hyperv: hv_root_partition() is true and hv_nested is false ==> hv_vmbus_exists() is false. hv_root_partition() is true and hv_nested is true ==> hv_vmbus_exists() is true. hv_root_partition() is false ==> hv_vmbus_exists() is true. While at it, rename vmbus_exists() to hv_vmbus_exists() to follow the convention that all public functions have the hv_ prefix; also change the return value's type from int to bool to make the code more readable; also move the two pr_info() calls. Reported-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-14mshv: Introduce tracing supportStanislav Kinsburskii8-15/+629
Introduces various trace events and use them in the corresponding places in the driver. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-14Drivers: hv: vmbus: Limit channel interrupt scan to relid high water markMichael Kelley3-11/+15
When checking for VMBus channel interrupts, current code always scans the full SynIC receive interrupt bit array to get the relid of the interrupting channels. The array has HV_EVENT_FLAGS_COUNT (2048) bits. But VMs rarely have more than 100 channels, and the relid is typically a small integer that is densely assigned by the Hyper-V host. It's wasteful to scan 2048 bits when it is highly unlikely that anything will be found past bit 100. The waste is double with Confidential VMBus because there are two receive interrupt arrays that must be scanned: one for the hypervisor SynIC and one for the paravisor SynIC. Improve the scanning by tracking the largest relid that has been offered by the Hyper-V host. Then when checking for VMBus channel interrupts, only scan up to this high water mark. When channels are rescinded, it's not worth the complexity to recalculate the high water mark. Hyper-V tends to reuse the rescinded relids for any new channels that are subsequently added, and the performance benefit of exactly tracking the high water mark would be minimal. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Roman Kisel <vdso@mailbox.org> Reviewed-by: Roman Kisel <vdso@mailbox.org> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-05drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepareLorenzo Stoakes (Oracle)2-14/+21
The f_op->mmap interface is deprecated, so update the vmbus driver to use its successor, mmap_prepare. This updates all callbacks which referenced the function pointer hv_mmap_ring_buffer to instead reference hv_mmap_prepare_ring_buffer, utilising the newly introduced compat_set_desc_from_vma() and __compat_vma_mmap() to be able to implement this change. The UIO HV generic driver is the only user of hv_create_ring_sysfs(), which is the only function which references vmbus_channel->mmap_prepare_ring_buffer which, in turn, is the only external interface to hv_mmap_prepare_ring_buffer. This patch therefore updates this caller to use mmap_prepare instead, which also previously used vm_iomap_memory(), so this change replaces it with its mmap_prepare equivalent, mmap_action_simple_ioremap(). [akpm@linux-foundation.org: restore struct vmbus_channel comment, per Michael Kelley] Link: https://lkml.kernel.org/r/05467cb62267d750e5c770147517d4df0246cda6.1774045440.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bodo Stroesser <bostroesser@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Hildenbrand <david@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Long Li <longli@microsoft.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Richard Weinberger <richard@nod.at> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: Wei Liu <wei.liu@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-05hv_balloon: set unspecified page reporting orderYuvraj Sakshith1-1/+1
Explicitly mention page reporting order to be set to default value using PAGE_REPORTING_ORDER_UNSPECIFIED fallback value. Link: https://lkml.kernel.org/r/20260303113032.3008371-4-yuvraj.sakshith@oss.qualcomm.com Signed-off-by: Yuvraj Sakshith <yuvraj.sakshith@oss.qualcomm.com> Acked-by: David Hildenbrand (Arm) <david@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Cc: Brendan Jackman <jackmanb@google.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Eugenio Pérez <eperezma@redhat.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jason Wang <jasowang@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Long Li <longli@microsoft.com> Cc: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Wei Liu <wei.liu@kernel.org> Cc: Xuan Zhuo <xuanzhuo@linux.alibaba.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-04Drivers: hv: Move add_interrupt_randomness() to hypervisor callback sysvecMichael Kelley2-6/+0
The Hyper-V ISRs, for normal guests and when running in the hypervisor root patition, are calling add_interrupt_randomness() as a primary source of entropy. The call is currently in the ISRs as a common place to handle both x86/x64 and arm64. On x86/x64, hypervisor interrupts come through a custom sysvec entry, and do not go through a generic interrupt handler. On arm64, hypervisor interrupts come through an emulated GICv3. GICv3 uses the generic handler handle_percpu_devid_irq(), which does not do add_interrupt_randomness() -- unlike its counterpart handle_percpu_irq(). But handle_percpu_devid_irq() is now updated to do the add_interrupt_randomness(). So add_interrupt_randomness() is now needed only in Hyper-V's x86/x64 custom sysvec path. Move add_interrupt_randomness() from the Hyper-V ISRs into the Hyper-V x86/x64 custom sysvec path, matching the existing STIMER0 sysvec path. With this change, add_interrupt_randomness() is no longer called from any device drivers, which is appropriate. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://patch.msgid.link/20260402202400.1707-3-mhklkml@zohomail.com
2026-04-04mshv: Fix infinite fault loop on permission-denied GPA interceptsStanislav Kinsburskii1-3/+12
Prevent infinite fault loops when guests access memory regions without proper permissions. Currently, mshv_handle_gpa_intercept() attempts to remap pages for all faults on movable memory regions, regardless of whether the access type is permitted. When a guest writes to a read-only region, the remap succeeds but the region remains read-only, causing immediate re-fault and spinning the vCPU indefinitely. Validate intercept access type against region permissions before attempting remaps. Reject writes to non-writable regions and executes to non-executable regions early, returning false to let the VMM handle the intercept appropriately. This also closes a potential DoS vector where malicious guests could intentionally trigger these fault loops to consume host resources. Fixes: b9a66cd5ccbb ("mshv: Add support for movable memory regions") Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-18mshv: Fix error handling in mshv_region_pinStanislav Kinsburskii1-2/+4
The current error handling has two issues: First, pin_user_pages_fast() can return a short pin count (less than requested but greater than zero) when it cannot pin all requested pages. This is treated as success, leading to partially pinned regions being used, which causes memory corruption. Second, when an error occurs mid-loop, already pinned pages from the current batch are not properly accounted for before calling mshv_region_invalidate_pages(), causing a page reference leak. Treat short pins as errors and fix partial batch accounting before cleanup. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-13mshv: Fix use-after-free in mshv_map_user_memory error pathStanislav Kinsburskii1-1/+1
In the error path of mshv_map_user_memory(), calling vfree() directly on the region leaves the MMU notifier registered. When userspace later unmaps the memory, the notifier fires and accesses the freed region, causing a use-after-free and potential kernel panic. Replace vfree() with mshv_partition_put() to properly unregister the MMU notifier before freeing the region. Fixes: b9a66cd5ccbb9 ("mshv: Add support for movable memory regions") Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-12mshv: pass struct mshv_user_mem_region by referenceMukesh R1-13/+13
For unstated reasons, function mshv_partition_ioctl_set_memory passes struct mshv_user_mem_region by value instead of by reference. Change it to pass by reference. Signed-off-by: Mukesh R <mrathor@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-03-11Revert "mshv: expose the scrub partition hypercall"Wei Liu1-1/+0
This reverts commit 36d6cbb62133fc6eea28f380409e0fb190f3dfbe. Calling this as a passthrough hypercall leaves the VM in an inconsistent state. Revert before it is released. Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-25mshv: add arm64 support for doorbell & intercept SINTsAnirudh Rayabharam (Microsoft)1-10/+109
On x86, the HYPERVISOR_CALLBACK_VECTOR is used to receive synthetic interrupts (SINTs) from the hypervisor for doorbells and intercepts. There is no such vector reserved for arm64. On arm64, the hypervisor exposes a synthetic register that can be read to find the INTID that should be used for SINTs. This INTID is in the PPI range. To better unify the code paths, introduce mshv_sint_vector_init() that either reads the synthetic register and obtains the INTID (arm64) or just uses HYPERVISOR_CALLBACK_VECTOR as the interrupt vector (x86). Reviewed-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-25mshv: refactor synic init and cleanupAnirudh Rayabharam (Microsoft)3-65/+75
Rename mshv_synic_init() to mshv_synic_cpu_init() and mshv_synic_cleanup() to mshv_synic_cpu_exit() to better reflect that these functions handle per-cpu synic setup and teardown. Use mshv_synic_init/cleanup() to perform init/cleanup that is not per-cpu. Move all the synic related setup from mshv_parent_partition_init. Move the reboot notifier to mshv_synic.c because it currently only operates on the synic cpuhp state. Move out synic_pages from the global mshv_root since its use is now completely local to mshv_synic.c. This is in preparation for adding more stuff to mshv_synic_init(). No functional change. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook1-1/+1
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds2-4/+2
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds10-19/+19
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook16-42/+38
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-20Merge tag 'hyperv-next-signed-20260218' of ↵Linus Torvalds15-203/+1663
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull Hyper-V updates from Wei Liu: - Debugfs support for MSHV statistics (Nuno Das Neves) - Support for the integrated scheduler (Stanislav Kinsburskii) - Various fixes for MSHV memory management and hypervisor status handling (Stanislav Kinsburskii) - Expose more capabilities and flags for MSHV partition management (Anatol Belski, Muminul Islam, Magnus Kulke) - Miscellaneous fixes to improve code quality and stability (Carlos López, Ethan Nelson-Moore, Li RongQing, Michael Kelley, Mukesh Rathor, Purna Pavan Chandra Aekkaladevi, Stanislav Kinsburskii, Uros Bizjak) - PREEMPT_RT fixes for vmbus interrupts (Jan Kiszka) * tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (34 commits) mshv: Handle insufficient root memory hypervisor statuses mshv: Handle insufficient contiguous memory hypervisor status mshv: Introduce hv_deposit_memory helper functions mshv: Introduce hv_result_needs_memory() helper function mshv: Add SMT_ENABLED_GUEST partition creation flag mshv: Add nested virtualization creation flag Drivers: hv: vmbus: Simplify allocation of vmbus_evt mshv: expose the scrub partition hypercall mshv: Add support for integrated scheduler mshv: Use try_cmpxchg() instead of cmpxchg() x86/hyperv: Fix error pointer dereference x86/hyperv: Reserve 3 interrupt vectors used exclusively by MSHV Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT x86/hyperv: Remove ASM_CALL_CONSTRAINT with VMMCALL insn x86/hyperv: Use savesegment() instead of inline asm() to save segment registers mshv: fix SRCU protection in irqfd resampler ack handler mshv: make field names descriptive in a header struct x86/hyperv: Update comment in hyperv_cleanup() mshv: clear eventfd counter on irqfd shutdown x86/hyperv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap() ...
2026-02-19mshv: Handle insufficient root memory hypervisor statusesStanislav Kinsburskii2-0/+16
When creating guest partition objects, the hypervisor may fail to allocate root partition pages and return an insufficient memory status. In this case, deposit memory using the root partition ID instead. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Mukesh R <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-19mshv: Handle insufficient contiguous memory hypervisor statusStanislav Kinsburskii2-0/+5
The HV_STATUS_INSUFFICIENT_CONTIGUOUS_MEMORY status indicates that the hypervisor lacks sufficient contiguous memory for its internal allocations. When this status is encountered, allocate and deposit HV_MAX_CONTIGUOUS_ALLOCATION_PAGES contiguous pages to the hypervisor. HV_MAX_CONTIGUOUS_ALLOCATION_PAGES is defined in the hypervisor headers, a deposit of this size will always satisfy the hypervisor's requirements. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Mukesh R <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-19mshv: Introduce hv_deposit_memory helper functionsStanislav Kinsburskii3-20/+29
Introduce hv_deposit_memory_node() and hv_deposit_memory() helper functions to handle memory deposit with proper error handling. The new hv_deposit_memory_node() function takes the hypervisor status as a parameter and validates it before depositing pages. It checks for HV_STATUS_INSUFFICIENT_MEMORY specifically and returns an error for unexpected status codes. This is a precursor patch to new out-of-memory error codes support. No functional changes intended. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Mukesh R <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-19mshv: Introduce hv_result_needs_memory() helper functionStanislav Kinsburskii3-16/+25
Replace direct comparisons of hv_result(status) against HV_STATUS_INSUFFICIENT_MEMORY with a new hv_result_needs_memory() helper function. This improves code readability and provides a consistent and extendable interface for checking out-of-memory conditions in hypercall results. No functional changes intended. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Reviewed-by: Mukesh R <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18mshv: Add SMT_ENABLED_GUEST partition creation flagAnatol Belski1-0/+2
Add support for HV_PARTITION_CREATION_FLAG_SMT_ENABLED_GUEST to allow userspace VMMs to enable SMT for guest partitions. Expose this via new MSHV_PT_BIT_SMT_ENABLED_GUEST flag in the UAPI. Without this flag, the hypervisor schedules guest VPs incorrectly, causing SMT unusable. Signed-off-by: Anatol Belski <anbelski@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18mshv: Add nested virtualization creation flagMuminul Islam1-0/+2
Introduce HV_PARTITION_CREATION_FLAG_NESTED_VIRTUALIZATION_CAPABLE to indicate support for nested virtualization during partition creation. This enables clearer configuration and capability checks for nested virtualization scenarios. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Muminul Islam <muislam@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18Drivers: hv: vmbus: Simplify allocation of vmbus_evtMichael Kelley1-14/+8
The per-cpu variable vmbus_evt is currently dynamically allocated. It's only 8 bytes, so just allocate it statically to simplify and save a few lines of code. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18mshv: expose the scrub partition hypercallMagnus Kulke1-0/+1
This hypercall needs to be exposed for VMMs to soft-reboot guests. It will reset APIC and synthetic interrupt controller state, among others. Signed-off-by: Magnus Kulke <magnuskulke@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18mshv: Add support for integrated schedulerStanislav Kinsburskii1-32/+50
Query the hypervisor for integrated scheduler support and use it if configured. Microsoft Hypervisor originally provided two schedulers: root and core. The root scheduler allows the root partition to schedule guest vCPUs across physical cores, supporting both time slicing and CPU affinity (e.g., via cgroups). In contrast, the core scheduler delegates vCPU-to-physical-core scheduling entirely to the hypervisor. Direct virtualization introduces a new privileged guest partition type - L1 Virtual Host (L1VH) — which can create child partitions from its own resources. These child partitions are effectively siblings, scheduled by the hypervisor's core scheduler. This prevents the L1VH parent from setting affinity or time slicing for its own processes or guest VPs. While cgroups, CFS, and cpuset controllers can still be used, their effectiveness is unpredictable, as the core scheduler swaps vCPUs according to its own logic (typically round-robin across all allocated physical CPUs). As a result, the system may appear to "steal" time from the L1VH and its children. To address this, Microsoft Hypervisor introduces the integrated scheduler. This allows an L1VH partition to schedule its own vCPUs and those of its guests across its "physical" cores, effectively emulating root scheduler behavior within the L1VH, while retaining core scheduler behavior for the rest of the system. The integrated scheduler is controlled by the root partition and gated by the vmm_enable_integrated_scheduler capability bit. If set, the hypervisor supports the integrated scheduler. The L1VH partition must then check if it is enabled by querying the corresponding extended partition property. If this property is true, the L1VH partition must use the root scheduler logic; otherwise, it must use the core scheduler. This requirement makes reading VMM capabilities in L1VH partition a requirement too. Signed-off-by: Andreea Pintilie <anpintil@microsoft.com> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18mshv: Use try_cmpxchg() instead of cmpxchg()Uros Bizjak2-4/+4
Use !try_cmpxchg() instead of cmpxchg (*ptr, old, new) != old. x86 CMPXCHG instruction returns success in ZF flag, so this change saves a compare after CMPXCHG. The generated assembly code improves from e.g.: 415: 48 8b 44 24 30 mov 0x30(%rsp),%rax 41a: 48 8b 54 24 38 mov 0x38(%rsp),%rdx 41f: f0 49 0f b1 91 a8 02 lock cmpxchg %rdx,0x2a8(%r9) 426: 00 00 428: 48 3b 44 24 30 cmp 0x30(%rsp),%rax 42d: 0f 84 09 ff ff ff je 33c <...> to: 415: 48 8b 44 24 30 mov 0x30(%rsp),%rax 41a: 48 8b 54 24 38 mov 0x38(%rsp),%rdx 41f: f0 49 0f b1 91 a8 02 lock cmpxchg %rdx,0x2a8(%r9) 426: 00 00 428: 0f 84 0e ff ff ff je 33c <...> No functional change intended. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Wei Liu <wei.liu@kernel.org> Cc: Dexuan Cui <decui@microsoft.com> Cc: Long Li <longli@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18