aboutsummaryrefslogtreecommitdiff
path: root/drivers/hv/vmbus_drv.c
AgeCommit message (Collapse)AuthorFilesLines
10 daysMerge tag 'hyperv-next-signed-20260421' of ↵Linus Torvalds1-20/+9
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull Hyper-V updates from Wei Liu: - Fix cross-compilation for hv tools (Aditya Garg) - Fix vmemmap_shift exceeding MAX_FOLIO_ORDER in mshv_vtl (Naman Jain) - Limit channel interrupt scan to relid high water mark (Michael Kelley) - Export hv_vmbus_exists() and use it in pci-hyperv (Dexuan Cui) - Fix cleanup and shutdown issues for MSHV (Jork Loeser) - Introduce more tracing support for MSHV (Stanislav Kinsburskii) * tag 'hyperv-next-signed-20260421' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: x86/hyperv: Skip LP/VP creation on kexec x86/hyperv: move stimer cleanup to hv_machine_shutdown() Drivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowing mshv: Add tracepoint for GPA intercept handling mshv_vtl: Fix vmemmap_shift exceeding MAX_FOLIO_ORDER tools: hv: Fix cross-compilation Drivers: hv: vmbus: Export hv_vmbus_exists() and use it in pci-hyperv mshv: Introduce tracing support Drivers: hv: vmbus: Limit channel interrupt scan to relid high water mark
11 daysx86/hyperv: move stimer cleanup to hv_machine_shutdown()Jork Loeser1-1/+0
Move hv_stimer_global_cleanup() from vmbus's hv_kexec_handler() to hv_machine_shutdown() in the platform code. This ensures stimer cleanup happens before the vmbus unload, which is required for root partition kexec to work correctly. Co-developed-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Anirudh Rayabharam <anrayabh@linux.microsoft.com> Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
11 daysDrivers: hv: vmbus: fix hyperv_cpuhp_online variable shadowingJork Loeser1-1/+0
vmbus_alloc_synic_and_connect() declares a local 'int hyperv_cpuhp_online' that shadows the file-scope global of the same name. The cpuhp state returned by cpuhp_setup_state() is stored in the local, leaving the global at 0 (CPUHP_OFFLINE). When hv_kexec_handler() or hv_machine_shutdown() later call cpuhp_remove_state(hyperv_cpuhp_online) they pass 0, which hits the BUG_ON in __cpuhp_remove_state_cpuslocked(). Remove the local declaration so the cpuhp state is stored in the file-scope global where hv_kexec_handler() and hv_machine_shutdown() expect it. Fixes: 2647c96649ba ("Drivers: hv: Support establishing the confidential VMBus connection") Signed-off-by: Jork Loeser <jloeser@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Anirudh Rayabharam (Microsoft) <anirudh@anirudhrb.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds1-12/+19
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-14Drivers: hv: vmbus: Export hv_vmbus_exists() and use it in pci-hypervDexuan Cui1-12/+8
With commit f84b21da3624 ("PCI: hv: Don't load the driver for baremetal root partition"), the bare metal Linux root partition won't use the pci-hyperv driver, but when a Linux VM runs on the Linux root partition, pci-hyperv's module_init function init_hv_pci_drv() can still run, e.g. in the case of CONFIG_PCI_HYPERV=y, even if the VMBus driver is not used in such a VM (i.e. the hv_vmbus driver's init function returns -ENODEV due to vmbus_root_device being NULL). In such a Linux VM, init_hv_pci_drv() runs with a side effect: the 3 hvpci_block_ops callbacks are set to functions that depend on hv_vmbus. Later, when the MLX driver in such a VM invokes the callbacks, e.g. in drivers/net/ethernet/mellanox/mlx5/core/lib/hv.c: mlx5_hv_register_invalidate(), hvpci_block_ops.reg_blk_invalidate() is hv_register_block_invalidate() rather than a NULL function pointer, and hv_register_block_invalidate() assumes that it can find a struct hv_pcibus_device from pdev->bus->sysdata, which is false in such a VM. Consequently, hv_register_block_invalidate() -> get_pcichild_wslot() -> spin_lock_irqsave() may hang since it can be accessing an invalid spinlock pointer. Fix the issue by exporting hv_vmbus_exists() and using it in pci-hyperv: hv_root_partition() is true and hv_nested is false ==> hv_vmbus_exists() is false. hv_root_partition() is true and hv_nested is true ==> hv_vmbus_exists() is true. hv_root_partition() is false ==> hv_vmbus_exists() is true. While at it, rename vmbus_exists() to hv_vmbus_exists() to follow the convention that all public functions have the hv_ prefix; also change the return value's type from int to bool to make the code more readable; also move the two pr_info() calls. Reported-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-14Drivers: hv: vmbus: Limit channel interrupt scan to relid high water markMichael Kelley1-6/+1
When checking for VMBus channel interrupts, current code always scans the full SynIC receive interrupt bit array to get the relid of the interrupting channels. The array has HV_EVENT_FLAGS_COUNT (2048) bits. But VMs rarely have more than 100 channels, and the relid is typically a small integer that is densely assigned by the Hyper-V host. It's wasteful to scan 2048 bits when it is highly unlikely that anything will be found past bit 100. The waste is double with Confidential VMBus because there are two receive interrupt arrays that must be scanned: one for the hypervisor SynIC and one for the paravisor SynIC. Improve the scanning by tracking the largest relid that has been offered by the Hyper-V host. Then when checking for VMBus channel interrupts, only scan up to this high water mark. When channels are rescinded, it's not worth the complexity to recalculate the high water mark. Hyper-V tends to reuse the rescinded relids for any new channels that are subsequently added, and the performance benefit of exactly tracking the high water mark would be minimal. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Roman Kisel <vdso@mailbox.org> Reviewed-by: Roman Kisel <vdso@mailbox.org> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-04-05drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepareLorenzo Stoakes (Oracle)1-12/+19
The f_op->mmap interface is deprecated, so update the vmbus driver to use its successor, mmap_prepare. This updates all callbacks which referenced the function pointer hv_mmap_ring_buffer to instead reference hv_mmap_prepare_ring_buffer, utilising the newly introduced compat_set_desc_from_vma() and __compat_vma_mmap() to be able to implement this change. The UIO HV generic driver is the only user of hv_create_ring_sysfs(), which is the only function which references vmbus_channel->mmap_prepare_ring_buffer which, in turn, is the only external interface to hv_mmap_prepare_ring_buffer. This patch therefore updates this caller to use mmap_prepare instead, which also previously used vm_iomap_memory(), so this change replaces it with its mmap_prepare equivalent, mmap_action_simple_ioremap(). [akpm@linux-foundation.org: restore struct vmbus_channel comment, per Michael Kelley] Link: https://lkml.kernel.org/r/05467cb62267d750e5c770147517d4df0246cda6.1774045440.git.ljs@kernel.org Signed-off-by: Lorenzo Stoakes (Oracle) <ljs@kernel.org> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Vlastimil Babka (SUSE) <vbabka@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Torgue <alexandre.torgue@foss.st.com> Cc: Al Viro <viro@zeniv.linux.org.uk> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Bodo Stroesser <bostroesser@gmail.com> Cc: Christian Brauner <brauner@kernel.org> Cc: Clemens Ladisch <clemens@ladisch.de> Cc: David Hildenbrand <david@kernel.org> Cc: David Howells <dhowells@redhat.com> Cc: Dexuan Cui <decui@microsoft.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Jan Kara <jack@suse.cz> Cc: Jann Horn <jannh@google.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Liam Howlett <liam.howlett@oracle.com> Cc: Long Li <longli@microsoft.com> Cc: Marc Dionne <marc.dionne@auristor.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Mike Rapoport <rppt@kernel.org> Cc: Miquel Raynal <miquel.raynal@bootlin.com> Cc: Pedro Falcato <pfalcato@suse.de> Cc: Richard Weinberger <richard@nod.at> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Vignesh Raghavendra <vigneshr@ti.com> Cc: Wei Liu <wei.liu@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-04Drivers: hv: Move add_interrupt_randomness() to hypervisor callback sysvecMichael Kelley1-3/+0
The Hyper-V ISRs, for normal guests and when running in the hypervisor root patition, are calling add_interrupt_randomness() as a primary source of entropy. The call is currently in the ISRs as a common place to handle both x86/x64 and arm64. On x86/x64, hypervisor interrupts come through a custom sysvec entry, and do not go through a generic interrupt handler. On arm64, hypervisor interrupts come through an emulated GICv3. GICv3 uses the generic handler handle_percpu_devid_irq(), which does not do add_interrupt_randomness() -- unlike its counterpart handle_percpu_irq(). But handle_percpu_devid_irq() is now updated to do the add_interrupt_randomness(). So add_interrupt_randomness() is now needed only in Hyper-V's x86/x64 custom sysvec path. Move add_interrupt_randomness() from the Hyper-V ISRs into the Hyper-V x86/x64 custom sysvec path, matching the existing STIMER0 sysvec path. With this change, add_interrupt_randomness() is no longer called from any device drivers, which is appropriate. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Acked-by: Wei Liu <wei.liu@kernel.org> Link: https://patch.msgid.link/20260402202400.1707-3-mhklkml@zohomail.com
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds1-3/+3
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook1-5/+5
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-20Merge tag 'hyperv-next-signed-20260218' of ↵Linus Torvalds1-14/+72
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull Hyper-V updates from Wei Liu: - Debugfs support for MSHV statistics (Nuno Das Neves) - Support for the integrated scheduler (Stanislav Kinsburskii) - Various fixes for MSHV memory management and hypervisor status handling (Stanislav Kinsburskii) - Expose more capabilities and flags for MSHV partition management (Anatol Belski, Muminul Islam, Magnus Kulke) - Miscellaneous fixes to improve code quality and stability (Carlos López, Ethan Nelson-Moore, Li RongQing, Michael Kelley, Mukesh Rathor, Purna Pavan Chandra Aekkaladevi, Stanislav Kinsburskii, Uros Bizjak) - PREEMPT_RT fixes for vmbus interrupts (Jan Kiszka) * tag 'hyperv-next-signed-20260218' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (34 commits) mshv: Handle insufficient root memory hypervisor statuses mshv: Handle insufficient contiguous memory hypervisor status mshv: Introduce hv_deposit_memory helper functions mshv: Introduce hv_result_needs_memory() helper function mshv: Add SMT_ENABLED_GUEST partition creation flag mshv: Add nested virtualization creation flag Drivers: hv: vmbus: Simplify allocation of vmbus_evt mshv: expose the scrub partition hypercall mshv: Add support for integrated scheduler mshv: Use try_cmpxchg() instead of cmpxchg() x86/hyperv: Fix error pointer dereference x86/hyperv: Reserve 3 interrupt vectors used exclusively by MSHV Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RT x86/hyperv: Remove ASM_CALL_CONSTRAINT with VMMCALL insn x86/hyperv: Use savesegment() instead of inline asm() to save segment registers mshv: fix SRCU protection in irqfd resampler ack handler mshv: make field names descriptive in a header struct x86/hyperv: Update comment in hyperv_cleanup() mshv: clear eventfd counter on irqfd shutdown x86/hyperv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap() ...
2026-02-18Drivers: hv: vmbus: Simplify allocation of vmbus_evtMichael Kelley1-14/+8
The per-cpu variable vmbus_evt is currently dynamically allocated. It's only 8 bytes, so just allocate it statically to simplify and save a few lines of code. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Long Li <longli@microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-18Drivers: hv: vmbus: Use kthread for vmbus interrupts on PREEMPT_RTJan Kiszka1-1/+65
Resolves the following lockdep report when booting PREEMPT_RT on Hyper-V with related guest support enabled: [ 1.127941] hv_vmbus: registering driver hyperv_drm [ 1.132518] ============================= [ 1.132519] [ BUG: Invalid wait context ] [ 1.132521] 6.19.0-rc8+ #9 Not tainted [ 1.132524] ----------------------------- [ 1.132525] swapper/0/0 is trying to lock: [ 1.132526] ffff8b9381bb3c90 (&channel->sched_lock){....}-{3:3}, at: vmbus_chan_sched+0xc4/0x2b0 [ 1.132543] other info that might help us debug this: [ 1.132544] context-{2:2} [ 1.132545] 1 lock held by swapper/0/0: [ 1.132547] #0: ffffffffa010c4c0 (rcu_read_lock){....}-{1:3}, at: vmbus_chan_sched+0x31/0x2b0 [ 1.132557] stack backtrace: [ 1.132560] CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.19.0-rc8+ #9 PREEMPT_{RT,(lazy)} [ 1.132565] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v4.1 09/25/2025 [ 1.132567] Call Trace: [ 1.132570] <IRQ> [ 1.132573] dump_stack_lvl+0x6e/0xa0 [ 1.132581] __lock_acquire+0xee0/0x21b0 [ 1.132592] lock_acquire+0xd5/0x2d0 [ 1.132598] ? vmbus_chan_sched+0xc4/0x2b0 [ 1.132606] ? lock_acquire+0xd5/0x2d0 [ 1.132613] ? vmbus_chan_sched+0x31/0x2b0 [ 1.132619] rt_spin_lock+0x3f/0x1f0 [ 1.132623] ? vmbus_chan_sched+0xc4/0x2b0 [ 1.132629] ? vmbus_chan_sched+0x31/0x2b0 [ 1.132634] vmbus_chan_sched+0xc4/0x2b0 [ 1.132641] vmbus_isr+0x2c/0x150 [ 1.132648] __sysvec_hyperv_callback+0x5f/0xa0 [ 1.132654] sysvec_hyperv_callback+0x88/0xb0 [ 1.132658] </IRQ> [ 1.132659] <TASK> [ 1.132660] asm_sysvec_hyperv_callback+0x1a/0x20 As code paths that handle vmbus IRQs use sleepy locks under PREEMPT_RT, the vmbus_isr execution needs to be moved into thread context. Open- coding this allows to skip the IPI that irq_work would additionally bring and which we do not need, being an IRQ, never an NMI. This affects both x86 and arm64, therefore hook into the common driver logic. Signed-off-by: Jan Kiszka <jan.kiszka@siemens.com> Reviewed-by: Florian Bezdeka <florian.bezdeka@siemens.com> Tested-by: Florian Bezdeka <florian.bezdeka@siemens.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-14drivers: hv: vmbus_drv: Remove reference to hpyerv_fbPrasanna Kumar T S M1-2/+2
Remove hyperv_fb reference as the driver is removed. Signed-off-by: Prasanna Kumar T S M <ptsm@linux.microsoft.com> Signed-off-by: Helge Deller <deller@gmx.de>
2025-12-16sysfb: Replace screen_info with sysfb_primary_displayThomas Zimmermann1-3/+3
Replace the global screen_info with sysfb_primary_display of type struct sysfb_display_info. Adapt all users of screen_info. Instances of screen_info are defined for x86, loongarch and EFI, with only one instance compiled into a specific build. Replace all of them with sysfb_primary_display. All existing users of screen_info are updated by pointing them to sysfb_primary_display.screen instead. This introduces some churn to the code, but has no impact on functionality. Boot parameters and EFI config tables are unchanged. They transfer screen_info as before. The logic in EFI's alloc_screen_info() changes slightly, as it now returns the screen field of sysfb_primary_display. Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Acked-by: Arnd Bergmann <arnd@arndb.de> Acked-by: Ard Biesheuvel <ardb@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> # drivers/pci/ Reviewed-by: Richard Lyu <richard.lyu@suse.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2025-12-09Merge tag 'hyperv-next-signed-20251207' of ↵Linus Torvalds1-65/+123
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Enhancements to Linux as the root partition for Microsoft Hypervisor: - Support a new mode called L1VH, which allows Linux to drive the hypervisor running the Azure Host directly - Support for MSHV crash dump collection - Allow Linux's memory management subsystem to better manage guest memory regions - Fix issues that prevented a clean shutdown of the whole system on bare metal and nested configurations - ARM64 support for the MSHV driver - Various other bug fixes and cleanups - Add support for Confidential VMBus for Linux guest on Hyper-V - Secure AVIC support for Linux guests on Hyper-V - Add the mshv_vtl driver to allow Linux to run as the secure kernel in a higher virtual trust level for Hyper-V * tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (58 commits) mshv: Cleanly shutdown root partition with MSHV mshv: Use reboot notifier to configure sleep state mshv: Add definitions for MSHV sleep state configuration mshv: Add support for movable memory regions mshv: Add refcount and locking to mem regions mshv: Fix huge page handling in memory region traversal mshv: Move region management to mshv_regions.c mshv: Centralize guest memory region destruction mshv: Refactor and rename memory region handling functions mshv: adjust interrupt control structure for ARM64 Drivers: hv: use kmalloc_array() instead of kmalloc() mshv: Add ioctl for self targeted passthrough hvcalls Drivers: hv: Introduce mshv_vtl driver Drivers: hv: Export some symbols for mshv_vtl static_call: allow using STATIC_CALL_TRAMP_STR() from assembly mshv: Extend create partition ioctl to support cpu features mshv: Allow mappings that overlap in uaddr mshv: Fix create memory region overlap check mshv: add WQ_PERCPU to alloc_workqueue users Drivers: hv: Use kmalloc_array() instead of kmalloc() ...
2025-11-15Drivers: hv: Export some symbols for mshv_vtlNaman Jain1-1/+3
MSHV_VTL driver is going to be introduced, which is supposed to provide interface for Virtual Machine Monitors (VMMs) to control Virtual Trust Level (VTL). Export the symbols needed to make it work (vmbus_isr, hv_context and hv_post_message). Co-developed-by: Roman Kisel <romank@linux.microsoft.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Co-developed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Signed-off-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202506110544.q0NDMQVc-lkp@intel.com/ Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Support establishing the confidential VMBus connectionRoman Kisel1-62/+106
To establish the confidential VMBus connection the CoCo VM, the guest first checks on the confidential VMBus availability, and then proceeds to initializing the communication stack. Implement that in the VMBus driver initialization. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Rename the SynIC enable and disable routinesRoman Kisel1-3/+3
The confidential VMBus requires support for the both hypervisor facing SynIC and the paravisor one. Rename the functions that enable and disable SynIC with the hypervisor. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: Rename fields for SynIC message and event pagesRoman Kisel1-3/+3
Confidential VMBus requires interacting with two SynICs -- one provided by the host hypervisor, and one provided by the paravisor. Each SynIC requires its own message and event pages. Rename the existing host-accessible SynIC message and event pages with the "hyp_" prefix to clearly distinguish them from the paravisor ones. The field name is also changed in mshv_root.* for consistency. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-15Drivers: hv: VMBus protocol version 6.0Roman Kisel1-0/+12
The confidential VMBus is supported starting from the protocol version 6.0 onwards. Provide the required definitions. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-11-14syscore: Pass context data to callbacksThierry Reding1-5/+9
Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-09-30Drivers: hv: vmbus: Fix typos in vmbus_drv.cAlok Tiwari1-2/+2
Fix two minor typos in vmbus_drv.c: - Correct "reponsible" -> "responsible" in a comment. - Add missing newline in pr_err() message ("channeln" -> "channel\n"). These are cosmetic changes only and do not affect functionality. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-09-30Drivers: hv: vmbus: Fix sysfs output format for ring buffer indexAlok Tiwari1-2/+2
The sysfs attributes out_read_index and out_write_index in vmbus_drv.c currently use %d to print outbound.current_read_index and outbound.current_write_index. These fields are u32 values, so printing them with %d (signed) is not logically correct. Update the format specifier to %u to correctly match their type. No functional change, only fixes the sysfs output format. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-09-30Drivers: hv: vmbus: Clean up sscanf format specifier in target_cpu_store()Alok Tiwari1-1/+1
The target_cpu_store() function parses the target CPU from the sysfs buffer using sscanf(). The format string currently uses "%uu", which is redundant. The compiler ignores the extra "u", so there is no incorrect parsing at runtime. Update the format string to use "%u" for clarity and consistency. Signed-off-by: Alok Tiwari <alok.a.tiwari@oracle.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2025-07-09Drivers: hv: Fix the check for HYPERVISOR_CALLBACK_VECTORNaman Jain1-4/+5
__is_defined(HYPERVISOR_CALLBACK_VECTOR) would return 1, only if HYPERVISOR_CALLBACK_VECTOR macro is defined as 1. However its value is 0xf3 and this leads to __is_defined() returning 0. The expectation was to just check whether this MACRO is defined or not and get 1 if it's defined. Replace __is_defined with #ifdef blocks instead to fix it. Fixes: 1dc5df133b98 ("Drivers: hv: vmbus: Get the IRQ number from DeviceTree") Cc: stable@kernel.org Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Reviewed-by: Roman Kisel <romank@linux.microsoft.com> Link: https://lore.kernel.org/r/20250707084322.1763-1-namjain@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250707084322.1763-1-namjain@linux.microsoft.com>
2025-06-03Merge tag 'hyperv-next-signed-20250602' of ↵Linus Torvalds1-10/+85
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel) - Fixes for Hyper-V UIO driver (Long Li) - Fixes for Hyper-V PCI driver (Michael Kelley) - Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley) - Documentation updates for Hyper-V VMBus (Michael Kelley) - Enhance logging for hv_kvp_daemon (Shradha Gupta) * tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits) Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir Documentation: hyperv: Update VMBus doc with new features and info PCI: hv: Remove unnecessary flex array in struct pci_packet Drivers: hv: Remove hv_alloc/free_* helpers Drivers: hv: Use kzalloc for panic page allocation uio_hv_generic: Align ring size to system page uio_hv_generic: Use correct size for interrupt and monitor pages Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary arch/x86: Provide the CPU number in the wakeup AP callback x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap() PCI: hv: Get vPCI MSI IRQ domain from DeviceTree ACPI: irq: Introduce acpi_get_gsi_dispatcher() Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device() Drivers: hv: vmbus: Get the IRQ number from DeviceTree dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties arm64, x86: hyperv: Report the VTL the system boots in arm64: hyperv: Initialize the Virtual Trust Level field Drivers: hv: Provide arch-neutral implementation of get_vtl() Drivers: hv: Enable VTL mode for arm64 ...
2025-05-23Drivers: hv: vmbus: Add comments about races with "channels" sysfs dirMichael Kelley1-2/+40
The VMBus driver code has some inherent races in the creation of the "channels" sysfs subdirectory and its per-channel numbered subdirectories. These races have not generally been recognized or understood. Add some comments to call them out. No code changes. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250514225508.52629-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250514225508.52629-1-mhklinux@outlook.com>
2025-05-23Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device()Roman Kisel1-8/+15
The ARM64 PCI code for hyperv needs to know the VMBus root device, and it is private. Provide a function that returns it. Rename it from "hv_dev" as "hv_dev" as a symbol is very overloaded. No functional changes. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250428210742.435282-10-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250428210742.435282-10-romank@linux.microsoft.com>
2025-05-23Drivers: hv: vmbus: Get the IRQ number from DeviceTreeRoman Kisel1-0/+30
The VMBus driver uses ACPI for interrupt assignment on arm64 hence it won't function in the VTL mode where only DeviceTree can be used. Update the VMBus driver to discover interrupt configuration from DT. Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/20250428210742.435282-9-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250428210742.435282-9-romank@linux.microsoft.com>
2025-05-21drivers: hv: fix up const issue with vmbus_chan_bin_attrsGreg Kroah-Hartman1-1/+1
In commit 9bec944506fa ("sysfs: constify attribute_group::bin_attrs"), the bin_attributes are now required to be const. Due to merge issues, the original commit could not modify this structure (it came in through a different branch.) Fix this up now by setting the variable properly. Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Naman Jain <namjain@linux.microsoft.com> Fixes: 9bec944506fa ("sysfs: constify attribute_group::bin_attrs") Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02Drivers: hv: Make the sysfs node size for the ring buffer dynamicNaman Jain1-1/+10
The ring buffer size varies across VMBus channels. The size of sysfs node for the ring buffer is currently hardcoded to 4 MB. Userspace clients either use fstat() or hardcode this size for doing mmap(). To address this, make the sysfs node size dynamic to reflect the actual ring buffer size for each channel. This will ensure that fstat() on ring sysfs node always returns the correct size of ring buffer. Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Link: https://lore.kernel.org/r/20250502074811.2022-3-namjain@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-05-02uio_hv_generic: Fix sysfs creation path for ring bufferNaman Jain1-1/+99
On regular bootup, devices get registered to VMBus first, so when uio_hv_generic driver for a particular device type is probed, the device is already initialized and added, so sysfs creation in hv_uio_probe() works fine. However, when the device is removed and brought back, the channel gets rescinded and the device again gets registered to VMBus. However this time, the uio_hv_generic driver is already registered to probe for that device and in this case sysfs creation is tried before the device's kobject gets initialized completely. Fix this by moving the core logic of sysfs creation of ring buffer, from uio_hv_generic to HyperV's VMBus driver, where the rest of the sysfs attributes for the channels are defined. While doing that, make use of attribute groups and macros, instead of creating sysfs directly, to ensure better error handling and code flow. Problematic path: vmbus_process_offer (A new offer comes for the VMBus device) vmbus_add_channel_work vmbus_device_register |-> device_register | |... | |-> hv_uio_probe | |... | |-> sysfs_create_bin_file (leads to a warning as | the primary channel's kobject, which is used to | create the sysfs file, is not yet initialized) |-> kset_create_and_add |-> vmbus_add_channel_kobj (initialization of the primary channel's kobject happens later) Above code flow is sequential and the warning is always reproducible in this path. Fixes: 9ab877a6ccf8 ("uio_hv_generic: make ring buffer attribute for primary channel") Cc: stable@kernel.org Suggested-by: Saurabh Sengar <ssengar@linux.microsoft.com> Suggested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Naman Jain <namjain@linux.microsoft.com> Link: https://lore.kernel.org/r/20250502074811.2022-2-namjain@linux.microsoft.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-25Merge tag 'hyperv-next-signed-20250324' of ↵Linus Torvalds1-22/+32
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Add support for running as the root partition in Hyper-V (Microsoft Hypervisor) by exposing /dev/mshv (Nuno and various people) - Add support for CPU offlining in Hyper-V (Hamza Mahfooz) - Misc fixes and cleanups (Roman Kisel, Tianyu Lan, Wei Liu, Michael Kelley, Thorsten Blum) * tag 'hyperv-next-signed-20250324' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (24 commits) x86/hyperv: fix an indentation issue in mshyperv.h x86/hyperv: Add comments about hv_vpset and var size hypercall input args Drivers: hv: Introduce mshv_root module to expose /dev/mshv to VMMs hyperv: Add definitions for root partition driver to hv headers x86: hyperv: Add mshv_handler() irq handler and setup function Drivers: hv: Introduce per-cpu event ring tail Drivers: hv: Export some functions for use by root partition module acpi: numa: Export node_to_pxm() hyperv: Introduce hv_recommend_using_aeoi() arm64/hyperv: Add some missing functions to arm64 x86/mshyperv: Add support for extended Hyper-V features hyperv: Log hypercall status codes as strings x86/hyperv: Fix check of return value from snp_set_vmsa() x86/hyperv: Add VTL mode callback for restarting the system x86/hyperv: Add VTL mode emergency restart callback hyperv: Remove unused union and structs hyperv: Add CONFIG_MSHV_ROOT to gate root partition support hyperv: Change hv_root_partition into a function hyperv: Convert hypercall statuses to linux error codes drivers/hv: add CPU offlining support ...
2025-03-10Drivers: hv: vmbus: Don't release fb_mmio resource in vmbus_free_mmio()Michael Kelley1-0/+13
The VMBus driver manages the MMIO space it owns via the hyperv_mmio resource tree. Because the synthetic video framebuffer portion of the MMIO space is initially setup by the Hyper-V host for each guest, the VMBus driver does an early reserve of that portion of MMIO space in the hyperv_mmio resource tree. It saves a pointer to that resource in fb_mmio. When a VMBus driver requests MMIO space and passes "true" for the "fb_overlap_ok" argument, the reserved framebuffer space is used if possible. In that case it's not necessary to do another request against the "shadow" hyperv_mmio resource tree because that resource was already requested in the early reserve steps. However, the vmbus_free_mmio() function currently does no special handling for the fb_mmio resource. When a framebuffer device is removed, or the driver is unbound, the current code for vmbus_free_mmio() releases the reserved resource, leaving fb_mmio pointing to memory that has been freed. If the same or another driver is subsequently bound to the device, vmbus_allocate_mmio() checks against fb_mmio, and potentially gets garbage. Furthermore a second unbind operation produces this "nonexistent resource" error because of the unbalanced behavior between vmbus_allocate_mmio() and vmbus_free_mmio(): [ 55.499643] resource: Trying to free nonexistent resource <0x00000000f0000000-0x00000000f07fffff> Fix this by adding logic to vmbus_free_mmio() to recognize when MMIO space in the fb_mmio reserved area would be released, and don't release it. This filtering ensures the fb_mmio resource always exists, and makes vmbus_free_mmio() more parallel with vmbus_allocate_mmio(). Fixes: be000f93e5d7 ("drivers:hv: Track allocations of children of hv_vmbus in private resource tree") Signed-off-by: Michael Kelley <mhklinux@outlook.com> Tested-by: Saurabh Sengar <ssengar@linux.microsoft.com> Reviewed-by: Saurabh Sengar <ssengar@linux.microsoft.com> Link: https://lore.kernel.org/r/20250310035208.275764-1-mhklinux@outlook.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250310035208.275764-1-mhklinux@outlook.com>
2025-02-22hyperv: Change hv_root_partition into a functionNuno Das Neves1-1/+1
Introduce hv_curr_partition_type to store the partition type as an enum. Right now this is limited to guest or root partition, but there will be other kinds in future and the enum is easily extensible. Set up hv_curr_partition_type early in Hyper-V initialization with hv_identify_partition_type(). hv_root_partition() just queries this value, and shouldn't be called before that. Making this check into a function sets the stage for adding a config option to gate the compilation of root partition code. In particular, hv_root_partition() can be stubbed out always be false if root partition support isn't desired. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Easwar Hariharan <eahariha@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Link: https://lore.kernel.org/r/1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <1740167795-13296-3-git-send-email-nunodasneves@linux.microsoft.com>
2025-02-13drivers/hv: introduce vmbus_channel_set_cpu()Hamza Mahfooz1-21/