aboutsummaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)AuthorFilesLines
9 daysMerge tag 'perf-urgent-2026-01-24' of ↵Linus Torvalds1-2/+11
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf events fixes from Ingo Molnar: - Fix mmap_count warning & bug when creating a group member event with the PERF_FLAG_FD_OUTPUT flag - Disable the sample period == 1 branch events BTS optimization on guests, because BTS is not virtualized * tag 'perf-urgent-2026-01-24' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Do not enable BTS for guests perf: Fix refcount warning on event->mmap_count increment
11 daysx86: make page fault handling disable interrupts properlyCedric Xing1-10/+5
There's a big comment in the x86 do_page_fault() about our interrupt disabling code: * User address page fault handling might have reenabled * interrupts. Fixing up all potential exit points of * do_user_addr_fault() and its leaf functions is just not * doable w/o creating an unholy mess or turning the code * upside down. but it turns out that comment is subtly wrong, and the code as a result is also wrong. Because it's certainly true that we may have re-enabled interrupts when handling user page faults. And it's most certainly true that we don't want to bother fixing up all the cases. But what isn't true is that it's limited to user address page faults. The confusion stems from the fact that we have logic here that depends on the address range of the access, but other code then depends on the _context_ the access was done in. The two are not related, even though both of them are about user-vs-kernel. In other words, both user and kernel addresses can cause interrupts to have been enabled (eg when __bad_area_nosemaphore() gets called for user accesses to kernel addresses). As a result we should make sure to disable interrupts again regardless of the address range before returning to the low-level fault handling code. The __bad_area_nosemaphore() code actually did disable interrupts again after enabling them, just not consistently. Ironically, as noted in the original comment, fixing up all the cases is just not worth it, when the simple solution is to just do it unconditionally in one single place. So remove the incomplete case that unsuccessfully tried to do what the comment said was "not doable" in commit ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code"), and just make it do the simple and straightforward thing. Signed-off-by: Cedric Xing <cedric.xing@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Fixes: ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code") Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
13 daysperf/x86/intel: Do not enable BTS for guestsFernand Sieber1-2/+11
By default when users program perf to sample branch instructions (PERF_COUNT_HW_BRANCH_INSTRUCTIONS) with a sample period of 1, perf interprets this as a special case and enables BTS (Branch Trace Store) as an optimization to avoid taking an interrupt on every branch. Since BTS doesn't virtualize, this optimization doesn't make sense when the request originates from a guest. Add an additional check that prevents this optimization for virtualized events (exclude_host). Reported-by: Jan H. Schönherr <jschoenh@amazon.de> Suggested-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Fernand Sieber <sieberf@amazon.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20251211183604.868641-1-sieberf@amazon.com
2026-01-19x86/kfence: avoid writing L1TF-vulnerable PTEsAndrew Cooper1-5/+24
For native, the choice of PTE is fine. There's real memory backing the non-present PTE. However, for XenPV, Xen complains: (XEN) d1 L1TF-vulnerable L1e 8010000018200066 - Shadowing To explain, some background on XenPV pagetables: Xen PV guests are control their own pagetables; they choose the new PTE value, and use hypercalls to make changes so Xen can audit for safety. In addition to a regular reference count, Xen also maintains a type reference count. e.g. SegDesc (referenced by vGDT/vLDT), Writable (referenced with _PAGE_RW) or L{1..4} (referenced by vCR3 or a lower pagetable level). This is in order to prevent e.g. a page being inserted into the pagetables for which the guest has a writable mapping. For non-present mappings, all other bits become software accessible, and typically contain metadata rather a real frame address. There is nothing that a reference count could sensibly be tied to. As such, even if Xen could recognise the address as currently safe, nothing would prevent that frame from changing owner to another VM in the future. When Xen detects a PV guest writing a L1TF-PTE, it responds by activating shadow paging. This is normally only used for the live phase of migration, and comes with a reasonable overhead. KFENCE only cares about getting #PF to catch wild accesses; it doesn't care about the value for non-present mappings. Use a fully inverted PTE, to avoid hitting the slow path when running under Xen. While adjusting the logic, take the opportunity to skip all actions if the PTE is already in the right state, half the number PVOps callouts, and skip TLB maintenance on a !P -> P transition which benefits non-Xen cases too. Link: https://lkml.kernel.org/r/20260106180426.710013-1-andrew.cooper3@citrix.com Fixes: 1dc0da6e9ec0 ("x86, kfence: enable KFENCE for x86") Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jann Horn <jannh@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-01-18Merge tag 'x86-urgent-2026-01-18' of ↵Linus Torvalds2-4/+20
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 fixes from Ingo Molnar: - Fix resctrl initialization on Hygon CPUs - Fix resctrl memory bandwidth counters on Hygon CPUs - Fix x86 self-tests build bug * tag 'x86-urgent-2026-01-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/x86: Add selftests include path for kselftest.h after centralization x86/resctrl: Fix memory bandwidth counter width for Hygon x86/resctrl: Add missing resctrl initialization for Hygon
2026-01-16Merge tag 'cxl-fixes-6.19-rc6' of ↵Linus Torvalds1-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull Compute Express Link (CXL) fixes from Dave Jiang: - Recognize all ZONE_DEVICE users as physaddr consumers - Fix format string for extended_linear_cache_size_show() - Fix target list setup for multiple decoders sharing the same downstream port - Restore HBIW check before derefernce platform data - Fix potential infinite loop in __cxl_dpa_reserve() - Check for invalid addresses returned from translation functions on error * tag 'cxl-fixes-6.19-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: cxl: Check for invalid addresses returned from translation functions on errors cxl/hdm: Fix potential infinite loop in __cxl_dpa_reserve() cxl/acpi: Restore HBIW check before dereferencing platform_data cxl/port: Fix target list setup for multiple decoders sharing the same dport cxl/region: fix format string for resource_size_t x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumers
2026-01-13Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds3-6/+54
Pull x86 kvm fixes from Paolo Bonzini: - Avoid freeing stack-allocated node in kvm_async_pf_queue_task - Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1 * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: selftests: kvm: Verify TILELOADD actually #NM faults when XFD[18]=1 selftests: kvm: try getting XFD and XSAVE state out of sync selftests: kvm: replace numbered sync points with actions x86/fpu: Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1 x86/kvm: Avoid freeing stack-allocated node in kvm_async_pf_queue_task
2026-01-13x86/resctrl: Fix memory bandwidth counter width for HygonXiaochen Shen2-2/+16
The memory bandwidth calculation relies on reading the hardware counter and measuring the delta between samples. To ensure accurate measurement, the software reads the counter frequently enough to prevent it from rolling over twice between reads. The default Memory Bandwidth Monitoring (MBM) counter width is 24 bits. Hygon CPUs provide a 32-bit width counter, but they do not support the MBM capability CPUID leaf (0xF.[ECX=1]:EAX) to report the width offset (from 24 bits). Consequently, the kernel falls back to the 24-bit default counter width, which causes incorrect overflow handling on Hygon CPUs. Fix this by explicitly setting the counter width offset to 8 bits (resulting in a 32-bit total counter width) for Hygon CPUs. Fixes: d8df126349da ("x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper") Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251209062650.1536952-3-shenxiaochen@open-hieco.net
2026-01-13x86/resctrl: Add missing resctrl initialization for HygonXiaochen Shen1-2/+4
Hygon CPUs supporting Platform QoS features currently undergo partial resctrl initialization through resctrl_cpu_detect() in the Hygon BSP init helper and AMD/Hygon common initialization code. However, several critical data structures remain uninitialized for Hygon CPUs in the following paths: - get_mem_config()-> __rdt_get_mem_config_amd(): rdt_resource::membw,alloc_capable hw_res::num_closid - rdt_init_res_defs()->rdt_init_res_defs_amd(): rdt_resource::cache hw_res::msr_base,msr_update Add the missing AMD/Hygon common initialization to ensure proper Platform QoS functionality on Hygon CPUs. Fixes: d8df126349da ("x86/cpu/hygon: Add missing resctrl_cpu_detect() in bsp_init helper") Signed-off-by: Xiaochen Shen <shenxiaochen@open-hieco.net> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Reinette Chatre <reinette.chatre@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251209062650.1536952-2-shenxiaochen@open-hieco.net
2026-01-11Merge tag 'x86-urgent-2026-01-11' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fix from Ingo Molnar: "Disable GCOV instrumentation in the SEV noinstr.c collection of SEV noinstr methods, to further robustify the code" * tag 'x86-urgent-2026-01-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Disable GCOV on noinstr object
2026-01-11treewide: Update email addressThomas Gleixner4-4/+4
In a vain attempt to consolidate the email zoo switch everything to the kernel.org account. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-01-10x86/fpu: Clear XSTATE_BV[i] in guest XSAVE state whenever XFD[i]=1Sean Christopherson2-3/+38
When loading guest XSAVE state via KVM_SET_XSAVE, and when updating XFD in response to a guest WRMSR, clear XFD-disabled features in the saved (or to be restored) XSTATE_BV to ensure KVM doesn't attempt to load state for features that are disabled via the guest's XFD. Because the kernel executes XRSTOR with the guest's XFD, saving XSTATE_BV[i]=1 with XFD[i]=1 will cause XRSTOR to #NM and panic the kernel. E.g. if fpu_update_guest_xfd() sets XFD without clearing XSTATE_BV: ------------[ cut here ]------------ WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#29: amx_test/848 Modules linked in: kvm_intel kvm irqbypass CPU: 29 UID: 1000 PID: 848 Comm: amx_test Not tainted 6.19.0-rc2-ffa07f7fd437-x86_amx_nm_xfd_non_init-vm #171 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:exc_device_not_available+0x101/0x110 Call Trace: <TASK> asm_exc_device_not_available+0x1a/0x20 RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90 switch_fpu_return+0x4a/0xb0 kvm_arch_vcpu_ioctl_run+0x1245/0x1e40 [kvm] kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm] __x64_sys_ioctl+0x8f/0xd0 do_syscall_64+0x62/0x940 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> ---[ end trace 0000000000000000 ]--- This can happen if the guest executes WRMSR(MSR_IA32_XFD) to set XFD[18] = 1, and a host IRQ triggers kernel_fpu_begin() prior to the vmexit handler's call to fpu_update_guest_xfd(). and if userspace stuffs XSTATE_BV[i]=1 via KVM_SET_XSAVE: ------------[ cut here ]------------ WARNING: arch/x86/kernel/traps.c:1524 at exc_device_not_available+0x101/0x110, CPU#14: amx_test/867 Modules linked in: kvm_intel kvm irqbypass CPU: 14 UID: 1000 PID: 867 Comm: amx_test Not tainted 6.19.0-rc2-2dace9faccd6-x86_amx_nm_xfd_non_init-vm #168 NONE Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:exc_device_not_available+0x101/0x110 Call Trace: <TASK> asm_exc_device_not_available+0x1a/0x20 RIP: 0010:restore_fpregs_from_fpstate+0x36/0x90 fpu_swap_kvm_fpstate+0x6b/0x120 kvm_load_guest_fpu+0x30/0x80 [kvm] kvm_arch_vcpu_ioctl_run+0x85/0x1e40 [kvm] kvm_vcpu_ioctl+0x2c3/0x8f0 [kvm] __x64_sys_ioctl+0x8f/0xd0 do_syscall_64+0x62/0x940 entry_SYSCALL_64_after_hwframe+0x4b/0x53 </TASK> ---[ end trace 0000000000000000 ]--- The new behavior is consistent with the AMX architecture. Per Intel's SDM, XSAVE saves XSTATE_BV as '0' for components that are disabled via XFD (and non-compacted XSAVE saves the initial configuration of the state component): If XSAVE, XSAVEC, XSAVEOPT, or XSAVES is saving the state component i, the instruction does not generate #NM when XCR0[i] = IA32_XFD[i] = 1; instead, it operates as if XINUSE[i] = 0 (and the state component was in its initial state): it saves bit i of XSTATE_BV field of the XSAVE header as 0; in addition, XSAVE saves the initial configuration of the state component (the other instructions do not save state component i). Alternatively, KVM could always do XRSTOR with XFD=0, e.g. by using a constant XFD based on the set of enabled features when XSAVEing for a struct fpu_guest. However, having XSTATE_BV[i]=1 for XFD-disabled features can only happen in the above interrupt case, or in similar scenarios involving preemption on preemptible kernels, because fpu_swap_kvm_fpstate()'s call to save_fpregs_to_fpstate() saves the outgoing FPU state with the current XFD; and that is (on all but the first WRMSR to XFD) the guest XFD. Therefore, XFD can only go out of sync with XSTATE_BV in the above interrupt case, or in similar scenarios involving preemption on preemptible kernels, and it we can consider it (de facto) part of KVM ABI that KVM_GET_XSAVE returns XSTATE_BV[i]=0 for XFD-disabled features. Reported-by: Paolo Bonzini <pbonzini@redhat.com> Cc: stable@vger.kernel.org Fixes: 820a6ee944e7 ("kvm: x86: Add emulation for IA32_XFD", 2022-01-14) Signed-off-by: Sean Christopherson <seanjc@google.com> [Move clearing of XSTATE_BV from fpu_copy_uabi_to_guest_fpstate to kvm_vcpu_ioctl_x86_set_xsave. - Paolo] Reviewed-by: Binbin Wu <binbin.wu@linux.intel.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2026-01-05x86/kaslr: Recognize all ZONE_DEVICE users as physaddr consumersDan Williams1-5/+5
Commit 7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems") is too narrow. The effect being mitigated in that commit is caused by ZONE_DEVICE which PCI_P2PDMA has a dependency. ZONE_DEVICE, in general, lets any physical address be added to the direct-map. I.e. not only ACPI hotplug ranges, CXL Memory Windows, or EFI Specific Purpose Memory, but also any PCI MMIO range for the DEVICE_PRIVATE and PCI_P2PDMA cases. Update the mitigation, limit KASLR entropy, to apply in all ZONE_DEVICE=y cases. Distro kernels typically have PCI_P2PDMA=y, so the practical exposure of this problem is limited to the PCI_P2PDMA=n case. A potential path to recover entropy would be to walk ACPI and determine the limits for hotplug and PCI MMIO before kernel_randomize_memory(). On smaller systems that could yield some KASLR address bits. This needs additional investigation to determine if some limited ACPI table scanning can happen this early without an open coded solution like arch/x86/boot/compressed/acpi.c needs to deploy. Cc: Ingo Molnar <mingo@kernel.org> Cc: Kees Cook <kees@kernel.org> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andy Lutomirski <luto@kernel.org> Cc: Logan Gunthorpe <logang@deltatee.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: "Liam R. Howlett" <Liam.Howlett@oracle.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Mike Rapoport <rppt@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Cc: Michal Hocko <mhocko@suse.com> Fixes: 7ffb791423c7 ("x86/kaslr: Reduce KASLR entropy on most x86 systems") Cc: <stable@vger.kernel.org> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Reviewed-by: Balbir Singh <balbirs@nvidia.com> Tested-by: Yasunori Goto <y-goto@fujitsu.com> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://patch.msgid.link/692e08b2516d4_261c1100a3@dwillia2-mobl4.notmuch Signed-off-by: Dave Jiang <dave.jiang@intel.com>
2026-01-05x86/sev: Disable GCOV on noinstr objectBrendan Jackman1-0/+2
With Debian clang version 19.1.7 (3+build5) there are calls to kasan_check_write() from __sev_es_nmi_complete(), which violates noinstr. Fix it by disabling GCOV for the noinstr object, as has been done for previous such instrumentation issues. Note that this file already disables __SANITIZE_ADDRESS__ and __SANITIZE_THREAD__, thus calls like kasan_check_write() ought to be nops regardless of GCOV. This has been fixed in other patches. However, to avoid any other accidental instrumentation showing up, (and since, in principle GCOV is instrumentation and hence should be disabled for noinstr code anyway), disable GCOV overall as well. Signed-off-by: Brendan Jackman <jackmanb@google.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Acked-by: Marco Elver <elver@google.com> Link: https://patch.msgid.link/20251216-gcov-inline-noinstr-v3-3-10244d154451@google.com
2026-01-01x86/kvm: Avoid freeing stack-allocated node in kvm_async_pf_queue_taskRyosuke Yasuoka1-3/+16
kvm_async_pf_queue_task() can incorrectly try to kfree() a node allocated on the stack of kvm_async_pf_task_wait_schedule(). This occurs when a task requests a PF while another task's PF request with the same token is still pending. Since the token is derived from the (u32)address in exc_page_fault(), two different tasks can generate the same token. Currently, kvm_async_pf_queue_task() assumes that any entry found in the list is a dummy entry and tries to kfree() it. To fix this, add a flag to the node structure to distinguish stack-allocated nodes, and only kfree() the node if it is a dummy entry. Signed-off-by: Ryosuke Yasuoka <ryasuoka@redhat.com> Message-ID: <20251206140939.144038-1-ryasuoka@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2025-12-29x86/microcode/AMD: Fix Entrysign revision check for Zen5/Strix HaloRong Zhang1-1/+1
Zen5 also contains family 1Ah, models 70h-7Fh, which are mistakenly missing from cpu_has_entrysign(). Add the missing range. Fixes: 8a9fb5129e8e ("x86/microcode/AMD: Limit Entrysign signature checking to known generations") Signed-off-by: Rong Zhang <i@rong.moe> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: stable@kernel.org Link: https://patch.msgid.link/20251229182245.152747-1-i@rong.moe
2025-12-21Merge tag 'x86-urgent-2025-12-21' of ↵Linus Torvalds6-5/+35
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: - Fix FPU core dumps on certain CPU models - Fix htmldocs build warning - Export TLB tracing event name via header - Remove unused constant from <linux/mm_types.h> - Fix comments - Fix whitespace noise in documentation - Fix variadic structure's definition to un-confuse UBSAN - Fix posted MSI interrupts irq_retrigger() bug - Fix asm build failure with older GCC builds * tag 'x86-urgent-2025-12-21' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bug: Fix old GCC compile fails x86/msi: Make irq_retrigger() functional for posted MSI x86/platform/uv: Fix UBSAN array-index-out-of-bounds mm: Remove tlb_flush_reason::NR_TLB_FLUSH_REASONS from <linux/mm_types.h> x86/mm/tlb/trace: Export the TLB_REMOTE_WRONG_CPU enum in <trace/events/tlb.h> x86/sgx: Remove unmatched quote in __sgx_encl_extend function comment x86/boot/Documentation: Fix whitespace noise in boot.rst x86/fpu: Fix FPU state core dump truncation on CPUs with no extended xfeatures x86/boot/Documentation: Fix htmldocs build warning due to malformed table in boot.rst
2025-12-20x86/irqflags: Use ASM_OUTPUT_RM in native_save_fl()Eric Dumazet1-1/+1
clang is generating very inefficient code for native_save_fl() which is used for local_irq_save() in critical spots. Allowing the "pop %0" to use memory: 1) forces the compiler to add annoying stack canaries when CONFIG_STACKPROTECTOR_STRONG=y in many places. 2) Almost always is followed by an immediate "move memory,register" One good example is _raw_spin_lock_irqsave, with 8 extra instructions ffffffff82067a30 <_raw_spin_lock_irqsave>: ffffffff82067a30: ... ffffffff82067a39: 53 push %rbx // Three instructions to ajust the stack, read the per-cpu canary // and copy it to 8(%rsp) ffffffff82067a3a: 48 83 ec 10 sub $0x10,%rsp ffffffff82067a3e: 65 48 8b 05 da 15 45 02 mov %gs:0x24515da(%rip),%rax # <__stack_chk_guard> ffffffff82067a46: 48 89 44 24 08 mov %rax,0x8(%rsp) ffffffff82067a4b: 9c pushf // instead of pop %rbx, compiler uses 2 instructions. ffffffff82067a4c: 8f 04 24 pop (%rsp) ffffffff82067a4f: 48 8b 1c 24 mov (%rsp),%rbx ffffffff82067a53: fa cli ffffffff82067a54: b9 01 00 00 00 mov $0x1,%ecx ffffffff82067a59: 31 c0 xor %eax,%eax ffffffff82067a5b: f0 0f b1 0f lock cmpxchg %ecx,(%rdi) ffffffff82067a5f: 75 1d jne ffffffff82067a7e <_raw_spin_lock_irqsave+0x4e> // three instructions to check the stack canary ffffffff82067a61: 65 48 8b 05 b7 15 45 02 mov %gs:0x24515b7(%rip),%rax # <__stack_chk_guard> ffffffff82067a69: 48 3b 44 24 08 cmp 0x8(%rsp),%rax ffffffff82067a6e: 75 17 jne ffffffff82067a87 ... // One extra instruction to adjust the stack. ffffffff82067a73: 48 83 c4 10 add $0x10,%rsp ... // One more instruction in case the stack was mangled. ffffffff82067a87: e8 a4 35 ff ff call ffffffff8205b030 <__stack_chk_fail> This patch changes nothing for gcc, but for clang saves ~20000 bytes of text even though more functions are inlined. $ size vmlinux.gcc.before vmlinux.gcc.after vmlinux.clang.before vmlinux.clang.after text data bss dec hex filename 45565821 25005462 4704800 75276083 47c9f33 vmlinux.gcc.before 45565821 25005462 4704800 75276083 47c9f33 vmlinux.gcc.after 45121072 24638617 5533040 75292729 47ce039 vmlinux.clang.before 45093887 24638633 5536808 75269328 47c84d0 vmlinux.clang.after $ scripts/bloat-o-meter -t vmlinux.clang.before vmlinux.clang.after add/remove: 1/2 grow/shrink: 21/533 up/down: 2250/-22112 (-19862) Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds7-17/+26
Pull x86 kvm fixes from Paolo Bonzini: "x86 fixes. Everyone else is already in holiday mood apparently. - Add a missing 'break' to fix param parsing in the rseq selftest - Apply runtime updates to the _current_ CPUID when userspace is setting CPUID, e.g. as part of vCPU hotplug, to fix a false positive and to avoid dropping the pending update - Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as it's not supported by KVM and leads to a use-after-free due to KVM failing to unbind the memslot from the previously-associated guest_memfd instance - Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for supporting flags-only changes on KVM_MEM_GUEST_MEMFD memlslots, e.g. for dirty logging - Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is defined as -1ull (a 64-bit value) - Update SVI when activating APICv to fix a bug where a post-activation EOI for an in-service IRQ would effective be lost due to SVI being stale - Immediately refresh APICv controls (if necessary) on a nested VM-Exit instead of deferring the update via KVM_REQ_APICV_UPDATE, as the request is effectively ignored because KVM thinks the vCPU already has the correct APICv settings" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: nVMX: Immediately refresh APICv controls as needed on nested VM-Exit KVM: VMX: Update SVI during runtime APICv activation KVM: nSVM: Set exit_code_hi to -1 when synthesizing SVM_EXIT_ERR (failed VMRUN) KVM: nSVM: Clear exit_code_hi in VMCB when synthesizing nested VM-Exits KVM: Harden and prepare for modifying existing guest_memfd memslots KVM: Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot KVM: selftests: Add a CPUID testcase for KVM_SET_CPUID2 with runtime updates KVM: x86: Apply runtime updates to current CPUID during KVM_SET_CPUID{,2} KVM: selftests: Add missing "break" in rseq_test's param parsing
2025-12-20Merge tag 'for-linus-6.19-rc2-tag' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull xen fix from Juergen Gross: "Just a single patch fixing a sparse warning" * tag 'for-linus-6.19-rc2-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: Fix sparse warning in enlighten_pv.c
2025-12-18Merge tag 'kvm-x86-fixes-6.19-rc1' of https://github.com/kvm-x86/linux into HEADPaolo Bonzini7-17/+26
KVM fixes for 6.19-rc1 - Add a missing "break" to fix param parsing in the rseq selftest. - Apply runtime updates to the _current_ CPUID when userspace is setting CPUID, e.g. as part of vCPU hotplug, to fix a false positive and to avoid dropping the pending update. - Disallow toggling KVM_MEM_GUEST_MEMFD on an existing memslot, as it's not supported by KVM and leads to a use-after-free due to KVM failing to unbind the memslot from the previously-associated guest_memfd instance. - Harden against similar KVM_MEM_GUEST_MEMFD goofs, and prepare for supporting flags-only changes on KVM_MEM_GUEST_MEMFD memlslots, e.g. for dirty logging. - Set exit_code[63:32] to -1 (all 0xffs) when synthesizing a nested SVM_EXIT_ERR (a.k.a. VMEXIT_INVALID) #VMEXIT, as VMEXIT_INVALID is defined as -1ull (a 64-bit value). - Update SVI when activating APICv to fix a bug where a post-activation EOI for an in-service IRQ would effective be lost due to SVI being stale. - Immediately refresh APICv controls (if necessary) on a nested VM-Exit instead of deferring the update via KVM_REQ_APICV_UPDATE, as the request is effectively ignored because KVM thinks the vCPU already has the correct APICv settings.
2025-12-18x86/bug: Fix old GCC compile failsPeter Zijlstra1-1/+1
For some mysterious reasons the GCC 8 and 9 preprocessor manages to sporadically fumble _ASM_BYTES(0x0f, 0x0b): $ grep ".byte[ ]*0x0f" defconfig-build/drivers/net/wireless/realtek/rtlwifi/base.s 1: .byte0x0f,0x0b ; 1: .byte 0x0f,0x0b ; which makes the assembler upset and all that. While there are more _ASM_BYTES() users (notably the NOP instructions), those don't seem affected. Therefore replace the offending ASM_UD2 with one using the ud2 mnemonic. Reported-by: Jean Delvare <jdelvare@suse.de> Suggested-by: Uros Bizjak <ubizjak@gmail.com> Fixes: 85a2d4a890dc ("x86,ibt: Use UDB instead of 0xEA") Cc: stable@kernel.org Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251218104659.GT3911114@noisy.programming.kicks-ass.net
2025-12-17x86/msi: Make irq_retrigger() functional for posted MSIThomas Gleixner2-0/+30
Luigi reported that retriggering a posted MSI interrupt does not work correctly. The reason is that the retrigger happens at the vector domain by sending an IPI to the actual vector on the target CPU. That works correctly exactly once because the posted MSI interrupt chip does not issue an EOI as that's only required for the posted MSI notification vector itself. As a consequence the vector becomes stale in the ISR, which not only affects this vector but also any lower priority vector in the affected APIC because the ISR bit is not cleared. Luigi proposed to set the vector in the remap PIR bitmap and raise the posted MSI notification vector. That works, but that still does not cure a related problem: If there is ever a stray interrupt on such a vector, then the related APIC ISR bit becomes stale due to the lack of EOI as described above. Unlikely to happen, but if it happens it's not debuggable at all. So instead of playing games with the PIR, this can be actually solved for both cases by: 1) Keeping track of the posted interrupt vector handler state 2) Implementing a posted MSI specific irq_ack() callback which checks that state. If the posted vector handler is inactive it issues an EOI, otherwise it delegates that to the posted handler. This is correct versus affinity changes and concurrent events on the posted vector as the actual handler invocation is serialized through the interrupt descriptor lock. Fixes: ed1e48ea4370 ("iommu/vt-d: Enable posted mode for device MSIs") Reported-by: Luigi Rizzo <lrizzo@google.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Luigi Rizzo <lrizzo@google.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20251125214631.044440658@linutronix.de Closes: https://lore.kernel.org/lkml/20251124104836.3685533-1-lrizzo@google.com
2025-12-17Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds2-12/+39
Pull bpf fixes from Alexei Starovoitov: - Fix BPF builds due to -fms-extensions. selftests (Alexei Starovoitov), bpftool (Quentin Monnet). - Fix build of net/smc when CONFIG_BPF_SYSCALL=y, but CONFIG_BPF_JIT=n (Geert Uytterhoeven) - Fix livepatch/BPF interaction and support reliable unwinding through BPF stack frames (Josh Poimboeuf) - Do not audit capability check in arm64 JIT (Ondrej Mosnacek) - Fix truncated dmabuf BPF iterator reads (T.J. Mercier) - Fix verifier assumptions of bpf_d_path's output buffer (Shuran Liu) - Fix warnings in libbpf when built with -Wdiscarded-qualifiers under C23 (Mikhail Gavrilov) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: selftests/bpf: add regression test for bpf_d_path() bpf: Fix verifier assumptions of bpf_d_path's output buffer selftests/bpf: Add test for truncated dmabuf_iter reads bpf: Fix truncated dmabuf iterator reads x86/unwind/orc: Support reliable unwinding through BPF stack frames bpf: Add bpf_has_frame_pointer() bpf, arm64: Do not audit capability check in do_jit() libbpf: Fix -Wdiscarded-qualifiers under C23 bpftool: Fix build warnings due to MS extensions net: smc: SMC_HS_CTRL_BPF should depend on BPF_JIT selftests/bpf: Add -fms-extensions to bpf build flags
2025-12-16x86/xen: Fix sparse warning in enlighten_pv.cJuergen Gross1-1/+1
The sparse tool issues a warning for arch/x76/xen/enlighten_pv.c: arch/x86/xen/enlighten_pv.c:120:9: sparse: sparse: incorrect type in initializer (different address spaces) expected void const [noderef] __percpu *__vpp_verify got bool * This is due to the percpu variable xen_in_preemptible_hcall being exported via EXPORT_SYMBOL_GPL() instead of EXPORT_PER_CPU_SYMBOL_GPL(). Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512140856.Ic6FetG6-lkp@intel.com/ Fixes: fdfd811ddde3 ("x86/xen: allow privcmd hypercalls to be preempted") Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Signed-off-by: Juergen Gross <jgross@suse.com> Message-ID: <20251215115112.15072-1-jgross@suse.com>
2025-12-14x86/platform/uv: Fix UBSAN array-index-out-of-boundsKyle Meyer1-1/+1
When UBSAN is enabled, multiple array-index-out-of-bounds messages are printed: [ 0.000000] [ T0] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:276:23 [ 0.000000] [ T0] index 1 is out of range for type '<unknown> [1]' ... [ 0.000000] [ T0] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:277:32 [ 0.000000] [ T0] index 1 is out of range for type '<unknown> [1]' ... [ 0.000000] [ T0] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:282:16 [ 0.000000] [ T0] index 1 is out of range for type '<unknown> [1]' ... [ 0.515850] [ T1] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:1344:23 [ 0.519851] [ T1] index 1 is out of range for type '<unknown> [1]' ... [ 0.603850] [ T1] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:1345:32 [ 0.607850] [ T1] index 1 is out of range for type '<unknown> [1]' ... [ 0.691850] [ T1] UBSAN: array-index-out-of-bounds in arch/x86/kernel/apic/x2apic_uv_x.c:1353:20 [ 0.695850] [ T1] index 1 is out of range for type '<unknown> [1]' One-element arrays have been deprecated: https://docs.kernel.org/process/deprecated.html#zero-length-and-one-element-arrays Switch entry in struct uv_systab to a flexible array member to fix UBSAN array-index-out-of-bounds messages. sizeof(struct uv_systab) is passed to early_memremap() and ioremap(). The flexible array member is not accessed until the UV system table size is used to remap the entire UV system table, so changes to sizeof(struct uv_systab) have no impact. Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/aTxksN-3otY41WvQ@hpe.com
2025-12-14Merge tag 'perf-urgent-2025-12-12' of ↵Linus Torvalds2-4/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull perf event fixes from Ingo Molnar: - Fix NULL pointer dereference crash in the Intel PMU driver - Fix missing read event generation on task exit - Fix AMD uncore driver init error handling - Fix whitespace noise * tag 'perf-urgent-2025-12-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common() perf/core: Fix missing read event generation on task exit perf/x86/amd/uncore: Fix the return value of amd_uncore_df_event_init() on error perf/uprobes: Remove <space><Tab> whitespace noise
2025-12-13x86/sgx: Remove unmatched quote in __sgx_encl_extend function commentThorsten Blum1-1/+1
There is no opening quote. Remove the unmatched closing quote. Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Signed-off-by: Ingo Molnar <mingo@kernel.org> Reviewed-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251210125628.544916-1-thorsten.blum@linux.dev
2025-12-13x86/hv: Add gitignore entry for generated header fileLinus Torvalds1-0/+1
Commit 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") added a new generated header file for the offsets into the mshv_vtl_cpu_context structure to be used by the low-level assembly code. But it didn't add the .gitignore file to go with it, so 'git status' and friends will mention it. Let's add the gitignore file before somebody thinks that generated header should be committed. Fixes: 7bfe3b8ea6e3 ("Drivers: hv: Introduce mshv_vtl driver") Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2025-12-12perf/x86/intel: Fix NULL event dereference crash in handle_pmi_common()Evan Li1-0/+3
handle_pmi_common() may observe an active bit set in cpuc->active_mask while the corresponding cpuc->events[] entry has already been cleared, which leads to a NULL pointer dereference. This can happen when interrupt throttling stops all events in a group while PEBS processing is still in progress. perf_event_overflow() can trigger perf_event_throttle_group(), which stops the group and clears the cpuc->events[] entry, but the active bit may still be set when handle_pmi_common() iterates over the events. The following recent fix: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss") moved the cpuc->events[] clearing from x86_pmu_stop() to x86_pmu_del() and relied on cpuc->active_mask/pebs_enabled checks. However, handle_pmi_common() can still encounter a NULL cpuc->events[] entry despite the active bit being set. Add an explicit NULL check on the event pointer before using it, to cover this legitimate scenario and avoid the NULL dereference crash. Fixes: 7e772a93eb61 ("perf/x86: Fix NULL event access and potential PEBS record loss") Reported-by: kitta <kitta@linux.alibaba.com> Co-developed-by: kitta <kitta@linux.alibaba.com> Signed-off-by: Evan Li <evan.li@linux.alibaba.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251212084943.2124787-1-evan.li@linux.alibaba.com Closes: https://bugzilla.kernel.org/show_bug.cgi?id=220855
2025-12-10x86/fpu: Fix FPU state core dump truncation on CPUs with no extended xfeaturesYongxin Liu1-2/+2
Zero can be a valid value of num_records. For example, on Intel Atom x6425RE, only x87 and SSE are supported (features 0, 1), and fpu_user_cfg.max_features is 3. The for_each_extended_xfeature() loop only iterates feature 2, which is not enabled, so num_records = 0. This is valid and should not cause core dump failure. The issue is that dump_xsave_layout_desc() returns 0 for both genuine errors (dump_emit() failure) and valid cases (no extended features). Use negative return values for errors and only abort on genuine failures. Fixes: ba386777a30b ("x86/elf: Add a new FPU buffer layout info to x86 core files") Signed-off-by: Yongxin Liu <yongxin.liu@windriver.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://patch.msgid.link/20251210000219.4094353-2-yongxin.liu@windriver.com
2025-12-09x86/unwind/orc: Support reliable unwinding through BPF stack framesJosh Poimboeuf1-12/+27
BPF JIT programs and trampolines use a frame pointer, so the current ORC unwinder strategy of falling back to frame pointers (when an ORC entry is missing) usually works in practice when unwinding through BPF JIT stack frames. However, that frame pointer fallback is just a guess, so the unwind gets marked unreliable for live patching, which can cause livepatch transition stalls. Make the common case reliable by calling the bpf_has_frame_pointer() helper to detect the valid frame pointer region of BPF JIT programs and trampolines. Fixes: ee9f8fce9964 ("x86/unwind: Add the ORC unwinder") Reported-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Closes: https://lore.kernel.org/0e555733-c670-4e84-b2e6-abb8b84ade38@crowdstrike.com Acked-by: Song Liu <song@kernel.org> Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/a18505975662328c8ffb1090dded890c6f8c1004.1764818927.git.jpoimboe@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org>
2025-12-09bpf: Add bpf_has_frame_pointer()Josh Poimboeuf1-0/+12
Introduce a bpf_has_frame_pointer() helper that unwinders can call to determine whether a given instruction pointer is within the valid frame pointer region of a BPF JIT program or trampoline (i.e., after the prologue, before the epilogue). This will enable livepatch (with the ORC unwinder) to reliably unwind through BPF JIT frames. Acked-by: Song Liu <song@kernel.org> Acked-and-tested-by: Andrey Grodzovsky <andrey.grodzovsky@crowdstrike.com> Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Link: https://lore.kernel.org/r/fd2bc5b4e261a680774b28f6100509fd5ebad2f0.1764818927.git.jpoimboe@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Jiri Olsa <jolsa@kernel.org>
2025-12-10Merge tag 'auto-type-conversion-for-v6.19-rc1' of ↵Linus Torvalds3-5/+5
git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-auto Pull __auto_type to auto conversion from Peter Anvin: "Convert '__auto_type' to 'auto', defining a macro for 'auto' unless C23+ is in use" * tag 'auto-type-conversion-for-v6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/hpa/linux-auto: tools/virtio: replace "__auto_type" with "auto" selftests/bpf: replace "__auto_type" with "auto" arch/x86: replace "__auto_type" with "auto" arch/nios2: replace "__auto_type" and adjacent equivalent with "auto" fs/proc: replace "__auto_type" with "const auto" include/linux: change "__auto_type" to "auto" compiler_types.h: add "auto" as a macro for "__auto_type"
2025-12-09perf/x86/amd/uncore: Fix the return value of amd_uncore_df_event_init() on errorSandipan Das1-4/+1
If amd_uncore_event_init() fails, return an error irrespective of the pmu_version. Setting hwc->config should be safe even if there is an error so use this opportunity to simplify the code. Closes: https://lore.kernel.org/all/aTaI0ci3vZ44lmBn@stanley.mountain/ Fixes: d6389d3ccc13 ("perf/x86/amd/uncore: Refactor uncore management") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Signed-off-by: Sandipan Das <sandipan.das@amd.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/076935e23a70335d33bd6e23308b75ae0ad35ba2.1765268667.git.sandipan.das@amd.com
2025-12-08arch/x86: replace "__auto_type" with "auto"H. Peter Anvin3-5/+5
Replace instances of "__auto_type" with "auto" in: arch/x86/include/asm/bug.h arch/x86/include/asm/string_64.h arch/x86/include/asm/uaccess_64.h Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com>
2025-12-09Merge tag 'hyperv-next-signed-20251207' of ↵Linus Torvalds10-21/+1071
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Enhancements to Linux as the root partition for Microsoft Hypervisor: - Support a new mode called L1VH, which allows Linux to drive the hypervisor running the Azure Host directly - Support for MSHV crash dump collection - Allow Linux's memory management subsystem to better manage guest memory regions - Fix issues that prevented a clean shutdown of the whole system on bare metal and nested configurations - ARM64 support for the MSHV driver - Various other bug fixes and cleanups - Add support for Confidential VMBus for Linux guest on Hyper-V - Secure AVIC support for Linux guests on Hyper-V - Add the mshv_vtl driver to allow Linux to run as the secure kernel in a higher virtual trust level for Hyper-V * tag 'hyperv-next-signed-20251207' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (58 commits) mshv: Cleanly shutdown root partition with MSHV mshv: Use reboot notifier to configure sleep state mshv: Add definitions for MSHV sleep state configuration mshv: Add support for movable memory regions mshv: Add refcount and locking to mem regions mshv: Fix huge page handling in memory region traversal mshv: Move region management to mshv_regions.c mshv: Centralize guest memory region destruction mshv: Refactor and rename memory region handling functions mshv: adjust interrupt control structure for ARM64 Drivers: hv: use kmalloc_array() instead of kmalloc() mshv: Add ioctl for self targeted passthrough hvcalls Drivers: hv: Introduce mshv_vtl driver Drivers: hv: Export some symbols for mshv_vtl static_call: allow using STATIC_CALL_TRAMP_STR() from assembly mshv: Extend create partition ioctl to support cpu features mshv: Allow mappings that overlap in uaddr mshv: Fix create memory region overlap check mshv: add WQ_PERCPU to alloc_workqueue users Drivers: hv: Use kmalloc_array() instead of kmalloc() ...
2025-12-08KVM: nVMX: Immediately refresh APICv controls as needed on nested VM-ExitDongli Zhang1-1/+2
If an APICv status updated was pended while L2 was active, immediately refresh vmcs01's controls instead of pending KVM_REQ_APICV_UPDATE as kvm_vcpu_update_apicv() only calls into vendor code if a change is necessary. E.g. if APICv is inhibited, and then activated while L2 is running: kvm_vcpu_update_apicv() | -> __kvm_vcpu_update_apicv() | -> apic->apicv_active = true | -> vmx_refresh_apicv_exec_ctrl() | -> vmx->nested.update_vmcs01_apicv_status = true | -> return Then L2 exits to L1: __nested_vmx_vmexit() | -> kvm_make_request(KVM_REQ_APICV_UPDATE) vcpu_enter_guest(): KVM_REQ_APICV_UPDATE -> kvm_vcpu_update_apicv() | -> __kvm_vcpu_update_apicv() | -> return // because if (apic->apicv_active == activate) Reported-by: Chao Gao <chao.gao@intel.com> Closes: https://lore.kernel.org/all/aQ2jmnN8wUYVEawF@intel.com Fixes: 7c69661e225c ("KVM: nVMX: Defer APICv updates while L2 is active until L1 is active") Cc: stable@vger.kernel.org Signed-off-by: Dongli Zhang <dongli.zhang@oracle.com> [sean: write changelog] Link: https://patch.msgid.link/20251205231913.441872-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2025-12-08KVM: VMX: Update SVI during runtime APICv activationDongli Zhang2-9<