aboutsummaryrefslogtreecommitdiff
path: root/arch/x86/kernel/apic
AgeCommit message (Collapse)AuthorFilesLines
2026-04-14Merge tag 'x86-cleanups-2026-04-13' of ↵Linus Torvalds1-6/+6
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: - Consolidate AMD and Hygon cases in parse_topology() (Wei Wang) - asm constraints cleanups in __iowrite32_copy() (Uros Bizjak) - Drop AMD Extended Interrupt LVT macros (Naveen N Rao) - Don't use REALLY_SLOW_IO for delays (Juergen Gross) - paravirt cleanups (Juergen Gross) - FPU code cleanups (Borislav Petkov) - split-lock handling code cleanups (Borislav Petkov, Ronan Pigott) * tag 'x86-cleanups-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Correct the comment explaining what xfeatures_in_use() does x86/split_lock: Don't warn about unknown split_lock_detect parameter x86/fpu: Correct misspelled xfeaures_to_write local var x86/apic: Drop AMD Extended Interrupt LVT macros x86/cpu/topology: Consolidate AMD and Hygon cases in parse_topology() block/floppy: Don't use REALLY_SLOW_IO for delays x86/paravirt: Replace io_delay() hook with a bool x86/irqflags: Preemptively move include paravirt.h directive where it belongs x86/split_lock: Restructure the unwieldy switch-case in sld_state_show() x86/local: Remove trailing semicolon from _ASM_XADD in local_add_return() x86/asm: Use inout "+" asm onstraint modifiers in __iowrite32_copy()
2026-04-11Merge branch 'timers/urgent' into timers/coreThomas Gleixner1-2/+16
to resolve the conflict with urgent fixes.
2026-04-04x86/apic: Drop AMD Extended Interrupt LVT macrosNaveen N Rao (AMD)1-6/+6
AMD defines Extended Interrupt Local Vector Table (EILVT) registers to allow for additional interrupt sources. While the APIC registers for those are unique to AMD, the format of those registers follows the standard LVT registers. Drop EILVT-specific macros in favor of the standard APIC LVT macros. Drop unused APIC_EILVT_NR_AMD_K8 and APIC_EILVT_LVTOFF while at it. No functional change. [ bp: Merge the two cleanup patches into one. ] Signed-off-by: Naveen N Rao (AMD) <naveen@kernel.org> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Manali Shukla <manali.shukla@amd.com> Link: https://patch.msgid.link/b98d69037c0102d2ccd082a941888a689cd214c9.1775019269.git.naveen@kernel.org
2026-03-21Merge tag 'v7.0-rc4' into timers/core, to resolve conflictIngo Molnar1-0/+6
Resolve conflict between this change in the upstream kernel: 4c652a47722f ("rseq: Mark rseq_arm_slice_extension_timer() __always_inline") ... and this pending change in timers/core: 0e98eb14814e ("entry: Prepare for deferred hrtimer rearming") Signed-off-by: Ingo Molnar <mingo@kernel.org>
2026-03-20x86/platform/uv: Handle deconfigured socketsKyle Meyer1-2/+16
When a socket is deconfigured, it's mapped to SOCK_EMPTY (0xffff). This causes a panic while allocating UV hub info structures. Fix this by using NUMA_NO_NODE, allowing UV hub info structures to be allocated on valid nodes. Fixes: 8a50c5851927 ("x86/platform/uv: UV support for sub-NUMA clustering") Signed-off-by: Kyle Meyer <kyle.meyer@hpe.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/ab2BmGL0ehVkkjKk@hpe.com
2026-03-10x86/apic: Disable x2apic on resume if the kernel expects soShashank Balaji1-0/+6
When resuming from s2ram, firmware may re-enable x2apic mode, which may have been disabled by the kernel during boot either because it doesn't support IRQ remapping or for other reasons. This causes the kernel to continue using the xapic interface, while the hardware is in x2apic mode, which causes hangs. This happens on defconfig + bare metal + s2ram. Fix this in lapic_resume() by disabling x2apic if the kernel expects it to be disabled, i.e. when x2apic_mode = 0. The ACPI v6.6 spec, Section 16.3 [1] says firmware restores either the pre-sleep configuration or initial boot configuration for each CPU, including MSR state: When executing from the power-on reset vector as a result of waking from an S2 or S3 sleep state, the platform firmware performs only the hardware initialization required to restore the system to either the state the platform was in prior to the initial operating system boot, or to the pre-sleep configuration state. In multiprocessor systems, non-boot processors should be placed in the same state as prior to the initial operating system boot. (further ahead) If this is an S2 or S3 wake, then the platform runtime firmware restores minimum context of the system before jumping to the waking vector. This includes: CPU configuration. Platform runtime firmware restores the pre-sleep configuration or initial boot configuration of each CPU (MSR, MTRR, firmware update, SMBase, and so on). Interrupts must be disabled (for IA-32 processors, disabled by CLI instruction). (and other things) So at least as per the spec, re-enablement of x2apic by the firmware is allowed if "x2apic on" is a part of the initial boot configuration. [1] https://uefi.org/specs/ACPI/6.6/16_Waking_and_Sleeping.html#initialization [ bp: Massage. ] Fixes: 6e1cb38a2aef ("x64, x2apic/intr-remap: add x2apic support, including enabling interrupt-remapping") Co-developed-by: Rahul Bukte <rahul.bukte@sony.com> Signed-off-by: Rahul Bukte <rahul.bukte@sony.com> Signed-off-by: Shashank Balaji <shashank.mahadasyam@sony.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Thomas Gleixner <tglx@kernel.org> Reviewed-by: Sohil Mehta <sohil.mehta@intel.com> Cc: stable@vger.kernel.org Link: https://patch.msgid.link/20260306-x2apic-fix-v2-1-bee99c12efa3@sony.com
2026-02-27x86/apic: Enable TSC coupled programming modeThomas Gleixner1-6/+6
The TSC deadline timer is directly coupled to the TSC and setting the next deadline is tedious as the clockevents core code converts the CLOCK_MONOTONIC based absolute expiry time to a relative expiry by reading the current time from the TSC. It converts that delta to cycles and hands the result to lapic_next_deadline(), which then has read to the TSC and add the delta to program the timer. The core code now supports coupled clock event devices and can provide the expiry time in TSC cycles directly without reading the TSC at all. This obviouly works only when the TSC is the current clocksource, but that's the default for all modern CPUs which implement the TSC deadline timer. If the TSC is not the current clocksource (e.g. early boot) then the core code falls back to the relative set_next_event() callback as before. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163430.076565985@kernel.org
2026-02-27x86/apic: Avoid the PVOPS indirection for the TSC deadline timerThomas Gleixner1-3/+10
XEN PV does not emulate the TSC deadline timer, so the PVOPS indirection for writing the deadline MSR can be avoided completely. Use native_wrmsrq() instead. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.877429827@kernel.org
2026-02-27x86/apic: Remove pointless fence in lapic_next_deadline()Thomas Gleixner1-9/+7
lapic_next_deadline() contains a fence before the TSC read and the write to the TSC_DEADLINE MSR with a content free and therefore useless comment: /* This MSR is special and need a special fence: */ The MSR is not really special. It is just not a serializing MSR, but that does not matter at all in this context as all of these operations are strictly CPU local. The only thing the fence prevents is that the RDTSC is speculated ahead, but that's not really relevant as the delta is calculated way before based on a previous TSC read and therefore inaccurate by definition. So removing the fence is just making it slightly more inaccurate in the worst case, but that is irrelevant as it's way below the actual system immanent latencies and variations. Signed-off-by: Thomas Gleixner <tglx@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20260224163429.809059527@kernel.org
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds1-1/+1
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook1-1/+1
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-01-26x86/apic: Inline __x2apic_send_IPI_dest()Eric Dumazet2-7/+9
Avoid one call/ret in networking RPS/RFS fast path, with no space cost. scripts/bloat-o-meter -t vmlinux.before vmlinux.after add/remove: 0/2 grow/shrink: 2/0 up/down: 96/-97 (-1) Function old new delta x2apic_send_IPI 146 198 +52 __x2apic_send_IPI_mask 326 370 +44 __pfx___x2apic_send_IPI_dest 16 - -16 __x2apic_send_IPI_dest 81 - -81 Total: Before=20861234, After=20861233, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Thomas Gleixner <tglx@kernel.org> Link: https://patch.msgid.link/20251223052026.4141498-1-edumazet@google.com
2025-12-05Merge tag 'soc-drivers-6.19' of ↵Linus Torvalds2-8/+21
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc Pull SoC driver updates from Arnd Bergmann: "This is the first half of the driver changes: - A treewide interface change to the "syscore" operations for power management, as a preparation for future Tegra specific changes - Reset controller updates with added drivers for LAN969x, eic770 and RZ/G3S SoCs - Protection of system controller registers on Renesas and Google SoCs, to prevent trivially triggering a system crash from e.g. debugfs access - soc_device identification updates on Nvidia, Exynos and Mediatek - debugfs support in the ST STM32 firewall driver - Minor updates for SoC drivers on AMD/Xilinx, Renesas, Allwinner, TI - Cleanups for memory controller support on Nvidia and Renesas" * tag 'soc-drivers-6.19' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (114 commits) memory: tegra186-emc: Fix missing put_bpmp Documentation: reset: Remove reset_controller_add_lookup() reset: fix BIT macro reference reset: rzg2l-usbphy-ctrl: Fix a NULL vs IS_ERR() bug in probe reset: th1520: Support reset controllers in more subsystems reset: th1520: Prepare for supporting multiple controllers dt-bindings: reset: thead,th1520-reset: Add controllers for more subsys dt-bindings: reset: thead,th1520-reset: Remove non-VO-subsystem resets reset: remove legacy reset lookup code clk: davinci: psc: drop unused reset lookup reset: rzg2l-usbphy-ctrl: Add support for RZ/G3S SoC reset: rzg2l-usbphy-ctrl: Add support for USB PWRRDY dt-bindings: reset: renesas,rzg2l-usbphy-ctrl: Document RZ/G3S support reset: eswin: Add eic7700 reset driver dt-bindings: reset: eswin: Documentation for eic7700 SoC reset: sparx5: add LAN969x support dt-bindings: reset: microchip: Add LAN969x support soc: rockchip: grf: Add select correct PWM implementation on RK3368 soc/tegra: pmc: Add USB wake events for Tegra234 amba: tegra-ahb: Fix device leak on SMMU enable ...
2025-12-02Merge tag 'x86_misc_for_6.19-rc1' of ↵Linus Torvalds2-2/+4
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 updates from Dave Hansen: "The most significant are some changes to ensure that symbols exported for KVM are used only by KVM modules themselves, along with some related cleanups. In true x86/misc fashion, the other patch is completely unrelated and just enhances an existing pr_warn() to make it clear to users how they have tainted their kernel when something is mucking with MSRs. Summary: - Make MSR-induced taint easier for users to track down - Restrict KVM-specific exports to KVM itself" * tag 'x86_misc_for_6.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possible x86/mm: Drop unnecessary export of "ptdump_walk_pgd_level_debugfs" x86/mtrr: Drop unnecessary export of "mtrr_state" x86/bugs: Drop unnecessary export of "x86_spec_ctrl_base" x86/msr: Add CPU_OUT_OF_SPEC taint name to "unrecognized" pr_warn(msg)
2025-11-14syscore: Pass context data to callbacksThierry Reding2-8/+21
Several drivers can benefit from registering per-instance data along with the syscore operations. To achieve this, move the modifiable fields out of the syscore_ops structure and into a separate struct syscore that can be registered with the framework. Add a void * driver data field for drivers to store contextual data that will be passed to the syscore ops. Acked-by: Rafael J. Wysocki (Intel) <rafael@kernel.org> Signed-off-by: Thierry Reding <treding@nvidia.com>
2025-11-12x86: Restrict KVM-induced symbol exports to KVM modules where obvious/possibleSean Christopherson2-2/+4
Extend KVM's export macro framework to provide EXPORT_SYMBOL_FOR_KVM(), and use the helper macro to export symbols for KVM throughout x86 if and only if KVM will build one or more modules, and only for those modules. To avoid unnecessary exports when CONFIG_KVM=m but kvm.ko will not be built (because no vendor modules are selected), let arch code #define EXPORT_SYMBOL_FOR_KVM to suppress/override the exports. Note, the set of symbols to restrict to KVM was generated by manual search and audit; any "misses" are due to human error, not some grand plan. Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Kai Huang <kai.huang@intel.com> Tested-by: Kai Huang <kai.huang@intel.com> Link: https://patch.msgid.link/20251112173944.1380633-5-seanjc%40google.com
2025-11-07x86/apic: Fix frequency in apic=verbose log outputJulian Stecklina1-6/+9
When apic=verbose is specified, the LAPIC timer calibration prints its results to the console. At least while debugging virtualization code, the CPU and bus frequencies are printed incorrectly. Specifically, for a 1.7 GHz CPU with 1 GHz bus frequency and HZ=1000, the log includes a superfluous 0 after the period: ..... calibration result: 999978 ..... CPU clock speed is 1696.0783 MHz. ..... host bus clock speed is 999.0978 MHz. Looking at the code, this only worked as intended for HZ=100. After the fix, the correct frequency is printed: ..... calibration result: 999828 ..... CPU clock speed is 1696.507 MHz. ..... host bus clock speed is 999.828 MHz. There is no functional change to the LAPIC calibration here, beyond the printing format changes. [ bp: - Massage commit message - Figures it should apply this patch about ~4 years later - Massage it into the current code ] Suggested-by: Markus Napierkowski <markus.napierkowski@cyberus-technology.de> Signed-off-by: Julian Stecklina <julian.stecklina@cyberus-technology.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://patch.msgid.link/20211030142148.143261-1-js@alien8.de
2025-10-21x86/ioapic: Simplify mp_irqdomain_alloc() slightlyChristophe JAILLET1-1/+1
The IRQ return value of irq_find_mapping() is only tested for existence, not used for anything else. So, this call can be replaced by a slightly simpler irq_resolve_mapping() call, which reduces generated code size a bit (x86-64 allmodconfig): text data bss dec hex filename 82142 38633 18048 138823 21e47 arch/x86/kernel/apic/io_apic.o.before 81932 38633 18048 138613 21d75 arch/x86/kernel/apic/io_apic.o.after [ mingo: Fixed & simplified the changelog ] Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: kernel-janitors@vger.kernel.org Link: https://patch.msgid.link/cb3a4968538637aac3a5ae4f5ecc4f5eb43376ea.1760861877.git.christophe.jaillet@wanadoo.fr
2025-09-04x86/apic/savic: Do not use snp_abort()Borislav Petkov (AMD)1-3/+3
This function is going away so replace the callsites with the equivalent functionality. Add a new SAVIC-specific termination reason. If more granularity is needed there, it will be revisited in the future. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de>
2025-09-01x86/apic: Enable Secure AVIC in the control MSRNeeraj Upadhyay1-1/+2
With all the pieces in place now, enable Secure AVIC in the Secure AVIC Control MSR. Any access to x2APIC MSRs are emulated by the hypervisor before Secure AVIC is enabled in the control MSR. Post Secure AVIC enablement, all x2APIC MSR accesses (whether accelerated by AVIC hardware or trapped as a #VC exception) operate on the vCPU's APIC backing page. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828112126.209028-1-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Add kexec support for Secure AVICNeeraj Upadhyay2-0/+11
Add a apic->teardown() callback to disable Secure AVIC before rebooting into the new kernel. This ensures that the new kernel does not access the old APIC backing page which was allocated by the previous kernel. Such accesses can happen if there are any APIC accesses done during the guest boot before Secure AVIC driver probe is done by the new kernel (as Secure AVIC would have remained enabled in the Secure AVIC control MSR). Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828112008.209013-1-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Handle EOI writes for Secure AVIC guestsNeeraj Upadhyay1-1/+30
Secure AVIC accelerates the guest's EOI MSR writes for edge-triggered interrupts. For level-triggered interrupts, EOI MSR writes trigger a #VC exception with an SVM_EXIT_AVIC_UNACCELERATED_ACCESS error code. To complete EOI handling, the #VC exception handler would need to trigger a GHCB protocol MSR write event to notify the hypervisor about completion of the level-triggered interrupt. Hypervisor notification is required for cases like emulated IO-APIC, to complete and clear interrupt in the IO-APIC's interrupt state. However, #VC exception handling adds extra performance overhead for APIC register writes. In addition, for Secure AVIC, some unaccelerated APIC register MSR writes are trapped, whereas others are faulted. This results in additional complexity in #VC exception handling for unaccelerated APIC MSR accesses. So, directly do a GHCB protocol based APIC EOI MSR write from apic->eoi() callback for level-triggered interrupts. Use WRMSR for edge-triggered interrupts, so that hardware re-evaluates any pending interrupt which can be delivered to the guest vCPU. For level-triggered interrupts, re-evaluation happens on return from VMGEXIT corresponding to the GHCB event for APIC EOI MSR write. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828111654.208987-1-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Read and write LVT* APIC registers from HV for SAVIC guestsNeeraj Upadhyay1-10/+10
The Hypervisor needs information about the current state of the LVT registers for device emulation and NMIs. So, forward reads and write of these registers to the hypervisor for Secure AVIC enabled guests. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828111356.208972-1-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Allow NMI to be injected from hypervisor for Secure AVICNeeraj Upadhyay1-0/+2
Secure AVIC requires the "AllowedNmi" bit in the Secure AVIC Control MSR to be set for an NMI to be injected from the hypervisor. So set it. Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828111243.208946-1-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Add support to send NMI IPI for Secure AVICNeeraj Upadhyay1-9/+17
Secure AVIC introduces a new field in the APIC backing page "NmiReq" that has to be set by the guest to request a NMI IPI through APIC_ICR write. Add support to set NmiReq appropriately to send NMI IPI. Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828111213.208933-1-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Support LAPIC timer for Secure AVICNeeraj Upadhyay2-2/+7
Secure AVIC requires the LAPIC timer to be emulated by the hypervisor. KVM already supports emulating the LAPIC timer using hrtimers. In order to emulate it, APIC_LVTT, APIC_TMICT and APIC_TDCR register values need to be propagated to the hypervisor for arming the timer. APIC_TMCCT register value has to be read from the hypervisor, which is required for calibrating the APIC timer. So, read/write all APIC timer registers from/to the hypervisor. Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828110926.208866-1-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Add support to send IPI for Secure AVICNeeraj Upadhyay1-3/+135
Secure AVIC hardware accelerates only Self-IPI, i.e. on WRMSR to APIC_SELF_IPI and APIC_ICR (with destination shorthand equal to Self) registers, hardware takes care of updating the APIC_IRR in the APIC backing page of the vCPU. For other IPI types (cross-vCPU, broadcast IPIs), software needs to take care of updating the APIC_IRR state of the target vCPUs and to ensure that the target vCPUs notice the new pending interrupt. Add new callbacks in the Secure AVIC driver for sending IPI requests. These callbacks update the IRR in the target guest vCPU's APIC backing page. To ensure that the remote vCPU notices the new pending interrupt, reuse the GHCB MSR handling code in vc_handle_msr() to issue APIC_ICR MSR-write GHCB protocol event to the hypervisor. For Secure AVIC guests, on APIC_ICR write MSR exits, the hypervisor notifies the target vCPU by either sending an AVIC doorbell (if target vCPU is running) or by waking up the non-running target vCPU. Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828110824.208851-1-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Add an update_vector() callback for Secure AVICNeeraj Upadhyay1-0/+23
Add an update_vector() callback to set/clear the ALLOWED_IRR field in a vCPU's APIC backing page for vectors which are emulated by the hypervisor. The ALLOWED_IRR field indicates the interrupt vectors which the guest allows the hypervisor to inject (typically for emulated devices). Interrupt vectors used exclusively by the guest itself and the vectors which are not emulated by the hypervisor, such as IPI vectors, should not be set by the guest in the ALLOWED_IRR fields. As clearing/setting state of a vector will also be used in subsequent commits for other APIC registers (such as APIC_IRR update for sending IPI), add a common update_vector() in the Secure AVIC driver. [ bp: Massage commit message. ] Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828110255.208779-4-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Add update_vector() callback for APIC driversNeeraj Upadhyay1-11/+17
Add an update_vector() callback to allow APIC drivers to perform driver specific operations on external vector allocation/teardown on a CPU. This callback will be used by the Secure AVIC APIC driver to configure the vectors which a guest vCPU allows the hypervisor to send to it. As system vectors have fixed vector assignments and are not dynamically allocated, add an apic_update_vector() public API to facilitate update_vector() callback invocation for them. This will be used for Secure AVIC enabled guests to allow the hypervisor to inject system vectors which are emulated by the hypervisor such as APIC timer vector and HYPERVISOR_CALLBACK_VECTOR. While at it, cleanup line break in apic_update_irq_cfg(). Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250828110255.208779-3-Neeraj.Upadhyay@amd.com
2025-09-01x86/apic: Initialize APIC ID for Secure AVICNeeraj Upadhyay1-0/+6
Initialize the APIC ID in the Secure AVIC APIC backing page with the APIC_ID MSR value read from the hypervisor. CPU topology evaluation later during boot would catch and report any duplicate APIC ID for two CPUs. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828110255.208779-2-Neeraj.Upadhyay@amd.com
2025-08-31x86/apic: Populate .read()/.write() callbacks of Secure AVIC driverNeeraj Upadhyay1-2/+120
Add read() and write() APIC callback functions to read and write the x2APIC registers directly from the guest APIC backing page of a vCPU. The x2APIC registers are mapped at an offset within the guest APIC backing page which is the same as their x2APIC MMIO offset. Secure AVIC adds new registers such as ALLOWED_IRRs (which are at 4-byte offset within the IRR register offset range) and NMI_REQ to the APIC register space. When Secure AVIC is enabled, accessing the guest's APIC registers through RD/WRMSR results in a #VC exception (for non-accelerated register accesses) with error code VMEXIT_AVIC_NOACCEL. The #VC exception handler can read/write the x2APIC register in the guest APIC backing page to complete the RDMSR/WRMSR. Since doing this would increase the latency of accessing the x2APIC registers, instead of doing RDMSR/WRMSR based register accesses and handling reads/writes in the #VC exception, directly read/write the APIC registers from/to the guest APIC backing page of the vCPU in read() and write() callbacks of the Secure AVIC APIC driver. [ bp: Massage commit message. ] Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828110255.208779-1-Neeraj.Upadhyay@amd.com
2025-08-31x86/apic: Initialize Secure AVIC APIC backing pageNeeraj Upadhyay2-0/+38
With Secure AVIC, the APIC backing page is owned and managed by the guest. Allocate and initialize APIC backing page for all guest CPUs. The NPT entry for a vCPU's APIC backing page must always be present when the vCPU is running in order for Secure AVIC to function. A VMEXIT_BUSY is returned on VMRUN and the vCPU cannot be resumed otherwise. To handle this, notify GPA of the vCPU's APIC backing page to the hypervisor by using the SVM_VMGEXIT_SECURE_AVIC GHCB protocol event. Before executing VMRUN, the hypervisor makes use of this information to make sure the APIC backing page is mapped in the NPT. [ bp: Massage commit message. ] Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828070334.208401-3-Neeraj.Upadhyay@amd.com
2025-08-28x86/apic: Add new driver for Secure AVICNeeraj Upadhyay2-0/+64
The Secure AVIC feature provides SEV-SNP guests hardware acceleration for performance sensitive APIC accesses while securely managing the guest-owned APIC state through the use of a private APIC backing page. This helps prevent the hypervisor from generating unexpected interrupts for a vCPU or otherwise violate architectural assumptions around the APIC behavior. Add a new x2APIC driver that will serve as the base of the Secure AVIC support. It is initially the same as the x2APIC physical driver (without IPI callbacks), but will be modified as features are implemented. As the new driver does not implement Secure AVIC features yet, if the hypervisor sets the Secure AVIC bit in SEV_STATUS, maintain the existing behavior to enforce the guest termination. [ bp: Massage commit message. ] Co-developed-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Kishon Vijay Abraham I <kvijayab@amd.com> Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Tianyu Lan <tiala@microsoft.com> Link: https://lore.kernel.org/20250828070334.208401-2-Neeraj.Upadhyay@amd.com
2025-08-25x86/apic: Make the ISR clearing saneThomas Gleixner1-40/+37
apic_pending_intr_clear() is fundamentally voodoo programming. It's primary purpose is to clear stale ISR bits in the local APIC, which would otherwise lock the corresponding interrupt priority level. The comments and the implementation claim falsely that after clearing the stale ISR bits, eventually stale IRR bits would be turned into ISR bits and can be cleared as well. That's just wishful thinking because: 1) If interrupts are disabled, the APIC does not propagate an IRR bit to the ISR. 2) If interrupts are enabled, then the APIC propagates the IRR bit to the ISR and raises the interrupt in the CPU, which means that code _cannot_ observe the ISR bit for any of those IRR bits. Rename the function to reflect the purpose and make exactly _one_ attempt to EOI the pending ISR bits and add comments why traversing the pending bit map in low to high priority order is correct. Instead of trying to "clear" IRR bits, simply print a warning message when the IRR is not empty. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/871ppwih4s.ffs@tglx
2025-07-15x86/apic: Move apic_update_irq_cfg() call to apic_update_vector()Neeraj Upadhyay1-2/+2
All callers of apic_update_vector() also call apic_update_irq_cfg() after it. So, move the apic_update_irq_cfg() call to apic_update_vector(). No functional change intended. Signed-off-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Link: https://lore.kernel.org/20250709033242.267892-18-Neeraj.Upadhyay@amd.com
2025-06-03Merge tag 'hyperv-next-signed-20250602' of ↵Linus Torvalds3-3/+9
git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux Pull hyperv updates from Wei Liu: - Support for Virtual Trust Level (VTL) on arm64 (Roman Kisel) - Fixes for Hyper-V UIO driver (Long Li) - Fixes for Hyper-V PCI driver (Michael Kelley) - Select CONFIG_SYSFB for Hyper-V guests (Michael Kelley) - Documentation updates for Hyper-V VMBus (Michael Kelley) - Enhance logging for hv_kvp_daemon (Shradha Gupta) * tag 'hyperv-next-signed-20250602' of git://git.kernel.org/pub/scm/linux/kernel/git/hyperv/linux: (23 commits) Drivers: hv: Always select CONFIG_SYSFB for Hyper-V guests Drivers: hv: vmbus: Add comments about races with "channels" sysfs dir Documentation: hyperv: Update VMBus doc with new features and info PCI: hv: Remove unnecessary flex array in struct pci_packet Drivers: hv: Remove hv_alloc/free_* helpers Drivers: hv: Use kzalloc for panic page allocation uio_hv_generic: Align ring size to system page uio_hv_generic: Use correct size for interrupt and monitor pages Drivers: hv: Allocate interrupt and monitor pages aligned to system page boundary arch/x86: Provide the CPU number in the wakeup AP callback x86/hyperv: Fix APIC ID and VP index confusion in hv_snp_boot_ap() PCI: hv: Get vPCI MSI IRQ domain from DeviceTree ACPI: irq: Introduce acpi_get_gsi_dispatcher() Drivers: hv: vmbus: Introduce hv_get_vmbus_root_device() Drivers: hv: vmbus: Get the IRQ number from DeviceTree dt-bindings: microsoft,vmbus: Add interrupt and DMA coherence properties arm64, x86: hyperv: Report the VTL the system boots in arm64: hyperv: Initialize the Virtual Trust Level field Drivers: hv: Provide arch-neutral implementation of get_vtl() Drivers: hv: Enable VTL mode for arm64 ...
2025-05-27Merge tag 'timers-cleanups-2025-05-25' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer cleanups from Thomas Gleixner: "Another set of timer API cleanups: - Convert init_timer*(), try_to_del_timer_sync() and destroy_timer_on_stack() over to the canonical timer_*() namespace convention. There is another large conversion pending, which has not been included because it would have caused a gazillion of merge conflicts in next. The conversion scripts will be run towards the end of the merge window and a pull request sent once all conflict dependencies have been merged" * tag 'timers-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: treewide, timers: Rename destroy_timer_on_stack() as timer_destroy_on_stack() treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try() timers: Rename init_timers() as timers_init() timers: Rename NEXT_TIMER_MAX_DELTA as TIMER_NEXT_MAX_DELTA timers: Rename __init_timer_on_stack() as __timer_init_on_stack() timers: Rename __init_timer() as __timer_init() timers: Rename init_timer_on_stack_key() as timer_init_key_on_stack() timers: Rename init_timer_key() as timer_init_key()
2025-05-27Merge tag 'irq-cleanups-2025-05-25' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull irq cleanups from Thomas Gleixner: "A set of cleanups for the generic interrupt subsystem: - Consolidate on one set of functions for the interrupt domain code to get rid of pointlessly duplicated code with only marginal different semantics. - Update the documentation accordingly and consolidate the coding style of the irqdomain header" * tag 'irq-cleanups-2025-05-25' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (46 commits) irqdomain: Consolidate coding style irqdomain: Fix kernel-doc and add it to Documentation Documentation: irqdomain: Update it Documentation: irq-domain.rst: Simple improvements Documentation: irq/concepts: Minor improvements Documentation: irq/concepts: Add commas and reflow irqdomain: Improve kernel-docs of functions irqdomain: Make struct irq_domain_info variables const irqdomain: Use irq_domain_instantiate()'s return value as initializers irqdomain: Drop irq_linear_revmap() pinctrl: keembay: Switch to irq_find_mapping() irqchip/armada-370-xp: Switch to irq_find_mapping() gpu: ipu-v3: Switch to irq_find_mapping() gpio: idt3243x: Switch to irq_find_mapping() sh: Switch to irq_find_mapping() powerpc: Switch to irq_find_mapping() irqdomain: Drop irq_domain_add_*() functions powerpc: Switch irq_domain_add_nomap() to use fwnode thermal: Switch to irq_domain_create_linear() soc: Switch to irq_domain_create_*() ...
2025-05-23arch/x86: Provide the CPU number in the wakeup AP callbackRoman Kisel3-3/+9
When starting APs, confidential guests and paravisor guests need to know the CPU number, and the pattern of using the linear search has emerged in several places. With N processors that leads to the O(N^2) time complexity. Provide the CPU number in the AP wake up callback so that one can get the CPU number in constant time. Suggested-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Roman Kisel <romank@linux.microsoft.com> Reviewed-by: Tom Lendacky <thomas.lendacky@amd.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Link: https://lore.kernel.org/r/20250507182227.7421-3-romank@linux.microsoft.com Signed-off-by: Wei Liu <wei.liu@kernel.org> Message-ID: <20250507182227.7421-3-romank@linux.microsoft.com>
2025-05-16x86/io_apic: Switch to of_fwnode_handle()Jiri Slaby (SUSE)1-1/+1
of_node_to_fwnode() is irqdomain's reimplementation of the "officially" defined of_fwnode_handle(). The former is in the process of being removed, so use the latter instead. [ tglx: Fix up subject prefix ] Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250319092951.37667-11-jirislaby@kernel.org
2025-05-13Merge branch 'x86/msr' into x86/core, to resolve conflictsIngo Molnar2-11/+13
Conflicts: arch/x86/boot/startup/sme.c arch/x86/coco/sev/core.c arch/x86/kernel/fpu/core.c arch/x86/kernel/fpu/xstate.c Semantic conflict: arch/x86/include/asm/sev-internal.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-05-08treewide, timers: Rename try_to_del_timer_sync() as timer_delete_sync_try()Ingo Molnar1-1/+1
Move this API to the canonical timer_*() namespace. Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/20250507175338.672442-9-mingo@kernel.org
2025-05-02x86/msr: Add explicit includes of <asm/msr.h>Xin Li (Intel)2-0/+2
For historic reasons there are some TSC-related functions in the <asm/msr.h> header, even though there's an <asm/tsc.h> header. To facilitate the relocation of rdtsc{,_ordered}() from <asm/msr.h> to <asm/tsc.h> and to eventually eliminate the inclusion of <asm/msr.h> in <asm/tsc.h>, add an explicit <asm/msr.h> dependency to the source files that reference definitions from <asm/msr.h>. [ mingo: Clarified the changelog. ] Signed-off-by: Xin Li (Intel) <xin@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Brian Gerst <brgerst@gmail.com> Cc: Juergen Gross <jgross@suse.com> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Kees Cook <keescook@chromium.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Uros Bizjak <ubizjak@gmail.com> Link: https://lore.kernel.org/r/20250501054241.1245648-1-xin@zytor.com
2025-04-18x86/asm: Rename rep_nop() to native_pause()Uros Bizjak1-1/+1
Rename rep_nop() function to what it really does. No functional change intended. Suggested-by: David Laight <david.laight.linux@gmail.com> Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: H. Peter Anvin <hpa@zytor.com> Link: https://lore.kernel.org/r/20250418080805.83679-1-ubizjak@gmail.com
2025-04-10x86/msr: Rename 'wrmsrl()' to 'wrmsrq()'Ingo Molnar1-5/+5
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-10x86/msr: Rename 'rdmsrl()' to 'rdmsrq()'Ingo Molnar2-6/+6
Suggested-by: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Juergen Gross <jgross@suse.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Xin Li <xin@zytor.com> Cc: Linus Torvalds <torvalds@linux-foundation.org>
2025-04-04irqdomain: Rename irq_set_default_host() to irq_set_default_domain()Jiri Slaby (SUSE)1-1/+1
Naming interrupt domains host is confusing at best and the irqdomain code uses both domain and host inconsistently. Therefore rename irq_set_default_host() to irq_set_default_domain(). Signed-off-by: Jiri Slaby (SUSE) <jirislaby@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/