aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-02-04dt-bindings: leds: Add issi,is31fl3293 to leds-is31fl32xxDaniel Mack1-0/+1
This variant supports 3 channels with 4096 brightness steps. Signed-off-by: Daniel Mack <daniel@zonque.org> Acked-by: Conor Dooley <conor.dooley@microchip.com> Link: https://patch.msgid.link/20251219154521.643312-2-daniel@zonque.org Signed-off-by: Lee Jones <lee@kernel.org>
2026-02-04leds: expresswire: Fix chip state breakageDuje Mihanović2-10/+17
It is possible to put the KTD2801 chip in an unknown/undefined state by changing the brightness very rapidly (for example, with a brightness slider). When this happens, the brightness is stuck on max and cannot be changed until the chip is power cycled. Fix this by disabling interrupts while talking to the chip. While at it, make expresswire_power_off() use fsleep() and also unexport some functions meant to be internal. Fixes: 1368d06dd2c9 ("leds: Introduce ExpressWire library") Tested-by: Karel Balej <balejk@matfyz.cz> Signed-off-by: Duje Mihanović <duje@dujemihanovic.xyz> Link: https://patch.msgid.link/20251217-expresswire-fix-v2-1-4a02b10acd96@dujemihanovic.xyz Signed-off-by: Lee Jones <lee@kernel.org>
2026-02-04mm/slab: only allow SLAB_OBJ_EXT_IN_OBJ for unmergeable cachesHarry Yoo3-3/+4
While SLAB_OBJ_EXT_IN_OBJ allows to reduce memory overhead to account slab objects, it prevents slab merging because merging can change the metadata layout. As pointed out Vlastimil Babka, disabling merging solely for this memory optimization may not be a net win, because disabling slab merging tends to increase overall memory usage. Restrict SLAB_OBJ_EXT_IN_OBJ to caches that are already unmergeable for other reasons (e.g., those with constructors or SLAB_TYPESAFE_BY_RCU). Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260127103151.21883-3-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/slab: place slabobj_ext metadata in unused space within s->sizeHarry Yoo3-11/+101
When a cache has high s->align value and s->object_size is not aligned to it, each object ends up with some unused space because of alignment. If this wasted space is big enough, we can use it to store the slabobj_ext metadata instead of wasting it. On my system, this happens with caches like kmem_cache, mm_struct, pid, task_struct, sighand_cache, xfs_inode, and others. To place the slabobj_ext metadata within each object, the existing slab_obj_ext() logic can still be used by setting: - slab->obj_exts = slab_address(slab) + (slabobj_ext offset) - stride = s->size slab_obj_ext() doesn't need know where the metadata is stored, so this method works without adding extra overhead to slab_obj_ext(). A good example benefiting from this optimization is xfs_inode (object_size: 992, align: 64). To measure memory savings, 2 millions of files were created on XFS. [ MEMCG=y, MEM_ALLOC_PROFILING=n ] Before patch (creating ~2.64M directories on xfs): Slab: 5175976 kB SReclaimable: 3837524 kB SUnreclaim: 1338452 kB After patch (creating ~2.64M directories on xfs): Slab: 5152912 kB SReclaimable: 3838568 kB SUnreclaim: 1314344 kB (-23.54 MiB) Enjoy the memory savings! Suggested-by: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-10-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/slab: move [__]ksize and slab_ksize() to mm/slub.cHarry Yoo4-89/+86
To access SLUB's internal implementation details beyond cache flags in ksize(), move __ksize(), ksize(), and slab_ksize() to mm/slub.c. [vbabka@suse.cz: also make __ksize() static and move its kerneldoc to ksize() ] Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-9-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/slab: save memory by allocating slabobj_ext array from leftoverHarry Yoo1-5/+150
The leftover space in a slab is always smaller than s->size, and kmem caches for large objects that are not power-of-two sizes tend to have a greater amount of leftover space per slab. In some cases, the leftover space is larger than the size of the slabobj_ext array for the slab. An excellent example of such a cache is ext4_inode_cache. On my system, the object size is 1136, with a preferred order of 3, 28 objects per slab, and 960 bytes of leftover space per slab. Since the size of the slabobj_ext array is only 224 bytes (w/o mem profiling) or 448 bytes (w/ mem profiling) per slab, the entire array fits within the leftover space. Allocate the slabobj_exts array from this unused space instead of using kcalloc() when it is large enough. The array is allocated from unused space only when creating new slabs, and it doesn't try to utilize unused space if alloc_slab_obj_exts() is called after slab creation because implementing lazy allocation involves more expensive synchronization. The implementation and evaluation of lazy allocation from unused space is left as future-work. As pointed by Vlastimil Babka [1], it could be beneficial when a slab cache without SLAB_ACCOUNT can be created, and some of the allocations from the cache use __GFP_ACCOUNT. For example, xarray does that. To avoid unnecessary overhead when MEMCG (with SLAB_ACCOUNT) and MEM_ALLOC_PROFILING are not used for the cache, allocate the slabobj_ext array only when either of them is enabled on slab allocation. [ MEMCG=y, MEM_ALLOC_PROFILING=n ] Before patch (creating ~2.64M directories on ext4): Slab: 4747880 kB SReclaimable: 4169652 kB SUnreclaim: 578228 kB After patch (creating ~2.64M directories on ext4): Slab: 4724020 kB SReclaimable: 4169188 kB SUnreclaim: 554832 kB (-22.84 MiB) Enjoy the memory savings! Link: https://lore.kernel.org/linux-mm/48029aab-20ea-4d90-bfd1-255592b2018e@suse.cz [1] Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-8-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/memcontrol,alloc_tag: handle slabobj_ext access under KASAN poisonHarry Yoo3-40/+95
In the near future, slabobj_ext may reside outside the allocated slab object range within a slab, which could be reported as an out-of-bounds access by KASAN. As suggested by Andrey Konovalov [1], explicitly disable KASAN and KMSAN checks when accessing slabobj_ext within slab allocator, memory profiling, and memory cgroup code. While an alternative approach could be to unpoison slabobj_ext, out-of-bounds accesses outside the slab allocator are generally more common. Move metadata_access_enable()/disable() helpers to mm/slab.h so that it can be used outside mm/slub.c. However, as suggested by Suren Baghdasaryan [2], instead of calling them directly from mm code (which is more prone to errors), change users to access slabobj_ext via get/put APIs: - Users should call get_slab_obj_exts() to access slabobj_metadata and call put_slab_obj_exts() when it's done. - From now on, accessing it outside the section covered by get_slab_obj_exts() ~ put_slab_obj_exts() is illegal. This ensures that accesses to slabobj_ext metadata won't be reported as access violations. Call kasan_reset_tag() in slab_obj_ext() before returning the address to prevent SW or HW tag-based KASAN from reporting false positives. Suggested-by: Andrey Konovalov <andreyknvl@gmail.com> Suggested-by: Suren Baghdasaryan <surenb@google.com> Link: https://lore.kernel.org/linux-mm/CA+fCnZezoWn40BaS3cgmCeLwjT+5AndzcQLc=wH3BjMCu6_YCw@mail.gmail.com [1] Link: https://lore.kernel.org/linux-mm/CAJuCfpG=Lb4WhYuPkSpdNO4Ehtjm1YcEEK0OM=3g9i=LxmpHSQ@mail.gmail.com [2] Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-7-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/slab: use stride to access slabobj_extHarry Yoo2-4/+35
Use a configurable stride value when accessing slab object extension metadata instead of assuming a fixed sizeof(struct slabobj_ext). Store stride value in free bits of slab->counters field. This allows for flexibility in cases where the extension is embedded within slab objects. Since these free bits exist only on 64-bit, any future optimizations that need to change stride value cannot be enabled on 32-bit architectures. Suggested-by: Vlastimil Babka <vbabka@suse.cz> Reviewed-by: Suren Baghdasaryan <surenb@google.com> Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-6-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/slab: abstract slabobj_ext access via new slab_obj_ext() helperHarry Yoo3-32/+79
Currently, the slab allocator assumes that slab->obj_exts is a pointer to an array of struct slabobj_ext objects. However, to support storage methods where struct slabobj_ext is embedded within objects, the slab allocator should not make this assumption. Instead of directly dereferencing the slabobj_exts array, abstract access to struct slabobj_ext via helper functions. Introduce a new API slabobj_ext metadata access: slab_obj_ext(slab, obj_exts, index) - returns the pointer to struct slabobj_ext element at the given index. Directly dereferencing the return value of slab_obj_exts() is no longer allowed. Instead, slab_obj_ext() must always be used to access individual struct slabobj_ext objects. Convert all users to use these APIs. No functional changes intended. Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-5-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04ext4: specify the free pointer offset for ext4_inode_cacheHarry Yoo1-6/+13
Convert ext4_inode_cache to use the kmem_cache_args interface and specify a free pointer offset. Since ext4_inode_cache uses a constructor, the free pointer would be placed after the object to prevent overwriting fields used by the constructor. However, some fields such as ->i_flags are not used by the constructor and can safely be repurposed for the free pointer. Specify the free pointer offset at i_flags to reduce the object size. Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-4-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/slab: allow specifying free pointer offset when using constructorHarry Yoo3-17/+21
When a slab cache has a constructor, the free pointer is placed after the object because certain fields must not be overwritten even after the object is freed. However, some fields that the constructor does not initialize can safely be overwritten after free. Allow specifying the free pointer offset within the object, reducing the overall object size when some fields can be reused for the free pointer. Adjust the document accordingly. Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260113061845.159790-3-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/slab: use unsigned long for orig_size to ensure proper metadata alignHarry Yoo1-7/+7
When both KASAN and SLAB_STORE_USER are enabled, accesses to struct kasan_alloc_meta fields can be misaligned on 64-bit architectures. This occurs because orig_size is currently defined as unsigned int, which only guarantees 4-byte alignment. When struct kasan_alloc_meta is placed after orig_size, it may end up at a 4-byte boundary rather than the required 8-byte boundary on 64-bit systems. Note that 64-bit architectures without HAVE_EFFICIENT_UNALIGNED_ACCESS are assumed to require 64-bit accesses to be 64-bit aligned. See HAVE_64BIT_ALIGNED_ACCESS and commit adab66b71abf ("Revert: "ring-buffer: Remove HAVE_64BIT_ALIGNED_ACCESS"") for more details. Change orig_size from unsigned int to unsigned long to ensure proper alignment for any subsequent metadata. This should not waste additional memory because kmalloc objects are already aligned to at least ARCH_KMALLOC_MINALIGN. Closes: https://lore.kernel.org/all/aPrLF0OUK651M4dk@hyeyoo Suggested-by: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: stable@vger.kernel.org Fixes: 6edf2576a6cc ("mm/slub: enable debugging memory wasting of kmalloc") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Closes: https://lore.kernel.org/all/aPrLF0OUK651M4dk@hyeyoo/ Link: https://patch.msgid.link/20260113061845.159790-2-harry.yoo@oracle.com Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04slub: clarify object field layout commentsHao Li1-33/+49
The comments above check_pad_bytes() document the field layout of a single object. Rewrite them to improve clarity and precision. Also update an outdated comment in calculate_sizes(). Suggested-by: Harry Yoo <harry.yoo@oracle.com> Acked-by: Harry Yoo <harry.yoo@oracle.com> Signed-off-by: Hao Li <hao.li@linux.dev> Link: https://patch.msgid.link/20251229122415.192377-1-hao.li@linux.dev Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04mm/slab: avoid allocating slabobj_ext array from its own slabHarry Yoo1-7/+53
When allocating slabobj_ext array in alloc_slab_obj_exts(), the array can be allocated from the same slab we're allocating the array for. This led to obj_exts_in_slab() incorrectly returning true [1], although the array is not allocated from wasted space of the slab. Vlastimil Babka observed that this problem should be fixed even when ignoring its incompatibility with obj_exts_in_slab(), because it creates slabs that are never freed as there is always at least one allocated object. To avoid this, use the next kmalloc size or large kmalloc when the array can be allocated from the same cache we're allocating the array for. In case of random kmalloc caches, there are multiple kmalloc caches for the same size and the cache is selected based on the caller address. Because it is fragile to ensure the same caller address is passed to kmalloc_slab(), kmalloc_noprof(), and kmalloc_node_noprof(), bump the size to (s->object_size + 1) when the sizes are equal, instead of directly comparing the kmem_cache pointers. Note that this doesn't happen when memory allocation profiling is disabled, as when the allocation of the array is triggered by memory cgroup (KMALLOC_CGROUP), the array is allocated from KMALLOC_NORMAL. Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202601231457.f7b31e09-lkp@intel.com [1] Cc: stable@vger.kernel.org Fixes: 4b8736964640 ("mm/slab: add allocation accounting into slab allocation and free paths") Signed-off-by: Harry Yoo <harry.yoo@oracle.com> Link: https://patch.msgid.link/20260126125714.88008-1-harry.yoo@oracle.com Reviewed-by: Hao Li <hao.li@linux.dev> Signed-off-by: Vlastimil Babka <vbabka@suse.cz>
2026-02-04driver core: disable revocable code from buildGreg Kroah-Hartman3-3/+2
The revocable code is still under active discussion, and there is no in-kernel users of it. So disable it from the build for now so that no one suffers from it being present in the tree, yet leave it in the source tree so that others can easily test it by reverting this commit and building off of it for future releases. Acked-by: Danilo Krummrich <dakr@kernel.org> Link: https://patch.msgid.link/2026020307-rimmed-dreamy-5a67@gregkh Reviewed-by: Tzung-Bi Shih <tzungbi@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2026-02-04drm/mgag200: fix mgag200_bmc_stop_scanout()Jacob Keller2-19/+18
The mgag200_bmc_stop_scanout() function is called by the .atomic_disable() handler for the MGA G200 VGA BMC encoder. This function performs a few register writes to inform the BMC of an upcoming mode change, and then polls to wait until the BMC actually stops. The polling is implemented using a busy loop with udelay() and an iteration timeout of 300, resulting in the function blocking for 300 milliseconds. The function gets called ultimately by the output_poll_execute work thread for the DRM output change polling thread of the mgag200 driver: kworker/0:0-mm_ 3528 [000] 4555.315364: ffffffffaa0e25b3 delay_halt.part.0+0x33 ffffffffc03f6188 mgag200_bmc_stop_scanout+0x178 ffffffffc087ae7a disable_outputs+0x12a ffffffffc087c12a drm_atomic_helper_commit_tail+0x1a ffffffffc03fa7b6 mgag200_mode_config_helper_atomic_commit_tail+0x26 ffffffffc087c9c1 commit_tail+0x91 ffffffffc087d51b drm_atomic_helper_commit+0x11b ffffffffc0509694 drm_atomic_commit+0xa4 ffffffffc05105e8 drm_client_modeset_commit_atomic+0x1e8 ffffffffc0510ce6 drm_client_modeset_commit_locked+0x56 ffffffffc0510e24 drm_client_modeset_commit+0x24 ffffffffc088a743 __drm_fb_helper_restore_fbdev_mode_unlocked+0x93 ffffffffc088a683 drm_fb_helper_hotplug_event+0xe3 ffffffffc050f8aa drm_client_dev_hotplug+0x9a ffffffffc088555a output_poll_execute+0x29a ffffffffa9b35924 process_one_work+0x194 ffffffffa9b364ee worker_thread+0x2fe ffffffffa9b3ecad kthread+0xdd ffffffffa9a08549 ret_from_fork+0x29 On a server running ptp4l with the mgag200 driver loaded, we found that ptp4l would sometimes get blocked from execution because of this busy waiting loop. Every so often, approximately once every 20 minutes -- though with large variance -- the output_poll_execute() thread would detect some sort of change that required performing a hotplug event which results in attempting to stop the BMC scanout, resulting in a 300msec delay on one CPU. On this system, ptp4l was pinned to a single CPU. When the output_poll_execute() thread ran on that CPU, it blocked ptp4l from executing for its 300 millisecond duration. This resulted in PTP service disruptions such as failure to send a SYNC message on time, failure to handle ANNOUNCE messages on time, and clock check warnings from the application. All of this despite the application being configured with FIFO_RT and a higher priority than the background workqueue tasks. (However, note that the kernel did not use CONFIG_PREEMPT...) It is unclear if the event is due to a faulty VGA connection, another bug, or actual events causing a change in the connection. At least on the system under test it is not a one-time event and consistently causes disruption to the time sensitive applications. The function has some helpful comments explaining what steps it is attempting to take. In particular, step 3a and 3b are explained as such: 3a - The third step is to verify if there is an active scan. We are waiting on a 0 on remhsyncsts (<XSPAREREG<0>. 3b - This step occurs only if the remove is actually scanning. We are waiting for the end of the frame which is a 1 on remvsyncsts (<XSPAREREG<1>). The actual steps 3a and 3b are implemented as while loops with a non-sleeping udelay(). The first step iterates while the tmp value at position 0 is *not* set. That is, it keeps iterating as long as the bit is zero. If the bit is already 0 (because there is no active scan), it will iterate the entire 300 attempts which wastes 300 milliseconds in total. This is opposite of what the description claims. The step 3b logic only executes if we do not iterate over the entire 300 attempts in the first loop. If it does trigger, it is trying to check and wait for a 1 on the remvsyncsts. However, again the condition is actually inverted and it will loop as long as the bit is 1, stopping once it hits zero (rather than the explained attempt to wait until we see a 1). Worse, both loops are implemented using non-sleeping waits which spin instead of allowing the scheduler to run other processes. If the kernel is not configured to allow arbitrary preemption, it will waste valuable CPU time doing nothing. There does not appear to be any documentation for the BMC register interface, beyond what is in the comments here. It seems more probable that the comment here is correct and the implementation accidentally got inverted from the intended logic. Reading through other DRM driver implementations, it does not appear that the .atomic_enable or .atomic_disable handlers need to delay instead of sleep. For example, the ast_astdp_encoder_helper_atomic_disable() function calls ast_dp_set_phy_sleep() which uses msleep(). The "atomic" in the name is referring to the atomic modesetting support, which is the support to enable atomic configuration from userspace, and not to the "atomic context" of the kernel. There is no reason to use udelay() here if a sleep would be sufficient. Replace the while loops with a read_poll_timeout() based implementation that will sleep between iterations, and which stops polling once the condition is met (instead of looping as long as the condition is met). This aligns with the commented behavior and avoids blocking on the CPU while doing nothing. Note the RREG_DAC is implemented using a statement expression to allow working properly with the read_poll_timeout family of functions. The other RREG_<TYPE> macros ought to be cleaned up to have better semantics, and several places in the mgag200 driver could make use of RREG_DAC or similar RREG_* macros should likely be cleaned up for better semantics as well, but that task has been left as a future cleanup for a non-bugfix. Fixes: 414c45310625 ("mgag200: initial g200se driver (v2)") Suggested-by: Thomas Zimmermann <tzimmermann@suse.de> Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Reviewed-by: Thomas Zimmermann <tzimmermann@suse.de> Reviewed-by: Jocelyn Falempe <jfalempe@redhat.com> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de> Link: https://patch.msgid.link/20260202-jk-mgag200-fix-bad-udelay-v2-1-ce1e9665987d@intel.com
2026-02-04Merge tag 'soc_fsl-6.20-1' of ↵Arnd Bergmann5-63/+233
https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux into soc/drivers FSL SOC Changes for 6.20 Freescale Management Complex: - Convert fsl-mc bus to bus callbacks - Fix a use-after-free - Drop redundant error messages - Fix ressources release on some error path Freescale QUICC Engine: - Add an interrupt controller for IO Ports - Use scoped for-each OF child loop * tag 'soc_fsl-6.20-1' of https://git.kernel.org/pub/scm/linux/kernel/git/chleroy/linux: bus: fsl-mc: fix an error handling in fsl_mc_device_add() soc: fsl: qe: qe_ports_ic: Consolidate chained IRQ handler install/remove dt-bindings: soc: fsl: qe: Add an interrupt controller for QUICC Engine Ports soc: fsl: qe: Add an interrupt controller for QUICC Engine Ports soc: fsl: qe: Simplify with scoped for each OF child loop bus: fsl-mc: fix use-after-free in driver_override_show() bus: fsl-mc: Convert to bus callbacks bus: fsl-mc: Drop error message in probe function Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-02-04Merge tag 'socfpga_dts_updates_for_v6.20_v3' of ↵Arnd Bergmann14-109/+336
git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux into soc/dt SoCFPGA DTS updates for v6.20, version 3 - dt-bindings updates: - Add intel,socfpga-agilex5-socdk-modular for the Agilex5 mod board - Add intel,socfpga-agilex-emmc for the Agilex eMMC daughter board - Move entries in intel,socfpga.yaml into altera.yaml - Add syscon as a fallback for sys-mgr - Add dma-cohrerent property for Agilex5 NAND and DMA - Add support for the Agilex5 modular board - Add IOMMUS property for ethernet nodes for Agilex5 - Use lowercase hex for dts files - Add #address-cells and #size-cells for sram - Fix dtbs_check warning for fpga-region - Move dma controller node for Agilex5 under simple-bus - Add support for the Agilex eMMC daughter board * tag 'socfpga_dts_updates_for_v6.20_v3' of git://git.kernel.org/pub/scm/linux/kernel/git/dinguyen/linux: dt-bindings: intel: Add Agilex eMMC support arm64: dts: socfpga: agilex: add emmc support arm64: dts: intel: agilex5: Add simple-bus node on top of dma controller node ARM: dts: socfpga: fix dtbs_check warning for fpga-region ARM: dts: socfpga: add #address-cells and #size-cells for sram node dt-bindings: altera: document syscon as fallback for sys-mgr arm64: dts: altera: Use lowercase hex dt-bindings: arm: altera: combine Intel's SoCFPGA into altera.yaml arm64: dts: socfpga: agilex5: Add IOMMUS property for ethernet nodes arm64: dts: socfpga: agilex5: add support for modular board dt-bindings: intel: Add Agilex5 SoCFPGA modular board arm64: dts: socfpga: agilex5: Add dma-coherent property Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2026-02-04Merge branch 'tape-devices'Heiko Carstens12-3195/+845
Jan Höppner says: ==================== Quite a lot of the tape device driver code is outdated as devices and storage systems supported by that code aren't supported by IBM anymore for a long time. Especially physical tape devices are not supported or used directly anymore. The only tape storage system supported by IBM is the Virtual Tape Server (VTS) family with TS7700 systems [1]. Host systems will only talk to VTS and are presented with the virtualized 3490E tape device type only. VTS can and still uses tape libraries with physical 3592 cartridges as storage backends (e.g. TS4500). However, these are never seen by any host. The general goal/idea for the tape device driver is to only support VTS from now on. This series gets rid of old outdated code that is not relevant to VTS. There is probably quite a bit more that could be cleaned up or could be improved. However, this is a first run to cleanup the code base and somewhat reduce maintenance burden. [1] https://www.ibm.com/products/ts7700 [2] https://www.ibm.com/products/ts4500 ==================== Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2026-02-04s390/tape: Rename tape_34xx.c to tape_3490.cJan Höppner3-132/+132
The driver now exclusively supports 3490 tape devices, given support for 3480 tape devices has been removed. Update the device driver name, its source file name, and change any occurrences of "34xx/34XX" to "3490" in the source code and comments. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2026-02-04s390/tape: Cleanup sense data analysis and error handlingJan Höppner1-120/+0
Quite a few Error Recovery Action (ERA) codes and sense data entries are not relevant anymore for the Virtual Tape Server (VTS) and are not being used by VTS. Most of them were relevant for actual physical errors when a tape cartridge got stuck or a tape didn't rewind properly for example. Remove these codes from the sense data analysis as it's dead code anyway. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2026-02-04s390/tape: Remove 3480 tape device typeJan Höppner4-61/+26
The only supported device type by the Virtual Tape Server is 3490. The 3480 device type was an old physical tape model and doesn't exist anymore. Remove 3480 from the list and any mention of it. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2026-02-04s390/tape: Remove unused command definitionsJan Höppner1-33/+5
Quite a few command definitions are either not used or don't exist for 3490 tape devices on Virtual Tape Servers (VTS). Cleanup the list, which makes it easier to understand what commands are actually implemented by the driver. The lists below outline the exact reason for the removal. Description for existing commands are adapted to reflect the removal of support for old device types. The following commands don't exist in VTS for 3490 devices and are unused: INVALID_00 0x00 DIAG_MODE_SET 0x0B FORCE_STREAM_CNT 0xEB LOOP_WRITE_TO_READ 0x8B MODE_SET_C3 0xC3 MODE_SET_CB 0xCB MODE_SET_D3 0xD3 NEW_MODE_SET 0xEB RELEASE 0xD4 REQ_TRK_IN_ERROR 0x1B RESERVE 0xF4 SET_DIAGNOSE 0x4B The following command definitions are not used: CONTROL_ACCESS 0xE3 PERF_SUBSYS_FUNC 0x77 READ_BACKWARD 0x0C READ_BUFFER 0x12 READ_BUFF_LOG 0x24 READ_CONFIG_DATA 0xFA READ_DEV_CHAR 0x64 READ_MESSAGE_ID 0x4E READ_SUBSYS_DATA 0x3E SENSE_GROUP_ID 0x34 SENSE_ID 0xE4 SET_GROUP_ID 0xAF SET_INTERFACE_ID 0x73 SET_TAPE_WRITE_IMMED 0xC3 SUSPEND 0x5B SYNC 0x43 Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2026-02-04s390/tape: Remove special block id handlingJan Höppner2-202/+2
For real 3490 tape models a logical block was composed of a direction bit (wrap), a segment number, a format mode, and a logical block number. This is represented in a 4byte identifier. The Virtual Tape Server (VTS) emulates 3490 tape devices and uses a stripped block id format where bit 0-9 of the 4byte identifier are always 0. Bit 10-31 represent the logical block number. All tapes use the 3480-2 XF format, which was defined via TAPE34XX_FMT_3480_2_XF but never used. There is also no special handling required anymore as this is the only format being used. Since VTS doesn't require any special handling of block ids and format, corresponding code is removed. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2026-02-04s390/tape: Remove tape load display supportJan Höppner6-99/+1
The LOAD_DISPLAY (LDD) X'9F' is still accepted by the Virtual Tape Server (VTS) but does not perform any action. Remove all functions and definitions related to this command. The tape_34xx_ioctl() function is also removed as it was mainly used to handle additional ioctl functionality. LOAD_DISPLAY was the only left case. All other ioctls are handled in tapechar_ioctl(). With LOAD_DISPLAY, the remaining definitions in asm/tape390.h are gone. Delete the file. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2026-02-04s390/tape: Remove support for 3590/3592 modelsJan Höppner8-1871/+2
Physical 3590/3592 tape models are not supported anymore for a very long time. The Virtual Tape Server (VTS) emulates and presents only 3490E models to the host. This is the only supported model and storage server. Remove the entire code base for 3590/3592 models as it can be considered dead code for quite some time already. Signed-off-by: Jan Höppner <hoeppner@linux.ibm.com> Reviewed-by: Jens Remus <jremus@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2026-02-04x86/hyperv: Update comment in hyperv_cleanup()Michael Kelley1-3/+7
The comment in hyperv_cleanup() became out-of-date as a result of commit c8ed0812646e ("x86/hyperv: Use direct call to hypercall-page"). Update the comment. No code or functional change. Signed-off-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: clear eventfd counter on irqfd shutdownCarlos López2-4/+2
While unhooking from the irqfd waitqueue, clear the internal eventfd counter by using eventfd_ctx_remove_wait_queue() instead of remove_wait_queue(), preventing potential spurious interrupts. This removes the need to store a pointer into the workqueue, as the eventfd already keeps track of it. This mimicks what other similar subsystems do on their equivalent paths with their irqfds (KVM, Xen, ACRN support, etc). Signed-off-by: Carlos López <clopez@suse.de> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04x86/hyperv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap()Michael Kelley1-3/+3
When running with a paravisor and SEV-SNP, the GHCB page is provided by the paravisor instead of being allocated by Linux. The provided page is normal memory, but is outside of the physical address space seen by Linux. As such it cannot be accessed via the kernel's direct map, and must be explicitly mapped to a kernel virtual address. Current code uses ioremap_cache() and iounmap() to map and unmap the page. These functions are for use on I/O address space that may not behave as normal memory, so they generate or expect addresses with the __iomem attribute. For normal memory, the preferred functions are memremap() and memunmap(), which operate similarly but without __iomem. At the time of the original work on CoCo VMs on Hyper-V, memremap() did not support creating a decrypted mapping, so ioremap_cache() was used instead, since I/O address space is always mapped decrypted. memremap() has since been enhanced to allow decrypted mappings, so replace ioremap_cache() with memremap() when mapping the GHCB page. Similarly, replace iounmap() with memunmap(). As a side benefit, the replacement cleans up 'sparse' warnings about __iomem mismatches. The replacement is done to use the correct functions as long-term goodness and to clean up the sparse warnings. No runtime bugs are fixed. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202311111925.iPGGJik4-lkp@intel.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04Drivers: hv: Use memremap()/memunmap() instead of ioremap_cache()/iounmap()Michael Kelley1-6/+6
When running with a paravisor or in the root partition, the SynIC event and message pages are provided by the paravisor or hypervisor respectively, instead of being allocated by Linux. The provided pages are normal memory, but are outside of the physical address space seen by Linux. As such they cannot be accessed via the kernel's direct map, and must be explicitly mapped to a kernel virtual address. Current code uses ioremap_cache() and iounmap() to map and unmap the pages. These functions are for use on I/O address space that may not behave as normal memory, so they generate or expect addresses with the __iomem attribute. For normal memory, the preferred functions are memremap() and memunmap(), which operate similarly but without __iomem. At the time of the original work on CoCo VMs on Hyper-V, memremap() did not support creating a decrypted mapping, so ioremap_cache() was used instead, since I/O address space is always mapped decrypted. memremap() has since been enhanced to allow decrypted mappings, so replace ioremap_cache() with memremap() when mapping the event and message pages. Similarly, replace iounmap() with memunmap(). As a side benefit, the replacement cleans up 'sparse' warnings about __iomem mismatches. The replacement is done to use the correct functions as long-term goodness and to clean up the sparse warnings. No runtime bugs are fixed. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601170445.JtZQwndW-lkp@intel.com/ Closes: https://lore.kernel.org/oe-kbuild-all/202512150359.fMdmbddk-lkp@intel.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04x86/hyperv: Move hv crash init after hypercall pg setupMukesh R1-1/+3
hv_root_crash_init() is not setting up the hypervisor crash collection for baremetal cases because when it's called, hypervisor page is not setup. Fix is simple, just move the crash init call after the hypercall page setup. Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04PCI: hv: remove unnecessary module_init/exit functionsEthan Nelson-Moore1-12/+0
The pci-hyperv-intf driver has unnecessary empty module_init and module_exit functions. Remove them. Note that if a module_init function exists, a module_exit function must also exist; otherwise, the module cannot be unloaded. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: Add debugfs to view hypervisor statisticsNuno Das Neves4-2/+785
Introduce a debugfs interface to expose root and child partition stats when running with mshv_root. Create a debugfs directory "mshv" containing 'stats' files organized by type and id. A stats file contains a number of counters depending on its type. e.g. an excerpt from a VP stats file: TotalRunTime : 1997602722 HypervisorRunTime : 649671371 RemoteNodeRunTime : 0 NormalizedRunTime : 1997602721 IdealCpu : 0 HypercallsCount : 1708169 HypercallsTime : 111914774 PageInvalidationsCount : 0 PageInvalidationsTime : 0 On a root partition with some active child partitions, the entire directory structure may look like: mshv/ stats # hypervisor stats lp/ # logical processors 0/ # LP id stats # LP 0 stats 1/ 2/ 3/ partition/ # partition stats 1/ # root partition id stats # root partition stats vp/ # root virtual processors 0/ # root VP id stats # root VP 0 stats 1/ 2/ 3/ 42/ # child partition id stats # child partition stats vp/ # child VPs 0/ # child VP id stats # child VP 0 stats 1/ 43/ 55/ On L1VH, some stats are not present as it does not own the hardware like the root partition does: - The hypervisor and lp stats are not present - L1VH's partition directory is named "self" because it can't get its own id - Some of L1VH's partition and VP stats fields are not populated, because it can't map its own HV_STATS_AREA_PARENT page. Co-developed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Co-developed-by: Praveen K Paladugu <prapal@linux.microsoft.com> Signed-off-by: Praveen K Paladugu <prapal@linux.microsoft.com> Co-developed-by: Mukesh Rathor <mrathor@linux.microsoft.com> Signed-off-by: Mukesh Rathor <mrathor@linux.microsoft.com> Co-developed-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com> Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com> Co-developed-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Jinank Jain <jinankjain@microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: Add data for printing stats page countersNuno Das Neves1-0/+490
Introduce mshv_debugfs_counters.c, containing static data corresponding to HV_*_COUNTER enums in the hypervisor source. Defining the enum members as an array instead makes more sense, since it will be iterated over to print counter information to debugfs. Include hypervisor, logical processor, partition, and virtual processor counters. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: Update hv_stats_page definitionsNuno Das Neves2-19/+15
hv_stats_page belongs in hvhdk.h, move it there. It does not require a union to access the data for different counters, just use a single u64 array for simplicity and to match the Windows definitions. While at it, correct the ARM64 value for VpRootDispatchThreadBlocked. Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: Always map child vp stats pages regardless of scheduler typeStanislav Kinsburskii1-17/+8
Currently vp->vp_stats_pages is only used by the root scheduler for fast interrupt injection. Soon, vp_stats_pages will also be needed for exposing child VP stats to userspace via debugfs. Mapping the pages a second time to a different address causes an error on L1VH. Remove the scheduler requirement and always map the vp stats pages. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: Improve mshv_vp_stats_map/unmap(), add them to mshv_root.hStanislav Kinsburskii2-17/+54
These functions are currently only used to map child partition VP stats, on root partition. However, they will soon be used on L1VH, and also used for mapping the host's own VP stats. Introduce a helper is_l1vh_parent() to determine whether we are mapping our own VP stats. In this case, do not attempt to map the PARENT area. Note this is a different case than mapping PARENT on an older hypervisor where it is not available at all, so must be handled separately. On unmap, pass the stats pages since on L1VH the kernel allocates them and they must be freed in hv_unmap_stats_page(). Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: Use typed hv_stats_page pointersStanislav Kinsburskii3-11/+14
Refactor all relevant functions to use struct hv_stats_page pointers instead of void pointers for stats page mapping and unmapping thus improving type safety and code clarity across the Hyper-V stats mapping APIs. Signed-off-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: Ignore second stats page map result failurePurna Pavan Chandra Aekkaladevi2-4/+51
Older versions of the hypervisor do not have a concept of separate SELF and PARENT stats areas. In this case, mapping the HV_STATS_AREA_SELF page is sufficient - it's the only page and it contains all available stats. Mapping HV_STATS_AREA_PARENT returns HV_STATUS_INVALID_PARAMETER which currently causes module init to fail on older hypevisor versions. Detect this case and gracefully fall back to populating stats_pages[HV_STATS_AREA_PARENT] with the already-mapped SELF page. Add comments to clarify the behavior, including a clarification of why this isn't needed for hv_call_map_stats_page2() which always supports PARENT and SELF areas. Signed-off-by: Purna Pavan Chandra Aekkaladevi <paekkaladevi@linux.microsoft.com> Signed-off-by: Nuno Das Neves <nunodasneves@linux.microsoft.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Acked-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Reviewed-by: Michael Kelley <mhklinux@outlook.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: Use EPOLLIN and EPOLLHUP instead of POLLIN and POLLHUPMichael Kelley1-4/+4
mshv code currently uses the POLLIN and POLLHUP flags. Starting with commit a9a08845e9acb ("vfs: do bulk POLL* -> EPOLL* replacement") the intent is to use the EPOLL* versions throughout the kernel. The comment at the top of mshv_eventfd.c describes it as being inspired by the KVM implementation, which was changed by the above mentioned commit in 2018 to use EPOLL*. mshv_eventfd.c is much newer than 2018 and there's no statement as to why it must use the POLL* versions. So change it to use the EPOLL* versions. This change also resolves a 'sparse' warning. No functional change, and the generated code is the same. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601220948.MUTO60W4-lkp@intel.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Stanislav Kinsburskii <skinsburskii@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04x86/hyperv: fix a compiler warning in hv_crash.cMukesh R1-2/+1
Fix a compiler warning that status is defined by not used. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202512301641.FC6OAbGM-lkp@intel.com/ Signed-off-by: Mukesh R <mrathor@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>
2026-02-04mshv: Fix compiler warning about cast converting incompatible function typeMichael Kelley1-2/+3
In mshv_vtl_sint_ioctl_pause_msg_stream(), the reference to function mshv_vtl_synic_mask_vmbus_sint() is cast to type smp_call_func_t. The cast generates a compiler warning because the function signature of mshv_vtl_synic_mask_vmbus_sint() doesn't match smp_call_func_t. There's no actual bug here because the mis-matched function signatures are compatible at runtime. Nonetheless, eliminate the compiler warning by changing the function signature of mshv_vtl_synic_mask_vmbus_sint() to match what on_each_cpu() expects. Remove the cast because it is then no longer necessary. No functional change. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202601170352.qbh3EKH5-lkp@intel.com/ Signed-off-by: Michael Kelley <mhklinux@outlook.com> Reviewed-by: Naman Jain <namjain@linux.microsoft.com> Signed-off-by: Wei Liu <wei.liu@kernel.org>