aboutsummaryrefslogtreecommitdiff
path: root/drivers/base/memory.c
AgeCommit message (Collapse)AuthorFilesLines
2025-12-05Merge tag 'mm-stable-2025-12-03-21-26' of ↵Linus Torvalds1-3/+3
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: "__vmalloc()/kvmalloc() and no-block support" (Uladzislau Rezki) Rework the vmalloc() code to support non-blocking allocations (GFP_ATOIC, GFP_NOWAIT) "ksm: fix exec/fork inheritance" (xu xin) Fix a rare case where the KSM MMF_VM_MERGE_ANY prctl state is not inherited across fork/exec "mm/zswap: misc cleanup of code and documentations" (SeongJae Park) Some light maintenance work on the zswap code "mm/page_owner: add debugfs files 'show_handles' and 'show_stacks_handles'" (Mauricio Faria de Oliveira) Enhance the /sys/kernel/debug/page_owner debug feature by adding unique identifiers to differentiate the various stack traces so that userspace monitoring tools can better match stack traces over time "mm/page_alloc: pcp->batch cleanups" (Joshua Hahn) Minor alterations to the page allocator's per-cpu-pages feature "Improve UFFDIO_MOVE scalability by removing anon_vma lock" (Lokesh Gidra) Address a scalability issue in userfaultfd's UFFDIO_MOVE operation "kasan: cleanups for kasan_enabled() checks" (Sabyrzhan Tasbolatov) "drivers/base/node: fold node register and unregister functions" (Donet Tom) Clean up the NUMA node handling code a little "mm: some optimizations for prot numa" (Kefeng Wang) Cleanups and small optimizations to the NUMA allocation hinting code "mm/page_alloc: Batch callers of free_pcppages_bulk" (Joshua Hahn) Address long lock hold times at boot on large machines. These were causing (harmless) softlockup warnings "optimize the logic for handling dirty file folios during reclaim" (Baolin Wang) Remove some now-unnecessary work from page reclaim "mm/damon: allow DAMOS auto-tuned for per-memcg per-node memory usage" (SeongJae Park) Enhance the DAMOS auto-tuning feature "mm/damon: fixes for address alignment issues in DAMON_LRU_SORT and DAMON_RECLAIM" (Quanmin Yan) Fix DAMON_LRU_SORT and DAMON_RECLAIM with certain userspace configuration "expand mmap_prepare functionality, port more users" (Lorenzo Stoakes) Enhance the new(ish) file_operations.mmap_prepare() method and port additional callsites from the old ->mmap() over to ->mmap_prepare() "Fix stale IOTLB entries for kernel address space" (Lu Baolu) Fix a bug (and possible security issue on non-x86) in the IOMMU code. In some situations the IOMMU could be left hanging onto a stale kernel pagetable entry "mm/huge_memory: cleanup __split_unmapped_folio()" (Wei Yang) Clean up and optimize the folio splitting code "mm, swap: misc cleanup and bugfix" (Kairui Song) Some cleanups and a minor fix in the swap discard code "mm/damon: misc documentation fixups" (SeongJae Park) "mm/damon: support pin-point targets removal" (SeongJae Park) Permit userspace to remove a specific monitoring target in the middle of the current targets list "mm: MISC follow-up patches for linux/pgalloc.h" (Harry Yoo) A couple of cleanups related to mm header file inclusion "mm/swapfile.c: select swap devices of default priority round robin" (Baoquan He) improve the selection of swap devices for NUMA machines "mm: Convert memory block states (MEM_*) macros to enums" (Israel Batista) Change the memory block labels from macros to enums so they will appear in kernel debug info "ksm: perform a range-walk to jump over holes in break_ksm" (Pedro Demarchi Gomes) Address an inefficiency when KSM unmerges an address range "mm/damon/tests: fix memory bugs in kunit tests" (SeongJae Park) Fix leaks and unhandled malloc() failures in DAMON userspace unit tests "some cleanups for pageout()" (Baolin Wang) Clean up a couple of minor things in the page scanner's writeback-for-eviction code "mm/hugetlb: refactor sysfs/sysctl interfaces" (Hui Zhu) Move hugetlb's sysfs/sysctl handling code into a new file "introduce VM_MAYBE_GUARD and make it sticky" (Lorenzo Stoakes) Make the VMA guard regions available in /proc/pid/smaps and improves the mergeability of guarded VMAs "mm: perform guard region install/remove under VMA lock" (Lorenzo Stoakes) Reduce mmap lock contention for callers performing VMA guard region operations "vma_start_write_killable" (Matthew Wilcox) Start work on permitting applications to be killed when they are waiting on a read_lock on the VMA lock "mm/damon/tests: add more tests for online parameters commit" (SeongJae Park) Add additional userspace testing of DAMON's "commit" feature "mm/damon: misc cleanups" (SeongJae Park) "make VM_SOFTDIRTY a sticky VMA flag" (Lorenzo Stoakes) Address the possible loss of a VMA's VM_SOFTDIRTY flag when that VMA is merged with another "mm: support device-private THP" (Balbir Singh) Introduce support for Transparent Huge Page (THP) migration in zone device-private memory "Optimize folio split in memory failure" (Zi Yan) "mm/huge_memory: Define split_type and consolidate split support checks" (Wei Yang) Some more cleanups in the folio splitting code "mm: remove is_swap_[pte, pmd]() + non-swap entries, introduce leaf entries" (Lorenzo Stoakes) Clean up our handling of pagetable leaf entries by introducing the concept of 'software leaf entries', of type softleaf_t "reparent the THP split queue" (Muchun Song) Reparent the THP split queue to its parent memcg. This is in preparation for addressing the long-standing "dying memcg" problem, wherein dead memcg's linger for too long, consuming memory resources "unify PMD scan results and remove redundant cleanup" (Wei Yang) A little cleanup in the hugepage collapse code "zram: introduce writeback bio batching" (Sergey Senozhatsky) Improve zram writeback efficiency by introducing batched bio writeback support "memcg: cleanup the memcg stats interfaces" (Shakeel Butt) Clean up our handling of the interrupt safety of some memcg stats "make vmalloc gfp flags usage more apparent" (Vishal Moola) Clean up vmalloc's handling of incoming GFP flags "mm: Add soft-dirty and uffd-wp support for RISC-V" (Chunyan Zhang) Teach soft dirty and userfaultfd write protect tracking to use RISC-V's Svrsw60t59b extension "mm: swap: small fixes and comment cleanups" (Youngjun Park) Fix a small bug and clean up some of the swap code "initial work on making VMA flags a bitmap" (Lorenzo Stoakes) Start work on converting the vma struct's flags to a bitmap, so we stop running out of them, especially on 32-bit "mm/swapfile: fix and cleanup swap list iterations" (Youngjun Park) Address a possible bug in the swap discard code and clean things up a little [ This merge also reverts commit ebb9aeb980e5 ("vfio/nvgrace-gpu: register device memory for poison handling") because it looks broken to me, I've asked for clarification - Linus ] * tag 'mm-stable-2025-12-03-21-26' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (321 commits) mm: fix vma_start_write_killable() signal handling mm/swapfile: use plist_for_each_entry in __folio_throttle_swaprate mm/swapfile: fix list iteration when next node is removed during discard fs/proc/task_mmu.c: fix make_uffd_wp_huge_pte() huge pte handling mm/kfence: add reboot notifier to disable KFENCE on shutdown memcg: remove inc/dec_lruvec_kmem_state helpers selftests/mm/uffd: initialize char variable to Null mm: fix DEBUG_RODATA_TEST indentation in Kconfig mm: introduce VMA flags bitmap type tools/testing/vma: eliminate dependency on vma->__vm_flags mm: simplify and rename mm flags function for clarity mm: declare VMA flags by bit zram: fix a spelling mistake mm/page_alloc: optimize lowmem_reserve max lookup using its semantic monotonicity mm/vmscan: skip increasing kswapd_failures when reclaim was boosted pagemap: update BUDDY flag documentation mm: swap: remove scan_swap_map_slots() references from comments mm: swap: change swap_alloc_slow() to void mm, swap: remove redundant comment for read_swap_cache_async mm, swap: use SWP_SOLIDSTATE to determine if swap is rotational ...
2025-11-16mm: change type of parameter for memory_notifyIsrael Batista1-2/+2
memory_notify() is responsible for sending events related to memory hotplugging to a notification queue. Since all the events must match one of the values from the enum memory_block_state, it is appropriate to change the function parameter type to make this condition explicit at compile time. Link: https://lkml.kernel.org/r/20251029195617.2210700-4-linux@israelbatista.dev.br Signed-off-by: Israel Batista <linux@israelbatista.dev.br> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-11-16mm: change type of state in struct memory_blockIsrael Batista1-1/+1
The state of a memory block should be restricted to values specified in the documentation of the memory hotplug API. However, since the state field in the memory_block struct was defined as an unsigned long, this restriction was not enforced at compile time. With the introduction of the enum memory_block_state, it is now possible to incorporate the desired semantics in the field declaration and enforce these restrictions at compile time. [akpm@linux-foundation.org: fix whitespace, per Randy] Link: https://lkml.kernel.org/r/20251029195617.2210700-3-linux@israelbatista.dev.br Signed-off-by: Israel Batista <linux@israelbatista.dev.br> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Reviewed-by: Lorenzo Stoakes <lorenzo.stoakes@oracle.com> Cc: Omar Sandoval <osandov@osandov.com> Cc: Randy Dunlap <rdunlap@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-14mm/memory_hotplug: Remove MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiersSumanth Korikkar1-22/+1
MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers were introduced to prepare the transition of memory to and from a physically accessible state. This enhancement was crucial for implementing the "memmap on memory" feature for s390. With introduction of dynamic (de)configuration of hotpluggable memory, memory can be brought to accessible state before add_memory(). Memory can be brought to inaccessible state before remove_memory(). Hence, there is no need of MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers anymore. This basically reverts commit c5f1e2d18909 ("mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiers") Additionally, apply minor adjustments to the function parameters of move_pfn_range_to_zone() and mhp_supports_memmap_on_memory() to ensure compatibility with the latest branch. Acked-by: David Hildenbrand <david@redhat.com> Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2025-10-03drivers/base: move memory_block_add_nid() into the callerHannes Reinecke1-18/+18
Now the node id only needs to be set for early memory, so move memory_block_add_nid() into the caller and rename it into memory_block_add_nid_early(). This allows us to further simplify the code by dropping the 'context' argument to do_register_memory_block_under_node(). Link: https://lkml.kernel.org/r/20250729064637.51662-4-hare@kernel.org Suggested-by: David Hildenbrand <david@redhat.com> Signed-off-by: Hannes Reinecke <hare@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-03mm/memory_hotplug: activate node before adding new memory blocksHannes Reinecke1-2/+2
The sysfs attributes for memory blocks require the node ID to be set and initialized, so move the node activation before adding new memory blocks. This also has the nice side effect that the BUG_ON() can be converted into a WARN_ON() as we now can handle registration errors. Link: https://lkml.kernel.org/r/20250729064637.51662-3-hare@kernel.org Fixes: b9ff036082cd ("mm/memory_hotplug.c: make add_memory_resource use __try_online_node") Signed-off-by: Hannes Reinecke <hare@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-10-03drivers/base/memory: add node id parameter to add_memory_block()Hannes Reinecke1-11/+4
Patch series "mm/memory_hotplug: fixup crash during uevent handling", v4. we have some udev rules trying to read the sysfs attribute 'valid_zones' during an memory 'add' event, causing a crash in zone_for_pfn_range(). Debugging found that mem->nid was set to NUMA_NO_NODE, which crashed in NODE_DATA(nid). Further analysis revealed that we're running into a race with udev event processing: add_memory_resource() has this function calls: 1) __try_online_node() 2) arch_add_memory() 3) create_memory_block_devices() -> calls device_register() -> memory 'add' event 4) node_set_online()/__register_one_node() -> calls device_register() -> node 'add' event 5) register_memory_blocks_under_node() -> sets mem->nid Which, to the uninitated, is ... weird ... Why do we try to online the node in 1), but only register the node in 4) _after_ we have created the memory blocks in 3) ? And why do we set the 'nid' value in 5), when the uevent (which might need to see the correct 'nid' value) is sent out in 3) ? There must be a reason, I'm sure ... So here's a small patchset to fixup uevent ordering. The first patch adds a 'nid' parameter to add_memory_blocks() (to avoid mem->nid being initialized with NUMA_NO_NODE), and the second patch reshuffles the code in add_memory_resource() to fully initialize the node prior to calling create_memory_block_devices() so that the node is valid at that time and uevent processing will see correct values in sysfs. This patch (of 3): We have some udev rules trying to read the sysfs attribute 'valid_zones' during an memory 'add' event, causing a crash in zone_for_pfn_range(). Debugging found that mem->nid was set to NUMA_NO_NODE, which crashed in NODE_DATA(nid). Further analysis revealed that we're running into a race with udev event processing: add_memory_resource() has this function calls: 1) __try_online_node() 2) arch_add_memory() 3) create_memory_block_devices() -> calls device_register() -> memory 'add' event 4) node_set_online()/__register_one_node() -> calls device_register() -> node 'add' event 5) register_memory_blocks_under_node() -> sets mem->nid Which, to the uninitated, is ... weird ... Why do we try to online the node in 1), but only register the node in 4) _after_ we have created the memory blocks in 3) ? And why do we set the 'nid' value in 5), when the uevent (which might need to see the correct 'nid' value) is sent out in 3) ? There must be a reason, I'm sure ... So here's a small patchset to fixup uevent ordering. The first patch adds a 'nid' parameter to add_memory_blocks() (to avoid mem->nid being initialized with NUMA_NO_NODE), and the second patch reshuffles the code in add_memory_resource() to fully initialize the node prior to calling create_memory_block_devices() so that the node is valid at that time and uevent processing will see correct values in sysfs. This patch (of 3): Add a 'nid' parameter to add_memory_block() to initialize the memory block with the correct node id. Link: https://lkml.kernel.org/r/20250729064637.51662-1-hare@kernel.org Link: https://lkml.kernel.org/r/20250729064637.51662-2-hare@kernel.org Signed-off-by: Hannes Reinecke <hare@kernel.org> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Reviewed-by: Donet Tom <donettom@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-07-09drivers/base/node: optimize memory block registration to reduce boot timeDonet Tom1-17/+4
Patch series "drivers/base/node.c: optimization and cleanups", v7. This patch (of 7) During node device initialization, `memory blocks` are registered under each NUMA node. The `memory blocks` to be registered are identified using the node's start and end PFNs, which are obtained from the node's pg_data However, not all PFNs within this range necessarily belong to the same node—some may belong to other nodes. Additionally, due to the discontiguous nature of physical memory, certain sections within a `memory block` may be absent. As a result, `memory blocks` that fall between a node's start and end PFNs may span across multiple nodes, and some sections within those blocks may be missing. `Memory blocks` have a fixed size, which is architecture dependent. Due to these considerations, the memory block registration is currently performed as follows: for_each_online_node(nid): start_pfn = pgdat->node_start_pfn; end_pfn = pgdat->node_start_pfn + node_spanned_pages; for_each_memory_block_between(PFN_PHYS(start_pfn), PFN_PHYS(end_pfn)) mem_blk = memory_block_id(pfn_to_section_nr(pfn)); pfn_mb_start=section_nr_to_pfn(mem_blk->start_section_nr) pfn_mb_end = pfn_start + memory_block_pfns - 1 for (pfn = pfn_mb_start; pfn < pfn_mb_end; pfn++): if (get_nid_for_pfn(pfn) != nid): continue; else do_register_memory_block_under_node(nid, mem_blk, MEMINIT_EARLY); Here, we derive the start and end PFNs from the node's pg_data, then determine the memory blocks that may belong to the node. For each `memory block` in this range, we inspect all PFNs it contains and check their associated NUMA node ID. If a PFN within the block matches the current node, the memory block is registered under that node. If CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, get_nid_for_pfn() performs a binary search in the `memblock regions` to determine the NUMA node ID for a given PFN. If it is not enabled, the node ID is retrieved directly from the struct page. On large systems, this process can become time-consuming, especially since we iterate over each `memory block` and all PFNs within it until a match is found. When CONFIG_DEFERRED_STRUCT_PAGE_INIT is enabled, the additional overhead of the binary search increases the execution time significantly, potentially leading to soft lockups during boot. In this patch, we iterate over `memblock region` to identify the `memory blocks` that belong to the current NUMA node. `memblock regions` are contiguous memory ranges, each associated with a single NUMA node, and they do not span across multiple nodes. for_each_memory_region(r): // r => region if (!node_online(r->nid)): continue; else for_each_memory_block_between(r->base, r->base + r->size - 1): do_register_memory_block_under_node(r->nid, mem_blk, MEMINIT_EARLY); We iterate over all memblock regions, and if the node associated with the region is online, we calculate the start and end memory blocks based on the region's start and end PFNs. We then register all the memory blocks within that range under the region node. Test Results on My system with 32TB RAM ======================================= 1. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT enabled. Without this patch ------------------ Startup finished in 1min 16.528s (kernel) With this patch --------------- Startup finished in 17.236s (kernel) - 78% Improvement 2. Boot time with CONFIG_DEFERRED_STRUCT_PAGE_INIT disabled. Without this patch ------------------ Startup finished in 28.320s (kernel) With this patch --------------- Startup finished in 15.621s (kernel) - 46% Improvement [donettom@linux.ibm.com: restore removed extra line] Link: https://lkml.kernel.org/r/20250609140354.467908-1-donettom@linux.ibm.com Link: https://lkml.kernel.org/r/2a0a05c2dffc62a742bf1dd030098be4ce99be28.1748452241.git.donettom@linux.ibm.com Link: https://lkml.kernel.org/r/2a0a05c2dffc62a742bf1dd030098be4ce99be28.1748452241.git.donettom@linux.ibm.com Signed-off-by: Donet Tom <donettom@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-05-11memory: implement memory_block_advise/probe_max_sizeGregory Price1-0/+51
Patch series "memory,x86,acpi: hotplug memory alignment advisement", v8. When physical address regions are not aligned to memory block size, the misaligned portion is lost (stranded capacity). Block size (min/max/selected) is architecture defined. Most architectures tend to use the minimum block size or some simplistic heurist. On x86, memory block size increases up to 2GB, and is otherwise fitted to the alignment of non-hotplug (i.e. not special purpose memory). CXL exposes its memory for management through the ACPI CEDT (CXL Early Detection Table) in a field called the CXL Fixed Memory Window. Per the CXL specification, this memory must be aligned to at least 256MB. When a CFMW aligns on a size less than the block size, this causes a loss of up to 2GB per CFMW on x86. It is not uncommon for CFMW to be allocated per-device - though this behavior is BIOS defined. This patch set provides 3 things: 1) implement advise/query functions in driverse/base/memory.c to report/query architecture agnostic hotplug block alignment advice. 2) update x86 memblock size logic to consider the hotplug advice 3) add code in acpi/numa/srat.c to report CFMW alignment advice The advisement interfaces are design to be called during arch_init code prior to allocator and smp_init. start_kernel will call these through setup_arch() (via acpi and mm/init_64.c on x86), which occurs prior to mm_core_init and smp_init - so no need for atomics. There's an attempt to signal callers to advise() that query has already occurred, but this is predicated on the notion that query actually occurs (which presently only happens on the x86 arch). This is to assist debugging future users. Otherwise, the advise() call has been marked __init to help static discovery of bad call times. Once query is called the first time, it will always return the same value. Interfaces return -EBUSY and 0 respectively on systems without hotplug. This patch (of 3): Hotplug memory sources may have opinions on what the memblock size should be - usually for alignment purposes. For example, CXL memory extents can be 256MB with a matching alignment. If this size/alignment is smaller than the block size, it can result in stranded capacity. Implement memory_block_advise_max_size for use prior to allocator init, for software to advise the system on the max block size. Implement memory_block_probe_max_size for use by arch init code to calculate the best block size. Use of advice is architecture defined. The probe value can never change after first probe. Calls to advise after probe will return -EBUSY to aid debugging. On systems without hotplug, always return -ENODEV and 0 respectively. Link: https://lkml.kernel.org/r/20250127153405.3379117-1-gourry@gourry.net Link: https://lkml.kernel.org/r/20250127153405.3379117-2-gourry@gourry.net Signed-off-by: Gregory Price <gourry@gourry.net> Suggested-by: Ira Weiny <ira.weiny@intel.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Acked-by: Dan Williams <dan.j.williams@intel.com> Tested-by: Fan Ni <fan.ni@samsung.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Alison Schofield <alison.schofield@intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Borislav Betkov <bp@alien8.de> Cc: Bruno Faccini <bfaccini@nvidia.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Haibo Xu <haibo1.xu@intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joanthan Cameron <Jonathan.Cameron@huawei.com> Cc: Len Brown <lenb@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Robert Richter <rrichter@amd.com> Cc: Thomas Gleinxer <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-04-15drivers/base/memory: Avoid overhead from for_each_present_section_nr()Gavin Shan1-24/+17
for_each_present_section_nr() was introduced to add_boot_memory_block() by commit 61659efdb35c ("drivers/base/memory: improve add_boot_memory_block()"). It causes unnecessary overhead when the present sections are really sparse. next_present_section_nr() called by the macro to find the next present section, which is far away from the spanning sections in the specified block. Too much time consumed by next_present_section_nr() in this case, which can lead to softlockup as observed by Aditya Gupta on IBM Power10 machine. watchdog: BUG: soft lockup - CPU#248 stuck for 22s! [swapper/248:1] Modules linked in: CPU: 248 UID: 0 PID: 1 Comm: swapper/248 Not tainted 6.15.0-rc1-next-20250408 #1 VOLUNTARY Hardware name: 9105-22A POWER10 (raw) 0x800200 opal:v7.1-107-gfda75d121942 PowerNV NIP: c00000000209218c LR: c000000002092204 CTR: 0000000000000000 REGS: c00040000418fa30 TRAP: 0900 Not tainted (6.15.0-rc1-next-20250408) MSR: 9000000002009033 <SF,HV,VEC,EE,ME,IR,DR,RI,LE> CR: 28000428 XER: 00000000 CFAR: 0000000000000000 IRQMASK: 0 GPR00: c000000002092204 c00040000418fcd0 c000000001b08100 0000000000000040 GPR04: 0000000000013e00 c000c03ffebabb00 0000000000c03fff c000400fff587f80 GPR08: 0000000000000000 00000000001196f7 0000000000000000 0000000028000428 GPR12: 0000000000000000 c000000002e80000 c00000000001007c 0000000000000000 GPR16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR20: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000 GPR28: c000000002df7f70 0000000000013dc0 c0000000011dd898 0000000008000000 NIP [c00000000209218c] memory_dev_init+0x114/0x1e0 LR [c000000002092204] memory_dev_init+0x18c/0x1e0 Call Trace: [c00040000418fcd0] [c000000002092204] memory_dev_init+0x18c/0x1e0 (unreliable) [c00040000418fd50] [c000000002091348] driver_init+0x78/0xa4 [c00040000418fd70] [c0000000020063ac] kernel_init_freeable+0x22c/0x370 [c00040000418fde0] [c0000000000100a8] kernel_init+0x34/0x25c [c00040000418fe50] [c00000000000cd94] ret_from_kernel_user_thread+0x14/0x1c Avoid the overhead by folding for_each_present_section_nr() to the outer loop. add_boot_memory_block() is dropped after that. Fixes: 61659efdb35c ("drivers/base/memory: improve add_boot_memory_block()") Closes: https://lore.kernel.org/linux-mm/20250409180344.477916-1-adityag@linux.ibm.com Reported-by: Aditya Gupta <adityag@linux.ibm.com> Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Tested-by: Aditya Gupta <adityag@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Link: https://lore.kernel.org/r/20250410125110.1232329-1-gshan@redhat.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2025-03-17drivers/base/memory: improve add_boot_memory_block()Gavin Shan1-9/+8
Patch series "drivers/base/memory: Two cleanups", v3. Two cleanups to drivers/base/memory. This patch (of 2)L It's unnecessary to count the present sections for the specified block since the block will be added if any section in the block is present. Besides, for_each_present_section_nr() can be reused as Andrew Morton suggested. Improve by using for_each_present_section_nr() and dropping the unnecessary @section_count. No functional changes intended. Link: https://lkml.kernel.org/r/20250311233045.148943-1-gshan@redhat.com Link: https://lkml.kernel.org/r/20250311233045.148943-2-gshan@redhat.com Signed-off-by: Gavin Shan <gshan@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Acked-by: Oscar Salvador <osalvador@suse.de> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-03-16drivers/base/memory: simplify outputting of valid_zones_show()Shiyang Ruan1-8/+4
No need to specify position at the first writing to the buf because the @len is always 0 at this time. Use sysfs_emit() instead to simplify it. Also avoid setting/checking default_zone with a conditional operator. Link: https://lkml.kernel.org/r/20250108015223.1522887-1-ruansy.fnst@fujitsu.com Signed-off-by: Shiyang Ruan <ruansy.fnst@fujitsu.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2025-01-25mm: add build-time option for hotplug memory default online typeGregory Price1-2/+2
Memory hotplug presently auto-onlines memory into a zone the kernel deems appropriate if CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y. The memhp_default_state boot param enables runtime config, but it's not possible to do this at build-time. Remove CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE, and replace it with CONFIG_MHP_DEFAULT_ONLINE_TYPE_* choices that sync with the boot param. Selections: CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE => mhp_default_online_type = "offline" Memory will not be onlined automatically. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO => mhp_default_online_type = "online" Memory will be onlined automatically in a zone deemed. appropriate by the kernel. CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_KERNEL => mhp_default_online_type = "online_kernel" Memory will be onlined automatically. The zone may allow kernel data (e.g. ZONE_NORMAL). CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_MOVABLE => mhp_default_online_type = "online_movable" Memory will be onlined automatically. The zone will be ZONE_MOVABLE. Default to CONFIG_MHP_DEFAULT_ONLINE_TYPE_OFFLINE to match the existing default CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n behavior. Existing users of CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y should use CONFIG_MHP_DEFAULT_ONLINE_TYPE_ONLINE_AUTO. [gourry@gourry.net: update KConfig comments] Link: https://lkml.kernel.org/r/20241226182918.648799-1-gourry@gourry.net Link: https://lkml.kernel.org/r/20241220210709.300066-1-gourry@gourry.net Signed-off-by: Gregory Price <gourry@gourry.net> Acked-by: David Hildenbrand <david@redhat.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-04-23crash: add a new kexec flag for hotplug supportSourabh Jain1-1/+1
Commit a72bbec70da2 ("crash: hotplug support for kexec_load()") introduced a new kexec flag, `KEXEC_UPDATE_ELFCOREHDR`. Kexec tool uses this flag to indicate to the kernel that it is safe to modify the elfcorehdr of the kdump image loaded using the kexec_load system call. However, it is possible that architectures may need to update kexec segments other then elfcorehdr. For example, FDT (Flatten Device Tree) on PowerPC. Introducing a new kexec flag for every new kexec segment may not be a good solution. Hence, a generic kexec flag bit, `KEXEC_CRASH_HOTPLUG_SUPPORT`, is introduced to share the CPU/Memory hotplug support intent between the kexec tool and the kernel for the kexec_load system call. Now we have two kexec flags that enables crash hotplug support for kexec_load system call. First is KEXEC_UPDATE_ELFCOREHDR (only used in x86), and second is KEXEC_CRASH_HOTPLUG_SUPPORT (for all architectures). To simplify the process of finding and reporting the crash hotplug support the following changes are introduced. 1. Define arch specific function to process the kexec flags and determine crash hotplug support 2. Rename the @update_elfcorehdr member of struct kimage to @hotplug_support and populate it for both kexec_load and kexec_file_load syscalls, because architecture can update more than one kexec segment 3. Let generic function crash_check_hotplug_support report hotplug support for loaded kdump image based on value of @hotplug_support To bring the x86 crash hotplug support in line with the above points, the following changes have been made: - Introduce the arch_crash_hotplug_support function to process kexec flags and determine crash hotplug support - Remove the arch_crash_hotplug_[cpu|memory]_support functions Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://msgid.link/20240326055413.186534-3-sourabhjain@linux.ibm.com
2024-02-21mm/memory_hotplug: introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE notifiersSumanth Korikkar1-1/+22
Patch series "implement "memmap on memory" feature on s390". This series provides "memmap on memory" support on s390 platform. "memmap on memory" allows struct pages array to be allocated from the hotplugged memory range instead of allocating it from main system memory. s390 currently preallocates struct pages array for all potentially possible memory, which ensures memory onlining always succeeds, but with the cost of significant memory consumption from the available system memory during boottime. In certain extreme configuration, this could lead to ipl failure. "memmap on memory" ensures struct pages array are populated from self contained hotplugged memory range instead of depleting the available system memory and this could eliminate ipl failure on s390 platform. On other platforms, system might go OOM when the physically hotplugged memory depletes the available memory before it is onlined. Hence, "memmap on memory" feature was introduced as described in commit a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range"). Unlike other architectures, s390 memory blocks are not physically accessible until it is online. To make it physically accessible two new memory notifiers MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE are added and this notifier lets the hypervisor inform that the memory should be made physically accessible. This allows for "memmap on memory" initialization during memory hotplug onlining phase, which is performed before calling MEM_GOING_ONLINE notifier. Patch 1 introduces MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to prepare the transition of memory to and from a physically accessible state. New mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced to ensure altmap cannot be written when adding memory - before it is set online. This enhancement is crucial for implementing the "memmap on memory" feature for s390 in a subsequent patch. Patches 2 allocates vmemmap pages from self-contained memory range for s390. It allocates memory map (struct pages array) from the hotplugged memory range, rather than using system memory by passing altmap to vmemmap functions. Patch 3 removes unhandled memory notifier types on s390. Patch 4 implements MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers on s390. MEM_PREPARE_ONLINE memory notifier makes memory block physical accessible via sclp assign command. The notifier ensures self-contained memory maps are accessible and hence enabling the "memmap on memory" on s390. MEM_FINISH_OFFLINE memory notifier shifts the memory block to an inaccessible state via sclp unassign command. Patch 5 finally enables MHP_MEMMAP_ON_MEMORY on s390. This patch (of 5): Introduce MEM_PREPARE_ONLINE/MEM_FINISH_OFFLINE memory notifiers to prepare the transition of memory to and from a physically accessible state. This enhancement is crucial for implementing the "memmap on memory" feature for s390 in a subsequent patch. Platforms such as x86 can support physical memory hotplug via ACPI. When there is physical memory hotplug, ACPI event leads to the memory addition with the following callchain: acpi_memory_device_add() -> acpi_memory_enable_device() -> __add_memory() After this, the hotplugged memory is physically accessible, and altmap support prepared, before the "memmap on memory" initialization in memory_block_online() is called. On s390, memory hotplug works in a different way. The available hotplug memory has to be defined upfront in the hypervisor, but it is made physically accessible only when the user sets it online via sysfs, currently in the MEM_GOING_ONLINE notifier. This is too late and "memmap on memory" initialization is performed before calling MEM_GOING_ONLINE notifier. During the memory hotplug addition phase, altmap support is prepared and during the memory onlining phase s390 requires memory to be physically accessible and then subsequently initiate the "memmap on memory" initialization process. The memory provider will handle new MEM_PREPARE_ONLINE / MEM_FINISH_OFFLINE notifications and make the memory accessible. The mhp_flag MHP_OFFLINE_INACCESSIBLE is introduced and is relevant when used along with MHP_MEMMAP_ON_MEMORY, because the altmap cannot be written (e.g., poisoned) when adding memory -- before it is set online. This allows for adding memory with an altmap that is not currently made available by a hypervisor. When onlining that memory, the hypervisor can be instructed to make that memory accessible via the new notifiers and the onlining phase will not require any memory allocations, which is helpful in low-memory situations. All architectures ignore unknown memory notifiers. Therefore, the introduction of these new notifiers does not result in any functional modifications across architectures. Link: https://lkml.kernel.org/r/20240108132747.3238763-1-sumanthk@linux.ibm.com Link: https://lkml.kernel.org/r/20240108132747.3238763-2-sumanthk@linux.ibm.com Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Suggested-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Suggested-by: David Hildenbrand <david@redhat.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-01-18Merge tag 'driver-core-6.8-rc1' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here are the set of driver core and kernfs changes for 6.8-rc1. Nothing major in here this release cycle, just lots of small cleanups and some tweaks on kernfs that in the very end, got reverted and will come back in a safer way next release cycle. Included in here are: - more driver core 'const' cleanups and fixes - fw_devlink=rpm is now the default behavior - kernfs tiny changes to remove some string functions - cpu handling in the driver core is updated to work better on many systems that add topologies and cpus after booting - other minor changes and cleanups All of the cpu handling patches have been acked by the respective maintainers and are coming in here in one series. Everything has been in linux-next for a while with no reported issues" * tag 'driver-core-6.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: (51 commits) Revert "kernfs: convert kernfs_idr_lock to an irq safe raw spinlock" kernfs: convert kernfs_idr_lock to an irq safe raw spinlock class: fix use-after-free in class_register() PM: clk: make pm_clk_add_notifier() take a const pointer EDAC: constantify the struct bus_type usage kernfs: fix reference to renamed function driver core: device.h: fix Excess kernel-doc description warning driver core: class: fix Excess kernel-doc description warning driver core: mark remaining local bus_type variables as const driver core: container: make container_subsys const driver core: bus: constantify subsys_register() calls driver core: bus: make bus_sort_breadthfirst() take a const pointer kernfs: d_obtain_alias(NULL) will do the right thing... driver core: Better advertise dev_err_probe() kernfs: Convert kernfs_path_from_node_locked() from strlcpy() to strscpy() kernfs: Convert kernfs_name_locked() from strlcpy() to strscpy() kernfs: Convert kernfs_walk_ns() from strlcpy() to strscpy() initramfs: Expose retained initrd as sysfs file fs/kernfs/dir: obey S_ISGID kernel/cgroup: use kernfs_create_dir_ns() ...
2023-12-21driver core: mark remaining local bus_type variables as constGreg Kroah-Hartman1-1/+1
Now that the driver core can properly handle constant struct bus_type, change the local driver core bus_type variables to be a constant structure as well, placing them into read-only memory which can not be modified at runtime. Cc: Ira Weiny <ira.weiny@intel.com> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: David Hildenbrand <david@redhat.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Kevin Hilman <khilman@kernel.org> Cc: Ulf Hansson <ulf.hansson@linaro.org> Cc: Len Brown <len.brown@intel.com> Acked-by: William Breathitt Gray <william.gray@linaro.org> Acked-by: Dave Ertman <david.m.ertman@intel.com> Link: https://lore.kernel.org/r/2023121908-paver-follow-cc21@gregkh Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-06mm/memory_hotplug: add missing mem_hotplug_lockSumanth Korikkar1-3/+15
From Documentation/core-api/memory-hotplug.rst: When adding/removing/onlining/offlining memory or adding/removing heterogeneous/device memory, we should always hold the mem_hotplug_lock in write mode to serialise memory hotplug (e.g. access to global/zone variables). mhp_(de)init_memmap_on_memory() functions can change zone stats and struct page content, but they are currently called w/o the mem_hotplug_lock. When memory block is being offlined and when kmemleak goes through each populated zone, the following theoretical race conditions could occur: CPU 0: | CPU 1: memory_offline() | -> offline_pages() | -> mem_hotplug_begin() | ... | -> mem_hotplug_done() | | kmemleak_scan() | -> get_online_mems() | ... -> mhp_deinit_memmap_on_memory() | [not protected by mem_hotplug_begin/done()]| Marks memory section as offline, | Retrieves zone_start_pfn poisons vmemmap struct pages and updates | and struct page members. the zone related data | | ... | -> put_online_mems() Fix this by ensuring mem_hotplug_lock is taken before performing mhp_init_memmap_on_memory(). Also ensure that mhp_deinit_memmap_on_memory() holds the lock. online/offline_pages() are currently only called from memory_block_online/offline(), so it is safe to move the locking there. Link: https://lkml.kernel.org/r/20231120145354.308999-2-sumanthk@linux.ibm.com Fixes: a08a2ae34613 ("mm,memory_hotplug: allocate memmap from the added memory range") Signed-off-by: Sumanth Korikkar <sumanthk@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@linux.ibm.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Alexander Gordeev <agordeev@linux.ibm.com> Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: kernel test robot <lkp@intel.com> Cc: <stable@vger.kernel.org> [5.15+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-29Merge tag 'mm-nonmm-stable-2023-08-28-22-48' of ↵Linus Torvalds1-0/+13
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull non-MM updates from Andrew Morton: - An extensive rework of kexec and crash Kconfig from Eric DeVolder ("refactor Kconfig to consolidate KEXEC and CRASH options") - kernel.h slimming work from Andy Shevchenko ("kernel.h: Split out a couple of macros to args.h") - gdb feature work from Kuan-Ying Lee ("Add GDB memory helper commands") - vsprintf inclusion rationalization from Andy Shevchenko ("lib/vsprintf: Rework header inclusions") - Switch the handling of kdump from a udev scheme to in-kernel handling, by Eric DeVolder ("crash: Kernel handling of CPU and memory hot un/plug") - Many singleton patches to various parts of the tree * tag 'mm-nonmm-stable-2023-08-28-22-48' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (81 commits) document while_each_thread(), change first_tid() to use for_each_thread() drivers/char/mem.c: shrink character device's devlist[] array x86/crash: optimize CPU changes crash: change crash_prepare_elf64_headers() to for_each_possible_cpu() crash: hotplug support for kexec_load() x86/crash: add x86 crash hotplug support crash: memory and CPU hotplug sysfs attributes kexec: exclude elfcorehdr from the segment digest crash: add generic infrastructure for crash hotplug support crash: move a few code bits to setup support of crash hotplug kstrtox: consistently use _tolower() kill do_each_thread() nilfs2: fix WARNING in mark_buffer_dirty due to discarded buffer reuse scripts/bloat-o-meter: count weak symbol sizes treewide: drop CONFIG_EMBEDDED lockdep: fix static memory detection even more lib/vsprintf: declare no_hash_pointers in sprintf.h lib/vsprintf: split out sprintf() and friends kernel/fork: stop playing lockless games for exe_file replacement adfs: delete unused "union adfs_dirtail" definition ...
2023-08-24crash: memory and CPU hotplug sysfs attributesEric DeVolder1-0/+13
Introduce the crash_hotplug attribute for memory and CPUs for use by userspace. These attributes directly facilitate the udev rule for managing userspace re-loading of the crash kernel upon hot un/plug changes. For memory, expose the crash_hotplug attribute to the /sys/devices/system/memory directory. For example: # udevadm info --attribute-walk /sys/devices/system/memory/memory81 looking at device '/devices/system/memory/memory81': KERNEL=="memory81" SUBSYSTEM=="memory" DRIVER=="" ATTR{online}=="1" ATTR{phys_device}=="0" ATTR{phys_index}=="00000051" ATTR{removable}=="1" ATTR{state}=="online" ATTR{valid_zones}=="Movable" looking at parent device '/devices/system/memory': KERNELS=="memory" SUBSYSTEMS=="" DRIVERS=="" ATTRS{auto_online_blocks}=="offline" ATTRS{block_size_bytes}=="8000000" ATTRS{crash_hotplug}=="1" For CPUs, expose the crash_hotplug attribute to the /sys/devices/system/cpu directory. For example: # udevadm info --attribute-walk /sys/devices/system/cpu/cpu0 looking at device '/devices/system/cpu/cpu0': KERNEL=="cpu0" SUBSYSTEM=="cpu" DRIVER=="processor" ATTR{crash_notes}=="277c38600" ATTR{crash_notes_size}=="368" ATTR{online}=="1" looking at parent device '/devices/system/cpu': KERNELS=="cpu" SUBSYSTEMS=="" DRIVERS=="" ATTRS{crash_hotplug}=="1" ATTRS{isolated}=="" ATTRS{kernel_max}=="8191" ATTRS{nohz_full}==" (null)" ATTRS{offline}=="4-7" ATTRS{online}=="0-3" ATTRS{possible}=="0-7" ATTRS{present}=="0-3" With these sysfs attributes in place, it is possible to efficiently instruct the udev rule to skip crash kernel reloading for kernels configured with crash hotplug support. For example, the following is the proposed udev rule change for RHEL system 98-kexec.rules (as the first lines of the rule file): # The kernel updates the crash elfcorehdr for CPU and memory changes SUBSYSTEM=="cpu", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" SUBSYSTEM=="memory", ATTRS{crash_hotplug}=="1", GOTO="kdump_reload_end" When examined in the context of 98-kexec.rules, the above rules test if crash_hotplug is set, and if so, the userspace initiated unload-then-reload of the crash kernel is skipped. CPU and memory checks are separated in accordance with CONFIG_HOTPLUG_CPU and CONFIG_MEMORY_HOTPLUG kernel config options. If an architecture supports, for example, memory hotplug but not CPU hotplug, then the /sys/devices/system/memory/crash_hotplug attribute file is present, but the /sys/devices/system/cpu/crash_hotplug attribute file will NOT be present. Thus the udev rule skips userspace processing of memory hot un/plug events, but the udev rule will evaluate false for CPU events, thus allowing userspace to process CPU hot un/plug events (ie the unload-then-reload of the kdump capture kernel). Link: https://lkml.kernel.org/r/20230814214446.6659-5-eric.devolder@oracle.com Signed-off-by: Eric DeVolder <eric.devolder@oracle.com> Reviewed-by: Sourabh Jain <sourabhjain@linux.ibm.com> Acked-by: Hari Bathini <hbathini@linux.ibm.com> Acked-by: Baoquan He <bhe@redhat.com> Cc: Akhil Raj <lf32.dev@gmail.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Cc: Borislav Petkov (AMD) <bp@alien8.de> Cc: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Dave Young <dyoung@redhat.com> Cc: David Hildenbrand <david@redhat.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Mimi Zohar <zohar@linux.ibm.com> Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Sean Christopherson <seanjc@google.com> Cc: Takashi Iwai <tiwai@suse.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Thomas Weißschuh <linux@weissschuh.net> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Vivek Goyal <vgoyal@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-08-21mm/memory_hotplug: embed vmem_altmap details in memory blockAneesh Kumar K.V1-10/+17
With memmap on memory, some architecture needs more details w.r.t altmap such as base_pfn, end_pfn, etc to unmap vmemmap memory. Instead of computing them again when we remove a memory block, embed vmem_altmap details in struct memory_block if we are using memmap on memory block feature. [yangyingliang@huawei.com: fix error return code in add_memory_resource()] Link: https://lkml.kernel.org/r/20230809081552.1351184-1-yangyingliang@huawei.com Link: https://lkml.kernel.org/r/20230808091501.287660-7-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Signed-off-by: Yang Yingliang <yangyingliang@huawei.com> Acked-by: Michal Hocko <mhocko@suse.com> Acked-by: David Hildenbrand <david@redhat.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Oscar Salvador <osalvador@suse.de> Cc: Vishal Verma <vishal.l.verma@intel.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2023-01-20drivers/base/memory: Fix comments for phys_index_show()Gavin Shan1-7/