From 9c860d1d5d69f9cb19eb7c36573ee14065a9c85a Mon Sep 17 00:00:00 2001 From: Brendan Jackman Date: Wed, 13 May 2026 12:35:13 +0000 Subject: mm: introduce for_each_free_list() Patch series "mm: misc cleanups from __GFP_UNMAPPED series". In v2 of the __GFP_UNMAPPED series [0], we realised that some of the patches could potentially be merged as independent cleanups. These are all independent of one another, if you think some are useful cleanups and others are pointless churn, it should be fine to just pick whatever subset you prefer. No functional change intended. This patch (of 4): There are a couple of places that iterate over the freelists with awareness of the data structures' layout. It seems ideally, code outside of mm should not be aware of the page allocator's freelists at all. But, this patch just doesn't hide them completely, it's just a meek incremental step in that direction: provide a macro to iterate over it without needing to be aware of the actual struct fields. Link: https://lore.kernel.org/20260513-page_alloc-unmapped-prep-v1-0-dacdf5402be8@google.com Link: https://lore.kernel.org/20260513-page_alloc-unmapped-prep-v1-1-dacdf5402be8@google.com Link: https://lore.kernel.org/all/20260320-page_alloc-unmapped-v2-0-28bf1bd54f41@google.com/ [0] Signed-off-by: Brendan Jackman Acked-by: Mike Rapoport (Microsoft) Reviewed-by: Vlastimil Babka (SUSE) Cc: Axel Rasmussen Cc: Barry Song Cc: David Hildenbrand Cc: Johannes Weiner Cc: Kairui Song Cc: Len Brown Cc: Liam R. Howlett Cc: Lorenzo Stoakes Cc: Michal Hocko Cc: "Rafael J. Wysocki" Cc: Shakeel Butt Cc: Suren Baghdasaryan Cc: Wei Xu Cc: Yuanchu Xie Cc: Zi Yan Signed-off-by: Andrew Morton --- kernel/power/snapshot.c | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'kernel/power') diff --git a/kernel/power/snapshot.c b/kernel/power/snapshot.c index a564650734dc..d933b5b2c05d 100644 --- a/kernel/power/snapshot.c +++ b/kernel/power/snapshot.c @@ -1244,8 +1244,9 @@ unsigned int snapshot_additional_pages(struct zone *zone) static void mark_free_pages(struct zone *zone) { unsigned long pfn, max_zone_pfn, page_count = WD_PAGE_COUNT; + struct list_head *free_list; unsigned long flags; - unsigned int order, t; + unsigned int order; struct page *page; if (zone_is_empty(zone)) @@ -1269,9 +1270,8 @@ static void mark_free_pages(struct zone *zone) swsusp_unset_page_free(page); } - for_each_migratetype_order(order, t) { - list_for_each_entry(page, - &zone->free_area[order].free_list[t], buddy_list) { + for_each_free_list(free_list, zone, order) { + list_for_each_entry(page, free_list, buddy_list) { unsigned long i; pfn = page_to_pfn(page); -- cgit v1.2.3 From c13a0316aef5f4b73e8b4bf6943737f836d65e1d Mon Sep 17 00:00:00 2001 From: Youngjun Park Date: Tue, 24 Mar 2026 01:08:21 +0900 Subject: mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device Patch series "mm/swap, PM: hibernate: fix swapoff race in uswsusp by pinning swap device", v8. Currently, in the uswsusp path, only the swap type value is retrieved at lookup time without holding a reference. If swapoff races after the type is acquired, subsequent slot allocations operate on a stale swap device. Additionally, grabbing and releasing the swap device reference on every slot allocation is inefficient across the entire hibernation swap path. This patch series addresses these issues: - Patch 1: Fixes the swapoff race in uswsusp by pinning the swap device from the point it is looked up until the session completes. - Patch 2: Removes the overhead of per-slot reference counting in alloc/free paths and cleans up the redundant SWP_WRITEOK check. This patch (of 2): Hibernation via uswsusp (/dev/snapshot ioctls) has a race window: after selecting the resume swap area but before user space is frozen, swapoff may run and invalidate the selected swap device. Fix this by pinning the swap device with SWP_HIBERNATION while it is in use. The pin is exclusive, which is sufficient since hibernate_acquire() already prevents concurrent hibernation sessions. The kernel swsusp path (sysfs-based hibernate/resume) uses find_hibernation_swap_type() which is not affected by the pin. It freezes user space before touching swap, so swapoff cannot race. Introduce dedicated helpers: - pin_hibernation_swap_type(): Look up and pin the swap device. Used by the uswsusp path. - find_hibernation_swap_type(): Lookup without pinning. Used by the kernel swsusp path. - unpin_hibernation_swap_type(): Clear the hibernation pin. While a swap device is pinned, swapoff is prevented from proceeding. Link: https://lore.kernel.org/20260323160822.1409904-1-youngjun.park@lge.com Link: https://lore.kernel.org/20260323160822.1409904-2-youngjun.park@lge.com Signed-off-by: Youngjun Park Reviewed-by: Kairui Song Cc: Baoquan He Cc: Barry Song Cc: Chris Li Cc: Kemeng Shi Cc: Nhat Pham Cc: "Rafael J . Wysocki" Signed-off-by: Andrew Morton --- kernel/power/swap.c | 2 +- kernel/power/user.c | 15 ++++++++++++--- 2 files changed, 13 insertions(+), 4 deletions(-) (limited to 'kernel/power') diff --git a/kernel/power/swap.c b/kernel/power/swap.c index 2e64869bb5a0..cc4764149e8f 100644 --- a/kernel/power/swap.c +++ b/kernel/power/swap.c @@ -341,7 +341,7 @@ static int swsusp_swap_check(void) * This is called before saving the image. */ if (swsusp_resume_device) - res = swap_type_of(swsusp_resume_device, swsusp_resume_block); + res = find_hibernation_swap_type(swsusp_resume_device, swsusp_resume_block); else res = find_first_swap(&swsusp_resume_device); if (res < 0) diff --git a/kernel/power/user.c b/kernel/power/user.c index be77f3556bd7..d0fcfba7ac23 100644 --- a/kernel/power/user.c +++ b/kernel/power/user.c @@ -71,7 +71,7 @@ static int snapshot_open(struct inode *inode, struct file *filp) memset(&data->handle, 0, sizeof(struct snapshot_handle)); if ((filp->f_flags & O_ACCMODE) == O_RDONLY) { /* Hibernating. The image device should be accessible. */ - data->swap = swap_type_of(swsusp_resume_device, 0); + data->swap = pin_hibernation_swap_type(swsusp_resume_device, 0); data->mode = O_RDONLY; data->free_bitmaps = false; error = pm_notifier_call_chain_robust(PM_HIBERNATION_PREPARE, PM_POST_HIBERNATION); @@ -90,8 +90,10 @@ static int snapshot_open(struct inode *inode, struct file *filp) data->free_bitmaps = !error; } } - if (error) + if (error) { + unpin_hibernation_swap_type(data->swap); hibernate_release(); + } data->frozen = false; data->ready = false; @@ -115,6 +117,7 @@ static int snapshot_release(struct inode *inode, struct file *filp) data = filp->private_data; data->dev = 0; free_all_swap_pages(data->swap); + unpin_hibernation_swap_type(data->swap); if (data->frozen) { pm_restore_gfp_mask(); free_basic_memory_bitmaps(); @@ -235,11 +238,17 @@ static int snapshot_set_swap_area(struct snapshot_data *data, offset = swap_area.offset; } + /* + * Unpin the swap device if a swap area was already + * set by SNAPSHOT_SET_SWAP_AREA. + */ + unpin_hibernation_swap_type(data->swap); + /* * User space encodes device types as two-byte values, * so we need to recode them */ - data->swap = swap_type_of(swdev, offset); + data->swap = pin_hibernation_swap_type(swdev, offset); if (data->swap < 0) return swdev ? -ENODEV : -EINVAL; data->dev = swdev; -- cgit v1.2.3