aboutsummaryrefslogtreecommitdiff
path: root/drivers/block
AgeCommit message (Collapse)AuthorFilesLines
2 daysublk: fix use-after-free in ublk_cancel_cmd()Ming Lei1-5/+15
When ublk_reset_ch_dev() clears io->cmd via ublk_queue_reinit() concurrently with ublk_cancel_cmd(), ublk_cancel_cmd() can read a stale pointer and pass it to io_uring_cmd_done(), causing a use-after-free. Fix by synchronizing the two paths with ubq->cancel_lock: - ublk_cancel_cmd(): read and clear io->cmd under cancel_lock, then call io_uring_cmd_done() on the saved local copy outside the lock. - ublk_reset_ch_dev(): hold cancel_lock across ublk_queue_reinit() so that io->cmd and io->flags are cleared atomically with respect to ublk_cancel_cmd(). Fixes: 216c8f5ef0f2 ("ublk: replace monitor with cancelable uring_cmd") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260508123746.242018-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
4 daysublk: validate physical_bs_shift, io_min_shift and io_opt_shiftMing Lei1-1/+17
ublk_validate_params() checks logical_bs_shift is within [9, PAGE_SHIFT] but has no upper bound for physical_bs_shift, io_min_shift, or io_opt_shift. A malicious userspace can set any of these to a large value (e.g., 44), causing undefined behavior from `1 << shift` in ublk_ctrl_start_dev() since the result is stored in 32-bit unsigned int. Cap all three at ilog2(SZ_256M) (28). 256M is big enough to cover all practical block sizes, and originates from the maximum physical block size possible in NVMe (lba_size * (1 + npwg), where npwg is 16-bit). Also zero out ub->params with memset() when copy_from_user() fails or ublk_validate_params() returns error, so that no stale or partial params survive for a subsequent START_DEV to consume. Fixes: 71f28f3136af ("ublk_drv: add io_uring based userspace block driver") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260506082238.22363-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
9 daysublk: don't issue uring_cmd from fallback task workJens Axboe1-1/+3
When ublk_ch_uring_cmd_cb() runs as fallback task work (e.g., because the submitting task is exiting), the command should not be issued as current is a kworker, not the daemon task. This can cause io->task to capture the wrong task in __ublk_fetch(), leading to a task mismatch warning in ublk_uring_cmd_cancel_fn(). Check tw.cancel and return -ECANCELED instead of issuing the command from fallback context. Fixes: 3421c7f68bba ("ublk: make sure io cmd handled in submitter task context") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260501112312.947327-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-24Merge tag 'block-7.1-20260424' of ↵Linus Torvalds2-107/+139
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block fixes from Jens Axboe: - Series for zloop, fixing a variety of issues - t10-pi code cleanup - Fix for a merge window regression with the bio memory allocation mask - Fix for a merge window regression in ublk, caused by an issue with the maple tree iteration code at teardown - ublk self tests additions - Zoned device pgmap fixes - Various little cleanups and fixes * tag 'block-7.1-20260424' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (21 commits) Revert "floppy: fix reference leak on platform_device_register() failure" ublk: avoid unpinning pages under maple tree spinlock ublk: refactor common helper ublk_shmem_remove_ranges() ublk: fix maple tree lockdep warning in ublk_buf_cleanup selftests: ublk: add ublk auto integrity test selftests: ublk: enable test_integrity_02.sh on fio 3.42 selftests: ublk: remove unused argument to _cleanup block: only restrict bio allocation gfp mask asked to block block/blk-throttle: Add WQ_PERCPU to alloc_workqueue users block: Add WQ_PERCPU to alloc_workqueue users block: relax pgmap check in bio_add_page for compatible zone device pages block: add pgmap check to biovec_phys_mergeable floppy: fix reference leak on platform_device_register() failure ublk: use unchecked copy helpers for bio page data t10-pi: reduce ref tag code duplication zloop: remove irq-safe locking zloop: factor out zloop_mark_{full,empty} helpers zloop: set RQF_QUIET when completing requests on deleted devices zloop: improve the unaligned write pointer warning zloop: use vfs_truncate ...
2026-04-24Merge tag 'ceph-for-7.1-rc1' of https://github.com/ceph/ceph-clientLinus Torvalds1-3/+3
Pull ceph updates from Ilya Dryomov: "We have a series from Alex which extends CephFS client metrics with support for per-subvolume data I/O performance and latency tracking (metadata operations aren't included) and a good variety of fixes and cleanups across RBD and CephFS" * tag 'ceph-for-7.1-rc1' of https://github.com/ceph/ceph-client: ceph: add subvolume metrics collection and reporting ceph: parse subvolume_id from InodeStat v9 and store in inode ceph: handle InodeStat v8 versioned field in reply parsing libceph: Fix slab-out-of-bounds access in auth message processing rbd: fix null-ptr-deref when device_add_disk() fails crush: cleanup in crush_do_rule() method ceph: clear s_cap_reconnect when ceph_pagelist_encode_32() fails ceph: only d_add() negative dentries when they are unhashed libceph: update outdated comment in ceph_sock_write_space() libceph: Remove obsolete session key alignment logic ceph: fix num_ops off-by-one when crypto allocation fails libceph: Prevent potential null-ptr-deref in ceph_handle_auth_reply()
2026-04-23Revert "floppy: fix reference leak on platform_device_register() failure"Jens Axboe1-7/+3
This reverts commit e784f2ea0b4fd0e7b70028ff8218f22456c5dcf8. Jiri says the patch is buggy, and it looks like he is right revert it for now. Link: https://lore.kernel.org/linux-block/897f442d-4e04-4b70-b716-38fd10b8af36@kernel.org/ Reported-by: Jiri Slaby <jirislaby@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-23ublk: avoid unpinning pages under maple tree spinlockMing Lei1-10/+46
ublk_shmem_remove_ranges() calls unpin_user_pages() while holding the maple tree spinlock (mas_lock). Although unpin_user_pages() is safe in atomic context, holding the spinlock across potentially many page unpinning operations is not ideal. Split into __ublk_shmem_remove_ranges() which erases up to 64 ranges under mas_lock, collecting base_pfn and nr_pages into a temporary xarray. Then drop the lock and unpin pages outside spinlock context. ublk_shmem_remove_ranges() loops until all matching ranges are processed. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260423033058.2805135-4-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-23ublk: refactor common helper ublk_shmem_remove_ranges()Ming Lei1-40/+29
Extract the shared walk+erase+unpin+kfree loop into ublk_shmem_remove_ranges(). When buf_index >= 0, only ranges matching that index are removed; when buf_index < 0, all ranges are removed. Also extract ublk_unpin_range_pages() to share the page unpinning loop. Convert both __ublk_ctrl_unreg_buf() and ublk_buf_cleanup() to use the new helper. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260423033058.2805135-3-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-23ublk: fix maple tree lockdep warning in ublk_buf_cleanupMing Lei1-0/+4
ublk_buf_cleanup() iterates the maple tree with mas_for_each() without holding mas_lock, triggering a lockdep splat on CONFIG_PROVE_RCU kernels since mas_find() internally uses rcu_dereference_check() which requires either RCU or the tree lock. Fix by holding mas_lock around the iteration, and call mas_erase() before freeing each range to avoid dangling pointers in the tree. Fixes: 5e864438e285 ("ublk: replace xarray with IDA for shmem buffer index allocation") Reported-by: Jens Axboe <axboe@kernel.dk> Closes: https://lore.kernel.org/linux-block/0349d72d-dff8-4f9f-b448-919fa5ae96da@kernel.dk/ Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260423033058.2805135-2-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-22rbd: fix null-ptr-deref when device_add_disk() failsDawei Feng1-3/+3
do_rbd_add() publishes the device with device_add() before calling device_add_disk(). If device_add_disk() fails after device_add() succeeds, the error path calls rbd_free_disk() directly and then later falls through to rbd_dev_device_release(), which calls rbd_free_disk() again. This double teardown can leave blk-mq cleanup operating on invalid state and trigger a null-ptr-deref in __blk_mq_free_map_and_rqs(), reached from blk_mq_free_tag_set(). Fix this by following the normal remove ordering: call device_del() before rbd_dev_device_release() when device_add_disk() fails after device_add(). That keeps the teardown sequence consistent and avoids re-entering disk cleanup through the wrong path. The bug was first flagged by an experimental analysis tool we are developing for kernel memory-management bugs while analyzing v6.13-rc1. The tool is still under development and is not yet publicly available. We reproduced the bug on v7.0 with a real Ceph backend and a QEMU x86_64 guest booted with KASAN and CONFIG_FAILSLAB enabled. The reproducer confines failslab injections to the __add_disk() range and injects fail-nth while mapping an RBD image through /sys/bus/rbd/add_single_major. On the unpatched kernel, fail-nth=4 reliably triggered the fault: Oops: general protection fault, probably for non-canonical address 0xdffffc0000000000: 0000 [#1] SMP KASAN NOPTI KASAN: null-ptr-deref in range [0x0000000000000000-0x0000000000000007] CPU: 0 UID: 0 PID: 273 Comm: bash Not tainted 7.0.0-01247-gd60bc1401583 #6 PREEMPT(lazy) Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 RIP: 0010:__blk_mq_free_map_and_rqs+0x8c/0x240 Code: 00 00 48 8b 6b 60 41 89 f4 49 c1 e4 03 4c 01 e5 45 85 ed 0f 85 0a 01 00 00 48 b8 00 00 00 00 00 fc ff df 48 89 e9 48 c1 e9 03 <80> 3c 01 00 0f 85 31 01 00 00 4c 8b 6d 00 4d 85 ed 0f 84 e2 00 00 RSP: 0018:ff1100000ab0fac8 EFLAGS: 00000246 RAX: dffffc0000000000 RBX: ff1100000c4806a0 RCX: 0000000000000000 RDX: 0000000000000002 RSI: 0000000000000000 RDI: ff1100000c4806f4 RBP: 0000000000000000 R08: 0000000000000001 R09: ffe21c000189001b R10: ff1100000c4800df R11: ff1100006cf37be0 R12: 0000000000000000 R13: 0000000000000000 R14: ff1100000c480700 R15: ff1100000c480004 FS: 00007f0fbe8fe740(0000) GS:ff110000e5851000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007fe53473b2e0 CR3: 0000000012eef000 CR4: 00000000007516f0 PKRU: 55555554 Call Trace: <TASK> blk_mq_free_tag_set+0x77/0x460 do_rbd_add+0x1446/0x2b80 ? __pfx_do_rbd_add+0x10/0x10 ? lock_acquire+0x18c/0x300 ? find_held_lock+0x2b/0x80 ? sysfs_file_kobj+0xb6/0x1b0 ? __pfx_sysfs_kf_write+0x10/0x10 kernfs_fop_write_iter+0x2f4/0x4a0 vfs_write+0x98e/0x1000 ? expand_files+0x51f/0x850 ? __pfx_vfs_write+0x10/0x10 ksys_write+0xf2/0x1d0 ? __pfx_ksys_write+0x10/0x10 do_syscall_64+0x115/0x690 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f0fbea15907 Code: 10 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 f3 0f 1e fa 64 8b 04 25 18 00 00 00 85 c0 75 10 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 51 c3 48 83 ec 28 48 89 54 24 18 48 89 74 24 RSP: 002b:00007ffe22346ea8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 RAX: ffffffffffffffda RBX: 0000000000000058 RCX: 00007f0fbea15907 RDX: 0000000000000058 RSI: 0000563ace6c0ef0 RDI: 0000000000000001 RBP: 0000563ace6c0ef0 R08: 0000563ace6c0ef0 R09: 6b6435726d694141 R10: 5250337279762f78 R11: 0000000000000246 R12: 0000000000000058 R13: 00007f0fbeb1c780 R14: ff1100000c480700 R15: ff1100000c480004 </TASK> With this fix applied, rerunning the reproducer over fail-nth=1..256 yields no KASAN reports. [ idryomov: rename err_out_device_del -> err_out_device ] Cc: stable@vger.kernel.org Fixes: 27c97abc30e2 ("rbd: add add_disk() error handling") Signed-off-by: Zilin Guan <zilin@seu.edu.cn> Signed-off-by: Dawei Feng <dawei.feng@seu.edu.cn> Reviewed-by: Ilya Dryomov <idryomov@gmail.com> Signed-off-by: Ilya Dryomov <idryomov@gmail.com>
2026-04-19Merge tag 'mm-stable-2026-04-18-02-14' of ↵Linus Torvalds1-1/+4
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull more MM updates from Andrew Morton: - "Eliminate Dying Memory Cgroup" (Qi Zheng and Muchun Song) Address the longstanding "dying memcg problem". A situation wherein a no-longer-used memory control group will hang around for an extended period pointlessly consuming memory - "fix unexpected type conversions and potential overflows" (Qi Zheng) Fix a couple of potential 32-bit/64-bit issues which were identified during review of the "Eliminate Dying Memory Cgroup" series - "kho: history: track previous kernel version and kexec boot count" (Breno Leitao) Use Kexec Handover (KHO) to pass the previous kernel's version string and the number of kexec reboots since the last cold boot to the next kernel, and print it at boot time - "liveupdate: prevent double preservation" (Pasha Tatashin) Teach LUO to avoid managing the same file across different active sessions - "liveupdate: Fix module unloading and unregister API" (Pasha Tatashin) Address an issue with how LUO handles module reference counting and unregistration during module unloading - "zswap pool per-CPU acomp_ctx simplifications" (Kanchana Sridhar) Simplify and clean up the zswap crypto compression handling and improve the lifecycle management of zswap pool's per-CPU acomp_ctx resources - "mm/damon/core: fix damon_call()/damos_walk() vs kdmond exit race" (SeongJae Park) Address unlikely but possible leaks and deadlocks in damon_call() and damon_walk() - "mm/damon/core: validate damos_quota_goal->nid" (SeongJae Park) Fix a couple of root-only wild pointer dereferences - "Docs/admin-guide/mm/damon: warn commit_inputs vs other params race" (SeongJae Park) Update the DAMON documentation to warn operators about potential races which can occur if the commit_inputs parameter is altered at the wrong time - "Minor hmm_test fixes and cleanups" (Alistair Popple) Bugfixes and a cleanup for the HMM kernel selftests - "Modify memfd_luo code" (Chenghao Duan) Cleanups, simplifications and speedups to the memfd_lou code - "mm, kvm: allow uffd support in guest_memfd" (Mike Rapoport) Support for userfaultfd in guest_memfd - "selftests/mm: skip several tests when thp is not available" (Chunyu Hu) Fix several issues in the selftests code which were causing breakage when the tests were run on CONFIG_THP=n kernels - "mm/mprotect: micro-optimization work" (Pedro Falcato) A couple of nice speedups for mprotect() - "MAINTAINERS: update KHO and LIVE UPDATE entries" (Pratyush Yadav) Document upcoming changes in the maintenance of KHO, LUO, memfd_luo, kexec, crash, kdump and probably other kexec-based things - they are being moved out of mm.git and into a new git tree * tag 'mm-stable-2026-04-18-02-14' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (121 commits) MAINTAINERS: add page cache reviewer mm/vmscan: avoid false-positive -Wuninitialized warning MAINTAINERS: update Dave's kdump reviewer email address MAINTAINERS: drop include/linux/liveupdate from LIVE UPDATE MAINTAINERS: drop include/linux/kho/abi/ from KHO MAINTAINERS: update KHO and LIVE UPDATE maintainers MAINTAINERS: update kexec/kdump maintainers entries mm/migrate_device: remove dead migration entry check in migrate_vma_collect_huge_pmd() selftests: mm: skip charge_reserved_hugetlb without killall userfaultfd: allow registration of ranges below mmap_min_addr mm/vmstat: fix vmstat_shepherd double-scheduling vmstat_update mm/hugetlb: fix early boot crash on parameters without '=' separator zram: reject unrecognized type= values in recompress_store() docs: proc: document ProtectionKey in smaps mm/mprotect: special-case small folios when applying permissions mm/mprotect: move softleaf code out of the main function mm: remove '!root_reclaim' checking in should_abort_scan() mm/sparse: fix comment for section map alignment mm/page_io: use sio->len for PSWPIN accounting in sio_read_complete() selftests/mm: transhuge_stress: skip the test when thp not available ...
2026-04-18zram: reject unrecognized type= values in recompress_store()Andrew Stellman1-0/+2
recompress_store() parses the type= parameter with three if statements checking for "idle", "huge", and "huge_idle". An unrecognized value silently falls through with mode left at 0, causing the recompression pass to run with no slot filter — processing all slots instead of the intended subset. Add a !mode check after the type parsing block to return -EINVAL for unrecognized values, consistent with the function's other parameter validation. Link: https://lore.kernel.org/20260407153027.42425-1-astellman@stellman-greene.com Signed-off-by: Andrew Stellman <astellman@stellman-greene.com> Suggested-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-18zram: do not forget to endio for partial discard requestsSergey Senozhatsky1-1/+2
As reported by Qu Wenruo and Avinesh Kumar, the following getconf PAGESIZE 65536 blkdiscard -p 4k /dev/zram0 takes literally forever to complete. zram doesn't support partial discards and just returns immediately w/o doing any discard work in such cases. The problem is that we forget to endio on our way out, so blkdiscard sleeps forever in submit_bio_wait(). Fix this by jumping to end_bio label, which does bio_endio(). Link: https://lore.kernel.org/20260331074255.777019-1-senozhatsky@chromium.org Fixes: 0120dd6e4e20 ("zram: make zram_bio_discard more self-contained") Signed-off-by: Sergey Senozhatsky <senozhatsky@chromium.org> Reported-by: Qu Wenruo <wqu@suse.com> Closes: https://lore.kernel.org/linux-block/92361cd3-fb8b-482e-bc89-15ff1acb9a59@suse.com Tested-by: Qu Wenruo <wqu@suse.com> Reported-by: Avinesh Kumar <avinesh.kumar@suse.com> Closes: https://bugzilla.suse.com/show_bug.cgi?id=1256530 Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Brian Geffon <bgeffon@google.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: Minchan Kim <minchan@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-04-17floppy: fix reference leak on platform_device_register() failureGuangshuo Li1-3/+7
When platform_device_register() fails in do_floppy_init(), the embedded struct device in floppy_device[drive] has already been initialized by device_initialize(), but the failure path jumps to out_remove_drives without dropping the device reference for the current drive. Previously registered floppy devices are cleaned up in out_remove_drives, but the device for the drive that fails registration is not, leading to a reference leak. The issue was identified by a static analysis tool I developed and confirmed by manual review. Fix this by calling put_device() for the current floppy device before jumping to the common cleanup path. Fixes: 94fd0db7bfb4a ("[PATCH] Floppy: Add cmos attribute to floppy driver") Cc: stable@vger.kernel.org Signed-off-by: Guangshuo Li <lgs201920130244@gmail.com> Link: https://patch.msgid.link/20260415145708.3331818-1-lgs201920130244@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-17ublk: use unchecked copy helpers for bio page dataMing Lei1-2/+10
Bio pages may originate from slab caches that lack a usercopy region (e.g. jbd2 frozen metadata buffers allocated via jbd2_alloc()). When CONFIG_HARDENED_USERCOPY is enabled, copy_to_iter() calls check_copy_size() which rejects these slab pages, triggering a kernel BUG in usercopy_abort(). This is a false positive: the data is ordinary block I/O content — the same data the loop driver writes to its backing file via vfs_iter_write(). The bvec length is always trusted, so the size check in check_copy_size() is not needed either. Switch to _copy_to_iter()/_copy_from_iter() which skip the check_copy_size() wrapper while the underlying copy_to_user() remains unchanged. Acked-by: Caleb Sander Mateos <csander@purestorage.com> Fixes: 2299ceec364e ("ublk: use copy_{to,from}_iter() for user copy") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://patch.msgid.link/20260415230246.808176-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-15Merge tag 'mm-stable-2026-04-13-21-45' of ↵Linus Torvalds5-154/+156
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull MM updates from Andrew Morton: - "maple_tree: Replace big node with maple copy" (Liam Howlett) Mainly prepararatory work for ongoing development but it does reduce stack usage and is an improvement. - "mm, swap: swap table phase III: remove swap_map" (Kairui Song) Offers memory savings by removing the static swap_map. It also yields some CPU savings and implements several cleanups. - "mm: memfd_luo: preserve file seals" (Pratyush Yadav) File seal preservation to LUO's memfd code - "mm: zswap: add per-memcg stat for incompressible pages" (Jiayuan Chen) Additional userspace stats reportng to zswap - "arch, mm: consolidate empty_zero_page" (Mike Rapoport) Some cleanups for our handling of ZERO_PAGE() and zero_pfn - "mm/kmemleak: Improve scan_should_stop() implementation" (Zhongqiu Han) A robustness improvement and some cleanups in the kmemleak code - "Improve khugepaged scan logic" (Vernon Yang) Improve khugepaged scan logic and reduce CPU consumption by prioritizing scanning tasks that access memory frequently - "Make KHO Stateless" (Jason Miu) Simplify Kexec Handover by transitioning KHO from an xarray-based metadata tracking system with serialization to a radix tree data structure that can be passed directly to the next kernel - "mm: vmscan: add PID and cgroup ID to vmscan tracepoints" (Thomas Ballasi and Steven Rostedt) Enhance vmscan's tracepointing - "mm: arch/shstk: Common shadow stack mapping helper and VM_NOHUGEPAGE" (Catalin Marinas) Cleanup for the shadow stack code: remove per-arch code in favour of a generic implementation - "Fix KASAN support for KHO restored vmalloc regions" (Pasha Tatashin) Fix a WARN() which can be emitted the KHO restores a vmalloc area - "mm: Remove stray references to pagevec" (Tal Zussman) Several cleanups, mainly udpating references to "struct pagevec", which became folio_batch three years ago - "mm: Eliminate fake head pages from vmemmap optimization" (Kiryl Shutsemau) Simplify the HugeTLB vmemmap optimization (HVO) by changing how tail pages encode their relationship to the head page - "mm/damon/core: improve DAMOS quota efficiency for core layer filters" (SeongJae Park) Improve two problematic behaviors of DAMOS that makes it less efficient when core layer filters are used - "mm/damon: strictly respect min_nr_regions" (SeongJae Park) Improve DAMON usability by extending the treatment of the min_nr_regions user-settable parameter - "mm/page_alloc: pcp locking cleanup" (Vlastimil Babka) The proper fix for a previously hotfixed SMP=n issue. Code simplifications and cleanups ensued - "mm: cleanups around unmapping / zapping" (David Hildenbrand) A bunch of cleanups around unmapping and zapping. Mostly simplifications, code movements, documentation and renaming of zapping functions - "support batched checking of the young flag for MGLRU" (Baolin Wang) Batched checking of the young flag for MGLRU. It's part cleanups; one benchmark shows large performance benefits for arm64 - "memcg: obj stock and slab stat caching cleanups" (Johannes Weiner) memcg cleanup and robustness improvements - "Allow order zero pages in page reporting" (Yuvraj Sakshith) Enhance free page reporting - it is presently and undesirably order-0 pages when reporting free memory. - "mm: vma flag tweaks" (Lorenzo Stoakes) Cleanup work following from the recent conversion of the VMA flags to a bitmap - "mm/damon: add optional debugging-purpose sanity checks" (SeongJae Park) Add some more developer-facing debug checks into DAMON core - "mm/damon: test and document power-of-2 min_region_sz requirement" (SeongJae Park) An additional DAMON kunit test and makes some adjustments to the addr_unit parameter handling - "mm/damon/core: make passed_sample_intervals comparisons overflow-safe" (SeongJae Park) Fix a hard-to-hit time overflow issue in DAMON core - "mm/damon: improve/fixup/update ratio calculation, test and documentation" (SeongJae Park) A batch of misc/minor improvements and fixups for DAMON - "mm: move vma_(kernel|mmu)_pagesize() out of hugetlb.c" (David Hildenbrand) Fix a possible issue with dax-device when CONFIG_HUGETLB=n. Some code movement was required. - "zram: recompression cleanups and tweaks" (Sergey Senozhatsky) A somewhat random mix of fixups, recompression cleanups and improvements in the zram code - "mm/damon: support multiple goal-based quota tuning algorithms" (SeongJae Park) Extend DAMOS quotas goal auto-tuning to support multiple tuning algorithms that users can select - "mm: thp: reduce unnecessary start_stop_khugepaged()" (Breno Leitao) Fix the khugpaged sysfs handling so we no longer spam the logs with reams of junk when starting/stopping khugepaged - "mm: improve map count checks" (Lorenzo Stoakes) Provide some cleanups and slight fixes in the mremap, mmap and vma code - "mm/damon: support addr_unit on default monitoring targets for modules" (SeongJae Park) Extend the use of DAMON core's addr_unit tunable - "mm: khugepaged cleanups and mTHP prerequisites" (Nico Pache) Cleanups to khugepaged and is a base for Nico's planned khugepaged mTHP support - "mm: memory hot(un)plug and SPARSEMEM cleanups" (David Hildenbrand) Code movement and cleanups in the memhotplug and sparsemem code - "mm: remove CONFIG_ARCH_ENABLE_MEMORY_HOTREMOVE and cleanup CONFIG_MIGRATION" (David Hildenbrand) Rationalize some memhotplug Kconfig support - "change young flag check functions to return bool" (Baolin Wang) Cleanups to change all young flag check functions to return bool - "mm/damon/sysfs: fix memory leak and NULL dereference issues" (Josh Law and SeongJae Park) Fix a few potential DAMON bugs - "mm/vma: convert vm_flags_t to vma_flags_t in vma code" (Lorenzo Stoakes) Convert a lot of the existing use of the legacy vm_flags_t data type to the new vma_flags_t type which replaces it. Mainly in the vma code. - "mm: expand mmap_prepare functionality and usage" (Lorenzo Stoakes) Expand the mmap_prepare functionality, which is intended to replace the deprecated f_op->mmap hook which has been the source of bugs and security issues for some time. Cleanups, documentation, extension of mmap_prepare into filesystem drivers - "mm/huge_memory: refactor zap_huge_pmd()" (Lorenzo Stoakes) Simplify and clean up zap_huge_pmd(). Additional cleanups around vm_normal_folio_pmd() and the softleaf functionality are performed. * tag 'mm-stable-2026-04-13-21-45' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (369 commits) mm: fix deferred split queue races during migration mm/khugepaged: fix issue with tracking lock mm/huge_memory: add and use has_deposited_pgtable() mm/huge_memory: add and use normal_or_softleaf_folio_pmd() mm: add softleaf_is_valid_pmd_entry(), pmd_to_softleaf_folio() mm/huge_memory: separate out the folio part of zap_huge_pmd() mm/huge_memory: use mm instead of tlb->mm mm/huge_memory: remove unnecessary sanity checks mm/huge_memory: deduplicate zap deposited table call mm/huge_memory: remove unnecessary VM_BUG_ON_PAGE() mm/huge_memory: add a common exit path to zap_huge_pmd() mm/huge_memory: handle buggy PMD entry in zap_huge_pmd() mm/huge_memory: have zap_huge_pmd return a boolean, add kdoc mm/huge: avoid big else branch in zap_huge_pmd() mm/huge_memory: simplify vma_is_specal_huge() mm: on remap assert that input range within the proposed VMA mm: add mmap_action_map_kernel_pages[_full]() uio: replace deprecated mmap hook with mmap_prepare in uio_info drivers: hv: vmbus: replace deprecated mmap hook with mmap_prepare mm: allow handling of stacked mmap_prepare hooks in more drivers ...
2026-04-15zloop: remove irq-safe lockingChristoph Hellwig1-22/+15
All of zloop runs in user context, so drop the irq-safe locking. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://patch.msgid.link/20260414081811.549755-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-15zloop: factor out zloop_mark_{full,empty} helpersChristoph Hellwig1-22/+26
Move a few chunks of duplicated code into helpers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://patch.msgid.link/20260414081811.549755-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-15zloop: set RQF_QUIET when completing requests on deleted devicesChristoph Hellwig1-1/+3
Reduce the dmesg spam for tests that involve device deletion. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://patch.msgid.link/20260414081811.549755-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-15zloop: improve the unaligned write pointer warningChristoph Hellwig1-3/+3
Use the IS_ALIGNED helper and avoid extra conversions, and tell the user what the unaligned size is. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://patch.msgid.link/20260414081811.549755-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-15zloop: use vfs_truncateChristoph Hellwig1-15/+2
While vfs_truncate does various extra checks that we don't really need, it is always better to use a VFS helper rather than open coding the logic. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://patch.msgid.link/20260414081811.549755-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-15zloop: fix write pointer calculation in zloop_forget_cacheChristoph Hellwig1-2/+11
The write pointer is absolute and in sector units, so we need to convert it to a relative byte address first. Fixes: c505448748f7 ("zloop: forget write cache on force removal") Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://patch.msgid.link/20260414081811.549755-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-14Merge tag 'x86-cleanups-2026-04-13' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Ingo Molnar: - Consolidate AMD and Hygon cases in parse_topology() (Wei Wang) - asm constraints cleanups in __iowrite32_copy() (Uros Bizjak) - Drop AMD Extended Interrupt LVT macros (Naveen N Rao) - Don't use REALLY_SLOW_IO for delays (Juergen Gross) - paravirt cleanups (Juergen Gross) - FPU code cleanups (Borislav Petkov) - split-lock handling code cleanups (Borislav Petkov, Ronan Pigott) * tag 'x86-cleanups-2026-04-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/fpu: Correct the comment explaining what xfeatures_in_use() does x86/split_lock: Don't warn about unknown split_lock_detect parameter x86/fpu: Correct misspelled xfeaures_to_write local var x86/apic: Drop AMD Extended Interrupt LVT macros x86/cpu/topology: Consolidate AMD and Hygon cases in parse_topology() block/floppy: Don't use REALLY_SLOW_IO for delays x86/paravirt: Replace io_delay() hook with a bool x86/irqflags: Preemptively move include paravirt.h directive where it belongs x86/split_lock: Restructure the unwieldy switch-case in sld_state_show() x86/local: Remove trailing semicolon from _ASM_XADD in local_add_return() x86/asm: Use inout "+" asm onstraint modifiers in __iowrite32_copy()
2026-04-13Merge tag 'for-7.1/block-20260411' of ↵Linus Torvalds7-509/+1151
git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux Pull block updates from Jens Axboe: - Add shared memory zero-copy I/O support for ublk, bypassing per-I/O copies between kernel and userspace by matching registered buffer PFNs at I/O time. Includes selftests. - Refactor bio integrity to support filesystem initiated integrity operations and arbitrary buffer alignment. - Clean up bio allocation, splitting bio_alloc_bioset() into clear fast and slow paths. Add bio_await() and bio_submit_or_kill() helpers, unify synchronous bi_end_io callbacks. - Fix zone write plug refcount handling and plug removal races. Add support for serializing zone writes at QD=1 for rotational zoned devices, yielding significant throughput improvements. - Add SED-OPAL ioctls for Single User Mode management and a STACK_RESET command. - Add io_uring passthrough (uring_cmd) support to the BSG layer. - Replace pp_buf in partition scanning with struct seq_buf. - zloop improvements and cleanups. - drbd genl cleanup, switching to pre_doit/post_doit. - NVMe pull request via Keith: - Fabrics authentication updates - Enhanced block queue limits support - Workqueue usage updates - A new write zeroes device quirk - Tagset cleanup fix for loop device - MD pull requests via Yu Kuai: - Fix raid5 soft lockup in retry_aligned_read() - Fix raid10 deadlock with check operation and nowait requests - Fix raid1 overlapping writes on writemostly disks - Fix sysfs deadlock on array_state=clear - Proactive RAID-5 parity building with llbitmap, with write_zeroes_unmap optimization for initial sync - Fix llbitmap barrier ordering, rdev skipping, and bitmap_ops version mismatch fallback - Fix bcache use-after-free and uninitialized closure - Validate raid5 journal metadata payload size - Various cleanups - Various other fixes, improvements, and cleanups * tag 'for-7.1/block-20260411' of git://git.kernel.org/pub/scm/linux/kernel/git/axboe/linux: (146 commits) ublk: fix tautological comparison warning in ublk_ctrl_reg_buf scsi: bsg: fix buffer overflow in scsi_bsg_uring_cmd() block: refactor blkdev_zone_mgmt_ioctl MAINTAINERS: update ublk driver maintainer email Documentation: ublk: address review comments for SHMEM_ZC docs ublk: allow buffer registration before device is started ublk: replace xarray with IDA for shmem buffer index allocation ublk: simplify PFN range loop in __ublk_ctrl_reg_buf ublk: verify all pages in multi-page bvec fall within registered range ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support xfs: use bio_await in xfs_zone_gc_reset_sync block: add a bio_submit_or_kill helper block: factor out a bio_await helper block: unify the synchronous bi_end_io callbacks xfs: fix number of GC bvecs selftests/ublk: add read-only buffer registration test selftests/ublk: add filesystem fio verify test for shmem_zc selftests/ublk: add hugetlbfs shmem_zc test for loop target selftests/ublk: add shared memory zero-copy test selftests/ublk: add UBLK_F_SHMEM_ZC support for loop target ...
2026-04-10ublk: fix tautological comparison warning in ublk_ctrl_reg_bufMing Lei1-8/+6
On 32-bit architectures, 'unsigned long size' can never exceed UBLK_SHMEM_BUF_SIZE_MAX (1ULL << 32), causing a tautological comparison warning. Validate buf_reg.len (__u64) directly before using it, and consolidate all input validation into a single check. Also remove the unnecessary local variables 'addr' and 'size' since buf_reg.addr and buf_reg.len can be used directly. Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202604101952.3NOzqnu9-lkp@intel.com/ Fixes: 23b3b6f0b584 ("ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer support") Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260410124136.3983429-1-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-09ublk: allow buffer registration before device is startedMing Lei1-51/+28
Before START_DEV, there is no disk, no queue, no I/O dispatch, so the maple tree can be safely modified under ub->mutex alone without freezing the queue. Add ublk_lock_buf_tree()/ublk_unlock_buf_tree() helpers that take ub->mutex first, then freeze the queue if device is started. This ordering (mutex -> freeze) is safe because ublk_stop_dev_unlocked() already holds ub->mutex when calling del_gendisk() which freezes the queue. Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260409133020.3780098-6-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-09ublk: replace xarray with IDA for shmem buffer index allocationMing Lei1-46/+46
Remove struct ublk_buf which only contained nr_pages that was never read after registration. Use IDA for pure index allocation instead of xarray. Make __ublk_ctrl_unreg_buf() return int so the caller can detect invalid index without a separate lookup. Simplify ublk_buf_cleanup() to walk the maple tree directly and unpin all pages in one pass, instead of iterating the xarray by buffer index. Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260409133020.3780098-5-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-09ublk: simplify PFN range loop in __ublk_ctrl_reg_bufMing Lei1-3/+2
Use the for-loop increment instead of a manual `i++` past the last page, and fix the mtree_insert_range end key accordingly. Suggested-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260409133020.3780098-4-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-09ublk: verify all pages in multi-page bvec fall within registered rangeMing Lei1-6/+11
rq_for_each_bvec() yields multi-page bvecs where bv_page is only the first page. ublk_try_buf_match() only validated the start PFN against the maple tree, but a bvec can span multiple pages past the end of a registered range. Use mas_walk() instead of mtree_load() to obtain the range boundaries stored in the maple tree, and check that the bvec's end PFN does not exceed the range. Also remove base_pfn from struct ublk_buf_range since mas.index already provides the range start PFN. Reported-by: Caleb Sander Mateos <csander@purestorage.com> Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260409133020.3780098-3-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-09ublk: widen ublk_shmem_buf_reg.len to __u64 for 4GB buffer supportMing Lei1-1/+8
The __u32 len field cannot represent a 4GB buffer (0x100000000 overflows to 0). Change it to __u64 so buffers up to 4GB can be registered. Add a reserved field for alignment and validate it is zero. The kernel enforces a default max of 4GB (UBLK_SHMEM_BUF_SIZE_MAX) which may be increased in future. Signed-off-by: Ming Lei <tom.leiming@gmail.com> Link: https://patch.msgid.link/20260409133020.3780098-2-tom.leiming@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07ublk: eliminate permanent pages[] array from struct ublk_bufMing Lei1-32/+55
The pages[] array (kvmalloc'd, 8 bytes per page = 2MB for a 1GB buffer) was stored permanently in struct ublk_buf but only needed during pin_user_pages_fast() and maple tree construction. Since the maple tree already stores PFN ranges via ublk_buf_range, struct page pointers can be recovered via pfn_to_page() during unregistration. Make pages[] a temporary allocation in ublk_ctrl_reg_buf(), freed immediately after the maple tree is built. Rewrite __ublk_ctrl_unreg_buf() to iterate the maple tree for matching buf_index entries, recovering struct page pointers via pfn_to_page() and unpinning in batches of 32. Simplify ublk_buf_erase_ranges() to iterate the maple tree by buf_index instead of walking the now-removed pages[] array. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://patch.msgid.link/20260331153207.3635125-5-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07ublk: enable UBLK_F_SHMEM_ZC feature flagMing Lei1-3/+4
Add UBLK_F_SHMEM_ZC (1ULL << 19) to the UAPI header and UBLK_F_ALL. Switch ublk_support_shmem_zc() and ublk_dev_support_shmem_zc() from returning false to checking the actual flag, enabling the shared memory zero-copy feature for devices that request it. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://patch.msgid.link/20260331153207.3635125-4-ming.lei@redhat.com [axboe: ublk_buf_reg -> ublk_shmem_buf_reg errors] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07ublk: add PFN-based buffer matching in I/O pathMing Lei1-1/+76
Add ublk_try_buf_match() which walks a request's bio_vecs, looks up each page's PFN in the per-device maple tree, and verifies all pages belong to the same registered buffer at contiguous offsets. Add ublk_iod_is_shmem_zc() inline helper for checking whether a request uses the shmem zero-copy path. Integrate into the I/O path: - ublk_setup_iod(): if pages match a registered buffer, set UBLK_IO_F_SHMEM_ZC and encode buffer index + offset in addr - ublk_start_io(): skip ublk_map_io() for zero-copy requests - __ublk_complete_rq(): skip ublk_unmap_io() for zero-copy requests The feature remains disabled (ublk_support_shmem_zc() returns false) until the UBLK_F_SHMEM_ZC flag is enabled in the next patch. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://patch.msgid.link/20260331153207.3635125-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07ublk: add UBLK_U_CMD_REG_BUF/UNREG_BUF control commandsMing Lei1-0/+295
Add control commands for registering and unregistering shared memory buffers for zero-copy I/O: - UBLK_U_CMD_REG_BUF (0x18): pins pages from userspace, inserts PFN ranges into a per-device maple tree for O(log n) lookup during I/O. Buffer pointers are tracked in a per-device xarray. Returns the assigned buffer index. - UBLK_U_CMD_UNREG_BUF (0x19): removes PFN entries and unpins pages. Queue freeze/unfreeze is handled internally so userspace need not quiesce the device during registration. Also adds: - UBLK_IO_F_SHMEM_ZC flag and addr encoding helpers in UAPI header (16-bit buffer index supporting up to 65536 buffers) - Data structures (ublk_buf, ublk_buf_range) and xarray/maple tree - __ublk_ctrl_reg_buf() helper for PFN insertion with error unwinding - __ublk_ctrl_unreg_buf() helper for cleanup reuse - ublk_support_shmem_zc() / ublk_dev_support_shmem_zc() stubs (returning false — feature not enabled yet) Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://patch.msgid.link/20260331153207.3635125-2-ming.lei@redhat.com [axboe: fixup ublk_buf_reg -> ublk_shmem_buf_reg errors, comments] Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-07drbd: use get_random_u64() where appropriateDavid Carlier2-3/+3
Use the typed random integer helpers instead of get_random_bytes() when filling a single integer variable. The helpers return the value directly, require no pointer or size argument, and better express intent. Signed-off-by: David Carlier <devnexen@gmail.com> Reviewed-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://patch.msgid.link/20260405154704.4610-1-devnexen@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-06drbd: remove DRBD_GENLA_F_MANDATORY flag handlingChristoph Böhmwalder4-79/+6
DRBD used a custom mechanism to mark netlink attributes as "mandatory": bit 14 of nla_type was repurposed as DRBD_GENLA_F_MANDATORY. Attributes sent from userspace that had this bit present and that were unknown to the kernel would lead to an error. Since commit ef6243acb478 ("genetlink: optionally validate strictly/dumps"), the generic netlink layer rejects unknown top-level attributes when strict validation is enabled. DRBD never opted out of strict validation, so unknown top-level attributes are already rejected by the netlink core. The mandatory flag mechanism was required for nested attributes, because these are parsed liberally, silently dropping attributes unknown to the kernel. This prepares for the move to a new YNL-based family, which will use the now-default strict parsing. The current family is not expected to gain any new attributes, which makes this change safe. Old userspace that still sets bit 14 is unaffected: nla_type() strips it before __nla_validate_parse() performs attribute validation, so the bit never reaches DRBD. Remove all references to the mandatory flag in DRBD. Cc: Johannes Berg <johannes.berg@intel.com> Cc: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Christoph Böhmwalder <christoph.boehmwalder@linbit.com> Link: https://patch.msgid.link/20260403132953.2248751-1-christoph.boehmwalder@linbit.com Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-04-06ublk: reset per-IO canceled flag on each fetchUday Shankar1-8/+13
If a ublk server starts recovering devices but dies before issuing fetch commands for all IOs, cancellation of the fetch commands that were successfully issued may never complete. This is because the per-IO canceled flag can remain set even after the fetch for that IO has been submitted - the per-IO canceled flags for all IOs in a queue are reset together only once all IOs for that queue have been fetched. So if a nonempty proper subset of the IOs for a queue are fetched when the ublk server dies, the IOs in that subset will never successfully be canceled, as their canceled flags remain set, and this prevents ublk_cancel_cmd from actually calling io_uring_cmd_done on the commands, despite the fact that they are outstanding. Fix this by resetting the per-IO cancel flags imm