aboutsummaryrefslogtreecommitdiff
path: root/fs/btrfs/extent-tree.c
AgeCommit message (Collapse)AuthorFilesLines
2026-04-07btrfs: decrease indentation of find_free_extent_update_loopJohannes Thumshirn1-54/+55
Decrease the indentation of find_free_extent_update_loop(), by inverting the check if the loop state is smaller than LOOP_NO_EMPTY_SIZE. This also allows for an early return from find_free_extent_update_loop(), in case LOOP_NO_EMPTY_SIZE is already set at this point. While at it change a if () { } else if else pattern to all using curly braces and be consistent with the rest of btrfs code. Also change 'int exists' to 'bool have_trans' giving it a more meaningful name and type. No functional changes intended. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: remove atomic parameter from btrfs_buffer_uptodate()Qu Wenruo1-1/+1
That parameter was introduced by commit b9fab919b748 ("Btrfs: avoid sleeping in verify_parent_transid while atomic"). At that time we needed to lock the extent buffer range inside the io tree to avoid content changes, thus it could sleep. But that behavior is no longer there, as later commit 9e2aff90fc2a ("btrfs: stop using lock_extent in btrfs_buffer_uptodate") dropped the io tree lock. We can remove the @atomic parameter safely now. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: revalidate cached tree blocks on the uptodate pathZhengYuan Huang1-1/+1
read_extent_buffer_pages_nowait() returns immediately when an extent buffer is already marked uptodate. On that cache-hit path, the caller supplied btrfs_tree_parent_check is not re-run. This can let read_tree_root_path() accept a cached tree block whose actual header level/owner does not match the expected value derived from the parent. E.g. a corrupted root item that points to a tree block which doesn't even belong to that root, and has mismatching level/owner. But that tree block is already read and cached, later the corrupted tree root got read from disk and hit the cached tree block. Fix this by re-validating cached extent buffers against the supplied btrfs_tree_parent_check on the uptodate path, and make read_tree_root_path() pass its check to btrfs_buffer_uptodate(). This makes cache hits and fresh reads follow the same tree-parent verification rules, and turns the corruption into a read failure instead of constructing an inconsistent root object. Signed-off-by: ZhengYuan Huang <gality369@gmail.com> Reviewed-by: Qu Wenruo <wqu@suse.com> [ Resolve the conflict with extent_buffer_uptodate() helper, handle transid mismatch case ] Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-04-07btrfs: zoned: remove redundant space_info lock and variable in ↵Jiasheng Jiang1-6/+2
do_allocation_zoned() In do_allocation_zoned(), the code acquires space_info->lock before block_group->lock. However, the critical section does not access or modify any members of the space_info structure. Thus, the lock is redundant as it provides no necessary synchronization here. This change simplifies the locking logic and aligns the function with other zoned paths, such as __btrfs_add_free_space_zoned(), which only rely on block_group->lock. Since the 'space_info' local variable is no longer used after removing the lock calls, it is also removed. Removing this unnecessary lock reduces contention on the global space_info lock, improving concurrency in the zoned allocation path. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Jiasheng Jiang <jiashengjiangcool@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-31btrfs: fix incorrect return value after changing leaf in ↵robbieko1-1/+1
lookup_extent_data_ref() After commit 1618aa3c2e01 ("btrfs: simplify return variables in lookup_extent_data_ref()"), the err and ret variables were merged into a single ret variable. However, when btrfs_next_leaf() returns 0 (success), ret is overwritten from -ENOENT to 0. If the first key in the next leaf does not match (different objectid or type), the function returns 0 instead of -ENOENT, making the caller believe the lookup succeeded when it did not. This can lead to operations on the wrong extent tree item, potentially causing extent tree corruption. Fix this by returning -ENOENT directly when the key does not match, instead of relying on the ret variable. Fixes: 1618aa3c2e01 ("btrfs: simplify return variables in lookup_extent_data_ref()") CC: stable@vger.kernel.org # 6.12+ Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: robbieko <robbieko@synology.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-17btrfs: check for NULL root after calls to btrfs_csum_root()Filipe Manana1-2/+18
btrfs_csum_root() can return a NULL pointer in case the root we are looking for is not in the rb tree that tracks roots. So add checks to every caller that is missing such check to log a message and return an error. Reported-by: Chris Mason <clm@meta.com> Link: https://lore.kernel.org/linux-btrfs/20260208161657.3972997-1-clm@meta.com/ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-03-17btrfs: check for NULL root after calls to btrfs_extent_root()Filipe Manana1-3/+75
btrfs_extent_root() can return a NULL pointer in case the root we are looking for is not in the rb tree that tracks roots. So add checks to every caller that is missing such check to log a message and return an error. The same applies to callers of btrfs_block_group_root(), since it calls btrfs_extent_root(). Reported-by: Chris Mason <clm@meta.com> Link: https://lore.kernel.org/linux-btrfs/20260208161657.3972997-1-clm@meta.com/ Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-26btrfs: handle discard errors in in btrfs_finish_extent_commit()Jingkai Tan1-1/+7
Coverity (ID: 1226842) reported that the return value of btrfs_discard_extent() is assigned to ret but is immediately overwritten by unpin_extent_range() without being checked. Use the same error handling that is done later in the same function. Signed-off-by: Jingkai Tan <contact@jingk.ai> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: fix transaction commit blocking during trim of unallocated spacejinbaohong1-22/+134
When trimming unallocated space, btrfs_trim_fs() holds the device_list_mutex for the entire duration while iterating through all devices. On large filesystems with significant unallocated space, this operation can take minutes to hours on large storage systems. This causes a problem because btrfs_run_dev_stats(), which is called during transaction commit, also requires device_list_mutex: btrfs_trim_fs() mutex_lock(&fs_devices->device_list_mutex) list_for_each_entry(device, ...) btrfs_trim_free_extents(device) mutex_unlock(&fs_devices->device_list_mutex) commit_transaction() btrfs_run_dev_stats() mutex_lock(&fs_devices->device_list_mutex) // blocked! ... While trim is running, all transaction commits are blocked waiting for the mutex. Fix this by refactoring btrfs_trim_free_extents() to process devices in bounded chunks (up to 2GB per iteration) and release device_list_mutex between chunks. Signed-off-by: robbieko <robbieko@synology.com> Signed-off-by: jinbaohong <jinbaohong@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: handle user interrupt properly in btrfs_trim_fs()jinbaohong1-0/+11
When a fatal signal is pending or the process is freezing, btrfs_trim_block_group() and btrfs_trim_free_extents() return -ERESTARTSYS. Currently this is treated as a regular error: the loops continue to the next iteration and count it as a block group or device failure. Instead, break out of the loops immediately and return -ERESTARTSYS to userspace without counting it as a failure. Also skip the device loop entirely if the block group loop was interrupted. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: jinbaohong <jinbaohong@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: preserve first error in btrfs_trim_fs()jinbaohong1-6/+9
When multiple block groups or devices fail during trim, preserve the first error encountered rather than the last one. The first error is typically more useful for debugging as it represents the original failure, while subsequent errors may be cascading effects. Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: jinbaohong <jinbaohong@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: continue trimming remaining devices on failurejinbaohong1-1/+1
Commit 93bba24d4b5a ("btrfs: Enhance btrfs_trim_fs function to handle error better") intended to make device trimming continue even if one device fails, tracking failures and reporting them at the end. However, it used 'break' instead of 'continue', causing the loop to exit on the first device failure. Fix this by replacing 'break' with 'continue'. Fixes: 93bba24d4b5a ("btrfs: Enhance btrfs_trim_fs function to handle error better") CC: stable@vger.kernel.org # 5.4+ Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Robbie Ko <robbieko@synology.com> Signed-off-by: jinbaohong <jinbaohong@synology.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: remove pointless out labels from extent-tree.cFilipe Manana1-14/+10
Some functions (lookup_extent_data_ref(), __btrfs_mod_ref() and btrfs_free_tree_block()) have an 'out' label that does nothing but return, making it pointless. Simplify this by removing the label and returning instead of gotos plus setting the 'ret' variable. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: use the btrfs_block_group_end() helper everywhereFilipe Manana1-5/+4
We have a helper to calculate a block group's exclusive end offset, but we only use it in some places. Update every site that open codes the calculation to use the helper. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: handle discarding fully-remapped block groupsMark Harmstone1-0/+2
Discard normally works by iterating over the free-space entries of a block group. This doesn't work for fully-remapped block groups, as we removed their free-space entries when we started relocation. For sync discard, call btrfs_discard_extent() when we commit the transaction in which the last identity remap was removed. For async discard, add a new function btrfs_trim_fully_remapped_block_group() to be called by the discard worker, which iterates over the block group's range using the normal async discard rules. Once we reach the end, remove the chunk's stripes and device extents to get back its free space. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: add do_remap parameter to btrfs_discard_extent()Mark Harmstone1-4/+5
btrfs_discard_extent() can be called either when an extent is removed or from walking the free-space tree. With a remapped block group these two things are no longer equivalent: the extent's addresses are remapped, while the free-space tree exclusively uses underlying addresses. Add a do_remap parameter to btrfs_discard_extent() and btrfs_map_discard(), saying whether or not the address needs to be run through the remap tree first. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: move existing remaps before relocating block groupMark Harmstone1-2/+4
If when relocating a block group we find that `remap_bytes` > 0 in its block group item, that means that it has been the destination block group for another that has been remapped. We need to search the remap tree for any remap backrefs within this range, and move the data to a third block group. This is because otherwise btrfs_translate_remap() could end up following an unbounded chain of remaps, which would only get worse over time. We only relocate one block group at a time, so `remap_bytes` will only ever go down while we are doing this. Once we're finished we set the REMAPPED flag on the block group, which will permanently prevent any other data from being moved to within it. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: handle deletions from remapped block groupMark Harmstone1-6/+88
Handle the case where we free an extent from a block group that has the REMAPPED flag set. Because the remap tree is orthogonal to the extent tree, for data this may be within any number of identity remaps or actual remaps. If we're freeing a metadata node, this will be wholly inside one or the other. btrfs_remove_extent_from_remap_tree() searches the remap tree for the remaps that cover the range in question, then calls remove_range_from_remap_tree() for each one, to punch a hole in the remap and adjust the free-space tree. For an identity remap, remove_range_from_remap_tree() will adjust the block group's `identity_remap_count` if this changes. If it reaches zero we mark the block group as fully remapped. For an identity remap, remove_range_from_remap_tree() will adjust the block group's `identity_remap_count` if this changes. If it reaches zero we mark the block group as fully remapped. Fully remapped block groups have their chunk stripes removed and their device extents freed, which makes the disk space available again to the chunk allocator. This happens asynchronously: in the cleaner thread for sync discard and nodiscard, and (in a later patch) in the discard worker for async discard. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: allow mounting filesystems with remap-tree incompat flagMark Harmstone1-0/+2
If we encounter a filesystem with the remap-tree incompat flag set, validate its compatibility with the other flags, and load the remap tree using the values that have been added to the superblock. The remap-tree feature depends on the free-space-tree, but no-holes and block-group-tree have been made dependencies to reduce the testing matrix. Similarly I'm not aware of any reason why mixed-bg and zoned would be incompatible with remap-tree, but this is blocked for the time being until it can be fully tested. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: don't add metadata items for the remap tree to the extent treeMark Harmstone1-1/+30
There is the following potential problem with the remap tree and delayed refs: * Remapped extent freed in a delayed ref, which removes an entry from the remap tree * Remap tree now small enough to fit in a single leaf * Corruption as we now have a level-0 block with a level-1 metadata item in the extent tree One solution to this would be to rework the remap tree code so that it operates via delayed refs. But as we're hoping to remove cow-only metadata items in the future anyway, change things so that the remap tree doesn't have any entries in the extent tree. This also has the benefit of reducing write amplification. We also make it so that the clear_cache mount option is a no-op, as with the extent tree v2, as the free-space tree can no longer be recreated from the extent tree. Finally disable relocating the remap tree itself, which is added back in a later patch. As it is we would get corruption as the traditional relocation method walks the extent tree, and we're removing its metadata items. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Mark Harmstone <mark@harmstone.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: tag as unlikely error handling in run_one_delayed_ref()Filipe Manana1-3/+5
We don't expect to get errors unless we have a corrupted fs, bad RAM or a bug. So tag the error handling as unlikely. This slightly reduces the module's text size on x86_64 using gcc 14.2.0-19 from Debian. Before this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1939458 172512 15592 2127562 2076ca fs/btrfs/btrfs.ko After this change: $ size fs/btrfs/btrfs.ko text data bss dec hex filename 1939398 172512 15592 2127502 20768e fs/btrfs/btrfs.ko Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: remove unnecessary else branch in run_one_delayed_ref()Filipe Manana1-3/+1
There is no need for an else branch to deal with an unexpected delayed ref type. We can just change the previous branch to deal with this by checking if the ref type is not BTRFS_EXTENT_OWNER_REF_KEY, since that branch is useless as it only sets 'ret' to zero when it's already zero. So merge the two branches. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: don't BUG() on unexpected delayed ref type in run_one_delayed_ref()Filipe Manana1-8/+12
There is no need to BUG(), we can just return an error and log an error message. Reviewed-by: Boris Burkov <boris@bur.io> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: zoned: re-flow prepare_allocation_zoned()Johannes Thumshirn1-17/+24
Re-flow prepare allocation zoned to make it a bit more readable by returning early and removing unnecessary indentations. This patch does not change any functionality. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: merge setting ret and return retDavid Sterba1-7/+3
In many places we have pattern: ret = ...; return ret; This can be simplified to a direct return, removing 'ret' if not otherwise needed. The places in self tests are not converted so we can add more test cases without changing surrounding code (extent-map-tests.c:test_case_4()). Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: simplify boolean argument for btrfs_inc_ref()/btrfs_dec_ref()Sun YangKai1-12/+6
Replace open-coded if/else blocks with the boolean directly and introduce local const bool variables, making the code shorter and easier to read. Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: use true/false for boolean parameters in btrfs_inc_ref()/btrfs_dec_ref()Sun YangKai1-4/+4
Replace integer literals 0/1 with true/false when calling btrfs_inc_ref() and btrfs_dec_ref() to make the code self-documenting and avoid mixing bool/integer types. Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: update comment for visit_node_for_delete()Sun YangKai1-1/+0
Drop the obsolete @refs parameter from the comment so the argument list matches the current function signature after commit f8c4d59de23c9 ("btrfs: drop unused parameter refs from visit_node_for_delete()"). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-25btrfs: remaining BTRFS_PATH_AUTO_FREE conversionsDavid Sterba1-25/+16
Do the remaining btrfs_path conversion to the auto cleaning, this seems to be the last one. Most of the conversions are trivial, only adding the declaration and removing the freeing, or changing the goto patterns to return. There are some functions with many changes, like __btrfs_free_extent(), btrfs_remove_from_free_space_tree() or btrfs_add_to_free_space_tree() but it still follows the same pattern. Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: use booleans for delalloc arguments and struct find_free_extent_ctlFilipe Manana1-9/+7
The struct find_free_extent_ctl uses an int for the 'delalloc' field but it's always used as a boolean, and its value is used to be passed to several functions to signal if we are dealing with delalloc. The same goes for the 'is_data' argument from btrfs_reserve_extent(). So change the type from int to bool and move the field definition in the find_free_extent_ctl structure so that it's close to other bool fields and reduces the size of the structure from 144 down to 136 bytes (at the moment it's only declared in the stack of btrfs_reserve_extent(), never allocated otherwise). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: use bool type for btrfs_path members used as booleansFilipe Manana1-4/+4
Many fields of struct btrfs_path are used as booleans but their type is an unsigned int (of one 1 bit width to save space). Change the type to bool keeping the :1 suffix so that they combine with the previous u8 fields in order to save space. This makes the code more clear by using explicit true/false and more in line with the preferred style, preserving the size of the structure. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: headers cleanup to remove unnecessary local includesQu Wenruo1-0/+1
[BUG] When I tried to remove btrfs_bio::fs_info and use btrfs_bio::inode to grab the fs_info, the header "btrfs_inode.h" is needed to access the full btrfs_inode structure. Then btrfs will fail to compile. [CAUSE] There is a recursive including chain: "bio.h" -> "btrfs_inode.h" -> "extent_map.h" -> "compression.h" -> "bio.h" That recursive including is causing problems for btrfs. [ENHANCEMENT] To reduce the risk of recursive including: - Remove unnecessary local includes from btrfs headers Either the included header is pulled in by other headers, or is completely unnecessary. - Remove btrfs local includes if the header only requires a pointer In that case let the implementing C file to pull the required header. This is especially important for headers like "btrfs_inode.h" which pulls in a lot of other btrfs headers, thus it's a mine field of recursive including. - Remove unnecessary temporary structure definition Either if we have included the header defining the structure, or completely unused. Now including "btrfs_inode.h" inside "bio.h" is completely fine, although "btrfs_inode.h" still includes "extent_map.h", but that header only includes "fs.h", no more chain back to "bio.h". Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: apply the AUTO_K(V)FREE macros throughout the codeMiquel Sabaté Solà1-10/+7
Apply the AUTO_KFREE and AUTO_KVFREE macros wherever it makes sense. Since this macro is expected to improve code readability, it has been avoided in places where the lifetime of objects wasn't easy to follow and a cleanup attribute would've made things worse; or when the cleanup section of a function involved many other things and thus there was no readability impact anyways. This change has also not been applied in extremely short functions where readability was clearly not an issue. Signed-off-by: Miquel Sabaté Solà <mssola@mssola.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: remove pointless label and goto from unpin_extent_range()Filipe Manana1-5/+3
There's no need to have an 'out' label and jump there in case we can not find a block group. We can simply return directly since there are no resources to release, removing the need for the label and the 'ret' variable. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: reduce block group critical section in unpin_extent_range()Filipe Manana1-8/+8
There's no need to update the bytes_pinned, bytes_readonly and max_extent_size fields of the space_info while inside the critical section delimited by the block group's lock. So move that out of the block group's critical section, but sill inside the space_info's critical section. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: change 'reserved' argument from pin_down_extent() to boolFilipe Manana1-6/+6
It's used as a boolean, so convert it from int type to bool type. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: remove 'reserved' argument from btrfs_pin_extent()Filipe Manana1-8/+7
All callers pass a value of 1 (true) to it, so remove it. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: use local variable for space_info in pin_down_extent()Filipe Manana1-9/+10
Instead of dereferencing the block group multiple times to access its space_info, use a local variable to shorten the code horizontal wise and make it easier to read. Also, while at it, also rename the block group argument from 'cache' to 'bg', as the cache name is confusing and it's from the old days where the block group structure was named as 'btrfs_block_group_cache'. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: reduce block group critical section in pin_down_extent()Filipe Manana1-5/+5
There's no need to update the bytes_reserved and bytes_may_use fields of the space_info while holding the block group's spinlock. We are only making the critical section longer than necessary. So move the space_info updates outside of the block group's critical section. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: use the key format macros when printing keysFilipe Manana1-7/+7
Change all locations that print a key to use the new macros to print them in order to ensure a consistent style and avoid repetitive code. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-11-24btrfs: remove fs_info argument from btrfs_dump_space_info()Filipe Manana1-2/+1
We don't need it since we can grab fs_info from the given space_info. So remove the fs_info argument. Reviewed-by: Qu Wenruo <wqu@suse.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Anand Jain <asj@kernel.org> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: add unlikely annotations to branches leading to transaction abortDavid Sterba1-19/+19
The unlikely() annotation is a static prediction hint that compiler may use to reorder code out of hot path. We use it elsewhere (namely tree-checker.c) for error branches that almost never happen. Transaction abort is one such error, the btrfs_abort_transaction() inlines code to check the state and print a warning, this ought to be out of the hot path. The most common pattern is when transaction abort is called after checking a return value and the control flow leads to a quick return. In other cases it may not be necessary to add unlikely() e.g. when the function returns anyway or the control flow is not changed noticeably. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: add unlikely annotations to branches leading to EIODavid Sterba1-2/+2
The unlikely() annotation is a static prediction hint that compiler may use to reorder code out of hot path. We use it elsewhere (namely tree-checker.c) for error branches that almost never happen, where EIO is one of them. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: add unlikely annotations to branches leading to EUCLEANDavid Sterba1-16/+16
The unlikely() annotation is a static prediction hint that compiler may use to reorder code out of hot path. We use it elsewhere (namely tree-checker.c) for error branches that almost never happen, where EUCLEAN (a corruption) is one of them. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-23btrfs: fix typos in comments and stringsDavid Sterba1-4/+4
Annual typo fixing pass. Strangely codespell found only about 30% of what is in this patch, the rest was done manually using text spellchecker with a custom dictionary of acceptable terms. Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-22btrfs: convert several int parameters to boolDavid Sterba1-8/+8
We're almost done cleaning misused int/bool parameters. Convert a bunch of them, found by manual grepping. Note that btrfs_sync_fs() needs an int as it's mandated by the struct super_operations prototype. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2025-09-22btrfs: zoned: refine extent allocator hint selectionNaohiro Aota1-2/+4
The hint block group selection in the extent allocator is wrong in the first place, as it can select the dedicated data relocation block group for the normal data allocation. Since we separated the normal data space_info and the data relocation space_info, we can easily identify a block group is for data relocation or not. Do not choose it for the normal data allocation. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-22btrfs: fix ssd_spread overallocationBoris Burkov1-16/+17
If the ssd_spread mount option is enabled, then we run the so called clustered allocator for data block groups. In practice, this results in creating a btrfs_free_cluster which caches a block_group and borrows its free extents for allocation. Since the introduction of allocation size classes in 6.1, there has been a bug in the interaction between that feature and ssd_spread. find_free_extent() has a number of nested loops. The loop going over the allocation stages, stored in ffe_ctl->loop and managed by find_free_extent_update_loop(), the loop over the raid levels, and the loop over all the block_groups in a space_info. The size class feature relies on the block_group loop to ensure it gets a chance to see a block_group of a given size class. However, the clustered allocator uses the cached cluster block_group and breaks that loop. Each call to do_allocation() will really just go back to the same cached block_group. Normally, this is OK, as the allocation either succeeds and we don't want to loop any more or it fails, and we clear the cluster and return its space to the block_group. But with size classes, the allocation can succeed, then later fail, outside of do_allocation() due to size class mismatch. That latter failure is not properly handled due to the highly complex multi loop logic. The result is a painful loop where we continue to allocate the same num_bytes from the cluster in a tight loop until it fails and releases the cluster and lets us try a new block_group. But by then, we have skipped great swaths of the available block_groups and are likely to fail to allocate, looping the outer loop. In pathological cases like the reproducer below, the cached block_group is often the very last one, in which case we don't perform this tight bg loop but instead rip through the ffe stages to LOOP_CHUNK_ALLOC and allocate a chunk, which is now the last one, and we enter the tight inner loop until an allocation failure. Then allocation succeeds on the final block_group and if the next allocation is a size mismatch, the exact same thing happens again. Triggering this is as easy as mounting with -o ssd_spread and then running: mount -o ssd_spread $dev $mnt dd if=/dev/zero of=$mnt/big bs=16M count=1 &>/dev/null dd if=/dev/zero of=$mnt/med bs=4M count=1 &>/dev/null sync if you do the two writes + sync in a loop, you can force btrfs to spin an excessive amount on semi-successful clustered allocations, before ultimately failing and advancing to the stage where we force a chunk allocation. This results in 2G of data allocated per iteration, despite only using ~20M of data. By using a small size classed extent, the inner loop takes longer and we can spin for longer. The simplest, shortest term fix to unbreak this is to make the clustered allocator size_class aware in the dumbest way, where it fails on size class mismatch. This may hinder the operation of the clustered allocator, but better hindered than completely broken and terribly overallocating. Further re-design improvements are also in the works. Fixes: 52bb7a2166af ("btrfs: introduce size class to block group allocator") CC: stable@vger.kernel.org # 6.1+ Reported-by: David Sterba <dsterba@suse.com> Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-21btrfs: add btrfs prefix to is_fstree() and make it return boolFilipe Manana1-3/+3
This is an exported function and therefore it should have a 'btrfs_' prefix, to make it clear it's btrfs specific, avoid future name collisions with code outside btrfs, and make its naming consistent with most other btrfs exported functions. So add a 'btrfs_' prefix to it and make it return bool instead of int, since all we need is to return true or false. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2025-07-21btrfs: add btrfs prefix to free space tree exported functionsFilipe Manana1-2/+2
A few of the free space tree exported functions have a 'btrfs_' prefix in their name, but most don't. Not only is this inconsistent, the preferred style is to have such a prefix, to avoid potential collisions in the future with other kernel code and offer a somewhat better readibility by making it obvious in calls sites that we are calling btrfs specific code. So add the 'btrfs_' prefix to all free space tree functions that are missing it. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>