aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-02-03btrfs: zoned: re-flow prepare_allocation_zoned()Johannes Thumshirn1-17/+24
Re-flow prepare allocation zoned to make it a bit more readable by returning early and removing unnecessary indentations. This patch does not change any functionality. Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: shrink the size of btrfs_bioQu Wenruo1-7/+8
This is done by: - Shrink the size of btrfs_bio::mirror_num From 32 bits unsigned int to u16. Normally btrfs mirror number is either 0 (all profiles), 1 (all profiles), 2 (DUP/RAID1/RAID10/RAID5), 3 (RAID1C3) or 4 (RAID1C4). But for RAID6 the mirror number can go as large as the number of devices of that chunk. Currently the limit for number of devices for a data chunk is BTRFS_MAX_DEVS(), which is around 500 for the default 16K nodesize. And if going the max 64K nodesize, we can have a little over 2000 devices for a chunk. Although I'd argue it's way overkilled, we don't reject such cases yet thus u8 is not going to cut it, and have to use u16 (max out at 64K). - Use bit fields for boolean members Although it's not always safe for racy call sites, those members are safe. * csum_search_commit_root * is_scrub Those two are set immediately after bbio allocation and no more writes after allocation, thus they are very safe. * async_csum * can_use_append Those two are set for each split range, and after that there is no writes into those two members in different threads, thus they are also safe. And there are spaces for 4 more bits before increasing the size of btrfs_bio again, which should be future proof enough. - Reorder the structure members Now we always put the largest member first (after the huge 120 bytes union), making it easier to fill any holes. This reduce the size of btrfs_bio by 8 bytes, from 312 bytes to 304 bytes. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: remove ASSERT compatibility for gcc < 8.xDavid Sterba1-17/+0
The minimum gcc version is 8 since 118c40b7b50340 ("kbuild: require gcc-8 and binutils-2.30"), the workaround for missing __VA_OPT__ support is not needed. Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: pass level to _btrfs_printk() to avoid parsing level from stringDavid Sterba2-45/+30
There's code in _btrfs_printk() to parse the message level from the input string so we can augment the message with the level description for better visibility in the logs. The parsing code has evolved over time, see commits: - 40f7828b36e3b9 ("btrfs: better handle btrfs_printk() defaults") - 262c5e86fec7cf ("printk/btrfs: handle more message headers") - 533574c6bc30cf ("btrfs: use printk_get_level and printk_skip_level, add __printf, fix fallout") - 4da35113426d16 ("btrfs: add varargs to btrfs_error") As we are using the specific level helpers everywhere we can simply pass the message level so we don't have to parse it. The proper printk() message header is created as KERN_SOH + "level". Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: simplify internal btrfs_printk helpersDavid Sterba1-5/+13
The printk() can be compiled out depending on CONFIG_PRINTK, this is reflected in our helpers. The indirection is provided by btrfs_printk() used in the ratelimited and RCU wrapper macros. Drop the btrfs_printk() helper and define the ratelimit and RCU helpers directly when CONFIG_PRINTK is undefined. This will allow further changes to the _btrfs_printk() interface (which is internal), any message in other code should use the level-specific helpers. Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: rename btrfs_create_block_group_cache to btrfs_create_block_groupJohannes Thumshirn1-4/+4
struct btrfs_block_group used to be called struct btrfs_block_group_cache but got renamed to btrfs_block_group with commit 32da5386d9a4 ("btrfs: rename btrfs_block_group_cache"). Rename btrfs_create_block_group_cache() to btrfs_create_block_group() to reflect that change. Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: merge setting ret and return retDavid Sterba13-63/+32
In many places we have pattern: ret = ...; return ret; This can be simplified to a direct return, removing 'ret' if not otherwise needed. The places in self tests are not converted so we can add more test cases without changing surrounding code (extent-map-tests.c:test_case_4()). Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: remove dead assignment in prepare_one_folio()Massimiliano Pellizzer1-4/+2
In prepare_one_folio(), ret is initialized to 0 at declaration, and in an error path we assign ret = 0 before jumping to the again label to retry the operation. However, ret is immediately overwritten by ret = set_folio_extent_mapped(folio) after the again label. Both assignments are never observed by any code path, therefore they can be safely removed. Signed-off-by: Massimiliano Pellizzer <mpellizzer.dev@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: replace for_each_set_bit() with for_each_set_bitmap()Qu Wenruo1-10/+16
Inside extent_io.c, there are several simple call sites doing things like: for_each_set_bit(bit, bitmap, bitmap_size) { /* handle one fs block */ } The workload includes: - set_bit() Inside extent_writepage_io(). This can be replaced with a bitmap_set(). - btrfs_folio_set_lock() - btrfs_mark_ordered_io_finished() Inside writepage_delalloc(). Instead of calling it multiple times, we can pass a range into the function with one call. Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: concentrate the error handling of submit_one_sector()Qu Wenruo1-14/+15
Currently submit_one_sector() has only one failure path from btrfs_get_extent(). However the error handling is split into two parts, one inside submit_one_sector(), which clears the dirty flag and finishes the writeback for the fs block. The other part is to submit any remaining bio inside bio_ctrl and mark the ordered extent finished for the fs block. There is no special reason that we must split the error handling, let's just concentrate all the error handling into submit_one_sector(). Reviewed-by: Boris Burkov <boris@bur.io> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: search for larger extent maps inside btrfs_do_readpage()Qu Wenruo1-1/+14
[CORNER CASE] If we have the following file extents layout, btrfs_get_extent() can return a smaller hole during read, and cause unnecessary extra tree searches: item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53 generation 9 type 1 (regular) extent data disk byte 13631488 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) item 7 key (257 EXTENT_DATA 32768) itemoff 15757 itemsize 53 generation 9 type 1 (regular) extent data disk byte 13635584 nr 4096 extent data offset 0 nr 4096 ram 4096 extent compression 0 (none) In above case, range [0, 4K) and [32K, 36K) are regular extents, and there is a hole in range [4K, 32K), and the fs has "no-holes" feature, meaning the hole will not have a file extent item. [INEFFICIENCY] Assume the system has 4K page size, and we're doing readahead for range [4K, 32K), no large folio yet. btrfs_readahead() for range [4K, 32K) |- btrfs_do_readpage() for folio 4K | |- get_extent_map() for range [4K, 8K) | |- btrfs_get_extent() for range [4K, 8K) | We hit item 6, then for the next item 7. | At this stage we know range [4K, 32K) is a hole. | But our search range is only [4K, 8K), not reaching 32K, thus | we go into not_found: tag, returning a hole em for [4K, 8K). | |- btrfs_do_readpage() for folio 8K | |- get_extent_map() for range [8K, 12K) | |- btrfs_get_extent() for range [8K, 12K) | We hit the same item 6, and then item 7. | But still we goto not_found tag, inserting a new hole em, | which will be merged with previous one. | | [ Repeat the same btrfs_get_extent() calls until the end ] So we're calling btrfs_get_extent() again and again, just for a different part of the same hole range [4K, 32K). [ENHANCEMENT] Make btrfs_do_readpage() to search for a larger extent map if readahead is involved. For btrfs_readahead() we have bio_ctrl::ractl set, and lock extents for the whole readahead range. If we find bio_ctrl::ractl is set, we can use that end range as extent map search end, this allows btrfs_get_extent() to return a much larger hole, thus reduce the need to call btrfs_get_extent() again and again. btrfs_readahead() for range [4K, 32K) |- btrfs_do_readpage() for folio 4K | |- get_extent_map() for range [4K, 32K) | |- btrfs_get_extent() for range [4K, 32K) | We hit item 6, then for the next item 7. | At this stage we know range [4K, 32K) is a hole. | So the hole em for range [4K, 32K) is returned. | |- btrfs_do_readpage() for folio 8K | |- get_extent_map() for range [8K, 32K) | The cached hole em range [4K, 32K) covers the range, | and reuse that em. | | [ Repeat the same btrfs_get_extent() calls until the end ] Now we only call btrfs_get_extent() once for the whole range [4K, 32K), other than the old 8 times. Such change will reduce the overhead of reading large holes a little. For current experimental build (with larger folios) on aarch64, there will be a tiny but consistent ~1% improvement reading a large hole file: Reading a 1GiB sparse file (all hole) using xfs_io, with 64K block size, the result is the time needed to read the whole file, reported from xfs_io. 32 runs, experimental build (with large folios). 64K page size, 4K fs block size. - Avg before: 0.20823 s - Avg after: 0.20635 s - Diff: -0.9% Reviewed-by: Filipe Manana <fdmanana@suse.com> Signed-off-by: Qu Wenruo <wqu@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: introduce BTRFS_PATH_AUTO_RELEASE() helperQu Wenruo4-22/+23
There are already several bugs with on-stack btrfs_path involved, even it is already a little safer than btrfs_path pointers (only leaks the extent buffers, not the btrfs_path structure itself) - Patch "btrfs: make sure extent and csum paths are always released in scrub_raid56_parity_stripe()" - Patch "btrfs: fix a potential path leak in print_data_reloc_error()" Thus there is a real need to apply auto release for those on-stack paths. Introduces a new macro, BTRFS_PATH_AUTO_RELEASE() which defines one on-stack btrfs_path structure, initialize it all to 0, then call btrfs_release_path() on it when exiting the scope. This applies to current 3 on-stack path usages: - defrag_get_extent() in defrag.c - print_data_reloc_error() in inode.c There is a special case where we want to release the path early before the time consuming iterate_extent_inodes() call, thus that manual early release is kept as is, with an extra comment added. - scrub_radi56_parity_stripe() in scrub.c Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: enable direct IO for bs > ps casesQu Wenruo1-15/+2
Previously direct IO was disabled if the fs block size was larger than the page size, the reasons are: - Iomap direct IO can split the range ignoring the fs block alignment Which could trigger the bio size check from btrfs_submit_bio(). - The buffer is only ensured to be contiguous in user space memory The underlying physical memory is not ensured to be contiguous, and that can cause problems for the checksum generation/verification and RAID56 handling. However the above problems are solved by the following upstream commits: - 001397f5ef49 ("iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag") Which added an extra flag that can be utilized by the fs to ensure the bio submitted by iomap is always aligned to fs block size. - ec20799064c8 ("btrfs: enable encoded read/write/send for bs > ps cases") - 8870dbeedcf9 ("btrfs: raid56: enable bs > ps support") Which makes btrfs to handle bios that are not backed by large folios but still are aligned to fs block size. As the commits have been merged we can enable direct IO support for bs > ps cases. Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Qu Wenruo <wqu@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: switch to library APIs for checksumsEric Biggers10-103/+137
Make btrfs use the library APIs instead of crypto_shash, for all checksum computations. This has many benefits: - Allows future checksum types, e.g. XXH3 or CRC64, to be more easily supported. Only a library API will be needed, not crypto_shash too. - Eliminates the overhead of the generic crypto layer, including an indirect call for every function call and other API overhead. A microbenchmark of btrfs_check_read_bio() with crc32c checksums shows a speedup from 658 cycles to 608 cycles per 4096-byte block. - Decreases the stack usage of btrfs by reducing the size of checksum contexts from 384 bytes to 240 bytes, and by eliminating the need for some functions to declare a checksum context at all. - Increases reliability. The library functions always succeed and return void. In contrast, crypto_shash can fail and return errors. Also, the library functions are guaranteed to be available when btrfs is loaded; there's no longer any need to use module softdeps to try to work around the crypto modules sometimes not being loaded. - Fixes a bug where blake2b checksums didn't work on kernels booted with fips=1. Since btrfs checksums are for integrity only, it's fine for them to use non-FIPS-approved algorithms. Note that with having to handle 4 algorithms instead of just 1-2, this commit does result in a slightly positive diffstat. That being said, this wouldn't have been the case if btrfs had actually checked for errors from crypto_shash, which technically it should have been doing. Reviewed-by: Ard Biesheuvel <ardb@kernel.org> Reviewed-by: Neal Gompa <neal@gompa.dev> Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: zoned: don't zone append to conventional zoneJohannes Thumshirn2-10/+12
In case of a zoned RAID, it can happen that a data write is targeting a sequential write required zone and a conventional zone. In this case the bio will be marked as REQ_OP_ZONE_APPEND but for the conventional zone, this needs to be REQ_OP_WRITE. The setting of REQ_OP_ZONE_APPEND is deferred to the last possible time in btrfs_submit_dev_bio(), but the decision if we can use zone append is cached in btrfs_bio. CC: Naohiro Aota <naohiro.aota@wdc.com> Fixes: e9b9b911e03c ("btrfs: add raid stripe tree to features enabled with debug config") Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: relax squota parent qgroup deletion ruleBoris Burkov1-15/+35
Currently, with squotas, we do not allow removing a parent qgroup with no members if it still has usage accounted to it. This makes it really difficult to recover from accounting bugs, as we have no good way of getting back to 0 usage. Instead, allow deletion (it's safe at 0 members..) while still warning about the inconsistency by adding a squota parent check. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: check squota parent usage on membership changeBoris Burkov1-0/+39
We could have detected the quick inherit bug more directly if we had an extra warning about squota hierarchy consistency while modifying the hierarchy. In squotas, the parent usage always simply adds up to the sum of its children, so we can just check for that when changing membership and detect more accounting bugs. Reviewed-by: Qu Wenruo <wqu@suse.com> Signed-off-by: Boris Burkov <boris@bur.io> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: simplify boolean argument for btrfs_inc_ref()/btrfs_dec_ref()Sun YangKai2-38/+18
Replace open-coded if/else blocks with the boolean directly and introduce local const bool variables, making the code shorter and easier to read. Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: use true/false for boolean parameters in btrfs_inc_ref()/btrfs_dec_ref()Sun YangKai2-14/+14
Replace integer literals 0/1 with true/false when calling btrfs_inc_ref() and btrfs_dec_ref() to make the code self-documenting and avoid mixing bool/integer types. Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: update comment for visit_node_for_delete()Sun YangKai1-1/+0
Drop the obsolete @refs parameter from the comment so the argument list matches the current function signature after commit f8c4d59de23c9 ("btrfs: drop unused parameter refs from visit_node_for_delete()"). Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Sun YangKai <sunk67188@gmail.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-03btrfs: raid56: fix memory leak of btrfs_raid_bio::stripe_uptodate_bitmapFilipe Manana1-0/+1
We allocate the bitmap but we never free it in free_raid_bio_pointers(). Fix this by adding a bitmap_free() call against the stripe_uptodate_bitmap of a raid bio. Fixes: 1810350b04ef ("btrfs: raid56: move sector_ptr::uptodate into a dedicated bitmap") Reported-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/linux-btrfs/20260126045315.GA31641@lst.de/ Reviewed-by: Qu Wenruo <wqu@suse.com> Tested-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: David Sterba <dsterba@suse.com> Signed-off-by: David Sterba <dsterba@suse.com>
2026-02-02MIPS: tools: relocs: Ship a definition of R_MIPS_PC32Yao Zi1-0/+7
R_MIPS_PC32 is a GNU extension, its definition is available in glibc only since 2.39 (released in 2024), and not available in musl libc yet. Provide our own definition for R_MIPS_PC32 and use it if necessary to fix relocs tool building on musl and older glibc systems. Fixes: ff79d31eb536 ("mips: Add support for PC32 relocations in vmlinux") Signed-off-by: Yao Zi <me@ziyao.cc> Link: https://patch.msgid.link/20260202041610.61389-1-me@ziyao.cc Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-02-02streamline_config.pl: remove superfluous exclamation markDiego Viola1-1/+1
In order to make the output cleaner and more consistent with other scripts. Signed-off-by: Diego Viola <diego.viola@gmail.com> Link: https://patch.msgid.link/20260202054541.17399-1-diego.viola@gmail.com Signed-off-by: Nathan Chancellor <nathan@kernel.org>
2026-02-02Merge branch 'devlink-and-mlx5-support-cross-function-rate-scheduling'Jakub Kicinski4-15/+17
Tariq Toukan says: ==================== devlink and mlx5: Support cross-function rate scheduling [part] Apply trivial cleanups from the series to make it smaller. ==================== Link: https://patch.msgid.link/20260128112544.1661250-1-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02devlink: Refactor devlink_rate_nodes_checkCosmin Ratiu4-12/+16
devlink_rate_nodes_check() was used to verify there are no devlink rate nodes created when switching the esw mode. Rate management code is about to become more complex, so refactor this function: - remove unused param 'mode'. - add a new 'rate_filter' param. - rename to devlink_rates_check(). - expose devlink_rate_is_node() to be used as a rate filter. This makes it more usable from multiple places, so use it from those places as well. Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Carolina Jubran <cjubran@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260128112544.1661250-6-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02devlink: Reverse locking order for nested instancesCosmin Ratiu1-3/+1
Commit [1] defined the locking expectations for nested devlink instances: the nested-in devlink instance lock needs to be acquired before the nested devlink instance lock. The code handling devlink rels was architected with that assumption in mind. There are no actual users of double locking yet but that is about to change in the upcoming patches in the series. Code operating on nested devlink instances will require also obtaining the nested-in instance lock, but such code may already be called from a variety of places with the nested devlink instance lock. Then, there's no way to acquire the nested-in lock other than making sure that all callers acquire it first. Reversing the nested lock order allows incrementally acquiring the nested-in instance lock when needed (perhaps even a chain of locks up to the root) without affecting any caller. The only affected use of nesting is devlink_nl_nested_fill(), which iterates over nested devlink instances with the RCU lock, without locking them, so there's no possibility of deadlock. So this commit just updates a comment regarding the nested locks. [1] commit c137743bce02b ("devlink: introduce object and nested devlink relationship infra") Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com> Reviewed-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Link: https://patch.msgid.link/20260128112544.1661250-4-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-03erofs: avoid some unnecessary #ifdefsFerry Meng3-24/+14
They can either be removed or replaced with IS_ENABLED(). Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com> Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com> Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-02-02Merge branch 'net-stmmac-pcs-preparation'Jakub Kicinski4-31/+44
Russell King says: ==================== net: stmmac: pcs preparation These three patches prepare for the PCS changes, which, subject to Qualcomm testing, should be coming in the next cycle. ==================== Link: https://patch.msgid.link/aXyRlFw7ZuhRPiKo@shell.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: stmmac: handle integrated PCS phy_intf_sel separatelyRussell King (Oracle)3-3/+23
The dwmac core has no support for SGMII without using its integrated PCS. Thus, PHY_INTF_SEL_SGMII is only supported when this block is present, and it makes no sense for stmmac_get_phy_intf_sel() to decode this. None of the platform glue users that use stmmac_get_phy_intf_sel() directly accept PHY_INTF_SEL_SGMII as a valid mode. Check whether a PCS will be used by the driver for the interface mode, and if it is the integrated PCS, query the integrated PCS for the phy_intf_sel_i value to use. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vlmOa-00000006zvB-1fIe@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: stmmac: move most PCS register definitions to stmmac_pcs.cRussell King (Oracle)2-26/+18
Move most of the PCS register offset definitions to stmmac_pcs.c. Since stmmac_pcs.c only ever passes zero into the register offset macros, remove that ability, making them simple constant integer definitions. Add appropriate descriptions of the registers, pointing out their similarity with their IEEE 802.3 counterparts. Make use of the BMSR definitions for the GMAC_AN_STATUS register and remove the driver private versions. Note that BMSR_LSTATUS is non-low-latching, unlike it's 802.3z counterpart. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vlmOV-00000006zv5-1CwO@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: stmmac: clear half-duplex caps where unsupportedRussell King (Oracle)2-2/+3
Where a core supports hardware features, but does not indicate support for half-duplex, clear phylink's half-duplex 1G, 100M and 10M capability bits to disallow half-duplex operation and advertisement of these link modes. This will avoid the need for special code in the PCS driver to do this based on the ESTATUS register bits, as the support in the PCS is dependent on the same synthesis choice as the MAC core. Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com> Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com> Link: https://patch.msgid.link/E1vlmOQ-00000006zuz-0ffN@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: spacemit: k1-emac: fix jumbo frame supportTomas Hlavacek1-6/+15
The driver never programs the MAC frame size and jabber registers, causing the hardware to reject frames larger than the default 1518 bytes even when larger DMA buffers are allocated. Program MAC_MAXIMUM_FRAME_SIZE, MAC_TRANSMIT_JABBER_SIZE, and MAC_RECEIVE_JABBER_SIZE based on the configured MTU. Also fix the maximum buffer size from 4096 to 4095, since the descriptor buffer size field is only 12 bits. Account for double VLAN tags in frame size calculations. Fixes: bfec6d7f2001 ("net: spacemit: Add K1 Ethernet MAC") Cc: stable@vger.kernel.org Signed-off-by: Tomas Hlavacek <tmshlvck@gmail.com> Link: https://patch.msgid.link/20260130102301.477514-1-tmshlvck@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02Merge branch 'add-support-for-renesas-rz-g3l-gbeth'Jakub Kicinski3-11/+70
Biju Das says: ==================== Add support for Renesas RZ/G3L GBETH From: Biju Das <biju.das.jz@bp.renesas.com> The Renesas RZ/G3L GBETH IP uses Synopsys DesignWare MAC version 5.30 compared to other Renesas SoC such as RZ/V2H that use MAC version 5.20. The RZ/G3L GBETH requires an extra clock compared to RZ/G3E and has pps interrupts. Document the Renesas RZ/G3L GBETH IP in bindings and add support for the RZ/G3L GBETH in dwmac-renesas-gbeth glue driver. ==================== Link: https://patch.msgid.link/20260131161250.5047-1-biju.das.jz@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: stmmac: dwmac-renesas-gbeth: Add support for RZ/G3L SoCBiju Das1-0/+1
Compared to other Renesas GBETH stmmac glue drivers, RZ/G3L GBETH IP use the version Synopsys DesignWare MAC (version 5.30). It has an extra clock compared to RZ/V2H and has ptp_pps_o interrupts. Add support for RZ/G3L GBETH by reusing device data of RZ/V2H and can be extended to add other functionalities later. Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/20260131161250.5047-3-biju.das.jz@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02dt-bindings: net: renesas,rzv2h-gbeth: Document Renesas RZ/G3L SoCBiju Das2-11/+69
Add device tree binding support for the Gigabit Ethernet (GBETH) IP on Renesas RZ/G3L SoC. This SoC uses different Synopsys DesignWare MAC version 5.30 compared to RZ/G3E. RZ/G3L requires an extra clock compared to RZ/G3E and has pps interrupts. Add a new compatible string "renesas,r9a08g046-gbeth" for RZ/G3L SoC and update the schema to handle hardware differences between SoC variants. Extend the base snps,dwmac.yaml schema to accommodate the PPS interrupts. Acked-by: Conor Dooley <conor.dooley@microchip.com> Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com> Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com> Link: https://patch.msgid.link/20260131161250.5047-2-biju.das.jz@bp.renesas.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-03erofs: handle end of filesystem properly for file-backed mountsGao Xiang1-12/+8
I/O requests beyond the end of the filesystem should be zeroed out, similar to loopback devices and that is what we expect. Fixes: ce63cb62d794 ("erofs: support unencoded inodes for fileio") Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-02-03erofs: separate plain and compressed filesystems formallyGao Xiang7-45/+39
The EROFS on-disk format uses a tiny, plain metadata design that prioritizes performance and minimizes complex inconsistencies against common writable disk filesystems (almost all serious metadata inconsistency cannot happen in well-designed immutable filesystems like EROFS). EROFS deliberately avoids artificial design flaws to eliminate serious security risks from untrusted remote sources by design, although human-made implementation bugs can still happen sometimes. Currently, there is no strict check to prevent compressed inodes, especially LZ4-compressed inodes, from being read in plain filesystems. Starting with erofs-utils 1.0 and Linux 5.3, LZ4_0PADDING sb feature is automatically enabled for LZ4-compressed EROFS images to support in-place decompression. Furthermore, since Linux 5.4 LTS is no longer supported, we no longer need to handle ancient LZ4-compressed EROFS images generated by erofs-utils prior to 1.0. To formally distinguish different filesystem types for improved security: - Use the presence of LZ4_0PADDING or a non-zero `dsb->u1.lz4_max_distance` as a marker for compressed filesystems containing LZ4-compressed inodes only; - For other algorithms, use `dsb->u1.available_compr_algs` bitmap. Note: LZ4_0PADDING has been supported since Linux 5.4 (the first formal kernel version), so exposing it via sysfs is no longer necessary and is now deprecated (but remain it for five more years until 2031): `dsb->u1` has been strictly non-zero for all EROFS images containing compressed inodes starting with erofs-utils v1.3 and it is actually a much better marker for compressed filesystems. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-02-03erofs: use inode_set_cached_link()Gao Xiang1-13/+21
Symlink lengths are now cached in in-memory inodes directly so that readlink can be sped up. Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
2026-02-02dt-bindings: intel: Add Agilex eMMC supportNg Tze Yee1-0/+1
Agilex devkit support a separate eMMC daughter card. Document Agilex eMMC daughter board compatible. [dinguyen] becauce of patch 1cb8486ac5f3 ("dt-bindings: arm: altera: combine Intel's SoCFPGA into altera.yaml"), I moved the change to altera.yaml file. Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Signed-off-by: Ng Tze Yee <tzeyee.ng@altera.com> Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
2026-02-02Documentation: document liveupdate cmdline parameterLi Chen1-0/+5
liveupdate is used to enable Live Update Orchestrator (LUO) early during boot. Add it to kernel-parameters.txt so users can discover and use it. Link: https://lkml.kernel.org/r/20260130112036.359806-1-me@linux.beauty Signed-off-by: Li Chen <me@linux.beauty> Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: "Borislav Petkov (AMD)" <bp@alien8.de> Cc: Frank van der Linden <fvdl@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Kees Cook <kees@kernel.org> Cc: Li RongQing <lirongqing@baidu.com> Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Pratyush Yadav <ptyadav@amazon.de> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-02mm, shmem: prevent infinite loop on truncate raceKairui Song1-9/+14
When truncating a large swap entry, shmem_free_swap() returns 0 when the entry's index doesn't match the given index due to lookup alignment. The failure fallback path checks if the entry crosses the end border and aborts when it happens, so truncate won't erase an unexpected entry or range. But one scenario was ignored. When `index` points to the middle of a large swap entry, and the large swap entry doesn't go across the end border, find_get_entries() will return that large swap entry as the first item in the batch with `indices[0]` equal to `index`. The entry's base index will be smaller than `indices[0]`, so shmem_free_swap() will fail and return 0 due to the "base < index" check. The code will then call shmem_confirm_swap(), get the order, check if it crosses the END boundary (which it doesn't), and retry with the same index. The next iteration will find the same entry again at the same index with same indices, leading to an infinite loop. Fix this by retrying with a round-down index, and abort if the index is smaller than the truncate range. Link: https://lkml.kernel.org/r/aXo6ltB5iqAKJzY8@KASONG-MC4 Fixes: 809bc86517cc ("mm: shmem: support large folio swap out") Fixes: 8a1968bd997f ("mm/shmem, swap: fix race of truncate and swap entry split") Signed-off-by: Kairui Song <kasong@tencent.com> Reported-by: Chris Mason <clm@meta.com> Closes: https://lore.kernel.org/linux-mm/20260128130336.727049-1-clm@meta.com/ Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com> Cc: Baoquan He <bhe@redhat.com> Cc: Barry Song <baohua@kernel.org> Cc: Chris Li <chrisl@kernel.org> Cc: Hugh Dickins <hughd@google.com> Cc: Kemeng Shi <shikemeng@huaweicloud.com> Cc: Nhat Pham <nphamcs@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-02mailmap: update Alexander Mikhalitsyn's emailsAlexander Mikhalitsyn1-0/+1
Link: https://lkml.kernel.org/r/20260128173915.162309-1-alexander@mihalicyn.com Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-02liveupdate: luo_file: do not clear serialized_data on unfreezePratyush Yadav (Google)1-2/+0
Patch series "liveupdate: fixes in error handling". This series contains some fixes in LUO's error handling paths. The first patch deals with failed freeze() attempts. The cleanup path calls unfreeze, and that clears some data needed by later unpreserve calls. The second patch is a bit more involved. It deals with failed retrieve() attempts. To do so properly, it reworks some of the error handling logic in luo_file core. Both these fixes are "theoretical" -- in the sense that I have not been able to reproduce either of them in normal operation. The only supported file type right now is memfd, and there is nothing userspace can do right now to make it fail its retrieve or freeze. I need to make the retrieve or freeze fail by artificially injecting errors. The injected errors trigger a use-after-free and a double-free. That said, once more complex file handlers are added or memfd preservation is used in ways not currently expected or covered by the tests, we will be able to see them on real systems. This patch (of 2): The unfreeze operation is supposed to undo the effects of the freeze operation. serialized_data is not set by freeze, but by preserve. Consequently, the unpreserve operation needs to access serialized_data to undo the effects of the preserve operation. This includes freeing the serialized data structures for example. If a freeze callback fails, unfreeze is called for all frozen files. This would clear serialized_data for them. Since live update has failed, it can be expected that userspace aborts, releasing all sessions. When the sessions are released, unpreserve will be called for all files. The unfrozen files will see 0 in their serialized_data. This is not expected by file handlers, and they might either fail, leaking data and state, or might even crash or cause invalid memory access. Do not clear serialized_data on unfreeze so it gets passed on to unpreserve. There is no need to clear it on unpreserve since luo_file will be freed immediately after. Link: https://lkml.kernel.org/r/20260126230302.2936817-1-pratyush@kernel.org Link: https://lkml.kernel.org/r/20260126230302.2936817-2-pratyush@kernel.org Fixes: 7c722a7f44e0 ("liveupdate: luo_file: implement file systems callbacks") Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org> Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org> Cc: Pasha Tatashin <pasha.tatashin@soleen.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-02x86/kfence: fix booting on 32bit non-PAE systemsAndrew Cooper1-3/+4
The original patch inverted the PTE unconditionally to avoid L1TF-vulnerable PTEs, but Linux doesn't make this adjustment in 2-level paging. Adjust the logic to use the flip_protnone_guard() helper, which is a nop on 2-level paging but inverts the address bits in all other paging modes. This doesn't matter for the Xen aspect of the original change. Linux no longer supports running 32bit PV under Xen, and Xen doesn't support running any 32bit PV guests without using PAE paging. Link: https://lkml.kernel.org/r/20260126211046.2096622-1-andrew.cooper3@citrix.com Fixes: b505f1944535 ("x86/kfence: avoid writing L1TF-vulnerable PTEs") Reported-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Closes: https://lore.kernel.org/lkml/CAKFNMokwjw68ubYQM9WkzOuH51wLznHpEOMSqtMoV1Rn9JV_gw@mail.gmail.com/ Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: Alexander Potapenko <glider@google.com> Cc: Marco Elver <elver@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Jann Horn <jannh@google.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2026-02-02bpf: Replace snprintf("%s") with strscpyThorsten Blum1-2/+3
Replace snprintf("%s") with the faster and more direct strscpy(). Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev> Link: https://lore.kernel.org/r/20260201215247.677121-2-thorsten.blum@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2026-02-03platform/chrome: lightbar: Fix lightbar_program_ex alignmentGwendal Grignou1-2/+2
Make sure sub-command of lightbar command starts with a 8bit parameter to ensure alignment. Fixes: 9600b8bdbfe4 ("platform/chrome: lightbar: Add support for large sequence") Signed-off-by: Gwendal Grignou <gwendal@google.com> Link: https://lore.kernel.org/r/20260202100621.3608437-1-gwendal@google.com Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
2026-02-02Merge branch '100GbE' of ↵Jakub Kicinski4-88/+136
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2026-01-30 (ice, i40e) This series contains updates to ice and i40e drivers. Grzegorz and Jake resolve issues around timing for E825 that can cause Tx timestamps to be missed/interrupts not generated on ice. Aaron Ma defers restart of PTP work until after after VSIs are rebuilt to prevent NULL pointer dereference for ice. Mohammad Heib removes calls to udp_tunnel_get_rx_info() in ice and i40e which violates locking expectations and is unneeded. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: drop udp_tunnel_get_rx_info() call from i40e_open() ice: drop udp_tunnel_get_rx_info() call from ndo_open() ice: Fix PTP NULL pointer dereference during VSI rebuild ice: PTP: fix missing timestamps on E825 hardware ice: fix missing TX timestamps interrupts on E825 devices ==================== Link: https://patch.msgid.link/20260130185401.1091523-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02Merge branch 'mptcp-implement-read_sock-and-splice_read'Jakub Kicinski6-19/+308
Matthieu Baerts says: ==================== mptcp: implement .read_sock and .splice_read This series is a preparation work for future in-kernel MPTCP sockets usage. Here, two interfaces are implemented: read_sock and splice_read. As a result of this series, splice() with MPTCP sockets -- which was already supported -- is now improved. - Patches 1-2: .read_sock implementation - Patches 3-4: .splice_read implementation - Patches 5-6: validate splice() support with MPTCP sockets. ==================== Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-0-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02selftests: mptcp: connect: cover splice modeGeliang Tang2-0/+6
The "splice" alternate mode for mptcp_connect.sh/.c is available now, this patch adds mptcp_connect_splice.sh to test it in the MPTCP CI by default. Note that this mode is also supported by stable kernel versions, but optimised in this patch series. Suggested-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-6-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02selftests: mptcp: add splice io modeGeliang Tang1-1/+78
This patch adds a new 'splice' io mode for mptcp_connect to test the newly added read_sock() and splice_read() functions of MPTCP. do_splice() efficiently transfers data directly between two file descriptors (infd and outfd) without copying to userspace, using Linux's splice() system call. Usage: ./mptcp_connect.sh -m splice Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel