| Age | Commit message (Collapse) | Author | Files | Lines |
|
Re-flow prepare allocation zoned to make it a bit more readable by
returning early and removing unnecessary indentations.
This patch does not change any functionality.
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
This is done by:
- Shrink the size of btrfs_bio::mirror_num
From 32 bits unsigned int to u16.
Normally btrfs mirror number is either 0 (all profiles), 1 (all
profiles), 2 (DUP/RAID1/RAID10/RAID5), 3 (RAID1C3) or 4 (RAID1C4).
But for RAID6 the mirror number can go as large as the number of
devices of that chunk.
Currently the limit for number of devices for a data chunk is
BTRFS_MAX_DEVS(), which is around 500 for the default 16K nodesize.
And if going the max 64K nodesize, we can have a little over 2000
devices for a chunk.
Although I'd argue it's way overkilled, we don't reject such cases yet
thus u8 is not going to cut it, and have to use u16 (max out at 64K).
- Use bit fields for boolean members
Although it's not always safe for racy call sites, those members are
safe.
* csum_search_commit_root
* is_scrub
Those two are set immediately after bbio allocation and no more
writes after allocation, thus they are very safe.
* async_csum
* can_use_append
Those two are set for each split range, and after that there is no
writes into those two members in different threads, thus they are
also safe.
And there are spaces for 4 more bits before increasing the size of
btrfs_bio again, which should be future proof enough.
- Reorder the structure members
Now we always put the largest member first (after the huge 120 bytes
union), making it easier to fill any holes.
This reduce the size of btrfs_bio by 8 bytes, from 312 bytes to 304 bytes.
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The minimum gcc version is 8 since 118c40b7b50340 ("kbuild: require
gcc-8 and binutils-2.30"), the workaround for missing __VA_OPT__ support
is not needed.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There's code in _btrfs_printk() to parse the message level from the
input string so we can augment the message with the level description
for better visibility in the logs.
The parsing code has evolved over time, see commits:
- 40f7828b36e3b9 ("btrfs: better handle btrfs_printk() defaults")
- 262c5e86fec7cf ("printk/btrfs: handle more message headers")
- 533574c6bc30cf ("btrfs: use printk_get_level and printk_skip_level, add __printf, fix fallout")
- 4da35113426d16 ("btrfs: add varargs to btrfs_error")
As we are using the specific level helpers everywhere we can simply pass
the message level so we don't have to parse it. The proper printk()
message header is created as KERN_SOH + "level".
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
The printk() can be compiled out depending on CONFIG_PRINTK, this is
reflected in our helpers. The indirection is provided by btrfs_printk()
used in the ratelimited and RCU wrapper macros.
Drop the btrfs_printk() helper and define the ratelimit and RCU helpers
directly when CONFIG_PRINTK is undefined. This will allow further
changes to the _btrfs_printk() interface (which is internal), any
message in other code should use the level-specific helpers.
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
struct btrfs_block_group used to be called struct
btrfs_block_group_cache but got renamed to btrfs_block_group with
commit 32da5386d9a4 ("btrfs: rename btrfs_block_group_cache").
Rename btrfs_create_block_group_cache() to btrfs_create_block_group() to
reflect that change.
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In many places we have pattern:
ret = ...;
return ret;
This can be simplified to a direct return, removing 'ret' if not
otherwise needed. The places in self tests are not converted so we can
add more test cases without changing surrounding code
(extent-map-tests.c:test_case_4()).
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In prepare_one_folio(), ret is initialized to 0 at declaration,
and in an error path we assign ret = 0 before jumping to the
again label to retry the operation. However, ret is immediately
overwritten by ret = set_folio_extent_mapped(folio) after the
again label.
Both assignments are never observed by any code path,
therefore they can be safely removed.
Signed-off-by: Massimiliano Pellizzer <mpellizzer.dev@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Inside extent_io.c, there are several simple call sites doing things
like:
for_each_set_bit(bit, bitmap, bitmap_size) {
/* handle one fs block */
}
The workload includes:
- set_bit()
Inside extent_writepage_io().
This can be replaced with a bitmap_set().
- btrfs_folio_set_lock()
- btrfs_mark_ordered_io_finished()
Inside writepage_delalloc().
Instead of calling it multiple times, we can pass a range into the
function with one call.
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently submit_one_sector() has only one failure path from
btrfs_get_extent().
However the error handling is split into two parts, one inside
submit_one_sector(), which clears the dirty flag and finishes the
writeback for the fs block.
The other part is to submit any remaining bio inside bio_ctrl and mark
the ordered extent finished for the fs block.
There is no special reason that we must split the error handling, let's
just concentrate all the error handling into submit_one_sector().
Reviewed-by: Boris Burkov <boris@bur.io>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
[CORNER CASE]
If we have the following file extents layout, btrfs_get_extent() can
return a smaller hole during read, and cause unnecessary extra tree
searches:
item 6 key (257 EXTENT_DATA 0) itemoff 15810 itemsize 53
generation 9 type 1 (regular)
extent data disk byte 13631488 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
item 7 key (257 EXTENT_DATA 32768) itemoff 15757 itemsize 53
generation 9 type 1 (regular)
extent data disk byte 13635584 nr 4096
extent data offset 0 nr 4096 ram 4096
extent compression 0 (none)
In above case, range [0, 4K) and [32K, 36K) are regular extents, and
there is a hole in range [4K, 32K), and the fs has "no-holes" feature,
meaning the hole will not have a file extent item.
[INEFFICIENCY]
Assume the system has 4K page size, and we're doing readahead for range
[4K, 32K), no large folio yet.
btrfs_readahead() for range [4K, 32K)
|- btrfs_do_readpage() for folio 4K
| |- get_extent_map() for range [4K, 8K)
| |- btrfs_get_extent() for range [4K, 8K)
| We hit item 6, then for the next item 7.
| At this stage we know range [4K, 32K) is a hole.
| But our search range is only [4K, 8K), not reaching 32K, thus
| we go into not_found: tag, returning a hole em for [4K, 8K).
|
|- btrfs_do_readpage() for folio 8K
| |- get_extent_map() for range [8K, 12K)
| |- btrfs_get_extent() for range [8K, 12K)
| We hit the same item 6, and then item 7.
| But still we goto not_found tag, inserting a new hole em,
| which will be merged with previous one.
|
| [ Repeat the same btrfs_get_extent() calls until the end ]
So we're calling btrfs_get_extent() again and again, just for a
different part of the same hole range [4K, 32K).
[ENHANCEMENT]
Make btrfs_do_readpage() to search for a larger extent map if readahead
is involved.
For btrfs_readahead() we have bio_ctrl::ractl set, and lock extents for
the whole readahead range.
If we find bio_ctrl::ractl is set, we can use that end range as extent
map search end, this allows btrfs_get_extent() to return a much larger
hole, thus reduce the need to call btrfs_get_extent() again and again.
btrfs_readahead() for range [4K, 32K)
|- btrfs_do_readpage() for folio 4K
| |- get_extent_map() for range [4K, 32K)
| |- btrfs_get_extent() for range [4K, 32K)
| We hit item 6, then for the next item 7.
| At this stage we know range [4K, 32K) is a hole.
| So the hole em for range [4K, 32K) is returned.
|
|- btrfs_do_readpage() for folio 8K
| |- get_extent_map() for range [8K, 32K)
| The cached hole em range [4K, 32K) covers the range,
| and reuse that em.
|
| [ Repeat the same btrfs_get_extent() calls until the end ]
Now we only call btrfs_get_extent() once for the whole range [4K, 32K),
other than the old 8 times.
Such change will reduce the overhead of reading large holes a little.
For current experimental build (with larger folios) on aarch64, there
will be a tiny but consistent ~1% improvement reading a large hole file:
Reading a 1GiB sparse file (all hole) using xfs_io, with 64K block
size, the result is the time needed to read the whole file, reported
from xfs_io.
32 runs, experimental build (with large folios).
64K page size, 4K fs block size.
- Avg before: 0.20823 s
- Avg after: 0.20635 s
- Diff: -0.9%
Reviewed-by: Filipe Manana <fdmanana@suse.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
There are already several bugs with on-stack btrfs_path involved, even
it is already a little safer than btrfs_path pointers (only leaks the
extent buffers, not the btrfs_path structure itself)
- Patch "btrfs: make sure extent and csum paths are always released in
scrub_raid56_parity_stripe()"
- Patch "btrfs: fix a potential path leak in print_data_reloc_error()"
Thus there is a real need to apply auto release for those on-stack paths.
Introduces a new macro, BTRFS_PATH_AUTO_RELEASE() which defines one
on-stack btrfs_path structure, initialize it all to 0, then call
btrfs_release_path() on it when exiting the scope.
This applies to current 3 on-stack path usages:
- defrag_get_extent() in defrag.c
- print_data_reloc_error() in inode.c
There is a special case where we want to release the path early before
the time consuming iterate_extent_inodes() call, thus that manual
early release is kept as is, with an extra comment added.
- scrub_radi56_parity_stripe() in scrub.c
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Previously direct IO was disabled if the fs block size was larger than
the page size, the reasons are:
- Iomap direct IO can split the range ignoring the fs block alignment
Which could trigger the bio size check from btrfs_submit_bio().
- The buffer is only ensured to be contiguous in user space memory
The underlying physical memory is not ensured to be contiguous, and
that can cause problems for the checksum generation/verification and
RAID56 handling.
However the above problems are solved by the following upstream commits:
- 001397f5ef49 ("iomap: add IOMAP_DIO_FSBLOCK_ALIGNED flag")
Which added an extra flag that can be utilized by the fs to ensure
the bio submitted by iomap is always aligned to fs block size.
- ec20799064c8 ("btrfs: enable encoded read/write/send for bs > ps cases")
- 8870dbeedcf9 ("btrfs: raid56: enable bs > ps support")
Which makes btrfs to handle bios that are not backed by large folios
but still are aligned to fs block size.
As the commits have been merged we can enable direct IO support for
bs > ps cases.
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Qu Wenruo <wqu@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Make btrfs use the library APIs instead of crypto_shash, for all
checksum computations. This has many benefits:
- Allows future checksum types, e.g. XXH3 or CRC64, to be more easily
supported. Only a library API will be needed, not crypto_shash too.
- Eliminates the overhead of the generic crypto layer, including an
indirect call for every function call and other API overhead. A
microbenchmark of btrfs_check_read_bio() with crc32c checksums shows a
speedup from 658 cycles to 608 cycles per 4096-byte block.
- Decreases the stack usage of btrfs by reducing the size of checksum
contexts from 384 bytes to 240 bytes, and by eliminating the need for
some functions to declare a checksum context at all.
- Increases reliability. The library functions always succeed and
return void. In contrast, crypto_shash can fail and return errors.
Also, the library functions are guaranteed to be available when btrfs
is loaded; there's no longer any need to use module softdeps to try to
work around the crypto modules sometimes not being loaded.
- Fixes a bug where blake2b checksums didn't work on kernels booted with
fips=1. Since btrfs checksums are for integrity only, it's fine for
them to use non-FIPS-approved algorithms.
Note that with having to handle 4 algorithms instead of just 1-2, this
commit does result in a slightly positive diffstat. That being said,
this wouldn't have been the case if btrfs had actually checked for
errors from crypto_shash, which technically it should have been doing.
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Neal Gompa <neal@gompa.dev>
Signed-off-by: Eric Biggers <ebiggers@kernel.org>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
In case of a zoned RAID, it can happen that a data write is targeting a
sequential write required zone and a conventional zone. In this case the
bio will be marked as REQ_OP_ZONE_APPEND but for the conventional zone,
this needs to be REQ_OP_WRITE.
The setting of REQ_OP_ZONE_APPEND is deferred to the last possible time in
btrfs_submit_dev_bio(), but the decision if we can use zone append is
cached in btrfs_bio.
CC: Naohiro Aota <naohiro.aota@wdc.com>
Fixes: e9b9b911e03c ("btrfs: add raid stripe tree to features enabled with debug config")
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Naohiro Aota <naohiro.aota@wdc.com>
Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Currently, with squotas, we do not allow removing a parent qgroup with
no members if it still has usage accounted to it. This makes it really
difficult to recover from accounting bugs, as we have no good way of
getting back to 0 usage.
Instead, allow deletion (it's safe at 0 members..) while still warning
about the inconsistency by adding a squota parent check.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We could have detected the quick inherit bug more directly if we had
an extra warning about squota hierarchy consistency while modifying the
hierarchy. In squotas, the parent usage always simply adds up to the sum of
its children, so we can just check for that when changing membership and
detect more accounting bugs.
Reviewed-by: Qu Wenruo <wqu@suse.com>
Signed-off-by: Boris Burkov <boris@bur.io>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Replace open-coded if/else blocks with the boolean directly and introduce
local const bool variables, making the code shorter and easier to read.
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Replace integer literals 0/1 with true/false when calling
btrfs_inc_ref() and btrfs_dec_ref() to make the code self-documenting
and avoid mixing bool/integer types.
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
Drop the obsolete @refs parameter from the comment so the argument list
matches the current function signature after commit f8c4d59de23c9
("btrfs: drop unused parameter refs from visit_node_for_delete()").
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Sun YangKai <sunk67188@gmail.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
We allocate the bitmap but we never free it in free_raid_bio_pointers().
Fix this by adding a bitmap_free() call against the stripe_uptodate_bitmap
of a raid bio.
Fixes: 1810350b04ef ("btrfs: raid56: move sector_ptr::uptodate into a dedicated bitmap")
Reported-by: Christoph Hellwig <hch@lst.de>
Link: https://lore.kernel.org/linux-btrfs/20260126045315.GA31641@lst.de/
Reviewed-by: Qu Wenruo <wqu@suse.com>
Tested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Filipe Manana <fdmanana@suse.com>
Reviewed-by: David Sterba <dsterba@suse.com>
Signed-off-by: David Sterba <dsterba@suse.com>
|
|
R_MIPS_PC32 is a GNU extension, its definition is available in glibc
only since 2.39 (released in 2024), and not available in musl libc yet.
Provide our own definition for R_MIPS_PC32 and use it if necessary to
fix relocs tool building on musl and older glibc systems.
Fixes: ff79d31eb536 ("mips: Add support for PC32 relocations in vmlinux")
Signed-off-by: Yao Zi <me@ziyao.cc>
Link: https://patch.msgid.link/20260202041610.61389-1-me@ziyao.cc
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
|
|
In order to make the output cleaner and more consistent with other
scripts.
Signed-off-by: Diego Viola <diego.viola@gmail.com>
Link: https://patch.msgid.link/20260202054541.17399-1-diego.viola@gmail.com
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
|
|
Tariq Toukan says:
====================
devlink and mlx5: Support cross-function rate scheduling [part]
Apply trivial cleanups from the series to make it smaller.
====================
Link: https://patch.msgid.link/20260128112544.1661250-1-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
devlink_rate_nodes_check() was used to verify there are no devlink rate
nodes created when switching the esw mode.
Rate management code is about to become more complex, so refactor this
function:
- remove unused param 'mode'.
- add a new 'rate_filter' param.
- rename to devlink_rates_check().
- expose devlink_rate_is_node() to be used as a rate filter.
This makes it more usable from multiple places, so use it from those
places as well.
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Carolina Jubran <cjubran@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260128112544.1661250-6-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Commit [1] defined the locking expectations for nested devlink
instances: the nested-in devlink instance lock needs to be acquired
before the nested devlink instance lock. The code handling devlink rels
was architected with that assumption in mind.
There are no actual users of double locking yet but that is about to
change in the upcoming patches in the series.
Code operating on nested devlink instances will require also obtaining
the nested-in instance lock, but such code may already be called from a
variety of places with the nested devlink instance lock. Then, there's
no way to acquire the nested-in lock other than making sure that all
callers acquire it first.
Reversing the nested lock order allows incrementally acquiring the
nested-in instance lock when needed (perhaps even a chain of locks up to
the root) without affecting any caller.
The only affected use of nesting is devlink_nl_nested_fill(), which
iterates over nested devlink instances with the RCU lock, without
locking them, so there's no possibility of deadlock.
So this commit just updates a comment regarding the nested locks.
[1] commit c137743bce02b ("devlink: introduce object and nested devlink
relationship infra")
Signed-off-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Link: https://patch.msgid.link/20260128112544.1661250-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
They can either be removed or replaced with IS_ENABLED().
Signed-off-by: Ferry Meng <mengferry@linux.alibaba.com>
Reviewed-by: Gao Xiang <hsiangkao@linux.alibaba.com>
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Russell King says:
====================
net: stmmac: pcs preparation
These three patches prepare for the PCS changes, which, subject
to Qualcomm testing, should be coming in the next cycle.
====================
Link: https://patch.msgid.link/aXyRlFw7ZuhRPiKo@shell.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The dwmac core has no support for SGMII without using its integrated
PCS. Thus, PHY_INTF_SEL_SGMII is only supported when this block is
present, and it makes no sense for stmmac_get_phy_intf_sel() to decode
this.
None of the platform glue users that use stmmac_get_phy_intf_sel()
directly accept PHY_INTF_SEL_SGMII as a valid mode.
Check whether a PCS will be used by the driver for the interface mode,
and if it is the integrated PCS, query the integrated PCS for the
phy_intf_sel_i value to use.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vlmOa-00000006zvB-1fIe@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Move most of the PCS register offset definitions to stmmac_pcs.c.
Since stmmac_pcs.c only ever passes zero into the register offset
macros, remove that ability, making them simple constant integer
definitions.
Add appropriate descriptions of the registers, pointing out their
similarity with their IEEE 802.3 counterparts. Make use of the
BMSR definitions for the GMAC_AN_STATUS register and remove the
driver private versions.
Note that BMSR_LSTATUS is non-low-latching, unlike it's 802.3z
counterpart.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vlmOV-00000006zv5-1CwO@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Where a core supports hardware features, but does not indicate support
for half-duplex, clear phylink's half-duplex 1G, 100M and 10M
capability bits to disallow half-duplex operation and advertisement of
these link modes.
This will avoid the need for special code in the PCS driver to do this
based on the ESTATUS register bits, as the support in the PCS is
dependent on the same synthesis choice as the MAC core.
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Tested-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Reviewed-by: Maxime Chevallier <maxime.chevallier@bootlin.com>
Tested-by: Mohd Ayaan Anwar <mohd.anwar@oss.qualcomm.com>
Link: https://patch.msgid.link/E1vlmOQ-00000006zuz-0ffN@rmk-PC.armlinux.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The driver never programs the MAC frame size and jabber registers,
causing the hardware to reject frames larger than the default 1518
bytes even when larger DMA buffers are allocated.
Program MAC_MAXIMUM_FRAME_SIZE, MAC_TRANSMIT_JABBER_SIZE, and
MAC_RECEIVE_JABBER_SIZE based on the configured MTU. Also fix the
maximum buffer size from 4096 to 4095, since the descriptor buffer
size field is only 12 bits. Account for double VLAN tags in frame
size calculations.
Fixes: bfec6d7f2001 ("net: spacemit: Add K1 Ethernet MAC")
Cc: stable@vger.kernel.org
Signed-off-by: Tomas Hlavacek <tmshlvck@gmail.com>
Link: https://patch.msgid.link/20260130102301.477514-1-tmshlvck@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Biju Das says:
====================
Add support for Renesas RZ/G3L GBETH
From: Biju Das <biju.das.jz@bp.renesas.com>
The Renesas RZ/G3L GBETH IP uses Synopsys DesignWare MAC version 5.30
compared to other Renesas SoC such as RZ/V2H that use MAC version 5.20.
The RZ/G3L GBETH requires an extra clock compared to RZ/G3E and has pps
interrupts. Document the Renesas RZ/G3L GBETH IP in bindings and add
support for the RZ/G3L GBETH in dwmac-renesas-gbeth glue driver.
====================
Link: https://patch.msgid.link/20260131161250.5047-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Compared to other Renesas GBETH stmmac glue drivers, RZ/G3L GBETH IP use
the version Synopsys DesignWare MAC (version 5.30). It has an extra clock
compared to RZ/V2H and has ptp_pps_o interrupts. Add support for RZ/G3L
GBETH by reusing device data of RZ/V2H and can be extended to add other
functionalities later.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260131161250.5047-3-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add device tree binding support for the Gigabit Ethernet (GBETH) IP on
Renesas RZ/G3L SoC. This SoC uses different Synopsys DesignWare MAC
version 5.30 compared to RZ/G3E.
RZ/G3L requires an extra clock compared to RZ/G3E and has pps interrupts.
Add a new compatible string "renesas,r9a08g046-gbeth" for RZ/G3L SoC and
update the schema to handle hardware differences between SoC variants.
Extend the base snps,dwmac.yaml schema to accommodate the PPS interrupts.
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260131161250.5047-2-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I/O requests beyond the end of the filesystem should be zeroed out,
similar to loopback devices and that is what we expect.
Fixes: ce63cb62d794 ("erofs: support unencoded inodes for fileio")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
The EROFS on-disk format uses a tiny, plain metadata design that
prioritizes performance and minimizes complex inconsistencies against
common writable disk filesystems (almost all serious metadata
inconsistency cannot happen in well-designed immutable filesystems like
EROFS). EROFS deliberately avoids artificial design flaws to eliminate
serious security risks from untrusted remote sources by design,
although human-made implementation bugs can still happen sometimes.
Currently, there is no strict check to prevent compressed inodes,
especially LZ4-compressed inodes, from being read in plain filesystems.
Starting with erofs-utils 1.0 and Linux 5.3, LZ4_0PADDING sb feature
is automatically enabled for LZ4-compressed EROFS images to support
in-place decompression. Furthermore, since Linux 5.4 LTS is no longer
supported, we no longer need to handle ancient LZ4-compressed EROFS
images generated by erofs-utils prior to 1.0.
To formally distinguish different filesystem types for improved
security:
- Use the presence of LZ4_0PADDING or a non-zero
`dsb->u1.lz4_max_distance` as a marker for compressed filesystems
containing LZ4-compressed inodes only;
- For other algorithms, use `dsb->u1.available_compr_algs` bitmap.
Note: LZ4_0PADDING has been supported since Linux 5.4 (the first formal
kernel version), so exposing it via sysfs is no longer necessary and is
now deprecated (but remain it for five more years until 2031):
`dsb->u1` has been strictly non-zero for all EROFS images containing
compressed inodes starting with erofs-utils v1.3 and it is actually
a much better marker for compressed filesystems.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Symlink lengths are now cached in in-memory inodes directly so that
readlink can be sped up.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Agilex devkit support a separate eMMC daughter card. Document Agilex
eMMC daughter board compatible.
[dinguyen] becauce of patch 1cb8486ac5f3 ("dt-bindings: arm: altera: combine Intel's SoCFPGA into altera.yaml"),
I moved the change to altera.yaml file.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Ng Tze Yee <tzeyee.ng@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
liveupdate is used to enable Live Update Orchestrator (LUO) early during
boot. Add it to kernel-parameters.txt so users can discover and use it.
Link: https://lkml.kernel.org/r/20260130112036.359806-1-me@linux.beauty
Signed-off-by: Li Chen <me@linux.beauty>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Li RongQing <lirongqing@baidu.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When truncating a large swap entry, shmem_free_swap() returns 0 when the
entry's index doesn't match the given index due to lookup alignment. The
failure fallback path checks if the entry crosses the end border and
aborts when it happens, so truncate won't erase an unexpected entry or
range. But one scenario was ignored.
When `index` points to the middle of a large swap entry, and the large
swap entry doesn't go across the end border, find_get_entries() will
return that large swap entry as the first item in the batch with
`indices[0]` equal to `index`. The entry's base index will be smaller
than `indices[0]`, so shmem_free_swap() will fail and return 0 due to the
"base < index" check. The code will then call shmem_confirm_swap(), get
the order, check if it crosses the END boundary (which it doesn't), and
retry with the same index.
The next iteration will find the same entry again at the same index with
same indices, leading to an infinite loop.
Fix this by retrying with a round-down index, and abort if the index is
smaller than the truncate range.
Link: https://lkml.kernel.org/r/aXo6ltB5iqAKJzY8@KASONG-MC4
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Fixes: 8a1968bd997f ("mm/shmem, swap: fix race of truncate and swap entry split")
Signed-off-by: Kairui Song <kasong@tencent.com>
Reported-by: Chris Mason <clm@meta.com>
Closes: https://lore.kernel.org/linux-mm/20260128130336.727049-1-clm@meta.com/
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Link: https://lkml.kernel.org/r/20260128173915.162309-1-alexander@mihalicyn.com
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "liveupdate: fixes in error handling".
This series contains some fixes in LUO's error handling paths.
The first patch deals with failed freeze() attempts. The cleanup path
calls unfreeze, and that clears some data needed by later unpreserve
calls.
The second patch is a bit more involved. It deals with failed retrieve()
attempts. To do so properly, it reworks some of the error handling logic
in luo_file core.
Both these fixes are "theoretical" -- in the sense that I have not been
able to reproduce either of them in normal operation. The only supported
file type right now is memfd, and there is nothing userspace can do right
now to make it fail its retrieve or freeze. I need to make the retrieve
or freeze fail by artificially injecting errors. The injected errors
trigger a use-after-free and a double-free.
That said, once more complex file handlers are added or memfd preservation
is used in ways not currently expected or covered by the tests, we will be
able to see them on real systems.
This patch (of 2):
The unfreeze operation is supposed to undo the effects of the freeze
operation. serialized_data is not set by freeze, but by preserve.
Consequently, the unpreserve operation needs to access serialized_data to
undo the effects of the preserve operation. This includes freeing the
serialized data structures for example.
If a freeze callback fails, unfreeze is called for all frozen files. This
would clear serialized_data for them. Since live update has failed, it
can be expected that userspace aborts, releasing all sessions. When the
sessions are released, unpreserve will be called for all files. The
unfrozen files will see 0 in their serialized_data. This is not expected
by file handlers, and they might either fail, leaking data and state, or
might even crash or cause invalid memory access.
Do not clear serialized_data on unfreeze so it gets passed on to
unpreserve. There is no need to clear it on unpreserve since luo_file
will be freed immediately after.
Link: https://lkml.kernel.org/r/20260126230302.2936817-1-pratyush@kernel.org
Link: https://lkml.kernel.org/r/20260126230302.2936817-2-pratyush@kernel.org
Fixes: 7c722a7f44e0 ("liveupdate: luo_file: implement file systems callbacks")
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The original patch inverted the PTE unconditionally to avoid
L1TF-vulnerable PTEs, but Linux doesn't make this adjustment in 2-level
paging.
Adjust the logic to use the flip_protnone_guard() helper, which is a nop
on 2-level paging but inverts the address bits in all other paging modes.
This doesn't matter for the Xen aspect of the original change. Linux no
longer supports running 32bit PV under Xen, and Xen doesn't support
running any 32bit PV guests without using PAE paging.
Link: https://lkml.kernel.org/r/20260126211046.2096622-1-andrew.cooper3@citrix.com
Fixes: b505f1944535 ("x86/kfence: avoid writing L1TF-vulnerable PTEs")
Reported-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Closes: https://lore.kernel.org/lkml/CAKFNMokwjw68ubYQM9WkzOuH51wLznHpEOMSqtMoV1Rn9JV_gw@mail.gmail.com/
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Replace snprintf("%s") with the faster and more direct strscpy().
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20260201215247.677121-2-thorsten.blum@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Make sure sub-command of lightbar command starts with a 8bit
parameter to ensure alignment.
Fixes: 9600b8bdbfe4 ("platform/chrome: lightbar: Add support for large sequence")
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Link: https://lore.kernel.org/r/20260202100621.3608437-1-gwendal@google.com
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-30 (ice, i40e)
This series contains updates to ice and i40e drivers.
Grzegorz and Jake resolve issues around timing for E825 that can cause Tx
timestamps to be missed/interrupts not generated on ice.
Aaron Ma defers restart of PTP work until after after VSIs are rebuilt
to prevent NULL pointer dereference for ice.
Mohammad Heib removes calls to udp_tunnel_get_rx_info() in ice and i40e
which violates locking expectations and is unneeded.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: drop udp_tunnel_get_rx_info() call from i40e_open()
ice: drop udp_tunnel_get_rx_info() call from ndo_open()
ice: Fix PTP NULL pointer dereference during VSI rebuild
ice: PTP: fix missing timestamps on E825 hardware
ice: fix missing TX timestamps interrupts on E825 devices
====================
Link: https://patch.msgid.link/20260130185401.1091523-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
mptcp: implement .read_sock and .splice_read
This series is a preparation work for future in-kernel MPTCP sockets
usage. Here, two interfaces are implemented: read_sock and splice_read.
As a result of this series, splice() with MPTCP sockets -- which was
already supported -- is now improved.
- Patches 1-2: .read_sock implementation
- Patches 3-4: .splice_read implementation
- Patches 5-6: validate splice() support with MPTCP sockets.
====================
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-0-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The "splice" alternate mode for mptcp_connect.sh/.c is available now,
this patch adds mptcp_connect_splice.sh to test it in the MPTCP CI by
default.
Note that this mode is also supported by stable kernel versions, but
optimised in this patch series.
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-6-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice_read() functions of MPTCP.
do_splice() efficiently transfers data directly between two file
descriptors (infd and outfd) without copying to userspace, using
Linux's splice() system call.
Usage:
./mptcp_connect.sh -m splice
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel |