| Age | Commit message (Collapse) | Author | Files | Lines |
|
The driver never programs the MAC frame size and jabber registers,
causing the hardware to reject frames larger than the default 1518
bytes even when larger DMA buffers are allocated.
Program MAC_MAXIMUM_FRAME_SIZE, MAC_TRANSMIT_JABBER_SIZE, and
MAC_RECEIVE_JABBER_SIZE based on the configured MTU. Also fix the
maximum buffer size from 4096 to 4095, since the descriptor buffer
size field is only 12 bits. Account for double VLAN tags in frame
size calculations.
Fixes: bfec6d7f2001 ("net: spacemit: Add K1 Ethernet MAC")
Cc: stable@vger.kernel.org
Signed-off-by: Tomas Hlavacek <tmshlvck@gmail.com>
Link: https://patch.msgid.link/20260130102301.477514-1-tmshlvck@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Biju Das says:
====================
Add support for Renesas RZ/G3L GBETH
From: Biju Das <biju.das.jz@bp.renesas.com>
The Renesas RZ/G3L GBETH IP uses Synopsys DesignWare MAC version 5.30
compared to other Renesas SoC such as RZ/V2H that use MAC version 5.20.
The RZ/G3L GBETH requires an extra clock compared to RZ/G3E and has pps
interrupts. Document the Renesas RZ/G3L GBETH IP in bindings and add
support for the RZ/G3L GBETH in dwmac-renesas-gbeth glue driver.
====================
Link: https://patch.msgid.link/20260131161250.5047-1-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Compared to other Renesas GBETH stmmac glue drivers, RZ/G3L GBETH IP use
the version Synopsys DesignWare MAC (version 5.30). It has an extra clock
compared to RZ/V2H and has ptp_pps_o interrupts. Add support for RZ/G3L
GBETH by reusing device data of RZ/V2H and can be extended to add other
functionalities later.
Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260131161250.5047-3-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add device tree binding support for the Gigabit Ethernet (GBETH) IP on
Renesas RZ/G3L SoC. This SoC uses different Synopsys DesignWare MAC
version 5.30 compared to RZ/G3E.
RZ/G3L requires an extra clock compared to RZ/G3E and has pps interrupts.
Add a new compatible string "renesas,r9a08g046-gbeth" for RZ/G3L SoC and
update the schema to handle hardware differences between SoC variants.
Extend the base snps,dwmac.yaml schema to accommodate the PPS interrupts.
Acked-by: Conor Dooley <conor.dooley@microchip.com>
Signed-off-by: Biju Das <biju.das.jz@bp.renesas.com>
Reviewed-by: Lad Prabhakar <prabhakar.mahadev-lad.rj@bp.renesas.com>
Link: https://patch.msgid.link/20260131161250.5047-2-biju.das.jz@bp.renesas.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
I/O requests beyond the end of the filesystem should be zeroed out,
similar to loopback devices and that is what we expect.
Fixes: ce63cb62d794 ("erofs: support unencoded inodes for fileio")
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
The EROFS on-disk format uses a tiny, plain metadata design that
prioritizes performance and minimizes complex inconsistencies against
common writable disk filesystems (almost all serious metadata
inconsistency cannot happen in well-designed immutable filesystems like
EROFS). EROFS deliberately avoids artificial design flaws to eliminate
serious security risks from untrusted remote sources by design,
although human-made implementation bugs can still happen sometimes.
Currently, there is no strict check to prevent compressed inodes,
especially LZ4-compressed inodes, from being read in plain filesystems.
Starting with erofs-utils 1.0 and Linux 5.3, LZ4_0PADDING sb feature
is automatically enabled for LZ4-compressed EROFS images to support
in-place decompression. Furthermore, since Linux 5.4 LTS is no longer
supported, we no longer need to handle ancient LZ4-compressed EROFS
images generated by erofs-utils prior to 1.0.
To formally distinguish different filesystem types for improved
security:
- Use the presence of LZ4_0PADDING or a non-zero
`dsb->u1.lz4_max_distance` as a marker for compressed filesystems
containing LZ4-compressed inodes only;
- For other algorithms, use `dsb->u1.available_compr_algs` bitmap.
Note: LZ4_0PADDING has been supported since Linux 5.4 (the first formal
kernel version), so exposing it via sysfs is no longer necessary and is
now deprecated (but remain it for five more years until 2031):
`dsb->u1` has been strictly non-zero for all EROFS images containing
compressed inodes starting with erofs-utils v1.3 and it is actually
a much better marker for compressed filesystems.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Symlink lengths are now cached in in-memory inodes directly so that
readlink can be sped up.
Signed-off-by: Gao Xiang <hsiangkao@linux.alibaba.com>
|
|
Agilex devkit support a separate eMMC daughter card. Document Agilex
eMMC daughter board compatible.
[dinguyen] becauce of patch 1cb8486ac5f3 ("dt-bindings: arm: altera: combine Intel's SoCFPGA into altera.yaml"),
I moved the change to altera.yaml file.
Acked-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Signed-off-by: Ng Tze Yee <tzeyee.ng@altera.com>
Signed-off-by: Dinh Nguyen <dinguyen@kernel.org>
|
|
liveupdate is used to enable Live Update Orchestrator (LUO) early during
boot. Add it to kernel-parameters.txt so users can discover and use it.
Link: https://lkml.kernel.org/r/20260130112036.359806-1-me@linux.beauty
Signed-off-by: Li Chen <me@linux.beauty>
Acked-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Borislav Petkov (AMD)" <bp@alien8.de>
Cc: Frank van der Linden <fvdl@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jonathan Corbet <corbet@lwn.net>
Cc: Kees Cook <kees@kernel.org>
Cc: Li RongQing <lirongqing@baidu.com>
Cc: Pawan Gupta <pawan.kumar.gupta@linux.intel.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Pratyush Yadav <ptyadav@amazon.de>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
When truncating a large swap entry, shmem_free_swap() returns 0 when the
entry's index doesn't match the given index due to lookup alignment. The
failure fallback path checks if the entry crosses the end border and
aborts when it happens, so truncate won't erase an unexpected entry or
range. But one scenario was ignored.
When `index` points to the middle of a large swap entry, and the large
swap entry doesn't go across the end border, find_get_entries() will
return that large swap entry as the first item in the batch with
`indices[0]` equal to `index`. The entry's base index will be smaller
than `indices[0]`, so shmem_free_swap() will fail and return 0 due to the
"base < index" check. The code will then call shmem_confirm_swap(), get
the order, check if it crosses the END boundary (which it doesn't), and
retry with the same index.
The next iteration will find the same entry again at the same index with
same indices, leading to an infinite loop.
Fix this by retrying with a round-down index, and abort if the index is
smaller than the truncate range.
Link: https://lkml.kernel.org/r/aXo6ltB5iqAKJzY8@KASONG-MC4
Fixes: 809bc86517cc ("mm: shmem: support large folio swap out")
Fixes: 8a1968bd997f ("mm/shmem, swap: fix race of truncate and swap entry split")
Signed-off-by: Kairui Song <kasong@tencent.com>
Reported-by: Chris Mason <clm@meta.com>
Closes: https://lore.kernel.org/linux-mm/20260128130336.727049-1-clm@meta.com/
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
Cc: Baoquan He <bhe@redhat.com>
Cc: Barry Song <baohua@kernel.org>
Cc: Chris Li <chrisl@kernel.org>
Cc: Hugh Dickins <hughd@google.com>
Cc: Kemeng Shi <shikemeng@huaweicloud.com>
Cc: Nhat Pham <nphamcs@gmail.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Link: https://lkml.kernel.org/r/20260128173915.162309-1-alexander@mihalicyn.com
Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@futurfusion.io>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Patch series "liveupdate: fixes in error handling".
This series contains some fixes in LUO's error handling paths.
The first patch deals with failed freeze() attempts. The cleanup path
calls unfreeze, and that clears some data needed by later unpreserve
calls.
The second patch is a bit more involved. It deals with failed retrieve()
attempts. To do so properly, it reworks some of the error handling logic
in luo_file core.
Both these fixes are "theoretical" -- in the sense that I have not been
able to reproduce either of them in normal operation. The only supported
file type right now is memfd, and there is nothing userspace can do right
now to make it fail its retrieve or freeze. I need to make the retrieve
or freeze fail by artificially injecting errors. The injected errors
trigger a use-after-free and a double-free.
That said, once more complex file handlers are added or memfd preservation
is used in ways not currently expected or covered by the tests, we will be
able to see them on real systems.
This patch (of 2):
The unfreeze operation is supposed to undo the effects of the freeze
operation. serialized_data is not set by freeze, but by preserve.
Consequently, the unpreserve operation needs to access serialized_data to
undo the effects of the preserve operation. This includes freeing the
serialized data structures for example.
If a freeze callback fails, unfreeze is called for all frozen files. This
would clear serialized_data for them. Since live update has failed, it
can be expected that userspace aborts, releasing all sessions. When the
sessions are released, unpreserve will be called for all files. The
unfrozen files will see 0 in their serialized_data. This is not expected
by file handlers, and they might either fail, leaking data and state, or
might even crash or cause invalid memory access.
Do not clear serialized_data on unfreeze so it gets passed on to
unpreserve. There is no need to clear it on unpreserve since luo_file
will be freed immediately after.
Link: https://lkml.kernel.org/r/20260126230302.2936817-1-pratyush@kernel.org
Link: https://lkml.kernel.org/r/20260126230302.2936817-2-pratyush@kernel.org
Fixes: 7c722a7f44e0 ("liveupdate: luo_file: implement file systems callbacks")
Signed-off-by: Pratyush Yadav (Google) <pratyush@kernel.org>
Reviewed-by: Mike Rapoport (Microsoft) <rppt@kernel.org>
Cc: Pasha Tatashin <pasha.tatashin@soleen.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
The original patch inverted the PTE unconditionally to avoid
L1TF-vulnerable PTEs, but Linux doesn't make this adjustment in 2-level
paging.
Adjust the logic to use the flip_protnone_guard() helper, which is a nop
on 2-level paging but inverts the address bits in all other paging modes.
This doesn't matter for the Xen aspect of the original change. Linux no
longer supports running 32bit PV under Xen, and Xen doesn't support
running any 32bit PV guests without using PAE paging.
Link: https://lkml.kernel.org/r/20260126211046.2096622-1-andrew.cooper3@citrix.com
Fixes: b505f1944535 ("x86/kfence: avoid writing L1TF-vulnerable PTEs")
Reported-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Closes: https://lore.kernel.org/lkml/CAKFNMokwjw68ubYQM9WkzOuH51wLznHpEOMSqtMoV1Rn9JV_gw@mail.gmail.com/
Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
Tested-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Tested-by: Borislav Petkov (AMD) <bp@alien8.de>
Cc: Alexander Potapenko <glider@google.com>
Cc: Marco Elver <elver@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Jann Horn <jannh@google.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
Replace snprintf("%s") with the faster and more direct strscpy().
Signed-off-by: Thorsten Blum <thorsten.blum@linux.dev>
Link: https://lore.kernel.org/r/20260201215247.677121-2-thorsten.blum@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
Make sure sub-command of lightbar command starts with a 8bit
parameter to ensure alignment.
Fixes: 9600b8bdbfe4 ("platform/chrome: lightbar: Add support for large sequence")
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Link: https://lore.kernel.org/r/20260202100621.3608437-1-gwendal@google.com
Signed-off-by: Tzung-Bi Shih <tzungbi@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue
Tony Nguyen says:
====================
Intel Wired LAN Driver Updates 2026-01-30 (ice, i40e)
This series contains updates to ice and i40e drivers.
Grzegorz and Jake resolve issues around timing for E825 that can cause Tx
timestamps to be missed/interrupts not generated on ice.
Aaron Ma defers restart of PTP work until after after VSIs are rebuilt
to prevent NULL pointer dereference for ice.
Mohammad Heib removes calls to udp_tunnel_get_rx_info() in ice and i40e
which violates locking expectations and is unneeded.
* '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue:
i40e: drop udp_tunnel_get_rx_info() call from i40e_open()
ice: drop udp_tunnel_get_rx_info() call from ndo_open()
ice: Fix PTP NULL pointer dereference during VSI rebuild
ice: PTP: fix missing timestamps on E825 hardware
ice: fix missing TX timestamps interrupts on E825 devices
====================
Link: https://patch.msgid.link/20260130185401.1091523-1-anthony.l.nguyen@intel.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Matthieu Baerts says:
====================
mptcp: implement .read_sock and .splice_read
This series is a preparation work for future in-kernel MPTCP sockets
usage. Here, two interfaces are implemented: read_sock and splice_read.
As a result of this series, splice() with MPTCP sockets -- which was
already supported -- is now improved.
- Patches 1-2: .read_sock implementation
- Patches 3-4: .splice_read implementation
- Patches 5-6: validate splice() support with MPTCP sockets.
====================
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-0-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The "splice" alternate mode for mptcp_connect.sh/.c is available now,
this patch adds mptcp_connect_splice.sh to test it in the MPTCP CI by
default.
Note that this mode is also supported by stable kernel versions, but
optimised in this patch series.
Suggested-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-6-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds a new 'splice' io mode for mptcp_connect to test
the newly added read_sock() and splice_read() functions of MPTCP.
do_splice() efficiently transfers data directly between two file
descriptors (infd and outfd) without copying to userspace, using
Linux's splice() system call.
Usage:
./mptcp_connect.sh -m splice
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-5-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch implements .splice_read interface of mptcp struct proto_ops
as mptcp_splice_read() with reference to tcp_splice_read().
Corresponding to __tcp_splice_read(), __mptcp_splice_read() is defined,
invoking mptcp_read_sock() instead of tcp_read_sock().
mptcp_splice_read() is almost the same as tcp_splice_read(), except for
sock_rps_record_flow().
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-4-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Export struct tcp_splice_state and tcp_splice_data_recv() in net/tcp.h
so that they can be used by MPTCP in the next patch.
Suggested-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-3-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Current in-kernel TCP sockets -- i.e. from nvme_tcp_try_recv() -- need
to call .read_sock interface of struct proto_ops, but it's not
implemented in MPTCP.
This patch implements it with reference to __tcp_read_sock() and
__mptcp_recvmsg_mskq().
Corresponding to tcp_recv_skb(), a new helper for MPTCP named
mptcp_recv_skb() is added to peek a skb from sk->sk_receive_queue.
Compared with __mptcp_recvmsg_mskq(), mptcp_read_sock() uses
sk->sk_rcvbuf as the max read length. The LISTEN status is checked
before the while loop, and mptcp_recv_skb() and mptcp_cleanup_rbuf()
are invoked after the loop. In the loop, all flags checks for
__mptcp_recvmsg_mskq() are removed.
Reviewed-by: Hannes Reinecke <hare@kernel.org>
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-2-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch extracts the free skb related code in __mptcp_recvmsg_mskq()
into a new helper mptcp_eat_recv_skb().
This new helper will be used in the next patch.
Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn>
Reviewed-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-1-31332ba70d7f@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Claudiu Manoil says:
====================
ENETC v4 hardware integration fixes
ENETC v4 targeted fixes addressing SoC level integration issues
regarding AXI settings and register access width.
====================
Link: https://patch.msgid.link/20260130141035.272471-1-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
It is not recommended to access the 32‑bit registers of this hardware IP
using lower‑width accessors (i.e. 16‑bit), and the only exception to
this rule was introduced in the initial ENETC v1 driver for the PMAR1
register, which holds the lower 16 bits of the primary MAC address of
an SI. Meanwhile, this exception has been replicated in the v4 driver
code as well.
Since LS1028 (the only SoC with ENETC v1) is not affected by this issue,
the current patch converts the 16‑bit reads from PMAR1 starting with
ENETC v4.
Fixes: 99100d0d9922 ("net: enetc: add preliminary support for i.MX95 ENETC PF")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260130141035.272471-5-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For ENETC v4, which is integrated into more complex SoCs (compared to v1),
16‑bit register writes are blocked in the SoC interconnect on some chips.
To be fair, it is not recommended to access 32‑bit registers of this IP
using lower‑width accessors (i.e. 16‑bit), and the only exception to
this rule was introduced by me in the initial ENETC v1 driver for the
PMAR1 register, which holds the lower 16 bits of the primary MAC address
of an SI. Meanwhile, this exception has been replicated for v4 as well.
Since LS1028 (the only SoC with ENETC v1) is not affected by this issue,
the current patch fixes the 16‑bit writes to PMAR1 starting with ENETC
v4.
Fixes: 99100d0d9922 ("net: enetc: add preliminary support for i.MX95 ENETC PF")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260130141035.272471-4-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For ENETC v4 these settings are controlled by the global ENETC
command cache attribute registers (EnCAR), from the IERB register
block.
The hardcoded CDBR cacheability settings were inherited from LS1028A,
and should be removed from the ENETC v4 driver as they conflict
with the global IERB settings.
Fixes: e3f4a0a8ddb4 ("net: enetc: add command BD ring support for i.MX95 ENETC")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260130141035.272471-3-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
For ENETC v4 these settings are controlled by the global ENETC
message and buffer cache attribute registers (EnBCAR and EnMCAR),
from the IERB register block.
The hardcoded cacheability settings were inherited from LS1028A,
and should be removed from the ENETC v4 driver as they conflict
with the global IERB settings.
Fixes: 99100d0d9922 ("net: enetc: add preliminary support for i.MX95 ENETC PF")
Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com>
Reviewed-by: Wei Fang <wei.fang@nxp.com>
Link: https://patch.msgid.link/20260130141035.272471-2-claudiu.manoil@nxp.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Takahiro Itazuri says:
====================
ptp: vmclock: Add VM generation counter and ACPI notification
Similarly to live migration, starting a VM from some serialized state
(aka snapshot) is an event which calls for adjusting guest clocks, hence
a hypervisor should increase the disruption_marker before resuming the
VM vCPUs, letting the guest know.
However, loading a snapshot, is slightly different than live migration,
especially since we can start multiple VMs from the same serialized
state. Apart from adjusting clocks, the guest needs to take additional
action during such events, e.g. recreate UUIDs, reset network
adapters/connections, reseed entropy pools, etc. These actions are not
necessary during live migration. This calls for a differentiation
between the two triggering events.
We differentiate between the two events via an extra field in the
vmclock_abi, called vm_generation_counter. Whereas hypervisors should
increase the disruption marker in both cases, they should only increase
vm_generation_counter when a snapshot is loaded in a VM (not during live
migration).
Additionally, we attach an ACPI notification to VMClock. Implementing
the notification is optional for the device. VMClock device will declare
that it implements the notification by setting
VMCLOCK_FLAG_NOTIFICATION_PRESENT bit in vmclock_abi flags. Hypervisors
that implement the notification must send an ACPI notification every
time seq_count changes to an even number. The driver will propagate
these notifications to userspace via the poll() interface.
====================
Link: https://patch.msgid.link/20260130173704.12575-1-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
To output UTC would involve complex calculations about whether the time
elapsed since the reference time has crossed the end of the month when
a leap second takes effect. I've prototyped that, but it made me sad.
Much better to report TAI, which is what PHCs should do anyway.
And much much simpler.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-8-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Now that we added device tree support we can remove dependency on
CONFIG_ACPI.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Tested-by: Takahiro Itazuri <itazur@amazon.dom>
Link: https://patch.msgid.link/20260130173704.12575-7-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
As we finalised the spec, we spotted that vmgenid actually says that the
_HID is supposed to be hypervisor-specific. Although in the 13 years
since the original vmgenid doc was published, nobody seems to have cared
about using _HID to distinguish between implementations on different
hypervisors, and we only ever use the _CID.
For consistency, match the _CID of "VMCLOCK" too.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-6-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add device tree support to the ptp_vmclock driver, allowing it to probe
via device tree in addition to ACPI.
Handle optional interrupt for clock disruption notifications, mirroring
the ACPI notification behaviour.
Although the interrupt is marked as 'optional' in the DT bindings, if
the device *advertises* the VMCLOCK_FLAG_NOTIFICATION_ABSENT then it
*should* have an interrupt. The driver will refuse to initialize if not.
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-5-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The vmclock device provides a PTP clock source and precise timekeeping
across live migration and snapshot/restore operations.
The binding has a required memory region containing the vmclock_abi
structure and an optional interrupt for clock disruption notifications.
The full spec is at https://uapi-group.org/specifications/specs/vmclock/
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-4-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add optional support for device notifications in VMClock. When
supported, the hypervisor will send a device notification every time it
updates the seq_count to a new even value.
Moreover, add support for poll() in VMClock as a means to propagate this
notification to user space. poll() will return a POLLIN event to
listeners every time seq_count changes to a value different than the one
last seen (since open() or last read()/pread()). This means that when
poll() returns a POLLIN event, listeners need to use read() to observe
what has changed and update the reader's view of seq_count. In other
words, after a poll() returned, all subsequent calls to poll() will
immediately return with a POLLIN event until the listener calls read().
The device advertises support for the notification mechanism by setting
flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If
the flag is not present the driver won't setup the ACPI notification
handler and poll() will always immediately return POLLHUP.
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Signed-off-by: Takahiro Itazuri <itazur@amazon.com>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Takahiro Itazuri <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-3-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Similar to live migration, loading a VM from some saved state (aka
snapshot) is also an event that calls for clock adjustments in the
guest. However, guests might want to take more actions as a response to
such events, e.g. as discarding UUIDs, resetting network connections,
reseeding entropy pools, etc. These are actions that guests don't
typically take during live migration, so add a new field in the
vmclock_abi called vm_generation_counter which informs the guest about
such events.
Hypervisor advertises support for vm_generation_counter through the
VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT flag. Users need to check the
presence of this bit in vmclock_abi flags field before using this flag.
Signed-off-by: Babis Chalios <bchalios@amazon.es>
Signed-off-by: David Woodhouse <dwmw@amazon.co.uk>
Reviewed-by: David Woodhouse <dwmw@amazon.co.uk>
Tested-by: Takahiro Itazur <itazur@amazon.com>
Link: https://patch.msgid.link/20260130173704.12575-2-itazur@amazon.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Eric Dumazet says:
====================
ipv6: misc changes in output path
Small optimizations mostly in ip6_xmit() path.
TX performance increases by about 3 %.
Patches 5-7: add dst4_mtu() and dst6_mtu() to save space.
Last patch colocates inet6_cork in inet_cork_full.
This series reduces kernel size by 494 bytes on x86_64:
scripts/bloat-o-meter -t vmlinux.old vmlinux.new
add/remove: 4/2 grow/shrink: 9/23 up/down: 665/-1159 (-494)
Function old new delta
ip6_finish_output_gso_slowpath_drop - 197 +197
ip6_xmit 1452 1595 +143
do_ipv6_getsockopt 2855 2950 +95
kzalloc_noprof - 55 +55
ip4ip6_err 918 955 +37
__icmp_send 1499 1532 +33
do_ip_getsockopt 2573 2605 +32
__ip6_append_data 4109 4137 +28
__pfx_kzalloc_noprof - 16 +16
__pfx_ip6_finish_output_gso_slowpath_drop - 16 +16
ipmr_prepare_xmit 1232 1238 +6
ip6_forward 1905 1909 +4
ip6_cork_release 108 111 +3
ipv6_push_nfrag_opts 489 486 -3
ipv6_push_frag_opts 90 87 -3
ip6_finish_output2 1446 1437 -9
ip6_tnl_xmit 2639 2627 -12
ip6_default_advmss 176 160 -16
__ip6_rt_update_pmtu 1087 1071 -16
tcp_v6_syn_recv_sock 1715 1696 -19
tcp_v4_syn_recv_sock 1107 1088 -19
__ip_make_skb 1339 1320 -19
ip_setup_cork 406 385 -21
ip6_setup_cork 732 710 -22
rawv6_push_pending_frames 581 556 -25
ip6_push_pending_frames 184 157 -27
udpv6_splice_eof 203 170 -33
ip6_flush_pending_frames 220 183 -37
ip6_append_data 349 312 -37
udp_v6_push_pending_frames 155 115 -40
sit_tunnel_xmit 1957 1914 -43
__pfx_dst_mtu 64 - -64
tcp_v4_mtu_reduced 289 220 -69
tcp_v6_mtu_reduced 209 139 -70
ip6_make_skb 574 484 -90
ip6_finish_output 827 697 -130
dst_mtu 160 - -160
fib6_nh_mtu_change 511 336 -175
Total: Before=22584400, After=22583906, chg -0.00%
====================
Link: https://patch.msgid.link/20260130210303.3888261-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
All inet6_cork users also use one inet_cork_full.
Reduce number of parameters and increase data locality.
This saves ~275 bytes of code on x86_64.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When we expect an IPv4 dst, use dst4_mtu() instead of dst_mtu()
to save some code space.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-8-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When we expect an IPv6 dst, use dst6_mtu() instead of dst_mtu()
to save some code space.
Due to current dst6_mtu() implementation, only convert
users in IPv6 stack.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-7-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With CONFIG_MITIGATION_RETPOLINE=y dst_mtu() is a bit fat,
because it is generic.
Indeed, clang does not always inline it.
Add dst4_mtu() and dst6_mtu() helpers for callers that
expect either ipv4_mtu() or ip6_mtu() to be called.
These helpers are always inlined.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-6-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
When a too big packet is dropped, use SKB_DROP_REASON_PKT_TOO_BIG.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-5-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
ip6_xmit() makes sure there is enough headroom in the skb,
it can uses __skb_push() instead of the out-of-line skb_push().
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-4-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
1) daddr is unlikely a multicast in ip6_finish_output2().
2) ip6_finish_output_gso_slowpath_drop() should not be called often.
3) ip6_fragment() should not be called often.
4) opt is unlikely to be set.
5) ip6_xmit() and ip6_forward() mostly sends not too big packets.
6) Most __ip6_make_skb() calls are for UDP packets,
not ICMPV6 ones.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-3-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
With CONFIG_STACKPROTECTOR_STRONG=y, it is better to avoid passing
a pointer to an automatic variable.
Change these exported functions to return 'u8 proto'
instead of void.
- ipv6_push_nfrag_opts()
- ipv6_push_frag_opts()
For instance, replace
ipv6_push_frag_opts(skb, opt, &proto);
with:
proto = ipv6_push_frag_opts(skb, opt, proto);
Note that even after this change, ip6_xmit() has to use a stack canary
because of @first_hop variable.
Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com>
Link: https://patch.msgid.link/20260130210303.3888261-2-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The rx->skey field contains a struct tipc_aead_key with GCM-AES
encryption keys used for TIPC cluster communication. Using plain
kfree() leaves this sensitive key material in freed memory pages
where it could potentially be recovered.
Switch to kfree_sensitive() to ensure the key material is zeroed
before the memory is freed.
Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange")
Signed-off-by: Daniel Hodges <hodgesd@meta.com>
Link: https://patch.msgid.link/20260131180114.2121438-1-hodgesd@meta.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Many network drivers have unnecessary empty module_init and module_exit
functions. Remove them (including some that just print a message). Note
that if a module_init function exists, a module_exit function must also
exist; otherwise, the module cannot be unloaded.
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Acked-by: Marc Kleine-Budde <mkl@pengutronix.de>
Acked-by: Michael Grzeschik <m.grzeschik@pengutronix.de>
Reviewed-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Ping-Ke Shih <pkshih@realtek.com>
Acked-by: Toke Høiland-Jørgensen <toke@toke.dk>
Link: https://patch.msgid.link/20260131004327.18112-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The module version is useless, and the only thing these drivers' init
routines did besides pci_register_driver was to print the driver name
and/or version.
Acked-by: Francois Romieu <romieu@fr.zoreil.com> (epic100)
Reviewed-by: Simon Horman <horms@kernel.org> (epic100, sis900)
Reviewed-by: Sai Krishna <saikrishnag@marvell.com> (epic100)
Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com>
Link: https://patch.msgid.link/20260131022441.56274-1-enelsonmoore@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Add the following check, to detect bugs sooner for CONFIG_DEBUG_NET=y
builds.
DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head);
Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20260130160253.2936789-1-edumazet@google.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Sean Anderson says:
====================
net: phy: dp83867: Always program R/SGMII enable bits
The hardware designers at my company neglected to read the datasheet for
this PHY and did not add appropriate resistors to configure it for
SGMII. Add support for configuring the it based on phy-mode instead of
relying on the resistors for a suitable default.
====================
Link: https://patch.msgid.link/20260129171205.3868605-1-sean.anderson@linux.dev
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|