aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-02-02Merge branch '100GbE' of ↵Jakub Kicinski4-88/+136
git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue Tony Nguyen says: ==================== Intel Wired LAN Driver Updates 2026-01-30 (ice, i40e) This series contains updates to ice and i40e drivers. Grzegorz and Jake resolve issues around timing for E825 that can cause Tx timestamps to be missed/interrupts not generated on ice. Aaron Ma defers restart of PTP work until after after VSIs are rebuilt to prevent NULL pointer dereference for ice. Mohammad Heib removes calls to udp_tunnel_get_rx_info() in ice and i40e which violates locking expectations and is unneeded. * '100GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/tnguy/net-queue: i40e: drop udp_tunnel_get_rx_info() call from i40e_open() ice: drop udp_tunnel_get_rx_info() call from ndo_open() ice: Fix PTP NULL pointer dereference during VSI rebuild ice: PTP: fix missing timestamps on E825 hardware ice: fix missing TX timestamps interrupts on E825 devices ==================== Link: https://patch.msgid.link/20260130185401.1091523-1-anthony.l.nguyen@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02Merge branch 'mptcp-implement-read_sock-and-splice_read'Jakub Kicinski6-19/+308
Matthieu Baerts says: ==================== mptcp: implement .read_sock and .splice_read This series is a preparation work for future in-kernel MPTCP sockets usage. Here, two interfaces are implemented: read_sock and splice_read. As a result of this series, splice() with MPTCP sockets -- which was already supported -- is now improved. - Patches 1-2: .read_sock implementation - Patches 3-4: .splice_read implementation - Patches 5-6: validate splice() support with MPTCP sockets. ==================== Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-0-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02selftests: mptcp: connect: cover splice modeGeliang Tang2-0/+6
The "splice" alternate mode for mptcp_connect.sh/.c is available now, this patch adds mptcp_connect_splice.sh to test it in the MPTCP CI by default. Note that this mode is also supported by stable kernel versions, but optimised in this patch series. Suggested-by: Matthieu Baerts <matttbe@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-6-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02selftests: mptcp: add splice io modeGeliang Tang1-1/+78
This patch adds a new 'splice' io mode for mptcp_connect to test the newly added read_sock() and splice_read() functions of MPTCP. do_splice() efficiently transfers data directly between two file descriptors (infd and outfd) without copying to userspace, using Linux's splice() system call. Usage: ./mptcp_connect.sh -m splice Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Co-developed-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-5-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02mptcp: implement .splice_readGeliang Tang1-0/+117
This patch implements .splice_read interface of mptcp struct proto_ops as mptcp_splice_read() with reference to tcp_splice_read(). Corresponding to __tcp_splice_read(), __mptcp_splice_read() is defined, invoking mptcp_read_sock() instead of tcp_read_sock(). mptcp_splice_read() is almost the same as tcp_splice_read(), except for sock_rps_record_flow(). Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-4-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02tcp: export tcp_splice_stateGeliang Tang2-11/+13
Export struct tcp_splice_state and tcp_splice_data_recv() in net/tcp.h so that they can be used by MPTCP in the next patch. Suggested-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-3-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02mptcp: implement .read_sockGeliang Tang1-0/+82
Current in-kernel TCP sockets -- i.e. from nvme_tcp_try_recv() -- need to call .read_sock interface of struct proto_ops, but it's not implemented in MPTCP. This patch implements it with reference to __tcp_read_sock() and __mptcp_recvmsg_mskq(). Corresponding to tcp_recv_skb(), a new helper for MPTCP named mptcp_recv_skb() is added to peek a skb from sk->sk_receive_queue. Compared with __mptcp_recvmsg_mskq(), mptcp_read_sock() uses sk->sk_rcvbuf as the max read length. The LISTEN status is checked before the while loop, and mptcp_recv_skb() and mptcp_cleanup_rbuf() are invoked after the loop. In the loop, all flags checks for __mptcp_recvmsg_mskq() are removed. Reviewed-by: Hannes Reinecke <hare@kernel.org> Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-2-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02mptcp: add eat_recv_skb helperGeliang Tang1-7/+12
This patch extracts the free skb related code in __mptcp_recvmsg_mskq() into a new helper mptcp_eat_recv_skb(). This new helper will be used in the next patch. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Reviewed-by: Mat Martineau <martineau@kernel.org> Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260130-net-next-mptcp-splice-v2-1-31332ba70d7f@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02Merge branch 'enetc-v4-hardware-integration-fixes'Jakub Kicinski4-14/+24
Claudiu Manoil says: ==================== ENETC v4 hardware integration fixes ENETC v4 targeted fixes addressing SoC level integration issues regarding AXI settings and register access width. ==================== Link: https://patch.msgid.link/20260130141035.272471-1-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: enetc: Convert 16-bit register reads to 32-bit for ENETC v4Claudiu Manoil2-4/+15
It is not recommended to access the 32‑bit registers of this hardware IP using lower‑width accessors (i.e. 16‑bit), and the only exception to this rule was introduced in the initial ENETC v1 driver for the PMAR1 register, which holds the lower 16 bits of the primary MAC address of an SI. Meanwhile, this exception has been replicated in the v4 driver code as well. Since LS1028 (the only SoC with ENETC v1) is not affected by this issue, the current patch converts the 16‑bit reads from PMAR1 starting with ENETC v4. Fixes: 99100d0d9922 ("net: enetc: add preliminary support for i.MX95 ENETC PF") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260130141035.272471-5-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: enetc: Convert 16-bit register writes to 32-bit for ENETC v4Claudiu Manoil1-2/+2
For ENETC v4, which is integrated into more complex SoCs (compared to v1), 16‑bit register writes are blocked in the SoC interconnect on some chips. To be fair, it is not recommended to access 32‑bit registers of this IP using lower‑width accessors (i.e. 16‑bit), and the only exception to this rule was introduced by me in the initial ENETC v1 driver for the PMAR1 register, which holds the lower 16 bits of the primary MAC address of an SI. Meanwhile, this exception has been replicated for v4 as well. Since LS1028 (the only SoC with ENETC v1) is not affected by this issue, the current patch fixes the 16‑bit writes to PMAR1 starting with ENETC v4. Fixes: 99100d0d9922 ("net: enetc: add preliminary support for i.MX95 ENETC PF") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260130141035.272471-4-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: enetc: Remove CBDR cacheability AXI settings for ENETC v4Claudiu Manoil1-4/+0
For ENETC v4 these settings are controlled by the global ENETC command cache attribute registers (EnCAR), from the IERB register block. The hardcoded CDBR cacheability settings were inherited from LS1028A, and should be removed from the ENETC v4 driver as they conflict with the global IERB settings. Fixes: e3f4a0a8ddb4 ("net: enetc: add command BD ring support for i.MX95 ENETC") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260130141035.272471-3-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: enetc: Remove SI/BDR cacheability AXI settings for ENETC v4Claudiu Manoil1-4/+7
For ENETC v4 these settings are controlled by the global ENETC message and buffer cache attribute registers (EnBCAR and EnMCAR), from the IERB register block. The hardcoded cacheability settings were inherited from LS1028A, and should be removed from the ENETC v4 driver as they conflict with the global IERB settings. Fixes: 99100d0d9922 ("net: enetc: add preliminary support for i.MX95 ENETC PF") Signed-off-by: Claudiu Manoil <claudiu.manoil@nxp.com> Reviewed-by: Wei Fang <wei.fang@nxp.com> Link: https://patch.msgid.link/20260130141035.272471-2-claudiu.manoil@nxp.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02Merge branch 'ptp-vmclock-add-vm-generation-counter-and-acpi-notification'Jakub Kicinski5-26/+279
Takahiro Itazuri says: ==================== ptp: vmclock: Add VM generation counter and ACPI notification Similarly to live migration, starting a VM from some serialized state (aka snapshot) is an event which calls for adjusting guest clocks, hence a hypervisor should increase the disruption_marker before resuming the VM vCPUs, letting the guest know. However, loading a snapshot, is slightly different than live migration, especially since we can start multiple VMs from the same serialized state. Apart from adjusting clocks, the guest needs to take additional action during such events, e.g. recreate UUIDs, reset network adapters/connections, reseed entropy pools, etc. These actions are not necessary during live migration. This calls for a differentiation between the two triggering events. We differentiate between the two events via an extra field in the vmclock_abi, called vm_generation_counter. Whereas hypervisors should increase the disruption marker in both cases, they should only increase vm_generation_counter when a snapshot is loaded in a VM (not during live migration). Additionally, we attach an ACPI notification to VMClock. Implementing the notification is optional for the device. VMClock device will declare that it implements the notification by setting VMCLOCK_FLAG_NOTIFICATION_PRESENT bit in vmclock_abi flags. Hypervisors that implement the notification must send an ACPI notification every time seq_count changes to an even number. The driver will propagate these notifications to userspace via the poll() interface. ==================== Link: https://patch.msgid.link/20260130173704.12575-1-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ptp: ptp_vmclock: return TAI not UTCDavid Woodhouse1-5/+5
To output UTC would involve complex calculations about whether the time elapsed since the reference time has crossed the end of the month when a leap second takes effect. I've prototyped that, but it made me sad. Much better to report TAI, which is what PHCs should do anyway. And much much simpler. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es> Tested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-8-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ptp: ptp_vmclock: remove dependency on CONFIG_ACPIDavid Woodhouse2-5/+11
Now that we added device tree support we can remove dependency on CONFIG_ACPI. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es> Tested-by: Takahiro Itazuri <itazur@amazon.dom> Link: https://patch.msgid.link/20260130173704.12575-7-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ptp: ptp_vmclock: add 'VMCLOCK' to ACPI device matchDavid Woodhouse1-0/+1
As we finalised the spec, we spotted that vmgenid actually says that the _HID is supposed to be hypervisor-specific. Although in the 13 years since the original vmgenid doc was published, nobody seems to have cared about using _HID to distinguish between implementations on different hypervisors, and we only ever use the _CID. For consistency, match the _CID of "VMCLOCK" too. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es> Tested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-6-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ptp: ptp_vmclock: Add device tree supportDavid Woodhouse1-6/+61
Add device tree support to the ptp_vmclock driver, allowing it to probe via device tree in addition to ACPI. Handle optional interrupt for clock disruption notifications, mirroring the ACPI notification behaviour. Although the interrupt is marked as 'optional' in the DT bindings, if the device *advertises* the VMCLOCK_FLAG_NOTIFICATION_ABSENT then it *should* have an interrupt. The driver will refuse to initialize if not. Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: Takahiro Itazuri <itazur@amazon.com> Tested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-5-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02dt-bindings: ptp: Add amazon,vmclockDavid Woodhouse2-0/+47
The vmclock device provides a PTP clock source and precise timekeeping across live migration and snapshot/restore operations. The binding has a required memory region containing the vmclock_abi structure and an optional interrupt for clock disruption notifications. The full spec is at https://uapi-group.org/specifications/specs/vmclock/ Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Babis Chalios <bchalios@amazon.es> Reviewed-by: Krzysztof Kozlowski <krzysztof.kozlowski@oss.qualcomm.com> Tested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-4-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ptp: vmclock: support device notificationsBabis Chalios2-19/+148
Add optional support for device notifications in VMClock. When supported, the hypervisor will send a device notification every time it updates the seq_count to a new even value. Moreover, add support for poll() in VMClock as a means to propagate this notification to user space. poll() will return a POLLIN event to listeners every time seq_count changes to a value different than the one last seen (since open() or last read()/pread()). This means that when poll() returns a POLLIN event, listeners need to use read() to observe what has changed and update the reader's view of seq_count. In other words, after a poll() returned, all subsequent calls to poll() will immediately return with a POLLIN event until the listener calls read(). The device advertises support for the notification mechanism by setting flag VMCLOCK_FLAG_NOTIFICATION_PRESENT in vmclock_abi flags field. If the flag is not present the driver won't setup the ACPI notification handler and poll() will always immediately return POLLHUP. Signed-off-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Signed-off-by: Takahiro Itazuri <itazur@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Takahiro Itazuri <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-3-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ptp: vmclock: add vm generation counterBabis Chalios1-0/+15
Similar to live migration, loading a VM from some saved state (aka snapshot) is also an event that calls for clock adjustments in the guest. However, guests might want to take more actions as a response to such events, e.g. as discarding UUIDs, resetting network connections, reseeding entropy pools, etc. These are actions that guests don't typically take during live migration, so add a new field in the vmclock_abi called vm_generation_counter which informs the guest about such events. Hypervisor advertises support for vm_generation_counter through the VMCLOCK_FLAG_VM_GEN_COUNTER_PRESENT flag. Users need to check the presence of this bit in vmclock_abi flags field before using this flag. Signed-off-by: Babis Chalios <bchalios@amazon.es> Signed-off-by: David Woodhouse <dwmw@amazon.co.uk> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Tested-by: Takahiro Itazur <itazur@amazon.com> Link: https://patch.msgid.link/20260130173704.12575-2-itazur@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02Merge branch 'ipv6-misc-changes-in-output-path'Jakub Kicinski20-125/+144
Eric Dumazet says: ==================== ipv6: misc changes in output path Small optimizations mostly in ip6_xmit() path. TX performance increases by about 3 %. Patches 5-7: add dst4_mtu() and dst6_mtu() to save space. Last patch colocates inet6_cork in inet_cork_full. This series reduces kernel size by 494 bytes on x86_64: scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 4/2 grow/shrink: 9/23 up/down: 665/-1159 (-494) Function old new delta ip6_finish_output_gso_slowpath_drop - 197 +197 ip6_xmit 1452 1595 +143 do_ipv6_getsockopt 2855 2950 +95 kzalloc_noprof - 55 +55 ip4ip6_err 918 955 +37 __icmp_send 1499 1532 +33 do_ip_getsockopt 2573 2605 +32 __ip6_append_data 4109 4137 +28 __pfx_kzalloc_noprof - 16 +16 __pfx_ip6_finish_output_gso_slowpath_drop - 16 +16 ipmr_prepare_xmit 1232 1238 +6 ip6_forward 1905 1909 +4 ip6_cork_release 108 111 +3 ipv6_push_nfrag_opts 489 486 -3 ipv6_push_frag_opts 90 87 -3 ip6_finish_output2 1446 1437 -9 ip6_tnl_xmit 2639 2627 -12 ip6_default_advmss 176 160 -16 __ip6_rt_update_pmtu 1087 1071 -16 tcp_v6_syn_recv_sock 1715 1696 -19 tcp_v4_syn_recv_sock 1107 1088 -19 __ip_make_skb 1339 1320 -19 ip_setup_cork 406 385 -21 ip6_setup_cork 732 710 -22 rawv6_push_pending_frames 581 556 -25 ip6_push_pending_frames 184 157 -27 udpv6_splice_eof 203 170 -33 ip6_flush_pending_frames 220 183 -37 ip6_append_data 349 312 -37 udp_v6_push_pending_frames 155 115 -40 sit_tunnel_xmit 1957 1914 -43 __pfx_dst_mtu 64 - -64 tcp_v4_mtu_reduced 289 220 -69 tcp_v6_mtu_reduced 209 139 -70 ip6_make_skb 574 484 -90 ip6_finish_output 827 697 -130 dst_mtu 160 - -160 fib6_nh_mtu_change 511 336 -175 Total: Before=22584400, After=22583906, chg -0.00% ==================== Link: https://patch.msgid.link/20260130210303.3888261-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv6: colocate inet6_cork in inet_cork_fullEric Dumazet5-41/+40
All inet6_cork users also use one inet_cork_full. Reduce number of parameters and increase data locality. This saves ~275 bytes of code on x86_64. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-9-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv4: use dst4_mtu() instead of dst_mtu()Eric Dumazet7-14/+13
When we expect an IPv4 dst, use dst4_mtu() instead of dst_mtu() to save some code space. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-8-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv6: use dst6_mtu() instead of dst_mtu()Eric Dumazet6-16/+19
When we expect an IPv6 dst, use dst6_mtu() instead of dst_mtu() to save some code space. Due to current dst6_mtu() implementation, only convert users in IPv6 stack. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-7-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02inet: add dst4_mtu() and dst6_mtu() helpersEric Dumazet2-0/+12
With CONFIG_MITIGATION_RETPOLINE=y dst_mtu() is a bit fat, because it is generic. Indeed, clang does not always inline it. Add dst4_mtu() and dst6_mtu() helpers for callers that expect either ipv4_mtu() or ip6_mtu() to be called. These helpers are always inlined. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-6-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv6: use SKB_DROP_REASON_PKT_TOO_BIG in ip6_xmit()Eric Dumazet1-1/+1
When a too big packet is dropped, use SKB_DROP_REASON_PKT_TOO_BIG. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-5-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv6: use __skb_push() in ip6_xmit()Eric Dumazet1-2/+2
ip6_xmit() makes sure there is enough headroom in the skb, it can uses __skb_push() instead of the out-of-line skb_push(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-4-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv6: add some unlikely()/likely() clauses in ip6_output.cEric Dumazet1-12/+12
1) daddr is unlikely a multicast in ip6_finish_output2(). 2) ip6_finish_output_gso_slowpath_drop() should not be called often. 3) ip6_fragment() should not be called often. 4) opt is unlikely to be set. 5) ip6_xmit() and ip6_forward() mostly sends not too big packets. 6) Most __ip6_make_skb() calls are for UDP packets, not ICMPV6 ones. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-3-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02ipv6: pass proto by value to ipv6_push_nfrag_opts() and ipv6_push_frag_opts()Eric Dumazet4-40/+46
With CONFIG_STACKPROTECTOR_STRONG=y, it is better to avoid passing a pointer to an automatic variable. Change these exported functions to return 'u8 proto' instead of void. - ipv6_push_nfrag_opts() - ipv6_push_frag_opts() For instance, replace ipv6_push_frag_opts(skb, opt, &proto); with: proto = ipv6_push_frag_opts(skb, opt, proto); Note that even after this change, ip6_xmit() has to use a stack canary because of @first_hop variable. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130210303.3888261-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02tipc: use kfree_sensitive() for session key materialDaniel Hodges1-2/+2
The rx->skey field contains a struct tipc_aead_key with GCM-AES encryption keys used for TIPC cluster communication. Using plain kfree() leaves this sensitive key material in freed memory pages where it could potentially be recovered. Switch to kfree_sensitive() to ensure the key material is zeroed before the memory is freed. Fixes: 1ef6f7c9390f ("tipc: add automatic session key exchange") Signed-off-by: Daniel Hodges <hodgesd@meta.com> Link: https://patch.msgid.link/20260131180114.2121438-1-hodgesd@meta.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: remove unnecessary module_init/exit functionsEthan Nelson-Moore10-156/+0
Many network drivers have unnecessary empty module_init and module_exit functions. Remove them (including some that just print a message). Note that if a module_init function exists, a module_exit function must also exist; otherwise, the module cannot be unloaded. Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> Acked-by: Michael Grzeschik <m.grzeschik@pengutronix.de> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Ping-Ke Shih <pkshih@realtek.com> Acked-by: Toke Høiland-Jørgensen <toke@toke.dk> Link: https://patch.msgid.link/20260131004327.18112-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: ethernet: use module_pci_driver; remove useless driver versionsEthan Nelson-Moore3-103/+3
The module version is useless, and the only thing these drivers' init routines did besides pci_register_driver was to print the driver name and/or version. Acked-by: Francois Romieu <romieu@fr.zoreil.com> (epic100) Reviewed-by: Simon Horman <horms@kernel.org> (epic100, sis900) Reviewed-by: Sai Krishna <saikrishnag@marvell.com> (epic100) Signed-off-by: Ethan Nelson-Moore <enelsonmoore@gmail.com> Link: https://patch.msgid.link/20260131022441.56274-1-enelsonmoore@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: add a debug check in __skb_push()Eric Dumazet1-0/+1
Add the following check, to detect bugs sooner for CONFIG_DEBUG_NET=y builds. DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head); Signed-off-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260130160253.2936789-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02Merge branch 'net-phy-dp83867-always-program-r-sgmii-enable-bits'Jakub Kicinski1-40/+23
Sean Anderson says: ==================== net: phy: dp83867: Always program R/SGMII enable bits The hardware designers at my company neglected to read the datasheet for this PHY and did not add appropriate resistors to configure it for SGMII. Add support for configuring the it based on phy-mode instead of relying on the resistors for a suitable default. ==================== Link: https://patch.msgid.link/20260129171205.3868605-1-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: phy: dp83867: Always program R/SGMII enable bitsSean Anderson1-24/+10
If the board designers have neglected to populate the appropriate resistors on the strapping pins then the phy may default to the wrong interface mode. Enable/disable the RGMII/SGMII enable bits as necessary to select the correct interface. The dp83867 strapping pins have four levels and typically configure two features at once. LED_0 controls both port mirroring and whether SGMII is enabled. If it is pulled to VDDIO, both port mirroring and SGMII will be enabled. For variants of the dp83867 that do not support SGMII, this will prevent data from being transferred. As we now explicitly set the SGMII and RGMII enable bits, we do not need to detect whether SGMII has been inadvertently enabled. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20260129171205.3868605-3-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: phy: dp83867: Program TX FIFO for all interfacesSean Anderson1-17/+14
All supported interfaces use the TX FIFO register at least some of the time, so there's no point in checking the interface. Retain the check for the RX FIFO level since it is only used by SGMII. Signed-off-by: Sean Anderson <sean.anderson@linux.dev> Link: https://patch.msgid.link/20260129171205.3868605-2-sean.anderson@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: stmmac: fix stm32 (and potentially others) resume regressionRussell King (Oracle)1-1/+2
Marek reported that suspending stm32 causes the following errors when the interface is administratively down: $ echo devices > /sys/power/pm_test $ echo mem > /sys/power/state ... ck_ker_eth2stp already disabled ... ck_ker_eth2stp already unprepared ... On suspend, stm32 starts the eth2stp clock in its suspend method, and stops it in the resume method. This is because the blamed commit omits the call to the platform glue ->suspend() method, but does make the call to the platform glue ->resume() method. This problem affects all other converted drivers as well - e.g. looking at the PCIe drivers, pci_save_state() will not be called, but pci_restore_state() will be. Similar issues affect all other drivers. Fix this by always calling the ->suspend() method, even when the network interface is down. This fixes all the conversions to the platform glue ->suspend() and ->resume() methods. Link: https://lore.kernel.org/r/20260114081809.12758-1-marex@nabladev.com Fixes: 07bbbfe7addf ("net: stmmac: add suspend()/resume() platform ops") Reported-by: Marek Vasut <marex@nabladev.com> Tested-by: Marek Vasut <marex@nabladev.com> Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/E1vlujh-00000007Hkw-2p6r@rmk-PC.armlinux.org.uk Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02fs: consolidate fsverity_info lookup in buffer.cChristoph Hellwig1-16/+11
Look up the fsverity_info once in end_buffer_async_read_io, and then pass it along to the I/O completion workqueue in struct postprocess_bh_ctx. This amortizes the lookup better once it becomes less efficient. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Reviewed-by: Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20260202060754.270269-8-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-02fsverity: push out fsverity_info lookupChristoph Hellwig7-35/+63
Pass a struct fsverity_info to the verification and readahead helpers, and push the lookup into the callers. Right now this is a very dumb almost mechanic move that open codes a lot of fsverity_info_addr() calls in the file systems. The subsequent patches will clean this up. This prepares for reducing the number of fsverity_info lookups, which will allow to amortize them better when using a more expensive lookup method. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Acked-by: David Sterba <dsterba@suse.com> # btrfs Link: https://lore.kernel.org/r/20260202060754.270269-7-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-02fsverity: deconstify the inode pointer in struct fsverity_infoChristoph Hellwig3-5/+6
A lot of file system code expects a non-const inode pointer. Dropping the const qualifier here allows using the inode pointer in verify_data_block and prepares for further argument reductions. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Darrick J. Wong" <djwong@kernel.org> Link: https://lore.kernel.org/r/20260202060754.270269-6-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-02fsverity: kick off hash readahead at data I/O submission timeChristoph Hellwig9-74/+161
Currently all reads of the fsverity hashes are kicked off from the data I/O completion handler, leading to needlessly dependent I/O. This is worked around a bit by performing readahead on the level 0 nodes, but still fairly ineffective. Switch to a model where the ->read_folio and ->readahead methods instead kick off explicit readahead of the fsverity hashed so they are usually available at I/O completion time. For 64k sequential reads on my test VM this improves read performance from 2.4GB/s - 2.6GB/s to 3.5GB/s - 3.9GB/s. The improvements for random reads are likely to be even bigger. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: David Sterba <dsterba@suse.com> # btrfs Link: https://lore.kernel.org/r/20260202060754.270269-5-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-02ext4: move ->read_folio and ->readahead to readpage.cChristoph Hellwig3-30/+31
Keep all the read into pagecache code in a single file. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Acked-by: Theodore Ts'o <tytso@mit.edu> Link: https://lore.kernel.org/r/20260202060754.270269-4-hch@lst.de Signed-off-by: Eric Biggers <ebiggers@kernel.org>
2026-02-02net: rss: fix reporting RXH_XFRM_NO_CHANGE as input_xfrm for contextsJakub Kicinski2-10/+2
Initializing input_xfrm to RXH_XFRM_NO_CHANGE in RSS contexts is problematic. I think I did this to make it clear that the context does not have its own settings applied. But unlike ETH_RSS_HASH_NO_CHANGE which is zero, RXH_XFRM_NO_CHANGE is 0xff. We need to be careful when reading the value back, and remember to treat 0xff as 0. Remove the initialization and switch to storing 0. This lets us also remove the workaround in ethnl_rss_set(). Get side does not need any adjustments and context get no longer reports: RSS input transformation: symmetric-xor: on symmetric-or-xor: on Unknown bits in RSS input transformation: 0xfc for NICs which don't support input_xfrm. Remove the init of hfunc to ETH_RSS_HASH_NO_CHANGE while at it. As already mentioned this is a noop since ETH_RSS_HASH_NO_CHANGE is 0 and struct is zalloc'd. But as this fix exemplifies storing NO_CHANGE as state is fragile. This issue is implicitly caught by running our selftests because YNL in selftests errors out on unknown bits. Fixes: d3e2c7bab124 ("ethtool: rss: support setting input-xfrm via Netlink") Link: https://patch.msgid.link/20260130190311.811129-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: l3mdev: use skb_dst_dev_rcu() in l3mdev_l3_out()Eric Dumazet1-3/+4
Extend the RCU section a bit so that we can use the safer skb_dst_dev_rcu() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260130191906.3781856-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02bnxt_en: Allow ntuple filters for dropsJoe Damato1-6/+7
It appears that in commit 7efd79c0e689 ("bnxt_en: Add drop action support for ntuple"), bnxt gained support for ntuple filters for packet drops. However, support for this does not seem to work in recent kernels or against net-next: % sudo ethtool -U eth0 flow-type udp4 src-ip 1.1.1.1 action -1 rmgr: Cannot insert RX class rule: Operation not supported Cannot insert classification rule The issue is that the existing code uses ethtool_get_flow_spec_ring_vf, which will return a non-zero value if the ring_cookie is set to RX_CLS_FLOW_DISC, which then causes bnxt_add_ntuple_cls_rule to return -EOPNOTSUPP because it thinks the user is trying to set an ntuple filter for a vf. Fix this by first checking that the ring_cookie is not RX_CLS_FLOW_DISC. After this patch, ntuple filters for drops can be added: % sudo ethtool -U eth0 flow-type udp4 src-ip 1.1.1.1 action -1 Added rule with ID 0 % ethtool -n eth0 44 RX rings available Total 1 rules Filter: 0 Rule Type: UDP over IPv4 Src IP addr: 1.1.1.1 mask: 0.0.0.0 Dest IP addr: 0.0.0.0 mask: 255.255.255.255 TOS: 0x0 mask: 0xff Src port: 0 mask: 0xffff Dest port: 0 mask: 0xffff Action: Drop Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260131003042.2570434-1-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02tools: ynl: cli: make the output compactJakub Kicinski1-3/+6
Make the default (non-JSON) output more compact. Looking at RSS context dumps is pretty much impossible without this, because default print shows the indirection table with line per entry: 'indir': [0, 1, 2, ... And indirection tables have 100-200 entries each. The compact output is far more readable: 'indir': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, Reviewed-by: Donald Hunter <donald.hunter@gmail.com> Link: https://patch.msgid.link/20260131203029.1173492-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02docs: networking: mention that RSS table should be 4x the queue countJakub Kicinski1-4/+8
Spell out the recommendation that the RSS table should be 4x the queue count to avoid traffic imbalance. Include minor rephrasing and removal of the explicit 128 entry example since a 128 entry table is inadequate on modern machines. Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131225454.1225151-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02selftests: drv-net: rss: validate min RSS table sizeJakub Kicinski2-0/+89
Add a test which checks that the RSS table is at least 4x the max queue count supported by the device. The original RSS spec from Microsoft stated that the RSS indirection table should be 2 to 8 times the CPU count, presumably assuming queue per CPU. If the CPU count is not a power of two, however, a power-of-2 table 2x larger than queue count results in a 33% traffic imbalance. Validate that the indirection table is at least 4x the queue count. This lowers the imbalance to 16% which empirically appears to be more acceptable to memcache-like workloads. Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260131225454.1225151-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-02net: spacemit: display phy driver informationChukun Pan1-0/+2
Print the PHY driver used and interrupt status after connection. Signed-off-by: Chukun Pan <amadeus@jmu.edu.cn> Reviewed-by: Andrew Lunn <andrew@lunn.ch> Link: https://patch.msgid.link/20260201100001.33102-1-amadeus@jmu.edu.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>