aboutsummaryrefslogtreecommitdiff
path: root/net/ipv4
AgeCommit message (Collapse)AuthorFilesLines
2026-03-10net: add xmit recursion limit to tunnel xmit functionsWeiming Shi1-0/+13
Tunnel xmit functions (iptunnel_xmit, ip6tunnel_xmit) lack their own recursion limit. When a bond device in broadcast mode has GRE tap interfaces as slaves, and those GRE tunnels route back through the bond, multicast/broadcast traffic triggers infinite recursion between bond_xmit_broadcast() and ip_tunnel_xmit()/ip6_tnl_xmit(), causing kernel stack overflow. The existing XMIT_RECURSION_LIMIT (8) in the no-qdisc path is not sufficient because tunnel recursion involves route lookups and full IP output, consuming much more stack per level. Use a lower limit of 4 (IP_TUNNEL_RECURSION_LIMIT) to prevent overflow. Add recursion detection using dev_xmit_recursion helpers directly in iptunnel_xmit() and ip6tunnel_xmit() to cover all IPv4/IPv6 tunnel paths including UDP encapsulated tunnels (VXLAN, Geneve, etc.). Move dev_xmit_recursion helpers from net/core/dev.h to public header include/linux/netdevice.h so they can be used by tunnel code. BUG: KASAN: stack-out-of-bounds in blake2s.constprop.0+0xe7/0x160 Write of size 32 at addr ffff88810033fed0 by task kworker/0:1/11 Workqueue: mld mld_ifc_work Call Trace: <TASK> __build_flow_key.constprop.0 (net/ipv4/route.c:515) ip_rt_update_pmtu (net/ipv4/route.c:1073) iptunnel_xmit (net/ipv4/ip_tunnel_core.c:84) ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847) gre_tap_xmit (net/ipv4/ip_gre.c:779) dev_hard_start_xmit (net/core/dev.c:3887) sch_direct_xmit (net/sched/sch_generic.c:347) __dev_queue_xmit (net/core/dev.c:4802) bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312) bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279) bond_start_xmit (drivers/net/bonding/bond_main.c:5530) dev_hard_start_xmit (net/core/dev.c:3887) __dev_queue_xmit (net/core/dev.c:4841) ip_finish_output2 (net/ipv4/ip_output.c:237) ip_output (net/ipv4/ip_output.c:438) iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86) gre_tap_xmit (net/ipv4/ip_gre.c:779) dev_hard_start_xmit (net/core/dev.c:3887) sch_direct_xmit (net/sched/sch_generic.c:347) __dev_queue_xmit (net/core/dev.c:4802) bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312) bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279) bond_start_xmit (drivers/net/bonding/bond_main.c:5530) dev_hard_start_xmit (net/core/dev.c:3887) __dev_queue_xmit (net/core/dev.c:4841) ip_finish_output2 (net/ipv4/ip_output.c:237) ip_output (net/ipv4/ip_output.c:438) iptunnel_xmit (net/ipv4/ip_tunnel_core.c:86) ip_tunnel_xmit (net/ipv4/ip_tunnel.c:847) gre_tap_xmit (net/ipv4/ip_gre.c:779) dev_hard_start_xmit (net/core/dev.c:3887) sch_direct_xmit (net/sched/sch_generic.c:347) __dev_queue_xmit (net/core/dev.c:4802) bond_dev_queue_xmit (drivers/net/bonding/bond_main.c:312) bond_xmit_broadcast (drivers/net/bonding/bond_main.c:5279) bond_start_xmit (drivers/net/bonding/bond_main.c:5530) dev_hard_start_xmit (net/core/dev.c:3887) __dev_queue_xmit (net/core/dev.c:4841) mld_sendpack mld_ifc_work process_one_work worker_thread </TASK> Fixes: 745e20f1b626 ("net: add a recursion limit in xmit path") Reported-by: Xiang Mei <xmei5@asu.edu> Signed-off-by: Weiming Shi <bestswngs@gmail.com> Link: https://patch.msgid.link/20260306160133.3852900-2-bestswngs@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-09tcp: inline tcp_chrono_start()Eric Dumazet1-24/+0
tcp_chrono_start() is small enough, and used in TCP sendmsg() fast path (from tcp_skb_entail()). Note clang is already inlining it from functions in tcp_output.c. Inlining it improves performance and reduces bloat : $ scripts/bloat-o-meter -t vmlinux.old vmlinux.new add/remove: 0/2 grow/shrink: 1/0 up/down: 1/-84 (-83) Function old new delta tcp_skb_entail 280 281 +1 __pfx_tcp_chrono_start 16 - -16 tcp_chrono_start 68 - -68 Total: Before=25192434, After=25192351, chg -0.00% Note that tcp_chrono_stop() is too big. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Neal Cardwell <ncardwell@google.com> Link: https://patch.msgid.link/20260308123549.2924460-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09tcp: avoid dst->ops->check() call in tcp_v{4,6}_do_rcv()Eric Dumazet1-1/+1
If incoming skb dst matches the socket cached one, there is no need to call again dst->ops->check(). Network layer already validated the skb dst for us, usually from tcp_v{4,6}_early_demux(). Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260306154322.1086539-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09tcp: move tcp_v4_early_demux() to net/ipv4/ip_input.cEric Dumazet2-38/+39
tcp_v4_early_demux() has a single caller : ip_rcv_finish_core(). Move it to net/ipv4/ip_input.c and mark it static, for possible compiler/linker optimizations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260306131130.654991-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09tcp: remove unused hash_size from struct tcp_out_optionsKeita Morisaki1-1/+0
hash_size is declared but never read. The MD5 path always uses a fixed size of 16, and the TCP-AO path uses tcp_ao_maclen(). This closes a 7-byte hole and reduces the struct size from 96 to 88 bytes. Suggested-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Keita Morisaki <kmta1236@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260307051619.51685-1-kmta1236@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09net: nexthop: fix percpu use-after-free in remove_nh_grp_entryMehul Rao1-3/+11
When removing a nexthop from a group, remove_nh_grp_entry() publishes the new group via rcu_assign_pointer() then immediately frees the removed entry's percpu stats with free_percpu(). However, the synchronize_net() grace period in the caller remove_nexthop_from_groups() runs after the free. RCU readers that entered before the publish still see the old group and can dereference the freed stats via nh_grp_entry_stats_inc() -> get_cpu_ptr(nhge->stats), causing a use-after-free on percpu memory. Fix by deferring the free_percpu() until after synchronize_net() in the caller. Removed entries are chained via nh_list onto a local deferred free list. After the grace period completes and all RCU readers have finished, the percpu stats are safely freed. Fixes: f4676ea74b85 ("net: nexthop: Add nexthop group entry stats") Cc: stable@vger.kernel.org Signed-off-by: Mehul Rao <mehulrao@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/20260306233821.196789-1-mehulrao@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-09net: Add SPDX ids to some source filesTim Bird6-3/+7
Add SPDX-License-Identifier lines to several source files under the network sub-directory. Work on files in the core, dns_resolver, ipv4, ipv6 and netfilter sub-dirs. Remove boilerplate and license reference text to avoid ambiguity. Rusty Russell has expressed that his contributions were intended to be GPL-2.0-or-later. Signed-off-by: Tim Bird <tim.bird@sony.com> Link: https://patch.msgid.link/20260305004724.87469-1-tim.bird@sony.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-06net: annotate data races around sk->sk_protJiayuan Chen1-2/+6
inet_sendmsg() and inet_recvmsg() access sk->sk_prot without lock_sock() or any other synchronization. sock_replace_proto() (used by sockmap), TLS and MPTCP can change sk->sk_prot under us, so these functions need READ_ONCE() to avoid load tearing. Signed-off-by: Jiayuan Chen <jiayuan.chen@linux.dev> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260304064253.16955-1-jiayuan.chen@linux.dev Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-06inet_diag: report delayed ack timer informationEric Dumazet2-6/+11
inet_sk_diag_fill() populates r->idiag_timer with the following precedence order: 1 - Retransmit timer. 4 - Probe0 timer. 2 - Keepalive timer. This patch adds a new value, last in the list, if other timers are not active. 5 - Delayed ACK timer. A corresponding iproute2 patch will follow to replace "unknown" with "delack": ESTAB 10 0 [2002:a05:6830:1f86::]:12875 [2002:a05:6830:1f85::]:50438 timer:(unknown,003ms,0) ino:152178 sk:3004 cgroup:unreachable:189 <-> skmem:(r1344,rb12780520,t0,tb262144,f2752,w0,o250,bl0,d0) ts usec_ts ... Also add the following enum in uapi/linux/inet_diag.h as suggested by David Ahern. enum { IDIAG_TIMER_OFF, IDIAG_TIMER_ON, IDIAG_TIMER_KEEPALIVE, IDIAG_TIMER_TIMEWAIT, IDIAG_TIMER_PROBE0, IDIAG_TIMER_DELACK, }; Neal Cardwell suggested to test for ICSK_ACK_TIMER: inet_csk_clear_xmit_timer() does not call sk_stop_timer() because INET_CSK_CLEAR_TIMERS is unset. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260305114829.2163276-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-06net: change sock.sk_ino and sock_i_ino() to u64Jeff Layton4-4/+4
inode->i_ino is being converted to a u64. sock.sk_ino (which caches the inode number) must also be widened to avoid truncation on 32-bit architectures where unsigned long is only 32 bits. Change sk_ino from unsigned long to u64, and update the return type of sock_i_ino() to match. Fix all format strings that print the result of sock_i_ino() (%lu -> %llu), and widen the intermediate variables and function parameters in the diag modules that were using int to hold the inode number. Note that the UAPI socket diag structures (inet_diag_msg.idiag_inode, unix_diag_msg.udiag_ino, etc.) are all __u32 and cannot be changed without breaking the ABI. The assignments to those fields will silently truncate, which is the existing behavior. Acked-by: Marc Kleine-Budde <mkl@pengutronix.de> # for net/can Signed-off-by: Jeff Layton <jlayton@kernel.org> Link: https://patch.msgid.link/20260304-iino-u64-v3-3-2257ad83d372@kernel.org Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-03-05tcp: Initialise ehash secrets during connect() and listen().Kuniyuki Iwashima1-2/+15
inet_ehashfn() and inet6_ehashfn() initialise random secrets on the first call by net_get_random_once(). While the init part is patched out using static keys, with CONFIG_STACKPROTECTOR_STRONG=y, this causes a compiler to generate a stack canary due to an automatic variable, unsigned long ___flags, in the DO_ONCE() macro being passed to __do_once_start(). With FDO, this is visible in __inet_lookup_established() and __inet6_lookup_established() too. Let's initialise the secrets by get_random_sleepable_once() in the slow paths: inet_hash() for listen(), and inet_hash_connect() and inet6_hash_connect() for connect(). Note that IPv6 listener will initialise both IPv4 & IPv6 secrets in inet_hash() for IPv4-mapped IPv6 address. With the patch, the stack size is reduced by 16 bytes (___flags + a stack canary) and NOPs for the static key go away. Before: __inet6_lookup_established() ... push %rbx sub $0x38,%rsp # stack is 56 bytes mov %edx,%ebx # sport mov %gs:0x299419f(%rip),%rax # load stack canary mov %rax,0x30(%rsp) and store it onto stack mov 0x440(%rdi),%r15 # net->ipv4.tcp_death_row.hashinfo nop 32: mov %r8d,%ebp # hnum shl $0x10,%ebp # hnum << 16 nop 3d: mov 0x70(%rsp),%r14d # sdif or %ebx,%ebp # INET_COMBINED_PORTS(sport, hnum) mov 0x11a8382(%rip),%eax # inet6_ehashfn() ... After: __inet6_lookup_established() ... push %rbx sub $0x28,%rsp # stack is 40 bytes mov 0x60(%rsp),%ebp # sdif mov %r8d,%r14d # hnum shl $0x10,%r14d # hnum << 16 or %edx,%r14d # INET_COMBINED_PORTS(sport, hnum) mov 0x440(%rdi),%rax # net->ipv4.tcp_death_row.hashinfo mov 0x1194f09(%rip),%r10d # inet6_ehashfn() ... Suggested-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260303235424.3877267-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-05tcp: shrink per-packet memset in __tcp_transmit_skb()Keita Morisaki1-4/+11
Use struct_group() to group the three fields in tcp_out_options that are read unconditionally by tcp_options_write() and bpf_skops_write_hdr_opt() (mss, bpf_opt_len, num_sack_blocks), then replace the full-struct memset with a targeted memset of only that group. struct tcp_out_options is 40 bytes without MPTCP and 96 bytes with CONFIG_MPTCP=y (typical distro config). Every remaining field is either assigned before first use by tcp_established_options()/tcp_syn_options(), or gated behind its OPTION_* flag in tcp_options_write(). This memset runs on every transmitted TCP packet, so shrinking it from 96 (or 40) bytes to 4 bytes reduces per-packet overhead on the hot path. Assembly comparison (x86-64, GCC 13, CONFIG_MPTCP=y): Before: rep stos zeroing 96 bytes (5 instructions, 12 8-byte stores) After: movl $0x0 zeroing 4 bytes (1 instruction, 1 store) Also add opts->options = 0 at the top of tcp_syn_options(), which already used |= without a prior clear. tcp_established_options() already clears opts->options at its top. Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com> Signed-off-by: Keita Morisaki <kmta1236@gmail.com> Acked-by: Matthieu Baerts (NGI0) <matttbe@kernel.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260304111517.2088694-1-kmta1236@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-05Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski13-72/+77
Cross-merge networking fixes after downstream PR (net-7.0-rc3). No conflicts. Adjacent changes: net/netfilter/nft_set_rbtree.c fb7fb4016300 ("netfilter: nf_tables: clone set on flush only") 3aea466a4399 ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04net: use ktime_t in struct scm_timestamping_internalEric Dumazet1-39/+22
Instead of using struct timespec64 in scm_timestamping_internal, use ktime_t, saving 24 bytes in kernel stack. This makes tcp_update_recv_tstamps() small enough to be inlined. The ktime_t -> timespec64 conversions happen after socket lock has been released in tcp_recvmsg(), and only if the application requested them. $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 0/2 grow/shrink: 5/4 up/down: 146/-277 (-131) Function old new delta tcp_zerocopy_receive 2383 2425 +42 mptcp_recvmsg 1565 1607 +42 tcp_recvmsg_locked 3797 3823 +26 put_cmsg_scm_timestamping64 131 149 +18 put_cmsg_scm_timestamping 131 149 +18 __pfx_tcp_update_recv_tstamps 16 - -16 do_tcp_getsockopt 4024 4006 -18 tcp_recv_timestamp 474 430 -44 tcp_zc_handle_leftover 417 371 -46 __sock_recv_timestamp 1087 1031 -56 tcp_update_recv_tstamps 97 - -97 Total: Before=25223788, After=25223657, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Link: https://patch.msgid.link/20260304012747.881644-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04tcp: move tcp_do_parse_auth_options() to net/ipv4/tcp.cEric Dumazet2-54/+54
tcp_do_parse_auth_options() fast path user is tcp_inbound_hash(). Move tcp_do_parse_auth_options() right before tcp_inbound_hash() so that it can be (auto)inlined by the compiler. As a bonus, stack canary is removed from tcp_inbound_hash(). Also use EXPORT_IPV6_MOD(tcp_do_parse_auth_options). $ scripts/bloat-o-meter -t vmlinux.0 vmlinux add/remove: 0/0 grow/shrink: 1/0 up/down: 131/0 (131) Function old new delta tcp_inbound_hash 565 696 +131 Total: Before=25223788, After=25223919, chg +0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260303191243.557245-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-04tcp: secure_seq: add back ports to TS offsetEric Dumazet3-25/+31
This reverts 28ee1b746f49 ("secure_seq: downgrade to per-host timestamp offsets") tcp_tw_recycle went away in 2017. Zhouyan Deng reported off-path TCP source port leakage via SYN cookie side-channel that can be fixed in multiple ways. One of them is to bring back TCP ports in TS offset randomization. As a bonus, we perform a single siphash() computation to provide both an ISN and a TS offset. Fixes: 28ee1b746f49 ("secure_seq: downgrade to per-host timestamp offsets") Reported-by: Zhouyan Deng <dengzhouyan_nwpu@163.com> Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Acked-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260302205527.1982836-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-03net/tcp-md5: Fix MAC comparison to be constant-timeEric Biggers3-2/+5
To prevent timing attacks, MACs need to be compared in constant time. Use the appropriate helper function for this. Fixes: cfb6eeb4c860 ("[TCP]: MD5 Signature Option (RFC2385) support.") Fixes: 658ddaaf6694 ("tcp: md5: RST: getting md5 key from listener") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Link: https://patch.msgid.link/20260302203409.13388-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-03net: ipv4: fix ARM64 alignment fault in multipath hash seedYung Chih Su1-2/+3
`struct sysctl_fib_multipath_hash_seed` contains two u32 fields (user_seed and mp_seed), making it an 8-byte structure with a 4-byte alignment requirement. In `fib_multipath_hash_from_keys()`, the code evaluates the entire struct atomically via `READ_ONCE()`: mp_seed = READ_ONCE(net->ipv4.sysctl_fib_multipath_hash_seed).mp_seed; While this silently works on GCC by falling back to unaligned regular loads which the ARM64 kernel tolerates, it causes a fatal kernel panic when compiled with Clang and LTO enabled. Commit e35123d83ee3 ("arm64: lto: Strengthen READ_ONCE() to acquire when CONFIG_LTO=y") strengthens `READ_ONCE()` to use Load-Acquire instructions (`ldar` / `ldapr`) to prevent compiler reordering bugs under Clang LTO. Since the macro evaluates the full 8-byte struct, Clang emits a 64-bit `ldar` instruction. ARM64 architecture strictly requires `ldar` to be naturally aligned, thus executing it on a 4-byte aligned address triggers a strict Alignment Fault (FSC = 0x21). Fix the read side by moving the `READ_ONCE()` directly to the `u32` member, which emits a safe 32-bit `ldar Wn`. Furthermore, Eric Dumazet pointed out that `WRITE_ONCE()` on the entire struct in `proc_fib_multipath_hash_set_seed()` is also flawed. Analysis shows that Clang splits this 8-byte write into two separate 32-bit `str` instructions. While this avoids an alignment fault, it destroys atomicity and exposes a tear-write vulnerability. Fix this by explicitly splitting the write into two 32-bit `WRITE_ONCE()` operations. Finally, add the missing `READ_ONCE()` when reading `user_seed` in `proc_fib_multipath_hash_seed()` to ensure proper pairing and concurrency safety. Fixes: 4ee2a8cace3f ("net: ipv4: Add a sysctl to set multipath hash seed") Signed-off-by: Yung Chih Su <yuuchihsu@gmail.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260302060247.7066-1-yuuchihsu@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-03net/tcp-ao: Fix MAC comparison to be constant-timeEric Biggers2-1/+3
To prevent timing attacks, MACs need to be compared in constant time. Use the appropriate helper function for this. Fixes: 0a3a809089eb ("net/tcp: Verify inbound TCP-AO signed segments") Cc: stable@vger.kernel.org Signed-off-by: Eric Biggers <ebiggers@kernel.org> Reviewed-by: Dmitry Safonov <0x7f454c46@gmail.com> Link: https://patch.msgid.link/20260302203600.13561-1-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02dccp Remove inet_hashinfo2_init_mod().Kuniyuki Iwashima1-26/+8
Commit c92c81df93df ("net: dccp: fix kernel crash on module load") added inet_hashinfo2_init_mod() for DCCP. Commit 22d6c9eebf2e ("net: Unexport shared functions for DCCP.") removed EXPORT_SYMBOL_GPL() it but forgot to remove the function itself. Let's remove inet_hashinfo2_init_mod(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260301063756.1581685-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Don't hold RTNL for ipmr_rtm_route().Kuniyuki Iwashima1-13/+21
ipmr_mfc_add() and ipmr_mfc_delete() are already protected by a dedicated mutex. rtm_to_ipmr_mfcc() calls __ipmr_get_table(), __dev_get_by_index(), amd ipmr_find_vif(). Once __dev_get_by_index() is converted to dev_get_by_index_rcu(), we can move the other two functions under that same RCU section and drop RTNL for ipmr_rtm_route(). Let's do that conversion and drop ASSERT_RTNL() in mr_call_mfc_notifiers(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-16-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Add dedicated mutex for mrt->{mfc_hash,mfc_cache_list}.Kuniyuki Iwashima1-6/+22
We will no longer hold RTNL for ipmr_rtm_route() to modify the MFC hash table. Only __dev_get_by_index() in rtm_to_ipmr_mfcc() is the RTNL dependant, otherwise, we just need protection for mrt->mfc_hash and mrt->mfc_cache_list. Let's add a new mutex for ipmr_mfc_add(), ipmr_mfc_delete(), and mroute_clean_tables() (setsockopt(MRT_FLUSH or MRT_DONE)). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-15-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr/ip6mr: Convert net->ipv[46].ipmr_seq to atomic_t.Kuniyuki Iwashima1-2/+2
We will no longer hold RTNL for ipmr_mfc_add() and ipmr_mfc_delete(). MFC entry can be loosely connected with VIF by its index for mrt->vif_table[] (stored in mfc_parent), but the two tables are not synchronised. i.e. Even if VIF 1 is removed, MFC for VIF 1 is not automatically removed. The only field that the MFC/VIF interfaces share is net->ipv[46].ipmr_seq, which is protected by RTNL. Adding a new mutex for both just to protect a single field is overkill. Let's convert the field to atomic_t. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-14-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Call fib_rules_unregister() without RTNL.Kuniyuki Iwashima1-2/+11
fib_rules_unregister() removes ops from net->rules_ops under spinlock, calls ops->delete() for each rule, and frees the ops. ipmr_rules_ops_template does not have ->delete(), and any operation does not require RTNL there. Let's move fib_rules_unregister() from ipmr_rules_exit_rtnl() to ipmr_net_exit(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-12-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Remove RTNL in ipmr_rules_init() and ipmr_net_init().Kuniyuki Iwashima1-10/+5
When ipmr_free_table() is called from ipmr_rules_init() or ipmr_net_init(), the netns is not yet published. Thus, no device should have been registered, and mroute_clean_tables() will not call vif_delete(), so unregister_netdevice_many() is unnecessary. unregister_netdevice_many() does nothing if the list is empty, but it requires RTNL due to the unconditional ASSERT_RTNL() at the entry of unregister_netdevice_many_notify(). Let's remove unnecessary RTNL and ASSERT_RTNL() and instead add WARN_ON_ONCE() in ipmr_free_table(). Note that we use a local list for the new WARN_ON_ONCE() because dev_kill_list passed from ipmr_rules_exit_rtnl() may have some devices when other ops->init() fails after ipmr durnig setup_net(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-11-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Convert ipmr_net_exit_batch() to ->exit_rtnl().Kuniyuki Iwashima1-18/+13
ipmr_net_ops uses ->exit_batch() to acquire RTNL only once for dying network namespaces. ipmr does not depend on the ordering of ->exit_rtnl() and ->exit_batch() of other pernet_operations (unlike fib_net_ops). Once ipmr_free_table() is called and all devices are queued for destruction in ->exit_rtnl(), later during NETDEV_UNREGISTER, ipmr_device_event() will not see anything in vif table and just do nothing. Let's convert ipmr_net_exit_batch() to ->exit_rtnl(). Note that fib_rules_unregister() does not need RTNL and we will remove RTNL and unregister_netdevice_many() in ipmr_net_init(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-10-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Move unregister_netdevice_many() out of ipmr_free_table().Kuniyuki Iwashima1-8/+17
This is a prep commit to convert ipmr_net_exit_batch() to ->exit_rtnl(). Let's move unregister_netdevice_many() in ipmr_free_table() to its callers. Now ipmr_rules_exit() can do batching all tables per netns. Note that later we will remove RTNL and unregister_netdevice_many() in ipmr_rules_init(). Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-9-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Move unregister_netdevice_many() out of mroute_clean_tables().Kuniyuki Iwashima1-10/+24
This is a prep commit to convert ipmr_net_exit_batch() to ->exit_rtnl(). Let's move unregister_netdevice_many() in mroute_clean_tables() to its callers. As a bonus, mrtsock_destruct() can do batching for all tables. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-8-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Convert ipmr_rtm_dumproute() to RCU.Kuniyuki Iwashima1-9/+20
ipmr_rtm_dumproute() calls mr_table_dump() or mr_rtm_dumproute(), and mr_rtm_dumproute() finally calls mr_table_dump(). mr_table_dump() calls the passed function, _ipmr_fill_mroute(). _ipmr_fill_mroute() is a wrapper of ipmr_fill_mroute() to cast struct mr_mfc * to struct mfc_cache *. ipmr_fill_mroute() can be already called safely under RCU. Let's convert ipmr_rtm_dumproute() to RCU. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-7-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Convert ipmr_rtm_getroute() to RCU.Kuniyuki Iwashima2-18/+20
ipmr_rtm_getroute() calls __ipmr_get_table(), ipmr_cache_find(), and ipmr_fill_mroute(). The table is not removed until netns dismantle, and net->ipv4.mr_tables is managed with RCU list API, so __ipmr_get_table() is safe under RCU. struct mfc_cache is freed by mr_cache_put() after RCU grace period, so we can use ipmr_cache_find() under RCU. rcu_read_lock() around it was just to avoid lockdep splat for rhl_for_each_entry_rcu(). ipmr_fill_mroute() calls mr_fill_mroute(), which properly uses RCU. Let's drop RTNL for ipmr_rtm_getroute() and use RCU instead. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-6-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Use MAXVIFS in mroute_msgsize().Kuniyuki Iwashima1-5/+4
mroute_msgsize() calculates skb size needed for ipmr_fill_mroute(). The size differs based on mrt->maxvif. We will drop RTNL for ipmr_rtm_getroute() and mrt->maxvif may change under RCU. To avoid -EMSGSIZE, let's calculate the size with the maximum value of mrt->maxvif, MAXVIFS. struct rtnexthop is 8 bytes and MAXVIFS is 32, so the maximum delta is 256 bytes, which is small enough. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-5-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Convert ipmr_rtm_dumplink() to RCU.Kuniyuki Iwashima1-12/+17
net->ipv4.mr_tables is updated under RTNL and can be read safely under RCU. Once created, the multicast route tables are not removed until netns dismantle. ipmr_rtm_dumplink() does not need RTNL protection for ipmr_for_each_table() and ipmr_fill_table() if RCU is held. Even if mrt->maxvif changes concurrently, ipmr_fill_vif() returns true to continue dumping the next table. Let's convert it to RCU. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-4-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02ipmr: Annotate access to mrt->mroute_do_{pim,assert,wrvifwhole}.Kuniyuki Iwashima1-10/+10
These fields in struct mr_table are updated in ip_mroute_setsockopt() under RTNL: * mroute_do_pim * mroute_do_assert * mroute_do_wrvifwhole However, ip_mroute_getsockopt() does not hold RTNL and read the first two fields locklessly, and ip_mr_forward() reads all the three under RCU. pim_rcv_v1() also reads mroute_do_pim locklessly. Let's use WRITE_ONCE() and READ_ONCE() for them. Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260228221800.1082070-3-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-02net: remove addr_len argument of recvmsg() handlersEric Dumazet9-48/+38
Use msg->msg_namelen as a place holder instead of a temporary variable, notably in inet[6]_recvmsg(). This removes stack canaries and allows tail-calls. $ scripts/bloat-o-meter -t vmlinux.old vmlinux add/remove: 0/0 grow/shrink: 2/19 up/down: 26/-532 (-506) Function old new delta rawv6_recvmsg 744 767 +23 vsock_dgram_recvmsg 55 58 +3 vsock_connectible_recvmsg 50 47 -3 unix_stream_recvmsg 161 158 -3 unix_seqpacket_recvmsg 62 59 -3 unix_dgram_recvmsg 42 39 -3 tcp_recvmsg 546 543 -3 mptcp_recvmsg 1568 1565 -3 ping_recvmsg 806 800 -6 tcp_bpf_recvmsg_parser 983 974 -9 ip_recv_error 588 576 -12 ipv6_recv_rxpmtu 442 428 -14 udp_recvmsg 1243 1224 -19 ipv6_recv_error 1046 1024 -22 udpv6_recvmsg 1487 1461 -26 raw_recvmsg 465 437 -28 udp_bpf_recvmsg 1027 984 -43 sock_common_recvmsg 103 27 -76 inet_recvmsg 257 175 -82 inet6_recvmsg 257 175 -82 tcp_bpf_recvmsg 663 568 -95 Total: Before=25143834, After=25143328, chg -0.00% Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://patch.msgid.link/20260227151120.1346573-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28icmp: fix ICMP error source address when xfrm policy matchesAntony Antony1-1/+0
When an IPsec gateway generates an ICMP error (e.g., Destination Host Unreachable), the source address incorrectly shows the unreachable destination instead of the gateway's address. IPv6 behaves correctly. Before fix: ping 10.1.6.3 From 10.1.6.3 icmp_seq=1 Destination Host Unreachable (wrong - 10.1.6.3 is the unreachable host) After fix: ping 10.1.6.3 From 10.1.5.2 icmp_seq=1 Destination Host Unreachable (correct - 10.1.5.2 is the gateway) The fix removes the memcpy that overwrote fl4 with fl4_dec after xfrm_lookup(). A follow-up commit adds a selftest. Fixes: 415b3334a21a ("icmp: Fix regression in nexthop resolution during replies.") Cc: stable+noautosel@kernel.org # Avoid false positives in tests Signed-off-by: Antony Antony <antony.antony@secunet.com> Acked-by: Tobias Brunner <tobias@strongswan.org> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://patch.msgid.link/19a0156ff6e76baa323a81d710510d399a6ff63a.1772101380.git.antony.antony@secunet.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28tcp: give up on stronger sk_rcvbuf checks (for now)Jakub Kicinski1-15/+1
We hit another corner case which leads to TcpExtTCPRcvQDrop Connections which send RPCs in the 20-80kB range over loopback experience spurious drops. The exact conditions for most of the drops I investigated are that: - socket exchanged >1MB of data so its not completely fresh - rcvbuf is around 128kB (default, hasn't grown) - there is ~60kB of data in rcvq - skb > 64kB arrives The sum of skb->len (!) of both of the skbs (the one already in rcvq and the arriving one) is larger than rwnd. My suspicion is that this happens because __tcp_select_window() rounds the rwnd up to (1 << wscale) if less than half of the rwnd has been consumed. Eric suggests that given the number of Fixes we already have pointing to 1d2fbaad7cd8 it's probably time to give up on it, until a bigger revamp of rmem management. Also while we could risk tweaking the rwnd math, there are other drops on workloads I investigated, after the commit in question, not explained by this phenomenon. Suggested-by: Eric Dumazet <edumazet@google.com> Link: https://lore.kernel.org/20260225122355.585fd57b@kernel.org Fixes: 1d2fbaad7cd8 ("tcp: stronger sk_rcvbuf checks") Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260227003359.2391017-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-28udp: Unhash auto-bound connected sk from 4-tuple hash table when disconnected.Kuniyuki Iwashima1-10/+15
Let's say we bind() an UDP socket to the wildcard address with a non-zero port, connect() it to an address, and disconnect it from the address. bind() sets SOCK_BINDPORT_LOCK on sk->sk_userlocks (but not SOCK_BINDADDR_LOCK), and connect() calls udp_lib_hash4() to put the socket into the 4-tuple hash table. Then, __udp_disconnect() calls sk->sk_prot->rehash(sk). It computes a new hash based on the wildcard address and moves the socket to a new slot in the 4-tuple hash table, leaving a garbage in the chain that no packet hits. Let's remove such a socket from 4-tuple hash table when disconnected. Note that udp_sk(sk)->udp_portaddr_hash needs to be udpated after udp_hash4_dec(hslot2) in udp_unhash4(). Fixes: 78c91ae2c6de ("ipv4/udp: Add 4-tuple hash for connected socket") Signed-off-by: Kuniyuki Iwashima <kuniyu@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260227035547.3321327-1-kuniyu@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-27inet: annotate data-races around isk->inet_numEric Dumazet2-5/+5
UDP/TCP lookups are using RCU, thus isk->inet_num accesses should use READ_ONCE() and WRITE_ONCE() where needed. Fixes: 3ab5aee7fe84 ("net: Convert TCP & DCCP hash tables to use RCU / hlist_nulls") Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260225203545.1512417-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26net: annotate data-races around sk->sk_{data_ready,write_space}Eric Dumazet6-12/+14
skmsg (and probably other layers) are changing these pointers while other cpus might read them concurrently. Add corresponding READ_ONCE()/WRITE_ONCE() annotations for UDP, TCP and AF_UNIX. Fixes: 604326b41a6f ("bpf, sockmap: convert to generic sk_msg interface") Reported-by: syzbot+87f770387a9e5dc6b79b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/699ee9fc.050a0220.1cd54b.0009.GAE@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: John Fastabend <john.fastabend@gmail.com> Cc: Jakub Sitnicki <jakub@cloudflare.com> Cc: Willem de Bruijn <willemdebruijn.kernel@gmail.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20260225131547.1085509-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26inet: remove three EXPORT_SYMBOL()Eric Dumazet1-3/+1
inet_rcv_saddr_equal() and inet_csk_listen_stop() are not used from any modules. inet_csk_accept() can use EXPORT_IPV6_MOD() Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260225134023.1176738-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski29-74/+81
Cross-merge networking fixes after downstream PR (net-7.0-rc2). Conflicts: tools/testing/selftests/drivers/net/hw/rss_ctx.py 19c3a2a81d2b ("selftests: drv-net: rss: Generate unique ports for RSS context tests") ce5a0f4612db ("selftests: drv-net: rss_ctx: test RSS contexts persist after ifdown/up") include/net/inet_connection_sock.h 858d2a4f67ff6 ("tcp: fix potential race in tcp_v6_syn_recv_sock()") fcd3d039fab69 ("tcp: make tcp_v{4,6}_send_check() static") https://lore.kernel.org/aZ8PSFLzBrEU3I89@sirena.org.uk drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c 69050f8d6d075 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") bf4afc53b77ae ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") 8a96b9144f18a ("net/mlx5e: Alloc xsk channel param out of mlx5e_open_xsk()") Adjacent changes: net/netfilter/ipvs/ip_vs_ctl.c c59bd9e62e06 ("ipvs: use more counters to avoid service lookups") bf4afc53b77a ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26Merge tag 'net-7.0-rc2' of ↵Linus Torvalds6-10/+25
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: - bnxt_en: fix deleting of Ntuple filters - wan: farsync: fix use-after-free bugs caused by unfinished tasklets - xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: - mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - gve: fix incorrect buffer cleanup for QPL - team: avoid NETDEV_CHANGEMTU event when unregistering slave - usb: validate USB endpoints" * tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) netfilter: nf_conntrack_h323: fix OOB read in decode_choice() dpaa2-switch: validate num_ifs to prevent out-of-bounds write net: consume xmit errors of GSO frames vsock: document write-once behavior of the child_ns_mode sysctl vsock: lock down child_ns_mode as write-once selftests/vsock: change tests to respect write-once child ns mode net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query net/mlx5: Fix missing devlink lock in SRIOV enable error path net/mlx5: E-switch, Clear legacy flag when moving to switchdev net/mlx5: LAG, disable MPESW in lag_disable_change() net/mlx5: DR, Fix circular locking dependency in dump selftests: team: Add a reference count leak test team: avoid NETDEV_CHANGEMTU event when unregistering slave net: mana: Fix double destroy_workqueue on service rescan PCI path MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 tcp: re-enable acceptance of FIN packets when RWIN is 0 vsock: Use container_of() to get net namespace in sysctl handlers net: usb: kaweth: validate USB endpoints ...
2026-02-25netfilter: nf_tables: remove register tracking infrastructureFlorian Westphal3-4/+0
This facility was disabled in commit 9e539c5b6d9c ("netfilter: nf_tables: disable expression reduction infra"), because not all nft_exprs guarantee they will update the destination register: some may set NFT_BREAK instead to cancel evaluation of the rule. This has been dead code ever since. There are no plans to salvage this at this time, so remove this. Signed-off-by: Florian Westphal <fw@strlen.de> Link: https://patch.msgid.link/20260224205048.4718-10-fw@strlen.de Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-25tcp: re-enable acceptance of FIN packets when RWIN is 0Simon Baatz1-4/+14