aboutsummaryrefslogtreecommitdiff
path: root/net/netfilter/ipvs
AgeCommit message (Collapse)AuthorFilesLines
2025-12-10ipvs: fix ipv4 null-ptr-deref in route error pathSlavin Liu1-0/+3
The IPv4 code path in __ip_vs_get_out_rt() calls dst_link_failure() without ensuring skb->dev is set, leading to a NULL pointer dereference in fib_compute_spec_dst() when ipv4_link_failure() attempts to send ICMP destination unreachable messages. The issue emerged after commit ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") started calling __ip_options_compile() from ipv4_link_failure(). This code path eventually calls fib_compute_spec_dst() which dereferences skb->dev. An attempt was made to fix the NULL skb->dev dereference in commit 0113d9c9d1cc ("ipv4: fix null-deref in ipv4_link_failure"), but it only addressed the immediate dev_net(skb->dev) dereference by using a fallback device. The fix was incomplete because fib_compute_spec_dst() later in the call chain still accesses skb->dev directly, which remains NULL when IPVS calls dst_link_failure(). The crash occurs when: 1. IPVS processes a packet in NAT mode with a misconfigured destination 2. Route lookup fails in __ip_vs_get_out_rt() before establishing a route 3. The error path calls dst_link_failure(skb) with skb->dev == NULL 4. ipv4_link_failure() → ipv4_send_dest_unreach() → __ip_options_compile() → fib_compute_spec_dst() 5. fib_compute_spec_dst() dereferences NULL skb->dev Apply the same fix used for IPv6 in commit 326bf17ea5d4 ("ipvs: fix ipv6 route unreach panic"): set skb->dev from skb_dst(skb)->dev before calling dst_link_failure(). KASAN: null-ptr-deref in range [0x0000000000000328-0x000000000000032f] CPU: 1 PID: 12732 Comm: syz.1.3469 Not tainted 6.6.114 #2 RIP: 0010:__in_dev_get_rcu include/linux/inetdevice.h:233 RIP: 0010:fib_compute_spec_dst+0x17a/0x9f0 net/ipv4/fib_frontend.c:285 Call Trace: <TASK> spec_dst_fill net/ipv4/ip_options.c:232 spec_dst_fill net/ipv4/ip_options.c:229 __ip_options_compile+0x13a1/0x17d0 net/ipv4/ip_options.c:330 ipv4_send_dest_unreach net/ipv4/route.c:1252 ipv4_link_failure+0x702/0xb80 net/ipv4/route.c:1265 dst_link_failure include/net/dst.h:437 __ip_vs_get_out_rt+0x15fd/0x19e0 net/netfilter/ipvs/ip_vs_xmit.c:412 ip_vs_nat_xmit+0x1d8/0xc80 net/netfilter/ipvs/ip_vs_xmit.c:764 Fixes: ed0de45a1008 ("ipv4: recompile ip options in ipv4_link_failure") Signed-off-by: Slavin Liu <slavin452@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-11-28net: Remove KMSG_COMPONENT macroHeiko Carstens30-60/+30
The KMSG_COMPONENT macro is a leftover of the s390 specific "kernel message catalog" from 2008 [1] which never made it upstream. The macro was added to s390 code to allow for an out-of-tree patch which used this to generate unique message ids. Also this out-of-tree patch doesn't exist anymore. The pattern of how the KMSG_COMPONENT macro is used can also be found at some non s390 specific code, for whatever reasons. Besides adding an indirection it is unused. Remove the macro in order to get rid of a pointless indirection. Replace all users with the string it defines. In all cases this leads to a simple replacement like this: - #define KMSG_COMPONENT "af_iucv" - #define pr_fmt(fmt) KMSG_COMPONENT ": " fmt + #define pr_fmt(fmt) "af_iucv: " fmt [1] https://lwn.net/Articles/292650/ Signed-off-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Alexandra Winter <wintera@linux.ibm.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Sidraya Jayagond <sidraya@linux.ibm.com> Link: https://patch.msgid.link/20251126140705.1944278-1-hca@linux.ibm.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops connect() callbacks to use sockaddr_unsizedKees Cook1-1/+1
Update all struct proto_ops connect() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-3-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-04net: Convert proto_ops bind() callbacks to use sockaddr_unsizedKees Cook1-2/+2
Update all struct proto_ops bind() callback function prototypes from "struct sockaddr *" to "struct sockaddr_unsized *" to avoid lying to the compiler about object sizes. Calls into struct proto handlers gain casts that will be removed in the struct proto conversion patch. No binary changes expected. Signed-off-by: Kees Cook <kees@kernel.org> Link: https://patch.msgid.link/20251104002617.2752303-2-kees@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-09-24ipvs: Defer ip_vs_ftp unregister during netns cleanupSlavin Liu1-1/+3
On the netns cleanup path, __ip_vs_ftp_exit() may unregister ip_vs_ftp before connections with valid cp->app pointers are flushed, leading to a use-after-free. Fix this by introducing a global `exiting_module` flag, set to true in ip_vs_ftp_exit() before unregistering the pernet subsystem. In __ip_vs_ftp_exit(), skip ip_vs_ftp unregister if called during netns cleanup (when exiting_module is false) and defer it to __ip_vs_cleanup_batch(), which unregisters all apps after all connections are flushed. If called during module exit, unregister ip_vs_ftp immediately. Fixes: 61b1ab4583e2 ("IPVS: netns, add basic init per netns.") Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Slavin Liu <slavin452@gmail.com> Signed-off-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-09-11ipvs: Use READ_ONCE/WRITE_ONCE for ipvs->enableZhang Tengfei4-20/+17
KCSAN reported a data-race on the `ipvs->enable` flag, which is written in the control path and read concurrently from many other contexts. Following a suggestion by Julian, this patch fixes the race by converting all accesses to use `WRITE_ONCE()/READ_ONCE()`. This lightweight approach ensures atomic access and acts as a compiler barrier, preventing unsafe optimizations where the flag is checked in loops (e.g., in ip_vs_est.c). Additionally, the `enable` checks in the fast-path hooks (`ip_vs_in_hook`, `ip_vs_out_hook`, `ip_vs_forward_icmp`) are removed. These are unnecessary since commit 857ca89711de ("ipvs: register hooks only with services"). The `enable=0` condition they check for can only occur in two rare and non-fatal scenarios: 1) after hooks are registered but before the flag is set, and 2) after hooks are unregistered on cleanup_net. In the worst case, a single packet might be mishandled (e.g., dropped), which does not lead to a system crash or data corruption. Adding a check in the performance-critical fast-path to handle this harmless condition is not a worthwhile trade-off. Fixes: 857ca89711de ("ipvs: register hooks only with services") Reported-by: syzbot+1651b5234028c294c339@syzkaller.appspotmail.com Closes: https://syzkaller.appspot.com/bug?extid=1651b5234028c294c339 Suggested-by: Julian Anastasov <ja@ssi.bg> Link: https://lore.kernel.org/lvs-devel/2189fc62-e51e-78c9-d1de-d35b8e3657e3@ssi.bg/ Signed-off-by: Zhang Tengfei <zhtfdev@gmail.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-08-13ipvs: Fix estimator kthreads preferred affinityFrederic Weisbecker1-1/+2
The estimator kthreads' affinity are defined by sysctl overwritten preferences and applied through a plain call to the scheduler's affinity API. However since the introduction of managed kthreads preferred affinity, such a practice shortcuts the kthreads core code which eventually overwrites the target to the default unbound affinity. Fix this with using the appropriate kthread's API. Fixes: d1a89197589c ("kthread: Default affine kthread to its preferred NUMA node") Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Florian Westphal <fw@strlen.de>
2025-07-25ipvs: Rename del_timer in comment in ip_vs_conn_expire_now()WangYuli1-1/+1
Commit 8fa7292fee5c ("treewide: Switch/rename to timer_delete[_sync]()") switched del_timer to timer_delete, but did not modify the comment for ip_vs_conn_expire_now(). Now fix it. Signed-off-by: WangYuli <wangyuli@uniontech.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-07-02net: dst: annotate data-races around dst->obsoleteEric Dumazet1-1/+1
(dst_entry)->obsolete is read locklessly, add corresponding annotations. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@google.com> Link: https://patch.msgid.link/20250630121934.3399505-2-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar4-4/+7
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-05-21sctp: use skb_crc32c() instead of __skb_checksum()Eric Biggers1-1/+1
Make sctp_compute_cksum() just use the new function skb_crc32c(), instead of calling __skb_checksum() with a skb_checksum_ops struct that does CRC32C. This is faster and simpler. Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://patch.msgid.link/20250519175012.36581-6-ebiggers@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-05-07ipvs: fix uninit-value for saddr in do_output_route4Julian Anastasov1-19/+8
syzbot reports for uninit-value for the saddr argument [1]. commit 4754957f04f5 ("ipvs: do not use random local source address for tunnels") already implies that the input value of saddr should be ignored but the code is still reading it which can prevent to connect the route. Fix it by changing the argument to ret_saddr. [1] BUG: KMSAN: uninit-value in do_output_route4+0x42c/0x4d0 net/netfilter/ipvs/ip_vs_xmit.c:147 do_output_route4+0x42c/0x4d0 net/netfilter/ipvs/ip_vs_xmit.c:147 __ip_vs_get_out_rt+0x403/0x21d0 net/netfilter/ipvs/ip_vs_xmit.c:330 ip_vs_tunnel_xmit+0x205/0x2380 net/netfilter/ipvs/ip_vs_xmit.c:1136 ip_vs_in_hook+0x1aa5/0x35b0 net/netfilter/ipvs/ip_vs_core.c:2063 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf7/0x400 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] __ip_local_out+0x758/0x7e0 net/ipv4/ip_output.c:118 ip_local_out net/ipv4/ip_output.c:127 [inline] ip_send_skb+0x6a/0x3c0 net/ipv4/ip_output.c:1501 udp_send_skb+0xfda/0x1b70 net/ipv4/udp.c:1195 udp_sendmsg+0x2fe3/0x33c0 net/ipv4/udp.c:1483 inet_sendmsg+0x1fc/0x280 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg+0x267/0x380 net/socket.c:727 ____sys_sendmsg+0x91b/0xda0 net/socket.c:2566 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2620 __sys_sendmmsg+0x41d/0x880 net/socket.c:2702 __compat_sys_sendmmsg net/compat.c:360 [inline] __do_compat_sys_sendmmsg net/compat.c:367 [inline] __se_compat_sys_sendmmsg net/compat.c:364 [inline] __ia32_compat_sys_sendmmsg+0xc8/0x140 net/compat.c:364 ia32_sys_call+0x3ffa/0x41f0 arch/x86/include/generated/asm/syscalls_32.h:346 do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline] __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/syscall_32.c:306 do_fast_syscall_32+0x38/0x80 arch/x86/entry/syscall_32.c:331 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/syscall_32.c:369 entry_SYSENTER_compat_after_hwframe+0x84/0x8e Uninit was created at: slab_post_alloc_hook mm/slub.c:4167 [inline] slab_alloc_node mm/slub.c:4210 [inline] __kmalloc_cache_noprof+0x8fa/0xe00 mm/slub.c:4367 kmalloc_noprof include/linux/slab.h:905 [inline] ip_vs_dest_dst_alloc net/netfilter/ipvs/ip_vs_xmit.c:61 [inline] __ip_vs_get_out_rt+0x35d/0x21d0 net/netfilter/ipvs/ip_vs_xmit.c:323 ip_vs_tunnel_xmit+0x205/0x2380 net/netfilter/ipvs/ip_vs_xmit.c:1136 ip_vs_in_hook+0x1aa5/0x35b0 net/netfilter/ipvs/ip_vs_core.c:2063 nf_hook_entry_hookfn include/linux/netfilter.h:154 [inline] nf_hook_slow+0xf7/0x400 net/netfilter/core.c:626 nf_hook include/linux/netfilter.h:269 [inline] __ip_local_out+0x758/0x7e0 net/ipv4/ip_output.c:118 ip_local_out net/ipv4/ip_output.c:127 [inline] ip_send_skb+0x6a/0x3c0 net/ipv4/ip_output.c:1501 udp_send_skb+0xfda/0x1b70 net/ipv4/udp.c:1195 udp_sendmsg+0x2fe3/0x33c0 net/ipv4/udp.c:1483 inet_sendmsg+0x1fc/0x280 net/ipv4/af_inet.c:851 sock_sendmsg_nosec net/socket.c:712 [inline] __sock_sendmsg+0x267/0x380 net/socket.c:727 ____sys_sendmsg+0x91b/0xda0 net/socket.c:2566 ___sys_sendmsg+0x28d/0x3c0 net/socket.c:2620 __sys_sendmmsg+0x41d/0x880 net/socket.c:2702 __compat_sys_sendmmsg net/compat.c:360 [inline] __do_compat_sys_sendmmsg net/compat.c:367 [inline] __se_compat_sys_sendmmsg net/compat.c:364 [inline] __ia32_compat_sys_sendmmsg+0xc8/0x140 net/compat.c:364 ia32_sys_call+0x3ffa/0x41f0 arch/x86/include/generated/asm/syscalls_32.h:346 do_syscall_32_irqs_on arch/x86/entry/syscall_32.c:83 [inline] __do_fast_syscall_32+0xb0/0x110 arch/x86/entry/syscall_32.c:306 do_fast_syscall_32+0x38/0x80 arch/x86/entry/syscall_32.c:331 do_SYSENTER_32+0x1f/0x30 arch/x86/entry/syscall_32.c:369 entry_SYSENTER_compat_after_hwframe+0x84/0x8e CPU: 0 UID: 0 PID: 22408 Comm: syz.4.5165 Not tainted 6.15.0-rc3-syzkaller-00019-gbc3372351d0c #0 PREEMPT(undef) Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/12/2025 Reported-by: syzbot+04b9a82855c8aed20860@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/68138dfa.050a0220.14dd7d.0017.GAE@google.com/ Fixes: 4754957f04f5 ("ipvs: do not use random local source address for tunnels") Signed-off-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-04-08Merge tag 'crc-for-linus' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux Pull CRC cleanups from Eric Biggers: "Finish cleaning up the CRC kconfig options by removing the remaining unnecessary prompts and an unnecessary 'default y', removing CONFIG_LIBCRC32C, and documenting all the CRC library options" * tag 'crc-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiggers/linux: lib/crc: remove CONFIG_LIBCRC32C lib/crc: document all the CRC library kconfig options lib/crc: remove unnecessary prompt for CONFIG_CRC_ITU_T lib/crc: remove unnecessary prompt for CONFIG_CRC_T10DIF lib/crc: remove unnecessary prompt for CONFIG_CRC16 lib/crc: remove unnecessary prompt for CONFIG_CRC_CCITT lib/crc: remove unnecessary prompt for CONFIG_CRC32 and drop 'default y'
2025-04-05treewide: Switch/rename to timer_delete[_sync]()Thomas Gleixner2-4/+4
timer_delete[_sync]() replaces del_timer[_sync](). Convert the whole tree over and remove the historical wrapper inlines. Conversion was done with coccinelle plus manual fixups where necessary. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2025-04-04lib/crc: remove CONFIG_LIBCRC32CEric Biggers1-1/+1
Now that LIBCRC32C does nothing besides select CRC32, make every option that selects LIBCRC32C instead select CRC32 directly. Then remove LIBCRC32C. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com> Acked-by: Ard Biesheuvel <ardb@kernel.org> Link: https://lore.kernel.org/r/20250401221600.24878-8-ebiggers@kernel.org Signed-off-by: Eric Biggers <ebiggers@google.com>
2025-03-12ipvs: prevent integer overflow in do_ip_vs_get_ctl()Dan Carpenter1-4/+4
The get->num_services variable is an unsigned int which is controlled by the user. The struct_size() function ensures that the size calculation does not overflow an unsigned long, however, we are saving the result to an int so the calculation can overflow. Both "len" and "get->num_services" come from the user. This check is just a sanity check to help the user and ensure they are using the API correctly. An integer overflow here is not a big deal. This has no security impact. Save the result from struct_size() type size_t to fix this integer overflow bug. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2025-01-05ipvs: speed up reads from ip_vs_conn proc fileFlorian Westphal1-22/+28
Reading is very slow because ->start() performs a linear re-scan of the entire hash table until it finds the successor to the last dumped element. The current implementation uses 'pos' as the 'number of elements to skip, then does linear iteration until it has skipped 'pos' entries. Store the last bucket and the number of elements to skip in that bucket instead, so we can resume from bucket b directly. before this patch, its possible to read ~35k entries in one second, but each read() gets slower as the number of entries to skip grows: time timeout 60 cat /proc/net/ip_vs_conn > /tmp/all; wc -l /tmp/all real 1m0.007s user 0m0.003s sys 0m59.956s 140386 /tmp/all Only ~100k more got read in remaining the remaining 59s, and did not get nowhere near the 1m entries that are stored at the time. after this patch, dump completes very quickly: time cat /proc/net/ip_vs_conn > /tmp/all; wc -l /tmp/all real 0m2.286s user 0m0.004s sys 0m2.281s 1000001 /tmp/all Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-12-18ipvs: Fix clamp() of ip_vs_conn_tab on small memory systemsDavid Laight1-2/+2
The 'max_avail' value is calculated from the system memory size using order_base_2(). order_base_2(x) is defined as '(x) ? fn(x) : 0'. The compiler generates two copies of the code that follows and then expands clamp(max, min, PAGE_SHIFT - 12) (11 on 32bit). This triggers a compile-time assert since min is 5. In reality a system would have to have less than 512MB memory for the bounds passed to clamp to be reversed. Swap the order of the arguments to clamp() to avoid the warning. Replace the clamp_val() on the line below with clamp(). clamp_val() is just 'an accident waiting to happen' and not needed here. Detected by compile time checks added to clamp(), specifically: minmax.h: use BUILD_BUG_ON_MSG() for the lo < hi test in clamp() Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Closes: https://lore.kernel.org/all/CA+G9fYsT34UkGFKxus63H6UVpYi5GRZkezT9MRLfAbM3f6ke0g@mail.gmail.com/ Fixes: 4f325e26277b ("ipvs: dynamically limit the connection hash table") Tested-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Reviewed-by: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Signed-off-by: David Laight <david.laight@aculab.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-28ipvs: fix UB due to uninitialized stack access in ip_vs_protocol_init()Jinghao Jia1-3/+1
Under certain kernel configurations when building with Clang/LLVM, the compiler does not generate a return or jump as the terminator instruction for ip_vs_protocol_init(), triggering the following objtool warning during build time: vmlinux.o: warning: objtool: ip_vs_protocol_init() falls through to next function __initstub__kmod_ip_vs_rr__935_123_ip_vs_rr_init6() At runtime, this either causes an oops when trying to load the ipvs module or a boot-time panic if ipvs is built-in. This same issue has been reported by the Intel kernel test robot previously. Digging deeper into both LLVM and the kernel code reveals this to be a undefined behavior problem. ip_vs_protocol_init() uses a on-stack buffer of 64 chars to store the registered protocol names and leaves it uninitialized after definition. The function calls strnlen() when concatenating protocol names into the buffer. With CONFIG_FORTIFY_SOURCE strnlen() performs an extra step to check whether the last byte of the input char buffer is a null character (commit 3009f891bb9f ("fortify: Allow strlen() and strnlen() to pass compile-time known lengths")). This, together with possibly other configurations, cause the following IR to be generated: define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #5 section ".init.text" align 16 !kcfi_type !29 { %1 = alloca [64 x i8], align 16 ... 14: ; preds = %11 %15 = getelementptr inbounds i8, ptr %1, i64 63 %16 = load i8, ptr %15, align 1 %17 = tail call i1 @llvm.is.constant.i8(i8 %16) %18 = icmp eq i8 %16, 0 %19 = select i1 %17, i1 %18, i1 false br i1 %19, label %20, label %23 20: ; preds = %14 %21 = call i64 @strlen(ptr noundef nonnull dereferenceable(1) %1) #23 ... 23: ; preds = %14, %11, %20 %24 = call i64 @strnlen(ptr noundef nonnull dereferenceable(1) %1, i64 noundef 64) #24 ... } The above code calculates the address of the last char in the buffer (value %15) and then loads from it (value %16). Because the buffer is never initialized, the LLVM GVN pass marks value %16 as undefined: %13 = getelementptr inbounds i8, ptr %1, i64 63 br i1 undef, label %14, label %17 This gives later passes (SCCP, in particular) more DCE opportunities by propagating the undef value further, and eventually removes everything after the load on the uninitialized stack location: define hidden i32 @ip_vs_protocol_init() local_unnamed_addr #0 section ".init.text" align 16 !kcfi_type !11 { %1 = alloca [64 x i8], align 16 ... 12: ; preds = %11 %13 = getelementptr inbounds i8, ptr %1, i64 63 unreachable } In this way, the generated native code will just fall through to the next function, as LLVM does not generate any code for the unreachable IR instruction and leaves the function without a terminator. Zero the on-stack buffer to avoid this possible UB. Fixes: 1da177e4c3f4 ("Linux-2.6.12-rc2") Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402100205.PWXIz1ZK-lkp@intel.com/ Co-developed-by: Ruowen Qin <ruqin@redhat.com> Signed-off-by: Ruowen Qin <ruqin@redhat.com> Signed-off-by: Jinghao Jia <jinghao7@illinois.edu> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-11net: convert to nla_get_*_default()Johannes Berg1-4/+1
Most of the original conversion is from the spatch below, but I edited some and left out other instances that were either buggy after conversion (where default values don't fit into the type) or just looked strange. @@ expression attr, def; expression val; identifier fn =~ "^nla_get_.*"; fresh identifier dfn = fn ## "_default"; @@ ( -if (attr) - val = fn(attr); -else - val = def; +val = dfn(attr, def); | -if (!attr) - val = def; -else - val = fn(attr); +val = dfn(attr, def); | -if (!attr) - return def; -return fn(attr); +return dfn(attr, def); | -attr ? fn(attr) : def +dfn(attr, def) | -!attr ? def : fn(attr) +dfn(attr, def) ) Signed-off-by: Johannes Berg <johannes.berg@intel.com> Reviewed-by: Toke Høiland-Jørgensen <toke@kernel.org> Link: https://patch.msgid.link/20241108114145.0580b8684e7f.I740beeaa2f70ebfc19bfca1045a24d6151992790@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-10-02move asm/unaligned.h to linux/unaligned.hAl Viro2-2/+2
asm/unaligned.h is always an include of asm-generic/unaligned.h; might as well move that thing to linux/unaligned.h and include that - there's nothing arch-specific in that header. auto-generated by the following: for i in `git grep -l -w asm/unaligned.h`; do sed -i -e "s/asm\/unaligned.h/linux\/unaligned.h/" $i done for i in `git grep -l -w asm-generic/unaligned.h`; do sed -i -e "s/asm-generic\/unaligned.h/linux\/unaligned.h/" $i done git mv include/asm-generic/unaligned.h include/linux/unaligned.h git mv tools/include/asm-generic/unaligned.h tools/include/linux/unaligned.h sed -i -e "/unaligned.h/d" include/asm-generic/Kbuild sed -i -e "s/__ASM_GENERIC/__LINUX/" include/linux/unaligned.h tools/include/linux/unaligned.h
2024-07-24sysctl: treewide: constify the ctl_table argument of proc_handlersJoel Granados1-6/+6
const qualify the struct ctl_table argument in the proc_handler function signatures. This is a prerequisite to moving the static ctl_table structs into .rodata data which will ensure that proc_handler function pointers cannot be modified. This patch has been generated by the following coccinelle script: ``` virtual patch @r1@ identifier ctl, write, buffer, lenp, ppos; identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)"; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos); @r2@ identifier func, ctl, write, buffer, lenp, ppos; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int write, void *buffer, size_t *lenp, loff_t *ppos) { ... } @r3@ identifier func; @@ int func( - struct ctl_table * + const struct ctl_table * ,int , void *, size_t *, loff_t *); @r4@ identifier func, ctl; @@ int func( - struct ctl_table *ctl + const struct ctl_table *ctl ,int , void *, size_t *, loff_t *); @r5@ identifier func, write, buffer, lenp, ppos; @@ int func( - struct ctl_table * + const struct ctl_table * ,int write, void *buffer, size_t *lenp, loff_t *ppos); ``` * Code formatting was adjusted in xfs_sysctl.c to comply with code conventions. The xfs_stats_clear_proc_handler, xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where adjusted. * The ctl_table argument in proc_watchdog_common was const qualified. This is called from a proc_handler itself and is calling back into another proc_handler, making it necessary to change it as part of the proc_handler migration. Co-developed-by: Thomas Weißschuh <linux@weissschuh.net> Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Co-developed-by: Joel Granados <j.granados@samsung.com> Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-17ipvs: properly dereference pe in ip_vs_add_serviceChen Hanxiao1-5/+5
Use pe directly to resolve sparse warning: net/netfilter/ipvs/ip_vs_ctl.c:1471:27: warning: dereference of noderef expression Fixes: 39b972231536 ("ipvs: handle connections started by real-servers") Signed-off-by: Chen Hanxiao <chenhx.fnst@fujitsu.com> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-06-26ipvs: Avoid unnecessary calls to skb_is_gso_sctpIsmael Luceno1-2/+2
In the context of the SCTP SNAT/DNAT handler, these calls can only return true. Fixes: e10d3ba4d434 ("ipvs: Fix checksumming on GSO of SCTP packets") Signed-off-by: Ismael Luceno <iluceno@suse.de> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-05-28ipvs: constify ctl_table arguments of utility functionsThomas Weißschuh1-3/+4
The sysctl core is preparing to only expose instances of struct ctl_table as "const". This will also affect the ctl_table argument of sysctl handlers. As the function prototype of all sysctl handlers throughout the tree needs to stay consistent that change will be done in one commit. To reduce the size of that final commit, switch utility functions which are not bound by "typedef proc_handler" to "const struct ctl_table". No functional change. Signed-off-by: Thomas Weißschuh <linux@weissschuh.net> Link: https://lore.kernel.org/r/20240527-sysctl-const-handler-net-v1-5-16523767d0b2@weissschuh.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-05-08ipvs: allow some sysctls in non-init user namespacesAlexander Mikhalitsyn1-4/+15
Let's make all IPVS sysctls writtable even when network namespace is owned by non-initial user namespace. Let's make a few sysctls to be read-only for non-privileged users: - sync_qlen_max - sync_sock_size - run_estimation - est_cpulist - est_nice I'm trying to be conservative with this to prevent introducing any security issues in there. Maybe, we can allow more sysctls to be writable, but let's do this on-demand and when we see real use-case. This patch is motivated by user request in the LXC project [1]. Having this can help with running some Kubernetes [2] or Docker Swarm [3] workloads inside the system containers. Link: https://github.com/lxc/lxc/issues/4278 [1] Link: https://github.com/kubernetes/kubernetes/blob/b722d017a34b300a2284b890448e5a605f21d01e/pkg/proxy/ipvs/proxier.go#L103 [2] Link: https://github.com/moby/libnetwork/blob/3797618f9a38372e8107d8c06f6ae199e1133ae8/osl/namespace_linux.go#L682 [3] Cc: Julian Anastasov <ja@ssi.bg> Cc: Simon Horman <horms@verge.net.au> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-08ipvs: add READ_ONCE barrier for ipvs->sysctl_amemthreshAlexander Mikhalitsyn1-7/+7
Cc: Julian Anastasov <ja@ssi.bg> Cc: Simon Horman <horms@verge.net.au> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: Jozsef Kadlecsik <kadlec@netfilter.org> Cc: Florian Westphal <fw@strlen.de> Suggested-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Alexander Mikhalitsyn <aleksandr.mikhalitsyn@canonical.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-05-03netfilter: Remove the now superfluous sentinel elements from ctl_table arrayJoel Granados3-12/+3
This commit comes at the tail end of a greater effort to remove the empty elements at the end of the ctl_table arrays (sentinels) which will reduce the overall build time size of the kernel and run time memory bloat by ~64 bytes per sentinel (further information Link : https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/) * Remove sentinel elements from ctl_table structs * Remove instances where an array element is zeroed out to make it look like a sentinel. This is not longer needed and is safe after commit c899710fe7f9 ("networking: Update to register_net_sysctl_sz") added the array size to the ctl_table registration * Remove the need for having __NF_SYSCTL_CT_LAST_SYSCTL as the sysctl array size is now in NF_SYSCTL_CT_LAST_SYSCTL * Remove extra element in ctl_table arrays declarations Acked-by: Kees Cook <keescook@chromium.org> # loadpin & yama Signed-off-by: Joel Granados <j.granados@samsung.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-30inet: introduce dst_rtable() helperEric Dumazet1-1/+1
I added dst_rt6_info() in commit e8dfd42c17fa ("ipv6: introduce dst_rt6_info() helper") This patch does a similar change for IPv4. Instead of (struct rtable *)dst casts, we can use : #define dst_rtable(_ptr) \ container_of_const(_ptr, struct rtable, dst) Patch is smaller than IPv6 one, because IPv4 has skb_rtable() helper. Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Reviewed-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/20240429133009.1227754-1-edumazet@google.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-29ipv6: introduce dst_rt6_info() helperEric Dumazet1-7/+7
Instead of (struct rt6_info *)dst casts, we can use : #define dst_rt6_info(_ptr) \ container_of_const(_ptr, struct rt6_info, dst) Some places needed missing const qualifiers : ip6_confirm_neigh(), ipv6_anycast_destination(), ipv6_unicast_destination(), has_gateway() v2: added missing parts (David Ahern) Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: David Ahern <dsahern@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-04-25Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-2/+4
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/ti/icssg/icssg_prueth.c net/mac80211/chan.c 89884459a0b9 ("wifi: mac80211: fix idle calculation with multi-link") 87f5500285fb ("wifi: mac80211: simplify ieee80211_assign_link_chanctx()") https://lore.kernel.org/all/20240422105623.7b1fbda2@canb.auug.org.au/ net/unix/garbage.c 1971d13ffa84 ("af_unix: Suppress false-positive lockdep splat for spin_lock() in __unix_gc().") 4090fa373f0e ("af_unix: Replace garbage collection algorithm.") drivers/net/ethernet/ti/icssg/icssg_prueth.c drivers/net/ethernet/ti/icssg/icssg_common.c 4dcd0e83ea1d ("net: ti: icssg-prueth: Fix signedness bug in prueth_init_rx_chns()") e2dc7bfd677f ("net: ti: icssg-prueth: Move common functions into a separate file") No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-04-25ipvs: Fix checksumming on GSO of SCTP packetsIsmael Luceno1-2/+4
It was observed in the wild that pairs of consecutive packets would leave the IPVS with the same wrong checksum, and the issue only went away when disabling GSO. IPVS needs to avoid computing the SCTP checksum when using GSO. Fixes: 90017accff61 ("sctp: Add GSO support") Co-developed-by: Firo Yang <firo.yang@suse.com> Signed-off-by: Ismael Luceno <iluceno@suse.de> Tested-by: Andreas Taschner <andreas.taschner@suse.com> Acked-by: Julian Anastasov <ja@ssi.bg> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-04-01ip_tunnel: convert __be16 tunnel flags to bitmapsAlexander Lobakin2-11/+15
Historically, tunnel flags like TUNNEL_CSUM or TUNNEL_ERSPAN_OPT have been defined as __be16. Now all of those 16 bits are occupied and there's no more free space for new flags. It can't be simply switched to a bigger container with no adjustments to the values, since it's an explicit Endian storage, and on LE systems (__be16)0x0001 equals to (__be64)0x0001000000000000. We could probably define new 64-bit flags depending on the Endianness, i.e. (__be64)0x0001 on BE and (__be64)0x00010000... on LE, but that would introduce an Endianness dependency and spawn a ton of Sparse warnings. To mitigate them, all of those places which were adjusted with this change would be touched anyway, so why not define stuff properly if there's no choice. Define IP_TUNNEL_*_BIT counterparts as a bit number instead of the value already coded and a fistful of <16 <-> bitmap> converters and helpers. The two flags which have a different bit position are SIT_ISATAP_BIT and VTI_ISVTI_BIT, as they were defined not as __cpu_to_be16(), but as (__force __be16), i.e. had different positions on LE and BE. Now they both have strongly defined places. Change all __be16 fields which were used to store those flags, to IP_TUNNEL_DECLARE_FLAGS() -> DECLARE_BITMAP(__IP_TUNNEL_FLAG_NUM) -> unsigned long[1] for now, and replace all TUNNEL_* occurrences to their bitmap counterparts. Use the converters in the places which talk to the userspace, hardware (NFP) or other hosts (GRE header). The rest must explicitly use the new flags only. This must be done at once, otherwise there will be too many conversions throughout the code in the intermediate commits. Finally, disable the old __be16 flags for use in the kernel code (except for the two 'irregular' flags mentioned above), to prevent any accidental (mis)use of them. For the userspace, nothing is changed, only additions were made. Most noticeable bloat-o-meter difference (.text): vmlinux: 307/-1 (306) gre.ko: 62/0 (62) ip_gre.ko: 941/-217 (724) [*] ip_tunnel.ko: 390/-900 (-510) [**] ip_vti.ko: 138/0 (138) ip6_gre.ko: 534/-18 (516) [*] ip6_tunnel.ko: 118/-10 (108) [*] gre_flags_to_tnl_flags() grew, but still is inlined [**] ip_tunnel_find() got uninlined, hence such decrease The average code size increase in non-extreme case is 100-200 bytes per module, mostly due to sizeof(long) > sizeof(__be16), as %__IP_TUNNEL_FLAG_NUM is less than %BITS_PER_LONG and the compilers are able to expand the majority of bitmap_*() calls here into direct operations on scalars. Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Alexander Lobakin <aleksander.lobakin@intel.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-29ipvs: Simplify the allocation of ip_vs_conn slab cachesKunwu Chan1-3/+1
Use the new KMEM_CACHE() macro instead of direct kmem_cache_create to simplify the creation of SLAB caches. Signed-off-by: Kunwu Chan <chentao@kylinos.cn> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Florian Westphal <fw@strlen.de>
2024-01-17ipvs: avoid stat macros calls from preemptible contextFedor Pchelkin1-2/+2
Inside decrement_ttl() upon discovering that the packet ttl has exceeded, __IP_INC_STATS and __IP6_INC_STATS macros can be called from preemptible context having the following backtrace: check_preemption_disabled: 48 callbacks suppressed BUG: using __this_cpu_add() in preemptible [00000000] code: curl/1177 caller is decrement_ttl+0x217/0x830 CPU: 5 PID: 1177 Comm: curl Not tainted 6.7.0+ #34 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0xbd/0xe0 check_preemption_disabled+0xd1/0xe0 decrement_ttl+0x217/0x830 __ip_vs_get_out_rt+0x4e0/0x1ef0 ip_vs_nat_xmit+0x205/0xcd0 ip_vs_in_hook+0x9b1/0x26a0 nf_hook_slow+0xc2/0x210 nf_hook+0x1fb/0x770 __ip_local_out+0x33b/0x640 ip_local_out+0x2a/0x490 __ip_queue_xmit+0x990/0x1d10 __tcp_transmit_skb+0x288b/0x3d10 tcp_connect+0x3466/0x5180 tcp_v4_connect+0x1535/0x1bb0 __inet_stream_connect+0x40d/0x1040 inet_stream_connect+0x57/0xa0 __sys_connect_file+0x162/0x1a0 __sys_connect+0x137/0x160 __x64_sys_connect+0x72/0xb0 do_syscall_64+0x6f/0x140 entry_SYSCALL_64_after_hwframe+0x6e/0x76 RIP: 0033:0x7fe6dbbc34e0 Use the corresponding preemption-aware variants: IP_INC_STATS and IP6_INC_STATS. Found by Linux Verification Center (linuxtesting.org). Fixes: 8d8e20e2d7bb ("ipvs: Decrement ttl") Signed-off-by: Fedor Pchelkin <pchelkin@ispras.ru> Acked-by: Julian Anastasov <ja@ssi.bg> Acked-by: Simon Horman <horms@kernel.org> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-01-11Merge tag 'net-next-6.8' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "The most interesting thing is probably the networking structs reorganization and a significant amount of changes is around self-tests. Core & protocols: - Analyze and reorganize core networking structs (socks, netdev, netns, mibs) to optimize cacheline consumption and set up build time warnings to safeguard against future header changes This improves TCP performances with many concurrent connections up to 40% - Add page-pool netlink-based introspection, exposing the memory usage and recycling stats. This helps indentify bad PP users and possible leaks - Refine TCP/DCCP source port selection to no longer favor even source port at connect() time when IP_LOCAL_PORT_RANGE is set. This lowers the time taken by connect() for hosts having many active connections to the same destination - Refactor the TCP bind conflict code, shrinking related socket structs - Refactor TCP SYN-Cookie handling, as a preparation step to allow arbitrary SYN-Cookie processing via eBPF - Tune optmem_max for 0-copy usage, increasing the default value to 128KB and namespecifying it - Allow coalescing for cloned skbs coming from page pools, improving RX performances with some common configurations - Reduce extension header parsing overhead at GRO time - Add bridge MDB bulk deletion support, allowing user-space to request the deletion of matching entries - Reorder nftables struct members, to keep data accessed by the datapath first - Introduce TC block ports tracking and use. This allows supporting multicast-like behavior at the TC layer - Remove UAPI support for retired TC qdiscs (dsmark, CBQ and ATM) and classifiers (RSVP and tcindex) - More data-race annotations - Extend the diag interface to dump TCP bound-only sockets - Conditional notification of events for TC qdisc class and actions - Support for WPAN dynamic associations with nearby devices, to form a sub-network using a specific PAN ID - Implement SMCv2.1 virtual ISM device support - Add support for Batman-avd mulicast packet type BPF: - Tons of verifier improvements: - BPF register bounds logic and range support along with a large test suite - log improvements - complete precision tracking support for register spills - track aligned STACK_ZERO cases as imprecise spilled registers. This improves the verifier "instructions processed" metric from single digit to 50-60% for some programs - support for user's global BPF subprogram arguments with few commonly requested annotations for a better developer experience - support tracking of BPF_JNE which helps cases when the compiler transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like - several fixes - Add initial TX metadata implementation for AF_XDP with support in mlx5 and stmmac drivers. Two types of offloads are supported right now, that is, TX timestamp and TX checksum offload - Fix kCFI bugs in BPF all forms of indirect calls from BPF into kernel and from kernel into BPF work with CFI enabled. This allows BPF to work with CONFIG_FINEIBT=y - Change BPF verifier logic to validate global subprograms lazily instead of unconditionally before the main program, so they can be guarded using BPF CO-RE techniques - Support uid/gid options when mounting bpffs - Add a new kfunc which acquires the associated cgroup of a task within a specific cgroup v1 hierarchy where the latter is identified by its id - Extend verifier to allow bpf_refcount_acquire() of a map value field obtained via direct load which is a use-case needed in sched_ext - Add BPF link_info support for uprobe multi link along with bpftool integration for the latter - Support for VLAN tag in XDP hints - Remove deprecated bpfilter kernel leftovers given the project is developed in user-space (https://github.com/facebook/bpfilter) Misc: - Support for parellel TC self-tests execution - Increase MPTCP self-tests coverage - Updated the bridge documentation, including several so-far undocumented features - Convert all the net self-tests to run in unique netns, to avoid random failures due to conflict and allow concurrent runs - Add TCP-AO self-tests - Add kunit tests for both cfg80211 and mac80211 - Autogenerate Netlink families documentation from YAML spec - Add yml-gen support for fixed headers and recursive nests, the tool can now generate user-space code for all genetlink families for which we have specs - A bunch of additional module descriptions fixes - Catch incorrect freeing of pages belonging to a page pool Driver API: - Rust abstractions for network PHY drivers; do not cover yet the full C API, but already allow implementing functional PHY drivers in rust - Introduce queue and NAPI support in the netdev Netlink interface, allowing complete access to the device <> NAPIs <> queues relationship - Introduce notifications filtering for devlink to allow control application scale to thousands of instances - Improve PHY validation, requesting rate matching information for each ethtool link mode supported by both the PHY and host - Add support for ethtool symmetric-xor RSS hash - ACPI based Wifi band RFI (WBRF) mitigation feature for the AMD platform - Expose pin fractional frequency offset value over new DPLL generic netlink attribute - Convert older drivers to platform remove callback returning void - Add support for PHY package MMD read/write New hardware / drivers: - Ethernet: - Octeon CN10K devices - Broadcom 5760X P7 - Qualcomm SM8550 SoC - Texas Instrument DP83TG720S PHY - Bluetooth: - IMC Networks Bluetooth radio Removed: - WiFi: - libertas 16-bit PCMCIA support - Atmel at76c50x drivers - HostAP ISA/PCMCIA style 802.11b driver - zd1201 802.11b USB dongles - Orinoco ISA/PCMCIA 802.11b driver - Aviator/Raytheon driver - Planet WL3501 driver - RNDIS USB 802.11b driver Driver updates: - Ethernet high-speed NICs: - Intel (100G, ice, idpf): - allow one by one port representors creation and removal - add temperature and clock information reporting - add get/set for ethtool's header split ringparam - add again FW logging - adds support switchdev hardware packet mirroring - iavf: implement symmetric-xor RSS hash - igc: add support for concurrent physical and free-running timers - i40e: increase the allowable descriptors - nVidia/Mellanox: - Preparation for Socket-Direct multi-dev netdev. That will allow in future releases combining multiple PFs devices attached to different NUMA nodes under the same netdev - Broadcom (bnxt): - TX completion handling improvements - add basic ntuple filter support - reduce MSIX vectors usage for MQPRIO offload - add VXLAN support, USO offload and TX coalesce completion for P7 - Marvell Octeon EP: - xmit-more support - add PF-VF mailbox support and use it for FW notifications for VFs - Wangxun (ngbe/txgbe): - implement ethtool functions to operate pause param, ring param, coalesce channel number and msglevel - Netronome/Corigine (nfp): - add flow-steering support - support UDP segmentation offload - Ethernet NICs embedded, slower, virtual: - Xilinx AXI: remove duplicate DMA code adopting the dma engine driver - stmmac: add support for HW-accelerated VLAN stripping - TI AM654x sw: add mqprio, frame preemption & coalescing - gve: add support for non-4k page sizes. - virtio-net: su