aboutsummaryrefslogtreecommitdiff
path: root/include/net
AgeCommit message (Collapse)AuthorFilesLines
2024-11-18Merge tag 'lsm-pr-20241112' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: "Thirteen patches, all focused on moving away from the current 'secid' LSM identifier to a richer 'lsm_prop' structure. This move will help reduce the translation that is necessary in many LSMs, offering better performance, and make it easier to support different LSMs in the future" * tag 'lsm-pr-20241112' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: remove lsm_prop scaffolding netlabel,smack: use lsm_prop for audit data audit: change context data from secid to lsm_prop lsm: create new security_cred_getlsmprop LSM hook audit: use an lsm_prop in audit_names lsm: use lsm_prop in security_inode_getsecid lsm: use lsm_prop in security_current_getsecid audit: update shutdown LSM data lsm: use lsm_prop in security_ipc_getsecid audit: maintain an lsm_prop in audit_context lsm: add lsmprop_to_secctx hook lsm: use lsm_prop in security_audit_rule_match lsm: add the lsm_prop data structure
2024-11-18ipv6/udp: Add 4-tuple hash for connected socketPhilo Lu1-0/+2
Implement ipv6 udp hash4 like that in ipv4. The major difference is that the hash value should be calculated with udp6_ehashfn(). Besides, ipv4-mapped ipv6 address is handled before hash() and rehash(). Export udp_ehashfn because now we use it in udpv6 rehash. Core procedures of hash/unhash/rehash are same as ipv4, and udpv4 and udpv6 share the same udptable, so some functions in ipv4 hash4 can also be shared. Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com> Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com> Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18ipv4/udp: Add 4-tuple hash for connected socketPhilo Lu1-1/+15
Currently, the udp_table has two hash table, the port hash and portaddr hash. Usually for UDP servers, all sockets have the same local port and addr, so they are all on the same hash slot within a reuseport group. In some applications, UDP servers use connect() to manage clients. In particular, when firstly receiving from an unseen 4 tuple, a new socket is created and connect()ed to the remote addr:port, and then the fd is used exclusively by the client. Once there are connected sks in a reuseport group, udp has to score all sks in the same hash2 slot to find the best match. This could be inefficient with a large number of connections, resulting in high softirq overhead. To solve the problem, this patch implement 4-tuple hash for connected udp sockets. During connect(), hash4 slot is updated, as well as a corresponding counter, hash4_cnt, in hslot2. In __udp4_lib_lookup(), hslot4 will be searched firstly if the counter is non-zero. Otherwise, hslot2 is used like before. Note that only connected sockets enter this hash4 path, while un-connected ones are not affected. hlist_nulls is used for hash4, because we probably move to another hslot wrongly when lookup with concurrent rehash. Then we check nulls at the list end to see if we should restart lookup. Because udp does not use SLAB_TYPESAFE_BY_RCU, we don't need to touch sk_refcnt when lookup. Stress test results (with 1 cpu fully used) are shown below, in pps: (1) _un-connected_ socket as server [a] w/o hash4: 1,825176 [b] w/ hash4: 1,831750 (+0.36%) (2) 500 _connected_ sockets as server [c] w/o hash4: 290860 (only 16% of [a]) [d] w/ hash4: 1,889658 (+3.1% compared with [b]) With hash4, compute_score is skipped when lookup, so [d] is slightly better than [b]. Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com> Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com> Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18net/udp: Add 4-tuple hash list basisPhilo Lu1-3/+82
Add a new hash list, hash4, in udp table. It will be used to implement 4-tuple hash for connected udp sockets. This patch adds the hlist to table, and implements helpers and the initialization. 4-tuple hash is implemented in the following patch. hash4 uses hlist_nulls to avoid moving wrongly onto another hlist due to concurrent rehash, because rehash() can happen with lookup(). Co-developed-by: Cambda Zhu <cambda@linux.alibaba.com> Signed-off-by: Cambda Zhu <cambda@linux.alibaba.com> Co-developed-by: Fred Chen <fred.cc@alibaba-inc.com> Signed-off-by: Fred Chen <fred.cc@alibaba-inc.com> Co-developed-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Yubing Qiu <yubing.qiuyubing@alibaba-inc.com> Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18net/udp: Add a new struct for hash2 slotPhilo Lu1-4/+34
Preparing for udp 4-tuple hash (uhash4 for short). To implement uhash4 without cache line missing when lookup, hslot2 is used to record the number of hashed sockets in hslot4. Thus adding a new struct udp_hslot_main with field hash4_cnt, which is used by hash2. The new struct is used to avoid doubling the size of udp_hslot. Before uhash4 lookup, firstly checking hash4_cnt to see if there are hashed sks in hslot4. Because hslot2 is always used in lookup, there is no cache line miss. Related helpers are updated, and use the helpers as possible. uhash4 is implemented in following patches. Signed-off-by: Philo Lu <lulie@linux.alibaba.com> Acked-by: Willem de Bruijn <willemb@google.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-18Merge tag 'ipsec-next-2024-11-15' of ↵David S. Miller2-3/+15
git://git.kernel.org/pub/scm/linux/kernel/git/klassert/ipsec-next Steffen Klassert says: ==================== ipsec-next-11-15 1) Add support for RFC 9611 per cpu xfrm state handling. 2) Add inbound and outbound xfrm state caches to speed up state lookups. 3) Convert xfrm to dscp_t. From Guillaume Nault. 4) Fix error handling in build_aevent. From Everest K.C. 5) Replace strncpy with strscpy_pad in copy_to_user_auth. From Daniel Yang. 6) Fix an uninitialized symbol during acquire state insertion. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-11-15Merge tag 'for-net-next-2024-11-14' of ↵Jakub Kicinski3-6/+108
git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next Luiz Augusto von Dentz says: ==================== bluetooth-next pull request for net-next: - btusb: add Foxconn 0xe0fc for Qualcomm WCN785x - btmtk: Fix ISO interface handling - Add quirk for ATS2851 - btusb: Add RTL8852BE device 0489:e123 - ISO: Do not emit LE PA/BIG Create Sync if previous is pending - btusb: Add USB HW IDs for MT7920/MT7925 - btintel_pcie: Add handshake between driver and firmware - btintel_pcie: Add recovery mechanism - hci_conn: Use disable_delayed_work_sync - SCO: Use kref to track lifetime of sco_conn - ISO: Use kref to track lifetime of iso_conn - btnxpuart: Add GPIO support to power save feature - btusb: Add 0x0489:0xe0f3 and 0x13d3:0x3623 for Qualcomm WCN785x * tag 'for-net-next-2024-11-14' of git://git.kernel.org/pub/scm/linux/kernel/git/bluetooth/bluetooth-next: (51 commits) Bluetooth: MGMT: Add initial implementation of MGMT_OP_HCI_CMD_SYNC Bluetooth: fix use-after-free in device_for_each_child() Bluetooth: btintel: Direct exception event to bluetooth stack Bluetooth: hci_core: Fix calling mgmt_device_connected Bluetooth: hci_bcm: Use the devm_clk_get_optional() helper Bluetooth: ISO: Send BIG Create Sync via hci_sync Bluetooth: hci_conn: Remove alloc from critical section Bluetooth: ISO: Use kref to track lifetime of iso_conn Bluetooth: SCO: Use kref to track lifetime of sco_conn Bluetooth: HCI: Add IPC(11) bus type Bluetooth: btusb: Add 3 HWIDs for MT7925 Bluetooth: btusb: Add new VID/PID 0489/e124 for MT7925 Bluetooth: ISO: Update hci_conn_hash_lookup_big for Broadcast slave Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending Bluetooth: ISO: Fix matching parent socket for BIS slave Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pending Bluetooth: btrtl: Decrease HCI_OP_RESET timeout from 10 s to 2 s Bluetooth: btbcm: fix missing of_node_put() in btbcm_get_board_name() Bluetooth: btusb: Add new VID/PID 0489/e111 for MT7925 Bluetooth: btmtk: adjust the position to init iso data anchor ... ==================== Link: https://patch.msgid.link/20241114214731.1994446-1-luiz.dentz@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-15Merge tag 'nf-next-24-11-15' of ↵Jakub Kicinski1-12/+13
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following patchset contains Netfilter updates for net-next: 1) Extended netlink error reporting if nfnetlink attribute parser fails, from Donald Hunter. 2) Incorrect request_module() module, from Simon Horman. 3) A series of patches to reduce memory consumption for set element transactions. Florian Westphal says: "When doing a flush on a set or mass adding/removing elements from a set, each element needs to allocate 96 bytes to hold the transactional state. In such cases, virtually all the information in struct nft_trans_elem is the same. Change nft_trans_elem to a flex-array, i.e. a single nft_trans_elem can hold multiple set element pointers. The number of elements that can be stored in one nft_trans_elem is limited by the slab allocator, this series limits the compaction to at most 62 elements as it caps the reallocation to 2048 bytes of memory." 4) A series of patches to prepare the transition to dscp_t in .flowi_tos. From Guillaume Nault. 5) Support for bitwise operations with two source registers, from Jeremy Sowden. * tag 'nf-next-24-11-15' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: bitwise: add support for doing AND, OR and XOR directly netfilter: bitwise: rename some boolean operation functions netfilter: nf_dup4: Convert nf_dup_ipv4_route() to dscp_t. netfilter: nft_fib: Convert nft_fib4_eval() to dscp_t. netfilter: rpfilter: Convert rpfilter_mt() to dscp_t. netfilter: flow_offload: Convert nft_flow_route() to dscp_t. netfilter: ipv4: Convert ip_route_me_harder() to dscp_t. netfilter: nf_tables: allocate element update information dynamically netfilter: nf_tables: switch trans_elem to real flex array netfilter: nf_tables: prepare nft audit for set element compaction netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structure netfilter: nf_tables: add nft_trans_commit_list_add_elem helper netfilter: bpf: Pass string literal as format argument of request_module() netfilter: nfnetlink: Report extack policy errors for batched ops ==================== Link: https://patch.msgid.link/20241115133207.8907-1-pablo@netfilter.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14Bluetooth: MGMT: Add initial implementation of MGMT_OP_HCI_CMD_SYNCLuiz Augusto von Dentz1-0/+10
This adds the initial implementation of MGMT_OP_HCI_CMD_SYNC as documented in mgmt-api (BlueZ tree): Send HCI command and wait for event Command =========================================== Command Code: 0x005B Controller Index: <controller id> Command Parameters: Opcode (2 Octets) Event (1 Octet) Timeout (1 Octet) Parameter Length (2 Octets) Parameter (variable) Return Parameters: Response (1-variable Octets) This command may be used to send a HCI command and wait for an (optional) event. The HCI command is specified by the Opcode, any arbitrary is supported including vendor commands, but contrary to the like of Raw/User channel it is run as an HCI command send by the kernel since it uses its command synchronization thus it is possible to wait for a specific event as a response. Setting event to 0x00 will cause the command to wait for either HCI Command Status or HCI Command Complete. Timeout is specified in seconds, setting it to 0 will cause the default timeout to be used. Possible errors: Failed Invalid Parameters Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: HCI: Add IPC(11) bus typeLuiz Augusto von Dentz1-0/+1
Zephyr(1) has been using the same bus defines as Linux so tools likes of btmon, etc, are able to decode the bus used by the driver to transport HCI packets. Link: https://github.com/zephyrproject-rtos/zephyr/pull/80808 Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: ISO: Update hci_conn_hash_lookup_big for Broadcast slaveIulia Tanasescu1-1/+11
Currently, hci_conn_hash_lookup_big only checks for BIS master connections, by filtering out connections with the destination address set. This commit updates this function to also consider BIS slave connections, since it is also used for a Broadcast Receiver to set an available BIG handle before issuing the LE BIG Create Sync command. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pendingIulia Tanasescu2-0/+30
The Bluetooth Core spec does not allow a LE BIG Create sync command to be sent to Controller if another one is pending (Vol 4, Part E, page 2586). In order to avoid this issue, the HCI_CONN_CREATE_BIG_SYNC was added to mark that the LE BIG Create Sync command has been sent for a hcon. Once the BIG Sync Established event is received, the hcon flag is erased and the next pending hcon is handled. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pendingIulia Tanasescu2-1/+36
The Bluetooth Core spec does not allow a LE PA Create sync command to be sent to Controller if another one is pending (Vol 4, Part E, page 2493). In order to avoid this issue, the HCI_CONN_CREATE_PA_SYNC was added to mark that the LE PA Create Sync command has been sent for a hcon. Once the PA Sync Established event is received, the hcon flag is erased and the next pending hcon is handled. Signed-off-by: Iulia Tanasescu <iulia.tanasescu@nxp.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Bluetooth: Add new quirks for ATS2851Danil Pylaev2-4/+20
This adds quirks for broken extended create connection, and write auth payload timeout. Signed-off-by: Danil Pylaev <danstiv404@gmail.com> Signed-off-by: Luiz Augusto von Dentz <luiz.von.dentz@intel.com>
2024-11-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski2-2/+12
Cross-merge networking fixes after downstream PR (net-6.12-rc8). Conflicts: tools/testing/selftests/net/.gitignore 252e01e68241 ("selftests: net: add netlink-dumps to .gitignore") be43a6b23829 ("selftests: ncdevmem: Move ncdevmem under drivers/net/hw") https://lore.kernel.org/all/20241113122359.1b95180a@canb.auug.org.au/ drivers/net/phy/phylink.c 671154f174e0 ("net: phylink: ensure PHY momentary link-fails are handled") 7530ea26c810 ("net: phylink: remove "using_mac_select_pcs"") Adjacent changes: drivers/net/ethernet/stmicro/stmmac/dwmac-intel-plat.c 5b366eae7193 ("stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines") e96321fad3ad ("net: ethernet: Switch back to struct platform_driver::remove()") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-14Merge tag 'net-6.12-rc8' of ↵Linus Torvalds1-0/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bluetooth. Quite calm week. No new regression under investigation. Current release - regressions: - eth: revert "igb: Disable threaded IRQ for igb_msix_other" Current release - new code bugs: - bluetooth: btintel: direct exception event to bluetooth stack Previous releases - regressions: - core: fix data-races around sk->sk_forward_alloc - netlink: terminate outstanding dump on socket close - mptcp: error out earlier on disconnect - vsock: fix accept_queue memory leak - phylink: ensure PHY momentary link-fails are handled - eth: mlx5: - fix null-ptr-deref in add rule err flow - lock FTE when checking if active - eth: dwmac-mediatek: fix inverted handling of mediatek,mac-wol Previous releases - always broken: - sched: fix u32's systematic failure to free IDR entries for hnodes. - sctp: fix possible UAF in sctp_v6_available() - eth: bonding: add ns target multicast address to slave device - eth: mlx5: fix msix vectors to respect platform limit - eth: icssg-prueth: fix 1 PPS sync" * tag 'net-6.12-rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (38 commits) net: sched: u32: Add test case for systematic hnode IDR leaks selftests: bonding: add ns multicast group testing bonding: add ns target multicast address to slave device net: ti: icssg-prueth: Fix 1 PPS sync stmmac: dwmac-intel-plat: fix call balance of tx_clk handling routines net: Make copy_safe_from_sockptr() match documentation net: stmmac: dwmac-mediatek: Fix inverted handling of mediatek,mac-wol ipmr: Fix access to mfc_cache_list without lock held samples: pktgen: correct dev to DEV net: phylink: ensure PHY momentary link-fails are handled mptcp: pm: use _rcu variant under rcu_read_lock mptcp: hold pm lock when deleting entry mptcp: update local address flags when setting it net: sched: cls_u32: Fix u32's systematic failure to free IDR entries for hnodes. MAINTAINERS: Re-add cancelled Renesas driver sections Revert "igb: Disable threaded IRQ for igb_msix_other" Bluetooth: btintel: Direct exception event to bluetooth stack Bluetooth: hci_core: Fix calling mgmt_device_connected virtio/vsock: Improve MSG_ZEROCOPY error handling vsock: Fix sk_error_queue memory leak ...
2024-11-14netfilter: nf_tables: allocate element update information dynamicallyFlorian Westphal1-3/+7
Move the timeout/expire/flag members from nft_trans_one_elem struct into a dybamically allocated structure, only needed when timeout update was requested. This halves size of nft_trans_one_elem struct and allows to compact up to 124 elements in one transaction container rather than 62. This halves memory requirements for a large flush or insert transaction, where ->update remains NULL. Care has to be taken to release the extra data in all spots, including abort path. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-14netfilter: nf_tables: prepare for multiple elements in nft_trans_elem structureFlorian Westphal1-12/+9
Add helpers to release the individual elements contained in the trans_elem container structure. No functional change intended. Followup patch will add 'nelems' member and will turn 'priv' into a flexible array. These helpers can then loop over all elements. Care needs to be taken to handle a mix of new elements and existing elements that are being updated (e.g. timeout refresh). Before this patch, NEWSETELEM transaction with update is released early so nft_trans_set_elem_destroy() won't get called, so we need to skip elements marked as update. Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-11-14bonding: add ns target multicast address to slave deviceHangbin Liu1-0/+2
Commit 4598380f9c54 ("bonding: fix ns validation on backup slaves") tried to resolve the issue where backup slaves couldn't be brought up when receiving IPv6 Neighbor Solicitation (NS) messages. However, this fix only worked for drivers that receive all multicast messages, such as the veth interface. For standard drivers, the NS multicast message is silently dropped because the slave device is not a member of the NS target multicast group. To address this, we need to make the slave device join the NS target multicast group, ensuring it can receive these IPv6 NS messages to validate the slave’s status properly. There are three policies before joining the multicast group: 1. All settings must be under active-backup mode (alb and tlb do not support arp_validate), with backup slaves and slaves supporting multicast. 2. We can add or remove multicast groups when arp_validate changes. 3. Other operations, such as enslaving, releasing, or setting NS targets, need to be guarded by arp_validate. Fixes: 4e24be018eb9 ("bonding: add new parameter ns_targets") Signed-off-by: Hangbin Liu <liuhangbin@gmail.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-13net: simplify eeecfg_mac_can_tx_lpiHeiner Kallweit1-4/+1
Simplify the function. Signed-off-by: Heiner Kallweit <hkallweit1@gmail.com> Reviewed-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Link: https://patch.msgid.link/f9a4623b-b94c-466c-8733-62057c6d9a17@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-13Merge tag 'wireless-next-2024-11-13' of ↵Jakub Kicinski1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.13 Most likely the last -next pull request for v6.13. Most changes are in Realtek and Qualcomm drivers, otherwise not really anything noteworthy. Major changes: mac80211 * EHT 1024 aggregation size for transmissions ath12k * switch to using wiphy_lock() and remove ar->conf_mutex * firmware coredump collection support * add debugfs support for a multitude of statistics ath11k * dt: document WCN6855 hardware inputs ath9k * remove include/linux/ath9k_platform.h ath5k * Arcadyan ARV45XX AR2417 & Gigaset SX76[23] AR241[34]A support rtw88: * 8821au and 8812au USB adapters support rtw89 * thermal protection * firmware secure boot for WiFi 6 chip * tag 'wireless-next-2024-11-13' of git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next: (154 commits) Revert "wifi: iwlegacy: do not skip frames with bad FCS" wifi: mac80211: pass MBSSID config by reference wifi: mac80211: Support EHT 1024 aggregation size in TX net: rfkill: gpio: Add check for clk_enable() wifi: brcmfmac: Fix oops due to NULL pointer dereference in brcmf_sdiod_sglist_rw() wifi: Switch back to struct platform_driver::remove() wifi: ipw2x00: libipw_rx_any(): fix bad alignment wifi: brcmfmac: release 'root' node in all execution paths wifi: iwlwifi: mvm: don't call power_update_mac in fast suspend wifi: iwlwifi: s/IWL_MVM_INVALID_STA/IWL_INVALID_STA wifi: iwlwifi: bump minimum API version in BZ/SC to 92 wifi: iwlwifi: move IWL_LMAC_*_INDEX to fw/api/context.h wifi: iwlwifi: be less noisy if the NIC is dead in S3 wifi: iwlwifi: mvm: tell iwlmei when we finished suspending wifi: iwlwifi: allow fast resume on ax200 wifi: iwlwifi: mvm: support new initiator and responder command version wifi: iwlwifi: mvm: use wiphy locked debugfs for low-latency wifi: iwlwifi: mvm: MLO scan upon channel condition degradation wifi: iwlwifi: mvm: support new versions of the wowlan APIs wifi: iwlwifi: mvm: allow always calling iwl_mvm_get_bss_vif() ... ==================== Link: https://patch.msgid.link/20241113172918.A8A11C4CEC3@smtp.kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-13Merge tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpfLinus Torvalds1-2/+10
Pull bpf fixes from Daniel Borkmann: - Fix a mismatching RCU unlock flavor in bpf_out_neigh_v6 (Jiawei Ye) - Fix BPF sockmap with kTLS to reject vsock and unix sockets upon kTLS context retrieval (Zijian Zhang) - Fix BPF bits iterator selftest for s390x (Hou Tao) * tag 'bpf-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf: Fix mismatched RCU unlock flavour in bpf_out_neigh_v6 bpf: Add sk_is_inet and IS_ICSK check in tls_sw_has_ctx_tx/rx selftests/bpf: Use -4095 as the bad address for bits iterator
2024-11-12net: ip: make ip_route_use_hint() return drop reasonsMenglong Dong1-3/+4
In this commit, we make ip_route_use_hint() return drop reasons. The drop reasons that we return are similar to what we do in ip_route_input_slow(), and no drop reasons are added in this commit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_mkroute_input/__mkroute_input return drop reasonsMenglong Dong1-0/+7
In this commit, we make ip_mkroute_input() and __mkroute_input() return drop reasons. The drop reason "SKB_DROP_REASON_ARP_PVLAN_DISABLE" is introduced for the case: the packet which is not IP is forwarded to the in_dev, and the proxy_arp_pvlan is not enabled. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input() return drop reasonsMenglong Dong1-3/+4
In this commit, we make ip_route_input() return skb drop reasons that come from ip_route_input_noref(). Meanwhile, adjust all the call to it. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input_noref() return drop reasonsMenglong Dong1-7/+8
In this commit, we make ip_route_input_noref() return drop reasons, which come from ip_route_input_rcu(). We need adjust the callers of ip_route_input_noref() to make sure the return value of ip_route_input_noref() is used properly. The errno that ip_route_input_noref() returns comes from ip_route_input and bpf_lwt_input_reroute in the origin logic, and we make them return -EINVAL on error instead. In the following patch, we will make ip_route_input() returns drop reasons too. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_route_input_slow() return drop reasonsMenglong Dong1-0/+6
In this commit, we make ip_route_input_slow() return skb drop reasons, and following new skb drop reasons are added: SKB_DROP_REASON_IP_INVALID_DEST The only caller of ip_route_input_slow() is ip_route_input_rcu(), and we adjust it by making it return -EINVAL on error. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make ip_mc_validate_source() return drop reasonMenglong Dong2-3/+7
Make ip_mc_validate_source() return drop reason, and adjust the call of it in ip_route_input_mc(). Another caller of it is ip_rcv_finish_core->udp_v4_early_demux, and the errno is not checked in detail, so we don't do more adjustment for it. The drop reason "SKB_DROP_REASON_IP_LOCALNET" is added in this commit. Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-12net: ip: make fib_validate_source() support drop reasonsMenglong Dong2-0/+22
In this commit, we make fib_validate_source() and __fib_validate_source() return -reason instead of errno on error. The return value of fib_validate_source can be -errno, 0, and 1. It's hard to make fib_validate_source() return drop reasons directly. The fib_validate_source() will return 1 if the scope of the source(revert) route is HOST. And the __mkroute_input() will mark the skb with IPSKB_DOREDIRECT in this case (combine with some other conditions). And then, a REDIRECT ICMP will be sent in ip_forward() if this flag exists. We can't pass this information to __mkroute_input if we make fib_validate_source() return drop reasons. Therefore, we introduce the wrapper fib_validate_source_reason() for fib_validate_source(), which will return the drop reasons on error. In the origin logic, LINUX_MIB_IPRPFILTER will be counted if fib_validate_source() return -EXDEV. And now, we need to adjust it by checking "reason == SKB_DROP_REASON_IP_RPFILTER". However, this will take effect only after the patch "net: ip: make ip_route_input_noref() return drop reasons", as we can't pass the drop reasons from fib_validate_source() to ip_rcv_finish_core() in this patch. Following new drop reasons are added in this patch: SKB_DROP_REASON_IP_LOCAL_SOURCE SKB_DROP_REASON_IP_INVALID_SOURCE Signed-off-by: Menglong Dong <dongml2@chinatelecom.cn> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-11net: Add control functions for irq suspensionMartin Karsten1-0/+3
The napi_suspend_irqs routine bootstraps irq suspension by elongating the defer timeout to irq_suspend_timeout. The napi_resume_irqs routine effectively cancels irq suspension by forcing the napi to be scheduled immediately. Signed-off-by: Martin Karsten <mkarsten@uwaterloo.ca> Co-developed-by: Joe Damato <jdamato@fastly.com> Signed-off-by: Joe Damato <jdamato@fastly.com> Tested-by: Joe Damato <jdamato@fastly.com> Tested-by: Martin Karsten <mkarsten@uwaterloo.ca> Acked-by: Stanislav Fomichev <sdf@fomichev.me> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Link: https://patch.msgid.link/20241109050245.191288-3-jdamato@fastly.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Register rtnl_dellink() and rtnl_setlink() with ↵Kuniyuki Iwashima1-0/+1
RTNL_FLAG_DOIT_PERNET_WIP. Currently, rtnl_setlink() and rtnl_dellink() cannot be fully converted to per-netns RTNL due to a lack of handling peer/lower/upper devices in different netns. For example, when we change a device in rtnl_setlink() and need to propagate that to its upper devices, we want to avoid acquiring all netns locks, for which we do not know the upper limit. The same situation happens when we remove a device. rtnl_dellink() could be transformed to remove a single device in the requested netns and delegate other devices to per-netns work, and rtnl_setlink() might be ? Until we come up with a better idea, let's use a new flag RTNL_FLAG_DOIT_PERNET_WIP for rtnl_dellink() and rtnl_setlink(). This will unblock converting RTNL users where such devices are not related. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241108004823.29419-11-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Add peer_type in struct rtnl_link_ops.Kuniyuki Iwashima1-0/+2
In ops->newlink(), veth, vxcan, and netkit call rtnl_link_get_net() with a net pointer, which is the first argument of ->newlink(). rtnl_link_get_net() could return another netns based on IFLA_NET_NS_PID and IFLA_NET_NS_FD in the peer device's attributes. We want to get it and fill rtnl_nets->nets[] in advance in rtnl_newlink() for per-netns RTNL. All of the three get the peer netns in the same way: 1. Call rtnl_nla_parse_ifinfomsg() 2. Call ops->validate() (vxcan doesn't have) 3. Call rtnl_link_get_net_tb() Let's add a new field peer_type to struct rtnl_link_ops and prefetch netns in the peer ifla to add it to rtnl_nets in rtnl_newlink(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Nikolay Aleksandrov <razor@blackwall.org> Link: https://patch.msgid.link/20241108004823.29419-6-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Remove __rtnl_link_register()Kuniyuki Iwashima1-2/+0
link_ops is protected by link_ops_mutex and no longer needs RTNL, so we have no reason to have __rtnl_link_register() separately. Let's remove it and call rtnl_link_register() from ifb.ko and dummy.ko. Note that both modules' init() work on init_net only, so we need not export pernet_ops_rwsem and can use rtnl_net_lock() there. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241108004823.29419-4-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Protect link_ops by mutex.Kuniyuki Iwashima1-1/+1
rtnl_link_unregister() holds RTNL and calls synchronize_srcu(), but rtnl_newlink() will acquire SRCU frist and then RTNL. Then, we need to unlink ops and call synchronize_srcu() outside of RTNL to avoid the deadlock. rtnl_link_unregister() rtnl_newlink() ---- ---- lock(rtnl_mutex); lock(&ops->srcu); lock(rtnl_mutex); sync(&ops->srcu); Let's move as such and add a mutex to protect link_ops. Now, link_ops is protected by its dedicated mutex and rtnl_link_register() no longer needs to hold RTNL. While at it, we move the initialisation of ops->dellink and ops->srcu out of the mutex scope. Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241108004823.29419-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11rtnetlink: Remove __rtnl_link_unregister().Kuniyuki Iwashima1-1/+0
rtnl_link_unregister() holds RTNL and calls __rtnl_link_unregister(), where we call synchronize_srcu() to wait inflight RTM_NEWLINK requests for per-netns RTNL. We put synchronize_srcu() in __rtnl_link_unregister() due to ifb.ko and dummy.ko. However, rtnl_newlink() will acquire SRCU before RTNL later in this series. Then, lockdep will detect the deadlock: rtnl_link_unregister() rtnl_newlink() ---- ---- lock(rtnl_mutex); lock(&ops->srcu); lock(rtnl_mutex); sync(&ops->srcu); To avoid the problem, we must call synchronize_srcu() before RTNL in rtnl_link_unregister(). As a preparation, let's remove __rtnl_link_unregister(). Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241108004823.29419-2-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-11net: netlink: add nla_get_*_default() accessorsJohannes Berg1-0/+262
There are quite a number of places that use patterns such as if (attr) val = nla_get_u16(attr); else val = DEFAULT; Add nla_get_u16_default() and friends like that to not have to type this out all the time. Acked-by: Toke Høiland-Jørgensen <toke@kernel.org> Acked-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Johannes Berg <johannes.berg@intel.com> Link: https://patch.msgid.link/20241108114145.acd2aadb03ac.I3df6aac71d38a5baa1c0a03d0c7e82d4395c030e@changeid Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Create netdev->neighbour associationGilad Naaman2-7/+14
Create a mapping between a netdev and its neighoburs, allowing for much cheaper flushes. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Link: https://patch.msgid.link/20241107160444.2913124-7-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Remove bare neighbour::next pointerGilad Naaman1-3/+1
Remove the now-unused neighbour::next pointer, leaving struct neighbour solely with the hlist_node implementation. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241107160444.2913124-6-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Convert iteration to use hlist+macroGilad Naaman1-4/+1
Remove all usage of the bare neighbour::next pointer, replacing them with neighbour::hash and its for_each macro. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241107160444.2913124-5-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Define neigh_for_each_in_bucketGilad Naaman1-0/+6
Introduce neigh_for_each_in_bucket in neighbour.h, to help iterate over the neighbour table more succinctly. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241107160444.2913124-3-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09neighbour: Add hlist_node to struct neighbourGilad Naaman1-0/+2
Add a doubly-linked node to neighbours, so that they can be deleted without iterating the entire bucket they're in. Signed-off-by: Gilad Naaman <gnaaman@drivenets.com> Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20241107160444.2913124-2-gnaaman@drivenets.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-09net: mctp: Expose transport binding identifier via IFLA attributeKhang Nguyen2-1/+21
MCTP control protocol implementations are transport binding dependent. Endpoint discovery is mandatory based on transport binding. Message timing requirements are specified in each respective transport binding specification. However, we currently have no means to get this information from MCTP links. Add a IFLA_MCTP_PHYS_BINDING netlink link attribute, which represents the transport type using the DMTF DSP0239-defined type numbers, returned as part of RTM_GETLINK data. We get an IFLA_MCTP_PHYS_BINDING attribute for each MCTP link, for example: - 0x00 (unspec) for loopback interface; - 0x01 (SMBus/I2C) for mctpi2c%d interfaces; and - 0x05 (serial) for mctpserial%d interfaces. Signed-off-by: Khang Nguyen <khangng@os.amperecomputing.com> Reviewed-by: Matt Johnston <matt@codeconstruct.com.au> Link: https://patch.msgid.link/20241105071915.821871-1-khangng@os.amperecomputing.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07ipv4: Prepare ip_route_output() to future .flowi4_tos conversion.Guillaume Nault1-3/+3
Convert the "tos" parameter of ip_route_output() to dscp_t. This way we'll have a dscp_t value directly available when .flowi4_tos will eventually be converted to dscp_t. All ip_route_output() callers but one set this "tos" parameter to 0 and therefore don't need to be adapted to the new prototype. Only br_nf_pre_routing_finish() needs conversion. It can just use ip4h_dscp() to get the DSCP field from the IPv4 header. Signed-off-by: Guillaume Nault <gnault@redhat.com> Reviewed-by: Ido Schimmel <idosch@nvidia.com> Link: https://patch.msgid.link/0f10d031dd44c70aae9bc6e19391cb30d5c2fe71.1730928699.git.gnault@redhat.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+4
Cross-merge networking fixes after downstream PR (net-6.12-rc7). Conflicts: drivers/net/ethernet/freescale/enetc/enetc_pf.c e15c5506dd39 ("net: enetc: allocate vf_state during PF probes") 3774409fd4c6 ("net: enetc: build enetc_pf_common.c as a separate module") https://lore.kernel.org/20241105114100.118bd35e@canb.auug.org.au Adjacent changes: drivers/net/ethernet/ti/am65-cpsw-nuss.c de794169cf17 ("net: ethernet: ti: am65-cpsw: Fix multi queue Rx on J7") 4a7b2ba94a59 ("net: ethernet: ti: am65-cpsw: Use tstats instead of open coded version") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-11-07wifi: mac80211: fix description of ieee80211_set_active_links() for new sequenceZong-Zhe Yang1-1/+1
The sequence of calls has changed, but the description is inconsistent. So, fix the description. Fixes: 188a1bf89432 ("wifi: mac80211: re-order assigning channel in activate links") Signed-off-by: Zong-Zhe Yang <kevin_yang@realtek.com> Link: https://patch.msgid.link/20241101082143.11138-1-kevin_yang@realtek.com Signed-off-by: Johannes Berg <johannes.berg@intel.com>
2024-11-07Merge tag 'nf-next-24-11-07' of ↵Paolo Abeni1-1/+2
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next Pablo Neira Ayuso says: ==================== Netfilter updates for net-next The following series contains Netfilter updates for net-next: 1) Make legacy xtables configs user selectable, from Breno Leitao. 2) Fix a few sparse warnings related to percpu, from Uros Bizjak. 3) Use strscpy_pad, from Justin Stitt. 4) Use nft_trans_elem_alloc() in catchall flush, from Florian Westphal. 5) A series of 7 patches to fix false positive with CONFIG_RCU_LIST=y. Florian also sees possible issue with 10 while module load/removal when requesting an expression that is available via module. As for patch 11, object is being updated so reference on the module already exists so I don't see any real issue. Florian says: "Unfortunately there are many more errors, and not all are false positives. First patches pass lockdep_commit_lock_is_held() to the rcu list traversal macro so that those splats are avoided. The last two patches are real code change as opposed to 'pass the transaction mutex to relax rcu check': Those two lists are not protected by transaction mutex so could be altered in parallel. This targets nf-next because these are long-standing issues." netfilter pull request 24-11-07 * tag 'nf-next-24-11-07' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next: netfilter: nf_tables: must hold rcu read lock while iterating object type list netfilter: nf_tables: must hold rcu read lock while iterating expression type list netfilter: nf_tables: avoid false-positive lockdep splats with basechain hook netfilter: nf_tables: avoid false-positive lockdep splats in set walker netfilter: nf_tables: avoid false-positive lockdep splats with flowtables netfilter: nf_tables: avoid false-positive lockdep splats with sets netfilter: nf_tables: avoid false-positive lockdep splat on rule deletion netfilter: nf_tables: prefer nft_trans_elem_alloc helper netfilter: nf_tables: replace deprecated strncpy with strscpy_pad netfilter: nf_tables: Fix percpu address space issues in nf_tables_api.c netfilter: Make legacy configs user selectable ==================== Link: https://patch.msgid.link/20241106234625.168468-1-pablo@netfilter.org Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-11-07netfilter: nf_tables: wait for rcu grace period on net_device removalPablo Neira Ayuso1-0/+4
8c873e219970 ("netfilter: core: free hooks with call_rcu") removed synchronize_net() call when unregistering basechain hook, however, net_device removal event handler for the NFPR