aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/broadcom/bnxt/bnxt.c
AgeCommit message (Collapse)AuthorFilesLines
11 daysMerge tag 'net-7.1-rc1' of ↵Linus Torvalds1-27/+31
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from Netfilter. Steady stream of fixes. Last two weeks feel comparable to the two weeks before the merge window. Lots of AI-aided bug discovery. A newer big source is Sashiko/Gemini (Roman Gushchin's system), which points out issues in existing code during patch review (maybe 25% of fixes here likely originating from Sashiko). Nice thing is these are often fixed by the respective maintainers, not drive-bys. Current release - new code bugs: - kconfig: MDIO_PIC64HPSC should depend on ARCH_MICROCHIP Previous releases - regressions: - add async ndo_set_rx_mode and switch drivers which we promised to be called under the per-netdev mutex to it - dsa: remove duplicate netdev_lock_ops() for conduit ethtool ops - hv_sock: report EOF instead of -EIO for FIN - vsock/virtio: fix MSG_PEEK calculation on bytes to copy Previous releases - always broken: - ipv6: fix possible UAF in icmpv6_rcv() - icmp: validate reply type before using icmp_pointers - af_unix: drop all SCM attributes for SOCKMAP - netfilter: fix a number of bugs in the osf (OS fingerprinting) - eth: intel: fix timestamp interrupt configuration for E825C Misc: - bunch of data-race annotations" * tag 'net-7.1-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (148 commits) rxrpc: Fix error handling in rxgk_extract_token() rxrpc: Fix re-decryption of RESPONSE packets rxrpc: Fix rxrpc_input_call_event() to only unshare DATA packets rxrpc: Fix missing validation of ticket length in non-XDR key preparsing rxgk: Fix potential integer overflow in length check rxrpc: Fix conn-level packet handling to unshare RESPONSE packets rxrpc: Fix potential UAF after skb_unshare() failure rxrpc: Fix rxkad crypto unalignment handling rxrpc: Fix memory leaks in rxkad_verify_response() net: rds: fix MR cleanup on copy error m68k: mvme147: Make me the maintainer net: txgbe: fix firmware version check selftests/bpf: check epoll readiness during reuseport migration tcp: call sk_data_ready() after listener migration vhost_net: fix sleeping with preempt-disabled in vhost_net_busy_poll() ipv6: Cap TLV scan in ip6_tnl_parse_tlv_enc_lim tipc: fix double-free in tipc_buf_append() llc: Return -EINPROGRESS from llc_ui_connect() ipv4: icmp: validate reply type before using icmp_pointers selftests/net: packetdrill: cover RFC 5961 5.2 challenge ACK on both edges ...
13 daysbnxt: use snapshot in bnxt_cfg_rx_modeStanislav Fomichev1-14/+15
With the introduction of ndo_set_rx_mode_async (as discussed in [1]) we can call bnxt_cfg_rx_mode directly. Convert bnxt_cfg_rx_mode to use uc/mc snapshots and move its call in bnxt_sp_task to the section that resets BNXT_STATE_IN_SP_TASK. Switch to direct call in bnxt_set_rx_mode. Link: https://lore.kernel.org/netdev/CACKFLi=5vj8hPqEUKDd8RTw3au5G+zRgQEqjF+6NZnyoNm90KA@mail.gmail.com/ [1] Cc: Michael Chan <michael.chan@broadcom.com> Cc: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260416185712.2155425-9-sdf@fomichev.me Signed-off-by: Paolo Abeni <pabeni@redhat.com>
13 daysbnxt: convert to ndo_set_rx_mode_asyncStanislav Fomichev1-14/+17
Convert bnxt from ndo_set_rx_mode to ndo_set_rx_mode_async. bnxt_set_rx_mode, bnxt_mc_list_updated and bnxt_uc_list_updated now take explicit uc/mc list parameters and iterate with netdev_hw_addr_list_for_each instead of netdev_for_each_{uc,mc}_addr. The bnxt_cfg_rx_mode internal caller passes the real lists under netif_addr_lock_bh. BNXT_RX_MASK_SP_EVENT is still used here, next patch converts to the direct call. Cc: Michael Chan <michael.chan@broadcom.com> Cc: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260416185712.2155425-8-sdf@fomichev.me Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-16Merge tag 'for-linus-fwctl' of ↵Linus Torvalds1-20/+29
git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl Pull fwctl updates from Jason Gunthorpe: - New fwctl driver for Broadcom RDMA NICs - Bug fix for non-modular builds * tag 'for-linus-fwctl' of git://git.kernel.org/pub/scm/linux/kernel/git/fwctl/fwctl: fwctl: Fix class init ordering to avoid NULL pointer dereference on device removal fwctl/bnxt_fwctl: Add documentation entries fwctl/bnxt_fwctl: Add bnxt fwctl device fwctl/bnxt_en: Create an aux device for fwctl fwctl/bnxt_en: Refactor aux bus functions to be more generic fwctl/bnxt_en: Move common definitions to include/linux/bnxt/
2026-04-12net: bnxt: Dispatch to SW USOJoe Damato1-0/+5
Wire in the SW USO path added in preceding commits when hardware USO is not possible. When a GSO skb with SKB_GSO_UDP_L4 arrives and the NIC lacks HW USO capability, redirect to bnxt_sw_udp_gso_xmit() which handles software segmentation into individual UDP frames submitted directly to the TX ring. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-10-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12net: bnxt: Add SW GSO completion and teardown supportJoe Damato1-9/+66
Update __bnxt_tx_int and bnxt_free_one_tx_ring_skbs to handle SW GSO segments: - MID segments: adjust tx_pkts/tx_bytes accounting and skip skb free (the skb is shared across all segments and freed only once) - LAST segments: call tso_dma_map_complete() to tear down the IOVA mapping if one was used. On the fallback path, payload DMA unmapping is handled by the existing per-BD dma_unmap_len walk. Both MID and LAST completions advance tx_inline_cons to release the segment's inline header slot back to the ring. is_sw_gso is initialized to zero, so the new code paths are not run. Add logic for feature advertisement and guardrails for ring sizing. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-9-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12net: bnxt: Add TX inline buffer infrastructureJoe Damato1-0/+35
Add per-ring pre-allocated inline buffer fields (tx_inline_buf, tx_inline_dma, tx_inline_size) to bnxt_tx_ring_info and helpers to allocate and free them. A producer and consumer (tx_inline_prod, tx_inline_cons) are added to track which slot(s) of the inline buffer are in-use. The inline buffer will be used by the SW USO path for pre-allocated, pre-DMA-mapped per-segment header copies. In the future, this could be extended to support TX copybreak. Allocation helper is marked __maybe_unused in this commit because it will be wired in later. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-6-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12net: bnxt: Use dma_unmap_len for TX completion unmappingJoe Damato1-22/+41
Store the DMA mapping length in each TX buffer descriptor via dma_unmap_len_set at submit time, and use dma_unmap_len at completion time. This is a no-op for normal packets but prepares for software USO, where header BDs set dma_unmap_len to 0 because the header buffer is unmapped collectively rather than per-segment. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-5-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12net: bnxt: Add a helper for tx_bd_extJoe Damato1-7/+2
Factor out some code to setup tx_bd_exts into a helper function. This helper will be used by SW USO implementation in the following commits. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-4-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-12net: bnxt: Export bnxt_xmit_get_cfa_actionJoe Damato1-1/+1
Export bnxt_xmit_get_cfa_action so that it can be used in future commits which add software USO support to bnxt. Suggested-by: Jakub Kicinski <kuba@kernel.org> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260408230607.2019402-3-joe@dama.to Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-24/+52
Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()") 0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic") 57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c 4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") 687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h b6045c899e37 ("wifi: iwlwifi: mld: Refactor scan command handling") ec66ec6a5a8f ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c 078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") 323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-01bnxt_en: Restore default stat ctxs for ULP when resource is availablePavan Chebbi1-0/+2
During resource reservation, if the L2 driver does not have enough MSIX vectors to provide to the RoCE driver, it sets the stat ctxs for ULP also to 0 so that we don't have to reserve it unnecessarily. However, subsequently the user may reduce L2 rings thereby freeing up some resources that the L2 driver can now earmark for RoCE. In this case, the driver should restore the default ULP stat ctxs to make sure that all RoCE resources are ready for use. The RoCE driver may fail to initialize in this scenario without this fix. Fixes: d630624ebd70 ("bnxt_en: Utilize ulp client resources if RoCE is not registered") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260331065138.948205-4-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-01bnxt_en: Don't assume XDP is never enabled in bnxt_init_dflt_ring_mode()Michael Chan1-4/+9
The original code made the assumption that when we set up the initial default ring mode, we must be just loading the driver and XDP cannot be enabled yet. This is not true when the FW goes through a resource or capability change. Resource reservations will be cancelled and reinitialized with XDP already enabled. devlink reload with XDP enabled will also have the same issue. This scenario will cause the ring arithmetic to be all wrong in the bnxt_init_dflt_ring_mode() path causing failure: bnxt_en 0000:a1:00.0 ens2f0np0: bnxt_setup_int_mode err: ffffffea bnxt_en 0000:a1:00.0 ens2f0np0: bnxt_request_irq err: ffffffea bnxt_en 0000:a1:00.0 ens2f0np0: nic open fail (rc: ffffffea) Fix it by properly accounting for XDP in the bnxt_init_dflt_ring_mode() path by using the refactored helper functions in the previous patch. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Fixes: ec5d31e3c15d ("bnxt_en: Handle firmware reset status during IF_UP.") Fixes: 228ea8c187d8 ("bnxt_en: implement devlink dev reload driver_reinit") Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260331065138.948205-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-01bnxt_en: Refactor some basic ring setup and adjustment logicMichael Chan1-16/+37
Refactor out the basic code that trims the default rings, sets up and adjusts XDP TX rings and CP rings. There is no change in behavior. This is to prepare for the next bug fix patch. Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260331065138.948205-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-30bnxt_en: set backing store type from query typePengpeng Hou1-4/+4
bnxt_hwrm_func_backing_store_qcaps_v2() stores resp->type from the firmware response in ctxm->type and later uses that value to index fixed backing-store metadata arrays such as ctx_arr[] and bnxt_bstore_to_trace[]. ctxm->type is fixed by the current backing-store query type and matches the array index of ctx->ctx_arr. Set ctxm->type from the current loop variable instead of depending on resp->type. Also update the loop to advance type from next_valid_type in the for statement, which keeps the control flow simpler for non-valid and unchanged entries. Fixes: 6a4d0774f02d ("bnxt_en: Add support for new backing store query firmware API") Signed-off-by: Pengpeng Hou <pengpeng@iscas.ac.cn> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Tested-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260328234357.43669-1-pengpeng@iscas.ac.cn Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-29bnxt_en: Move bnxt_rss_ext_op into headerChris J Arges1-17/+0
This allows bnxt_rss_ext_op to be used by other functions. In addition this modifies the rxcmp argument to be const since the function only reads from this structure. Reviewed-by: Joe Damato <joe@dama.to> Signed-off-by: Chris J Arges <carges@cloudflare.com> Link: https://patch.msgid.link/20260325201139.2501937-4-carges@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-29bnxt_en: Implement XDP RSS hash metadata extractionChris J Arges1-0/+5
Add support for extracting RSS hash values and hash types from hardware completion descriptors in XDP programs for bnxt_en. Add IP_TYPE definition for determining if completion is ipv4 or ipv6. In addition add ITYPE_ICMP flag for identifying ICMP completions. Signed-off-by: Chris J Arges <carges@cloudflare.com> Reviewed-by: Joe Damato <joe@dama.to> Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-29bnxt_en: use bnxt_xdp_buff for xdp contextChris J Arges1-10/+19
This adds bnxt_xdp_buff which embeds the xdp_buff struct and stores pointers to hardware RX completion descriptors (rx_cmp and rx_cmp_ext) along with the completion type. Signed-off-by: Chris J Arges <carges@cloudflare.com> Link: https://patch.msgid.link/20260325201139.2501937-2-carges@cloudflare.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27fwctl/bnxt_en: Refactor aux bus functions to be more genericPavan Chebbi1-19/+28
Up until now there was only one auxiliary device that bnxt created and that was for RoCE driver. bnxt fwctl is also going to use an aux bus device that bnxt should create. This requires some nomenclature changes and refactoring of the existing bnxt aux dev functions. Convert 'aux_priv' and 'edev' members of struct bnxt into arrays where each element contains supported auxbus device's data. Move struct bnxt_aux_priv from bnxt.h to ulp.h because that is where it belongs. Make aux bus init/uninit/add/del functions more generic which will loop through all the aux device types. Make bnxt_ulp_start/stop functions (the only other common functions applicable to any aux device) loop through the aux devices to update their config and states. Make callers of bnxt_ulp_start() call it only when there are no errors. Also, as an improvement in code, bnxt_register_dev() can skip unnecessary dereferencing of edev from bp, instead use the edev pointer from the function parameter. Future patches will reuse these functions to add an aux bus device for fwctl. Link: https://patch.msgid.link/r/20260314151605.932749-3-pavan.chebbi@broadcom.com Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-27fwctl/bnxt_en: Move common definitions to include/linux/bnxt/Pavan Chebbi1-1/+1
We have common definitions that are now going to be used by more than one component outside of bnxt (bnxt_re and fwctl) Move bnxt_ulp.h to include/linux/bnxt/ as ulp.h. Link: https://patch.msgid.link/r/20260314151605.932749-2-pavan.chebbi@broadcom.com Reviewed-by: Andy Gospodarek <gospo@broadcom.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Cc: linux-rdma@vger.kernel.org Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2026-03-23ethtool: Track user-provided RSS indirection table sizeBjörn Töpel1-2/+1
Track the number of indirection table entries the user originally provided (context 0/default as well!). Replace IFF_RXFH_CONFIGURED with rss_indir_user_size: the flag is redundant now that user_size captures the same information. Add ethtool_rxfh_indir_lost() for drivers that must reset the indirection table. Convert bnxt and mlx5 to use it. Signed-off-by: Björn Töpel <bjorn@kernel.org> Link: https://patch.msgid.link/20260320085826.1957255-2-bjorn@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-19Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-0/+2
Cross-merge networking fixes after downstream PR (net-7.0-rc5). net/netfilter/nft_set_rbtree.c 598adea720b97 ("netfilter: revert nft_set_rbtree: validate open interval overlap") 3aea466a43998 ("netfilter: nft_set_rbtree: don't disable bh when acquiring tree lock") https://lore.kernel.org/abgaQBpeGstdN4oq@sirena.org.uk No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-17bnxt_en: fix OOB access in DBG_BUF_PRODUCER async event handlerJunrui Luo1-0/+2
The ASYNC_EVENT_CMPL_EVENT_ID_DBG_BUF_PRODUCER handler in bnxt_async_event_process() uses a firmware-supplied 'type' field directly as an index into bp->bs_trace[] without bounds validation. The 'type' field is a 16-bit value extracted from DMA-mapped completion ring memory that the NIC writes directly to host RAM. A malicious or compromised NIC can supply any value from 0 to 65535, causing an out-of-bounds access into kernel heap memory. The bnxt_bs_trace_check_wrap() call then dereferences bs_trace->magic_byte and writes to bs_trace->last_offset and bs_trace->wrapped, leading to kernel memory corruption or a crash. Fix by adding a bounds check and defining BNXT_TRACE_MAX as DBG_LOG_BUFFER_FLUSH_REQ_TYPE_ERR_QPC_TRACE + 1 to cover all currently defined firmware trace types (0x0 through 0xc). Fixes: 84fcd9449fd7 ("bnxt_en: Manage the FW trace context memory") Reported-by: Yuhao Jiang <danisjiang@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Junrui Luo <moonafterrain@outlook.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/SYBPR01MB7881A253A1C9775D277F30E9AF42A@SYBPR01MB7881.ausprd01.prod.outlook.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-43/+34
Cross-merge networking fixes after downstream PR (net-7.0-rc2). Conflicts: tools/testing/selftests/drivers/net/hw/rss_ctx.py 19c3a2a81d2b ("selftests: drv-net: rss: Generate unique ports for RSS context tests") ce5a0f4612db ("selftests: drv-net: rss_ctx: test RSS contexts persist after ifdown/up") include/net/inet_connection_sock.h 858d2a4f67ff6 ("tcp: fix potential race in tcp_v6_syn_recv_sock()") fcd3d039fab69 ("tcp: make tcp_v{4,6}_send_check() static") https://lore.kernel.org/aZ8PSFLzBrEU3I89@sirena.org.uk drivers/net/ethernet/mellanox/mlx5/core/en/xsk/setup.c drivers/net/ethernet/mellanox/mlx5/core/en/xsk/pool.c 69050f8d6d075 ("treewide: Replace kmalloc with kmalloc_obj for non-scalar types") bf4afc53b77ae ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") 8a96b9144f18a ("net/mlx5e: Alloc xsk channel param out of mlx5e_open_xsk()") Adjacent changes: net/netfilter/ipvs/ip_vs_ctl.c c59bd9e62e06 ("ipvs: use more counters to avoid service lookups") bf4afc53b77a ("Convert 'alloc_obj' family to use the new default GFP_KERNEL argument") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-26Merge tag 'net-7.0-rc2' of ↵Linus Torvalds1-6/+7
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from IPsec, Bluetooth and netfilter Current release - regressions: - wifi: fix dev_alloc_name() return value check - rds: fix recursive lock in rds_tcp_conn_slots_available Current release - new code bugs: - vsock: lock down child_ns_mode as write-once Previous releases - regressions: - core: - do not pass flow_id to set_rps_cpu() - consume xmit errors of GSO frames - netconsole: avoid OOB reads, msg is not nul-terminated - netfilter: h323: fix OOB read in decode_choice() - tcp: re-enable acceptance of FIN packets when RWIN is 0 - udplite: fix null-ptr-deref in __udp_enqueue_schedule_skb(). - wifi: brcmfmac: fix potential kernel oops when probe fails - phy: register phy led_triggers during probe to avoid AB-BA deadlock - eth: - bnxt_en: fix deleting of Ntuple filters - wan: farsync: fix use-after-free bugs caused by unfinished tasklets - xscale: check for PTP support properly Previous releases - always broken: - tcp: fix potential race in tcp_v6_syn_recv_sock() - kcm: fix zero-frag skb in frag_list on partial sendmsg error - xfrm: - fix race condition in espintcp_close() - always flush state and policy upon NETDEV_UNREGISTER event - bluetooth: - purge error queues in socket destructors - fix response to L2CAP_ECRED_CONN_REQ - eth: - mlx5: - fix circular locking dependency in dump - fix "scheduling while atomic" in IPsec MAC address query - gve: fix incorrect buffer cleanup for QPL - team: avoid NETDEV_CHANGEMTU event when unregistering slave - usb: validate USB endpoints" * tag 'net-7.0-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) netfilter: nf_conntrack_h323: fix OOB read in decode_choice() dpaa2-switch: validate num_ifs to prevent out-of-bounds write net: consume xmit errors of GSO frames vsock: document write-once behavior of the child_ns_mode sysctl vsock: lock down child_ns_mode as write-once selftests/vsock: change tests to respect write-once child ns mode net/mlx5e: Fix "scheduling while atomic" in IPsec MAC address query net/mlx5: Fix missing devlink lock in SRIOV enable error path net/mlx5: E-switch, Clear legacy flag when moving to switchdev net/mlx5: LAG, disable MPESW in lag_disable_change() net/mlx5: DR, Fix circular locking dependency in dump selftests: team: Add a reference count leak test team: avoid NETDEV_CHANGEMTU event when unregistering slave net: mana: Fix double destroy_workqueue on service rescan PCI path MAINTAINERS: Update maintainer entry for QUALCOMM ETHQOS ETHERNET DRIVER dpll: zl3073x: Remove redundant cleanup in devm_dpll_init() selftests/net: packetdrill: Verify acceptance of FIN packets when RWIN is 0 tcp: re-enable acceptance of FIN packets when RWIN is 0 vsock: Use container_of() to get net namespace in sysctl handlers net: usb: kaweth: validate USB endpoints ...
2026-02-24eth: bnxt: rename ring_err_stats -> ring_drv_statsJakub Kicinski1-9/+9
We recently added GRO stats to bnxt, which are maintained by the driver. Having "err" in the name of the struct for ring stats no longer makes sense (as pointed out by Michael, see Link). Rename them to "drv" stats, as these are all maintained by the driver (even if partially based on info from descriptors). Michael suggested calling these misc, happy to go back to that. IMHO "drv" is a bit more meaningful that "misc". Pure rename using sed, no functional changes. Link: https://lore.kernel.org/CACKFLimgibJ0qkM1AacZVh8MKKy-pE_AAc4KPKZ7GUqebmXW9A@mail.gmail.com Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260223203702.4137801-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-22Convert remaining multi-line kmalloc_obj/flex GFP_KERNEL usesKees Cook1-3/+3
Conversion performed via this Coccinelle script: // SPDX-License-Identifier: GPL-2.0-only // Options: --include-headers-for-types --all-includes --include-headers --keep-comments virtual patch @gfp depends on patch && !(file in "tools") && !(file in "samples")@ identifier ALLOC = {kmalloc_obj,kmalloc_objs,kmalloc_flex, kzalloc_obj,kzalloc_objs,kzalloc_flex, kvmalloc_obj,kvmalloc_objs,kvmalloc_flex, kvzalloc_obj,kvzalloc_objs,kvzalloc_flex}; @@ ALLOC(... - , GFP_KERNEL ) $ make coccicheck MODE=patch COCCI=gfp.cocci Build and boot tested x86_64 with Fedora 42's GCC and Clang: Linux version 6.19.0+ (user@host) (gcc (GCC) 15.2.1 20260123 (Red Hat 15.2.1-7), GNU ld version 2.44-12.fc42) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Linux version 6.19.0+ (user@host) (clang version 20.1.8 (Fedora 20.1.8-4.fc42), LLD 20.1.8) #1 SMP PREEMPT_DYNAMIC 1970-01-01 Signed-off-by: Kees Cook <kees@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert more 'alloc_obj' cases to default GFP_KERNEL argumentsLinus Torvalds1-8/+4
This converts some of the visually simpler cases that have been split over multiple lines. I only did the ones that are easy to verify the resulting diff by having just that final GFP_KERNEL argument on the next line. Somebody should probably do a proper coccinelle script for this, but for me the trivial script actually resulted in an assertion failure in the middle of the script. I probably had made it a bit _too_ trivial. So after fighting that far a while I decided to just do some of the syntactically simpler cases with variations of the previous 'sed' scripts. The more syntactically complex multi-line cases would mostly really want whitespace cleanup anyway. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21Convert 'alloc_obj' family to use the new default GFP_KERNEL argumentLinus Torvalds1-14/+14
This was done entirely with mindless brute force, using git grep -l '\<k[vmz]*alloc_objs*(.*, GFP_KERNEL)' | xargs sed -i 's/\(alloc_objs*(.*\), GFP_KERNEL)/\1)/' to convert the new alloc_obj() users that had a simple GFP_KERNEL argument to just drop that argument. Note that due to the extreme simplicity of the scripting, any slightly more complex cases spread over multiple lines would not be triggered: they definitely exist, but this covers the vast bulk of the cases, and the resulting diff is also then easier to check automatically. For the same reason the 'flex' versions will be done as a separate conversion. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook1-37/+31
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-20bnxt_en: Fix deleting of Ntuple filtersPavan Chebbi1-0/+3
Ntuple filters can be deleted when the interface is down. The current code blindly sends the filter delete command to FW. When the interface is down, all the VNICs are deleted in the FW. When the VNIC is freed in the FW, all the associated filters are also freed. We need not send the free command explicitly. Sending such command will generate FW error in the dmesg. In order to fix this, we can safely return from bnxt_hwrm_cfa_ntuple_filter_free() when BNXT_STATE_OPEN is not true which confirms the VNICs have been deleted. Fixes: 8336a974f37d ("bnxt_en: Save user configured filters in a lookup list") Suggested-by: Michael Chan <michael.chan@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260219185313.2682148-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-20bnxt_en: Fix RSS context delete logicPavan Chebbi1-6/+4
We need to free the corresponding RSS context VNIC in FW everytime an RSS context is deleted in driver. Commit 667ac333dbb7 added a check to delete the VNIC in FW only when netif_running() is true to help delete RSS contexts with interface down. Having that condition will make the driver leak VNICs in FW whenever close() happens with active RSS contexts. On the subsequent open(), as part of RSS context restoration, we will end up trying to create extra VNICs for which we did not make any reservation. FW can fail this request, thereby making us lose active RSS contexts. Suppose an RSS context is deleted already and we try to process a delete request again, then the HWRM functions will check for validity of the request and they simply return if the resource is already freed. So, even for delete-when-down cases, netif_running() check is not necessary. Remove the netif_running() condition check when deleting an RSS context. Reported-by: Jakub Kicinski <kicinski@meta.com> Fixes: 667ac333dbb7 ("eth: bnxt: allow deleting RSS contexts when the device is down") Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260219185313.2682148-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10bnxt_en: Check RSS contexts in bnxt_need_reserve_rings()Michael Chan1-0/+2
bnxt_need_reserve_rings() checks all resources except HW RSS contexts to determine if a new reservation is required. For completeness, add the check for HW RSS contexts. This makes the code more complete after the recent commit to increase the number of RSS contexts for a larger RSS indirection table: Fixes: 51b9d3f948b8 ("bnxt_en: Use a larger RSS indirection table on P5_PLUS chips") Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260207235118.1987301-3-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-10bnxt_en: Refactor bnxt_need_reserve_rings()Michael Chan1-15/+30
bnxt_need_reserve_rings() checks 6 ring resources against the reserved values to determine if a new reservation is needed. Factor out the code to collect the total resources into a new helper function bnxt_get_total_resources() to make the code cleaner and easier to read. Instead of individual scalar variables, use the struct bnxt_hw_rings to hold all the ring resources. Using the struct, hwr.cp replaces the nq variable and the chip specific hwr.cp_p5 replaces cp on newer chips. There is no change in behavior. This will make it easier to check the RSS context resource in the next patch. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Signed-off-by: Michael Chan <michael.chan@broadcom.com> Reviewed-by: Joe Damato <joe@dama.to> Link: https://patch.msgid.link/20260207235118.1987301-2-michael.chan@broadcom.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-09eth: bnxt: gather and report HW-GRO statsJakub Kicinski1-2/+13
Count and report HW-GRO stats as seen by the kernel. The device stats for GRO seem to not reflect the reality, perhaps they count sessions which did not actually result in any aggregation. Also they count wire packets, so we have to count super-frames, anyway. Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260207003509.3927744-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-02-06bnxt_en: Remove jumbo_remove step from TX pathAlice Mikityanska1-21/+0
Now that the kernel doesn't insert HBH for BIG TCP IPv6 packets, remove unnecessary steps from the bnxt_en TX path, that used to check and remove HBH. Signed-off-by: Alice Mikityanska <alice@isovalent.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260205133925.526371-9-alice.kernel@fastmail.im Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-29eth: bnxt: make sure we populate the qcfg defaults on old FW/HWJakub Kicinski1-0/+1
The driver now depends on the core to tell it what the rx page size should be for the agg ring. We must populate the ndo_default_qcfg callback even if we don't support any queue ops. This fixes: Oops: divide error: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN RIP: 0010:bnxt_alloc_rx_page_pool (drivers/net/ethernet/broadcom/bnxt/bnxt.c:3852) with fw version 225.1.109.0. Link: https://lore.kernel.org/20250421222827.283737-20-kuba@kernel.org Fixes: f96e1b35779e ("eth: bnxt: support qcfg provided rx page size") Reviewed-by: Michael Chan <michael.chan@broadcom.com> Link: https://patch.msgid.link/20260128193258.125274-1-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-23eth: bnxt: plug bnxt_validate_qcfg() into qopsJakub Kicinski1-5/+6
Plug bnxt_validate_qcfg() back into qops, where it was in my old RFC. Link: https://patch.msgid.link/20260122005113.2476634-7-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-23net: introduce a trivial netdev_queue_config()Jakub Kicinski1-3/+3
We may choose to extend or reimplement the logic which renders the per-queue config. The drivers should not poke directly into the queue state. Add a helper for drivers to use when they want to query the config for a specific queue. Link: https://patch.msgid.link/20260122005113.2476634-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-23eth: bnxt: always set the queue mgmt opsJakub Kicinski1-1/+5
Core provides a centralized callback for validating per-queue settings but the callback is part of the queue management ops. Having the ops conditionally set complicates the parts of the driver which could otherwise lean on the core to feed it the correct settings. Always set the queue ops, but provide no restart-related callbacks if queue ops are not supported by the device. This should maintain current behavior, the check in netdev_rx_queue_restart() looks both at op struct and individual ops. Reviewed-by: Subbaraya Sundeep <sbhatta@marvell.com> Link: https://patch.msgid.link/20260122005113.2476634-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20Merge tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linuxJakub Kicinski1-29/+94
Pavel Begunkov says: ==================== Add support for providers with large rx buffer Many modern NICs support configurable receive buffer lengths, and zcrx and memory providers can use buffers larger than 4K to improve performance. When paired with hw-gro larger rx buffer sizes can drastically reduce the number of buffers traversing the stack and save a lot of processing time. It also allows to give to users larger contiguous chunks of data. Single stream benchmarks showed up to ~30% CPU util improvement. E.g. comparison for 4K vs 32K buffers using a 200Gbit NIC: packets=23987040 (MB=2745098), rps=199559 (MB/s=22837) CPU %usr %nice %sys %iowait %irq %soft %idle 0 1.53 0.00 27.78 2.72 1.31 66.45 0.22 packets=24078368 (MB=2755550), rps=200319 (MB/s=22924) CPU %usr %nice %sys %iowait %irq %soft %idle 0 0.69 0.00 8.26 31.65 1.83 57.00 0.57 This series adds net infrastructure for memory providers configuring the size and implements it for bnxt. It's an opt-in feature for drivers, they should advertise support for the parameter in the qops and must check if the hardware supports the given size. It's limited to memory providers as it drastically simplifies implementation. It doesn't affect the fast path zcrx uAPI, and the user exposed parameter is defined in zcrx terms, which allows it to be flexible and adjusted in the future. A liburing example can be found at [2] full branch: [1] https://github.com/isilence/linux.git zcrx/large-buffers-v8 Liburing example: [2] https://github.com/isilence/liburing.git zcrx/rx-buf-len * tag 'net-queue-rx-buf-len-v9' of https://github.com/isilence/linux: io_uring/zcrx: document area chunking parameter selftests: iou-zcrx: test large chunk sizes eth: bnxt: support qcfg provided rx page size eth: bnxt: adjust the fill level of agg queues with larger buffers eth: bnxt: store rx buffer size per queue net: pass queue rx page size from memory provider net: add bare bone queue configs net: reduce indent of struct netdev_queue_mgmt_ops members net: memzero mp params when closing a queue ==================== Link: https://patch.msgid.link/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14eth: bnxt: support qcfg provided rx page sizePavel Begunkov1-1/+35
Implement support for qcfg provided rx page sizes. For that, implement the ndo_default_qcfg callback and validate the config on restart. Also, use the current config's value in bnxt_init_ring_struct to retain the correct size across resets. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2026-01-14eth: bnxt: adjust the fill level of agg queues with larger buffersJakub Kicinski1-4/+21
The driver tries to provision more agg buffers than header buffers since multiple agg segments can reuse the same header. The calculation / heuristic tries to provide enough pages for 65k of data for each header (or 4 frags per header if the result is too big). This calculation is currently global to the adapter. If we increase the buffer sizes 8x we don't want 8x the amount of memory sitting on the rings. Luckily we don't have to fill the rings completely, adjust the fill level dynamically in case particular queue has buffers larger than the global size. Signed-off-by: Jakub Kicinski <kuba@kernel.org> [pavel: rebase on top of agg_size_fac] Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2026-01-14eth: bnxt: store rx buffer size per queuePavel Begunkov1-23/+33
Instead of using a constant buffer length, allow configuring the size for each queue separately. There is no way to change the length yet, and it'll be passed from memory providers in a later patch. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2026-01-14net: add bare bone queue configsPavel Begunkov1-2/+6
We'll need to pass extra parameters when allocating a queue for memory providers. Define a new structure for queue configurations, and pass it to qapi callbacks. It's empty for now, actual parameters will be added in following patches. Configurations should persist across resets, and for that they're default-initialised on device registration and stored in struct netdev_rx_queue. We also add a new qapi callback for defaulting a given config. It must be implemented if a driver wants to use queue configs and is optional otherwise. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2026-01-10bnxt_en: Implement ethtool_ops -> get_link_ext_state()Michael Chan1-1/+24
Map the link_down_reason from the FW to the ethtool link_ext_state when it is available. Also log it to the link down dmesg when it is available. Add 2 new link_ext_state enums to the UAPI: ETHTOOL_LINK_EXT_STATE_OTP_SPEED_VIOLATION ETHTOOL_LINK_EXT_STATE_BMC_REQUEST_DOWN to cover OTP (one-time-programmable) speed restrictions and BMC (Baseboard management controller) forcing the link down. Reviewed-by: Andy Gospodarek <andrew.gospodarek@broadcom.com> Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com> Reviewed-by: Pavan Chebbi <pavan.chebbi@broadcom.com> Sign