aboutsummaryrefslogtreecommitdiff
path: root/drivers/net/ethernet/meta/fbnic
AgeCommit message (Collapse)AuthorFilesLines
37 hourseth: fbnic: fix double-free of PCS on phylink creation failureBobby Eshleman1-1/+2
fbnic_phylink_create() stores the newly allocated PCS in fbn->pcs and then calls phylink_create(). When phylink_create() fails, the error path correctly destroys the PCS via xpcs_destroy_pcs(), but the caller, fbnic_netdev_alloc(), responds by invoking fbnic_netdev_free() which calls fbnic_phylink_destroy(). That function finds fbn->pcs non-NULL and calls xpcs_destroy_pcs() a second time on the already-freed object, triggering a refcount underflow use-after-free: [ 1.934973] fbnic 0000:01:00.0: Failed to create Phylink interface, err: -22 [ 1.935103] ------------[ cut here ]------------ [ 1.935179] refcount_t: underflow; use-after-free. [ 1.935252] WARNING: lib/refcount.c:28 at refcount_warn_saturate+0x59/0x90, CPU#0: swapper/0/1 [ 1.935389] Modules linked in: [ 1.935484] CPU: 0 UID: 0 PID: 1 Comm: swapper/0 Not tainted 7.0.0-virtme-04244-g1f5ffc672165-dirty #1 PREEMPT(lazy) [ 1.935661] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 [ 1.935826] RIP: 0010:refcount_warn_saturate+0x59/0x90 [ 1.935931] Code: 44 48 8d 3d 49 f9 a7 01 67 48 0f b9 3a e9 bf 1e 96 00 48 8d 3d 48 f9 a7 01 67 48 0f b9 3a c3 cc cc cc cc 48 8d 3d 47 f9 a7 01 <67> 48 0f b9 3a c3 cc cc cc cc 48 8d 3d 46 f9 a7 01 67 48 0f b9 3a [ 1.936274] RSP: 0000:ffffd0d440013c58 EFLAGS: 00010246 [ 1.936376] RAX: 0000000000000000 RBX: ffff8f39c188c278 RCX: 000000000000002b [ 1.936524] RDX: ffff8f39c004f000 RSI: 0000000000000003 RDI: ffffffff96abab00 [ 1.936692] RBP: ffff8f39c188c240 R08: ffffffff96988e88 R09: 00000000ffffdfff [ 1.936835] R10: ffffffff96878ea0 R11: 0000000000000187 R12: 0000000000000000 [ 1.936970] R13: ffff8f39c0cef0c8 R14: ffff8f39c1ac01c0 R15: 0000000000000000 [ 1.937114] FS: 0000000000000000(0000) GS:ffff8f3ba08b4000(0000) knlGS:0000000000000000 [ 1.937273] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1.937382] CR2: ffff8f3b3ffff000 CR3: 0000000172642001 CR4: 0000000000372ef0 [ 1.937540] Call Trace: [ 1.937619] <TASK> [ 1.937698] xpcs_destroy_pcs+0x25/0x40 [ 1.937783] fbnic_netdev_alloc+0x1e5/0x200 [ 1.937859] fbnic_probe+0x230/0x370 [ 1.937939] local_pci_probe+0x3e/0x90 [ 1.938013] pci_device_probe+0xbb/0x1e0 [ 1.938091] ? sysfs_do_create_link_sd+0x6d/0xe0 [ 1.938188] really_probe+0xc1/0x2b0 [ 1.938282] __driver_probe_device+0x73/0x120 [ 1.938371] driver_probe_device+0x1e/0xe0 [ 1.938466] __driver_attach+0x8d/0x190 [ 1.938560] ? __pfx___driver_attach+0x10/0x10 [ 1.938663] bus_for_each_dev+0x7b/0xd0 [ 1.938758] bus_add_driver+0xe8/0x210 [ 1.938854] driver_register+0x60/0x120 [ 1.938929] ? __pfx_fbnic_init_module+0x10/0x10 [ 1.939026] fbnic_init_module+0x25/0x60 [ 1.939109] do_one_initcall+0x49/0x220 [ 1.939202] ? rdinit_setup+0x20/0x40 [ 1.939304] kernel_init_freeable+0x1b0/0x310 [ 1.939449] ? __pfx_kernel_init+0x10/0x10 [ 1.939560] kernel_init+0x1a/0x1c0 [ 1.939640] ret_from_fork+0x1ed/0x240 [ 1.939730] ? __pfx_kernel_init+0x10/0x10 [ 1.939805] ret_from_fork_asm+0x1a/0x30 [ 1.939886] </TASK> [ 1.939927] ---[ end trace 0000000000000000 ]--- [ 1.940184] fbnic 0000:01:00.0: Netdev allocation failed Instead of calling fbnic_phylink_destroy(), the prior initialization of netdev should just be unrolled with free_netdev() and clearing fbd->netdev. Clearing fbd->netdev to NULL avoids UAF in init_failure_mode where callers guard by checking !fbd->netdev, such as fbnic_mdio_read_pmd(). These callers remain active even after a failed probe, so fdb->netdev still needs to be cleared. Fixes: d0fe7104c795 ("fbnic: Replace use of internal PCS w/ Designware XPCS") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260504-fbnic-pcs-fix-v2-1-de45192821d9@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-21fbnic: convert to ndo_set_rx_mode_asyncStanislav Fomichev4-11/+19
Convert fbnic from ndo_set_rx_mode to ndo_set_rx_mode_async. The driver's __fbnic_set_rx_mode() now takes explicit uc/mc list parameters and uses __hw_addr_sync_dev() on the snapshots instead of __dev_uc_sync/__dev_mc_sync on the netdev directly. Update callers in fbnic_up, fbnic_fw_config_after_crash, fbnic_bmc_rpc_check and fbnic_set_mac to pass the real address lists calling __fbnic_set_rx_mode outside the async work path. Cc: Alexander Duyck <alexanderduyck@fb.com> Cc: kernel-team@meta.com Reviewed-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com> Signed-off-by: Stanislav Fomichev <sdf@fomichev.me> Link: https://patch.msgid.link/20260416185712.2155425-6-sdf@fomichev.me Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-04-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Merge in late fixes in preparation for the net-next PR. Conflicts: include/net/sch_generic.h a6bd339dbb351 ("net_sched: fix skb memory leak in deferred qdisc drops") ff2998f29f390 ("net: sched: introduce qdisc-specific drop reason tracing") https://lore.kernel.org/adz0iX85FHMz0HdO@sirena.org.uk drivers/net/ethernet/airoha/airoha_eth.c 1acdfbdb516b ("net: airoha: Fix VIP configuration for AN7583 SoC") bf3471e6e6c0 ("net: airoha: Make flow control source port mapping dependent on nbq parameter") Adjacent changes: drivers/net/ethernet/airoha/airoha_ppe.c f44218cd5e6a ("net: airoha: Reset PPE cpu port configuration in airoha_ppe_hw_init()") 7da62262ec96 ("inet: add ip_local_port_step_width sysctl to improve port usage distribution") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-09eth: fbnic: Use wake instead of startMohsin Bashir1-1/+1
fbnic_up() calls netif_tx_start_all_queues(), which only clears __QUEUE_STATE_DRV_XOFF. If qdisc backlog has accumulated on any TX queue before the reconfiguration (e.g. ring resize via ethtool -G), start does not call __netif_schedule() to kick the qdisc, so the pending backlog is never drained and the queue stalls. Switch to netif_tx_wake_all_queues(), which clears DRV_XOFF and also calls __netif_schedule() on every queue, ensuring any backlog that built up before the down/up cycle is promptly dequeued. Fixes: bc6107771bb4 ("eth: fbnic: Allocate a netdevice and napi vectors with queues") Signed-off-by: Mohsin Bashir <hmohsin@meta.com> Link: https://patch.msgid.link/20260408002415.2963915-1-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski3-5/+5
Cross-merge networking fixes after downstream PR (net-7.0-rc7). Conflicts: net/vmw_vsock/af_vsock.c b18c83388874 ("vsock: initialize child_ns_mode_locked in vsock_net_init()") 0de607dc4fd8 ("vsock: add G2H fallback for CIDs not owned by H2G transport") Adjacent changes: drivers/net/ethernet/broadcom/bnxt/bnxt_ethtool.c ceee35e5674a ("bnxt_en: Refactor some basic ring setup and adjustment logic") 57cdfe0dc70b ("bnxt_en: Resize RSS contexts on channel count change") drivers/net/wireless/intel/iwlwifi/mld/mac80211.c 4d56037a02bd ("wifi: iwlwifi: mld: block EMLSR during TDLS connections") 687a95d204e7 ("wifi: iwlwifi: mld: correctly set wifi generation data") drivers/net/wireless/intel/iwlwifi/mld/scan.h b6045c899e37 ("wifi: iwlwifi: mld: Refactor scan command handling") ec66ec6a5a8f ("wifi: iwlwifi: mld: Fix MLO scan timing") drivers/net/wireless/intel/iwlwifi/mvm/fw.c 078df640ef05 ("wifi: iwlwifi: mld: add support for iwl_mcc_allowed_ap_type_cmd v 2") 323156c3541e ("wifi: iwlwifi: mvm: don't send a 6E related command when not supported") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-04-02eth: fbnic: Increase FBNIC_QUEUE_SIZE_MIN to 64Dimitri Daskalakis1-1/+1
On systems with 64K pages, RX queues will be wedged if users set the descriptor count to the current minimum (16). Fbnic fragments large pages into 4K chunks, and scales down the ring size accordingly. With 64K pages and 16 descriptors, the ring size mask is 0 and will never be filled. 32 descriptors is another special case that wedges the RX rings. Internally, the rings track pages for the head/tail pointers, not page fragments. So with 32 descriptors, there's only 1 usable page as one ring slot is kept empty to disambiguate between an empty/full ring. As a result, the head pointer never advances and the HW stalls after consuming 16 page fragments. Fixes: 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260401162848.2335350-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-31fbnic: Set Relaxed Ordering PCIe TLP attributes for DMA enginesAlexander Duyck4-2/+27
Add ATTR CSR bit field definitions for the DMA engine TLP header configuration registers: AW_CFG: RDE_ATTR[17:15], RQM_ATTR[14:12], TQM_ATTR[11:9] AR_CFG: TDE_ATTR[17:15], RQM_ATTR[14:12], TQM_ATTR[11:9] These fields control the PCIe TLP attribute bits for outbound transactions from the TQM, RQM, RDE (write path), and TDE (read path) DMA engines. An enum is added with standard PCIe TLP attribute values: NS (No Snoop), RO (Relaxed Ordering), and IDO (ID-based Ordering). Read the PCIe Relaxed Ordering capability at probe time and store it in fbnic_dev. Configure Relaxed Ordering on the PCIe TLP attributes in fbnic_mbx_init_desc_ring when the capability is enabled. For the write path (AW_CFG), set RO on RDE and TQM attributes. For the read path (AR_CFG), set RO on all three attributes (TDE, RQM, TQM). This allows the PCIe fabric to reorder these transactions for improved throughput. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260327204445.3074446-1-dimitri.daskalakis1@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-27eth: fbnic: Fix debugfs output for BDQ's with page fragsDimitri Daskalakis1-1/+1
The rings size_mask represents the number of pages, so we need to determine the number of page frags when dumping the descriptors. Fixes: df04373b0dab ("eth fbnic: Add debugfs hooks for tx/rx rings") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260324195123.3486219-3-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-27eth: fbnic: Account for page fragments when updating BDQ tailDimitri Daskalakis1-3/+3
FBNIC supports fixed size buffers of 4K. When PAGE_SIZE > 4K, we fragment the page across multiple descriptors (FBNIC_BD_FRAG_COUNT). When refilling the BDQ, the correct number of entries are populated, but tail was only incremented by one. So on a system with 64K pages, HW would get one descriptor refilled for every 16 we populate. Additionally, we program the ring size in the HW when enabling the BDQ. This was not accounting for page fragments, so on systems with 64K pages, the HW used 1/16th of the ring. Fixes: 0cb4c0a13723 ("eth: fbnic: Implement Rx queue alloc/start/stop/free") Signed-off-by: Dimitri Daskalakis <daskald@meta.com> Link: https://patch.msgid.link/20260324195123.3486219-2-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-03-10eth fbnic: Add mailbox self testMike Marciniszyn (Meta)3-0/+142
The mailbox self test ensures the interface to and from the firmware is healthy by sending a test message and fielding the response from the firmware. This patch uses the new completion API [1][2] that allocates a completion structure, binds the completion to the TEST message, and uses a new FW parsing routine that wraps the completion processing around the TLV parser. Link: https://patch.msgid.link/20250516164804.741348-1-lee@trager.us [1] Link: https://patch.msgid.link/20260115003353.4150771-6-mohsin.bashr@gmail.com [2] Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-6-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10eth fbnic: TLV support for use by MBX self testMike Marciniszyn (Meta)2-0/+303
The TLV (Type-Value-Length) self uses a known set of data to create a TLV message. These routines support the MBX self test by creating the test messages and parsing the response message coming back from the firmware. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-5-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10eth fbnic: Add msix self testMike Marciniszyn (Meta)3-0/+214
This function is meant to test the global interrupt registers and the PCIe IP MSI-X functionality. It essentially goes through and tests various combinations of the set, clear, and mask bits in order to verify the behavior is as we expect it to be from the driver. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-4-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-10eth fbnic: Add register self testMike Marciniszyn (Meta)3-0/+197
The register test will be used to verify hardware is behaving as expected. The test itself will have us writing to registers that should have no side effects due to us resetting after the test has been completed. While the test is being run the interface should be offline. This patch counts on the first patch of this series to export netif_open() and also ensures that the half close calls netif_close() to avoid deadlock. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260307105847.1438-3-mike.marciniszyn@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05eth: fbnic: Fetch TX pause storm statsMohsin Bashir4-0/+20
With pause storm protection in place, track the occurrence of pause storm events. Since there is a one-to-one mapping between pause storm interrupts and events, use the interrupt count to track this metric. ./ethtool -I -a eth0 Pause parameters for eth0: Autonegotiate: off RX: off TX: on Statistics: tx_pause_frames: 759657 rx_pause_frames: 0 tx_pause_storm_events: 219 Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-5-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-03-05eth: fbnic: Add protection against pause stormMohsin Bashir7-0/+186
Add protection against TX pause storms. A pause storm occurs when a device fails to send received packets up to the stack. When a pause storm is detected (pause state persists beyond the configured timeout), the device stops sending the pause frames and begins dropping packets instead of back-pressuring. The timeout is configurable via ethtool tunable (pfc-prevention-tout) with a maximum value of 10485ms, and the default value of 500ms. Once the device transitions to the storm-detected state, the service task periodically attempts recovery, returning the device to normal operation to handle any subsequent pause storm episodes. Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260302230149.1580195-4-mohsin.bashr@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-21Convert 'alloc_flex' family to use the new default GFP_KERNEL argumentLinus Torvalds1-1/+1
This is the exact same thing as the 'alloc_obj()' version, only much smaller because there are a lot fewer users of the *alloc_flex() interface. As with alloc_obj() version, this was done entirely with mindless brute force, using the same script, except using 'flex' in the pattern rather than 'objs*'. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2026-02-21treewide: Replace kmalloc with kmalloc_obj for non-scalar typesKees Cook1-1/+1
This is the result of running the Coccinelle script from scripts/coccinelle/api/kmalloc_objs.cocci. The script is designed to avoid scalar types (which need careful case-by-case checking), and instead replace kmalloc-family calls that allocate struct or union object instances: Single allocations: kmalloc(sizeof(TYPE), ...) are replaced with: kmalloc_obj(TYPE, ...) Array allocations: kmalloc_array(COUNT, sizeof(TYPE), ...) are replaced with: kmalloc_objs(TYPE, COUNT, ...) Flex array allocations: kmalloc(struct_size(PTR, FAM, COUNT), ...) are replaced with: kmalloc_flex(*PTR, FAM, COUNT, ...) (where TYPE may also be *VAR) The resulting allocations no longer return "void *", instead returning "TYPE *". Signed-off-by: Kees Cook <kees@kernel.org>
2026-02-19eth: fbnic: Advertise supported XDP features.Dimitri Daskalakis1-0/+2
Drivers are supposed to advertise the XDP features they support. This was missed while adding XDP support. Before: $ ynl --family netdev --dump dev-get ... {'ifindex': 3, 'xdp-features': set(), 'xdp-rx-metadata-features': set(), 'xsk-features': set()}, ... After: $ ynl --family netdev --dump dev-get ... {'ifindex': 3, 'xdp-features': {'basic', 'rx-sg'}, 'xdp-rx-metadata-features': set(), 'xsk-features': set()}, ... Fixes: 168deb7b31b2 ("eth: fbnic: Add support for XDP_TX action") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://patch.msgid.link/20260218030620.3329608-1-dimitri.daskalakis1@gmail.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-18eth: fbnic: Add validation for MTU changesDimitri Daskalakis1-0/+18
Increasing the MTU beyond the HDS threshold causes the hardware to fragment packets across multiple buffers. If a single-buffer XDP program is attached, the driver will drop all multi-frag frames. While we can't prevent a remote sender from sending non-TCP packets larger than the MTU, this will prevent users from inadvertently breaking new TCP streams. Traditionally, drivers supported XDP with MTU less than 4Kb (packet per page). Fbnic currently prevents attaching XDP when MTU is too high. But it does not prevent increasing MTU after XDP is attached. Fixes: 1b0a3950dbd4 ("eth: fbnic: Add XDP pass, drop, abort support") Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
2026-02-17eth: fbnic: set DMA_HINT_L4 for all flowsBobby Eshleman2-3/+5
fbnic always advertises ETHTOOL_TCP_DATA_SPLIT_ENABLED via ethtool .get_ringparam. To enable proper splitting for all flow types, even for IP/Ethernet flows, this patch sets DMA_HINT_L4 unconditionally for all RSS and NFC flow steering rules. According to the spec, L4 falls back to L3 if no valid L4 is found, and L3 falls back to L2 if no L3 is found. This makes sure that the correct header boundary is used regardless of traffic type. This is important for zero-copy use cases where we must ensure that all ZC packets are split correctly. Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-3-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-17eth: fbnic: increase FBNIC_HDR_BYTES_MIN from 128 to 256 bytesBobby Eshleman1-1/+1
Increase FBNIC_HDR_BYTES_MIN from 128 to 256 bytes. The previous minimum was too small to guarantee that very long L2+L3+L4 headers always fit within the header buffer. When EN_HDR_SPLIT is disabled and a packet exceeds MAX_HEADER_BYTES, splitting occurs at that byte offset instead of the header boundary, resulting in some of the header landing in the payload page. The increased minimum ensures headers always fit with the MAX_HEADER_BYTES cut off and land in the header page. Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-2-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-17eth: fbnic: set FBNIC_QUEUE_RDE_CTL0_EN_HDR_SPLIT on RDE_CTL0Bobby Eshleman1-11/+14
Fix EN_HDR_SPLIT configuration by writing the field to RDE_CTL0 instead of RDE_CTL1. Because drop mode configuration and header splitting enablement both use RDE_CTL0, we consolidate these configurations into the single function fbnic_config_drop_mode. Fixes: 2b30fc01a6c7 ("eth: fbnic: Add support for HDS configuration") Signed-off-by: Bobby Eshleman <bobbyeshleman@meta.com> Acked-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260211-fbnic-tcp-hds-fixes-v1-1-55d050e6f606@meta.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-13fbnic: close fw_log race between users and teardownChengfeng Ye2-10/+12
Fixes a theoretical race on fw_log between the teardown path and fw_log write functions. fw_log is written inside fbnic_fw_log_write() and can be reached from the mailbox handler fbnic_fw_msix_intr(), but fw_log is freed before IRQ/MBX teardown during cleanup, resulting in a potential data race of dereferencing a freed/null variable. Possible Interleaving Scenario: CPU0: fbnic_fw_msix_intr() // Entry fbnic_fw_log_write() if (fbnic_fw_log_ready()) // true ... preempt ... CPU1: fbnic_remove() // Entry fbnic_fw_log_free() vfree(log->data_start); log->data_start = NULL; CPU0: continues, walks log->entries or writes to log->data_start The initialization also has an incorrect order problem, as the fw_log is currently allocated after MBX setup during initialization. Fix the problems by adjusting the synchronization order to put initialization in place before the mailbox is enabled, and not cleared until after the mailbox has been disabled. Fixes: ecc53b1b46c89 ("eth: fbnic: Enable firmware logging") Signed-off-by: Chengfeng Ye <dg573847474@gmail.com> Link: https://patch.msgid.link/20260211191329.530886-1-dg573847474@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-30eth fbnic: Add debugfs hooks for tx/rx ringsMike Marciniszyn (Meta)5-1/+400
Add debugfs hooks to display tx/rx rings for each napi vector. Note that the cloning mechanism in fbnic_ethtool.c for configuration changes protects against concurrency issues with simultaneous config changes along with debugs ring accesses. The configuration switch builds up the new configuration offline, takes the current config down, which removes the debugfs nv files, and switches to the new configuration. The new configuration is brought up which brings the debugfs files back on top of the new configuration rings. The interaction with fbnic_queue_stop() and fbnic_queue_start() will similarly delete and add the files for the indicated vector. Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260127200644.11640-3-mike.marciniszyn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-30eth fbnic: Add debugfs hooks for firmware mailboxMike Marciniszyn (Meta)3-1/+50
This patch adds reporting the Rx and Tx information interfacing with the firmware. The result of reading fbnic/fw_mbx is: Rx Rdy: 1 Head: 11 Tail: 10 Idx Len E Addr F H Raw ---------------------------------- 00 4096 0 000101fea000 0 1 1000000101fea001 01 4096 0 000101feb000 0 1 1000000101feb001 . . . 15 4096 0 000101fe9000 0 1 1000000101fe9001 Tx Rdy: 1 Head: 4 Tail: 4 Idx Len E Addr F H Raw ---------------------------------- 00 0004 1 00010321b000 1 1 000440010321b003 01 0004 1 00010228d000 1 1 000440010228d003 . . . 15 0004 1 00010321b000 1 1 000440010321b003 Signed-off-by: Mike Marciniszyn (Meta) <mike.marciniszyn@gmail.com> Link: https://patch.msgid.link/20260127200644.11640-2-mike.marciniszyn@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-23net: fbnic: convert to use .get_rx_ring_countBreno Leitao1-4/+8
Use the newly introduced .get_rx_ring_count ethtool ops callback instead of handling ETHTOOL_GRXRINGS directly in .get_rxnfc(). Signed-off-by: Breno Leitao <leitao@debian.org> Reviewed-by: Brett Creeley <brett.creeley@amd.com> Link: https://patch.msgid.link/20260122-grxring_big_v4-v2-5-94dbe4dcaa10@debian.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Update RX mbox timeout valueMohsin Bashir3-5/+14
While waiting for completions on read requests, driver is using different timeout values for different messages. Make use of a single timeout value. Introduce a wrapper function to handle the wait, which also simplify maintaining the 80 char line limit. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-6-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Remove retry supportMohsin Bashir1-19/+5
The driver retries sensor read requests from firmware, but this is unnecessary. A functioning firmware should respond to each request within the timeout period. Remove the retry logic and set the timeout to the sum of all retry timeouts. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-5-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Reuse RX mailbox pagesMohsin Bashir1-9/+16
Currently, the RX mailbox frees and reallocates a page for each received message. Since FW Rx messages are processed synchronously, and nothing hold these pages (unlike skbs which we hand over to the stack), reuse the pages and put them back on the Rx ring. Now that we ensure the ring is always fully populated we don't have to worry about filling it up after partial population during init, either. Update fbnic_mbx_process_rx_msgs() to recycle pages after message processing. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-4-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Allocate all pages for RX mailboxMohsin Bashir1-5/+9
Now that memory is allocated with GFP_KERNEL, allocation failures should be extremely rare. Ensure the FW communication ring is always fully populated with free pages, and hard fail initialization otherwise. This enables simplifications in next patches. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-3-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-20eth: fbnic: Use GFP_KERNEL to allocting mbx pagesMohsin Bashir1-2/+1
Replace GFP_ATOMIC with GFP_KERNEL for mailbox RX page allocation. Since interrupt handler is threaded GFP_KERNEL is a safe option to reduce allocation failures. Also remove __GFP_NOWARN so the kernel reports a warning on allocation failure to aid debugging. Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20260115003353.4150771-2-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2026-01-14net: add bare bone queue configsPavel Begunkov1-2/+6
We'll need to pass extra parameters when allocating a queue for memory providers. Define a new structure for queue configurations, and pass it to qapi callbacks. It's empty for now, actual parameters will be added in following patches. Configurations should persist across resets, and for that they're default-initialised on device registration and stored in struct netdev_rx_queue. We also add a new qapi callback for defaulting a given config. It must be implemented if a driver wants to use queue configs and is optional otherwise. Suggested-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
2025-12-04Merge tag 'pci-v6.19-changes' of ↵Linus Torvalds1-1/+0
git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci Pull PCI updates from Bjorn Helgaas: "Enumeration: - Enable host bridge emulation for PCI_DOMAINS_GENERIC platforms (Dan Williams) - Switch vmd from custom domain number allocator to the common allocator to prevent a potential race with new non-VMD buses (Dan Williams) - Enable Precision Time Measurement (PTM) only if device advertises support for a relevant role, to prevent invalid PTM Requests that cause ACS violations that are reported as AER Uncorrectable Non-Fatal errors (Mika Westerberg) Resource management: - Prevent resource tree corruption when BAR resize fails (Ilpo Järvinen) - Restore BARs to the original size if a BAR resize fails (Ilpo Järvinen) - Remove BAR release from BAR resize attempts by the xe, i915, and amdgpu drivers so the PCI core can restore BARs if the resize fails (Ilpo Järvinen) - Move Resizable BAR code to rebar.c (Ilpo Järvinen) - Add pci_rebar_size_supported() and use it in i915 and xe (Ilpo Järvinen) - Add pci_rebar_get_max_size() and use it in xe and amdgpu (Ilpo Järvinen) Power management and error handling: - For drivers using PCI legacy suspend, save config state at suspend so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - For devices with no driver or a driver that lacks power management, save config state at hibernate so that state (not any earlier state from enumeration, probe, or error recovery) will be restored when resuming (Lukas Wunner) - Save device config space on device addition, before driver binding, so error recovery works more reliably (Lukas Wunner) - Drop pci_save_state() from several drivers that no longer need it since the PCI core always does it and pci_restore_state() no longer invalidates the saved state (Lukas Wunner) - Document use of pci_save_state() by drivers to capture the state they want restored during error recovery (Lukas Wunner) Power control: - Add a struct pci_ops.assert_perst() function pointer to assert/deassert PCIe PERST# and implement it for the qcom driver (Krishna Chaitanya Chundru) - Add DT binding and pwrctrl driver for the Toshiba TC9563 PCIe switch, which must be held in reset after poweron so the pwrctrl driver can configure the switch via I2C before bringing up the links (Krishna Chaitanya Chundru) Endpoint framework: - Convert the endpoint doorbell test to use a threaded IRQ to fix a 'sleeping while atomic' issue (Bhanu Seshu Kumar Valluri) - Add endpoint VNTB MSI doorbell support to reduce latency between host and endpoint (Frank Li) New native PCIe controller drivers: - Add CIX Sky1 host controller DT binding and driver (Hans Zhang) - Add NXP S32G host controller DT binding and driver (Vincent Guittot) - Add Renesas RZ/G3S host controller DT binding and driver (Claudiu Beznea) - Add SpacemiT K1 host controller DT binding and driver (Alex Elder) Amlogic Meson PCIe controller driver: - Update DT binding to name DBI region 'dbi', not 'elbi', and update driver to support both (Manivannan Sadhasivam) Apple PCIe controller driver: - Move struct pci_host_bridge allocation from pci_host_common_init() to callers, which significantly simplifies pcie-apple (Marc Zyngier) Broadcom STB PCIe controller driver: - Disable advertising ASPM L0s support correctly (Jim Quinlan) - Add a panic/die handler to print diagnostic info in case PCIe caused an unrecoverable abort (Jim Quinlan) Cadence PCIe controller driver: - Add module support for Cadence platform host and endpoint controller driver (Manikandan K Pillai) - Split headers into 'legacy' (LGA) and 'high perf' (HPA) to prepare for new CIX Sky1 driver (Manikandan K Pillai) MediaTek PCIe controller driver: - Convert DT binding to YAML schema (Christian Marangi) - Add Airoha AN7583 DT compatible and driver support (Christian Marangi) Qualcomm PCIe controller driver: - Add Qualcomm Kaanapali to SM8550 DT binding (Qiang Yu) - Add required 'power-domains' and 'resets' to qcom sa8775p, sc7280, sc8280xp, sm8150, sm8250, sm8350, sm8450, sm8550, x1e80100 DT schemas (Krzysztof Kozlowski) - Look up OPP using both frequency and data rate (not just frequency) so RPMh votes can account for both (Krishna Chaitanya Chundru) Rockchip DesignWare PCIe controller driver: - Add Rockchip RK3528 compatible strings in DT binding (Yao Zi) STMicroelectronics STM32MP25 PCIe controller driver: - Fix a race between link training and endpoint register initialization (Christian Bruel) - Align endpoint allocations to match the ATU requirements (Christian Bruel) Synopsys DesignWare PCIe controller driver: - Clear L1 PM Substate Capability 'Supported' bits unless glue driver says it's supported, which prevents users from enabling non-working L1SS. Currently only qcom and tegra194 support L1SS (Bjorn Helgaas) - Remove now-superfluous L1SS disable code from tegra194 (Bjorn Helgaas) - Configure L1SS support in dw-rockchip when DT says 'supports-clkreq' (Shawn Lin) TI Keystone PCIe controller driver: - Fail the probe instead of silently succeeding if ks_pcie_of_data didn't specify Root Complex or Endpoint mode (Siddharth Vadapalli) - Make keystone buildable as a loadable module, except on ARM32 where hook_fault_code() is __init (Siddharth Vadapalli)" * tag 'pci-v6.19-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (100 commits) MAINTAINERS: Add Manivannan Sadhasivam as PCI/pwrctrl maintainer MAINTAINERS: Add CIX Sky1 PCIe controller driver maintainer PCI: sky1: Add PCIe host support for CIX Sky1 dt-bindings: PCI: Add CIX Sky1 PCIe Root Complex bindings PCI: cadence: Add support for High Perf Architecture (HPA) controller MAINTAINERS: Add NXP S32G PCIe controller driver maintainer PCI: s32g: Add NXP S32G PCIe controller driver (RC) PCI: dwc: Add register and bitfield definitions dt-bindings: PCI: s32g: Add NXP S32G PCIe controller PCI: Add Renesas RZ/G3S host controller driver PCI: host-generic: Move bridge allocation outside of pci_host_common_init() dt-bindings: PCI: Add Renesas RZ/G3S PCIe controller binding PCI: Validate pci_rebar_size_supported() input Documentation: PCI: Amend error recovery doc with pci_save_state() rules treewide: Drop pci_save_state() after pci_restore_state() PCI/ERR: Ensure error recoverability at all times PCI/PM: Stop needlessly clearing state_saved on enumeration and thaw PCI/PM: Reinstate clearing state_saved in legacy and !PM codepaths PCI: dw-rockchip: Configure L1SS support PCI: tegra194: Remove unnecessary L1SS disable code ...
2025-11-27Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski1-1/+1
Conflicts: net/xdp/xsk.c 0ebc27a4c67d ("xsk: avoid data corruption on cq descriptor number") 8da7bea7db69 ("xsk: add indirect call for xsk_destruct_skb") 30ed05adca4a ("xsk: use a smaller new lock for shared pool case") https://lore.kernel.org/20251127105450.4a1665ec@canb.auug.org.au https://lore.kernel.org/eb4eee14-7e24-4d1b-b312-e9ea738fefee@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-27fbnic: Replace use of internal PCS w/ Designware XPCSAlexander Duyck4-63/+54
As we have exposed the PCS registers via the SWMII we can now start looking at connecting the XPCS driver to those registers and let it mange the PCS instead of us doing it directly from the fbnic driver. For now this just gets us the ability to detect link. The hope is in the future to add some of the vendor specific registers to begin enabling XPCS configuration of the interface. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374325295.959489.14521115864034905277.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27fbnic: Add SW shim for MDIO interface to PMD and PCSAlexander Duyck5-0/+205
In order for us to support a PCS device we need to add an MDIO bus to allow the drivers to have access to the registers for the device. This change adds such an interface. The interface will consist of 2 PHY addrs, the first one consisting of a PMD and PCS, and the second just being a PCS. There is a need for 2 PHYs addrs due to the fact that in order to support the 50GBase-CR2 mode we will need to access and configure the PCS vendor registers and RSFEC registers from the second lane identical to the first. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374324532.959489.15389723111560978054.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27fbnic: Add handler for reporting link down event statisticsAlexander Duyck1-0/+9
We were previously not displaying the number of link_down_events tracked by the device. With this change we should now be able to display the value. The value itself tracks the calls from the phylink interface to the mac_link_down call. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374323824.959489.6915296616773178954.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27fbnic: Add logic to track PMD state via MAC/PCS signalsAlexander Duyck9-52/+175
One complication with the design of our part is that the PMD doesn't provide a direct signal to the host. Instead we have visibility to signals that the PCS provides to the MAC that allow us to check the link state through that. We will need to account for several things in the PMD and firmware when managing the link. Specifically when the link first starts to come up the PMD will cause the link to flap. This is due to the firmware starting a training cycle when the link is first detected. This will cause link flapping if we were to immediately report link up when the PCS first detects it. To address that we are adding a pmd_state variable that is meant to be a countdown of sorts indicating the state of the PMD. If the link is down or has been reconfigured the PMD will start out in the initialize state. By default the link is assumed to be in the SEND_DATA state if it is available on initial link inspection. If link is detected while in the initialize state the PMD state will switch to training, and if after 4 seconds the link is still stable we will transition to link_ready, and finally the send_data state. With this we can avoid link flapping when a cable is first connected to the NIC. One side effect of this is that we need to pull the link state away from the PCS. For now we use a union of the PCS link state register value and the pmd_state. The plan is to add a PMD register to report the pmd_state to the phylink interface. With that we can then look at switching over to the use of the XPCS driver for fbnic instead of having an internal one. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374323107.959489.14951134213387615059.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-27fbnic: Rename PCS IRQ to MAC IRQ as it is actually a MAC interruptAlexander Duyck6-32/+32
Throughout several spots in the code I had called out the IRQ as being related to the PCS. However the actual IRQ is a part of the MAC and it is just exposing PCS data. To more accurately reflect the owner of the calls this change makes it so that we rename the functions and values that are taking in the interrupt value and processing it to reflect that it is a MAC call and not a PCS one. This change is mostly motivated by the fact that we will be moving the handling of this interrupt from being PCS focused to being more PMA/PMD focused as this will drive the phydev driver that I am adding instead of driving the PCS directly. Signed-off-by: Alexander Duyck <alexanderduyck@fb.com> Link: https://patch.msgid.link/176374322373.959489.12018231545479053860.stgit@ahduyck-xeon-server.home.arpa Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2025-11-26eth: fbnic: Fix counter roll-over issueMohsin Bashir1-1/+1
Fix a potential counter roll-over issue in fbnic_mbx_alloc_rx_msgs() when calculating descriptor slots. The issue occurs when head - tail results in a large positive value (unsigned) and the compiler interprets head - tail - 1 as a signed value. Since FBNIC_IPC_MBX_DESC_LEN is a power of two, use a masking operation, which is a common way of avoiding this problem when dealing with these sort of ring space calculations. Fixes: da3cde08209e ("eth: fbnic: Add FW communication mechanism") Signed-off-by: Mohsin Bashir <mohsin.bashr@gmail.com> Link: https://patch.msgid.link/20251125211704.3222413-1-mohsin.bashr@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-25drivers: net: fbnic: Return the true error in fbnic_alloc_napi_vectors.Dimitri Daskalakis1-1/+1
The error case in fbnic_alloc_napi_vectors defaulted to returning ENOMEM. This can mask the true error case, causing confusion. Signed-off-by: Dimitri Daskalakis <dimitri.daskalakis1@gmail.com> Link: https://patch.msgid.link/20251124200518.1848029-1-dimitri.daskalakis1@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-24treewide: Drop pci_save_state() after pci_restore_state()Lukas Wunner1-1/+0
In 2009, commit c82f63e411f1 ("PCI: check saved state before restore") changed the behavior of pci_restore_state() such that it became necessary to call pci_save_state() afterwards, lest recovery from subsequent PCI errors fails. The commit has just been reverted and so all the pci_save_state() after pci_restore_state() calls that have accumulated in the tree are now superfluous. Drop them. Two drivers chose a different approach to achieve the same result: drivers/scsi/ipr.c and drivers/net/ethernet/intel/e1000e/netdev.c set the pci_dev's "state_saved" flag to true before calling pci_restore_state(). Drop this as well. Signed-off-by: Lukas Wunner <lukas@wunner.de> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Acked-by: Dave Jiang <dave.jiang@intel.com> Acked-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> # qat Link: https://patch.msgid.link/c2b28cc4defa1b743cf1dedee23c455be98b397a.1760274044.git.lukas@wunner.de
2025-11-20eth: fbnic: access @pp through netmem_desc instead of pageByungchul Park1-1/+2
To eliminate the use of struct page in page pool, the page pool users should use netmem descriptor and APIs instead. Make fbnic access @pp through netmem_desc instead of page. Signed-off-by: Byungchul Park <byungchul@sk.com> Link: https://patch.msgid.link/20251120011118.73253-1-byungchul@sk.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-11-17eth: fbnic: Configure RDE settings for pause frameMohsin Bashir4-4/+28
fbnic supports pause frames. When pause frames are enabled presumably user