aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorJakub Kicinski <kuba@kernel.org>2025-04-14 17:29:18 -0700
committerJakub Kicinski <kuba@kernel.org>2025-04-14 17:29:19 -0700
commita4cba7e98e35e618b3b4e1fce9746caad67cb308 (patch)
tree51efa82b68d2b7c23ce030b326108d3a64b0ae11
parent4129a75a76a755c151be1ebc0e7d618a4d21e41f (diff)
parent3db55f8cc8d329a97e06fb44347b64a0ca44e780 (diff)
Merge branch 'net-mlx5-hws-refactor-action-ste-handling'
Tariq Toukan says: ==================== net/mlx5: HWS, Refactor action STE handling This patch series by Vlad refactors how action STEs are handled for hardware steering. Definitions ---------- * STE (Steering Table Entry): a building block for steering rules. Simple rules consist of a single STE that specifies both the match value and what actions to do. For more complex rules we have one or more match STEs that point to one or more action STEs. It is these action STEs which this patch series is primarily concerned with. * RTC (Rule Table Context): a table that contains STEs. A matcher currently consists of a match RTC and, if necessary, an action RTC. This patch series decouples action RTCs from matchers and moves action RTCs to a central pool. * Matcher: a logical container for steering rules. While the items above describe hardware concepts, a matcher is purely a software construct. Current situation ----------------- As mentioned above, a matcher currently consists of a match RTC (or more, in case of complex matchers) and zero or one action STCs. An action STC is only allocated if the matcher contains sufficiently complicated action templates, or many actions. When adding a rule, we decide based on its action template whether it requires action STEs. If yes, we allocate the required number of action STEs from the matcher's action STE. When updating a rule, we need to prevent the rule ever being in an invalid state. So we need to allocate and write new action STEs first, then update the match STE to point to them, and finally release the old action STEs. So there is a state when a rule needs double the action STEs it normally uses. Thus, for a given matcher of log_sz=N, log_action_ste_sz=A, the action STC log_size is (N + A + 1). We need enough space to hold all the rules' action STEs, and effectively double that space to account for the not very common case of rules being updated. We could manage with much fewer extra action STEs, but RTCs are allocated in powers of two. This results in effective utilization of action RTCs of 50%, outside rule update cases. This is further complicated when resizing matchers. To avoid updating all the rules to point to new match STEs, we keep existing action RTCs around as resize_data, and only free them when the matcher is freed. Action STE pool --------------- This patch series decouples action RTCs from matchers by creating a per-queue pool. When a rule needs to allocate action STEs it does so from the pool, creating a new RTC if needed. During update two sets of action STEs are in use, possibly from different RTCs. The pool is sharded per-queue to avoid lock contention. Each per-queue pool consists of 3 elements, corresponding to rx-only, tx-only and rx-and-tx use cases. The series takes this approach because rules that are bidirectional require that their action STEs have the same index in the rx- and tx-RTCs, and using a single RTC would result in unidirectional rules wasting the STEs for the unused direction. Pool elements, in turn, consist of a list of RTCs. The driver progressively allocates larger RTCs as they are needed to amortize the cost of allocation. Allocation of elements (STEs) inside RTCs is modelled by an existing mechanism, somewhat confusingly also known as a pool. The first few patches in the series refactor this abstraction to simplify it and adapt it to the new schema. Finally, this series implements periodic cleanup of unused action RTCs as a new feature. Previously, once a matcher allocated an action RTC, it would only be freed when the matcher is freed. This resulted in a lot of wasted memory for matchers that had previously grown, but were now mostly unused. Conversely, action STE pools have a timestamp of when they were last used. A cleanup routine periodically checks all pools. If a pool's last usage was too far in the past, it is destroyed. Benchmarks ---------- The test module creates a batch of (1 << 18) rules per queue and then deletes them, in a loop. The rules are complex enough to require two action STEs per rule. Each queue is manipulated from a separate kernel workqueue, so there is a 1:1 correspondence between threads and queues. There are sleep statements between insert and delete batches so that memory usage can be evaluated using `free -m`. The numbers below are the diff between base memory usage (without the mlx5 module inserted) and peak usage while running a test. The values are rounded to the nearest hundred megabytes. The `queues` column lists how many queues the test used. queues mem_before mem_after 1 1300M 800M 4 4000M 2300M 8 7300M 3300M Across all of the tests, insertion and deletion rates are the same before and after these patches. Summary of the patches ---------------------- * Patch 1: Fix matcher action template attach to avoid overrunning the buffer and correctly report errors. * Patches 2-7: Cleanup the existing pool abstraction. Clarify semantics, and use cases, simplify API and callers. * Patch 8: Implement the new action STE pool structure. * Patch 9: Use the action STE pool when manipulating rules. * Patch 10: Remove action RTC from matcher. * Patch 11: Add logic to periodically check and free unused action RTCs. * Patch 12: Export action STE tables in debugfs for our dump tool. ==================== Link: https://patch.msgid.link/1744312662-356571-1-git-send-email-tariqt@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/Makefile3
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c56
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h8
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c467
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h69
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c98
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.h9
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.c1
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/cmd.h1
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.c8
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/context.h2
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.c71
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/debug.h2
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/internal.h1
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.c420
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/matcher.h26
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.c515
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/pool.h103
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.c69
-rw-r--r--drivers/net/ethernet/mellanox/mlx5/core/steering/hws/rule.h12
20 files changed, 972 insertions, 969 deletions
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/Makefile b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
index 568bbe5f83f5..d292e6a9e22c 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/Makefile
+++ b/drivers/net/ethernet/mellanox/mlx5/core/Makefile
@@ -154,7 +154,8 @@ mlx5_core-$(CONFIG_MLX5_HW_STEERING) += steering/hws/cmd.o \
steering/hws/vport.o \
steering/hws/bwc_complex.o \
steering/hws/fs_hws_pools.o \
- steering/hws/fs_hws.o
+ steering/hws/fs_hws.o \
+ steering/hws/action_ste_pool.o
#
# SF device
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c
index b5332c54d4fb..bef4d25c1a2a 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.c
@@ -238,6 +238,7 @@ hws_action_fixup_stc_attr(struct mlx5hws_context *ctx,
enum mlx5hws_table_type table_type,
bool is_mirror)
{
+ struct mlx5hws_pool *pool;
bool use_fixup = false;
u32 fw_tbl_type;
u32 base_id;
@@ -253,13 +254,11 @@ hws_action_fixup_stc_attr(struct mlx5hws_context *ctx,
use_fixup = true;
break;
}
+ pool = stc_attr->ste_table.ste_pool;
if (!is_mirror)
- base_id = mlx5hws_pool_chunk_get_base_id(stc_attr->ste_table.ste_pool,
- &stc_attr->ste_table.ste);
+ base_id = mlx5hws_pool_get_base_id(pool);
else
- base_id =
- mlx5hws_pool_chunk_get_base_mirror_id(stc_attr->ste_table.ste_pool,
- &stc_attr->ste_table.ste);
+ base_id = mlx5hws_pool_get_base_mirror_id(pool);
*fixup_stc_attr = *stc_attr;
fixup_stc_attr->ste_table.ste_obj_id = base_id;
@@ -337,7 +336,7 @@ __must_hold(&ctx->ctrl_lock)
if (!mlx5hws_context_cap_dynamic_reparse(ctx))
stc_attr->reparse_mode = MLX5_IFC_STC_REPARSE_IGNORE;
- obj_0_id = mlx5hws_pool_chunk_get_base_id(stc_pool, stc);
+ obj_0_id = mlx5hws_pool_get_base_id(stc_pool);
/* According to table/action limitation change the stc_attr */
use_fixup = hws_action_fixup_stc_attr(ctx, stc_attr, &fixup_stc_attr, table_type, false);
@@ -353,7 +352,7 @@ __must_hold(&ctx->ctrl_lock)
if (table_type == MLX5HWS_TABLE_TYPE_FDB) {
u32 obj_1_id;
- obj_1_id = mlx5hws_pool_chunk_get_base_mirror_id(stc_pool, stc);
+ obj_1_id = mlx5hws_pool_get_base_mirror_id(stc_pool);
use_fixup = hws_action_fixup_stc_attr(ctx, stc_attr,
&fixup_stc_attr,
@@ -393,11 +392,11 @@ __must_hold(&ctx->ctrl_lock)
stc_attr.action_type = MLX5_IFC_STC_ACTION_TYPE_DROP;
stc_attr.action_offset = MLX5HWS_ACTION_OFFSET_HIT;
stc_attr.stc_offset = stc->offset;
- obj_id = mlx5hws_pool_chunk_get_base_id(stc_pool, stc);
+ obj_id = mlx5hws_pool_get_base_id(stc_pool);
mlx5hws_cmd_stc_modify(ctx->mdev, obj_id, &stc_attr);
if (table_type == MLX5HWS_TABLE_TYPE_FDB) {
- obj_id = mlx5hws_pool_chunk_get_base_mirror_id(stc_pool, stc);
+ obj_id = mlx5hws_pool_get_base_mirror_id(stc_pool);
mlx5hws_cmd_stc_modify(ctx->mdev, obj_id, &stc_attr);
}
@@ -1575,17 +1574,15 @@ hws_action_create_dest_match_range_definer(struct mlx5hws_context *ctx)
return definer;
}
-static struct mlx5hws_matcher_action_ste *
+static struct mlx5hws_range_action_table *
hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx,
struct mlx5hws_definer *definer,
u32 miss_ft_id)
{
struct mlx5hws_cmd_rtc_create_attr rtc_attr = {0};
- struct mlx5hws_action_default_stc *default_stc;
- struct mlx5hws_matcher_action_ste *table_ste;
+ struct mlx5hws_range_action_table *table_ste;
struct mlx5hws_pool_attr pool_attr = {0};
struct mlx5hws_pool *ste_pool, *stc_pool;
- struct mlx5hws_pool_chunk *ste;
u32 *rtc_0_id, *rtc_1_id;
u32 obj_id;
int ret;
@@ -1604,7 +1601,6 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx,
pool_attr.table_type = MLX5HWS_TABLE_TYPE_FDB;
pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE;
- pool_attr.flags = MLX5HWS_POOL_FLAGS_FOR_STE_ACTION_POOL;
pool_attr.alloc_log_sz = 1;
table_ste->pool = mlx5hws_pool_create(ctx, &pool_attr);
if (!table_ste->pool) {
@@ -1616,8 +1612,6 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx,
rtc_0_id = &table_ste->rtc_0_id;
rtc_1_id = &table_ste->rtc_1_id;
ste_pool = table_ste->pool;
- ste = &table_ste->ste;
- ste->order = 1;
rtc_attr.log_size = 0;
rtc_attr.log_depth = 0;
@@ -1629,18 +1623,16 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx,
rtc_attr.fw_gen_wqe = true;
rtc_attr.is_scnd_range = true;
- obj_id = mlx5hws_pool_chunk_get_base_id(ste_pool, ste);
+ obj_id = mlx5hws_pool_get_base_id(ste_pool);
rtc_attr.pd = ctx->pd_num;
rtc_attr.ste_base = obj_id;
- rtc_attr.ste_offset = ste->offset;
rtc_attr.reparse_mode = mlx5hws_context_get_reparse_mode(ctx);
rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(MLX5HWS_TABLE_TYPE_FDB, false);
/* STC is a single resource (obj_id), use any STC for the ID */
stc_pool = ctx->stc_pool;
- default_stc = ctx->common_res.default_stc;
- obj_id = mlx5hws_pool_chunk_get_base_id(stc_pool, &default_stc->default_hit);
+ obj_id = mlx5hws_pool_get_base_id(stc_pool);
rtc_attr.stc_base = obj_id;
ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_0_id);
@@ -1650,11 +1642,11 @@ hws_action_create_dest_match_range_table(struct mlx5hws_context *ctx,
}
/* Create mirror RTC */
- obj_id = mlx5hws_pool_chunk_get_base_mirror_id(ste_pool, ste);
+ obj_id = mlx5hws_pool_get_base_mirror_id(ste_pool);
rtc_attr.ste_base = obj_id;
rtc_attr.table_type = mlx5hws_table_get_res_fw_ft_type(MLX5HWS_TABLE_TYPE_FDB, true);
- obj_id = mlx5hws_pool_chunk_get_base_mirror_id(stc_pool, &default_stc->default_hit);
+ obj_id = mlx5hws_pool_get_base_mirror_id(stc_pool);
rtc_attr.stc_base = obj_id;
ret = mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_1_id);
@@ -1677,9 +1669,9 @@ free_ste:
return NULL;
}
-static void
-hws_action_destroy_dest_match_range_table(struct mlx5hws_context *ctx,
- struct mlx5hws_matcher_action_ste *table_ste)
+static void hws_action_destroy_dest_match_range_table(
+ struct mlx5hws_context *ctx,
+ struct mlx5hws_range_action_table *table_ste)
{
mutex_lock(&ctx->ctrl_lock);
@@ -1691,12 +1683,11 @@ hws_action_destroy_dest_match_range_table(struct mlx5hws_context *ctx,
mutex_unlock(&ctx->ctrl_lock);
}
-static int
-hws_action_create_dest_match_range_fill_table(struct mlx5hws_context *ctx,
- struct mlx5hws_matcher_action_ste *table_ste,
- struct mlx5hws_action *hit_ft_action,
- struct mlx5hws_definer *range_definer,
- u32 min, u32 max)
+static int hws_action_create_dest_match_range_fill_table(
+ struct mlx5hws_context *ctx,
+ struct mlx5hws_range_action_table *table_ste,
+ struct mlx5hws_action *hit_ft_action,
+ struct mlx5hws_definer *range_definer, u32 min, u32 max)
{
struct mlx5hws_wqe_gta_data_seg_ste match_wqe_data = {0};
struct mlx5hws_wqe_gta_data_seg_ste range_wqe_data = {0};
@@ -1792,7 +1783,7 @@ mlx5hws_action_create_dest_match_range(struct mlx5hws_context *ctx,
u32 min, u32 max, u32 flags)
{
struct mlx5hws_cmd_stc_modify_attr stc_attr = {0};
- struct mlx5hws_matcher_action_ste *table_ste;
+ struct mlx5hws_range_action_table *table_ste;
struct mlx5hws_action *hit_ft_action;
struct mlx5hws_definer *definer;
struct mlx5hws_action *action;
@@ -1837,7 +1828,6 @@ mlx5hws_action_create_dest_match_range(struct mlx5hws_context *ctx,
stc_attr.action_offset = MLX5HWS_ACTION_OFFSET_HIT;
stc_attr.action_type = MLX5_IFC_STC_ACTION_TYPE_JUMP_TO_STE_TABLE;
stc_attr.reparse_mode = MLX5_IFC_STC_REPARSE_IGNORE;
- stc_attr.ste_table.ste = table_ste->ste;
stc_attr.ste_table.ste_pool = table_ste->pool;
stc_attr.ste_table.match_definer_id = ctx->caps->trivial_match_definer;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h
index 64b76075f7f8..25fa0d4c9221 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action.h
@@ -118,6 +118,12 @@ struct mlx5hws_action_template {
u8 only_term;
};
+struct mlx5hws_range_action_table {
+ struct mlx5hws_pool *pool;
+ u32 rtc_0_id;
+ u32 rtc_1_id;
+};
+
struct mlx5hws_action {
u8 type;
u8 flags;
@@ -186,7 +192,7 @@ struct mlx5hws_action {
size_t size;
} remove_header;
struct {
- struct mlx5hws_matcher_action_ste *table_ste;
+ struct mlx5hws_range_action_table *table_ste;
struct mlx5hws_action *hit_ft_action;
struct mlx5hws_definer *definer;
} range;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c
new file mode 100644
index 000000000000..5766a9c82f96
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.c
@@ -0,0 +1,467 @@
+// SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB
+/* Copyright (c) 2025 NVIDIA Corporation & Affiliates */
+
+#include "internal.h"
+
+static const char *
+hws_pool_opt_to_str(enum mlx5hws_pool_optimize opt)
+{
+ switch (opt) {
+ case MLX5HWS_POOL_OPTIMIZE_NONE:
+ return "rx-and-tx";
+ case MLX5HWS_POOL_OPTIMIZE_ORIG:
+ return "rx-only";
+ case MLX5HWS_POOL_OPTIMIZE_MIRROR:
+ return "tx-only";
+ default:
+ return "unknown";
+ }
+}
+
+static int
+hws_action_ste_table_create_pool(struct mlx5hws_context *ctx,
+ struct mlx5hws_action_ste_table *action_tbl,
+ enum mlx5hws_pool_optimize opt, size_t log_sz)
+{
+ struct mlx5hws_pool_attr pool_attr = { 0 };
+
+ pool_attr.pool_type = MLX5HWS_POOL_TYPE_STE;
+ pool_attr.table_type = MLX5HWS_TABLE_TYPE_FDB;
+ pool_attr.flags = MLX5HWS_POOL_FLAG_BUDDY;
+ pool_attr.opt_type = opt;
+ pool_attr.alloc_log_sz = log_sz;
+
+ action_tbl->pool = mlx5hws_pool_create(ctx, &pool_attr);
+ if (!action_tbl->pool) {
+ mlx5hws_err(ctx, "Failed to allocate STE pool\n");
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int hws_action_ste_table_create_single_rtc(
+ struct mlx5hws_context *ctx,
+ struct mlx5hws_action_ste_table *action_tbl,
+ enum mlx5hws_pool_optimize opt, size_t log_sz, bool tx)
+{
+ struct mlx5hws_cmd_rtc_create_attr rtc_attr = { 0 };
+ u32 *rtc_id;
+
+ rtc_attr.log_depth = 0;
+ rtc_attr.update_index_mode = MLX5_IFC_RTC_STE_UPDATE_MODE_BY_OFFSET;
+ /* Action STEs use the default always hit definer. */
+ rtc_attr.match_definer_0 = ctx->caps->trivial_match_definer;
+ rtc_attr.is_frst_jumbo = false;
+ rtc_attr.miss_ft_id = 0;
+ rtc_attr.pd = ctx->pd_num;
+ rtc_attr.reparse_mode = mlx5hws_context_get_reparse_mode(ctx);
+
+ if (tx) {
+ rtc_attr.table_type = FS_FT_FDB_TX;
+ rtc_attr.ste_base =
+ mlx5hws_pool_get_base_mirror_id(action_tbl->pool);
+ rtc_attr.stc_base =
+ mlx5hws_pool_get_base_mirror_id(ctx->stc_pool);
+ rtc_attr.log_size =
+ opt == MLX5HWS_POOL_OPTIMIZE_ORIG ? 0 : log_sz;
+ rtc_id = &action_tbl->rtc_1_id;
+ } else {
+ rtc_attr.table_type = FS_FT_FDB_RX;
+ rtc_attr.ste_base = mlx5hws_pool_get_base_id(action_tbl->pool);
+ rtc_attr.stc_base = mlx5hws_pool_get_base_id(ctx->stc_pool);
+ rtc_attr.log_size =
+ opt == MLX5HWS_POOL_OPTIMIZE_MIRROR ? 0 : log_sz;
+ rtc_id = &action_tbl->rtc_0_id;
+ }
+
+ return mlx5hws_cmd_rtc_create(ctx->mdev, &rtc_attr, rtc_id);
+}
+
+static int
+hws_action_ste_table_create_rtcs(struct mlx5hws_context *ctx,
+ struct mlx5hws_action_ste_table *action_tbl,
+ enum mlx5hws_pool_optimize opt, size_t log_sz)
+{
+ int err;
+
+ err = hws_action_ste_table_create_single_rtc(ctx, action_tbl, opt,
+ log_sz, false);
+ if (err)
+ return err;
+
+ err = hws_action_ste_table_create_single_rtc(ctx, action_tbl, opt,
+ log_sz, true);
+ if (err) {
+ mlx5hws_cmd_rtc_destroy(ctx->mdev, action_tbl->rtc_0_id);
+ return err;
+ }
+
+ return 0;
+}
+
+static void
+hws_action_ste_table_destroy_rtcs(struct mlx5hws_action_ste_table *action_tbl)
+{
+ mlx5hws_cmd_rtc_destroy(action_tbl->pool->ctx->mdev,
+ action_tbl->rtc_1_id);
+ mlx5hws_cmd_rtc_destroy(action_tbl->pool->ctx->mdev,
+ action_tbl->rtc_0_id);
+}
+
+static int
+hws_action_ste_table_create_stc(struct mlx5hws_context *ctx,
+ struct mlx5hws_action_ste_table *action_tbl)
+{
+ struct mlx5hws_cmd_stc_modify_attr stc_attr = { 0 };
+
+ stc_attr.action_offset = MLX5HWS_ACTION_OFFSET_HIT;
+ stc_attr.action_type = MLX5_IFC_STC_ACTION_TYPE_JUMP_TO_STE_TABLE;
+ stc_attr.reparse_mode = MLX5_IFC_STC_REPARSE_IGNORE;
+ stc_attr.ste_table.ste_pool = action_tbl->pool;
+ stc_attr.ste_table.match_definer_id = ctx->caps->trivial_match_definer;
+
+ return mlx5hws_action_alloc_single_stc(ctx, &stc_attr,
+ MLX5HWS_TABLE_TYPE_FDB,
+ &action_tbl->stc);
+}
+
+static struct mlx5hws_action_ste_table *
+hws_action_ste_table_alloc(struct mlx5hws_action_ste_pool_element *parent_elem)
+{
+ enum mlx5hws_pool_optimize opt = parent_elem->opt;
+ struct mlx5hws_context *ctx = parent_elem->ctx;
+ struct mlx5hws_action_ste_table *action_tbl;
+ size_t log_sz;
+ int err;
+
+ log_sz = min(parent_elem->log_sz ?
+ parent_elem->log_sz +
+ MLX5HWS_ACTION_STE_TABLE_STEP_LOG_SZ :
+ MLX5HWS_ACTION_STE_TABLE_INIT_LOG_SZ,
+ MLX5HWS_ACTION_STE_TABLE_MAX_LOG_SZ);
+
+ action_tbl = kzalloc(sizeof(*action_tbl), GFP_KERNEL);
+ if (!action_tbl)
+ return ERR_PTR(-ENOMEM);
+
+ err = hws_action_ste_table_create_pool(ctx, action_tbl, opt, log_sz);
+ if (err)
+ goto free_tbl;
+
+ err = hws_action_ste_table_create_rtcs(ctx, action_tbl, opt, log_sz);
+ if (err)
+ goto destroy_pool;
+
+ err = hws_action_ste_table_create_stc(ctx, action_tbl);
+ if (err)
+ goto destroy_rtcs;
+
+ action_tbl->parent_elem = parent_elem;
+ INIT_LIST_HEAD(&action_tbl->list_node);
+ action_tbl->last_used = jiffies;
+ list_add(&action_tbl->list_node, &parent_elem->available);
+ parent_elem->log_sz = log_sz;
+
+ mlx5hws_dbg(ctx,
+ "Allocated %s action STE table log_sz %zu; STEs (%d, %d); RTCs (%d, %d); STC %d\n",
+ hws_pool_opt_to_str(opt), log_sz,
+ mlx5hws_pool_get_base_id(action_tbl->pool),
+ mlx5hws_pool_get_base_mirror_id(action_tbl->pool),
+ action_tbl->rtc_0_id, action_tbl->rtc_1_id,
+ action_tbl->stc.offset);
+
+ return action_tbl;
+
+destroy_rtcs:
+ hws_action_ste_table_destroy_rtcs(action_tbl);
+destroy_pool:
+ mlx5hws_pool_destroy(action_tbl->pool);
+free_tbl:
+ kfree(action_tbl);
+
+ return ERR_PTR(err);
+}
+
+static void
+hws_action_ste_table_destroy(struct mlx5hws_action_ste_table *action_tbl)
+{
+ struct mlx5hws_context *ctx = action_tbl->parent_elem->ctx;
+
+ mlx5hws_dbg(ctx,
+ "Destroying %s action STE table: STEs (%d, %d); RTCs (%d, %d); STC %d\n",
+ hws_pool_opt_to_str(action_tbl->parent_elem->opt),
+ mlx5hws_pool_get_base_id(action_tbl->pool),
+ mlx5hws_pool_get_base_mirror_id(action_tbl->pool),
+ action_tbl->rtc_0_id, action_tbl->rtc_1_id,
+ action_tbl->stc.offset);
+
+ mlx5hws_action_free_single_stc(ctx, MLX5HWS_TABLE_TYPE_FDB,
+ &action_tbl->stc);
+ hws_action_ste_table_destroy_rtcs(action_tbl);
+ mlx5hws_pool_destroy(action_tbl->pool);
+
+ list_del(&action_tbl->list_node);
+ kfree(action_tbl);
+}
+
+static int
+hws_action_ste_pool_element_init(struct mlx5hws_context *ctx,
+ struct mlx5hws_action_ste_pool_element *elem,
+ enum mlx5hws_pool_optimize opt)
+{
+ elem->ctx = ctx;
+ elem->opt = opt;
+ INIT_LIST_HEAD(&elem->available);
+ INIT_LIST_HEAD(&elem->full);
+
+ return 0;
+}
+
+static void hws_action_ste_pool_element_destroy(
+ struct mlx5hws_action_ste_pool_element *elem)
+{
+ struct mlx5hws_action_ste_table *action_tbl, *p;
+
+ /* This should be empty, but attempt to free its elements anyway. */
+ list_for_each_entry_safe(action_tbl, p, &elem->full, list_node)
+ hws_action_ste_table_destroy(action_tbl);
+
+ list_for_each_entry_safe(action_tbl, p, &elem->available, list_node)
+ hws_action_ste_table_destroy(action_tbl);
+}
+
+static int hws_action_ste_pool_init(struct mlx5hws_context *ctx,
+ struct mlx5hws_action_ste_pool *pool)
+{
+ enum mlx5hws_pool_optimize opt;
+ int err;
+
+ mutex_init(&pool->lock);
+
+ /* Rules which are added for both RX and TX must use the same action STE
+ * indices for both. If we were to use a single table, then RX-only and
+ * TX-only rules would waste the unused entries. Thus, we use separate
+ * table sets for the three cases.
+ */
+ for (opt = MLX5HWS_POOL_OPTIMIZE_NONE; opt < MLX5HWS_POOL_OPTIMIZE_MAX;
+ opt++) {
+ err = hws_action_ste_pool_element_init(ctx, &pool->elems[opt],
+ opt);
+ if (err)
+ goto destroy_elems;
+ pool->elems[opt].parent_pool = pool;
+ }
+
+ return 0;
+
+destroy_elems:
+ while (opt-- > MLX5HWS_POOL_OPTIMIZE_NONE)
+ hws_action_ste_pool_element_destroy(&pool->elems[opt]);
+
+ return err;
+}
+
+static void hws_action_ste_pool_destroy(struct mlx5hws_action_ste_pool *pool)
+{
+ int opt;
+
+ for (opt = MLX5HWS_POOL_OPTIMIZE_MAX - 1;
+ opt >= MLX5HWS_POOL_OPTIMIZE_NONE; opt--)
+ hws_action_ste_pool_element_destroy(&pool->elems[opt]);
+}
+
+static void hws_action_ste_pool_element_collect_stale(
+ struct mlx5hws_action_ste_pool_element *elem, struct list_head *cleanup)
+{
+ struct mlx5hws_action_ste_table *action_tbl, *p;
+ unsigned long expire_time, now;
+
+ expire_time = secs_to_jiffies(MLX5HWS_ACTION_STE_POOL_EXPIRE_SECONDS);
+ now = jiffies;
+
+ list_for_each_entry_safe(action_tbl, p, &elem->available, list_node) {
+ if (mlx5hws_pool_full(action_tbl->pool) &&
+ time_before(action_tbl->last_used + expire_time, now))
+ list_move(&action_tbl->list_node, cleanup);
+ }
+}
+
+static void hws_action_ste_table_cleanup_list(struct list_head *cleanup)
+{
+ struct mlx5hws_action_ste_table *action_tbl, *p;
+
+ list_for_each_entry_safe(action_tbl, p, cleanup, list_node)
+ hws_action_ste_table_destroy(action_tbl);
+}
+
+static void hws_action_ste_pool_cleanup(struct work_struct *work)
+{
+ enum mlx5hws_pool_optimize opt;
+ struct mlx5hws_context *ctx;
+ LIST_HEAD(cleanup);
+ int i;
+
+ ctx = container_of(work, struct mlx5hws_context,
+ action_ste_cleanup.work);
+
+ for (i = 0; i < ctx->queues; i++) {
+ struct mlx5hws_action_ste_pool *p = &ctx->action_ste_pool[i];
+
+ mutex_lock(&p->lock);
+ for (opt = MLX5HWS_POOL_OPTIMIZE_NONE;
+ opt < MLX5HWS_POOL_OPTIMIZE_MAX; opt++)
+ hws_action_ste_pool_element_collect_stale(
+ &p->elems[opt], &cleanup);
+ mutex_unlock(&p->lock);
+ }
+
+ hws_action_ste_table_cleanup_list(&cleanup);
+
+ schedule_delayed_work(&ctx->action_ste_cleanup,
+ secs_to_jiffies(
+ MLX5HWS_ACTION_STE_POOL_CLEANUP_SECONDS));
+}
+
+int mlx5hws_action_ste_pool_init(struct mlx5hws_context *ctx)
+{
+ struct mlx5hws_action_ste_pool *pool;
+ size_t queues = ctx->queues;
+ int i, err;
+
+ pool = kcalloc(queues, sizeof(*pool), GFP_KERNEL);
+ if (!pool)
+ return -ENOMEM;
+
+ for (i = 0; i < queues; i++) {
+ err = hws_action_ste_pool_init(ctx, &pool[i]);
+ if (err)
+ goto free_pool;
+ }
+
+ ctx->action_ste_pool = pool;
+
+ INIT_DELAYED_WORK(&ctx->action_ste_cleanup,
+ hws_action_ste_pool_cleanup);
+ schedule_delayed_work(
+ &ctx->action_ste_cleanup,
+ secs_to_jiffies(MLX5HWS_ACTION_STE_POOL_CLEANUP_SECONDS));
+
+ return 0;
+
+free_pool:
+ while (i--)
+ hws_action_ste_pool_destroy(&pool[i]);
+ kfree(pool);
+
+ return err;
+}
+
+void mlx5hws_action_ste_pool_uninit(struct mlx5hws_context *ctx)
+{
+ size_t queues = ctx->queues;
+ int i;
+
+ cancel_delayed_work_sync(&ctx->action_ste_cleanup);
+
+ for (i = 0; i < queues; i++)
+ hws_action_ste_pool_destroy(&ctx->action_ste_pool[i]);
+
+ kfree(ctx->action_ste_pool);
+}
+
+static struct mlx5hws_action_ste_pool_element *
+hws_action_ste_choose_elem(struct mlx5hws_action_ste_pool *pool,
+ bool skip_rx, bool skip_tx)
+{
+ if (skip_rx)
+ return &pool->elems[MLX5HWS_POOL_OPTIMIZE_MIRROR];
+
+ if (skip_tx)
+ return &pool->elems[MLX5HWS_POOL_OPTIMIZE_ORIG];
+
+ return &pool->elems[MLX5HWS_POOL_OPTIMIZE_NONE];
+}
+
+static int
+hws_action_ste_table_chunk_alloc(struct mlx5hws_action_ste_table *action_tbl,
+ struct mlx5hws_action_ste_chunk *chunk)
+{
+ int err;
+
+ err = mlx5hws_pool_chunk_alloc(action_tbl->pool, &chunk->ste);
+ if (err)
+ return err;
+
+ chunk->action_tbl = action_tbl;
+ action_tbl->last_used = jiffies;
+
+ return 0;
+}
+
+int mlx5hws_action_ste_chunk_alloc(struct mlx5hws_action_ste_pool *pool,
+ bool skip_rx, bool skip_tx,
+ struct mlx5hws_action_ste_chunk *chunk)
+{
+ struct mlx5hws_action_ste_pool_element *elem;
+ struct mlx5hws_action_ste_table *action_tbl;
+ bool found;
+ int err;
+
+ if (skip_rx && skip_tx)
+ return -EINVAL;
+
+ mutex_lock(&pool->lock);
+
+ elem = hws_action_ste_choose_elem(pool, skip_rx, skip_tx);
+
+ mlx5hws_dbg(elem->ctx,
+ "Allocating action STEs skip_rx %d skip_tx %d order %d\n",
+ skip_rx, skip_tx, chunk->ste.order);
+
+ found = false;
+ list_for_each_entry(action_tbl, &elem->available, list_node) {
+ if (!hws_action_ste_table_chunk_alloc(action_tbl, chunk)) {
+ found = true;
+ break;
+ }
+ }
+
+ if (!found) {
+ action_tbl = hws_action_ste_table_alloc(elem);
+ if (IS_ERR(action_tbl)) {
+ err = PTR_ERR(action_tbl);
+ goto out;
+ }
+
+ err = hws_action_ste_table_chunk_alloc(action_tbl, chunk);
+ if (err)
+ goto out;
+ }
+
+ if (mlx5hws_pool_empty(action_tbl->pool))
+ list_move(&action_tbl->list_node, &elem->full);
+
+ err = 0;
+
+out:
+ mutex_unlock(&pool->lock);
+
+ return err;
+}
+
+void mlx5hws_action_ste_chunk_free(struct mlx5hws_action_ste_chunk *chunk)
+{
+ struct mutex *lock = &chunk->action_tbl->parent_elem->parent_pool->lock;
+
+ mlx5hws_dbg(chunk->action_tbl->pool->ctx,
+ "Freeing action STEs offset %d order %d\n",
+ chunk->ste.offset, chunk->ste.order);
+
+ mutex_lock(lock);
+ mlx5hws_pool_chunk_free(chunk->action_tbl->pool, &chunk->ste);
+ chunk->action_tbl->last_used = jiffies;
+ list_move(&chunk->action_tbl->list_node,
+ &chunk->action_tbl->parent_elem->available);
+ mutex_unlock(lock);
+}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h
new file mode 100644
index 000000000000..a8ba97359e31
--- /dev/null
+++ b/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/action_ste_pool.h
@@ -0,0 +1,69 @@
+/* SPDX-License-Identifier: GPL-2.0 OR Linux-OpenIB */
+/* Copyright (c) 2025 NVIDIA Corporation & Affiliates */
+
+#ifndef ACTION_STE_POOL_H_
+#define ACTION_STE_POOL_H_
+
+#define MLX5HWS_ACTION_STE_TABLE_INIT_LOG_SZ 10
+#define MLX5HWS_ACTION_STE_TABLE_STEP_LOG_SZ 1
+#define MLX5HWS_ACTION_STE_TABLE_MAX_LOG_SZ 20
+
+#define MLX5HWS_ACTION_STE_POOL_CLEANUP_SECONDS 300
+#define MLX5HWS_ACTION_STE_POOL_EXPIRE_SECONDS 300
+
+struct mlx5hws_action_ste_pool_element;
+
+struct mlx5hws_action_ste_table {
+ struct mlx5hws_action_ste_pool_element *parent_elem;
+ /* Wraps the RTC and STE range for this given action. */
+ struct mlx5hws_pool *pool;
+ /* Match STEs use this STC to jump to this pool's RTC. */
+ struct mlx5hws_pool_chunk stc;
+ u32 rtc_0_id;
+ u32 rtc_1_id;
+ struct list_head list_node;
+ unsigned long last_used;
+};
+
+struct mlx5hws_action_ste_pool_element {
+ struct mlx5hws_context *ctx;
+ struct mlx5hws_action_ste_pool *parent_pool;
+ size_t log_sz; /* Size of the largest table so far. */
+ enum mlx5hws_pool_optimize opt;
+ struct list_head available;
+ struct list_head full;
+};
+
+/* Central repository of action STEs. The context contains one of these pools
+ * per queue.
+ */
+struct mlx5hws_action_ste_pool {
+ /* Protects the entire pool. We have one pool per queue and only one
+ * operation can be active per rule at a given time. Thus this lock
+ * protects solely against concurrent garbage collection and we expect
+ * very little contention.
+ */
+ struct mutex lock;
+ struct mlx5hws_action_ste_pool_element elems[MLX5HWS_POOL_OPTIMIZE_MAX];
+};
+
+/* A chunk of STEs and the table it was allocated from. Used by rules. */
+struct mlx5hws_action_ste_chunk {
+ struct mlx5hws_action_ste_table *action_tbl;
+ struct mlx5hws_pool_chunk ste;
+};
+
+int mlx5hws_action_ste_pool_init(struct mlx5hws_context *ctx);
+
+void mlx5hws_action_ste_pool_uninit(struct mlx5hws_context *ctx);
+
+/* Callers are expected to fill chunk->ste.order. On success, this function
+ * populates chunk->tbl and chunk->ste.offset.
+ */
+int mlx5hws_action_ste_chunk_alloc(struct mlx5hws_action_ste_pool *pool,
+ bool skip_rx, bool skip_tx,
+ struct mlx5hws_action_ste_chunk *chunk);
+
+void mlx5hws_action_ste_chunk_free(struct mlx5hws_action_ste_chunk *chunk);
+
+#endif /* ACTION_STE_POOL_H_ */
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/steering/hws/bwc.c b/drivers/net/ethernet/m