From 31f61ac33032ee87ea404d6d996ba2c386502a36 Mon Sep 17 00:00:00 2001 From: Amery Hung Date: Tue, 14 Apr 2026 12:10:14 -0700 Subject: bpf: Refactor dynptr mutability tracking Redefine dynptr mutability and fix inconsistency in the verifier and kfunc signatures. Dynptr mutability is at two levels. The first is the bpf_dynptr structure and the second is the memory the dynptr points to. The verifer currently tracks the mutability of the bpf_dynptr struct through helper and kfunc prototypes, where "const struct bpf_dynptr *" means the structure itself is immutable. The second level is tracked in upper bit of bpf_dynptr->size in runtime and is not changed in this patch. There are two type of inconsistency in the verfier regarding the mutability of the bpf_dynptr struct. First, there are many existing kfuncs whose prototypes are wrong. For example, bpf_dynptr_adjust() mutates a dynptr's start and offset but marks the argument as a const pointer. At the same time many other kfuncs that does not mutate the dynptr but mark themselves as mutable. Second, the verifier currently does not honor the const qualifier in kfunc prototypes as it determines whether tagging the arg_type with MEM_RDONLY or not based on the register state. Since all the verifier care is to prevent CONST_PTR_TO_DYNPTR from being destroyed in callback and global subprogram, redefine the mutability at the bpf_dynptr level to just bpf_dynptr_kern->data. Then, explicitly prohibit passing CONST_PTR_TO_DYNPTR to an argument tagged with MEM_UNINIT or OBJ_RELEASE. The mutability of a dynptr's view is not really interesting so drop MEM_RDONLY annotation for dynptr from the helpers and kfuncs. Plus, if the mutability of the entire bpf_dynptr were to be done correctly, it would kill the bpf_dynptr_adjust() usage in callback and global subporgram. Implementation wise - First, make sure all kfunc arg are correctly tagged: Tag the dynptr argument of bpf_dynptr_file_discard() with OBJ_RELEASE. - Then, in process_dynptr_func(), make sure CONST_PTR_TO_DYNPTR cannot be passed to argument tagged with MEM_UNINIT or OBJ_RELEASE. For MEM_UNINIT, it is already checked by is_dynptr_reg_valid_uninit(). For OBJ_RELEASE, check against OBJ_RELEASE instead of MEM_RDONLY and drop a now identical check in unmark_stack_slots_dynptr(). - Remove the mutual exclusive check between MEM_UNINIT and MEM_RDONLY, but don't add a MEM_UNINIT and OBJ_RELEASE version as it is obviously wrong. Note that while this patch stops following the C semantic for the mutability of bpf_dynptr, the prototype of kfuncs are still fixed to maintain the correct C semantics in the implementation. Adding or removing the const qualifier does not break backward compatibility. In addition, fix kfuncs dropping the const qualifier when casting the opaque bpf_dynptr to bpf_dynptr_kern. In test_kfunc_dynptr_param.c, initialize dynptr to 0 to avoid -Wuninitialized-const-pointer warning. Signed-off-by: Amery Hung Acked-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/bpf/20260414191014.1218567-1-ameryhung@gmail.com Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf.h | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index b4b703c90ca9..3cb6b9e70080 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -3622,8 +3622,8 @@ static inline int bpf_fd_reuseport_array_update_elem(struct bpf_map *map, struct bpf_key *bpf_lookup_user_key(s32 serial, u64 flags); struct bpf_key *bpf_lookup_system_key(u64 id); void bpf_key_put(struct bpf_key *bkey); -int bpf_verify_pkcs7_signature(struct bpf_dynptr *data_p, - struct bpf_dynptr *sig_p, +int bpf_verify_pkcs7_signature(const struct bpf_dynptr *data_p, + const struct bpf_dynptr *sig_p, struct bpf_key *trusted_keyring); #else @@ -3641,8 +3641,8 @@ static inline void bpf_key_put(struct bpf_key *bkey) { } -static inline int bpf_verify_pkcs7_signature(struct bpf_dynptr *data_p, - struct bpf_dynptr *sig_p, +static inline int bpf_verify_pkcs7_signature(const struct bpf_dynptr *data_p, + const struct bpf_dynptr *sig_p, struct bpf_key *trusted_keyring) { return -EOPNOTSUPP; -- cgit v1.2.3 From f7a6b9eaff3e6693ba3b19c5812e28538049bbf2 Mon Sep 17 00:00:00 2001 From: Alan Maguire Date: Fri, 17 Apr 2026 15:30:18 +0100 Subject: bpf: Extend BTF UAPI vlen, kinds to use unused bits BTF maximum vlen is encoded using 16 bits with a maximum vlen of 65535. This has sufficed for structs, function parameters and enumerated type values. However, with upcoming BTF location information - in particular information about inline sites - this limit is surpassed. Use bits 16-23 - currently unused in BTF info - to extend to 24 bits, giving a max vlen of (2^24 - 1), or 16 million. Also extend BTF kind encoding from 5 to 7 bits, giving a maximum available number of kinds of 128. Since with the BTF location work we use another 3 kinds, we are fast approaching the current limit of 32. Convert BTF_MAX_* values to enums to allow them to be encoded in kernel BTF; this will allow us to detect if the running kernel supports a 24-bit vlen or not. Add one for max _possible_ (not used) kind. Fix up a few places in the kernel where a 16-bit vlen is assumed; remove BTF_INFO_MASK as now all bits are used. The vlen expansion was suggested by Andrii in [1]; the kind expansion is tackled here too as it may be needed also to support new kinds in BTF. [1] https://lore.kernel.org/bpf/CAEf4BzZx=X6vGqcA8SPU6D+v6k+TR=ZewebXMuXtpmML058piw@mail.gmail.com/ Suggested-by: Andrii Nakryiko Signed-off-by: Alan Maguire Acked-by: Mykyta Yatsenko Link: https://lore.kernel.org/r/20260417143023.1551481-2-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov --- include/linux/btf.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index 48108471c5b1..c82d0d689059 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -415,12 +415,12 @@ static inline bool btf_type_is_array(const struct btf_type *t) return BTF_INFO_KIND(t->info) == BTF_KIND_ARRAY; } -static inline u16 btf_type_vlen(const struct btf_type *t) +static inline u32 btf_type_vlen(const struct btf_type *t) { return BTF_INFO_VLEN(t->info); } -static inline u16 btf_vlen(const struct btf_type *t) +static inline u32 btf_vlen(const struct btf_type *t) { return btf_type_vlen(t); } -- cgit v1.2.3 From 12628ffaf98b708a80857a462613119b9e16de4c Mon Sep 17 00:00:00 2001 From: Mykyta Yatsenko Date: Wed, 22 Apr 2026 12:41:07 -0700 Subject: bpf: Add bpf_prog_run_array_sleepable() Add bpf_prog_run_array_sleepable() for running BPF program arrays on faultable tracepoints. Unlike bpf_prog_run_array_uprobe(), it includes per-program recursion checking for private stack safety and hardcodes is_uprobe to false. Skip dummy_bpf_prog at the top of the loop. When bpf_prog_array_delete_safe() replaces a detached program with dummy_bpf_prog on allocation failure, the dummy is statically allocated and has NULL active, stats, and aux fields. Identify it by prog->len == 0, since every real program has at least one instruction. Keep bpf_prog_run_array_uprobe() unchanged for uprobe callers. Signed-off-by: Mykyta Yatsenko Acked-by: Kumar Kartikeya Dwivedi Link: https://lore.kernel.org/bpf/20260422-sleepable_tracepoints-v13-2-99005dff21ef@meta.com Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf.h | 50 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 50 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 3cb6b9e70080..d3aea3931b85 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -3079,6 +3079,56 @@ void bpf_dynptr_set_null(struct bpf_dynptr_kern *ptr); void bpf_dynptr_set_rdonly(struct bpf_dynptr_kern *ptr); void bpf_prog_report_arena_violation(bool write, unsigned long addr, unsigned long fault_ip); +static __always_inline u32 +bpf_prog_run_array_sleepable(const struct bpf_prog_array *array, + const void *ctx, bpf_prog_run_fn run_prog) +{ + const struct bpf_prog_array_item *item; + struct bpf_prog *prog; + struct bpf_run_ctx *old_run_ctx; + struct bpf_trace_run_ctx run_ctx; + u32 ret = 1; + + if (unlikely(!array)) + return ret; + + migrate_disable(); + + run_ctx.is_uprobe = false; + + old_run_ctx = bpf_set_run_ctx(&run_ctx.run_ctx); + item = &array->items[0]; + while ((prog = READ_ONCE(item->prog))) { + /* Skip dummy_bpf_prog placeholder (len == 0) */ + if (unlikely(!prog->len)) { + item++; + continue; + } + + if (unlikely(!bpf_prog_get_recursion_context(prog))) { + bpf_prog_inc_misses_counter(prog); + bpf_prog_put_recursion_context(prog); + item++; + continue; + } + + run_ctx.bpf_cookie = item->bpf_cookie; + + if (!prog->sleepable) { + guard(rcu)(); + ret &= run_prog(prog, ctx); + } else { + ret &= run_prog(prog, ctx); + } + + bpf_prog_put_recursion_context(prog); + item++; + } + bpf_reset_run_ctx(old_run_ctx); + migrate_enable(); + return ret; +} + #else /* !CONFIG_BPF_SYSCALL */ static inline struct bpf_prog *bpf_prog_get(u32 ufd) { -- cgit v1.2.3 From 57918341dd19e5ca8a77622ffae3db19e5ba4cc7 Mon Sep 17 00:00:00 2001 From: Mykyta Yatsenko Date: Wed, 22 Apr 2026 12:41:08 -0700 Subject: bpf: Add sleepable support for classic tracepoint programs Add trace_call_bpf_faultable(), a variant of trace_call_bpf() for faultable tracepoints that supports sleepable BPF programs. It uses rcu_tasks_trace for lifetime protection and bpf_prog_run_array_sleepable() for per-program RCU flavor selection, following the uprobe_prog_run() pattern. Restructure perf_syscall_enter() and perf_syscall_exit() to run BPF programs before perf event processing. Previously, BPF ran after the per-cpu perf trace buffer was allocated under preempt_disable, requiring cleanup via perf_swevent_put_recursion_context() on filter. Now BPF runs in faultable context before preempt_disable, reading syscall arguments from local variables instead of the per-cpu trace record, removing the dependency on buffer allocation. This allows sleepable BPF programs to execute and avoids unnecessary buffer allocation when BPF filters the event. The perf event submission path (buffer allocation, fill, submit) remains under preempt_disable as before. Since BPF no longer runs within the buffer allocation context, the fake_regs output parameter to perf_trace_buf_alloc() is no longer needed and is replaced with NULL. Add an attach-time check in __perf_event_set_bpf_prog() to reject sleepable BPF_PROG_TYPE_TRACEPOINT programs on non-syscall tracepoints, since only syscall tracepoints run in faultable context. This prepares the classic tracepoint runtime and attach paths for sleepable programs. The verifier changes to allow loading sleepable BPF_PROG_TYPE_TRACEPOINT programs are in a subsequent patch. To: Peter Zijlstra To: Steven Rostedt Signed-off-by: Mykyta Yatsenko Acked-by: Kumar Kartikeya Dwivedi # for BPF bits Acked-by: Steven Rostedt Link: https://lore.kernel.org/bpf/20260422-sleepable_tracepoints-v13-3-99005dff21ef@meta.com Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/trace_events.h | 6 ++++++ 1 file changed, 6 insertions(+) (limited to 'include/linux') diff --git a/include/linux/trace_events.h b/include/linux/trace_events.h index 40a43a4c7caf..d49338c44014 100644 --- a/include/linux/trace_events.h +++ b/include/linux/trace_events.h @@ -770,6 +770,7 @@ trace_trigger_soft_disabled(struct trace_event_file *file) #ifdef CONFIG_BPF_EVENTS unsigned int trace_call_bpf(struct trace_event_call *call, void *ctx); +unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx); int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie); void perf_event_detach_bpf_prog(struct perf_event *event); int perf_event_query_prog_array(struct perf_event *event, void __user *info); @@ -792,6 +793,11 @@ static inline unsigned int trace_call_bpf(struct trace_event_call *call, void *c return 1; } +static inline unsigned int trace_call_bpf_faultable(struct trace_event_call *call, void *ctx) +{ + return 1; +} + static inline int perf_event_attach_bpf_prog(struct perf_event *event, struct bpf_prog *prog, u64 bpf_cookie) { -- cgit v1.2.3 From 9b9f0b42703ceb88332bcb19453c4288c2683e34 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Wed, 22 Apr 2026 20:35:01 -0700 Subject: bpf: Prepare verifier logs for upcoming kfunc stack arguments This change prepares verifier log reporting for upcoming kfunc stack argument support. Currently verifier log code mostly assumes that an argument can be described directly by a register number. That works for arguments passed in `R1` to `R5`, but it does not work once kfunc arguments can also be passed on the stack. Introduce an opaque `argno_t` type that encodes both register-based and arg-based references. Four helpers form the interface: - argno_from_reg(regno): create from a register number - argno_from_arg(arg): create from a 1-based arg number - reg_from_argno(a): extract register number, or -1 - arg_from_argno(a): extract arg number, or -1 reg_arg_name() converts an argno_t to a human-readable string for verifier logs: "R%d" for register arguments, or "*(R11-off)" for stack arguments beyond R5. Update selftests accordingly. Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20260423033501.2539667-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index b148f816f25b..d5b4303315dd 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -913,6 +913,7 @@ struct bpf_verifier_env { * e.g., in reg_type_str() to generate reg_type string */ char tmp_str_buf[TMP_STR_BUF_LEN]; + char tmp_arg_name[32]; struct bpf_insn insn_buf[INSN_BUF_SIZE]; struct bpf_insn epilogue_buf[INSN_BUF_SIZE]; struct bpf_scc_callchain callchain_buf; -- cgit v1.2.3 From 246ad6e5ee259669692bdb7fb353e8c5d5bba628 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Wed, 22 Apr 2026 20:35:06 -0700 Subject: bpf: Introduce bpf register BPF_REG_PARAMS Introduce BPF_REG_PARAMS as a dedicated BPF register for stack argument accesses. It occupies the BPF register number 11 (R11), which is used as the base pointer for the stack argument area, keeping it separate from the R10-based (BPF_REG_FP) program stack. The kernel-internal hidden register BPF_REG_AX previously occupied slot 11 (MAX_BPF_REG). With BPF_REG_PARAMS taking that slot, BPF_REG_AX moves to slot 12 and MAX_BPF_EXT_REG increases accordingly. Acked-by: Puranjay Mohan Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20260423033506.2542005-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/filter.h | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index 1ec6d5ba64cc..b77d0b06db6e 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -58,8 +58,9 @@ struct ctl_table_header; #define BPF_REG_H BPF_REG_9 /* hlen, callee-saved */ /* Kernel hidden auxiliary/helper register. */ -#define BPF_REG_AX MAX_BPF_REG -#define MAX_BPF_EXT_REG (MAX_BPF_REG + 1) +#define BPF_REG_PARAMS MAX_BPF_REG +#define BPF_REG_AX (MAX_BPF_REG + 1) +#define MAX_BPF_EXT_REG (MAX_BPF_REG + 2) #define MAX_BPF_JIT_REG MAX_BPF_EXT_REG /* unused opcode to mark special call to bpf_tail_call() helper */ -- cgit v1.2.3 From 4439328d3878c97fdf5ddec828a43ea07c388452 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Wed, 22 Apr 2026 20:35:11 -0700 Subject: bpf: Reuse MAX_BPF_FUNC_ARGS for maximum number of arguments Currently, MAX_BPF_FUNC_ARGS is used for tracepoint related progs where the number of parameters cannot exceed MAX_BPF_FUNC_ARGS. Here, MAX_BPF_FUNC_ARGS is reused to set a limit of the number of arguments for bpf functions and kfuncs. The current value for MAX_BPF_FUNC_ARGS is 12 which should be sufficient for majority of bpf functions and kfuncs. Acked-by: Puranjay Mohan Acked-by: Kumar Kartikeya Dwivedi Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20260423033511.2542870-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 5 +++++ 1 file changed, 5 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index d3aea3931b85..715b6df9c403 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -1151,6 +1151,11 @@ struct bpf_prog_offload { /* The longest tracepoint has 12 args. * See include/trace/bpf_probe.h + * + * Also reuse this macro for maximum number of arguments a BPF function + * or a kfunc can have. Args 1-5 are passed in registers, args 6-12 via + * stack arg slots. The JIT may map some stack arg slots to registers based + * on the native calling convention (e.g., arg 6 to R9 on x86-64). */ #define MAX_BPF_FUNC_ARGS 12 -- cgit v1.2.3 From 256f0071f9b61ae5028f749449fd3fdad015889d Mon Sep 17 00:00:00 2001 From: Eduard Zingerman Date: Fri, 24 Apr 2026 15:52:42 -0700 Subject: bpf: representation and basic operations on circular numbers MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit adds basic definitions for cnum32/cnum64. This is a unified numeric range representation for signed and unsigned domains. Inspired by an old post from Shung-Hsi Yu [1] and paper [2]. Operations correctness is verified using cbmc model checker, tests source code can be found in a separate repo [3]. The cnum64_cnum32_intersect() function is notable, because it handled several cases verifier.c:deduce_bounds_64_from_32() does not. Given: - a is a 64-bit range - b is a 32-bit range - t is a refined 64-bit range, such that ∀ v ∈ a, (u32)v ∈ b: v ∈ t. cnum64_cnum32_intersect() makes the following deductions: (A): 'b' is a sub-range of the first or the last 32-bit sub-range of 'a': 64-bit number axis ---> N*2^32 (N+1)*2^32 (N+2)*2^32 (N+3)*2^32 ||------|---|=====|-------||----------|=====|-------||----------|=====|----|--|| | |< b >| |< b >| |< b >| | | | | | |<--+--------------------------- a ---------------------------+--->| | | |<-------------------------- t -------------------------->| (B) 'b' does not intersect with the first of the last 32-bit sub-range of 'a': N*2^32 (N+1)*2^32 (N+2)*2^32 (N+3)*2^32 ||--|=====|----|----------||--|=====|---------------||--|=====|------------|--|| |< b >| | |< b >| |< b >| | | | | | |<-------------+--------- a -------------------|----------->| | | |<-------- t ------------------>| (C) 'b' crosses 0/U32_MAX boundary: N*2^32 (N+1)*2^32 (N+2)*2^32 (N+3)*2^32 ||===|---------|------|===||===|----------------|===||===|---------|------|===|| |b >| | |< b||b >| |< b||b >| | |< b| | | | | |<-----+----------------- a --------------+-------->| | | |<---------------- t ------------->| Current implementation of deduce_bounds_64_from_32() only handles case (A). [1] https://lore.kernel.org/all/ZTZxoDJJbX9mrQ9w@u94a/ [2] https://jorgenavas.github.io/papers/ACM-TOPLAS-wrapped.pdf [3] https://github.com/eddyz87/cnum-verif/tree/master Signed-off-by: Eduard Zingerman Link: https://lore.kernel.org/r/20260424-cnums-everywhere-rfc-v1-v3-1-ca434b39a486@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/cnum.h | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 80 insertions(+) create mode 100644 include/linux/cnum.h (limited to 'include/linux') diff --git a/include/linux/cnum.h b/include/linux/cnum.h new file mode 100644 index 000000000000..a7259b105b45 --- /dev/null +++ b/include/linux/cnum.h @@ -0,0 +1,80 @@ +/* SPDX-License-Identifier: GPL-2.0-only */ +/* Copyright (c) 2026 Meta Platforms, Inc. and affiliates. */ + +#ifndef _LINUX_CNUM_H +#define _LINUX_CNUM_H + +#include + +/* + * cnum32: a circular number. + * A unified representation for signed and unsigned ranges. + * + * Assume that a 32-bit range is a circle, with 0 being in the 12 o'clock + * position, numbers placed sequentially in clockwise order and U32_MAX + * in the 11 o'clock position. Signed values map onto the same circle: + * S32_MAX sits at 5 o'clock, S32_MIN sits at 6 o'clock (opposite 0), + * negative values occupy the left half and positive values the right half. + * + * @cnum32 represents an arc on this circle drawn clockwise. + * @base corresponds to the first value of the range. + * @size corresponds to the number of integers in the range excluding @base. + * (The @base is excluded to avoid integer overflow when representing the full + * 0..U32_MAX range, which corresponds to 2^32, which can't be stored in u32). + * + * For example: {U32_MAX, 1} corresponds to signed range [-1, 0], + * {S32_MAX, 1} corresponds to unsigned range [S32_MAX, S32_MIN]. + */ +struct cnum32 { + u32 base; + u32 size; +}; + +#define CNUM32_UNBOUNDED ((struct cnum32){ .base = 0, .size = U32_MAX }) +#define CNUM32_EMPTY ((struct cnum32){ .base = U32_MAX, .size = U32_MAX }) + +struct cnum32 cnum32_from_urange(u32 min, u32 max); +struct cnum32 cnum32_from_srange(s32 min, s32 max); +u32 cnum32_umin(struct cnum32 cnum); +u32 cnum32_umax(struct cnum32 cnum); +s32 cnum32_smin(struct cnum32 cnum); +s32 cnum32_smax(struct cnum32 cnum); +struct cnum32 cnum32_intersect(struct cnum32 a, struct cnum32 b); +void cnum32_intersect_with(struct cnum32 *dst, struct cnum32 src); +void cnum32_intersect_with_urange(struct cnum32 *dst, u32 min, u32 max); +void cnum32_intersect_with_srange(struct cnum32 *dst, s32 min, s32 max); +bool cnum32_contains(struct cnum32 cnum, u32 v); +bool cnum32_is_const(struct cnum32 cnum); +bool cnum32_is_empty(struct cnum32 cnum); +struct cnum32 cnum32_add(struct cnum32 a, struct cnum32 b); +struct cnum32 cnum32_negate(struct cnum32 a); + +/* Same as cnum32 but for 64-bit ranges */ +struct cnum64 { + u64 base; + u64 size; +}; + +#define CNUM64_UNBOUNDED ((struct cnum64){ .base = 0, .size = U64_MAX }) +#define CNUM64_EMPTY ((struct cnum64){ .base = U64_MAX, .size = U64_MAX }) + +struct cnum64 cnum64_from_urange(u64 min, u64 max); +struct cnum64 cnum64_from_srange(s64 min, s64 max); +u64 cnum64_umin(struct cnum64 cnum); +u64 cnum64_umax(struct cnum64 cnum); +s64 cnum64_smin(struct cnum64 cnum); +s64 cnum64_smax(struct cnum64 cnum); +struct cnum64 cnum64_intersect(struct cnum64 a, struct cnum64 b); +void cnum64_intersect_with(struct cnum64 *dst, struct cnum64 src); +void cnum64_intersect_with_urange(struct cnum64 *dst, u64 min, u64 max); +void cnum64_intersect_with_srange(struct cnum64 *dst, s64 min, s64 max); +bool cnum64_contains(struct cnum64 cnum, u64 v); +bool cnum64_is_const(struct cnum64 cnum); +bool cnum64_is_empty(struct cnum64 cnum); +struct cnum64 cnum64_add(struct cnum64 a, struct cnum64 b); +struct cnum64 cnum64_negate(struct cnum64 a); + +struct cnum32 cnum32_from_cnum64(struct cnum64 cnum); +struct cnum64 cnum64_cnum32_intersect(struct cnum64 a, struct cnum32 b); + +#endif /* _LINUX_CNUM_H */ -- cgit v1.2.3 From b93f7180f0bc37336cb26b43aa4796973d84852e Mon Sep 17 00:00:00 2001 From: Eduard Zingerman Date: Fri, 24 Apr 2026 15:52:43 -0700 Subject: bpf: use accessor functions for bpf_reg_state min/max fields Replace direct access to bpf_reg_state->{smin,smax,umin,umax, s32_min,s32_max,u32_min,u32_max}_value with getter/setter inline functions, preparing for future switch to cnum-based internal representation. Signed-off-by: Eduard Zingerman Link: https://lore.kernel.org/r/20260424-cnums-everywhere-rfc-v1-v3-2-ca434b39a486@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 64 ++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 64 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index d5b4303315dd..bf3ffa56bbe5 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -209,6 +209,70 @@ struct bpf_reg_state { bool precise; }; +static inline s64 reg_smin(const struct bpf_reg_state *reg) +{ + return reg->smin_value; +} + +static inline s64 reg_smax(const struct bpf_reg_state *reg) +{ + return reg->smax_value; +} + +static inline u64 reg_umin(const struct bpf_reg_state *reg) +{ + return reg->umin_value; +} + +static inline u64 reg_umax(const struct bpf_reg_state *reg) +{ + return reg->umax_value; +} + +static inline s32 reg_s32_min(const struct bpf_reg_state *reg) +{ + return reg->s32_min_value; +} + +static inline s32 reg_s32_max(const struct bpf_reg_state *reg) +{ + return reg->s32_max_value; +} + +static inline u32 reg_u32_min(const struct bpf_reg_state *reg) +{ + return reg->u32_min_value; +} + +static inline u32 reg_u32_max(const struct bpf_reg_state *reg) +{ + return reg->u32_max_value; +} + +static inline void reg_set_srange32(struct bpf_reg_state *reg, s32 smin, s32 smax) +{ + reg->s32_min_value = smin; + reg->s32_max_value = smax; +} + +static inline void reg_set_urange32(struct bpf_reg_state *reg, u32 umin, u32 umax) +{ + reg->u32_min_value = umin; + reg->u32_max_value = umax; +} + +static inline void reg_set_srange64(struct bpf_reg_state *reg, s64 smin, s64 smax) +{ + reg->smin_value = smin; + reg->smax_value = smax; +} + +static inline void reg_set_urange64(struct bpf_reg_state *reg, u64 umin, u64 umax) +{ + reg->umin_value = umin; + reg->umax_value = umax; +} + enum bpf_stack_slot_type { STACK_INVALID, /* nothing was stored in this stack slot */ STACK_SPILL, /* register spilled into stack */ -- cgit v1.2.3 From bbc631085503a7fde9617be18b0657cc9a83910a Mon Sep 17 00:00:00 2001 From: Eduard Zingerman Date: Fri, 24 Apr 2026 15:52:44 -0700 Subject: bpf: replace min/max fields with struct cnum{32,64} Replace eight independent s64, u64, s32, u32 min/max fields in bpf_reg_state with two circular number fields: - cnum64 for a unified signed/unsigned 64-bit range tracking; - cnum32 for a unified signed/unsigned 32-bit range tracking. Each cnum represents a range as a single arc on the circular number line (base + size), from which signed and unsigned bounds are derived on demand via accessor functions introduced in the preceding commit. Notable changes: - Signed<->unsigned deductions in __reg_deduce_bounds() are removed. - 64<->32 bit deductions are replaced with: - reg->r32 = cnum32_intersect(reg->r32, cnum32_from_cnum64(reg->r64)); this is functionally equivalent to the old code. - reg->r64 = cnum64_cnum32_intersect(reg->r64, reg->r32); this handles a few additional cases, see commit message for "bpf: representation and basic operations on circular numbers". - regs_refine_cond_op() now computes results in terms of operations on sets, e.g. for JNE: /* Complement of the range [val, val] as cnum64. */ lo = (struct cnum64){ val + 1, U64_MAX - 1 }; reg1->r64 = cnum64_intersect(reg1->r64, lo); - For add, sub operations on scalars replace explicit bounds computations with cnum{32,64}_{add,negate}. - For add, sub operations on pointers deduplicate with arithmetic operations on scalars and use cnum{32,64}_{add,negate}. - For and, or, xor operations on scalars remove explicit signed bounds computations. - range_bounds_violation() reduces to checking cnum_is_empty(). - const_tnum_range_mismatch() reduces to checking cnum_is_const(). Selftest adjustments: a few existing tests are updated because a single cnum arc cannot always represent what the old system expressed as the intersection of independent signed and unsigned ranges. For example, if the old system tracked u64=[0, U64_MAX-U32_MAX+2] and s64=[S64_MIN+2, 2] independently, their intersection is a tight two-point set. A single cnum must pick the shorter arc, losing the other constraint. These cases are documented with comments in the adjusted tests. reg_bounds.c is updated with logic similar to cnum64_cnum32_intersect(). Instead of using cnums it inspects intersection between 'b' and first / last / next-after-first / previous-before-last sub-ranges of 'a'. reg_bounds.c is also updated to skip test cases that rely in signed and unsigned ranges intersecting in two intervals, as such cases are not representable by a single cnum. The following "crafted" test cases are affected: - reg_bounds_crafted/(s64)[0xffffffffffff8000; 0x7fff] (u32) [0; 0x1f] - reg_bounds_crafted/(s64)[0; 0x1f] (u32) [0xffffffffffffff80; 0x7f] - reg_bounds_crafted/(s64)[0xffffffffffffff80; 0x7f] (u32) [0; 0x1f] - reg_bounds_crafted/(u64)[0; 1] (s32) [1; 2147483648] - reg_bounds_crafted/(u64)[1; 2147483648] (s32) [0; 1] - reg_bounds_crafted/(u64)[0; 0xffffffff00000000] (s64) 0 - reg_bounds_crafted/(u64)0 (s64) [0; 0xffffffff00000000] - reg_bounds_crafted/(u64)[0; 0xffffffff00000000] (s32) 0 - reg_bounds_crafted/(u64)0 (s32) [0; 0xffffffff00000000] - reg_bounds_crafted/(s64)[S64_MIN; 0] (u64) S64_MIN - reg_bounds_crafted/(s64)S64_MIN (u64) [S64_MIN; 0] - reg_bounds_crafted/(s32)[S32_MIN; 0] (u32) S32_MIN - reg_bounds_crafted/(s32)S32_MIN (u32) [S32_MIN; 0] - reg_bounds_crafted/(s64)[0; 0x1f] (u32) [0xffffffff80000000; 0x7fffffff] - reg_bounds_crafted/(s64)[0xffffffff80000000; 0x7fffffff] (u32) [0; 0x1f] - reg_bounds_crafted/(s64)[0; 0x1f] (u32) [0xffffffffffff8000; 0x7fff] As well as some reg_bounds_roand_{consts,ranges}_A_B, where A and B differ in sign domain. Signed-off-by: Eduard Zingerman Link: https://lore.kernel.org/r/20260424-cnums-everywhere-rfc-v1-v3-3-ca434b39a486@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 39 +++++++++++++++------------------------ 1 file changed, 15 insertions(+), 24 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index bf3ffa56bbe5..101ca6cc5424 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -8,6 +8,7 @@ #include /* for struct btf and btf_id() */ #include /* for MAX_BPF_STACK */ #include +#include /* Maximum variable offset umax_value permitted when resolving memory accesses. * In practice this is far bigger than any realistic pointer offset; this limit @@ -120,14 +121,8 @@ struct bpf_reg_state { * These refer to the same value as var_off, not necessarily the actual * contents of the register. */ - s64 smin_value; /* minimum possible (s64)value */ - s64 smax_value; /* maximum possible (s64)value */ - u64 umin_value; /* minimum possible (u64)value */ - u64 umax_value; /* maximum possible (u64)value */ - s32 s32_min_value; /* minimum possible (s32)value */ - s32 s32_max_value; /* maximum possible (s32)value */ - u32 u32_min_value; /* minimum possible (u32)value */ - u32 u32_max_value; /* maximum possible (u32)value */ + struct cnum64 r64; /* 64-bit range as circular number */ + struct cnum32 r32; /* 32-bit range as circular number */ /* For PTR_TO_PACKET, used to find other pointers with the same variable * offset, so they can share range knowledge. * For PTR_TO_MAP_VALUE_OR_NULL this is used to share which map value we @@ -211,66 +206,62 @@ struct bpf_reg_state { static inline s64 reg_smin(const struct bpf_reg_state *reg) { - return reg->smin_value; + return cnum64_smin(reg->r64); } static inline s64 reg_smax(const struct bpf_reg_state *reg) { - return reg->smax_value; + return cnum64_smax(reg->r64); } static inline u64 reg_umin(const struct bpf_reg_state *reg) { - return reg->umin_value; + return cnum64_umin(reg->r64); } static inline u64 reg_umax(const struct bpf_reg_state *reg) { - return reg->umax_value; + return cnum64_umax(reg->r64); } static inline s32 reg_s32_min(const struct bpf_reg_state *reg) { - return reg->s32_min_value; + return cnum32_smin(reg->r32); } static inline s32 reg_s32_max(const struct bpf_reg_state *reg) { - return reg->s32_max_value; + return cnum32_smax(reg->r32); } static inline u32 reg_u32_min(const struct bpf_reg_state *reg) { - return reg->u32_min_value; + return cnum32_umin(reg->r32); } static inline u32 reg_u32_max(const struct bpf_reg_state *reg) { - return reg->u32_max_value; + return cnum32_umax(reg->r32); } static inline void reg_set_srange32(struct bpf_reg_state *reg, s32 smin, s32 smax) { - reg->s32_min_value = smin; - reg->s32_max_value = smax; + reg->r32 = cnum32_from_srange(smin, smax); } static inline void reg_set_urange32(struct bpf_reg_state *reg, u32 umin, u32 umax) { - reg->u32_min_value = umin; - reg->u32_max_value = umax; + reg->r32 = cnum32_from_urange(umin, umax); } static inline void reg_set_srange64(struct bpf_reg_state *reg, s64 smin, s64 smax) { - reg->smin_value = smin; - reg->smax_value = smax; + reg->r64 = cnum64_from_srange(smin, smax); } static inline void reg_set_urange64(struct bpf_reg_state *reg, u64 umin, u64 umax) { - reg->umin_value = umin; - reg->umax_value = umax; + reg->r64 = cnum64_from_urange(umin, umax); } enum bpf_stack_slot_type { -- cgit v1.2.3 From cd5b460ed1eca9e48f3eb07db1ee0a522c0eaa23 Mon Sep 17 00:00:00 2001 From: Eduard Zingerman Date: Sat, 25 Apr 2026 15:48:23 -0700 Subject: bpf: range_within() must check cnum ranges instead of min/max pairs states.c:range_within() must be updated to properly check if cnum-based range in an old state is a superset of a range in the cur state. Currently it makes the decision using min/max accessors: reg_umin(old) <= reg_umin(cur) <= reg_umax(old) This is wrong for cnums that cross both UT_MAX/0 and ST_MAX/ST_MIN boundaries. Consider cnum32{base=0x7FFFFFF0, size=0x80000020}, which represents values [0x7FFFFFF0, ..., U32_MAX, 0, ..., 0x10]. Its projections are u32_min/max=0/U32_MAX, s32_min/max=S32_MIN/MAX. A register with range [0x100, 0x200] (which lies entirely in the gap of the wrapping range) would pass the min/max check despite having no overlap with the actual cnum arc. This commit replaces min/max comparison with cnum{32,64}_is_subset() operation. The operation implementation is verified using cbmc model checker in [1]. [1] https://github.com/eddyz87/cnum-verif/ Fixes: bbc631085503 ("bpf: replace min/max fields with struct cnum{32,64}") Signed-off-by: Eduard Zingerman Link: https://lore.kernel.org/r/20260425-cnum-range-within-v1-1-2fdca70cb09d@gmail.com Signed-off-by: Alexei Starovoitov --- include/linux/cnum.h | 2 ++ 1 file changed, 2 insertions(+) (limited to 'include/linux') diff --git a/include/linux/cnum.h b/include/linux/cnum.h index a7259b105b45..49b7d0c7645d 100644 --- a/include/linux/cnum.h +++ b/include/linux/cnum.h @@ -48,6 +48,7 @@ bool cnum32_is_const(struct cnum32 cnum); bool cnum32_is_empty(struct cnum32 cnum); struct cnum32 cnum32_add(struct cnum32 a, struct cnum32 b); struct cnum32 cnum32_negate(struct cnum32 a); +bool cnum32_is_subset(struct cnum32 outer, struct cnum32 inner); /* Same as cnum32 but for 64-bit ranges */ struct cnum64 { @@ -73,6 +74,7 @@ bool cnum64_is_const(struct cnum64 cnum); bool cnum64_is_empty(struct cnum64 cnum); struct cnum64 cnum64_add(struct cnum64 a, struct cnum64 b); struct cnum64 cnum64_negate(struct cnum64 a); +bool cnum64_is_subset(struct cnum64 outer, struct cnum64 inner); struct cnum32 cnum32_from_cnum64(struct cnum64 cnum); struct cnum64 cnum64_cnum32_intersect(struct cnum64 a, struct cnum32 b); -- cgit v1.2.3 From f603e84ab7918db6470c0b06b46ece7fbdb71e9a Mon Sep 17 00:00:00 2001 From: Paul Chaignon Date: Thu, 30 Apr 2026 10:44:28 +0200 Subject: bpf: Print breakdown of insns processed by subprogs When using global functions (i.e. subprogs), the verifier performs function-by-function verification. In that case, the sum of the instructions processed in each global function and in the main program counts towards the 1 million instructions limit. Only that sum is reported in the verifier logs. While starting to use global functions in Cilium (finally!), we found it can be useful to have the breakdown per global function, to understand exactly where the budget is currently spent. This patch implements this breakdown, under BPF_LOG_STATS, as done for the stack depths. When iterating over subprogs, we need to skip the hidden subprogs at the end because they don't have a corresponding func_info_aux entry and calling bpf_subprog_is_global() would result in an OOB access. Signed-off-by: Paul Chaignon Link: https://lore.kernel.org/bpf/5590f9c67e614ec9054d0c7e74e87cc690a52c56.1777538384.git.paul.chaignon@gmail.com Signed-off-by: Kumar Kartikeya Dwivedi --- include/linux/bpf_verifier.h | 1 + 1 file changed, 1 insertion(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 101ca6cc5424..976e2b2f40e8 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -779,6 +779,7 @@ struct bpf_subprog_info { u32 exit_idx; /* Index of one of the BPF_EXIT instructions in this subprogram */ u16 stack_depth; /* max. stack depth used by this function */ u16 stack_extra; + u32 insn_processed; /* offsets in range [stack_depth .. fastcall_stack_off) * are used for bpf_fastcall spills and fills. */ -- cgit v1.2.3 From f28771c0691bcb7f477a0f35550b17b88c32dea8 Mon Sep 17 00:00:00 2001 From: Leon Hwang Date: Tue, 12 May 2026 23:31:50 +0800 Subject: bpf: Extend BPF syscall with common attributes support Add generic BPF syscall support for passing common attributes. The initial set of common attributes includes: 1. 'log_buf': User-provided buffer for storing logs. 2. 'log_size': Size of the log buffer. 3. 'log_level': Log verbosity level. 4. 'log_true_size': Actual log size reported by kernel. The common-attribute pointer and its size are passed as the 4th and 5th syscall arguments. A new command bit, 'BPF_COMMON_ATTRS' ('1 << 16'), indicates that common attributes are supplied. This commit adds syscall and uapi plumbing. Command-specific handling is added in follow-up patches. Signed-off-by: Leon Hwang Link: https://lore.kernel.org/r/20260512153157.28382-2-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/syscalls.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/syscalls.h b/include/linux/syscalls.h index f5639d5ac331..50055ab73649 100644 --- a/include/linux/syscalls.h +++ b/include/linux/syscalls.h @@ -936,7 +936,8 @@ asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, asmlinkage long sys_getrandom(char __user *buf, size_t count, unsigned int flags); asmlinkage long sys_memfd_create(const char __user *uname_ptr, unsigned int flags); -asmlinkage long sys_bpf(int cmd, union bpf_attr __user *attr, unsigned int size); +asmlinkage long sys_bpf(int cmd, union bpf_attr __user *attr, unsigned int size, + struct bpf_common_attr __user *attr_common, unsigned int size_common); asmlinkage long sys_execveat(int dfd, const char __user *filename, const char __user *const __user *argv, const char __user *const __user *envp, int flags); -- cgit v1.2.3 From 503c039ffeca7530ce9d6446a07b4bb776180b45 Mon Sep 17 00:00:00 2001 From: Leon Hwang Date: Tue, 12 May 2026 23:31:52 +0800 Subject: bpf: Refactor reporting log_true_size for prog_load The next commit will add support for reporting logs via extended common attributes, including 'log_true_size'. To prepare for that, refactor the 'log_true_size' reporting logic by introducing a new struct bpf_log_attr to encapsulate log-related behavior: * bpf_log_attr_init(): initialize log fields, which will support extended common attributes in the next commit. * bpf_log_attr_finalize(): handle log finalization and write back 'log_true_size' to userspace. Acked-by: Andrii Nakryiko Signed-off-by: Leon Hwang Link: https://lore.kernel.org/r/20260512153157.28382-4-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf.h | 4 +++- include/linux/bpf_verifier.h | 12 ++++++++++++ 2 files changed, 15 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf.h b/include/linux/bpf.h index 14759972f148..9e16e91647d3 100644 --- a/include/linux/bpf.h +++ b/include/linux/bpf.h @@ -2919,7 +2919,9 @@ int bpf_check_uarg_tail_zero(bpfptr_t uaddr, size_t expected_size, size_t actual_size); /* verify correctness of eBPF program */ -int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr, u32 uattr_size); +struct bpf_log_attr; +int bpf_check(struct bpf_prog **fp, union bpf_attr *attr, bpfptr_t uattr, + struct bpf_log_attr *attr_log); #ifndef CONFIG_BPF_JIT_ALWAYS_ON void bpf_patch_call_args(struct bpf_insn *insn, u32 stack_depth); diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 976e2b2f40e8..8d27ad1f9f94 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -755,6 +755,18 @@ static inline bool bpf_verifier_log_needed(const struct bpf_verifier_log *log) return log && log->level; } +struct bpf_log_attr { + char __user *ubuf; + u32 size; + u32 level; + u32 offsetof_true_size; + bpfptr_t uattr; +}; + +int bpf_log_attr_init(struct bpf_log_attr *log, u64 log_buf, u32 log_size, u32 log_level, + u32 offsetof_log_true_size, bpfptr_t uattr); +int bpf_log_attr_finalize(struct bpf_log_attr *attr, struct bpf_verifier_log *log); + #define BPF_MAX_SUBPROGS 256 struct bpf_subprog_arg_info { -- cgit v1.2.3 From ac89d33fdd8183df39fe92ffa525be7af6feb9d1 Mon Sep 17 00:00:00 2001 From: Leon Hwang Date: Tue, 12 May 2026 23:31:53 +0800 Subject: bpf: Add syscall common attributes support for prog_load BPF_PROG_LOAD can now take log parameters from both union bpf_attr and struct bpf_common_attr. The merge rules are: - if both sides provide a complete log tuple (buf/size/level) and they match, use it; - if only one side provides log parameters, use that one; - if both sides provide complete tuples but they differ, return -EINVAL. Signed-off-by: Leon Hwang Link: https://lore.kernel.org/r/20260512153157.28382-5-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 8d27ad1f9f94..8433430dedb7 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -764,7 +764,8 @@ struct bpf_log_attr { }; int bpf_log_attr_init(struct bpf_log_attr *log, u64 log_buf, u32 log_size, u32 log_level, - u32 offsetof_log_true_size, bpfptr_t uattr); + u32 offsetof_log_true_size, bpfptr_t uattr, struct bpf_common_attr *common, + bpfptr_t uattr_common, u32 size_common); int bpf_log_attr_finalize(struct bpf_log_attr *attr, struct bpf_verifier_log *log); #define BPF_MAX_SUBPROGS 256 -- cgit v1.2.3 From ceeb7eda94a3548958b30818495ef7eb12898727 Mon Sep 17 00:00:00 2001 From: Leon Hwang Date: Tue, 12 May 2026 23:31:54 +0800 Subject: bpf: Add syscall common attributes support for btf_load BPF_BTF_LOAD can now take log parameters from both union bpf_attr and struct bpf_common_attr, with the same merge rules as BPF_PROG_LOAD: - if both sides provide a complete log tuple (buf/size/level) and they match, use it; - if only one side provides log parameters, use that one; - if both sides provide complete tuples but they differ, return -EINVAL. Acked-by: Andrii Nakryiko Signed-off-by: Leon Hwang Link: https://lore.kernel.org/r/20260512153157.28382-6-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/btf.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'include/linux') diff --git a/include/linux/btf.h b/include/linux/btf.h index c82d0d689059..240401d9b25b 100644 --- a/include/linux/btf.h +++ b/include/linux/btf.h @@ -145,7 +145,8 @@ const char *btf_get_name(const struct btf *btf); void btf_get(struct btf *btf); void btf_put(struct btf *btf); const struct btf_header *btf_header(const struct btf *btf); -int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, u32 uattr_sz); +struct bpf_log_attr; +int btf_new_fd(const union bpf_attr *attr, bpfptr_t uattr, struct bpf_log_attr *attr_log); struct btf *btf_get_by_fd(int fd); int btf_get_info_by_fd(const struct btf *btf, const union bpf_attr *attr, -- cgit v1.2.3 From 49f9b2b2a18c5ce06b21fc2b3399352d80dee0c6 Mon Sep 17 00:00:00 2001 From: Leon Hwang Date: Tue, 12 May 2026 23:31:55 +0800 Subject: bpf: Add syscall common attributes support for map_create Many BPF_MAP_CREATE validation failures currently return -EINVAL without any explanation to userspace. Plumb common syscall log attributes into map_create(), create a verifier log from bpf_common_attr::log_buf/log_size/log_level, and report map-creation failure reasons through that buffer. This improves debuggability by allowing userspace to inspect why map creation failed and read back log_true_size from common attributes. Signed-off-by: Leon Hwang Link: https://lore.kernel.org/r/20260512153157.28382-7-leon.hwang@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 3 +++ 1 file changed, 3 insertions(+) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 8433430dedb7..c15a4c26a43b 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -766,6 +766,9 @@ struct bpf_log_attr { int bpf_log_attr_init(struct bpf_log_attr *log, u64 log_buf, u32 log_size, u32 log_level, u32 offsetof_log_true_size, bpfptr_t uattr, struct bpf_common_attr *common, bpfptr_t uattr_common, u32 size_common); +struct bpf_verifier_log *bpf_log_attr_create_vlog(struct bpf_log_attr *attr_log, + struct bpf_common_attr *common, bpfptr_t uattr, + u32 size); int bpf_log_attr_finalize(struct bpf_log_attr *attr, struct bpf_verifier_log *log); #define BPF_MAX_SUBPROGS 256 -- cgit v1.2.3 From ede2dc5c6b571ce6d3aacf5a81933f8c5d5e6c7d Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Tue, 12 May 2026 21:49:54 -0700 Subject: bpf: Convert bpf_get_spilled_reg macro to static inline function Convert the bpf_get_spilled_reg() macro to a static inline function for better type safety and readability. This also simplifies the macro definition in preparation for upcoming stack argument support which will introduce additional macros. No functional change. Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20260513044954.2382693-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index c15a4c26a43b..203fb751eeae 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -552,10 +552,14 @@ struct bpf_verifier_state { u32 may_goto_depth; }; -#define bpf_get_spilled_reg(slot, frame, mask) \ - (((slot < frame->allocated_stack / BPF_REG_SIZE) && \ - ((1 << frame->stack[slot].slot_type[BPF_REG_SIZE - 1]) & (mask))) \ - ? &frame->stack[slot].spilled_ptr : NULL) +static inline struct bpf_reg_state * +bpf_get_spilled_reg(int slot, struct bpf_func_state *frame, u32 mask) +{ + if (slot < frame->allocated_stack / BPF_REG_SIZE && + (1 << frame->stack[slot].slot_type[BPF_REG_SIZE - 1]) & mask) + return &frame->stack[slot].spilled_ptr; + return NULL; +} /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */ #define bpf_for_each_spilled_reg(iter, frame, reg, mask) \ -- cgit v1.2.3 From 78bbe61632f11b1091c03259f92b6559489222ae Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Tue, 12 May 2026 21:50:05 -0700 Subject: bpf: Add helper functions for r11-based stack argument insns MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add three static inline helper functions — is_stack_arg_ldx(), is_stack_arg_st(), and is_stack_arg_stx() — that identify r11-based (BPF_REG_PARAMS) instructions used for stack argument passing. These helpers encapsulate the detailed encoding requirements (operand size, register, offset alignment and sign) and hide raw BPF_REG_PARAMS usage from the verifier, making call sites more readable and explicit. A later patch ("bpf: Enable r11 based insns") will wire these helpers into the verifier. Until then, check_and_resolve_insns() rejects any r11-based registers. Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20260513045005.2383881-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/filter.h | 21 +++++++++++++++++++++ 1 file changed, 21 insertions(+) (limited to 'include/linux') diff --git a/include/linux/filter.h b/include/linux/filter.h index b77d0b06db6e..918d9b34eac6 100644 --- a/include/linux/filter.h +++ b/include/linux/filter.h @@ -749,6 +749,27 @@ static inline u32 bpf_prog_run_pin_on_cpu(const struct bpf_prog *prog, return ret; } +static inline bool is_stack_arg_ldx(const struct bpf_insn *insn) +{ + return insn->code == (BPF_LDX | BPF_MEM | BPF_DW) && + insn->src_reg == BPF_REG_PARAMS && + insn->off > 0 && insn->off % 8 == 0; +} + +static inline bool is_stack_arg_st(const struct bpf_insn *insn) +{ + return insn->code == (BPF_ST | BPF_MEM | BPF_DW) && + insn->dst_reg == BPF_REG_PARAMS && + insn->off < 0 && insn->off % 8 == 0; +} + +static inline bool is_stack_arg_stx(const struct bpf_insn *insn) +{ + return insn->code == (BPF_STX | BPF_MEM | BPF_DW) && + insn->dst_reg == BPF_REG_PARAMS && + insn->off < 0 && insn->off % 8 == 0; +} + #define BPF_SKB_CB_LEN QDISC_CB_PRIV_LEN struct bpf_skb_data_end { -- cgit v1.2.3 From 0f6bd5e7a804af27e7f34b8306afde7a6b269318 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Tue, 12 May 2026 21:50:15 -0700 Subject: bpf: Support stack arguments for bpf functions MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Currently BPF functions (subprogs) are limited to 5 register arguments. With [1], the compiler can emit code that passes additional arguments via a dedicated stack area through bpf register BPF_REG_PARAMS (r11), introduced in an earlier patch ([2]). The compiler uses positive r11 offsets for incoming (callee-side) args and negative r11 offsets for outgoing (caller-side) args, following the x86_64/arm64 calling convention direction. There is an 8-byte gap at offset 0 separating two regions: Incoming (callee reads): r11+8 (arg6), r11+16 (arg7), ... Outgoing (caller writes): r11-8 (arg6), r11-16 (arg7), ... The following is an example to show how stack arguments are saved and transferred between caller and callee: int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) { ... bar(a1, a2, a3, a4, a5, a6, a7, a8); ... } Caller (foo) Callee (bar) ============ ============ Incoming (positive offsets): Incoming (positive offsets): r11+8: [incoming arg 6] r11+8: [incoming arg 6] <-+ r11+16: [incoming arg 7] r11+16: [incoming arg 7] <-|+ r11+24: [incoming arg 8] <-||+ Outgoing (negative offsets): ||| r11-8: [outgoing arg 6 to bar] -------->-------------------------+|| r11-16: [outgoing arg 7 to bar] -------->--------------------------+| r11-24: [outgoing arg 8 to bar] -------->---------------------------+ If the bpf function has more than one call: int foo(int a1, int a2, int a3, int a4, int a5, int a6, int a7) { ... bar1(a1, a2, a3, a4, a5, a6, a7, a8); ... bar2(a1, a2, a3, a4, a5, a6, a7, a8, a9); ... } Caller (foo) Callee (bar2) ============ ============== Incoming (positive offsets): Incoming (positive offsets): r11+8: [incoming arg 6] r11+8: [incoming arg 6] <+ r11+16: [incoming arg 7] r11+16: [incoming arg 7] <|+ r11+24: [incoming arg 8] <||+ Outgoing for bar2 (negative offsets): r11+32: [incoming arg 9] <|||+ r11-8: [outgoing arg 6] ---->----------->-------------------------+||| r11-16: [outgoing arg 7] ---->----------->--------------------------+|| r11-24: [outgoing arg 8] ---->----------->---------------------------+| r11-32: [outgoing arg 9] ---->----------->----------------------------+ The verifier tracks outgoing stack arguments in stack_arg_regs[] and out_stack_arg_cnt in bpf_func_state, separately from the regular r10 stack. The callee does not copy incoming args — it reads them directly from the caller's outgoing slots at positive r11 offsets. Similar to stacksafe(), introduce stack_arg_safe() to do pruning check. Outgoing stack arg slots are invalidated when the callee returns (e.g. in prepare_func_exit), not at call time. This allows the callee to read incoming args from the caller's outgoing slots during verification. The following are a few examples. Example 1: *(u64 *)(r11 - 8) = r6; *(u64 *)(r11 - 16) = r7; call bar1; // arg6 = r6, arg7 = r7 call bar2; // expected with 2 stack arguments, failed Example 2: To fix the Example 1: *(u64 *)(r11 - 8) = r6; *(u64 *)(r11 - 16) = r7; call bar1; // arg6 = r6, arg7 = r7 *(u64 *)(r11 - 8) = r8; *(u64 *)(r11 - 16) = r9; call bar2; // arg6 = r8, arg7 = r9 Example 3: The compiler can hoist the shared stack arg stores above the branch: *(u64 *)(r11 - 16) = r7; if cond goto else; *(u64 *)(r11 - 8) = r8; call bar1; // arg6 = r8, arg7 = r7 goto end; else: *(u64 *)(r11 - 8) = r9; call bar2; // arg6 = r9, arg7 = r7 end: Example 4: Within a loop: loop: *(u64 *)(r11 - 8) = r6; // arg6, before loop call bar; // reuses arg6 each iteration if ... goto loop; A separate max_out_stack_arg_cnt field in bpf_subprog_info tracks the deepest outgoing slot actually written. This intends to reject programs that write to slots beyond what any callee expects. It is necessary for JIT. Similar to typical compiler generated code, enforce the following orderings: - all stack arg reads must be ahead of any stack arg write - all stack arg reads must be before any bpf func, kfunc and helpers This is needed as JIT may emit 'mov' insns for read/write with the same register and bpf function, kfunc and helper will invalidate all arguments immediately after the call. Callback functions with stack arguments need kernel setup parameter types (including stack parameters) properly and then callback function can retrieve such information for verification purpose. Global subprogs and freplace with >5 args are not yet supported. [1] https://github.com/llvm/llvm-project/pull/189060 [2] https://lore.kernel.org/bpf/20260423033506.2542005-1-yonghong.song@linux.dev/ Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20260513045015.2385013-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 43 +++++++++++++++++++++++++++++++++++++++---- 1 file changed, 39 insertions(+), 4 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 203fb751eeae..5398a02a1280 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -402,6 +402,7 @@ struct bpf_func_state { bool in_callback_fn; bool in_async_callback_fn; bool in_exception_callback_fn; + bool no_stack_arg_load; /* For callback calling functions that limit number of possible * callback executions (e.g. bpf_loop) keeps track of current * simulated iteration number. @@ -427,6 +428,9 @@ struct bpf_func_state { * `stack`. allocated_stack is always a multiple of BPF_REG_SIZE. */ int allocated_stack; + + u16 out_stack_arg_cnt; /* Number of outgoing on-stack argument slots */ + struct bpf_reg_state *stack_arg_regs; /* Outgoing on-stack arguments */ }; #define MAX_CALL_FRAMES 8 @@ -465,8 +469,10 @@ struct bpf_jmp_history_entry { u64 linked_regs; }; -/* Maximum number of register states that can exist at once */ -#define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE) * MAX_CALL_FRAMES) +/* Maximum number of bpf_reg_state objects that can exist at once */ +#define MAX_STACK_ARG_SLOTS (MAX_BPF_FUNC_ARGS - MAX_BPF_FUNC_REG_ARGS) +#define BPF_ID_MAP_SIZE ((MAX_BPF_REG + MAX_BPF_STACK / BPF_REG_SIZE + \ + MAX_STACK_ARG_SLOTS) * MAX_CALL_FRAMES) struct bpf_verifier_state { /* call stack tracking */ struct bpf_func_state *frame[MAX_CALL_FRAMES]; @@ -561,12 +567,27 @@ bpf_get_spilled_reg(int slot, struct bpf_func_state *frame, u32 mask) return NULL; } +static inline struct bpf_reg_state * +bpf_get_spilled_stack_arg(int slot, struct bpf_func_state *frame) +{ + if (slot < frame->out_stack_arg_cnt && + frame->stack_arg_regs[slot].type != NOT_INIT) + return &frame->stack_arg_regs[slot]; + return NULL; +} + /* Iterate over 'frame', setting 'reg' to either NULL or a spilled register. */ #define bpf_for_each_spilled_reg(iter, frame, reg, mask) \ for (iter = 0, reg = bpf_get_spilled_reg(iter, frame, mask); \ iter < frame->allocated_stack / BPF_REG_SIZE; \ iter++, reg = bpf_get_spilled_reg(iter, frame, mask)) +/* Iterate over 'frame', setting 'reg' to either NULL or a spilled stack arg. */ +#define bpf_for_each_spilled_stack_arg(iter, frame, reg) \ + for (iter = 0, reg = bpf_get_spilled_stack_arg(iter, frame); \ + iter < frame->out_stack_arg_cnt; \ + iter++, reg = bpf_get_spilled_stack_arg(iter, frame)) + #define bpf_for_each_reg_in_vstate_mask(__vst, __state, __reg, __mask, __expr) \ ({ \ struct bpf_verifier_state *___vstate = __vst; \ @@ -584,6 +605,11 @@ bpf_get_spilled_reg(int slot, struct bpf_func_state *frame, u32 mask) continue; \ (void)(__expr); \ } \ + bpf_for_each_spilled_stack_arg(___j, __state, __reg) { \ + if (!__reg) \ + continue; \ + (void)(__expr); \ + } \ } \ }) @@ -815,12 +841,21 @@ struct bpf_subprog_info { bool keep_fastcall_stack: 1; bool changes_pkt_data: 1; bool might_sleep: 1; - u8 arg_cnt:3; + u8 arg_cnt:4; enum priv_stack_mode priv_stack_mode; - struct bpf_subprog_arg_info args[MAX_BPF_FUNC_REG_ARGS]; + struct bpf_subprog_arg_info args[MAX_BPF_FUNC_ARGS]; + u16 stack_arg_cnt; /* incoming + max outgoing */ + u16 max_out_stack_arg_cnt; }; +static inline u16 bpf_in_stack_arg_cnt(const struct bpf_subprog_info *sub) +{ + if (sub->arg_cnt > MAX_BPF_FUNC_REG_ARGS) + return sub->arg_cnt - MAX_BPF_FUNC_REG_ARGS; + return 0; +} + struct bpf_verifier_env; struct backtrack_state { -- cgit v1.2.3 From 3a656670fd6da624f6241038ca4cf350f24fd5e8 Mon Sep 17 00:00:00 2001 From: Yonghong Song Date: Tue, 12 May 2026 21:50:20 -0700 Subject: bpf: Refactor jmp history to use dedicated spi/frame fields Move stack slot index (spi) and frame number out of the flags field in bpf_jmp_history_entry into dedicated bitfields. This simplifies the encoding and makes room for new flags. Previously, spi and frame were packed into the lower 9 bits of the 12-bit flags field (3 bits frame + 6 bits spi), with INSN_F_STACK_ACCESS at BIT(9) and INSN_F_DST/SRC_REG_STACK at BIT(10)/BIT(11). But this has no room for an INSN_F_* flag for stack arguments. To resolve this issue, bpf_jmp_history_entry field idx is narrowed to 20 bits (sufficient for insn indices up to 1M), and the freed bits hold spi (6 bits) and frame (3 bits) as dedicated struct fields. The flags enum is simplified accordingly: INSN_F_STACK_ACCESS -> BIT(0) INSN_F_DST_REG_STACK -> BIT(1) INSN_F_SRC_REG_STACK -> BIT(2) which allows more room for additional INSN_F_* flags. bpf_push_jmp_history() now takes explicit spi and frame parameters instead of encoding them into flags. The insn_stack_access_flags(), insn_stack_access_spi(), and insn_stack_access_frameno() helpers are removed. No functional change. Signed-off-by: Yonghong Song Link: https://lore.kernel.org/r/20260513045020.2385962-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov --- include/linux/bpf_verifier.h | 37 ++++++++++++++++--------------------- 1 file changed, 16 insertions(+), 21 deletions(-) (limited to 'include/linux') diff --git a/include/linux/bpf_verifier.h b/include/linux/bpf_verifier.h index 5398a02a1280..3ec338169981 100644 --- a/include/linux/bpf_verifier.h +++ b/include/linux/bpf_verifier.h @@ -435,40 +435,35 @@ struct bpf_func_state { #define MAX_CALL_FRAMES 8 -/* instruction history flags, used in bpf_jmp_history_entry.flags field */ +/* instruction history flags, used in bpf_jmp_history_entry.flags field. + * Frame number and SPI are stored in dedicated fields of bpf_jmp_history_entry. + */ enum { - /* instruction references stack slot through PTR_TO_STACK register; - * we also store stack's frame number in lower 3 bits (MAX_CALL_FRAMES is 8) - * and accessed stack slot's index in next 6 bits (MAX_BPF_STACK is 512, - * 8 bytes per slot, so slot index (spi) is [0, 63]) - */ - INSN_F_FRAMENO_MASK = 0x7, /* 3 bits */ - - INSN_F_SPI_MASK = 0x3f, /* 6 bits */ - INSN_F_SPI_SHIFT = 3, /* shifted 3 bits to the left */ + INSN_F_STACK_ACCESS = BIT(0), - INSN_F_STACK_ACCESS = BIT(9), - - INSN_F_DST_REG_STACK = BIT(10), /* dst_reg is PTR_TO_STACK */ - INSN_F_SRC_REG_STACK = BIT(11), /* src_reg is PTR_TO_STACK */ - /* total 12 bits are used now. */ + INSN_F_DST_REG_STACK = BIT(1), /* dst_reg is PTR_TO_STACK */ + INSN_F_SRC_REG_STACK = BIT(2), /* src_reg is PTR_TO_STACK */ }; -static_assert(INSN_F_FRAMENO_MASK + 1 >= MAX_CALL_FRAMES); -static_assert(INSN_F_SPI_MASK + 1 >= MAX_BPF_STACK / 8); - str