diff options
| author | Linus Torvalds <torvalds@linux-foundation.org> | 2020-10-15 18:42:13 -0700 |
|---|---|---|
| committer | Linus Torvalds <torvalds@linux-foundation.org> | 2020-10-15 18:42:13 -0700 |
| commit | 9ff9b0d392ea08090cd1780fb196f36dbb586529 (patch) | |
| tree | 276a3a5c4525b84dee64eda30b423fc31bf94850 /tools/lib | |
| parent | 840e5bb326bbcb16ce82dd2416d2769de4839aea (diff) | |
| parent | 105faa8742437c28815b2a3eb8314ebc5fd9288c (diff) | |
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
- Add redirect_neigh() BPF packet redirect helper, allowing to limit
stack traversal in common container configs and improving TCP
back-pressure.
Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain.
- Expand netlink policy support and improve policy export to user
space. (Ge)netlink core performs request validation according to
declared policies. Expand the expressiveness of those policies
(min/max length and bitmasks). Allow dumping policies for particular
commands. This is used for feature discovery by user space (instead
of kernel version parsing or trial and error).
- Support IGMPv3/MLDv2 multicast listener discovery protocols in
bridge.
- Allow more than 255 IPv4 multicast interfaces.
- Add support for Type of Service (ToS) reflection in SYN/SYN-ACK
packets of TCPv6.
- In Multi-patch TCP (MPTCP) support concurrent transmission of data on
multiple subflows in a load balancing scenario. Enhance advertising
addresses via the RM_ADDR/ADD_ADDR options.
- Support SMC-Dv2 version of SMC, which enables multi-subnet
deployments.
- Allow more calls to same peer in RxRPC.
- Support two new Controller Area Network (CAN) protocols - CAN-FD and
ISO 15765-2:2016.
- Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit
kernel problem.
- Add TC actions for implementing MPLS L2 VPNs.
- Improve nexthop code - e.g. handle various corner cases when nexthop
objects are removed from groups better, skip unnecessary
notifications and make it easier to offload nexthops into HW by
converting to a blocking notifier.
- Support adding and consuming TCP header options by BPF programs,
opening the doors for easy experimental and deployment-specific TCP
option use.
- Reorganize TCP congestion control (CC) initialization to simplify
life of TCP CC implemented in BPF.
- Add support for shipping BPF programs with the kernel and loading
them early on boot via the User Mode Driver mechanism, hence reusing
all the user space infra we have.
- Support sleepable BPF programs, initially targeting LSM and tracing.
- Add bpf_d_path() helper for returning full path for given 'struct
path'.
- Make bpf_tail_call compatible with bpf-to-bpf calls.
- Allow BPF programs to call map_update_elem on sockmaps.
- Add BPF Type Format (BTF) support for type and enum discovery, as
well as support for using BTF within the kernel itself (current use
is for pretty printing structures).
- Support listing and getting information about bpf_links via the bpf
syscall.
- Enhance kernel interfaces around NIC firmware update. Allow
specifying overwrite mask to control if settings etc. are reset
during update; report expected max time operation may take to users;
support firmware activation without machine reboot incl. limits of
how much impact reset may have (e.g. dropping link or not).
- Extend ethtool configuration interface to report IEEE-standard
counters, to limit the need for per-vendor logic in user space.
- Adopt or extend devlink use for debug, monitoring, fw update in many
drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx,
dpaa2-eth).
- In mlxsw expose critical and emergency SFP module temperature alarms.
Refactor port buffer handling to make the defaults more suitable and
support setting these values explicitly via the DCBNL interface.
- Add XDP support for Intel's igb driver.
- Support offloading TC flower classification and filtering rules to
mscc_ocelot switches.
- Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as
fixed interval period pulse generator and one-step timestamping in
dpaa-eth.
- Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3)
offload.
- Add Lynx PHY/PCS MDIO module, and convert various drivers which have
this HW to use it. Convert mvpp2 to split PCS.
- Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as
7-port Mediatek MT7531 IP.
- Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver,
and wcn3680 support in wcn36xx.
- Improve performance for packets which don't require much offloads on
recent Mellanox NICs by 20% by making multiple packets share a
descriptor entry.
- Move chelsio inline crypto drivers (for TLS and IPsec) from the
crypto subtree to drivers/net. Move MDIO drivers out of the phy
directory.
- Clean up a lot of W=1 warnings, reportedly the actively developed
subsections of networking drivers should now build W=1 warning free.
- Make sure drivers don't use in_interrupt() to dynamically adapt their
code. Convert tasklets to use new tasklet_setup API (sadly this
conversion is not yet complete).
* tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits)
Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH"
net, sockmap: Don't call bpf_prog_put() on NULL pointer
bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo
bpf, sockmap: Add locking annotations to iterator
netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements
net: fix pos incrementment in ipv6_route_seq_next
net/smc: fix invalid return code in smcd_new_buf_create()
net/smc: fix valid DMBE buffer sizes
net/smc: fix use-after-free of delayed events
bpfilter: Fix build error with CONFIG_BPFILTER_UMH
cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr
net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info
bpf: Fix register equivalence tracking.
rxrpc: Fix loss of final ack on shutdown
rxrpc: Fix bundle counting for exclusive connections
netfilter: restore NF_INET_NUMHOOKS
ibmveth: Identify ingress large send packets.
ibmveth: Switch order of ibmveth_helper calls.
cxgb4: handle 4-tuple PEDIT to NAT mode translation
selftests: Add VRF route leaking tests
...
Diffstat (limited to 'tools/lib')
| -rw-r--r-- | tools/lib/bpf/Makefile | 28 | ||||
| -rw-r--r-- | tools/lib/bpf/bpf.c | 70 | ||||
| -rw-r--r-- | tools/lib/bpf/bpf.h | 39 | ||||
| -rw-r--r-- | tools/lib/bpf/bpf_core_read.h | 120 | ||||
| -rw-r--r-- | tools/lib/bpf/bpf_helpers.h | 49 | ||||
| -rw-r--r-- | tools/lib/bpf/bpf_prog_linfo.c | 3 | ||||
| -rw-r--r-- | tools/lib/bpf/bpf_tracing.h | 4 | ||||
| -rw-r--r-- | tools/lib/bpf/btf.c | 1631 | ||||
| -rw-r--r-- | tools/lib/bpf/btf.h | 103 | ||||
| -rw-r--r-- | tools/lib/bpf/btf_dump.c | 87 | ||||
| -rw-r--r-- | tools/lib/bpf/hashmap.c | 3 | ||||
| -rw-r--r-- | tools/lib/bpf/hashmap.h | 12 | ||||
| -rw-r--r-- | tools/lib/bpf/libbpf.c | 3407 | ||||
| -rw-r--r-- | tools/lib/bpf/libbpf.h | 12 | ||||
| -rw-r--r-- | tools/lib/bpf/libbpf.map | 38 | ||||
| -rw-r--r-- | tools/lib/bpf/libbpf_common.h | 2 | ||||
| -rw-r--r-- | tools/lib/bpf/libbpf_internal.h | 147 | ||||
| -rw-r--r-- | tools/lib/bpf/libbpf_probes.c | 8 | ||||
| -rw-r--r-- | tools/lib/bpf/netlink.c | 128 | ||||
| -rw-r--r-- | tools/lib/bpf/nlattr.c | 9 | ||||
| -rw-r--r-- | tools/lib/bpf/ringbuf.c | 8 | ||||
| -rw-r--r-- | tools/lib/bpf/xsk.c | 383 | ||||
| -rw-r--r-- | tools/lib/bpf/xsk.h | 9 |
23 files changed, 4726 insertions, 1574 deletions
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile index 9ae8f4ef0aac..5f9abed3e226 100644 --- a/tools/lib/bpf/Makefile +++ b/tools/lib/bpf/Makefile @@ -1,6 +1,9 @@ # SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) # Most of this file is copied from tools/lib/traceevent/Makefile +RM ?= rm +srctree = $(abs_srctree) + LIBBPF_VERSION := $(shell \ grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \ sort -rV | head -n1 | cut -d'_' -f2) @@ -56,7 +59,7 @@ ifndef VERBOSE endif FEATURE_USER = .libbpf -FEATURE_TESTS = libelf libelf-mmap zlib bpf reallocarray +FEATURE_TESTS = libelf zlib bpf FEATURE_DISPLAY = libelf zlib bpf INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/include/uapi @@ -95,27 +98,18 @@ PC_FILE = libbpf.pc ifdef EXTRA_CFLAGS CFLAGS := $(EXTRA_CFLAGS) else - CFLAGS := -g -Wall -endif - -ifeq ($(feature-libelf-mmap), 1) - override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT -endif - -ifeq ($(feature-reallocarray), 0) - override CFLAGS += -DCOMPAT_NEED_REALLOCARRAY + CFLAGS := -g -O2 endif # Append required CFLAGS -override CFLAGS += $(EXTRA_WARNINGS) +override CFLAGS += $(EXTRA_WARNINGS) -Wno-switch-enum override CFLAGS += -Werror -Wall -override CFLAGS += -fPIC override CFLAGS += $(INCLUDES) override CFLAGS += -fvisibility=hidden override CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 # flags specific for shared library -SHLIB_FLAGS := -DSHARED +SHLIB_FLAGS := -DSHARED -fPIC ifeq ($(VERBOSE),1) Q = @@ -197,7 +191,7 @@ $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN_SHARED) @ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION) $(OUTPUT)libbpf.a: $(BPF_IN_STATIC) - $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^ + $(QUIET_LINK)$(RM) -f $@; $(AR) rcs $@ $^ $(OUTPUT)libbpf.pc: $(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \ @@ -271,10 +265,10 @@ install: install_lib install_pkgconfig install_headers ### Cleaning rules config-clean: - $(call QUIET_CLEAN, config) + $(call QUIET_CLEAN, feature-detect) $(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null -clean: +clean: config-clean $(call QUIET_CLEAN, libbpf) $(RM) -rf $(CMD_TARGETS) \ *~ .*.d .*.cmd LIBBPF-CFLAGS $(BPF_HELPER_DEFS) \ $(SHARED_OBJDIR) $(STATIC_OBJDIR) \ @@ -301,7 +295,7 @@ cscope: cscope -b -q -I $(srctree)/include -f cscope.out tags: - rm -f TAGS tags + $(RM) -f TAGS tags ls *.c *.h | xargs $(TAGS_PROG) -a # Declare the contents of the .PHONY variable as phony. We keep that diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c index 0750681057c2..d27e34133973 100644 --- a/tools/lib/bpf/bpf.c +++ b/tools/lib/bpf/bpf.c @@ -32,9 +32,6 @@ #include "libbpf.h" #include "libbpf_internal.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - /* * When building perf, unistd.h is overridden. __NR_bpf is * required to be defined explicitly. @@ -589,19 +586,31 @@ int bpf_link_create(int prog_fd, int target_fd, enum bpf_attach_type attach_type, const struct bpf_link_create_opts *opts) { + __u32 target_btf_id, iter_info_len; union bpf_attr attr; if (!OPTS_VALID(opts, bpf_link_create_opts)) return -EINVAL; + iter_info_len = OPTS_GET(opts, iter_info_len, 0); + target_btf_id = OPTS_GET(opts, target_btf_id, 0); + + if (iter_info_len && target_btf_id) + return -EINVAL; + memset(&attr, 0, sizeof(attr)); attr.link_create.prog_fd = prog_fd; attr.link_create.target_fd = target_fd; attr.link_create.attach_type = attach_type; attr.link_create.flags = OPTS_GET(opts, flags, 0); - attr.link_create.iter_info = - ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0)); - attr.link_create.iter_info_len = OPTS_GET(opts, iter_info_len, 0); + + if (iter_info_len) { + attr.link_create.iter_info = + ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0)); + attr.link_create.iter_info_len = iter_info_len; + } else if (target_btf_id) { + attr.link_create.target_btf_id = target_btf_id; + } return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr)); } @@ -715,6 +724,37 @@ int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr) return ret; } +int bpf_prog_test_run_opts(int prog_fd, struct bpf_test_run_opts *opts) +{ + union bpf_attr attr; + int ret; + + if (!OPTS_VALID(opts, bpf_test_run_opts)) + return -EINVAL; + + memset(&attr, 0, sizeof(attr)); + attr.test.prog_fd = prog_fd; + attr.test.cpu = OPTS_GET(opts, cpu, 0); + attr.test.flags = OPTS_GET(opts, flags, 0); + attr.test.repeat = OPTS_GET(opts, repeat, 0); + attr.test.duration = OPTS_GET(opts, duration, 0); + attr.test.ctx_size_in = OPTS_GET(opts, ctx_size_in, 0); + attr.test.ctx_size_out = OPTS_GET(opts, ctx_size_out, 0); + attr.test.data_size_in = OPTS_GET(opts, data_size_in, 0); + attr.test.data_size_out = OPTS_GET(opts, data_size_out, 0); + attr.test.ctx_in = ptr_to_u64(OPTS_GET(opts, ctx_in, NULL)); + attr.test.ctx_out = ptr_to_u64(OPTS_GET(opts, ctx_out, NULL)); + attr.test.data_in = ptr_to_u64(OPTS_GET(opts, data_in, NULL)); + attr.test.data_out = ptr_to_u64(OPTS_GET(opts, data_out, NULL)); + + ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr)); + OPTS_SET(opts, data_size_out, attr.test.data_size_out); + OPTS_SET(opts, ctx_size_out, attr.test.ctx_size_out); + OPTS_SET(opts, duration, attr.test.duration); + OPTS_SET(opts, retval, attr.test.retval); + return ret; +} + static int bpf_obj_get_next_id(__u32 start_id, __u32 *next_id, int cmd) { union bpf_attr attr; @@ -818,7 +858,7 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd) return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr)); } -int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, +int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, bool do_log) { union bpf_attr attr = {}; @@ -875,3 +915,19 @@ int bpf_enable_stats(enum bpf_stats_type type) return sys_bpf(BPF_ENABLE_STATS, &attr, sizeof(attr)); } + +int bpf_prog_bind_map(int prog_fd, int map_fd, + const struct bpf_prog_bind_opts *opts) +{ + union bpf_attr attr; + + if (!OPTS_VALID(opts, bpf_prog_bind_opts)) + return -EINVAL; + + memset(&attr, 0, sizeof(attr)); + attr.prog_bind_map.prog_fd = prog_fd; + attr.prog_bind_map.map_fd = map_fd; + attr.prog_bind_map.flags = OPTS_GET(opts, flags, 0); + + return sys_bpf(BPF_PROG_BIND_MAP, &attr, sizeof(attr)); +} diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h index 015d13f25fcc..875dde20d56e 100644 --- a/tools/lib/bpf/bpf.h +++ b/tools/lib/bpf/bpf.h @@ -174,8 +174,9 @@ struct bpf_link_create_opts { __u32 flags; union bpf_iter_link_info *iter_info; __u32 iter_info_len; + __u32 target_btf_id; }; -#define bpf_link_create_opts__last_field iter_info_len +#define bpf_link_create_opts__last_field target_btf_id LIBBPF_API int bpf_link_create(int prog_fd, int target_fd, enum bpf_attach_type attach_type, @@ -234,7 +235,7 @@ LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type, __u32 query_flags, __u32 *attach_flags, __u32 *prog_ids, __u32 *prog_cnt); LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd); -LIBBPF_API int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, +LIBBPF_API int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size, bool do_log); LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, __u32 *buf_len, __u32 *prog_id, __u32 *fd_type, @@ -243,6 +244,40 @@ LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf, enum bpf_stats_type; /* defined in up-to-date linux/bpf.h */ LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type); +struct bpf_prog_bind_opts { + size_t sz; /* size of this struct for forward/backward compatibility */ + __u32 flags; +}; +#define bpf_prog_bind_opts__last_field flags + +LIBBPF_API int bpf_prog_bind_map(int prog_fd, int map_fd, + const struct bpf_prog_bind_opts *opts); + +struct bpf_test_run_opts { + size_t sz; /* size of this struct for forward/backward compatibility */ + const void *data_in; /* optional */ + void *data_out; /* optional */ + __u32 data_size_in; + __u32 data_size_out; /* in: max length of data_out + * out: length of data_out + */ + const void *ctx_in; /* optional */ + void *ctx_out; /* optional */ + __u32 ctx_size_in; + __u32 ctx_size_out; /* in: max length of ctx_out + * out: length of cxt_out + */ + __u32 retval; /* out: return code of the BPF program */ + int repeat; + __u32 duration; /* out: average per repetition in ns */ + __u32 flags; + __u32 cpu; +}; +#define bpf_test_run_opts__last_field cpu + +LIBBPF_API int bpf_prog_test_run_opts(int prog_fd, + struct bpf_test_run_opts *opts); + #ifdef __cplusplus } /* extern "C" */ #endif diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h index eae5cccff761..bbcefb3ff5a5 100644 --- a/tools/lib/bpf/bpf_core_read.h +++ b/tools/lib/bpf/bpf_core_read.h @@ -19,32 +19,52 @@ enum bpf_field_info_kind { BPF_FIELD_RSHIFT_U64 = 5, }; +/* second argument to __builtin_btf_type_id() built-in */ +enum bpf_type_id_kind { + BPF_TYPE_ID_LOCAL = 0, /* BTF type ID in local program */ + BPF_TYPE_ID_TARGET = 1, /* BTF type ID in target kernel */ +}; + +/* second argument to __builtin_preserve_type_info() built-in */ +enum bpf_type_info_kind { + BPF_TYPE_EXISTS = 0, /* type existence in target kernel */ + BPF_TYPE_SIZE = 1, /* type size in target kernel */ +}; + +/* second argument to __builtin_preserve_enum_value() built-in */ +enum bpf_enum_value_kind { + BPF_ENUMVAL_EXISTS = 0, /* enum value existence in kernel */ + BPF_ENUMVAL_VALUE = 1, /* enum value value relocation */ +}; + #define __CORE_RELO(src, field, info) \ __builtin_preserve_field_info((src)->field, BPF_FIELD_##info) #if __BYTE_ORDER == __LITTLE_ENDIAN #define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \ - bpf_probe_read((void *)dst, \ - __CORE_RELO(src, fld, BYTE_SIZE), \ - (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) + bpf_probe_read_kernel( \ + (void *)dst, \ + __CORE_RELO(src, fld, BYTE_SIZE), \ + (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) #else /* semantics of LSHIFT_64 assumes loading values into low-ordered bytes, so * for big-endian we need to adjust destination pointer accordingly, based on * field byte size */ #define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \ - bpf_probe_read((void *)dst + (8 - __CORE_RELO(src, fld, BYTE_SIZE)), \ - __CORE_RELO(src, fld, BYTE_SIZE), \ - (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) + bpf_probe_read_kernel( \ + (void *)dst + (8 - __CORE_RELO(src, fld, BYTE_SIZE)), \ + __CORE_RELO(src, fld, BYTE_SIZE), \ + (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET)) #endif /* * Extract bitfield, identified by s->field, and return its value as u64. * All this is done in relocatable manner, so bitfield changes such as * signedness, bit size, offset changes, this will be handled automatically. - * This version of macro is using bpf_probe_read() to read underlying integer - * storage. Macro functions as an expression and its return type is - * bpf_probe_read()'s return value: 0, on success, <0 on error. + * This version of macro is using bpf_probe_read_kernel() to read underlying + * integer storage. Macro functions as an expression and its return type is + * bpf_probe_read_kernel()'s return value: 0, on success, <0 on error. */ #define BPF_CORE_READ_BITFIELD_PROBED(s, field) ({ \ unsigned long long val = 0; \ @@ -92,15 +112,75 @@ enum bpf_field_info_kind { __builtin_preserve_field_info(field, BPF_FIELD_EXISTS) /* - * Convenience macro to get byte size of a field. Works for integers, + * Convenience macro to get the byte size of a field. Works for integers, * struct/unions, pointers, arrays, and enums. */ #define bpf_core_field_size(field) \ __builtin_preserve_field_info(field, BPF_FIELD_BYTE_SIZE) /* - * bpf_core_read() abstracts away bpf_probe_read() call and captures offset - * relocation for source address using __builtin_preserve_access_index() + * Convenience macro to get BTF type ID of a specified type, using a local BTF + * information. Return 32-bit unsigned integer with type ID from program's own + * BTF. Always succeeds. + */ +#define bpf_core_type_id_local(type) \ + __builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_LOCAL) + +/* + * Convenience macro to get BTF type ID of a target kernel's type that matches + * specified local type. + * Returns: + * - valid 32-bit unsigned type ID in kernel BTF; + * - 0, if no matching type was found in a target kernel BTF. + */ +#define bpf_core_type_id_kernel(type) \ + __builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_TARGET) + +/* + * Convenience macro to check that provided named type + * (struct/union/enum/typedef) exists in a target kernel. + * Returns: + * 1, if such type is present in target kernel's BTF; + * 0, if no matching type is found. + */ +#define bpf_core_type_exists(type) \ + __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_EXISTS) + +/* + * Convenience macro to get the byte size of a provided named type + * (struct/union/enum/typedef) in a target kernel. + * Returns: + * >= 0 size (in bytes), if type is present in target kernel's BTF; + * 0, if no matching type is found. + */ +#define bpf_core_type_size(type) \ + __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE) + +/* + * Convenience macro to check that provided enumerator value is defined in + * a target kernel. + * Returns: + * 1, if specified enum type and its enumerator value are present in target + * kernel's BTF; + * 0, if no matching enum and/or enum value within that enum is found. + */ +#define bpf_core_enum_value_exists(enum_type, enum_value) \ + __builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_EXISTS) + +/* + * Convenience macro to get the integer value of an enumerator value in + * a target kernel. + * Returns: + * 64-bit value, if specified enum type and its enumerator value are + * present in target kernel's BTF; + * 0, if no matching enum and/or enum value within that enum is found. + */ +#define bpf_core_enum_value(enum_type, enum_value) \ + __builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_VALUE) + +/* + * bpf_core_read() abstracts away bpf_probe_read_kernel() call and captures + * offset relocation for source address using __builtin_preserve_access_index() * built-in, provided by Clang. * * __builtin_preserve_access_index() takes as an argument an expression of @@ -115,8 +195,8 @@ enum bpf_field_info_kind { * (local) BTF, used to record relocation. */ #define bpf_core_read(dst, sz, src) \ - bpf_probe_read(dst, sz, \ - (const void *)__builtin_preserve_access_index(src)) + bpf_probe_read_kernel(dst, sz, \ + (const void *)__builtin_preserve_access_index(src)) /* * bpf_core_read_str() is a thin wrapper around bpf_probe_read_str() @@ -124,8 +204,8 @@ enum bpf_field_info_kind { * argument. */ #define bpf_core_read_str(dst, sz, src) \ - bpf_probe_read_str(dst, sz, \ - (const void *)__builtin_preserve_access_index(src)) + bpf_probe_read_kernel_str(dst, sz, \ + (const void *)__builtin_preserve_access_index(src)) #define ___concat(a, b) a ## b #define ___apply(fn, n) ___concat(fn, n) @@ -239,15 +319,17 @@ enum bpf_field_info_kind { * int x = BPF_CORE_READ(s, a.b.c, d.e, f, g); * * BPF_CORE_READ will decompose above statement into 4 bpf_core_read (BPF - * CO-RE relocatable bpf_probe_read() wrapper) calls, logically equivalent to: + * CO-RE relocatable bpf_probe_read_kernel() wrapper) calls, logically + * equivalent to: * 1. const void *__t = s->a.b.c; * 2. __t = __t->d.e; * 3. __t = __t->f; * 4. return __t->g; * * Equivalence is logical, because there is a heavy type casting/preservation - * involved, as well as all the reads are happening through bpf_probe_read() - * calls using __builtin_preserve_access_index() to emit CO-RE relocations. + * involved, as well as all the reads are happening through + * bpf_probe_read_kernel() calls using __builtin_preserve_access_index() to + * emit CO-RE relocations. * * N.B. Only up to 9 "field accessors" are supported, which should be more * than enough for any practical purpose. diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h index e9a4ecddb7a5..2bdb7d6dbad2 100644 --- a/tools/lib/bpf/bpf_helpers.h +++ b/tools/lib/bpf/bpf_helpers.h @@ -32,6 +32,9 @@ #ifndef __always_inline #define __always_inline __attribute__((always_inline)) #endif +#ifndef __noinline +#define __noinline __attribute__((noinline)) +#endif #ifndef __weak #define __weak __attribute__((weak)) #endif @@ -51,6 +54,52 @@ #endif /* + * Helper macro to throw a compilation error if __bpf_unreachable() gets + * built into the resulting code. This works given BPF back end does not + * implement __builtin_trap(). This is useful to assert that certain paths + * of the program code are never used and hence eliminated by the compiler. + * + * For example, consider a switch statement that covers known cases used by + * the program. __bpf_unreachable() can then reside in the default case. If + * the program gets extended such that a case is not covered in the switch + * statement, then it will throw a build error due to the default case not + * being compiled out. + */ +#ifndef __bpf_unreachable +# define __bpf_unreachable() __builtin_trap() +#endif + +/* + * Helper function to perform a tail call with a constant/immediate map slot. + */ +static __always_inline void +bpf_tail_call_static(void *ctx, const void *map, const __u32 slot) +{ + if (!__builtin_constant_p(slot)) + __bpf_unreachable(); + + /* + * Provide a hard guarantee that LLVM won't optimize setting r2 (map + * pointer) and r3 (constant map index) from _different paths_ ending + * up at the _same_ call insn as otherwise we won't be able to use the + * jmpq/nopl retpoline-free patching by the x86-64 JIT in the kernel + * given they mismatch. See also d2e4c1e6c294 ("bpf: Constant map key + * tracking for prog array pokes") for details on verifier tracking. + * + * Note on clobber list: we need to stay in-line with BPF calling + * convention, so even if we don't end up using r0, r4, r5, we need + * to mark them as clobber so that LLVM doesn't end up using them + * before / after the call. + */ + asm volatile("r1 = %[ctx]\n\t" + "r2 = %[map]\n\t" + "r3 = %[slot]\n\t" + "call 12" + :: [ctx]"r"(ctx), [map]"r"(map), [slot]"i"(slot) + : "r0", "r1", "r2", "r3", "r4", "r5"); +} + +/* * Helper structure used by eBPF C program * to describe BPF map attributes to libbpf loader */ diff --git a/tools/lib/bpf/bpf_prog_linfo.c b/tools/lib/bpf/bpf_prog_linfo.c index bafca49cb1e6..3ed1a27b5f7c 100644 --- a/tools/lib/bpf/bpf_prog_linfo.c +++ b/tools/lib/bpf/bpf_prog_linfo.c @@ -8,9 +8,6 @@ #include "libbpf.h" #include "libbpf_internal.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - struct bpf_prog_linfo { void *raw_linfo; void *raw_jited_linfo; diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h index eebf020cbe3e..f9ef37707888 100644 --- a/tools/lib/bpf/bpf_tracing.h +++ b/tools/lib/bpf/bpf_tracing.h @@ -289,9 +289,9 @@ struct pt_regs; #define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP #else #define BPF_KPROBE_READ_RET_IP(ip, ctx) \ - ({ bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); }) + ({ bpf_probe_read_kernel(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); }) #define BPF_KRETPROBE_READ_RET_IP(ip, ctx) \ - ({ bpf_probe_read(&(ip), sizeof(ip), \ + ({ bpf_probe_read_kernel(&(ip), sizeof(ip), \ (void *)(PT_REGS_FP(ctx) + sizeof(ip))); }) #endif diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c index 6bdbc389b493..231b07203e3d 100644 --- a/tools/lib/bpf/btf.c +++ b/tools/lib/bpf/btf.c @@ -1,6 +1,7 @@ // SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause) /* Copyright (c) 2018 Facebook */ +#include <byteswap.h> #include <endian.h> #include <stdio.h> #include <stdlib.h> @@ -21,26 +22,78 @@ #include "libbpf_internal.h" #include "hashmap.h" -/* make sure libbpf doesn't use kernel-only integer typedefs */ -#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64 - #define BTF_MAX_NR_TYPES 0x7fffffffU #define BTF_MAX_STR_OFFSET 0x7fffffffU static struct btf_type btf_void; struct btf { - union { - struct btf_header *hdr; - void *data; - }; - struct btf_type **types; - const char *strings; - void *nohdr_data; + /* raw BTF data in native endianness */ + void *raw_data; + /* raw BTF data in non-native endianness */ + void *raw_data_swapped; + __u32 raw_size; + /* whether target endianness differs from the native one */ + bool swapped_endian; + + /* + * When BTF is loaded from an ELF or raw memory it is stored + * in a contiguous memory block. The hdr, type_data, and, strs_data + * point inside that memory region to their respective parts of BTF + * representation: + * + * +--------------------------------+ + * | Header | Types | Strings | + * +--------------------------------+ + * ^ ^ ^ + * | | | + * hdr | | + * types_data-+ | + * strs_data------------+ + * + * If BTF data is later modified, e.g., due to types added or + * removed, BTF deduplication performed, etc, this contiguous + * representation is broken up into three independently allocated + * memory regions to be able to modify them independently. + * raw_data is nulled out at that point, but can be later allocated + * and cached again if user calls btf__get_raw_data(), at which point + * raw_data will contain a contiguous copy of header, types, and + * strings: + * + * +----------+ +---------+ +-----------+ + * | Header | | Types | | Strings | + * +----------+ +---------+ +-----------+ + * ^ ^ ^ + * | | | + * hdr | | + * types_data----+ | + * strs_data------------------+ + * + * +----------+---------+-----------+ + * | Header | Types | Strings | + * raw_data----->+----------+---------+-----------+ + */ + struct btf_header *hdr; + + void *types_data; + size_t types_data_cap; /* used size stored in hdr->type_len */ + + /* type ID to `struct btf_type *` lookup index */ + __u32 *type_offs; + size_t type_offs_cap; __u32 nr_types; - __u32 types_size; - __u32 data_size; + + void *strs_data; + size_t strs_data_cap; /* used size stored in hdr->str_len */ + + /* lookup index for each unique string in strings section */ + struct hashmap *strs_hash; + /* whether strings are already deduplicated */ + bool strs_deduped; + /* BTF object FD, if loaded into kernel */ int fd; + + /* Pointer size (in bytes) for a target architecture of this BTF */ int ptr_sz; }; @@ -49,60 +102,114 @@ static inline __u64 ptr_to_u64(const void *ptr) return (__u64) (unsigned long) ptr; } -static int btf_add_type(struct btf *btf, struct btf_type *t) +/* Ensure given dynamically allocated memory region pointed to by *data* with + * capacity of *cap_cnt* elements each taking *elem_sz* bytes has enough + * memory to accomodate *add_cnt* new elements, assuming *cur_cnt* elements + * are already used. At most *max_cnt* elements can be ever allocated. + * If necessary, memory is reallocated and all existing data is copied over, + * new pointer to the memory region is stored at *data, new memory region + * capacity (in number of elements) is stored in *cap. + * On success, memory pointer to the beginning of unused memory is returned. + * On error, NULL is returned. + */ +void *btf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz, + size_t cur_cnt, size_t max_cnt, size_t add_cnt) { - if (btf->types_size - btf->nr_types < 2) { - struct btf_type **new_types; - __u32 expand_by, new_size; + size_t new_cnt; + void *new_data; - if (btf->types_size == BTF_MAX_NR_TYPES) - return -E2BIG; + if (cur_cnt + add_cnt <= *cap_cnt) + return *data + cur_cnt * elem_sz; - expand_by = max(btf->types_size >> 2, 16U); - new_size = min(BTF_MAX_NR_TYPES, btf->types_size + expand_by); + /* requested more than the set limit */ + if (cur_cnt + add_cnt > max_cnt) + return NULL; - new_types = realloc(btf->types, sizeof(*new_types) * new_size); - if (!new_types) - return -ENOMEM; + new_cnt = *cap_cnt; + new_cnt += new_cnt / 4; /* expand by 25% */ + if (new_cnt < 16) /* but at least 16 elements */ + new_cnt = 16; + if (new_cnt > max_cnt) /* but not exceeding a set limit */ + new_cnt = max_cnt; + if (new_cnt < cur_cnt + add_cnt) /* also ensure we have enough memory */ + new_cnt = cur_cnt + add_cnt; + + new_data = libbpf_reallocarray(*data, new_cnt, elem_sz); + if (!new_data) + return NULL; - if (btf->nr_types == 0) - new_types[0] = &btf_void; + /* zero out newly allocated portion of memory */ + memset(new_data + (*cap_cnt) * elem_sz, 0, (new_cnt - *cap_cnt) * elem_sz); |
