aboutsummaryrefslogtreecommitdiff
path: root/tools/lib
diff options
context:
space:
mode:
authorLinus Torvalds <torvalds@linux-foundation.org>2020-10-15 18:42:13 -0700
committerLinus Torvalds <torvalds@linux-foundation.org>2020-10-15 18:42:13 -0700
commit9ff9b0d392ea08090cd1780fb196f36dbb586529 (patch)
tree276a3a5c4525b84dee64eda30b423fc31bf94850 /tools/lib
parent840e5bb326bbcb16ce82dd2416d2769de4839aea (diff)
parent105faa8742437c28815b2a3eb8314ebc5fd9288c (diff)
Merge tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski: - Add redirect_neigh() BPF packet redirect helper, allowing to limit stack traversal in common container configs and improving TCP back-pressure. Daniel reports ~10Gbps => ~15Gbps single stream TCP performance gain. - Expand netlink policy support and improve policy export to user space. (Ge)netlink core performs request validation according to declared policies. Expand the expressiveness of those policies (min/max length and bitmasks). Allow dumping policies for particular commands. This is used for feature discovery by user space (instead of kernel version parsing or trial and error). - Support IGMPv3/MLDv2 multicast listener discovery protocols in bridge. - Allow more than 255 IPv4 multicast interfaces. - Add support for Type of Service (ToS) reflection in SYN/SYN-ACK packets of TCPv6. - In Multi-patch TCP (MPTCP) support concurrent transmission of data on multiple subflows in a load balancing scenario. Enhance advertising addresses via the RM_ADDR/ADD_ADDR options. - Support SMC-Dv2 version of SMC, which enables multi-subnet deployments. - Allow more calls to same peer in RxRPC. - Support two new Controller Area Network (CAN) protocols - CAN-FD and ISO 15765-2:2016. - Add xfrm/IPsec compat layer, solving the 32bit user space on 64bit kernel problem. - Add TC actions for implementing MPLS L2 VPNs. - Improve nexthop code - e.g. handle various corner cases when nexthop objects are removed from groups better, skip unnecessary notifications and make it easier to offload nexthops into HW by converting to a blocking notifier. - Support adding and consuming TCP header options by BPF programs, opening the doors for easy experimental and deployment-specific TCP option use. - Reorganize TCP congestion control (CC) initialization to simplify life of TCP CC implemented in BPF. - Add support for shipping BPF programs with the kernel and loading them early on boot via the User Mode Driver mechanism, hence reusing all the user space infra we have. - Support sleepable BPF programs, initially targeting LSM and tracing. - Add bpf_d_path() helper for returning full path for given 'struct path'. - Make bpf_tail_call compatible with bpf-to-bpf calls. - Allow BPF programs to call map_update_elem on sockmaps. - Add BPF Type Format (BTF) support for type and enum discovery, as well as support for using BTF within the kernel itself (current use is for pretty printing structures). - Support listing and getting information about bpf_links via the bpf syscall. - Enhance kernel interfaces around NIC firmware update. Allow specifying overwrite mask to control if settings etc. are reset during update; report expected max time operation may take to users; support firmware activation without machine reboot incl. limits of how much impact reset may have (e.g. dropping link or not). - Extend ethtool configuration interface to report IEEE-standard counters, to limit the need for per-vendor logic in user space. - Adopt or extend devlink use for debug, monitoring, fw update in many drivers (dsa loop, ice, ionic, sja1105, qed, mlxsw, mv88e6xxx, dpaa2-eth). - In mlxsw expose critical and emergency SFP module temperature alarms. Refactor port buffer handling to make the defaults more suitable and support setting these values explicitly via the DCBNL interface. - Add XDP support for Intel's igb driver. - Support offloading TC flower classification and filtering rules to mscc_ocelot switches. - Add PTP support for Marvell Octeontx2 and PP2.2 hardware, as well as fixed interval period pulse generator and one-step timestamping in dpaa-eth. - Add support for various auth offloads in WiFi APs, e.g. SAE (WPA3) offload. - Add Lynx PHY/PCS MDIO module, and convert various drivers which have this HW to use it. Convert mvpp2 to split PCS. - Support Marvell Prestera 98DX3255 24-port switch ASICs, as well as 7-port Mediatek MT7531 IP. - Add initial support for QCA6390 and IPQ6018 in ath11k WiFi driver, and wcn3680 support in wcn36xx. - Improve performance for packets which don't require much offloads on recent Mellanox NICs by 20% by making multiple packets share a descriptor entry. - Move chelsio inline crypto drivers (for TLS and IPsec) from the crypto subtree to drivers/net. Move MDIO drivers out of the phy directory. - Clean up a lot of W=1 warnings, reportedly the actively developed subsections of networking drivers should now build W=1 warning free. - Make sure drivers don't use in_interrupt() to dynamically adapt their code. Convert tasklets to use new tasklet_setup API (sadly this conversion is not yet complete). * tag 'net-next-5.10' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2583 commits) Revert "bpfilter: Fix build error with CONFIG_BPFILTER_UMH" net, sockmap: Don't call bpf_prog_put() on NULL pointer bpf, selftest: Fix flaky tcp_hdr_options test when adding addr to lo bpf, sockmap: Add locking annotations to iterator netfilter: nftables: allow re-computing sctp CRC-32C in 'payload' statements net: fix pos incrementment in ipv6_route_seq_next net/smc: fix invalid return code in smcd_new_buf_create() net/smc: fix valid DMBE buffer sizes net/smc: fix use-after-free of delayed events bpfilter: Fix build error with CONFIG_BPFILTER_UMH cxgb4/ch_ipsec: Replace the module name to ch_ipsec from chcr net: sched: Fix suspicious RCU usage while accessing tcf_tunnel_info bpf: Fix register equivalence tracking. rxrpc: Fix loss of final ack on shutdown rxrpc: Fix bundle counting for exclusive connections netfilter: restore NF_INET_NUMHOOKS ibmveth: Identify ingress large send packets. ibmveth: Switch order of ibmveth_helper calls. cxgb4: handle 4-tuple PEDIT to NAT mode translation selftests: Add VRF route leaking tests ...
Diffstat (limited to 'tools/lib')
-rw-r--r--tools/lib/bpf/Makefile28
-rw-r--r--tools/lib/bpf/bpf.c70
-rw-r--r--tools/lib/bpf/bpf.h39
-rw-r--r--tools/lib/bpf/bpf_core_read.h120
-rw-r--r--tools/lib/bpf/bpf_helpers.h49
-rw-r--r--tools/lib/bpf/bpf_prog_linfo.c3
-rw-r--r--tools/lib/bpf/bpf_tracing.h4
-rw-r--r--tools/lib/bpf/btf.c1631
-rw-r--r--tools/lib/bpf/btf.h103
-rw-r--r--tools/lib/bpf/btf_dump.c87
-rw-r--r--tools/lib/bpf/hashmap.c3
-rw-r--r--tools/lib/bpf/hashmap.h12
-rw-r--r--tools/lib/bpf/libbpf.c3407
-rw-r--r--tools/lib/bpf/libbpf.h12
-rw-r--r--tools/lib/bpf/libbpf.map38
-rw-r--r--tools/lib/bpf/libbpf_common.h2
-rw-r--r--tools/lib/bpf/libbpf_internal.h147
-rw-r--r--tools/lib/bpf/libbpf_probes.c8
-rw-r--r--tools/lib/bpf/netlink.c128
-rw-r--r--tools/lib/bpf/nlattr.c9
-rw-r--r--tools/lib/bpf/ringbuf.c8
-rw-r--r--tools/lib/bpf/xsk.c383
-rw-r--r--tools/lib/bpf/xsk.h9
23 files changed, 4726 insertions, 1574 deletions
diff --git a/tools/lib/bpf/Makefile b/tools/lib/bpf/Makefile
index 9ae8f4ef0aac..5f9abed3e226 100644
--- a/tools/lib/bpf/Makefile
+++ b/tools/lib/bpf/Makefile
@@ -1,6 +1,9 @@
# SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
# Most of this file is copied from tools/lib/traceevent/Makefile
+RM ?= rm
+srctree = $(abs_srctree)
+
LIBBPF_VERSION := $(shell \
grep -oE '^LIBBPF_([0-9.]+)' libbpf.map | \
sort -rV | head -n1 | cut -d'_' -f2)
@@ -56,7 +59,7 @@ ifndef VERBOSE
endif
FEATURE_USER = .libbpf
-FEATURE_TESTS = libelf libelf-mmap zlib bpf reallocarray
+FEATURE_TESTS = libelf zlib bpf
FEATURE_DISPLAY = libelf zlib bpf
INCLUDES = -I. -I$(srctree)/tools/include -I$(srctree)/tools/include/uapi
@@ -95,27 +98,18 @@ PC_FILE = libbpf.pc
ifdef EXTRA_CFLAGS
CFLAGS := $(EXTRA_CFLAGS)
else
- CFLAGS := -g -Wall
-endif
-
-ifeq ($(feature-libelf-mmap), 1)
- override CFLAGS += -DHAVE_LIBELF_MMAP_SUPPORT
-endif
-
-ifeq ($(feature-reallocarray), 0)
- override CFLAGS += -DCOMPAT_NEED_REALLOCARRAY
+ CFLAGS := -g -O2
endif
# Append required CFLAGS
-override CFLAGS += $(EXTRA_WARNINGS)
+override CFLAGS += $(EXTRA_WARNINGS) -Wno-switch-enum
override CFLAGS += -Werror -Wall
-override CFLAGS += -fPIC
override CFLAGS += $(INCLUDES)
override CFLAGS += -fvisibility=hidden
override CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64
# flags specific for shared library
-SHLIB_FLAGS := -DSHARED
+SHLIB_FLAGS := -DSHARED -fPIC
ifeq ($(VERBOSE),1)
Q =
@@ -197,7 +191,7 @@ $(OUTPUT)libbpf.so.$(LIBBPF_VERSION): $(BPF_IN_SHARED)
@ln -sf $(@F) $(OUTPUT)libbpf.so.$(LIBBPF_MAJOR_VERSION)
$(OUTPUT)libbpf.a: $(BPF_IN_STATIC)
- $(QUIET_LINK)$(RM) $@; $(AR) rcs $@ $^
+ $(QUIET_LINK)$(RM) -f $@; $(AR) rcs $@ $^
$(OUTPUT)libbpf.pc:
$(QUIET_GEN)sed -e "s|@PREFIX@|$(prefix)|" \
@@ -271,10 +265,10 @@ install: install_lib install_pkgconfig install_headers
### Cleaning rules
config-clean:
- $(call QUIET_CLEAN, config)
+ $(call QUIET_CLEAN, feature-detect)
$(Q)$(MAKE) -C $(srctree)/tools/build/feature/ clean >/dev/null
-clean:
+clean: config-clean
$(call QUIET_CLEAN, libbpf) $(RM) -rf $(CMD_TARGETS) \
*~ .*.d .*.cmd LIBBPF-CFLAGS $(BPF_HELPER_DEFS) \
$(SHARED_OBJDIR) $(STATIC_OBJDIR) \
@@ -301,7 +295,7 @@ cscope:
cscope -b -q -I $(srctree)/include -f cscope.out
tags:
- rm -f TAGS tags
+ $(RM) -f TAGS tags
ls *.c *.h | xargs $(TAGS_PROG) -a
# Declare the contents of the .PHONY variable as phony. We keep that
diff --git a/tools/lib/bpf/bpf.c b/tools/lib/bpf/bpf.c
index 0750681057c2..d27e34133973 100644
--- a/tools/lib/bpf/bpf.c
+++ b/tools/lib/bpf/bpf.c
@@ -32,9 +32,6 @@
#include "libbpf.h"
#include "libbpf_internal.h"
-/* make sure libbpf doesn't use kernel-only integer typedefs */
-#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64
-
/*
* When building perf, unistd.h is overridden. __NR_bpf is
* required to be defined explicitly.
@@ -589,19 +586,31 @@ int bpf_link_create(int prog_fd, int target_fd,
enum bpf_attach_type attach_type,
const struct bpf_link_create_opts *opts)
{
+ __u32 target_btf_id, iter_info_len;
union bpf_attr attr;
if (!OPTS_VALID(opts, bpf_link_create_opts))
return -EINVAL;
+ iter_info_len = OPTS_GET(opts, iter_info_len, 0);
+ target_btf_id = OPTS_GET(opts, target_btf_id, 0);
+
+ if (iter_info_len && target_btf_id)
+ return -EINVAL;
+
memset(&attr, 0, sizeof(attr));
attr.link_create.prog_fd = prog_fd;
attr.link_create.target_fd = target_fd;
attr.link_create.attach_type = attach_type;
attr.link_create.flags = OPTS_GET(opts, flags, 0);
- attr.link_create.iter_info =
- ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0));
- attr.link_create.iter_info_len = OPTS_GET(opts, iter_info_len, 0);
+
+ if (iter_info_len) {
+ attr.link_create.iter_info =
+ ptr_to_u64(OPTS_GET(opts, iter_info, (void *)0));
+ attr.link_create.iter_info_len = iter_info_len;
+ } else if (target_btf_id) {
+ attr.link_create.target_btf_id = target_btf_id;
+ }
return sys_bpf(BPF_LINK_CREATE, &attr, sizeof(attr));
}
@@ -715,6 +724,37 @@ int bpf_prog_test_run_xattr(struct bpf_prog_test_run_attr *test_attr)
return ret;
}
+int bpf_prog_test_run_opts(int prog_fd, struct bpf_test_run_opts *opts)
+{
+ union bpf_attr attr;
+ int ret;
+
+ if (!OPTS_VALID(opts, bpf_test_run_opts))
+ return -EINVAL;
+
+ memset(&attr, 0, sizeof(attr));
+ attr.test.prog_fd = prog_fd;
+ attr.test.cpu = OPTS_GET(opts, cpu, 0);
+ attr.test.flags = OPTS_GET(opts, flags, 0);
+ attr.test.repeat = OPTS_GET(opts, repeat, 0);
+ attr.test.duration = OPTS_GET(opts, duration, 0);
+ attr.test.ctx_size_in = OPTS_GET(opts, ctx_size_in, 0);
+ attr.test.ctx_size_out = OPTS_GET(opts, ctx_size_out, 0);
+ attr.test.data_size_in = OPTS_GET(opts, data_size_in, 0);
+ attr.test.data_size_out = OPTS_GET(opts, data_size_out, 0);
+ attr.test.ctx_in = ptr_to_u64(OPTS_GET(opts, ctx_in, NULL));
+ attr.test.ctx_out = ptr_to_u64(OPTS_GET(opts, ctx_out, NULL));
+ attr.test.data_in = ptr_to_u64(OPTS_GET(opts, data_in, NULL));
+ attr.test.data_out = ptr_to_u64(OPTS_GET(opts, data_out, NULL));
+
+ ret = sys_bpf(BPF_PROG_TEST_RUN, &attr, sizeof(attr));
+ OPTS_SET(opts, data_size_out, attr.test.data_size_out);
+ OPTS_SET(opts, ctx_size_out, attr.test.ctx_size_out);
+ OPTS_SET(opts, duration, attr.test.duration);
+ OPTS_SET(opts, retval, attr.test.retval);
+ return ret;
+}
+
static int bpf_obj_get_next_id(__u32 start_id, __u32 *next_id, int cmd)
{
union bpf_attr attr;
@@ -818,7 +858,7 @@ int bpf_raw_tracepoint_open(const char *name, int prog_fd)
return sys_bpf(BPF_RAW_TRACEPOINT_OPEN, &attr, sizeof(attr));
}
-int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
+int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf, __u32 log_buf_size,
bool do_log)
{
union bpf_attr attr = {};
@@ -875,3 +915,19 @@ int bpf_enable_stats(enum bpf_stats_type type)
return sys_bpf(BPF_ENABLE_STATS, &attr, sizeof(attr));
}
+
+int bpf_prog_bind_map(int prog_fd, int map_fd,
+ const struct bpf_prog_bind_opts *opts)
+{
+ union bpf_attr attr;
+
+ if (!OPTS_VALID(opts, bpf_prog_bind_opts))
+ return -EINVAL;
+
+ memset(&attr, 0, sizeof(attr));
+ attr.prog_bind_map.prog_fd = prog_fd;
+ attr.prog_bind_map.map_fd = map_fd;
+ attr.prog_bind_map.flags = OPTS_GET(opts, flags, 0);
+
+ return sys_bpf(BPF_PROG_BIND_MAP, &attr, sizeof(attr));
+}
diff --git a/tools/lib/bpf/bpf.h b/tools/lib/bpf/bpf.h
index 015d13f25fcc..875dde20d56e 100644
--- a/tools/lib/bpf/bpf.h
+++ b/tools/lib/bpf/bpf.h
@@ -174,8 +174,9 @@ struct bpf_link_create_opts {
__u32 flags;
union bpf_iter_link_info *iter_info;
__u32 iter_info_len;
+ __u32 target_btf_id;
};
-#define bpf_link_create_opts__last_field iter_info_len
+#define bpf_link_create_opts__last_field target_btf_id
LIBBPF_API int bpf_link_create(int prog_fd, int target_fd,
enum bpf_attach_type attach_type,
@@ -234,7 +235,7 @@ LIBBPF_API int bpf_prog_query(int target_fd, enum bpf_attach_type type,
__u32 query_flags, __u32 *attach_flags,
__u32 *prog_ids, __u32 *prog_cnt);
LIBBPF_API int bpf_raw_tracepoint_open(const char *name, int prog_fd);
-LIBBPF_API int bpf_load_btf(void *btf, __u32 btf_size, char *log_buf,
+LIBBPF_API int bpf_load_btf(const void *btf, __u32 btf_size, char *log_buf,
__u32 log_buf_size, bool do_log);
LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
__u32 *buf_len, __u32 *prog_id, __u32 *fd_type,
@@ -243,6 +244,40 @@ LIBBPF_API int bpf_task_fd_query(int pid, int fd, __u32 flags, char *buf,
enum bpf_stats_type; /* defined in up-to-date linux/bpf.h */
LIBBPF_API int bpf_enable_stats(enum bpf_stats_type type);
+struct bpf_prog_bind_opts {
+ size_t sz; /* size of this struct for forward/backward compatibility */
+ __u32 flags;
+};
+#define bpf_prog_bind_opts__last_field flags
+
+LIBBPF_API int bpf_prog_bind_map(int prog_fd, int map_fd,
+ const struct bpf_prog_bind_opts *opts);
+
+struct bpf_test_run_opts {
+ size_t sz; /* size of this struct for forward/backward compatibility */
+ const void *data_in; /* optional */
+ void *data_out; /* optional */
+ __u32 data_size_in;
+ __u32 data_size_out; /* in: max length of data_out
+ * out: length of data_out
+ */
+ const void *ctx_in; /* optional */
+ void *ctx_out; /* optional */
+ __u32 ctx_size_in;
+ __u32 ctx_size_out; /* in: max length of ctx_out
+ * out: length of cxt_out
+ */
+ __u32 retval; /* out: return code of the BPF program */
+ int repeat;
+ __u32 duration; /* out: average per repetition in ns */
+ __u32 flags;
+ __u32 cpu;
+};
+#define bpf_test_run_opts__last_field cpu
+
+LIBBPF_API int bpf_prog_test_run_opts(int prog_fd,
+ struct bpf_test_run_opts *opts);
+
#ifdef __cplusplus
} /* extern "C" */
#endif
diff --git a/tools/lib/bpf/bpf_core_read.h b/tools/lib/bpf/bpf_core_read.h
index eae5cccff761..bbcefb3ff5a5 100644
--- a/tools/lib/bpf/bpf_core_read.h
+++ b/tools/lib/bpf/bpf_core_read.h
@@ -19,32 +19,52 @@ enum bpf_field_info_kind {
BPF_FIELD_RSHIFT_U64 = 5,
};
+/* second argument to __builtin_btf_type_id() built-in */
+enum bpf_type_id_kind {
+ BPF_TYPE_ID_LOCAL = 0, /* BTF type ID in local program */
+ BPF_TYPE_ID_TARGET = 1, /* BTF type ID in target kernel */
+};
+
+/* second argument to __builtin_preserve_type_info() built-in */
+enum bpf_type_info_kind {
+ BPF_TYPE_EXISTS = 0, /* type existence in target kernel */
+ BPF_TYPE_SIZE = 1, /* type size in target kernel */
+};
+
+/* second argument to __builtin_preserve_enum_value() built-in */
+enum bpf_enum_value_kind {
+ BPF_ENUMVAL_EXISTS = 0, /* enum value existence in kernel */
+ BPF_ENUMVAL_VALUE = 1, /* enum value value relocation */
+};
+
#define __CORE_RELO(src, field, info) \
__builtin_preserve_field_info((src)->field, BPF_FIELD_##info)
#if __BYTE_ORDER == __LITTLE_ENDIAN
#define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \
- bpf_probe_read((void *)dst, \
- __CORE_RELO(src, fld, BYTE_SIZE), \
- (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET))
+ bpf_probe_read_kernel( \
+ (void *)dst, \
+ __CORE_RELO(src, fld, BYTE_SIZE), \
+ (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET))
#else
/* semantics of LSHIFT_64 assumes loading values into low-ordered bytes, so
* for big-endian we need to adjust destination pointer accordingly, based on
* field byte size
*/
#define __CORE_BITFIELD_PROBE_READ(dst, src, fld) \
- bpf_probe_read((void *)dst + (8 - __CORE_RELO(src, fld, BYTE_SIZE)), \
- __CORE_RELO(src, fld, BYTE_SIZE), \
- (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET))
+ bpf_probe_read_kernel( \
+ (void *)dst + (8 - __CORE_RELO(src, fld, BYTE_SIZE)), \
+ __CORE_RELO(src, fld, BYTE_SIZE), \
+ (const void *)src + __CORE_RELO(src, fld, BYTE_OFFSET))
#endif
/*
* Extract bitfield, identified by s->field, and return its value as u64.
* All this is done in relocatable manner, so bitfield changes such as
* signedness, bit size, offset changes, this will be handled automatically.
- * This version of macro is using bpf_probe_read() to read underlying integer
- * storage. Macro functions as an expression and its return type is
- * bpf_probe_read()'s return value: 0, on success, <0 on error.
+ * This version of macro is using bpf_probe_read_kernel() to read underlying
+ * integer storage. Macro functions as an expression and its return type is
+ * bpf_probe_read_kernel()'s return value: 0, on success, <0 on error.
*/
#define BPF_CORE_READ_BITFIELD_PROBED(s, field) ({ \
unsigned long long val = 0; \
@@ -92,15 +112,75 @@ enum bpf_field_info_kind {
__builtin_preserve_field_info(field, BPF_FIELD_EXISTS)
/*
- * Convenience macro to get byte size of a field. Works for integers,
+ * Convenience macro to get the byte size of a field. Works for integers,
* struct/unions, pointers, arrays, and enums.
*/
#define bpf_core_field_size(field) \
__builtin_preserve_field_info(field, BPF_FIELD_BYTE_SIZE)
/*
- * bpf_core_read() abstracts away bpf_probe_read() call and captures offset
- * relocation for source address using __builtin_preserve_access_index()
+ * Convenience macro to get BTF type ID of a specified type, using a local BTF
+ * information. Return 32-bit unsigned integer with type ID from program's own
+ * BTF. Always succeeds.
+ */
+#define bpf_core_type_id_local(type) \
+ __builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_LOCAL)
+
+/*
+ * Convenience macro to get BTF type ID of a target kernel's type that matches
+ * specified local type.
+ * Returns:
+ * - valid 32-bit unsigned type ID in kernel BTF;
+ * - 0, if no matching type was found in a target kernel BTF.
+ */
+#define bpf_core_type_id_kernel(type) \
+ __builtin_btf_type_id(*(typeof(type) *)0, BPF_TYPE_ID_TARGET)
+
+/*
+ * Convenience macro to check that provided named type
+ * (struct/union/enum/typedef) exists in a target kernel.
+ * Returns:
+ * 1, if such type is present in target kernel's BTF;
+ * 0, if no matching type is found.
+ */
+#define bpf_core_type_exists(type) \
+ __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_EXISTS)
+
+/*
+ * Convenience macro to get the byte size of a provided named type
+ * (struct/union/enum/typedef) in a target kernel.
+ * Returns:
+ * >= 0 size (in bytes), if type is present in target kernel's BTF;
+ * 0, if no matching type is found.
+ */
+#define bpf_core_type_size(type) \
+ __builtin_preserve_type_info(*(typeof(type) *)0, BPF_TYPE_SIZE)
+
+/*
+ * Convenience macro to check that provided enumerator value is defined in
+ * a target kernel.
+ * Returns:
+ * 1, if specified enum type and its enumerator value are present in target
+ * kernel's BTF;
+ * 0, if no matching enum and/or enum value within that enum is found.
+ */
+#define bpf_core_enum_value_exists(enum_type, enum_value) \
+ __builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_EXISTS)
+
+/*
+ * Convenience macro to get the integer value of an enumerator value in
+ * a target kernel.
+ * Returns:
+ * 64-bit value, if specified enum type and its enumerator value are
+ * present in target kernel's BTF;
+ * 0, if no matching enum and/or enum value within that enum is found.
+ */
+#define bpf_core_enum_value(enum_type, enum_value) \
+ __builtin_preserve_enum_value(*(typeof(enum_type) *)enum_value, BPF_ENUMVAL_VALUE)
+
+/*
+ * bpf_core_read() abstracts away bpf_probe_read_kernel() call and captures
+ * offset relocation for source address using __builtin_preserve_access_index()
* built-in, provided by Clang.
*
* __builtin_preserve_access_index() takes as an argument an expression of
@@ -115,8 +195,8 @@ enum bpf_field_info_kind {
* (local) BTF, used to record relocation.
*/
#define bpf_core_read(dst, sz, src) \
- bpf_probe_read(dst, sz, \
- (const void *)__builtin_preserve_access_index(src))
+ bpf_probe_read_kernel(dst, sz, \
+ (const void *)__builtin_preserve_access_index(src))
/*
* bpf_core_read_str() is a thin wrapper around bpf_probe_read_str()
@@ -124,8 +204,8 @@ enum bpf_field_info_kind {
* argument.
*/
#define bpf_core_read_str(dst, sz, src) \
- bpf_probe_read_str(dst, sz, \
- (const void *)__builtin_preserve_access_index(src))
+ bpf_probe_read_kernel_str(dst, sz, \
+ (const void *)__builtin_preserve_access_index(src))
#define ___concat(a, b) a ## b
#define ___apply(fn, n) ___concat(fn, n)
@@ -239,15 +319,17 @@ enum bpf_field_info_kind {
* int x = BPF_CORE_READ(s, a.b.c, d.e, f, g);
*
* BPF_CORE_READ will decompose above statement into 4 bpf_core_read (BPF
- * CO-RE relocatable bpf_probe_read() wrapper) calls, logically equivalent to:
+ * CO-RE relocatable bpf_probe_read_kernel() wrapper) calls, logically
+ * equivalent to:
* 1. const void *__t = s->a.b.c;
* 2. __t = __t->d.e;
* 3. __t = __t->f;
* 4. return __t->g;
*
* Equivalence is logical, because there is a heavy type casting/preservation
- * involved, as well as all the reads are happening through bpf_probe_read()
- * calls using __builtin_preserve_access_index() to emit CO-RE relocations.
+ * involved, as well as all the reads are happening through
+ * bpf_probe_read_kernel() calls using __builtin_preserve_access_index() to
+ * emit CO-RE relocations.
*
* N.B. Only up to 9 "field accessors" are supported, which should be more
* than enough for any practical purpose.
diff --git a/tools/lib/bpf/bpf_helpers.h b/tools/lib/bpf/bpf_helpers.h
index e9a4ecddb7a5..2bdb7d6dbad2 100644
--- a/tools/lib/bpf/bpf_helpers.h
+++ b/tools/lib/bpf/bpf_helpers.h
@@ -32,6 +32,9 @@
#ifndef __always_inline
#define __always_inline __attribute__((always_inline))
#endif
+#ifndef __noinline
+#define __noinline __attribute__((noinline))
+#endif
#ifndef __weak
#define __weak __attribute__((weak))
#endif
@@ -51,6 +54,52 @@
#endif
/*
+ * Helper macro to throw a compilation error if __bpf_unreachable() gets
+ * built into the resulting code. This works given BPF back end does not
+ * implement __builtin_trap(). This is useful to assert that certain paths
+ * of the program code are never used and hence eliminated by the compiler.
+ *
+ * For example, consider a switch statement that covers known cases used by
+ * the program. __bpf_unreachable() can then reside in the default case. If
+ * the program gets extended such that a case is not covered in the switch
+ * statement, then it will throw a build error due to the default case not
+ * being compiled out.
+ */
+#ifndef __bpf_unreachable
+# define __bpf_unreachable() __builtin_trap()
+#endif
+
+/*
+ * Helper function to perform a tail call with a constant/immediate map slot.
+ */
+static __always_inline void
+bpf_tail_call_static(void *ctx, const void *map, const __u32 slot)
+{
+ if (!__builtin_constant_p(slot))
+ __bpf_unreachable();
+
+ /*
+ * Provide a hard guarantee that LLVM won't optimize setting r2 (map
+ * pointer) and r3 (constant map index) from _different paths_ ending
+ * up at the _same_ call insn as otherwise we won't be able to use the
+ * jmpq/nopl retpoline-free patching by the x86-64 JIT in the kernel
+ * given they mismatch. See also d2e4c1e6c294 ("bpf: Constant map key
+ * tracking for prog array pokes") for details on verifier tracking.
+ *
+ * Note on clobber list: we need to stay in-line with BPF calling
+ * convention, so even if we don't end up using r0, r4, r5, we need
+ * to mark them as clobber so that LLVM doesn't end up using them
+ * before / after the call.
+ */
+ asm volatile("r1 = %[ctx]\n\t"
+ "r2 = %[map]\n\t"
+ "r3 = %[slot]\n\t"
+ "call 12"
+ :: [ctx]"r"(ctx), [map]"r"(map), [slot]"i"(slot)
+ : "r0", "r1", "r2", "r3", "r4", "r5");
+}
+
+/*
* Helper structure used by eBPF C program
* to describe BPF map attributes to libbpf loader
*/
diff --git a/tools/lib/bpf/bpf_prog_linfo.c b/tools/lib/bpf/bpf_prog_linfo.c
index bafca49cb1e6..3ed1a27b5f7c 100644
--- a/tools/lib/bpf/bpf_prog_linfo.c
+++ b/tools/lib/bpf/bpf_prog_linfo.c
@@ -8,9 +8,6 @@
#include "libbpf.h"
#include "libbpf_internal.h"
-/* make sure libbpf doesn't use kernel-only integer typedefs */
-#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64
-
struct bpf_prog_linfo {
void *raw_linfo;
void *raw_jited_linfo;
diff --git a/tools/lib/bpf/bpf_tracing.h b/tools/lib/bpf/bpf_tracing.h
index eebf020cbe3e..f9ef37707888 100644
--- a/tools/lib/bpf/bpf_tracing.h
+++ b/tools/lib/bpf/bpf_tracing.h
@@ -289,9 +289,9 @@ struct pt_regs;
#define BPF_KRETPROBE_READ_RET_IP BPF_KPROBE_READ_RET_IP
#else
#define BPF_KPROBE_READ_RET_IP(ip, ctx) \
- ({ bpf_probe_read(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); })
+ ({ bpf_probe_read_kernel(&(ip), sizeof(ip), (void *)PT_REGS_RET(ctx)); })
#define BPF_KRETPROBE_READ_RET_IP(ip, ctx) \
- ({ bpf_probe_read(&(ip), sizeof(ip), \
+ ({ bpf_probe_read_kernel(&(ip), sizeof(ip), \
(void *)(PT_REGS_FP(ctx) + sizeof(ip))); })
#endif
diff --git a/tools/lib/bpf/btf.c b/tools/lib/bpf/btf.c
index 6bdbc389b493..231b07203e3d 100644
--- a/tools/lib/bpf/btf.c
+++ b/tools/lib/bpf/btf.c
@@ -1,6 +1,7 @@
// SPDX-License-Identifier: (LGPL-2.1 OR BSD-2-Clause)
/* Copyright (c) 2018 Facebook */
+#include <byteswap.h>
#include <endian.h>
#include <stdio.h>
#include <stdlib.h>
@@ -21,26 +22,78 @@
#include "libbpf_internal.h"
#include "hashmap.h"
-/* make sure libbpf doesn't use kernel-only integer typedefs */
-#pragma GCC poison u8 u16 u32 u64 s8 s16 s32 s64
-
#define BTF_MAX_NR_TYPES 0x7fffffffU
#define BTF_MAX_STR_OFFSET 0x7fffffffU
static struct btf_type btf_void;
struct btf {
- union {
- struct btf_header *hdr;
- void *data;
- };
- struct btf_type **types;
- const char *strings;
- void *nohdr_data;
+ /* raw BTF data in native endianness */
+ void *raw_data;
+ /* raw BTF data in non-native endianness */
+ void *raw_data_swapped;
+ __u32 raw_size;
+ /* whether target endianness differs from the native one */
+ bool swapped_endian;
+
+ /*
+ * When BTF is loaded from an ELF or raw memory it is stored
+ * in a contiguous memory block. The hdr, type_data, and, strs_data
+ * point inside that memory region to their respective parts of BTF
+ * representation:
+ *
+ * +--------------------------------+
+ * | Header | Types | Strings |
+ * +--------------------------------+
+ * ^ ^ ^
+ * | | |
+ * hdr | |
+ * types_data-+ |
+ * strs_data------------+
+ *
+ * If BTF data is later modified, e.g., due to types added or
+ * removed, BTF deduplication performed, etc, this contiguous
+ * representation is broken up into three independently allocated
+ * memory regions to be able to modify them independently.
+ * raw_data is nulled out at that point, but can be later allocated
+ * and cached again if user calls btf__get_raw_data(), at which point
+ * raw_data will contain a contiguous copy of header, types, and
+ * strings:
+ *
+ * +----------+ +---------+ +-----------+
+ * | Header | | Types | | Strings |
+ * +----------+ +---------+ +-----------+
+ * ^ ^ ^
+ * | | |
+ * hdr | |
+ * types_data----+ |
+ * strs_data------------------+
+ *
+ * +----------+---------+-----------+
+ * | Header | Types | Strings |
+ * raw_data----->+----------+---------+-----------+
+ */
+ struct btf_header *hdr;
+
+ void *types_data;
+ size_t types_data_cap; /* used size stored in hdr->type_len */
+
+ /* type ID to `struct btf_type *` lookup index */
+ __u32 *type_offs;
+ size_t type_offs_cap;
__u32 nr_types;
- __u32 types_size;
- __u32 data_size;
+
+ void *strs_data;
+ size_t strs_data_cap; /* used size stored in hdr->str_len */
+
+ /* lookup index for each unique string in strings section */
+ struct hashmap *strs_hash;
+ /* whether strings are already deduplicated */
+ bool strs_deduped;
+ /* BTF object FD, if loaded into kernel */
int fd;
+
+ /* Pointer size (in bytes) for a target architecture of this BTF */
int ptr_sz;
};
@@ -49,60 +102,114 @@ static inline __u64 ptr_to_u64(const void *ptr)
return (__u64) (unsigned long) ptr;
}
-static int btf_add_type(struct btf *btf, struct btf_type *t)
+/* Ensure given dynamically allocated memory region pointed to by *data* with
+ * capacity of *cap_cnt* elements each taking *elem_sz* bytes has enough
+ * memory to accomodate *add_cnt* new elements, assuming *cur_cnt* elements
+ * are already used. At most *max_cnt* elements can be ever allocated.
+ * If necessary, memory is reallocated and all existing data is copied over,
+ * new pointer to the memory region is stored at *data, new memory region
+ * capacity (in number of elements) is stored in *cap.
+ * On success, memory pointer to the beginning of unused memory is returned.
+ * On error, NULL is returned.
+ */
+void *btf_add_mem(void **data, size_t *cap_cnt, size_t elem_sz,
+ size_t cur_cnt, size_t max_cnt, size_t add_cnt)
{
- if (btf->types_size - btf->nr_types < 2) {
- struct btf_type **new_types;
- __u32 expand_by, new_size;
+ size_t new_cnt;
+ void *new_data;
- if (btf->types_size == BTF_MAX_NR_TYPES)
- return -E2BIG;
+ if (cur_cnt + add_cnt <= *cap_cnt)
+ return *data + cur_cnt * elem_sz;
- expand_by = max(btf->types_size >> 2, 16U);
- new_size = min(BTF_MAX_NR_TYPES, btf->types_size + expand_by);
+ /* requested more than the set limit */
+ if (cur_cnt + add_cnt > max_cnt)
+ return NULL;
- new_types = realloc(btf->types, sizeof(*new_types) * new_size);
- if (!new_types)
- return -ENOMEM;
+ new_cnt = *cap_cnt;
+ new_cnt += new_cnt / 4; /* expand by 25% */
+ if (new_cnt < 16) /* but at least 16 elements */
+ new_cnt = 16;
+ if (new_cnt > max_cnt) /* but not exceeding a set limit */
+ new_cnt = max_cnt;
+ if (new_cnt < cur_cnt + add_cnt) /* also ensure we have enough memory */
+ new_cnt = cur_cnt + add_cnt;
+
+ new_data = libbpf_reallocarray(*data, new_cnt, elem_sz);
+ if (!new_data)
+ return NULL;
- if (btf->nr_types == 0)
- new_types[0] = &btf_void;
+ /* zero out newly allocated portion of memory */
+ memset(new_data + (*cap_cnt) * elem_sz, 0, (new_cnt - *cap_cnt) * elem_sz);