aboutsummaryrefslogtreecommitdiff
path: root/security/selinux/include
AgeCommit message (Collapse)AuthorFilesLines
2026-01-13selinux: add support for BPF token access controlEric Suen5-1/+12
BPF token support was introduced to allow a privileged process to delegate limited BPF functionality—such as map creation and program loading—to an unprivileged process: https://lore.kernel.org/linux-security-module/20231130185229.2688956-1-andrii@kernel.org/ This patch adds SELinux support for controlling BPF token access. With this change, SELinux policies can now enforce constraints on BPF token usage based on both the delegating (privileged) process and the recipient (unprivileged) process. Supported operations currently include: - map_create - prog_load High-level workflow: 1. An unprivileged process creates a VFS context via `fsopen()` and obtains a file descriptor. 2. This descriptor is passed to a privileged process, which configures BPF token delegation options and mounts a BPF filesystem. 3. SELinux records the `creator_sid` of the privileged process during mount setup. 4. The unprivileged process then uses this BPF fs mount to create a token and attach it to subsequent BPF syscalls. 5. During verification of `map_create` and `prog_load`, SELinux uses `creator_sid` and the current SID to check policy permissions via: avc_has_perm(creator_sid, current_sid, SECCLASS_BPF, BPF__MAP_CREATE, NULL); The implementation introduces two new permissions: - map_create_as - prog_load_as At token creation time, SELinux verifies that the current process has the appropriate `*_as` permission (depending on the `allowed_cmds` value in the bpf_token) to act on behalf of the `creator_sid`. Example SELinux policy: allow test_bpf_t self:bpf { map_create map_read map_write prog_load prog_run map_create_as prog_load_as }; Additionally, a new policy capability bpf_token_perms is added to ensure backward compatibility. If disabled, previous behavior ((checks based on current process SID)) is preserved. Signed-off-by: Eric Suen <ericsu@linux.microsoft.com> Tested-by: Daniel Durning <danieldurning.work@gmail.com> Reviewed-by: Daniel Durning <danieldurning.work@gmail.com> [PM: merge fuzz, subject tweaks, whitespace tweaks, line length tweaks] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-12-03Merge tag 'selinux-pr-20251201' of ↵Linus Torvalds5-0/+56
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Improve the granularity of SELinux labeling for memfd files Currently when creating a memfd file, SELinux treats it the same as any other tmpfs, or hugetlbfs, file. While simple, the drawback is that it is not possible to differentiate between memfd and tmpfs files. This adds a call to the security_inode_init_security_anon() LSM hook and wires up SELinux to provide a set of memfd specific access controls, including the ability to control the execution of memfds. As usual, the commit message has more information. - Improve the SELinux AVC lookup performance Adopt MurmurHash3 for the SELinux AVC hash function instead of the custom hash function currently used. MurmurHash3 is already used for the SELinux access vector table so the impact to the code is minimal, and performance tests have shown improvements in both hash distribution and latency. See the commit message for the performance measurments. - Introduce a Kconfig option for the SELinux AVC bucket/slot size While we have the ability to grow the number of AVC hash buckets today, the size of the buckets (slot size) is fixed at 512. This pull request makes that slot size configurable at build time through a new Kconfig knob, CONFIG_SECURITY_SELINUX_AVC_HASH_BITS. * tag 'selinux-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: improve bucket distribution uniformity of avc_hash() selinux: Move avtab_hash() to a shared location for future reuse selinux: Introduce a new config to make avc cache slot size adjustable memfd,selinux: call security_inode_init_security_anon()
2025-12-03Merge tag 'lsm-pr-20251201' of ↵Linus Torvalds2-0/+28
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM updates from Paul Moore: - Rework the LSM initialization code What started as a "quick" patch to enable a notification event once all of the individual LSMs were initialized, snowballed a bit into a 30+ patch patchset when everything was done. Most of the patches, and diffstat, is due to splitting out the initialization code into security/lsm_init.c and cleaning up some of the mess that was there. While not strictly necessary, it does cleanup the code signficantly, and hopefully makes the upkeep a bit easier in the future. Aside from the new LSM_STARTED_ALL notification, these changes also ensure that individual LSM initcalls are only called when the LSM is enabled at boot time. There should be a minor reduction in boot times for those who build multiple LSMs into their kernels, but only enable a subset at boot. It is worth mentioning that nothing at present makes use of the LSM_STARTED_ALL notification, but there is work in progress which is dependent upon LSM_STARTED_ALL. - Make better use of the seq_put*() helpers in device_cgroup * tag 'lsm-pr-20251201' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: (36 commits) lsm: use unrcu_pointer() for current->cred in security_init() device_cgroup: Refactor devcgroup_seq_show to use seq_put* helpers lsm: add a LSM_STARTED_ALL notification event lsm: consolidate all of the LSM framework initcalls selinux: move initcalls to the LSM framework ima,evm: move initcalls to the LSM framework lockdown: move initcalls to the LSM framework apparmor: move initcalls to the LSM framework safesetid: move initcalls to the LSM framework tomoyo: move initcalls to the LSM framework smack: move initcalls to the LSM framework ipe: move initcalls to the LSM framework loadpin: move initcalls to the LSM framework lsm: introduce an initcall mechanism into the LSM framework lsm: group lsm_order_parse() with the other lsm_order_*() functions lsm: output available LSMs when debugging lsm: cleanup the debug and console output in lsm_init.c lsm: add/tweak function header comment blocks in lsm_init.c lsm: fold lsm_init_ordered() into security_init() lsm: cleanup initialize_lsm() and rename to lsm_init_single() ...
2025-11-20selinux: rename the cred_security_struct variables to "crsec"Paul Moore1-2/+2
Along with the renaming from task_security_struct to cred_security_struct, rename the local variables to "crsec" from "tsec". This both fits with existing conventions and helps distinguish between task and cred related variables. No functional changes. Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-20selinux: move avdcache to per-task security structStephen Smalley1-2/+12
The avdcache is meant to be per-task; move it to a new task_security_struct that is duplicated per-task. Cc: stable@vger.kernel.org Fixes: 5d7ddc59b3d89b724a5aa8f30d0db94ff8d2d93f ("selinux: reduce path walk overhead") Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: line length fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-11-20selinux: rename task_security_struct to cred_security_structStephen Smalley1-4/+4
Before Linux had cred structures, the SELinux task_security_struct was per-task and although the structure was switched to being per-cred long ago, the name was never updated. This change renames it to cred_security_struct to avoid confusion and pave the way for the introduction of an actual per-task security structure for SELinux. No functional change. Cc: stable@vger.kernel.org Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23selinux: improve bucket distribution uniformity of avc_hash()Hongru Zhang1-5/+6
Reuse the already implemented MurmurHash3 algorithm. Under heavy stress testing (on an 8-core system sustaining over 50,000 authentication events per second), sample once per second and take the mean of 1800 samples: 1. Bucket utilization rate and length of longest chain +--------------------------+-----------------------------------------+ | | bucket utilization rate / longest chain | | +--------------------+--------------------+ | | no-patch | with-patch | +--------------------------+--------------------+--------------------+ | 512 nodes, 512 buckets | 52.5%/7.5 | 60.2%/5.7 | +--------------------------+--------------------+--------------------+ | 1024 nodes, 512 buckets | 68.9%/12.1 | 80.2%/9.7 | +--------------------------+--------------------+--------------------+ | 2048 nodes, 512 buckets | 83.7%/19.4 | 93.4%/16.3 | +--------------------------+--------------------+--------------------+ | 8192 nodes, 8192 buckets | 49.5%/11.4 | 60.3%/7.4 | +--------------------------+--------------------+--------------------+ 2. avc_search_node latency (total latency of hash operation and table lookup) +--------------------------+-----------------------------------------+ | | latency of function avc_search_node | | +--------------------+--------------------+ | | no-patch | with-patch | +--------------------------+--------------------+--------------------+ | 512 nodes, 512 buckets | 87ns | 84ns | +--------------------------+--------------------+--------------------+ | 1024 nodes, 512 buckets | 97ns | 96ns | +--------------------------+--------------------+--------------------+ | 2048 nodes, 512 buckets | 118ns | 113ns | +--------------------------+--------------------+--------------------+ | 8192 nodes, 8192 buckets | 106ns | 99ns | +--------------------------+--------------------+--------------------+ Although MurmurHash3 has higher overhead than the bitwise operations in the original algorithm, the data shows that the MurmurHash3 achieves better distribution, reducing average lookup time. Consequently, the total latency of hashing and table lookup is lower than before. Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com> [PM: whitespace fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-23selinux: Move avtab_hash() to a shared location for future reuseHongru Zhang1-0/+46
This is a preparation patch, no functional change. Signed-off-by: Hongru Zhang <zhanghongru@xiaomi.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22memfd,selinux: call security_inode_init_security_anon()Thiébaud Weksteen4-0/+9
Prior to this change, no security hooks were called at the creation of a memfd file. It means that, for SELinux as an example, it will receive the default type of the filesystem that backs the in-memory inode. In most cases, that would be tmpfs, but if MFD_HUGETLB is passed, it will be hugetlbfs. Both can be considered implementation details of memfd. It also means that it is not possible to differentiate between a file coming from memfd_create and a file coming from a standard tmpfs mount point. Additionally, no permission is validated at creation, which differs from the similar memfd_secret syscall. Call security_inode_init_security_anon during creation. This ensures that the file is setup similarly to other anonymous inodes. On SELinux, it means that the file will receive the security context of its task. The ability to limit fexecve on memfd has been of interest to avoid potential pitfalls where /proc/self/exe or similar would be executed [1][2]. Reuse the "execute_no_trans" and "entrypoint" access vectors, similarly to the file class. These access vectors may not make sense for the existing "anon_inode" class. Therefore, define and assign a new class "memfd_file" to support such access vectors. Guard these changes behind a new policy capability named "memfd_class". [1] https://crbug.com/1305267 [2] https://lore.kernel.org/lkml/20221215001205.51969-1-jeffxu@google.com/ Signed-off-by: Thiébaud Weksteen <tweek@google.com> Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Acked-by: Hugh Dickins <hughd@google.com> [PM: subj tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-10-22selinux: move initcalls to the LSM frameworkPaul Moore2-0/+28
SELinux currently has a number of initcalls so we've created a new function, selinux_initcall(), which wraps all of these initcalls so that we have a single initcall function that can be registered with the LSM framework. Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-09-30Merge tag 'lsm-pr-20250926' of ↵Linus Torvalds1-0/+20
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Move the management of the LSM BPF security blobs into the framework In order to enable multiple LSMs we need to allocate and free the various security blobs in the LSM framework and not the individual LSMs as they would end up stepping all over each other. - Leverage the lsm_bdev_alloc() helper in lsm_bdev_alloc() Make better use of our existing helper functions to reduce some code duplication. - Update the Rust cred code to use 'sync::aref' Part of a larger effort to move the Rust code over to the 'sync' module. - Make CONFIG_LSM dependent on CONFIG_SECURITY As the CONFIG_LSM Kconfig setting is an ordered list of the LSMs to enable a boot, it obviously doesn't make much sense to enable this when CONFIG_SECURITY is disabled. - Update the LSM and CREDENTIALS sections in MAINTAINERS with Rusty bits Add the Rust helper files to the associated LSM and CREDENTIALS entries int the MAINTAINERS file. We're trying to improve the communication between the two groups and making sure we're all aware of what is going on via cross-posting to the relevant lists is a good way to start. * tag 'lsm-pr-20250926' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: lsm: CONFIG_LSM can depend on CONFIG_SECURITY MAINTAINERS: add the associated Rust helper to the CREDENTIALS section MAINTAINERS: add the associated Rust helper to the LSM section rust,cred: update AlwaysRefCounted import to sync::aref security: use umax() to improve code lsm,selinux: Add LSM blob support for BPF objects lsm: use lsm_blob_alloc() in lsm_bdev_alloc()
2025-09-07selinux: enable per-file labeling for functionfsNeill Kapron3-0/+8
This patch adds support for genfscon per-file labeling of functionfs files as well as support for userspace to apply labels after new functionfs endpoints are created. This allows for separate labels and therefore access control on a per-endpoint basis. An example use case would be for the default endpoint EP0 used as a restricted control endpoint, and additional usb endpoints to be used by other more permissive domains. It should be noted that if there are multiple functionfs mounts on a system, genfs file labels will apply to all mounts, and therefore will not likely be as useful as the userspace relabeling portion of this patch - the addition to selinux_is_genfs_special_handling(). This patch introduces the functionfs_seclabel policycap to maintain existing functionfs genfscon behavior unless explicitly enabled. Signed-off-by: Neill Kapron <nkapron@google.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: trim changelog, apply boolean logic fixup] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11lsm,selinux: Add LSM blob support for BPF objectsBlaise Boscaccy1-0/+20
This patch introduces LSM blob support for BPF maps, programs, and tokens to enable LSM stacking and multiplexing of LSM modules that govern BPF objects. Additionally, the existing BPF hooks used by SELinux have been updated to utilize the new blob infrastructure, removing the assumption of exclusive ownership of the security pointer. Signed-off-by: Blaise Boscaccy <bboscaccy@linux.microsoft.com> [PM: dropped local variable init, style fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-08-11selinux: Remove unused function selinux_policycap_netif_wildcard()Yue Haibing1-6/+0
This is unused since commit a3d3043ef24a ("selinux: get netif_wildcard policycap from policy instead of cache"). Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19selinux: optimize selinux_inode_getattr/permission() based on ↵Stephen Smalley1-0/+8
neveraudit|permissive Extend the task avdcache to also cache whether the task SID is both permissive and neveraudit, and return immediately if so in both selinux_inode_getattr() and selinux_inode_permission(). The same approach could be applied to many of the hook functions although the avdcache would need to be updated for more than directory search checks in order for this optimization to be beneficial for checks on objects other than directories. To test, apply https://github.com/SELinuxProject/selinux/pull/473 to your selinux userspace, build and install libsepol, and use the following CIL policy module: $ cat neverauditpermissive.cil (typeneveraudit unconfined_t) (typepermissive unconfined_t) Without this module inserted, running the following commands: perf record make -jN # on an already built allmodconfig tree perf report --sort=symbol,dso yields the following percentages (only showing __d_lookup_rcu for reference and only showing relevant SELinux functions): 1.65% [k] __d_lookup_rcu 0.53% [k] selinux_inode_permission 0.40% [k] selinux_inode_getattr 0.15% [k] avc_lookup 0.05% [k] avc_has_perm 0.05% [k] avc_has_perm_noaudit 0.02% [k] avc_policy_seqno 0.02% [k] selinux_file_permission 0.01% [k] selinux_inode_alloc_security 0.01% [k] selinux_file_alloc_security for a total of 1.24% for SELinux compared to 1.65% for __d_lookup_rcu(). After running the following command to insert this module: semodule -i neverauditpermissive.cil and then re-running the same perf commands from above yields the following non-zero percentages: 1.74% [k] __d_lookup_rcu 0.31% [k] selinux_inode_permission 0.03% [k] selinux_inode_getattr 0.03% [k] avc_policy_seqno 0.01% [k] avc_lookup 0.01% [k] selinux_file_permission 0.01% [k] selinux_file_open for a total of 0.40% for SELinux compared to 1.74% for __d_lookup_rcu(). Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-06-19selinux: introduce neveraudit typesStephen Smalley2-1/+7
Introduce neveraudit types i.e. types that should never trigger audit messages. This allows the AVC to skip all audit-related processing for such types. Note that neveraudit differs from dontaudit not only wrt being applied for all checks with a given source type but also in that it disables all auditing, not just permission denials. When a type is both a permissive type and a neveraudit type, the security server can short-circuit the security_compute_av() logic, allowing all permissions and not auditing any permissions. This change just introduces the basic support but does not yet further optimize the AVC or hook function logic when a type is both a permissive type and a dontaudit type. Suggested-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-05-28Merge tag 'net-next-6.16' of ↵Linus Torvalds1-2/+0
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Paolo Abeni: "Core: - Implement the Device Memory TCP transmit path, allowing zero-copy data transmission on top of TCP from e.g. GPU memory to the wire. - Move all the IPv6 routing tables management outside the RTNL scope, under its own lock and RCU. The route control path is now 3x times faster. - Convert queue related netlink ops to instance lock, reducing again the scope of the RTNL lock. This improves the control plane scalability. - Refactor the software crc32c implementation, removing unneeded abstraction layers and improving significantly the related micro-benchmarks. - Optimize the GRO engine for UDP-tunneled traffic, for a 10% performance improvement in related stream tests. - Cover more per-CPU storage with local nested BH locking; this is a prep work to remove the current per-CPU lock in local_bh_disable() on PREMPT_RT. - Introduce and use nlmsg_payload helper, combining buffer bounds verification with accessing payload carried by netlink messages. Netfilter: - Rewrite the procfs conntrack table implementation, improving considerably the dump performance. A lot of user-space tools still use this interface. - Implement support for wildcard netdevice in netdev basechain and flowtables. - Integrate conntrack information into nft trace infrastructure. - Export set count and backend name to userspace, for better introspection. BPF: - BPF qdisc support: BPF-qdisc can be implemented with BPF struct_ops programs and can be controlled in similar way to traditional qdiscs using the "tc qdisc" command. - Refactor the UDP socket iterator, addressing long standing issues WRT duplicate hits or missed sockets. Protocols: - Improve TCP receive buffer auto-tuning and increase the default upper bound for the receive buffer; overall this improves the single flow maximum thoughput on 200Gbs link by over 60%. - Add AFS GSSAPI security class to AF_RXRPC; it provides transport security for connections to the AFS fileserver and VL server. - Improve TCP multipath routing, so that the sources address always matches the nexthop device. - Introduce SO_PASSRIGHTS for AF_UNIX, to allow disabling SCM_RIGHTS, and thus preventing DoS caused by passing around problematic FDs. - Retire DCCP socket. DCCP only receives updates for bugs, and major distros disable it by default. Its removal allows for better organisation of TCP fields to reduce the number of cache lines hit in the fast path. - Extend TCP drop-reason support to cover PAWS checks. Driver API: - Reorganize PTP ioctl flag support to require an explicit opt-in for the drivers, avoiding the problem of drivers not rejecting new unsupported flags. - Converted several device drivers to timestamping APIs. - Introduce per-PHY ethtool dump helpers, improving the support for dump operations targeting PHYs. Tests and tooling: - Add support for classic netlink in user space C codegen, so that ynl-c can now read, create and modify links, routes addresses and qdisc layer configuration. - Add ynl sub-types for binary attributes, allowing ynl-c to output known struct instead of raw binary data, clarifying the classic netlink output. - Extend MPTCP selftests to improve the code-coverage. - Add tests for XDP tail adjustment in AF_XDP. New hardware / drivers: - OpenVPN virtual driver: offload OpenVPN data channels processing to the kernel-space, increasing the data transfer throughput WRT the user-space implementation. - Renesas glue driver for the gigabit ethernet RZ/V2H(P) SoC. - Broadcom asp-v3.0 ethernet driver. - AMD Renoir ethernet device. - ReakTek MT9888 2.5G ethernet PHY driver. - Aeonsemi 10G C45 PHYs driver. Drivers: - Ethernet high-speed NICs: - nVidia/Mellanox (mlx5): - refactor the steering table handling to significantly reduce the amount of memory used - add support for complex matches in H/W flow steering - improve flow streeing error handling - convert to netdev instance locking - Intel (100G, ice, igb, ixgbe, idpf): - ice: add switchdev support for LLDP traffic over VF - ixgbe: add firmware manipulation and regions devlink support - igb: introduce support for frame transmission premption - igb: adds persistent NAPI configuration - idpf: introduce RDMA support - idpf: add initial PTP support - Meta (fbnic): - extend hardware stats coverage - add devlink dev flash support - Broadcom (bnxt): - add support for RX-side device memory TCP - Wangxun (txgbe): - implement support for udp tunnel offload - complete PTP and SRIOV support for AML 25G/10G devices - Ethernet NICs embedded and virtual: - Google (gve): - add device memory TCP TX support - Amazon (ena): - support persistent per-NAPI config - Airoha: - add H/W support for L2 traffic offload - add per flow stats for flow offloading - RealTek (rtl8211): add support for WoL magic packet - Synopsys (stmmac): - dwmac-socfpga 1000BaseX support - add Loongson-2K3000 support - introduce support for hardware-accelerated VLAN stripping - Broadcom (bcmgenet): - expose more H/W stats - Freescale (enetc, dpaa2-eth): - enetc: add MAC filter, VLAN filter RSS and loopback support - dpaa2-eth: convert to H/W timestamping APIs - vxlan: convert FDB table to rhashtable, for better scalabilty - veth: apply qdisc backpressure on full ring to reduce TX drops - Ethernet switches: - Microchip (kzZ88x3): add ETS scheduler support - Ethernet PHYs: - RealTek (rtl8211): - add support for WoL magic packet - add support for PHY LEDs - CAN: - Adds RZ/G3E CANFD support to the rcar_canfd driver. - Preparatory work for CAN-XL support. - Add self-tests framework with support for CAN physical interfaces. - WiFi: - mac80211: - scan improvements with multi-link operation (MLO) - Qualcomm (ath12k): - enable AHB support for IPQ5332 - add monitor interface support to QCN9274 - add multi-link operation support to WCN7850 - add 802.11d scan offload support to WCN7850 - monitor mode for WCN7850, better 6 GHz regulatory - Qualcomm (ath11k): - restore hibernation support - MediaTek (mt76): - WiFi-7 improvements - implement support for mt7990 - Intel (iwlwifi): - enhanced multi-link single-radio (EMLSR) support on 5 GHz links - rework device configuration - RealTek (rtw88): - improve throughput for RTL8814AU - RealTek (rtw89): - add multi-link operation support - STA/P2P concurrency improvements - support different SAR configs by antenna - Bluetooth: - introduce HCI Driver protocol - btintel_pcie: do not generate coredump for diagnostic events - btusb: add HCI Drv commands for configuring altsetting - btusb: add RTL8851BE device 0x0bda:0xb850 - btusb: add new VID/PID 13d3/3584 for MT7922 - btusb: add new VID/PID 13d3/3630 and 13d3/3613 for MT7925 - btnxpuart: implement host-wakeup feature" * tag 'net-next-6.16' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (1611 commits) selftests/bpf: Fix bpf selftest build warning selftests: netfilter: Fix skip of wildcard interface test net: phy: mscc: Stop clearing the the UDPv4 checksum for L2 frames net: openvswitch: Fix the dead loop of MPLS parse calipso: Don't call calipso functions for AF_INET sk. selftests/tc-testing: Add a test for HFSC eltree double add with reentrant enqueue behaviour on netem net_sched: hfsc: Address reentrant enqueue adding class to eltree twice octeontx2-pf: QOS: Refactor TC_HTB_LEAF_DEL_LAST callback octeontx2-pf: QOS: Perform cache sync on send queue teardown net: mana: Add support for Multi Vports on Bare metal net: devmem: ncdevmem: remove unused variable net: devmem: ksft: upgrade rx test to send 1K data net: devmem: ksft: add 5 tuple FS support net: devmem: ksft: add exit_wait to make rx test pass net: devmem: ksft: add ipv4 support net: devmem: preserve sockc_err page_pool: fix ugly page_pool formatting net: devmem: move list_add to net_devmem_bind_dmabuf. selftests: netfilter: nft_queue.sh: include file transfer duration in log message net: phy: mscc: Fix memory leak when using one step timestamping ...
2025-04-11net: Retire DCCP socket.Kuniyuki Iwashima1-2/+0
DCCP was orphaned in 2021 by commit 054c4610bd05 ("MAINTAINERS: dccp: move Gerrit Renker to CREDITS"), which noted that the last maintainer had been inactive for five years. In recent years, it has become a playground for syzbot, and most changes to DCCP have been odd bug fixes triggered by syzbot. Apart from that, the only changes have been driven by treewide or networking API updates or adjustments related to TCP. Thus, in 2023, we announced we would remove DCCP in 2025 via commit b144fcaf46d4 ("dccp: Print deprecation notice."). Since then, only one individual has contacted the netdev mailing list. [0] There is ongoing research for Multipath DCCP. The repository is hosted on GitHub [1], and development is not taking place through the upstream community. While the repository is published under the GPLv2 license, the scheduling part remains proprietary, with a LICENSE file [2] stating: "This is not Open Source software." The researcher mentioned a plan to address the licensing issue, upstream the patches, and step up as a maintainer, but there has been no further communication since then. Maintaining DCCP for a decade without any real users has become a burden. Therefore, it's time to remove it. Removing DCCP will also provide significant benefits to TCP. It allows us to freely reorganize the layout of struct inet_connection_sock, which is currently shared with DCCP, and optimize it to reduce the number of cachelines accessed in the TCP fast path. Note that we keep DCCP netfilter modules as requested. [3] Link: https://lore.kernel.org/netdev/20230710182253.81446-1-kuniyu@amazon.com/T/#u #[0] Link: https://github.com/telekom/mp-dccp #[1] Link: https://github.com/telekom/mp-dccp/blob/mpdccp_v03_k5.10/net/dccp/non_gpl_scheduler/LICENSE #[2] Link: https://lore.kernel.org/netdev/Z_VQ0KlCRkqYWXa-@calendula/ #[3] Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com> Acked-by: Paul Moore <paul@paul-moore.com> (LSM and SELinux) Acked-by: Casey Schaufler <casey@schaufler-ca.com> Link: https://patch.msgid.link/20250410023921.11307-3-kuniyu@amazon.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2025-04-11selinux: reduce path walk overheadPaul Moore1-0/+14
Reduce the SELinux performance overhead during path walks through the use of a per-task directory access cache and some minor code optimizations. The directory access cache is per-task because it allows for a lockless cache while also fitting well with a common application pattern of heavily accessing a relatively small number of SELinux directory labels. The cache is inherited by child processes when the child runs with the same SELinux domain as the parent, and invalidated on changes to the task's SELinux domain or the loaded SELinux policy. A cache of four entries was chosen based on testing with the Fedora "targeted" policy, a SELinux Reference Policy variant, and 'make allmodconfig' on Linux v6.14. Code optimizations include better use of inline functions to reduce function calls in the common case, especially in the inode revalidation code paths, and elimination of redundant checks between the LSM and SELinux layers. As mentioned briefly above, aside from general use and regression testing with the selinux-testsuite, performance was measured using 'make allmodconfig' with Linux v6.14 as a base reference. As expected, there were variations from one test run to another, but the measurements below are a good representation of the test results seen on my test system. * Linux v6.14 REF 1.26% [k] __d_lookup_rcu SELINUX (1.31%) 0.58% [k] selinux_inode_permission 0.29% [k] avc_lookup 0.25% [k] avc_has_perm_noaudit 0.19% [k] __inode_security_revalidate * Linux v6.14 + patch REF 1.41% [k] __d_lookup_rcu SELINUX (0.89%) 0.65% [k] selinux_inode_permission 0.15% [k] avc_lookup 0.05% [k] avc_has_perm_noaudit 0.04% [k] avc_policy_seqno X.XX% [k] __inode_security_revalidate (now inline) In both cases the __d_lookup_rcu() function was used as a reference point to establish a context for the SELinux related functions. On a unpatched Linux v6.14 system we see the time spent in the combined SELinux functions exceeded that of __d_lookup_rcu(), 1.31% compared to 1.26%. However, with this patch applied the time spent in the combined SELinux functions dropped to roughly 65% of the time spent in __d_lookup_rcu(), 0.89% compared to 1.41%. Aside from the significant decrease in time spent in the SELinux AVC, it appears that any additional time spent searching and updating the cache is offset by other code improvements, e.g. time spent in selinux_inode_permission() + __inode_security_revalidate() + avc_policy_seqno() is less on the patched kernel than the unpatched kernel. It is worth noting that in this patch the use of the per-task cache is limited to the security_inode_permission() LSM callback, selinux_inode_permission(), but future work could expand the cache into inode_has_perm(), likely through consolidation of the two functions. While this would likely have little to no impact on path walks, it may benefit other operations. Reviewed-by: Stephen Smalley <stephen.smalley.work@gmail.com> Tested-by: Stephen Smalley <stephen.smalley.work@gmail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11selinux: support wildcard match in genfsconTakaya Saeki2-0/+2
Currently, genfscon only supports string prefix match to label files. Thus, labeling numerous dynamic sysfs entries requires many specific path rules. For example, labeling device paths such as `/sys/devices/pci0000:00/0000:00:03.1/<...>/0000:04:00.1/wakeup` requires listing all specific PCI paths, which is challenging to maintain. While user-space restorecon can handle these paths with regular expression rules, relabeling thousands of paths under sysfs after it is mounted is inefficient compared to using genfscon. This commit adds wildcard matching to genfscon to make rules more efficient and expressive. This new behavior is enabled by genfs_seclabel_wildcard capability. With this capability, genfscon does wildcard matching instead of prefix matching. When multiple wildcard rules match against a path, then the longest rule (determined by the length of the rule string) will be applied. If multiple rules of the same length match, the first matching rule encountered in the given genfscon policy will be applied. Users are encouraged to write longer, more explicit path rules to avoid relying on this behavior. This change resulted in nice real-world performance improvements. For example, boot times on test Android devices were reduced by 15%. This improvement is due to the elimination of the "restorecon -R /sys" step during boot, which takes more than two seconds in the worst case. Signed-off-by: Takaya Saeki <takayas@chromium.org> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11selinux: contify network namespace pointerChristian Göttsche1-1/+1
The network namespace is not modified. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-04-11selinux: constify network address pointerChristian Göttsche2-2/+2
The network address, either an IPv4 or IPv6 one, is not modified. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-03-25Merge tag 'selinux-pr-20250323' of ↵Linus Torvalds4-2/+12
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Add additional SELinux access controls for kernel file reads/loads The SELinux kernel file read/load access controls were never updated beyond the initial kernel module support, this pull request adds support for firmware, kexec, policies, and x.509 certificates. - Add support for wildcards in network interface names There are a number of userspace tools which auto-generate network interface names using some pattern of <XXXX>-<NN> where <XXXX> is a fixed string, e.g. "podman", and <NN> is a increasing counter. Supporting wildcards in the SELinux policy for network interfaces simplifies the policy associted with these interfaces. - Fix a potential problem in the kernel read file SELinux code SELinux should always check the file label in the security_kernel_read_file() LSM hook, regardless of if the file is being read in chunks. Unfortunately, the existing code only considered the file label on the first chunk; this pull request fixes this problem. There is more detail in the individual commit, but thankfully the existing code didn't expose a bug due to multi-stage reads only taking place in one driver, and that driver loading a file type that isn't targeted by the SELinux policy. - Fix the subshell error handling in the example policy loader Minor fix to SELinux example policy loader in scripts/selinux due to an undesired interaction with subshells and errexit. * tag 'selinux-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: get netif_wildcard policycap from policy instead of cache selinux: support wildcard network interface names selinux: Chain up tool resolving errors in install_policy.sh selinux: add permission checks for loading other kinds of kernel files selinux: always check the file label in selinux_kernel_read_file() selinux: fix spelling error
2025-03-25Merge tag 'lsm-pr-20250323' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull lsm updates from Paul Moore: - Various minor updates to the LSM Rust bindings Changes include marking trivial Rust bindings as inlines and comment tweaks to better reflect the LSM hooks. - Add LSM/SELinux access controls to io_uring_allowed() Similar to the io_uring_disabled sysctl, add a LSM hook to io_uring_allowed() to enable LSMs a simple way to enforce security policy on the use of io_uring. This pull request includes SELinux support for this new control using the io_uring/allowed permission. - Remove an unused parameter from the security_perf_event_open() hook The perf_event_attr struct parameter was not used by any currently supported LSMs, remove it from the hook. - Add an explicit MAINTAINERS entry for the credentials code We've seen problems in the past where patches to the credentials code sent by non-maintainers would often languish on the lists for multiple months as there was no one explicitly tasked with the responsibility of reviewing and/or merging credentials related code. Considering that most of the code under security/ has a vested interest in ensuring that the credentials code is well maintained, I'm volunteering to look after the credentials code and Serge Hallyn has also volunteered to step up as an official reviewer. I posted the MAINTAINERS update as a RFC to LKML in hopes that someone else would jump up with an "I'll do it!", but beyond Serge it was all crickets. - Update Stephen Smalley's old email address to prevent confusion This includes a corresponding update to the mailmap file. * tag 'lsm-pr-20250323' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: mailmap: map Stephen Smalley's old email addresses lsm: remove old email address for Stephen Smalley MAINTAINERS: add Serge Hallyn as a credentials reviewer MAINTAINERS: add an explicit credentials entry cred,rust: mark Credential methods inline lsm,rust: reword "destroy" -> "release" in SecurityCtx lsm,rust: mark SecurityCtx methods inline perf: Remove unnecessary parameter of security check lsm: fix a missing security_uring_allowed() prototype io_uring,lsm,selinux: add LSM hooks for io_uring_setup() io_uring: refactor io_uring_allowed()
2025-03-07selinux: support wildcard network interface namesChristian Göttsche3-1/+9
Add support for wildcard matching of network interface names. This is useful for auto-generated interfaces, for example podman creates network interfaces for containers with the naming scheme podman0, podman1, podman2, ... To maintain backward compatibility guard this feature with a new policy capability 'netif_wildcard'. Netifcon definitions are compared against in the order given by the policy, so userspace tools should sort them in a reasonable order. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-02-27selinux: add FILE__WATCH_MOUNTNSMiklos Szeredi1-1/+1
Watching mount namespaces for changes (mount, umount, move mount) was added by previous patches. This patch adds the file/watch_mountns permission that can be applied to nsfs files (/proc/$$/ns/mnt), making it possible to allow or deny watching a particular namespace for changes. Suggested-by: Paul Moore <paul@paul-moore.com> Link: https://lore.kernel.org/all/CAHC9VhTOmCjCSE2H0zwPOmpFopheexVb6jyovz92ZtpKtoVv6A@mail.gmail.com/ Signed-off-by: Miklos Szeredi <mszeredi@redhat.com> Link: https://lore.kernel.org/r/20250224154836.958915-1-mszeredi@redhat.com Acked-by: Paul Moore <paul@paul-moore.com> Signed-off-by: Christian Brauner <brauner@kernel.org>
2025-02-26selinux: add permission checks for loading other kinds of kernel files"Kipp N. Davis"1-1/+3
Although the LSM hooks for loading kernel modules were later generalized to cover loading other kinds of files, SELinux didn't implement corresponding permission checks, leaving only the module case covered. Define and add new permission checks for these other cases. Signed-off-by: Cameron K. Williams <ckwilliams.work@gmail.com> Signed-off-by: Kipp N. Davis <kippndavis.work@gmx.com> Acked-by: Stephen Smalley <stephen.smalley.work@gmail.com> [PM: merge fuzz, line length, and spacing fixes] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-02-07io_uring,lsm,selinux: add LSM hooks for io_uring_setup()Hamza Mahfooz1-1/+1
It is desirable to allow LSM to configure accessibility to io_uring because it is a coarse yet very simple way to restrict access to it. So, add an LSM for io_uring_allowed() to guard access to io_uring. Cc: Paul Moore <paul@paul-moore.com> Signed-off-by: Hamza Mahfooz <hamzamahfooz@linux.microsoft.com> Acked-by: Jens Axboe <axboe@kernel.dk> [PM: merge fuzz due to changes in preceding patches, subj tweak] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-01-21Merge tag 'selinux-pr-20250121' of ↵Linus Torvalds3-5/+6
git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux Pull selinux updates from Paul Moore: - Extended permissions supported in conditional policy The SELinux extended permissions, aka "xperms", allow security admins to target individuals ioctls, and recently netlink messages, with their SELinux policy. Adding support for conditional policies allows admins to toggle the granular xperms using SELinux booleans, helping pave the way for greater use of xperms in general purpose SELinux policies. This change bumps the maximum SELinux policy version to 34. - Fix a SCTP/SELinux error return code inconsistency Depending on the loaded SELinux policy, specifically it's EXTSOCKCLASS support, the bind(2) LSM/SELinux hook could return different error codes due to the SELinux code checking the socket's SELinux object class (which can vary depending on EXTSOCKCLASS) and not the socket's sk_protocol field. We fix this by doing the obvious, and looking at the sock->sk_protocol field instead of the object class. - Makefile fixes to properly cleanup av_permissions.h Add av_permissions.h to "targets" so that it is properly cleaned up using the kbuild infrastructure. - A number of smaller improvements by Christian Göttsche A variety of straightforward changes to reduce code duplication, reduce pointer lookups, migrate void pointers to defined types, simplify code, constify function parameters, and correct iterator types. * tag 'selinux-pr-20250121' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/selinux: selinux: make more use of str_read() when loading the policy selinux: avoid unnecessary indirection in struct level_datum selinux: use known type instead of void pointer selinux: rename comparison functions for clarity selinux: rework match_ipv6_addrmask() selinux: constify and reconcile function parameter names selinux: avoid using types indicating user space interaction selinux: supply missing field initializers selinux: add netlink nlmsg_type audit message selinux: add support for xperms in conditional policies selinux: Fix SCTP error inconsistency in selinux_socket_bind() selinux: use native iterator types selinux: add generated av_permissions.h to targets
2025-01-07selinux: constify and reconcile function parameter namesChristian Göttsche2-3/+3
Align the parameter names between declarations and definitions, and constify read-only parameters. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: tweak the subject line] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-01-07selinux: supply missing field initializersChristian Göttsche1-1/+1
Please clang by supplying the missing field initializers in the secclass_map variable and sel_fill_super() function. Signed-off-by: Christian Göttsche <cgzones@googlemail.com> [PM: tweak subj and commit description] Signed-off-by: Paul Moore <paul@paul-moore.com>
2025-01-04selinux: match extended permissions to their base permissionsThiébaud Weksteen2-1/+7
In commit d1d991efaf34 ("selinux: Add netlink xperm support") a new extended permission was added ("nlmsg"). This was the second extended permission implemented in selinux ("ioctl" being the first one). Extended permissions are associated with a base permission. It was found that, in the access vector cache (avc), the extended permission did not keep track of its base permission. This is an issue for a domain that is using both extended permissions (i.e., a domain calling ioctl() on a netlink socket). In this case, the extended permissions were overlapping. Keep track of the base permission in the cache. A new field "base_perm" is added to struct extended_perms_decision to make sure that the extended permission refers to the correct policy permission. A new field "base_perms" is added to struct extended_perms to quickly decide if extended permissions apply. While it is in theory possible to retrieve the base permission from the access vector, the same base permission may not be mapped to the same bit for each class (e.g., "nlmsg" is mapped to a different bit for "netlink_route_socket" and "netlink_audit_socket"). Instead, use a constant (AVC_EXT_IOCTL or AVC_EXT_NLMSG) provided by the caller. Fixes: d1d991efaf34 ("selinux: Add netlink xperm support") Signed-off-by: Thiébaud Weksteen <tweek@google.com> Signed-off-by: Paul Moore <paul@paul-moore.com>
2024-12-13selinux: add support for xperms in conditional policiesChristian Göttsche1-1/+2
Add support for extended permission rules in conditional policies. Currently the kernel accepts such rules already, but evaluating a security decision