| Age | Commit message (Collapse) | Author | Files | Lines |
|
Pull Rust fixes from Miguel Ojeda:
- Soundness: make internal functions generated by the 'module!' macro
inaccessible, do not implement 'Zeroable' for 'Infallible' and
require 'Send' for the 'Module' trait.
- Build: avoid errors with "empty" files and workaround 'rustdoc' ICE.
- Kconfig: depend on '!CFI_CLANG' and avoid selecting 'CONSTRUCTORS'.
- Code docs: remove non-existing key from 'module!' macro example.
- Docs: trivial rendering fix in arch table.
* tag 'rust-fixes-6.9' of https://github.com/Rust-for-Linux/linux:
rust: remove `params` from `module` macro example
kbuild: rust: force `alloc` extern to allow "empty" Rust files
kbuild: rust: remove unneeded `@rustc_cfg` to avoid ICE
rust: kernel: require `Send` for `Module` implementations
rust: phy: implement `Send` for `Registration`
rust: make mutually exclusive with CFI_CLANG
rust: macros: fix soundness issue in `module!` macro
rust: init: remove impl Zeroable for Infallible
docs: rust: fix improper rendering in Arch Support page
rust: don't select CONSTRUCTORS
|
|
On RISC-V and arm64, and presumably x86, if CFI_CLANG is enabled,
loading a rust module will trigger a kernel panic. Support for
sanitisers, including kcfi (CFI_CLANG), is in the works, but for now
they're nightly-only options in rustc. Make RUST depend on !CFI_CLANG
to prevent configuring a kernel without symmetrical support for kfi.
[ Matthew Maurer writes [1]:
This patch is fine by me - the last patch needed for KCFI to be
functional in Rust just landed upstream last night, so we should
revisit this (in the form of enabling it) once we move to
`rustc-1.79.0` or later.
Ramon de C Valle also gave feedback [2] on the status of KCFI for
Rust and created a tracking issue [3] in upstream Rust. - Miguel ]
Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support")
Cc: stable@vger.kernel.org
Signed-off-by: Conor Dooley <conor.dooley@microchip.com>
Acked-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/rust-for-linux/CAGSQo024u1gHJgzsO38Xg3c4or+JupoPABQx_+0BLEpPg0cOEA@mail.gmail.com/ [1]
Link: https://lore.kernel.org/rust-for-linux/CAOcBZOS2kPyH0Dm7Fuh4GC3=v7nZhyzBj_-dKu3PfAnrHZvaxg@mail.gmail.com/ [2]
Link: https://github.com/rust-lang/rust/issues/123479 [3]
Link: https://lore.kernel.org/r/20240404-providing-emporium-e652e359c711@spud
[ Added feedback from the list, links, and used Cc for the tag. ]
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
We allocate memory of size 'xlen + strlen(boot_command_line) + 1' for
static_command_line, but the strings copied into static_command_line are
extra_command_line and command_line, rather than extra_command_line and
boot_command_line.
When strlen(command_line) > strlen(boot_command_line), static_command_line
will overflow.
This patch just recovers strlen(command_line) which was miss-consolidated
with strlen(boot_command_line) in the commit f5c7310ac73e ("init/main: add
checks for the return value of memblock_alloc*()")
Link: https://lore.kernel.org/all/20240412081733.35925-2-ytcoode@gmail.com/
Fixes: f5c7310ac73e ("init/main: add checks for the return value of memblock_alloc*()")
Cc: stable@vger.kernel.org
Signed-off-by: Yuntao Wang <ytcoode@gmail.com>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
If the "bootconfig" kernel command-line argument was specified or if
the kernel was built with CONFIG_BOOT_CONFIG_FORCE, but if there are
no embedded kernel parameter, omit the "# Parameters from bootloader:"
comment from the /proc/bootconfig file. This will cause automation
to fall back to the /proc/cmdline file, which will be identical to the
comment in this no-embedded-kernel-parameters case.
Link: https://lore.kernel.org/all/20240409044358.1156477-2-paulmck@kernel.org/
Fixes: 8b8ce6c75430 ("fs/proc: remove redundant comments from /proc/bootconfig")
Signed-off-by: Masami Hiramatsu <mhiramat@kernel.org>
Signed-off-by: Paul E. McKenney <paulmck@kernel.org>
Cc: stable@vger.kernel.org
Acked-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Signed-off-by: Masami Hiramatsu (Google) <mhiramat@kernel.org>
|
|
If a member of a cpio archive for an initrd or initrams is larger than
2Gb, we'll eventually fail to write to that file when we get to that
limit, unless O_LARGEFILE is set.
The problem can be seen with this recipe, assuming that BLK_DEV_RAM
is not configured:
cd /tmp
dd if=/dev/zero of=BIGFILE bs=1048576 count=2200
echo BIGFILE | cpio -o -H newc -R root:root > initrd.img
kexec -l /boot/vmlinuz-$(uname -r) --initrd=initrd.img --reuse-cmdline
kexec -e
The console will show 'Initramfs unpacking failed: write error'. With
the patch, the error is gone.
Link: https://lkml.kernel.org/r/20240323152934.3307391-1-jsperbeck@google.com
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
This was originally part of commit 4b9a68f2e59a0 ("rust: add support for
static synchronisation primitives") from the old Rust branch, which used
module constructors to initialize globals containing various
synchronisation primitives with pin-init. That commit has never been
upstreamed, but the `select CONSTRUCTORS` statement ended up being
included in the patch that initially added Rust support to the Linux
Kernel.
We are not using module constructors, so let's remove the select.
Signed-off-by: Alice Ryhl <aliceryhl@google.com>
Reviewed-by: Benno Lossin <benno.lossin@proton.me>
Cc: stable@vger.kernel.org
Fixes: 2f7ab1267dc9 ("Kbuild: add Rust support")
Link: https://lore.kernel.org/r/20240308-constructors-v1-1-4c811342391c@google.com
Signed-off-by: Miguel Ojeda <ojeda@kernel.org>
|
|
If initrd data is larger than 2Gb, we'll eventually fail to write to the
/initrd.image file when we hit that limit, unless O_LARGEFILE is set.
Link: https://lkml.kernel.org/r/20240317221522.896040-1-jsperbeck@google.com
Signed-off-by: John Sperbeck <jsperbeck@google.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux
Pull RISC-V updates from Palmer Dabbelt:
- Support for various vector-accelerated crypto routines
- Hibernation is now enabled for portable kernel builds
- mmap_rnd_bits_max is larger on systems with larger VAs
- Support for fast GUP
- Support for membarrier-based instruction cache synchronization
- Support for the Andes hart-level interrupt controller and PMU
- Some cleanups around unaligned access speed probing and Kconfig
settings
- Support for ACPI LPI and CPPC
- Various cleanus related to barriers
- A handful of fixes
* tag 'riscv-for-linus-6.9-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (66 commits)
riscv: Fix syscall wrapper for >word-size arguments
crypto: riscv - add vector crypto accelerated AES-CBC-CTS
crypto: riscv - parallelize AES-CBC decryption
riscv: Only flush the mm icache when setting an exec pte
riscv: Use kcalloc() instead of kzalloc()
riscv/barrier: Add missing space after ','
riscv/barrier: Consolidate fence definitions
riscv/barrier: Define RISCV_FULL_BARRIER
riscv/barrier: Define __{mb,rmb,wmb}
RISC-V: defconfig: Enable CONFIG_ACPI_CPPC_CPUFREQ
cpufreq: Move CPPC configs to common Kconfig and add RISC-V
ACPI: RISC-V: Add CPPC driver
ACPI: Enable ACPI_PROCESSOR for RISC-V
ACPI: RISC-V: Add LPI driver
cpuidle: RISC-V: Move few functions to arch/riscv
riscv: Introduce set_compat_task() in asm/compat.h
riscv: Introduce is_compat_thread() into compat.h
riscv: add compile-time test into is_compat_task()
riscv: Replace direct thread flag check with is_compat_task()
riscv: Improve arch_get_mmap_end() macro
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild
Pull Kbuild updates from Masahiro Yamada:
- Generate a list of built DTB files (arch/*/boot/dts/dtbs-list)
- Use more threads when building Debian packages in parallel
- Fix warnings shown during the RPM kernel package uninstallation
- Change OBJECT_FILES_NON_STANDARD_*.o etc. to take a relative path to
Makefile
- Support GCC's -fmin-function-alignment flag
- Fix a null pointer dereference bug in modpost
- Add the DTB support to the RPM package
- Various fixes and cleanups in Kconfig
* tag 'kbuild-v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: (67 commits)
kconfig: tests: test dependency after shuffling choices
kconfig: tests: add a test for randconfig with dependent choices
kconfig: tests: support KCONFIG_SEED for the randconfig runner
kbuild: rpm-pkg: add dtb files in kernel rpm
kconfig: remove unneeded menu_is_visible() call in conf_write_defconfig()
kconfig: check prompt for choice while parsing
kconfig: lxdialog: remove unused dialog colors
kconfig: lxdialog: fix button color for blackbg theme
modpost: fix null pointer dereference
kbuild: remove GCC's default -Wpacked-bitfield-compat flag
kbuild: unexport abs_srctree and abs_objtree
kbuild: Move -Wenum-{compare-conditional,enum-conversion} into W=1
kconfig: remove named choice support
kconfig: use linked list in get_symbol_str() to iterate over menus
kconfig: link menus to a symbol
kbuild: fix inconsistent indentation in top Makefile
kbuild: Use -fmin-function-alignment when available
alpha: merge two entries for CONFIG_ALPHA_GAMMA
alpha: merge two entries for CONFIG_ALPHA_EV4
kbuild: change DTC_FLAGS_<basetarget>.o to take the path relative to $(obj)
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull non-MM updates from Andrew Morton:
- Kuan-Wei Chiu has developed the well-named series "lib min_heap: Min
heap optimizations".
- Kuan-Wei Chiu has also sped up the library sorting code in the series
"lib/sort: Optimize the number of swaps and comparisons".
- Alexey Gladkov has added the ability for code running within an IPC
namespace to alter its IPC and MQ limits. The series is "Allow to
change ipc/mq sysctls inside ipc namespace".
- Geert Uytterhoeven has contributed some dhrystone maintenance work in
the series "lib: dhry: miscellaneous cleanups".
- Ryusuke Konishi continues nilfs2 maintenance work in the series
"nilfs2: eliminate kmap and kmap_atomic calls"
"nilfs2: fix kernel bug at submit_bh_wbc()"
- Nathan Chancellor has updated our build tools requirements in the
series "Bump the minimum supported version of LLVM to 13.0.1".
- Muhammad Usama Anjum continues with the selftests maintenance work in
the series "selftests/mm: Improve run_vmtests.sh".
- Oleg Nesterov has done some maintenance work against the signal code
in the series "get_signal: minor cleanups and fix".
Plus the usual shower of singleton patches in various parts of the tree.
Please see the individual changelogs for details.
* tag 'mm-nonmm-stable-2024-03-14-09-36' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (77 commits)
nilfs2: prevent kernel bug at submit_bh_wbc()
nilfs2: fix failure to detect DAT corruption in btree and direct mappings
ocfs2: enable ocfs2_listxattr for special files
ocfs2: remove SLAB_MEM_SPREAD flag usage
assoc_array: fix the return value in assoc_array_insert_mid_shortcut()
buildid: use kmap_local_page()
watchdog/core: remove sysctl handlers from public header
nilfs2: use div64_ul() instead of do_div()
mul_u64_u64_div_u64: increase precision by conditionally swapping a and b
kexec: copy only happens before uchunk goes to zero
get_signal: don't initialize ksig->info if SIGNAL_GROUP_EXIT/group_exec_task
get_signal: hide_si_addr_tag_bits: fix the usage of uninitialized ksig
get_signal: don't abuse ksig->info.si_signo and ksig->sig
const_structs.checkpatch: add device_type
Normalise "name (ad@dr)" MODULE_AUTHORs to "name <ad@dr>"
dyndbg: replace kstrdup() + strchr() with kstrdup_and_replace()
list: leverage list_is_head() for list_entry_is_head()
nilfs2: MAINTAINERS: drop unreachable project mirror site
smp: make __smp_processor_id() 0-argument macro
fat: fix uninitialized field in nostale filehandles
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm
Pull MM updates from Andrew Morton:
- Sumanth Korikkar has taught s390 to allocate hotplug-time page frames
from hotplugged memory rather than only from main memory. Series
"implement "memmap on memory" feature on s390".
- More folio conversions from Matthew Wilcox in the series
"Convert memcontrol charge moving to use folios"
"mm: convert mm counter to take a folio"
- Chengming Zhou has optimized zswap's rbtree locking, providing
significant reductions in system time and modest but measurable
reductions in overall runtimes. The series is "mm/zswap: optimize the
scalability of zswap rb-tree".
- Chengming Zhou has also provided the series "mm/zswap: optimize zswap
lru list" which provides measurable runtime benefits in some
swap-intensive situations.
- And Chengming Zhou further optimizes zswap in the series "mm/zswap:
optimize for dynamic zswap_pools". Measured improvements are modest.
- zswap cleanups and simplifications from Yosry Ahmed in the series
"mm: zswap: simplify zswap_swapoff()".
- In the series "Add DAX ABI for memmap_on_memory", Vishal Verma has
contributed several DAX cleanups as well as adding a sysfs tunable to
control the memmap_on_memory setting when the dax device is
hotplugged as system memory.
- Johannes Weiner has added the large series "mm: zswap: cleanups",
which does that.
- More DAMON work from SeongJae Park in the series
"mm/damon: make DAMON debugfs interface deprecation unignorable"
"selftests/damon: add more tests for core functionalities and corner cases"
"Docs/mm/damon: misc readability improvements"
"mm/damon: let DAMOS feeds and tame/auto-tune itself"
- In the series "mm/mempolicy: weighted interleave mempolicy and sysfs
extension" Rakie Kim has developed a new mempolicy interleaving
policy wherein we allocate memory across nodes in a weighted fashion
rather than uniformly. This is beneficial in heterogeneous memory
environments appearing with CXL.
- Christophe Leroy has contributed some cleanup and consolidation work
against the ARM pagetable dumping code in the series "mm: ptdump:
Refactor CONFIG_DEBUG_WX and check_wx_pages debugfs attribute".
- Luis Chamberlain has added some additional xarray selftesting in the
series "test_xarray: advanced API multi-index tests".
- Muhammad Usama Anjum has reworked the selftest code to make its
human-readable output conform to the TAP ("Test Anything Protocol")
format. Amongst other things, this opens up the use of third-party
tools to parse and process out selftesting results.
- Ryan Roberts has added fork()-time PTE batching of THP ptes in the
series "mm/memory: optimize fork() with PTE-mapped THP". Mainly
targeted at arm64, this significantly speeds up fork() when the
process has a large number of pte-mapped folios.
- David Hildenbrand also gets in on the THP pte batching game in his
series "mm/memory: optimize unmap/zap with PTE-mapped THP". It
implements batching during munmap() and other pte teardown
situations. The microbenchmark improvements are nice.
- And in the series "Transparent Contiguous PTEs for User Mappings"
Ryan Roberts further utilizes arm's pte's contiguous bit ("contpte
mappings"). Kernel build times on arm64 improved nicely. Ryan's
series "Address some contpte nits" provides some followup work.
- In the series "mm/hugetlb: Restore the reservation" Breno Leitao has
fixed an obscure hugetlb race which was causing unnecessary page
faults. He has also added a reproducer under the selftest code.
- In the series "selftests/mm: Output cleanups for the compaction
test", Mark Brown did what the title claims.
- Kinsey Ho has added the series "mm/mglru: code cleanup and
refactoring".
- Even more zswap material from Nhat Pham. The series "fix and extend
zswap kselftests" does as claimed.
- In the series "Introduce cpu_dcache_is_aliasing() to fix DAX
regression" Mathieu Desnoyers has cleaned up and fixed rather a mess
in our handling of DAX on archiecctures which have virtually aliasing
data caches. The arm architecture is the main beneficiary.
- Lokesh Gidra's series "per-vma locks in userfaultfd" provides
dramatic improvements in worst-case mmap_lock hold times during
certain userfaultfd operations.
- Some page_owner enhancements and maintenance work from Oscar Salvador
in his series
"page_owner: print stacks and their outstanding allocations"
"page_owner: Fixup and cleanup"
- Uladzislau Rezki has contributed some vmalloc scalability
improvements in his series "Mitigate a vmap lock contention". It
realizes a 12x improvement for a certain microbenchmark.
- Some kexec/crash cleanup work from Baoquan He in the series "Split
crash out from kexec and clean up related config items".
- Some zsmalloc maintenance work from Chengming Zhou in the series
"mm/zsmalloc: fix and optimize objects/page migration"
"mm/zsmalloc: some cleanup for get/set_zspage_mapping()"
- Zi Yan has taught the MM to perform compaction on folios larger than
order=0. This a step along the path to implementaton of the merging
of large anonymous folios. The series is named "Enable >0 order folio
memory compaction".
- Christoph Hellwig has done quite a lot of cleanup work in the
pagecache writeback code in his series "convert write_cache_pages()
to an iterator".
- Some modest hugetlb cleanups and speedups in Vishal Moola's series
"Handle hugetlb faults under the VMA lock".
- Zi Yan has changed the page splitting code so we can split huge pages
into sizes other than order-0 to better utilize large folios. The
series is named "Split a folio to any lower order folios".
- David Hildenbrand has contributed the series "mm: remove
total_mapcount()", a cleanup.
- Matthew Wilcox has sought to improve the performance of bulk memory
freeing in his series "Rearrange batched folio freeing".
- Gang Li's series "hugetlb: parallelize hugetlb page init on boot"
provides large improvements in bootup times on large machines which
are configured to use large numbers of hugetlb pages.
- Matthew Wilcox's series "PageFlags cleanups" does that.
- Qi Zheng's series "minor fixes and supplement for ptdesc" does that
also. S390 is affected.
- Cleanups to our pagemap utility functions from Peter Xu in his series
"mm/treewide: Replace pXd_large() with pXd_leaf()".
- Nico Pache has fixed a few things with our hugepage selftests in his
series "selftests/mm: Improve Hugepage Test Handling in MM
Selftests".
- Also, of course, many singleton patches to many things. Please see
the individual changelogs for details.
* tag 'mm-stable-2024-03-13-20-04' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: (435 commits)
mm/zswap: remove the memcpy if acomp is not sleepable
crypto: introduce: acomp_is_async to expose if comp drivers might sleep
memtest: use {READ,WRITE}_ONCE in memory scanning
mm: prohibit the last subpage from reusing the entire large folio
mm: recover pud_leaf() definitions in nopmd case
selftests/mm: skip the hugetlb-madvise tests on unmet hugepage requirements
selftests/mm: skip uffd hugetlb tests with insufficient hugepages
selftests/mm: dont fail testsuite due to a lack of hugepages
mm/huge_memory: skip invalid debugfs new_order input for folio split
mm/huge_memory: check new folio order when split a folio
mm, vmscan: retry kswapd's priority loop with cache_trim_mode off on failure
mm: add an explicit smp_wmb() to UFFDIO_CONTINUE
mm: fix list corruption in put_pages_list
mm: remove folio from deferred split list before uncharging it
filemap: avoid unnecessary major faults in filemap_fault()
mm,page_owner: drop unnecessary check
mm,page_owner: check for null stack_record before bumping its refcount
mm: swap: fix race between free_swap_and_cache() and swapoff()
mm/treewide: align up pXd_leaf() retval across archs
mm/treewide: drop pXd_large()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux
Pull modules updates from Luis Chamberlain:
"Christophe Leroy did most of the work on this release, first with a
few cleanups on CONFIG_STRICT_KERNEL_RWX and ending with error
handling for when set_memory_XX() can fail.
This is part of a larger effort to clean up all these callers which
can fail, modules is just part of it"
* tag 'modules-6.9-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux:
module: Don't ignore errors from set_memory_XX()
lib/test_kmod: fix kernel-doc warnings
powerpc: Simplify strict_kernel_rwx_enabled()
modules: Remove #ifdef CONFIG_STRICT_MODULE_RWX around rodata_enabled
init: Declare rodata_enabled and mark_rodata_ro() at all time
module: Change module_enable_{nx/x/ro}() to more explicit names
module: Use set_memory_rox()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next
Pull networking updates from Jakub Kicinski:
"Core & protocols:
- Large effort by Eric to lower rtnl_lock pressure and remove locks:
- Make commonly used parts of rtnetlink (address, route dumps
etc) lockless, protected by RCU instead of rtnl_lock.
- Add a netns exit callback which already holds rtnl_lock,
allowing netns exit to take rtnl_lock once in the core instead
of once for each driver / callback.
- Remove locks / serialization in the socket diag interface.
- Remove 6 calls to synchronize_rcu() while holding rtnl_lock.
- Remove the dev_base_lock, depend on RCU where necessary.
- Support busy polling on a per-epoll context basis. Poll length and
budget parameters can be set independently of system defaults.
- Introduce struct net_hotdata, to make sure read-mostly global
config variables fit in as few cache lines as possible.
- Add optional per-nexthop statistics to ease monitoring / debug of
ECMP imbalance problems.
- Support TCP_NOTSENT_LOWAT in MPTCP.
- Ensure that IPv6 temporary addresses' preferred lifetimes are long
enough, compared to other configured lifetimes, and at least 2 sec.
- Support forwarding of ICMP Error messages in IPSec, per RFC 4301.
- Add support for the independent control state machine for bonding
per IEEE 802.1AX-2008 5.4.15 in addition to the existing coupled
control state machine.
- Add "network ID" to MCTP socket APIs to support hosts with multiple
disjoint MCTP networks.
- Re-use the mono_delivery_time skbuff bit for packets which user
space wants to be sent at a specified time. Maintain the timing
information while traversing veth links, bridge etc.
- Take advantage of MSG_SPLICE_PAGES for RxRPC DATA and ACK packets.
- Simplify many places iterating over netdevs by using an xarray
instead of a hash table walk (hash table remains in place, for use
on fastpaths).
- Speed up scanning for expired routes by keeping a dedicated list.
- Speed up "generic" XDP by trying harder to avoid large allocations.
- Support attaching arbitrary metadata to netconsole messages.
Things we sprinkled into general kernel code:
- Enforce VM_IOREMAP flag and range in ioremap_page_range and
introduce VM_SPARSE kind and vm_area_[un]map_pages (used by
bpf_arena).
- Rework selftest harness to enable the use of the full range of ksft
exit code (pass, fail, skip, xfail, xpass).
Netfilter:
- Allow userspace to define a table that is exclusively owned by a
daemon (via netlink socket aliveness) without auto-removing this
table when the userspace program exits. Such table gets marked as
orphaned and a restarting management daemon can re-attach/regain
ownership.
- Speed up element insertions to nftables' concatenated-ranges set
type. Compact a few related data structures.
BPF:
- Add BPF token support for delegating a subset of BPF subsystem
functionality from privileged system-wide daemons such as systemd
through special mount options for userns-bound BPF fs to a trusted
& unprivileged application.
- Introduce bpf_arena which is sparse shared memory region between
BPF program and user space where structures inside the arena can
have pointers to other areas of the arena, and pointers work
seamlessly for both user-space programs and BPF programs.
- Introduce may_goto instruction that is a contract between the
verifier and the program. The verifier allows the program to loop
assuming it's behaving well, but reserves the right to terminate
it.
- Extend the BPF verifier to enable static subprog calls in spin lock
critical sections.
- Support registration of struct_ops types from modules which helps
projects like fuse-bpf that seeks to implement a new struct_ops
type.
- Add support for retrieval of cookies for perf/kprobe multi links.
- Support arbitrary TCP SYN cookie generation / validation in the TC
layer with BPF to allow creating SYN flood handling in BPF
firewalls.
- Add code generation to inline the bpf_kptr_xchg() helper which
improves performance when stashing/popping the allocated BPF
objects.
Wireless:
- Add SPP (signaling and payload protected) AMSDU support.
- Support wider bandwidth OFDMA, as required for EHT operation.
Driver API:
- Major overhaul of the Energy Efficient Ethernet internals to
support new link modes (2.5GE, 5GE), share more code between
drivers (especially those using phylib), and encourage more
uniform behavior. Convert and clean up drivers.
- Define an API for querying per netdev queue statistics from
drivers.
- IPSec: account in global stats for fully offloaded sessions.
- Create a concept of Ethernet PHY Packages at the Device Tree level,
to allow parameterizing the existing PHY package code.
- Enable Rx hashing (RSS) on GTP protocol fields.
Misc:
- Improvements and refactoring all over networking selftests.
- Create uniform module aliases for TC classifiers, actions, and
packet schedulers to simplify creating modprobe policies.
- Address all missing MODULE_DESCRIPTION() warnings in networking.
- Extend the Netlink descriptions in YAML to cover message
encapsulation or "Netlink polymorphism", where interpretation of
nested attributes depends on link type, classifier type or some
other "class type".
Drivers:
- Ethernet high-speed NICs:
- Add a new driver for Marvell's Octeon PCI Endpoint NIC VF.
- Intel (100G, ice, idpf):
- support E825-C devices
- nVidia/Mellanox:
- support devices with one port and multiple PCIe links
- Broadcom (bnxt):
- support n-tuple filters
- support configuring the RSS key
- Wangxun (ngbe/txgbe):
- implement irq_domain for TXGBE's sub-interrupts
- Pensando/AMD:
- support XDP
- optimize queue submission and wakeup handling (+17% bps)
- optimize struct layout, saving 28% of memory on queues
- Ethernet NICs embedded and virtual:
- Google cloud vNIC:
- refactor driver to perform memory allocations for new queue
config before stopping and freeing the old queue memory
- Synopsys (stmmac):
- obey queueMaxSDU and implement counters required by 802.1Qbv
- Renesas (ravb):
- support packet checksum offload
- suspend to RAM and runtime PM support
- Ethernet switches:
- nVidia/Mellanox:
- support for nexthop group statistics
- Microchip:
- ksz8: implement PHY loopback
- add support for KSZ8567, a 7-port 10/100Mbps switch
- PTP:
- New driver for RENESAS FemtoClock3 Wireless clock generator.
- Support OCP PTP cards designed and built by Adva.
- CAN:
- Support recvmsg() flags for own, local and remote traffic on CAN
BCM sockets.
- Support for esd GmbH PCIe/402 CAN device family.
- m_can:
- Rx/Tx submission coalescing
- wake on frame Rx
- WiFi:
- Intel (iwlwifi):
- enable signaling and payload protected A-MSDUs
- support wider-bandwidth OFDMA
- support for new devices
- bump FW API to 89 for AX devices; 90 for BZ/SC devices
- MediaTek (mt76):
- mt7915: newer ADIE version support
- mt7925: radio temperature sensor support
- Qualcomm (ath11k):
- support 6 GHz station power modes: Low Power Indoor (LPI),
Standard Power) SP and Very Low Power (VLP)
- QCA6390 & WCN6855: support 2 concurrent station interfaces
- QCA2066 support
- Qualcomm (ath12k):
- refactoring in preparation for Multi-Link Operation (MLO)
support
- 1024 Block Ack window size support
- firmware-2.bin support
- support having multiple identical PCI devices (firmware needs
to have ATH12K_FW_FEATURE_MULTI_QRTR_ID)
- QCN9274: support split-PHY devices
- WCN7850: enable Power Save Mode in station mode
- WCN7850: P2P support
- RealTek:
- rtw88: support for more rtw8811cu and rtw8821cu devices
- rtw89: support SCAN_RANDOM_SN and SET_SCAN_DWELL
- rtlwifi: speed up USB firmware initialization
- rtwl8xxxu:
- RTL8188F: concurrent interface support
- Channel Switch Announcement (CSA) support in AP mode
- Broadcom (brcmfmac):
- per-vendor feature support
- per-vendor SAE password setup
- DMI nvram filename quirk for ACEPC W5 Pro"
* tag 'net-next-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2255 commits)
nexthop: Fix splat with CONFIG_DEBUG_PREEMPT=y
nexthop: Fix out-of-bounds access during attribute validation
nexthop: Only parse NHA_OP_FLAGS for dump messages that require it
nexthop: Only parse NHA_OP_FLAGS for get messages that require it
bpf: move sleepable flag from bpf_prog_aux to bpf_prog
bpf: hardcode BPF_PROG_PACK_SIZE to 2MB * num_possible_nodes()
selftests/bpf: Add kprobe multi triggering benchmarks
ptp: Move from simple ida to xarray
vxlan: Remove generic .ndo_get_stats64
vxlan: Do not alloc tstats manually
devlink: Add comments to use netlink gen tool
nfp: flower: handle acti_netdevs allocation failure
net/packet: Add getsockopt support for PACKET_COPY_THRESH
net/netlink: Add getsockopt support for NETLINK_LISTEN_ALL_NSID
selftests/bpf: Add bpf_arena_htab test.
selftests/bpf: Add bpf_arena_list test.
selftests/bpf: Add unit tests for bpf_arena_alloc/free_pages
bpf: Add helper macro bpf_addr_space_cast()
libbpf: Recognize __arena global variables.
bpftool: Recognize arena map type
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 cleanups from Ingo Molnar:
"Misc cleanups, including a large series from Thomas Gleixner to cure
sparse warnings"
* tag 'x86-cleanups-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/nmi: Drop unused declaration of proc_nmi_enabled()
x86/callthunks: Use EXPORT_PER_CPU_SYMBOL_GPL() for per CPU variables
x86/cpu: Provide a declaration for itlb_multihit_kvm_mitigation
x86/cpu: Use EXPORT_PER_CPU_SYMBOL_GPL() for x86_spec_ctrl_current
x86/uaccess: Add missing __force to casts in __access_ok() and valid_user_address()
x86/percpu: Cure per CPU madness on UP
smp: Consolidate smp_prepare_boot_cpu()
x86/msr: Add missing __percpu annotations
x86/msr: Prepare for including <linux/percpu.h> into <asm/msr.h>
perf/x86/amd/uncore: Fix __percpu annotation
x86/nmi: Remove an unnecessary IS_ENABLED(CONFIG_SMP)
x86/apm_32: Remove dead function apm_get_battery_status()
x86/insn-eval: Fix function param name in get_eff_addr_sib()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 APIC updates from Thomas Gleixner:
"Rework of APIC enumeration and topology evaluation.
The current implementation has a couple of shortcomings:
- It fails to handle hybrid systems correctly.
- The APIC registration code which handles CPU number assignents is
in the middle of the APIC code and detached from the topology
evaluation.
- The various mechanisms which enumerate APICs, ACPI, MPPARSE and
guest specific ones, tweak global variables as they see fit or in
case of XENPV just hack around the generic mechanisms completely.
- The CPUID topology evaluation code is sprinkled all over the vendor
code and reevaluates global variables on every hotplug operation.
- There is no way to analyze topology on the boot CPU before bringing
up the APs. This causes problems for infrastructure like PERF which
needs to size certain aspects upfront or could be simplified if
that would be possible.
- The APIC admission and CPU number association logic is
incomprehensible and overly complex and needs to be kept around
after boot instead of completing this right after the APIC
enumeration.
This update addresses these shortcomings with the following changes:
- Rework the CPUID evaluation code so it is common for all vendors
and provides information about the APIC ID segments in a uniform
way independent of the number of segments (Thread, Core, Module,
..., Die, Package) so that this information can be computed instead
of rewriting global variables of dubious value over and over.
- A few cleanups and simplifcations of the APIC, IO/APIC and related
interfaces to prepare for the topology evaluation changes.
- Seperation of the parser stages so the early evaluation which tries
to find the APIC address can be seperately overridden from the late
evaluation which enumerates and registers the local APIC as further
preparation for sanitizing the topology evaluation.
- A new registration and admission logic which
- encapsulates the inner workings so that parsers and guest logic
cannot longer fiddle in it
- uses the APIC ID segments to build topology bitmaps at
registration time
- provides a sane admission logic
- allows to detect the crash kernel case, where CPU0 does not run
on the real BSP, automatically. This is required to prevent
sending INIT/SIPI sequences to the real BSP which would reset
the whole machine. This was so far handled by a tedious command
line parameter, which does not even work in nested crash
scenarios.
- Associates CPU number after the enumeration completed and
prevents the late registration of APICs, which was somehow
tolerated before.
- Converting all parsers and guest enumeration mechanisms over to the
new interfaces.
This allows to get rid of all global variable tweaking from the
parsers and enumeration mechanisms and sanitizes the XEN[PV]
handling so it can use CPUID evaluation for the first time.
- Mopping up existing sins by taking the information from the APIC ID
segment bitmaps.
This evaluates hybrid systems correctly on the boot CPU and allows
for cleanups and fixes in the related drivers, e.g. PERF.
The series has been extensively tested and the minimal late fallout
due to a broken ACPI/MADT table has been addressed by tightening the
admission logic further"
* tag 'x86-apic-2024-03-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (76 commits)
x86/topology: Ignore non-present APIC IDs in a present package
x86/apic: Build the x86 topology enumeration functions on UP APIC builds too
smp: Provide 'setup_max_cpus' definition on UP too
smp: Avoid 'setup_max_cpus' namespace collision/shadowing
x86/bugs: Use fixed addressing for VERW operand
x86/cpu/topology: Get rid of cpuinfo::x86_max_cores
x86/cpu/topology: Provide __num_[cores|threads]_per_package
x86/cpu/topology: Rename topology_max_die_per_package()
x86/cpu/topology: Rename smp_num_siblings
x86/cpu/topology: Retrieve cores per package from topology bitmaps
x86/cpu/topology: Use topology logical mapping mechanism
x86/cpu/topology: Provide logical pkg/die mapping
x86/cpu/topology: Simplify cpu_mark_primary_thread()
x86/cpu/topology: Mop up primary thread mask handling
x86/cpu/topology: Use topology bitmaps for sizing
x86/cpu/topology: Let XEN/PV use topology from CPUID/MADT
x86/xen/smp_pv: Count number of vCPUs early
x86/cpu/topology: Assign hotpluggable CPUIDs during init
x86/cpu/topology: Reject unknown APIC IDs on ACPI hotplug
x86/topology: Add a mechanism to track topology via APIC IDs
...
|
|
Pull workqueue updates from Tejun Heo:
"This cycle, a lot of workqueue changes including some that are
significant and invasive.
- During v6.6 cycle, unbound workqueues were updated so that they are
more topology aware and flexible, which among other things improved
workqueue behavior on modern multi-L3 CPUs. In the process, commit
636b927eba5b ("workqueue: Make unbound workqueues to use per-cpu
pool_workqueues") switched unbound workqueues to use per-CPU
frontend pool_workqueues as a part of increasing front-back mapping
flexibility.
An unwelcome side effect of this change was that this made max
concurrency enforcement per-CPU blowing up the maximum number of
allowed concurrent executions. I incorrectly assumed that this
wouldn't cause practical problems as most unbound workqueue users
are self-regulate max concurrency; however, there definitely are
which don't (e.g. on IO paths) and the drastic increase in the
allowed max concurrency led to noticeable perf regressions in some
use cases.
This is now addressed by separating out max concurrency enforcement
to a separate struct - wq_node_nr_active - which makes @max_active
consistently mean system-wide max concurrency regardless of the
number of CPUs or (finally) NUMA nodes. This is a rather invasive
and, in places, a bit clunky; however, the clunkiness rises from
the the inherent requirement to handle the disagreement between the
execution locality domain and max concurrency enforcement domain on
some modern machines.
See commit 5797b1c18919 ("workqueue: Implement system-wide
nr_active enforcement for unbound workqueues") for more details.
- BH workqueue support is added.
They are similar to per-CPU workqueues but execute work items in
the softirq context. This is expected to replace tasklet. However,
currently, it's missing the ability to disable and enable work
items which is needed to convert many tasklet users. To avoid
crowding this merge window too much, this will be included in the
next merge window. A separate pull request will be sent for the
couple conversion patches that are currently pending.
- Waiman plugged a long-standing hole in workqueue CPU isolation
where ordered workqueues didn't follow wq_unbound_cpumask updates.
Ordered workqueues now follow the same rules as other unbound
workqueues.
- More CPU isolation improvements: Juri fixed another deficit in
workqueue isolation where unbound rescuers don't respect
wq_unbound_cpumask. Leonardo fixed delayed_work timers firing on
isolated CPUs.
- Other misc changes"
* tag 'wq-for-6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq: (54 commits)
workqueue: Drain BH work items on hot-unplugged CPUs
workqueue: Introduce from_work() helper for cleaner callback declarations
workqueue: Control intensive warning threshold through cmdline
workqueue: Make @flags handling consistent across set_work_data() and friends
workqueue: Remove clear_work_data()
workqueue: Factor out work_grab_pending() from __cancel_work_sync()
workqueue: Clean up enum work_bits and related constants
workqueue: Introduce work_cancel_flags
workqueue: Use variable name irq_flags for saving local irq flags
workqueue: Reorganize flush and cancel[_sync] functions
workqueue: Rename __cancel_work_timer() to __cancel_timer_sync()
workqueue: Use rcu_read_lock_any_held() instead of rcu_read_lock_held()
workqueue: Cosmetic changes
workqueue, irq_work: Build fix for !CONFIG_IRQ_WORK
workqueue: Fix queue_work_on() with BH workqueues
async: Use a dedicated unbound workqueue with raised min_active
workqueue: Implement workqueue_set_min_active()
workqueue: Fix kernel-doc comment of unplug_oldest_pwq()
workqueue: Bind unbound workqueue rescuer to wq_unbound_cpumask
kernel/workqueue: Let rescuers follow unbound wq cpumask changes
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux
Pull RCU updates from Boqun Feng:
- Eliminate deadlocks involving do_exit() and RCU tasks, by Paul:
Instead of SRCU read side critical sections, now a percpu list is
used in do_exit() for scaning yet-to-exit tasks
- Fix a deadlock due to the dependency between workqueue and RCU
expedited grace period, reported by Anna-Maria Behnsen and Thomas
Gleixner and fixed by Frederic: Now RCU expedited always uses its own
kthread worker instead of a workqueue
- RCU NOCB updates, code cleanups, unnecessary barrier removals and
minor bug fixes
- Maintain real-time response in rcu_tasks_postscan() and a minor fix
for tasks trace quiescence check
- Misc updates, comments and readibility improvement, boot time
parameter for lazy RCU and rcutorture improvement
- Documentation updates
* tag 'rcu.next.v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/boqun/linux: (34 commits)
rcu-tasks: Maintain real-time response in rcu_tasks_postscan()
rcu-tasks: Eliminate deadlocks involving do_exit() and RCU tasks
rcu-tasks: Maintain lists to eliminate RCU-tasks/do_exit() deadlocks
rcu-tasks: Initialize data to eliminate RCU-tasks/do_exit() deadlocks
rcu-tasks: Initialize callback lists at rcu_init() time
rcu-tasks: Add data to eliminate RCU-tasks/do_exit() deadlocks
rcu-tasks: Repair RCU Tasks Trace quiescence check
rcu/sync: remove un-used rcu_sync_enter_start function
rcutorture: Suppress rtort_pipe_count warnings until after stalls
srcu: Improve comments about acceleration leak
rcu: Provide a boot time parameter to control lazy RCU
rcu: Rename jiffies_till_flush to jiffies_lazy_flush
doc: Update checklist.rst discussion of callback execution
doc: Clarify use of slab constructors and SLAB_TYPESAFE_BY_RCU
context_tracking: Fix kerneldoc headers for __ct_user_{enter,exit}()
doc: Add EARLY flag to early-parsed kernel boot parameters
doc: Add CONFIG_RCU_STRICT_GRACE_PERIOD to checklist.rst
doc: Make checklist.rst note that spinlocks are implied RCU readers
doc: Make whatisRCU.rst note that spinlocks are RCU readers
doc: Spinlocks are implied RCU readers
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull block handle updates from Christian Brauner:
"Last cycle we changed opening of block devices, and opening a block
device would return a bdev_handle. This allowed us to implement
support for restricting and forbidding writes to mounted block
devices. It was accompanied by converting and adding helpers to
operate on bdev_handles instead of plain block devices.
That was already a good step forward but ultimately it isn't necessary
to have special purpose helpers for opening block devices internally
that return a bdev_handle.
Fundamentally, opening a block device internally should just be
equivalent to opening files. So now all internal opens of block
devices return files just as a userspace open would. Instead of
introducing a separate indirection into bdev_open_by_*() via struct
bdev_handle bdev_file_open_by_*() is made to just return a struct
file. Opening and closing a block device just becomes equivalent to
opening and closing a file.
This all works well because internally we already have a pseudo fs for
block devices and so opening block devices is simple. There's a few
places where we needed to be careful such as during boot when the
kernel is supposed to mount the rootfs directly without init doing it.
Here we need to take care to ensure that we flush out any asynchronous
file close. That's what we already do for opening, unpacking, and
closing the initramfs. So nothing new here.
The equivalence of opening and closing block devices to regular files
is a win in and of itself. But it also has various other advantages.
We can remove struct bdev_handle completely. Various low-level helpers
are now private to the block layer. Other helpers were simply
removable completely.
A follow-up series that is already reviewed build on this and makes it
possible to remove bdev->bd_inode and allows various clean ups of the
buffer head code as well. All places where we stashed a bdev_handle
now just stash a file and use simple accessors to get to the actual
block device which was already the case for bdev_handle"
* tag 'vfs-6.9.super' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (35 commits)
block: remove bdev_handle completely
block: don't rely on BLK_OPEN_RESTRICT_WRITES when yielding write access
bdev: remove bdev pointer from struct bdev_handle
bdev: make struct bdev_handle private to the block layer
bdev: make bdev_{release, open_by_dev}() private to block layer
bdev: remove bdev_open_by_path()
reiserfs: port block device access to file
ocfs2: port block device access to file
nfs: port block device access to files
jfs: port block device access to file
f2fs: port block device access to files
ext4: port block device access to file
erofs: port device access to file
btrfs: port device access to file
bcachefs: port block device access to file
target: port block device access to file
s390: port block device access to file
nvme: port block device access to file
block2mtd: port device access to files
bcache: port block device access to files
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull pdfd updates from Christian Brauner:
- Until now pidfds could only be created for thread-group leaders but
not for threads. There was no technical reason for this. We simply
had no users that needed support for this. Now we do have users that
need support for this.
This introduces a new PIDFD_THREAD flag for pidfd_open(). If that
flag is set pidfd_open() creates a pidfd that refers to a specific
thread.
In addition, we now allow clone() and clone3() to be called with
CLONE_PIDFD | CLONE_THREAD which wasn't possible before.
A pidfd that refers to an individual thread differs from a pidfd that
refers to a thread-group leader:
(1) Pidfds are pollable. A task may poll a pidfd and get notified
when the task has exited.
For thread-group leader pidfds the polling task is woken if the
thread-group is empty. In other words, if the thread-group
leader task exits when there are still threads alive in its
thread-group the polling task will not be woken when the
thread-group leader exits but rather when the last thread in the
thread-group exits.
For thread-specific pidfds the polling task is woken if the
thread exits.
(2) Passing a thread-group leader pidfd to pidfd_send_signal() will
generate thread-group directed signals like kill(2) does.
Passing a thread-specific pidfd to pidfd_send_signal() will
generate thread-specific signals like tgkill(2) does.
The default scope of the signal is thus determined by the type
of the pidfd.
Since use-cases exist where the default scope of the provided
pidfd needs to be overriden the following flags are added to
pidfd_send_signal():
- PIDFD_SIGNAL_THREAD
Send a thread-specific signal.
- PIDFD_SIGNAL_THREAD_GROUP
Send a thread-group directed signal.
- PIDFD_SIGNAL_PROCESS_GROUP
Send a process-group directed signal.
The scope change will only work if the struct pid is actually
used for this scope.
For example, in order to send a thread-group directed signal the
provided pidfd must be used as a thread-group leader and
similarly for PIDFD_SIGNAL_PROCESS_GROUP the struct pid must be
used as a process group leader.
- Move pidfds from the anonymous inode infrastructure to a tiny pseudo
filesystem. This will unblock further work that we weren't able to do
simply because of the very justified limitations of anonymous inodes.
Moving pidfds to a tiny pseudo filesystem allows for statx on pidfds
to become useful for the first time. They can now be compared by
inode number which are unique for the system lifetime.
Instead of stashing struct pid in file->private_data we can now stash
it in inode->i_private. This makes it possible to introduce concepts
that operate on a process once all file descriptors have been closed.
A concrete example is kill-on-last-close. Another side-effect is that
file->private_data is now freed up for per-file options for pidfds.
Now, each struct pid will refer to a different inode but the same
struct pid will refer to the same inode if it's opened multiple
times. In contrast to now where each struct pid refers to the same
inode.
The tiny pseudo filesystem is not visible anywhere in userspace
exactly like e.g., pipefs and sockfs. There's no lookup, there's no
complex inode operations, nothing. Dentries and inodes are always
deleted when the last pidfd is closed.
We allocate a new inode and dentry for each struct pid and we reuse
that inode and dentry for all pidfds that refer to the same struct
pid. The code is entirely optional and fairly small. If it's not
selected we fallback to anonymous inodes. Heavily inspired by nsfs.
The dentry and inode allocation mechanism is moved into generic
infrastructure that is now shared between nsfs and pidfs. The
path_from_stashed() helper must be provided with a stashing location,
an inode number, a mount, and the private data that is supposed to be
used and it will provide a path that can be passed to dentry_open().
The helper will try retrieve an existing dentry from the provided
stashing location. If a valid dentry is found it is reused. If not a
new one is allocated and we try to stash it in the provided location.
If this fails we retry until we either find an existing dentry or the
newly allocated dentry could be stashed. Subsequent openers of the
same namespace or task are then able to reuse it.
- Currently it is only possible to get notified when a task has exited,
i.e., become a zombie and userspace gets notified with EPOLLIN. We
now also support waiting until the task has been reaped, notifying
userspace with EPOLLHUP.
- Ensure that ESRCH is reported for getfd if a task is exiting instead
of the confusing EBADF.
- Various smaller cleanups to pidfd functions.
* tag 'vfs-6.9.pidfd' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (23 commits)
libfs: improve path_from_stashed()
libfs: add stashed_dentry_prune()
libfs: improve path_from_stashed() helper
pidfs: convert to path_from_stashed() helper
nsfs: convert to path_from_stashed() helper
libfs: add path_from_stashed()
pidfd: add pidfs
pidfd: move struct pidfd_fops
pidfd: allow to override signal scope in pidfd_send_signal()
pidfd: change pidfd_send_signal() to respect PIDFD_THREAD
signal: fill in si_code in prepare_kill_siginfo()
selftests: add ESRCH tests for pidfd_getfd()
pidfd: getfd should always report ESRCH if a task is exiting
pidfd: clone: allow CLONE_THREAD | CLONE_PIDFD together
pidfd: exit: kill the no longer used thread_group_exited()
pidfd: change do_notify_pidfd() to use __wake_up(poll_to_ |