aboutsummaryrefslogtreecommitdiff
path: root/kernel/rcu
AgeCommit message (Collapse)AuthorFilesLines
2024-11-19Merge tag 'irq-core-2024-11-18' of ↵Linus Torvalds1-0/+8
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull interrupt subsystem updates from Thomas Gleixner: "Tree wide: - Make nr_irqs static to the core code and provide accessor functions to remove existing and prevent future aliasing problems with local variables or function arguments of the same name. Core code: - Prevent freeing an interrupt in the devres code which is not managed by devres in the first place. - Use seq_put_decimal_ull_width() for decimal values output in /proc/interrupts which increases performance significantly as it avoids parsing the format strings over and over. - Optimize raising the timer and hrtimer soft interrupts by using the 'set bit only' variants instead of the combined version which checks whether ksoftirqd should be woken up. The latter is a pointless exercise as both soft interrupts are raised in the context of the timer interrupt and therefore never wake up ksoftirqd. - Delegate timer/hrtimer soft interrupt processing to a dedicated thread on RT. Timer and hrtimer soft interrupts are always processed in ksoftirqd on RT enabled kernels. This can lead to high latencies when other soft interrupts are delegated to ksoftirqd as well. The separate thread allows to run them seperately under a RT scheduling policy to reduce the latency overhead. Drivers: - New drivers or extensions of existing drivers to support Renesas RZ/V2H(P), Aspeed AST27XX, T-HEAD C900 and ATMEL sam9x7 interrupt chips - Support for multi-cluster GICs on MIPS. MIPS CPUs can come with multiple CPU clusters, where each CPU cluster has its own GIC (Generic Interrupt Controller). This requires to access the GIC of a remote cluster through a redirect register block. This is encapsulated into a set of helper functions to keep the complexity out of the actual code paths which handle the GIC details. - Support for encrypted guests in the ARM GICV3 ITS driver The ITS page needs to be shared with the hypervisor and therefore must be decrypted. - Small cleanups and fixes all over the place" * tag 'irq-core-2024-11-18' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits) irqchip/riscv-aplic: Prevent crash when MSI domain is missing genirq/proc: Use seq_put_decimal_ull_width() for decimal values softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT. timers: Use __raise_softirq_irqoff() to raise the softirq. hrtimer: Use __raise_softirq_irqoff() to raise the softirq riscv: defconfig: Enable T-HEAD C900 ACLINT SSWI drivers irqchip: Add T-HEAD C900 ACLINT SSWI driver dt-bindings: interrupt-controller: Add T-HEAD C900 ACLINT SSWI device irqchip/stm32mp-exti: Use of_property_present() for non-boolean properties irqchip/mips-gic: Fix selection of GENERIC_IRQ_EFFECTIVE_AFF_MASK irqchip/mips-gic: Prevent indirect access to clusters without CPU cores irqchip/mips-gic: Multi-cluster support irqchip/mips-gic: Setup defaults in each cluster irqchip/mips-gic: Support multi-cluster in for_each_online_cpu_gic() irqchip/mips-gic: Replace open coded online CPU iterations genirq/irqdesc: Use str_enabled_disabled() helper in wakeup_show() genirq/devres: Don't free interrupt which is not managed by devres irqchip/gic-v3-its: Fix over allocation in itt_alloc_pool() irqchip/aspeed-intc: Add AST27XX INTC support dt-bindings: interrupt-controller: Add support for ASPEED AST27XX INTC ...
2024-11-19Merge tag 'rcu.release.v6.13' of ↵Linus Torvalds12-173/+275
git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux Pull RCU updates from Frederic Weisbecker: "SRCU: - Introduction of the new SRCU-lite flavour with a new pair of srcu_read_[un]lock_lite() APIs. In practice the read side using this flavour becomes lighter by removing a full memory barrier on LOCK and a full memory barrier on UNLOCK. This comes at the expense of a higher latency write side with two (in the best case of a snaphot of unused read-sides) or more RCU grace periods on the update side which now assumes by itself the whole full ordering guarantee against the LOCK/UNLOCK counters on both indexes, along with the accesses performed inside. Uretprobes is a known potential user. Note this doesn't replace the default normal flavour of SRCU which still behaves the same as usual. - Add testing of SRCU-lite through rcutorture and rcuscale - Various cleanups on the way. Fixes: - Allow short-circuiting RCU-TASKS-RUDE grace periods on architectures that have sane noinstr boundaries forbidding tracing on low-level idle and kernel entry code. RCU-TASKS is enough on such configurations because it involves an RCU grace period that waits for all idle tasks to either schedule out voluntarily or enter into RCU unwatched noinstr code. - Allow and test start_poll_synchronize_rcu() with IRQs disabled. - Mention rcuog kthreads in relevant documentation and Kconfig help - Various fixes and consolidations rcutorture: - Add --no-affinity on tools to leave the affinity setting of guests up to the user. - Add guest_os_delay parameter to rcuscale for better warm-up control. - Fix and improve some rcuscale error handling. - Various cleanups and fixes stall: - Remove dead code - Stop dumping tasks if a stalled grace period eventually ended midway as that only produces confusing output. - Optimize detection of stalling CPUs and avoid useless node locking otherwise. NOCB: - Fix rcu_barrier() hang due to a race against callbacks deoffloading. This is not yet used, except by rcutorture, and waits for its promised cpusets interface. - Remove leftover function declaration" * tag 'rcu.release.v6.13' of git://git.kernel.org/pub/scm/linux/kernel/git/rcu/linux: (42 commits) rcuscale: Remove redundant WARN_ON_ONCE() splat rcuscale: Do a proper cleanup if kfree_scale_init() fails srcu: Unconditionally record srcu_read_lock_lite() in ->srcu_reader_flavor srcu: Check for srcu_read_lock_lite() across all CPUs srcu: Remove smp_mb() from srcu_read_unlock_lite() rcutorture: Avoid printing cpu=-1 for no-fault RCU boost failure rcuscale: Add guest_os_delay module parameter refscale: Correct affinity check torture: Add --no-affinity parameter to kvm.sh rcu/nocb: Fix missed RCU barrier on deoffloading rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcu rcu/srcutiny: don't return before reenabling preemption rcu-tasks: Remove open-coded one-byte cmpxchg() emulation doc: Remove kernel-parameters.txt entry for rcutorture.read_exit rcutorture: Test start-poll primitives with interrupts disabled rcu: Permit start_poll_synchronize_rcu*() with interrupts disabled rcu: Allow short-circuiting of synchronize_rcu_tasks_rude() doc: Add rcuog kthreads to kernel-per-CPU-kthreads.rst rcu: Add rcuog kthreads to RCU_NOCB_CPU help text rcu: Use the BITS_PER_LONG macro ...
2024-11-15Merge branches 'rcu/fixes', 'rcu/nocb', 'rcu/torture', 'rcu/stall' and ↵Frederic Weisbecker7-123/+219
'rcu/srcu' into rcu/dev
2024-11-15rcuscale: Remove redundant WARN_ON_ONCE() splatUladzislau Rezki (Sony)1-2/+0
There are two places where WARN_ON_ONCE() is called two times in the error paths. One which is encapsulated into if() condition and another one, which is unnecessary, is placed in the brackets. Remove an extra WARN_ON_ONCE() splat which is in brackets. Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-15rcuscale: Do a proper cleanup if kfree_scale_init() failsUladzislau Rezki (Sony)1-2/+4
A static analyzer for C, Smatch, reports and triggers below warnings: kernel/rcu/rcuscale.c:1215 rcu_scale_init() warn: inconsistent returns 'global &fullstop_mutex'. The checker complains about, we do not unlock the "fullstop_mutex" mutex, in case of hitting below error path: <snip> ... if (WARN_ON_ONCE(jiffies_at_lazy_cb - jif_start < 2 * HZ)) { pr_alert("ERROR: call_rcu() CBs are not being lazy as expected!\n"); WARN_ON_ONCE(1); return -1; ^^^^^^^^^^ ... <snip> it happens because "-1" is returned right away instead of doing a proper unwinding. Fix it by jumping to "unwind" label instead of returning -1. Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Closes: https://lore.kernel.org/rcu/ZxfTrHuEGtgnOYWp@pc636/T/ Fixes: 084e04fff160 ("rcuscale: Add laziness and kfree tests") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-15srcu: Unconditionally record srcu_read_lock_lite() in ->srcu_reader_flavorPaul E. McKenney1-4/+2
Currently, srcu_read_lock_lite() uses the SRCU_READ_FLAVOR_LITE bit in ->srcu_reader_flavor to communicate to the grace-period processing in srcu_readers_active_idx_check() that the smp_mb() must be replaced by a synchronize_rcu(). Unfortunately, ->srcu_reader_flavor is not updated unless the kernel is built with CONFIG_PROVE_RCU=y. Therefore in all kernels built with CONFIG_PROVE_RCU=n, srcu_readers_active_idx_check() incorrectly uses smp_mb() instead of synchronize_rcu() for srcu_struct structures whose readers use srcu_read_lock_lite(). This commit therefore causes Tree SRCU srcu_read_lock_lite() to unconditionally update ->srcu_reader_flavor so that srcu_readers_active_idx_check() can make the correct choice. Reported-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Closes: https://lore.kernel.org/all/d07e8f4a-d5ff-4c8e-8e61-50db285c57e9@amd.com/ Fixes: c0f08d6b5a61 ("srcu: Add srcu_read_lock_lite() and srcu_read_unlock_lite()") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12srcu: Check for srcu_read_lock_lite() across all CPUsPaul E. McKenney1-5/+7
If srcu_read_lock_lite() is used on a given srcu_struct structure, then the grace-period processing must do synchronize_rcu() instead of smp_mb() between the scans of the ->srcu_unlock_count[] and ->srcu_lock_count[] counters. Currently, it does that by testing the SRCU_READ_FLAVOR_LITE bit of the ->srcu_reader_flavor mask, which works well. But only if the CPU running that srcu_struct structure's grace period has previously executed srcu_read_lock_lite(), which might not be the case, especially just after that srcu_struct structure has been created and initialized. This commit therefore updates the srcu_readers_unlock_idx() function to OR together the ->srcu_reader_flavor masks from all CPUs, and then make the srcu_readers_active_idx_check() function that test the SRCU_READ_FLAVOR_LITE bit in the resulting mask. Note that the srcu_readers_unlock_idx() function is already scanning all the CPUs to sum up the ->srcu_unlock_count[] fields and that this is on the grace-period slow path, hence no concerns about the small amount of extra work. Reported-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Closes: https://lore.kernel.org/all/d07e8f4a-d5ff-4c8e-8e61-50db285c57e9@amd.com/ Fixes: c0f08d6b5a61 ("srcu: Add srcu_read_lock_lite() and srcu_read_unlock_lite()") Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcutorture: Avoid printing cpu=-1 for no-fault RCU boost failurePaul E. McKenney1-2/+7
If a CPU runs throughout the stalled grace period without passing through a quiescent state, RCU priority boosting cannot help. The rcu_torture_boost_failed() function therefore prints a message flagging the first such CPU. However, if the stall was instead due to (for example) RCU's grace-period kthread being starved of CPU, there will be no such CPU, causing rcu_check_boost_fail() to instead pass back -1 through its cpup CPU-pointer parameter. Therefore, the current message complains about a mythical CPU -1. This commit therefore checks for this situation, and notes that all CPUs have passed through a quiescent state. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcuscale: Add guest_os_delay module parameterPaul E. McKenney1-0/+17
This commit adds a guest_os_delay module parameter that extends warm-up and cool-down the specified number of seconds before and after the series of test runs. This allows the data-collection intervals from any given rcuscale guest OSes to line up with active periods in the other rcuscale guest OSes, and also allows the thermal warm-up period required to obtain consistent results from one test to the next. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12refscale: Correct affinity checkPaul E. McKenney1-1/+1
The current affinity check works fine until there are more reader processes than CPUs, at which point the affinity check is looking for non-existent CPUs. This commit therefore applies the same modulus to the check as is present in the set_cpus_allowed_ptr() call. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcu/nocb: Fix missed RCU barrier on deoffloadingZqiang1-1/+12
Currently, running rcutorture test with torture_type=rcu fwd_progress=8 n_barrier_cbs=8 nocbs_nthreads=8 nocbs_toggle=100 onoff_interval=60 test_boost=2, will trigger the following warning: WARNING: CPU: 19 PID: 100 at kernel/rcu/tree_nocb.h:1061 rcu_nocb_rdp_deoffload+0x292/0x2a0 RIP: 0010:rcu_nocb_rdp_deoffload+0x292/0x2a0 Call Trace: <TASK> ? __warn+0x7e/0x120 ? rcu_nocb_rdp_deoffload+0x292/0x2a0 ? report_bug+0x18e/0x1a0 ? handle_bug+0x3d/0x70 ? exc_invalid_op+0x18/0x70 ? asm_exc_invalid_op+0x1a/0x20 ? rcu_nocb_rdp_deoffload+0x292/0x2a0 rcu_nocb_cpu_deoffload+0x70/0xa0 rcu_nocb_toggle+0x136/0x1c0 ? __pfx_rcu_nocb_toggle+0x10/0x10 kthread+0xd1/0x100 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x2f/0x50 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> CPU0 CPU2 CPU3 //rcu_nocb_toggle //nocb_cb_wait //rcutorture // deoffload CPU1 // process CPU1's rdp rcu_barrier() rcu_segcblist_entrain() rcu_segcblist_add_len(1); // len == 2 // enqueue barrier // callback to CPU1's // rdp->cblist rcu_do_batch() // invoke CPU1's rdp->cblist // callback rcu_barrier_callback() rcu_barrier() mutex_lock(&rcu_state.barrier_mutex); // still see len == 2 // enqueue barrier callback // to CPU1's rdp->cblist rcu_segcblist_entrain() rcu_segcblist_add_len(1); // len == 3 // decrement len rcu_segcblist_add_len(-2); kthread_parkme() // CPU1's rdp->cblist len == 1 // Warn because there is // still a pending barrier // trigger warning WARN_ON_ONCE(rcu_segcblist_n_cbs(&rdp->cblist)); cpus_read_unlock(); // wait CPU1 to comes online and // invoke barrier callback on // CPU1 rdp's->cblist wait_for_completion(&rcu_state.barrier_completion); // deoffload CPU4 cpus_read_lock() rcu_barrier() mutex_lock(&rcu_state.barrier_mutex); // block on barrier_mutex // wait rcu_barrier() on // CPU3 to unlock barrier_mutex // but CPU3 unlock barrier_mutex // need to wait CPU1 comes online // when CPU1 going online will block on cpus_write_lock The above scenario will not only trigger a WARN_ON_ONCE(), but also trigger a deadlock. Thanks to nocb locking, a second racing rcu_barrier() on an offline CPU will either observe the decremented callback counter down to 0 and spare the callback enqueue, or rcuo will observe the new callback and keep rdp->nocb_cb_sleep to false. Therefore check rdp->nocb_cb_sleep before parking to make sure no further rcu_barrier() is waiting on the rdp. Fixes: 1fcb932c8b5c ("rcu/nocb: Simplify (de-)offloading state machine") Suggested-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcu/kvfree: Fix data-race in __mod_timer / kvfree_call_rcuUladzislau Rezki (Sony)1-2/+12
KCSAN reports a data race when access the krcp->monitor_work.timer.expires variable in the schedule_delayed_monitor_work() function: <snip> BUG: KCSAN: data-race in __mod_timer / kvfree_call_rcu read to 0xffff888237d1cce8 of 8 bytes by task 10149 on cpu 1: schedule_delayed_monitor_work kernel/rcu/tree.c:3520 [inline] kvfree_call_rcu+0x3b8/0x510 kernel/rcu/tree.c:3839 trie_update_elem+0x47c/0x620 kernel/bpf/lpm_trie.c:441 bpf_map_update_value+0x324/0x350 kernel/bpf/syscall.c:203 generic_map_update_batch+0x401/0x520 kernel/bpf/syscall.c:1849 bpf_map_do_batch+0x28c/0x3f0 kernel/bpf/syscall.c:5143 __sys_bpf+0x2e5/0x7a0 __do_sys_bpf kernel/bpf/syscall.c:5741 [inline] __se_sys_bpf kernel/bpf/syscall.c:5739 [inline] __x64_sys_bpf+0x43/0x50 kernel/bpf/syscall.c:5739 x64_sys_call+0x2625/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:322 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xc9/0x1c0 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f write to 0xffff888237d1cce8 of 8 bytes by task 56 on cpu 0: __mod_timer+0x578/0x7f0 kernel/time/timer.c:1173 add_timer_global+0x51/0x70 kernel/time/timer.c:1330 __queue_delayed_work+0x127/0x1a0 kernel/workqueue.c:2523 queue_delayed_work_on+0xdf/0x190 kernel/workqueue.c:2552 queue_delayed_work include/linux/workqueue.h:677 [inline] schedule_delayed_monitor_work kernel/rcu/tree.c:3525 [inline] kfree_rcu_monitor+0x5e8/0x660 kernel/rcu/tree.c:3643 process_one_work kernel/workqueue.c:3229 [inline] process_scheduled_works+0x483/0x9a0 kernel/workqueue.c:3310 worker_thread+0x51d/0x6f0 kernel/workqueue.c:3391 kthread+0x1d1/0x210 kernel/kthread.c:389 ret_from_fork+0x4b/0x60 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 Reported by Kernel Concurrency Sanitizer on: CPU: 0 UID: 0 PID: 56 Comm: kworker/u8:4 Not tainted 6.12.0-rc2-syzkaller-00050-g5b7c893ed5ed #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 09/13/2024 Workqueue: events_unbound kfree_rcu_monitor <snip> kfree_rcu_monitor() rearms the work if a "krcp" has to be still offloaded and this is done without holding krcp->lock, whereas the kvfree_call_rcu() holds it. Fix it by acquiring the "krcp->lock" for kfree_rcu_monitor() so both functions do not race anymore. Reported-by: syzbot+061d370693bdd99f9d34@syzkaller.appspotmail.com Link: https://lore.kernel.org/lkml/ZxZ68KmHDQYU0yfD@pc636/T/ Fixes: 8fc5494ad5fa ("rcu/kvfree: Move need_offload_krc() out of krcp->lock") Signed-off-by: Uladzislau Rezki (Sony) <urezki@gmail.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcu/srcutiny: don't return before reenabling preemptionMichal Schmidt1-1/+1
Code after the return statement is dead. Enable preemption before returning from srcu_drive_gp(). This will be important when/if PREEMPT_AUTO (lazy resched) gets merged. Fixes: 65b4a59557f6 ("srcu: Make Tiny SRCU explicitly disable preemption") Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Michal Schmidt <mschmidt@redhat.com> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcu-tasks: Remove open-coded one-byte cmpxchg() emulationPaul E. McKenney1-16/+1
This commit removes the open-coded one-byte cmpxchg() emulation from rcu_trc_cmpxchg_need_qs(), replacing it with just cmpxchg() given the latter's new-found ability to handle single-byte arguments across all architectures. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcutorture: Test start-poll primitives with interrupts disabledPaul E. McKenney1-0/+10
This commit tests the ->start_poll() and ->start_poll_full() functions with interrupts disabled, but only for RCU variants setting the ->start_poll_irqsoff flag. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcu: Permit start_poll_synchronize_rcu*() with interrupts disabledPaul E. McKenney1-7/+0
The header comment for both start_poll_synchronize_rcu() and start_poll_synchronize_rcu_full() state that interrupts must be enabled when calling these two functions, and there is a lockdep assertion in start_poll_synchronize_rcu_common() enforcing this restriction. However, there is no need for this restrictions, as can be seen in call_rcu(), which does wakeups when interrupts are disabled. This commit therefore removes the lockdep assertion and the comments. Reported-by: Kent Overstreet <kent.overstreet@linux.dev> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcu: Allow short-circuiting of synchronize_rcu_tasks_rude()Paul E. McKenney1-1/+2
There are now architectures for which all deep-idle and entry-exit functions are properly inlined or marked noinstr. Such architectures do not need synchronize_rcu_tasks_rude(), or will not once RCU Tasks has been modified to pay attention to idle tasks. This commit therefore allows a CONFIG_ARCH_HAS_NOINSTR_MARKINGS Kconfig option to turn synchronize_rcu_tasks_rude() into a no-op. To facilitate testing, kernels built by rcutorture scripting will enable RCU Tasks Trace even on systems that do not need it. [ paulmck: Apply Peter Zijlstra feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcu: Add rcuog kthreads to RCU_NOCB_CPU help textPaul E. McKenney1-10/+18
The RCU_NOCB_CPU help text currently fails to mention rcuog kthreads, so this commit adds this information. Reported-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcu: Use the BITS_PER_LONG macroJinjie Ruan1-2/+1
sizeof(unsigned long) * 8 is the number of bits in an unsigned long variable, replace it with BITS_PER_LONG macro to make it simpler. Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com> Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcu: Use bitwise instead of arithmetic operator for flagsHongbo Li1-11/+11
This silences the following coccinelle warning: WARNING: sum of probable bitmasks, consider | Signed-off-by: Hongbo Li <lihongbo22@huawei.com> Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12refscale: Add srcu_read_lock_lite() support using "srcu-lite"Paul E. McKenney1-3/+34
This commit creates a new srcu-lite option for the refscale.scale_type module parameter that selects srcu_read_lock_lite() and srcu_read_unlock_lite(). [ paulmck: Apply Dan Carpenter feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcutorture: Add srcu_read_lock_lite() support to rcutorture.reader_flavorPaul E. McKenney1-0/+7
This commit causes bit 0x4 of rcutorture.reader_flavor to select the new srcu_read_lock_lite() and srcu_read_unlock_lite() functions. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcutorture: Add reader_flavor parameter for SRCU readersPaul E. McKenney1-8/+22
This commit adds an rcutorture.reader_flavor parameter whose bits correspond to reader flavors. For example, SRCU's readers are 0x1 for normal and 0x2 for NMI-safe. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12rcutorture: Expand RCUTORTURE_RDR_MASK_[12] to eight bitsPaul E. McKenney1-14/+14
This commit prepares for testing of multiple SRCU reader flavors by expanding RCUTORTURE_RDR_MASK_1 and RCUTORTURE_RDR_MASK_2 from a single bit to eight bits, allowing them to accommodate the return values from multiple calls to srcu_read_lock*(). This will in turn permit better testing coverage for these SRCU reader flavors, including testing of the diagnostics for inproper use of mixed reader flavors. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12srcu: Allow inlining of __srcu_read_{,un}lock_lite()Paul E. McKenney1-41/+0
This commit moves __srcu_read_lock_lite() and __srcu_read_unlock_lite() into include/linux/srcu.h and marks them "static inline" so that they can be inlined into srcu_read_lock_lite() and srcu_read_unlock_lite(), respectively. They are not hand-inlined due to Tree SRCU and Tiny SRCU having different implementations. The earlier removal of smp_mb() combined with the inlining produce significant single-percentage performance wins. Link: https://lore.kernel.org/all/CAEf4BzYgiNmSb=ZKQ65tm6nJDi1UX2Gq26cdHSH1mPwXJYZj5g@mail.gmail.com/ Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12srcu: Add srcu_read_lock_lite() and srcu_read_unlock_lite()Paul E. McKenney1-11/+71
This patch adds srcu_read_lock_lite() and srcu_read_unlock_lite(), which dispense with the read-side smp_mb() but also are restricted to code regions that RCU is watching. If a given srcu_struct structure uses srcu_read_lock_lite() and srcu_read_unlock_lite(), it is not permitted to use any other SRCU read-side marker, before, during, or after. Another price of light-weight readers is heavier weight grace periods. Such readers mean that SRCU grace periods on srcu_struct structures used by light-weight readers will incur at least two calls to synchronize_rcu(). In addition, normal SRCU grace periods for light-weight-reader srcu_struct structures never auto-expedite. Note that expedited SRCU grace periods for light-weight-reader srcu_struct structures still invoke synchronize_rcu(), not synchronize_srcu_expedited(). Something about wishing to keep the IPIs down to a dull roar. The srcu_read_lock_lite() and srcu_read_unlock_lite() functions may not (repeat, *not*) be used from NMI handlers, but if this is needed, an additional flavor of SRCU reader can be added by some future commit. [ paulmck: Apply Alexei Starovoitov expediting feedback. ] [ paulmck: Apply kernel test robot feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Tested-by: kernel test robot <oliver.sang@intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12srcu: Create CPP macros for normal and NMI-safe SRCU readersPaul E. McKenney1-10/+11
This commit creates SRCU_READ_FLAVOR_NORMAL and SRCU_READ_FLAVOR_NMI C-preprocessor macros for srcu_read_lock() and srcu_read_lock_nmisafe(), respectively. These replace the old true/false values that were previously passed to srcu_check_read_flavor(). In addition, the srcu_check_read_flavor() function itself requires a bit of rework to handle bitmasks instead of true/false values. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12srcu: Standardize srcu_data pointers to "sdp" and similarPaul E. McKenney1-10/+10
This commit changes a few "cpuc" variables to "sdp" to align with usage elsewhere. [ paulmck: Apply Neeraj Upadhyay feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12srcu: Bit manipulation changes for additional reader flavorPaul E. McKenney1-3/+4
Currently, there are only two flavors of readers, normal and NMI-safe. Very straightforward state updates suffice to check for erroneous mixing of reader flavors on a given srcu_struct structure. This commit upgrades the checking in preparation for the addition of light-weight (as in memory-barrier-free) readers. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-12srcu: Renaming in preparation for additional reader flavorPaul E. McKenney1-11/+11
Currently, there are only two flavors of readers, normal and NMI-safe. A number of fields, functions, and types reflect this restriction. This renaming-only commit prepares for the addition of light-weight (as in memory-barrier-free) readers. OK, OK, there is also a drive-by white-space fixeup! Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-11-07softirq: Use a dedicated thread for timer wakeups on PREEMPT_RT.Sebastian Andrzej Siewior1-0/+8
The timer and hrtimer soft interrupts are raised in hard interrupt context. With threaded interrupts force enabled or on PREEMPT_RT this leads to waking the ksoftirqd for the processing of the soft interrupt. ksoftirqd runs as SCHED_OTHER task which means it will compete with other tasks for CPU resources. This can introduce long delays for timer processing on heavy loaded systems and is not desired. Split the TIMER_SOFTIRQ and HRTIMER_SOFTIRQ processing into a dedicated timers thread and let it run at the lowest SCHED_FIFO priority. Wake-ups for RT tasks happen from hardirq context so only timer_list timers and hrtimers for "regular" tasks are processed here. The higher priority ensures that wakeups are performed before scheduling SCHED_OTHER tasks. Using a dedicated variable to store the pending softirq bits values ensure that the timer are not accidentally picked up by ksoftirqd and other threaded interrupts. It shouldn't be picked up by ksoftirqd since it runs at lower priority. However if ksoftirqd is already running while a timer fires, then ksoftird will be PI-boosted due to the BH-lock to ktimer's priority. The timer thread can pick up pending softirqs from ksoftirqd but only if the softirq load is high. It is not be desired that the picked up softirqs are processed at SCHED_FIFO priority under high softirq load but this can already happen by a PI-boost by a force-threaded interrupt. [ frederic@kernel.org: rcutorture.c fixes, storm fix by introduction of local_timers_pending() for tick_nohz_next_event() ] [ junxiao.chang@intel.com: Ensure ktimersd gets woken up even if a softirq is currently served. ] Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> [rcutorture] Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Link: https://lore.kernel.org/all/20241106150419.2593080-4-bigeasy@linutronix.de
2024-11-03rcu: Finer-grained grace-period-end checks in rcu_dump_cpu_stacks()Paul E. McKenney1-7/+11
This commit pushes the grace-period-end checks further down into rcu_dump_cpu_stacks(), and also uses lockless checks coupled with finer-grained locking. The result is that the current leaf rcu_node structure's ->lock is acquired only if a stack backtrace might be needed from the current CPU, and is held across only that CPU's backtrace. As a result, if there are no stalled CPUs associated with a given rcu_node structure, then its ->lock will not be acquired at all. On large systems, it is usually (though not always) the case that a small number of CPUs are stalling the current grace period, which means that the ->lock need be acquired only for a small fraction of the rcu_node structures. [ paulmck: Apply Dan Carpenter feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-10-28srcu: Introduce srcu_gp_is_expedited() helper functionPaul E. McKenney1-2/+12
Even though the open-coded expressions usually fit on one line, this commit replaces them with a call to a new srcu_gp_is_expedited() helper function in order to improve readability. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-10-28srcu: Rename srcu_might_be_idle() to srcu_should_expedite()Paul E. McKenney1-7/+9
SRCU auto-expedites grace periods that follow a sufficiently long idle period, and the srcu_might_be_idle() function is used to make this decision. However, the upcoming light-weight SRCU readers will not do auto-expediting because doing so would cause the grace-period machinery to invoke synchronize_rcu_expedited() twice, with IPIs all around. However, software-engineering considerations force this determination to remain in srcu_might_be_idle(). This commit therefore changes the name of srcu_might_be_idle() to srcu_should_expedite(), thus moving from what it currently does to why it does it, this latter being more future-proof. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Kent Overstreet <kent.overstreet@linux.dev> Cc: <bpf@vger.kernel.org> Reviewed-by: Neeraj Upadhyay <Neeraj.Upadhyay@amd.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-10-23srcu: Replace WARN_ON_ONCE() with BUILD_BUG_ON() if possibleZhen Lei1-2/+2
The value of ARRAY_SIZE() can be determined at compile time, so if both sides of the equation are ARRAY_SIZE(), using BUILD_BUG_ON() can help us catch the problem earlier. While there are cases where unequal array sizes will work, there is no point in allowing them, so it makes more sense to force them to be equal using BUILD_BUG_ON(). Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-10-23rcu: Stop stall warning from dumping stacks if grace period endsPaul E. McKenney1-6/+11
Currently, once an RCU CPU stall warning decides to dump the stalling CPUs' stacks, the rcu_dump_cpu_stacks() function persists until it has gone through the full list. Unfortunately, if the stalled grace periods ends midway through, this function will be dumping stacks of innocent-bystander CPUs that happen to be blocking not the old grace period, but instead the new one. This can cause serious confusion. This commit therefore stops dumping stacks if and when the stalled grace period ends. [ paulmck: Apply Joel Fernandes feedback. ] Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-10-23rcu: Delete unused rcu_gp_might_be_stalled() functionPaul E. McKenney1-30/+0
The rcu_gp_might_be_stalled() function is no longer used, so this commit removes it. Signed-off-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Joel Fernandes (Google) <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-10-22rcu: Remove unused declaration rcu_segcblist_offload()Yue Haibing1-1/+0
Commit 17351eb59abd ("rcu/nocb: Simplify (de-)offloading state machine") removed the implementation but leave declaration. Signed-off-by: Yue Haibing <yuehaibing@huawei.com> Reviewed-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: "Paul E. McKenney" <paulmck@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2024-10-17Merge branch 'linus' into sched/urgent, to resolve conflictIngo Molnar2-5/+12
Conflicts: kernel/sched/ext.c There's a context conflict between this upstream commit: 3fdb9ebcec10 sched_ext: Start schedulers with consistent p->scx.slice values ... and this fix in sched/urgent: 98442f0ccd82 sched: Fix delayed_dequeue vs switched_from_fair() Resolve it. Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-10-14sched/fair: Fix external p->on_rq usersPeter Zijlstra1-0/+9
Sean noted that ever since commit 152e11f6df29 ("sched/fair: Implement delayed dequeue") KVM's preemption notifiers have started mis-classifying preemption vs blocking. Notably p->on_rq is no longer sufficient to determine if a task is runnable or blocked -- the aforementioned commit introduces tasks that remain on the runqueue even through they will not run again, and should be considered blocked for many cases. Add the task_is_runnable() helper to classify things and audit all external users of the p->on_rq state. Also add a few comments. Fixes: 152e11f6df29 ("sched/fair: Implement delayed dequeue") Reported-by: Sean Christopherson <seanjc@google.com> Tested-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Ingo Molnar <mingo@kernel.org> Link: https://lkml.kernel.org/r/20241010091843.GK33184@noisy.programming.kicks-ass.net
2024-10-10rcu/nocb: Fix rcuog wake-up from offline softirqFrederic Weisbecker1-1/+7
After a CPU has set itself offline and before it eventually calls rcutree_report_cpu_dead(), there are still opportunities for callbacks to be enqueued, for example from a softirq. When that happens on NOCB, the rcuog wake-up is deferred through an IPI to an online CPU in order not to call into the scheduler and risk arming the RT-bandwidth after hrtimers have been migrated out and disabled. But performing a synchronized IPI from a softirq is buggy as reported in the following scenario: WARNING: CPU: 1 PID: 26 at kernel/smp.c:633 smp_call_function_single Modules linked in: rcutorture torture CPU: 1 UID: 0 PID: 26 Comm: migration/1 Not tainted 6.11.0-rc1-00012-g9139f93209d1 #1 Stopper: multi_cpu_stop+0x0/0x320 <- __stop_cpus+0xd0/0x120 RIP: 0010:smp_call_function_single <IRQ> swake_up_one_online __call_rcu_nocb_wake __call_rcu_common ? rcu_torture_one_read call_timer_fn __run_timers run_timer_softirq handle_softirqs irq_exit_rcu ? tick_handle_periodic sysvec_apic_timer_interrupt </IRQ> Fix this with forcing deferred rcuog wake up through the NOCB timer when the CPU is offline. The actual wake up will happen from rcutree_report_cpu_dead(). Reported-by: kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202409231644.4c55582d-lkp@intel.com Fixes: 9139f93209d1 ("rcu/nocb: Fix RT throttling hrtimer armed from offline CPU") Reviewed-by: "Joel Fernandes (Google)" <joel@joelfernandes.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Signed-off-by: Neeraj Upadhyay <neeraj.upadhyay@kernel.org>
2024-10-01rcu/kvfree: Refactor kvfree_rcu_queue_batch()Uladzislau Rezki (Sony)1-4/+5
Improve readability of kvfree_rcu_queue_bat