aboutsummaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2026-02-03doc: Add housekeeping documentationFrederic Weisbecker2-0/+112
Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com>
2026-02-03kthread: Document kthread_affine_preferred()Frederic Weisbecker1-0/+12
The documentation of this new API has been overlooked during its introduction. Fill the gap. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com>
2026-02-03kthread: Comment on the purpose and placement of kthread_affine_node() callFrederic Weisbecker1-0/+4
It may not appear obvious why kthread_affine_node() is not called before the kthread creation completion instead of after the first wake-up. The reason is that kthread_affine_node() applies a default affinity behaviour that only takes place if no affinity preference have already been passed by the kthread creation call site. Add a comment to clarify that. Reported-by: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03kthread: Honour kthreads preferred affinity after cpuset changesFrederic Weisbecker4-13/+37
When cpuset isolated partitions get updated, unbound kthreads get indifferently affine to all non isolated CPUs, regardless of their individual affinity preferences. For example kswapd is a per-node kthread that prefers to be affine to the node it refers to. Whenever an isolated partition is created, updated or deleted, kswapd's node affinity is going to be broken if any CPU in the related node is not isolated because kswapd will be affine globally. Fix this with letting the consolidated kthread managed affinity code do the affinity update on behalf of cpuset. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Michal Koutný <mkoutny@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org
2026-02-03sched/arm64: Move fallback task cpumask to HK_TYPE_DOMAINFrederic Weisbecker2-7/+11
When none of the allowed CPUs of a task are online, it gets migrated to the fallback cpumask which is all the non nohz_full CPUs. However just like nohz_full CPUs, domain isolated CPUs don't want to be disturbed by tasks that have lost their CPU affinities. And since nohz_full rely on domain isolation to work correctly, the housekeeping mask of domain isolated CPUs should always be a subset of the housekeeping mask of nohz_full CPUs (there can be CPUs that are domain isolated but not nohz_full, OTOH there shouldn't be nohz_full CPUs that are not domain isolated): HK_TYPE_DOMAIN & HK_TYPE_KERNEL_NOISE == HK_TYPE_DOMAIN Therefore use HK_TYPE_DOMAIN as the appropriate fallback target for tasks. Note that cpuset isolated partitions are not supported on those systems and may result in undefined behaviour. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Will Deacon <will@kernel.org> Tested-by: Will Deacon <will@kernel.org> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: linux-arm-kernel@lists.infradead.org
2026-02-03sched: Switch the fallback task allowed cpumask to HK_TYPE_DOMAINFrederic Weisbecker1-1/+1
Tasks that have all their allowed CPUs offline don't want their affinity to fallback on either nohz_full CPUs or on domain isolated CPUs. And since nohz_full implies domain isolation, checking the latter is enough to verify both. Therefore exclude domain isolation from fallback task affinity. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will@kernel.org> Cc: linux-arm-kernel@lists.infradead.org
2026-02-03kthread: Rely on HK_TYPE_DOMAIN for preferred affinity managementFrederic Weisbecker1-3/+5
Unbound kthreads want to run neither on nohz_full CPUs nor on domain isolated CPUs. And since nohz_full implies domain isolation, checking the latter is enough to verify both. Therefore exclude kthreads from domain isolation. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03kthread: Include kthreadd to the managed affinity listFrederic Weisbecker1-1/+2
The unbound kthreads affinity management performed by cpuset is going to be imported to the kthread core code for consolidation purposes. Treat kthreadd just like any other kthread. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03kthread: Include unbound kthreads in the managed affinity listFrederic Weisbecker1-28/+40
The managed affinity list currently contains only unbound kthreads that have affinity preferences. Unbound kthreads globally affine by default are outside of the list because their affinity is automatically managed by the scheduler (through the fallback housekeeping mask) and by cpuset. However in order to preserve the preferred affinity of kthreads, cpuset will delegate the isolated partition update propagation to the housekeeping and kthread code. Prepare for that with including all unbound kthreads in the managed affinity list. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03kthread: Refine naming of affinity related fieldsFrederic Weisbecker1-19/+19
The kthreads preferred affinity related fields use "hotplug" as the base of their naming because the affinity management was initially deemed to deal with CPU hotplug. The scope of this role is going to broaden now and also deal with cpuset isolated partition updates. Switch the naming accordingly. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03PCI: Remove superfluous HK_TYPE_WQ checkFrederic Weisbecker1-14/+3
It doesn't make sense to use nohz_full without also isolating the related CPUs from the domain topology, either through the use of isolcpus= or cpuset isolated partitions. And now HK_TYPE_DOMAIN includes all kinds of domain isolated CPUs. This means that HK_TYPE_DOMAIN should always be a subset of HK_TYPE_KERNEL_NOISE (of which HK_TYPE_WQ is only an alias). Therefore sane configurations verify: HK_TYPE_KERNEL_NOISE & HK_TYPE_DOMAIN == HK_TYPE_DOMAIN Simplify the PCI probe target election accordingly. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: linux-pci@vger.kernel.org
2026-02-03sched/isolation: Remove HK_TYPE_TICK test from cpu_is_isolated()Frederic Weisbecker1-2/+1
It doesn't make sense to use nohz_full without also isolating the related CPUs from the domain topology, either through the use of isolcpus= or cpuset isolated partitions. And now HK_TYPE_DOMAIN includes all kinds of domain isolated CPUs. This means that HK_TYPE_DOMAIN should always be a subset of HK_TYPE_KERNEL_NOISE (of which HK_TYPE_TICK is only an alias). Therefore if a CPU is not HK_TYPE_DOMAIN, it shouldn't be HK_TYPE_KERNEL_NOISE either. Testing the former is then enough. Simplify cpu_is_isolated() accordingly. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Waiman Long <longman@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03cpuset: Remove cpuset_cpu_is_isolated()Frederic Weisbecker3-21/+1
The set of cpuset isolated CPUs is now included in HK_TYPE_DOMAIN housekeeping cpumask. There is no usecase left interested in just checking what is isolated by cpuset and not by the isolcpus= kernel boot parameter. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org
2026-02-03timers/migration: Remove superfluous cpuset isolation testFrederic Weisbecker1-3/+2
Cpuset isolated partitions are now included in HK_TYPE_DOMAIN. Testing if a CPU is part of an isolated partition alone is now useless. Remove the superflous test. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com>
2026-02-03cpuset: Propagate cpuset isolation update to timers through housekeepingFrederic Weisbecker2-3/+4
Until now, cpuset would propagate isolated partition changes to timer migration so that unbound timers don't get migrated to isolated CPUs. Since housekeeping now centralizes, synchronize and propagates isolation cpumask changes, perform the work from that subsystem for consolidation and consistency purposes. Signed-off-by: Frederic Weisbecker <frederic@kernel.org>
2026-02-03cpuset: Propagate cpuset isolation update to workqueue through housekeepingFrederic Weisbecker5-12/+16
Until now, cpuset would propagate isolated partition changes to workqueues so that unbound workers get properly reaffined. Since housekeeping now centralizes, synchronize and propagates isolation cpumask changes, perform the work from that subsystem for consolidation and consistency purposes. For simplification purpose, the target function is adapted to take the new housekeeping mask instead of the isolated mask. Suggested-by: Tejun Heo <tj@kernel.org> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Acked-by: Tejun Heo <tj@kernel.org> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Lai Jiangshan <jiangshanlai@gmail.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org
2026-02-03PCI: Flush PCI probe workqueue on cpuset isolated partition changeFrederic Weisbecker3-1/+21
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In order to synchronize against PCI probe works and make sure that no asynchronous probing is still pending or executing on a newly isolated CPU, the housekeeping subsystem must flush the PCI probe works. However the PCI probe works can't be flushed easily since they are queued to the main per-CPU workqueue pool. Solve this with creating a PCI probe-specific pool and provide and use the appropriate flushing API. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: linux-pci@vger.kernel.org
2026-02-03sched/isolation: Flush vmstat workqueues on cpuset isolated partition changeFrederic Weisbecker4-0/+9
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In order to synchronize against vmstat workqueue to make sure that no asynchronous vmstat work is still pending or executing on a newly made isolated CPU, the housekeeping susbsystem must flush the vmstat workqueues. This involves flushing the whole mm_percpu_wq workqueue, shared with LRU drain, introducing here a welcome side effect. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: linux-mm@kvack.org
2026-02-03sched/isolation: Flush memcg workqueues on cpuset isolated partition changeFrederic Weisbecker4-1/+18
The HK_TYPE_DOMAIN housekeeping cpumask is now modifiable at runtime. In order to synchronize against memcg workqueue to make sure that no asynchronous draining is still pending or executing on a newly made isolated CPU, the housekeeping susbsystem must flush the memcg workqueues. However the memcg workqueues can't be flushed easily since they are queued to the main per-CPU workqueue pool. Solve this with creating a memcg specific pool and provide and use the appropriate flushing API. Acked-by: Shakeel Butt <shakeel.butt@linux.dev> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org Cc: linux-mm@kvack.org
2026-02-03cpuset: Update HK_TYPE_DOMAIN cpumask from cpusetFrederic Weisbecker4-8/+80
Until now, HK_TYPE_DOMAIN used to only include boot defined isolated CPUs passed through isolcpus= boot option. Users interested in also knowing the runtime defined isolated CPUs through cpuset must use different APIs: cpuset_cpu_is_isolated(), cpu_is_isolated(), etc... There are many drawbacks to that approach: 1) Most interested subsystems want to know about all isolated CPUs, not just those defined on boot time. 2) cpuset_cpu_is_isolated() / cpu_is_isolated() are not synchronized with concurrent cpuset changes. 3) Further cpuset modifications are not propagated to subsystems Solve 1) and 2) and centralize all isolated CPUs within the HK_TYPE_DOMAIN housekeeping cpumask. Subsystems can rely on RCU to synchronize against concurrent changes. The propagation mentioned in 3) will be handled in further patches. [Chen Ridong: Fix cpu_hotplug_lock deadlock and use correct static branch API] Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Chen Ridong <chenridong@huawei.com> Signed-off-by: Chen Ridong <chenridong@huawei.com> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org
2026-02-03sched/isolation: Convert housekeeping cpumasks to rcu pointersFrederic Weisbecker2-23/+38
HK_TYPE_DOMAIN's cpumask will soon be made modifiable by cpuset. A synchronization mechanism is then needed to synchronize the updates with the housekeeping cpumask readers. Turn the housekeeping cpumasks into RCU pointers. Once a housekeeping cpumask will be modified, the update side will wait for an RCU grace period and propagate the change to interested subsystem when deemed necessary. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03cpuset: Provide lockdep check for cpuset lock heldFrederic Weisbecker2-0/+9
cpuset modifies partitions, including isolated, while holding the cpuset mutex. This means that holding the cpuset mutex is safe to synchronize against housekeeping cpumask changes. Provide a lockdep check to validate that. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: "Michal Koutný" <mkoutny@suse.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org Cc: linux-kernel@vger.kernel.org
2026-02-03cpu: Provide lockdep check for CPU hotplug lock write-heldFrederic Weisbecker3-0/+7
cpuset modifies partitions, including isolated, while holding the cpu hotplug lock read-held. This means that write-holding the CPU hotplug lock is safe to synchronize against housekeeping cpumask changes. Provide a lockdep check to validate that. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Waiman Long <longman@redhat.com> Cc: linux-kernel@vger.kernel.org
2026-02-03timers/migration: Prevent from lockdep false positive warningFrederic Weisbecker1-5/+15
Testing housekeeping_cpu() will soon require that either the RCU "lock" is held or the cpuset mutex. When CPUs get isolated through cpuset, the change is propagated to timer migration such that isolation is also performed from the migration tree. However that propagation is done using workqueue which tests if the target is actually isolated before proceeding. Lockdep doesn't know that the workqueue caller holds cpuset mutex and that it waits for the work, making the housekeeping cpumask read safe. Shut down the future warning by removing this test. It is unecessary beyond hotplug, the workqueue is already targeted towards isolated CPUs. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Gabriele Monaco <gmonaco@redhat.com>
2026-02-03block: Protect against concurrent isolated cpuset changeFrederic Weisbecker1-1/+5
The block subsystem prevents running the workqueue to isolated CPUs, including those defined by cpuset isolated partitions. Since HK_TYPE_DOMAIN will soon contain both and be subject to runtime modifications, synchronize against housekeeping using the relevant lock. For full support of cpuset changes, the block subsystem may need to propagate changes to isolated cpumask through the workqueue in the future. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Cc: Jens Axboe <axboe@kernel.dk> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: linux-block@vger.kernel.org
2026-02-03net: Keep ignoring isolated cpuset changeFrederic Weisbecker1-1/+1
RPS cpumask can be overriden through sysfs/syctl. The boot defined isolated CPUs are then excluded from that cpumask. However HK_TYPE_DOMAIN will soon integrate cpuset isolated CPUs updates and the RPS infrastructure needs more thoughts to be able to propagate such changes and synchronize against them. Keep handling only what was passed through "isolcpus=" for now. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: David S. Miller <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Simon Horman <horms@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: netdev@vger.kernel.org
2026-02-03driver core: cpu: Convert /sys/devices/system/cpu/isolated to use ↵Frederic Weisbecker1-1/+1
HK_TYPE_DOMAIN_BOOT Make sure /sys/devices/system/cpu/isolated only prints what was passed through the isolcpus= parameter before HK_TYPE_DOMAIN will also integrate cpuset isolated partitions. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Danilo Krummrich <dakr@kernel.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Rafael J. Wysocki <rafael@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03cpuset: Convert boot_hk_cpus to use HK_TYPE_DOMAIN_BOOTFrederic Weisbecker1-15/+7
boot_hk_cpus is an ad-hoc copy of HK_TYPE_DOMAIN_BOOT. Remove it and use the official version. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Chen Ridong <chenridong@huawei.com> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Michal Koutny <mkoutny@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: cgroups@vger.kernel.org
2026-02-03sched/isolation: Save boot defined domain flagsFrederic Weisbecker2-2/+7
HK_TYPE_DOMAIN will soon integrate not only boot defined isolcpus= CPUs but also cpuset isolated partitions. Housekeeping still needs a way to record what was initially passed to isolcpus= in order to keep these CPUs isolated after a cpuset isolated partition is modified or destroyed while containing some of them. Create a new HK_TYPE_DOMAIN_BOOT to keep track of those. Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Reviewed-by: Phil Auld <pauld@redhat.com> Reviewed-by: Waiman Long <longman@redhat.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03ALSA: aloop: Fix racy access at PCM triggerTakashi Iwai1-26/+36
The PCM trigger callback of aloop driver tries to check the PCM state and stop the stream of the tied substream in the corresponding cable. Since both check and stop operations are performed outside the cable lock, this may result in UAF when a program attempts to trigger frequently while opening/closing the tied stream, as spotted by fuzzers. For addressing the UAF, this patch changes two things: - It covers the most of code in loopback_check_format() with cable->lock spinlock, and add the proper NULL checks. This avoids already some racy accesses. - In addition, now we try to check the state of the capture PCM stream that may be stopped in this function, which was the major pain point leading to UAF. Reported-by: syzbot+5f8f3acdee1ec7a7ef7b@syzkaller.appspotmail.com Closes: https://lore.kernel.org/69783ba1.050a0220.c9109.0011.GAE@google.com Cc: <stable@vger.kernel.org> Link: https://patch.msgid.link/20260203141003.116584-1-tiwai@suse.de Signed-off-by: Takashi Iwai <tiwai@suse.de>
2026-02-03mm: vmstat: Prepare to protect against concurrent isolated cpuset changeFrederic Weisbecker1-4/+6
The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifiable at runtime. In order to synchronize against vmstat workqueue to make sure that no asynchronous vmstat work is pending or executing on a newly made isolated CPU, target and queue a vmstat work under the same RCU read side critical section. Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a vmstat workqueue flush will also be issued in a further change to make sure that no work remains pending after a CPU has been made isolated. Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com> Cc: linux-mm@kvack.org
2026-02-03memcg: Prepare to protect against concurrent isolated cpuset changeFrederic Weisbecker1-4/+17
The HK_TYPE_DOMAIN housekeeping cpumask will soon be made modifiable at runtime. In order to synchronize against memcg workqueue to make sure that no asynchronous draining is pending or executing on a newly made isolated CPU, target and queue a drain work under the same RCU critical section. Whenever housekeeping will update the HK_TYPE_DOMAIN cpumask, a memcg workqueue flush will also be issued in a further change to make sure that no work remains pending after a CPU has been made isolated. Acked-by: Michal Hocko <mhocko@suse.com> Signed-off-by: Frederic Weisbecker <frederic@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Marco Crivellari <marco.crivellari@suse.com> Cc: Muchun Song <muchun.song@linux.dev> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Shakeel Butt <shakeel.butt@linux.dev> Cc: Tejun Heo <tj@kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Waiman Long <longman@redhat.com>
2026-02-03fs: add porting notes about readlink_copy()Mateusz Guzik1-0/+10
Calling convention has changed in ea382199071931d1 ("vfs: support caching symlink lengths in inodes") Signed-off-by: Mateusz Guzik <mjguzik@gmail.com> Link: https://patch.msgid.link/20260203130032.315177-1-mjguzik@gmail.com Signed-off-by: Christian Brauner <brauner@kernel.org>
2026-02-03block: don't use strcpy to copy blockdev nameJohannes Thumshirn1-1/+1
0-day bot flagged the use of strcpy() in blk_trace_setup(), because the source buffer can theoretically be bigger than the destination buffer. While none of the current callers pass a string bigger than BLKTRACE_BDEV_SIZE, use strscpy() to prevent eventual future misuse and silence the checker warnings. Reported-by: kernel test robot <lkp@intel.com> Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/r/202602020718.GUEIRyG9-lkp@intel.com/ Fixes: 113cbd62824a ("blktrace: pass blk_user_trace2 to setup functions") Signed-off-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>
2026-02-03Merge branch 'accecn-protocol-case-handling-series'Paolo Abeni76-102/+1680
Chia-Yu Chang says: ==================== AccECN protocol case handling series Plesae find the v13 AccECN case handling patch series, which covers several excpetional case handling of Accurate ECN spec (RFC9768), adds new identifiers to be used by CC modules, adds ecn_delta into rate_sample, and keeps the ACE counter for computation, etc. This patch series is part of the full AccECN patch series, which is at https://github.com/L4STeam/linux-net-next/commits/upstream_l4steam/ --- Chia-Yu Chang (13): selftests/net: gro: add self-test for TCP CWR flag tcp: ECT_1_NEGOTIATION and NEEDS_ACCECN identifiers tcp: disable RFC3168 fallback identifier for CC modules tcp: accecn: handle unexpected AccECN negotiation feedback tcp: accecn: retransmit downgraded SYN in AccECN negotiation tcp: add TCP_SYNACK_RETRANS synack_type tcp: accecn: retransmit SYN/ACK without AccECN option or non-AccECN SYN/ACK tcp: accecn: unset ECT if receive or send ACE=0 in AccECN negotiaion tcp: accecn: fallback outgoing half link to non-AccECN tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSIST tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_info tcp: accecn: enable AccECN selftests/net: packetdrill: add TCP Accurate ECN cases Ilpo Järvinen (2): tcp: try to avoid safer when ACKs are thinned gro: flushing when CWR is set negatively affects AccECN Documentation/networking/ip-sysctl.rst | 4 +- .../networking/net_cachelines/tcp_sock.rst | 1 + include/linux/tcp.h | 4 +- include/net/inet_ecn.h | 20 +++- include/net/tcp.h | 32 +++++- include/net/tcp_ecn.h | 103 ++++++++++++------ include/uapi/linux/tcp.h | 26 ++++- net/ipv4/inet_connection_sock.c | 3 + net/ipv4/sysctl_net_ipv4.c | 4 +- net/ipv4/tcp.c | 10 ++ net/ipv4/tcp_cong.c | 5 +- net/ipv4/tcp_input.c | 40 ++++++- net/ipv4/tcp_minisocks.c | 43 +++++--- net/ipv4/tcp_offload.c | 3 +- net/ipv4/tcp_output.c | 34 ++++-- net/ipv4/tcp_timer.c | 3 + tools/testing/selftests/drivers/net/gro.c | 81 ++++++++++---- tools/testing/selftests/drivers/net/gro.py | 3 +- .../tcp_accecn_2nd_data_as_first.pkt | 24 ++++ .../tcp_accecn_2nd_data_as_first_connect.pkt | 30 +++++ .../tcp_accecn_3rd_ack_after_synack_rxmt.pkt | 19 ++++ ..._accecn_3rd_ack_ce_updates_received_ce.pkt | 18 +++ .../tcp_accecn_3rd_ack_lost_data_ce.pkt | 22 ++++ .../net/packetdrill/tcp_accecn_3rd_dups.pkt | 26 +++++ .../tcp_accecn_acc_ecn_disabled.pkt | 13 +++ .../tcp_accecn_accecn_then_notecn_syn.pkt | 28 +++++ .../tcp_accecn_accecn_to_rfc3168.pkt | 18 +++ .../tcp_accecn_client_accecn_options_drop.pkt | 34 ++++++ .../tcp_accecn_client_accecn_options_lost.pkt | 38 +++++++ .../tcp_accecn_clientside_disabled.pkt | 12 ++ ...cecn_close_local_close_then_remote_fin.pkt | 25 +++++ .../tcp_accecn_delivered_2ndlargeack.pkt | 25 +++++ ..._accecn_delivered_falseoverflow_detect.pkt | 31 ++++++ .../tcp_accecn_delivered_largeack.pkt | 24 ++++ .../tcp_accecn_delivered_largeack2.pkt | 25 +++++ .../tcp_accecn_delivered_maxack.pkt | 25 +++++ .../tcp_accecn_delivered_updates.pkt | 70 ++++++++++++ .../net/packetdrill/tcp_accecn_ecn3.pkt | 12 ++ .../tcp_accecn_ecn_field_updates_opt.pkt | 35 ++++++ .../packetdrill/tcp_accecn_ipflags_drop.pkt | 14 +++ .../tcp_accecn_listen_opt_drop.pkt | 16 +++ .../tcp_accecn_multiple_syn_ack_drop.pkt | 28 +++++ .../tcp_accecn_multiple_syn_drop.pkt | 18 +++ .../tcp_accecn_negotiation_bleach.pkt | 23 ++++ .../tcp_accecn_negotiation_connect.pkt | 23 ++++ .../tcp_accecn_negotiation_listen.pkt | 26 +++++ .../tcp_accecn_negotiation_noopt_connect.pkt | 23 ++++ .../tcp_accecn_negotiation_optenable.pkt | 23 ++++ .../tcp_accecn_no_ecn_after_accecn.pkt | 20 ++++ .../net/packetdrill/tcp_accecn_noopt.pkt | 27 +++++ .../net/packetdrill/tcp_accecn_noprogress.pkt | 27 +++++ .../tcp_accecn_notecn_then_accecn_syn.pkt | 28 +++++ .../tcp_accecn_rfc3168_to_fallback.pkt | 18 +++ .../tcp_accecn_rfc3168_to_rfc3168.pkt | 18 +++ .../tcp_accecn_sack_space_grab.pkt | 28 +++++ .../tcp_accecn_sack_space_grab_with_ts.pkt | 39 +++++++ ...tcp_accecn_serverside_accecn_disabled1.pkt | 20 ++++ ...tcp_accecn_serverside_accecn_disabled2.pkt | 20 ++++ .../tcp_accecn_serverside_broken.pkt | 19 ++++ .../tcp_accecn_serverside_ecn_disabled.pkt | 19 ++++ .../tcp_accecn_serverside_only.pkt | 18 +++ ...n_syn_ace_flags_acked_after_retransmit.pkt | 18 +++ .../tcp_accecn_syn_ace_flags_drop.pkt | 16 +++ ...n_ack_ace_flags_acked_after_retransmit.pkt | 27 +++++ .../tcp_accecn_syn_ack_ace_flags_drop.pkt | 26 +++++ .../net/packetdrill/tcp_accecn_syn_ce.pkt | 13 +++ .../net/packetdrill/tcp_accecn_syn_ect0.pkt | 13 +++ .../net/packetdrill/tcp_accecn_syn_ect1.pkt | 13 +++ .../net/packetdrill/tcp_accecn_synack_ce.pkt | 27 +++++ ..._accecn_synack_ce_updates_delivered_ce.pkt | 22 ++++ .../packetdrill/tcp_accecn_synack_ect0.pkt | 24 ++++ .../packetdrill/tcp_accecn_synack_ect1.pkt | 24 ++++ .../packetdrill/tcp_accecn_synack_rexmit.pkt | 15 +++ .../packetdrill/tcp_accecn_synack_rxmt.pkt | 25 +++++ .../packetdrill/tcp_accecn_tsnoprogress.pkt | 26 +++++ .../net/packetdrill/tcp_accecn_tsprogress.pkt | 25 +++++ 76 files changed, 1680 insertions(+), 102 deletions(-) create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_2nd_data_as_first.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_2nd_data_as_first_connect.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_ack_after_synack_rxmt.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_ack_ce_updates_received_ce.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_ack_lost_data_ce.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_3rd_dups.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_acc_ecn_disabled.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_accecn_then_notecn_syn.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_accecn_to_rfc3168.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_client_accecn_options_drop.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_client_accecn_options_lost.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_clientside_disabled.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_close_local_close_then_remote_fin.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_2ndlargeack.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_falseoverflow_detect.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_largeack.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_largeack2.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_maxack.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_delivered_updates.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_ecn3.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_ecn_field_updates_opt.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_ipflags_drop.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_listen_opt_drop.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_multiple_syn_ack_drop.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_multiple_syn_drop.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_bleach.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_connect.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_listen.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_noopt_connect.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_negotiation_optenable.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_no_ecn_after_accecn.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_noopt.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_noprogress.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_notecn_then_accecn_syn.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_rfc3168_to_fallback.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_rfc3168_to_rfc3168.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_sack_space_grab.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_sack_space_grab_with_ts.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_accecn_disabled1.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_accecn_disabled2.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_broken.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_ecn_disabled.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_serverside_only.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ace_flags_acked_after_retransmit.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ace_flags_drop.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ack_ace_flags_acked_after_retransmit.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ack_ace_flags_drop.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ce.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ect0.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_syn_ect1.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ce.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ce_updates_delivered_ce.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ect0.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_ect1.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_rexmit.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_synack_rxmt.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_tsnoprogress.pkt create mode 100644 tools/testing/selftests/net/packetdrill/tcp_accecn_tsprogress.pkt ==================== Link: https://patch.msgid.link/20260131222515.8485-1-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03selftests/net: packetdrill: add TCP Accurate ECN casesChia-Yu Chang58-0/+1363
Linux Accurate ECN test sets using ACE counters and AccECN options to cover several scenarios: Connection teardown, different ACK conditions, counter wrapping, SACK space grabbing, fallback schemes, negotiation retransmission/reorder/loss, AccECN option drop/loss, different handshake reflectors, data with marking, and different sysctl values. The packetdrill used is commit cbe405666c9c8698ac1e72f5e8ffc551216dfa56 of repo: https://github.com/minuscat/packetdrill/tree/upstream_accecn. And corresponding patches are sent to google/packetdrill email list. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Co-developed-by: Ilpo Järvinen <ij@kernel.org> Signed-off-by: Ilpo Järvinen <ij@kernel.org> Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-16-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: enable AccECNChia-Yu Chang1-1/+1
Enable Accurate ECN negotiation and request for incoming and outgoing connection by setting sysctl_tcp_ecn: +==============+===========================================+ | | Highest ECN variant (Accurate ECN, ECN, | | tcp_ecn | or no ECN) to be negotiated & requested | | +---------------------+---------------------+ | | Incoming connection | Outgoing connection | +==============+=====================+=====================+ | 0 | No ECN | No ECN | | 1 | ECN | ECN | | 2 | ECN | No ECN | +--------------+---------------------+---------------------+ | 3 | Accurate ECN | Accurate ECN | | 4 | Accurate ECN | ECN | | 5 | Accurate ECN | No ECN | +==============+=====================+=====================+ Refer Documentation/networking/ip-sysctl.rst for more details. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: Paolo Abeni <pabeni@redhat.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-15-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: add tcpi_ecn_mode and tcpi_option2 in tcp_infoChia-Yu Chang3-14/+31
Add 2-bit tcpi_ecn_mode feild within tcp_info to indicate which ECN mode is negotiated: ECN_MODE_DISABLED, ECN_MODE_RFC3168, ECN_MODE_ACCECN, or ECN_MODE_PENDING. This is done by utilizing available bits from tcpi_accecn_opt_seen (reduced from 16 bits to 2 bits) and tcpi_accecn_fail_mode (reduced from 16 bits to 4 bits). Also, an extra 24-bit tcpi_options2 field is identified to represent newer options and connection features, as all 8 bits of tcpi_options field have been used. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Co-developed-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: Neal Cardwell <ncardwell@google.com> Reviewed-by: Eric Dumazet <edumazet@google.com> Link: https://patch.msgid.link/20260131222515.8485-14-chia-yu.chang@nokia-bell-labs.com Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2026-02-03tcp: accecn: detect loss ACK w/ AccECN option and add TCP_ACCECN_OPTION_PERSISTChia-Yu Chang6-5/+26
Detect spurious retransmission of a previously sent ACK carrying the AccECN option after the second retransmission. Since this might be caused by the middlebox dropping ACK with options it does not recognize, disable the sending of the AccECN option in all subsequent ACKs. This patch follows Section 3.2.3.2.2 of AccECN spec (RFC9768), and a new field (accecn_opt_sent_w_dsack) is added to indicate that an AccECN option was sent with duplicate SACK info. Also, a new AccECN option sending mode is added to tcp_ecn_option sysctl: (TCP_ECN_OPTION_PERSIST), which ignores the AccECN fallback policy and persistently sends AccECN option once it fits into TCP option space. Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com> Acked-by: P