aboutsummaryrefslogtreecommitdiff
path: root/fs/dlm/lock.c
AgeCommit message (Collapse)AuthorFilesLines
31 hoursMerge tag 'locking-core-2026-02-08' of ↵Linus Torvalds1-1/+1
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull locking updates from Ingo Molnar: "Lock debugging: - Implement compiler-driven static analysis locking context checking, using the upcoming Clang 22 compiler's context analysis features (Marco Elver) We removed Sparse context analysis support, because prior to removal even a defconfig kernel produced 1,700+ context tracking Sparse warnings, the overwhelming majority of which are false positives. On an allmodconfig kernel the number of false positive context tracking Sparse warnings grows to over 5,200... On the plus side of the balance actual locking bugs found by Sparse context analysis is also rather ... sparse: I found only 3 such commits in the last 3 years. So the rate of false positives and the maintenance overhead is rather high and there appears to be no active policy in place to achieve a zero-warnings baseline to move the annotations & fixers to developers who introduce new code. Clang context analysis is more complete and more aggressive in trying to find bugs, at least in principle. Plus it has a different model to enabling it: it's enabled subsystem by subsystem, which results in zero warnings on all relevant kernel builds (as far as our testing managed to cover it). Which allowed us to enable it by default, similar to other compiler warnings, with the expectation that there are no warnings going forward. This enforces a zero-warnings baseline on clang-22+ builds (Which are still limited in distribution, admittedly) Hopefully the Clang approach can lead to a more maintainable zero-warnings status quo and policy, with more and more subsystems and drivers enabling the feature. Context tracking can be enabled for all kernel code via WARN_CONTEXT_ANALYSIS_ALL=y (default disabled), but this will generate a lot of false positives. ( Having said that, Sparse support could still be added back, if anyone is interested - the removal patch is still relatively straightforward to revert at this stage. ) Rust integration updates: (Alice Ryhl, Fujita Tomonori, Boqun Feng) - Add support for Atomic<i8/i16/bool> and replace most Rust native AtomicBool usages with Atomic<bool> - Clean up LockClassKey and improve its documentation - Add missing Send and Sync trait implementation for SetOnce - Make ARef Unpin as it is supposed to be - Add __rust_helper to a few Rust helpers as a preparation for helper LTO - Inline various lock related functions to avoid additional function calls WW mutexes: - Extend ww_mutex tests and other test-ww_mutex updates (John Stultz) Misc fixes and cleanups: - rcu: Mark lockdep_assert_rcu_helper() __always_inline (Arnd Bergmann) - locking/local_lock: Include more missing headers (Peter Zijlstra) - seqlock: fix scoped_seqlock_read kernel-doc (Randy Dunlap) - rust: sync: Replace `kernel::c_str!` with C-Strings (Tamir Duberstein)" * tag 'locking-core-2026-02-08' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (90 commits) locking/rwlock: Fix write_trylock_irqsave() with CONFIG_INLINE_WRITE_TRYLOCK rcu: Mark lockdep_assert_rcu_helper() __always_inline compiler-context-analysis: Remove __assume_ctx_lock from initializers tomoyo: Use scoped init guard crypto: Use scoped init guard kcov: Use scoped init guard compiler-context-analysis: Introduce scoped init guards cleanup: Make __DEFINE_LOCK_GUARD handle commas in initializers seqlock: fix scoped_seqlock_read kernel-doc tools: Update context analysis macros in compiler_types.h rust: sync: Replace `kernel::c_str!` with C-Strings rust: sync: Inline various lock related methods rust: helpers: Move #define __rust_helper out of atomic.c rust: wait: Add __rust_helper to helpers rust: time: Add __rust_helper to helpers rust: task: Add __rust_helper to helpers rust: sync: Add __rust_helper to helpers rust: refcount: Add __rust_helper to helpers rust: rcu: Add __rust_helper to helpers rust: processor: Add __rust_helper to helpers ...
2026-01-20dlm: validate length in dlm_search_rsb_treeEzrak1e1-1/+2
The len parameter in dlm_dump_rsb_name() is not validated and comes from network messages. When it exceeds DLM_RESNAME_MAXLEN, it can cause out-of-bounds write in dlm_search_rsb_tree(). Add length validation to prevent potential buffer overflow. Signed-off-by: Ezrak1e <ezrakiez@gmail.com> Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-20dlm: fix recovery pending middle conversionAlexander Aring1-18/+1
During a workload involving conversions between lock modes PR and CW, lock recovery can create a "conversion deadlock" state between locks that have been recovered. When this occurs, kernel warning messages are logged, e.g. "dlm: WARN: pending deadlock 1e node 0 2 1bf21" "dlm: receive_rcom_lock_args 2e middle convert gr 3 rq 2 remote 2 1e" After this occurs, the deadlocked conversions both appear on the convert queue of the resource being locked, and the conversion requests do not complete. Outside of recovery, conversions that would produce a deadlock are resolved immediately, and return -EDEADLK. The locks are not placed on the convert queue in the deadlocked state. To fix this problem, an lkb under conversion between PR/CW is rebuilt during recovery on a new master's granted queue, with the currently granted mode, rather than being rebuilt on the new master's convert queue, with the currently granted mode and the newly requested mode. The in-progress convert is then resent to the new master after recovery, so the conversion deadlock will be processed outside of the recovery context and handled as described above. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2026-01-05compiler-context-analysis: Change __cond_acquires to take return valueMarco Elver1-1/+1
While Sparse is oblivious to the return value of conditional acquire functions, Clang's context analysis needs to know the return value which indicates successful acquisition. Add the additional argument, and convert existing uses. Notably, Clang's interpretation of the value merely relates to the use in a later conditional branch, i.e. 1 ==> context lock acquired in branch taken if condition non-zero, and 0 ==> context lock acquired in branch taken if condition is zero. Given the precise value does not matter, introduce symbolic variants to use instead of either 0 or 1, which should be more intuitive. No functional change intended. Signed-off-by: Marco Elver <elver@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://patch.msgid.link/20251219154418.3592607-10-elver@google.com
2025-08-14dlm: move to rinfo for all middle conversion casesAlexander Aring1-1/+1
Since commit f74dacb4c8116 ("dlm: fix recovery of middle conversions") we introduced additional debugging information if we hit the middle conversion by using log_limit(). The DLM log_limit() functionality requires a DLM debug option being enabled. As this case is so rarely and excempt any potential introduced new issue with recovery we switching it to log_rinfo() ad this is ratelimited under normal DLM loglevel. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-06-08treewide, timers: Rename from_timer() to timer_container_of()Ingo Molnar1-1/+1
Move this API to the canonical timer_*() namespace. [ tglx: Redone against pre rc1 ] Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/all/aB2X0jCKQO56WdMt@gmail.com
2025-02-28dlm: fix error if active rsb is not hashedAlexander Aring1-0/+1
If an active rsb is not hashed anymore and this could occur because we releases and acquired locks we need to signal the followed code that the lookup failed. Since the lookup was successful, but it isn't part of the rsb hash anymore we need to signal it by setting error to -EBADR as dlm_search_rsb_tree() does it. Cc: stable@vger.kernel.org Fixes: 5be323b0c64d ("dlm: move dlm_search_rsb_tree() out of lock") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2025-02-28dlm: fix error if inactive rsb is not hashedAlexander Aring1-0/+1
If an inactive rsb is not hashed anymore and this could occur because we releases and acquired locks we need to signal the followed code that the lookup failed. Since the lookup was successful, but it isn't part of the rsb hash anymore we need to signal it by setting error to -EBADR as dlm_search_rsb_tree() does it. Cc: stable@vger.kernel.org Fixes: 01fdeca1cc2d ("dlm: use rcu to avoid an extra rsb struct lookup") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-12-19dlm: fix removal of rsb struct that is master and dir recordAlexander Aring1-16/+30
An rsb struct was not being removed in the case where it was both the master and the dir record. This case (master and dir node) was missed in the condition for doing add_scan() from deactivate_rsb(). Fixing this triggers a related WARN_ON that needs to be fixed, and requires adjusting where two del_scan() calls are made. Fixes: c217adfc8caa ("dlm: fix add_scan and del_scan usage") Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-11-15dlm: fix recovery of middle conversionsAlexander Aring1-8/+12
In one special case, recovery is unable to reliably rebuild lock state by simply recreating lkb structs as sent from the lock holders. That case is when the lkb's include conversions between PR and CW modes. The recovery code has always recognized this special case, but the implemention has always been broken, and would set invalid modes in recovered lkb's. Unpredictable or bogus errors could then be returned for further locking calls on these locks. This bug has gone unnoticed for so long due to some combination of: - applications never or infrequently converting between PR/CW - recovery not occuring during these conversions - if the recovery bug does occur, the caller may not notice, depending on what further locking calls are made, e.g. if the lock is simply unlocked it may go unnoticed However, a core analysis from a recent gfs2 bug report points to this broken code. PR = Protected Read CW = Concurrent Write PR and CW are incompatible PR and PR are compatible CW and CW are compatible Example 1 node C, resource R granted: PR node A granted: PR node B granted: NL node C granted: NL node D - A sends convert PR->CW to C - C fails before A gets a reply - recovery occurs At this point, A does not know if it still holds the lock in PR, or if its conversion to CW was granted: - If A's conversion to CW was granted, then another node's CW lock may also have been granted. - If A's conversion to CW was not granted, it still holds a PR lock, and other nodes may also hold PR locks. So, the new master of R cannot simply recreate the lock from A using granted mode PR and requested mode CW. The new master must look at all the recovered locks to determine the correct granted modes, and ensure that all the recovered locks are recreated in compatible states. The correct lock recovery steps in this example are: - node D becomes the new master of R - node B sends D its lkb, granted PR - node A sends D its lkb, convert PR->CW - D determines the correct lock state is: granted: PR node B convert: PR->CW node A The lkb sent by each node was recreated without any change on the new master node. Example 2 node C, resource R granted: PR node A granted: NL node C granted: NL node D waiting: CW node B - A sends convert PR->CW to C - C grants the conversion to CW for A - C grants the waiting request for CW to B - C sends granted message to B, but fails before it can send the granted message to A - B receives the granted message from C At this point: - A believes it is converting PR->CW - B believes it is holding a CW lock The correct lock recovery steps in this example are: - node D becomes the new master of R - node A sends D its lkb, convert PR->CW - node B sends D its lkb, granted CW - D determins the correct lock state is: granted: CW node B granted: CW node A The lkb sent by B is recreated without change, but the lkb sent by A is changed because the granted mode was not compatible. Fixes to make this work correctly: recover_convert_waiter: should not make any changes to a converting lkb that is still waiting for a reply message. It was previously setting grmode to IV, which is invalid state, so the lkb would not be handled correctly by other code. receive_rcom_lock_args: was checking the wrong lkb field (wait_type instead of status) to determine if the lkb is being converted, and in need of inspection for this special recovery. It was also setting grmode to IV in the lkb, causing it to be mishandled by other code. Now, this function just puts the lkb, directly as sent, onto the convert queue of the resource being recovered, and corrects it in recover_conversion() later, if needed. recover_conversion: the job of this function is to detect and correct lkb states for the special PR/CW conversions. The new code now checks for recovered lkbs on the granted queue with grmode PR or CW, and takes the real grmode from that. Then it looks for lkbs on the convert queue with an incompatible grmode (i.e. grmode PR when the real grmode is CW, or v.v.) These converting lkbs need to be fixed. They are fixed by temporarily setting their grmode to NL, so that grmodes are not incompatible and won't confuse other locking code. The converting lkb will then be granted at the end of recovery, replacing the temporary NL grmode. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04dlm: make add_to_waiters() that it can't failAlexander Aring1-29/+14
If add_to_waiters() fails we have a problem because the previous called functions such as validate_lock_args() or validate_unlock_args() sets specific lkb values that are set for a request, there exists no way back to revert those changes. When there is a pending lock request the original request arguments will be overwritten with unknown consequences. The good news are that I believe those cases that we fail in add_to_waiters() can't happen or very unlikely to happen (only if the DLM user does stupid API things), but if so we have the above mentioned problem. There are two conditions that will be removed here. The first one is the -EINVAL case which contains is_overlap_unlock() or (is_overlap_cancel() and mstype == DLM_MSG_CANCEL). The is_overlap_unlock() is missing for the normal UNLOCK case which is moved to validate_unlock_args(). The is_overlap_cancel() already happens in validate_unlock_args() when DLM_LKF_CANCEL is set. In case of validate_lock_args() we check on is_overlap() when it is not a new request, on a new request the lkb is always new and does not have those values set. The -EBUSY check can't happen in case as for non new lock requests (when DLM_LKF_CONVERT is set) we already check in validate_lock_args() for lkb_wait_type and is_overlap(). Then there is only validate_unlock_args() that will never hit the default case because dlm_unlock() will produce DLM_MSG_UNLOCK and DLM_MSG_CANCEL messages. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-10-04dlm: fix possible lkb_resource null dereferenceAlexander Aring1-6/+4
This patch fixes a possible null pointer dereference when this function is called from request_lock() as lkb->lkb_resource is not assigned yet, only after validate_lock_args() by calling attach_lkb(). Another issue is that a resource name could be a non printable bytearray and we cannot assume to be ASCII coded. The log functionality is probably never being hit when DLM is used in normal way and no debug logging is enabled. The null pointer dereference can only occur on a new created lkb that does not have the resource assigned yet, it probably never hits the null pointer dereference but we should be sure that other changes might not change this behaviour and we actually can hit the mentioned null pointer dereference. In this patch we just drop the printout of the resource name, the lkb id is enough to make a possible connection to a resource name if this exists. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08dlm: move lkb xarray lookup out of lockAlexander Aring1-4/+14
This patch moves the xarray lookup functionality for the lkb out of the ls_lkbxa_lock read lock handling. We can do that as the xarray should be possible to access lockless in case of reader like xa_load(). We confirm under ls_lkbxa_lock that the lkb is still part of the data structure and take a reference when its still part of ls_lkbxa to avoid being freed after doing the lookup. To do a check if the lkb is still part of the ls_lkbxa data structure we use a kref_read() as the last put will remove it from the ls_lkbxa data structure and any reference taken means it is still part of ls_lkbxa. A similar approach was done with the DLM rsb rhashtable just with a flag instead of the refcounter because the refcounter has a slightly different meaning. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08dlm: move dlm_search_rsb_tree() out of lockAlexander Aring1-30/+47
The rhashtable structure is lockless for readers such as rhashtable_lookup_fast(). It should be save to call this lookup functionality out of holding ls_rsbtbl_lock to get the rsb pointer out of the hash. This reduce the contention time of ls_rsbtbl_lock in some cases. We still need to check if the rsb is part of the check as this state can be changed while ls_rsbtbl_lock is not held. If its part of the rhashtable data structure we take a reference to be sure it will not be freed after we drop the ls_rsbtbl_lock read lock. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08dlm: use RSB_HASHED to avoid lookup twiceAlexander Aring1-3/+3
Since commit 01fdeca1cc2d ("dlm: use rcu to avoid an extra rsb struct lookup") _dlm_master_lookup() is called under rcu lock that prevents that the rsb structure is being freed. There was a missing change to avoid an additional lookup and just check that the rsb is still part of the ls_rsbtbl structure. This patch is doing such check instead of lookup the rsb structure again. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08dlm: warn about invalid nodeid comparsionsAlexander Aring1-3/+3
This patch adds a warn on if is_master() and dlm_is_removed() checks on invalid nodeid states that are probably not what the caller wants to do here. The is_master() function checking on r->res_nodeid is invalid when it is set to -1, whereas the dlm_is_removed() has a different meaning as "nodeid member" and also 0 is invalid. We run into these cases and this patch changes those cases as we never will run into them. There should be no functional changes as the condition should return the same result. However this patch signals now on caller level that there might be an "extra" case to handle here. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08dlm: remove unnecessary refcountsAlexander Aring1-16/+1
This patch removes unnecessary refcounts that are obviously not necessary because either when the pointer is passed as parameter or it is part of a list we should already hold a reference to it. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-08-08dlm: cleanup memory allocation helpersAlexander Aring1-2/+2
This patch removes a unnecessary parameter from DLM memory allocation helpers and reduce some functions by just directly reply the pointer address of the allocated memory. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-06-10dlm: use rcu to avoid an extra rsb struct lookupAlexander Aring1-15/+87
Use rcu to free rsb structs, and hold the rcu read lock while looking up rsb structs. This allows us to avoid an extra hash table lookup for an rsb. A new rsb flag HASHED is added which is set while the rsb is in the hash table. This flag is checked in place of repeating the hash table lookup. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-06-10dlm: fix add_scan and del_scan usageDavid Teigland1-22/+35
Remove a few calls to add_scan() and del_scan() in cases where the rsb is a dir record, so the rsb should never be placed on the scan list at all. Add WARN_ON to catch cases where this is done. Signed-off-by: David Teigland <teigland@redhat.com>
2024-06-10dlm: change list and timer namesDavid Teigland1-168/+140
The old terminology of "toss" and "keep" is no longer an accurate description of the rsb states and lists, so change the names to "inactive" and "active". The old names had also been copied into the scanning code, which is changed back to use the "scan" name. - "active" rsb structs have lkb's attached, and are ref counted. - "inactive" rsb structs have no lkb's attached, are not ref counted. - "scan" list is for rsb's that can be freed after a timeout period. - "slow" lists are for infrequent iterations through active or inactive rsb structs. - inactive rsb structs that are directory records will not be put on the scan list, since they are not freed based on timeouts. - inactive rsb structs that are not directory records will be put on the scan list to be freed, since they are not longer needed. Signed-off-by: David Teigland <teigland@redhat.com>
2024-05-31dlm: move lkb idr to xarray datastructureAlexander Aring1-14/+16
According to kernel doc idr is deprecated and xarrays should be used nowadays. This patch is moving the lkb idr implementation to xarrays. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-05-31dlm: drop own rsb pre allocation mechanismAlexander Aring1-80/+12
This patch drops the own written rsb pre allocation mechanism as this is already done by using kmem caches, we don't need another layer on top of that to running some pre allocation scheme. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-05-31dlm: don't kref_init rsbs created for toss listAlexander Aring1-1/+0
This patch removes a kref_init() that isn't necessary because the rsb is created for toss list. Under toss list the rsb should not have any reference counting logic. If in theory the rsb gets to into keep list then a kref_init() for res_ref will be initiated. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-17dlm: fix sleep in atomic contextAlexander Aring1-6/+6
This patch changes the orphans mutex to a spinlock since commit c288745f1d4a ("dlm: avoid blocking receive at the end of recovery") is using a rwlock_t to lock the DLM message receive path and do_purge() can be called while this lock is held that forbids to sleep. We need to use spin_lock_bh() because also a user context that calls dlm_user_purge() can call do_purge() and since commit 92d59adfaf71 ("dlm: do message processing in softirq context") the DLM message receive path is done under softirq context. Fixes: c288745f1d4a ("dlm: avoid blocking receive at the end of recovery") Reported-by: Dan Carpenter <dan.carpenter@linaro.org> Closes: https://lore.kernel.org/gfs2/9ad928eb-2ece-4ad9-a79c-d2bce228e4bc@moroto.mountain/ Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: use rwlock for lkbidrAlexander Aring1-37/+7
Convert the lock for lkbidr to an rwlock. Most idr lookups will use the read lock. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: use rwlock for rsb hash tableAlexander Aring1-75/+194
The conversion to rhashtable introduced a hash table lock per lockspace, in place of per bucket locks. To make this more scalable, switch to using a rwlock for hash table access. The common case fast path uses it as a read lock. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: drop dlm_scand kthread and use timersAlexander Aring1-142/+240
Currently the scand kthread acts like a garbage collection for expired rsbs on toss list, to clean them up after a certain timeout. It triggers every couple of seconds and iterates over the toss list while holding ls_rsbtbl_lock for the whole hash bucket iteration. To reduce the amount of time holding ls_rsbtbl_lock, we now handle the disposal of expired rsbs using a per-lockspace timer that expires for the earliest tossed rsb on the lockspace toss queue. This toss queue is ordered according to the rsb res_toss_time with the earliest tossed rsb as the first entry. The toss timer will only trylock() necessary locks, since it is low priority garbage collection, and will rearm the timer if trylock() fails. If the timer function does not find any expired rsb's, it rearms the timer with the next earliest expired rsb. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: do not use ref counts for rsb in the toss stateAlexander Aring1-31/+30
In the past we had problems when an rsb had a reference counter greater than one while in the toss state. An rsb in the toss state is not actively used for locking, and should not have any other references apart from the single ref keeping it on the rsb hash. Shift to freeing rsb's directly rather than using kref_put to free them, since the ref counting is not meant to be used in this state. Add warnings if ref counting is seen while an rsb is in the toss state. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: switch to use rhashtable for rsbsAlexander Aring1-118/+54
Replace our own hash table with the more advanced rhashtable for keeping rsb structs. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: add rsb lists for iterationAlexander Aring1-26/+21
To prepare for using rhashtable, add two rsb lists for iterating through rsb's in two uncommon cases where this is necesssary: - when dumping rsb state from debugfs, now using seq_list. - when looking at all rsb's during recovery. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: merge toss and keep hash table lists into one listAlexander Aring1-48/+55
There are several places where lock processing can perform two hash table lookups, first in the "keep" list, and if not found, in the "toss" list. This patch introduces a new rsb state flag "RSB_TOSS" to represent the difference between the state of being on keep vs toss list, so that the two lists can be combined. This avoids cases of two lookups. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-16dlm: change to single hashtable lockAlexander Aring1-39/+38
Prepare to replace our own hash table with rhashtable by replacing the per-bucket locks in our own hash table with a single lock. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: use spin_lock_bh for message processingAlexander Aring1-88/+118
Use spin_lock_bh for all spinlocks involved in message processing, in preparation for softirq message processing. DLM lock requests from user space involve dlm processing in user context, in addition to the standard kernel context, necessitating bh variants. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: remove schedule in receive pathAlexander Aring1-1/+0
Remove an explicit schedule() call in the message processing path, in preparation for softirq message processing. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: convert ls_recv_active from rw_semaphore to rwlockAlexander Aring1-2/+2
Convert ls_recv_active rw_semaphore to an rwlock to avoid sleeping, in preparation for softirq message processing. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: avoid blocking receive at the end of recoveryAlexander Aring1-2/+14
The end of the recovery process transitioned to normal message processing by temporarily blocking the receiving context, processing saved messages, then unblocking the receiving context. To avoid blocking the receiving context, the old wait_queue and mutex are replaced by a new rwlock and the new RECV_MSG_BLOCKED flag. Received messages are added to the list of saved messages, protected by the rwlock, until the flag is cleared, which happens when all saved messages have been processed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: convert res_lock to spinlockAlexander Aring1-1/+1
Convert the rsb struct res_lock from a mutex to a spinlock in preparation for processing messages in softirq context. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: convert ls_waiters_mutex to spinlockAlexander Aring1-10/+10
Convert the waiters mutex to a spinlock in prepration for processing messages in softirq context. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: drop mutex use in waiters recoveryAlexander Aring1-8/+9
The waiters_mutex no longer needs to be used in the waiters recovery functions dlm_recover_waiters_pre() and dlm_recover_waiters_pre(). During recovery, ordinary locking operations are paused, and the recovery thread is the only context accessing the waiters list, so the lock is not needed. Access to the waiters list from debugfs functions is avoided by taking the top level recovery lock in the debugfs dump function. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: move rsb root_list to ls_recover() stackAlexander Aring1-4/+2
Move the rsb root_list from the lockspace to a stack variable since it is now only used by the ls_recover() function. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: use a new list for recovery of master rsb namesAlexander Aring1-0/+2
Add a new "masters_list" for master rsb structs, with a new rwlock. The new list is created and used during the recovery process to send the master rsb names to new nodes. With this change, the current "root_list" can be used without locking. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: switch to GFP_ATOMIC in dlm allocationsAlexander Aring1-2/+0
Replace GFP_NOFS with GFP_ATOMIC. Also stop using idr_preload which uses a non-bh spin_lock. This is further preparation for softirq message processing. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-09dlm: remove allocation parameter in msg allocationAlexander Aring1-19/+12
Remove the context parameter for message allocations and always use GFP_ATOMIC. This prepares for softirq message processing. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-01dlm: remove callback reference countingAlexander Aring1-3/+5
Get rid of the unnecessary refcounting on callback structs. Copy interesting callback info into the lkb struct rather than maintaining pointers to callback structs from the lkb. This goes back to the way things were done prior to commit 61bed0baa4db ("fs: dlm: use a non-static queue for callbacks"). Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-04-01dlm: fix race between final callback and removeAlexander Aring1-12/+8
This patch fixes the following issue: node 1 is dir node 2 is master node 3 is other 1->2: unlock 2: put final lkb, rsb moved to toss 2->1: unlock_reply 1: queue lkb callback with EUNLOCK 2->1: remove 1: receive_remove ignored (rsb on keep because of queued lkb callback) 1: complete lkb callback, put_lkb, move rsb to toss 3->1: lookup 1->3: lookup_reply master=2 3->2: request 2->3: request_reply EBADR In summary: An unexpected lkb reference causes the rsb to remain on the wrong list. The rsb being on the wrong list causes receive_remove to be ignored. An ignored receive_remove causes inconsistent dir and master state. This sequence requires an unusually long delay in delivering the unlock callback, because the remove message from 2->1 usually happens after some seconds. So, it's not known exactly how frequently this sequence occurs in pratice. It's possible that the same end result could also have another unknown cause. The solution for this issue is to further separate callback state from the lkb, so that an lkb reference (and from that, an rsb ref) are not held while a callback remains queued. Then, within the unlock_reply, the lkb will be freed and the rsb moved to the toss list. So, the receive_remove will not be ignored. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2024-03-15dlm: add comments about forced waiters resetDavid Teigland1-20/+58
When a lock is waiting for a reply for a remote operation, and recovery interrupts this "waiters" state, the remote operation is voided by the recovery, and no reply will be processed. The lkb waiters state for the remote operation is forcibly reset/cleared, so that the lock operation can be restarted after recovery. Improve the comments describing this. Signed-off-by: David Teigland <teigland@redhat.com>
2024-03-15dlm: revert atomic_t lkb_wait_countDavid Teigland1-14/+18
Revert "fs: dlm: handle lkb wait count as atomic_t" This reverts commit 75a7d60134ce84209f2c61ec4619ee543aa8f466. This counter does not need to be atomic. As the comment in the reverted commit mentions, the counter is protected by the rsb lock. Signed-off-by: David Teigland <teigland@redhat.com>
2023-08-10fs: dlm: constify receive bufferAlexander Aring1-51/+58
The dlm receive buffer should be never manipulated as DLM is the last instance of parsing layer. This patch constify the whole receive buffer so we are sure it never gets manipulated when it's being parsed. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>
2023-08-10fs: dlm: drop rxbuf manipulation in dlm_recover_master_copyAlexander Aring1-3/+7
Currently dlm_recover_master_copy() manipulates the receive buffer of an rcom lock message and modifies it on the fly so a later memcpy() to a new rcom message with the same message has those new values. This patch avoids manipulating the received rcom message by store the values for the new rcom message in paremter assigned with call by reference. Later when dlm_send_rcom_lock() constructs a new message and memcpy() the receive buffer those values will be set on the new constructed message. Signed-off-by: Alexander Aring <aahringo@redhat.com> Signed-off-by: David Teigland <teigland@redhat.com>