Summary:
The current locktree implementation stores the address of the
PessimisticTransactions object as the TXNID. However, when a transaction
is blocked on a lock, it records the list of waitees with conflicting
locks using the rocksdb assigned TransactionID. This is performed by
calling GetID() on PessimisticTransactions objects of the waitees,
and then recorded in the waiter's list.
However, there is no guarantee the objects are valid when recording the
waitee list during the conflict callbacks because the waitee
could have released the lock and freed the PessimisticTransactions
object.
The waitee/txnid values are only valid PessimisticTransaction objects
while the mutex for the root of the locktree is held.
The simplest fix for this problem is to use the address of the
PessimisticTransaction as the TransactionID so that it is consistent
with its usage in the locktree. The TXNID is only converted back to a
PessimisticTransaction for the report_wait callbacks. Since
these callbacks are now all made within the critical section where the
lock_request queue mutx is held, these conversions will be safe.
Otherwise, only the uint64_t TXNID of the waitee is registerd
with the waiter transaction. The PessimisitcTransaction object of the
waitee is never referenced.
The main downside of this approach is the TransactionID will not change
if the PessimisticTransaction object is reused for new transactions.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9898
Test Plan:
Add a new test case and run unit tests.
Also verified with MyRocks workloads using range locks that the
crash no longer happens.
Reviewed By: riversand963
Differential Revision: D35950376
Pulled By: hermanlee
fbshipit-source-id: 8c9cae272e23e487fc139b6a8ed5b8f8f24b1570
Summary:
Range Locking supports Lock Escalation. Lock Escalation is invoked when
lock memory is nearly exhausted and it reduced the amount of memory used
by joining adjacent locks.
Bridging the gap between certain locks has adverse effects. For example,
in MyRocks it is not a good idea to bridge the gap between locks in
different indexes, as that get the lock to cover large portions of
indexes, or even entire indexes.
Resolve this by introducing Escalation Barrier. The escalation process
will call the user-provided barrier callback function:
bool(const Endpoint& a, const Endpoint& b)
If the function returns true, there's a barrier between a and b and Lock
Escalation will not try to bridge the gap between a and b.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9290
Reviewed By: akankshamahajan15
Differential Revision: D33486753
Pulled By: riversand963
fbshipit-source-id: f97910b67aba0579ea1d35f523ca6863d3dd018e
Summary:
locktree is a module providing Range Locking. It has a counter for
the number of times a lock acquisition request was blocked by an
existing conflicting lock and had to wait for it to be released.
Expose this counter in RangeLockManagerHandle::Counters::lock_wait_count.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/9289
Reviewed By: jay-zhuang
Differential Revision: D33079182
Pulled By: riversand963
fbshipit-source-id: 25b1a362d9da247536ab5007bd15900b319f139e
Summary:
Fix this scenario:
trx1> acquire shared lock on $key
trx2> acquire shared lock on the same $key
trx1> attempt to acquire a unique lock on $key.
Lock acquisition will fail, and deadlock detection will start.
It will call iterate_and_get_overlapping_row_locks() which will
produce a list with two locks (shared locks by trx1 and trx2).
However the code in lock_request::build_wait_graph() was not prepared
to find the lock by the same transaction in the list of conflicting
locks. Fix it to ignore it.
(One may suggest to fix iterate_and_get_overlapping_row_locks() to not
include locks by trx1. This is not a good idea, because that function
is also used to report all locks currently held)
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7938
Reviewed By: zhichao-cao
Differential Revision: D26529374
Pulled By: ajkr
fbshipit-source-id: d89cbed008db1a97a8f2351b9bfb75310750d16a
Summary:
BasicLockEscalation will cause false-positive warnings under TSAN (this is a known issue in TSAN, see details in https://gist.github.com/spetrunia/77274cf2d5848e0a7e090d622695ed4e), skip this test if TSAN is enabled, or if we are not sure whether TSAN is enabled.
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7814
Test Plan: watch the tsan contrun test to pass.
Reviewed By: zhichao-cao
Differential Revision: D25708094
Pulled By: cheng-chang
fbshipit-source-id: 4fc813ff373301d033d086154cc7bb60a5e95889
Summary:
Range Locking - an implementation based on the locktree library
- Add a RangeTreeLockManager and RangeTreeLockTracker which implement
range locking using the locktree library.
- Point locks are handled as locks on single-point ranges.
- Add a unit test: range_locking_test
Pull Request resolved: https://github.com/facebook/rocksdb/pull/7506
Reviewed By: akankshamahajan15
Differential Revision: D25320703
Pulled By: cheng-chang
fbshipit-source-id: f86347384b42ba2b0257d67eca0f45f806b69da7