Push- instead of pull-model for managing Write stalls
Summary:
Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
* proceed with all writes without delay
* delay all writes by fixed time
* stop all writes
The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).
When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.
This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.
Test Plan: make check for now. I'll add some unit tests later. Also, perf test.
Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22791
10 years ago
|
|
|
// Copyright (c) 2013, Facebook, Inc. All rights reserved.
|
|
|
|
// This source code is licensed under the BSD-style license found in the
|
|
|
|
// LICENSE file in the root directory of this source tree. An additional grant
|
|
|
|
// of patent rights can be found in the PATENTS file in the same directory.
|
|
|
|
|
|
|
|
#pragma once
|
|
|
|
|
|
|
|
#include <stdint.h>
|
|
|
|
|
|
|
|
#include <memory>
|
|
|
|
|
|
|
|
namespace rocksdb {
|
|
|
|
|
|
|
|
class WriteControllerToken;
|
|
|
|
|
|
|
|
// WriteController is controlling write stalls in our write code-path. Write
|
|
|
|
// stalls happen when compaction can't keep up with write rate.
|
|
|
|
// All of the methods here (including WriteControllerToken's destructors) need
|
|
|
|
// to be called while holding DB mutex
|
|
|
|
class WriteController {
|
|
|
|
public:
|
|
|
|
WriteController() : total_stopped_(0), total_delay_us_(0) {}
|
|
|
|
~WriteController() = default;
|
|
|
|
|
|
|
|
// When an actor (column family) requests a stop token, all writes will be
|
|
|
|
// stopped until the stop token is released (deleted)
|
|
|
|
std::unique_ptr<WriteControllerToken> GetStopToken();
|
|
|
|
// When an actor (column family) requests a delay token, total delay for all
|
|
|
|
// writes will be increased by delay_us. The delay will last until delay token
|
|
|
|
// is released
|
|
|
|
std::unique_ptr<WriteControllerToken> GetDelayToken(uint64_t delay_us);
|
|
|
|
|
|
|
|
// these two metods are querying the state of the WriteController
|
|
|
|
bool IsStopped() const;
|
|
|
|
uint64_t GetDelay() const;
|
|
|
|
|
|
|
|
private:
|
|
|
|
friend class WriteControllerToken;
|
|
|
|
friend class StopWriteToken;
|
|
|
|
friend class DelayWriteToken;
|
|
|
|
|
|
|
|
int total_stopped_;
|
|
|
|
uint64_t total_delay_us_;
|
|
|
|
};
|
|
|
|
|
|
|
|
class WriteControllerToken {
|
|
|
|
public:
|
|
|
|
explicit WriteControllerToken(WriteController* controller)
|
|
|
|
: controller_(controller) {}
|
|
|
|
virtual ~WriteControllerToken() {}
|
Push- instead of pull-model for managing Write stalls
Summary:
Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
* proceed with all writes without delay
* delay all writes by fixed time
* stop all writes
The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).
When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.
This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.
Test Plan: make check for now. I'll add some unit tests later. Also, perf test.
Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22791
10 years ago
|
|
|
|
|
|
|
protected:
|
|
|
|
WriteController* controller_;
|
|
|
|
|
|
|
|
private:
|
|
|
|
// no copying allowed
|
|
|
|
WriteControllerToken(const WriteControllerToken&) = delete;
|
|
|
|
void operator=(const WriteControllerToken&) = delete;
|
|
|
|
};
|
|
|
|
|
|
|
|
class StopWriteToken : public WriteControllerToken {
|
|
|
|
public:
|
|
|
|
explicit StopWriteToken(WriteController* controller)
|
|
|
|
: WriteControllerToken(controller) {}
|
|
|
|
virtual ~StopWriteToken();
|
Push- instead of pull-model for managing Write stalls
Summary:
Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
* proceed with all writes without delay
* delay all writes by fixed time
* stop all writes
The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).
When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.
This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.
Test Plan: make check for now. I'll add some unit tests later. Also, perf test.
Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22791
10 years ago
|
|
|
};
|
|
|
|
|
|
|
|
class DelayWriteToken : public WriteControllerToken {
|
|
|
|
public:
|
|
|
|
DelayWriteToken(WriteController* controller, uint64_t delay_us)
|
|
|
|
: WriteControllerToken(controller), delay_us_(delay_us) {}
|
|
|
|
virtual ~DelayWriteToken();
|
Push- instead of pull-model for managing Write stalls
Summary:
Introducing WriteController, which is a source of truth about per-DB write delays. Let's define an DB epoch as a period where there are no flushes and compactions (i.e. new epoch is started when flush or compaction finishes). Each epoch can either:
* proceed with all writes without delay
* delay all writes by fixed time
* stop all writes
The three modes are recomputed at each epoch change (flush, compaction), rather than on every write (which is currently the case).
When we have a lot of column families, our current pull behavior adds a big overhead, since we need to loop over every column family for every write. With new push model, overhead on Write code-path is minimal.
This is just the start. Next step is to also take care of stalls introduced by slow memtable flushes. The final goal is to eliminate function MakeRoomForWrite(), which currently needs to be called for every column family by every write.
Test Plan: make check for now. I'll add some unit tests later. Also, perf test.
Reviewers: dhruba, yhchiang, MarkCallaghan, sdong, ljin
Reviewed By: ljin
Subscribers: leveldb
Differential Revision: https://reviews.facebook.net/D22791
10 years ago
|
|
|
|
|
|
|
private:
|
|
|
|
uint64_t delay_us_;
|
|
|
|
};
|
|
|
|
|
|
|
|
} // namespace rocksdb
|