LockTree library, originally from PerconaFT (#7753)
Summary: To be used for implementing Range Locking. Pull Request resolved: https://github.com/facebook/rocksdb/pull/7753 Reviewed By: zhichao-cao Differential Revision: D25378980 Pulled By: cheng-chang fbshipit-source-id: 801a9c5cd92a84654ca2586b73e8f69001e89320main
parent
7b2216c906
commit
98236fb10e
@ -0,0 +1,661 @@ |
||||
GNU AFFERO GENERAL PUBLIC LICENSE |
||||
Version 3, 19 November 2007 |
||||
|
||||
Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/> |
||||
Everyone is permitted to copy and distribute verbatim copies |
||||
of this license document, but changing it is not allowed. |
||||
|
||||
Preamble |
||||
|
||||
The GNU Affero General Public License is a free, copyleft license for |
||||
software and other kinds of works, specifically designed to ensure |
||||
cooperation with the community in the case of network server software. |
||||
|
||||
The licenses for most software and other practical works are designed |
||||
to take away your freedom to share and change the works. By contrast, |
||||
our General Public Licenses are intended to guarantee your freedom to |
||||
share and change all versions of a program--to make sure it remains free |
||||
software for all its users. |
||||
|
||||
When we speak of free software, we are referring to freedom, not |
||||
price. Our General Public Licenses are designed to make sure that you |
||||
have the freedom to distribute copies of free software (and charge for |
||||
them if you wish), that you receive source code or can get it if you |
||||
want it, that you can change the software or use pieces of it in new |
||||
free programs, and that you know you can do these things. |
||||
|
||||
Developers that use our General Public Licenses protect your rights |
||||
with two steps: (1) assert copyright on the software, and (2) offer |
||||
you this License which gives you legal permission to copy, distribute |
||||
and/or modify the software. |
||||
|
||||
A secondary benefit of defending all users' freedom is that |
||||
improvements made in alternate versions of the program, if they |
||||
receive widespread use, become available for other developers to |
||||
incorporate. Many developers of free software are heartened and |
||||
encouraged by the resulting cooperation. However, in the case of |
||||
software used on network servers, this result may fail to come about. |
||||
The GNU General Public License permits making a modified version and |
||||
letting the public access it on a server without ever releasing its |
||||
source code to the public. |
||||
|
||||
The GNU Affero General Public License is designed specifically to |
||||
ensure that, in such cases, the modified source code becomes available |
||||
to the community. It requires the operator of a network server to |
||||
provide the source code of the modified version running there to the |
||||
users of that server. Therefore, public use of a modified version, on |
||||
a publicly accessible server, gives the public access to the source |
||||
code of the modified version. |
||||
|
||||
An older license, called the Affero General Public License and |
||||
published by Affero, was designed to accomplish similar goals. This is |
||||
a different license, not a version of the Affero GPL, but Affero has |
||||
released a new version of the Affero GPL which permits relicensing under |
||||
this license. |
||||
|
||||
The precise terms and conditions for copying, distribution and |
||||
modification follow. |
||||
|
||||
TERMS AND CONDITIONS |
||||
|
||||
0. Definitions. |
||||
|
||||
"This License" refers to version 3 of the GNU Affero General Public License. |
||||
|
||||
"Copyright" also means copyright-like laws that apply to other kinds of |
||||
works, such as semiconductor masks. |
||||
|
||||
"The Program" refers to any copyrightable work licensed under this |
||||
License. Each licensee is addressed as "you". "Licensees" and |
||||
"recipients" may be individuals or organizations. |
||||
|
||||
To "modify" a work means to copy from or adapt all or part of the work |
||||
in a fashion requiring copyright permission, other than the making of an |
||||
exact copy. The resulting work is called a "modified version" of the |
||||
earlier work or a work "based on" the earlier work. |
||||
|
||||
A "covered work" means either the unmodified Program or a work based |
||||
on the Program. |
||||
|
||||
To "propagate" a work means to do anything with it that, without |
||||
permission, would make you directly or secondarily liable for |
||||
infringement under applicable copyright law, except executing it on a |
||||
computer or modifying a private copy. Propagation includes copying, |
||||
distribution (with or without modification), making available to the |
||||
public, and in some countries other activities as well. |
||||
|
||||
To "convey" a work means any kind of propagation that enables other |
||||
parties to make or receive copies. Mere interaction with a user through |
||||
a computer network, with no transfer of a copy, is not conveying. |
||||
|
||||
An interactive user interface displays "Appropriate Legal Notices" |
||||
to the extent that it includes a convenient and prominently visible |
||||
feature that (1) displays an appropriate copyright notice, and (2) |
||||
tells the user that there is no warranty for the work (except to the |
||||
extent that warranties are provided), that licensees may convey the |
||||
work under this License, and how to view a copy of this License. If |
||||
the interface presents a list of user commands or options, such as a |
||||
menu, a prominent item in the list meets this criterion. |
||||
|
||||
1. Source Code. |
||||
|
||||
The "source code" for a work means the preferred form of the work |
||||
for making modifications to it. "Object code" means any non-source |
||||
form of a work. |
||||
|
||||
A "Standard Interface" means an interface that either is an official |
||||
standard defined by a recognized standards body, or, in the case of |
||||
interfaces specified for a particular programming language, one that |
||||
is widely used among developers working in that language. |
||||
|
||||
The "System Libraries" of an executable work include anything, other |
||||
than the work as a whole, that (a) is included in the normal form of |
||||
packaging a Major Component, but which is not part of that Major |
||||
Component, and (b) serves only to enable use of the work with that |
||||
Major Component, or to implement a Standard Interface for which an |
||||
implementation is available to the public in source code form. A |
||||
"Major Component", in this context, means a major essential component |
||||
(kernel, window system, and so on) of the specific operating system |
||||
(if any) on which the executable work runs, or a compiler used to |
||||
produce the work, or an object code interpreter used to run it. |
||||
|
||||
The "Corresponding Source" for a work in object code form means all |
||||
the source code needed to generate, install, and (for an executable |
||||
work) run the object code and to modify the work, including scripts to |
||||
control those activities. However, it does not include the work's |
||||
System Libraries, or general-purpose tools or generally available free |
||||
programs which are used unmodified in performing those activities but |
||||
which are not part of the work. For example, Corresponding Source |
||||
includes interface definition files associated with source files for |
||||
the work, and the source code for shared libraries and dynamically |
||||
linked subprograms that the work is specifically designed to require, |
||||
such as by intimate data communication or control flow between those |
||||
subprograms and other parts of the work. |
||||
|
||||
The Corresponding Source need not include anything that users |
||||
can regenerate automatically from other parts of the Corresponding |
||||
Source. |
||||
|
||||
The Corresponding Source for a work in source code form is that |
||||
same work. |
||||
|
||||
2. Basic Permissions. |
||||
|
||||
All rights granted under this License are granted for the term of |
||||
copyright on the Program, and are irrevocable provided the stated |
||||
conditions are met. This License explicitly affirms your unlimited |
||||
permission to run the unmodified Program. The output from running a |
||||
covered work is covered by this License only if the output, given its |
||||
content, constitutes a covered work. This License acknowledges your |
||||
rights of fair use or other equivalent, as provided by copyright law. |
||||
|
||||
You may make, run and propagate covered works that you do not |
||||
convey, without conditions so long as your license otherwise remains |
||||
in force. You may convey covered works to others for the sole purpose |
||||
of having them make modifications exclusively for you, or provide you |
||||
with facilities for running those works, provided that you comply with |
||||
the terms of this License in conveying all material for which you do |
||||
not control copyright. Those thus making or running the covered works |
||||
for you must do so exclusively on your behalf, under your direction |
||||
and control, on terms that prohibit them from making any copies of |
||||
your copyrighted material outside their relationship with you. |
||||
|
||||
Conveying under any other circumstances is permitted solely under |
||||
the conditions stated below. Sublicensing is not allowed; section 10 |
||||
makes it unnecessary. |
||||
|
||||
3. Protecting Users' Legal Rights From Anti-Circumvention Law. |
||||
|
||||
No covered work shall be deemed part of an effective technological |
||||
measure under any applicable law fulfilling obligations under article |
||||
11 of the WIPO copyright treaty adopted on 20 December 1996, or |
||||
similar laws prohibiting or restricting circumvention of such |
||||
measures. |
||||
|
||||
When you convey a covered work, you waive any legal power to forbid |
||||
circumvention of technological measures to the extent such circumvention |
||||
is effected by exercising rights under this License with respect to |
||||
the covered work, and you disclaim any intention to limit operation or |
||||
modification of the work as a means of enforcing, against the work's |
||||
users, your or third parties' legal rights to forbid circumvention of |
||||
technological measures. |
||||
|
||||
4. Conveying Verbatim Copies. |
||||
|
||||
You may convey verbatim copies of the Program's source code as you |
||||
receive it, in any medium, provided that you conspicuously and |
||||
appropriately publish on each copy an appropriate copyright notice; |
||||
keep intact all notices stating that this License and any |
||||
non-permissive terms added in accord with section 7 apply to the code; |
||||
keep intact all notices of the absence of any warranty; and give all |
||||
recipients a copy of this License along with the Program. |
||||
|
||||
You may charge any price or no price for each copy that you convey, |
||||
and you may offer support or warranty protection for a fee. |
||||
|
||||
5. Conveying Modified Source Versions. |
||||
|
||||
You may convey a work based on the Program, or the modifications to |
||||
produce it from the Program, in the form of source code under the |
||||
terms of section 4, provided that you also meet all of these conditions: |
||||
|
||||
a) The work must carry prominent notices stating that you modified |
||||
it, and giving a relevant date. |
||||
|
||||
b) The work must carry prominent notices stating that it is |
||||
released under this License and any conditions added under section |
||||
7. This requirement modifies the requirement in section 4 to |
||||
"keep intact all notices". |
||||
|
||||
c) You must license the entire work, as a whole, under this |
||||
License to anyone who comes into possession of a copy. This |
||||
License will therefore apply, along with any applicable section 7 |
||||
additional terms, to the whole of the work, and all its parts, |
||||
regardless of how they are packaged. This License gives no |
||||
permission to license the work in any other way, but it does not |
||||
invalidate such permission if you have separately received it. |
||||
|
||||
d) If the work has interactive user interfaces, each must display |
||||
Appropriate Legal Notices; however, if the Program has interactive |
||||
interfaces that do not display Appropriate Legal Notices, your |
||||
work need not make them do so. |
||||
|
||||
A compilation of a covered work with other separate and independent |
||||
works, which are not by their nature extensions of the covered work, |
||||
and which are not combined with it such as to form a larger program, |
||||
in or on a volume of a storage or distribution medium, is called an |
||||
"aggregate" if the compilation and its resulting copyright are not |
||||
used to limit the access or legal rights of the compilation's users |
||||
beyond what the individual works permit. Inclusion of a covered work |
||||
in an aggregate does not cause this License to apply to the other |
||||
parts of the aggregate. |
||||
|
||||
6. Conveying Non-Source Forms. |
||||
|
||||
You may convey a covered work in object code form under the terms |
||||
of sections 4 and 5, provided that you also convey the |
||||
machine-readable Corresponding Source under the terms of this License, |
||||
in one of these ways: |
||||
|
||||
a) Convey the object code in, or embodied in, a physical product |
||||
(including a physical distribution medium), accompanied by the |
||||
Corresponding Source fixed on a durable physical medium |
||||
customarily used for software interchange. |
||||
|
||||
b) Convey the object code in, or embodied in, a physical product |
||||
(including a physical distribution medium), accompanied by a |
||||
written offer, valid for at least three years and valid for as |
||||
long as you offer spare parts or customer support for that product |
||||
model, to give anyone who possesses the object code either (1) a |
||||
copy of the Corresponding Source for all the software in the |
||||
product that is covered by this License, on a durable physical |
||||
medium customarily used for software interchange, for a price no |
||||
more than your reasonable cost of physically performing this |
||||
conveying of source, or (2) access to copy the |
||||
Corresponding Source from a network server at no charge. |
||||
|
||||
c) Convey individual copies of the object code with a copy of the |
||||
written offer to provide the Corresponding Source. This |
||||
alternative is allowed only occasionally and noncommercially, and |
||||
only if you received the object code with such an offer, in accord |
||||
with subsection 6b. |
||||
|
||||
d) Convey the object code by offering access from a designated |
||||
place (gratis or for a charge), and offer equivalent access to the |
||||
Corresponding Source in the same way through the same place at no |
||||
further charge. You need not require recipients to copy the |
||||
Corresponding Source along with the object code. If the place to |
||||
copy the object code is a network server, the Corresponding Source |
||||
may be on a different server (operated by you or a third party) |
||||
that supports equivalent copying facilities, provided you maintain |
||||
clear directions next to the object code saying where to find the |
||||
Corresponding Source. Regardless of what server hosts the |
||||
Corresponding Source, you remain obligated to ensure that it is |
||||
available for as long as needed to satisfy these requirements. |
||||
|
||||
e) Convey the object code using peer-to-peer transmission, provided |
||||
you inform other peers where the object code and Corresponding |
||||
Source of the work are being offered to the general public at no |
||||
charge under subsection 6d. |
||||
|
||||
A separable portion of the object code, whose source code is excluded |
||||
from the Corresponding Source as a System Library, need not be |
||||
included in conveying the object code work. |
||||
|
||||
A "User Product" is either (1) a "consumer product", which means any |
||||
tangible personal property which is normally used for personal, family, |
||||
or household purposes, or (2) anything designed or sold for incorporation |
||||
into a dwelling. In determining whether a product is a consumer product, |
||||
doubtful cases shall be resolved in favor of coverage. For a particular |
||||
product received by a particular user, "normally used" refers to a |
||||
typical or common use of that class of product, regardless of the status |
||||
of the particular user or of the way in which the particular user |
||||
actually uses, or expects or is expected to use, the product. A product |
||||
is a consumer product regardless of whether the product has substantial |
||||
commercial, industrial or non-consumer uses, unless such uses represent |
||||
the only significant mode of use of the product. |
||||
|
||||
"Installation Information" for a User Product means any methods, |
||||
procedures, authorization keys, or other information required to install |
||||
and execute modified versions of a covered work in that User Product from |
||||
a modified version of its Corresponding Source. The information must |
||||
suffice to ensure that the continued functioning of the modified object |
||||
code is in no case prevented or interfered with solely because |
||||
modification has been made. |
||||
|
||||
If you convey an object code work under this section in, or with, or |
||||
specifically for use in, a User Product, and the conveying occurs as |
||||
part of a transaction in which the right of possession and use of the |
||||
User Product is transferred to the recipient in perpetuity or for a |
||||
fixed term (regardless of how the transaction is characterized), the |
||||
Corresponding Source conveyed under this section must be accompanied |
||||
by the Installation Information. But this requirement does not apply |
||||
if neither you nor any third party retains the ability to install |
||||
modified object code on the User Product (for example, the work has |
||||
been installed in ROM). |
||||
|
||||
The requirement to provide Installation Information does not include a |
||||
requirement to continue to provide support service, warranty, or updates |
||||
for a work that has been modified or installed by the recipient, or for |
||||
the User Product in which it has been modified or installed. Access to a |
||||
network may be denied when the modification itself materially and |
||||
adversely affects the operation of the network or violates the rules and |
||||
protocols for communication across the network. |
||||
|
||||
Corresponding Source conveyed, and Installation Information provided, |
||||
in accord with this section must be in a format that is publicly |
||||
documented (and with an implementation available to the public in |
||||
source code form), and must require no special password or key for |
||||
unpacking, reading or copying. |
||||
|
||||
7. Additional Terms. |
||||
|
||||
"Additional permissions" are terms that supplement the terms of this |
||||
License by making exceptions from one or more of its conditions. |
||||
Additional permissions that are applicable to the entire Program shall |
||||
be treated as though they were included in this License, to the extent |
||||
that they are valid under applicable law. If additional permissions |
||||
apply only to part of the Program, that part may be used separately |
||||
under those permissions, but the entire Program remains governed by |
||||
this License without regard to the additional permissions. |
||||
|
||||
When you convey a copy of a covered work, you may at your option |
||||
remove any additional permissions from that copy, or from any part of |
||||
it. (Additional permissions may be written to require their own |
||||
removal in certain cases when you modify the work.) You may place |
||||
additional permissions on material, added by you to a covered work, |
||||
for which you have or can give appropriate copyright permission. |
||||
|
||||
Notwithstanding any other provision of this License, for material you |
||||
add to a covered work, you may (if authorized by the copyright holders of |
||||
that material) supplement the terms of this License with terms: |
||||
|
||||
a) Disclaiming warranty or limiting liability differently from the |
||||
terms of sections 15 and 16 of this License; or |
||||
|
||||
b) Requiring preservation of specified reasonable legal notices or |
||||
author attributions in that material or in the Appropriate Legal |
||||
Notices displayed by works containing it; or |
||||
|
||||
c) Prohibiting misrepresentation of the origin of that material, or |
||||
requiring that modified versions of such material be marked in |
||||
reasonable ways as different from the original version; or |
||||
|
||||
d) Limiting the use for publicity purposes of names of licensors or |
||||
authors of the material; or |
||||
|
||||
e) Declining to grant rights under trademark law for use of some |
||||
trade names, trademarks, or service marks; or |
||||
|
||||
f) Requiring indemnification of licensors and authors of that |
||||
material by anyone who conveys the material (or modified versions of |
||||
it) with contractual assumptions of liability to the recipient, for |
||||
any liability that these contractual assumptions directly impose on |
||||
those licensors and authors. |
||||
|
||||
All other non-permissive additional terms are considered "further |
||||
restrictions" within the meaning of section 10. If the Program as you |
||||
received it, or any part of it, contains a notice stating that it is |
||||
governed by this License along with a term that is a further |
||||
restriction, you may remove that term. If a license document contains |
||||
a further restriction but permits relicensing or conveying under this |
||||
License, you may add to a covered work material governed by the terms |
||||
of that license document, provided that the further restriction does |
||||
not survive such relicensing or conveying. |
||||
|
||||
If you add terms to a covered work in accord with this section, you |
||||
must place, in the relevant source files, a statement of the |
||||
additional terms that apply to those files, or a notice indicating |
||||
where to find the applicable terms. |
||||
|
||||
Additional terms, permissive or non-permissive, may be stated in the |
||||
form of a separately written license, or stated as exceptions; |
||||
the above requirements apply either way. |
||||
|
||||
8. Termination. |
||||
|
||||
You may not propagate or modify a covered work except as expressly |
||||
provided under this License. Any attempt otherwise to propagate or |
||||
modify it is void, and will automatically terminate your rights under |
||||
this License (including any patent licenses granted under the third |
||||
paragraph of section 11). |
||||
|
||||
However, if you cease all violation of this License, then your |
||||
license from a particular copyright holder is reinstated (a) |
||||
provisionally, unless and until the copyright holder explicitly and |
||||
finally terminates your license, and (b) permanently, if the copyright |
||||
holder fails to notify you of the violation by some reasonable means |
||||
prior to 60 days after the cessation. |
||||
|
||||
Moreover, your license from a particular copyright holder is |
||||
reinstated permanently if the copyright holder notifies you of the |
||||
violation by some reasonable means, this is the first time you have |
||||
received notice of violation of this License (for any work) from that |
||||
copyright holder, and you cure the violation prior to 30 days after |
||||
your receipt of the notice. |
||||
|
||||
Termination of your rights under this section does not terminate the |
||||
licenses of parties who have received copies or rights from you under |
||||
this License. If your rights have been terminated and not permanently |
||||
reinstated, you do not qualify to receive new licenses for the same |
||||
material under section 10. |
||||
|
||||
9. Acceptance Not Required for Having Copies. |
||||
|
||||
You are not required to accept this License in order to receive or |
||||
run a copy of the Program. Ancillary propagation of a covered work |
||||
occurring solely as a consequence of using peer-to-peer transmission |
||||
to receive a copy likewise does not require acceptance. However, |
||||
nothing other than this License grants you permission to propagate or |
||||
modify any covered work. These actions infringe copyright if you do |
||||
not accept this License. Therefore, by modifying or propagating a |
||||
covered work, you indicate your acceptance of this License to do so. |
||||
|
||||
10. Automatic Licensing of Downstream Recipients. |
||||
|
||||
Each time you convey a covered work, the recipient automatically |
||||
receives a license from the original licensors, to run, modify and |
||||
propagate that work, subject to this License. You are not responsible |
||||
for enforcing compliance by third parties with this License. |
||||
|
||||
An "entity transaction" is a transaction transferring control of an |
||||
organization, or substantially all assets of one, or subdividing an |
||||
organization, or merging organizations. If propagation of a covered |
||||
work results from an entity transaction, each party to that |
||||
transaction who receives a copy of the work also receives whatever |
||||
licenses to the work the party's predecessor in interest had or could |
||||
give under the previous paragraph, plus a right to possession of the |
||||
Corresponding Source of the work from the predecessor in interest, if |
||||
the predecessor has it or can get it with reasonable efforts. |
||||
|
||||
You may not impose any further restrictions on the exercise of the |
||||
rights granted or affirmed under this License. For example, you may |
||||
not impose a license fee, royalty, or other charge for exercise of |
||||
rights granted under this License, and you may not initiate litigation |
||||
(including a cross-claim or counterclaim in a lawsuit) alleging that |
||||
any patent claim is infringed by making, using, selling, offering for |
||||
sale, or importing the Program or any portion of it. |
||||
|
||||
11. Patents. |
||||
|
||||
A "contributor" is a copyright holder who authorizes use under this |
||||
License of the Program or a work on which the Program is based. The |
||||
work thus licensed is called the contributor's "contributor version". |
||||
|
||||
A contributor's "essential patent claims" are all patent claims |
||||
owned or controlled by the contributor, whether already acquired or |
||||
hereafter acquired, that would be infringed by some manner, permitted |
||||
by this License, of making, using, or selling its contributor version, |
||||
but do not include claims that would be infringed only as a |
||||
consequence of further modification of the contributor version. For |
||||
purposes of this definition, "control" includes the right to grant |
||||
patent sublicenses in a manner consistent with the requirements of |
||||
this License. |
||||
|
||||
Each contributor grants you a non-exclusive, worldwide, royalty-free |
||||
patent license under the contributor's essential patent claims, to |
||||
make, use, sell, offer for sale, import and otherwise run, modify and |
||||
propagate the contents of its contributor version. |
||||
|
||||
In the following three paragraphs, a "patent license" is any express |
||||
agreement or commitment, however denominated, not to enforce a patent |
||||
(such as an express permission to practice a patent or covenant not to |
||||
sue for patent infringement). To "grant" such a patent license to a |
||||
party means to make such an agreement or commitment not to enforce a |
||||
patent against the party. |
||||
|
||||
If you convey a covered work, knowingly relying on a patent license, |
||||
and the Corresponding Source of the work is not available for anyone |
||||
to copy, free of charge and under the terms of this License, through a |
||||
publicly available network server or other readily accessible means, |
||||
then you must either (1) cause the Corresponding Source to be so |
||||
available, or (2) arrange to deprive yourself of the benefit of the |
||||
patent license for this particular work, or (3) arrange, in a manner |
||||
consistent with the requirements of this License, to extend the patent |
||||
license to downstream recipients. "Knowingly relying" means you have |
||||
actual knowledge that, but for the patent license, your conveying the |
||||
covered work in a country, or your recipient's use of the covered work |
||||
in a country, would infringe one or more identifiable patents in that |
||||
country that you have reason to believe are valid. |
||||
|
||||
If, pursuant to or in connection with a single transaction or |
||||
arrangement, you convey, or propagate by procuring conveyance of, a |
||||
covered work, and grant a patent license to some of the parties |
||||
receiving the covered work authorizing them to use, propagate, modify |
||||
or convey a specific copy of the covered work, then the patent license |
||||
you grant is automatically extended to all recipients of the covered |
||||
work and works based on it. |
||||
|
||||
A patent license is "discriminatory" if it does not include within |
||||
the scope of its coverage, prohibits the exercise of, or is |
||||
conditioned on the non-exercise of one or more of the rights that are |
||||
specifically granted under this License. You may not convey a covered |
||||
work if you are a party to an arrangement with a third party that is |
||||
in the business of distributing software, under which you make payment |
||||
to the third party based on the extent of your activity of conveying |
||||
the work, and under which the third party grants, to any of the |
||||
parties who would receive the covered work from you, a discriminatory |
||||
patent license (a) in connection with copies of the covered work |
||||
conveyed by you (or copies made from those copies), or (b) primarily |
||||
for and in connection with specific products or compilations that |
||||
contain the covered work, unless you entered into that arrangement, |
||||
or that patent license was granted, prior to 28 March 2007. |
||||
|
||||
Nothing in this License shall be construed as excluding or limiting |
||||
any implied license or other defenses to infringement that may |
||||
otherwise be available to you under applicable patent law. |
||||
|
||||
12. No Surrender of Others' Freedom. |
||||
|
||||
If conditions are imposed on you (whether by court order, agreement or |
||||
otherwise) that contradict the conditions of this License, they do not |
||||
excuse you from the conditions of this License. If you cannot convey a |
||||
covered work so as to satisfy simultaneously your obligations under this |
||||
License and any other pertinent obligations, then as a consequence you may |
||||
not convey it at all. For example, if you agree to terms that obligate you |
||||
to collect a royalty for further conveying from those to whom you convey |
||||
the Program, the only way you could satisfy both those terms and this |
||||
License would be to refrain entirely from conveying the Program. |
||||
|
||||
13. Remote Network Interaction; Use with the GNU General Public License. |
||||
|
||||
Notwithstanding any other provision of this License, if you modify the |
||||
Program, your modified version must prominently offer all users |
||||
interacting with it remotely through a computer network (if your version |
||||
supports such interaction) an opportunity to receive the Corresponding |
||||
Source of your version by providing access to the Corresponding Source |
||||
from a network server at no charge, through some standard or customary |
||||
means of facilitating copying of software. This Corresponding Source |
||||
shall include the Corresponding Source for any work covered by version 3 |
||||
of the GNU General Public License that is incorporated pursuant to the |
||||
following paragraph. |
||||
|
||||
Notwithstanding any other provision of this License, you have |
||||
permission to link or combine any covered work with a work licensed |
||||
under version 3 of the GNU General Public License into a single |
||||
combined work, and to convey the resulting work. The terms of this |
||||
License will continue to apply to the part which is the covered work, |
||||
but the work with which it is combined will remain governed by version |
||||
3 of the GNU General Public License. |
||||
|
||||
14. Revised Versions of this License. |
||||
|
||||
The Free Software Foundation may publish revised and/or new versions of |
||||
the GNU Affero General Public License from time to time. Such new versions |
||||
will be similar in spirit to the present version, but may differ in detail to |
||||
address new problems or concerns. |
||||
|
||||
Each version is given a distinguishing version number. If the |
||||
Program specifies that a certain numbered version of the GNU Affero General |
||||
Public License "or any later version" applies to it, you have the |
||||
option of following the terms and conditions either of that numbered |
||||
version or of any later version published by the Free Software |
||||
Foundation. If the Program does not specify a version number of the |
||||
GNU Affero General Public License, you may choose any version ever published |
||||
by the Free Software Foundation. |
||||
|
||||
If the Program specifies that a proxy can decide which future |
||||
versions of the GNU Affero General Public License can be used, that proxy's |
||||
public statement of acceptance of a version permanently authorizes you |
||||
to choose that version for the Program. |
||||
|
||||
Later license versions may give you additional or different |
||||
permissions. However, no additional obligations are imposed on any |
||||
author or copyright holder as a result of your choosing to follow a |
||||
later version. |
||||
|
||||
15. Disclaimer of Warranty. |
||||
|
||||
THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY |
||||
APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT |
||||
HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY |
||||
OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, |
||||
THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR |
||||
PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM |
||||
IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF |
||||
ALL NECESSARY SERVICING, REPAIR OR CORRECTION. |
||||
|
||||
16. Limitation of Liability. |
||||
|
||||
IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING |
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS |
||||
THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY |
||||
GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE |
||||
USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF |
||||
DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD |
||||
PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS), |
||||
EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF |
||||
SUCH DAMAGES. |
||||
|
||||
17. Interpretation of Sections 15 and 16. |
||||
|
||||
If the disclaimer of warranty and limitation of liability provided |
||||
above cannot be given local legal effect according to their terms, |
||||
reviewing courts shall apply local law that most closely approximates |
||||
an absolute waiver of all civil liability in connection with the |
||||
Program, unless a warranty or assumption of liability accompanies a |
||||
copy of the Program in return for a fee. |
||||
|
||||
END OF TERMS AND CONDITIONS |
||||
|
||||
How to Apply These Terms to Your New Programs |
||||
|
||||
If you develop a new program, and you want it to be of the greatest |
||||
possible use to the public, the best way to achieve this is to make it |
||||
free software which everyone can redistribute and change under these terms. |
||||
|
||||
To do so, attach the following notices to the program. It is safest |
||||
to attach them to the start of each source file to most effectively |
||||
state the exclusion of warranty; and each file should have at least |
||||
the "copyright" line and a pointer to where the full notice is found. |
||||
|
||||
<one line to give the program's name and a brief idea of what it does.> |
||||
Copyright (C) <year> <name of author> |
||||
|
||||
This program is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License as published by |
||||
the Free Software Foundation, either version 3 of the License, or |
||||
(at your option) any later version. |
||||
|
||||
This program is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with this program. If not, see <http://www.gnu.org/licenses/>. |
||||
|
||||
Also add information on how to contact you by electronic and paper mail. |
||||
|
||||
If your software can interact with users remotely through a computer |
||||
network, you should also make sure that it provides a way for users to |
||||
get its source. For example, if your program is a web application, its |
||||
interface could display a "Source" link that leads users to an archive |
||||
of the code. There are many ways you could offer source, and different |
||||
solutions will be better for different programs; see section 13 for the |
||||
specific requirements. |
||||
|
||||
You should also get your employer (if you work as a programmer) or school, |
||||
if any, to sign a "copyright disclaimer" for the program, if necessary. |
||||
For more information on this, and how to apply and follow the GNU AGPL, see |
||||
<http://www.gnu.org/licenses/>. |
@ -0,0 +1,174 @@ |
||||
Apache License |
||||
Version 2.0, January 2004 |
||||
http://www.apache.org/licenses/ |
||||
|
||||
TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION |
||||
|
||||
1. Definitions. |
||||
|
||||
"License" shall mean the terms and conditions for use, reproduction, |
||||
and distribution as defined by Sections 1 through 9 of this document. |
||||
|
||||
"Licensor" shall mean the copyright owner or entity authorized by |
||||
the copyright owner that is granting the License. |
||||
|
||||
"Legal Entity" shall mean the union of the acting entity and all |
||||
other entities that control, are controlled by, or are under common |
||||
control with that entity. For the purposes of this definition, |
||||
"control" means (i) the power, direct or indirect, to cause the |
||||
direction or management of such entity, whether by contract or |
||||
otherwise, or (ii) ownership of fifty percent (50%) or more of the |
||||
outstanding shares, or (iii) beneficial ownership of such entity. |
||||
|
||||
"You" (or "Your") shall mean an individual or Legal Entity |
||||
exercising permissions granted by this License. |
||||
|
||||
"Source" form shall mean the preferred form for making modifications, |
||||
including but not limited to software source code, documentation |
||||
source, and configuration files. |
||||
|
||||
"Object" form shall mean any form resulting from mechanical |
||||
transformation or translation of a Source form, including but |
||||
not limited to compiled object code, generated documentation, |
||||
and conversions to other media types. |
||||
|
||||
"Work" shall mean the work of authorship, whether in Source or |
||||
Object form, made available under the License, as indicated by a |
||||
copyright notice that is included in or attached to the work |
||||
(an example is provided in the Appendix below). |
||||
|
||||
"Derivative Works" shall mean any work, whether in Source or Object |
||||
form, that is based on (or derived from) the Work and for which the |
||||
editorial revisions, annotations, elaborations, or other modifications |
||||
represent, as a whole, an original work of authorship. For the purposes |
||||
of this License, Derivative Works shall not include works that remain |
||||
separable from, or merely link (or bind by name) to the interfaces of, |
||||
the Work and Derivative Works thereof. |
||||
|
||||
"Contribution" shall mean any work of authorship, including |
||||
the original version of the Work and any modifications or additions |
||||
to that Work or Derivative Works thereof, that is intentionally |
||||
submitted to Licensor for inclusion in the Work by the copyright owner |
||||
or by an individual or Legal Entity authorized to submit on behalf of |
||||
the copyright owner. For the purposes of this definition, "submitted" |
||||
means any form of electronic, verbal, or written communication sent |
||||
to the Licensor or its representatives, including but not limited to |
||||
communication on electronic mailing lists, source code control systems, |
||||
and issue tracking systems that are managed by, or on behalf of, the |
||||
Licensor for the purpose of discussing and improving the Work, but |
||||
excluding communication that is conspicuously marked or otherwise |
||||
designated in writing by the copyright owner as "Not a Contribution." |
||||
|
||||
"Contributor" shall mean Licensor and any individual or Legal Entity |
||||
on behalf of whom a Contribution has been received by Licensor and |
||||
subsequently incorporated within the Work. |
||||
|
||||
2. Grant of Copyright License. Subject to the terms and conditions of |
||||
this License, each Contributor hereby grants to You a perpetual, |
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable |
||||
copyright license to reproduce, prepare Derivative Works of, |
||||
publicly display, publicly perform, sublicense, and distribute the |
||||
Work and such Derivative Works in Source or Object form. |
||||
|
||||
3. Grant of Patent License. Subject to the terms and conditions of |
||||
this License, each Contributor hereby grants to You a perpetual, |
||||
worldwide, non-exclusive, no-charge, royalty-free, irrevocable |
||||
(except as stated in this section) patent license to make, have made, |
||||
use, offer to sell, sell, import, and otherwise transfer the Work, |
||||
where such license applies only to those patent claims licensable |
||||
by such Contributor that are necessarily infringed by their |
||||
Contribution(s) alone or by combination of their Contribution(s) |
||||
with the Work to which such Contribution(s) was submitted. If You |
||||
institute patent litigation against any entity (including a |
||||
cross-claim or counterclaim in a lawsuit) alleging that the Work |
||||
or a Contribution incorporated within the Work constitutes direct |
||||
or contributory patent infringement, then any patent licenses |
||||
granted to You under this License for that Work shall terminate |
||||
as of the date such litigation is filed. |
||||
|
||||
4. Redistribution. You may reproduce and distribute copies of the |
||||
Work or Derivative Works thereof in any medium, with or without |
||||
modifications, and in Source or Object form, provided that You |
||||
meet the following conditions: |
||||
|
||||
(a) You must give any other recipients of the Work or |
||||
Derivative Works a copy of this License; and |
||||
|
||||
(b) You must cause any modified files to carry prominent notices |
||||
stating that You changed the files; and |
||||
|
||||
(c) You must retain, in the Source form of any Derivative Works |
||||
that You distribute, all copyright, patent, trademark, and |
||||
attribution notices from the Source form of the Work, |
||||
excluding those notices that do not pertain to any part of |
||||
the Derivative Works; and |
||||
|
||||
(d) If the Work includes a "NOTICE" text file as part of its |
||||
distribution, then any Derivative Works that You distribute must |
||||
include a readable copy of the attribution notices contained |
||||
within such NOTICE file, excluding those notices that do not |
||||
pertain to any part of the Derivative Works, in at least one |
||||
of the following places: within a NOTICE text file distributed |
||||
as part of the Derivative Works; within the Source form or |
||||
documentation, if provided along with the Derivative Works; or, |
||||
within a display generated by the Derivative Works, if and |
||||
wherever such third-party notices normally appear. The contents |
||||
of the NOTICE file are for informational purposes only and |
||||
do not modify the License. You may add Your own attribution |
||||
notices within Derivative Works that You distribute, alongside |
||||
or as an addendum to the NOTICE text from the Work, provided |
||||
that such additional attribution notices cannot be construed |
||||
as modifying the License. |
||||
|
||||
You may add Your own copyright statement to Your modifications and |
||||
may provide additional or different license terms and conditions |
||||
for use, reproduction, or distribution of Your modifications, or |
||||
for any such Derivative Works as a whole, provided Your use, |
||||
reproduction, and distribution of the Work otherwise complies with |
||||
the conditions stated in this License. |
||||
|
||||
5. Submission of Contributions. Unless You explicitly state otherwise, |
||||
any Contribution intentionally submitted for inclusion in the Work |
||||
by You to the Licensor shall be under the terms and conditions of |
||||
this License, without any additional terms or conditions. |
||||
Notwithstanding the above, nothing herein shall supersede or modify |
||||
the terms of any separate license agreement you may have executed |
||||
with Licensor regarding such Contributions. |
||||
|
||||
6. Trademarks. This License does not grant permission to use the trade |
||||
names, trademarks, service marks, or product names of the Licensor, |
||||
except as required for reasonable and customary use in describing the |
||||
origin of the Work and reproducing the content of the NOTICE file. |
||||
|
||||
7. Disclaimer of Warranty. Unless required by applicable law or |
||||
agreed to in writing, Licensor provides the Work (and each |
||||
Contributor provides its Contributions) on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or |
||||
implied, including, without limitation, any warranties or conditions |
||||
of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A |
||||
PARTICULAR PURPOSE. You are solely responsible for determining the |
||||
appropriateness of using or redistributing the Work and assume any |
||||
risks associated with Your exercise of permissions under this License. |
||||
|
||||
8. Limitation of Liability. In no event and under no legal theory, |
||||
whether in tort (including negligence), contract, or otherwise, |
||||
unless required by applicable law (such as deliberate and grossly |
||||
negligent acts) or agreed to in writing, shall any Contributor be |
||||
liable to You for damages, including any direct, indirect, special, |
||||
incidental, or consequential damages of any character arising as a |
||||
result of this License or out of the use or inability to use the |
||||
Work (including but not limited to damages for loss of goodwill, |
||||
work stoppage, computer failure or malfunction, or any and all |
||||
other commercial damages or losses), even if such Contributor |
||||
has been advised of the possibility of such damages. |
||||
|
||||
9. Accepting Warranty or Additional Liability. While redistributing |
||||
the Work or Derivative Works thereof, You may choose to offer, |
||||
and charge a fee for, acceptance of support, warranty, indemnity, |
||||
or other liability obligations and/or rights consistent with this |
||||
License. However, in accepting such obligations, You may act only |
||||
on Your own behalf and on Your sole responsibility, not on behalf |
||||
of any other Contributor, and only if You agree to indemnify, |
||||
defend, and hold each Contributor harmless for any liability |
||||
incurred by, or claims asserted against, such Contributor by reason |
||||
of your accepting any such warranty or additional liability. |
@ -0,0 +1,339 @@ |
||||
GNU GENERAL PUBLIC LICENSE |
||||
Version 2, June 1991 |
||||
|
||||
Copyright (C) 1989, 1991 Free Software Foundation, Inc., |
||||
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA |
||||
Everyone is permitted to copy and distribute verbatim copies |
||||
of this license document, but changing it is not allowed. |
||||
|
||||
Preamble |
||||
|
||||
The licenses for most software are designed to take away your |
||||
freedom to share and change it. By contrast, the GNU General Public |
||||
License is intended to guarantee your freedom to share and change free |
||||
software--to make sure the software is free for all its users. This |
||||
General Public License applies to most of the Free Software |
||||
Foundation's software and to any other program whose authors commit to |
||||
using it. (Some other Free Software Foundation software is covered by |
||||
the GNU Lesser General Public License instead.) You can apply it to |
||||
your programs, too. |
||||
|
||||
When we speak of free software, we are referring to freedom, not |
||||
price. Our General Public Licenses are designed to make sure that you |
||||
have the freedom to distribute copies of free software (and charge for |
||||
this service if you wish), that you receive source code or can get it |
||||
if you want it, that you can change the software or use pieces of it |
||||
in new free programs; and that you know you can do these things. |
||||
|
||||
To protect your rights, we need to make restrictions that forbid |
||||
anyone to deny you these rights or to ask you to surrender the rights. |
||||
These restrictions translate to certain responsibilities for you if you |
||||
distribute copies of the software, or if you modify it. |
||||
|
||||
For example, if you distribute copies of such a program, whether |
||||
gratis or for a fee, you must give the recipients all the rights that |
||||
you have. You must make sure that they, too, receive or can get the |
||||
source code. And you must show them these terms so they know their |
||||
rights. |
||||
|
||||
We protect your rights with two steps: (1) copyright the software, and |
||||
(2) offer you this license which gives you legal permission to copy, |
||||
distribute and/or modify the software. |
||||
|
||||
Also, for each author's protection and ours, we want to make certain |
||||
that everyone understands that there is no warranty for this free |
||||
software. If the software is modified by someone else and passed on, we |
||||
want its recipients to know that what they have is not the original, so |
||||
that any problems introduced by others will not reflect on the original |
||||
authors' reputations. |
||||
|
||||
Finally, any free program is threatened constantly by software |
||||
patents. We wish to avoid the danger that redistributors of a free |
||||
program will individually obtain patent licenses, in effect making the |
||||
program proprietary. To prevent this, we have made it clear that any |
||||
patent must be licensed for everyone's free use or not licensed at all. |
||||
|
||||
The precise terms and conditions for copying, distribution and |
||||
modification follow. |
||||
|
||||
GNU GENERAL PUBLIC LICENSE |
||||
TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION |
||||
|
||||
0. This License applies to any program or other work which contains |
||||
a notice placed by the copyright holder saying it may be distributed |
||||
under the terms of this General Public License. The "Program", below, |
||||
refers to any such program or work, and a "work based on the Program" |
||||
means either the Program or any derivative work under copyright law: |
||||
that is to say, a work containing the Program or a portion of it, |
||||
either verbatim or with modifications and/or translated into another |
||||
language. (Hereinafter, translation is included without limitation in |
||||
the term "modification".) Each licensee is addressed as "you". |
||||
|
||||
Activities other than copying, distribution and modification are not |
||||
covered by this License; they are outside its scope. The act of |
||||
running the Program is not restricted, and the output from the Program |
||||
is covered only if its contents constitute a work based on the |
||||
Program (independent of having been made by running the Program). |
||||
Whether that is true depends on what the Program does. |
||||
|
||||
1. You may copy and distribute verbatim copies of the Program's |
||||
source code as you receive it, in any medium, provided that you |
||||
conspicuously and appropriately publish on each copy an appropriate |
||||
copyright notice and disclaimer of warranty; keep intact all the |
||||
notices that refer to this License and to the absence of any warranty; |
||||
and give any other recipients of the Program a copy of this License |
||||
along with the Program. |
||||
|
||||
You may charge a fee for the physical act of transferring a copy, and |
||||
you may at your option offer warranty protection in exchange for a fee. |
||||
|
||||
2. You may modify your copy or copies of the Program or any portion |
||||
of it, thus forming a work based on the Program, and copy and |
||||
distribute such modifications or work under the terms of Section 1 |
||||
above, provided that you also meet all of these conditions: |
||||
|
||||
a) You must cause the modified files to carry prominent notices |
||||
stating that you changed the files and the date of any change. |
||||
|
||||
b) You must cause any work that you distribute or publish, that in |
||||
whole or in part contains or is derived from the Program or any |
||||
part thereof, to be licensed as a whole at no charge to all third |
||||
parties under the terms of this License. |
||||
|
||||
c) If the modified program normally reads commands interactively |
||||
when run, you must cause it, when started running for such |
||||
interactive use in the most ordinary way, to print or display an |
||||
announcement including an appropriate copyright notice and a |
||||
notice that there is no warranty (or else, saying that you provide |
||||
a warranty) and that users may redistribute the program under |
||||
these conditions, and telling the user how to view a copy of this |
||||
License. (Exception: if the Program itself is interactive but |
||||
does not normally print such an announcement, your work based on |
||||
the Program is not required to print an announcement.) |
||||
|
||||
These requirements apply to the modified work as a whole. If |
||||
identifiable sections of that work are not derived from the Program, |
||||
and can be reasonably considered independent and separate works in |
||||
themselves, then this License, and its terms, do not apply to those |
||||
sections when you distribute them as separate works. But when you |
||||
distribute the same sections as part of a whole which is a work based |
||||
on the Program, the distribution of the whole must be on the terms of |
||||
this License, whose permissions for other licensees extend to the |
||||
entire whole, and thus to each and every part regardless of who wrote it. |
||||
|
||||
Thus, it is not the intent of this section to claim rights or contest |
||||
your rights to work written entirely by you; rather, the intent is to |
||||
exercise the right to control the distribution of derivative or |
||||
collective works based on the Program. |
||||
|
||||
In addition, mere aggregation of another work not based on the Program |
||||
with the Program (or with a work based on the Program) on a volume of |
||||
a storage or distribution medium does not bring the other work under |
||||
the scope of this License. |
||||
|
||||
3. You may copy and distribute the Program (or a work based on it, |
||||
under Section 2) in object code or executable form under the terms of |
||||
Sections 1 and 2 above provided that you also do one of the following: |
||||
|
||||
a) Accompany it with the complete corresponding machine-readable |
||||
source code, which must be distributed under the terms of Sections |
||||
1 and 2 above on a medium customarily used for software interchange; or, |
||||
|
||||
b) Accompany it with a written offer, valid for at least three |
||||
years, to give any third party, for a charge no more than your |
||||
cost of physically performing source distribution, a complete |
||||
machine-readable copy of the corresponding source code, to be |
||||
distributed under the terms of Sections 1 and 2 above on a medium |
||||
customarily used for software interchange; or, |
||||
|
||||
c) Accompany it with the information you received as to the offer |
||||
to distribute corresponding source code. (This alternative is |
||||
allowed only for noncommercial distribution and only if you |
||||
received the program in object code or executable form with such |
||||
an offer, in accord with Subsection b above.) |
||||
|
||||
The source code for a work means the preferred form of the work for |
||||
making modifications to it. For an executable work, complete source |
||||
code means all the source code for all modules it contains, plus any |
||||
associated interface definition files, plus the scripts used to |
||||
control compilation and installation of the executable. However, as a |
||||
special exception, the source code distributed need not include |
||||
anything that is normally distributed (in either source or binary |
||||
form) with the major components (compiler, kernel, and so on) of the |
||||
operating system on which the executable runs, unless that component |
||||
itself accompanies the executable. |
||||
|
||||
If distribution of executable or object code is made by offering |
||||
access to copy from a designated place, then offering equivalent |
||||
access to copy the source code from the same place counts as |
||||
distribution of the source code, even though third parties are not |
||||
compelled to copy the source along with the object code. |
||||
|
||||
4. You may not copy, modify, sublicense, or distribute the Program |
||||
except as expressly provided under this License. Any attempt |
||||
otherwise to copy, modify, sublicense or distribute the Program is |
||||
void, and will automatically terminate your rights under this License. |
||||
However, parties who have received copies, or rights, from you under |
||||
this License will not have their licenses terminated so long as such |
||||
parties remain in full compliance. |
||||
|
||||
5. You are not required to accept this License, since you have not |
||||
signed it. However, nothing else grants you permission to modify or |
||||
distribute the Program or its derivative works. These actions are |
||||
prohibited by law if you do not accept this License. Therefore, by |
||||
modifying or distributing the Program (or any work based on the |
||||
Program), you indicate your acceptance of this License to do so, and |
||||
all its terms and conditions for copying, distributing or modifying |
||||
the Program or works based on it. |
||||
|
||||
6. Each time you redistribute the Program (or any work based on the |
||||
Program), the recipient automatically receives a license from the |
||||
original licensor to copy, distribute or modify the Program subject to |
||||
these terms and conditions. You may not impose any further |
||||
restrictions on the recipients' exercise of the rights granted herein. |
||||
You are not responsible for enforcing compliance by third parties to |
||||
this License. |
||||
|
||||
7. If, as a consequence of a court judgment or allegation of patent |
||||
infringement or for any other reason (not limited to patent issues), |
||||
conditions are imposed on you (whether by court order, agreement or |
||||
otherwise) that contradict the conditions of this License, they do not |
||||
excuse you from the conditions of this License. If you cannot |
||||
distribute so as to satisfy simultaneously your obligations under this |
||||
License and any other pertinent obligations, then as a consequence you |
||||
may not distribute the Program at all. For example, if a patent |
||||
license would not permit royalty-free redistribution of the Program by |
||||
all those who receive copies directly or indirectly through you, then |
||||
the only way you could satisfy both it and this License would be to |
||||
refrain entirely from distribution of the Program. |
||||
|
||||
If any portion of this section is held invalid or unenforceable under |
||||
any particular circumstance, the balance of the section is intended to |
||||
apply and the section as a whole is intended to apply in other |
||||
circumstances. |
||||
|
||||
It is not the purpose of this section to induce you to infringe any |
||||
patents or other property right claims or to contest validity of any |
||||
such claims; this section has the sole purpose of protecting the |
||||
integrity of the free software distribution system, which is |
||||
implemented by public license practices. Many people have made |
||||
generous contributions to the wide range of software distributed |
||||
through that system in reliance on consistent application of that |
||||
system; it is up to the author/donor to decide if he or she is willing |
||||
to distribute software through any other system and a licensee cannot |
||||
impose that choice. |
||||
|
||||
This section is intended to make thoroughly clear what is believed to |
||||
be a consequence of the rest of this License. |
||||
|
||||
8. If the distribution and/or use of the Program is restricted in |
||||
certain countries either by patents or by copyrighted interfaces, the |
||||
original copyright holder who places the Program under this License |
||||
may add an explicit geographical distribution limitation excluding |
||||
those countries, so that distribution is permitted only in or among |
||||
countries not thus excluded. In such case, this License incorporates |
||||
the limitation as if written in the body of this License. |
||||
|
||||
9. The Free Software Foundation may publish revised and/or new versions |
||||
of the General Public License from time to time. Such new versions will |
||||
be similar in spirit to the present version, but may differ in detail to |
||||
address new problems or concerns. |
||||
|
||||
Each version is given a distinguishing version number. If the Program |
||||
specifies a version number of this License which applies to it and "any |
||||
later version", you have the option of following the terms and conditions |
||||
either of that version or of any later version published by the Free |
||||
Software Foundation. If the Program does not specify a version number of |
||||
this License, you may choose any version ever published by the Free Software |
||||
Foundation. |
||||
|
||||
10. If you wish to incorporate parts of the Program into other free |
||||
programs whose distribution conditions are different, write to the author |
||||
to ask for permission. For software which is copyrighted by the Free |
||||
Software Foundation, write to the Free Software Foundation; we sometimes |
||||
make exceptions for this. Our decision will be guided by the two goals |
||||
of preserving the free status of all derivatives of our free software and |
||||
of promoting the sharing and reuse of software generally. |
||||
|
||||
NO WARRANTY |
||||
|
||||
11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY |
||||
FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN |
||||
OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES |
||||
PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED |
||||
OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF |
||||
MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS |
||||
TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE |
||||
PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, |
||||
REPAIR OR CORRECTION. |
||||
|
||||
12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING |
||||
WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR |
||||
REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, |
||||
INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING |
||||
OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED |
||||
TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY |
||||
YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER |
||||
PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE |
||||
POSSIBILITY OF SUCH DAMAGES. |
||||
|
||||
END OF TERMS AND CONDITIONS |
||||
|
||||
How to Apply These Terms to Your New Programs |
||||
|
||||
If you develop a new program, and you want it to be of the greatest |
||||
possible use to the public, the best way to achieve this is to make it |
||||
free software which everyone can redistribute and change under these terms. |
||||
|
||||
To do so, attach the following notices to the program. It is safest |
||||
to attach them to the start of each source file to most effectively |
||||
convey the exclusion of warranty; and each file should have at least |
||||
the "copyright" line and a pointer to where the full notice is found. |
||||
|
||||
<one line to give the program's name and a brief idea of what it does.> |
||||
Copyright (C) <year> <name of author> |
||||
|
||||
This program is free software; you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License as published by |
||||
the Free Software Foundation; either version 2 of the License, or |
||||
(at your option) any later version. |
||||
|
||||
This program is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License along |
||||
with this program; if not, write to the Free Software Foundation, Inc., |
||||
51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA. |
||||
|
||||
Also add information on how to contact you by electronic and paper mail. |
||||
|
||||
If the program is interactive, make it output a short notice like this |
||||
when it starts in an interactive mode: |
||||
|
||||
Gnomovision version 69, Copyright (C) year name of author |
||||
Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'. |
||||
This is free software, and you are welcome to redistribute it |
||||
under certain conditions; type `show c' for details. |
||||
|
||||
The hypothetical commands `show w' and `show c' should show the appropriate |
||||
parts of the General Public License. Of course, the commands you use may |
||||
be called something other than `show w' and `show c'; they could even be |
||||
mouse-clicks or menu items--whatever suits your program. |
||||
|
||||
You should also get your employer (if you work as a programmer) or your |
||||
school, if any, to sign a "copyright disclaimer" for the program, if |
||||
necessary. Here is a sample; alter the names: |
||||
|
||||
Yoyodyne, Inc., hereby disclaims all copyright interest in the program |
||||
`Gnomovision' (which makes passes at compilers) written by James Hacker. |
||||
|
||||
<signature of Ty Coon>, 1 April 1989 |
||||
Ty Coon, President of Vice |
||||
|
||||
This General Public License does not permit incorporating your program into |
||||
proprietary programs. If your program is a subroutine library, you may |
||||
consider it more useful to permit linking proprietary applications with the |
||||
library. If this is what you want to do, use the GNU Lesser General |
||||
Public License instead of this License. |
@ -0,0 +1,13 @@ |
||||
The files in this directory originally come from |
||||
https://github.com/percona/PerconaFT/. |
||||
|
||||
This directory only includes the "locktree" part of PerconaFT, and its |
||||
dependencies. |
||||
|
||||
The following modifications were made: |
||||
- Make locktree usable outside of PerconaFT library |
||||
- Add shared read-only lock support |
||||
|
||||
The files named *_subst.* are substitutes of the PerconaFT's files, they |
||||
contain replacements of PerconaFT's functionality. |
||||
|
@ -0,0 +1,76 @@ |
||||
#ifndef _DB_H |
||||
#define _DB_H |
||||
|
||||
#include <stdint.h> |
||||
#include <sys/types.h> |
||||
|
||||
typedef struct __toku_dbt DBT; |
||||
|
||||
// port: this is currently not used
|
||||
struct simple_dbt { |
||||
uint32_t len; |
||||
void *data; |
||||
}; |
||||
|
||||
// engine status info
|
||||
// engine status is passed to handlerton as an array of
|
||||
// TOKU_ENGINE_STATUS_ROW_S[]
|
||||
typedef enum { |
||||
STATUS_FS_STATE = 0, // interpret as file system state (redzone) enum
|
||||
STATUS_UINT64, // interpret as uint64_t
|
||||
STATUS_CHARSTR, // interpret as char *
|
||||
STATUS_UNIXTIME, // interpret as time_t
|
||||
STATUS_TOKUTIME, // interpret as tokutime_t
|
||||
STATUS_PARCOUNT, // interpret as PARTITIONED_COUNTER
|
||||
STATUS_DOUBLE // interpret as double
|
||||
} toku_engine_status_display_type; |
||||
|
||||
typedef enum { |
||||
TOKU_ENGINE_STATUS = (1ULL << 0), // Include when asking for engine status
|
||||
TOKU_GLOBAL_STATUS = |
||||
(1ULL << 1), // Include when asking for information_schema.global_status
|
||||
} toku_engine_status_include_type; |
||||
|
||||
typedef struct __toku_engine_status_row { |
||||
const char *keyname; // info schema key, should not change across revisions
|
||||
// without good reason
|
||||
const char |
||||
*columnname; // column for mysql, e.g. information_schema.global_status.
|
||||
// TOKUDB_ will automatically be prefixed.
|
||||
const char *legend; // the text that will appear at user interface
|
||||
toku_engine_status_display_type type; // how to interpret the value
|
||||
toku_engine_status_include_type |
||||
include; // which kinds of callers should get read this row?
|
||||
union { |
||||
double dnum; |
||||
uint64_t num; |
||||
const char *str; |
||||
char datebuf[26]; |
||||
struct partitioned_counter *parcount; |
||||
} value; |
||||
} * TOKU_ENGINE_STATUS_ROW, TOKU_ENGINE_STATUS_ROW_S; |
||||
|
||||
#define DB_BUFFER_SMALL -30999 |
||||
#define DB_LOCK_DEADLOCK -30995 |
||||
#define DB_LOCK_NOTGRANTED -30994 |
||||
#define DB_NOTFOUND -30989 |
||||
#define DB_KEYEXIST -30996 |
||||
#define DB_DBT_MALLOC 8 |
||||
#define DB_DBT_REALLOC 64 |
||||
#define DB_DBT_USERMEM 256 |
||||
|
||||
/* PerconaFT specific error codes */ |
||||
#define TOKUDB_OUT_OF_LOCKS -100000 |
||||
|
||||
typedef void (*lock_wait_callback)(void *arg, uint64_t requesting_txnid, |
||||
uint64_t blocking_txnid); |
||||
|
||||
struct __toku_dbt { |
||||
void *data; |
||||
size_t size; |
||||
size_t ulen; |
||||
// One of DB_DBT_XXX flags
|
||||
uint32_t flags; |
||||
}; |
||||
|
||||
#endif |
@ -0,0 +1,124 @@ |
||||
/* -*- mode: C; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <string.h> |
||||
|
||||
#include "../db.h" |
||||
#include "../portability/memory.h" |
||||
#include "../util/dbt.h" |
||||
|
||||
typedef int (*ft_compare_func)(void *arg, const DBT *a, const DBT *b); |
||||
|
||||
int toku_keycompare(const void *key1, size_t key1len, const void *key2, |
||||
size_t key2len); |
||||
|
||||
int toku_builtin_compare_fun(const DBT *, const DBT *) |
||||
__attribute__((__visibility__("default"))); |
||||
|
||||
namespace toku { |
||||
|
||||
// a comparator object encapsulates the data necessary for
|
||||
// comparing two keys in a fractal tree. it further understands
|
||||
// that points may be positive or negative infinity.
|
||||
|
||||
class comparator { |
||||
void init(ft_compare_func cmp, void *cmp_arg, uint8_t memcmp_magic) { |
||||
_cmp = cmp; |
||||
_cmp_arg = cmp_arg; |
||||
_memcmp_magic = memcmp_magic; |
||||
} |
||||
|
||||
public: |
||||
// This magic value is reserved to mean that the magic has not been set.
|
||||
static const uint8_t MEMCMP_MAGIC_NONE = 0; |
||||
|
||||
void create(ft_compare_func cmp, void *cmp_arg, |
||||
uint8_t memcmp_magic = MEMCMP_MAGIC_NONE) { |
||||
init(cmp, cmp_arg, memcmp_magic); |
||||
} |
||||
|
||||
// inherit the attributes of another comparator, but keep our own
|
||||
// copy of fake_db that is owned separately from the one given.
|
||||
void inherit(const comparator &cmp) { |
||||
invariant_notnull(cmp._cmp); |
||||
init(cmp._cmp, cmp._cmp_arg, cmp._memcmp_magic); |
||||
} |
||||
|
||||
// like inherit, but doesn't require that the this comparator
|
||||
// was already created
|
||||
void create_from(const comparator &cmp) { inherit(cmp); } |
||||
|
||||
void destroy() {} |
||||
|
||||
ft_compare_func get_compare_func() const { return _cmp; } |
||||
|
||||
uint8_t get_memcmp_magic() const { return _memcmp_magic; } |
||||
|
||||
bool valid() const { return _cmp != nullptr; } |
||||
|
||||
inline bool dbt_has_memcmp_magic(const DBT *dbt) const { |
||||
return *reinterpret_cast<const char *>(dbt->data) == _memcmp_magic; |
||||
} |
||||
|
||||
int operator()(const DBT *a, const DBT *b) const { |
||||
if (__builtin_expect(toku_dbt_is_infinite(a) || toku_dbt_is_infinite(b), |
||||
0)) { |
||||
return toku_dbt_infinite_compare(a, b); |
||||
} else if (_memcmp_magic != MEMCMP_MAGIC_NONE |
||||
// If `a' has the memcmp magic..
|
||||
&& dbt_has_memcmp_magic(a) |
||||
// ..then we expect `b' to also have the memcmp magic
|
||||
&& __builtin_expect(dbt_has_memcmp_magic(b), 1)) { |
||||
assert(0); // psergey: this branch should not be taken.
|
||||
return toku_builtin_compare_fun(a, b); |
||||
} else { |
||||
// yikes, const sadness here
|
||||
return _cmp(_cmp_arg, a, b); |
||||
} |
||||
} |
||||
|
||||
private: |
||||
ft_compare_func _cmp; |
||||
void *_cmp_arg; |
||||
|
||||
uint8_t _memcmp_magic; |
||||
}; |
||||
|
||||
} /* namespace toku */ |
@ -0,0 +1,88 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include "../db.h" |
||||
#include "../portability/toku_race_tools.h" |
||||
#include "../util/status.h" |
||||
|
||||
//
|
||||
// Lock Tree Manager statistics
|
||||
//
|
||||
class LTM_STATUS_S { |
||||
public: |
||||
enum { |
||||
LTM_SIZE_CURRENT = 0, |
||||
LTM_SIZE_LIMIT, |
||||
LTM_ESCALATION_COUNT, |
||||
LTM_ESCALATION_TIME, |
||||
LTM_ESCALATION_LATEST_RESULT, |
||||
LTM_NUM_LOCKTREES, |
||||
LTM_LOCK_REQUESTS_PENDING, |
||||
LTM_STO_NUM_ELIGIBLE, |
||||
LTM_STO_END_EARLY_COUNT, |
||||
LTM_STO_END_EARLY_TIME, |
||||
LTM_WAIT_COUNT, |
||||
LTM_WAIT_TIME, |
||||
LTM_LONG_WAIT_COUNT, |
||||
LTM_LONG_WAIT_TIME, |
||||
LTM_TIMEOUT_COUNT, |
||||
LTM_WAIT_ESCALATION_COUNT, |
||||
LTM_WAIT_ESCALATION_TIME, |
||||
LTM_LONG_WAIT_ESCALATION_COUNT, |
||||
LTM_LONG_WAIT_ESCALATION_TIME, |
||||
LTM_STATUS_NUM_ROWS // must be last
|
||||
}; |
||||
|
||||
void init(void); |
||||
void destroy(void); |
||||
|
||||
TOKU_ENGINE_STATUS_ROW_S status[LTM_STATUS_NUM_ROWS]; |
||||
|
||||
private: |
||||
bool m_initialized = false; |
||||
}; |
||||
typedef LTM_STATUS_S* LTM_STATUS; |
||||
extern LTM_STATUS_S ltm_status; |
||||
|
||||
#define LTM_STATUS_VAL(x) ltm_status.status[LTM_STATUS_S::x].value.num |
||||
|
||||
void toku_status_init(void); // just call ltm_status.init();
|
||||
void toku_status_destroy(void); // just call ltm_status.destroy();
|
@ -0,0 +1,139 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
limitations under the License. |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include "concurrent_tree.h" |
||||
|
||||
// PORT #include <toku_assert.h>
|
||||
namespace toku { |
||||
|
||||
void concurrent_tree::create(const comparator *cmp) { |
||||
// start with an empty root node. we do this instead of
|
||||
// setting m_root to null so there's always a root to lock
|
||||
m_root.create_root(cmp); |
||||
} |
||||
|
||||
void concurrent_tree::destroy(void) { m_root.destroy_root(); } |
||||
|
||||
bool concurrent_tree::is_empty(void) { return m_root.is_empty(); } |
||||
|
||||
uint64_t concurrent_tree::get_insertion_memory_overhead(void) { |
||||
return sizeof(treenode); |
||||
} |
||||
|
||||
void concurrent_tree::locked_keyrange::prepare(concurrent_tree *tree) { |
||||
// the first step in acquiring a locked keyrange is locking the root
|
||||
treenode *const root = &tree->m_root; |
||||
m_tree = tree; |
||||
m_subtree = root; |
||||
m_range = keyrange::get_infinite_range(); |
||||
root->mutex_lock(); |
||||
} |
||||
|
||||
void concurrent_tree::locked_keyrange::acquire(const keyrange &range) { |
||||
treenode *const root = &m_tree->m_root; |
||||
|
||||
treenode *subtree; |
||||
if (root->is_empty() || root->range_overlaps(range)) { |
||||
subtree = root; |
||||
} else { |
||||
// we do not have a precomputed comparison hint, so pass null
|
||||
const keyrange::comparison *cmp_hint = nullptr; |
||||
subtree = root->find_node_with_overlapping_child(range, cmp_hint); |
||||
} |
||||
|
||||
// subtree is locked. it will be unlocked when this is release()'d
|
||||
invariant_notnull(subtree); |
||||
m_range = range; |
||||
m_subtree = subtree; |
||||
} |
||||
|
||||
bool concurrent_tree::locked_keyrange::add_shared_owner(const keyrange &range, |
||||
TXNID new_owner) { |
||||
return m_subtree->insert(range, new_owner, /*is_shared*/ true); |
||||
} |
||||
|
||||
void concurrent_tree::locked_keyrange::release(void) { |
||||
m_subtree->mutex_unlock(); |
||||
} |
||||
|
||||
void concurrent_tree::locked_keyrange::insert(const keyrange &range, |
||||
TXNID txnid, bool is_shared) { |
||||
// empty means no children, and only the root should ever be empty
|
||||
if (m_subtree->is_empty()) { |
||||
m_subtree->set_range_and_txnid(range, txnid, is_shared); |
||||
} else { |
||||
m_subtree->insert(range, txnid, is_shared); |
||||
} |
||||
} |
||||
|
||||
void concurrent_tree::locked_keyrange::remove(const keyrange &range, |
||||
TXNID txnid) { |
||||
invariant(!m_subtree->is_empty()); |
||||
treenode *new_subtree = m_subtree->remove(range, txnid); |
||||
// if removing range changed the root of the subtree,
|
||||
// then the subtree must be the root of the entire tree.
|
||||
if (new_subtree == nullptr) { |
||||
invariant(m_subtree->is_root()); |
||||
invariant(m_subtree->is_empty()); |
||||
} |
||||
} |
||||
|
||||
void concurrent_tree::locked_keyrange::remove_all(void) { |
||||
m_subtree->recursive_remove(); |
||||
} |
||||
|
||||
} /* namespace toku */ |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,174 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=2:softtabstop=2:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
limitations under the License. |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include "../ft/comparator.h" |
||||
#include "keyrange.h" |
||||
#include "treenode.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// A concurrent_tree stores non-overlapping ranges.
|
||||
// Access to disjoint parts of the tree usually occurs concurrently.
|
||||
|
||||
class concurrent_tree { |
||||
public: |
||||
// A locked_keyrange gives you exclusive access to read and write
|
||||
// operations that occur on any keys in that range. You only have
|
||||
// the right to operate on keys in that range or keys that were read
|
||||
// from the keyrange using iterate()
|
||||
//
|
||||
// Access model:
|
||||
// - user prepares a locked keyrange. all threads serialize behind prepare().
|
||||
// - user breaks the serialzation point by acquiring a range, or releasing.
|
||||
// - one thread operates on a certain locked_keyrange object at a time.
|
||||
// - when the thread is finished, it releases
|
||||
|
||||
class locked_keyrange { |
||||
public: |
||||
// effect: prepare to acquire a locked keyrange over the given
|
||||
// concurrent_tree, preventing other threads from preparing
|
||||
// until this thread either does acquire() or release().
|
||||
// note: operations performed on a prepared keyrange are equivalent
|
||||
// to ones performed on an acquired keyrange over -inf, +inf.
|
||||
// rationale: this provides the user with a serialization point for
|
||||
// descending
|
||||
// or modifying the the tree. it also proives a convenient way of
|
||||
// doing serializable operations on the tree.
|
||||
// There are two valid sequences of calls:
|
||||
// - prepare, acquire, [operations], release
|
||||
// - prepare, [operations],release
|
||||
void prepare(concurrent_tree *tree); |
||||
|
||||
// requires: the locked keyrange was prepare()'d
|
||||
// effect: acquire a locked keyrange over the given concurrent_tree.
|
||||
// the locked keyrange represents the range of keys overlapped
|
||||
// by the given range
|
||||
void acquire(const keyrange &range); |
||||
|
||||
// effect: releases a locked keyrange and the mutex it holds
|
||||
void release(void); |
||||
|
||||
// effect: iterate over each range this locked_keyrange represents,
|
||||
// calling function->fn() on each node's keyrange and txnid
|
||||
// until there are no more or the function returns false
|
||||
template <class F> |
||||
void iterate(F *function) const { |
||||
// if the subtree is non-empty, traverse it by calling the given
|
||||
// function on each range, txnid pair found that overlaps.
|
||||
if (!m_subtree->is_empty()) { |
||||
m_subtree->traverse_overlaps(m_range, function); |
||||
} |
||||
} |
||||
|
||||
// Adds another owner to the lock on the specified keyrange.
|
||||
// requires: the keyrange contains one treenode whose bounds are
|
||||
// exactly equal to the specifed range (no sub/supersets)
|
||||
bool add_shared_owner(const keyrange &range, TXNID new_owner); |
||||
|
||||
// inserts the given range into the tree, with an associated txnid.
|
||||
// requires: range does not overlap with anything in this locked_keyrange
|
||||
// rationale: caller is responsible for only inserting unique ranges
|
||||
void insert(const keyrange &range, TXNID txnid, bool is_shared); |
||||
|
||||
// effect: removes the given range from the tree.
|
||||
// - txnid=TXNID_ANY means remove the range no matter what its
|
||||
// owners are
|
||||
// - Other value means remove the specified txnid from
|
||||
// ownership (if the range has other owners, it will remain
|
||||
// in the tree)
|
||||
// requires: range exists exactly in this locked_keyrange
|
||||
// rationale: caller is responsible for only removing existing ranges
|
||||
void remove(const keyrange &range, TXNID txnid); |
||||
|
||||
// effect: removes all of the keys represented by this locked keyrange
|
||||
// rationale: we'd like a fast way to empty out a tree
|
||||
void remove_all(void); |
||||
|
||||
private: |
||||
// the concurrent tree this locked keyrange is for
|
||||
concurrent_tree *m_tree; |
||||
|
||||
// the range of keys this locked keyrange represents
|
||||
keyrange m_range; |
||||
|
||||
// the subtree under which all overlapping ranges exist
|
||||
treenode *m_subtree; |
||||
|
||||
friend class concurrent_tree_unit_test; |
||||
}; |
||||
|
||||
// effect: initialize the tree to an empty state
|
||||
void create(const comparator *cmp); |
||||
|
||||
// effect: destroy the tree.
|
||||
// requires: tree is empty
|
||||
void destroy(void); |
||||
|
||||
// returns: true iff the tree is empty
|
||||
bool is_empty(void); |
||||
|
||||
// returns: the memory overhead of a single insertion into the tree
|
||||
static uint64_t get_insertion_memory_overhead(void); |
||||
|
||||
private: |
||||
// the root needs to always exist so there's a lock to grab
|
||||
// even if the tree is empty. that's why we store a treenode
|
||||
// here and not a pointer to one.
|
||||
treenode m_root; |
||||
|
||||
friend class concurrent_tree_unit_test; |
||||
}; |
||||
|
||||
} /* namespace toku */ |
@ -0,0 +1,221 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include "keyrange.h" |
||||
|
||||
#include "../util/dbt.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// create a keyrange by borrowing the left and right dbt
|
||||
// pointers. no memory is copied. no checks for infinity needed.
|
||||
void keyrange::create(const DBT *left, const DBT *right) { |
||||
init_empty(); |
||||
m_left_key = left; |
||||
m_right_key = right; |
||||
} |
||||
|
||||
// destroy the key copies. if they were never set, then destroy does nothing.
|
||||
void keyrange::destroy(void) { |
||||
toku_destroy_dbt(&m_left_key_copy); |
||||
toku_destroy_dbt(&m_right_key_copy); |
||||
} |
||||
|
||||
// create a keyrange by copying the keys from the given range.
|
||||
void keyrange::create_copy(const keyrange &range) { |
||||
// start with an initialized, empty range
|
||||
init_empty(); |
||||
|
||||
// optimize the case where the left and right keys are the same.
|
||||
// we'd like to only have one copy of the data.
|
||||
if (toku_dbt_equals(range.get_left_key(), range.get_right_key())) { |
||||
set_both_keys(range.get_left_key()); |
||||
} else { |
||||
// replace our empty left and right keys with
|
||||
// copies of the range's left and right keys
|
||||
replace_left_key(range.get_left_key()); |
||||
replace_right_key(range.get_right_key()); |
||||
} |
||||
} |
||||
|
||||
// extend this keyrange by choosing the leftmost and rightmost
|
||||
// endpoints between this range and the given. replaced keys
|
||||
// in this range are freed and inherited keys are copied.
|
||||
void keyrange::extend(const comparator &cmp, const keyrange &range) { |
||||
const DBT *range_left = range.get_left_key(); |
||||
const DBT *range_right = range.get_right_key(); |
||||
if (cmp(range_left, get_left_key()) < 0) { |
||||
replace_left_key(range_left); |
||||
} |
||||
if (cmp(range_right, get_right_key()) > 0) { |
||||
replace_right_key(range_right); |
||||
} |
||||
} |
||||
|
||||
// how much memory does this keyrange take?
|
||||
// - the size of the left and right keys
|
||||
// --- ignore the fact that we may have optimized the point case.
|
||||
// it complicates things for little gain.
|
||||
// - the size of the keyrange class itself
|
||||
uint64_t keyrange::get_memory_size(void) const { |
||||
const DBT *left_key = get_left_key(); |
||||
const DBT *right_key = get_right_key(); |
||||
return left_key->size + right_key->size + sizeof(keyrange); |
||||
} |
||||
|
||||
// compare ranges.
|
||||
keyrange::comparison keyrange::compare(const comparator &cmp, |
||||
const keyrange &range) const { |
||||
if (cmp(get_right_key(), range.get_left_key()) < 0) { |
||||
return comparison::LESS_THAN; |
||||
} else if (cmp(get_left_key(), range.get_right_key()) > 0) { |
||||
return comparison::GREATER_THAN; |
||||
} else if (cmp(get_left_key(), range.get_left_key()) == 0 && |
||||
cmp(get_right_key(), range.get_right_key()) == 0) { |
||||
return comparison::EQUALS; |
||||
} else { |
||||
return comparison::OVERLAPS; |
||||
} |
||||
} |
||||
|
||||
bool keyrange::overlaps(const comparator &cmp, const keyrange &range) const { |
||||
// equality is a stronger form of overlapping.
|
||||
// so two ranges "overlap" if they're either equal or just overlapping.
|
||||
comparison c = compare(cmp, range); |
||||
return c == comparison::EQUALS || c == comparison::OVERLAPS; |
||||
} |
||||
|
||||
keyrange keyrange::get_infinite_range(void) { |
||||
keyrange range; |
||||
range.create(toku_dbt_negative_infinity(), toku_dbt_positive_infinity()); |
||||
return range; |
||||
} |
||||
|
||||
void keyrange::init_empty(void) { |
||||
m_left_key = nullptr; |
||||
m_right_key = nullptr; |
||||
toku_init_dbt(&m_left_key_copy); |
||||
toku_init_dbt(&m_right_key_copy); |
||||
m_point_range = false; |
||||
} |
||||
|
||||
const DBT *keyrange::get_left_key(void) const { |
||||
if (m_left_key) { |
||||
return m_left_key; |
||||
} else { |
||||
return &m_left_key_copy; |
||||
} |
||||
} |
||||
|
||||
const DBT *keyrange::get_right_key(void) const { |
||||
if (m_right_key) { |
||||
return m_right_key; |
||||
} else { |
||||
return &m_right_key_copy; |
||||
} |
||||
} |
||||
|
||||
// copy the given once and set both the left and right pointers.
|
||||
// optimization for point ranges, so the left and right ranges
|
||||
// are not copied twice.
|
||||
void keyrange::set_both_keys(const DBT *key) { |
||||
if (toku_dbt_is_infinite(key)) { |
||||
m_left_key = key; |
||||
m_right_key = key; |
||||
} else { |
||||
toku_clone_dbt(&m_left_key_copy, *key); |
||||
toku_copyref_dbt(&m_right_key_copy, m_left_key_copy); |
||||
} |
||||
m_point_range = true; |
||||
} |
||||
|
||||
// destroy the current left key. set and possibly copy the new one
|
||||
void keyrange::replace_left_key(const DBT *key) { |
||||
// a little magic:
|
||||
//
|
||||
// if this is a point range, then the left and right keys share
|
||||
// one copy of the data, and it lives in the left key copy. so
|
||||
// if we're replacing the left key, move the real data to the
|
||||
// right key copy instead of destroying it. now, the memory is
|
||||
// owned by the right key and the left key may be replaced.
|
||||
if (m_point_range) { |
||||
m_right_key_copy = m_left_key_copy; |
||||
} else { |
||||
toku_destroy_dbt(&m_left_key_copy); |
||||
} |
||||
|
||||
if (toku_dbt_is_infinite(key)) { |
||||
m_left_key = key; |
||||
} else { |
||||
toku_clone_dbt(&m_left_key_copy, *key); |
||||
m_left_key = nullptr; |
||||
} |
||||
m_point_range = false; |
||||
} |
||||
|
||||
// destroy the current right key. set and possibly copy the new one
|
||||
void keyrange::replace_right_key(const DBT *key) { |
||||
toku_destroy_dbt(&m_right_key_copy); |
||||
if (toku_dbt_is_infinite(key)) { |
||||
m_right_key = key; |
||||
} else { |
||||
toku_clone_dbt(&m_right_key_copy, *key); |
||||
m_right_key = nullptr; |
||||
} |
||||
m_point_range = false; |
||||
} |
||||
|
||||
} /* namespace toku */ |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,140 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include "../ft/comparator.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// A keyrange has a left and right key as endpoints.
|
||||
//
|
||||
// When a keyrange is created it owns no memory, but when it copies
|
||||
// or extends another keyrange, it copies memory as necessary. This
|
||||
// means it is cheap in the common case.
|
||||
|
||||
class keyrange { |
||||
public: |
||||
// effect: constructor that borrows left and right key pointers.
|
||||
// no memory is allocated or copied.
|
||||
void create(const DBT *left_key, const DBT *right_key); |
||||
|
||||
// effect: constructor that allocates and copies another keyrange's points.
|
||||
void create_copy(const keyrange &range); |
||||
|
||||
// effect: destroys the keyrange, freeing any allocated memory
|
||||
void destroy(void); |
||||
|
||||
// effect: extends the keyrange by choosing the leftmost and rightmost
|
||||
// endpoints from this range and the given range.
|
||||
// replaced keys in this range are freed, new keys are copied.
|
||||
void extend(const comparator &cmp, const keyrange &range); |
||||
|
||||
// returns: the amount of memory this keyrange takes. does not account
|
||||
// for point optimizations or malloc overhead.
|
||||
uint64_t get_memory_size(void) const; |
||||
|
||||
// returns: pointer to the left key of this range
|
||||
const DBT *get_left_key(void) const; |
||||
|
||||
// returns: pointer to the right key of this range
|
||||
const DBT *get_right_key(void) const; |
||||
|
||||
// two ranges are either equal, lt, gt, or overlapping
|
||||
enum comparison { EQUALS, LESS_THAN, GREATER_THAN, OVERLAPS }; |
||||
|
||||
// effect: compares this range to the given range
|
||||
// returns: LESS_THAN if given range is strictly to the left
|
||||
// GREATER_THAN if given range is strictly to the right
|
||||
// EQUALS if given range has the same left and right endpoints
|
||||
// OVERLAPS if at least one of the given range's endpoints falls
|
||||
// between this range's endpoints
|
||||
comparison compare(const comparator &cmp, const keyrange &range) const; |
||||
|
||||
// returns: true if the range and the given range are equal or overlapping
|
||||
bool overlaps(const comparator &cmp, const keyrange &range) const; |
||||
|
||||
// returns: a keyrange representing -inf, +inf
|
||||
static keyrange get_infinite_range(void); |
||||
|
||||
private: |
||||
// some keys should be copied, some keys should not be.
|
||||
//
|
||||
// to support both, we use two DBTs for copies and two pointers
|
||||
// for temporaries. the access rule is:
|
||||
// - if a pointer is non-null, then it reprsents the key.
|
||||
// - otherwise the pointer is null, and the key is in the copy.
|
||||
DBT m_left_key_copy; |
||||
DBT m_right_key_copy; |
||||
const DBT *m_left_key; |
||||
const DBT *m_right_key; |
||||
|
||||
// if this range is a point range, then m_left_key == m_right_key
|
||||
// and the actual data is stored exactly once in m_left_key_copy.
|
||||
bool m_point_range; |
||||
|
||||
// effect: initializes a keyrange to be empty
|
||||
void init_empty(void); |
||||
|
||||
// effect: copies the given key once into the left key copy
|
||||
// and sets the right key copy to share the left.
|
||||
// rationale: optimization for point ranges to only do one malloc
|
||||
void set_both_keys(const DBT *key); |
||||
|
||||
// effect: destroys the current left key. sets and copies the new one.
|
||||
void replace_left_key(const DBT *key); |
||||
|
||||
// effect: destroys the current right key. sets and copies the new one.
|
||||
void replace_right_key(const DBT *key); |
||||
}; |
||||
|
||||
} /* namespace toku */ |
@ -0,0 +1,534 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include "lock_request.h" |
||||
|
||||
#include "../portability/toku_race_tools.h" |
||||
#include "../portability/txn_subst.h" |
||||
#include "../util/dbt.h" |
||||
#include "locktree.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// initialize a lock request's internals
|
||||
void lock_request::create(toku_external_mutex_factory_t mutex_factory) { |
||||
m_txnid = TXNID_NONE; |
||||
m_conflicting_txnid = TXNID_NONE; |
||||
m_start_time = 0; |
||||
m_left_key = nullptr; |
||||
m_right_key = nullptr; |
||||
toku_init_dbt(&m_left_key_copy); |
||||
toku_init_dbt(&m_right_key_copy); |
||||
|
||||
m_type = type::UNKNOWN; |
||||
m_lt = nullptr; |
||||
|
||||
m_complete_r = 0; |
||||
m_state = state::UNINITIALIZED; |
||||
m_info = nullptr; |
||||
|
||||
// psergey-todo: this condition is for interruptible wait
|
||||
// note: moved to here from lock_request::create:
|
||||
toku_external_cond_init(mutex_factory, &m_wait_cond); |
||||
|
||||
m_start_test_callback = nullptr; |
||||
m_start_before_pending_test_callback = nullptr; |
||||
m_retry_test_callback = nullptr; |
||||
} |
||||
|
||||
// destroy a lock request.
|
||||
void lock_request::destroy(void) { |
||||
invariant(m_state != state::PENDING); |
||||
invariant(m_state != state::DESTROYED); |
||||
m_state = state::DESTROYED; |
||||
toku_destroy_dbt(&m_left_key_copy); |
||||
toku_destroy_dbt(&m_right_key_copy); |
||||
toku_external_cond_destroy(&m_wait_cond); |
||||
} |
||||
|
||||
// set the lock request parameters. this API allows a lock request to be reused.
|
||||
void lock_request::set(locktree *lt, TXNID txnid, const DBT *left_key, |
||||
const DBT *right_key, lock_request::type lock_type, |
||||
bool big_txn, void *extra) { |
||||
invariant(m_state != state::PENDING); |
||||
m_lt = lt; |
||||
|
||||
m_txnid = txnid; |
||||
m_left_key = left_key; |
||||
m_right_key = right_key; |
||||
toku_destroy_dbt(&m_left_key_copy); |
||||
toku_destroy_dbt(&m_right_key_copy); |
||||
m_type = lock_type; |
||||
m_state = state::INITIALIZED; |
||||
m_info = lt ? lt->get_lock_request_info() : nullptr; |
||||
m_big_txn = big_txn; |
||||
m_extra = extra; |
||||
} |
||||
|
||||
// get rid of any stored left and right key copies and
|
||||
// replace them with copies of the given left and right key
|
||||
void lock_request::copy_keys() { |
||||
if (!toku_dbt_is_infinite(m_left_key)) { |
||||
toku_clone_dbt(&m_left_key_copy, *m_left_key); |
||||
m_left_key = &m_left_key_copy; |
||||
} |
||||
if (!toku_dbt_is_infinite(m_right_key)) { |
||||
toku_clone_dbt(&m_right_key_copy, *m_right_key); |
||||
m_right_key = &m_right_key_copy; |
||||
} |
||||
} |
||||
|
||||
// what are the conflicts for this pending lock request?
|
||||
void lock_request::get_conflicts(txnid_set *conflicts) { |
||||
invariant(m_state == state::PENDING); |
||||
const bool is_write_request = m_type == type::WRITE; |
||||
m_lt->get_conflicts(is_write_request, m_txnid, m_left_key, m_right_key, |
||||
conflicts); |
||||
} |
||||
|
||||
// build a wait-for-graph for this lock request and the given conflict set
|
||||
// for each transaction B that blocks A's lock request
|
||||
// if B is blocked then
|
||||
// add (A,T) to the WFG and if B is new, fill in the WFG from B
|
||||
void lock_request::build_wait_graph(wfg *wait_graph, |
||||
const txnid_set &conflicts) { |
||||
uint32_t num_conflicts = conflicts.size(); |
||||
for (uint32_t i = 0; i < num_conflicts; i++) { |
||||
TXNID conflicting_txnid = conflicts.get(i); |
||||
lock_request *conflicting_request = find_lock_request(conflicting_txnid); |
||||
invariant(conflicting_txnid != m_txnid); |
||||
invariant(conflicting_request != this); |
||||
if (conflicting_request) { |
||||
bool already_exists = wait_graph->node_exists(conflicting_txnid); |
||||
wait_graph->add_edge(m_txnid, conflicting_txnid); |
||||
if (!already_exists) { |
||||
// recursively build the wait for graph rooted at the conflicting
|
||||
// request, given its set of lock conflicts.
|
||||
txnid_set other_conflicts; |
||||
other_conflicts.create(); |
||||
conflicting_request->get_conflicts(&other_conflicts); |
||||
conflicting_request->build_wait_graph(wait_graph, other_conflicts); |
||||
other_conflicts.destroy(); |
||||
} |
||||
} |
||||
} |
||||
} |
||||
|
||||
// returns: true if the current set of lock requests contains
|
||||
// a deadlock, false otherwise.
|
||||
bool lock_request::deadlock_exists(const txnid_set &conflicts) { |
||||
wfg wait_graph; |
||||
wait_graph.create(); |
||||
|
||||
build_wait_graph(&wait_graph, conflicts); |
||||
|
||||
std::function<void(TXNID)> reporter; |
||||
if (m_deadlock_cb) { |
||||
reporter = [this](TXNID a) { |
||||
lock_request *req = find_lock_request(a); |
||||
if (req) { |
||||
m_deadlock_cb(req->m_txnid, (req->m_type == lock_request::WRITE), |
||||
req->m_left_key, req->m_right_key); |
||||
} |
||||
}; |
||||
} |
||||
|
||||
bool deadlock = wait_graph.cycle_exists_from_txnid(m_txnid, reporter); |
||||
wait_graph.destroy(); |
||||
return deadlock; |
||||
} |
||||
|
||||
// try to acquire a lock described by this lock request.
|
||||
int lock_request::start(void) { |
||||
int r; |
||||
|
||||
txnid_set conflicts; |
||||
conflicts.create(); |
||||
if (m_type == type::WRITE) { |
||||
r = m_lt->acquire_write_lock(m_txnid, m_left_key, m_right_key, &conflicts, |
||||
m_big_txn); |
||||
} else { |
||||
invariant(m_type == type::READ); |
||||
r = m_lt->acquire_read_lock(m_txnid, m_left_key, m_right_key, &conflicts, |
||||
m_big_txn); |
||||
} |
||||
|
||||
// if the lock is not granted, save it to the set of lock requests
|
||||
// and check for a deadlock. if there is one, complete it as failed
|
||||
if (r == DB_LOCK_NOTGRANTED) { |
||||
copy_keys(); |
||||
m_state = state::PENDING; |
||||
m_start_time = toku_current_time_microsec() / 1000; |
||||
m_conflicting_txnid = conflicts.get(0); |
||||
if (m_start_before_pending_test_callback) |
||||
m_start_before_pending_test_callback(); |
||||
toku_external_mutex_lock(&m_info->mutex); |
||||
insert_into_lock_requests(); |
||||
if (deadlock_exists(conflicts)) { |
||||
remove_from_lock_requests(); |
||||
r = DB_LOCK_DEADLOCK; |
||||
} |
||||
toku_external_mutex_unlock(&m_info->mutex); |
||||
if (m_start_test_callback) m_start_test_callback(); // test callback
|
||||
} |
||||
|
||||
if (r != DB_LOCK_NOTGRANTED) { |
||||
complete(r); |
||||
} |
||||
|
||||
conflicts.destroy(); |
||||
return r; |
||||
} |
||||
|
||||
// sleep on the lock request until it becomes resolved or the wait time has
|
||||
// elapsed.
|
||||
int lock_request::wait(uint64_t wait_time_ms) { |
||||
return wait(wait_time_ms, 0, nullptr); |
||||
} |
||||
|
||||
int lock_request::wait(uint64_t wait_time_ms, uint64_t killed_time_ms, |
||||
int (*killed_callback)(void), |
||||
void (*lock_wait_callback)(void *, TXNID, TXNID), |
||||
void *callback_arg) { |
||||
uint64_t t_now = toku_current_time_microsec(); |
||||
uint64_t t_start = t_now; |
||||
uint64_t t_end = t_start + wait_time_ms * 1000; |
||||
|
||||
toku_external_mutex_lock(&m_info->mutex); |
||||
|
||||
// check again, this time locking out other retry calls
|
||||
if (m_state == state::PENDING) { |
||||
GrowableArray<TXNID> conflicts_collector; |
||||
conflicts_collector.init(); |
||||
retry(&conflicts_collector); |
||||
if (m_state == state::PENDING) { |
||||
report_waits(&conflicts_collector, lock_wait_callback, callback_arg); |
||||
} |
||||
conflicts_collector.deinit(); |
||||
} |
||||
|
||||
while (m_state == state::PENDING) { |
||||
// check if this thread is killed
|
||||
if (killed_callback && killed_callback()) { |
||||
remove_from_lock_requests(); |
||||
complete(DB_LOCK_NOTGRANTED); |
||||
continue; |
||||
} |
||||
|
||||
// compute the time until we should wait
|
||||
uint64_t t_wait; |
||||
if (killed_time_ms == 0) { |
||||
t_wait = t_end; |
||||
} else { |
||||
t_wait = t_now + killed_time_ms * 1000; |
||||
if (t_wait > t_end) t_wait = t_end; |
||||
} |
||||
|
||||
int r = toku_external_cond_timedwait(&m_wait_cond, &m_info->mutex, |
||||
(int64_t)(t_wait - t_now)); |
||||
invariant(r == 0 || r == ETIMEDOUT); |
||||
|
||||
t_now = toku_current_time_microsec(); |
||||
if (m_state == state::PENDING && (t_now >= t_end)) { |
||||
m_info->counters.timeout_count += 1; |
||||
|
||||
// if we're still pending and we timed out, then remove our
|
||||
// request from the set of lock requests and fail.
|
||||
remove_from_lock_requests(); |
||||
|
||||
// complete sets m_state to COMPLETE, breaking us out of the loop
|
||||
complete(DB_LOCK_NOTGRANTED); |
||||
} |
||||
} |
||||
|
||||
uint64_t t_real_end = toku_current_time_microsec(); |
||||
uint64_t duration = t_real_end - t_start; |
||||
m_info->counters.wait_count += 1; |
||||
m_info->counters.wait_time += duration; |
||||
if (duration >= 1000000) { |
||||
m_info->counters.long_wait_count += 1; |
||||
m_info->counters.long_wait_time += duration; |
||||
} |
||||
toku_external_mutex_unlock(&m_info->mutex); |
||||
|
||||
invariant(m_state == state::COMPLETE); |
||||
return m_complete_r; |
||||
} |
||||
|
||||
// complete this lock request with the given return value
|
||||
void lock_request::complete(int complete_r) { |
||||
m_complete_r = complete_r; |
||||
m_state = state::COMPLETE; |
||||
} |
||||
|
||||
const DBT *lock_request::get_left_key(void) const { return m_left_key; } |
||||
|
||||
const DBT *lock_request::get_right_key(void) const { return m_right_key; } |
||||
|
||||
TXNID lock_request::get_txnid(void) const { return m_txnid; } |
||||
|
||||
uint64_t lock_request::get_start_time(void) const { return m_start_time; } |
||||
|
||||
TXNID lock_request::get_conflicting_txnid(void) const { |
||||
return m_conflicting_txnid; |
||||
} |
||||
|
||||
int lock_request::retry(GrowableArray<TXNID> *conflicts_collector) { |
||||
invariant(m_state == state::PENDING); |
||||
int r; |
||||
txnid_set conflicts; |
||||
conflicts.create(); |
||||
|
||||
if (m_type == type::WRITE) { |
||||
r = m_lt->acquire_write_lock(m_txnid, m_left_key, m_right_key, &conflicts, |
||||
m_big_txn); |
||||
} else { |
||||
r = m_lt->acquire_read_lock(m_txnid, m_left_key, m_right_key, &conflicts, |
||||
m_big_txn); |
||||
} |
||||
|
||||
// if the acquisition succeeded then remove ourselves from the
|
||||
// set of lock requests, complete, and signal the waiting thread.
|
||||
if (r == 0) { |
||||
remove_from_lock_requests(); |
||||
complete(r); |
||||
if (m_retry_test_callback) m_retry_test_callback(); // test callback
|
||||
toku_external_cond_broadcast(&m_wait_cond); |
||||
} else { |
||||
m_conflicting_txnid = conflicts.get(0); |
||||
add_conflicts_to_waits(&conflicts, conflicts_collector); |
||||
} |
||||
conflicts.destroy(); |
||||
|
||||
return r; |
||||
} |
||||
|
||||
void lock_request::retry_all_lock_requests( |
||||
locktree *lt, void (*lock_wait_callback)(void *, TXNID, TXNID), |
||||
void *callback_arg, void (*after_retry_all_test_callback)(void)) { |
||||
lt_lock_request_info *info = lt->get_lock_request_info(); |
||||
|
||||
// if there are no pending lock requests than there is nothing to do
|
||||
// the unlocked data race on pending_is_empty is OK since lock requests
|
||||
// are retried after added to the pending set.
|
||||
if (info->pending_is_empty) return; |
||||
|
||||
// get my retry generation (post increment of retry_want)
|
||||
unsigned long long my_retry_want = (info->retry_want += 1); |
||||
|
||||
toku_mutex_lock(&info->retry_mutex); |
||||
|
||||
GrowableArray<TXNID> conflicts_collector; |
||||
conflicts_collector.init(); |
||||
|
||||
// here is the group retry algorithm.
|
||||
// get the latest retry_want count and use it as the generation number of
|
||||
// this retry operation. if this retry generation is > the last retry
|
||||
// generation, then do the lock retries. otherwise, no lock retries
|
||||
// are needed.
|
||||
if ((my_retry_want - 1) == info->retry_done) { |
||||
for (;;) { |
||||
if (!info->running_retry) { |
||||
info->running_retry = true; |
||||
info->retry_done = info->retry_want; |
||||
toku_mutex_unlock(&info->retry_mutex); |
||||
retry_all_lock_requests_info(info, &conflicts_collector); |
||||
if (after_retry_all_test_callback) after_retry_all_test_callback(); |
||||
toku_mutex_lock(&info->retry_mutex); |
||||
info->running_retry = false; |
||||
toku_cond_broadcast(&info->retry_cv); |
||||
break; |
||||
} else { |
||||
toku_cond_wait(&info->retry_cv, &info->retry_mutex); |
||||
} |
||||
} |
||||
} |
||||
toku_mutex_unlock(&info->retry_mutex); |
||||
|
||||
report_waits(&conflicts_collector, lock_wait_callback, callback_arg); |
||||
conflicts_collector.deinit(); |
||||
} |
||||
|
||||
void lock_request::retry_all_lock_requests_info( |
||||
lt_lock_request_info *info, GrowableArray<TXNID> *collector) { |
||||
toku_external_mutex_lock(&info->mutex); |
||||
// retry all of the pending lock requests.
|
||||
for (uint32_t i = 0; i < info->pending_lock_requests.size();) { |
||||
lock_request *request; |
||||
int r = info->pending_lock_requests.fetch(i, &request); |
||||
invariant_zero(r); |
||||
|
||||
// retry the lock request. if it didn't succeed,
|
||||
// move on to the next lock request. otherwise
|
||||
// the request is gone from the list so we may
|
||||
// read the i'th entry for the next one.
|
||||
r = request->retry(collector); |
||||
if (r != 0) { |
||||
i++; |
||||
} |
||||
} |
||||
|
||||
// future threads should only retry lock requests if some still exist
|
||||
info->should_retry_lock_requests = info->pending_lock_requests.size() > 0; |
||||
toku_external_mutex_unlock(&info->mutex); |
||||
} |
||||
|
||||
void lock_request::add_conflicts_to_waits( |
||||
txnid_set *conflicts, GrowableArray<TXNID> *wait_conflicts) { |
||||
uint32_t num_conflicts = conflicts->size(); |
||||
for (uint32_t i = 0; i < num_conflicts; i++) { |
||||
wait_conflicts->push(m_txnid); |
||||
wait_conflicts->push(conflicts->get(i)); |
||||
} |
||||
} |
||||
|
||||
void lock_request::report_waits(GrowableArray<TXNID> *wait_conflicts, |
||||
void (*lock_wait_callback)(void *, TXNID, |
||||
TXNID), |
||||
void *callback_arg) { |
||||
if (!lock_wait_callback) return; |
||||
size_t num_conflicts = wait_conflicts->get_size(); |
||||
for (size_t i = 0; i < num_conflicts; i += 2) { |
||||
TXNID blocked_txnid = wait_conflicts->fetch_unchecked(i); |
||||
TXNID blocking_txnid = wait_conflicts->fetch_unchecked(i + 1); |
||||
(*lock_wait_callback)(callback_arg, blocked_txnid, blocking_txnid); |
||||
} |
||||
} |
||||
|
||||
void *lock_request::get_extra(void) const { return m_extra; } |
||||
|
||||
void lock_request::kill_waiter(void) { |
||||
remove_from_lock_requests(); |
||||
complete(DB_LOCK_NOTGRANTED); |
||||
toku_external_cond_broadcast(&m_wait_cond); |
||||
} |
||||
|
||||
void lock_request::kill_waiter(locktree *lt, void *extra) { |
||||
lt_lock_request_info *info = lt->get_lock_request_info(); |
||||
toku_external_mutex_lock(&info->mutex); |
||||
for (uint32_t i = 0; i < info->pending_lock_requests.size(); i++) { |
||||
lock_request *request; |
||||
int r = info->pending_lock_requests.fetch(i, &request); |
||||
if (r == 0 && request->get_extra() == extra) { |
||||
request->kill_waiter(); |
||||
break; |
||||
} |
||||
} |
||||
toku_external_mutex_unlock(&info->mutex); |
||||
} |
||||
|
||||
// find another lock request by txnid. must hold the mutex.
|
||||
lock_request *lock_request::find_lock_request(const TXNID &txnid) { |
||||
lock_request *request; |
||||
int r = m_info->pending_lock_requests.find_zero<TXNID, find_by_txnid>( |
||||
txnid, &request, nullptr); |
||||
if (r != 0) { |
||||
request = nullptr; |
||||
} |
||||
return request; |
||||
} |
||||
|
||||
// insert this lock request into the locktree's set. must hold the mutex.
|
||||
void lock_request::insert_into_lock_requests(void) { |
||||
uint32_t idx; |
||||
lock_request *request; |
||||
int r = m_info->pending_lock_requests.find_zero<TXNID, find_by_txnid>( |
||||
m_txnid, &request, &idx); |
||||
invariant(r == DB_NOTFOUND); |
||||
r = m_info->pending_lock_requests.insert_at(this, idx); |
||||
invariant_zero(r); |
||||
m_info->pending_is_empty = false; |
||||
} |
||||
|
||||
// remove this lock request from the locktree's set. must hold the mutex.
|
||||
void lock_request::remove_from_lock_requests(void) { |
||||
uint32_t idx; |
||||
lock_request *request; |
||||
int r = m_info->pending_lock_requests.find_zero<TXNID, find_by_txnid>( |
||||
m_txnid, &request, &idx); |
||||
invariant_zero(r); |
||||
invariant(request == this); |
||||
r = m_info->pending_lock_requests.delete_at(idx); |
||||
invariant_zero(r); |
||||
if (m_info->pending_lock_requests.size() == 0) |
||||
m_info->pending_is_empty = true; |
||||
} |
||||
|
||||
int lock_request::find_by_txnid(lock_request *const &request, |
||||
const TXNID &txnid) { |
||||
TXNID request_txnid = request->m_txnid; |
||||
if (request_txnid < txnid) { |
||||
return -1; |
||||
} else if (request_txnid == txnid) { |
||||
return 0; |
||||
} else { |
||||
return 1; |
||||
} |
||||
} |
||||
|
||||
void lock_request::set_start_test_callback(void (*f)(void)) { |
||||
m_start_test_callback = f; |
||||
} |
||||
|
||||
void lock_request::set_start_before_pending_test_callback(void (*f)(void)) { |
||||
m_start_before_pending_test_callback = f; |
||||
} |
||||
|
||||
void lock_request::set_retry_test_callback(void (*f)(void)) { |
||||
m_retry_test_callback = f; |
||||
} |
||||
|
||||
} /* namespace toku */ |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,238 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include "../db.h" |
||||
#include "../ft/comparator.h" |
||||
#include "../portability/toku_pthread.h" |
||||
#include "locktree.h" |
||||
#include "txnid_set.h" |
||||
#include "wfg.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// A lock request contains the db, the key range, the lock type, and
|
||||
// the transaction id that describes a potential row range lock.
|
||||
//
|
||||
// the typical use case is:
|
||||
// - initialize a lock request
|
||||
// - start to try to acquire the lock
|
||||
// - do something else
|
||||
// - wait for the lock request to be resolved on a timed condition
|
||||
// - destroy the lock request
|
||||
// a lock request is resolved when its state is no longer pending, or
|
||||
// when it becomes granted, or timedout, or deadlocked. when resolved, the
|
||||
// state of the lock request is changed and any waiting threads are awakened.
|
||||
|
||||
class lock_request { |
||||
public: |
||||
enum type { UNKNOWN, READ, WRITE }; |
||||
|
||||
// effect: Initializes a lock request.
|
||||
void create(toku_external_mutex_factory_t mutex_factory); |
||||
|
||||
// effect: Destroys a lock request.
|
||||
void destroy(void); |
||||
|
||||
// effect: Resets the lock request parameters, allowing it to be reused.
|
||||
// requires: Lock request was already created at some point
|
||||
void set(locktree *lt, TXNID txnid, const DBT *left_key, const DBT *right_key, |
||||
type lock_type, bool big_txn, void *extra = nullptr); |
||||
|
||||
// effect: Tries to acquire a lock described by this lock request.
|
||||
// returns: The return code of locktree::acquire_[write,read]_lock()
|
||||
// or DB_LOCK_DEADLOCK if this request would end up deadlocked.
|
||||
int start(void); |
||||
|
||||
// effect: Sleeps until either the request is granted or the wait time
|
||||
// expires. returns: The return code of locktree::acquire_[write,read]_lock()
|
||||
// or simply DB_LOCK_NOTGRANTED if the wait time expired.
|
||||
int wait(uint64_t wait_time_ms); |
||||
int wait(uint64_t wait_time_ms, uint64_t killed_time_ms, |
||||
int (*killed_callback)(void), |
||||
void (*lock_wait_callback)(void *, TXNID, TXNID) = nullptr, |
||||
void *callback_arg = nullptr); |
||||
|
||||
// return: left end-point of the lock range
|
||||
const DBT *get_left_key(void) const; |
||||
|
||||
// return: right end-point of the lock range
|
||||
const DBT *get_right_key(void) const; |
||||
|
||||
// return: the txnid waiting for a lock
|
||||
TXNID get_txnid(void) const; |
||||
|
||||
// return: when this lock request started, as milliseconds from epoch
|
||||
uint64_t get_start_time(void) const; |
||||
|
||||
// return: which txnid is blocking this request (there may be more, though)
|
||||
TXNID get_conflicting_txnid(void) const; |
||||
|
||||
// effect: Retries all of the lock requests for the given locktree.
|
||||
// Any lock requests successfully restarted is completed and woken
|
||||
// up.
|
||||
// The rest remain pending.
|
||||
static void retry_all_lock_requests( |
||||
locktree *lt, void (*lock_wait_callback)(void *, TXNID, TXNID) = nullptr, |
||||
void *callback_arg = nullptr, |
||||
void (*after_retry_test_callback)(void) = nullptr); |
||||
static void retry_all_lock_requests_info(lt_lock_request_info *info, |
||||
GrowableArray<TXNID> *collector); |
||||
|
||||
void set_start_test_callback(void (*f)(void)); |
||||
void set_start_before_pending_test_callback(void (*f)(void)); |
||||
void set_retry_test_callback(void (*f)(void)); |
||||
|
||||
void *get_extra(void) const; |
||||
|
||||
void kill_waiter(void); |
||||
static void kill_waiter(locktree *lt, void *extra); |
||||
|
||||
private: |
||||
enum state { |
||||
UNINITIALIZED, |
||||
INITIALIZED, |
||||
PENDING, |
||||
COMPLETE, |
||||
DESTROYED, |
||||
}; |
||||
|
||||
// The keys for a lock request are stored "unowned" in m_left_key
|
||||
// and m_right_key. When the request is about to go to sleep, it
|
||||
// copies these keys and stores them in m_left_key_copy etc and
|
||||
// sets the temporary pointers to null.
|
||||
TXNID m_txnid; |
||||
TXNID m_conflicting_txnid; |
||||
uint64_t m_start_time; |
||||
const DBT *m_left_key; |
||||
const DBT *m_right_key; |
||||
DBT m_left_key_copy; |
||||
DBT m_right_key_copy; |
||||
|
||||
// The lock request type and associated locktree
|
||||
type m_type; |
||||
locktree *m_lt; |
||||
|
||||
// If the lock request is in the completed state, then its
|
||||
// final return value is stored in m_complete_r
|
||||
int m_complete_r; |
||||
state m_state; |
||||
|
||||
toku_external_cond_t m_wait_cond; |
||||
|
||||
bool m_big_txn; |
||||
|
||||
// the lock request info state stored in the
|
||||
// locktree that this lock request is for.
|
||||
struct lt_lock_request_info *m_info; |
||||
|
||||
void *m_extra; |
||||
|
||||
// effect: tries again to acquire the lock described by this lock request
|
||||
// returns: 0 if retrying the request succeeded and is now complete
|
||||
int retry(GrowableArray<TXNID> *conflict_collector); |
||||
|
||||
void complete(int complete_r); |
||||
|
||||
// effect: Finds another lock request by txnid.
|
||||
// requires: The lock request info mutex is held
|
||||
lock_request *find_lock_request(const TXNID &txnid); |
||||
|
||||
// effect: Insert this lock request into the locktree's set.
|
||||
// requires: the locktree's mutex is held
|
||||
void insert_into_lock_requests(void); |
||||
|
||||
// effect: Removes this lock request from the locktree's set.
|
||||
// requires: The lock request info mutex is held
|
||||
void remove_from_lock_requests(void); |
||||
|
||||
// effect: Asks this request's locktree which txnids are preventing
|
||||
// us from getting the lock described by this request.
|
||||
// returns: conflicts is populated with the txnid's that this request
|
||||
// is blocked on
|
||||
void get_conflicts(txnid_set *conflicts); |
||||
|
||||
// effect: Builds a wait-for-graph for this lock request and the given
|
||||
// conflict set
|
||||
void build_wait_graph(wfg *wait_graph, const txnid_set &conflicts); |
||||
|
||||
// returns: True if this lock request is in deadlock with the given conflicts
|
||||
// set
|
||||
bool deadlock_exists(const txnid_set &conflicts); |
||||
|
||||
void copy_keys(void); |
||||
|
||||
static int find_by_txnid(lock_request *const &request, const TXNID &txnid); |
||||
|
||||
// Report list of conflicts to lock wait callback.
|
||||
static void report_waits(GrowableArray<TXNID> *wait_conflicts, |
||||
void (*lock_wait_callback)(void *, TXNID, TXNID), |
||||
void *callback_arg); |
||||
void add_conflicts_to_waits(txnid_set *conflicts, |
||||
GrowableArray<TXNID> *wait_conflicts); |
||||
|
||||
void (*m_start_test_callback)(void); |
||||
void (*m_start_before_pending_test_callback)(void); |
||||
void (*m_retry_test_callback)(void); |
||||
|
||||
public: |
||||
std::function<void(TXNID, bool, const DBT *, const DBT *)> m_deadlock_cb; |
||||
|
||||
friend class lock_request_unit_test; |
||||
}; |
||||
// PORT: lock_request is not a POD anymore due to use of toku_external_cond_t
|
||||
// This is ok as the PODness is not really required: lock_request objects are
|
||||
// not moved in memory or anything.
|
||||
// ENSURE_POD(lock_request);
|
||||
|
||||
} /* namespace toku */ |
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,559 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <atomic> |
||||
|
||||
#include "../db.h" |
||||
#include "../ft/comparator.h" |
||||
#include "../portability/toku_external_pthread.h" |
||||
#include "../portability/toku_pthread.h" |
||||
#include "../portability/toku_time.h" |
||||
// PORT #include <ft/ft-ops.h> // just for DICTIONARY_ID..
|
||||
// PORT: ft-status for LTM_STATUS:
|
||||
#include "../ft/ft-status.h" |
||||
|
||||
struct DICTIONARY_ID { |
||||
uint64_t dictid; |
||||
}; |
||||
|
||||
#include "../util/omt.h" |
||||
#include "range_buffer.h" |
||||
#include "txnid_set.h" |
||||
#include "wfg.h" |
||||
|
||||
namespace toku { |
||||
|
||||
class locktree; |
||||
class locktree_manager; |
||||
class lock_request; |
||||
class concurrent_tree; |
||||
|
||||
typedef int (*lt_create_cb)(locktree *lt, void *extra); |
||||
typedef void (*lt_destroy_cb)(locktree *lt); |
||||
typedef void (*lt_escalate_cb)(TXNID txnid, const locktree *lt, |
||||
const range_buffer &buffer, void *extra); |
||||
|
||||
struct lt_counters { |
||||
uint64_t wait_count, wait_time; |
||||
uint64_t long_wait_count, long_wait_time; |
||||
uint64_t timeout_count; |
||||
|
||||
void add(const lt_counters &rhs) { |
||||
wait_count += rhs.wait_count; |
||||
wait_time += rhs.wait_time; |
||||
long_wait_count += rhs.long_wait_count; |
||||
long_wait_time += rhs.long_wait_time; |
||||
timeout_count += rhs.timeout_count; |
||||
} |
||||
}; |
||||
|
||||
// Lock request state for some locktree
|
||||
struct lt_lock_request_info { |
||||
omt<lock_request *> pending_lock_requests; |
||||
std::atomic_bool pending_is_empty; |
||||
toku_external_mutex_t mutex; |
||||
bool should_retry_lock_requests; |
||||
lt_counters counters; |
||||
std::atomic_ullong retry_want; |
||||
unsigned long long retry_done; |
||||
toku_mutex_t retry_mutex; |
||||
toku_cond_t retry_cv; |
||||
bool running_retry; |
||||
|
||||
void init(toku_external_mutex_factory_t mutex_factory); |
||||
void destroy(void); |
||||
}; |
||||
|
||||
// The locktree manager manages a set of locktrees, one for each open
|
||||
// dictionary. Locktrees are retrieved from the manager. When they are no
|
||||
// longer needed, they are be released by the user.
|
||||
class locktree_manager { |
||||
public: |
||||
// param: create_cb, called just after a locktree is first created.
|
||||
// destroy_cb, called just before a locktree is destroyed.
|
||||
// escalate_cb, called after a locktree is escalated (with extra
|
||||
// param)
|
||||
void create(lt_create_cb create_cb, lt_destroy_cb destroy_cb, |
||||
lt_escalate_cb escalate_cb, void *extra, |
||||
toku_external_mutex_factory_t mutex_factory_arg); |
||||
|
||||
void destroy(void); |
||||
|
||||
size_t get_max_lock_memory(void); |
||||
|
||||
int set_max_lock_memory(size_t max_lock_memory); |
||||
|
||||
// effect: Get a locktree from the manager. If a locktree exists with the
|
||||
// given
|
||||
// dict_id, it is referenced and then returned. If one did not exist,
|
||||
// it is created. It will use the comparator for comparing keys. The
|
||||
// on_create callback (passed to locktree_manager::create()) will be
|
||||
// called with the given extra parameter.
|
||||
locktree *get_lt(DICTIONARY_ID dict_id, const comparator &cmp, |
||||
void *on_create_extra); |
||||
|
||||
void reference_lt(locktree *lt); |
||||
|
||||
// effect: Releases one reference on a locktree. If the reference count
|
||||
// transitions
|
||||
// to zero, the on_destroy callback is called before it gets
|
||||
// destroyed.
|
||||
void release_lt(locktree *lt); |
||||
|
||||
void get_status(LTM_STATUS status); |
||||
|
||||
// effect: calls the iterate function on each pending lock request
|
||||
// note: holds the manager's mutex
|
||||
typedef int (*lock_request_iterate_callback)(DICTIONARY_ID dict_id, |
||||
TXNID txnid, const DBT *left_key, |
||||
const DBT *right_key, |
||||
TXNID blocking_txnid, |
||||
uint64_t start_time, |
||||
void *extra); |
||||
int iterate_pending_lock_requests(lock_request_iterate_callback cb, |
||||
void *extra); |
||||
|
||||
// effect: Determines if too many locks or too much memory is being used,
|
||||
// Runs escalation on the manager if so.
|
||||
// param: big_txn, if the current transaction is 'big' (has spilled rollback
|
||||
// logs) returns: 0 if there enough resources to create a new lock, or
|
||||
// TOKUDB_OUT_OF_LOCKS
|
||||
// if there are not enough resources and lock escalation failed to
|
||||
// free up enough resources for a new lock.
|
||||
int check_current_lock_constraints(bool big_txn); |
||||
|
||||
bool over_big_threshold(void); |
||||
|
||||
void note_mem_used(uint64_t mem_used); |
||||
|
||||
void note_mem_released(uint64_t mem_freed); |
||||
|
||||
bool out_of_locks(void) const; |
||||
|
||||
// Escalate all locktrees
|
||||
void escalate_all_locktrees(void); |
||||
|
||||
// Escalate a set of locktrees
|
||||
void escalate_locktrees(locktree **locktrees, int num_locktrees); |
||||
|
||||
// effect: calls the private function run_escalation(), only ok to
|
||||
// do for tests.
|
||||
// rationale: to get better stress test coverage, we want a way to
|
||||
// deterministicly trigger lock escalation.
|
||||
void run_escalation_for_test(void); |
||||
void run_escalation(void); |
||||
|
||||
// Add time t to the escalator's wait time statistics
|
||||
void add_escalator_wait_time(uint64_t t); |
||||
|
||||
void kill_waiter(void *extra); |
||||
|
||||
private: |
||||
static const uint64_t DEFAULT_MAX_LOCK_MEMORY = 64L * 1024 * 1024; |
||||
|
||||
// tracks the current number of locks and lock memory
|
||||
uint64_t m_max_lock_memory; |
||||
uint64_t m_current_lock_memory; |
||||
|
||||
struct lt_counters m_lt_counters; |
||||
|
||||
// the create and destroy callbacks for the locktrees
|
||||
lt_create_cb m_lt_create_callback; |
||||
lt_destroy_cb m_lt_destroy_callback; |
||||
lt_escalate_cb m_lt_escalate_callback; |
||||
void *m_lt_escalate_callback_extra; |
||||
|
||||
omt<locktree *> m_locktree_map; |
||||
|
||||
toku_external_mutex_factory_t mutex_factory; |
||||
|
||||
// the manager's mutex protects the locktree map
|
||||
toku_mutex_t m_mutex; |
||||
|
||||
void mutex_lock(void); |
||||
|
||||
void mutex_unlock(void); |
||||
|
||||
// Manage the set of open locktrees
|
||||
locktree *locktree_map_find(const DICTIONARY_ID &dict_id); |
||||
void locktree_map_put(locktree *lt); |
||||
void locktree_map_remove(locktree *lt); |
||||
|
||||
static int find_by_dict_id(locktree *const <, const DICTIONARY_ID &dict_id); |
||||
|
||||
void escalator_init(void); |
||||
void escalator_destroy(void); |
||||
|
||||
// statistics about lock escalation.
|
||||
toku_mutex_t m_escalation_mutex; |
||||
uint64_t m_escalation_count; |
||||
tokutime_t m_escalation_time; |
||||
uint64_t m_escalation_latest_result; |
||||
uint64_t m_wait_escalation_count; |
||||
uint64_t m_wait_escalation_time; |
||||
uint64_t m_long_wait_escalation_count; |
||||
uint64_t m_long_wait_escalation_time; |
||||
|
||||
// the escalator coordinates escalation on a set of locktrees for a bunch of
|
||||
// threads
|
||||
class locktree_escalator { |
||||
public: |
||||
void create(void); |
||||
void destroy(void); |
||||
void run(locktree_manager *mgr, void (*escalate_locktrees_fun)(void *extra), |
||||
void *extra); |
||||
|
||||
private: |
||||
toku_mutex_t m_escalator_mutex; |
||||
toku_cond_t m_escalator_done; |
||||
bool m_escalator_running; |
||||
}; |
||||
|
||||
locktree_escalator m_escalator; |
||||
|
||||
friend class manager_unit_test; |
||||
}; |
||||
|
||||
// A locktree represents the set of row locks owned by all transactions
|
||||
// over an open dictionary. Read and write ranges are represented as
|
||||
// a left and right key which are compared with the given comparator
|
||||
//
|
||||
// Locktrees are not created and destroyed by the user. Instead, they are
|
||||
// referenced and released using the locktree manager.
|
||||
//
|
||||
// A sample workflow looks like this:
|
||||
// - Create a manager.
|
||||
// - Get a locktree by dictionaroy id from the manager.
|
||||
// - Perform read/write lock acquision on the locktree, add references to
|
||||
// the locktree using the manager, release locks, release references, etc.
|
||||
// - ...
|
||||
// - Release the final reference to the locktree. It will be destroyed.
|
||||
// - Destroy the manager.
|
||||
class locktree { |
||||
public: |
||||
// effect: Creates a locktree
|
||||
void create(locktree_manager *mgr, DICTIONARY_ID dict_id, |
||||
const comparator &cmp, |
||||
toku_external_mutex_factory_t mutex_factory); |
||||
|
||||
void destroy(void); |
||||
|
||||
// For thread-safe, external reference counting
|
||||
void add_reference(void); |
||||
|
||||
// requires: the reference count is > 0
|
||||
// returns: the reference count, after decrementing it by one
|
||||
uint32_t release_reference(void); |
||||
|
||||
// returns: the current reference count
|
||||
uint32_t get_reference_count(void); |
||||
|
||||
// effect: Attempts to grant a read lock for the range of keys between
|
||||
// [left_key, right_key]. returns: If the lock cannot be granted, return
|
||||
// DB_LOCK_NOTGRANTED, and populate the
|
||||
// given conflicts set with the txnids that hold conflicting locks in
|
||||
// the range. If the locktree cannot create more locks, return
|
||||
// TOKUDB_OUT_OF_LOCKS.
|
||||
// note: Read locks cannot be shared between txnids, as one would expect.
|
||||
// This is for simplicity since read locks are rare in MySQL.
|
||||
int acquire_read_lock(TXNID txnid, const DBT *left_key, const DBT *right_key, |
||||
txnid_set *conflicts, bool big_txn); |
||||
|
||||
// effect: Attempts to grant a write lock for the range of keys between
|
||||
// [left_key, right_key]. returns: If the lock cannot be granted, return
|
||||
// DB_LOCK_NOTGRANTED, and populate the
|
||||
// given conflicts set with the txnids that hold conflicting locks in
|
||||
// the range. If the locktree cannot create more locks, return
|
||||
// TOKUDB_OUT_OF_LOCKS.
|
||||
int acquire_write_lock(TXNID txnid, const DBT *left_key, const DBT *right_key, |
||||
txnid_set *conflicts, bool big_txn); |
||||
|
||||
// effect: populate the conflicts set with the txnids that would preventing
|
||||
// the given txnid from getting a lock on [left_key, right_key]
|
||||
void get_conflicts(bool is_write_request, TXNID txnid, const DBT *left_key, |
||||
const DBT *right_key, txnid_set *conflicts); |
||||
|
||||
// effect: Release all of the lock ranges represented by the range buffer for
|
||||
// a txnid.
|
||||
void release_locks(TXNID txnid, const range_buffer *ranges, |
||||
bool all_trx_locks_hint = false); |
||||
|
||||
// effect: Runs escalation on this locktree
|
||||
void escalate(lt_escalate_cb after_escalate_callback, void *extra); |
||||
|
||||
// returns: The userdata associated with this locktree, or null if it has not
|
||||
// been set.
|
||||
void *get_userdata(void) const; |
||||
|
||||
void set_userdata(void *userdata); |
||||
|
||||
locktree_manager *get_manager(void) const; |
||||
|
||||
void set_comparator(const comparator &cmp); |
||||
|
||||
int compare(const locktree *lt) const; |
||||
|
||||
DICTIONARY_ID get_dict_id() const; |
||||
|
||||
// Private info struct for storing pending lock request state.
|
||||
// Only to be used by lock requests. We store it here as
|
||||
// something less opaque than usual to strike a tradeoff between
|
||||
// abstraction and code complexity. It is still fairly abstract
|
||||
// since the lock_request object is opaque
|
||||
struct lt_lock_request_info *get_lock_request_info(void); |
||||
|
||||
typedef void (*dump_callback)(void *cdata, const DBT *left, const DBT *right, |
||||
TXNID txnid, bool is_shared, |
||||
TxnidVector *owners); |
||||
void dump_locks(void *cdata, dump_callback cb); |
||||
|
||||
private: |
||||
locktree_manager *m_mgr; |
||||
DICTIONARY_ID m_dict_id; |
||||
uint32_t m_reference_count; |
||||
|
||||
// Since the memory referenced by this comparator is not owned by the
|
||||
// locktree, the user must guarantee it will outlive the locktree.
|
||||
//
|
||||
// The ydb API accomplishes this by opening an ft_handle in the on_create
|
||||
// callback, which will keep the underlying FT (and its descriptor) in memory
|
||||
// for as long as the handle is open. The ft_handle is stored opaquely in the
|
||||
// userdata pointer below. see locktree_manager::get_lt w/ on_create_extra
|
||||
comparator m_cmp; |
||||
|
||||
concurrent_tree *m_rangetree; |
||||
|
||||
void *m_userdata; |
||||
struct lt_lock_request_info m_lock_request_info; |
||||
|
||||
// psergey-todo:
|
||||
// Each transaction also keeps a list of ranges it has locked.
|
||||
// So, when a transaction is running in STO mode, two identical
|
||||
// lists are kept: the STO lock list and transaction's owned locks
|
||||
// list. Why can't we do with just one list?
|
||||
|
||||
// The following fields and members prefixed with "sto_" are for
|
||||
// the single txnid optimization, intended to speed up the case
|
||||
// when only one transaction is using the locktree. If we know
|
||||
// the locktree has only one transaction, then acquiring locks
|
||||
// takes O(1) work and releasing all locks takes O(1) work.
|
||||
//
|
||||
// How do we know that the locktree only has a single txnid?
|
||||
// What do we do if it does?
|
||||
//
|
||||
// When a txn with txnid T requests a lock:
|
||||
// - If the tree is empty, the optimization is possible. Set the single
|
||||
// txnid to T, and insert the lock range into the buffer.
|
||||
// - If the tree is not empty, check if the single txnid is T. If so,
|
||||
// append the lock range to the buffer. Otherwise, migrate all of
|
||||
// the locks in the buffer into the rangetree on behalf of txnid T,
|
||||
// and invalid the single txnid.
|
||||
//
|
||||
// When a txn with txnid T releases its locks:
|
||||
// - If the single txnid is valid, it must be for T. Destroy the buffer.
|
||||
// - If it's not valid, release locks the normal way in the rangetree.
|
||||
//
|
||||
// To carry out the optimization we need to record a single txnid
|
||||
// and a range buffer for each locktree, each protected by the root
|
||||
// lock of the locktree's rangetree. The root lock for a rangetree
|
||||
// is grabbed by preparing a locked keyrange on the rangetree.
|
||||
TXNID m_sto_txnid; |
||||
range_buffer m_sto_buffer; |
||||
|
||||
// The single txnid optimization speeds up the case when only one
|
||||
// transaction is using the locktree. But it has the potential to
|
||||
// hurt the case when more than one txnid exists.
|
||||
//
|
||||
// There are two things we need to do to make the optimization only
|
||||
// optimize the case we care about, and not hurt the general case.
|
||||
//
|
||||
// Bound the worst-case latency for lock migration when the
|
||||
// optimization stops working:
|
||||
// - Idea: Stop the optimization and migrate immediate if we notice
|
||||
// the single txnid has takes many locks in the range buffer.
|
||||
// - Implementation: Enforce a max size on the single txnid range buffer.
|
||||
// - Analysis: Choosing the perfect max value, M, is difficult to do
|
||||
// without some feedback from the field. Intuition tells us that M should
|
||||
// not be so small that the optimization is worthless, and it should not
|
||||
// be so big that it's unreasonable to have to wait behind a thread doing
|
||||
// the work of converting M buffer locks into rangetree locks.
|
||||
//
|
||||
// Prevent concurrent-transaction workloads from trying the optimization
|
||||
// in vain:
|
||||
// - Idea: Don't even bother trying the optimization if we think the
|
||||
// system is in a concurrent-transaction state.
|
||||
// - Implementation: Do something even simpler than detecting whether the
|
||||
// system is in a concurent-transaction state. Just keep a "score" value
|
||||
// and some threshold. If at any time the locktree is eligible for the
|
||||
// optimization, only do it if the score is at this threshold. When you
|
||||
// actually do the optimization but someone has to migrate locks in the buffer
|
||||
// (expensive), then reset the score back to zero. Each time a txn
|
||||
// releases locks, the score is incremented by 1.
|
||||
// - Analysis: If you let the threshold be "C", then at most 1 / C txns will
|
||||
// do the optimization in a concurrent-transaction system. Similarly, it
|
||||
// takes at most C txns to start using the single txnid optimzation, which
|
||||
// is good when the system transitions from multithreaded to single threaded.
|
||||
//
|
||||
// STO_BUFFER_MAX_SIZE:
|
||||
//
|
||||
// We choose the max value to be 1 million since most transactions are smaller
|
||||
// than 1 million and we can create a rangetree of 1 million elements in
|
||||
// less than a second. So we can be pretty confident that this threshold
|
||||
// enables the optimization almost always, and prevents super pathological
|
||||
// latency issues for the first lock taken by a second thread.
|
||||
//
|
||||
// STO_SCORE_THRESHOLD:
|
||||
//
|
||||
// A simple first guess at a good value for the score threshold is 100.
|
||||
// By our analysis, we'd end up doing the optimization in vain for
|
||||
// around 1% of all transactions, which seems reasonable. Further,
|
||||
// if the system goes single threaded, it ought to be pretty quick
|
||||
// for 100 transactions to go by, so we won't have to wait long before
|
||||
// we start doing the single txind optimzation again.
|
||||
static const int STO_BUFFER_MAX_SIZE = 50 * 1024; |
||||
static const int STO_SCORE_THRESHOLD = 100; |
||||
int m_sto_score; |
||||
|
||||
// statistics about time spent ending the STO early
|
||||
uint64_t m_sto_end_early_count; |
||||
tokutime_t m_sto_end_early_time; |
||||
|
||||
// effect: begins the single txnid optimizaiton, setting m_sto_txnid
|
||||
// to the given txnid.
|
||||
// requires: m_sto_txnid is invalid
|
||||
void sto_begin(TXNID txnid); |
||||
|
||||
// effect: append a range to the sto buffer
|
||||
// requires: m_sto_txnid is valid
|
||||
void sto_append(const DBT *left_key, const DBT *right_key, |
||||
bool is_write_request); |
||||
|
||||
// effect: ends the single txnid optimization, releaseing any memory
|
||||
// stored in the sto buffer, notifying the tracker, and
|
||||
// invalidating m_sto_txnid.
|
||||
// requires: m_sto_txnid is valid
|
||||
void sto_end(void); |
||||
|
||||
// params: prepared_lkr is a void * to a prepared locked keyrange. see below.
|
||||
// effect: ends the single txnid optimization early, migrating buffer locks
|
||||
// into the rangetree, calling sto_end(), and then setting the
|
||||
// sto_score back to zero.
|
||||
// requires: m_sto_txnid is valid
|
||||
void sto_end_early(void *prepared_lkr); |
||||
void sto_end_early_no_accounting(void *prepared_lkr); |
||||
|
||||
// params: prepared_lkr is a void * to a prepared locked keyrange. we can't
|
||||
// use
|
||||
// the real type because the compiler won't allow us to forward
|
||||
// declare concurrent_tree::locked_keyrange without including
|
||||
// concurrent_tree.h, which we cannot do here because it is a template
|
||||
// implementation.
|
||||
// requires: the prepared locked keyrange is for the locktree's rangetree
|
||||
// requires: m_sto_txnid is valid
|
||||
// effect: migrates each lock in the single txnid buffer into the locktree's
|
||||
// rangetree, notifying the memory tracker as necessary.
|
||||
void sto_migrate_buffer_ranges_to_tree(void *prepared_lkr); |
||||
|
||||
// effect: If m_sto_txnid is valid, then release the txnid's locks
|
||||
// by ending the optimization.
|
||||
// requires: If m_sto_txnid is valid, it is equal to the given txnid
|
||||
// returns: True if locks were released for this txnid
|
||||
bool sto_try_release(TXNID txnid); |
||||
|
||||
// params: prepared_lkr is a void * to a prepared locked keyrange. see above.
|
||||
// requires: the prepared locked keyrange is for the locktree's rangetree
|
||||
// effect: If m_sto_txnid is valid and equal to the given txnid, then
|
||||
// append a range onto the buffer. Otherwise, if m_sto_txnid is valid
|
||||
// but not equal to this txnid, then migrate the buffer's locks
|
||||
// into the rangetree and end the optimization, setting the score
|
||||
// back to zero.
|
||||
// returns: true if the lock was acquired for this txnid
|
||||
bool sto_try_acquire(void *prepared_lkr, TXNID txnid, const DBT *left_key, |
||||
const DBT *right_key, bool is_write_request); |
||||
|
||||
// Effect:
|
||||
// Provides a hook for a helgrind suppression.
|
||||
// Returns:
|
||||
// true if m_sto_txnid is not TXNID_NONE
|
||||
bool sto_txnid_is_valid_unsafe(void) const; |
||||
|
||||
// Effect:
|
||||
// Provides a hook for a helgrind suppression.
|
||||
// Returns:
|
||||
// m_sto_score
|
||||
int sto_get_score_unsafe(void) const; |
||||
|
||||
void remove_overlapping_locks_for_txnid(TXNID txnid, const DBT *left_key, |
||||
const DBT *right_key); |
||||
|
||||
int acquire_lock_consolidated(void *prepared_lkr, TXNID txnid, |
||||
const DBT *left_key, const DBT *right_key, |
||||
bool is_write_request, txnid_set *conflicts); |
||||
|
||||
int acquire_lock(bool is_write_request, TXNID txnid, const DBT *left_key, |
||||
const DBT *right_key, txnid_set *conflicts); |
||||
|
||||
int try_acquire_lock(bool is_write_request, TXNID txnid, const DBT *left_key, |
||||
const DBT *right_key, txnid_set *conflicts, |
||||
bool big_txn); |
||||
|
||||
friend class locktree_unit_test; |
||||
friend class manager_unit_test; |
||||
friend class lock_request_unit_test; |
||||
|
||||
// engine status reaches into the locktree to read some stats
|
||||
friend void locktree_manager::get_status(LTM_STATUS status); |
||||
}; |
||||
|
||||
} /* namespace toku */ |
@ -0,0 +1,526 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include <stdlib.h> |
||||
#include <string.h> |
||||
|
||||
#include "../portability/toku_pthread.h" |
||||
#include "../util/status.h" |
||||
#include "lock_request.h" |
||||
#include "locktree.h" |
||||
|
||||
namespace toku { |
||||
|
||||
void locktree_manager::create(lt_create_cb create_cb, lt_destroy_cb destroy_cb, |
||||
lt_escalate_cb escalate_cb, void *escalate_extra, |
||||
toku_external_mutex_factory_t mutex_factory_arg) { |
||||
mutex_factory = mutex_factory_arg; |
||||
m_max_lock_memory = DEFAULT_MAX_LOCK_MEMORY; |
||||
m_current_lock_memory = 0; |
||||
|
||||
m_locktree_map.create(); |
||||
m_lt_create_callback = create_cb; |
||||
m_lt_destroy_callback = destroy_cb; |
||||
m_lt_escalate_callback = escalate_cb; |
||||
m_lt_escalate_callback_extra = escalate_extra; |
||||
ZERO_STRUCT(m_mutex); |
||||
toku_mutex_init(manager_mutex_key, &m_mutex, nullptr); |
||||
|
||||
ZERO_STRUCT(m_lt_counters); |
||||
|
||||
escalator_init(); |
||||
} |
||||
|
||||
void locktree_manager::destroy(void) { |
||||
escalator_destroy(); |
||||
invariant(m_current_lock_memory == 0); |
||||
invariant(m_locktree_map.size() == 0); |
||||
m_locktree_map.destroy(); |
||||
toku_mutex_destroy(&m_mutex); |
||||
} |
||||
|
||||
void locktree_manager::mutex_lock(void) { toku_mutex_lock(&m_mutex); } |
||||
|
||||
void locktree_manager::mutex_unlock(void) { toku_mutex_unlock(&m_mutex); } |
||||
|
||||
size_t locktree_manager::get_max_lock_memory(void) { return m_max_lock_memory; } |
||||
|
||||
int locktree_manager::set_max_lock_memory(size_t max_lock_memory) { |
||||
int r = 0; |
||||
mutex_lock(); |
||||
if (max_lock_memory < m_current_lock_memory) { |
||||
r = EDOM; |
||||
} else { |
||||
m_max_lock_memory = max_lock_memory; |
||||
} |
||||
mutex_unlock(); |
||||
return r; |
||||
} |
||||
|
||||
int locktree_manager::find_by_dict_id(locktree *const <, |
||||
const DICTIONARY_ID &dict_id) { |
||||
if (lt->get_dict_id().dictid < dict_id.dictid) { |
||||
return -1; |
||||
} else if (lt->get_dict_id().dictid == dict_id.dictid) { |
||||
return 0; |
||||
} else { |
||||
return 1; |
||||
} |
||||
} |
||||
|
||||
locktree *locktree_manager::locktree_map_find(const DICTIONARY_ID &dict_id) { |
||||
locktree *lt; |
||||
int r = m_locktree_map.find_zero<DICTIONARY_ID, find_by_dict_id>(dict_id, <, |
||||
nullptr); |
||||
return r == 0 ? lt : nullptr; |
||||
} |
||||
|
||||
void locktree_manager::locktree_map_put(locktree *lt) { |
||||
int r = m_locktree_map.insert<DICTIONARY_ID, find_by_dict_id>( |
||||
lt, lt->get_dict_id(), nullptr); |
||||
invariant_zero(r); |
||||
} |
||||
|
||||
void locktree_manager::locktree_map_remove(locktree *lt) { |
||||
uint32_t idx; |
||||
locktree *found_lt; |
||||
int r = m_locktree_map.find_zero<DICTIONARY_ID, find_by_dict_id>( |
||||
lt->get_dict_id(), &found_lt, &idx); |
||||
invariant_zero(r); |
||||
invariant(found_lt == lt); |
||||
r = m_locktree_map.delete_at(idx); |
||||
invariant_zero(r); |
||||
} |
||||
|
||||
locktree *locktree_manager::get_lt(DICTIONARY_ID dict_id, const comparator &cmp, |
||||
void *on_create_extra) { |
||||
// hold the mutex around searching and maybe
|
||||
// inserting into the locktree map
|
||||
mutex_lock(); |
||||
|
||||
locktree *lt = locktree_map_find(dict_id); |
||||
if (lt == nullptr) { |
||||
XCALLOC(lt); |
||||
lt->create(this, dict_id, cmp, mutex_factory); |
||||
|
||||
// new locktree created - call the on_create callback
|
||||
// and put it in the locktree map
|
||||
if (m_lt_create_callback) { |
||||
int r = m_lt_create_callback(lt, on_create_extra); |
||||
if (r != 0) { |
||||
lt->release_reference(); |
||||
lt->destroy(); |
||||
toku_free(lt); |
||||
lt = nullptr; |
||||
} |
||||
} |
||||
if (lt) { |
||||
locktree_map_put(lt); |
||||
} |
||||
} else { |
||||
reference_lt(lt); |
||||
} |
||||
|
||||
mutex_unlock(); |
||||
|
||||
return lt; |
||||
} |
||||
|
||||
void locktree_manager::reference_lt(locktree *lt) { |
||||
// increment using a sync fetch and add.
|
||||
// the caller guarantees that the lt won't be
|
||||
// destroyed while we increment the count here.
|
||||
//
|
||||
// the caller can do this by already having an lt
|
||||
// reference or by holding the manager mutex.
|
||||
//
|
||||
// if the manager's mutex is held, it is ok for the
|
||||
// reference count to transition from 0 to 1 (no race),
|
||||
// since we're serialized with other opens and closes.
|
||||
lt->add_reference(); |
||||
} |
||||
|
||||
void locktree_manager::release_lt(locktree *lt) { |
||||
bool do_destroy = false; |
||||
DICTIONARY_ID dict_id = lt->get_dict_id(); |
||||
|
||||
// Release a reference on the locktree. If the count transitions to zero,
|
||||
// then we *may* need to do the cleanup.
|
||||
//
|
||||
// Grab the manager's mutex and look for a locktree with this locktree's
|
||||
// dictionary id. Since dictionary id's never get reused, any locktree
|
||||
// found must be the one we just released a reference on.
|
||||
//
|
||||
// At least two things could have happened since we got the mutex:
|
||||
// - Another thread gets a locktree with the same dict_id, increments
|
||||
// the reference count. In this case, we shouldn't destroy it.
|
||||
// - Another thread gets a locktree with the same dict_id and then
|
||||
// releases it quickly, transitioning the reference count from zero to
|
||||
// one and back to zero. In this case, only one of us should destroy it.
|
||||
// It doesn't matter which. We originally missed this case, see #5776.
|
||||
//
|
||||
// After 5776, the high level rule for release is described below.
|
||||
//
|
||||
// If a thread releases a locktree and notices the reference count transition
|
||||
// to zero, then that thread must immediately:
|
||||
// - assume the locktree object is invalid
|
||||
// - grab the manager's mutex
|
||||
// - search the locktree map for a locktree with the same dict_id and remove
|
||||
// it, if it exists. the destroy may be deferred.
|
||||
// - release the manager's mutex
|
||||
//
|
||||
// This way, if many threads transition the same locktree's reference count
|
||||
// from 1 to zero and wait behind the manager's mutex, only one of them will
|
||||
// do the actual destroy and the others will happily do nothing.
|
||||
uint32_t refs = lt->release_reference(); |
||||
if (refs == 0) { |
||||
mutex_lock(); |
||||
// lt may not have already been destroyed, so look it up.
|
||||
locktree *find_lt = locktree_map_find(dict_id); |
||||
if (find_lt != nullptr) { |
||||
// A locktree is still in the map with that dict_id, so it must be
|
||||
// equal to lt. This is true because dictionary ids are never reused.
|
||||
// If the reference count is zero, it's our responsibility to remove
|
||||
// it and do the destroy. Otherwise, someone still wants it.
|
||||
// If the locktree is still valid then check if it should be deleted.
|
||||
if (find_lt == lt) { |
||||
if (lt->get_reference_count() == 0) { |
||||
locktree_map_remove(lt); |
||||
do_destroy = true; |
||||
} |
||||
m_lt_counters.add(lt->get_lock_request_info()->counters); |
||||
} |
||||
} |
||||
mutex_unlock(); |
||||
} |
||||
|
||||
// if necessary, do the destroy without holding the mutex
|
||||
if (do_destroy) { |
||||
if (m_lt_destroy_callback) { |
||||
m_lt_destroy_callback(lt); |
||||
} |
||||
lt->destroy(); |
||||
toku_free(lt); |
||||
} |
||||
} |
||||
|
||||
void locktree_manager::run_escalation(void) { |
||||
struct escalation_fn { |
||||
static void run(void *extra) { |
||||
locktree_manager *mgr = (locktree_manager *)extra; |
||||
mgr->escalate_all_locktrees(); |
||||
}; |
||||
}; |
||||
m_escalator.run(this, escalation_fn::run, this); |
||||
} |
||||
|
||||
// test-only version of lock escalation
|
||||
void locktree_manager::run_escalation_for_test(void) { run_escalation(); } |
||||
|
||||
void locktree_manager::escalate_all_locktrees(void) { |
||||
uint64_t t0 = toku_current_time_microsec(); |
||||
|
||||
// get all locktrees
|
||||
mutex_lock(); |
||||
int num_locktrees = m_locktree_map.size(); |
||||
locktree **locktrees = new locktree *[num_locktrees]; |
||||
for (int i = 0; i < num_locktrees; i++) { |
||||
int r = m_locktree_map.fetch(i, &locktrees[i]); |
||||
invariant_zero(r); |
||||
reference_lt(locktrees[i]); |
||||
} |
||||
mutex_unlock(); |
||||
|
||||
// escalate them
|
||||
escalate_locktrees(locktrees, num_locktrees); |
||||
|
||||
delete[] locktrees; |
||||
|
||||
uint64_t t1 = toku_current_time_microsec(); |
||||
add_escalator_wait_time(t1 - t0); |
||||
} |
||||
|
||||
void locktree_manager::note_mem_used(uint64_t mem_used) { |
||||
(void)toku_sync_fetch_and_add(&m_current_lock_memory, mem_used); |
||||
} |
||||
|
||||
void locktree_manager::note_mem_released(uint64_t mem_released) { |
||||
uint64_t old_mem_used = |
||||
toku_sync_fetch_and_sub(&m_current_lock_memory, mem_released); |
||||
invariant(old_mem_used >= mem_released); |
||||
} |
||||
|
||||
bool locktree_manager::out_of_locks(void) const { |
||||
return m_current_lock_memory >= m_max_lock_memory; |
||||
} |
||||
|
||||
bool locktree_manager::over_big_threshold(void) { |
||||
return m_current_lock_memory >= m_max_lock_memory / 2; |
||||
} |
||||
|
||||
int locktree_manager::iterate_pending_lock_requests( |
||||
lock_request_iterate_callback callback, void *extra) { |
||||
mutex_lock(); |
||||
int r = 0; |
||||
uint32_t num_locktrees = m_locktree_map.size(); |
||||
for (uint32_t i = 0; i < num_locktrees && r == 0; i++) { |
||||
locktree *lt; |
||||
r = m_locktree_map.fetch(i, <); |
||||
invariant_zero(r); |
||||
if (r == EINVAL) // Shouldn't happen, avoid compiler warning
|
||||
continue; |
||||
|
||||
struct lt_lock_request_info *info = lt->get_lock_request_info(); |
||||
toku_external_mutex_lock(&info->mutex); |
||||
|
||||
uint32_t num_requests = info->pending_lock_requests.size(); |
||||
for (uint32_t k = 0; k < num_requests && r == 0; k++) { |
||||
lock_request *req; |
||||
r = info->pending_lock_requests.fetch(k, &req); |
||||
invariant_zero(r); |
||||
if (r == EINVAL) /* Shouldn't happen, avoid compiler warning */ |
||||
continue; |
||||
r = callback(lt->get_dict_id(), req->get_txnid(), req->get_left_key(), |
||||
req->get_right_key(), req->get_conflicting_txnid(), |
||||
req->get_start_time(), extra); |
||||
} |
||||
|
||||
toku_external_mutex_unlock(&info->mutex); |
||||
} |
||||
mutex_unlock(); |
||||
return r; |
||||
} |
||||
|
||||
int locktree_manager::check_current_lock_constraints(bool big_txn) { |
||||
int r = 0; |
||||
if (big_txn && over_big_threshold()) { |
||||
run_escalation(); |
||||
if (over_big_threshold()) { |
||||
r = TOKUDB_OUT_OF_LOCKS; |
||||
} |
||||
} |
||||
if (r == 0 && out_of_locks()) { |
||||
run_escalation(); |
||||
if (out_of_locks()) { |
||||
// return an error if we're still out of locks after escalation.
|
||||
r = TOKUDB_OUT_OF_LOCKS; |
||||
} |
||||
} |
||||
return r; |
||||
} |
||||
|
||||
void locktree_manager::escalator_init(void) { |
||||
ZERO_STRUCT(m_escalation_mutex); |
||||
toku_mutex_init(manager_escalation_mutex_key, &m_escalation_mutex, nullptr); |
||||
m_escalation_count = 0; |
||||
m_escalation_time = 0; |
||||
m_wait_escalation_count = 0; |
||||
m_wait_escalation_time = 0; |
||||
m_long_wait_escalation_count = 0; |
||||
m_long_wait_escalation_time = 0; |
||||
m_escalation_latest_result = 0; |
||||
m_escalator.create(); |
||||
} |
||||
|
||||
void locktree_manager::escalator_destroy(void) { |
||||
m_escalator.destroy(); |
||||
toku_mutex_destroy(&m_escalation_mutex); |
||||
} |
||||
|
||||
void locktree_manager::add_escalator_wait_time(uint64_t t) { |
||||
toku_mutex_lock(&m_escalation_mutex); |
||||
m_wait_escalation_count += 1; |
||||
m_wait_escalation_time += t; |
||||
if (t >= 1000000) { |
||||
m_long_wait_escalation_count += 1; |
||||
m_long_wait_escalation_time += t; |
||||
} |
||||
toku_mutex_unlock(&m_escalation_mutex); |
||||
} |
||||
|
||||
void locktree_manager::escalate_locktrees(locktree **locktrees, |
||||
int num_locktrees) { |
||||
// there are too many row locks in the system and we need to tidy up.
|
||||
//
|
||||
// a simple implementation of escalation does not attempt
|
||||
// to reduce the memory foot print of each txn's range buffer.
|
||||
// doing so would require some layering hackery (or a callback)
|
||||
// and more complicated locking. for now, just escalate each
|
||||
// locktree individually, in-place.
|
||||
tokutime_t t0 = toku_time_now(); |
||||
for (int i = 0; i < num_locktrees; i++) { |
||||
locktrees[i]->escalate(m_lt_escalate_callback, |
||||
m_lt_escalate_callback_extra); |
||||
release_lt(locktrees[i]); |
||||
} |
||||
tokutime_t t1 = toku_time_now(); |
||||
|
||||
toku_mutex_lock(&m_escalation_mutex); |
||||
m_escalation_count++; |
||||
m_escalation_time += (t1 - t0); |
||||
m_escalation_latest_result = m_current_lock_memory; |
||||
toku_mutex_unlock(&m_escalation_mutex); |
||||
} |
||||
|
||||
struct escalate_args { |
||||
locktree_manager *mgr; |
||||
locktree **locktrees; |
||||
int num_locktrees; |
||||
}; |
||||
|
||||
void locktree_manager::locktree_escalator::create(void) { |
||||
ZERO_STRUCT(m_escalator_mutex); |
||||
toku_mutex_init(manager_escalator_mutex_key, &m_escalator_mutex, nullptr); |
||||
toku_cond_init(manager_m_escalator_done_key, &m_escalator_done, nullptr); |
||||
m_escalator_running = false; |
||||
} |
||||
|
||||
void locktree_manager::locktree_escalator::destroy(void) { |
||||
toku_cond_destroy(&m_escalator_done); |
||||
toku_mutex_destroy(&m_escalator_mutex); |
||||
} |
||||
|
||||
void locktree_manager::locktree_escalator::run( |
||||
locktree_manager *mgr, void (*escalate_locktrees_fun)(void *extra), |
||||
void *extra) { |
||||
uint64_t t0 = toku_current_time_microsec(); |
||||
toku_mutex_lock(&m_escalator_mutex); |
||||
if (!m_escalator_running) { |
||||
// run escalation on this thread
|
||||
m_escalator_running = true; |
||||
toku_mutex_unlock(&m_escalator_mutex); |
||||
escalate_locktrees_fun(extra); |
||||
toku_mutex_lock(&m_escalator_mutex); |
||||
m_escalator_running = false; |
||||
toku_cond_broadcast(&m_escalator_done); |
||||
} else { |
||||
toku_cond_wait(&m_escalator_done, &m_escalator_mutex); |
||||
} |
||||
toku_mutex_unlock(&m_escalator_mutex); |
||||
uint64_t t1 = toku_current_time_microsec(); |
||||
mgr->add_escalator_wait_time(t1 - t0); |
||||
} |
||||
|
||||
void locktree_manager::get_status(LTM_STATUS statp) { |
||||
ltm_status.init(); |
||||
LTM_STATUS_VAL(LTM_SIZE_CURRENT) = m_current_lock_memory; |
||||
LTM_STATUS_VAL(LTM_SIZE_LIMIT) = m_max_lock_memory; |
||||
LTM_STATUS_VAL(LTM_ESCALATION_COUNT) = m_escalation_count; |
||||
LTM_STATUS_VAL(LTM_ESCALATION_TIME) = m_escalation_time; |
||||
LTM_STATUS_VAL(LTM_ESCALATION_LATEST_RESULT) = m_escalation_latest_result; |
||||
LTM_STATUS_VAL(LTM_WAIT_ESCALATION_COUNT) = m_wait_escalation_count; |
||||
LTM_STATUS_VAL(LTM_WAIT_ESCALATION_TIME) = m_wait_escalation_time; |
||||
LTM_STATUS_VAL(LTM_LONG_WAIT_ESCALATION_COUNT) = m_long_wait_escalation_count; |
||||
LTM_STATUS_VAL(LTM_LONG_WAIT_ESCALATION_TIME) = m_long_wait_escalation_time; |
||||
|
||||
uint64_t lock_requests_pending = 0; |
||||
uint64_t sto_num_eligible = 0; |
||||
uint64_t sto_end_early_count = 0; |
||||
tokutime_t sto_end_early_time = 0; |
||||
uint32_t num_locktrees = 0; |
||||
struct lt_counters lt_counters; |
||||
ZERO_STRUCT(lt_counters); // PORT: instead of ={}.
|
||||
|
||||
if (toku_mutex_trylock(&m_mutex) == 0) { |
||||
lt_counters = m_lt_counters; |
||||
num_locktrees = m_locktree_map.size(); |
||||
for (uint32_t i = 0; i < num_locktrees; i++) { |
||||
locktree *lt; |
||||
int r = m_locktree_map.fetch(i, <); |
||||
invariant_zero(r); |
||||
if (r == EINVAL) // Shouldn't happen, avoid compiler warning
|
||||
continue; |
||||
if (toku_external_mutex_trylock(<->m_lock_request_info.mutex) == 0) { |
||||
lock_requests_pending += |
||||
lt->m_lock_request_info.pending_lock_requests.size(); |
||||
lt_counters.add(lt->get_lock_request_info()->counters); |
||||
toku_external_mutex_unlock(<->m_lock_request_info.mutex); |
||||
} |
||||
sto_num_eligible += lt->sto_txnid_is_valid_unsafe() ? 1 : 0; |
||||
sto_end_early_count += lt->m_sto_end_early_count; |
||||
sto_end_early_time += lt->m_sto_end_early_time; |
||||
} |
||||
mutex_unlock(); |
||||
} |
||||
|
||||
LTM_STATUS_VAL(LTM_NUM_LOCKTREES) = num_locktrees; |
||||
LTM_STATUS_VAL(LTM_LOCK_REQUESTS_PENDING) = lock_requests_pending; |
||||
LTM_STATUS_VAL(LTM_STO_NUM_ELIGIBLE) = sto_num_eligible; |
||||
LTM_STATUS_VAL(LTM_STO_END_EARLY_COUNT) = sto_end_early_count; |
||||
LTM_STATUS_VAL(LTM_STO_END_EARLY_TIME) = sto_end_early_time; |
||||
LTM_STATUS_VAL(LTM_WAIT_COUNT) = lt_counters.wait_count; |
||||
LTM_STATUS_VAL(LTM_WAIT_TIME) = lt_counters.wait_time; |
||||
LTM_STATUS_VAL(LTM_LONG_WAIT_COUNT) = lt_counters.long_wait_count; |
||||
LTM_STATUS_VAL(LTM_LONG_WAIT_TIME) = lt_counters.long_wait_time; |
||||
LTM_STATUS_VAL(LTM_TIMEOUT_COUNT) = lt_counters.timeout_count; |
||||
*statp = ltm_status; |
||||
} |
||||
|
||||
void locktree_manager::kill_waiter(void *extra) { |
||||
mutex_lock(); |
||||
int r = 0; |
||||
uint32_t num_locktrees = m_locktree_map.size(); |
||||
for (uint32_t i = 0; i < num_locktrees; i++) { |
||||
locktree *lt; |
||||
r = m_locktree_map.fetch(i, <); |
||||
invariant_zero(r); |
||||
if (r) continue; // Get rid of "may be used uninitialized" warning
|
||||
lock_request::kill_waiter(lt, extra); |
||||
} |
||||
mutex_unlock(); |
||||
} |
||||
|
||||
} /* namespace toku */ |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,264 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include "range_buffer.h" |
||||
|
||||
#include <string.h> |
||||
|
||||
#include "../portability/memory.h" |
||||
#include "../util/dbt.h" |
||||
|
||||
namespace toku { |
||||
|
||||
bool range_buffer::record_header::left_is_infinite(void) const { |
||||
return left_neg_inf || left_pos_inf; |
||||
} |
||||
|
||||
bool range_buffer::record_header::right_is_infinite(void) const { |
||||
return right_neg_inf || right_pos_inf; |
||||
} |
||||
|
||||
void range_buffer::record_header::init(const DBT *left_key, |
||||
const DBT *right_key, |
||||
bool is_exclusive) { |
||||
is_exclusive_lock = is_exclusive; |
||||
left_neg_inf = left_key == toku_dbt_negative_infinity(); |
||||
left_pos_inf = left_key == toku_dbt_positive_infinity(); |
||||
left_key_size = toku_dbt_is_infinite(left_key) ? 0 : left_key->size; |
||||
if (right_key) { |
||||
right_neg_inf = right_key == toku_dbt_negative_infinity(); |
||||
right_pos_inf = right_key == toku_dbt_positive_infinity(); |
||||
right_key_size = toku_dbt_is_infinite(right_key) ? 0 : right_key->size; |
||||
} else { |
||||
right_neg_inf = left_neg_inf; |
||||
right_pos_inf = left_pos_inf; |
||||
right_key_size = 0; |
||||
} |
||||
} |
||||
|
||||
const DBT *range_buffer::iterator::record::get_left_key(void) const { |
||||
if (_header.left_neg_inf) { |
||||
return toku_dbt_negative_infinity(); |
||||
} else if (_header.left_pos_inf) { |
||||
return toku_dbt_positive_infinity(); |
||||
} else { |
||||
return &_left_key; |
||||
} |
||||
} |
||||
|
||||
const DBT *range_buffer::iterator::record::get_right_key(void) const { |
||||
if (_header.right_neg_inf) { |
||||
return toku_dbt_negative_infinity(); |
||||
} else if (_header.right_pos_inf) { |
||||
return toku_dbt_positive_infinity(); |
||||
} else { |
||||
return &_right_key; |
||||
} |
||||
} |
||||
|
||||
size_t range_buffer::iterator::record::size(void) const { |
||||
return sizeof(record_header) + _header.left_key_size + _header.right_key_size; |
||||
} |
||||
|
||||
void range_buffer::iterator::record::deserialize(const char *buf) { |
||||
size_t current = 0; |
||||
|
||||
// deserialize the header
|
||||
memcpy(&_header, buf, sizeof(record_header)); |
||||
current += sizeof(record_header); |
||||
|
||||
// deserialize the left key if necessary
|
||||
if (!_header.left_is_infinite()) { |
||||
// point the left DBT's buffer into ours
|
||||
toku_fill_dbt(&_left_key, buf + current, _header.left_key_size); |
||||
current += _header.left_key_size; |
||||
} |
||||
|
||||
// deserialize the right key if necessary
|
||||
if (!_header.right_is_infinite()) { |
||||
if (_header.right_key_size == 0) { |
||||
toku_copyref_dbt(&_right_key, _left_key); |
||||
} else { |
||||
toku_fill_dbt(&_right_key, buf + current, _header.right_key_size); |
||||
} |
||||
} |
||||
} |
||||
|
||||
toku::range_buffer::iterator::iterator() |
||||
: _ma_chunk_iterator(nullptr), |
||||
_current_chunk_base(nullptr), |
||||
_current_chunk_offset(0), |
||||
_current_chunk_max(0), |
||||
_current_rec_size(0) {} |
||||
|
||||
toku::range_buffer::iterator::iterator(const range_buffer *buffer) |
||||
: _ma_chunk_iterator(&buffer->_arena), |
||||
_current_chunk_base(nullptr), |
||||
_current_chunk_offset(0), |
||||
_current_chunk_max(0), |
||||
_current_rec_size(0) { |
||||
reset_current_chunk(); |
||||
} |
||||
|
||||
void range_buffer::iterator::reset_current_chunk() { |
||||
_current_chunk_base = _ma_chunk_iterator.current(&_current_chunk_max); |
||||
_current_chunk_offset = 0; |
||||
} |
||||
|
||||
bool range_buffer::iterator::current(record *rec) { |
||||
if (_current_chunk_offset < _current_chunk_max) { |
||||
const char *buf = reinterpret_cast<const char *>(_current_chunk_base); |
||||
rec->deserialize(buf + _current_chunk_offset); |
||||
_current_rec_size = rec->size(); |
||||
return true; |
||||
} else { |
||||
return false; |
||||
} |
||||
} |
||||
|
||||
// move the iterator to the next record in the buffer
|
||||
void range_buffer::iterator::next(void) { |
||||
invariant(_current_chunk_offset < _current_chunk_max); |
||||
invariant(_current_rec_size > 0); |
||||
|
||||
// the next record is _current_rec_size bytes forward
|
||||
_current_chunk_offset += _current_rec_size; |
||||
// now, we don't know how big the current is, set it to 0.
|
||||
_current_rec_size = 0; |
||||
|
||||
if (_current_chunk_offset >= _current_chunk_max) { |
||||
// current chunk is exhausted, try moving to the next one
|
||||
if (_ma_chunk_iterator.more()) { |
||||
_ma_chunk_iterator.next(); |
||||
reset_current_chunk(); |
||||
} |
||||
} |
||||
} |
||||
|
||||
void range_buffer::create(void) { |
||||
// allocate buffer space lazily instead of on creation. this way,
|
||||
// no malloc/free is done if the transaction ends up taking no locks.
|
||||
_arena.create(0); |
||||
_num_ranges = 0; |
||||
} |
||||
|
||||
void range_buffer::append(const DBT *left_key, const DBT *right_key, |
||||
bool is_write_request) { |
||||
// if the keys are equal, then only one copy is stored.
|
||||
if (toku_dbt_equals(left_key, right_key)) { |
||||
invariant(left_key->size <= MAX_KEY_SIZE); |
||||
append_point(left_key, is_write_request); |
||||
} else { |
||||
invariant(left_key->size <= MAX_KEY_SIZE); |
||||
invariant(right_key->size <= MAX_KEY_SIZE); |
||||
append_range(left_key, right_key, is_write_request); |
||||
} |
||||
_num_ranges++; |
||||
} |
||||
|
||||
bool range_buffer::is_empty(void) const { return total_memory_size() == 0; } |
||||
|
||||
uint64_t range_buffer::total_memory_size(void) const { |
||||
return _arena.total_size_in_use(); |
||||
} |
||||
|
||||
int range_buffer::get_num_ranges(void) const { return _num_ranges; } |
||||
|
||||
void range_buffer::destroy(void) { _arena.destroy(); } |
||||
|
||||
void range_buffer::append_range(const DBT *left_key, const DBT *right_key, |
||||
bool is_exclusive) { |
||||
size_t record_length = |
||||
sizeof(record_header) + left_key->size + right_key->size; |
||||
char *buf = reinterpret_cast<char *>(_arena.malloc_from_arena(record_length)); |
||||
|
||||
record_header h; |
||||
h.init(left_key, right_key, is_exclusive); |
||||
|
||||
// serialize the header
|
||||
memcpy(buf, &h, sizeof(record_header)); |
||||
buf += sizeof(record_header); |
||||
|
||||
// serialize the left key if necessary
|
||||
if (!h.left_is_infinite()) { |
||||
memcpy(buf, left_key->data, left_key->size); |
||||
buf += left_key->size; |
||||
} |
||||
|
||||
// serialize the right key if necessary
|
||||
if (!h.right_is_infinite()) { |
||||
memcpy(buf, right_key->data, right_key->size); |
||||
} |
||||
} |
||||
|
||||
void range_buffer::append_point(const DBT *key, bool is_exclusive) { |
||||
size_t record_length = sizeof(record_header) + key->size; |
||||
char *buf = reinterpret_cast<char *>(_arena.malloc_from_arena(record_length)); |
||||
|
||||
record_header h; |
||||
h.init(key, nullptr, is_exclusive); |
||||
|
||||
// serialize the header
|
||||
memcpy(buf, &h, sizeof(record_header)); |
||||
buf += sizeof(record_header); |
||||
|
||||
// serialize the key if necessary
|
||||
if (!h.left_is_infinite()) { |
||||
memcpy(buf, key->data, key->size); |
||||
} |
||||
} |
||||
|
||||
} /* namespace toku */ |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,177 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <inttypes.h> |
||||
#include <stdint.h> |
||||
|
||||
#include "../util/dbt.h" |
||||
#include "../util/memarena.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// a key range buffer represents a set of key ranges that can
|
||||
// be stored, iterated over, and then destroyed all at once.
|
||||
class range_buffer { |
||||
private: |
||||
// the key range buffer is a bunch of records in a row.
|
||||
// each record has the following header, followed by the
|
||||
// left key and right key data payload, if applicable.
|
||||
// we limit keys to be 2^16, since we store lengths as 2 bytes.
|
||||
static const size_t MAX_KEY_SIZE = 1 << 16; |
||||
|
||||
struct record_header { |
||||
bool left_neg_inf; |
||||
bool left_pos_inf; |
||||
bool right_pos_inf; |
||||
bool right_neg_inf; |
||||
uint16_t left_key_size; |
||||
uint16_t right_key_size; |
||||
bool is_exclusive_lock; |
||||
|
||||
bool left_is_infinite(void) const; |
||||
|
||||
bool right_is_infinite(void) const; |
||||
|
||||
void init(const DBT *left_key, const DBT *right_key, bool is_exclusive); |
||||
}; |
||||
// PORT static_assert(sizeof(record_header) == 8, "record header format is
|
||||
// off");
|
||||
|
||||
public: |
||||
// the iterator abstracts reading over a buffer of variable length
|
||||
// records one by one until there are no more left.
|
||||
class iterator { |
||||
public: |
||||
iterator(); |
||||
iterator(const range_buffer *buffer); |
||||
|
||||
// a record represents the user-view of a serialized key range.
|
||||
// it handles positive and negative infinity and the optimized
|
||||
// point range case, where left and right points share memory.
|
||||
class record { |
||||
public: |
||||
// get a read-only pointer to the left key of this record's range
|
||||
const DBT *get_left_key(void) const; |
||||
|
||||
// get a read-only pointer to the right key of this record's range
|
||||
const DBT *get_right_key(void) const; |
||||
|
||||
// how big is this record? this tells us where the next record is
|
||||
size_t size(void) const; |
||||
|
||||
bool get_exclusive_flag() const { return _header.is_exclusive_lock; } |
||||
|
||||
// populate a record header and point our DBT's
|
||||
// buffers into ours if they are not infinite.
|
||||
void deserialize(const char *buf); |
||||
|
||||
private: |
||||
record_header _header; |
||||
DBT _left_key; |
||||
DBT _right_key; |
||||
}; |
||||
|
||||
// populate the given record object with the current
|
||||
// the memory referred to by record is valid for only
|
||||
// as long as the record exists.
|
||||
bool current(record *rec); |
||||
|
||||
// move the iterator to the next record in the buffer
|
||||
void next(void); |
||||
|
||||
private: |
||||
void reset_current_chunk(); |
||||
|
||||
// the key range buffer we are iterating over, the current
|
||||
// offset in that buffer, and the size of the current record.
|
||||
memarena::chunk_iterator _ma_chunk_iterator; |
||||
const void *_current_chunk_base; |
||||
size_t _current_chunk_offset; |
||||
size_t _current_chunk_max; |
||||
size_t _current_rec_size; |
||||
}; |
||||
|
||||
// allocate buffer space lazily instead of on creation. this way,
|
||||
// no malloc/free is done if the transaction ends up taking no locks.
|
||||
void create(void); |
||||
|
||||
// append a left/right key range to the buffer.
|
||||
// if the keys are equal, then only one copy is stored.
|
||||
void append(const DBT *left_key, const DBT *right_key, |
||||
bool is_write_request = false); |
||||
|
||||
// is this range buffer empty?
|
||||
bool is_empty(void) const; |
||||
|
||||
// how much memory is being used by this range buffer?
|
||||
uint64_t total_memory_size(void) const; |
||||
|
||||
// how many ranges are stored in this range buffer?
|
||||
int get_num_ranges(void) const; |
||||
|
||||
void destroy(void); |
||||
|
||||
private: |
||||
memarena _arena; |
||||
int _num_ranges; |
||||
|
||||
void append_range(const DBT *left_key, const DBT *right_key, |
||||
bool is_write_request); |
||||
|
||||
// append a point to the buffer. this is the space/time saving
|
||||
// optimization for key ranges where left == right.
|
||||
void append_point(const DBT *key, bool is_write_request); |
||||
}; |
||||
|
||||
} /* namespace toku */ |
@ -0,0 +1,519 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include "treenode.h" |
||||
|
||||
#include "../portability/toku_race_tools.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// TODO: source location info might have to be pulled up one caller
|
||||
// to be useful
|
||||
void treenode::mutex_lock(void) { toku_mutex_lock(&m_mutex); } |
||||
|
||||
void treenode::mutex_unlock(void) { toku_mutex_unlock(&m_mutex); } |
||||
|
||||
void treenode::init(const comparator *cmp) { |
||||
m_txnid = TXNID_NONE; |
||||
m_is_root = false; |
||||
m_is_empty = true; |
||||
m_cmp = cmp; |
||||
|
||||
m_is_shared = false; |
||||
m_owners = nullptr; |
||||
|
||||
// use an adaptive mutex at each node since we expect the time the
|
||||
// lock is held to be relatively short compared to a context switch.
|
||||
// indeed, this improves performance at high thread counts considerably.
|
||||
memset(&m_mutex, 0, sizeof(toku_mutex_t)); |
||||
toku_pthread_mutexattr_t attr; |
||||
toku_mutexattr_init(&attr); |
||||
toku_mutexattr_settype(&attr, TOKU_MUTEX_ADAPTIVE); |
||||
toku_mutex_init(treenode_mutex_key, &m_mutex, &attr); |
||||
toku_mutexattr_destroy(&attr); |
||||
m_left_child.set(nullptr); |
||||
m_right_child.set(nullptr); |
||||
} |
||||
|
||||
void treenode::create_root(const comparator *cmp) { |
||||
init(cmp); |
||||
m_is_root = true; |
||||
} |
||||
|
||||
void treenode::destroy_root(void) { |
||||
invariant(is_root()); |
||||
invariant(is_empty()); |
||||
toku_mutex_destroy(&m_mutex); |
||||
m_cmp = nullptr; |
||||
} |
||||
|
||||
void treenode::set_range_and_txnid(const keyrange &range, TXNID txnid, |
||||
bool is_shared) { |
||||
// allocates a new copy of the range for this node
|
||||
m_range.create_copy(range); |
||||
m_txnid = txnid; |
||||
m_is_shared = is_shared; |
||||
m_is_empty = false; |
||||
} |
||||
|
||||
bool treenode::is_root(void) { return m_is_root; } |
||||
|
||||
bool treenode::is_empty(void) { return m_is_empty; } |
||||
|
||||
bool treenode::range_overlaps(const keyrange &range) { |
||||
return m_range.overlaps(*m_cmp, range); |
||||
} |
||||
|
||||
treenode *treenode::alloc(const comparator *cmp, const keyrange &range, |
||||
TXNID txnid, bool is_shared) { |
||||
treenode *XCALLOC(node); |
||||
node->init(cmp); |
||||
node->set_range_and_txnid(range, txnid, is_shared); |
||||
return node; |
||||
} |
||||
|
||||
void treenode::swap_in_place(treenode *node1, treenode *node2) { |
||||
keyrange tmp_range = node1->m_range; |
||||
TXNID tmp_txnid = node1->m_txnid; |
||||
node1->m_range = node2->m_range; |
||||
node1->m_txnid = node2->m_txnid; |
||||
node2->m_range = tmp_range; |
||||
node2->m_txnid = tmp_txnid; |
||||
|
||||
bool tmp_is_shared = node1->m_is_shared; |
||||
node1->m_is_shared = node2->m_is_shared; |
||||
node2->m_is_shared = tmp_is_shared; |
||||
|
||||
auto tmp_m_owners = node1->m_owners; |
||||
node1->m_owners = node2->m_owners; |
||||
node2->m_owners = tmp_m_owners; |
||||
} |
||||
|
||||
bool treenode::add_shared_owner(TXNID txnid) { |
||||
assert(m_is_shared); |
||||
if (txnid == m_txnid) |
||||
return false; // acquiring a lock on the same range by the same trx
|
||||
|
||||
if (m_txnid != TXNID_SHARED) { |
||||
m_owners = new TxnidVector; |
||||
m_owners->insert(m_txnid); |
||||
m_txnid = TXNID_SHARED; |
||||
} |
||||
m_owners->insert(txnid); |
||||
return true; |
||||
} |
||||
|
||||
void treenode::free(treenode *node) { |
||||
// destroy the range, freeing any copied keys
|
||||
node->m_range.destroy(); |
||||
|
||||
if (node->m_owners) { |
||||
delete node->m_owners; |
||||
node->m_owners = nullptr; // need this?
|
||||
} |
||||
|
||||
// the root is simply marked as empty.
|
||||
if (node->is_root()) { |
||||
// PORT toku_mutex_assert_locked(&node->m_mutex);
|
||||
node->m_is_empty = true; |
||||
} else { |
||||
// PORT toku_mutex_assert_unlocked(&node->m_mutex);
|
||||
toku_mutex_destroy(&node->m_mutex); |
||||
toku_free(node); |
||||
} |
||||
} |
||||
|
||||
uint32_t treenode::get_depth_estimate(void) const { |
||||
const uint32_t left_est = m_left_child.depth_est; |
||||
const uint32_t right_est = m_right_child.depth_est; |
||||
return (left_est > right_est ? left_est : right_est) + 1; |
||||
} |
||||
|
||||
treenode *treenode::find_node_with_overlapping_child( |
||||
const keyrange &range, const keyrange::comparison *cmp_hint) { |
||||
// determine which child to look at based on a comparison. if we were
|
||||
// given a comparison hint, use that. otherwise, compare them now.
|
||||
keyrange::comparison c = |
||||
cmp_hint ? *cmp_hint : range.compare(*m_cmp, m_range); |
||||
|
||||
treenode *child; |
||||
if (c == keyrange::comparison::LESS_THAN) { |
||||
child = lock_and_rebalance_left(); |
||||
} else { |
||||
// The caller (locked_keyrange::acquire) handles the case where
|
||||
// the root of the locked_keyrange is the node that overlaps.
|
||||
// range is guaranteed not to overlap this node.
|
||||
invariant(c == keyrange::comparison::GREATER_THAN); |
||||
child = lock_and_rebalance_right(); |
||||
} |
||||
|
||||
// if the search would lead us to an empty subtree (child == nullptr),
|
||||
// or the child overlaps, then we know this node is the parent we want.
|
||||
// otherwise we need to recur into that child.
|
||||
if (child == nullptr) { |
||||
return this; |
||||
} else { |
||||
c = range.compare(*m_cmp, child->m_range); |
||||
if (c == keyrange::comparison::EQUALS || |
||||
c == keyrange::comparison::OVERLAPS) { |
||||
child->mutex_unlock(); |
||||
return this; |
||||
} else { |
||||
// unlock this node before recurring into the locked child,
|
||||
// passing in a comparison hint since we just comapred range
|
||||
// to the child's range.
|
||||
mutex_unlock(); |
||||
return child->find_node_with_overlapping_child(range, &c); |
||||
} |
||||
} |
||||
} |
||||
|
||||
bool treenode::insert(const keyrange &range, TXNID txnid, bool is_shared) { |
||||
int rc = true; |
||||
// choose a child to check. if that child is null, then insert the new node
|
||||
// there. otherwise recur down that child's subtree
|
||||
keyrange::comparison c = range.compare(*m_cmp, m_range); |
||||
if (c == keyrange::comparison::LESS_THAN) { |
||||
treenode *left_child = lock_and_rebalance_left(); |
||||
if (left_child == nullptr) { |
||||
left_child = treenode::alloc(m_cmp, range, txnid, is_shared); |
||||
m_left_child.set(left_child); |
||||
} else { |
||||
left_child->insert(range, txnid, is_shared); |
||||
left_child->mutex_unlock(); |
||||
} |
||||
} else if (c == keyrange::comparison::GREATER_THAN) { |
||||
// invariant(c == keyrange::comparison::GREATER_THAN);
|
||||
treenode *right_child = lock_and_rebalance_right(); |
||||
if (right_child == nullptr) { |
||||
right_child = treenode::alloc(m_cmp, range, txnid, is_shared); |
||||
m_right_child.set(right_child); |
||||
} else { |
||||
right_child->insert(range, txnid, is_shared); |
||||
right_child->mutex_unlock(); |
||||
} |
||||
} else if (c == keyrange::comparison::EQUALS) { |
||||
invariant(is_shared); |
||||
invariant(m_is_shared); |
||||
rc = add_shared_owner(txnid); |
||||
} else { |
||||
invariant(0); |
||||
} |
||||
return rc; |
||||
} |
||||
|
||||
treenode *treenode::find_child_at_extreme(int direction, treenode **parent) { |
||||
treenode *child = |
||||
direction > 0 ? m_right_child.get_locked() : m_left_child.get_locked(); |
||||
|
||||
if (child) { |
||||
*parent = this; |
||||
treenode *child_extreme = child->find_child_at_extreme(direction, parent); |
||||
child->mutex_unlock(); |
||||
return child_extreme; |
||||
} else { |
||||
return this; |
||||
} |
||||
} |
||||
|
||||
treenode *treenode::find_leftmost_child(treenode **parent) { |
||||
return find_child_at_extreme(-1, parent); |
||||
} |
||||
|
||||
treenode *treenode::find_rightmost_child(treenode **parent) { |
||||
return find_child_at_extreme(1, parent); |
||||
} |
||||
|
||||
treenode *treenode::remove_root_of_subtree() { |
||||
// if this node has no children, just free it and return null
|
||||
if (m_left_child.ptr == nullptr && m_right_child.ptr == nullptr) { |
||||
// treenode::free requires that non-root nodes are unlocked
|
||||
if (!is_root()) { |
||||
mutex_unlock(); |
||||
} |
||||
treenode::free(this); |
||||
return nullptr; |
||||
} |
||||
|
||||
// we have a child, so get either the in-order successor or
|
||||
// predecessor of this node to be our replacement.
|
||||
// replacement_parent is updated by the find functions as
|
||||
// they recur down the tree, so initialize it to this.
|
||||
treenode *child, *replacement; |
||||
treenode *replacement_parent = this; |
||||
if (m_left_child.ptr != nullptr) { |
||||
child = m_left_child.get_locked(); |
||||
replacement = child->find_rightmost_child(&replacement_parent); |
||||
invariant(replacement == child || replacement_parent != this); |
||||
|
||||
// detach the replacement from its parent
|
||||
if (replacement_parent == this) { |
||||
m_left_child = replacement->m_left_child; |
||||
} else { |
||||
replacement_parent->m_right_child = replacement->m_left_child; |
||||
} |
||||
} else { |
||||
child = m_right_child.get_locked(); |
||||
replacement = child->find_leftmost_child(&replacement_parent); |
||||
invariant(replacement == child || replacement_parent != this); |
||||
|
||||
// detach the replacement from its parent
|
||||
if (replacement_parent == this) { |
||||
m_right_child = replacement->m_right_child; |
||||
} else { |
||||
replacement_parent->m_left_child = replacement->m_right_child; |
||||
} |
||||
} |
||||
child->mutex_unlock(); |
||||
|
||||
// swap in place with the detached replacement, then destroy it
|
||||
treenode::swap_in_place(replacement, this); |
||||
treenode::free(replacement); |
||||
|
||||
return this; |
||||
} |
||||
|
||||
void treenode::recursive_remove(void) { |
||||
treenode *left = m_left_child.ptr; |
||||
if (left) { |
||||
left->recursive_remove(); |
||||
} |
||||
m_left_child.set(nullptr); |
||||
|
||||
treenode *right = m_right_child.ptr; |
||||
if (right) { |
||||
right->recursive_remove(); |
||||
} |
||||
m_right_child.set(nullptr); |
||||
|
||||
// we do not take locks on the way down, so we know non-root nodes
|
||||
// are unlocked here and the caller is required to pass a locked
|
||||
// root, so this free is correct.
|
||||
treenode::free(this); |
||||
} |
||||
|
||||
void treenode::remove_shared_owner(TXNID txnid) { |
||||
assert(m_owners->size() > 1); |
||||
m_owners->erase(txnid); |
||||
assert(m_owners->size() > 0); |
||||
/* if there is just one owner left, move it to m_txnid */ |
||||
if (m_owners->size() == 1) { |
||||
m_txnid = *m_owners->begin(); |
||||
delete m_owners; |
||||
m_owners = nullptr; |
||||
} |
||||
} |
||||
|
||||
treenode *treenode::remove(const keyrange &range, TXNID txnid) { |
||||
treenode *child; |
||||
// if the range is equal to this node's range, then just remove
|
||||
// the root of this subtree. otherwise search down the tree
|
||||
// in either the left or right children.
|
||||
keyrange::comparison c = range.compare(*m_cmp, m_range); |
||||
switch (c) { |
||||
case keyrange::comparison::EQUALS: { |
||||
// if we are the only owners, remove. Otherwise, just remove
|
||||
// us from the owners list.
|
||||
if (txnid != TXNID_ANY && has_multiple_owners()) { |
||||
remove_shared_owner(txnid); |
||||
return this; |
||||
} else { |
||||
return remove_root_of_subtree(); |
||||
} |
||||
} |
||||
case keyrange::comparison::LESS_THAN: |
||||
child = m_left_child.get_locked(); |
||||
invariant_notnull(child); |
||||
child = child->remove(range, txnid); |
||||
|
||||
// unlock the child if there still is one.
|
||||
// regardless, set the right child pointer
|
||||
if (child) { |
||||
child->mutex_unlock(); |
||||
} |
||||
m_left_child.set(child); |
||||
break; |
||||
case keyrange::comparison::GREATER_THAN: |
||||
child = m_right_child.get_locked(); |
||||
invariant_notnull(child); |
||||
child = child->remove(range, txnid); |
||||
|
||||
// unlock the child if there still is one.
|
||||
// regardless, set the right child pointer
|
||||
if (child) { |
||||
child->mutex_unlock(); |
||||
} |
||||
m_right_child.set(child); |
||||
break; |
||||
case keyrange::comparison::OVERLAPS: |
||||
// shouldn't be overlapping, since the tree is
|
||||
// non-overlapping and this range must exist
|
||||
abort(); |
||||
} |
||||
|
||||
return this; |
||||
} |
||||
|
||||
bool treenode::left_imbalanced(int threshold) const { |
||||
uint32_t left_depth = m_left_child.depth_est; |
||||
uint32_t right_depth = m_right_child.depth_est; |
||||
return m_left_child.ptr != nullptr && left_depth > threshold + right_depth; |
||||
} |
||||
|
||||
bool treenode::right_imbalanced(int threshold) const { |
||||
uint32_t left_depth = m_left_child.depth_est; |
||||
uint32_t right_depth = m_right_child.depth_est; |
||||
return m_right_child.ptr != nullptr && right_depth > threshold + left_depth; |
||||
} |
||||
|
||||
// effect: rebalances the subtree rooted at this node
|
||||
// using AVL style O(1) rotations. unlocks this
|
||||
// node if it is not the new root of the subtree.
|
||||
// requires: node is locked by this thread, children are not
|
||||
// returns: locked root node of the rebalanced tree
|
||||
treenode *treenode::maybe_rebalance(void) { |
||||
// if we end up not rotating at all, the new root is this
|
||||
treenode *new_root = this; |
||||
treenode *child = nullptr; |
||||
|
||||
if (left_imbalanced(IMBALANCE_THRESHOLD)) { |
||||
child = m_left_child.get_locked(); |
||||
if (child->right_imbalanced(0)) { |
||||
treenode *grandchild = child->m_right_child.get_locked(); |
||||
|
||||
child->m_right_child = grandchild->m_left_child; |
||||
grandchild->m_left_child.set(child); |
||||
|
||||
m_left_child = grandchild->m_right_child; |
||||
grandchild->m_right_child.set(this); |
||||
|
||||
new_root = grandchild; |
||||
} else { |
||||
m_left_child = child->m_right_child; |
||||
child->m_right_child.set(this); |
||||
new_root = child; |
||||
} |
||||
} else if (right_imbalanced(IMBALANCE_THRESHOLD)) { |
||||
child = m_right_child.get_locked(); |
||||
if (child->left_imbalanced(0)) { |
||||
treenode *grandchild = child->m_left_child.get_locked(); |
||||
|
||||
child->m_left_child = grandchild->m_right_child; |
||||
grandchild->m_right_child.set(child); |
||||
|
||||
m_right_child = grandchild->m_left_child; |
||||
grandchild->m_left_child.set(this); |
||||
|
||||
new_root = grandchild; |
||||
} else { |
||||
m_right_child = child->m_left_child; |
||||
child->m_left_child.set(this); |
||||
new_root = child; |
||||
} |
||||
} |
||||
|
||||
// up to three nodes may be locked.
|
||||
// - this
|
||||
// - child
|
||||
// - grandchild (but if it is locked, its the new root)
|
||||
//
|
||||
// one of them is the new root. we unlock everything except the new root.
|
||||
if (child && child != new_root) { |
||||
TOKU_VALGRIND_RESET_MUTEX_ORDERING_INFO(&child->m_mutex); |
||||
child->mutex_unlock(); |
||||
} |
||||
if (this != new_root) { |
||||
TOKU_VALGRIND_RESET_MUTEX_ORDERING_INFO(&m_mutex); |
||||
mutex_unlock(); |
||||
} |
||||
TOKU_VALGRIND_RESET_MUTEX_ORDERING_INFO(&new_root->m_mutex); |
||||
return new_root; |
||||
} |
||||
|
||||
treenode *treenode::lock_and_rebalance_left(void) { |
||||
treenode *child = m_left_child.get_locked(); |
||||
if (child) { |
||||
treenode *new_root = child->maybe_rebalance(); |
||||
m_left_child.set(new_root); |
||||
child = new_root; |
||||
} |
||||
return child; |
||||
} |
||||
|
||||
treenode *treenode::lock_and_rebalance_right(void) { |
||||
treenode *child = m_right_child.get_locked(); |
||||
if (child) { |
||||
treenode *new_root = child->maybe_rebalance(); |
||||
m_right_child.set(new_root); |
||||
child = new_root; |
||||
} |
||||
return child; |
||||
} |
||||
|
||||
void treenode::child_ptr::set(treenode *node) { |
||||
ptr = node; |
||||
depth_est = ptr ? ptr->get_depth_estimate() : 0; |
||||
} |
||||
|
||||
treenode *treenode::child_ptr::get_locked(void) { |
||||
if (ptr) { |
||||
ptr->mutex_lock(); |
||||
depth_est = ptr->get_depth_estimate(); |
||||
} |
||||
return ptr; |
||||
} |
||||
|
||||
} /* namespace toku */ |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,301 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=2:softtabstop=2:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <string.h> |
||||
|
||||
#include "../ft/comparator.h" |
||||
#include "../portability/memory.h" |
||||
#include "../portability/toku_pthread.h" |
||||
// PORT: we need LTM_STATUS
|
||||
#include "../ft/ft-status.h" |
||||
#include "../portability/txn_subst.h" |
||||
#include "keyrange.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// a node in a tree with its own mutex
|
||||
// - range is the "key" of this node
|
||||
// - txnid is the single txnid associated with this node
|
||||
// - left and right children may be null
|
||||
//
|
||||
// to build a tree on top of this abstraction, the user:
|
||||
// - provides memory for a root node, initializes it via create_root()
|
||||
// - performs tree operations on the root node. memory management
|
||||
// below the root node is handled by the abstraction, not the user.
|
||||
// this pattern:
|
||||
// - guaruntees a root node always exists.
|
||||
// - does not allow for rebalances on the root node
|
||||
|
||||
class treenode { |
||||
public: |
||||
// every treenode function has some common requirements:
|
||||
// - node is locked and children are never locked
|
||||
// - node may be unlocked if no other thread has visibility
|
||||
|
||||
// effect: create the root node
|
||||
void create_root(const comparator *cmp); |
||||
|
||||
// effect: destroys the root node
|
||||
void destroy_root(void); |
||||
|
||||
// effect: sets the txnid and copies the given range for this node
|
||||
void set_range_and_txnid(const keyrange &range, TXNID txnid, bool is_shared); |
||||
|
||||
// returns: true iff this node is marked as empty
|
||||
bool is_empty(void); |
||||
|
||||
// returns: true if this is the root node, denoted by a null parent
|
||||
bool is_root(void); |
||||
|
||||
// returns: true if the given range overlaps with this node's range
|
||||
bool range_overlaps(const keyrange &range); |
||||
|
||||
// effect: locks the node
|
||||
void mutex_lock(void); |
||||
|
||||
// effect: unlocks the node
|
||||
void mutex_unlock(void); |
||||
|
||||
// return: node whose child overlaps, or a child that is empty
|
||||
// and would contain range if it existed
|
||||
// given: if cmp_hint is non-null, then it is a precomputed
|
||||
// comparison of this node's range to the given range.
|
||||
treenode *find_node_with_overlapping_child( |
||||
const keyrange &range, const keyrange::comparison *cmp_hint); |
||||
|
||||
// effect: performs an in-order traversal of the ranges that overlap the
|
||||
// given range, calling function->fn() on each node that does
|
||||
// requires: function signature is: bool fn(const keyrange &range, TXNID
|
||||
// txnid) requires: fn returns true to keep iterating, false to stop iterating
|
||||
// requires: fn does not attempt to use any ranges read out by value
|
||||
// after removing a node with an overlapping range from the tree.
|
||||
template <class F> |
||||
void traverse_overlaps(const keyrange &range, F *function) { |
||||
keyrange::comparison c = range.compare(*m_cmp, m_range); |
||||
if (c == keyrange::comparison::EQUALS) { |
||||
// Doesn't matter if fn wants to keep going, there
|
||||
// is nothing left, so return.
|
||||
function->fn(m_range, m_txnid, m_is_shared, m_owners); |
||||
return; |
||||
} |
||||
|
||||
treenode *left = m_left_child.get_locked(); |
||||
if (left) { |
||||
if (c != keyrange::comparison::GREATER_THAN) { |
||||
// Target range is less than this node, or it overlaps this
|
||||
// node. There may be something on the left.
|
||||
left->traverse_overlaps(range, function); |
||||
} |
||||
left->mutex_unlock(); |
||||
} |
||||
|
||||
if (c == keyrange::comparison::OVERLAPS) { |
||||
bool keep_going = function->fn(m_range, m_txnid, m_is_shared, m_owners); |
||||
if (!keep_going) { |
||||
return; |
||||
} |
||||
} |
||||
|
||||
treenode *right = m_right_child.get_locked(); |
||||
if (right) { |
||||
if (c != keyrange::comparison::LESS_THAN) { |
||||
// Target range is greater than this node, or it overlaps this
|
||||
// node. There may be something on the right.
|
||||
right->traverse_overlaps(range, function); |
||||
} |
||||
right->mutex_unlock(); |
||||
} |
||||
} |
||||
|
||||
// effect: inserts the given range and txnid into a subtree, recursively
|
||||
// requires: range does not overlap with any node below the subtree
|
||||
bool insert(const keyrange &range, TXNID txnid, bool is_shared); |
||||
|
||||
// effect: removes the given range from the subtree
|
||||
// requires: range exists in the subtree
|
||||
// returns: the root of the resulting subtree
|
||||
treenode *remove(const keyrange &range, TXNID txnid); |
||||
|
||||
// effect: removes this node and all of its children, recursively
|
||||
// requires: every node at and below this node is unlocked
|
||||
void recursive_remove(void); |
||||
|
||||
private: |
||||
// the child_ptr is a light abstraction for the locking of
|
||||
// a child and the maintenence of its depth estimate.
|
||||
|
||||
struct child_ptr { |
||||
// set the child pointer
|
||||
void set(treenode *node); |
||||
|
||||
// get and lock this child if it exists
|
||||
treenode *get_locked(void); |
||||
|
||||
treenode *ptr; |
||||
uint32_t depth_est; |
||||
}; |
||||
|
||||
// the balance factor at which a node is considered imbalanced
|
||||
static const int32_t IMBALANCE_THRESHOLD = 2; |
||||
|
||||
// node-level mutex
|
||||
toku_mutex_t m_mutex; |
||||
|
||||
// the range and txnid for this node. the range contains a copy
|
||||
// of the keys originally inserted into the tree. nodes may
|
||||
// swap ranges. but at the end of the day, when a node is
|
||||
// destroyed, it frees the memory associated with whatever range
|
||||
// it has at the time of destruction.
|
||||
keyrange m_range; |
||||
|
||||
void remove_shared_owner(TXNID txnid); |
||||
|
||||
bool has_multiple_owners() { return (m_txnid == TXNID_SHARED); } |
||||
|
||||
private: |
||||
// Owner transaction id.
|
||||
// A value of TXNID_SHARED means this node has multiple owners
|
||||
TXNID m_txnid; |
||||
|
||||
// If true, this lock is a non-exclusive lock, and it can have either
|
||||
// one or several owners.
|
||||
bool m_is_shared; |
||||
|
||||
// List of the owners, or nullptr if there's just one owner.
|
||||
TxnidVector *m_owners; |
||||
|
||||
// two child pointers
|
||||
child_ptr m_left_child; |
||||
child_ptr m_right_child; |
||||
|
||||
// comparator for ranges
|
||||
// psergey-todo: Is there any sense to store the comparator in each tree
|
||||
// node?
|
||||
const comparator *m_cmp; |
||||
|
||||
// marked for the root node. the root node is never free()'d
|
||||
// when removed, but instead marked as empty.
|
||||
bool m_is_root; |
||||
|
||||
// marked for an empty node. only valid for the root.
|
||||
bool m_is_empty; |
||||
|
||||
// effect: initializes an empty node with the given comparator
|
||||
void init(const comparator *cmp); |
||||
|
||||
// requires: this is a shared node (m_is_shared==true)
|
||||
// effect: another transaction is added as an owner.
|
||||
// returns: true <=> added another owner
|
||||
// false <=> this transaction is already an owner
|
||||
bool add_shared_owner(TXNID txnid); |
||||
|
||||
// requires: *parent is initialized to something meaningful.
|
||||
// requires: subtree is non-empty
|
||||
// returns: the leftmost child of the given subtree
|
||||
// returns: a pointer to the parent of said child in *parent, only
|
||||
// if this function recurred, otherwise it is untouched.
|
||||
treenode *find_leftmost_child(treenode **parent); |
||||
|
||||
// requires: *parent is initialized to something meaningful.
|
||||
// requires: subtree is non-empty
|
||||
// returns: the rightmost child of the given subtree
|
||||
// returns: a pointer to the parent of said child in *parent, only
|
||||
// if this function recurred, otherwise it is untouched.
|
||||
treenode *find_rightmost_child(treenode **parent); |
||||
|
||||
// effect: remove the root of this subtree, destroying the old root
|
||||
// returns: the new root of the subtree
|
||||
treenode *remove_root_of_subtree(void); |
||||
|
||||
// requires: subtree is non-empty, direction is not 0
|
||||
// returns: the child of the subtree at either the left or rightmost extreme
|
||||
treenode *find_child_at_extreme(int direction, treenode **parent); |
||||
|
||||
// effect: retrieves and possibly rebalances the left child
|
||||
// returns: a locked left child, if it exists
|
||||
treenode *lock_and_rebalance_left(void); |
||||
|
||||
// effect: retrieves and possibly rebalances the right child
|
||||
// returns: a locked right child, if it exists
|
||||
treenode *lock_and_rebalance_right(void); |
||||
|
||||
// returns: the estimated depth of this subtree
|
||||
uint32_t get_depth_estimate(void) const; |
||||
|
||||
// returns: true iff left subtree depth is sufficiently less than the right
|
||||
bool left_imbalanced(int threshold) const; |
||||
|
||||
// returns: true iff right subtree depth is sufficiently greater than the left
|
||||
bool right_imbalanced(int threshold) const; |
||||
|
||||
// effect: performs an O(1) rebalance, which will "heal" an imbalance by at
|
||||
// most 1. effect: if the new root is not this node, then this node is
|
||||
// unlocked. returns: locked node representing the new root of the rebalanced
|
||||
// subtree
|
||||
treenode *maybe_rebalance(void); |
||||
|
||||
// returns: allocated treenode populated with a copy of the range and txnid
|
||||
static treenode *alloc(const comparator *cmp, const keyrange &range, |
||||
TXNID txnid, bool is_shared); |
||||
|
||||
// requires: node is a locked root node, or an unlocked non-root node
|
||||
static void free(treenode *node); |
||||
|
||||
// effect: swaps the range/txnid pairs for node1 and node2.
|
||||
static void swap_in_place(treenode *node1, treenode *node2); |
||||
|
||||
friend class concurrent_tree_unit_test; |
||||
}; |
||||
|
||||
} /* namespace toku */ |
@ -0,0 +1,119 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include "txnid_set.h" |
||||
|
||||
#include "../db.h" |
||||
|
||||
namespace toku { |
||||
|
||||
int find_by_txnid(const TXNID &txnid_a, const TXNID &txnid_b); |
||||
int find_by_txnid(const TXNID &txnid_a, const TXNID &txnid_b) { |
||||
if (txnid_a < txnid_b) { |
||||
return -1; |
||||
} else if (txnid_a == txnid_b) { |
||||
return 0; |
||||
} else { |
||||
return 1; |
||||
} |
||||
} |
||||
|
||||
void txnid_set::create(void) { |
||||
// lazily allocate the underlying omt, since it is common
|
||||
// to create a txnid set and never put anything in it.
|
||||
m_txnids.create_no_array(); |
||||
} |
||||
|
||||
void txnid_set::destroy(void) { m_txnids.destroy(); } |
||||
|
||||
// Return true if the given transaction id is a member of the set.
|
||||
// Otherwise, return false.
|
||||
bool txnid_set::contains(TXNID txnid) const { |
||||
TXNID find_txnid; |
||||
int r = m_txnids.find_zero<TXNID, find_by_txnid>(txnid, &find_txnid, nullptr); |
||||
return r == 0 ? true : false; |
||||
} |
||||
|
||||
// Add a given txnid to the set
|
||||
void txnid_set::add(TXNID txnid) { |
||||
int r = m_txnids.insert<TXNID, find_by_txnid>(txnid, txnid, nullptr); |
||||
invariant(r == 0 || r == DB_KEYEXIST); |
||||
} |
||||
|
||||
// Delete a given txnid from the set.
|
||||
void txnid_set::remove(TXNID txnid) { |
||||
uint32_t idx; |
||||
int r = m_txnids.find_zero<TXNID, find_by_txnid>(txnid, nullptr, &idx); |
||||
if (r == 0) { |
||||
r = m_txnids.delete_at(idx); |
||||
invariant_zero(r); |
||||
} |
||||
} |
||||
|
||||
// Return the size of the set
|
||||
uint32_t txnid_set::size(void) const { return m_txnids.size(); } |
||||
|
||||
// Get the ith id in the set, assuming that the set is sorted.
|
||||
TXNID txnid_set::get(uint32_t i) const { |
||||
TXNID txnid; |
||||
int r = m_txnids.fetch(i, &txnid); |
||||
if (r == EINVAL) /* Shouldn't happen, avoid compiler warning */ |
||||
return TXNID_NONE; |
||||
invariant_zero(r); |
||||
return txnid; |
||||
} |
||||
|
||||
} /* namespace toku */ |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,91 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include "../portability/txn_subst.h" |
||||
#include "../util/omt.h" |
||||
|
||||
namespace toku { |
||||
|
||||
class txnid_set { |
||||
public: |
||||
// effect: Creates an empty set. Does not malloc space for
|
||||
// any entries yet. That is done lazily on add().
|
||||
void create(void); |
||||
|
||||
// effect: Destroy the set's internals.
|
||||
void destroy(void); |
||||
|
||||
// returns: True if the given txnid is a member of the set.
|
||||
bool contains(TXNID id) const; |
||||
|
||||
// effect: Adds a given txnid to the set if it did not exist
|
||||
void add(TXNID txnid); |
||||
|
||||
// effect: Deletes a txnid from the set if it exists.
|
||||
void remove(TXNID txnid); |
||||
|
||||
// returns: Size of the set
|
||||
uint32_t size(void) const; |
||||
|
||||
// returns: The "i'th" id in the set, as if it were sorted.
|
||||
TXNID get(uint32_t i) const; |
||||
|
||||
private: |
||||
toku::omt<TXNID> m_txnids; |
||||
|
||||
friend class txnid_set_unit_test; |
||||
}; |
||||
ENSURE_POD(txnid_set); |
||||
|
||||
} /* namespace toku */ |
@ -0,0 +1,212 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include "../db.h" |
||||
#include "../portability/memory.h" |
||||
// PORT #include <toku_assert.h>
|
||||
#include <memory.h> |
||||
#include <string.h> |
||||
|
||||
#include "txnid_set.h" |
||||
#include "wfg.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// Create a lock request graph
|
||||
void wfg::create(void) { m_nodes.create(); } |
||||
|
||||
// Destroy the internals of the lock request graph
|
||||
void wfg::destroy(void) { |
||||
uint32_t n_nodes = m_nodes.size(); |
||||
for (uint32_t i = 0; i < n_nodes; i++) { |
||||
node *n; |
||||
int r = m_nodes.fetch(i, &n); |
||||
invariant_zero(r); |
||||
invariant_notnull(n); |
||||
if (r) continue; // Get rid of "may be used uninitialized" warning
|
||||
node::free(n); |
||||
} |
||||
m_nodes.destroy(); |
||||
} |
||||
|
||||
// Add an edge (a_id, b_id) to the graph
|
||||
void wfg::add_edge(TXNID a_txnid, TXNID b_txnid) { |
||||
node *a_node = find_create_node(a_txnid); |
||||
node *b_node = find_create_node(b_txnid); |
||||
a_node->edges.add(b_node->txnid); |
||||
} |
||||
|
||||
// Return true if a node with the given transaction id exists in the graph.
|
||||
// Return false otherwise.
|
||||
bool wfg::node_exists(TXNID txnid) { |
||||
node *n = find_node(txnid); |
||||
return n != NULL; |
||||
} |
||||
|
||||
bool wfg::cycle_exists_from_node(node *target, node *head, |
||||
std::function<void(TXNID)> reporter) { |
||||
bool cycle_found = false; |
||||
head->visited = true; |
||||
uint32_t n_edges = head->edges.size(); |
||||
for (uint32_t i = 0; i < n_edges && !cycle_found; i++) { |
||||
TXNID edge_id = head->edges.get(i); |
||||
if (target->txnid == edge_id) { |
||||
cycle_found = true; |
||||
if (reporter) reporter(edge_id); |
||||
} else { |
||||
node *new_head = find_node(edge_id); |
||||
if (new_head && !new_head->visited) { |
||||
cycle_found = cycle_exists_from_node(target, new_head, reporter); |
||||
if (cycle_found && reporter) reporter(edge_id); |
||||
} |
||||
} |
||||
} |
||||
head->visited = false; |
||||
return cycle_found; |
||||
} |
||||
|
||||
// Return true if there exists a cycle from a given transaction id in the graph.
|
||||
// Return false otherwise.
|
||||
bool wfg::cycle_exists_from_txnid(TXNID txnid, |
||||
std::function<void(TXNID)> reporter) { |
||||
node *a_node = find_node(txnid); |
||||
bool cycles_found = false; |
||||
if (a_node) { |
||||
cycles_found = cycle_exists_from_node(a_node, a_node, reporter); |
||||
} |
||||
return cycles_found; |
||||
} |
||||
|
||||
// Apply a given function f to all of the nodes in the graph. The apply
|
||||
// function returns when the function f is called for all of the nodes in the
|
||||
// graph, or the function f returns non-zero.
|
||||
void wfg::apply_nodes(int (*fn)(TXNID id, void *extra), void *extra) { |
||||
int r = 0; |
||||
uint32_t n_nodes = m_nodes.size(); |
||||
for (uint32_t i = 0; i < n_nodes && r == 0; i++) { |
||||
node *n; |
||||
r = m_nodes.fetch(i, &n); |
||||
invariant_zero(r); |
||||
if (r) continue; // Get rid of "may be used uninitialized" warning
|
||||
r = fn(n->txnid, extra); |
||||
} |
||||
} |
||||
|
||||
// Apply a given function f to all of the edges whose origin is a given node id.
|
||||
// The apply function returns when the function f is called for all edges in the
|
||||
// graph rooted at node id, or the function f returns non-zero.
|
||||
void wfg::apply_edges(TXNID txnid, |
||||
int (*fn)(TXNID txnid, TXNID edge_txnid, void *extra), |
||||
void *extra) { |
||||
node *n = find_node(txnid); |
||||
if (n) { |
||||
int r = 0; |
||||
uint32_t n_edges = n->edges.size(); |
||||
for (uint32_t i = 0; i < n_edges && r == 0; i++) { |
||||
r = fn(txnid, n->edges.get(i), extra); |
||||
} |
||||
} |
||||
} |
||||
|
||||
// find node by id
|
||||
wfg::node *wfg::find_node(TXNID txnid) { |
||||
node *n = nullptr; |
||||
int r = m_nodes.find_zero<TXNID, find_by_txnid>(txnid, &n, nullptr); |
||||
invariant(r == 0 || r == DB_NOTFOUND); |
||||
return n; |
||||
} |
||||
|
||||
// this is the omt comparison function
|
||||
// nodes are compared by their txnid.
|
||||
int wfg::find_by_txnid(node *const &node_a, const TXNID &txnid_b) { |
||||
TXNID txnid_a = node_a->txnid; |
||||
if (txnid_a < txnid_b) { |
||||
return -1; |
||||
} else if (txnid_a == txnid_b) { |
||||
return 0; |
||||
} else { |
||||
return 1; |
||||
} |
||||
} |
||||
|
||||
// insert a new node
|
||||
wfg::node *wfg::find_create_node(TXNID txnid) { |
||||
node *n; |
||||
uint32_t idx; |
||||
int r = m_nodes.find_zero<TXNID, find_by_txnid>(txnid, &n, &idx); |
||||
if (r == DB_NOTFOUND) { |
||||
n = node::alloc(txnid); |
||||
r = m_nodes.insert_at(n, idx); |
||||
invariant_zero(r); |
||||
} |
||||
invariant_notnull(n); |
||||
return n; |
||||
} |
||||
|
||||
wfg::node *wfg::node::alloc(TXNID txnid) { |
||||
node *XCALLOC(n); |
||||
n->txnid = txnid; |
||||
n->visited = false; |
||||
n->edges.create(); |
||||
return n; |
||||
} |
||||
|
||||
void wfg::node::free(wfg::node *n) { |
||||
n->edges.destroy(); |
||||
toku_free(n); |
||||
} |
||||
|
||||
} /* namespace toku */ |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,123 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <functional> |
||||
|
||||
#include "../util/omt.h" |
||||
#include "txnid_set.h" |
||||
|
||||
namespace toku { |
||||
|
||||
// A wfg is a 'wait-for' graph. A directed edge in represents one
|
||||
// txn waiting for another to finish before it can acquire a lock.
|
||||
|
||||
class wfg { |
||||
public: |
||||
// Create a lock request graph
|
||||
void create(void); |
||||
|
||||
// Destroy the internals of the lock request graph
|
||||
void destroy(void); |
||||
|
||||
// Add an edge (a_id, b_id) to the graph
|
||||
void add_edge(TXNID a_txnid, TXNID b_txnid); |
||||
|
||||
// Return true if a node with the given transaction id exists in the graph.
|
||||
// Return false otherwise.
|
||||
bool node_exists(TXNID txnid); |
||||
|
||||
// Return true if there exists a cycle from a given transaction id in the
|
||||
// graph. Return false otherwise.
|
||||
bool cycle_exists_from_txnid(TXNID txnid, |
||||
std::function<void(TXNID)> reporter); |
||||
|
||||
// Apply a given function f to all of the nodes in the graph. The apply
|
||||
// function returns when the function f is called for all of the nodes in the
|
||||
// graph, or the function f returns non-zero.
|
||||
void apply_nodes(int (*fn)(TXNID txnid, void *extra), void *extra); |
||||
|
||||
// Apply a given function f to all of the edges whose origin is a given node
|
||||
// id. The apply function returns when the function f is called for all edges
|
||||
// in the graph rooted at node id, or the function f returns non-zero.
|
||||
void apply_edges(TXNID txnid, |
||||
int (*fn)(TXNID txnid, TXNID edge_txnid, void *extra), |
||||
void *extra); |
||||
|
||||
private: |
||||
struct node { |
||||
// txnid for this node and the associated set of edges
|
||||
TXNID txnid; |
||||
txnid_set edges; |
||||
bool visited; |
||||
|
||||
static node *alloc(TXNID txnid); |
||||
|
||||
static void free(node *n); |
||||
}; |
||||
ENSURE_POD(node); |
||||
|
||||
toku::omt<node *> m_nodes; |
||||
|
||||
node *find_node(TXNID txnid); |
||||
|
||||
node *find_create_node(TXNID txnid); |
||||
|
||||
bool cycle_exists_from_node(node *target, node *head, |
||||
std::function<void(TXNID)> reporter); |
||||
|
||||
static int find_by_txnid(node *const &node_a, const TXNID &txnid_b); |
||||
}; |
||||
ENSURE_POD(wfg); |
||||
|
||||
} /* namespace toku */ |
@ -0,0 +1,201 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <stdlib.h> |
||||
|
||||
#include "toku_portability.h" |
||||
|
||||
/* Percona memory allocation functions and macros.
|
||||
* These are functions for malloc and free */ |
||||
|
||||
int toku_memory_startup(void) __attribute__((constructor)); |
||||
void toku_memory_shutdown(void) __attribute__((destructor)); |
||||
|
||||
/* Generally: errno is set to 0 or a value to indicate problems. */ |
||||
|
||||
// Everything should call toku_malloc() instead of malloc(), and toku_calloc()
|
||||
// instead of calloc() That way the tests can can, e.g., replace the malloc
|
||||
// function using toku_set_func_malloc().
|
||||
void *toku_calloc(size_t nmemb, size_t size) |
||||
__attribute__((__visibility__("default"))); |
||||
void *toku_xcalloc(size_t nmemb, size_t size) |
||||
__attribute__((__visibility__("default"))); |
||||
void *toku_malloc(size_t size) __attribute__((__visibility__("default"))); |
||||
void *toku_malloc_aligned(size_t alignment, size_t size) |
||||
__attribute__((__visibility__("default"))); |
||||
|
||||
// xmalloc aborts instead of return NULL if we run out of memory
|
||||
void *toku_xmalloc(size_t size) __attribute__((__visibility__("default"))); |
||||
void *toku_xrealloc(void *, size_t size) |
||||
__attribute__((__visibility__("default"))); |
||||
void *toku_xmalloc_aligned(size_t alignment, size_t size) |
||||
__attribute__((__visibility__("default"))); |
||||
// Effect: Perform a os_malloc_aligned(size) with the additional property that
|
||||
// the returned pointer is a multiple of ALIGNMENT.
|
||||
// Fail with a resource_assert if the allocation fails (don't return an error
|
||||
// code). If the alloc_aligned function has been set then call it instead.
|
||||
// Requires: alignment is a power of two.
|
||||
|
||||
void toku_free(void *) __attribute__((__visibility__("default"))); |
||||
|
||||
size_t toku_malloc_usable_size(void *p) |
||||
__attribute__((__visibility__("default"))); |
||||
|
||||
/* MALLOC is a macro that helps avoid a common error:
|
||||
* Suppose I write |
||||
* struct foo *x = malloc(sizeof(struct foo)); |
||||
* That works fine. But if I change it to this, I've probably made an mistake: |
||||
* struct foo *x = malloc(sizeof(struct bar)); |
||||
* It can get worse, since one might have something like |
||||
* struct foo *x = malloc(sizeof(struct foo *)) |
||||
* which looks reasonable, but it allocoates enough to hold a pointer instead of |
||||
* the amount needed for the struct. So instead, write struct foo *MALLOC(x); |
||||
* and you cannot go wrong. |
||||
*/ |
||||
#define MALLOC(v) CAST_FROM_VOIDP(v, toku_malloc(sizeof(*v))) |
||||
/* MALLOC_N is like calloc(Except no 0ing of data): It makes an array. Write
|
||||
* int *MALLOC_N(5,x); |
||||
* to make an array of 5 integers. |
||||
*/ |
||||
#define MALLOC_N(n, v) CAST_FROM_VOIDP(v, toku_malloc((n) * sizeof(*v))) |
||||
#define MALLOC_N_ALIGNED(align, n, v) \ |
||||
CAST_FROM_VOIDP(v, toku_malloc_aligned((align), (n) * sizeof(*v))) |
||||
|
||||
// CALLOC_N is like calloc with auto-figuring out size of members
|
||||
#define CALLOC_N(n, v) CAST_FROM_VOIDP(v, toku_calloc((n), sizeof(*v))) |
||||
|
||||
#define CALLOC(v) CALLOC_N(1, v) |
||||
|
||||
// XMALLOC macros are like MALLOC except they abort if the operation fails
|
||||
#define XMALLOC(v) CAST_FROM_VOIDP(v, toku_xmalloc(sizeof(*v))) |
||||
#define XMALLOC_N(n, v) CAST_FROM_VOIDP(v, toku_xmalloc((n) * sizeof(*v))) |
||||
#define XCALLOC_N(n, v) CAST_FROM_VOIDP(v, toku_xcalloc((n), (sizeof(*v)))) |
||||
#define XCALLOC(v) XCALLOC_N(1, v) |
||||
#define XREALLOC(v, s) CAST_FROM_VOIDP(v, toku_xrealloc(v, s)) |
||||
#define XREALLOC_N(n, v) CAST_FROM_VOIDP(v, toku_xrealloc(v, (n) * sizeof(*v))) |
||||
|
||||
#define XMALLOC_N_ALIGNED(align, n, v) \ |
||||
CAST_FROM_VOIDP(v, toku_xmalloc_aligned((align), (n) * sizeof(*v))) |
||||
|
||||
#define XMEMDUP(dst, src) CAST_FROM_VOIDP(dst, toku_xmemdup(src, sizeof(*src))) |
||||
#define XMEMDUP_N(dst, src, len) CAST_FROM_VOIDP(dst, toku_xmemdup(src, len)) |
||||
|
||||
// ZERO_ARRAY writes zeroes to a stack-allocated array
|
||||
#define ZERO_ARRAY(o) \ |
||||
do { \
|
||||
memset((o), 0, sizeof(o)); \
|
||||
} while (0) |
||||
// ZERO_STRUCT writes zeroes to a stack-allocated struct
|
||||
#define ZERO_STRUCT(o) \ |
||||
do { \
|
||||
memset(&(o), 0, sizeof(o)); \
|
||||
} while (0) |
||||
|
||||
/* Copy memory. Analogous to strdup() */ |
||||
void *toku_memdup(const void *v, size_t len); |
||||
/* Toku-version of strdup. Use this so that it calls toku_malloc() */ |
||||
char *toku_strdup(const char *s) __attribute__((__visibility__("default"))); |
||||
/* Toku-version of strndup. Use this so that it calls toku_malloc() */ |
||||
char *toku_strndup(const char *s, size_t n) |
||||
__attribute__((__visibility__("default"))); |
||||
/* Copy memory. Analogous to strdup() Crashes instead of returning NULL */ |
||||
void *toku_xmemdup(const void *v, size_t len) |
||||
__attribute__((__visibility__("default"))); |
||||
/* Toku-version of strdup. Use this so that it calls toku_xmalloc() Crashes
|
||||
* instead of returning NULL */ |
||||
char *toku_xstrdup(const char *s) __attribute__((__visibility__("default"))); |
||||
|
||||
void toku_malloc_cleanup( |
||||
void); /* Before exiting, call this function to free up any internal data
|
||||
structures from toku_malloc. Otherwise valgrind will complain of |
||||
memory leaks. */ |
||||
|
||||
/* Check to see if everything malloc'd was free. Might be a no-op depending on
|
||||
* how memory.c is configured. */ |
||||
void toku_memory_check_all_free(void); |
||||
/* Check to see if memory is "sane". Might be a no-op. Probably better to
|
||||
* simply use valgrind. */ |
||||
void toku_do_memory_check(void); |
||||
|
||||
typedef void *(*malloc_fun_t)(size_t); |
||||
typedef void (*free_fun_t)(void *); |
||||
typedef void *(*realloc_fun_t)(void *, size_t); |
||||
typedef void *(*malloc_aligned_fun_t)(size_t /*alignment*/, size_t /*size*/); |
||||
typedef void *(*realloc_aligned_fun_t)(size_t /*alignment*/, void * /*pointer*/, |
||||
size_t /*size*/); |
||||
|
||||
void toku_set_func_malloc(malloc_fun_t f); |
||||
void toku_set_func_xmalloc_only(malloc_fun_t f); |
||||
void toku_set_func_malloc_only(malloc_fun_t f); |
||||
void toku_set_func_realloc(realloc_fun_t f); |
||||
void toku_set_func_xrealloc_only(realloc_fun_t f); |
||||
void toku_set_func_realloc_only(realloc_fun_t f); |
||||
void toku_set_func_free(free_fun_t f); |
||||
|
||||
typedef struct memory_status { |
||||
uint64_t malloc_count; // number of malloc operations
|
||||
uint64_t free_count; // number of free operations
|
||||
uint64_t realloc_count; // number of realloc operations
|
||||
uint64_t malloc_fail; // number of malloc operations that failed
|
||||
uint64_t realloc_fail; // number of realloc operations that failed
|
||||
uint64_t requested; // number of bytes requested
|
||||
uint64_t used; // number of bytes used (requested + overhead), obtained from
|
||||
// malloc_usable_size()
|
||||
uint64_t freed; // number of bytes freed;
|
||||
uint64_t max_requested_size; // largest attempted allocation size
|
||||
uint64_t last_failed_size; // size of the last failed allocation attempt
|
||||
volatile uint64_t |
||||
max_in_use; // maximum memory footprint (used - freed), approximate (not
|
||||
// worth threadsafety overhead for exact)
|
||||
const char *mallocator_version; |
||||
uint64_t mmap_threshold; |
||||
} LOCAL_MEMORY_STATUS_S, *LOCAL_MEMORY_STATUS; |
||||
|
||||
void toku_memory_get_status(LOCAL_MEMORY_STATUS s); |
||||
|
||||
// Effect: Like toku_memory_footprint, except instead of passing p,
|
||||
// we pass toku_malloc_usable_size(p).
|
||||
size_t toku_memory_footprint_given_usable_size(size_t touched, size_t usable); |
||||
|
||||
// Effect: Return an estimate how how much space an object is using, possibly by
|
||||
// using toku_malloc_usable_size(p).
|
||||
// If p is NULL then returns 0.
|
||||
size_t toku_memory_footprint(void *p, size_t touched); |
@ -0,0 +1,37 @@ |
||||
//
|
||||
// A replacement for toku_assert.h
|
||||
//
|
||||
#pragma once |
||||
|
||||
#include <assert.h> |
||||
#include <errno.h> |
||||
|
||||
#ifdef NDEBUG |
||||
|
||||
#define assert_zero(a) ((void)(a)) |
||||
#define invariant(a) ((void)(a)) |
||||
#define invariant_notnull(a) ((void)(a)) |
||||
#define invariant_zero(a) ((void)(a)) |
||||
|
||||
#else |
||||
|
||||
#define assert_zero(a) assert((a) == 0) |
||||
#define invariant(a) assert(a) |
||||
#define invariant_notnull(a) assert(a) |
||||
#define invariant_zero(a) assert_zero(a) |
||||
|
||||
#endif |
||||
|
||||
#define lazy_assert_zero(a) assert_zero(a) |
||||
|
||||
#define paranoid_invariant_zero(a) assert_zero(a) |
||||
#define paranoid_invariant_notnull(a) assert(a) |
||||
#define paranoid_invariant(a) assert(a) |
||||
|
||||
#define ENSURE_POD(type) \ |
||||
static_assert(std::is_pod<type>::value, #type "isn't POD") |
||||
|
||||
inline int get_error_errno(void) { |
||||
invariant(errno); |
||||
return errno; |
||||
} |
@ -0,0 +1,116 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
// PORT2: #include <portability/toku_config.h>
|
||||
#include <stdbool.h> |
||||
#include <stddef.h> |
||||
#include <stdint.h> |
||||
|
||||
#include "toku_assert_subst.h" |
||||
|
||||
__attribute__((const, always_inline)) static inline intptr_t which_cache_line( |
||||
intptr_t addr) { |
||||
static const size_t assumed_cache_line_size = 64; |
||||
return addr / assumed_cache_line_size; |
||||
} |
||||
template <typename T> |
||||
__attribute__((const, always_inline)) static inline bool crosses_boundary( |
||||
T *addr, size_t width) { |
||||
const intptr_t int_addr = reinterpret_cast<intptr_t>(addr); |
||||
const intptr_t last_byte = int_addr + width - 1; |
||||
return which_cache_line(int_addr) != which_cache_line(last_byte); |
||||
} |
||||
|
||||
template <typename T, typename U> |
||||
__attribute__((always_inline)) static inline T toku_sync_fetch_and_add(T *addr, |
||||
U diff) { |
||||
paranoid_invariant(!crosses_boundary(addr, sizeof *addr)); |
||||
return __sync_fetch_and_add(addr, diff); |
||||
} |
||||
template <typename T, typename U> |
||||
__attribute__((always_inline)) static inline T toku_sync_add_and_fetch(T *addr, |
||||
U diff) { |
||||
paranoid_invariant(!crosses_boundary(addr, sizeof *addr)); |
||||
return __sync_add_and_fetch(addr, diff); |
||||
} |
||||
template <typename T, typename U> |
||||
__attribute__((always_inline)) static inline T toku_sync_fetch_and_sub(T *addr, |
||||
U diff) { |
||||
paranoid_invariant(!crosses_boundary(addr, sizeof *addr)); |
||||
return __sync_fetch_and_sub(addr, diff); |
||||
} |
||||
template <typename T, typename U> |
||||
__attribute__((always_inline)) static inline T toku_sync_sub_and_fetch(T *addr, |
||||
U diff) { |
||||
paranoid_invariant(!crosses_boundary(addr, sizeof *addr)); |
||||
return __sync_sub_and_fetch(addr, diff); |
||||
} |
||||
template <typename T, typename U, typename V> |
||||
__attribute__((always_inline)) static inline T toku_sync_val_compare_and_swap( |
||||
T *addr, U oldval, V newval) { |
||||
paranoid_invariant(!crosses_boundary(addr, sizeof *addr)); |
||||
return __sync_val_compare_and_swap(addr, oldval, newval); |
||||
} |
||||
template <typename T, typename U, typename V> |
||||
__attribute__((always_inline)) static inline bool |
||||
toku_sync_bool_compare_and_swap(T *addr, U oldval, V newval) { |
||||
paranoid_invariant(!crosses_boundary(addr, sizeof *addr)); |
||||
return __sync_bool_compare_and_swap(addr, oldval, newval); |
||||
} |
||||
|
||||
// in case you include this but not toku_portability.h
|
||||
#pragma GCC poison __sync_fetch_and_add |
||||
#pragma GCC poison __sync_fetch_and_sub |
||||
#pragma GCC poison __sync_fetch_and_or |
||||
#pragma GCC poison __sync_fetch_and_and |
||||
#pragma GCC poison __sync_fetch_and_xor |
||||
#pragma GCC poison __sync_fetch_and_nand |
||||
#pragma GCC poison __sync_add_and_fetch |
||||
#pragma GCC poison __sync_sub_and_fetch |
||||
#pragma GCC poison __sync_or_and_fetch |
||||
#pragma GCC poison __sync_and_and_fetch |
||||
#pragma GCC poison __sync_xor_and_fetch |
||||
#pragma GCC poison __sync_nand_and_fetch |
||||
#pragma GCC poison __sync_bool_compare_and_swap |
||||
#pragma GCC poison __sync_val_compare_and_swap |
||||
#pragma GCC poison __sync_synchronize |
||||
#pragma GCC poison __sync_lock_test_and_set |
||||
#pragma GCC poison __sync_release |
@ -0,0 +1,82 @@ |
||||
/*
|
||||
A wrapper around rocksdb::TransactionDBMutexFactory-provided condition and |
||||
mutex that provides toku_pthread_*-like interface. The functions are named |
||||
|
||||
toku_external_{mutex|cond}_XXX |
||||
|
||||
Lock Tree uses this mutex and condition for interruptible (long) lock waits. |
||||
|
||||
(It also still uses toku_pthread_XXX calls for mutexes/conditions for |
||||
shorter waits on internal objects) |
||||
*/ |
||||
|
||||
#pragma once |
||||
|
||||
#include <pthread.h> |
||||
#include <stdint.h> |
||||
#include <time.h> |
||||
|
||||
#include "rocksdb/utilities/transaction_db.h" |
||||
#include "rocksdb/utilities/transaction_db_mutex.h" |
||||
#include "toku_portability.h" |
||||
|
||||
using ROCKSDB_NAMESPACE::TransactionDBCondVar; |
||||
using ROCKSDB_NAMESPACE::TransactionDBMutex; |
||||
|
||||
typedef std::shared_ptr<ROCKSDB_NAMESPACE::TransactionDBMutexFactory> |
||||
toku_external_mutex_factory_t; |
||||
|
||||
typedef std::shared_ptr<TransactionDBMutex> toku_external_mutex_t; |
||||
typedef std::shared_ptr<TransactionDBCondVar> toku_external_cond_t; |
||||
|
||||
static inline void toku_external_cond_init( |
||||
toku_external_mutex_factory_t mutex_factory, toku_external_cond_t *cond) { |
||||
*cond = mutex_factory->AllocateCondVar(); |
||||
} |
||||
|
||||
inline void toku_external_cond_destroy(toku_external_cond_t *cond) { |
||||
cond->reset(); // this will destroy the managed object
|
||||
} |
||||
|
||||
inline void toku_external_cond_signal(toku_external_cond_t *cond) { |
||||
(*cond)->Notify(); |
||||
} |
||||
|
||||
inline void toku_external_cond_broadcast(toku_external_cond_t *cond) { |
||||
(*cond)->NotifyAll(); |
||||
} |
||||
|
||||
inline int toku_external_cond_timedwait(toku_external_cond_t *cond, |
||||
toku_external_mutex_t *mutex, |
||||
int64_t timeout_microsec) { |
||||
auto res = (*cond)->WaitFor(*mutex, timeout_microsec); |
||||
if (res.ok()) |
||||
return 0; |
||||
else |
||||
return ETIMEDOUT; |
||||
} |
||||
|
||||
inline void toku_external_mutex_init(toku_external_mutex_factory_t factory, |
||||
toku_external_mutex_t *mutex) { |
||||
// Use placement new: the memory has been allocated but constructor wasn't
|
||||
// called
|
||||
new (mutex) toku_external_mutex_t; |
||||
*mutex = factory->AllocateMutex(); |
||||
} |
||||
|
||||
inline void toku_external_mutex_lock(toku_external_mutex_t *mutex) { |
||||
(*mutex)->Lock(); |
||||
} |
||||
|
||||
inline int toku_external_mutex_trylock(toku_external_mutex_t *mutex) { |
||||
(*mutex)->Lock(); |
||||
return 0; |
||||
} |
||||
|
||||
inline void toku_external_mutex_unlock(toku_external_mutex_t *mutex) { |
||||
(*mutex)->UnLock(); |
||||
} |
||||
|
||||
inline void toku_external_mutex_destroy(toku_external_mutex_t *mutex) { |
||||
mutex->reset(); // this will destroy the managed object
|
||||
} |
@ -0,0 +1,240 @@ |
||||
#pragma once |
||||
|
||||
#include <stdio.h> // FILE |
||||
|
||||
// Performance instrumentation object identifier type
|
||||
typedef unsigned int pfs_key_t; |
||||
|
||||
enum class toku_instr_object_type { mutex, rwlock, cond, thread, file }; |
||||
|
||||
struct PSI_file; |
||||
|
||||
struct TOKU_FILE { |
||||
/** The real file. */ |
||||
FILE *file; |
||||
struct PSI_file *key; |
||||
TOKU_FILE() : file(nullptr), key(nullptr) {} |
||||
}; |
||||
|
||||
struct PSI_mutex; |
||||
struct PSI_cond; |
||||
struct PSI_rwlock; |
||||
|
||||
struct toku_mutex_t; |
||||
struct toku_cond_t; |
||||
struct toku_pthread_rwlock_t; |
||||
|
||||
class toku_instr_key; |
||||
|
||||
class toku_instr_probe_empty { |
||||
public: |
||||
explicit toku_instr_probe_empty(UU(const toku_instr_key &key)) {} |
||||
|
||||
void start_with_source_location(UU(const char *src_file), UU(int src_line)) {} |
||||
|
||||
void stop() {} |
||||
}; |
||||
|
||||
#define TOKU_PROBE_START(p) p->start_with_source_location(__FILE__, __LINE__) |
||||
#define TOKU_PROBE_STOP(p) p->stop |
||||
|
||||
extern toku_instr_key toku_uninstrumented; |
||||
|
||||
#ifndef MYSQL_TOKUDB_ENGINE |
||||
|
||||
#include <pthread.h> |
||||
|
||||
class toku_instr_key { |
||||
public: |
||||
toku_instr_key(UU(toku_instr_object_type type), UU(const char *group), |
||||
UU(const char *name)) {} |
||||
|
||||
explicit toku_instr_key(UU(pfs_key_t key_id)) {} |
||||
// No-instrumentation constructor:
|
||||
toku_instr_key() {} |
||||
~toku_instr_key() {} |
||||
}; |
||||
|
||||
typedef toku_instr_probe_empty toku_instr_probe; |
||||
|
||||
enum class toku_instr_file_op { |
||||
file_stream_open, |
||||
file_create, |
||||
file_open, |
||||
file_delete, |
||||
file_rename, |
||||
file_read, |
||||
file_write, |
||||
file_sync, |
||||
file_stream_close, |
||||
file_close, |
||||
file_stat |
||||
}; |
||||
|
||||
struct PSI_file {}; |
||||
struct PSI_mutex {}; |
||||
|
||||
struct toku_io_instrumentation {}; |
||||
|
||||
inline int toku_pthread_create(UU(const toku_instr_key &key), pthread_t *thread, |
||||
const pthread_attr_t *attr, |
||||
void *(*start_routine)(void *), void *arg) { |
||||
return pthread_create(thread, attr, start_routine, arg); |
||||
} |
||||
|
||||
inline void toku_instr_register_current_thread() {} |
||||
|
||||
inline void toku_instr_delete_current_thread() {} |
||||
|
||||
// Instrument file creation, opening, closing, and renaming
|
||||
inline void toku_instr_file_open_begin(UU(toku_io_instrumentation &io_instr), |
||||
UU(const toku_instr_key &key), |
||||
UU(toku_instr_file_op op), |
||||
UU(const char *name), |
||||
UU(const char *src_file), |
||||
UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_file_stream_open_end( |
||||
UU(toku_io_instrumentation &io_instr), UU(TOKU_FILE &file)) {} |
||||
|
||||
inline void toku_instr_file_open_end(UU(toku_io_instrumentation &io_instr), |
||||
UU(int fd)) {} |
||||
|
||||
inline void toku_instr_file_name_close_begin( |
||||
UU(toku_io_instrumentation &io_instr), UU(const toku_instr_key &key), |
||||
UU(toku_instr_file_op op), UU(const char *name), UU(const char *src_file), |
||||
UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_file_stream_close_begin( |
||||
UU(toku_io_instrumentation &io_instr), UU(toku_instr_file_op op), |
||||
UU(TOKU_FILE &file), UU(const char *src_file), UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_file_fd_close_begin( |
||||
UU(toku_io_instrumentation &io_instr), UU(toku_instr_file_op op), |
||||
UU(int fd), UU(const char *src_file), UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_file_close_end(UU(toku_io_instrumentation &io_instr), |
||||
UU(int result)) {} |
||||
|
||||
inline void toku_instr_file_io_begin(UU(toku_io_instrumentation &io_instr), |
||||
UU(toku_instr_file_op op), UU(int fd), |
||||
UU(unsigned int count), |
||||
UU(const char *src_file), |
||||
UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_file_name_io_begin( |
||||
UU(toku_io_instrumentation &io_instr), UU(const toku_instr_key &key), |
||||
UU(toku_instr_file_op op), UU(const char *name), UU(unsigned int count), |
||||
UU(const char *src_file), UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_file_stream_io_begin( |
||||
UU(toku_io_instrumentation &io_instr), UU(toku_instr_file_op op), |
||||
UU(TOKU_FILE &file), UU(unsigned int count), UU(const char *src_file), |
||||
UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_file_io_end(UU(toku_io_instrumentation &io_instr), |
||||
UU(unsigned int count)) {} |
||||
|
||||
struct toku_mutex_t; |
||||
|
||||
struct toku_mutex_instrumentation {}; |
||||
|
||||
inline PSI_mutex *toku_instr_mutex_init(UU(const toku_instr_key &key), |
||||
UU(toku_mutex_t &mutex)) { |
||||
return nullptr; |
||||
} |
||||
|
||||
inline void toku_instr_mutex_destroy(UU(PSI_mutex *&mutex_instr)) {} |
||||
|
||||
inline void toku_instr_mutex_lock_start( |
||||
UU(toku_mutex_instrumentation &mutex_instr), UU(toku_mutex_t &mutex), |
||||
UU(const char *src_file), UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_mutex_trylock_start( |
||||
UU(toku_mutex_instrumentation &mutex_instr), UU(toku_mutex_t &mutex), |
||||
UU(const char *src_file), UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_mutex_lock_end( |
||||
UU(toku_mutex_instrumentation &mutex_instr), |
||||
UU(int pthread_mutex_lock_result)) {} |
||||
|
||||
inline void toku_instr_mutex_unlock(UU(PSI_mutex *mutex_instr)) {} |
||||
|
||||
struct toku_cond_instrumentation {}; |
||||
|
||||
enum class toku_instr_cond_op { |
||||
cond_wait, |
||||
cond_timedwait, |
||||
}; |
||||
|
||||
inline PSI_cond *toku_instr_cond_init(UU(const toku_instr_key &key), |
||||
UU(toku_cond_t &cond)) { |
||||
return nullptr; |
||||
} |
||||
|
||||
inline void toku_instr_cond_destroy(UU(PSI_cond *&cond_instr)) {} |
||||
|
||||
inline void toku_instr_cond_wait_start( |
||||
UU(toku_cond_instrumentation &cond_instr), UU(toku_instr_cond_op op), |
||||
UU(toku_cond_t &cond), UU(toku_mutex_t &mutex), UU(const char *src_file), |
||||
UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_cond_wait_end(UU(toku_cond_instrumentation &cond_instr), |
||||
UU(int pthread_cond_wait_result)) {} |
||||
|
||||
inline void toku_instr_cond_signal(UU(toku_cond_t &cond)) {} |
||||
|
||||
inline void toku_instr_cond_broadcast(UU(toku_cond_t &cond)) {} |
||||
|
||||
#if 0 |
||||
// rw locks are not used
|
||||
// rwlock instrumentation
|
||||
struct toku_rwlock_instrumentation {}; |
||||
|
||||
inline PSI_rwlock *toku_instr_rwlock_init(UU(const toku_instr_key &key), |
||||
UU(toku_pthread_rwlock_t &rwlock)) { |
||||
return nullptr; |
||||
} |
||||
|
||||
inline void toku_instr_rwlock_destroy(UU(PSI_rwlock *&rwlock_instr)) {} |
||||
|
||||
inline void toku_instr_rwlock_rdlock_wait_start( |
||||
UU(toku_rwlock_instrumentation &rwlock_instr), |
||||
UU(toku_pthread_rwlock_t &rwlock), |
||||
UU(const char *src_file), |
||||
UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_rwlock_wrlock_wait_start( |
||||
UU(toku_rwlock_instrumentation &rwlock_instr), |
||||
UU(toku_pthread_rwlock_t &rwlock), |
||||
UU(const char *src_file), |
||||
UU(int src_line)) {} |
||||
|
||||
inline void toku_instr_rwlock_rdlock_wait_end( |
||||
UU(toku_rwlock_instrumentation &rwlock_instr), |
||||
UU(int pthread_rwlock_wait_result)) {} |
||||
|
||||
inline void toku_instr_rwlock_wrlock_wait_end( |
||||
UU(toku_rwlock_instrumentation &rwlock_instr), |
||||
UU(int pthread_rwlock_wait_result)) {} |
||||
|
||||
inline void toku_instr_rwlock_unlock(UU(toku_pthread_rwlock_t &rwlock)) {} |
||||
#endif |
||||
|
||||
#else // MYSQL_TOKUDB_ENGINE
|
||||
// There can be not only mysql but also mongodb or any other PFS stuff
|
||||
#include <toku_instr_mysql.h> |
||||
#endif // MYSQL_TOKUDB_ENGINE
|
||||
|
||||
// Mutexes
|
||||
extern toku_instr_key manager_escalation_mutex_key; |
||||
extern toku_instr_key manager_escalator_mutex_key; |
||||
extern toku_instr_key manager_mutex_key; |
||||
extern toku_instr_key treenode_mutex_key; |
||||
extern toku_instr_key locktree_request_info_mutex_key; |
||||
extern toku_instr_key locktree_request_info_retry_mutex_key; |
||||
|
||||
// condition vars
|
||||
extern toku_instr_key lock_request_m_wait_cond_key; |
||||
extern toku_instr_key locktree_request_info_retry_cv_key; |
||||
extern toku_instr_key manager_m_escalator_done_key; // unused
|
@ -0,0 +1,73 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#if defined(__clang__) |
||||
#define constexpr_static_assert(a, b) |
||||
#else |
||||
#define constexpr_static_assert(a, b) static_assert(a, b) |
||||
#endif |
||||
|
||||
// include here, before they get deprecated
|
||||
#include <inttypes.h> |
||||
#include <stdint.h> |
||||
#include <stdio.h> |
||||
#include <sys/stat.h> |
||||
#include <sys/time.h> |
||||
#include <sys/types.h> |
||||
#include <unistd.h> |
||||
|
||||
#include "toku_atomic.h" |
||||
|
||||
#if defined(__cplusplus) |
||||
#include <type_traits> |
||||
#endif |
||||
|
||||
#if defined(__cplusplus) |
||||
// decltype() here gives a reference-to-pointer instead of just a pointer,
|
||||
// just use __typeof__
|
||||
#define CAST_FROM_VOIDP(name, value) name = static_cast<__typeof__(name)>(value) |
||||
#else |
||||
#define CAST_FROM_VOIDP(name, value) name = cast_to_typeof(name)(value) |
||||
#endif |
||||
|
||||
#define UU(x) x __attribute__((__unused__)) |
||||
|
||||
#include "toku_instrumentation.h" |
@ -0,0 +1,501 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <pthread.h> |
||||
#include <stdint.h> |
||||
#include <time.h> |
||||
|
||||
#include "toku_portability.h" |
||||
// PORT2: #include "toku_assert.h"
|
||||
|
||||
// TODO: some things moved toku_instrumentation.h, not necessarily the best
|
||||
// place
|
||||
typedef pthread_attr_t toku_pthread_attr_t; |
||||
typedef pthread_t toku_pthread_t; |
||||
typedef pthread_mutex_t toku_pthread_mutex_t; |
||||
typedef pthread_condattr_t toku_pthread_condattr_t; |
||||
typedef pthread_cond_t toku_pthread_cond_t; |
||||
typedef pthread_rwlockattr_t toku_pthread_rwlockattr_t; |
||||
typedef pthread_key_t toku_pthread_key_t; |
||||
typedef struct timespec toku_timespec_t; |
||||
|
||||
// TODO: break this include loop
|
||||
#include <pthread.h> |
||||
typedef pthread_mutexattr_t toku_pthread_mutexattr_t; |
||||
|
||||
struct toku_mutex_t { |
||||
pthread_mutex_t pmutex; |
||||
struct PSI_mutex *psi_mutex; /* The performance schema instrumentation hook */ |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
pthread_t owner; // = pthread_self(); // for debugging
|
||||
bool locked; |
||||
bool valid; |
||||
pfs_key_t instr_key_id; |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
}; |
||||
|
||||
struct toku_cond_t { |
||||
pthread_cond_t pcond; |
||||
struct PSI_cond *psi_cond; |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
pfs_key_t instr_key_id; |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
}; |
||||
|
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
#define TOKU_COND_INITIALIZER \ |
||||
{ .pcond = PTHREAD_COND_INITIALIZER, .psi_cond = nullptr, .instr_key_id = 0 } |
||||
#else |
||||
#define TOKU_COND_INITIALIZER \ |
||||
{ .pcond = PTHREAD_COND_INITIALIZER, .psi_cond = nullptr } |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
|
||||
struct toku_pthread_rwlock_t { |
||||
pthread_rwlock_t rwlock; |
||||
struct PSI_rwlock *psi_rwlock; |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
pfs_key_t instr_key_id; |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
}; |
||||
|
||||
typedef struct toku_mutex_aligned { |
||||
toku_mutex_t aligned_mutex __attribute__((__aligned__(64))); |
||||
} toku_mutex_aligned_t; |
||||
|
||||
// Initializing with {} will fill in a struct with all zeros.
|
||||
// But you may also need a pragma to suppress the warnings, as follows
|
||||
//
|
||||
// #pragma GCC diagnostic push
|
||||
// #pragma GCC diagnostic ignored "-Wmissing-field-initializers"
|
||||
// toku_mutex_t foo = ZERO_MUTEX_INITIALIZER;
|
||||
// #pragma GCC diagnostic pop
|
||||
//
|
||||
// In general it will be a lot of busy work to make this codebase compile
|
||||
// cleanly with -Wmissing-field-initializers
|
||||
|
||||
#define ZERO_MUTEX_INITIALIZER \ |
||||
{} |
||||
|
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
#define TOKU_MUTEX_INITIALIZER \ |
||||
{ \
|
||||
.pmutex = PTHREAD_MUTEX_INITIALIZER, .psi_mutex = nullptr, .owner = 0, \
|
||||
.locked = false, .valid = true, .instr_key_id = 0 \
|
||||
} |
||||
#else |
||||
#define TOKU_MUTEX_INITIALIZER \ |
||||
{ .pmutex = PTHREAD_MUTEX_INITIALIZER, .psi_mutex = nullptr } |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
|
||||
// Darwin doesn't provide adaptive mutexes
|
||||
#if defined(__APPLE__) |
||||
#define TOKU_MUTEX_ADAPTIVE PTHREAD_MUTEX_DEFAULT |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
#define TOKU_ADAPTIVE_MUTEX_INITIALIZER \ |
||||
{ \
|
||||
.pmutex = PTHREAD_MUTEX_INITIALIZER, .psi_mutex = nullptr, .owner = 0, \
|
||||
.locked = false, .valid = true, .instr_key_id = 0 \
|
||||
} |
||||
#else |
||||
#define TOKU_ADAPTIVE_MUTEX_INITIALIZER \ |
||||
{ .pmutex = PTHREAD_MUTEX_INITIALIZER, .psi_mutex = nullptr } |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
#else // __FreeBSD__, __linux__, at least
|
||||
#define TOKU_MUTEX_ADAPTIVE PTHREAD_MUTEX_ADAPTIVE_NP |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
#define TOKU_ADAPTIVE_MUTEX_INITIALIZER \ |
||||
{ \
|
||||
.pmutex = PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP, .psi_mutex = nullptr, \
|
||||
.owner = 0, .locked = false, .valid = true, .instr_key_id = 0 \
|
||||
} |
||||
#else |
||||
#define TOKU_ADAPTIVE_MUTEX_INITIALIZER \ |
||||
{ .pmutex = PTHREAD_ADAPTIVE_MUTEX_INITIALIZER_NP, .psi_mutex = nullptr } |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
#endif // defined(__APPLE__)
|
||||
|
||||
// Different OSes implement mutexes as different amounts of nested structs.
|
||||
// C++ will fill out all missing values with zeroes if you provide at least one
|
||||
// zero, but it needs the right amount of nesting.
|
||||
#if defined(__FreeBSD__) |
||||
#define ZERO_COND_INITIALIZER \ |
||||
{ 0 } |
||||
#elif defined(__APPLE__) |
||||
#define ZERO_COND_INITIALIZER \ |
||||
{ \
|
||||
{ 0 } \
|
||||
} |
||||
#else // __linux__, at least
|
||||
#define ZERO_COND_INITIALIZER \ |
||||
{} |
||||
#endif |
||||
|
||||
static inline void toku_mutexattr_init(toku_pthread_mutexattr_t *attr) { |
||||
int r = pthread_mutexattr_init(attr); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
static inline void toku_mutexattr_settype(toku_pthread_mutexattr_t *attr, |
||||
int type) { |
||||
int r = pthread_mutexattr_settype(attr, type); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
static inline void toku_mutexattr_destroy(toku_pthread_mutexattr_t *attr) { |
||||
int r = pthread_mutexattr_destroy(attr); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
static inline void toku_mutex_assert_locked(const toku_mutex_t *mutex) { |
||||
invariant(mutex->locked); |
||||
invariant(mutex->owner == pthread_self()); |
||||
} |
||||
#else |
||||
static inline void toku_mutex_assert_locked(const toku_mutex_t *mutex |
||||
__attribute__((unused))) {} |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
|
||||
// asserting that a mutex is unlocked only makes sense
|
||||
// if the calling thread can guaruntee that no other threads
|
||||
// are trying to lock this mutex at the time of the assertion
|
||||
//
|
||||
// a good example of this is a tree with mutexes on each node.
|
||||
// when a node is locked the caller knows that no other threads
|
||||
// can be trying to lock its childrens' mutexes. the children
|
||||
// are in one of two fixed states: locked or unlocked.
|
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
static inline void toku_mutex_assert_unlocked(toku_mutex_t *mutex) { |
||||
invariant(mutex->owner == 0); |
||||
invariant(!mutex->locked); |
||||
} |
||||
#else |
||||
static inline void toku_mutex_assert_unlocked(toku_mutex_t *mutex |
||||
__attribute__((unused))) {} |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
|
||||
#define toku_mutex_lock(M) \ |
||||
toku_mutex_lock_with_source_location(M, __FILE__, __LINE__) |
||||
|
||||
static inline void toku_cond_init(toku_cond_t *cond, |
||||
const toku_pthread_condattr_t *attr) { |
||||
int r = pthread_cond_init(&cond->pcond, attr); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
#define toku_mutex_trylock(M) \ |
||||
toku_mutex_trylock_with_source_location(M, __FILE__, __LINE__) |
||||
|
||||
inline void toku_mutex_unlock(toku_mutex_t *mutex) { |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
invariant(mutex->owner == pthread_self()); |
||||
invariant(mutex->valid); |
||||
invariant(mutex->locked); |
||||
mutex->locked = false; |
||||
mutex->owner = 0; |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
toku_instr_mutex_unlock(mutex->psi_mutex); |
||||
int r = pthread_mutex_unlock(&mutex->pmutex); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_mutex_lock_with_source_location(toku_mutex_t *mutex, |
||||
const char *src_file, |
||||
int src_line) { |
||||
toku_mutex_instrumentation mutex_instr; |
||||
toku_instr_mutex_lock_start(mutex_instr, *mutex, src_file, src_line); |
||||
|
||||
const int r = pthread_mutex_lock(&mutex->pmutex); |
||||
toku_instr_mutex_lock_end(mutex_instr, r); |
||||
|
||||
assert_zero(r); |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
invariant(mutex->valid); |
||||
invariant(!mutex->locked); |
||||
invariant(mutex->owner == 0); |
||||
mutex->locked = true; |
||||
mutex->owner = pthread_self(); |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
} |
||||
|
||||
inline int toku_mutex_trylock_with_source_location(toku_mutex_t *mutex, |
||||
const char *src_file, |
||||
int src_line) { |
||||
toku_mutex_instrumentation mutex_instr; |
||||
toku_instr_mutex_trylock_start(mutex_instr, *mutex, src_file, src_line); |
||||
|
||||
const int r = pthread_mutex_lock(&mutex->pmutex); |
||||
toku_instr_mutex_lock_end(mutex_instr, r); |
||||
|
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
if (r == 0) { |
||||
invariant(mutex->valid); |
||||
invariant(!mutex->locked); |
||||
invariant(mutex->owner == 0); |
||||
mutex->locked = true; |
||||
mutex->owner = pthread_self(); |
||||
} |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
return r; |
||||
} |
||||
|
||||
#define toku_cond_wait(C, M) \ |
||||
toku_cond_wait_with_source_location(C, M, __FILE__, __LINE__) |
||||
|
||||
#define toku_cond_timedwait(C, M, W) \ |
||||
toku_cond_timedwait_with_source_location(C, M, W, __FILE__, __LINE__) |
||||
|
||||
inline void toku_cond_init(const toku_instr_key &key, toku_cond_t *cond, |
||||
const pthread_condattr_t *attr) { |
||||
toku_instr_cond_init(key, *cond); |
||||
int r = pthread_cond_init(&cond->pcond, attr); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_cond_destroy(toku_cond_t *cond) { |
||||
toku_instr_cond_destroy(cond->psi_cond); |
||||
int r = pthread_cond_destroy(&cond->pcond); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_cond_wait_with_source_location(toku_cond_t *cond, |
||||
toku_mutex_t *mutex, |
||||
const char *src_file, |
||||
int src_line) { |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
invariant(mutex->locked); |
||||
mutex->locked = false; |
||||
mutex->owner = 0; |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
|
||||
/* Instrumentation start */ |
||||
toku_cond_instrumentation cond_instr; |
||||
toku_instr_cond_wait_start(cond_instr, toku_instr_cond_op::cond_wait, *cond, |
||||
*mutex, src_file, src_line); |
||||
|
||||
/* Instrumented code */ |
||||
const int r = pthread_cond_wait(&cond->pcond, &mutex->pmutex); |
||||
|
||||
/* Instrumentation end */ |
||||
toku_instr_cond_wait_end(cond_instr, r); |
||||
|
||||
assert_zero(r); |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
invariant(!mutex->locked); |
||||
mutex->locked = true; |
||||
mutex->owner = pthread_self(); |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
} |
||||
|
||||
inline int toku_cond_timedwait_with_source_location(toku_cond_t *cond, |
||||
toku_mutex_t *mutex, |
||||
toku_timespec_t *wakeup_at, |
||||
const char *src_file, |
||||
int src_line) { |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
invariant(mutex->locked); |
||||
mutex->locked = false; |
||||
mutex->owner = 0; |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
|
||||
/* Instrumentation start */ |
||||
toku_cond_instrumentation cond_instr; |
||||
toku_instr_cond_wait_start(cond_instr, toku_instr_cond_op::cond_timedwait, |
||||
*cond, *mutex, src_file, src_line); |
||||
|
||||
/* Instrumented code */ |
||||
const int r = pthread_cond_timedwait(&cond->pcond, &mutex->pmutex, wakeup_at); |
||||
|
||||
/* Instrumentation end */ |
||||
toku_instr_cond_wait_end(cond_instr, r); |
||||
|
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
invariant(!mutex->locked); |
||||
mutex->locked = true; |
||||
mutex->owner = pthread_self(); |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
return r; |
||||
} |
||||
|
||||
inline void toku_cond_signal(toku_cond_t *cond) { |
||||
toku_instr_cond_signal(*cond); |
||||
const int r = pthread_cond_signal(&cond->pcond); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_cond_broadcast(toku_cond_t *cond) { |
||||
toku_instr_cond_broadcast(*cond); |
||||
const int r = pthread_cond_broadcast(&cond->pcond); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_mutex_init(const toku_instr_key &key, toku_mutex_t *mutex, |
||||
const toku_pthread_mutexattr_t *attr) { |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
mutex->valid = true; |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
toku_instr_mutex_init(key, *mutex); |
||||
const int r = pthread_mutex_init(&mutex->pmutex, attr); |
||||
assert_zero(r); |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
mutex->locked = false; |
||||
invariant(mutex->valid); |
||||
mutex->valid = true; |
||||
mutex->owner = 0; |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
} |
||||
|
||||
inline void toku_mutex_destroy(toku_mutex_t *mutex) { |
||||
#if defined(TOKU_PTHREAD_DEBUG) |
||||
invariant(mutex->valid); |
||||
mutex->valid = false; |
||||
invariant(!mutex->locked); |
||||
#endif // defined(TOKU_PTHREAD_DEBUG)
|
||||
toku_instr_mutex_destroy(mutex->psi_mutex); |
||||
int r = pthread_mutex_destroy(&mutex->pmutex); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
#define toku_pthread_rwlock_rdlock(RW) \ |
||||
toku_pthread_rwlock_rdlock_with_source_location(RW, __FILE__, __LINE__) |
||||
|
||||
#define toku_pthread_rwlock_wrlock(RW) \ |
||||
toku_pthread_rwlock_wrlock_with_source_location(RW, __FILE__, __LINE__) |
||||
|
||||
#if 0 |
||||
inline void toku_pthread_rwlock_init( |
||||
const toku_instr_key &key, |
||||
toku_pthread_rwlock_t *__restrict rwlock, |
||||
const toku_pthread_rwlockattr_t *__restrict attr) { |
||||
toku_instr_rwlock_init(key, *rwlock); |
||||
int r = pthread_rwlock_init(&rwlock->rwlock, attr); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_pthread_rwlock_destroy(toku_pthread_rwlock_t *rwlock) { |
||||
toku_instr_rwlock_destroy(rwlock->psi_rwlock); |
||||
int r = pthread_rwlock_destroy(&rwlock->rwlock); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_pthread_rwlock_rdlock_with_source_location( |
||||
toku_pthread_rwlock_t *rwlock, |
||||
const char *src_file, |
||||
uint src_line) { |
||||
|
||||
/* Instrumentation start */ |
||||
toku_rwlock_instrumentation rwlock_instr; |
||||
toku_instr_rwlock_rdlock_wait_start( |
||||
rwlock_instr, *rwlock, src_file, src_line); |
||||
/* Instrumented code */ |
||||
const int r = pthread_rwlock_rdlock(&rwlock->rwlock); |
||||
|
||||
/* Instrumentation end */ |
||||
toku_instr_rwlock_rdlock_wait_end(rwlock_instr, r); |
||||
|
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_pthread_rwlock_wrlock_with_source_location( |
||||
toku_pthread_rwlock_t *rwlock, |
||||
const char *src_file, |
||||
uint src_line) { |
||||
|
||||
/* Instrumentation start */ |
||||
toku_rwlock_instrumentation rwlock_instr; |
||||
toku_instr_rwlock_wrlock_wait_start( |
||||
rwlock_instr, *rwlock, src_file, src_line); |
||||
/* Instrumented code */ |
||||
const int r = pthread_rwlock_wrlock(&rwlock->rwlock); |
||||
|
||||
/* Instrumentation end */ |
||||
toku_instr_rwlock_wrlock_wait_end(rwlock_instr, r); |
||||
|
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_pthread_rwlock_rdunlock(toku_pthread_rwlock_t *rwlock) { |
||||
toku_instr_rwlock_unlock(*rwlock); |
||||
const int r = pthread_rwlock_unlock(&rwlock->rwlock); |
||||
assert_zero(r); |
||||
} |
||||
|
||||
inline void toku_pthread_rwlock_wrunlock(toku_pthread_rwlock_t *rwlock) { |
||||
toku_instr_rwlock_unlock(*rwlock); |
||||
const int r = pthread_rwlock_unlock(&rwlock->rwlock); |
||||
assert_zero(r); |
||||
} |
||||
#endif |
||||
|
||||
static inline int toku_pthread_join(toku_pthread_t thread, void **value_ptr) { |
||||
return pthread_join(thread, value_ptr); |
||||
} |
||||
|
||||
static inline int toku_pthread_detach(toku_pthread_t thread) { |
||||
return pthread_detach(thread); |
||||
} |
||||
|
||||
static inline int toku_pthread_key_create(toku_pthread_key_t *key, |
||||
void (*destroyf)(void *)) { |
||||
return pthread_key_create(key, destroyf); |
||||
} |
||||
|
||||
static inline int toku_pthread_key_delete(toku_pthread_key_t key) { |
||||
return pthread_key_delete(key); |
||||
} |
||||
|
||||
static inline void *toku_pthread_getspecific(toku_pthread_key_t key) { |
||||
return pthread_getspecific(key); |
||||
} |
||||
|
||||
static inline int toku_pthread_setspecific(toku_pthread_key_t key, void *data) { |
||||
return pthread_setspecific(key, data); |
||||
} |
||||
|
||||
int toku_pthread_yield(void) __attribute__((__visibility__("default"))); |
||||
|
||||
static inline toku_pthread_t toku_pthread_self(void) { return pthread_self(); } |
||||
|
||||
static inline void *toku_pthread_done(void *exit_value) { |
||||
toku_instr_delete_current_thread(); |
||||
pthread_exit(exit_value); |
||||
return nullptr; // Avoid compiler warning
|
||||
} |
@ -0,0 +1,165 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
// PORT2: #include <portability/toku_config.h>
|
||||
|
||||
#ifdef HAVE_valgrind |
||||
#undef USE_VALGRIND |
||||
#define USE_VALGRIND 1 |
||||
#endif |
||||
|
||||
#if defined(__linux__) && USE_VALGRIND |
||||
|
||||
#include <valgrind/drd.h> |
||||
#include <valgrind/helgrind.h> |
||||
|
||||
#define TOKU_ANNOTATE_NEW_MEMORY(p, size) ANNOTATE_NEW_MEMORY(p, size) |
||||
#define TOKU_VALGRIND_HG_ENABLE_CHECKING(p, size) \ |
||||
VALGRIND_HG_ENABLE_CHECKING(p, size) |
||||
#define TOKU_VALGRIND_HG_DISABLE_CHECKING(p, size) \ |
||||
VALGRIND_HG_DISABLE_CHECKING(p, size) |
||||
#define TOKU_DRD_IGNORE_VAR(v) DRD_IGNORE_VAR(v) |
||||
#define TOKU_DRD_STOP_IGNORING_VAR(v) DRD_STOP_IGNORING_VAR(v) |
||||
#define TOKU_ANNOTATE_IGNORE_READS_BEGIN() ANNOTATE_IGNORE_READS_BEGIN() |
||||
#define TOKU_ANNOTATE_IGNORE_READS_END() ANNOTATE_IGNORE_READS_END() |
||||
#define TOKU_ANNOTATE_IGNORE_WRITES_BEGIN() ANNOTATE_IGNORE_WRITES_BEGIN() |
||||
#define TOKU_ANNOTATE_IGNORE_WRITES_END() ANNOTATE_IGNORE_WRITES_END() |
||||
|
||||
/*
|
||||
* How to make helgrind happy about tree rotations and new mutex orderings: |
||||
* |
||||
* // Tell helgrind that we unlocked it so that the next call doesn't get a
|
||||
* "destroyed a locked mutex" error. |
||||
* // Tell helgrind that we destroyed the mutex.
|
||||
* VALGRIND_HG_MUTEX_UNLOCK_PRE(&locka); |
||||
* VALGRIND_HG_MUTEX_DESTROY_PRE(&locka); |
||||
* |
||||
* // And recreate it. It would be better to simply be able to say that the
|
||||
* order on these two can now be reversed, because this code forgets all the |
||||
* ordering information for this mutex. |
||||
* // Then tell helgrind that we have locked it again.
|
||||
* VALGRIND_HG_MUTEX_INIT_POST(&locka, 0); |
||||
* VALGRIND_HG_MUTEX_LOCK_POST(&locka); |
||||
* |
||||
* When the ordering of two locks changes, we don't need tell Helgrind about do |
||||
* both locks. Just one is good enough. |
||||
*/ |
||||
|
||||
#define TOKU_VALGRIND_RESET_MUTEX_ORDERING_INFO(mutex) \ |
||||
VALGRIND_HG_MUTEX_UNLOCK_PRE(mutex); \
|
||||
VALGRIND_HG_MUTEX_DESTROY_PRE(mutex); \
|
||||
VALGRIND_HG_MUTEX_INIT_POST(mutex, 0); \
|
||||
VALGRIND_HG_MUTEX_LOCK_POST(mutex); |
||||
|
||||
#else // !defined(__linux__) || !USE_VALGRIND
|
||||
|
||||
#define NVALGRIND 1 |
||||
#define TOKU_ANNOTATE_NEW_MEMORY(p, size) ((void)0) |
||||
#define TOKU_VALGRIND_HG_ENABLE_CHECKING(p, size) ((void)0) |
||||
#define TOKU_VALGRIND_HG_DISABLE_CHECKING(p, size) ((void)0) |
||||
#define TOKU_DRD_IGNORE_VAR(v) |
||||
#define TOKU_DRD_STOP_IGNORING_VAR(v) |
||||
#define TOKU_ANNOTATE_IGNORE_READS_BEGIN() ((void)0) |
||||
#define TOKU_ANNOTATE_IGNORE_READS_END() ((void)0) |
||||
#define TOKU_ANNOTATE_IGNORE_WRITES_BEGIN() ((void)0) |
||||
#define TOKU_ANNOTATE_IGNORE_WRITES_END() ((void)0) |
||||
#define TOKU_VALGRIND_RESET_MUTEX_ORDERING_INFO(mutex) |
||||
#undef RUNNING_ON_VALGRIND |
||||
#define RUNNING_ON_VALGRIND (0U) |
||||
#endif |
||||
|
||||
// Valgrind 3.10.1 (and previous versions).
|
||||
// Problems with VALGRIND_HG_DISABLE_CHECKING and VALGRIND_HG_ENABLE_CHECKING.
|
||||
// Helgrind's implementation of disable and enable checking causes false races
|
||||
// to be reported. In addition, the race report does not include ANY
|
||||
// information about the code that uses the helgrind disable and enable
|
||||
// functions. Therefore, it is very difficult to figure out the cause of the
|
||||
// race. DRD does implement the disable and enable functions.
|
||||
|
||||
// Problems with ANNOTATE_IGNORE_READS.
|
||||
// Helgrind does not implement ignore reads.
|
||||
// Annotate ignore reads is the way to inform DRD to ignore racy reads.
|
||||
|
||||
// FT code uses unsafe reads in several places. These unsafe reads have been
|
||||
// noted as valid since they use the toku_unsafe_fetch function. Unfortunately,
|
||||
// this causes helgrind to report erroneous data races which makes use of
|
||||
// helgrind problematic.
|
||||
|
||||
// Unsafely fetch and return a `T' from src, telling drd to ignore
|
||||
// racey access to src for the next sizeof(*src) bytes
|
||||
template <typename T> |
||||
T toku_unsafe_fetch(T *src) { |
||||
if (0) |
||||
TOKU_VALGRIND_HG_DISABLE_CHECKING(src, |
||||
sizeof *src); // disabled, see comment
|
||||
TOKU_ANNOTATE_IGNORE_READS_BEGIN(); |
||||
T r = *src; |
||||
TOKU_ANNOTATE_IGNORE_READS_END(); |
||||
if (0) |
||||
TOKU_VALGRIND_HG_ENABLE_CHECKING(src, |
||||
sizeof *src); // disabled, see comment
|
||||
return r; |
||||
} |
||||
|
||||
template <typename T> |
||||
T toku_unsafe_fetch(T &src) { |
||||
return toku_unsafe_fetch(&src); |
||||
} |
||||
|
||||
// Unsafely set a `T' value into *dest from src, telling drd to ignore
|
||||
// racey access to dest for the next sizeof(*dest) bytes
|
||||
template <typename T> |
||||
void toku_unsafe_set(T *dest, const T src) { |
||||
if (0) |
||||
TOKU_VALGRIND_HG_DISABLE_CHECKING(dest, |
||||
sizeof *dest); // disabled, see comment
|
||||
TOKU_ANNOTATE_IGNORE_WRITES_BEGIN(); |
||||
*dest = src; |
||||
TOKU_ANNOTATE_IGNORE_WRITES_END(); |
||||
if (0) |
||||
TOKU_VALGRIND_HG_ENABLE_CHECKING(dest, |
||||
sizeof *dest); // disabled, see comment
|
||||
} |
||||
|
||||
template <typename T> |
||||
void toku_unsafe_set(T &dest, const T src) { |
||||
toku_unsafe_set(&dest, src); |
||||
} |
@ -0,0 +1,158 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
// PORT2: #include "toku_config.h"
|
||||
|
||||
#include <stdint.h> |
||||
#include <sys/time.h> |
||||
#include <time.h> |
||||
#if defined(__powerpc__) |
||||
#include <sys/platform/ppc.h> |
||||
#endif |
||||
|
||||
#if 0 |
||||
static inline float toku_tdiff (struct timeval *a, struct timeval *b) { |
||||
return (float)((a->tv_sec - b->tv_sec) + 1e-6 * (a->tv_usec - b->tv_usec)); |
||||
} |
||||
// PORT2: temporary:
|
||||
#define HAVE_CLOCK_REALTIME |
||||
#if !defined(HAVE_CLOCK_REALTIME) |
||||
// OS X does not have clock_gettime, we fake clockid_t for the interface, and we'll implement it with clock_get_time.
|
||||
typedef int clockid_t; |
||||
// just something bogus, it doesn't matter, we just want to make sure we're
|
||||
// only supporting this mode because we're not sure we can support other modes
|
||||
// without a real clock_gettime()
|
||||
#define CLOCK_REALTIME 0x01867234 |
||||
#endif |
||||
int toku_clock_gettime(clockid_t clk_id, struct timespec *ts) __attribute__((__visibility__("default"))); |
||||
#endif |
||||
|
||||
// *************** Performance timers ************************
|
||||
// What do you really want from a performance timer:
|
||||
// (1) Can determine actual time of day from the performance time.
|
||||
// (2) Time goes forward, never backward.
|
||||
// (3) Same time on different processors (or even different machines).
|
||||
// (4) Time goes forward at a constant rate (doesn't get faster and slower)
|
||||
// (5) Portable.
|
||||
// (6) Getting the time is cheap.
|
||||
// Unfortuately it seems tough to get Properties 1-5. So we go for Property 6,,
|
||||
// but we abstract it. We offer a type tokutime_t which can hold the time. This
|
||||
// type can be subtracted to get a time difference. We can get the present time
|
||||
// cheaply. We can convert this type to seconds (but that can be expensive). The
|
||||
// implementation is to use RDTSC (hence we lose property 3: not portable).
|
||||
// Recent machines have constant_tsc in which case we get property (4).
|
||||
// Recent OSs on recent machines (that have RDTSCP) fix the per-processor clock
|
||||
// skew, so we get property (3). We get property 2 with RDTSC (as long as
|
||||
// there's not any skew). We don't even try to get propety 1, since we don't
|
||||
// need it. The decision here is that these times are really accurate only on
|
||||
// modern machines with modern OSs.
|
||||
typedef uint64_t tokutime_t; // Time type used in by tokutek timers.
|
||||
|
||||
#if 0 |
||||
// The value of tokutime_t is not specified here.
|
||||
// It might be microseconds since 1/1/1970 (if gettimeofday() is
|
||||
// used), or clock cycles since boot (if rdtsc is used). Or something
|
||||
// else.
|
||||
// Two tokutime_t values can be subtracted to get a time difference.
|
||||
// Use tokutime_to_seconds to that convert difference to seconds.
|
||||
// We want get_tokutime() to be fast, but don't care so much about tokutime_to_seconds();
|
||||
//
|
||||
// For accurate time calculations do the subtraction in the right order:
|
||||
// Right: tokutime_to_seconds(t1-t2);
|
||||
// Wrong tokutime_to_seconds(t1)-toku_time_to_seconds(t2);
|
||||
// Doing it the wrong way is likely to result in loss of precision.
|
||||
// A double can hold numbers up to about 53 bits. RDTSC which uses about 33 bits every second, so that leaves
|
||||
// 2^20 seconds from booting (about 2 weeks) before the RDTSC value cannot be represented accurately as a double.
|
||||
//
|
||||
double tokutime_to_seconds(tokutime_t) __attribute__((__visibility__("default"))); // Convert tokutime to seconds.
|
||||
|
||||
#endif |
||||
|
||||
// Get the value of tokutime for right now. We want this to be fast, so we
|
||||
// expose the implementation as RDTSC.
|
||||
static inline tokutime_t toku_time_now(void) { |
||||
#if defined(__x86_64__) || defined(__i386__) |
||||
uint32_t lo, hi; |
||||
__asm__ __volatile__("rdtsc" : "=a"(lo), "=d"(hi)); |
||||
return (uint64_t)hi << 32 | lo; |
||||
#elif defined(__aarch64__) |
||||
uint64_t result; |
||||
__asm __volatile__("mrs %[rt], cntvct_el0" : [ rt ] "=r"(result)); |
||||
return result; |
||||
#elif defined(__powerpc__) |
||||
return __ppc_get_timebase(); |
||||
#else |
||||
#error No timer implementation for this platform |
||||
#endif |
||||
} |
||||
|
||||
static inline uint64_t toku_current_time_microsec(void) { |
||||
struct timeval t; |
||||
gettimeofday(&t, NULL); |
||||
return t.tv_sec * (1UL * 1000 * 1000) + t.tv_usec; |
||||
} |
||||
|
||||
#if 0 |
||||
// sleep microseconds
|
||||
static inline void toku_sleep_microsec(uint64_t ms) { |
||||
struct timeval t; |
||||
|
||||
t.tv_sec = ms / 1000000; |
||||
t.tv_usec = ms % 1000000; |
||||
|
||||
select(0, NULL, NULL, NULL, &t); |
||||
} |
||||
#endif |
||||
|
||||
/*
|
||||
PORT: Usage of this file: |
||||
|
||||
uint64_t toku_current_time_microsec() // uses gettimeoday
|
||||
is used to track how much time various operations took (for example, lock |
||||
escalation). (TODO: it is not clear why these operations are tracked with |
||||
microsecond precision while others use nanoseconds) |
||||
|
||||
tokutime_t toku_time_now() // uses rdtsc
|
||||
seems to be used for a very similar purpose. This has greater precision |
||||
|
||||
RocksDB environment provides Env::Default()->NowMicros() and NowNanos() which |
||||
should be adequate substitutes. |
||||
*/ |
@ -0,0 +1,27 @@ |
||||
//
|
||||
// A substitute for ft/txn/txn.h
|
||||
//
|
||||
#pragma once |
||||
|
||||
#include <set> |
||||
|
||||
#include "../util/omt.h" |
||||
|
||||
typedef uint64_t TXNID; |
||||
#define TXNID_NONE ((TXNID)0) |
||||
|
||||
// A set of transactions
|
||||
// (TODO: consider using class toku::txnid_set. The reason for using STL
|
||||
// container was that its API is easier)
|
||||
class TxnidVector : public std::set<TXNID> { |
||||
public: |
||||
bool contains(TXNID txnid) { return find(txnid) != end(); } |
||||
}; |
||||
|
||||
// A value for lock structures with a meaning "the lock is owned by multiple
|
||||
// transactions (and one has to check the TxnidVector to get their ids)
|
||||
#define TXNID_SHARED (TXNID(-1)) |
||||
|
||||
// Auxiliary value meaning "any transaction id will do". No real transaction
|
||||
// may have this is as id.
|
||||
#define TXNID_ANY (TXNID(-2)) |
@ -0,0 +1,132 @@ |
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
/*
|
||||
This is a dump ground to make Lock Tree work without the rest of TokuDB. |
||||
*/ |
||||
#include <string.h> |
||||
|
||||
#include "db.h" |
||||
#include "ft/ft-status.h" |
||||
#include "portability/memory.h" |
||||
#include "util/dbt.h" |
||||
|
||||
// portability/os_malloc.cc
|
||||
|
||||
void toku_free(void *p) { free(p); } |
||||
|
||||
void *toku_xmalloc(size_t size) { return malloc(size); } |
||||
|
||||
void *toku_xrealloc(void *v, size_t size) { return realloc(v, size); } |
||||
|
||||
void *toku_xmemdup(const void *v, size_t len) { |
||||
void *p = toku_xmalloc(len); |
||||
memcpy(p, v, len); |
||||
return p; |
||||
} |
||||
|
||||
// TODO: what are the X-functions? Xcalloc, Xrealloc?
|
||||
void *toku_xcalloc(size_t nmemb, size_t size) { return calloc(nmemb, size); } |
||||
|
||||
// ft-ft-opts.cc:
|
||||
|
||||
// locktree
|
||||
toku_instr_key lock_request_m_wait_cond_key; |
||||
toku_instr_key manager_m_escalator_done_key; |
||||
toku_instr_key locktree_request_info_mutex_key; |
||||
toku_instr_key locktree_request_info_retry_mutex_key; |
||||
toku_instr_key locktree_request_info_retry_cv_key; |
||||
|
||||
toku_instr_key treenode_mutex_key; |
||||
toku_instr_key manager_mutex_key; |
||||
toku_instr_key manager_escalation_mutex_key; |
||||
toku_instr_key manager_escalator_mutex_key; |
||||
|
||||
// portability/memory.cc
|
||||
size_t toku_memory_footprint(void *, size_t touched) { return touched; } |
||||
|
||||
// ft/ft-status.c
|
||||
// PORT2: note: the @c parameter to TOKUFT_STATUS_INIT must not start with
|
||||
// "TOKU"
|
||||
LTM_STATUS_S ltm_status; |
||||
void LTM_STATUS_S::init() { |
||||
if (m_initialized) return; |
||||
#define LTM_STATUS_INIT(k, c, t, l) \ |
||||
TOKUFT_STATUS_INIT((*this), k, c, t, "locktree: " l, \
|
||||
TOKU_ENGINE_STATUS | TOKU_GLOBAL_STATUS) |
||||
LTM_STATUS_INIT(LTM_SIZE_CURRENT, LOCKTREE_MEMORY_SIZE, STATUS_UINT64, |
||||
"memory size"); |
||||
LTM_STATUS_INIT(LTM_SIZE_LIMIT, LOCKTREE_MEMORY_SIZE_LIMIT, STATUS_UINT64, |
||||
"memory size limit"); |
||||
LTM_STATUS_INIT(LTM_ESCALATION_COUNT, LOCKTREE_ESCALATION_NUM, STATUS_UINT64, |
||||
"number of times lock escalation ran"); |
||||
LTM_STATUS_INIT(LTM_ESCALATION_TIME, LOCKTREE_ESCALATION_SECONDS, |
||||
STATUS_TOKUTIME, "time spent running escalation (seconds)"); |
||||
LTM_STATUS_INIT(LTM_ESCALATION_LATEST_RESULT, |
||||
LOCKTREE_LATEST_POST_ESCALATION_MEMORY_SIZE, STATUS_UINT64, |
||||
"latest post-escalation memory size"); |
||||
LTM_STATUS_INIT(LTM_NUM_LOCKTREES, LOCKTREE_OPEN_CURRENT, STATUS_UINT64, |
||||
"number of locktrees open now"); |
||||
LTM_STATUS_INIT(LTM_LOCK_REQUESTS_PENDING, LOCKTREE_PENDING_LOCK_REQUESTS, |
||||
STATUS_UINT64, "number of pending lock requests"); |
||||
LTM_STATUS_INIT(LTM_STO_NUM_ELIGIBLE, LOCKTREE_STO_ELIGIBLE_NUM, |
||||
STATUS_UINT64, "number of locktrees eligible for the STO"); |
||||
LTM_STATUS_INIT(LTM_STO_END_EARLY_COUNT, LOCKTREE_STO_ENDED_NUM, |
||||
STATUS_UINT64, |
||||
"number of times a locktree ended the STO early"); |
||||
LTM_STATUS_INIT(LTM_STO_END_EARLY_TIME, LOCKTREE_STO_ENDED_SECONDS, |
||||
STATUS_TOKUTIME, "time spent ending the STO early (seconds)"); |
||||
LTM_STATUS_INIT(LTM_WAIT_COUNT, LOCKTREE_WAIT_COUNT, STATUS_UINT64, |
||||
"number of wait locks"); |
||||
LTM_STATUS_INIT(LTM_WAIT_TIME, LOCKTREE_WAIT_TIME, STATUS_UINT64, |
||||
"time waiting for locks"); |
||||
LTM_STATUS_INIT(LTM_LONG_WAIT_COUNT, LOCKTREE_LONG_WAIT_COUNT, STATUS_UINT64, |
||||
"number of long wait locks"); |
||||
LTM_STATUS_INIT(LTM_LONG_WAIT_TIME, LOCKTREE_LONG_WAIT_TIME, STATUS_UINT64, |
||||
"long time waiting for locks"); |
||||
LTM_STATUS_INIT(LTM_TIMEOUT_COUNT, LOCKTREE_TIMEOUT_COUNT, STATUS_UINT64, |
||||
"number of lock timeouts"); |
||||
LTM_STATUS_INIT(LTM_WAIT_ESCALATION_COUNT, LOCKTREE_WAIT_ESCALATION_COUNT, |
||||
STATUS_UINT64, "number of waits on lock escalation"); |
||||
LTM_STATUS_INIT(LTM_WAIT_ESCALATION_TIME, LOCKTREE_WAIT_ESCALATION_TIME, |
||||
STATUS_UINT64, "time waiting on lock escalation"); |
||||
LTM_STATUS_INIT(LTM_LONG_WAIT_ESCALATION_COUNT, |
||||
LOCKTREE_LONG_WAIT_ESCALATION_COUNT, STATUS_UINT64, |
||||
"number of long waits on lock escalation"); |
||||
LTM_STATUS_INIT(LTM_LONG_WAIT_ESCALATION_TIME, |
||||
LOCKTREE_LONG_WAIT_ESCALATION_TIME, STATUS_UINT64, |
||||
"long time waiting on lock escalation"); |
||||
|
||||
m_initialized = true; |
||||
#undef LTM_STATUS_INIT |
||||
} |
||||
void LTM_STATUS_S::destroy() { |
||||
if (!m_initialized) return; |
||||
for (int i = 0; i < LTM_STATUS_NUM_ROWS; ++i) { |
||||
if (status[i].type == STATUS_PARCOUNT) { |
||||
// PORT: TODO?? destroy_partitioned_counter(status[i].value.parcount);
|
||||
} |
||||
} |
||||
} |
||||
|
||||
int toku_keycompare(const void *key1, size_t key1len, const void *key2, |
||||
size_t key2len) { |
||||
size_t comparelen = key1len < key2len ? key1len : key2len; |
||||
int c = memcmp(key1, key2, comparelen); |
||||
if (__builtin_expect(c != 0, 1)) { |
||||
return c; |
||||
} else { |
||||
if (key1len < key2len) { |
||||
return -1; |
||||
} else if (key1len > key2len) { |
||||
return 1; |
||||
} else { |
||||
return 0; |
||||
} |
||||
} |
||||
} |
||||
|
||||
int toku_builtin_compare_fun(const DBT *a, const DBT *b) { |
||||
return toku_keycompare(a->data, a->size, b->data, b->size); |
||||
} |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,153 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include "dbt.h" |
||||
|
||||
#include <string.h> |
||||
|
||||
#include "../db.h" |
||||
#include "../portability/memory.h" |
||||
|
||||
DBT *toku_init_dbt(DBT *dbt) { |
||||
memset(dbt, 0, sizeof(*dbt)); |
||||
return dbt; |
||||
} |
||||
|
||||
DBT toku_empty_dbt(void) { |
||||
static const DBT empty_dbt = {.data = 0, .size = 0, .ulen = 0, .flags = 0}; |
||||
return empty_dbt; |
||||
} |
||||
|
||||
DBT *toku_init_dbt_flags(DBT *dbt, uint32_t flags) { |
||||
toku_init_dbt(dbt); |
||||
dbt->flags = flags; |
||||
return dbt; |
||||
} |
||||
|
||||
void toku_destroy_dbt(DBT *dbt) { |
||||
switch (dbt->flags) { |
||||
case DB_DBT_MALLOC: |
||||
case DB_DBT_REALLOC: |
||||
toku_free(dbt->data); |
||||
toku_init_dbt(dbt); |
||||
break; |
||||
} |
||||
} |
||||
|
||||
DBT *toku_fill_dbt(DBT *dbt, const void *k, size_t len) { |
||||
toku_init_dbt(dbt); |
||||
dbt->size = len; |
||||
dbt->data = (char *)k; |
||||
return dbt; |
||||
} |
||||
|
||||
DBT *toku_memdup_dbt(DBT *dbt, const void *k, size_t len) { |
||||
toku_init_dbt_flags(dbt, DB_DBT_MALLOC); |
||||
dbt->size = len; |
||||
dbt->data = toku_xmemdup(k, len); |
||||
return dbt; |
||||
} |
||||
|
||||
DBT *toku_copyref_dbt(DBT *dst, const DBT src) { |
||||
dst->flags = 0; |
||||
dst->ulen = 0; |
||||
dst->size = src.size; |
||||
dst->data = src.data; |
||||
return dst; |
||||
} |
||||
|
||||
DBT *toku_clone_dbt(DBT *dst, const DBT &src) { |
||||
return toku_memdup_dbt(dst, src.data, src.size); |
||||
} |
||||
|
||||
void toku_sdbt_cleanup(struct simple_dbt *sdbt) { |
||||
if (sdbt->data) toku_free(sdbt->data); |
||||
memset(sdbt, 0, sizeof(*sdbt)); |
||||
} |
||||
|
||||
const DBT *toku_dbt_positive_infinity(void) { |
||||
static DBT positive_infinity_dbt = { |
||||
.data = 0, .size = 0, .ulen = 0, .flags = 0}; // port
|
||||
return &positive_infinity_dbt; |
||||
} |
||||
|
||||
const DBT *toku_dbt_negative_infinity(void) { |
||||
static DBT negative_infinity_dbt = { |
||||
.data = 0, .size = 0, .ulen = 0, .flags = 0}; // port
|
||||
return &negative_infinity_dbt; |
||||
} |
||||
|
||||
bool toku_dbt_is_infinite(const DBT *dbt) { |
||||
return dbt == toku_dbt_positive_infinity() || |
||||
dbt == toku_dbt_negative_infinity(); |
||||
} |
||||
|
||||
bool toku_dbt_is_empty(const DBT *dbt) { |
||||
// can't have a null data field with a non-zero size
|
||||
paranoid_invariant(dbt->data != nullptr || dbt->size == 0); |
||||
return dbt->data == nullptr; |
||||
} |
||||
|
||||
int toku_dbt_infinite_compare(const DBT *a, const DBT *b) { |
||||
if (a == b) { |
||||
return 0; |
||||
} else if (a == toku_dbt_positive_infinity()) { |
||||
return 1; |
||||
} else if (b == toku_dbt_positive_infinity()) { |
||||
return -1; |
||||
} else if (a == toku_dbt_negative_infinity()) { |
||||
return -1; |
||||
} else { |
||||
invariant(b == toku_dbt_negative_infinity()); |
||||
return 1; |
||||
} |
||||
} |
||||
|
||||
bool toku_dbt_equals(const DBT *a, const DBT *b) { |
||||
if (!toku_dbt_is_infinite(a) && !toku_dbt_is_infinite(b)) { |
||||
return a->data == b->data && a->size == b->size; |
||||
} else { |
||||
// a or b is infinite, so they're equal if they are the same infinite
|
||||
return a == b ? true : false; |
||||
} |
||||
} |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,84 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include "../db.h" |
||||
|
||||
// TODO: John
|
||||
// Document this API a little better so that DBT
|
||||
// memory management can be morm widely understood.
|
||||
|
||||
DBT *toku_init_dbt(DBT *); |
||||
|
||||
// returns: an initialized but empty dbt (for which toku_dbt_is_empty() is true)
|
||||
DBT toku_empty_dbt(void); |
||||
|
||||
DBT *toku_init_dbt_flags(DBT *, uint32_t flags); |
||||
|
||||
void toku_destroy_dbt(DBT *); |
||||
|
||||
DBT *toku_fill_dbt(DBT *dbt, const void *k, size_t len); |
||||
|
||||
DBT *toku_memdup_dbt(DBT *dbt, const void *k, size_t len); |
||||
|
||||
DBT *toku_copyref_dbt(DBT *dst, const DBT src); |
||||
|
||||
DBT *toku_clone_dbt(DBT *dst, const DBT &src); |
||||
|
||||
void toku_sdbt_cleanup(struct simple_dbt *sdbt); |
||||
|
||||
// returns: special DBT pointer representing positive infinity
|
||||
const DBT *toku_dbt_positive_infinity(void); |
||||
|
||||
// returns: special DBT pointer representing negative infinity
|
||||
const DBT *toku_dbt_negative_infinity(void); |
||||
|
||||
// returns: true if the given dbt is either positive or negative infinity
|
||||
bool toku_dbt_is_infinite(const DBT *dbt); |
||||
|
||||
// returns: true if the given dbt has no data (ie: dbt->data == nullptr)
|
||||
bool toku_dbt_is_empty(const DBT *dbt); |
||||
|
||||
// effect: compares two potentially infinity-valued dbts
|
||||
// requires: at least one is infinite (assert otherwise)
|
||||
int toku_dbt_infinite_compare(const DBT *a, const DBT *b); |
||||
|
||||
// returns: true if the given dbts have the same data pointer and size
|
||||
bool toku_dbt_equals(const DBT *a, const DBT *b); |
@ -0,0 +1,143 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <memory.h> |
||||
|
||||
//******************************************************************************
|
||||
//
|
||||
// Overview: A growable array is a little bit like std::vector except that
|
||||
// it doesn't have constructors (hence can be used in static constructs, since
|
||||
// the google style guide says no constructors), and it's a little simpler.
|
||||
// Operations:
|
||||
// init and deinit (we don't have constructors and destructors).
|
||||
// fetch_unchecked to get values out.
|
||||
// store_unchecked to put values in.
|
||||
// push to add an element at the end
|
||||
// get_size to find out the size
|
||||
// get_memory_size to find out how much memory the data stucture is using.
|
||||
//
|
||||
//******************************************************************************
|
||||
|
||||
namespace toku { |
||||
|
||||
template <typename T> |
||||
class GrowableArray { |
||||
public: |
||||
void init(void) |
||||
// Effect: Initialize the array to contain no elements.
|
||||
{ |
||||
m_array = NULL; |
||||
m_size = 0; |
||||
m_size_limit = 0; |
||||
} |
||||
|
||||
void deinit(void) |
||||
// Effect: Deinitialize the array (freeing any memory it uses, for example).
|
||||
{ |
||||
toku_free(m_array); |
||||
m_array = NULL; |
||||
m_size = 0; |
||||
m_size_limit = 0; |
||||
} |
||||
|
||||
T fetch_unchecked(size_t i) const |
||||
// Effect: Fetch the ith element. If i is out of range, the system asserts.
|
||||
{ |
||||
return m_array[i]; |
||||
} |
||||
|
||||
void store_unchecked(size_t i, T v) |
||||
// Effect: Store v in the ith element. If i is out of range, the system
|
||||
// asserts.
|
||||
{ |
||||
paranoid_invariant(i < m_size); |
||||
m_array[i] = v; |
||||
} |
||||
|
||||
void push(T v) |
||||
// Effect: Add v to the end of the array (increasing the size). The amortized
|
||||
// cost of this operation is constant. Implementation hint: Double the size
|
||||
// of the array when it gets too big so that the amortized cost stays
|
||||
// constant.
|
||||
{ |
||||
if (m_size >= m_size_limit) { |
||||
if (m_array == NULL) { |
||||
m_size_limit = 1; |
||||
} else { |
||||
m_size_limit *= 2; |
||||
} |
||||
XREALLOC_N(m_size_limit, m_array); |
||||
} |
||||
m_array[m_size++] = v; |
||||
} |
||||
|
||||
size_t get_size(void) const |
||||
// Effect: Return the number of elements in the array.
|
||||
{ |
||||
return m_size; |
||||
} |
||||
size_t memory_size(void) const |
||||
// Effect: Return the size (in bytes) that the array occupies in memory. This
|
||||
// is really only an estimate.
|
||||
{ |
||||
return sizeof(*this) + sizeof(T) * m_size_limit; |
||||
} |
||||
|
||||
private: |
||||
T *m_array; |
||||
size_t m_size; |
||||
size_t m_size_limit; // How much space is allocated in array.
|
||||
}; |
||||
|
||||
} // namespace toku
|
@ -0,0 +1,187 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ifndef ROCKSDB_LITE |
||||
#ifndef OS_WIN |
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#include "memarena.h" |
||||
|
||||
#include <string.h> |
||||
|
||||
#include <algorithm> |
||||
|
||||
#include "../portability/memory.h" |
||||
|
||||
void memarena::create(size_t initial_size) { |
||||
_current_chunk = arena_chunk(); |
||||
_other_chunks = nullptr; |
||||
_size_of_other_chunks = 0; |
||||
_footprint_of_other_chunks = 0; |
||||
_n_other_chunks = 0; |
||||
|
||||
_current_chunk.size = initial_size; |
||||
if (_current_chunk.size > 0) { |
||||
XMALLOC_N(_current_chunk.size, _current_chunk.buf); |
||||
} |
||||
} |
||||
|
||||
void memarena::destroy(void) { |
||||
if (_current_chunk.buf) { |
||||
toku_free(_current_chunk.buf); |
||||
} |
||||
for (int i = 0; i < _n_other_chunks; i++) { |
||||
toku_free(_other_chunks[i].buf); |
||||
} |
||||
if (_other_chunks) { |
||||
toku_free(_other_chunks); |
||||
} |
||||
_current_chunk = arena_chunk(); |
||||
_other_chunks = nullptr; |
||||
_n_other_chunks = 0; |
||||
} |
||||
|
||||
static size_t round_to_page(size_t size) { |
||||
const size_t page_size = 4096; |
||||
const size_t r = page_size + ((size - 1) & ~(page_size - 1)); |
||||
assert((r & (page_size - 1)) == 0); // make sure it's aligned
|
||||
assert(r >= size); // make sure it's not too small
|
||||
assert(r < |
||||
size + page_size); // make sure we didn't grow by more than a page.
|
||||
return r; |
||||
} |
||||
|
||||
static const size_t MEMARENA_MAX_CHUNK_SIZE = 64 * 1024 * 1024; |
||||
|
||||
void *memarena::malloc_from_arena(size_t size) { |
||||
if (_current_chunk.buf == nullptr || |
||||
_current_chunk.size < _current_chunk.used + size) { |
||||
// The existing block isn't big enough.
|
||||
// Add the block to the vector of blocks.
|
||||
if (_current_chunk.buf) { |
||||
invariant(_current_chunk.size > 0); |
||||
int old_n = _n_other_chunks; |
||||
XREALLOC_N(old_n + 1, _other_chunks); |
||||
_other_chunks[old_n] = _current_chunk; |
||||
_n_other_chunks = old_n + 1; |
||||
_size_of_other_chunks += _current_chunk.size; |
||||
_footprint_of_other_chunks += |
||||
toku_memory_footprint(_current_chunk.buf, _current_chunk.used); |
||||
} |
||||
|
||||
// Make a new one. Grow the buffer size exponentially until we hit
|
||||
// the max chunk size, but make it at least `size' bytes so the
|
||||
// current allocation always fit.
|
||||
size_t new_size = |
||||
std::min(MEMARENA_MAX_CHUNK_SIZE, 2 * _current_chunk.size); |
||||
if (new_size < size) { |
||||
new_size = size; |
||||
} |
||||
new_size = round_to_page( |
||||
new_size); // at least size, but round to the next page size
|
||||
XMALLOC_N(new_size, _current_chunk.buf); |
||||
_current_chunk.used = 0; |
||||
_current_chunk.size = new_size; |
||||
} |
||||
invariant(_current_chunk.buf != nullptr); |
||||
|
||||
// allocate in the existing block.
|
||||
char *p = _current_chunk.buf + _current_chunk.used; |
||||
_current_chunk.used += size; |
||||
return p; |
||||
} |
||||
|
||||
void memarena::move_memory(memarena *dest) { |
||||
// Move memory to dest
|
||||
XREALLOC_N(dest->_n_other_chunks + _n_other_chunks + 1, dest->_other_chunks); |
||||
dest->_size_of_other_chunks += _size_of_other_chunks + _current_chunk.size; |
||||
dest->_footprint_of_other_chunks += |
||||
_footprint_of_other_chunks + |
||||
toku_memory_footprint(_current_chunk.buf, _current_chunk.used); |
||||
for (int i = 0; i < _n_other_chunks; i++) { |
||||
dest->_other_chunks[dest->_n_other_chunks++] = _other_chunks[i]; |
||||
} |
||||
dest->_other_chunks[dest->_n_other_chunks++] = _current_chunk; |
||||
|
||||
// Clear out this memarena's memory
|
||||
toku_free(_other_chunks); |
||||
_current_chunk = arena_chunk(); |
||||
_other_chunks = nullptr; |
||||
_size_of_other_chunks = 0; |
||||
_footprint_of_other_chunks = 0; |
||||
_n_other_chunks = 0; |
||||
} |
||||
|
||||
size_t memarena::total_memory_size(void) const { |
||||
return sizeof(*this) + total_size_in_use() + |
||||
_n_other_chunks * sizeof(*_other_chunks); |
||||
} |
||||
|
||||
size_t memarena::total_size_in_use(void) const { |
||||
return _size_of_other_chunks + _current_chunk.used; |
||||
} |
||||
|
||||
size_t memarena::total_footprint(void) const { |
||||
return sizeof(*this) + _footprint_of_other_chunks + |
||||
toku_memory_footprint(_current_chunk.buf, _current_chunk.used) + |
||||
_n_other_chunks * sizeof(*_other_chunks); |
||||
} |
||||
|
||||
////////////////////////////////////////////////////////////////////////////////
|
||||
|
||||
const void *memarena::chunk_iterator::current(size_t *used) const { |
||||
if (_chunk_idx < 0) { |
||||
*used = _ma->_current_chunk.used; |
||||
return _ma->_current_chunk.buf; |
||||
} else if (_chunk_idx < _ma->_n_other_chunks) { |
||||
*used = _ma->_other_chunks[_chunk_idx].used; |
||||
return _ma->_other_chunks[_chunk_idx].buf; |
||||
} |
||||
*used = 0; |
||||
return nullptr; |
||||
} |
||||
|
||||
void memarena::chunk_iterator::next() { _chunk_idx++; } |
||||
|
||||
bool memarena::chunk_iterator::more() const { |
||||
if (_chunk_idx < 0) { |
||||
return _ma->_current_chunk.buf != nullptr; |
||||
} |
||||
return _chunk_idx < _ma->_n_other_chunks; |
||||
} |
||||
#endif // OS_WIN
|
||||
#endif // ROCKSDB_LITE
|
@ -0,0 +1,127 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <stdlib.h> |
||||
|
||||
/*
|
||||
* A memarena is used to efficiently store a collection of objects that never |
||||
* move The pattern is allocate more and more stuff and free all of the items at |
||||
* once. The underlying memory will store 1 or more objects per chunk. Each |
||||
* chunk is contiguously laid out in memory but chunks are not necessarily |
||||
* contiguous with each other. |
||||
*/ |
||||
class memarena { |
||||
public: |
||||
memarena() |
||||
: _current_chunk(arena_chunk()), |
||||
_other_chunks(nullptr), |
||||
_n_other_chunks(0), |
||||
_size_of_other_chunks(0), |
||||
_footprint_of_other_chunks(0) {} |
||||
|
||||
// Effect: Create a memarena with the specified initial size
|
||||
void create(size_t initial_size); |
||||
|
||||
void destroy(void); |
||||
|
||||
// Effect: Allocate some memory. The returned value remains valid until the
|
||||
// memarena is cleared or closed.
|
||||
// In case of ENOMEM, aborts.
|
||||
void *malloc_from_arena(size_t size); |
||||
|
||||
// Effect: Move all the memory from this memarena into DEST.
|
||||
// When SOURCE is closed the memory won't be freed.
|
||||
// When DEST is closed, the memory will be freed, unless DEST moves
|
||||
// its memory to another memarena...
|
||||
void move_memory(memarena *dest); |
||||
|
||||
// Effect: Calculate the amount of memory used by a memory arena.
|
||||
size_t total_memory_size(void) const; |
||||
|
||||
// Effect: Calculate the used space of the memory arena (ie: excludes unused
|
||||
// space)
|
||||
size_t total_size_in_use(void) const; |
||||
|
||||
// Effect: Calculate the amount of memory used, according to
|
||||
// toku_memory_footprint(),
|
||||
// which is a more expensive but more accurate count of memory used.
|
||||
size_t total_footprint(void) const; |
||||
|
||||
// iterator over the underlying chunks that store objects in the memarena.
|
||||
// a chunk is represented by a pointer to const memory and a usable byte
|
||||
// count.
|
||||
class chunk_iterator { |
||||
public: |
||||
chunk_iterator(const memarena *ma) : _ma(ma), _chunk_idx(-1) {} |
||||
|
||||
// returns: base pointer to the current chunk
|
||||
// *used set to the number of usable bytes
|
||||
// if more() is false, returns nullptr and *used = 0
|
||||
const void *current(size_t *used) const; |
||||
|
||||
// requires: more() is true
|
||||
void next(); |
||||
|
||||
bool more() const; |
||||
|
||||
private: |
||||
// -1 represents the 'initial' chunk in a memarena, ie: ma->_current_chunk
|
||||
// >= 0 represents the i'th chunk in the ma->_other_chunks array
|
||||
const memarena *_ma; |
||||
int _chunk_idx; |
||||
}; |
||||
|
||||
private: |
||||
struct arena_chunk { |
||||
arena_chunk() : buf(nullptr), used(0), size(0) {} |
||||
char *buf; |
||||
size_t used; |
||||
size_t size; |
||||
}; |
||||
|
||||
struct arena_chunk _current_chunk; |
||||
struct arena_chunk *_other_chunks; |
||||
int _n_other_chunks; |
||||
size_t _size_of_other_chunks; // the buf_size of all the other chunks.
|
||||
size_t _footprint_of_other_chunks; // the footprint of all the other chunks.
|
||||
|
||||
friend class memarena_unit_test; |
||||
}; |
@ -0,0 +1,793 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
Licensed under the Apache License, Version 2.0 (the "License"); |
||||
you may not use this file except in compliance with the License. |
||||
You may obtain a copy of the License at |
||||
|
||||
http://www.apache.org/licenses/LICENSE-2.0
|
||||
|
||||
Unless required by applicable law or agreed to in writing, software |
||||
distributed under the License is distributed on an "AS IS" BASIS, |
||||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. |
||||
See the License for the specific language governing permissions and |
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include <memory.h> |
||||
#include <stdint.h> |
||||
|
||||
#include "../portability/toku_portability.h" |
||||
#include "../portability/toku_race_tools.h" |
||||
#include "growable_array.h" |
||||
|
||||
namespace toku { |
||||
|
||||
/**
|
||||
* Order Maintenance Tree (OMT) |
||||
* |
||||
* Maintains a collection of totally ordered values, where each value has an |
||||
* integer weight. The OMT is a mutable datatype. |
||||
* |
||||
* The Abstraction: |
||||
* |
||||
* An OMT is a vector of values, $V$, where $|V|$ is the length of the vector. |
||||
* The vector is numbered from $0$ to $|V|-1$. |
||||
* Each value has a weight. The weight of the $i$th element is denoted |
||||
* $w(V_i)$. |
||||
* |
||||
* We can create a new OMT, which is the empty vector. |
||||
* |
||||
* We can insert a new element $x$ into slot $i$, changing $V$ into $V'$ where |
||||
* $|V'|=1+|V|$ and |
||||
* |
||||
* V'_j = V_j if $j<i$ |
||||
* x if $j=i$ |
||||
* V_{j-1} if $j>i$. |
||||
* |
||||
* We can specify $i$ using a kind of function instead of as an integer. |
||||
* Let $b$ be a function mapping from values to nonzero integers, such that |
||||
* the signum of $b$ is monotically increasing. |
||||
* We can specify $i$ as the minimum integer such that $b(V_i)>0$. |
||||
* |
||||
* We look up a value using its index, or using a Heaviside function. |
||||
* For lookups, we allow $b$ to be zero for some values, and again the signum of |
||||
* $b$ must be monotonically increasing. When lookup up values, we can look up |
||||
* $V_i$ where $i$ is the minimum integer such that $b(V_i)=0$. (With a |
||||
* special return code if no such value exists.) (Rationale: Ordinarily we want |
||||
* $i$ to be unique. But for various reasons we want to allow multiple zeros, |
||||
* and we want the smallest $i$ in that case.) $V_i$ where $i$ is the minimum |
||||
* integer such that $b(V_i)>0$. (Or an indication that no such value exists.) |
||||
* $V_i$ where $i$ is the maximum integer such that $b(V_i)<0$. (Or an |
||||
* indication that no such value exists.) |
||||
* |
||||
* When looking up a value using a Heaviside function, we get the value and its |
||||
* index. |
||||
* |
||||
* We can also split an OMT into two OMTs, splitting the weight of the values |
||||
* evenly. Find a value $j$ such that the values to the left of $j$ have about |
||||
* the same total weight as the values to the right of $j$. The resulting two |
||||
* OMTs contain the values to the left of $j$ and the values to the right of $j$ |
||||
* respectively. All of the values from the original OMT go into one of the new |
||||
* OMTs. If the weights of the values don't split exactly evenly, then the |
||||
* implementation has the freedom to choose whether the new left OMT or the new |
||||
* right OMT is larger. |
||||
* |
||||
* Performance: |
||||
* Insertion and deletion should run with $O(\log |V|)$ time and $O(\log |V|)$ |
||||
* calls to the Heaviside function. The memory required is O(|V|). |
||||
* |
||||
* Usage: |
||||
* The omt is templated by two parameters: |
||||
* - omtdata_t is what will be stored within the omt. These could be pointers |
||||
* or real data types (ints, structs). |
||||
* - omtdataout_t is what will be returned by find and related functions. By |
||||
* default, it is the same as omtdata_t, but you can set it to (omtdata_t *). To |
||||
* create an omt which will store "TXNID"s, for example, it is a good idea to |
||||
* typedef the template: typedef omt<TXNID> txnid_omt_t; If you are storing |
||||
* structs, you may want to be able to get a pointer to the data actually stored |
||||
* in the omt (see find_zero). To do this, use the second template parameter: |
||||
* typedef omt<struct foo, struct foo *> foo_omt_t; |
||||
*/ |
||||
|
||||
namespace omt_internal { |
||||
|
||||
template <bool subtree_supports_marks> |
||||
class subtree_templated { |
||||
private: |
||||
uint32_t m_index; |
||||
|
||||
public: |
||||
static const uint32_t NODE_NULL = UINT32_MAX; |
||||
inline void set_to_null(void) { m_index = NODE_NULL; } |
||||
|
||||
inline bool is_null(void) const { return NODE_NULL == this->get_index(); } |
||||
|
||||
inline uint32_t get_index(void) const { return m_index; } |
||||
|
||||
inline void set_index(uint32_t index) { |
||||
paranoid_invariant(index != NODE_NULL); |
||||
m_index = index; |
||||
} |
||||
} __attribute__((__packed__, aligned(4))); |
||||
|
||||
template <> |
||||
class subtree_templated<true> { |
||||
private: |
||||
uint32_t m_bitfield; |
||||
static const uint32_t MASK_INDEX = ~(((uint32_t)1) << 31); |
||||
static const uint32_t MASK_BIT = ((uint32_t)1) << 31; |
||||
|
||||
inline void set_index_internal(uint32_t new_index) { |
||||
m_bitfield = (m_bitfield & MASK_BIT) | new_index; |
||||
} |
||||
|
||||
public: |
||||
static const uint32_t NODE_NULL = INT32_MAX; |
||||
inline void set_to_null(void) { this->set_index_internal(NODE_NULL); } |
||||
|
||||
inline bool is_null(void) const { return NODE_NULL == this->get_index(); } |
||||
|
||||
inline uint32_t get_index(void) const { |
||||
TOKU_DRD_IGNORE_VAR(m_bitfield); |
||||
const uint32_t bits = m_bitfield; |
||||
TOKU_DRD_STOP_IGNORING_VAR(m_bitfield); |
||||
return bits & MASK_INDEX; |
||||
} |
||||
|
||||
inline void set_index(uint32_t index) { |
||||
paranoid_invariant(index < NODE_NULL); |
||||
this->set_index_internal(index); |
||||
} |
||||
|
||||
inline bool get_bit(void) const { |
||||
TOKU_DRD_IGNORE_VAR(m_bitfield); |
||||
const uint32_t bits = m_bitfield; |
||||
TOKU_DRD_STOP_IGNORING_VAR(m_bitfield); |
||||
return (bits & MASK_BIT) != 0; |
||||
} |
||||
|
||||
inline void enable_bit(void) { |
||||
// These bits may be set by a thread with a write lock on some
|
||||
// leaf, and the index can be read by another thread with a (read
|
||||
// or write) lock on another thread. Also, the has_marks_below
|
||||
// bit can be set by two threads simultaneously. Neither of these
|
||||
// are real races, so if we are using DRD we should tell it to
|
||||
// ignore these bits just while we set this bit. If there were a
|
||||
// race in setting the index, that would be a real race.
|
||||
TOKU_DRD_IGNORE_VAR(m_bitfield); |
||||
m_bitfield |= MASK_BIT; |
||||
TOKU_DRD_STOP_IGNORING_VAR(m_bitfield); |
||||
} |
||||
|
||||
inline void disable_bit(void) { m_bitfield &= MASK_INDEX; } |
||||
} __attribute__((__packed__)); |
||||
|
||||
template <typename omtdata_t, bool subtree_supports_marks> |
||||
class omt_node_templated { |
||||
public: |
||||
omtdata_t value; |
||||
uint32_t weight; |
||||
subtree_templated<subtree_supports_marks> left; |
||||
subtree_templated<subtree_supports_marks> right; |
||||
|
||||
// this needs to be in both implementations because we don't have
|
||||
// a "static if" the caller can use
|
||||
inline void clear_stolen_bits(void) {} |
||||
}; // note: originally this class had __attribute__((__packed__, aligned(4)))
|
||||
|
||||
template <typename omtdata_t> |
||||
class omt_node_templated<omtdata_t, true> { |
||||
public: |
||||
omtdata_t value; |
||||
uint32_t weight; |
||||
subtree_templated<true> left; |
||||
subtree_templated<true> right; |
||||
inline bool get_marked(void) const { return left.get_bit(); } |
||||
inline void set_marked_bit(void) { return left.enable_bit(); } |
||||
inline void unset_marked_bit(void) { return left.disable_bit(); } |
||||
|
||||
inline bool get_marks_below(void) const { return right.get_bit(); } |
||||
inline void set_marks_below_bit(void) { |
||||
// This function can be called by multiple threads.
|
||||
// Checking first reduces cache invalidation.
|
||||
if (!this->get_marks_below()) { |
||||
right.enable_bit(); |
||||
} |
||||
} |
||||
inline void unset_marks_below_bit(void) { right.disable_bit(); } |
||||
|
||||
inline void clear_stolen_bits(void) { |
||||
this->unset_marked_bit(); |
||||
this->unset_marks_below_bit(); |
||||
} |
||||
}; // note: originally this class had __attribute__((__packed__, aligned(4)))
|
||||
|
||||
} // namespace omt_internal
|
||||
|
||||
template <typename omtdata_t, typename omtdataout_t = omtdata_t, |
||||
bool supports_marks = false> |
||||
class omt { |
||||
public: |
||||
/**
|
||||
* Effect: Create an empty OMT. |
||||
* Performance: constant time. |
||||
*/ |
||||
void create(void); |
||||
|
||||
/**
|
||||
* Effect: Create an empty OMT with no internal allocated space. |
||||
* Performance: constant time. |
||||
* Rationale: In some cases we need a valid omt but don't want to malloc. |
||||
*/ |
||||
void create_no_array(void); |
||||
|
||||
/**
|
||||
* Effect: Create a OMT containing values. The number of values is in |
||||
* numvalues. Stores the new OMT in *omtp. Requires: this has not been created |
||||
* yet Requires: values != NULL Requires: values is sorted Performance: |
||||
* time=O(numvalues) Rationale: Normally to insert N values takes O(N lg N) |
||||
* amortized time. If the N values are known in advance, are sorted, and the |
||||
* structure is empty, we can batch insert them much faster. |
||||
*/ |
||||
__attribute__((nonnull)) void create_from_sorted_array( |
||||
const omtdata_t *const values, const uint32_t numvalues); |
||||
|
||||
/**
|
||||
* Effect: Create an OMT containing values. The number of values is in |
||||
* numvalues. On success the OMT takes ownership of *values array, and sets |
||||
* values=NULL. Requires: this has not been created yet Requires: values != |
||||
* NULL Requires: *values is sorted Requires: *values was allocated with |
||||
* toku_malloc Requires: Capacity of the *values array is <= new_capacity |
||||
* Requires: On success, *values may not be accessed again by the caller. |
||||
* Performance: time=O(1) |
||||
* Rational: create_from_sorted_array takes O(numvalues) time. |
||||
* By taking ownership of the array, we save a malloc and |
||||
* memcpy, and possibly a free (if the caller is done with the array). |
||||
*/ |
||||
void create_steal_sorted_array(omtdata_t **const values, |
||||
const uint32_t numvalues, |
||||
const uint32_t new_capacity); |
||||
|
||||
/**
|
||||
* Effect: Create a new OMT, storing it in *newomt. |
||||
* The values to the right of index (starting at index) are moved to *newomt. |
||||
* Requires: newomt != NULL |
||||
* Returns |
||||
* 0 success, |
||||
* EINVAL if index > toku_omt_size(omt) |
||||
* On nonzero return, omt and *newomt are unmodified. |
||||
* Performance: time=O(n) |
||||
* Rationale: We don't need a split-evenly operation. We need to split items |
||||
* so that their total sizes are even, and other similar splitting criteria. |
||||
* It's easy to split evenly by calling size(), and dividing by two. |
||||
*/ |
||||
__attribute__((nonnull)) int split_at(omt *const newomt, const uint32_t idx); |
||||
|
||||
/**
|
||||
* Effect: Appends leftomt and rightomt to produce a new omt. |
||||
* Creates this as the new omt. |
||||
* leftomt and rightomt are destroyed. |
||||
* Performance: time=O(n) is acceptable, but one can imagine implementations |
||||
* that are O(\log n) worst-case. |
||||
*/ |
||||
__attribute__((nonnull)) void merge(omt *const leftomt, omt *const rightomt); |
||||
|
||||
/**
|
||||
* Effect: Creates a copy of an omt. |
||||
* Creates this as the clone. |
||||
* Each element is copied directly. If they are pointers, the underlying |
||||
* data is not duplicated. Performance: O(n) or the running time of |
||||
* fill_array_with_subtree_values() |
||||
*/ |
||||
void clone(const omt &src); |
||||
|
||||
/**
|
||||
* Effect: Set the tree to be empty. |
||||
* Note: Will not reallocate or resize any memory. |
||||
* Performance: time=O(1) |
||||
*/ |
||||
void clear(void); |
||||
|
||||
/**
|
||||
* Effect: Destroy an OMT, freeing all its memory. |
||||
* If the values being stored are pointers, their underlying data is not |
||||
* freed. See free_items() Those values may be freed before or after calling |
||||
* toku_omt_destroy. Rationale: Returns no values since free() cannot fail. |
||||
* Rationale: Does not free the underlying pointers to reduce complexity. |
||||
* Performance: time=O(1) |
||||
*/ |
||||
void destroy(void); |
||||
|
||||
/**
|
||||
* Effect: return |this|. |
||||
* Performance: time=O(1) |
||||
*/ |
||||
uint32_t size(void) const; |
||||
|
||||
/**
|
||||
* Effect: Insert value into the OMT. |
||||
* If there is some i such that $h(V_i, v)=0$ then returns DB_KEYEXIST. |
||||
* Otherwise, let i be the minimum value such that $h(V_i, v)>0$. |
||||
* If no such i exists, then let i be |V| |
||||
* Then this has the same effect as |
||||
* insert_at(tree, value, i); |
||||
* If idx!=NULL then i is stored in *idx |
||||
* Requires: The signum of h must be monotonically increasing. |
||||
* Returns: |
||||
* 0 success |
||||
* DB_KEYEXIST the key is present (h was equal to zero for some value) |
||||
* On nonzero return, omt is unchanged. |
||||
* Performance: time=O(\log N) amortized. |
||||
* Rationale: Some future implementation may be O(\log N) worst-case time, but |
||||
* O(\log N) amortized is good enough for now. |
||||
*/ |
||||
template <typename omtcmp_t, int (*h)(const omtdata_t &, const omtcmp_t &)> |
||||
int insert(const omtdata_t &value, const omtcmp_t &v, uint32_t *const idx); |
||||
|
||||
/**
|
||||
* Effect: Increases indexes of all items at slot >= idx by 1. |
||||
* Insert value into the position at idx. |
||||
* Returns: |
||||
* 0 success |
||||
* EINVAL if idx > this->size() |
||||
* On error, omt is unchanged. |
||||
* Performance: time=O(\log N) amortized time. |
||||
* Rationale: Some future implementation may be O(\log N) worst-case time, but |
||||
* O(\log N) amortized is good enough for now. |
||||
*/ |
||||
int insert_at(const omtdata_t &value, const uint32_t idx); |
||||
|
||||
/**
|
||||
* Effect: Replaces the item at idx with value. |
||||
* Returns: |
||||
* 0 success |
||||
* EINVAL if idx>=this->size() |
||||
* On error, omt is unchanged. |
||||
* Performance: time=O(\log N) |
||||
* Rationale: The FT needs to be able to replace a value with another copy of |
||||
* the same value (allocated in a different location) |
||||
* |
||||
*/ |
||||
int set_at(const omtdata_t &value, const uint32_t idx); |
||||
|
||||
/**
|
||||
* Effect: Delete the item in slot idx. |
||||
* Decreases indexes of all items at slot > idx by 1. |
||||
* Returns |
||||
* 0 success |
||||
* EINVAL if idx>=this->size() |
||||
* On error, omt is unchanged. |
||||
* Rationale: To delete an item, first find its index using find or find_zero, |
||||
* then delete it. Performance: time=O(\log N) amortized. |
||||
*/ |
||||
int delete_at(const uint32_t idx); |
||||
|
||||
/**
|
||||
* Effect: Iterate over the values of the omt, from left to right, calling f |
||||
* on each value. The first argument passed to f is a ref-to-const of the |
||||
* value stored in the omt. The second argument passed to f is the index of |
||||
* the value. The third argument passed to f is iterate_extra. The indices run |
||||
* from 0 (inclusive) to this->size() (exclusive). Requires: f != NULL |
||||
* Returns: |
||||
* If f ever returns nonzero, then the iteration stops, and the value |
||||
* returned by f is returned by iterate. If f always returns zero, then |
||||
* iterate returns 0. Requires: Don't modify the omt while running. (E.g., f |
||||
* may not insert or delete values from the omt.) Performance: time=O(i+\log |
||||
* N) where i is the number of times f is called, and N is the number of |
||||
* elements in the omt. Rationale: Although the functional iterator requires |
||||
* defining another function (as opposed to C++ style iterator), it is much |
||||
* easier to read. Rationale: We may at some point use functors, but for now |
||||
* this is a smaller change from the old OMT. |
||||
*/ |
||||
template <typename iterate_extra_t, |
||||
int (*f)(const omtdata_t &, const uint32_t, iterate_extra_t *const)> |
||||
int iterate(iterate_extra_t *const iterate_extra) const; |
||||
|
||||
/**
|
||||
* Effect: Iterate over the values of the omt, from left to right, calling f |
||||
* on each value. The first argument passed to f is a ref-to-const of the |
||||
* value stored in the omt. The second argument passed to f is the index of |
||||
* the value. The third argument passed to f is iterate_extra. The indices run |
||||
* from 0 (inclusive) to this->size() (exclusive). We will iterate only over |
||||
* [left,right) |
||||
* |
||||
* Requires: left <= right |
||||
* Requires: f != NULL |
||||
* Returns: |
||||
* EINVAL if right > this->size() |
||||
* If f ever returns nonzero, then the iteration stops, and the value |
||||
* returned by f is returned by iterate_on_range. If f always returns zero, |
||||
* then iterate_on_range returns 0. Requires: Don't modify the omt while |
||||
* running. (E.g., f may not insert or delete values from the omt.) |
||||
* Performance: time=O(i+\log N) where i is the number of times f is called, |
||||
* and N is the number of elements in the omt. Rational: Although the |
||||
* functional iterator requires defining another function (as opposed to C++ |
||||
* style iterator), it is much easier to read. |
||||
*/ |
||||
template <typename iterate_extra_t, |
||||
int (*f)(const omtdata_t &, const uint32_t, iterate_extra_t *const)> |
||||
int iterate_on_range(const uint32_t left, const uint32_t right, |
||||
iterate_extra_t *const iterate_extra) const; |
||||
|
||||
/**
|
||||
* Effect: Iterate over the values of the omt, and mark the nodes that are |
||||
* visited. Other than the marks, this behaves the same as iterate_on_range. |
||||
* Requires: supports_marks == true |
||||
* Performance: time=O(i+\log N) where i is the number of times f is called, |
||||
* and N is the number of elements in the omt. Notes: This function MAY be |
||||
* called concurrently by multiple threads, but not concurrently with any |
||||
* other non-const function. |
||||
*/ |
||||
template <typename iterate_extra_t, |
||||
int (*f)(const omtdata_t &, const uint32_t, iterate_extra_t *const)> |
||||
int iterate_and_mark_range(const uint32_t left, const uint32_t right, |
||||
iterate_extra_t *const iterate_extra); |
||||
|
||||
/**
|
||||
* Effect: Iterate over the values of the omt, from left to right, calling f |
||||
* on each value whose node has been marked. Other than the marks, this |
||||
* behaves the same as iterate. Requires: supports_marks == true Performance: |
||||
* time=O(i+\log N) where i is the number of times f is called, and N is the |
||||
* number of elements in the omt. |
||||
*/ |
||||
template <typename iterate_extra_t, |
||||
int (*f)(const omtdata_t &, const uint32_t, iterate_extra_t *const)> |
||||
int iterate_over_marked(iterate_extra_t *const iterate_extra) const; |
||||
|
||||
/**
|
||||
* Effect: Delete all elements from the omt, whose nodes have been marked. |
||||
* Requires: supports_marks == true |
||||
* Performance: time=O(N + i\log N) where i is the number of marked elements, |
||||
* {c,sh}ould be faster |
||||
*/ |
||||
void delete_all_marked(void); |
||||
|
||||
/**
|
||||
* Effect: Verify that the internal state of the marks in the tree are |
||||
* self-consistent. Crashes the system if the marks are in a bad state. |
||||
* Requires: supports_marks == true |
||||
* Performance: time=O(N) |
||||
* Notes: |
||||
* Even though this is a const function, it requires exclusive access. |
||||
* Rationale: |
||||
* The current implementation of the marks relies on a sort of |
||||
* "cache" bit representing the state of bits below it in the tree. |
||||
* This allows glass-box testing that these bits are correct. |
||||
*/ |
||||
void verify_marks_consistent(void) const; |
||||
|
||||
/**
|
||||
* Effect: None |
||||
* Returns whether there are any marks in the tree. |
||||
*/ |
||||
bool has_marks(void) const; |
||||
|
||||
/**
|
||||
* Effect: Iterate over the values of the omt, from left to right, calling f |
||||
* on each value. The first argument passed to f is a pointer to the value |
||||
* stored in the omt. The second argument passed to f is the index of the |
||||
* value. The third argument passed to f is iterate_extra. The indices run |
||||
* from 0 (inclusive) to this->size() (exclusive). Requires: same as for |
||||
* iterate() Returns: same as for iterate() Performance: same as for iterate() |
||||
* Rationale: In general, most iterators should use iterate() since they |
||||
* should not modify the data stored in the omt. This function is for |
||||
* iterators which need to modify values (for example, free_items). Rationale: |
||||
* We assume if you are transforming the data in place, you want to do it to |
||||
* everything at once, so there is not yet an iterate_on_range_ptr (but there |
||||
* could be). |
||||
*/ |
||||
template <typename iterate_extra_t, |
||||
int (*f)(omtdata_t *, const uint32_t, iterate_extra_t *const)> |
||||
void iterate_ptr(iterate_extra_t *const iterate_extra); |
||||
|
||||
/**
|
||||
* Effect: Set *value=V_idx |
||||
* Returns |
||||
* 0 success |
||||
* EINVAL if index>=toku_omt_size(omt) |
||||
* On nonzero return, *value is unchanged |
||||
* Performance: time=O(\log N) |
||||
*/ |
||||
int fetch(const uint32_t idx, omtdataout_t *const value) const; |
||||
|
||||
/**
|
||||
* Effect: Find the smallest i such that h(V_i, extra)>=0 |
||||
* If there is such an i and h(V_i,extra)==0 then set *idxp=i, set *value = |
||||
* V_i, and return 0. If there is such an i and h(V_i,extra)>0 then set |
||||
* *idxp=i and return DB_NOTFOUND. If there is no such i then set |
||||
* *idx=this->size() and return DB_NOTFOUND. Note: value is of type |
||||
* omtdataout_t, which may be of type (omtdata_t) or (omtdata_t *) but is |
||||
* fixed by the instantiation. If it is the value type, then the value is |
||||
* copied out (even if the value type is a pointer to something else) If it is |
||||
* the pointer type, then *value is set to a pointer to the data within the |
||||
* omt. This is determined by the type of the omt as initially declared. If |
||||
* the omt is declared as omt<foo_t>, then foo_t's will be stored and foo_t's |
||||
* will be returned by find and related functions. If the omt is declared as |
||||
* omt<foo_t, foo_t *>, then foo_t's will be stored, and pointers to the |
||||
* stored items will be returned by find and related functions. Rationale: |
||||
* Structs too small for malloc should be stored directly in the omt. |
||||
* These structs may need to be edited as they exist inside the omt, so we |
||||
* need a way to get a pointer within the omt. Using separate functions for |
||||
* returning pointers and values increases code duplication and reduces |
||||
* type-checking. That also reduces the ability of the creator of a data |
||||
* structure to give advice to its future users. Slight overloading in this |
||||
* case seemed to provide a better API and better type checking. |
||||
*/ |
||||
template <typename omtcmp_t, int (*h)(const omtdata_t &, const omtcmp_t &)> |
||||
int find_zero(const omtcmp_t &extra, omtdataout_t *const value, |
||||
uint32_t *const idxp) const; |
||||
|
||||
/**
|
||||
* Effect: |
||||
* If direction >0 then find the smallest i such that h(V_i,extra)>0. |
||||
* If direction <0 then find the largest i such that h(V_i,extra)<0. |
||||
* (Direction may not be equal to zero.) |
||||
* If value!=NULL then store V_i in *value |
||||
* If idxp!=NULL then store i in *idxp. |
||||
* Requires: The signum of h is monotically increasing. |
||||
* Returns |
||||
* 0 success |
||||
* DB_NOTFOUND no such value is found. |
||||
* On nonzero return, *value and *idxp are unchanged |
||||
* Performance: time=O(\log N) |
||||
* Rationale: |
||||
* Here's how to use the find function to find various things |
||||
* Cases for find: |
||||
* find first value: ( h(v)=+1, direction=+1 ) |
||||
* find last value ( h(v)=-1, direction=-1 ) |
||||
* find first X ( h(v)=(v< x) ? -1 : 1 direction=+1 ) |
||||
* find last X ( h(v)=(v<=x) ? -1 : 1 direction=-1 ) |
||||
* find X or successor to X ( same as find first X. ) |
||||
* |
||||
* Rationale: To help understand heaviside functions and behavor of find: |
||||
* There are 7 kinds of heaviside functions. |
||||
* The signus of the h must be monotonically increasing. |
||||
* Given a function of the following form, A is the element |
||||
* returned for direction>0, B is the element returned |
||||
* for direction<0, C is the element returned for |
||||
* direction==0 (see find_zero) (with a return of 0), and D is the element |
||||
* returned for direction==0 (see find_zero) with a return of DB_NOTFOUND. |
||||
* If any of A, B, or C are not found, then asking for the |
||||
* associated direction will return DB_NOTFOUND. |
||||
* See find_zero for more information. |
||||
* |
||||
* Let the following represent the signus of the heaviside function. |
||||
* |
||||
* -...- |
||||
* A |
||||
* D |
||||
* |
||||
* +...+ |
||||
* B |
||||
* D |
||||
* |
||||
* 0...0 |
||||
* C |
||||
* |
||||
* -...-0...0 |
||||
* AC |
||||
* |
||||
* 0...0+...+ |
||||
* C B |
||||
* |
||||
* -...-+...+ |
||||
* AB |
||||
* D |
||||
* |
||||
* -...-0...0+...+ |
||||
* AC B |
||||
*/ |
||||
template <typename omtcmp_t, int (*h)(const omtdata_t &, const omtcmp_t &)> |
||||
int find(const omtcmp_t &extra, int direction, omtdataout_t *const value, |
||||
uint32_t *const idxp) const; |
||||
|
||||
/**
|
||||
* Effect: Return the size (in bytes) of the omt, as it resides in main |
||||
* memory. If the data stored are pointers, don't include the size of what |
||||
* they all point to. |
||||
*/ |
||||
size_t memory_size(void); |
||||
|
||||
private: |
||||
typedef uint32_t node_idx; |
||||
typedef omt_internal::subtree_templated<supports_marks> subtree; |
||||
typedef omt_internal::omt_node_templated<omtdata_t, supports_marks> omt_node; |
||||
ENSURE_POD(subtree); |
||||
|
||||
struct omt_array { |
||||
uint32_t start_idx; |
||||
uint32_t num_values; |
||||
omtdata_t *values; |
||||
}; |
||||
|
||||
struct omt_tree { |
||||
subtree root; |
||||
uint32_t free_idx; |
||||
omt_node *nodes; |
||||
}; |
||||
|
||||
bool is_array; |
||||
uint32_t capacity; |
||||
union { |
||||
struct omt_array a; |
||||
struct omt_tree t; |
||||
} d; |
||||
|
||||
__attribute__((nonnull)) void unmark(const subtree &subtree, |
||||
const uint32_t index, |
||||
GrowableArray<node_idx> *const indexes); |
||||
|
||||
void create_internal_no_array(const uint32_t new_capacity); |
||||
|
||||
void create_internal(const uint32_t new_capacity); |
||||
|
||||
uint32_t nweight(const subtree &subtree) const; |
||||
|
||||
node_idx node_malloc(void); |
||||
|
||||
void node_free(const node_idx idx); |
||||
|
||||
void maybe_resize_array(const uint32_t n); |
||||
|
||||
__attribute__((nonnull)) void fill_array_with_subtree_values( |
||||
omtdata_t *const array, const subtree &subtree) const; |
||||
|
||||
void convert_to_array(void); |
||||
|
||||
__attribute__((nonnull)) void rebuild_from_sorted_array( |
||||
subtree *const subtree, const omtdata_t *const values, |
||||
const uint32_t numvalues); |
||||
|
||||
void convert_to_tree(void); |
||||
|
||||
void maybe_resize_or_convert(const uint32_t n); |
||||
|
||||
bool will_need_rebalance(const subtree &subtree, const int leftmod, |
||||
const int rightmod) const; |
||||
|
||||
__attribute__((nonnull)) void insert_internal( |
||||
subtree *const subtreep, const omtdata_t &value, const uint32_t idx, |
||||
subtree **const rebalance_subtree); |
||||
|
||||
void set_at_internal_array(const omtdata_t &value, const uint32_t idx); |
||||
|
||||
void set_at_internal(const subtree &subtree, const omtdata_t &value, |
||||
const uint32_t idx); |
||||
|
||||
void delete_internal(subtree *const subtreep, const uint32_t idx, |
||||
omt_node *const copyn, |
||||
subtree **const rebalance_subtree); |
||||
|
||||
template <typename iterate_extra_t, |
||||
int (*f)(const omtdata_t &, const uint32_t, iterate_extra_t *const)> |
||||
int iterate_internal_array(const uint32_t left, const uint32_t right, |
||||
iterate_extra_t *const iterate_extra) const; |
||||
|
||||
template <typename iterate_extra_t, |
||||
int (*f)(omtdata_t *, const uint32_t, iterate_extra_t *const)> |
||||
void iterate_ptr_internal(const uint32_t left, const uint32_t right, |
||||
const subtree &subtree, const uint32_t idx, |
||||
iterate_extra_t *const iterate_extra); |
||||
|
||||
template <typename iterate_extra_t, |
||||
int (*f)(omtdata_t *, const uint32_t, iterate_extra_t *const)> |
||||
void iterate_ptr_internal_array(const uint32_t left, const uint32_t right, |
||||
iterate_extra_t *const iterate_extra); |
||||
|
||||
template <typename iterate_extra_t, |
||||
int (*f)(const omtdata_t &, const uint32_t, iterate_extra_t *const)> |
||||
int iterate_internal(const uint32_t left, const uint32_t right, |
||||
const subtree &subtree, const uint32_t idx, |
||||
iterate_extra_t *const iterate_extra) const; |
||||
|
||||
template <typename iterate_extra_t, |
||||
int (*f)(const omtdata_t &, const uint32_t, iterate_extra_t *const)> |
||||
int iterate_and_mark_range_internal(const uint32_t left, const uint32_t right, |
||||
const subtree &subtree, |
||||
const uint32_t idx, |
||||
iterate_extra_t *const iterate_extra); |
||||
|
||||
template <typename iterate_extra_t, |
||||
int (*f)(const omtdata_t &, const uint32_t, iterate_extra_t *const)> |
||||
int iterate_over_marked_internal(const subtree &subtree, const uint32_t idx, |
||||
iterate_extra_t *const iterate_extra) const; |
||||
|
||||
uint32_t verify_marks_consistent_internal(const subtree &subtree, |
||||
const bool allow_marks) const; |
||||
|
||||
void fetch_internal_array(const uint32_t i, omtdataout_t *const value) const; |
||||
|
||||
void fetch_internal(const subtree &subtree, const uint32_t i, |
||||
omtdataout_t *const value) const; |
||||
|
||||
__attribute__((nonnull)) void fill_array_with_subtree_idxs( |
||||
node_idx *const array, const subtree &subtree) const; |
||||
|
||||
__attribute__((nonnull)) void rebuild_subtree_from_idxs( |
||||
subtree *const subtree, const node_idx *const idxs, |
||||
const uint32_t numvalues); |
||||
|
||||
__attribute__((nonnull)) void rebalance(subtree *const subtree); |
||||
|
||||
__attribute__((nonnull)) static void copyout(omtdata_t *const out, |
||||
const omt_node *const n); |
||||
|
||||
__attribute__((nonnull)) static void copyout(omtdata_t **const out, |
||||
omt_node *const n); |
||||
|
||||
__attribute__((nonnull)) static void copyout( |
||||
omtdata_t *const out, const omtdata_t *const stored_value_ptr); |
||||
|
||||
__attribute__((nonnull)) static void copyout( |
||||
omtdata_t **const out, omtdata_t *const stored_value_ptr); |
||||
|
||||
template <typename omtcmp_t, int (*h)(const omtdata_t &, const omtcmp_t &)> |
||||
int find_internal_zero_array(const omtcmp_t &extra, omtdataout_t *const value, |
||||
uint32_t *const idxp) const; |
||||
|
||||
template <typename omtcmp_t, int (*h)(const omtdata_t &, const omtcmp_t &)> |
||||
int find_internal_zero(const subtree &subtree, const omtcmp_t &extra, |
||||
omtdataout_t *const value, uint32_t *const idxp) const; |
||||
|
||||
template <typename omtcmp_t, int (*h)(const omtdata_t &, const omtcmp_t &)> |
||||
int find_internal_plus_array(const omtcmp_t &extra, omtdataout_t *const value, |
||||
uint32_t *const idxp) const; |
||||
|
||||
template <typename omtcmp_t, int (*h)(const omtdata_t &, const omtcmp_t &)> |
||||
int find_internal_plus(const subtree &subtree, const omtcmp_t &extra, |
||||
omtdataout_t *const value, uint32_t *const idxp) const; |
||||
|
||||
template <typename omtcmp_t, int (*h)(const omtdata_t &, const omtcmp_t &)> |
||||
int find_internal_minus_array(const omtcmp_t &extra, |
||||
omtdataout_t *const value, |
||||
uint32_t *const idxp) const; |
||||
|
||||
template <typename omtcmp_t, int (*h)(const omtdata_t &, const omtcmp_t &)> |
||||
int find_internal_minus(const subtree &subtree, const omtcmp_t &extra, |
||||
omtdataout_t *const value, |
||||
uint32_t *const idxp) const; |
||||
}; |
||||
|
||||
} // namespace toku
|
||||
|
||||
// include the implementation here
|
||||
#include "omt_impl.h" |
File diff suppressed because it is too large
Load Diff
@ -0,0 +1,151 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
// Overview: A partitioned_counter provides a counter that can be incremented
|
||||
// and the running sum can be read at any time.
|
||||
// We assume that increments are frequent, whereas reading is infrequent.
|
||||
// Implementation hint: Use thread-local storage so each thread increments its
|
||||
// own data. The increment does not require a lock or atomic operation.
|
||||
// Reading the data can be performed by iterating over the thread-local
|
||||
// versions, summing them up. The data structure also includes a sum for all
|
||||
// the threads that have died. Use a pthread_key to create the thread-local
|
||||
// versions. When a thread finishes, the system calls pthread_key destructor
|
||||
// which can add that thread's copy into the sum_of_dead counter.
|
||||
// Rationale: For statistics such as are found in engine status, we need a
|
||||
// counter that requires no cache misses to increment. We've seen significant
|
||||
// performance speedups by removing certain counters. Rather than removing
|
||||
// those statistics, we would like to just make the counter fast. We generally
|
||||
// increment the counters frequently, and want to fetch the values
|
||||
// infrequently. The counters are monotonic. The counters can be split into
|
||||
// many counters, which can be summed up at the end. We don't care if we get
|
||||
// slightly out-of-date counter sums when we read the counter. We don't care
|
||||
// if there is a race on reading the a counter
|
||||
// variable and incrementing.
|
||||
// See tests/test_partitioned_counter.c for some performance measurements.
|
||||
// Operations:
|
||||
// create_partitioned_counter Create a counter initialized to zero.
|
||||
// destroy_partitioned_counter Destroy it.
|
||||
// increment_partitioned_counter Increment it. This is the frequent
|
||||
// operation. read_partitioned_counter Get the current value. This is
|
||||
// infrequent.
|
||||
// See partitioned_counter.cc for the abstraction function and representation
|
||||
// invariant.
|
||||
//
|
||||
// The google style guide says to avoid using constructors, and it appears that
|
||||
// constructors may have broken all the tests, because they called
|
||||
// pthread_key_create before the key was actually created. So the google style
|
||||
// guide may have some wisdom there...
|
||||
//
|
||||
// This version does not use constructors, essentially reverrting to the google
|
||||
// C++ style guide.
|
||||
//
|
||||
|
||||
// The old C interface. This required a bunch of explicit
|
||||
// ___attribute__((__destructor__)) functions to remember to destroy counters at
|
||||
// the end.
|
||||
#if defined(__cplusplus) |
||||
extern "C" { |
||||
#endif |
||||
|
||||
typedef struct partitioned_counter *PARTITIONED_COUNTER; |
||||
PARTITIONED_COUNTER create_partitioned_counter(void); |
||||
// Effect: Create a counter, initialized to zero.
|
||||
|
||||
void destroy_partitioned_counter(PARTITIONED_COUNTER); |
||||
// Effect: Destroy the counter. No operations on that counter are permitted
|
||||
// after this.
|
||||
|
||||
void increment_partitioned_counter(PARTITIONED_COUNTER, uint64_t amount); |
||||
// Effect: Increment the counter by amount.
|
||||
// Requires: No overflows. This is a 64-bit unsigned counter.
|
||||
|
||||
uint64_t read_partitioned_counter(PARTITIONED_COUNTER) |
||||
__attribute__((__visibility__("default"))); |
||||
// Effect: Return the current value of the counter.
|
||||
|
||||
void partitioned_counters_init(void); |
||||
// Effect: Initialize any partitioned counters data structures that must be set
|
||||
// up before any partitioned counters run.
|
||||
|
||||
void partitioned_counters_destroy(void); |
||||
// Effect: Destroy any partitioned counters data structures.
|
||||
|
||||
#if defined(__cplusplus) |
||||
}; |
||||
#endif |
||||
|
||||
#if 0 |
||||
#include <pthread.h> |
||||
|
||||
#include "fttypes.h" |
||||
|
||||
// Used inside the PARTITIONED_COUNTER.
|
||||
struct linked_list_head { |
||||
struct linked_list_element *first; |
||||
}; |
||||
|
||||
|
||||
class PARTITIONED_COUNTER { |
||||
public: |
||||
PARTITIONED_COUNTER(void); |
||||
// Effect: Construct a counter, initialized to zero.
|
||||
|
||||
~PARTITIONED_COUNTER(void); |
||||
// Effect: Destruct the counter.
|
||||
|
||||
void increment(uint64_t amount); |
||||
// Effect: Increment the counter by amount. This is a 64-bit unsigned counter, and if you overflow it, you will get overflowed results (that is mod 2^64).
|
||||
// Requires: Don't use this from a static constructor or destructor.
|
||||
|
||||
uint64_t read(void); |
||||
// Effect: Read the sum.
|
||||
// Requires: Don't use this from a static constructor or destructor.
|
||||
|
||||
private: |
||||
uint64_t _sum_of_dead; // The sum of all thread-local counts from threads that have terminated.
|
||||
pthread_key_t _key; // The pthread_key which gives us the hook to construct and destruct thread-local storage.
|
||||
struct linked_list_head _ll_counter_head; // A linked list of all the thread-local information for this counter.
|
||||
|
||||
// This function is used to destroy the thread-local part of the state when a thread terminates.
|
||||
// But it's not the destructor for the local part of the counter, it's a destructor on a "dummy" key just so that we get a notification when a thread ends.
|
||||
friend void destroy_thread_local_part_of_partitioned_counters (void *); |
||||
}; |
||||
#endif |
@ -0,0 +1,62 @@ |
||||
/* -*- mode: C++; c-basic-offset: 4; indent-tabs-mode: nil -*- */ |
||||
// vim: ft=cpp:expandtab:ts=8:sw=4:softtabstop=4:
|
||||
#ident "$Id$" |
||||
/*======
|
||||
This file is part of PerconaFT. |
||||
|
||||
|
||||
Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved. |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU General Public License, version 2, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
|
||||
---------------------------------------- |
||||
|
||||
PerconaFT is free software: you can redistribute it and/or modify |
||||
it under the terms of the GNU Affero General Public License, version 3, |
||||
as published by the Free Software Foundation. |
||||
|
||||
PerconaFT is distributed in the hope that it will be useful, |
||||
but WITHOUT ANY WARRANTY; without even the implied warranty of |
||||
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the |
||||
GNU Affero General Public License for more details. |
||||
|
||||
You should have received a copy of the GNU Affero General Public License |
||||
along with PerconaFT. If not, see <http://www.gnu.org/licenses/>.
|
||||
======= */ |
||||
|
||||
#ident \ |
||||
"Copyright (c) 2006, 2015, Percona and/or its affiliates. All rights reserved." |
||||
|
||||
#pragma once |
||||
|
||||
#include "partitioned_counter.h" |
||||
// PORT2: #include <util/constexpr.h>
|
||||
|
||||
#define TOKUFT_STATUS_INIT(array, k, c, t, l, inc) \ |
||||
do { \
|
||||
array.status[k].keyname = #k; \
|
||||
array.status[k].columnname = #c; \
|
||||
array.status[k].type = t; \
|
||||
array.status[k].legend = l; \
|
||||
constexpr_static_assert( \
|
||||
strcmp(#c, "NULL") && strcmp(#c, "0"), \
|
||||
"Use nullptr for no column name instead of NULL, 0, etc..."); \
|
||||
constexpr_static_assert( \
|
||||
(inc) == TOKU_ENGINE_STATUS || strcmp(#c, "nullptr"), \
|
||||
"Missing column name."); \
|
||||
array.status[k].include = \
|
||||
static_cast<toku_engine_status_include_type>(inc); \
|
||||
if (t == STATUS_PARCOUNT) { \
|
||||
array.status[k].value.parcount = create_partitioned_counter(); \
|
||||
} \
|
||||
} while (0) |
Loading…
Reference in new issue