fork of https://github.com/oxigraph/rocksdb and https://github.com/facebook/rocksdb for nextgraph and oxigraph
You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
68 lines
2.8 KiB
68 lines
2.8 KiB
13 years ago
|
Neutronium: a very dense encoding of Thrift objects
|
||
|
|
||
|
Neutronium is a Thrift encoding format optimized for space at the expense of
|
||
|
speed. It achieves high efficiency in a few ways:
|
||
|
|
||
|
1. It does not encode type and tag information. This is stored out of line in
|
||
|
a Schema object, which must be provided at both encoding and decoding time.
|
||
|
If encoding many Thrift objects, you can transmit / store the Schema only
|
||
|
once. The Schema (in thrift/lib/thrift/reflection.thrift) is itself a
|
||
|
Thrift object, and can be serialized / deserialized in the usual way.
|
||
|
|
||
|
2. It encodes data very compactly. Bytes use one byte each; larger numbers
|
||
|
(i16, i32, i64, double) use variable-length encoding (GroupVarint). Booleans
|
||
|
use one bit each. We also encode one bit for every optional field that exists
|
||
|
in the structure definition (indicating whether the field is set or not).
|
||
|
Strings can be encoded in a variety of formats, see below.
|
||
|
|
||
|
3. Aggregates (lists, maps, sets) are encoded efficiently -- they are encoded
|
||
|
like a structure with a variable number of fields. So list<i32> takes
|
||
|
advantage of GroupVarint encoding among consecutive values.
|
||
|
|
||
|
4. Strings can be interned: when encoding multiple strings, we can detect
|
||
|
duplicates and store only an ID. This requires using an InternTable and
|
||
|
passing the same InternTable to the encoder and decoder (the InternTable
|
||
|
can be easily serialized and deserialized).
|
||
|
|
||
|
Neutronium is backwards compatible as long as the schema is identical from
|
||
|
encoding to decoding, and the changes you made to the Thrift definition are
|
||
|
backwards compatible (that is, fields were added, removed, or renamed, but
|
||
|
field ids remained the same)
|
||
|
|
||
|
Configuration:
|
||
|
|
||
|
Neutronium can be configured by using field attributes in your Thrift
|
||
|
definition:
|
||
|
|
||
|
struct Foo {
|
||
|
1: i32 a (neutronium.fixed = 1),
|
||
|
}
|
||
|
|
||
|
Attributes for number fields:
|
||
|
neutronium.fixed = 1
|
||
|
Encode the number as a fixed-length value (i16 takes 2 bytes, i32
|
||
|
takes 4 bytes, i64 takes 8 bytes) instead of using Varint encoding
|
||
|
|
||
|
Attributes for string fields:
|
||
|
neutronium.fixed = <length>
|
||
|
neutronium.pad = 'X'
|
||
|
Do not encode the string length, assume that all strings have length
|
||
|
<length>. Strings longer than <length> are truncated; strings
|
||
|
shorter than <length> are padded with 'X' (default: the null byte, '\0').
|
||
|
Use this when you expect that all / most strings have a fixed length.
|
||
|
|
||
|
neutronium.terminator = 'X'
|
||
|
Do not encode the string length; store strings terminated with a
|
||
|
terminator ('\0' will likely be a popular choice). Encoding strings that
|
||
|
contain the terminator is an error.
|
||
|
|
||
|
neutronium.intern = 1
|
||
|
Intern strings; requires a non-NULL InternTable.
|
||
|
|
||
|
Attributes for enum fields:
|
||
|
neutronium.strict = 1
|
||
|
Encode the enum using as few bits as necessary to encode all possible
|
||
|
values; note that it becomes an error to encode an enum value that is not
|
||
|
specified in the Thrift definition.
|
||
|
|