Explicitly cast char to signed char in Hash()

Summary:
The compilers we use treat char as signed. However, this is not guarantee of C standard and some compilers (for ARM platform for example), treat char as unsigned. Code that assumes that char is either signed or unsigned is wrong.

This change explicitly casts the char to signed version. This will not break any of our use cases on x86, which, I believe are all of them. In case somebody out there is using RocksDB on ARM AND using bloom filters, they're going to have a bad time. However, it is very unlikely that this is the case.

Test Plan: sanity test with previous commit (with new sanity test)

Reviewers: yhchiang, ljin, sdong

Reviewed By: ljin

Subscribers: dhruba, leveldb

Differential Revision: https://reviews.facebook.net/D22767
main
Igor Canadi 10 years ago
parent 52311463e9
commit 88841bd007
  1. 2
      HISTORY.md
  2. 10
      tools/auto_sanity_test.sh
  3. 29
      tools/db_sanity_test.cc
  4. 18
      util/hash.cc

@ -1,6 +1,8 @@
# Rocksdb Change Log # Rocksdb Change Log
## Unreleased (will be released with 3.6) ## Unreleased (will be released with 3.6)
### Disk format changes
* If you're using RocksDB on ARM platforms and you're using default bloom filter, there is a disk format change you need to be aware of. There are three steps you need to do when you convert to new release: 1. turn off filter policy, 2. compact the whole database, 3. turn on filter policy
### Behavior changes ### Behavior changes
* We have refactored our system of stalling writes. Any stall-related statistics' meanings are changed. Instead of per-write stall counts, we now count stalls per-epoch, where epochs are periods between flushes and compactions. You'll find more information in our Tuning Perf Guide once we release RocksDB 3.6. * We have refactored our system of stalling writes. Any stall-related statistics' meanings are changed. Instead of per-write stall counts, we now count stalls per-epoch, where epochs are periods between flushes and compactions. You'll find more information in our Tuning Perf Guide once we release RocksDB 3.6.

@ -37,6 +37,11 @@ echo "Running db sanity check with commits $commit_new and $commit_old."
echo "=============================================================" echo "============================================================="
echo "Making build $commit_new" echo "Making build $commit_new"
git checkout $commit_new
if [ $? -ne 0 ]; then
echo "[ERROR] Can't checkout $commit_new"
exit 1
fi
makestuff makestuff
mv db_sanity_test new_db_sanity_test mv db_sanity_test new_db_sanity_test
echo "Creating db based on the new commit --- $commit_new" echo "Creating db based on the new commit --- $commit_new"
@ -44,6 +49,11 @@ echo "Creating db based on the new commit --- $commit_new"
echo "=============================================================" echo "============================================================="
echo "Making build $commit_old" echo "Making build $commit_old"
git checkout $commit_old
if [ $? -ne 0 ]; then
echo "[ERROR] Can't checkout $commit_old"
exit 1
fi
makestuff makestuff
mv db_sanity_test old_db_sanity_test mv db_sanity_test old_db_sanity_test
echo "Creating db based on the old commit --- $commit_old" echo "Creating db based on the old commit --- $commit_old"

@ -8,15 +8,15 @@
#include <vector> #include <vector>
#include <memory> #include <memory>
#include "include/rocksdb/db.h" #include "rocksdb/db.h"
#include "include/rocksdb/options.h" #include "rocksdb/options.h"
#include "include/rocksdb/env.h" #include "rocksdb/env.h"
#include "include/rocksdb/slice.h" #include "rocksdb/slice.h"
#include "include/rocksdb/status.h" #include "rocksdb/status.h"
#include "include/rocksdb/comparator.h" #include "rocksdb/comparator.h"
#include "include/rocksdb/table.h" #include "rocksdb/table.h"
#include "include/rocksdb/slice_transform.h" #include "rocksdb/slice_transform.h"
#include "include/rocksdb/filter_policy.h" #include "rocksdb/filter_policy.h"
namespace rocksdb { namespace rocksdb {
@ -50,7 +50,7 @@ class SanityTest {
return s; return s;
} }
} }
return Status::OK(); return db->Flush(FlushOptions());
} }
Status Verify() { Status Verify() {
DB* db; DB* db;
@ -149,10 +149,10 @@ class SanityTestPlainTableFactory : public SanityTest {
class SanityTestBloomFilter : public SanityTest { class SanityTestBloomFilter : public SanityTest {
public: public:
explicit SanityTestBloomFilter(const std::string& path) explicit SanityTestBloomFilter(const std::string& path) : SanityTest(path) {
: SanityTest(path) { BlockBasedTableOptions table_options;
table_options_.filter_policy.reset(NewBloomFilterPolicy(10)); table_options.filter_policy.reset(NewBloomFilterPolicy(10));
options_.table_factory.reset(NewBlockBasedTableFactory(table_options_)); options_.table_factory.reset(NewBlockBasedTableFactory(table_options));
} }
~SanityTestBloomFilter() {} ~SanityTestBloomFilter() {}
virtual Options GetOptions() const { return options_; } virtual Options GetOptions() const { return options_; }
@ -160,7 +160,6 @@ class SanityTestBloomFilter : public SanityTest {
private: private:
Options options_; Options options_;
BlockBasedTableOptions table_options_;
}; };
namespace { namespace {

@ -31,14 +31,26 @@ uint32_t Hash(const char* data, size_t n, uint32_t seed) {
// Pick up remaining bytes // Pick up remaining bytes
switch (limit - data) { switch (limit - data) {
// Note: It would be better if this was cast to unsigned char, but that
// would be a disk format change since we previously didn't have any cast
// at all (so gcc used signed char).
// To understand the difference between shifting unsigned and signed chars,
// let's use 250 as an example. unsigned char will be 250, while signed char
// will be -6. Bit-wise, they are equivalent: 11111010. However, when
// converting negative number (signed char) to int, it will be converted
// into negative int (of equivalent value, which is -6), while converting
// positive number (unsigned char) will be converted to 250. Bitwise,
// this looks like this:
// signed char 11111010 -> int 11111111111111111111111111111010
// unsigned char 11111010 -> int 00000000000000000000000011111010
case 3: case 3:
h += data[2] << 16; h += static_cast<uint32_t>(static_cast<signed char>(data[2]) << 16);
// fall through // fall through
case 2: case 2:
h += data[1] << 8; h += static_cast<uint32_t>(static_cast<signed char>(data[1]) << 8);
// fall through // fall through
case 1: case 1:
h += data[0]; h += static_cast<uint32_t>(static_cast<signed char>(data[0]));
h *= m; h *= m;
h ^= (h >> r); h ^= (h >> r);
break; break;

Loading…
Cancel
Save