You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
rocksdb/build_tools/build_detect_platform

620 lines
21 KiB

#!/usr/bin/env bash
#
# Detects OS we're compiling on and outputs a file specified by the first
# argument, which in turn gets read while processing Makefile.
#
# The output will set the following variables:
# CC C Compiler path
# CXX C++ Compiler path
# PLATFORM_LDFLAGS Linker flags
# JAVA_LDFLAGS Linker flags for RocksDBJava
# JAVA_STATIC_LDFLAGS Linker flags for RocksDBJava static build
# PLATFORM_SHARED_EXT Extension for shared libraries
# PLATFORM_SHARED_LDFLAGS Flags for building shared library
# PLATFORM_SHARED_CFLAGS Flags for compiling objects for shared library
# PLATFORM_CCFLAGS C compiler flags
# PLATFORM_CXXFLAGS C++ compiler flags. Will contain:
# PLATFORM_SHARED_VERSIONED Set to 'true' if platform supports versioned
# shared libraries, empty otherwise.
# FIND Command for the find utility
# WATCH Command for the watch utility
#
# The PLATFORM_CCFLAGS and PLATFORM_CXXFLAGS might include the following:
#
# -DROCKSDB_PLATFORM_POSIX if posix-platform based
# -DSNAPPY if the Snappy library is present
# -DLZ4 if the LZ4 library is present
# -DZSTD if the ZSTD library is present
Adding NUMA support to db_bench tests Summary: Changes: - Adding numa_aware flag to db_bench.cc - Using numa.h library to bind memory and cpu of threads to a fixed NUMA node Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op. Test Plan: Ran db_bench tests using following command: ./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True] The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True. Reviewers: sdong, yhchiang, ljin, igor Reviewed By: ljin, igor Subscribers: igor, leveldb Differential Revision: https://reviews.facebook.net/D19353
10 years ago
# -DNUMA if the NUMA library is present
# -DTBB if the TBB library is present
#
# Using gflags in rocksdb:
# Our project depends on gflags, which requires users to take some extra steps
# before they can compile the whole repository:
# 1. Install gflags. You may download it from here:
# https://gflags.github.io/gflags/ (Mac users can `brew install gflags`)
# 2. Once installed, add the include path for gflags to your CPATH env var and
# the lib path to LIBRARY_PATH. If installed with default settings, the lib
# will be /usr/local/lib and the include path will be /usr/local/include
OUTPUT=$1
if test -z "$OUTPUT"; then
echo "usage: $0 <output-filename>" >&2
exit 1
fi
# we depend on C++11
PLATFORM_CXXFLAGS="-std=c++11"
# we currently depend on POSIX platform
COMMON_FLAGS="-DROCKSDB_PLATFORM_POSIX -DROCKSDB_LIB_IO_POSIX"
# Default to fbcode gcc on internal fb machines
if [ -z "$ROCKSDB_NO_FBCODE" -a -d /mnt/gvfs/third-party ]; then
FBCODE_BUILD="true"
# If we're compiling with TSAN we need pic build
PIC_BUILD=$COMPILE_WITH_TSAN
if [ -z "$ROCKSDB_FBCODE_BUILD_WITH_481" ]; then
source "$PWD/build_tools/fbcode_config.sh"
else
# we need this to build with MySQL. Don't use for other purposes.
source "$PWD/build_tools/fbcode_config4.8.1.sh"
fi
fi
# Delete existing output, if it exists
rm -f "$OUTPUT"
touch "$OUTPUT"
if test -z "$CC"; then
if [ -x "$(command -v cc)" ]; then
CC=cc
elif [ -x "$(command -v clang)" ]; then
CC=clang
else
CC=cc
fi
fi
if test -z "$CXX"; then
if [ -x "$(command -v g++)" ]; then
CXX=g++
elif [ -x "$(command -v clang++)" ]; then
CXX=clang++
else
CXX=g++
fi
fi
# Detect OS
if test -z "$TARGET_OS"; then
TARGET_OS=`uname -s`
fi
if test -z "$TARGET_ARCHITECTURE"; then
TARGET_ARCHITECTURE=`uname -m`
fi
if test -z "$CLANG_SCAN_BUILD"; then
CLANG_SCAN_BUILD=scan-build
fi
build: do not relink every single binary just for a timestamp Summary: Prior to this change, "make check" would always waste a lot of time relinking 60+ binaries. With this change, it does that only when the generated file, util/build_version.cc, changes, and that happens only when the date changes or when the current git SHA changes. This change makes some other improvements: before, there was no rule to build a deleted util/build_version.cc. If it was somehow removed, any attempt to link a program would fail. There is no longer any need for the separate file, build_tools/build_detect_version. Its functionality is now in the Makefile. * Makefile (DEPFILES): Don't filter-out util/build_version.cc. No need, and besides, removing that dependency was wrong. (date, git_sha, gen_build_version): New helper variables. (util/build_version.cc): New rule, to create this file and update it only if it would contain new information. * build_tools/build_detect_platform: Remove file. * db/db_impl.cc: Now, print only date (not the time). * util/build_version.h (rocksdb_build_compile_time): Remove declaration. No longer used. Test Plan: - Run "make check" twice, and note that the second time no linking is performed. - Remove util/build_version.cc and ensure that any "make" command regenerates it before doing anything else. - Run this: strings librocksdb.a|grep _build_. That prints output including the following: rocksdb_build_git_date:2015-02-19 rocksdb_build_git_sha:2.8.fb-1792-g3cb6cc0 Reviewers: ljin, sdong, igor Reviewed By: igor Subscribers: dhruba Differential Revision: https://reviews.facebook.net/D33591
10 years ago
if test -z "$CLANG_ANALYZER"; then
CLANG_ANALYZER=$(command -v clang++ 2> /dev/null)
fi
if test -z "$FIND"; then
FIND=find
fi
if test -z "$WATCH"; then
WATCH=watch
fi
COMMON_FLAGS="$COMMON_FLAGS ${CFLAGS}"
CROSS_COMPILE=
PLATFORM_CCFLAGS=
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS"
PLATFORM_SHARED_EXT="so"
PLATFORM_SHARED_LDFLAGS="-Wl,--no-as-needed -shared -Wl,-soname -Wl,"
PLATFORM_SHARED_CFLAGS="-fPIC"
PLATFORM_SHARED_VERSIONED=true
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# generic port files (working on all platform by #ifdef) go directly in /port
GENERIC_PORT_FILES=`cd "$ROCKSDB_ROOT"; find port -name '*.cc' | tr "\n" " "`
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# On GCC, we pick libc's memcmp over GCC's memcmp via -fno-builtin-memcmp
case "$TARGET_OS" in
Darwin)
PLATFORM=OS_MACOSX
COMMON_FLAGS="$COMMON_FLAGS -DOS_MACOSX"
PLATFORM_SHARED_EXT=dylib
PLATFORM_SHARED_LDFLAGS="-dynamiclib -install_name "
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# PORT_FILES=port/darwin/darwin_specific.cc
;;
IOS)
PLATFORM=IOS
COMMON_FLAGS="$COMMON_FLAGS -DOS_MACOSX -DIOS_CROSS_COMPILE -DROCKSDB_LITE"
PLATFORM_SHARED_EXT=dylib
PLATFORM_SHARED_LDFLAGS="-dynamiclib -install_name "
CROSS_COMPILE=true
PLATFORM_SHARED_VERSIONED=
;;
Linux)
PLATFORM=OS_LINUX
COMMON_FLAGS="$COMMON_FLAGS -DOS_LINUX"
if [ -z "$USE_CLANG" ]; then
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp"
else
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -latomic"
fi
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lrt"
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# PORT_FILES=port/linux/linux_specific.cc
;;
SunOS)
PLATFORM=OS_SOLARIS
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_SOLARIS -m64"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lrt -static-libstdc++ -static-libgcc -m64"
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# PORT_FILES=port/sunos/sunos_specific.cc
;;
AIX)
PLATFORM=OS_AIX
CC=gcc
COMMON_FLAGS="$COMMON_FLAGS -maix64 -pthread -fno-builtin-memcmp -D_REENTRANT -DOS_AIX -D__STDC_FORMAT_MACROS"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -pthread -lpthread -lrt -maix64 -static-libstdc++ -static-libgcc"
# PORT_FILES=port/aix/aix_specific.cc
;;
FreeBSD)
PLATFORM=OS_FREEBSD
CXX=clang++
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_FREEBSD"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread"
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# PORT_FILES=port/freebsd/freebsd_specific.cc
;;
NetBSD)
PLATFORM=OS_NETBSD
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_NETBSD"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lgcc_s"
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# PORT_FILES=port/netbsd/netbsd_specific.cc
;;
OpenBSD)
PLATFORM=OS_OPENBSD
CXX=clang++
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_OPENBSD"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -pthread"
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# PORT_FILES=port/openbsd/openbsd_specific.cc
FIND=gfind
WATCH=gnuwatch
;;
DragonFly)
PLATFORM=OS_DRAGONFLYBSD
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_DRAGONFLYBSD"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread"
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# PORT_FILES=port/dragonfly/dragonfly_specific.cc
;;
Cygwin)
PLATFORM=CYGWIN
PLATFORM_SHARED_CFLAGS=""
PLATFORM_CXXFLAGS="-std=gnu++11"
COMMON_FLAGS="$COMMON_FLAGS -DCYGWIN"
if [ -z "$USE_CLANG" ]; then
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp"
else
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -latomic"
fi
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lpthread -lrt"
# PORT_FILES=port/linux/linux_specific.cc
;;
OS_ANDROID_CROSSCOMPILE)
PLATFORM=OS_ANDROID
COMMON_FLAGS="$COMMON_FLAGS -fno-builtin-memcmp -D_REENTRANT -DOS_ANDROID -DROCKSDB_PLATFORM_POSIX"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS " # All pthread features are in the Android C library
[RocksDB] Add stacktrace signal handler Summary: This diff provides the ability to print out a stacktrace when the process receives certain signals. Currently, we enable this for the following signals (program error related): SIGILL SIGSEGV SIGBUS SIGABRT Application simply #include "util/stack_trace.h" and call leveldb::InstallStackTraceHandler() during initialization, if signal handler is needed. It's not done automatically when openning db, because it's the application(process)'s responsibility to install signal handler and some applications might already have their own (like fbcode). Sample output: Received signal 11 (Segmentation fault) #0 0x408ff0 ./signal_test() [0x408ff0] /home/haobo/rocksdb/util/signal_test.cc:4 #1 0x40827d ./signal_test() [0x40827d] /home/haobo/rocksdb/util/signal_test.cc:24 #2 0x7f8bb183172e /usr/local/fbcode/gcc-4.7.1-glibc-2.14.1/lib/libc.so.6(__libc_start_main+0x10e) [0x7f8bb183172e] ??:0 #3 0x408ebc ./signal_test() [0x408ebc] /home/engshare/third-party/src/glibc/glibc-2.14.1/glibc-2.14.1/csu/../sysdeps/x86_64/elf/start.S:113 Segmentation fault (core dumped) For each frame, we print the raw pointer, the symbol provided by backtrace_symbols (still not good enough), and the source file/line. Note that address translation is done by directly shell out to addr2line. ??:0 means addr2line fails to do the translation. Hacky, but I think it's good for now. Test Plan: signal_test.cc Reviewers: dhruba, MarkCallaghan Reviewed By: dhruba CC: leveldb Differential Revision: https://reviews.facebook.net/D10173
12 years ago
# PORT_FILES=port/android/android.cc
CROSS_COMPILE=true
;;
*)
echo "Unknown platform!" >&2
exit 1
esac
PLATFORM_CXXFLAGS="$PLATFORM_CXXFLAGS ${CXXFLAGS}"
JAVA_LDFLAGS="$PLATFORM_LDFLAGS"
JAVA_STATIC_LDFLAGS="$PLATFORM_LDFLAGS"
if [ "$CROSS_COMPILE" = "true" -o "$FBCODE_BUILD" = "true" ]; then
# Cross-compiling; do not try any compilation tests.
# Also don't need any compilation tests if compiling on fbcode
true
else
if ! test $ROCKSDB_DISABLE_FALLOCATE; then
# Test whether fallocate is available
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <fcntl.h>
#include <linux/falloc.h>
int main() {
int fd = open("/dev/null", 0);
fallocate(fd, FALLOC_FL_KEEP_SIZE | FALLOC_FL_PUNCH_HOLE, 0, 1024);
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_FALLOCATE_PRESENT"
fi
fi
if ! test $ROCKSDB_DISABLE_SNAPPY; then
# Test whether Snappy library is installed
# http://code.google.com/p/snappy/
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <snappy.h>
int main() {}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DSNAPPY"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lsnappy"
JAVA_LDFLAGS="$JAVA_LDFLAGS -lsnappy"
fi
fi
if ! test $ROCKSDB_DISABLE_GFLAGS; then
# Test whether gflags library is installed
# http://gflags.github.io/gflags/
# check if the namespace is gflags
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null << EOF
#include <gflags/gflags.h>
int main() {}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DGFLAGS=1"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lgflags"
else
# check if namespace is google
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null << EOF
#include <gflags/gflags.h>
using namespace google;
int main() {}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DGFLAGS=google"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lgflags"
fi
fi
fi
if ! test $ROCKSDB_DISABLE_ZLIB; then
# Test whether zlib library is installed
$CXX $CFLAGS $COMMON_FLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <zlib.h>
int main() {}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DZLIB"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lz"
JAVA_LDFLAGS="$JAVA_LDFLAGS -lz"
fi
fi
if ! test $ROCKSDB_DISABLE_BZIP; then
# Test whether bzip library is installed
$CXX $CFLAGS $COMMON_FLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <bzlib.h>
int main() {}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DBZIP2"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lbz2"
JAVA_LDFLAGS="$JAVA_LDFLAGS -lbz2"
fi
fi
if ! test $ROCKSDB_DISABLE_LZ4; then
# Test whether lz4 library is installed
$CXX $CFLAGS $COMMON_FLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <lz4.h>
#include <lz4hc.h>
int main() {}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DLZ4"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -llz4"
JAVA_LDFLAGS="$JAVA_LDFLAGS -llz4"
fi
fi
if ! test $ROCKSDB_DISABLE_ZSTD; then
# Test whether zstd library is installed
$CXX $CFLAGS $COMMON_FLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <zstd.h>
int main() {}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DZSTD"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lzstd"
JAVA_LDFLAGS="$JAVA_LDFLAGS -lzstd"
fi
fi
if ! test $ROCKSDB_DISABLE_NUMA; then
# Test whether numa is available
$CXX $CFLAGS -x c++ - -o /dev/null -lnuma 2>/dev/null <<EOF
#include <numa.h>
#include <numaif.h>
int main() {}
Adding NUMA support to db_bench tests Summary: Changes: - Adding numa_aware flag to db_bench.cc - Using numa.h library to bind memory and cpu of threads to a fixed NUMA node Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op. Test Plan: Ran db_bench tests using following command: ./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True] The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True. Reviewers: sdong, yhchiang, ljin, igor Reviewed By: ljin, igor Subscribers: igor, leveldb Differential Revision: https://reviews.facebook.net/D19353
10 years ago
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DNUMA"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lnuma"
JAVA_LDFLAGS="$JAVA_LDFLAGS -lnuma"
fi
Adding NUMA support to db_bench tests Summary: Changes: - Adding numa_aware flag to db_bench.cc - Using numa.h library to bind memory and cpu of threads to a fixed NUMA node Result: There seems to be no significant change in the micros/op time with numa_aware enabled. I also tried this with other implementations, including a combination of pthread_setaffinity_np, sched_setaffinity and set_mempolicy methods. It'd be great if someone could point out where I'm going wrong and if we can achieve a better micors/op. Test Plan: Ran db_bench tests using following command: ./db_bench --db=/mnt/tmp --num_levels=6 --key_size=20 --prefix_size=20 --keys_per_prefix=0 --value_size=100 --block_size=4096 --cache_size=17179869184 --cache_numshardbits=6 --compression_type=none --compression_ratio=1 --min_level_to_compress=-1 --disable_seek_compaction=1 --hard_rate_limit=2 --write_buffer_size=134217728 --max_write_buffer_number=2 --level0_file_num_compaction_trigger=8 --target_file_size_base=134217728 --max_bytes_for_level_base=1073741824 --disable_wal=0 --wal_dir=/mnt/tmp --sync=0 --disable_data_sync=1 --verify_checksum=1 --delete_obsolete_files_period_micros=314572800 --max_grandparent_overlap_factor=10 --max_background_compactions=4 --max_background_flushes=0 --level0_slowdown_writes_trigger=16 --level0_stop_writes_trigger=24 --statistics=0 --stats_per_interval=0 --stats_interval=1048576 --histogram=0 --use_plain_table=1 --open_files=-1 --mmap_read=1 --mmap_write=0 --memtablerep=prefix_hash --bloom_bits=10 --bloom_locality=1 --perf_level=0 --duration=300 --benchmarks=readwhilewriting --use_existing_db=1 --num=157286400 --threads=24 --writes_per_second=10240 --numa_aware=[False/True] The tests were run in private devserver with 24 cores and the db was prepopulated using filluniquerandom test. The tests resulted in 0.145 us/op with numa_aware=False and 0.161 us/op with numa_aware=True. Reviewers: sdong, yhchiang, ljin, igor Reviewed By: ljin, igor Subscribers: igor, leveldb Differential Revision: https://reviews.facebook.net/D19353
10 years ago
fi
if ! test $ROCKSDB_DISABLE_TBB; then
# Test whether tbb is available
$CXX $CFLAGS $LDFLAGS -x c++ - -o /dev/null -ltbb 2>/dev/null <<EOF
#include <tbb/tbb.h>
int main() {}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DTBB"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -ltbb"
JAVA_LDFLAGS="$JAVA_LDFLAGS -ltbb"
fi
fi
if ! test $ROCKSDB_DISABLE_JEMALLOC; then
# Test whether jemalloc is available
if echo 'int main() {}' | $CXX $CFLAGS -x c++ - -o /dev/null -ljemalloc \
2>/dev/null; then
# This will enable some preprocessor identifiers in the Makefile
JEMALLOC=1
# JEMALLOC can be enabled either using the flag (like here) or by
# providing direct link to the jemalloc library
WITH_JEMALLOC_FLAG=1
fi
fi
if ! test $JEMALLOC && ! test $ROCKSDB_DISABLE_TCMALLOC; then
# jemalloc is not available. Let's try tcmalloc
if echo 'int main() {}' | $CXX $CFLAGS -x c++ - -o /dev/null \
-ltcmalloc 2>/dev/null; then
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -ltcmalloc"
JAVA_LDFLAGS="$JAVA_LDFLAGS -ltcmalloc"
fi
fi
if ! test $ROCKSDB_DISABLE_MALLOC_USABLE_SIZE; then
# Test whether malloc_usable_size is available
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <malloc.h>
int main() {
size_t res = malloc_usable_size(0);
return 0;
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_MALLOC_USABLE_SIZE"
fi
fi
if ! test $ROCKSDB_DISABLE_PTHREAD_MUTEX_ADAPTIVE_NP; then
# Test whether PTHREAD_MUTEX_ADAPTIVE_NP mutex type is available
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <pthread.h>
int main() {
int x = PTHREAD_MUTEX_ADAPTIVE_NP;
return 0;
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_PTHREAD_ADAPTIVE_MUTEX"
fi
fi
if ! test $ROCKSDB_DISABLE_BACKTRACE; then
# Test whether backtrace is available
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <execinfo.h>>
int main() {
void* frames[1];
backtrace_symbols(frames, backtrace(frames, 1));
return 0;
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_BACKTRACE"
else
# Test whether execinfo library is installed
$CXX $CFLAGS -lexecinfo -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <execinfo.h>
int main() {
void* frames[1];
backtrace_symbols(frames, backtrace(frames, 1));
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_BACKTRACE"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS -lexecinfo"
JAVA_LDFLAGS="$JAVA_LDFLAGS -lexecinfo"
fi
fi
fi
if ! test $ROCKSDB_DISABLE_PG; then
# Test if -pg is supported
$CXX $CFLAGS -pg -x c++ - -o /dev/null 2>/dev/null <<EOF
int main() {
return 0;
}
EOF
if [ "$?" = 0 ]; then
PROFILING_FLAGS=-pg
fi
fi
if ! test $ROCKSDB_DISABLE_SYNC_FILE_RANGE; then
# Test whether sync_file_range is supported for compatibility with an old glibc
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <fcntl.h>
int main() {
int fd = open("/dev/null", 0);
sync_file_range(fd, 0, 1024, SYNC_FILE_RANGE_WRITE);
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_RANGESYNC_PRESENT"
fi
fi
if ! test $ROCKSDB_DISABLE_SCHED_GETCPU; then
# Test whether sched_getcpu is supported
$CXX $CFLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <sched.h>
int main() {
int cpuid = sched_getcpu();
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_SCHED_GETCPU_PRESENT"
fi
fi
fi
# TODO(tec): Fix -Wshorten-64-to-32 errors on FreeBSD and enable the warning.
# -Wshorten-64-to-32 breaks compilation on FreeBSD i386
if ! [ "$TARGET_OS" = FreeBSD -a "$TARGET_ARCHITECTURE" = i386 ]; then
# Test whether -Wshorten-64-to-32 is available
$CXX $CFLAGS -x c++ - -o /dev/null -Wshorten-64-to-32 2>/dev/null <<EOF
int main() {}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -Wshorten-64-to-32"
fi
fi
# shall we use HDFS?
if test "$USE_HDFS"; then
if test -z "$JAVA_HOME"; then
echo "JAVA_HOME has to be set for HDFS usage."
exit 1
fi
HDFS_CCFLAGS="$HDFS_CCFLAGS -I$JAVA_HOME/include -I$JAVA_HOME/include/linux -DUSE_HDFS"
HDFS_LDFLAGS="$HDFS_LDFLAGS -lhdfs -L$JAVA_HOME/jre/lib/amd64"
HDFS_LDFLAGS="$HDFS_LDFLAGS -L$JAVA_HOME/jre/lib/amd64/server -L$GLIBC_RUNTIME_PATH/lib"
HDFS_LDFLAGS="$HDFS_LDFLAGS -ldl -lverify -ljava -ljvm"
COMMON_FLAGS="$COMMON_FLAGS $HDFS_CCFLAGS"
PLATFORM_LDFLAGS="$PLATFORM_LDFLAGS $HDFS_LDFLAGS"
JAVA_LDFLAGS="$JAVA_LDFLAGS $HDFS_LDFLAGS"
fi
if test -z "$PORTABLE"; then
if test -n "`echo $TARGET_ARCHITECTURE | grep ^ppc64`"; then
# Tune for this POWER processor, treating '+' models as base models
POWER=`LD_SHOW_AUXV=1 /bin/true | grep AT_PLATFORM | grep -E -o power[0-9]+`
COMMON_FLAGS="$COMMON_FLAGS -mcpu=$POWER -mtune=$POWER "
elif test -n "`echo $TARGET_ARCHITECTURE | grep ^s390x`"; then
COMMON_FLAGS="$COMMON_FLAGS -march=z10 "
elif test -n "`echo $TARGET_ARCHITECTURE | grep ^arm`"; then
# TODO: Handle this with approprite options.
COMMON_FLAGS="$COMMON_FLAGS"
elif [ "$TARGET_OS" == "IOS" ]; then
COMMON_FLAGS="$COMMON_FLAGS"
elif [ "$TARGET_OS" != "AIX" ] && [ "$TARGET_OS" != "SunOS" ]; then
COMMON_FLAGS="$COMMON_FLAGS -march=native "
elif test "$USE_SSE"; then
Port 3 way SSE4.2 crc32c implementation from Folly Summary: **# Summary** RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain. This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used. **# Performance Test Results** We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm. Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s. 1) ReadRandom in db_bench overall metrics PER RUN Algorithm | run | micros/op | ops/sec |Throughput (MB/s) 3-way | 1 | 4.143 | 241387 | 26.7 3-way | 2 | 3.775 | 264872 | 29.3 3-way | 3 | 4.116 | 242929 | 26.9 FastCrc32c|1 | 4.037 | 247727 | 27.4 FastCrc32c|2 | 4.648 | 215166 | 23.8 FastCrc32c|3 | 4.352 | 229799 | 25.4 AVG Algorithm | Average of micros/op | Average of ops/sec | Average of Throughput (MB/s) 3-way | 4.01 | 249,729 | 27.63 FastCrc32c | 4.35 | 230,897 | 25.53 2) Crc32c computation CPU cost (inclusive samples percentage) PER RUN Implementation | run |  TotalSamples | Crc32c percentage 3-way   | 1    |  4,572,250,000 | 4.37% 3-way   | 2    |  3,779,250,000 | 4.62% 3-way   | 3    |  4,129,500,000 | 4.48% FastCrc32c     | 1    |  4,663,500,000 | 11.24% FastCrc32c     | 2    |  4,047,500,000 | 12.34% FastCrc32c     | 3    |  4,366,750,000 | 11.68% **# Test Plan** make -j64 corruption_test && ./corruption_test By default it uses 3-way SSE algorithm NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test make clean && DEBUG_LEVEL=0 make -j64 db_bench make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench Closes https://github.com/facebook/rocksdb/pull/3173 Differential Revision: D6330882 Pulled By: yingsu00 fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
7 years ago
COMMON_FLAGS="$COMMON_FLAGS -msse4.2 -mpclmul"
fi
elif test "$USE_SSE"; then
Port 3 way SSE4.2 crc32c implementation from Folly Summary: **# Summary** RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain. This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used. **# Performance Test Results** We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm. Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s. 1) ReadRandom in db_bench overall metrics PER RUN Algorithm | run | micros/op | ops/sec |Throughput (MB/s) 3-way | 1 | 4.143 | 241387 | 26.7 3-way | 2 | 3.775 | 264872 | 29.3 3-way | 3 | 4.116 | 242929 | 26.9 FastCrc32c|1 | 4.037 | 247727 | 27.4 FastCrc32c|2 | 4.648 | 215166 | 23.8 FastCrc32c|3 | 4.352 | 229799 | 25.4 AVG Algorithm | Average of micros/op | Average of ops/sec | Average of Throughput (MB/s) 3-way | 4.01 | 249,729 | 27.63 FastCrc32c | 4.35 | 230,897 | 25.53 2) Crc32c computation CPU cost (inclusive samples percentage) PER RUN Implementation | run |  TotalSamples | Crc32c percentage 3-way   | 1    |  4,572,250,000 | 4.37% 3-way   | 2    |  3,779,250,000 | 4.62% 3-way   | 3    |  4,129,500,000 | 4.48% FastCrc32c     | 1    |  4,663,500,000 | 11.24% FastCrc32c     | 2    |  4,047,500,000 | 12.34% FastCrc32c     | 3    |  4,366,750,000 | 11.68% **# Test Plan** make -j64 corruption_test && ./corruption_test By default it uses 3-way SSE algorithm NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test make clean && DEBUG_LEVEL=0 make -j64 db_bench make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench Closes https://github.com/facebook/rocksdb/pull/3173 Differential Revision: D6330882 Pulled By: yingsu00 fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
7 years ago
COMMON_FLAGS="$COMMON_FLAGS -msse4.2 -mpclmul"
fi
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
cross-platform compatibility improvements Summary: We've had a couple CockroachDB users fail to build RocksDB on exotic platforms, so I figured I'd try my hand at solving these issues upstream. The problems stem from a) `USE_SSE=1` being too aggressive about turning on SSE4.2, even on toolchains that don't support SSE4.2 and b) RocksDB attempting to detect support for thread-local storage based on OS, even though it can vary by compiler on the same OS. See the individual commit messages for details. Regarding SSE support, this PR should change virtually nothing for non-CMake based builds. `make`, `PORTABLE=1 make`, `USE_SSE=1 make`, and `PORTABLE=1 USE_SSE=1 make` function exactly as before, except that SSE support will be automatically disabled when a simple SSE4.2-using test program fails to compile, as it does on OpenBSD. (OpenBSD's ports GCC supports SSE4.2, but its binutils do not, so `__SSE_4_2__` is defined but an SSE4.2-using program will fail to assemble.) A warning is emitted in this case. The CMake build is modified to support the same set of options, except that `USE_SSE` is spelled `FORCE_SSE42` because `USE_SSE` is rather useless now that we can automatically detect SSE support, and I figure changing options in the CMake build is less disruptive than changing the non-CMake build. I've tested these changes on all the platforms I can get my hands on (macOS, Windows MSVC, Windows MinGW, and OpenBSD) and it all works splendidly. Let me know if there's anything you object to—I obviously don't mean to break any of your build pipelines in the process of fixing ours downstream. Closes https://github.com/facebook/rocksdb/pull/2199 Differential Revision: D5054042 Pulled By: yiwu-arbug fbshipit-source-id: 938e1fc665c049c02ae15698e1409155b8e72171
8 years ago
#include <cstdint>
#include <nmmintrin.h>
int main() {
volatile uint32_t x = _mm_crc32_u32(0, 0);
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DHAVE_SSE42"
elif test "$USE_SSE"; then
echo "warning: USE_SSE specified but compiler could not use SSE intrinsics, disabling"
Port 3 way SSE4.2 crc32c implementation from Folly Summary: **# Summary** RocksDB uses SSE crc32 intrinsics to calculate the crc32 values but it does it in single way fashion (not pipelined on single CPU core). Intel's whitepaper () published an algorithm that uses 3-way pipelining for the crc32 intrinsics, then use pclmulqdq intrinsic to combine the values. Because pclmulqdq has overhead on its own, this algorithm will show perf gains on buffers larger than 216 bytes, which makes RocksDB a perfect user, since most of the buffers RocksDB call crc32c on is over 4KB. Initial db_bench show tremendous CPU gain. This change uses the 3-way SSE algorithm by default. The old SSE algorithm is now behind a compiler tag NO_THREEWAY_CRC32C. If user compiles the code with NO_THREEWAY_CRC32C=1 then the old SSE Crc32c algorithm would be used. If the server does not have SSE4.2 at the run time the slow way (Non SSE) will be used. **# Performance Test Results** We ran the FillRandom and ReadRandom benchmarks in db_bench. ReadRandom is the point of interest here since it calculates the CRC32 for the in-mem buffers. We did 3 runs for each algorithm. Before this change the CRC32 value computation takes about 11.5% of total CPU cost, and with the new 3-way algorithm it reduced to around 4.5%. The overall throughput also improved from 25.53MB/s to 27.63MB/s. 1) ReadRandom in db_bench overall metrics PER RUN Algorithm | run | micros/op | ops/sec |Throughput (MB/s) 3-way | 1 | 4.143 | 241387 | 26.7 3-way | 2 | 3.775 | 264872 | 29.3 3-way | 3 | 4.116 | 242929 | 26.9 FastCrc32c|1 | 4.037 | 247727 | 27.4 FastCrc32c|2 | 4.648 | 215166 | 23.8 FastCrc32c|3 | 4.352 | 229799 | 25.4 AVG Algorithm | Average of micros/op | Average of ops/sec | Average of Throughput (MB/s) 3-way | 4.01 | 249,729 | 27.63 FastCrc32c | 4.35 | 230,897 | 25.53 2) Crc32c computation CPU cost (inclusive samples percentage) PER RUN Implementation | run |  TotalSamples | Crc32c percentage 3-way   | 1    |  4,572,250,000 | 4.37% 3-way   | 2    |  3,779,250,000 | 4.62% 3-way   | 3    |  4,129,500,000 | 4.48% FastCrc32c     | 1    |  4,663,500,000 | 11.24% FastCrc32c     | 2    |  4,047,500,000 | 12.34% FastCrc32c     | 3    |  4,366,750,000 | 11.68% **# Test Plan** make -j64 corruption_test && ./corruption_test By default it uses 3-way SSE algorithm NO_THREEWAY_CRC32C=1 make -j64 corruption_test && ./corruption_test make clean && DEBUG_LEVEL=0 make -j64 db_bench make clean && DEBUG_LEVEL=0 NO_THREEWAY_CRC32C=1 make -j64 db_bench Closes https://github.com/facebook/rocksdb/pull/3173 Differential Revision: D6330882 Pulled By: yingsu00 fbshipit-source-id: 8ec3d89719533b63b536a736663ca6f0dd4482e9
7 years ago
exit 1
fi
$CXX $PLATFORM_CXXFLAGS $COMMON_FLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#include <cstdint>
#include <wmmintrin.h>
int main() {
const auto a = _mm_set_epi64x(0, 0);
const auto b = _mm_set_epi64x(0, 0);
const auto c = _mm_clmulepi64_si128(a, b, 0x00);
auto d = _mm_cvtsi128_si64(c);
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DHAVE_PCLMUL"
elif test "$USE_SSE"; then
echo "warning: USE_SSE specified but compiler could not use PCLMUL intrinsics, disabling"
exit 1
cross-platform compatibility improvements Summary: We've had a couple CockroachDB users fail to build RocksDB on exotic platforms, so I figured I'd try my hand at solving these issues upstream. The problems stem from a) `USE_SSE=1` being too aggressive about turning on SSE4.2, even on toolchains that don't support SSE4.2 and b) RocksDB attempting to detect support for thread-local storage based on OS, even though it can vary by compiler on the same OS. See the individual commit messages for details. Regarding SSE support, this PR should change virtually nothing for non-CMake based builds. `make`, `PORTABLE=1 make`, `USE_SSE=1 make`, and `PORTABLE=1 USE_SSE=1 make` function exactly as before, except that SSE support will be automatically disabled when a simple SSE4.2-using test program fails to compile, as it does on OpenBSD. (OpenBSD's ports GCC supports SSE4.2, but its binutils do not, so `__SSE_4_2__` is defined but an SSE4.2-using program will fail to assemble.) A warning is emitted in this case. The CMake build is modified to support the same set of options, except that `USE_SSE` is spelled `FORCE_SSE42` because `USE_SSE` is rather useless now that we can automatically detect SSE support, and I figure changing options in the CMake build is less disruptive than changing the non-CMake build. I've tested these changes on all the platforms I can get my hands on (macOS, Windows MSVC, Windows MinGW, and OpenBSD) and it all works splendidly. Let me know if there's anything you object to—I obviously don't mean to break any of your build pipelines in the process of fixing ours downstream. Closes https://github.com/facebook/rocksdb/pull/2199 Differential Revision: D5054042 Pulled By: yiwu-arbug fbshipit-source-id: 938e1fc665c049c02ae15698e1409155b8e72171
8 years ago
fi
# iOS doesn't support thread-local storage, but this check would erroneously
# succeed because the cross-compiler flags are added by the Makefile, not this
# script.
if [ "$PLATFORM" != IOS ]; then
$CXX $COMMON_FLAGS -x c++ - -o /dev/null 2>/dev/null <<EOF
#if defined(_MSC_VER) && !defined(__thread)
#define __thread __declspec(thread)
#endif
int main() {
static __thread int tls;
}
EOF
if [ "$?" = 0 ]; then
COMMON_FLAGS="$COMMON_FLAGS -DROCKSDB_SUPPORT_THREAD_LOCAL"
fi
fi
PLATFORM_CCFLAGS="$PLATFORM_CCFLAGS $COMMON_FLAGS"
PLATFORM_CXXFLAGS="$PLATFORM_CXXFLAGS $COMMON_FLAGS"
VALGRIND_VER="$VALGRIND_VER"
ROCKSDB_MAJOR=`build_tools/version.sh major`
ROCKSDB_MINOR=`build_tools/version.sh minor`
ROCKSDB_PATCH=`build_tools/version.sh patch`
echo "CC=$CC" >> "$OUTPUT"
echo "CXX=$CXX" >> "$OUTPUT"
echo "PLATFORM=$PLATFORM" >> "$OUTPUT"
echo "PLATFORM_LDFLAGS=$PLATFORM_LDFLAGS" >> "$OUTPUT"
echo "JAVA_LDFLAGS=$JAVA_LDFLAGS" >> "$OUTPUT"
echo "JAVA_STATIC_LDFLAGS=$JAVA_STATIC_LDFLAGS" >> "$OUTPUT"
echo "VALGRIND_VER=$VALGRIND_VER" >> "$OUTPUT"
echo "PLATFORM_CCFLAGS=$PLATFORM_CCFLAGS" >> "$OUTPUT"
echo "PLATFORM_CXXFLAGS=$PLATFORM_CXXFLAGS" >> "$OUTPUT"
echo "PLATFORM_SHARED_CFLAGS=$PLATFORM_SHARED_CFLAGS" >> "$OUTPUT"
echo "PLATFORM_SHARED_EXT=$PLATFORM_SHARED_EXT" >> "$OUTPUT"
echo "PLATFORM_SHARED_LDFLAGS=$PLATFORM_SHARED_LDFLAGS" >> "$OUTPUT"
echo "PLATFORM_SHARED_VERSIONED=$PLATFORM_SHARED_VERSIONED" >> "$OUTPUT"
echo "EXEC_LDFLAGS=$EXEC_LDFLAGS" >> "$OUTPUT"
echo "JEMALLOC_INCLUDE=$JEMALLOC_INCLUDE" >> "$OUTPUT"
echo "JEMALLOC_LIB=$JEMALLOC_LIB" >> "$OUTPUT"
echo "ROCKSDB_MAJOR=$ROCKSDB_MAJOR" >> "$OUTPUT"
echo "ROCKSDB_MINOR=$ROCKSDB_MINOR" >> "$OUTPUT"
echo "ROCKSDB_PATCH=$ROCKSDB_PATCH" >> "$OUTPUT"
echo "CLANG_SCAN_BUILD=$CLANG_SCAN_BUILD" >> "$OUTPUT"
echo "CLANG_ANALYZER=$CLANG_ANALYZER" >> "$OUTPUT"
echo "PROFILING_FLAGS=$PROFILING_FLAGS" >> "$OUTPUT"
echo "FIND=$FIND" >> "$OUTPUT"
echo "WATCH=$WATCH" >> "$OUTPUT"
# This will enable some related identifiers for the preprocessor
if test -n "$JEMALLOC"; then
echo "JEMALLOC=1" >> "$OUTPUT"
fi
# Indicates that jemalloc should be enabled using -ljemalloc flag
# The alternative is to porvide a direct link to the library via JEMALLOC_LIB
# and JEMALLOC_INCLUDE
if test -n "$WITH_JEMALLOC_FLAG"; then
echo "WITH_JEMALLOC_FLAG=$WITH_JEMALLOC_FLAG" >> "$OUTPUT"
fi
echo "LUA_PATH=$LUA_PATH" >> "$OUTPUT"