Skip to content

RocksDB

William Zhang edited this page Nov 24, 2017 · 3 revisions

Common

https://github.com/facebook/rocksdb/INSTALL.md https://github.com/facebook/rocksdb/wiki/RocksJava-Basics

Checkout a stable branch/tag since the latest version (master) is not stable.

$ git checkout -b v4.11.2 v4.11.2

Windows

VS2013 x64 Native Tools Command Prompt
D:\> mkdir build
D:\> cd build
D:\> cmake .. -G "Visual Studio 12 2013 Win64" -DJNI=1
D:\> msbuild /m rocksdb.sln

Linux

$ scl enable devtoolset-3 bash
$ cmake .
$ make -j clean
$ make -j check
$ make -j rocksdbjava # For rocksdbjni

MacOS

  1. Remove “set(SYSTEM_LIBS ${CMAKE_THREAD_LIBS_INIT} rt)” from CMakeLists.txt if exists.
  2. Clone gflags 2.0, configure and make install it. “brew install gflags” not enough for dev.
  3. brew install zlib, lz4, snappy, zstd.

gflags is needed by tools/utilities. @see CMakeLists.txt and thirdparty.inc to turn on features like zstd. Turn on the features in CMakeLists.txt: OFF => ON, or list them in command line options like below.

$ rm -rf CMakeCache.txt CMakeFiles/
$ CXXFLAGS="-DGFLAGS=google" LDFLAGS="-lgflags" cmake .
$ CXXFLAGS="-DGFLAGS=google -DJEMALLOC -DZLIB -DSNAPPY -DLZ4 -DZSTD" LDFLAGS="-lgflags -ljemalloc -lz -lsnappy -llz4 -lzstd" cmake .
$ make VERBOSE=1 -j

jdb_bench.sh (Sample code for macOS.)

$ make -j rocksdbjava
$ cd java
$ make db_bench
$ ./jdb_bench.sh
  1. Since there are multiple jars in target, make a change in jdb_bench.sh:
    -ROCKS_JAR=`find target -name rocksdbjni*.jar`
    +ROCKS_JAR=target/rocksdbjni-5.4.0-osx.jar
        
  2. And compressors should be there in target/.
    libsnappy.dylib -> /usr/local/Cellar/snappy/1.1.4/lib/libsnappy.1.dylib
        

utility

$ brew install rocksdb
# Assuming there is a db with <k, v> = <long in big endian, int in little endian>
$ rocksdb_ldb dump --db=/tmp/long-int/ --hex >x
$ head x
0x0000000000000000 ==> 0x00000000
0x0000000000000001 ==> 0x01000000
0x0000000000000002 ==> 0x02000000
0x0000000000000003 ==> 0x03000000
0x0000000000000004 ==> 0x04000000
0x0000000000000005 ==> 0x05000000
0x0000000000000006 ==> 0x06000000
0x0000000000000007 ==> 0x07000000
0x0000000000000008 ==> 0x08000000
0x0000000000000009 ==> 0x09000000

$ rocksdb_dump --db_path long-int/ --dump_location /tmp/dump.out
$ rocksdb_undump --db_path long-int-new/  --dump_location=/tmp/dump.out

flush

以 Big/Little endian 的形式保存 4 字节的整数 Key(以及 4 字节的整数 Value),持续不断的插入数据。发现两个现象:

  1. Big endian 形式的 Key 写出的 L0 级别的文件比较小,大约为 1M。 Little endian 形式的 Key 写出的 L0 级别的文件比较大,大约为 4M。
  2. Big endian 形式的 Key 写出的文件数目很多,每个大小都差不多,看起来像是没有 compact 一样。Little endian 的文件数目明显较少。

从 RocksDB 的日志看到:

[default] [JOB 2] Flushing memtable with next log file: 6 EVENT_LOG_v1 {“time_micros”: 2495769399809332, “job”: 2, “event”: “flush_started”, “num_memtables”: 1, “num_entries”: 117247, “num_deletes”: 0, “memory_usage”: 4065400}

两种方式下内存表的 memory_usage 都约为 4M,与参数配置 max_write_buffer_size 一致。但是刷出去的文件大小有明显差异。需要查找代码,分析具体原因。Status FlushJob::WriteLevel0Table()。

Big endian 方式写出去的文件记录是基本有序的,因此不同的 L0 文件之间没有交集,合并到 L1 等级别的时候,文件保持了原始大小(直接拷贝了)。

Clone this wiki locally