-
Notifications
You must be signed in to change notification settings - Fork 97
Home
MongoRocks is MongoDB with RocksDB as a storage engine. It's plugging into MongoDB though the storage engine API, which was released as part of MongoDB 3.0: http://docs.mongodb.org/manual/faq/storage/
To report bugs, ask questions or leave feedback, please use MongoRocks Google Group: https://groups.google.com/forum/#!forum/mongo-rocks
In version 3.0, RocksDB code is part of main Mongo repository. In 3.2 and going forward, the code for MongoRocks will be a separate module. There are currently two repositories:
- https://github.com/mongodb-partners/mongo-rocks -- This is the repository for MongoRocks module for versions 3.2 and going forward. It's still in active development, as is MongoDB version 3.2.
- https://github.com/mongodb-partners/mongo -- This is the fork of MongoDB repository and it's used for developing version 3.0. There are two branches:
- v3.0-fb -- Development branch. This is where all the new commits and fixes go.
- v3.0-mongorocks -- A bit more stable branch. We move v3.0-fb branch to v3.0-mongorocks when we ran it in production for a while without issues.
This all matters to you if you wish to compile from source. If not, we will publish binaries to our Google Group regularly.
At this point, we have been running MongoRocks in production at Parse for couple of months. 30% of Parse's replica sets are running with MongoRocks primaries. However, we are being a bit conservative and have not "released" a legit MongoDB with a version number yet. Feel free to benchmark and run shadow test, but please don't use it in production yet. We hope to release a version of MongoRocks at the beginning of July, at which point you'll be able to run in production.
For the most part, running MongoDB with RocksDB storage engine should be transparent to the user. However, there are some cool features and configuration options that can make your experience even better.
RocksDB's files are immutable. This means that backups are easy and fast: 1. Find a list of live files, 2. hard link to a different directory (copy if the destination is on the different file system). To issue a backup, you can call:
db.adminCommand({setParameter:1, rocksdbBackup: "/var/lib/mongodb/backup/1"})
(Yes, we're aware that it's a bit silly to use setParameter API to issue backups. We're planning to move to MongoDB's command API in 3.2)
This will create a directory /var/lib/mongodb/backup/1
(it should not exist before) and hardlink all the relevant files. You can then copy those files to S3 or HDFS in the background. We're building a tool that will incrementally backup MongoRocks to S3. Keep an eye on MongoRocks Google Group for the announcement.
RocksDB's writes are very fast because bulk of the work is done in the background in a process called compaction. Compactions are automatically triggered when the state of LSM tree becomes non-ideal. However, you can also trigger the compaction manually. After the compaction is done your reads will be faster and space used on disk a bit smaller (approximately 10%). To schedule a manual compaction, you can call:
db.adminCommand({setParameter:1, rocksdbCompact: 1})
Configuring RocksDB is a bit of an art. We hope that the default configuration will be good for most cases, but you can always get better performance by tuning, especially if your workload is special in some way.
There are couple of parameters you configure:
-
--rocksdbCacheSizeGB
orstorage.rocksdb.cacheSizeGB
-- size of RocksDB's block cache. By default 50% of RAM. We keep uncompressed pages in the block cache and compressed pages in the kernel's page cache. You can also configure block cache size dynamically by calling:db.adminCommand({setParameter:1, rocksdbRuntimeConfigCacheSizeGB: 10})
-
--rocksdbCompression
orstorage.rocksdb.compression
-- compression. By default this issnappy
. Other available options arenone
andzlib
. If your binary doesn't support the requested compression, opening the database will fail. -
--rocksdbConfigString
orstorage.rocksdb.configString
-- through this parameter you can configure all other RocksDB options. We'll need to write a separate wiki page to explain all the options available there :) -
--rocksdbMaxWriteMBPerSec
orstorage.rocksdb.maxWriteMBPerSec
-- default is 1024. RocksDB compactions can create spiky writes to IO, which can cause higher P99 storage read latency. You can use this option to smooth our the writes. For example, if you set this to 100MB/s, RocksDB will make sure to never write more than 100MB/s to storage. That way writes will be smoother and there will be storage bandwidth available for reads to go through. You can also change this value dynamically by callingdb.adminCommand({setParameter:1, rocksdbRuntimeConfigMaxWriteMBPerSec:30})
-
--rocksdbCrashSafeCounters
orstorage.rocksdb.crashSafeCounters
-- false by default. This means that if your database performs an unclean shutdown, the counters for number of records in a collections might be wrong. You can correct them with MongoDB'svalidate
call. This is similar to WiredTiger behavior. If you set this option to true, then we'll make sure that counters are correct even after a crash. Write performance might suffer a bit, of course.
In this section, we'll write about any issues that might happen when running MongoRocks and how to fix them. Currently, we're aware of one thing to be careful about.
By default, Linux kernel allows each process to use only 1024 file descriptors. MongoRocks is configured in such a way that it's using 32MB files, so if your database is 1TB in size, you'll need 32K files. Before running MongoRocks, please increase the number of file descriptors that MongoDB process can use. Here's the recommended setting of ulimit from MongoDB's docs: http://docs.mongodb.org/manual/reference/ulimit/#recommended-ulimit-settings
Run db.serverStatus()["rocksdb"]
and enjoy. We'll write a separate wiki page explaining what all of this means. In the meantime, you can start here: https://github.com/facebook/rocksdb/wiki/RocksDB-Tuning-Guide#compaction-stats
These blog posts describe Parse's experiences with MongoDB with RocksDB: