-
Notifications
You must be signed in to change notification settings - Fork 618
Environment
This page contains information that is intended to help start hacking on Peloton. It covers a wide set of topics related to software development, like development environment, debugging, version control etc. We are only attempting to provide biased suggestions that we think are invaluable for low-level software development. Feel free to deviate from them in case you are comfortable with other tools that solve similar problems. Special thanks to Florian Funke
. This guide is heavily inspired by a similar one written by him.
We prefer the Linux
OS for development and testing. You can use any other OS for development as long as your modified source code can be built and run on Linux.
In case you are a Mac OS
or Windows
user, we encourage you to consider using Virtual Box
and Vagrant
. Virtual Box
is a hypervisor for x86 computers, and Vagrant
is a tool for sharing easy to configure virtual development environments. More information is available here.
We are developing Peloton in C++
. In particular, we are following the C++11
standard. C++ provides a lot of leeway in DBMS development compared to other high-level languages. For instance, it supports both manual and automated memory management, varied styles of programming, stronger type checking, different kinds of polymorphism etc.
Here's a list of useful references :
-
CPP Reference is an online reference of the powerful
Standard Template Library
(STL). - C++ FAQ covers a lots of topics.
Here's a list of C++11 features that you might want to make use of:
-
auto
type inference - Range-based
for
loops - Smart pointers, in particular
unique_ptr
. - STL data structures, such as
unordered_set
,unordered_map
, etc. - Threads, deleted member functions, lambdas, etc.
Please comment your code. Comment all the class definitions, non-trivial member functions and variables, and all the steps in your algorithms. Here's an example.
We follow the Google C++ style guide. As they mention in that guide, these rules exist to keep the code base manageable while still allowing coders to use C++ language features productively.
Make sure that you follow the naming rules. For instance, use class UpperCaseCamelCase
for type names, int lower_case_with_underscores
for variable/method/function names.
Please refrain from using any libraries other than the STL
(and googletest
for unit testing) without contacting me.
Organize source code files into relevant folders based on their functionality. Separate binary files from source files, and production code from testing code.
In general, the src
directory contains all the production code, the test
directory contains all the testing code, and the build
directory contains all the built binary files. Within the src
directory, the src/backend
directory contains all the Peloton files, while the src/postgres
directory contains all the Postgres files. Within the Peloton directory, src/backend/storage' contains files related to our storage engine, while
src/backend/index` contains files related to the auxiliary access structures.
We exclusively use git
for source code management, and github
for collaboration. Here's a simple guide for using it. If you have never used a tool like git, I am sure that you will soon wonder how you managed to live without it. In case you want to learn more, here's a free book.
Please update or add a .gitignore
file to exclude unwanted files (e.g. the build directory with binary files, backup/temporary files of your editor/IDE, large files containing test data, etc.) from the repository.
We use autotools
for building Peloton. In particular, we use automake
for automatically generating Makefile.in
files. You will probably only need to add file names to existing automake files (with extension .am
), or create an automake file similar to an already existing one in your project.
Each module in Peloton has its own automake file. For instance, here's the Makefile.am within the storage (src/backend/storage
) module. If we want add another file foo.cpp
within the storage module, then we should add it to storage_FILES
in the Makefile.am. On the other hand, if we want to add a new baz
module, then we can create a new Makefile.am similar to the one in the storage module, and store it in src/backend/baz
directory. Then, we can include it in the higher-level Makefile.am within the src
directory.
In case you are curious about autotools
, here's a short tutorial.
GCC's g++ is popular and proven, while LLVM's clang++ is also a great, free C++ compiler and a promising challenger to g++ (especially its comprehensible error and warning messages are compelling). For both, I recommend the latest version, in particular because C++11 support is constantly being improved.
Useful compiler flags:
-std=c++11/-std=c++0x: Enable (experimental) C++11 (C++0x) support
-g: Denerate debug symbols
-O0: Disable optimizations to allow for more reliable debugging (update: use -Og, if supported by your compiler). Use -O3 when running bechmarks.
-Wall (GCC) and resp. -Weverything (Clang): Generate helpful warnings. Do not ignore them! In fact, force yourself to deal with warnings by turning them into errors with -Werror.
Use a debugger to find bugs, don't rely on debug output. Good debuggers: GDB (the GNU debugger) and LLVM's LLDB. Most IDEs have a graphical debugger front-end, but the command line can already be very helpful when your program crashes. There's a curses-based interface for gdb, called cgdb that I can recommend. Little known fact: GDB now supports (limited) reverse debugging.
If your program behaves somehow "indeterministic" or "mysterious", Valgrind is your friend. Valgrind's memcheck finds illegal accesses to memory, uninitialized reads and much more. The option --db-attach=yes starts the debugger when an error is found. Check out this blog post on the interaction between GDB and Valgrind. Valgrind's Helgrind and DRD can help you find thread-related problems. This short blog post gives some helpful advise on how to detect the cause of a deadlock. A significantly faster and only marginally less thorough alternative to Valgrind's memcheck is AddressSanitizer.
Before making a commit in your SCM system, make sure your program is memchecked and passes all unit tests.
If you prefer an IDE over a setup with just an editor and a command line, Eclipse with CDT is a good (but heavyweight) cross-platform IDE. KDevelop is also a good choice for KDE users. Both have the advantage, that you can easily import make-based projects and build your programs from within the IDE using make. C++ guru (and Microsoft employee) Herb Sutter recommends the free version of Microsoft Visual C++ for Windows users.
Unit tests help to improve the correctness of your code and prevent regression. googletest a great unit-testing framework for C++. Use it to write testcases for each class/algorithm that actually try to break it (this is easier if you write your unit tests before you implement the code itself). Include corner cases and try to find off-by-one errors. Use realistic parameters, e.g. dozens of threads and millions of elements in your data structures. bcov is a code coverage analysis tool that tells you how much of your code is covered by your unit tests. Using a code coverage tool is probably a case of using a sledgehammer to crack a nut for your (smallish) project, but I found it worth mentioning... especially since my boss wrote it.
Profilers help you to understand the performance of your program (and the environment it is running in). As profiling is probably not required for assignments/projects (but may help!), I keep this section short. To put it bluntly: Avoid oprofile (and gprof), prefer perf. It is easy to use perf and it helps you understand where you spend your CPU cycles, how many cache misses you produce and much more. If you qualify for a student license, you can get Intel® VTune™ for free. Check it out, it's complex but amazing.
We use Eclipse
for development. First, install the EGit plug-in that enables Eclipse to work with Git repositories.
- Create a new workspace for Eclipse (if necessary).
- Select
File -> Import -> Git -> Projects
from Git. - When the next panel comes up, click the
Clone
button. In the next window, enter the path to the Github repository into the URI field at Location:git@github.com:cmu-db/peloton.git
. Then, clickNext
. - In the next panel you can select which branches you wish to clone from the remote repository. You most likely only need to clone the
master
branch. Then, clickNext
. - Select the location on your local machine where you wish to store your cloned repository. You can leave the other defaults. Then, click
Finish
. - It will now begin to pull down the repository. Once it’s finished, select
peloton
. Then, clickNext
. - In the next panel, select the
Import Existing Projects
option at the top. Then, clickNext
. - In the next page, select the
peloton
checkbox. ClickFinish
. Have fun hacking !
These tools are installed by the Vagrantfile
.
- eclipse-4.4 Eclipse Luna
- tmux Terminal Multiplexer
- zsh Z Shell