Skip to content
This repository was archived by the owner on Sep 27, 2019. It is now read-only.

Environment

Joy Arulraj edited this page Dec 14, 2015 · 17 revisions

Prologue

This page contains information that is intended to help start hacking on Peloton. It covers a wide set of topics related to software development, like development environment, debugging, version control etc. We are only attempting to provide biased suggestions that we think are invaluable for low-level software development. Feel free to deviate from them in case you are comfortable with other tools that solve similar problems. Special thanks to Florian Funke. This guide is heavily inspired by a similar one written by him.

Operating System

We prefer the Linux OS for development and testing. You can use any other OS for development as long as your modified source code can be built and run on Linux.

In case you are a Mac OS or Windows user, we encourage you to consider using Virtual Box and Vagrant. Virtual Box is a hypervisor for x86 computers, and Vagrant is a tool for sharing easy to configure virtual development environments. More information is available here.

Language

We are developing Peloton in C++. In particular, we are following the C++11 standard. C++ provides a lot of leeway in DBMS development compared to other high-level languages. For instance, it supports both manual and automated memory management, varied styles of programming, stronger type checking, different kinds of polymorphism etc.

Here's a list of useful references :

  1. CPP Reference is an online reference of the powerful Standard Template Library (STL).
  2. C++ FAQ covers a lots of topics.

Here's a list of C++11 features that you might want to make use of:

  1. auto type inference
  2. Range-based for loops
  3. Smart pointers, in particular unique_ptr.
  4. STL data structures, such as unordered_set, unordered_map, etc.
  5. Threads, deleted member functions, lambdas, etc.

Comments, Formatting, and Libraries

Please comment your code. Comment all the class definitions, non-trivial member functions and variables, and all the steps in your algorithms. Here's an example.

We follow the Google C++ style guide. As they mention in that guide, these rules exist to keep the code base manageable while still allowing coders to use C++ language features productively. Make sure that you follow the naming rules. For instance, use class UpperCaseCamelCase for type names, int lower_case_with_underscores for variable/method/function names.

Please refrain from using any libraries other than the STL (and googletest for unit testing) without contacting me.

Directory Structure

Organize source code files into relevant folders based on their functionality. Separate binary files from source files, and production code from testing code.

In general, the src directory contains all the production code, the test directory contains all the testing code, and the build directory contains all the built binary files. Within the src directory, the src/backend directory contains all the Peloton files, while the src/postgres directory contains all the Postgres files. Within the Peloton directory, src/backend/storage' contains files related to our storage engine, while src/backend/index` contains files related to the auxiliary access structures.

Source Code Management

We exclusively use git for source code management, and github for collaboration. Here's a simple guide for using it. If you have never used a tool like git, I am sure that you will soon wonder how you managed to live without it. In case you want to learn more, here's a free book.

Please update or add a .gitignore file to exclude unwanted files (e.g. the build directory with binary files, backup/temporary files of your editor/IDE, large files containing test data, etc.) from the repository.

Build System

We use autotools for building Peloton. In particular, we use automake for automatically generating Makefile.in files. You will probably only need to add file names to existing automake files (with extension .am), or create an automake file similar to an already existing one in your project.

Each module in Peloton has its own automake file. For instance, here's the Makefile.am within the storage (src/backend/storage) module. If we want add another file foo.cpp within the storage module, then we should add it to storage_FILES in the Makefile.am. On the other hand, if we want to add a new baz module, then we can create a new Makefile.am similar to the one in the storage module, and store it in src/backend/baz directory. Then, we can include it in the higher-level Makefile.am within the src directory.

In case you are curious about autotools, here's a short tutorial.

Compiler

GCC's g++ is popular and proven, while LLVM's clang++ is also a great, free C++ compiler and a promising challenger to g++ (especially its comprehensible error and warning messages are compelling). For both, I recommend the latest version, in particular because C++11 support is constantly being improved.

Useful compiler flags:

-std=c++11/-std=c++0x: Enable (experimental) C++11 (C++0x) support
-g: Denerate debug symbols
-O0: Disable optimizations to allow for more reliable debugging (update: use -Og, if supported by your compiler). Use -O3 when running bechmarks.
-Wall (GCC) and resp. -Weverything (Clang): Generate helpful warnings. Do not ignore them! In fact, force yourself to deal with warnings by turning them into errors with -Werror.

Debugging Tools

Use a debugger to find bugs, don't rely on debug output. Good debuggers: GDB (the GNU debugger) and LLVM's LLDB. Most IDEs have a graphical debugger front-end, but the command line can already be very helpful when your program crashes. There's a curses-based interface for gdb, called cgdb that I can recommend. Little known fact: GDB now supports (limited) reverse debugging.

If your program behaves somehow "indeterministic" or "mysterious", Valgrind is your friend. Valgrind's memcheck finds illegal accesses to memory, uninitialized reads and much more. The option --db-attach=yes starts the debugger when an error is found. Check out this blog post on the interaction between GDB and Valgrind. Valgrind's Helgrind and DRD can help you find thread-related problems. This short blog post gives some helpful advise on how to detect the cause of a deadlock. A significantly faster and only marginally less thorough alternative to Valgrind's memcheck is AddressSanitizer.

Before making a commit in your SCM system, make sure your program is memchecked and passes all unit tests.

Integrated Development Environments (IDEs)

If you prefer an IDE over a setup with just an editor and a command line, Eclipse with CDT is a good (but heavyweight) cross-platform IDE. KDevelop is also a good choice for KDE users. Both have the advantage, that you can easily import make-based projects and build your programs from within the IDE using make. C++ guru (and Microsoft employee) Herb Sutter recommends the free version of Microsoft Visual C++ for Windows users.

Testing

Unit tests help to improve the correctness of your code and prevent regression. googletest a great unit-testing framework for C++. Use it to write testcases for each class/algorithm that actually try to break it (this is easier if you write your unit tests before you implement the code itself). Include corner cases and try to find off-by-one errors. Use realistic parameters, e.g. dozens of threads and millions of elements in your data structures. bcov is a code coverage analysis tool that tells you how much of your code is covered by your unit tests. Using a code coverage tool is probably a case of using a sledgehammer to crack a nut for your (smallish) project, but I found it worth mentioning... especially since my boss wrote it.

Profiling

Profilers help you to understand the performance of your program (and the environment it is running in). As profiling is probably not required for assignments/projects (but may help!), I keep this section short. To put it bluntly: Avoid oprofile (and gprof), prefer perf. It is easy to use perf and it helps you understand where you spend your CPU cycles, how many cache misses you produce and much more. If you qualify for a student license, you can get Intel® VTune™ for free. Check it out, it's complex but amazing.

Eclipse Setup

We use Eclipse for development. First, install the EGit plug-in that enables Eclipse to work with Git repositories.

  1. Create a new workspace for Eclipse (if necessary).
  2. Select File -> Import -> Git -> Projects from Git.
  3. When the next panel comes up, click the Clone button. In the next window, enter the path to the Github repository into the URI field at Location: git@github.com:cmu-db/peloton.git. Then, click Next.
  4. In the next panel you can select which branches you wish to clone from the remote repository. You most likely only need to clone the master branch. Then, click Next.
  5. Select the location on your local machine where you wish to store your cloned repository. You can leave the other defaults. Then, click Finish.
  6. It will now begin to pull down the repository. Once it’s finished, select peloton. Then, click Next.
  7. In the next panel, select the Import Existing Projects option at the top. Then, click Next.
  8. In the next page, select the peloton checkbox. Click Finish. Have fun hacking !

These tools are installed by the Vagrantfile.

Clone this wiki locally