A relational database system in C++. Indexing, join, select, project. Supports - int, float and varchar data types
- The basic unit of data that is written/read from disk is a page of size 4096 bytes (4 KB).
- All records are stored in an unordered heap-file structure.
- Every record has a unique record id, which is a tuple consisting of the page number and the serial no. of the record in this page.
- Field access occurs in O(1) unit time.
The page structure is as follows:
[record1, record2, ..., record N, xx-freespace-xx, slot-table]
The record structure is as follows:
[version, null-byte array, field-offsets array, field1, field2, field3, ...]
For int and float, 4 bytes are used. For varchar, 4 bytes are used to store length of characters, then 1-byte per character.
Insert: The db-file is searched for a page, such that it has enough space to accomodate the record.
Delete: The corresponding record is deleted from the page. Subsequent records are moved to "the left", so that there aren't any gaps between records.
Update: First it is checked if the same page can accomadate the updated record. If yes, then the record is modified in place, and subsequent records are moved to "the left" or to "the right". If there is not enough space, the record is moved to another page and a tombstone is left in the original location. This tombstone indicates the RID: pagenumber, slotnum of the record in a new page.
B+ Tree indexes can be created on any attribute(s), for any data-type.
The non-leaf nodes of the tree contain 'pointers' to go to the appropriate leaf nodes. The leaf nodes contain the actual key values and the corresponding RIDs for that record.
So, record retrieval works in the following way:
P.S. Range selects/queries are also supported
The following diagram depicts the indexing structure:
These operators are applied 'on-the-fly' to records.