Skip to content

On disk file format

vmx edited this page Mar 27, 2012 · 19 revisions

This is a wiki page to draft the new GeoCouch on disk file format.

Blocks and Chunks

Same as for couchstore: https://github.com/couchbaselabs/couchstore/wiki/Format

File Header

Prefixed with a 32 bit length and a hash, similarly to other data chunks, but the length does include the length of the hash.

Values in the file header

  • 8 bits -- File format version (Currently 3)
  • 48 bits -- Update sequence number. This is the sequence number new updates should start at.
  • 48 bits -- Purge sequence number.
  • 48 bits -- Purged documents pointer
  • 16 bits -- Size of R-tree root
  • The R-tree root, it is a node pointer as described in the "B-trees" section of the couchstore file format.

R-tree

The K/V and K/P nodes have the same format as the B-trees have (see https://github.com/couchbaselabs/couchstore/wiki/Format for more)

The Spatial index

The keys in this R-tree are raw blobs with whatever comes from Erlang. It is a list with keys, one for every dimension.

The values are

  • 12 bits -- Size of the document ID
  • 28 bits -- Size of the geometry
  • 48 bits -- Position of the geometry content on disk
  • 32 bits -- Size of the document data
  • 48 bits -- Position of the document content on disk
  • Document ID

The content type will always be JSON, as we emitted JSON from the emit() function.

The revision metadata is not needed.

The reduce value in this R-tree will be the bitmask for the superstar index and the value of the reduce function.

It still needs to be decided which format the geometry will have. As calculations need to be made, perhaps Well-Known Binary (see Section 3.3 of the official standard or in more readable HTML) would make sense, which is supported by every major geo library.

Clone this wiki locally