Skip to content

support hash-based object comparisons? #4

@perezd

Description

@perezd

So, its known now that as of couchdb 0.10 that document revs are a monotonically incrementing value + a deterministic md5 hash of the document at that moment in time. (more info: http://jchrisa.net/drl/_design/sofa/_list/post/post-page?startkey=%5B%22Deep-Couch-Deterministic-Revs-for-Idempotent-PUTs%22%5D)

The way Txn currently works is that it GETs the id, then uses obj_diff to make a decision on if we need to do a PUT. While this is a fine approach, it can be variable cost in terms of I/O and computation.

I'd like to propose an alternative approach, using HEAD. Here's how it would work:

  1. txn does a HEAD request for a doc id. on success the etag has a rev value (1-122ade142b..)
  2. we strip the monotonic header (the 1-) and what we have left is the deterministic content hash
  3. we define a node.js implementation of the couchdb rev hash function, and pass our candidate object into it to get a hash value.
  4. we compare the fetched hash with our locally computed hash
  5. if the hashes are not different, we could do 2 things, either do a full GET and obj_diff (like normal) then a PUT, or just simply a PUT, because its different.

The benefit here is that we can rely on a much simpler diff comparator and use a lot less I/O during the comparison step.

Thoughts?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions