Skip to content

How much could we compress the list of trust anchors? #64

@bwesterb

Description

@bwesterb

I do not think clients should send a lot of trust anchors. Nor do I think we should complicate the standard with a compression mechanism like this. It's good to think about though.

Coding subsets of integers

Theoretical limit

For the moment, let's simplify and assume trust anchors are numbers instead of OIDs. Say there are k=200 anchors with maximum value N=62,203 (number of PENs today). There are N choose k possible subsets of anchors to communicate. Thus, we cannot expect to code every subset with less than lg N choose k bits. We can approximate this well using Stirling's approximation: 243 bytes.

Rice coding

We can get quite close to this limit using a very simple method: Rice coding of differences. For each delta we compute q = floor(d / 2^b) and r = d mod 2^b, where b = round(-lg(-lg(1-k/N))) = 8. q is encoded in unary and r in binary, so 600 would be encoded as 0b11001011000. The 110 encodes q=2; the remainder is the remainder r. The average size using Rice coding for random subsets is about 244.3 bytes with a 95th percentile of 244.8 bytes.

Rice coding doesn't take advantage of big gaps. If we want to code {N-200, N-199, ..., N-1} that takes 255.25 bytes.

Huffman bitlength

A different approach is to write each delta in two parts: first code its bitlength, and then to write the delta (without its most significant bit). For the bitlengths we can use a Huffman code, which can be compressed efficiently using canonical trees and delta coding of the code lengths as in bzip2. (Here is an implementation.)

For random subsets this performs worse than Rice coding: 253 bytes on average with a 95th percentile of 255 bytes. However, it encodes {N-200, N-199, ..., N-1} in only 50 bytes, and there is room for improvement.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions