Using Token offsets makes us sensitive to the tokenization algorithm. Better instead to use character offsets if that's reasonable to do.