perhaps: - rip out all the headers and put them in a separate document - put together a tag:dataset mapping document