FYI, the download script used in the huggingface repo is broklen (Atleast for me in python 3.11.x) `AttributeError: 'DownloadConfig' object has no attribute 'use_auth_token'` Manually grabbing the jsonl and using the DOCCI_AAR_URL_PATTERN works though.