-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Right now, our general policy when ingesting new datasets is to build a git-annex link to every file in a dataset by default, with a couple of specific exceptions (README.md and DATS.json). However, the utility of building links rather than just storing small files directly in github is questionable, and in tests with the microstructure_informed_connectomics dataset, which contains ~11,300 files, building git-annex links to each file took nearly twice as long as building links only to files larger than a cut-off of 200kb (estimated by manual examination of some subdirectories) and downloading the rest directly.
Do we want to consider size-based or other criteria for which files get git-annex links (such as storing all text files directly) ?
Metadata
Metadata
Labels
Type
Projects
Status