Skip to content

Size or other cut-off points for building git-annex links #674

@emmetaobrien

Description

@emmetaobrien

Right now, our general policy when ingesting new datasets is to build a git-annex link to every file in a dataset by default, with a couple of specific exceptions (README.md and DATS.json). However, the utility of building links rather than just storing small files directly in github is questionable, and in tests with the microstructure_informed_connectomics dataset, which contains ~11,300 files, building git-annex links to each file took nearly twice as long as building links only to files larger than a cut-off of 200kb (estimated by manual examination of some subdirectories) and downloading the rest directly.

Do we want to consider size-based or other criteria for which files get git-annex links (such as storing all text files directly) ?

Metadata

Metadata

Type

No type

Projects

Status

Done

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions