Size or other cut-off points for building git-annex links

Right now, our general policy when ingesting new datasets is to build a git-annex link to every file in a dataset by default, with a couple of specific exceptions (README.md and DATS.json). However, the utility of building links rather than just storing small files directly in github is questionable, and in tests with the microstructure_informed_connectomics dataset, which contains ~11,300 files, building git-annex links to each file took nearly twice as long as building links only to files larger than a cut-off of 200kb (estimated by manual examination of some subdirectories) and downloading the rest directly.

Do we want to consider size-based or other criteria for which files get git-annex links (such as storing all text files directly) ?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Size or other cut-off points for building git-annex links #674

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Size or other cut-off points for building git-annex links #674

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions