Skip to content

Add instructions for using DVC for versioning data #13

@shntnu

Description

@shntnu

@gwaygenomics said this broadinstitute/lincs-cell-painting#60 (comment)

We might at some point also consider moving from gitLFS to dvc. It was super easy to get setup, and plays very nicely with AWS. I did this in the grit-benchmark repo (in broadinstitute/grit-benchmark#28)

The file pointer is in a readable format (YAML file)

outs:
- md5: c53856c1596f00a67a636389716d8219
  size: 26948901
  path: cellhealth_single_cell_umap_embeddings_SQ00014610_chr2.tsv.gz

Steps

  1. Read the docs https://dvc.org/doc/start
  2. Create a destination prefix (a "folder") on S3, which will be the remote storage location for dvc.
  3. Add the dvc and dvcs3 dependencies
  4. Update your .gitignore to ignore the files you used to previously track using GitLFS
  5. Follow steps here https://dvc.org/doc/start and here https://dvc.org/doc/start/data-versioning

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions