-
Notifications
You must be signed in to change notification settings - Fork 1
pMD design considerations
Kaelyn Long edited this page Mar 18, 2025
·
5 revisions
The preliminary implementation of data-raw/sampleMetadata.R (https://github.com/ASAP-MAC/parkinsonsMetagenomicData/pull/7/) contains hard-coded file names, dataset names, and joining code that is repeated for each dataset. This will be cumbersome to maintain as data are added or updated. Some design considerations for pMD:
- we need to choose "reference" cross-platform data that will be routinely/automatically updated as data become available or updated. The simplest would be files directly output by the NF pipeline, but further derived data can be used if needed for computational efficiency by package users.
- break down big functions into small, single-purpose functions. Small functions for internal use only can be not exported (easiest way is to start these function names with a
.
). - don't hard-code names of individual datasets or files - functions should work independently of what data are available. Use some kind of iteration to join samples requested by the user.
- consider where
sampleMetadata
should come from. Having it as a.rda
file indata/
has always been a hassle in maintenance of cMD; it could be much more convenient if this were instead a function that included a version argument that pulls metadata directly from the source. - add roxygen2 markup for all exported functions.
Use this wiki space to propose design/implementation in more detail.
Data Processing Results
Raw pipeline output stored at gs://metagenomics-mac
- Vignette provides an example of downloading a single file based on sample UUID and output file type. Download of multiple files at the same time should also be supported, but is not yet validated.
Curated Metadata
Currently manually maintained at parkinsonsManualCuration, transitioning to be sourced from ODM and curation team