- Lab_Root/
- Projects/ 📊
- Project_Name_1/
- data/
- raw/
- processed/
- metadata/
- code/
- scripts/
- notebooks/
- pipelines/
- results/
- figures/
- tables/
- models/
- manuscript/
- README.md
- data/
- Project_Name_2/
- ...
- Project_Name_1/
- People/ 👥
- LastName_FirstName/
- Smith_Jane/
- Smith_John/
- ...
- Resources/
- references/
- databases/
- standard_datasets/
- Admin/ 📋
- grants/
- protocols/
- meetings/
- Archive/ 🗃️
- Archived_Projects/
- Project_Name_1_YYYY/
- Project_Name_2_YYYY/
- Alumni/
- Wang_Dave_YYYY-YYYY/
- Virgin_Skip_YYYY-YYYY/
- ...
- Archived_Projects/
- Projects/ 📊
Contains all active research projects, each with standardized subdirectories for data, code, results, and documentation.
Individual directories for all current lab members.
Shared resources used across multiple projects including references, databases, and standard datasets.
Administrative materials including grants, protocols, and meeting notes.
- Archived_Projects: Completed projects with year of completion
- Alumni: Past lab members with their tenure dates
- Be consistent: Use snake_case or kebab-case consistently. CamelCase should be avoided
- Avoid spaces: Replace spaces with underscores or hyphens
- Include dates: Use ISO format (YYYY-MM-DD) when including dates
- Version numbers: Include version numbers (v1, v2.3) when applicable
- Descriptive names: Files should be self-descriptive without being excessively long
- Raw data is sacred: Never modify raw data; always create processed copies
- Data provenance: Document the source of all datasets and preprocessing steps, preferably using an RMarkdown or Quarto document
- Large files: Use Git LFS (Large File Storage) for files >100MB. HTCF LFS is for large local data, and Harvard Dataverse is for external hosting.
- Data validation: Implement checksums to verify data integrity
- Repository per project: Create a separate repository for each major project
- Branching strategy:
main
- stable, working codedevelop
- integration branch- Feature branches for new analyses
- README files: Include setup instructions, dependencies, and basic usage
- GitHub Actions: Set up automated testing and verification workflows
- Releases: Tag significant versions with semantic versioning (v1.0.0)
- Issues: Use for tracking tasks, bugs, and future work
- GitHub Pages: For project documentation and results sharing
- Project boards: For task management and project coordination
- GitHub Packages: For storing and sharing lab-developed packages
- Continuous Integration: Automatically test code on push
- Modular code: Break code into logical, reusable components
- Script headers: Include purpose, author, date, and usage examples in comments
- Requirements: Include environment specifications (requirements.txt, environment.yml)
- Documentation: Document functions, parameters, and expected outputs
- Environment management: Use conda or Docker to ensure reproducibility when applicable
- Seed values: Set and document random seeds
- Dependency management: Specify exact versions in requirements
- Notebooks: RMarkdwon/Quarto or Jupyter
- Onboarding documents: Create guides for new lab members
- Compute resources: Document procedures for utilizing cluster/cloud resources
- Collaboration guidelines: Establish protocols for code sharing and authorship