Building a local database using OBITools v.1.2.12 on Brown's high-performance cluster, OSCAR.
Note: This is an optional step and only needs to be run if a) a local reference library is available for your study system and b) an updated version of the local reference library is needed.
The schematic below shows the entire bioinformatic pipeline for DNA metabarcoding data, but the step included in this repository is shown in the white box labeled "Step 2b".
- Login to BOLD, ensuring you have access to the local library
- Click “select all” then deselect any records that should not be included in the library
- Especially those that are Tagged as Mis-identified, Lab-Mixup, and Contaminated: check this by navigating to Record List and scroll to the Tags column on the far right, then click the arrow to sort by tags
- Then navigate to “Downloads” in the left menu bar and download two files
- “Data Spreadsheets”: select all specimen data options as of time of analysis and select Multi-page format
- “Sequences”
- Select marker, most frequently will use trnL
- Select the target marker with alignment type “none”
- Under Labels, for Voucher ensure the default Sequence/Process ID is selected. For Taxonomy, ensure the default Taxon is selected. Allow no other parameters to be downloaded in the fasta
- Store the metadata and fasta files in a common, dated, directory
- If not on campus, make sure you are connected to the Brown VPN
- Navigate to the RStudio Server hosted on Open OnDemand and choose R version 4.3.1.
- Choose 48 hours, 4 cores, 24 GB memory (adjust up or down if needed)
- Under Modules put
git miniconda3
. - Launch the session once it has been allocated.
- Go to the terminal pane in RStudio and
cd /oscar/data/tkartzin/<your folder>
(replace with your user folder here) - In that terminal
git clone https://github.com/trklab-metabarcoding/obitools2-localdb-build.git
- Also in the terminal:
cd obitools2-localdb-build
- In the Files panes of RStudio, use the menu at the top right to make sure you are also at the same path.
- Double-click the
.obitools2-localdb-build.Rproj
file to set the project working directory. All of the notebooks are built from this working directory.
Once created, local reference databases will be stored in the shared lab directory /oscar/data/tkartzin/local_ref_lib
. The local reference library should be placed in a dated folder under the correct project code (e.g. YNP), loci (e.g. trnL), and region of interest (e.g. P6).
The input file for this step is a .fas
file that has been downloaded from BOLD. This file should be saved in the parent directory of this repository. This file, along with the rest of the output, will be moved to shared lab directory at the end of the step.
The first code chunk updates all of the params
in the YAML header of the notebook. This includes specifying the project code, locus (e.g trnL), region of interest (e.g. P6), and ecoPCR parameters.
Step through each code chunk to investigate duplicate sequence identifiers, format BOLD headers, add taxonomy, and run an in silico PCR to build an ecoPCR database.
As you move through the code, you will build a table where you can track the number of sequences that move through each code chunk to help identify problems with the database build and to use in publications.
At the end of this step, the output will be moved to a dated folder in the appropriate region-of-interest folder under each taxonomic division at /oscar/data/tkartzin/local_ref_lib/<project code>/<locus>/<region-of-interest>/
If you need permissions, check with Tim Divoll or Tyler Kartzinel
# make a destination folder (-p will check if it exists yet)
mkdir -p $reflib_path
chmod -R g+w $reflib_path
# make a destination project folder
mkdir -p $reflib_path/$project_code
chmod -R g+w $reflib_path/$project_code
# make a destination locus folder
mkdir -p $reflib_path/$project_code/$locus
chmod -R g+w $reflib_path/$project_code/$locus
# make a destination region of interest folder
mkdir -p $reflib_path/$project_code/$locus/$roi
chmod -R g+w $reflib_path/$project_code/$locus/$roi
Useful GitHub commands:
git add <file>
- add a file to the staging areagit commit -m "<descriptive message>"
- commit the staged changes with a message (required)git switch <branch>
- change to a different branchgit checkout -b <branch>
- make a new branch; just be aware of which branch you are currently ongit pull
- pull the latest changes from the remote repo; a good habit every time you switch to maingit stash
- stash the changes so your branch is clean before you switch to another branchgit stash pop
- pop the changes back out after you have switched to the desired branch
If it appears that your Oscar session will time out before Step 2b has completed running, email CCV as soon as possible to request more time be added to your session. Re-read README and other GitHub resources. Contact fellow lab member with your question. Email Tim!