BARTI (Barcode Tracking In vivo)

Pipeline for barcode sequence data analysis

Version 1.0 (stable, updated 16 May 2017) on master branch. (v1 and v2 branches are in development.)

Martin CJ, Cadena AM, Leung VW, Lin PL, Maiello P, Hicks N, Chase MR, Flynn JL, Fortune SM. 2017. Digitally barcoding Mycobacterium tuberculosis reveals in vivo infection dynamics in the macaque model of tuberculosis. mBio 8:e00312-17. https://doi.org/10.1128/mBio.00312-17.

Overview

Dependencies

I. Basic Needs (Command-Line Interface (CLI) and Python)

(For Windows users only). Download a command-line interface program (CLI). Recommended: Cygwin (cygwin.com).
Install Python v2 (ver. 2.7.11 or later; not v3). (If you have OS X) Recommended: Anaconda (MiniConda version; anaconda.org).
Notes on pip

II. Packages

The following packages are required (versions in parentheses).

numpy (1.10)
pandas (0.18)
sqlalchemy
regex [see Note 1]
jupyter (1.0.0) [see Note 2]

Note 1 (regex): this regex is NOT the native re package. Install regex via binstar. The package is available as conda for OS X and Linux; for Windows, only the pip type is available.

Note 2 (jupyter): Currently, Jupyter is required to read and execute the scripts (.ipynb type) while we test out our command-line .py version.

(Quick Notes)

How to install packages

(If you're new to using command-lines and want more detail, read the expanded instructions in "How to install packages".)

In your CLI (e.g. Terminal or Cygwin)

Option A (preferred): to install with donda, type:

$ conda install [your_package_name]
Example: conda install my_new_package
...
Proceed ([y]/n)? (blinking cursor here) # type y to continue, n to cancel, and Enter to submit.

Option B: to install with pip (if conda doesn't work or package is unavailable on conda).

pip install [your_package_name]
Example: pip install conda_cant_find_me

If already installed,
pip install [your_package_name] --upgrade
Example: conda install im_already_here --upgrade

Downloading the pipeline.

Option 1: Download the scripts here.

Or, on the main repository page, select the green "Clone or download" button. A dropdown menu will appear, and select the "Download ZIP" option.

Open and extract the files in the downloaded zip, and ensure they are in your desired folder (so you may easily access your data from the scripts, i.e. file paths)

Option 2: Clone repository to your Github.

If you would like to get the project in an existing folder, initialize the folder first with $ git init. (If not, skip this step.)

Then, clone and add remote for your repository.

$ git clone https://github.com/sarahfortunelab/barcodetracking.git $ git remote add [your_remote_name], conventionally named origin for the first remote.

Option 3: Fork repository to also stay up-to-date with our developments.

On the Github repository page, click the button Fork in the top-right corner to create a fork of the original project.

Then, navigate to forked Git repository, click Clone or download, and copy the Git address.

Clone your repository to your local folder with the following, wherein "YOUR_USER_NAME" is where your own user name will be:

$ git clone https://github.com/YOUR_USER_NAME/barcodetracking.git

To sync your repository with our repository to get updates, add an upstream remote to our repository:

$ git add remote upstream https://github.com/sarahfortunelab/barcodetracking.git

To receive updates to your local branch, e.g. master through your remote upstream:

$ git fetch upstream $ git checkout [local branch to sync], typically named the same as that of the main repository
$ git merge upstream/master

Comments on Installation

Dependencies

I. Basic Needs

Command-line interface.
- OS X. The application "Terminal" is factory-installed on your computer as "Terminal".
- Windows. A Linux-like terminal is strongly preferred and is unfortunately not included in the Windows system. A good application is Cygwin (https://cygwin.com).
  (The native "Command Prompt" interface could be used, but is not supported, i.e. use at your own risk)
Python.
- This pipeline is written in and supports Python v2 versions 2.7.11 and later. (Python v3 can be used, though the pipeline has not been tested for changes between v2 and v3.)

MiniConda (a lighter version offered by Anaconda) includes fewer packages but will still For a lighter version, MiniConda includes fewer packages but takes up less storage space.. (Anaconda also includes many standard packages used in pipeline.)

II. Packages

Note on jupyter: there are other jupyter components that are suffixed. These are installed and updated alongside jupyter as dependencies, so you don't have to worry about those.

How to install packages

Using conda

Using your CLI (Terminal or Cygwin), type:

conda install [your_package_name]
Example: conda install my_new_package

The terminal will then print some information on what other dependencies are needed, and then ask:

Proceed ([y]/n)? (blinking cursor here)

(To make things explicit , type n (to cancel) or y (to continue) and hit Enter.)

Using pip

If you don't have Anaconda, or in case conda can't find a package, it's possible to find and install it through pip. The syntax is similar to that for conda.

pip install [your_package_name]
Example: pip install conda_cant_find_me

If it's already installed, the terminal will print an error that says it's already installed and which version it is. In that case, you can choose to update it by typing:

pip install [your_package_name] --upgrade
Example: conda install im_already_here --upgrade

(To clarify, anaconda usually installs most of these packages while installing Python, but they are NOT a part of (i.e. native to) Python.) has many of these packages.) (versions in parentheses). (If already installed, check the version and update as necessary.)

Starting up the script(s)

The current version of this package executes through the Jupyter Notebook interface. To open the scripts, start up Jupyter via.:

cd DIRECTORY_PATH where DIRECTORY_PATH is the root directory

Example:

cd "My Documents/barcodetracking"

jupyter notebook

The interface will load in your default internet browser as a local server (e.g. https://localhost:8888). In this window, you will see your directory and files. Navigate in the window to open the script you wish to run.
At the start of the script, there will be a list of user inputs. Fill these in according to the annotated instructions. Then, in the script's top menu bar, select

Cell > Run All

Each cell waiting to be or currently running will be marked with an asterisk * on the left of the cell. When it is finished, a number will appear that corresponds to the order in which it was executed.
At the bottom of the script, a log will be written to show the script's progress and when it is finished.
Shutting down the script(s)

After using the script, close the script's window and shut down the script document by:


Check off the tick box to the left of the script file. 

A menu bar will appear under the regular top menu, with a few buttons. Select the orange button "Shutdown". When it has been successfully shut down, the notebook icon for that file will turn from green to grey.



Sample Data

Sample data have been provided in the data/sample_data folder. These data are generated from samples with known numbers (i.e. 1, 5, 25, and 120) of barcoded plasmids.
The data are presented are the raw compressed FASTQ (.fastq.gz) files generated in Illumina sequencing.
Results of an indexed sample in two files: one for forward read ("R1") and the second for reverse read ("R2") data. (Read more about Illumina's file nomenclature in Illumina's CASAVA User Guide.
Below are samples included and the corresponding file pairs:


1 barcode:

NH001_S1_L001_R1_001.fastq.gz
NH001_S1_L001_R2_001.fastq.gz



5 barcodes:


NH005_S5_L001_R1_001.fastq.gz


NH005_S5_L001_R2_001.fastq.gz


25 barcodes:

NH025_S11_L001_R1_001.fastq.gz
NH025_S11_L001_R2_001.fastq.gz



120 barcodes:

NH120_S14_L001_R1_001.fastq
NH120_S14_L001_R2_001.fastq

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.ipynb_checkpoints		.ipynb_checkpoints
data		data
helpers		helpers
output-testall		output-testall
output		output
scripts-testall		scripts-testall
scripts		scripts
.DS_Store		.DS_Store
README.md		README.md
random.txt		random.txt
random2.txt		random2.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BARTI (Barcode Tracking In vivo)

Pipeline for barcode sequence data analysis

Overview

Dependencies

I. Basic Needs (Command-Line Interface (CLI) and Python)

II. Packages

(Quick Notes)

Downloading the pipeline.

Option 1: Download the scripts here.

Option 2: Clone repository to your Github.

Option 3: Fork repository to also stay up-to-date with our developments.

Comments on Installation

Dependencies

I. Basic Needs

II. Packages

Starting up the script(s)

Shutting down the script(s)

Sample Data

About

Uh oh!

Releases

Packages

Languages

vivianleung/barcodetracking

Folders and files

Latest commit

History

Repository files navigation

BARTI (Barcode Tracking In vivo)

Pipeline for barcode sequence data analysis

Overview

Dependencies

I. Basic Needs (Command-Line Interface (CLI) and Python)

II. Packages

(Quick Notes)

Downloading the pipeline.

Option 1: Download the scripts here.

Option 2: Clone repository to your Github.

Option 3: Fork repository to also stay up-to-date with our developments.

Comments on Installation

Dependencies

I. Basic Needs

II. Packages

Starting up the script(s)

Shutting down the script(s)

Sample Data

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages