Version 1.0 (stable, updated 16 May 2017) on master branch. (v1 and v2 branches are in development.)
Martin CJ, Cadena AM, Leung VW, Lin PL, Maiello P, Hicks N, Chase MR, Flynn JL, Fortune SM. 2017. Digitally barcoding Mycobacterium tuberculosis reveals in vivo infection dynamics in the macaque model of tuberculosis. mBio 8:e00312-17. https://doi.org/10.1128/mBio.00312-17.
-
(For Windows users only). Download a command-line interface program (CLI). Recommended: Cygwin (cygwin.com).
-
Install Python v2 (ver. 2.7.11 or later; not v3). (If you have OS X) Recommended: Anaconda (MiniConda version; anaconda.org).
-
Notes on
pip
The following packages are required (versions in parentheses).
- numpy (1.10)
- pandas (0.18)
- sqlalchemy
- regex [see Note 1]
- jupyter (1.0.0) [see Note 2]
Note 1 (regex): this regex
is NOT the native re
package. Install regex
via binstar
. The package is available as conda for OS X and Linux; for Windows, only the pip type is available.
Note 2 (jupyter): Currently, Jupyter is required to read and execute the scripts (.ipynb type) while we test out our command-line .py version.
How to install packages
(If you're new to using command-lines and want more detail, read the expanded instructions in "How to install packages".)
In your CLI (e.g. Terminal or Cygwin)
Option A (preferred): to install with donda, type:
$
conda install [your_package_name]
Example:conda install my_new_package
...
Proceed ([y]/n)?
(blinking cursor here) # typey
to continue,n
to cancel, andEnter
to submit.
Option B: to install with pip (if conda doesn't work or package is unavailable on conda).
pip install [your_package_name]
Example:pip install conda_cant_find_me
If already installed,
pip install [your_package_name] --upgrade
Example:conda install im_already_here --upgrade
Option 1: Download the scripts here.
Or, on the main repository page, select the green "Clone or download" button. A dropdown menu will appear, and select the "Download ZIP" option.
Open and extract the files in the downloaded zip, and ensure they are in your desired folder (so you may easily access your data from the scripts, i.e. file paths)
If you would like to get the project in an existing folder, initialize the folder first with $ git init
. (If not, skip this step.)
Then, clone and add remote for your repository.
$ git clone https://github.com/sarahfortunelab/barcodetracking.git
, conventionally named
$ git remote add [your_remote_name]origin
for the first remote.
On the Github repository page, click the button Fork in the top-right corner to create a fork of the original project.
Then, navigate to forked Git repository, click Clone or download, and copy the Git address.
Clone your repository to your local folder with the following, wherein "YOUR_USER_NAME" is where your own user name will be:
$ git clone https://github.com/YOUR_USER_NAME/barcodetracking.git
To sync your repository with our repository to get updates, add an upstream remote to our repository:
$ git add remote upstream https://github.com/sarahfortunelab/barcodetracking.git
To receive updates to your local branch, e.g. master
through your remote upstream
:
$ git fetch upstream
$ git checkout [local branch to sync]
, typically named the same as that of the main repository
$ git merge upstream/master
-
Command-line interface.
- OS X. The application "Terminal" is factory-installed on your computer as "Terminal".
- Windows. A Linux-like terminal is strongly preferred and is unfortunately not included in the Windows system. A good application is Cygwin (https://cygwin.com).
(The native "Command Prompt" interface could be used, but is not supported, i.e. use at your own risk)
-
Python.
- This pipeline is written in and supports Python v2 versions 2.7.11 and later. (Python v3 can be used, though the pipeline has not been tested for changes between v2 and v3.)
MiniConda (a lighter version offered by Anaconda) includes fewer packages but will still For a lighter version, MiniConda includes fewer packages but takes up less storage space.. (Anaconda also includes many standard packages used in pipeline.)
Note on jupyter
: there are other jupyter
components that are suffixed. These are installed and updated alongside jupyter
as dependencies, so you don't have to worry about those.
How to install packages
Using conda
Using your CLI (Terminal or Cygwin), type:
conda install [your_package_name]
Example:conda install my_new_package
The terminal will then print some information on what other dependencies are needed, and then ask:
Proceed ([y]/n)?
(blinking cursor here)
(To make things explicit , type
n
(to cancel) ory
(to continue) and hitEnter
.)
Using pip
If you don't have Anaconda, or in case conda can't find a package, it's possible to find and install it through pip
. The syntax is similar to that for conda.
pip install [your_package_name]
Example:pip install conda_cant_find_me
If it's already installed, the terminal will print an error that says it's already installed and which version it is. In that case, you can choose to update it by typing:
pip install [your_package_name] --upgrade
Example:conda install im_already_here --upgrade
(To clarify, anaconda usually installs most of these packages while installing Python, but they are NOT a part of (i.e. native to) Python.) has many of these packages.) (versions in parentheses). (If already installed, check the version and update as necessary.)
The current version of this package executes through the Jupyter Notebook interface. To open the scripts, start up Jupyter via.:
cd DIRECTORY_PATH
where DIRECTORY_PATH is the root directory
Example:
cd "My Documents/barcodetracking"
jupyter notebook
The interface will load in your default internet browser as a local server (e.g. https://localhost:8888). In this window, you will see your directory and files. Navigate in the window to open the script you wish to run.
At the start of the script, there will be a list of user inputs. Fill these in according to the annotated instructions. Then, in the script's top menu bar, select
Cell > Run All
Each cell waiting to be or currently running will be marked with an asterisk * on the left of the cell. When it is finished, a number will appear that corresponds to the order in which it was executed.
At the bottom of the script, a log will be written to show the script's progress and when it is finished.
After using the script, close the script's window and shut down the script document by:
- Check off the tick box to the left of the script file.
- A menu bar will appear under the regular top menu, with a few buttons. Select the orange button "Shutdown". When it has been successfully shut down, the notebook icon for that file will turn from green to grey.
Sample data have been provided in the data/sample_data folder. These data are generated from samples with known numbers (i.e. 1, 5, 25, and 120) of barcoded plasmids.
The data are presented are the raw compressed FASTQ (.fastq.gz) files generated in Illumina sequencing.
Results of an indexed sample in two files: one for forward read ("R1") and the second for reverse read ("R2") data. (Read more about Illumina's file nomenclature in Illumina's CASAVA User Guide.
Below are samples included and the corresponding file pairs:
-
1 barcode:
- NH001_S1_L001_R1_001.fastq.gz
- NH001_S1_L001_R2_001.fastq.gz
-
5 barcodes:
-
NH005_S5_L001_R1_001.fastq.gz
-
NH005_S5_L001_R2_001.fastq.gz
-
25 barcodes:
- NH025_S11_L001_R1_001.fastq.gz
- NH025_S11_L001_R2_001.fastq.gz
-
120 barcodes:
- NH120_S14_L001_R1_001.fastq
- NH120_S14_L001_R2_001.fastq