Skip to content

BiocomputeLab/segment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

segment

A flexible analysis tool for classifying the order and orientation of sequence segments in recombinase-based genetic circuitry.

Installation

To use segment you will need to first compile the source code contained in this repository. This requires that the rustc compiler is available on your system. Further details about how to install rustc can be found at the following URL and we recommend the use of rustup to maintain your environment and ensure all other supporting tools are available.

https://www.rust-lang.org/tools/install

Once rustc is installed, compilation of segment is performed by running the following command from within the root of the code repository:

cargo build --release

This should take a couple minutes to complete and the generated segment executable can be found in the target/release directory. We recommend placing the segment executable in a location found in your PATH environment variable to making running the tool easier from the command line.

It should be possible to compile and run segment on any system for which rustc is available (e.g., Windows, MacOS and Linux) and the only dependancy is on the bio crate. Testing of the software has been performed exclusively on MacOS Sequoia (15.5).

Usage

To run segment it has the following usage:

segment REF_FASTA DATA_FASTQ MIN_READ_LEN MAX_READ_LEN OUTPUT_FILE

where REF_FASTA is the location of a file containing the reference segments in a FASTA format (this must include a start and end segment reference as a bare minimum), DATA_FASTQ is the sequencing data in a FASTQ format, MIN_READ_LEN and MAX_READ_LEN are the minimum and maximum read length, respectively, to carry out the classification for, and OUTPUT_FILE is the filename to output the results to.

Example

To run segment on the small example data set provided in the data directory, we would therefore use the command (running from the root of the code repository):

segment data/refs.fasta data/reads.fastq 500 2000 output.txt

This should generate a file classifying each read in the intput FASTQ data file that passes the read length filtering. The first few lines are:

ad6f445e-17e6-49ce-bf40-67da322006c8    start-attP_Bxb1*-end
124e2e64-ee6f-4349-972d-58e5a210c264    start-attP_Bxb1*-end
27bcf6c4-9226-4fa9-af0a-1ed4294d8768    start-end
715b0122-721e-4dc1-acff-ca723162392c    start-attP_Bxb1*-end
5f2bb513-496e-447b-a151-df14796f39bc    start-attP_Bxb1*-end
fe1b90af-4e41-4c33-ab87-eee17512462c    start-attP_Bxb1*-end
cb24e39a-8ee8-4988-a788-250ccd99349c    start-attB_Tp901-attB_Int5-attB_BxB1*-end
c689dd31-4e98-4154-ab05-a8b0abed9436    start-attP_Bxb1*-end
8c19bcd4-b228-4e0d-a257-e5116a0a4322    start-attP_Bxb1*-end
...

This should run very quickly and take less than a minute to run on a standard desktop machine.

About

Flexible analysis tool for recombinase-based circuits

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages